The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer

Shu, Huiling; Xiao, Ke; Zhu, Wenyong; Zhang, Rongxin; Tao, Tiantong; Sun, Xiao

doi:10.3390/ijms26146874

Open AccessArticle

The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer

by

Huiling Shu

^†,

Ke Xiao

^†

,

Wenyong Zhu

,

Rongxin Zhang

,

Tiantong Tao

and

Xiao Sun

^*

State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 211189, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Int. J. Mol. Sci. 2025, 26(14), 6874; https://doi.org/10.3390/ijms26146874

Submission received: 10 June 2025 / Revised: 10 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

(This article belongs to the Special Issue Molecular Research of Multi-omics in Cancer)

Download

Browse Figures

Versions Notes

Abstract

The role of G-quadruplexes (G4s) in gene regulation has been widely documented, especially in gene promoters. However, the transcriptional mechanisms involving G4s in other regulatory regions remain largely unexplored. In this study, we integrated the G4-DNA data derived from 22 breast cancer patient-derived tumor xenograft (PDTX) models and MCF7 cell line as potential breast cancer-associated G4s (BC-G4s). Genome-wide analysis showed that BC-G4s are more prevalent in gene promoters and the first introns. The genes accommodating promoter or intronic BC-G4s show significantly higher transcriptional output than their non-G4 counterparts. The biased distribution of BC-G4s in close proximity to the transcription start site (TSS) is associated with an enrichment of transcription factor (TF) interactions. A significant negative correlation was detected between the G4–TF interactions within the first introns and their cognate promoters. These different interactions are complementary rather than redundant. Furthermore, the differentially expressed genes (DEGs) harboring promoter and first intron BC-G4s are significantly enriched in the cell cycle pathway. Notably, promoter BC-G4s of DEGs could be a central hub for TF–TF co-occurrence. Our analysis also revealed that G4-related single nucleotide variants (SNVs) affect the stability of G4 structures and the transcription of disease-related genes. Collectively, our results shed light on how BC-G4s within promoters and first introns regulate gene expression and reinforce the critical role of G4s and G4-related genes in breast cancer-associated processes.

Keywords:

G-quadruplexes; the first intron; transcriptional control; transcription factor binding; breast cancer

1. Introduction

The DNA G-quadruplexes (G4s) are non-canonical four-stranded structures formed in guanine-rich regions via a union of four G-tracts [1]. The putative G4-forming sequences (pG4s) are prevalent in functionally important genomic regions, especially around the transcription start sites (TSSs) in the human genome [2,3]. A typical G4 motif is represented as G_xN_1–7G_xN_1–7G_xN_1–7G_xN_1–7 (where x ≥ 3 and N can be any base) [4]. Computational analyses demonstrate the presence of over 700,000 regions with the potential to fold into G4 structures in the human genome [5]. A number of studies have shown that G4s are implicated in various essential cellular processes including transcriptional regulation, chromatin remodeling, alternative splicing, and genome maintenance [6,7,8,9,10,11].

The impact of G4 structures on transcription activity has been widely discussed. Numerous bioinformatic studies have supported the idea that G4s are predominantly present at gene promoters, especially in oncogenes [12,13,14]. Accumulating evidence has revealed that endogenous G4s are enriched in active promoters which correlates with elevated transcription [15,16]. A comprehensive analysis of transcription factor (TF) binding sites and G4 structures emphasized G4s functioning as binding hubs for multiple TFs [17]. Consequently, such enhanced gene transcription is probably associated with the recruitment of increased numbers of TFs at promoter G4s [15]. Biochemical tests further verified that several TFs exhibit high-affinity binding to G4s in vitro including SP1, MAZ, and PARP-1 [18,19]. In contrast to the great interest in promoter G4s, G4s located downstream of the TSS, especially within introns, are less studied. It has been reported that first introns are the longest and the most conserved compared with other downstream introns [20,21,22]. In addition, first introns contain more regulatory elements and active epigenetic marks linked to the level of gene expression [21,23]. Examination of sequences downstream of the TSSs demonstrated that G-rich elements are concentrated in the first introns [24]. These observations lead us to hypothesize that G4s formed by G-rich intron 1 elements may serve as structural targets for modulating gene expression, hence it is necessary to dig into the mechanism that takes advantage of these structures.

Studies have shown that single nucleotide variants (SNVs) could also affect gene expression when located within transcription regulatory elements [8]. Since the majority of these SNVs are found in non-coding regions, it is still a challenge to elucidate the impact of SNVs on molecular function and disease [25,26]. Recent works highlighted a correlation between disease risk and the alteration of G4s caused by SNVs, which may be attributed to the change in gene expression related to the transformation as well as the gains/losses of G4 structures [8,27,28]. Thus, the connection between G4-related SNVs and the transcription of disease-associated genes in a specific cancer type deserves to be explored thoroughly.

Based on the significant role of G4s in multiple biological processes, G4 structures have been considered as a potential therapeutic tool against tumor cells [5]. An integrative analysis of differentially enriched G4 regions in breast cancer patient-derived tumor samples unveiled intratumor heterogeneity, thus contributing to breast cancer stratification and precision medicine for cancer treatment [29]. According to the GLOBOCAN cancer statistics for the year 2022, breast cancer has become the most frequently diagnosed cancer in 157 countries and it also ranked the leading cause of deaths in 112 countries [30]. Although a recent study focused on the different G4 patterns in distinct breast cancer subtypes [29], the overall characteristics of G4s in breast cancer, especially those in non-coding regulatory regions other than promoters, remain largely unexplored.

In this study, we first identified all candidate G4s associated with breast cancer (BC-G4s; see definition below) and investigated their genomic distribution patterns. We explored the relationship between BC-G4s and gene expression, focusing on the genes harboring BC-G4s in promoters and the first introns. Next, pathway enrichment patterns were analyzed on the differentially expressed genes (DEGs) with promoter and first intron BC-G4s. To elucidate the impact of BC-G4s on gene transcription, we systematically investigated TF interactions with BC-G4s in both promoters and the first introns. Additionally, we conducted TF enrichment analysis on promoter BC-G4s of DEGs and further explored the potential biological mechanisms underlying the TF network. Finally, we examined the alteration of BC-G4s caused by SNVs, as well as their influence on key genes related to breast cancer biology. Taken together, we discovered vital BC-G4s and G4-related genes which play a crucial role in breast cancer progression through in-depth analyses of breast cancer G4 data (Figure 1).

2. Results

2.1. BC-G4s Are Enriched in Gene Promoters and the First Introns

To comprehensively explore the roles of G4s in breast cancer development and progression, we mapped all candidate G4 structures correlated with breast tumorigenesis. We collected the in vivo G4 data from the public high-throughput experiments in breast cancer including 22 breast cancer patient-derived tumor xenograft (PDTX) models (27 biological samples) and the MCF7 breast adenocarcinoma cell line [29,31]. For each PDTX sample, peaks were considered as high-confident G4 regions if confirmed in two out of four technical replicates. These G4 sites were then merged to generate a single PDTX G4 dataset. Since approximately 78.6% of MCF7 G4s overlap with PDTX G4s (Supplementary Figure S1A), we finally merged them into a combined breast cancer-associated G4 dataset (hereafter called BC-G4s) for further analysis.

The genomic feature distributions of BC-G4s were first evaluated. Notably, BC-G4s are highly abundant in promoters and introns, especially in the first introns (Figure 2A). Motivated by the consideration that non-canonical DNA structures affect gene expression, we assessed whether the existence of BC-G4s in gene promoters and introns was coupled to the gene expression level. To this end, we systematically overlapped gene promoter and intronic regions with BC-G4s. We used counts per million (CPM) values normalized with the trimmed mean of M (TMM) values method derived from TCGA breast cancer raw count data to analyze expression profiles. Consistent with our hypothesis, the genes harboring promoter or intronic BC-G4s show significantly high expression when compared to those without such G4 regions (Figure 2B; Wilcoxon two-sided test, p-value < 2 × 10⁻¹⁶). To exclude the mutual influence of promoter and intronic BC-G4s on gene expression, we further divided genes into four categories: genes with both promoter and intronic G4s, genes with only promoter G4s, genes with only intronic G4s and genes without these G4 regions. Planned comparisons demonstrated that the expression level of the genes harboring both promoter and intronic G4s is significantly increased relative to the other groups (Figure 2C; Wilcoxon two-sided test, p-value < 2.22 × 10⁻¹⁶).

In particular, we investigated the presence of BC-G4s from intron 1 to intron 8 since the average gene transcript possesses approximately 7.6 introns. We found that BC-G4s are more prevalent in the first introns in comparison to the others, with a gradual decrease in the numbers of BC-G4s overlapping the succeeding introns (Figure 2D). Given that first introns are typically longer than other downstream introns [32], we probed whether the large proportion of BC-G4s in the first introns was attributed to the functional properties of the intron sites or simply an artifact of their long length. By conducting a permutation test randomizing the positions of 20,589 intronic BC-G4s across all intronic regions, we discovered that the observed proportion (50.8%) was not replicated in any of the 10,000 permutations, strongly eliminating the possibility that the proportion of BC-G4s in the first introns is solely a by-product of their long length (p-value << 0.0001, Figure 2E). Remarkably, the genes with BC-G4s in the first introns show substantially enhanced expression as compared to those with G4s in other introns (Figure 2F and Supplementary Figure S1B; Wilcoxon two-sided test, p-value = 1.7 × 10⁻⁶). Intriguingly, the distribution of BC-G4s has a bias towards the 5′ end of the first introns and 3′ end of promoters, indicating highly abundance in the vicinity of the TSSs (Figure 2G,H and Supplementary Figure S1C). In general, these results underscore the importance of BC-G4s in promoters and the first introns to gene expression in breast cancer.

2.2. The Distribution of BC-G4s in Up-Regulated Genes Is Biased Toward the TSSs

To gain insight into the regulatory functions of BC-G4s, we focused on the promoter and first intron BC-G4s in the differentially expressed genes (DEGs). We integrated TCGA and GTEx raw count data processed using an identical pipeline to generate a transcriptomic profile comprising 1119 tumors, 113 tumor-adjacent normal (NAT) samples in breast cancer, and 92 normal samples in breast tissue [33,34]. We employed a stringent normalization method to remove unwanted variation across batches, enabling direct comparison of runs performed in different laboratories at different times (Supplementary Figure S2C) [35]. In total, 1058 up-regulated and 580 down-regulated genes were identified in tumors relative to NAT samples (Figure 3A). Among these DEGs, 303 up-regulated and 78 down-regulated genes contain promoter BC-G4s while 314 up-regulated and 100 down-regulated genes harbor BC-G4s in the first introns. Taken together, BC-G4s are prominently present in the promoters and first introns of up-regulated DEGs (Fisher’s exact test p-value = 8.992 × 10⁻¹³, p-value = 2.022 × 10⁻⁸ for promoter and the first intron, respectively).

To further explore the biological implications, we compared Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment patterns between DEGs with BC-G4s in gene promoters and the first introns (Figure 3B,C). Interestingly, both groups are markedly enriched in cell cycle pathway compared to the DEGs lacking BC-G4s which indicates that BC-G4s may be an integral part in the regulation of cell cycle in breast cancer (Figure 3B,C and Supplementary Figure S2D). In addition, BC-G4s in the up-regulated genes are preferentially found in the 5′ end of the first introns and 3′ end of promoters, a pattern that is not observed in the down-regulated genes (Figure 3D,E and Supplementary Figure S2E). The proximity of BC-G4s to the TSSs in the up-regulated genes further underscores a pivotal role of G4 structures in the regulation of gene expression.

2.3. TF Binding to BC-G4s in the First Introns Compensates for G4–TF Interactions in the Cognate Promoters

The fact that BC-G4s are highly abundant near the TSSs prompted us to further assess their potential role in transcriptional regulation. We first evaluated the frequency of TF interactions with BC-G4s in promoters and introns of DEGs using ChIP-seq data for 148 TFs enriched in G4s. The result reflected that BC-G4s in the first intron of DEGs bind over twice as many TFs as BC-G4s in downstream introns (Figure 4A; Mann–Whitney U test, p-value = 1.376 × 10⁻¹⁵). This supports the preceding conclusion that BC-G4s in the first introns have a greater impact on gene regulation than those in other introns. Considering that the first introns exhibit a higher density of active transcriptional regulatory signals [21], we speculated that BC-G4s in the first introns harbor more TF binding events due to functionality of the first intron sites and the proximity to the TSS.

The frequent binding of TFs to BC-G4s in both promoters and first introns hinted inner relationship between these two groups of BC-G4s. Moreover, a study of Caenorhabditis elegans suggested that TFs binding to first introns are largely distinct from those binding to promoters, and the different interactions are complementary rather than redundant [22]. Therefore, we tested this hypothesis for the BC-G4s by comparing the number of G4–TF interactions between promoters and first introns for each gene. As expected, we detected a significant negative correlation between promoter and the first intron G4–TF interactions, implying that TF binding to both regions is mutually complementary to a certain extent in breast cancer (Figure 4B; Spearman’s correlation coefficient R = −0.37, p-value = 2.4 × 10⁻¹⁴).

Furthermore, we hypothesized that G4–TF interactions in the first introns may contribute to transcriptional regulation in two alternative ways. On one hand, BC-G4s in the first introns could share some identical TFs with those in promoters, leading to reinforcement and thereby robust regulation of gene expression. On the other hand, BC-G4s could bind different TFs in gene promoters and the first introns, allowing for independent modulation of transcription. To address this question, we calculated the proportion of TFs that bind both promoter and first intron G4s relative to the total TFs bound by first intron G4s, assessed on a gene-by-gene basis (Figure 4C). Our analysis revealed that 74% of first intron G4s share no TF with those in the cognate promoters, while approximately 4% share 10% of TFs with their respective promoter G4s. Thus, there is generally little overlap between G4–TF interactions in the first introns and their corresponding promoters, which infers that G4s within different regions could cooperatively regulate gene transcription in an independent manner (Figure 4C). Collectively, BC-G4s in promoters and the first introns of DEGs mostly contribute to additive modulation of gene expression by recruiting different TFs.

2.4. Promoter BC-G4s of DEGs Function as Hubs for TF Co-Binding

Extensive research has manifested that the assembly of the general transcription factors on promoter DNA is required to facilitate transcription initiation [36,37]. Considering that BC-G4s in promoters bind far more TFs than those in the first introns, we specially focused on the possible implications of interactions between promoter BC-G4s and various TFs. We identified 105 proteins (mostly TFs and cofactors) abundant in DEG promoter BC-G4s [38], and the top 20 most significantly enriched TFs are shown in Figure 5A. In addition, 6 of the 20 TFs were up-regulated in tumor relative to NAT samples (Supplementary Figure S3). In concurrence with our previous observations, these TFs are largely connected with the cell cycle control such as E2F1 and E2F4. We further explored the collaborative patterns of TFs enriched in DEG promoter BC-G4s. The protein–protein interaction (PPI) of the set of 105 proteins was analyzed using the STRING database and portrayed as a network using Cytoscape (Figure 5B). PPI analysis highlighted a pronounced enrichment of confirmed interactions among these proteins (STRING PPI enrichment p-value < 1 × 10⁻¹⁶) which implies that these proteins may be biologically connected or function as a group. To focus on the key module in large-scale PPI network, significant protein clusters were identified subsequently using the Molecular Complex Detection (MCODE) plugin of Cytoscape which is a density-based graph theoretic clustering algorithm. We selected the top two clusters (including TFs and TF partners) with the highest scores which are cluster 1 composed of 10 nodes and 41 edges (HDAC1, HIF1A, SMAD3, CEBPB, RELA, TP53, ESR1, MYC, JUN, and STAT3) and cluster 2 composed of five nodes and nine edges (RAD21, SMC1A, STAG2, CTCF, and STAG1). Gene ontology (GO) enrichment analysis revealed proteins in cluster 1 belong to transcription regulator complex whereas proteins in cluster 2 are mainly implicated in establishment of meiotic sister chromatid cohesion (Figure 5C,D).

Furthermore, we were curious about whether promoter BC-G4s might be the main contributor to the co-binding patterns of TF clusters. To test this hypothesis, we extracted the overlaps among the DNA binding sites within each cluster using breast cancer ChIP-seq data (ChIP-ATLAS). We subsequently calculated the fold enrichment of these sites, over random chance, in the promoter BC-G4 regions of DEGs. Intriguingly, the co-localized sites of TFs exhibited marked abundance in promoter BC-G4s (Table 1). For instance, the sites with more than two co-localized TFs from cluster 1 overlap with 267 of the 424 G4 regions, which show significant enrichment at these promoter BC-G4s. Notably, most G4 regions of the DEGs are potentially bound by multiple TFs, implying that these promoter BC-G4s might act as central regulators for TF cooperation.

Recent research on TF binding profiles uncovered human “stripe factors” that frequently colocalize with various other TFs, creating vertical stripes in motif or ChIP-seq hierarchical clustering maps [39]. According to the published research, SMAD3, TP53, ESR1, MYC, and JUN from cluster 1, along with RAD21 and CTCF from cluster 2 are predicted to be “stripe factors”. It has been verified that the binding of these factors in human cells facilitates chromatin accessibility and recruits TF partners across the genome [39]. Additionally, their GC-rich binding motifs provide robust evidence for direct interactions with G4s which reinforces the notion that promoter BC-G4s are fundamental to the collaborative patterns of TFs (Figure 5E).

Based on the co-binding patterns of G4–TF and TF–TF interactions, we screened two genes of which the promoter BC-G4 is enriched with the most TF–TF interactions: the cell cycle-related gene AURKA and mRNA processing gene CSTF1, as candidates with the potential for future targeted therapy (Figure 5F). Specifically, these two genes both exhibit significantly elevated expression in tumor samples relative to NAT or normal samples. It has also been reported that overexpression of AURKA and CSTF1 are associated with detrimental prognosis of breast cancer patients [40].

2.5. SNVs in BC-G4s Modulate G4 Structures and the Transcription of Breast Cancer-Associated Genes

A single nucleotide variation (SNV) within a G4 motif may cause a drastic change in G4 structure and, consequently, the expression of the regulated gene. Therefore, we investigated the impact of genome-wide G4–SNV interactions in breast cancer on the folding propensity of G4s and gene activity. To target SNVs that might influence the stability of G4s, we analyzed those located in G4 motifs within the BC-G4 peak regions. We downloaded the simple somatic mutation files of five breast cancer projects from the International Cancer Genome Consortium (ICGC) database release 28 and extracted a total of 4,103,395 SNV coordinates related to breast cancer. Among these, 3587 distinct SNVs were identified within 3523 motifs in BC-G4s, accounting for 3.7% of all BC-G4 motifs. The SNV-type distributions in BC-G4s were first computed in which we observed a slightly greater abundance of SNVs in G-tract regions when compared to loop regions and a high level of G > A mutation in G-tract regions (Supplementary Figure S4A).

To probe into the effects of SNVs on BC-G4 structural stability, the minimum free energy (MFE) of BC-G4s with or without SNVs was calculated using the RNAfold program from the Vienna RNA package which is a powerful tool to predict secondary structures of single stranded RNA or DNA. In particular, the MFE of BC-G4s with SNVs in the G-tract region was found to be significantly higher than the MFE of the original sequences (Figure 6A; Wilcoxon two-sided test, p-value < 2 × 10⁻¹⁶) whereas this phenomenon was not observed in the loop region (Supplementary Figure S4B). The result suggests that SNVs within the G-tract regions lead to a pronounced increase in structural instability.

We further analyzed the SNV types affecting BC-G4 structural stability and corroborated that SNVs disrupting G-tract regions are more prone to induce BC-G4 structural alterations, predominantly leading to G4 destabilization as expected. In addition, specific transition events, such as G > A in G-tract regions and C > T in loop regions, are more susceptible to triggering structural destabilization of BC-G4s (Figure 6B). Conversely, a small minority of SNVs further stabilize G4 structures, potentially due to the formation of Watson–Crick base-pairs induced by the SNVs within G4 structures [41].

In our research about the biomedical importance of SNVs within BC-G4s concerning gene regulation, the genomic annotation of SNVs revealed a notable enrichment in gene introns and promoter regions. In particular, intronic SNVs show more prevalence in the first introns. After being normalized by the number of SNVs in the background, i.e., both inside and outside of the G4 motifs, G4-related SNVs exhibit noteworthy abundance in the promoter and 5′ UTR regions (Figure 6C; Fisher’s exact test p-value < 2.2 × 10⁻¹⁶ for both promoter and 5′ UTR regions). These results are indicative of stronger purifying selection acting on G4 loci for motif retention than on the remaining sequences in genic functional regions, thus enabling the stability and function of BC-G4s [42]. To figure out the clinical significance of SNVs, the survival analysis of SNV location was performed using the data from ICGC which demonstrated that breast cancer patients with SNVs in G-tract regions have a lower overall survival probability than those with SNVs in loop regions (Figure 6D; log-rank test, p-value = 0.04).

In view of the crucial importance of BC-G4s in gene promoters and the first introns, we selected the genes accommodating SNVs that affect stability of these BC-G4s. We next input these genes to Phenolyzer, a tool for prioritizing human disease genes based on existing knowledge, to identify those associated with breast cancer [43]. From the top 10 genes with the highest scores, we selected 6 seed genes to examine their expression levels across tumor, NAT, and normal conditions (Supplementary Table S2). A seed gene refers to the gene with direct relations to breast cancer, based on the existing databases. Notably, AKT1, ERBB2 (HER2), CDK4, and CCNE1 are significantly amplified in tumor samples while PIK3CA displays the lower expression level in breast cancer (Figure 7A). According to previous research, each SNV may exert distinct effects depending on a combination of multiple factors and our findings further proved this observation [27]. AKT1 and PIK3CA are both implicated in PI3K/AKT pathway which is vital to cell growth and overexpression of ERBB2 also contributes to cell proliferation. The other three genes including CDKN1A, CDK4, and CCNE1 are all involved in cell cycle control.

Moreover, we focused on the gain of G4 motifs induced by SNVs. We first obtained 30 nucleotides from both sides of each SNV and identified pG4 patterns on both strands in the reference (REF) and alternative (ALT) sequences, respectively, using the pqsfinder package. The pqsfinder result manifested that 4613 SNVs lead to the formation of new G4 motifs. We subsequently overlapped these regions with experimentally validated G4 peaks and found 272 newly formed G4 motifs in breast cancer. Strikingly, a new G4 motif is formed in the promoter region of KRAS which exhibits a markedly increased expression level in breast cancer (Figure 7B).

In addition, we examined whether SNV-driven G4 signals in the promoter region of KRAS could be detected in breast cancer or other cellular contexts to confirm cancer-type specific G4 formation. Intriguingly, the relatively weak G4 signals observed around this site, generated in vitro or in vivo from HepG2 and K562 cell lines, indicated challenges in forming a stable G4 structure in these cells (Figure 7C). However, a part of the breast cancer samples exhibited pronounced signals at the SNV while others did not (Figure 7C). In summary, these results imply that this SNV in breast cancer may be fundamental to the formation of a new KRAS G4 structure.

3. Discussion

In this work, we have conducted a comprehensive and in-depth research on genome-wide characteristics of BC-G4s generated from public G4 data in breast cancer. Our results reveal that BC-G4s play a prominent part in the regulation of gene expression and cell cycle pathway. We also notice that BC-G4s in the first introns have more TF interactions than downstream introns. Based on TF enrichment patterns and the impact of SNVs on BC-G4s, we have identified several important G4s and G4-related genes that may serve as the potential drug target for future breast cancer therapy.

Advances in computational methods for pG4 prediction and G4 sequencing technologies have facilitated the investigation of G4 structures in different cell lines [44,45,46]. We have identified BC-G4s derived from PDTX samples and MCF7 cell line, thus including multiple subtypes in breast cancer. Previous works have emphasized the specificity of differentially expressed G4 regions in distinct breast cancer subtypes, while we aimed to provide a comprehensive overview of all possible emerging BC-G4s throughout the human genome. For this purpose, we surveyed their presence in gene regulatory elements. The prevalence of BC-G4s in promoters and introns aroused our interest in the exploration of the correlation between genes marked by promoter or intronic BC-G4s and their expression levels. Remarkably, we demonstrated the positive effects of BC-G4s in gene promoters and the first introns on elevated gene expression (Figure 2B,F), which can be attributed to the proximity of these regions to the TSSs.

A prior study correlated enhanced gene transcription with various TFs binding to promoter G4s [17]. Herein, we extend this pattern to BC-G4s in the first introns. We observed that BC-G4s in the first introns could also recruit TFs and facilitate gene expression, implying a universally positive role of G4s located both upstream and downstream of the TSSs in transcription. Notably, TF interactions with BC-G4s in the first introns are not redundant with those in promoters, suggesting cooperative regulation of gene expression (Figure 8). A pioneer analysis of the interactions between TFs and the first introns in Caenorhabditis elegans elaborated the combined effects of multiple regulatory regions [22]. We further confirmed that G4s may be fundamental to the additive regulation of gene expression.

Replicative immortality is one of the hallmarks of cancer that typically arises from the deregulation of cell cycle pathways. The disruption of cell cycle control caused by molecular alterations in breast cancer drives genome instability and tumor progression [47]. The cell cycle-targeted therapy has been considered a promising anti-cancer strategy. We observed that BC-G4s were apparently implicated in the modulation of cell cycle pathways which inspired us to investigate novel inhibitors of cell cycle regulators from the perspective of BC-G4s (Figure 3B,C). We have selected several genes probably regulated by BC-G4s including CDK4, CCNE1, CDKN1A, and AURKA which are components of the cell cycle machinery with clinical potential. The others such as AKT1, PI3KCA, and ERBB2 are also closely related to cell cycle regulation. More specifically, upstream oncogenic signaling such as PI3K/AKT/mTOR signaling pathway leads to the activation of cyclin D-CDK4/6 complex, which facilitate the phosphorylation of RB1 further contributing to E2F release and transition of cell cycle from G1 to S phase [48,49,50]. Notably, E2F proteins like E2F1 and E2F4 revealed a marked enrichment in promoter BC-G4s of DEGs (Figure 5A). Over the past two decades, inhibitors of CDK4/6 have been widely used as first-line therapy for hormone receptor-positive and human epidermal growth factor receptor 2 (HER2)-negative metastatic breast cancer [51,52,53,54,55,56]. However, cell cycle alterations as well as upstream oncogenic signal transduction alterations can promote resistance to CDK4/6 inhibitors [57]. Given these findings, combinatorial strategies could be a great aid and advantage in conquering the problems and targeting the BC-G4s involved in cell cycle processes hopefully represents a breakthrough in clinical trials [48]. Furthermore, the BC-G4 in the promoter of AURKA is enriched with multiple TF–TF interactions suggesting the vigorous activity of this G4 region in breast cancer (Figure 5F). In recent years, the emergence of more effective biochemical methods has greatly contributed to the study of G4-protein interactions in diseases, thus adding new dimensions to clinical applications [58,59].

The well-known proto-oncogene KRAS is one of the most frequently mutated genes in various cancers and the overexpression of KRAS can result in poor survival. The core promoter region of human KRAS extending from +50 to −510 bp in relation to the TSS is characterized by a high G/C content [60]. KRAS-G4 comprising a GC-rich nuclease hypersensitive element (NHE) has been shown to be a transcriptional modulator recognized by several nuclear proteins [61,62,63]. Intriguingly, we discovered that the promoter region of KRAS probably formed a novel G4 proximal to the TSS based on the occurrence of SNVs in breast cancer which is distinct from the well-established KRAS-G4 previously. Remarkably, the first approval of targeted therapy for non-small cell lung carcinoma (NSCLC) patients with KRAS mutation has shed light on the development of KRAS-targeting drugs [64]. Additionally, the structural insights into the KRAS-G4-ligand interactions contribute to the rational design of KRAS-G4 specific drugs [62]. Although we are not clear about how this newly formed G4 induced by SNV correlates with the overexpression of KRAS, it will pave the way for drug innovation in treatment of KRAS-mutant breast cancer patients.

Taken together, we emphasize that the presence of BC-G4s in promoters and the first introns is inextricably linked with enhanced gene expression and have a pivotal role in TF binding. Notably, it has been proved that BC-G4s may cover multiple aspects of cell cycle regulation. In addition to the structural alterations caused by SNVs, we also observed the formation of a novel G4 motif in the promoter region of KRAS in breast cancer. Collectively, studies on G4-protein interactions and G4-variant effects will present a promising new strategy for drug design in breast cancer.

4. Materials and Methods

4.1. G4 Dataset Acquisition

In this study, we mainly focused on the experimentally validated G4 dataset in breast cancer. First, we downloaded the G4 coordinate files of 22 breast cancer patient-derived tumor xenograft (PDTX) models and MCF7 breast adenocarcinoma cell line from the Gene Expression Omnibus (GEO) database (GSE152216 and GSE181373) that were obtained from quantitative G4-chromatin immunoprecipitation (ChIP)-seq and single-nuclei (sn) CUT&Tag both using G4 structure-specific antibody BG4 [29,31]. Then we merged all the G4 peaks from breast cancer tissues and cell line to generate a single G4 DNA consensus of 47,797 G4 regions (hereafter named as BC-G4s) using the bedtools merge function (version 2.29.1). Genomic feature distributions of BC-G4 were analyzed using the ChIPseeker (version 1.34.1) package [65].

The putative G-quadruplex (pG4) regions in the human genome hg19 assembly across both strands were obtained from GSE133379 [46], which includes multiple G4 motif subtypes such as 4G, 4GL15, Bulges, and GVBQ. The intersections between the putative and experimental G4 were acquired by using bedtools intersection function. The in vitro G4 data captured by the G4-seq technique were extracted from GSE63874 [44]. The G4 files for HepG2 and K562 cell lines derived from G4 ChIP-seq were downloaded from GSE145090 [17].

4.2. RNA-Seq Dataset Acquisition and Processing

We chose RNA-seq datasets from TCGA and GTEx since they are two consortia for population-level studies where samples are from patients or healthy individuals. The raw feature counts based on an identical processing protocol described in Rahman et al. [33] were obtained from GSE62944 (1119 tumor samples and 113 NAT samples in breast cancer) and GSE86354 (92 normal samples from GTEx in breast tissue) (Supplementary Figure S2A) [33,34]. Dimensionality reduction was performed through the Rtsne (version 0.17) package on the log2 CPM (counts per million) values calculated by the edgeR (version 3.40.2) package (Supplementary Figure S2B). Since large-scale RNA-seq datasets generated in different laboratories and at different times contain batch-specific systematic variations, we employed the RUVg method from the RUVseq package (version 1.32.0) for batch-effect removal and data integration [35]. The list of housekeeping genes was considered as a good set of negative controls suggested by the official user guide [66]. Batch effect correction could be visualized in the relative log expression (RLE) plot where the distributions were centered around the zero line and undistinguishable between the conditions (Supplementary Figure S2C).

4.3. Differential Expression Analysis

The count matrix processing was performed by edgeR package [67]. We filtered out genes with very low counts followed by TMM (the trimmed mean of M values) normalization and then transferred counts to CPM. To avoid exaggerated false positives for large-sample-size data, the Wilcoxon rank-sum test rather than parametric methods was selected to identify differentially expressed genes (DEGs) for its solid false discover rate (FDR) control and good power [68]. Differential expression analysis was performed between pairs of the three conditions (Supplementary Table S1). Each gene’s CPM was input into the wilcox.test function in R for p-value calculation. The FDR value was obtained using the Benjamini and Hochberg method. DEGs between tumor and NAT samples was identified if (1) FDR < 0.01, and (2) ≥4-fold expression change. KEGG pathway enrichment plot was created using the clusterProfiler (version 4.6.2) package.

4.4. BC-G4 Overlapping Gene Promoter and Intron Expression Analysis

Promoter (1 kb upstream from the transcription start site, TSS) and intron coordinates were generated using the hg19 assembly (https://www.gencodegenes.org/human/release_19.html (accessed on 1 July 2013)). The gene promoters and introns harboring BC-G4 were obtained using the bedtools annotate function. Significance was tested using the Wilcoxon test.

4.5. Transcription Factor Enrichment Analysis

The TFs enriched in promoter and intronic BC-G4s of DEGs were calculated using the ChIP-Atlas Enrichment Analysis web tool (https://chip-atlas.org/enrichment_analysis (accessed on 16 May 2024)) with the following parameters [69,70]: Experiment type: ChIP: TFs and others; Cell type Class: Breast; Threshold for Significance: 500; Enter dataset A: promoter and intronic BC-G4s of DEGs in bed format; Enter dataset B: Random permutation of dataset A (100 permutation times). The result table containing 11 tab-separated columns was downloaded. Rows were filtered based on multiple criteria: no. 5 cell belongs to breast cancer cells; no. 9 log(p-value) < −3 and no. 11 fold enrichment > 1. 148 TFs in total were determined for further study.

4.6. ChIP Binding Site Analysis

ChIP-seq data of TFs in breast cancer were downloaded from http://chip-atlas.org/peak_browser (accessed on 16 May 2024). The midpoint of the ChIP-seq BED region was defined as the binding site. Promoter and intronic BC-G4s of DEGs were analyzed individually to verify whether the TF ChIP peak midpoints were located between the start and end of the fragment.

4.7. Transcription Factor Network Analysis

The protein–protein interaction (PPI) network of 105 TFs enriched in promoter BC-G4s of DEGs with the cutoff interaction score set at 0.9 was constructed using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) website (http://string-db.org/) [71]. The network was subsequently portrayed using Cytoscape software (version 3.10.0) [72]. The PPI enrichment p-value was sourced from the STRING web tool. Next, the central clustering modules in the PPI network were screened out using the “MCODE” plugin of Cytoscape with the default parameters (Network Scoring: Include Loops: false; Degree Cutoff: 2; Cluster Finding: Node Score Cutoff: 0.2; Haircut: true; Fluff: false; K-Core: 2; Max. Depth from Seed: 100) and top 2 clusters with the highest scores were selected. ChIP-seq BED files of 15 TFs from the top 2 clusters were downloaded from ChIP-Atlas Peak Browser web tool. The peaks were further screened based on the IDs that were previously determined in the TF enrichment result table. The common intervals among multiple BED files from the same cluster were calculated using bedtools multiinter. The enrichment analysis of intersections among TF binding sites in promoter BC-G4s of DEGs was conducted following the same steps as TF enrichment analysis. p-values were calculated with two-tailed Fisher’s exact probability test. TF binding motifs in humans were downloaded from JASPAR (https://jaspar.elixir.no/) [73].

4.8. SNV Dataset Acquisition

The International Cancer Genome Consortium (ICGC) has been extensively used as a tool for cancer genomic analyses and data generated from the TCGA was included in its current release (ICGC release 28, https://dcc.icgc.org/releases/release_28 (accessed on 26 November 2019)) [74]. ICGC release 28 contains five breast cancer projects including (1) BRCA-EU Breast ER+ and HER2− Cancer—EU/UK, (2) BRCA-FR Breast Cancer—FR, (3) BRCA-KR Breast Cancer—Very young women—KR, (4) BRCA-UK Breast Triple Negative/Lobular Cancer—UK and (5) BRCA-US Breast Cancer—TCGA, US. Single nucleotide variation (SNV) coordinates in breast cancer were extracted from simple somatic mutation files downloaded from these five projects. We removed apparently duplicated SNVs and obtained 4,103,395 SNV coordinates in total for subsequent analyses.

The donor survival time data were downloaded from https://icgc-xena-hub.s3.us-east-1.amazonaws.com/download/donor%2Fdonor.all_projects.overallSurvival.gz (accessed on 26 November 2019).

4.9. SNV Affecting BC-G4s

Overlaps between SNVs and BC-G4 motifs were calculated using the bedtools intersect function. To evaluate the effect of SNVs on the structural stability of BC-G4s, we compared the minimum free energy (MFE) of BC-G4s before and after the occurrence of SNV computed by the Vienna Package RNAfold (version 2.6.4) [75]. A lower MFE value indicates a more stable structure while a higher MFE value means a less stable structure. G4-related SNVs in genomic regions were annotated using bedtools and the enrichment score for each annotation was calculated by taking the formula described as follows, by comparing the proportion of G4-related SNVs in a specific genomic region to that of background SNVs in breast cancer:

{E n r i c h m e n t S c o r e}_{G 4, r e g i o n} = \frac{N_{G 4, r e g i o n} / N_{G 4, t o t a l}}{N_{B C, r e g i o n} / N_{B C, t o t a l}}

where N_G4,region denotes the number of G4-related SNVs in the given genomic region. N_G4,total represents the total number of G4-related SNVs. N_BC,region is the number of breast cancer SNVs in the genomic region. N_BC,total stands for the total number of breast cancer SNVs. An enrichment score greater than 1 suggests an over-representation of G4-related SNVs in the specified genomic region relative to the background SNV distribution in breast cancer. Fisher’s exact tests were conducted to assess the enrichment of the SNVs in BC-G4s.

In addition, 30 nucleotide sequences flanking the SNV position were extracted by bedtools getfasta. The reference and variant sequence in each pair were both searched for the putative quadruplex motif using the pqsfinder (version 2.14.1) package with default parameters [76]. We selected the newly formed G4 motifs with in vivo experimental support. Based on our previous study, genes harboring SNVs located in the first introns and upstream regions that impact G4 structures or facilitate to form new G4 motifs were input into Phenolyzer to search for the seed genes associated with breast cancer (Supplementary Table S2, Supplementary Figure S4C,D) [43]. The Kaplan–Meier method was used to estimate overall survival (OS), and the log-rank test was utilized to compare them.

4.10. Statistical Analysis

Wilcoxon tests were conducted to compare gene expression values between groups in this study. To validate the higher frequency of BC-G4s in the first introns, a permutation test was performed by randomizing the intronic BC-G4 positions across all intron regions using bedtools shuffle function. The proportion of BC-G4s in the first introns was calculated in each of 10,000 random permutations and compared to the observed proportion. We conducted Fisher’s exact test to assess the enrichment of BC-G4s in promoters and the first introns of DEGs. Spearman’s correlation coefficient was calculated to investigate the relationship between G4–TF interactions in the first introns and the cognate promoters.

5. Conclusions

In summary, this work is a pioneer exploration of genome-wide characteristics of BC-G4s. Based on TF enrichment patterns and the impact of SNVs on BC-G4s, we have identified several important G4s and G4-related genes that may serve as the potential drug targets for future breast cancer therapy. It is an appealing prospect that our analytical framework could also be useful in the search for novel cancer treatments beyond breast cancer.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26146874/s1.

Author Contributions

Conceptualization, H.S. and X.S.; methodology, H.S.; software, H.S.; validation, H.S. and K.X.; formal analysis, H.S. and K.X.; investigation, H.S. and K.X.; resources, H.S. and K.X.; data curation, H.S. and K.X.; writing—original draft preparation, H.S.; writing—review and editing, H.S., K.X., W.Z., R.Z., T.T. and X.S.; visualization, H.S.; supervision, X.S.; project administration, X.S.; funding acquisition, K.X. and X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Leading Technology Program of Jiangsu Province (grant number BK20222008), the National Natural Science Foundation of China (No. 62472084 and 62002060), and the Fundamental Research Funds for the Central Universities of China (No. 2242023K5005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The BC-G4 data that support the findings are openly available in github at https://github.com/hlshu/BC-G4peak/ (accessed on 11 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

G4	G-quadruplexes
TSS	transcription start site
TF	transcription factor
SNV	single nucleotide variant
PDTX	patient-derived tumor xenograft
DEG	differentially expressed gene
PPI	protein–protein interaction
MFE	minimum free energy

References

Varshney, D.; Spiegel, J.; Zyner, K.; Tannahill, D.; Balasubramanian, S. The regulation and functions of DNA and RNA G-quadruplexes. Nat. Rev. Mol. Cell Biol. 2020, 21, 459–474. [Google Scholar] [CrossRef] [PubMed]
Rhodes, D.; Lipps, H.J. G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 2015, 43, 8627–8637. [Google Scholar] [CrossRef] [PubMed]
Maizels, N.; Gray, L.T. The G4 genome. PLoS Genet. 2013, 9, e1003468. [Google Scholar] [CrossRef] [PubMed]
Todd, A.K.; Johnston, M.; Neidle, S. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 2005, 33, 2901–2907. [Google Scholar] [CrossRef] [PubMed]
Kosiol, N.; Juranek, S.; Brossart, P.; Heine, A.; Paeschke, K. G-quadruplexes: A promising target for cancer therapy. Mol. Cancer 2021, 20, 40. [Google Scholar] [CrossRef] [PubMed]
Georgakopoulos-Soares, I.; Parada, G.E.; Wong, H.Y.; Medhi, R.; Furlan, G.; Munita, R.; Miska, E.A.; Kwok, C.K.; Hemberg, M. Alternative splicing modulation by G-quadruplexes. Nat. Commun. 2022, 13, 2404. [Google Scholar] [CrossRef] [PubMed]
Yuan, J.; He, X.; Wang, Y. G-quadruplex DNA contributes to RNA polymerase II-mediated 3D chromatin architecture. Nucleic Acids Res. 2023, 51, 8434–8446. [Google Scholar] [CrossRef] [PubMed]
Lorenzatti, A.; Piga, E.J.; Gismondi, M.; Binolfi, A.; Margarit, E.; Calcaterra, N.B.; Armas, P. Genetic variations in G-quadruplex forming sequences affect the transcription of human disease-related genes. Nucleic Acids Res. 2023, 51, 12124–12139. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Shu, H.; Wang, Y.; Tao, T.; Tu, J.; Wang, C.; Mergny, J.L.; Sun, X. G-Quadruplex Structures Are Key Modulators of Somatic Structural Variants in Cancers. Cancer Res. 2023, 83, 1234–1248. [Google Scholar] [CrossRef] [PubMed]
Armas, P.; David, A.; Calcaterra, N.B. Transcriptional control by G-quadruplexes: In vivo roles and perspectives for specific intervention. Transcription 2017, 8, 21–25. [Google Scholar] [CrossRef] [PubMed]
Armas, P.; Calcaterra, N.B. G-quadruplex in animal development: Contribution to gene expression and genomic heterogeneity. Mech. Dev. 2018, 154, 64–72. [Google Scholar] [CrossRef] [PubMed]
Robinson, J.; Raguseo, F.; Nuccio, S.P.; Liano, D.; Di Antonio, M. DNA G-quadruplex structures: More than simple roadblocks to transcription? Nucleic Acids Res. 2021, 49, 8419–8431. [Google Scholar] [CrossRef] [PubMed]
Halder, K.; Halder, R.; Chowdhury, S. Genome-wide analysis predicts DNA structural motifs as nucleosome exclusion signals. Mol. Biosyst. 2009, 5, 1703–1712. [Google Scholar] [CrossRef] [PubMed]
Huppert, J.L.; Balasubramanian, S. G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res. 2007, 35, 406–413. [Google Scholar] [CrossRef] [PubMed]
Shen, J.Z.; Varshney, D.; Simeone, A.; Zhang, X.Y.; Adhikari, S.; Tannahill, D.; Balasubramanian, S. Promoter G-quadruplex folding precedes transcription and is controlled by chromatin. Genome Biol. 2021, 22, 143. [Google Scholar] [CrossRef] [PubMed]
Hansel-Hertsch, R.; Beraldi, D.; Lensing, S.V.; Marsico, G.; Zyner, K.; Parry, A.; Di Antonio, M.; Pike, J.; Kimura, H.; Narita, M.; et al. G-quadruplex structures mark human regulatory chromatin. Nat. Genet. 2016, 48, 1267–1272. [Google Scholar] [CrossRef] [PubMed]
Spiegel, J.; Cuesta, S.M.; Adhikari, S.; Hansel-Hertsch, R.; Tannahill, D.; Balasubramanian, S. G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol. 2021, 22, 117. [Google Scholar] [CrossRef] [PubMed]
Cogoi, S.; Paramasivam, M.; Membrino, A.; Yokoyama, K.K.; Xodo, L.E. The Promoter Responds to Myc-associated Zinc Finger and Poly(ADP-ribose) Polymerase 1 Proteins, Which Recognize a Critical Quadruplex-forming-element. J. Biol. Chem. 2010, 285, 22003–22016. [Google Scholar] [CrossRef] [PubMed]
Raiber, E.A.; Kranaster, R.; Lam, E.; Nikan, M.; Balasubramanian, S. A non-canonical DNA structure is a binding motif for the transcription factor SP1. Nucleic Acids Res. 2012, 40, 1499–1508. [Google Scholar] [CrossRef] [PubMed]
Jo, S.S.; Choi, S.S. Analysis of the Functional Relevance of Epigenetic Chromatin Marks in the First Intron Associated with Specific Gene Expression Patterns. Genome Biol. Evol. 2019, 11, 786–797. [Google Scholar] [CrossRef] [PubMed]
Park, S.G.; Hannenhalli, S.; Choi, S.S. Conservation in first introns is positively associated with the number of exons within genes and the presence of regulatory epigenetic signals. BMC Genom. 2014, 15, 526. [Google Scholar] [CrossRef] [PubMed]
Fuxman Bass, J.I.; Tamburino, A.M.; Mori, A.; Beittel, N.; Weirauch, M.T.; Reece-Hoyes, J.S.; Walhout, A.J. Transcription factor binding to Caenorhabditis elegans first introns reveals lack of redundancy with gene promoters. Nucleic Acids Res. 2014, 42, 153–162. [Google Scholar] [CrossRef] [PubMed]
Zhu, W.Y.; Huang, H.; Ming, W.L.; Zhang, R.X.; Gu, Y.; Bai, Y.F.; Liu, X.A.; Liu, H.D.; Liu, Y.; Gu, W.J.; et al. Delineating highly transcribed noncoding elements landscape in breast cancer. Comput. Struct. Biotec. 2023, 21, 4432–4445. [Google Scholar] [CrossRef] [PubMed]
Eddy, J.; Maizels, N. Conserved elements with potential to form polymorphic G-quadruplex structures in the first intron of human genes. Nucleic Acids Res. 2008, 36, 1321–1333. [Google Scholar] [CrossRef] [PubMed]
Lappalainen, T.; MacArthur, D.G. From variant to function in human disease genetics. Science 2021, 373, 1464–1468. [Google Scholar] [CrossRef] [PubMed]
Maurano, M.T.; Humbert, R.; Rynes, E.; Thurman, R.E.; Haugen, E.; Wang, H.; Reynolds, A.P.; Sandstrom, R.; Qu, H.; Brody, J.; et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 2012, 337, 1190–1195. [Google Scholar] [CrossRef] [PubMed]
Gong, J.Y.; Wen, C.J.; Tang, M.L.; Duan, R.F.; Chen, J.N.; Zhang, J.Y.; Zheng, K.W.; He, Y.D.; Hao, Y.H.; Yu, Q.; et al. G-quadruplex structural variations in human genome associated with single-nucleotide variations and their impact on gene activity. Proc. Natl. Acad. Sci. USA 2021, 118, e2013230118. [Google Scholar] [CrossRef] [PubMed]
Neupane, A.; Chariker, J.H.; Rouchka, E.C. Analysis of Nucleotide Variations in Human G-Quadruplex Forming Regions Associated with Disease States. Genes 2023, 14, 2125. [Google Scholar] [CrossRef] [PubMed]
Hansel-Hertsch, R.; Simeone, A.; Shea, A.; Hui, W.W.I.; Zyner, K.G.; Marsico, G.; Rueda, O.M.; Bruna, A.; Martin, A.; Zhang, X.; et al. Landscape of G-quadruplex DNA structural regions in breast cancer. Nat. Genet. 2020, 52, 878–883. [Google Scholar] [CrossRef] [PubMed]
Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef] [PubMed]
Hui, W.W.I.; Simeone, A.; Zyner, K.G.; Tannahill, D.; Balasubramanian, S. Single-cell mapping of DNA G-quadruplex structures in human cancer cells. Sci. Rep. 2021, 11, 23641. [Google Scholar] [CrossRef] [PubMed]
Bradnam, K.R.; Korf, I. Longer First Introns Are a General Property of Eukaryotic Gene Structure. PLoS ONE 2008, 3, e3093. [Google Scholar] [CrossRef] [PubMed]
Rahman, M.; Jackson, L.K.; Johnson, W.E.; Li, D.Y.; Bild, A.H.; Piccolo, S.R. Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results. Bioinformatics 2015, 31, 3666–3672. [Google Scholar] [CrossRef] [PubMed]
Aran, D.; Camarda, R.; Odegaard, J.; Paik, H.; Oskotsky, B.; Krings, G.; Goga, A.; Sirota, M.; Butte, A.J. Comprehensive analysis of normal adjacent to tumor transcriptomes. Nat. Commun. 2017, 8, 1077. [Google Scholar] [CrossRef] [PubMed]
Risso, D.; Ngai, J.; Speed, T.P.; Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 2014, 32, 896–902. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Pugh, B.F. What do Transcription Factors Interact With? J. Mol. Biol. 2021, 433, 166883. [Google Scholar] [CrossRef] [PubMed]
Haberle, V.; Stark, A. Eukaryotic core promoters and the functional basis of transcription initiation. Nat. Rev. Mol. Cell Biol. 2018, 19, 621–637. [Google Scholar] [CrossRef] [PubMed]
Lambert, S.A.; Jolma, A.; Campitelli, L.F.; Das, P.K.; Yin, Y.; Albu, M.; Chen, X.; Taipale, J.; Hughes, T.R.; Weirauch, M.T. The Human Transcription Factors. Cell 2018, 172, 650–665. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.B.; Vartak, S.V.; Conte, A.; Wang, X.; Garcia, D.A.; Stevens, E.; Jung, S.K.; Kieffer-Kwon, K.R.; Vian, L.; Stodola, T.; et al. “Stripe” transcription factors provide accessibility to co-binding partners in mammalian genomes. Mol. Cell 2022, 82, 3398–3411. [Google Scholar] [CrossRef] [PubMed]
Morafraile, E.C.; Pérez-Peña, J.; Fuentes-Antrás, J.; Manzano, A.; Pérez-Segura, P.; Pandiella, A.; Galán-Moya, E.M.; Ocaña, A. Genomic Correlates of DNA Damage in Breast Cancer Subtypes. Cancers 2021, 13, 2117. [Google Scholar] [CrossRef] [PubMed]
Zhang, N.; Gorin, A.; Majumdar, A.; Kettani, A.; Chernichenko, N.; Skripkin, E.; Patel, D.J. Dimeric DNA quadruplex containing major groove-aligned A•T•A•T and G•C•G•C tetrads stabilized by inter-subunit Watson-Crick A•T and G•C pairs. J. Mol. Biol. 2001, 312, 1073–1088. [Google Scholar] [CrossRef] [PubMed]
Guiblet, W.M.; DeGiorgio, M.; Cheng, X.; Chiaromonte, F.; Eckert, K.A.; Huang, Y.F.; Makova, K.D. Selection and thermostability suggest G-quadruplexes are novel functional elements of the human genome. Genome Res. 2021, 31, 1136–1149. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Robinson, P.N.; Wang, K. Phenolyzer: Phenotype-based prioritization of candidate genes for human diseases. Nat. Methods 2015, 12, 841–843. [Google Scholar] [CrossRef] [PubMed]
Chambers, V.S.; Marsico, G.; Boutell, J.M.; Di Antonio, M.; Smith, G.P.; Balasubramanian, S. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015, 33, 877–881. [Google Scholar] [CrossRef] [PubMed]
Lyu, J.; Shao, R.; Kwong Yung, P.Y.; Elsasser, S.J. Genome-wide mapping of G-quadruplex structures with CUT&Tag. Nucleic Acids Res. 2022, 50, e13. [Google Scholar] [CrossRef] [PubMed]
Zheng, K.W.; Zhang, J.Y.; He, Y.D.; Gong, J.Y.; Wen, C.J.; Chen, J.N.; Hao, Y.H.; Zhao, Y.; Tan, Z. Detection of genomic G-quadruplexes in living cells using a small artificial protein. Nucleic Acids Res. 2020, 48, 11706–11720. [Google Scholar] [CrossRef] [PubMed]
Fuentes-Antras, J.; Bedard, P.L.; Cescon, D.W. Seize the engine: Emerging cell cycle targets in breast cancer. Clin. Transl. Med. 2024, 14, e1544. [Google Scholar] [CrossRef] [PubMed]
Spring, L.M.; Wander, S.A.; Andre, F.; Moy, B.; Turner, N.C.; Bardia, A. Cyclin-dependent kinase 4 and 6 inhibitors for hormone receptor-positive breast cancer: Past, present, and future. Lancet 2020, 395, 817–827. [Google Scholar] [CrossRef] [PubMed]
Weinberg, R.A. The retinoblastoma protein and cell cycle control. Cell 1995, 81, 323–330. [Google Scholar] [CrossRef] [PubMed]
Wang, J.Y.; Knudsen, E.S.; Welch, P.J. The retinoblastoma tumor suppressor protein. Adv. Cancer Res. 1994, 64, 25–85. [Google Scholar] [CrossRef] [PubMed]
Dickler, M.N.; Tolaney, S.M.; Rugo, H.S.; Cortés, J.; Diéras, V.; Patt, D.; Wildiers, H.; Hudis, C.A.; O’Shaughnessy, J.; Zamora, E.; et al. MONARCH 1, A Phase II Study of Abemaciclib, a CDK4 and CDK6 Inhibitor, as a Single Agent, in Patients with Refractory HR+/HER2-Metastatic Breast Cancer. Clin. Cancer Res. 2017, 23, 5218–5224. [Google Scholar] [CrossRef] [PubMed]
Finn, R.S.; Martin, M.; Rugo, H.S.; Jones, S.; Im, S.-A.; Gelmon, K.; Harbeck, N.; Lipatov, O.N.; Walshe, J.M.; Moulder, S.; et al. Palbociclib and Letrozole in Advanced Breast Cancer. N. Engl. J. Med. 2016, 375, 1925–1936. [Google Scholar] [CrossRef] [PubMed]
Hortobagyi, G.N.; Stemmer, S.M.; Burris, H.A.; Yap, Y.-S.; Sonke, G.S.; Paluch-Shimon, S.; Campone, M.; Blackwell, K.L.; André, F.; Winer, E.P.; et al. Ribociclib as First-Line Therapy for HR-Positive, Advanced Breast Cancer. N. Engl. J. Med. 2016, 375, 1738–1748. [Google Scholar] [CrossRef] [PubMed]
Im, S.-A.; Lu, Y.-S.; Bardia, A.; Harbeck, N.; Colleoni, M.; Franke, F.; Chow, L.; Sohn, J.; Lee, K.-S.; Campos-Gomez, S.; et al. Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer. N. Engl. J. Med. 2019, 381, 307–316. [Google Scholar] [CrossRef] [PubMed]
Turner, N.C.; Ro, J.; André, F.; Loi, S.; Verma, S.; Iwata, H.; Harbeck, N.; Loibl, S.; Huang Bartlett, C.; Zhang, K.; et al. Palbociclib in Hormone-Receptor-Positive Advanced Breast Cancer. N. Engl. J. Med. 2015, 373, 209–219. [Google Scholar] [CrossRef] [PubMed]
Turner, N.C.; Slamon, D.J.; Ro, J.; Bondarenko, I.; Im, S.-A.; Masuda, N.; Colleoni, M.; DeMichele, A.; Loi, S.; Verma, S.; et al. Overall Survival with Palbociclib and Fulvestrant in Advanced Breast Cancer. N. Engl. J. Med. 2018, 379, 1926–1936. [Google Scholar] [CrossRef] [PubMed]
Wander, S.A.; Cohen, O.; Johnson, G.N.; Kim, D.; Luo, F.; Mao, P.; Nayar, U.; Helvie, K.; Marini, L.; Freeman, S.; et al. Whole exome sequencing (WES) in hormone-receptor positive (HR+) metastatic breast cancer (MBC) to identify mediators of resistance to cyclin-dependent kinase 4/6 inhibitors (CDK4/6i). J. Clin. Oncol. 2018, 36, 12016. [Google Scholar] [CrossRef]
Dai, Y.; Teng, X.; Zhang, Q.; Hou, H.; Li, J. Advances and challenges in identifying and characterizing G-quadruplex-protein interactions. Trends Biochem. Sci. 2023, 48, 894–909. [Google Scholar] [CrossRef] [PubMed]
Shu, H.; Zhang, R.; Xiao, K.; Yang, J.; Sun, X. G-Quadruplex-Binding Proteins: Promising Targets for Drug Design. Biomolecules 2022, 12, 648. [Google Scholar] [CrossRef] [PubMed]
Pramanik, S.; Chen, Y.; Song, H.; Khutsishvili, I.; Marky, L.A.; Ray, S.; Natarajan, A.; Singh, P.K.; Bhakat, K.K. The human AP-endonuclease 1 (APE1) is a DNA G-quadruplex structure binding protein and regulates KRAS expression in pancreatic ductal adenocarcinoma cells. Nucleic Acids Res. 2022, 50, 3394–3412. [Google Scholar] [CrossRef] [PubMed]
Cogoi, S.; Xodo, L.E. G-quadruplex formation within the promoter of the KRAS proto-oncogene and its effect on transcription. Nucleic Acids Res. 2006, 34, 2536–2549. [Google Scholar] [CrossRef] [PubMed]
Wang, K.-B.; Liu, Y.; Li, J.; Xiao, C.; Wang, Y.; Gu, W.; Li, Y.; Xia, Y.-Z.; Yan, T.; Yang, M.-H.; et al. Structural insight into the bulge-containing KRAS oncogene promoter G-quadruplex bound to berberine and coptisine. Nat. Commun. 2022, 13, 6016. [Google Scholar] [CrossRef] [PubMed]
Cogoi, S.; Paramasivam, M.; Spolaore, B.; Xodo, L.E. Structural polymorphism within a regulatory element of the human KRAS promoter: Formation of G4-DNA recognized by nuclear proteins. Nucleic Acids Res. 2008, 36, 3765–3780. [Google Scholar] [CrossRef] [PubMed]
Chen, K.; Zhang, Y.; Qian, L.; Wang, P. Emerging strategies to target RAS signaling in human cancer therapy. J. Hematol. Oncol. 2021, 14, 116. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Li, M.; Wu, T.; Zhan, L.; Li, L.; Chen, M.; Xie, W.; Xie, Z.; Hu, E.; Xu, S.; et al. Exploring Epigenomic Datasets by ChIPseeker. Curr. Protoc. 2022, 2, e585. [Google Scholar] [CrossRef] [PubMed]
Eisenberg, E.; Levanon, E.Y. Human housekeeping genes, revisited. Trends Genet. 2013, 29, 569–574. [Google Scholar] [CrossRef] [PubMed]
Zhou, X.; Lindsay, H.; Robinson, M.D. Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res. 2014, 42, e91. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Ge, X.; Peng, F.; Li, W.; Li, J.J. Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome Biol. 2022, 23, 79. [Google Scholar] [CrossRef] [PubMed]
Oki, S.; Ohta, T.; Shioi, G.; Hatanaka, H.; Ogasawara, O.; Okuda, Y.; Kawaji, H.; Nakaki, R.; Sese, J.; Meno, C. ChIP-Atlas: A data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep. 2018, 19, e46255. [Google Scholar] [CrossRef] [PubMed]
Zou, Z.; Ohta, T.; Miura, F.; Oki, S. ChIP-Atlas 2021 update: A data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res. 2022, 50, W175–W182. [Google Scholar] [CrossRef] [PubMed]
Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K.P.; et al. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015, 43, D447–D452. [Google Scholar] [CrossRef] [PubMed]
Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
Rauluseviciute, I.; Riudavets-Puig, R.; Blanc-Mathieu, R.; Castro-Mondragon, J.A.; Ferenc, K.; Kumar, V.; Lemma, R.B.; Lucas, J.; Chéneby, J.; Baranasic, D.; et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2023, 52, D174–D182. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Bajari, R.; Andric, D.; Gerthoffert, F.; Lepsa, A.; Nahal-Bose, H.; Stein, L.D.; Ferretti, V. The International Cancer Genome Consortium Data Portal. Nat. Biotechnol. 2019, 37, 367–369. [Google Scholar] [CrossRef] [PubMed]
Gruber, A.R.; Lorenz, R.; Bernhart, S.H.; Neubock, R.; Hofacker, I.L. The Vienna RNA websuite. Nucleic Acids Res. 2008, 36, W70–W74. [Google Scholar] [CrossRef] [PubMed]
Hon, J.; Martinek, T.; Zendulka, J.; Lexa, M. pqsfinder: An exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 2017, 33, 3373–3379. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Bioinformatics pipeline towards the identification of vital G4s and G4-related genes associated with breast cancer. We first integrated G4 data from 22 breast cancer PDTX models and MCF7 cell line to identify BC-G4s in the human genome. The importance of G4s in gene promoters and the first introns was then demonstrated through genome-wide analysis of BC-G4s. The crucial BC-G4s and G4-related genes were characterized according to differential expression analysis, TF enrichment patterns and genetic variation analysis.

Figure 2. Genome-wide analysis of BC-G4s. (A) Distribution patterns of BC-G4s in the human genome. (B) Violin plots show the expression levels of genes harboring promoter or intronic BC-G4s compared with those without G4 regions. The yellow violin plot stands for the G4 group, whereas the green plot represents the non-G4 group. (C) Violin plots show the differences among the expression levels of the four gene categories. The red, light blue, green and dark blue violin plots signify the genes with both promoter and intronic BC-G4s, genes with only promoter G4, genes with only intronic G4s and genes without these G4 regions, respectively. (D) Distribution patterns of BC-G4s in intronic regions (from intron 1 to intron 8). (E) Histograms depict the proportions of BC-G4s in the first introns estimated during 10,000 random iterations. The actual proportion of BC-G4s in the first introns (50.8%) is marked by a purple line in the right panel, while the proportions of BC-G4s from the 10,000 permutations are shown as histograms in the left panel. (F) Boxplots show the expression of genes with BC-G4s in distinct introns. The dark red and green boxes indicate the genes accommodating BC-G4s in the first introns or the other introns, respectively. (G) Density plot illustrates the normalized distance of BC-G4s to 5′ end of the first intron. (H) Density plot illustrates the normalized distance of BC-G4s to 5′ end of the promoter. **** p < 0.0001, Wilcoxon two-sided test.

Figure 3. Characteristics of BC-G4s in the promoter and the first intron of the differentially expressed genes (DEGs). (A) The volcano plot depicts DEGs between tumor and NAT groups in breast cancer. The up-regulated and down-regulated genes are represented with red and blue colors, respectively. (B,C) describe the KEGG pathway enrichment results of DEGs with BC-G4s in gene promoters and the first introns, respectively. DEGs accommodating BC-G4s in promoters and the first introns are both remarkably enriched in cell cycle pathway. (D,E) show the distributions of the normalized distance of BC-G4s to 5′ end of the first intron and promoter in up-regulated and down-regulated genes, respectively.

Figure 4. Relationship between promoter and first intron G4–TF interactions. (A) TF binding to intronic and promoter BC-G4s. The average number (±SEM) of TFs that interact with BC-G4s is plotted. The blue, green and purple color bars represent BC-G4s located in the first introns, other introns, and promoters, respectively. (B) Spearman correlation between the number of promoter and the first intron G4–TF interactions. (C) The barplot shows the distribution of the proportion of TFs that bind both promoter and first intron G4s, relative to the total TFs bound by first intron G4s, on a gene-by-gene basis. **** p < 0.0001, Mann–Whitney U test.

Figure 5. Transcription factor regulatory network analysis. (A) The enrichment barplot shows the top 20 TFs most significantly enriched in promoter BC-G4s of DEGs. (B) STRING analysis of protein–protein interaction of 105 proteins. A total of 219 edges are found between 86 of the proteins. Only 31 are expected by chance (p-value < 1 × 10⁻¹⁶). (C,D) show the top 2 significant protein clusters from the PPI network. (C) Cluster 1 and the top 5 enriched GO terms of cluster 1. (D) Cluster 2 and the top 5 enriched GO terms of cluster 2. (E) DNA-binding profiles retrieved from the JASPAR database for the 6 human stripe factors in two clusters except RAD21. (F) Boxplot of the expression levels of AURKA and CSTF1, two genes of which the promoter BC-G4 was enriched with the most TF–TF interactions. The significant difference is observed between tumor and NAT samples in these two genes.

Figure 6. Analysis of SNVs associated with BC-G4s. (A) DNA-based minimum free energy (MFE) shows the thermodynamic changes caused by SNVs within the G-tract region of BC-G4s. (B) The distribution patterns of different SNV types in the G-tract or loop region leading to structural alterations of BC-G4s. (C) Barplots show the structural effects of G4-related SNVs in distinct genomic features (promoters, 5′ UTR, exons, CDS, 3′ UTR, introns, and the first introns). The green line depicts the enrichment of the SNV distribution within BC-G4s in comparison to the background SNV distribution in breast cancer. (D) Kaplan–Meier survival plot of SNV located in the G-tract and loop region. **** p < 0.0001, Wilcoxon two-sided test.

Figure 7. The vital genes and SNVs associated with BC-G4s. (A) Boxplots of the expression levels and the corresponding SNV information of the top 6 seed genes in which SNVs in gene promoters and the first introns cause structural alterations of BC-G4s. (B) Boxplots of KRAS expression and the formation of a new G4 motif in the promoter region. (C) Track intensities of G4-seq for observed G4 (oG4) in vitro (blue), G4 ChIP-seq for HepG2 and K562 cell lines (light gray), qG4-ChIP-seq for examples of several PDTX models (pink and green) around the SNV position (red) in the KRAS promoter region.

Figure 8. TF interactions with BC-G4s in promoters and the first introns cooperatively regulate gene transcription (BC-G4s form on the non-transcribed strand).

Table 1. The co-localized sites of TFs are significantly enriched in promoter BC-G4s of DEGs (G4Rs), and the sites are classified by the number (#) of these binding proteins.

Class Labeled by # of Proteins	# of Sites	Overlaps in G4R	Overlaps in Control	p-Value
Cluster 1
≥2	56,089	267/424	6/424	2.00578 × 10⁻⁹⁷
≥3	27,651	217/424	3/424	1.801687 × 10⁻⁷⁶
≥4	16,064	172/424	3/424	2.081164 × 10⁻⁵⁶
≥5	9648	116/424	3/424	1.390859 × 10⁻³⁴
≥6	3483	44/424	0/424	3.505748 × 10⁻¹⁴
≥7	739	15/424	0/424	5.381052 × 10⁻⁵
≥8	127	5/424	0/424	0.06176341
Cluster 2
≥2	47,269	164/424	10/424	1.053576 × 10⁻⁴⁴
≥3	32,097	74/424	5/424	3.748428 × 10⁻¹⁸
≥4	23,600	42/424	3/424	3.429332 × 10⁻¹⁰
≥5	18,483	36/424	1/424	2.652893 × 10⁻¹⁰

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shu, H.; Xiao, K.; Zhu, W.; Zhang, R.; Tao, T.; Sun, X. The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer. Int. J. Mol. Sci. 2025, 26, 6874. https://doi.org/10.3390/ijms26146874

AMA Style

Shu H, Xiao K, Zhu W, Zhang R, Tao T, Sun X. The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer. International Journal of Molecular Sciences. 2025; 26(14):6874. https://doi.org/10.3390/ijms26146874

Chicago/Turabian Style

Shu, Huiling, Ke Xiao, Wenyong Zhu, Rongxin Zhang, Tiantong Tao, and Xiao Sun. 2025. "The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer" International Journal of Molecular Sciences 26, no. 14: 6874. https://doi.org/10.3390/ijms26146874

APA Style

Shu, H., Xiao, K., Zhu, W., Zhang, R., Tao, T., & Sun, X. (2025). The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer. International Journal of Molecular Sciences, 26(14), 6874. https://doi.org/10.3390/ijms26146874

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Relevance of G-Quadruplexes in Gene Promoters and the First Introns Associated with Transcriptional Regulation in Breast Cancer

Abstract

1. Introduction

2. Results

2.1. BC-G4s Are Enriched in Gene Promoters and the First Introns

2.2. The Distribution of BC-G4s in Up-Regulated Genes Is Biased Toward the TSSs

2.3. TF Binding to BC-G4s in the First Introns Compensates for G4–TF Interactions in the Cognate Promoters

2.4. Promoter BC-G4s of DEGs Function as Hubs for TF Co-Binding

2.5. SNVs in BC-G4s Modulate G4 Structures and the Transcription of Breast Cancer-Associated Genes

3. Discussion

4. Materials and Methods

4.1. G4 Dataset Acquisition

4.2. RNA-Seq Dataset Acquisition and Processing

4.3. Differential Expression Analysis

4.4. BC-G4 Overlapping Gene Promoter and Intron Expression Analysis

4.5. Transcription Factor Enrichment Analysis

4.6. ChIP Binding Site Analysis

4.7. Transcription Factor Network Analysis

4.8. SNV Dataset Acquisition

4.9. SNV Affecting BC-G4s

4.10. Statistical Analysis

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI