DNA G-Quadruplex-Binding Proteins: An Updated Overview

: DNA G-quadruplexes (G4s) are non-canonical secondary structures formed in guanine-rich sequences. Within the human genome, G4s are found in regulatory regions such as gene promoters and telomeres to control replication, transcription, and telomere lengthening. In the cellular context, there are several proteins named as G4-binding proteins (G4BPs) that interact with G4s, either anchoring upon, stabilizing


Introduction
DNA can adopt alternative secondary structures beyond the double helix. In 1910, it was first reported that guanine derivatives form gels at high concentration in aqueous solution [1]. This property was largely unexplored until the 1960s when X-ray fiber diffraction studies identified the G-tetrad structure. G-tetrads were postulated to be the basis for gelation [2]. In G-tetrads, four guanine bases are arranged via Hoogsteen base pairing in planar tetrameric squares (Figure 1). Several G-tetrads, that are proximally located, self-stack by π-π interactions further stabilized by monovalent or divalent cations centrally coordinated to form a three-dimensional structure, termed as a G-quadruplex (G4) [3]. DNA 2023, 3 2 thermodynamic stability of G4s seems to be similar to the double helix DNA of similar length [15]. However, G4 stability and folding kinetics remain a topic of research. Within the human genome (Figure 2), the first computational algorithms predicted that up to 375,000 G4s could be simultaneously formed [16]. Later, high-throughput G4-sequencing (G4-seq) in purified human DNA with G4-stabilizing ligands retrieved ∼700,000 G4s, accounting for ∼350,000 G4s that were not previously identified by bioinformatics and correspond to non-canonical G4s such as bulged and long loop G4 structures remained overlooked until human telomeres were discovered to be composed of tandem repeats of guanine-rich DNA sequences [4]. From then on, G4s have been reported as highly prevalent in the human genome [5]. Moreover, G4s are highly conserved among different species [6], ranging from viruses [7] to prokaryotes [8] and other eukaryotes [9]. Within the genome, G4s are not randomly distributed. Instead, G4s are clustered in key regulatory sites such as gene promoters and telomeres, as well as in gene bodies [10]. Interestingly, nucleosome-depleted regions and promoters of actively transcribed genes are significantly enriched in G4s [11]. Altogether, these observations strongly support their close involvement in functions of the genome including DNA replication, transcription, and epigenetic modification [12].
The potential G4 motif is reported with the sequence G X -N 1-7 -G X -N 1-7 -G X -N 1-7 -G X , where x is from 3 to 6 and N corresponds to any nucleotide (A, G, T or C) forming intermediate loops. G4s comprise several polymorphic structures and can fold into various topologies depending on the relative direction of the strands (i.e., parallel, antiparallel, or hybrid), the number of strands involved (i.e., intra or intermolecular), and the number of stacking G-tetrads ( Figure 1) [13]. G4s are readily formed in vitro with temperature and buffer modifications, whilst the conditions under which G4s fold in vivo are not completely understood. It is presumed that several potential folding intermediates exist with favorable minima en route to the G4 form [14]. Once formed, the thermodynamic stability of G4s seems to be similar to the double helix DNA of similar length [15]. However, G4 stability and folding kinetics remain a topic of research.
Within the human genome (Figure 2), the first computational algorithms predicted that up to 375,000 G4s could be simultaneously formed [16]. Later, high-throughput G4-sequencing (G4-seq) in purified human DNA with G4-stabilizing ligands retrieved ∼700,000 G4s, accounting for ∼350,000 G4s that were not previously identified by bioinformatics and correspond to non-canonical G4s such as bulged and long loop structures [17]. More recently, the mapping of G4s by chromatin immunoprecipitation and sequencing (ChIP-seq) using the anti-G4 antibody, BG4, only revealed ∼1000-10,000 G4s [18]. Therefore, the number of G4s in chromatin accounted for~1% of those G4s identified by G4-seq. These data strongly support that not all sequences with G4 potential form these structures in a cellular context. In fact, G4s seem to fold under certain conditions such as upon specific stress stimuli, during specific stages of the cell cycle, or in a cell type-specific manner [19]. The formation of these DNA structures is strictly controlled through several proteins that bind and stabilize or resolve them [20]. In this sense, DNA G4s are resolved during replication to allow the progression of DNA polymerase forks. Any error in G4 unfolding poses a threat to genome stability by hindering efficient replication and could lead to replication errors, DNA damage, and even chromosomal reorganization [21]. G4s are also considered as physical obstacles for the transcriptional machinery and must be resolved. Alternatively, G4s recruit different protein factors that positively influence replication and transcription. In addition, G4s regulate epigenetic processes and telomere homeostasis. It is important to note that apart from functioning in the neighboring regions, the role of G4 structures expands into additional long-range mechanisms (i.e., due to the formation of enhancerpromoter loops that allow normally separated sequences to meet) [22]. Within a cellular context, G4 formation is largely dependent on interactions with an arsenal of protein factors. These interactors shift the equilibrium of unfolded guanine-rich sequences towards folded G4s or vice versa (stabilization versus destabilization), making G4 formation extraordinarily dynamic in live cells. Bioinformatic analyses search for the consensus G4 sequence. G4-seq maps G4s by DNA polymerase stalling followed by high-throughput sequencing. In G4-seq, fragmented genomic DNA is sequenced twice: first in non-G4-stabilizing conditions (Read 1) to provide a reference, and then in G4-stabilizing conditions (Read 2) to identify the positions of G4-dependent DNA polymerase stalling. G4 ChIP-seq maps DNA G4s using chromatin immunoprecipitation with G4-specific antibodies and next-generation sequencing.
The participation of proteins is inherently required for the regulation of G4 formation across the whole genome and transcriptome, as well as the fulfillment of their biological functions. The proteins that specifically bind to G4s are known as G4-binding proteins (G4BPs). The interest in G4BPs is considerably increasing for drug development owing to the ability to target the respective downstream processes directed by the interaction between G4BPs and the G4 counterpart. Within this context, the present review aims to provide an updated overview of the functional aspects and interactions of G4BPs in what can be referred to as "the cellular interactome" of DNA G4s. We also present information on the structural properties of G4BPs that are available to date and discuss how to exploit G4-G4BP interactions to develop selective therapeutic approaches. Unlike other previous work, this review is focused on DNA G4s (excluding RNA G4s), it includes only extensively summarized and widely contrasted information, and is completely updated. In addition, our review of G4BPs is unique since it has been prepared with a practical purpose, avoiding many chemical and structural details, and at a level accessible to readers who are non-specialists in the topic themselves, but who are interested in the field of research. Researchers who are interested in more detailed information may also read other reviews that have recently been published elsewhere [23,24].

Identification of G4BPs
The commonly used methods for the identification of G4BPs include affinity chromatography and quantitative techniques based on mass spectrometry. Both methodologies are often used in combination to isolate proteins that bind to specific G4 motifs [25]. In addition, fluorescence energy resonance transfer (FRET) technology provides information not only about the existence or absence of interaction, but also about how the protein changes the conformation of the DNA G4 [26]. Many G4BPs have been identified through pull-down methods implemented with cell lysates and using specific G4 sequences. However, the concentration and distribution of biological elements in cell lysates are quite different from those inside cells. Hence, the main limitation of these approaches is that they do not consider the native chromatin state. Figure 2. Quantity of G4s identified in the human genome with different methodologies. Bioinformatic analyses search for the consensus G4 sequence. G4-seq maps G4s by DNA polymerase stalling followed by high-throughput sequencing. In G4-seq, fragmented genomic DNA is sequenced twice: first in non-G4-stabilizing conditions (Read 1) to provide a reference, and then in G4-stabilizing conditions (Read 2) to identify the positions of G4-dependent DNA polymerase stalling. G4 ChIP-seq maps DNA G4s using chromatin immunoprecipitation with G4-specific antibodies and next-generation sequencing.
The participation of proteins is inherently required for the regulation of G4 formation across the whole genome and transcriptome, as well as the fulfillment of their biological functions. The proteins that specifically bind to G4s are known as G4-binding proteins (G4BPs). The interest in G4BPs is considerably increasing for drug development owing to the ability to target the respective downstream processes directed by the interaction between G4BPs and the G4 counterpart. Within this context, the present review aims to provide an updated overview of the functional aspects and interactions of G4BPs in what can be referred to as "the cellular interactome" of DNA G4s. We also present information on the structural properties of G4BPs that are available to date and discuss how to exploit G4-G4BP interactions to develop selective therapeutic approaches. Unlike other previous work, this review is focused on DNA G4s (excluding RNA G4s), it includes only extensively summarized and widely contrasted information, and is completely updated. In addition, our review of G4BPs is unique since it has been prepared with a practical purpose, avoiding many chemical and structural details, and at a level accessible to readers who are non-specialists in the topic themselves, but who are interested in the field of research. Researchers who are interested in more detailed information may also read other reviews that have recently been published elsewhere [23,24].

Identification of G4BPs
The commonly used methods for the identification of G4BPs include affinity chromatography and quantitative techniques based on mass spectrometry. Both methodologies are often used in combination to isolate proteins that bind to specific G4 motifs [25]. In addition, fluorescence energy resonance transfer (FRET) technology provides information not only about the existence or absence of interaction, but also about how the protein changes the conformation of the DNA G4 [26]. Many G4BPs have been identified through pull-down methods implemented with cell lysates and using specific G4 sequences. However, the concentration and distribution of biological elements in cell lysates are quite different from those inside cells. Hence, the main limitation of these approaches is that they do not consider the native chromatin state. Chromatin immunoprecipitation (ChIP) has been coupled with mass spectrometry-based proteomics analysis to characterize the composition of particular chromatin-associated protein complexes [27]. Nevertheless, these approaches require high-affinity and high-selectivity antibodies that are not always available. Computational analyses are also exploited since structural features of known G4BPs can be used to predict new interactors [28]. However, the in silico realm requires further validation with the biochemical experiments described above.
To overcome these limitations, a co-binding-mediated protein profiling (CMPP) approach has been recently developed to identify G4BPs in living cells [29]. In that work, cell-permeable and functionalized ligands are designed to bind endogenous G4s in cellular chromatin. When these ligands are in close proximity to the G4BPs, the G4BPs are labeled by subsequent photo-crosslinking. Importantly, the perturbation of G4-protein interactions by the photo-proximity crosslinking of a G4-binding probe was minimal. Moreover, compared with proteomic approaches, CMPP capture takes into account the local chromatin environment in a functioning cell and facilitates the detection of transient interactions that are usually lost during cell lysis or washing steps. Another study proposed a similar approximation that was also based on photo-crosslinking [30]. They used a pyrrolidine derivative of pyridostatin serving as a trifunctional probe: an efficient derivative of pyridostatin that targets G4s with extensive sequence tolerance, a diazirine group with photo-crosslinking ability to enable the crosslinking of the probe with nearby proteins, and an alkyne group for the copper(I)-catalyzed azide−alkyne cycloaddition (CuAAC) reaction and subsequent pulling down. This method was named G4 ligand-mediated crosslinking and pull-down (G4-LIMCAP) and enabled the labeling and pulling down of G4BPs in vivo. Although there is some overlap between both photo-crosslinking methodologies, most of the G4BPs identified by CMPP were not found by G4-LIMCAP. The different outcomes may arise due to variations in chromatin states, G4 biology, and protein expression levels between the different cell lines used. Regardless of the methodology, one should not overlook the possibility that the G4 ligands partially influence the endogenous G4 landscape and, thus, the G4-interactome. For instance, G4 ligands may induce the stabilization of weaker and more transient G4s or alter the folded topology of G4s, affecting G4BP binding. For these reasons, as the researchers themselves point out, it is essential to validate candidate G4 interactors with additional approaches in vitro.
To sum up, methods for the detection of G4BPs comprise in vivo, in vitro, and in silico approaches. In general, in vivo and in silico approaches are useful to identify potential G4BPs, but require in vitro approaches to confirm G4-protein interactions.

G4BPs
The nucleic acid G4-interacting proteins database (G4IPDB) contains comprehensive and updated information about proteins interacting with G4s [31]. At a freely accessible platform, G4IPDB includes more than 100 entries with data about interacting proteins, target sequence, PubMed reference number and binding details if available. Whilst there are multiple studies describing G4-G4BP interactions, the functional outcome remains to be determined for many pairings [32]. The present review is only focused on G4BPs that are widely characterized with contrasted information.
The categorization of G4BPs is attainable in several ways ( Figure 3). Firstly, G4BPs are divided into the following two types according to the distribution of G4s in the genome: (i) DNA and (ii) RNA G4BPs. This review only covers DNA G4BPs. Secondly, G4BPs are classified into two main types based on their functional relationships with G4s: (i) G4BPs that are recruited by G4s without affecting their structure and (ii) G4BPs that have an effect on the G4 structure. These last G4BPs are also divided into the following two categories in consonance with the structural effects on G4 structure: (i) G4-stabilizing, which promote putative G4 sequences to form a stable G4 structure and (ii) G4-destabilizing, which have the ability to unfold G4s. In the following section, we summarize recent discoveries related to the involvement of G4BPs in the regulation of cellular processes that include telomere lengthening, replication, transcription, chromatin remodeling, and histone modification. The involvement of G4BPs in these fundamental processes depends on their binding sites or, in other words, their target G4 sequences.

Telomeric G4BPs
Telomeres constitute the ends of eukaryotic chromosomes and consist of recurrent TTAGGG-containing sequences that form a G4 structure. In fact, the human telomeric sequence was one of the first sequences discovered to form G4s [33]. There are several G4BPs that bind to telomeric G4s and unfold them to maintain the length and integrity of telomeres. In particular, the shelterin protein POT1 (Protection of Telomeres 1) binds to the3′ overhang of telomeric repeats and regulates the unwinding of the G4s with its heterodimeric partner TPP1 (TIN2 Interacting Protein) by a conformational selection mechanism. Initially, the POT1-G4 binding was shown to be selective to antiparallel G4s, while parallel G4s were unaffected [34]. Later, POT1 was demonstrated to unfold and bind to any conformational form of human telomeric G4 (antiparallel, hybrid, or parallel) without avidly interacting with duplex DNA or with other G4 structures [35]. The unfolding activity of telomeric G4s has also been observed for replication protein A (RPA), which is the most abundant single-stranded DNA-binding protein in human cells [36]. However, the access of RPA protein to telomeric DNA is blocked by the presence of POT1-TPP1 complexes [34]. Another telomere-associated protein complex called CST (CTC1-STN1-TEN1) unwinds G4 structures more rapidly than POT1 [37]. The telomeric repeat binding factor 2 (TRF2) associates with both telomeric G4s [38] and extratelomeric G4s, such as that included in the PCGF3 promoter [39]. In addition, the helicases WRN (Werner syndrome ATP-dependent helicase) and BLM (Bloom syndrome protein) of the RecQ family unfold telomeric G4s to maintain the integrity of telomeres [40]. In vitro assays proved that the helicase-nuclease DNA2 recognizes and resolves telomeric G4s through cleavage at the G4 site [41]. Furthermore, the regulator of telomere elongation helicase 1 (RTEL1) unwinds an intermolecular G4 formed with human telomeric sequence [42]. The oncogenic fusion protein TLS/FUS binds to G4 telomere DNA, which controls histone modifications of telomeres and telomere length [43]. Moreover, heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) participates in the resolution of G4s at the end of telomeres to stimulate telomere elongation [44]. Interestingly, UP1, which is the proteolytic cleavage product of hnRNPA1, retains its G4-unfolding activity [45]. Finally, it is important to remark that human telomerase reverse transcriptase (hTERT) itself acts as a G4 resolvase to ensure proper telomere replication [46].

G4BPs Involved in Replication
Several G4BPs are recruited to the G4 formation sites to unwind them and allow efficient DNA replication and maintain genome integrity [47]. However, G4s may prevent the uncoupling of the leading and lagging-strand polymerases, thereby protecting proper replication [48]. Therefore, G4BPs seem to display a dual effect on the process of replication. Apart from telomeric G4s, BLM and WRN bind and unfold other In the following section, we summarize recent discoveries related to the involvement of G4BPs in the regulation of cellular processes that include telomere lengthening, replication, transcription, chromatin remodeling, and histone modification. The involvement of G4BPs in these fundamental processes depends on their binding sites or, in other words, their target G4 sequences.

Telomeric G4BPs
Telomeres constitute the ends of eukaryotic chromosomes and consist of recurrent TTAGGG-containing sequences that form a G4 structure. In fact, the human telomeric sequence was one of the first sequences discovered to form G4s [33]. There are several G4BPs that bind to telomeric G4s and unfold them to maintain the length and integrity of telomeres. In particular, the shelterin protein POT1 (Protection of Telomeres 1) binds to the3 overhang of telomeric repeats and regulates the unwinding of the G4s with its heterodimeric partner TPP1 (TIN2 Interacting Protein) by a conformational selection mechanism. Initially, the POT1-G4 binding was shown to be selective to antiparallel G4s, while parallel G4s were unaffected [34]. Later, POT1 was demonstrated to unfold and bind to any conformational form of human telomeric G4 (antiparallel, hybrid, or parallel) without avidly interacting with duplex DNA or with other G4 structures [35]. The unfolding activity of telomeric G4s has also been observed for replication protein A (RPA), which is the most abundant single-stranded DNA-binding protein in human cells [36]. However, the access of RPA protein to telomeric DNA is blocked by the presence of POT1-TPP1 complexes [34]. Another telomere-associated protein complex called CST (CTC1-STN1-TEN1) unwinds G4 structures more rapidly than POT1 [37]. The telomeric repeat binding factor 2 (TRF2) associates with both telomeric G4s [38] and extratelomeric G4s, such as that included in the PCGF3 promoter [39]. In addition, the helicases WRN (Werner syndrome ATP-dependent helicase) and BLM (Bloom syndrome protein) of the RecQ family unfold telomeric G4s to maintain the integrity of telomeres [40]. In vitro assays proved that the helicase-nuclease DNA2 recognizes and resolves telomeric G4s through cleavage at the G4 site [41]. Furthermore, the regulator of telomere elongation helicase 1 (RTEL1) unwinds an intermolecular G4 formed with human telomeric sequence [42]. The oncogenic fusion protein TLS/FUS binds to G4 telomere DNA, which controls histone modifications of telomeres and telomere length [43]. Moreover, heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) participates in the resolution of G4s at the end of telomeres to stimulate telomere elongation [44]. Interestingly, UP1, which is the proteolytic cleavage product of hnRNPA1, retains its G4-unfolding activity [45]. Finally, it is important to remark that human telomerase reverse transcriptase (hTERT) itself acts as a G4 resolvase to ensure proper telomere replication [46].

G4BPs Involved in Replication
Several G4BPs are recruited to the G4 formation sites to unwind them and allow efficient DNA replication and maintain genome integrity [47]. However, G4s may prevent the uncoupling of the leading and lagging-strand polymerases, thereby protecting proper replication [48]. Therefore, G4BPs seem to display a dual effect on the process of replication. Apart from telomeric G4s, BLM and WRN bind and unfold other G4s. Both helicases act as counterparts. While BLM functions on the leading strand of replication, WRN exerts its helicase function on the lagging strand of replication [48]. Fanconi anemia complementation group J (FANCJ) is a DNA helicase that removes G4 structures for efficient DNA replication and, in its absence, replication machinery is stalled at G4 sites, eventually leading to DNA damage [49]. Moreover, DEAH box protein 11 (DDX11) is an Fe-S helicase that resolves G4s and other non-canonical DNA structures [50]. Preimplantation factor-1 (Pif1) is a highly conserved helicase across many domains of life and its potent G4 unwinding role is conserved among Pif1 helicases to suppress genome instability [51]. Pif1 has been characterized to unfold the telomeric G4s as well [52]. In addition, breast cancer type 1 susceptibility protein (BRCA1) is a tumor suppressor protein that has been recently demonstrated to act as a direct G4BP, although further structural data are required [53].

G4BPs Involved in Transcription
More than 40% of human genes contain putative G4s in their promoter regions, suggesting a role for G4s in the control of gene transcription. In fact, G4 motifs function in two interconnected processes: transcriptional termination and activation to shape the transcriptome [10]. In particular, G4s in the promoter regions of oncogenes have been most intensively studied so far [54]. Consequently, G4BPs targeting G4s in oncogene promoters are more widely characterized. Several transcription factors bind to G4s harbored in promoter sites to aid or suppress gene transcription [55]. In fact, transcription factors account for a significant part of the G4BPs included in G4IPDB [31]. For instance, SP1 is able to bind both canonical SP1 duplex DNA as well as G4 structures within the promoter region of the CKIT oncogene [56]. Myc-associated zinc finger (MAZ) and poly(ADP-ribose) polymerase 1 (PARP-1) recognize a critical G4 in KRAS promoter, increasing KRAS transcriptional output. MAZ specifically binds to duplex and G4 conformations, whereas PARP-1 shows specificity only for the G4 [57]. The Yin and Yang 1 (YY1) transcription factor binds directly to G4 structures and this interaction contributes to YY1 dimerization and YY1-mediated long-range DNA looping [58].
Apart from transcription factors, there are several proteins that interact with G4s. Such is the case of nucleolin, the most abundant phosphoprotein in nucleolus. Nucleolin induces CMYC G4 formation in vivo [59]. Nucleolin binding has been shown with other G4s such as those harbored in BCL2, hTERT, VEGF, RET, PDGFA, and CKIT promoters, but the functional significance of these interactions has not been elucidated yet [60]. Interestingly, nucleolin-hnRNPD heterodimer has been reported to bind to G4 structures as a prerequisite to immunoglobulin switch recombination [61]. Furthermore, nucleophosmin is a multifunctional protein that binds to CMYC G4 motif through its C-terminal region [62]. Another protein, the nucleoside diphosphate kinase NM23-H2, exerts a completely different effect on CMYC G4 from nucleolin since NM23-H2 promotes the unfolding of the CMYC G4 structure, thereby activating CMYC transcription [63]. Moreover, tumor suppressor protein 53 (TP53), the well-known guardian of the genome, has recently demonstrated its activity as a G4BP. In particular, mutant TP53 binds and stabilizes a well-characterized G4 structure in vitro, but the same mechanism could constitute the initial step for the transcriptional control of a large set of cancer-relevant genes [64]. The same occurs for wild-type TP53, which has the ability to selectively bind CMYC promoter G4s [65]. The hnRNPA1 is also able to bind and unfold G4s in promoter regions, including the KRAS promoter G4 [66] and the TRA2B promoter one [67]. In addition, transcriptional helicases xeroderma pigmentosum type B and D (XPB and XPD) are recruited to DNA G4s. XPD functions as a robust G4 helicase, whilst XPB binds G4 without unwinding it [68]. Finally, it is worth pointing out the G-quartet nuclease 1 (GQN1), which cuts the single-stranded DNA region upstream of the barrel formed by stacked G-tetrads independently of the flanking sequence and without cleaving duplex or single-stranded DNA [69].

G4BPs Involved in Chromatin Remodeling and Histone Modification
Several epigenetic and chromatin remodeling complexes bind selectively to G4s [70]. For instance, DNA (cytosine-5)-methyltransferase 1 (DNMT1) is sequestered at G4 sites to inhibit the methylation of proximal CpG island promoters [71]. A similar process occurs with additional DNMTs [72]. In addition, G4s guide the recruitment of RE1-silencing transcription factor-lysine-specific histone demethylase 1A (REST-LSD1) that induces gene repression by removing the gene-activating monomethylation H3K4me1 and dimethylation H3K4me2 [73]. The recruitment of chromatin remodeling complexes such as BRD3 is also mediated by G4s and favors transcription at G4 sites [74]. The chromatin remodeling protein ATRX is an X-linked gene of the SWI/SNF family, which is reported to interact with G4s in vitro [75]. SMARCA4, another member of the SWI/SNF family, is also recruited to endogenous promoter G4s in chromatin [29]. G4 sites in the human genome frequently colocalize with binding sites of CCCTC-binding factor (CTCF), a chromatin remodeling factor with the capability of nucleosome repositioning, suggesting that CTCF is recruited by G4s at least in part [76].
The following table (Table 1) lists G4BPs that influence telomere lengthening, replication, transcription, chromatin remodeling, and histone modification such as described above. Table 1. G4BPs involved in telomere lengthening, replication, transcription, and chromatin remodeling and histone modification. List of all G4BPs (alphabetically sorted) grouped according to their biological function. Literature column shows the reference in which the protein was first described as a G4BP.

Structural Properties of G4BPs
Although there is a limited number of high-resolution structures of G4BPs interacting with G4s available (Figure 4A), it is assumed that binding can occur at the following sites: (i) top-stacking with the upper G-tetrads, (ii) groove-binding, (iii) loop-binding, or a combination of those modes ( Figure 4B).

Structural Properties of G4BPs
Although there is a limited number of high-resolution structures of G4BPs interacting with G4s available (Figure 4A), it is assumed that binding can occur at the following sites: (i) top-stacking with the upper G-tetrads, (ii) groove-binding, (iii) loop-binding, or a combination of those modes ( Figure 4B). The different functions of G4BPs may be linked to the binding mode the protein assumes in interacting with a G4. For instance, a top-stacking binding mode would appear to be more practical for G4BPs that unwind multiple and different G4s. In contrast, G4BPs that are too selective towards G4s may function through loop-binding because the orientation and length of the loops vary among different G4s (with a similar barrel core). Finally, groove-binders display a particular conformation able to fit into a groove of the G4 [23]. Interestingly, binding of a G4BP to a G4 does not equate to functioning. It was demonstrated that PARP-1 affinity for a G4 increased as the loop features were removed, but PARP-1 activation was no longer achieved [77].
Analyzing the amino acid composition and structural patterns of G4BPs provides further insights into G4-recognition mechanisms. G4BPs are enriched in shared domains or motifs that are established or predicted to function as binding regions. In particular, 77 human G4BPs shared a domain rich in glycine and arginine residues [78]. Such a highly conserved domain is termed the RGG (Arginine-Glycine-Glycine)/RG (Arginine-Glycine) motif or GAR (Glycine-Arginine-rich) domain and is composed of repeat sequences rich The different functions of G4BPs may be linked to the binding mode the protein assumes in interacting with a G4. For instance, a top-stacking binding mode would appear to be more practical for G4BPs that unwind multiple and different G4s. In contrast, G4BPs that are too selective towards G4s may function through loop-binding because the orientation and length of the loops vary among different G4s (with a similar barrel core). Finally, groove-binders display a particular conformation able to fit into a groove of the G4 [23]. Interestingly, binding of a G4BP to a G4 does not equate to functioning. It was demonstrated that PARP-1 affinity for a G4 increased as the loop features were removed, but PARP-1 activation was no longer achieved [77].
Analyzing the amino acid composition and structural patterns of G4BPs provides further insights into G4-recognition mechanisms. G4BPs are enriched in shared domains or motifs that are established or predicted to function as binding regions. In particular, 77 human G4BPs shared a domain rich in glycine and arginine residues [78]. Such a highly conserved domain is termed the RGG (Arginine-Glycine-Glycine)/RG (Arginine-Glycine) motif or GAR (Glycine-Arginine-rich) domain and is composed of repeat sequences rich in RGG or RG [79]. The RGG domain is important in G4-protein interactions. For instance, the RGG motif in nucleolin is essential for the recognition of the CMYC G4 sequence and the promotion of G4 formation [80]. The pairs of interactions between amino acid residues in proteins and bases in DNA have been identified [81]. Within the RGG domain, the internal arrangement of RGG repeats and gap amino acids seem to play a more crucial role in the G4-binding mechanism than a critical number of RGG repeats [28]. Interestingly, the cold-inducible RNA-binding protein (CIRBP) was the first protein identified as a G4BP both in vitro and in cells from the exploration of the RGG motif [28]. In addition, proteins that bind oligonucleotides or oligosaccharides were observed to contain a particular motif named the oligonucleotide/oligosaccharide-binding (OB)-fold domain. OB-fold has a five-stranded β-sheet coiled to form a closed β-barrel [82]. For instance, the OB-fold domain is included in several G4BPs such as CST [37] and POT1 [83] and participates in G4 interaction.

Concluding Remarks
The large number and evolutive conservation of G4s point to their importance in biological functions. Since G4 structures are also a source of genomic instability, G4 formation has to be tightly controlled. In this sense, G4s are highly dynamic in vivo and their folding depends on the cell type and chromatin state. G4BPs participate in the stabilization or resolution of G4s. Thus, it is important to consider the G4 not as an isolated entity [22], but rather as a structure that exists as part of an interconnected network of other biomolecules, such as G4BPS, within living cells. To date, a broad spectrum of G4BPs has been identified, but there are some difficulties in the analysis of the G4 interactome. Firstly, the interaction in vitro may not confirm the existence of such an event in vivo due to the plasticity of G4 formation in distinct cellular contexts. Secondly, three-dimensional structures of G4-G4BP complexes are still sparse. However, given the development of methodologies for the identification of G4BPs, we suspect that the number of proteins with G4-binding specificity will be increased in the future. While some of the known G4BPs appear to function as "pan-binders", others act in a more selective manner and with different affinity, which implies that selectivity is attainable. Despite several domains involved in the recognition of G4s having already been characterized, there may still be others to be deciphered. Determining the features of G4BPs will be helpful in eliciting the details necessary to rationally design selective binders. Therefore, improving the selectivity of G4 binders to minimize off-target effects in the host cell remains a challenge for the future. Undoubtedly, a critical step to activate or inactivate physiological or pathological pathways is the recognition and processing of G4s by G4BPs. In this regard, the extensive research on G4BPs will provide new targets for drug design and pave the way for novel therapeutic approaches in human diseases.