Advances in the Structure of GGGGCC Repeat RNA Sequence and Its Interaction with Small Molecules and Protein Partners

The aberrant expansion of GGGGCC hexanucleotide repeats within the first intron of the C9orf72 gene represent the predominant genetic etiology underlying amyotrophic lateral sclerosis (ALS) and frontal temporal dementia (FTD). The transcribed r(GGGGCC)n RNA repeats form RNA foci, which recruit RNA binding proteins and impede their normal cellular functions, ultimately resulting in fatal neurodegenerative disorders. Furthermore, the non-canonical translation of the r(GGGGCC)n sequence can generate dipeptide repeats, which have been postulated as pathological causes. Comprehensive structural analyses of r(GGGGCC)n have unveiled its polymorphic nature, exhibiting the propensity to adopt dimeric, hairpin, or G-quadruplex conformations, all of which possess the capacity to interact with RNA binding proteins. Small molecules capable of binding to r(GGGGCC)n have been discovered and proposed as potential lead compounds for the treatment of ALS and FTD. Some of these molecules function in preventing RNA–protein interactions or impeding the phase transition of r(GGGGCC)n. In this review, we present a comprehensive summary of the recent advancements in the structural characterization of r(GGGGCC)n, its propensity to form RNA foci, and its interactions with small molecules and proteins. Specifically, we emphasize the structural diversity of r(GGGGCC)n and its influence on partner binding. Given the crucial role of r(GGGGCC)n in the pathogenesis of ALS and FTD, the primary objective of this review is to facilitate the development of therapeutic interventions targeting r(GGGGCC)n RNA.


Introduction
Amyotrophic lateral sclerosis (ALS) and frontal temporal dementia (FTD) are two neurodegenerative disorders characterized by progressive degeneration and dysfunction of neuronal architecture [1][2][3][4][5]. Both diseases have a fatality rate typically occurring within three to five years after the onset of symptoms [6,7]. ALS, affecting approximately two individuals per 100,000, is characterized by the degeneration of motor neurons, leading to muscle weakness and atrophy [8,9]. FTD, the second most prevalent form of dementia in individuals under the age of 65, is typically characterized by atrophy of the frontal and/or temporal lobes, manifesting as heterogeneous symptoms encompassing behavioral changes (behavioral variant FTD, bvFTD), language impairment (primary progressive aphasia, PPA), or deterioration in motor skills [10]. Despite considerable efforts, the development of efficacious therapeutic strategies for the treatment of ALS and FTD remains a challenge [11].
The etiologies of ALS and FTD are various. Sporadic ALS (sALS) accounts for 90% of the ALS patients. The remaining 10% of ALS patients are familial ALS (fALS) caused by mutations. The fALS can be caused by the dysfunction of mutated proteins, such as SOD1 mutations, FUS/TLS, and TDP-43 provoked by TARDBP mutations [12,13], which (ii) The transcribed r(GGGGCC)n aggregates in the nucleus to form RNA foci that recruit RBPs, affecting the intra cellular functions of RBPs, i.e., splicing. (iii) The r(GGGGCC)n RNA is transported into the cytoplasm and undergoes repeat-associated non-ATG translation, resulting in the synthesis of DPRs. The DPRs forms aggregation and associate TDP-43, which induce cytotoxic effects in cells. The labels (i-iii) correspond to the three pathological mechanisms. The red star highlights the hexanucleotide repeat GGGGCC in the non-coding region of C9orf72.
This review provides a comprehensive overview of the recent progress made in understanding the structures of r(GGGGCC)n, and the interactions between r(GGGGCC)n and small molecules and between r(GGGGCC)n and protein partners. We focus on elucidating the structural diversity of r(GGGGCC)n and its implications for partner binding. Given the crucial role of r(GGGGCC)n in the pathogenesis of ALS and FTD, the primary objective of this review is to support the development of drugs targeting r(GGGGCC)n RNA.

The Solution Structures of r(GGGGCC)n RNA
The r(GGGGCC)n RNA is a guanine-rich sequence, which promotes the formation of G4 structures. While the tertiary structure of r(GGGGCC)n remains to be fully elucidated, the secondary structures have been extensively studied. Circular Dichroism (CD) spectra are commonly used to demonstrate the G4 structures and the topology of G4, which for instance, the spectral patterns would provide the evidence of parallel, antiparallel, or other types of topologies ( Figure 2). Nuclear Magnetic Resonance (NMR) spectra are able to provide more structural details, some of which yield full structural determination or (ii) The transcribed r(GGGGCC)n aggregates in the nucleus to form RNA foci that recruit RBPs, affecting the intra cellular functions of RBPs, i.e., splicing. (iii) The r(GGGGCC)n RNA is transported into the cytoplasm and undergoes repeat-associated non-ATG translation, resulting in the synthesis of DPRs. The DPRs forms aggregation and associate TDP-43, which induce cytotoxic effects in cells. The labels (i-iii) correspond to the three pathological mechanisms. The red star highlights the hexanucleotide repeat GGGGCC in the non-coding region of C9orf72.
This review provides a comprehensive overview of the recent progress made in understanding the structures of r(GGGGCC) n , and the interactions between r(GGGGCC) n and small molecules and between r(GGGGCC) n and protein partners. We focus on elucidating the structural diversity of r(GGGGCC) n and its implications for partner binding. Given the crucial role of r(GGGGCC) n in the pathogenesis of ALS and FTD, the primary objective of this review is to support the development of drugs targeting r(GGGGCC) n RNA. The r(GGGGCC) n RNA is a guanine-rich sequence, which promotes the formation of G4 structures. While the tertiary structure of r(GGGGCC) n remains to be fully elucidated, the secondary structures have been extensively studied. Circular Dichroism (CD) spectra are commonly used to demonstrate the G4 structures and the topology of G4, which for instance, the spectral patterns would provide the evidence of parallel, antiparallel, or other types of topologies ( Figure 2). Nuclear Magnetic Resonance (NMR) spectra are able to provide more structural details, some of which yield full structural determination or provide evidence of co-existence of different conformations (Figure 2a). Depending on the sequence length and solution conditions, r(GGGGCC) n can adopt different secondary structures, including G4 [58,59] and hairpin conformations [60]. In 2012, Adrian M. Isaacs and colleagues demonstrated that r(GGGGCC) 3 GGGGC can fold into G4 or double-stranded structures, with the topology being influenced by the presence of cation ions in the solution [61]. In a K + buffer, it forms a stable parallel intramolecular G4 structure, while it becomes less stable in Na + and Li + solutions.
Further investigations by Pearson and colleagues employed circular dichroism (CD) spectroscopy ( Figure 2a) and gel-shift assays, revealing that r(GGGGCC) n (n = 2, −5, −6, and −8) predominantly adopt highly stable uni-and multi-molecular parallel G4 structures [62]. The abundance of G4 structures is influenced by the repeat number and RNA concentrations, with the proportion of multi-molecular G4 structures increasing as the number of repetitions rises.
The equilibrium between G4 and hairpin structures has also been observed in r(GGGGCC) n . In the absence of K + ions, r(GGGGCC) 4 RNA forms a hairpin conformation [62], featuring single-stranded bulges within the RNA chain. However, in a K + buffer (Figure 2c), it adopts a parallel G4 structures ( Figure 2g) [59]. This equilibrium between hairpin and G4 structures is suggested to be linked to the presence of an abortive transcript containing hexanucleotide repeats [55]. The G4 structure may hinder the transcription of full-length RNA and recruit RBPs in cells, contributing to disease pathogenesis. The equilibrium is biased towards the hairpin conformation with a higher repeat number of r(GGGGCC) n . Specifically, r(GGGGCC) 4 predominantly adopts a G4 topology, while r(GGGGCC) 8 RNA exhibits both G4 and hairpin structures, even in a K + buffer, as confirmed by various biophysical methods. However, in a Na + buffer, r(GGGGCC) 8 RNA solely adopts a hairpin structure [55]. Furthermore, r(GGGGCC) 4 undergoes a monomerdimer equilibrium in a pH-dependent manner. At pH 6.0 and 25 • C, it exists as both a homodimer and a hairpin structure. Decreasing the temperature increases the population of dimeric RNA, which exhibits distinct structural differences compared to G4 structures in the presence of K + [63]. Conversely, at neutral pH, r(GGGGCC) 4 primarily adopts a hairpin conformation.

Structure of d(GGGGCC) n DNA
High-resolution structures of d(GGGGCC) n have been successfully determined [64,65]. Janez Plavec and colleagues utilized NMR spectroscopy to elucidate the structure of d[(GGGGCC) 3 GG Br GG] (represented by PDB codes 2N2D) [66]. The incorporation of a bromine-substituted guanine residue (G Br ) contributed to the stabilization of the conformation, leading to a more rigid structure amenable to structural analysis. The d[(GGGCCC) 3 GG Br GG] sequence adopted an antiparallel G4 topology ( Figure 2b) [67].

Biological Phase Separation and Transition of r(GGGGCC)n
Biological liquid-liquid phase separation is a widely observed phenomenon in ce and plays a critical role in the formation of membraneless organelles, signal transductio and DNA packaging [69][70][71][72]. As the strength of interactions in phase separation system increases, a transition from a liquid to a solid state often occurs, resulting in the formatio of insoluble gel-like states, many of which are associated with diseases [73]. Jain and co leagues demonstrated that r(GGGGCC)n can undergo phase separation both in vivo an in vitro [43]. They found that phase separation of r(GGGGCC)n occurs once a speci threshold of repeat value is reached, leading to a solution-gel phase transition as t strength of multi-base interactions increases ( Figure 3). The formation of RNA foci is d pendent on solution conditions and is reinforced by Mg 2+ but impaired by monovale cations such as K + or Na + . The authors proposed that inter-chain hydrogen bonds stabili intermolecular G4s, which serve as the building blocks of RNA foci. However, direct ev dence of the secondary structure of r(GGGGCC)n within RNA foci is still lacking.  4 (f) and the parallel G4 topology formed by the r(GGGGCC) 4 RNA (g). The (b,c,f,g) were reprinted from the reference [59]. (d,e) were reprinted from the reference [67,68], respectively.

Biological Phase Separation and Transition of r(GGGGCC) n
Biological liquid-liquid phase separation is a widely observed phenomenon in cells and plays a critical role in the formation of membraneless organelles, signal transduction, and DNA packaging [69][70][71][72]. As the strength of interactions in phase separation systems increases, a transition from a liquid to a solid state often occurs, resulting in the formation of insoluble gel-like states, many of which are associated with diseases [73]. Jain and colleagues demonstrated that r(GGGGCC) n can undergo phase separation both in vivo and in vitro [43]. They found that phase separation of r(GGGGCC) n occurs once a specific threshold of repeat value is reached, leading to a solution-gel phase transition as the strength of multi-base interactions increases ( Figure 3). The formation of RNA foci is dependent on solution conditions and is reinforced by Mg 2+ but impaired by monovalent cations such as K + or Na + . The authors proposed that inter-chain hydrogen bonds stabilize intermolecular G4s, which serve as the building blocks of RNA foci. However, direct evidence of the secondary structure of r(GGGGCC) n within RNA foci is still lacking.
Christopher E. Shaw and colleagues discovered that r(GGGGCC) n RNA foci were detected in neuronal cell lines and zebrafish embryos expressing 38 or 72 repeats but not in those expressing 8 repeats [6]. This finding indicates that longer r(GGGGCC) n sequences lead to nuclear retention of transcripts and the formation of RNA foci, which are resistant to the enzyme ribonuclease (RNase) [6,52]. Extended r(GGGGCC) n sequences exhibit significant neurotoxicity and bind to hnRNP H and other RBPs. RNA toxicity and sequestration of RBPs may impair RNA processing and contribute to neurodegenerative diseases. In a study conducted by Simon Alberti and colleagues, it was demonstrated that RNA plays a crucial role in regulating the phase behavior of prion-like RBPs [74]. Lower RNA to protein ratios promote the separation of RBPs into liquid droplets, whereas higher ratios prevent droplet formation in vitro. When nuclear RNA levels are reduced or RNA binding is genetically ablated, excessive phase separation occurs, leading to the formation of cytotoxic solid-like assemblies in cells. The researchers proposed that the nucleus functions as a buffered system, with high RNA concentrations maintaining RBPs in a soluble state. Disruptions in RNA levels or the RNA binding abilities of RBPs result in abnormal phase transitions [75]. (c) Representative immuno fluorescence images illustrating that the r(GGGGCC)29 recruited endogenous hnRNP H. Figure 3 was reprinted from the reference [43].
Christopher E. Shaw and colleagues discovered that r(GGGGCC)n RNA foci were detected in neuronal cell lines and zebrafish embryos expressing 38 or 72 repeats but not in those expressing 8 repeats [6]. This finding indicates that longer r(GGGGCC)n sequences lead to nuclear retention of transcripts and the formation of RNA foci, which are resistant to the enzyme ribonuclease (RNase) [6,52]. Extended r(GGGGCC)n sequences exhibit significant neurotoxicity and bind to hnRNP H and other RBPs. RNA toxicity and sequestration of RBPs may impair RNA processing and contribute to neurodegenerative diseases. In a study conducted by Simon Alberti and colleagues, it was demonstrated that RNA plays a crucial role in regulating the phase behavior of prion-like RBPs [74]. Lower RNA to protein ratios promote the separation of RBPs into liquid droplets, whereas higher ratios prevent droplet formation in vitro. When nuclear RNA levels are reduced or RNA binding is genetically ablated, excessive phase separation occurs, leading to the formation of cytotoxic solid-like assemblies in cells. The researchers proposed that the nucleus functions as a buffered system, with high RNA concentrations maintaining RBPs in a soluble state. Disruptions in RNA levels or the RNA binding abilities of RBPs result in abnormal phase transitions [75].

hnRNP H and TDP-43
Heterogeneous nuclear ribonucleoprotein H (hnRNP H) is a member of the hnRNP family and functions as a multifunctional RBP involved in mRNA maturation at various stages [76]. It contains a modular domain consisting of tandem quasi-RNA recognition motifs (HqRRM1,2) at the N-terminus and a third qRRM3 at the C-terminus, situated between two glycine-rich segments [44,77,78]. The hnRNP H has the ability to bind Grich RNA sequences containing at least three consecutive guanines [44]. In the brain cells of ALS patients, hnRNP H has been found associated with insoluble aggregation of r(GGGGCC) n , leading to aberrant alternative splicing [52]. This phenotype has been utilized as a biomarker for disease diagnosis. Furthermore, ALS/FTD patients exhibit splicing alterations in several key targets and insoluble hnRNP H, indicating that modifications along this axis are critical aspects of disease etiology [52].
James L. Manley and colleagues demonstrated that hnRNP H binds to r(GGGGCC) n in vitro, and this interaction is dependent on the formation of G4s. The hnRNP H colocalizes with G4 aggregates in C9 patient-derived fibroblasts and astrocytes, but not in control cells, as proven by imaging on BG4, a G4 structure-specific antibody ( Figure 4) [79]. Another study by Donald C. Rio and colleagues revealed that in sporadic ALS/FTD patients, insolubility of hnRNP H was associated with altered splicing of a wide range of targets [52]. Numerous ALS/FTD brains show high levels of insoluble hnRNP H sequestered in r(GGGGCC) 4 RNA foci, resulting from RNA splicing defects involving intron retention [52]. These findings highlight previously unreported splicing abnormalities in extremely insoluble hnRNP H-related ALS brains, suggesting a potential feedback relationship between effective RBP concentrations and protein quality control in all ALS/FTD cases.
lized as a biomarker for disease diagnosis. Furthermore, ALS/FTD patients exhibit splic ing alterations in several key targets and insoluble hnRNP H, indicating that modifica tions along this axis are critical aspects of disease etiology [52]. James L. Manley and colleagues demonstrated that hnRNP H binds to r(GGGGCC) in vitro, and this interaction is dependent on the formation of G4s. The hnRNP H colocal izes with G4 aggregates in C9 patient-derived fibroblasts and astrocytes, but not in contro cells, as proven by imaging on BG4, a G4 structure-specific antibody ( Figure 4) [79]. An other study by Donald C. Rio and colleagues revealed that in sporadic ALS/FTD patients insolubility of hnRNP H was associated with altered splicing of a wide range of target [52]. Numerous ALS/FTD brains show high levels of insoluble hnRNP H sequestered in r(GGGGCC)4 RNA foci, resulting from RNA splicing defects involving intron retention [52]. These findings highlight previously unreported splicing abnormalities in extremely insoluble hnRNP H-related ALS brains, suggesting a potential feedback relationship be tween effective RBP concentrations and protein quality control in all ALS/FTD cases.  Figure 4 was reprinted from th reference [79].
TAR DNA binding protein 43 (TDP-43), another member of the hnRNP family, pos sesses two RNA recognition motifs (RRMs), a nuclear localization signal (NLS), and prion-like domain at the C-terminus [80]. Numerous mutations in TDP-43 have been as sociated with ALS and FTD [81,82]. The accumulation of TDP-43 is a major pathologica feature of ALS and FTD [83][84][85], and inclusion bodies are observed in patients with ab normal expansions of r(GGGGCC)n, serving as a histopathological marker in 97% of ALS cases and 45% of FTD cases.
In contrast to hnRNP H, which directly associates with r(GGGGCC)n, the pathogeni mechanism of TDP-43 in ALS/FTD is believed to involve its interaction with DPRs, which are non-ATG translation products of r(GGGGCC)n [15,86]. Edward B. Lee and colleague discovered that DPRs induce TDP-43 protein lesions in an ALS/FTD model and trigge the onset and progression of FTD [81]. The amount and characteristics of produced DPRs rather than the length of r(GGGGCC)n repeats, determine the duration and severity o TDP-43 dysfunction.  Figure 4 was reprinted from the reference [79].
TAR DNA binding protein 43 (TDP-43), another member of the hnRNP family, possesses two RNA recognition motifs (RRMs), a nuclear localization signal (NLS), and a prion-like domain at the C-terminus [80]. Numerous mutations in TDP-43 have been associated with ALS and FTD [81,82]. The accumulation of TDP-43 is a major pathological feature of ALS and FTD [83][84][85], and inclusion bodies are observed in patients with abnormal expansions of r(GGGGCC) n , serving as a histopathological marker in 97% of ALS cases and 45% of FTD cases.
In contrast to hnRNP H, which directly associates with r(GGGGCC) n , the pathogenic mechanism of TDP-43 in ALS/FTD is believed to involve its interaction with DPRs, which are non-ATG translation products of r(GGGGCC) n [15,86]. Edward B. Lee and colleagues discovered that DPRs induce TDP-43 protein lesions in an ALS/FTD model and trigger the onset and progression of FTD [81]. The amount and characteristics of produced DPRs, rather than the length of r(GGGGCC) n repeats, determine the duration and severity of TDP-43 dysfunction.

FUS
Sarcoma fusion protein (FUS) is a 526-amino acid residue protein. [87] It is predominantly expressed in neurons and is involved in DNA and RNA metabolism through its interactions with motor proteins kinesin [88] and myosin-Va [89]. Missense mutations in the FUS gene have been associated with ALS [90,91], although the prevalence of FUS gene variants in the familial ALS population is low. Sua Myong and colleagues conducted investigations on the binding of wild-type FUS to single-stranded RNAs, including r(GGGGCC) 4 , in a length-dependent manner. They observed the formation of a highly dynamic protein-RNA complex. The FUS-RNA interaction involves two mechanisms: (i) stable binding of FUS monomers to single-stranded RNA (ssRNA), and (ii) weak interaction of two FUS units with RNA, resulting in a highly dynamic interaction.
Higuro and workers observed the formation and phase transition of FUS condensates in vitro using purified full-length wild-type and mutant FUS proteins and r(GGGGCC) 4 . They found that FUS specifically forms complexes with r(GGGGCC) 4 in a G4 structuredependent manner, leading to a transition from liquid-liquid separation to liquid-solid transitions. Importantly, amino acid mutations associated with ALS significantly impact G4-dependent FUS condensation. These findings provide insights into the relationship between protein aggregation and dysfunction of FUS in ALS [49].

Zfp106
Zfp106 is a C2H2 zinc finger protein characterized by the presence of seven WD40 domains and four putative zinc fingers [92]. It plays a crucial role in maintaining neuromuscular signaling. Knockout mice exhibit gene expression patterns indicative of neuromuscular degeneration in their muscles and spinal cords. Interestingly, this phenotype can be reversed through motor neuron-specific repair of the Zfp106 transgene, highlighting its essential role in biological processes [93]. The functional acquisition model of C9orf72 neurodegeneration has been investigated in a Drosophila model [94], where Zfp106 effectively mitigates the neurotoxicity associated with the expression of GGGGCC repeat in C9orf72 ALS Drosophila. This suggests that Zfp106 acts as a repressor of neurodegeneration in C9orf72 ALS models and demonstrates a functional interaction between Zfp106 and the r(GGGGCC) n sequence. Furthermore, Brian L. Black and colleagues conducted pull-down assays and mobility shift assays, providing evidence that Zfp106 specifically binds to r(GGGGCC) 8 but not to the sequence of r(AAAACC) 8 . The ability of Zfp106 to regulate normal cellular functions and inhibit ALS by binding to r(GGGGCC) n makes it a potential drug target for treating ALS [45]. However, the mechanisms through which Zfp106 regulates normal cellular processes via RNA binding and how it inhibits ALS progression by interacting with r(GGGGCC) n are still being investigated to guide drug design efforts [45].

ADARB2
ADARB2 is a member of the CNS-rich adenosine deaminase family, known for its role in mediating A-to-I (adenosine to inosine) editing of RNA [95]. It consists of two doublestranded-specific adenosine deaminase repeats, three double-stranded RNA-binding domains, and one editase domain spanning from the N-to C-terminus. The A-to-I editing activity primarily occurs within the 16-130 nucleotide interval. This enzyme selectively deaminates adenosine (A) residues in the double-stranded region of mRNA, converting them to inosine (I), which is recognized as guanine by the cellular translation machinery, resulting in codon alterations within the synthesized protein [46] (Figure 5). Jeffrey D. Rothstein and colleagues conducted RNA fluorescence in situ hybridization (RNA FISH) and immunofluorescence labeling of RBP simultaneously in the induced pluripotent stem neuron (IPSN) cell line derived from C9orf72-related cases. Their study revealed the co-localization of ADARB2 protein with nuclear r(GGGGCC) n RNA foci, while mRNA levels remained unchanged. Co-precipitation of ADARB2 with r(GGGGCC) n repeats was also observed in vivo.
In vitro investigations utilizing recombinant ADARB2 through gel shift assays clearly demonstrated its binding to r(GGGGCC) n , implying the possible formation of ADARB2-RNA complexes. These collective findings indicate a strong binding between ADARB2 and r(GGGGCC) n . Furthermore, this team verified in vivo that the formation of r(GGGGCC) n RNA foci requires the involvement of ADARB2 protein. Treatment of the IPSN line with specific siRNA targeting ADARB2 significantly reduced the number of RNA foci. However, further experimental evidence is still needed to fully elucidate ADARB2 s in vivo function [96]. Another unresolved aspect of ADARB2 function is the speculation that ADARB2 may lose its editing activity upon interaction with r(GGGGCC) n , although experimental validation of its downstream editing effects is currently lacking.  Figure 5 was reprinted from the reference [46]. Jeffrey D. Rothstein and colleagues conducted RNA fluorescence in situ hybridization (RNA FISH) and immunofluorescence labeling of RBP simultaneously in the induced pluripotent stem neuron (IPSN) cell line derived from C9orf72-related cases. Their study revealed the co-localization of ADARB2 protein with nuclear r(GGGGCC)n RNA foci, while mRNA levels remained unchanged. Co-precipitation of ADARB2 with r(GGGGCC)n repeats was also observed in vivo.
In vitro investigations utilizing recombinant ADARB2 through gel shift assays clearly demonstrated its binding to r(GGGGCC)n, implying the possible formation of ADARB2-RNA complexes. These collective findings indicate a strong binding between ADARB2 and r(GGGGCC)n. Furthermore, this team verified in vivo that the formation of r(GGGGCC)n RNA foci requires the involvement of ADARB2 protein. Treatment of the IPSN line with specific siRNA targeting ADARB2 significantly reduced the number of RNA foci. However, further experimental evidence is still needed to fully elucidate ADARB2′s in vivo function [96]. Another unresolved aspect of ADARB2 function is the speculation that ADARB2 may lose its editing activity upon interaction with r(GGGGCC)n, although experimental validation of its downstream editing effects is currently lacking. . siRNA knockdown of ADARB2 results in a significant reduction in the percent of iPSNs with nuclear RNA foci (arrows). Data in (E) indicate mean ±SEM (*** p < 0.001). Figure 5 was reprinted from the reference [46].

Purα
Pur-alpha (Purα) is a highly conserved DNA and RNA binding protein in eukaryotic cells [97]. It performs diverse physiological functions, including transcription activation or inhibition, cell growth, and translation [98,99]. While predominantly localized in the nucleus, Purα is also widely distributed in the cytoplasm of neurons, particularly in synaptic branches [88]. In the nucleus, Purα stimulates gene transcription by binding to mRNA transcripts and accompanying them to the cytoplasm. It remains associated with the mRNA during transport over considerable distances and functions at specific sites of mRNA translation [100]. The absence of Purα can lead to various neurological disorders [101,102].
The r(GGGGCC) n repeat can sequester Purα, thereby impairing its normal functions such as gene transcription and mRNA translation, ultimately resulting in cell death [103]. In an ALS/FTD zebrafish model, Swinnen and colleagues demonstrated that the Pur2 domain of Purα binds to r(GGGGCC) 90 repeat RNA [37]. Peng Jin and colleagues conducted studies on the pathogenesis of ALS/FTD, revealing that r(GGGGCC) 10 can sequester Purα, a major component of RBPs, from the whole-cell lysate of mouse spinal cord [47]. Rossi and colleagues found that Purα can aggregate into cytosolic and nuclear granules in HeLa cells transiently transfected with a plasmid expressing r(GGGGCG) 31 . Nonetheless, due to the specific interaction between Purα and r(GGGGCC) n , it is conceivable that Purα may influence the outcome of RAN translation. Consequently, in ALS, reduced protein levels amplify certain cellular characteristics. Over-expression of Purα in mammalian and Drosophila model systems can rescue r(GGGGCC) n repeat-induced neurodegeneration [47].
Furthermore, Purα also interacts with the C-terminal region of FUS, another protein recruited by r(GGGGCC) n [104]. In vivo expression of Purα in various Drosophila tissues significantly exacerbates neurodegeneration caused by mutated FUS. Conversely, reducing Purα expression in neurons expressing mutated FUS significantly improves the climbing ability of Drosophila flies. This suggests that downregulation of Purα ameliorates locomotion defects, a classical symptom of ALS resulting from mutant FUS expression. These findings indicate that Purα may contribute to the pathogenesis of ALS mediated by FUS. However, it remains unclear which functional domains or subdomains of Purα are involved in mediating its interaction with FUS [105].
Binding of Purα to other cellular proteins can directly impact the expression of the PURA gene. Purα itself can bind to GC/GA-rich sequences in its own promoter and inhibit gene expression [106]. Similarly, binding of Purα to expanded polynucleotide repeat RNA may also affect the expression of the PURA gene. In both scenarios, the mechanism of action may involve the combination of Purα with cellular components, resulting in a reduction in effective intracellular Purα levels. The reduction in Purα could trigger a feedback mechanism of the PURA gene, although it is unknown whether this compensates for Purα sequestration [100].

Lead Small Molecules Binds to r(GGGGCC) n
Given the pharmacological advantages of r(GGGGCC) n formation of RNA foci and their recruitment of RBPs, small molecules present an attractive option for targeting r(GGGGCC) n . Therefore, it is interesting to investigate the binding of r(GGGGCC) n to small molecules ( Figure 6). Currently, a number of the small molecules contain aromatic rings have been found to bind to r(GGGGCC) n .
Molecules 2023, 28, x FOR PEER REVIEW 11 of 19 Figure 6. r(GGGGCC)n small molecular structure bound to small molecules.

Binding of r(GGGGCC) 8 with the TMPyP4
The G4 structure has been shown to bind to 5,10,15,20-tetra(N-methyl-4-pyridyl) porphyrin (TMPyP4), as demonstrated before [107,108]. TMPyP4 binds a variety of G4 structures of DNA or RNA [109,110]. In 2014, Christopher E. Pearson and colleagues found that TMPyP4 could bind and distort the G4 formed by r(GGGGCC) 8 , inhibiting the interaction of some proteins with the repeat [23]. Several studies have shown that TMPyP4 disrupts the binding of hnRNPA1 to the r(GGGGCC) 8 repeat, that are supposed to link to ALS/FTD pathogenesis [23]. Therefore, it may be possible to develop therapeutic treatments using TMPyP4 to disrupt the interaction of RBPs. However, TMPyP4 may either stabilize or destabilize RNA G4. Kelly and colleagues used molecule dynamics simulations to analyze RNA G4 structure and speculated that TMPyP4 might interact with RNA G4 in three different ways: top-stacking, bottom-stacking, and side-binding, maintaining stability under certain conditions [111]. However, the specific structure and binding mode of the complex have not been reported. Therefore, further study on the interaction between TMPyP4 and r(GGGGCC) n RNA, as well as the destruction of RBPs binding which may cause toxicity, will be one of the directions for the development of related small molecule drugs.

Binding of r(GGGGCC) 8 with Other Liands
Matthew D. Disney and colleagues has discovered three lead compounds, 1a, 2, and 3, that bind with r(GGGGCC) 8 in vitro, with Kds of 9.7, 10, and 16 µM, respectively [55]. These three small molecules were obtained by Hoechst or bis-benzimidazole query, and were derived from the small molecule library established by chemical similarity search. This library is enriched in compounds that have the potential to recognize RNA 1 × 1 nucleotide internal loops, among which 1a has been proven to bind 1 × 1 GG internal loops present in r(CGG) exp , and improve fragile X-associated tremor/ataxia syndrome (FXTAS)-associated defects [112].
As r(GGGGCC) 8 RNA experiences dynamical equilibrium between hairpin and parallel G4 structure in solution, the binding constants of these lead compounds with RNA were evaluated in either K + containing buffer (favorable for G4 structure) or Na + buffer (favorable for hairpin). The 3-10 times higher Kds of 1a and 3 were obtained in the presence of K + than the Na + buffer, demonstrating their favor binding to G4 structures of r(GGGGCC) 8 . In contrast, a Na + -dependent affinity of 2 was not affected by r(GGGGCC) 8 , but it significantly decreased with K + , showing the specific binding with hairpin structures. The optical melting data further demonstrated that compound 3 has no influence on the stability of r(GGGGCC) 8 , while compounds 1a and 2 improve it.
The effects of three ligands on non-ATG translation of r(GGGGCC) n were tested in HEK293 cells expressing r(GGGGCC) 66 [55]. It was found that poly(GP) and poly(GA) proteins, but not poly(GR) proteins, were produced in the system. Compound 3 (100 µM, 24 h) was shown to moderately limit poly(GP) synthesis while having no effect on poly(GA). Compounds 1a and 2, on the other hand, drastically reduced the amounts of GP and GA proteins, which dramatically lowered the percentage of positive cells in the lesions. This suggests that ligand binding to r(GGGGCC) n could be a potentially effective cure for FTD/ALS.

Binding of r(GGGGCC) 8 with CB096
Disney and colleagues discovered a benzimidazole derivative CB096 that binds to r(GGGGCC) n . NMR, structure-activity relationship (SAR) studies, and molecular dynamics (MD) simulations with r(GGGGCC) n hairpin structure have been used to determine the molecular interaction between CB096 and r(GGGGCC) n (Figure 7) [113]. When r(GGGGCC) n is folded, CB096 can specifically bind to the repeating 1 × 1 GG inner ring structure of 5 CGG\3 GGC. The TO-PRO-1 (TO-1) fluorescent dye replacement assay and microscale thermoelectrophoresis (MST) were used to screen the ligands bound to the r(GGGGCC) 8 hairpin. CB096 binds to 5 CGG/3 GGC of the r(GGGGCC) n hairpin and breaks the base pair as shown by NMR. To bind to the r(GGGGCC) n hairpin structure, the chemical 5 s-NO2 group and 2-methoxyphenyl are crucial. In ALS/HEK293T FTD's cells, CB096 slowed RAN translation and reduced poly(GP) DPR formation, but did not affect r(GGGGCC) 66 mRNA levels. In conclusion, the researchers showed that CB096 binds particularly to the 1 × 1 GG inner ring 5 CGG\3 GGC generated during the expansion of r(GGGGCC) n . r(GGGGCC)n. NMR, structure-activity relationship (SAR) studies, and molecular dynam-ics (MD) simulations with r(GGGGCC)n hairpin structure have been used to determine the molecular interaction between CB096 and r(GGGGCC)n (Figure 7) [113]. When r(GGGGCC)n is folded, CB096 can specifically bind to the repeating 1 × 1 GG inner ring structure of 5′CGG\3′GGC. The TO-PRO-1 (TO-1) fluorescent dye replacement assay and microscale thermoelectrophoresis (MST) were used to screen the ligands bound to the r(GGGGCC)8 hairpin. CB096 binds to 5′CGG/3′GGC of the r(GGGGCC)n hairpin and breaks the base pair as shown by NMR. To bind to the r(GGGGCC)n hairpin structure, the chemical 5′s-NO2 group and 2-methoxyphenyl are crucial. In ALS/HEK293T FTD's cells, CB096 slowed RAN translation and reduced poly(GP) DPR formation, but did not affect r(GGGGCC)66 mRNA levels. In conclusion, the researchers showed that CB096 binds particularly to the 1 × 1 GG inner ring 5′CGG\3′GGC generated during the expansion of r(GGGGCC)n.

Binding of r(GGGGCC) n with DB1246, DB1247, and DB1273
Isaacs and colleagues screened a chemical library of small molecules to find the r(GGGGCC) 4 binding ligands [53]. They identified 44 hits out of 138 small molecules by a FRET-based G4 melting assay. Among those hitting compounds, three molecules are structurally similar (DB1246, DB1247, and DB1273) and have the ability to bind and stabilize G4s structure, as shown by temperature dependent CD spectroscopy [53]. Treatment with these compounds led to a significant reduction in both RNA foci formation and dipeptide repeat protein levels in Drosophila carrying r(GGGGCC) 36 and improved survival in vivo [53]. These findings suggest that targeting the r(GGGGCC) n G4 using small molecules may be a promising therapeutic approach to alleviate two key pathologies associated with FTD/ALS.

Binding of r(GGGGCC) n with CB253
Andrei and colleagues incorporated 19 F modified nucleotides to replace the C6 residue in r(GGGGCC) 2 duplex model (5 CCGGGG/3 GGGGCC) to investigate the binding mechanism of CB253 to r(GGGGCC) n (Figure 8) [114]. The replacement of 19 F nucleotide enables the use of 19 F NMR spectroscopy to investigate the structure and interactions. Two types of inner ring, 1 × 1 GG and 2 × 2 GG, were detected and verified in the r(GGGGCC) 2 hairpin structure. Among them, the 1 × 1 GG was the main conformation, and the two conformations could slowly transform into each other to achieve an equilibrium. Addition of CB253 stabilizes the 2 × 2 GG inner ring structure of r(GGGGCC) 2 duplex, which becomes a stable dominant conformation. CB253 can form key interactions with N1-H of G3 and combine with r(GGGGCC) 2 at a 2:1 ratio. The precise 2,4-diamino substitution pattern within CB253's quinazoline scaffold is crucial for binding the r(GGGGCC) n hairpin RNA. In HEK293T and lymphoblastoid cells from C9orf72 patients, CB253 reduced the formation of stress granules induced by r(GGGGCC) 66 and inhibited RAN translation in a dose-dependent manner, leading to a significant reduction in poly(GP) DPR levels. These findings indicate that CB253 is a promising chemical probe that can specifically bind to and stabilize the 2 × 2 GG inner ring of r(GGGGCC) n hairpin structure, and inhibit various C9orf72-specific pathological mechanisms by directly engaging r(GGGGCC) n . hairpin structure. Among them, the 1 × 1 GG was the main conformation, and the tw conformations could slowly transform into each other to achieve an equilibrium. Addition of CB253 stabilizes the 2 × 2 GG inner ring structure of r(GGGGCC)2 duplex, which be comes a stable dominant conformation. CB253 can form key interactions with N1-H of G and combine with r(GGGGCC)2 at a 2:1 ratio. The precise 2,4-diamino substitution pattern within CB253′s quinazoline scaffold is crucial for binding the r(GGGGCC)n hairpin RNA In HEK293T and lymphoblastoid cells from C9orf72 patients, CB253 reduced the for mation of stress granules induced by r(GGGGCC)66 and inhibited RAN translation in dose-dependent manner, leading to a significant reduction in poly(GP) DPR levels. Thes findings indicate that CB253 is a promising chemical probe that can specifically bind t and stabilize the 2 × 2 GG inner ring of r(GGGGCC)n hairpin structure, and inhibit variou C9orf72-specific pathological mechanisms by directly engaging r(GGGGCC)n.

Summary and Perspective
In this review, we provide a comprehensive overview of the advancements in under standing the structure of r(GGGGCC)n and d(GGGGCC)n, the phase separation and tran sition of r(GGGGCC)n, the interactions of r(GGGGCC)n with RBPs, and the discovered Figure 8. CB253 that selectively binds the hairpin form of r(GGGGCC) n . Figure 8 was reprinted from the reference [114].

Summary and Perspective
In this review, we provide a comprehensive overview of the advancements in understanding the structure of r(GGGGCC) n and d(GGGGCC) n , the phase separation and transition of r(GGGGCC) n , the interactions of r(GGGGCC) n with RBPs, and the discovered ligands capable of inhibiting the non-ATG translations of r(GGGGCC) n and/or the interactions between r(GGGGCC) n and RBPs.
The relationship between the fatal neurodegenerative diseases ALS/FTD, the structure of r(GGGGCC) n RNA, and their interactions have garnered significant research attention. When the repeat number exceeds the threshold, r(GGGGCC) n RNA undergoes phase separation and transition, leading to the formation of nuclear RNA foci. These RNA foci recruit RBPs, disrupting the physiological functions of RNA splicing and maturation. Another pathogenic mechanism by which r(GGGGCC) n contributes to ALS or FTD is the cytotoxicity of repetitive dipeptide proteins generated through non-ATG translation. Aggregates of these repetitive dipeptide proteins, can recruit numerous 26S proteasome complexes and stabilize a transient substrate-processing conformation of the 26S proteasome, suggesting impaired degradation processes [115].
Characterizing the repeat structure of r(GGGGCC) n RNA and elucidating the structurefunction relationship are key areas of research in understanding the pathogenic causes. r(GGGGCC) n can adopt diverse structures, including hairpin and parallel G4 topologies, with equilibrium between them depending on solution conditions. However, the threedimensional structures of r(GGGGCC) n RNA are still unknown. Achieving a dominant conformation for structural studies may require sequence and solution condition optimization. Another challenging aspect is determining the secondary structures of r(GGGGCC) n within RNA foci or gel-like states. Due to the non-crystalline solid state and heterogeneous nature of RNA foci, commonly used high-resolution structure determination methods such as X-ray crystallography or solution NMR are not applicable [116,117]. To date, the RNA structures within RNA foci remain unidentified. Advancements in RNA structure determination methodologies, such as solid-state NMR [118][119][120], are needed to overcome this limitation.
Several small molecules that bind to r(GGGGCC) n have been discovered to block RBP interactions, inhibit phase separation, and/or hinder non-ATG translation, as evidenced both in vivo and in vitro. Understanding the structural details of the interactions between r(GGGGCC) n RNA and ligands is crucial for facilitating the design of lead compounds to treat ALS/FTD. Similar to the challenges faced in studying r(GGGGCC) n RNA, the complex structure determination of r(GGGGCC) n RNA repeats and small molecules is lacking, necessitating further developments to gain insights into drug design.
Another known treatment approach for ALS/FTD involves the use of antisense RNA. Single-dose injections of antisense oligonucleotides (ASOs) targeting repeat-containing RNAs, while preserving mRNA levels encoding C9orf72, have resulted in sustained reductions in RNA foci and dipeptide-repeat proteins, leading to the amelioration of behavioral deficits. These efforts have identified the gain of toxicity as a central disease mechanism caused by repeat-expanded C9orf72 and established the feasibility of ASO-mediated therapy [16]. ALS brains treated with ASO therapeutics targeting the C9orf72 transcript or repeat expansion showed mitigation despite the presence of repeat-associated non-ATG translation products [46]. Moreover, the introduction of mRNA that encodes r(GGGGCC) n binding proteins into ALS/FTD cells has the potential to restore RBP functions by augmenting the intracellular pool of RBPs recruited by RNA foci. This approach represents an alternative strategy for treating ALS by targeting r(GGGGCC) n RNA. Lastly, gene editing system by CRISPR/Cas9 has successfully removed the GGGGCC repeat expansion in C9orf72, leading to reduction in RNA foci and DPR formations, proving a promising approach in ALS treatments.