C9orf72-G4C2 Intermediate Repeats and Parkinson’s Disease; A Data-Driven Hypothesis

Pathogenic C9orf72-G4C2 repeat expansions are associated with ALS/FTD, but not with Parkinson’s disease (PD); yet the possible link between intermediate repeat lengths and PD remains inconclusive. We aim to study the potential involvement of these repeats in PD. The number of C9orf72-repeats were determined by flanking and repeat-primed PCR assays, and the risk-haplotype was determined by SNP-array. Their association with PD was assessed in a stratified manner: in PD-patients-carriers of mutations in LRRK2, GBA, or SMPD1 genes (n = 388), and in PD-non-carriers (NC, n = 718). Allelic distribution was significantly different only in PD-NC compared to 600 controls when looking both at the allele with higher repeat’s size (p = 0.034) and at the combined number of repeats from both alleles (p = 0.023). Intermediate repeats (20–60 repeats) were associated with PD in PD-NC patients (p = 0.041; OR = 3.684 (CI 1.05–13.0)) but not in PD-carriers (p = 0.684). The C9orf72 risk-haplotype, determined in a subgroup of 588 PDs and 126 controls, was observed in higher frequency in PD-NC (dominant model, OR = 1.71, CI 1.04–2.81, p = 0.0356). All 19 alleles within the risk-haplotype were associated with higher C9orf72 RNA levels according to the GTEx database. Based on our data, we suggest a model in which intermediate repeats are a risk factor for PD in non-carriers, driven not only by the number of repeats but also by the variants’ genotypes within the risk-haplotype. Further studies are needed to elucidate this possible role of C9orf72 in PD pathogenesis.


Introduction
Parkinson's disease (PD) is a common neurodegenerative disorder, affecting about 2% of the elderly population worldwide [1]. Its complex genetic background has been revealed in the past decades, implicating many genes associated with the disease. A wide variety of genetic changes and mechanisms are involved in PD, including rare and common variants, recessive, dominant, and oligogenic inheritance, and epigenetics [2][3][4][5][6][7][8]. However, the full range of genetic changes in PD is still evolving. G 4 C 2 Hexanucleotide repeat expansions in C9orf72 are strongly associated with amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) [9,10], mostly in European and North American populations [11,12]. Although 30 repeats and over are considered pathogenic for ALS and FTD, most patients that are C9orf72associated-ALS or -FTD, carry an expanded allele with hundreds, or even thousands, of repeats [12][13][14]. Interestingly, parkinsonism was observed in more than 40% of FTD and FTD/ALS patients with pathogenic C9orf72 expansions [15]. This observation has led researchers to investigate the possible association of C9orf72 expansions with PD.
While rare cases of PD patients with 30 to 60 repeat expansions or more in C9orf72 were detected (<0.7%), there was no association with PD [16][17][18][19][20]. Few studies have suggested that intermediate-size repeat lengths in C9orf72 may be a risk factor for PD; however, these studies suggested different repeat lengths for this association: ≥7 repeats in Han Chinese [21], and ≥20 repeats in Caucasians [17]. In a multi-center meta-analysis of mostly Caucasian and Asian populations, the pathogenicity expansions threshold was determined as >60 repeats, and the intermediate repeat size was set as 17-60 repeats [22], suggesting an effect on PD-risk for the cutoff of 17 repeats, and even as little as 10 repeats as a standalone or as a cutoff. In a recent comprehensive review by Bourinaris and Houlden [23], C9orf72 intermediate repeat lengths were reported in several parkinsonism and movement disorders, including in Dopa-responsive PD, atypical parkinsonisms including PSP and MSA, essential-tremor plus parkinsonism, and spinocerebellar ataxia.
As previous studies have shown the pivotal role of genetically homogeneous populations, such as the Ashkenazi Jews (AJ), in understanding the genetic background of neurodegenerative diseases [24][25][26][27], we hereby determined C9orf72 repeats' size in PD patients of Ashkenazi origin and examined their potential association with PD. We also studied the possible association of the shared risk-haplotype, which is observed in carriers of intermediate repeats, with PD-risk.

Population
This study included a cohort of consecutively recruited unrelated 1106 PD patients of full Ashkenazi Jewish origin (Table 1). Patients were recruited between the years 2005 and 2015. The diagnostic criteria, recruitment, and genotyping for LRRK2, GBA, and SMPD1 mutations have been previously described [24,25,28]. Carrier patients (PD-carriers) were defined with one or more of the following mutations in LRRK2 (p.G2019S), SMPD1 (p. L302P), or any of the 10 GBA mutations (c.84insG, IVS2 + 1G > A, p.V394L, p.N370S, p.L444P, p.R496H, p.E326K, p.T369M, p.R44C, and 370Rec). Patients who did not carry any of these mutations were determined as non-carriers (PD-NC). The cohort of 600 ethnically matched control individuals used in this study has been previously described [26].

Determining the G 4 C 2 Hexanucleotide Repeat Length in the C9orf72 Gene
To determine the number of repeats in C9orf72, flanking and repeat-primed PCR assays were performed as previously described [9,10], with some modifications [26]. This method detects all repeats expansion but can determine the number of repeats up to 55 repeats. Therefore, in all individuals that carried an allele with 30 repeats or more, an additional method was used to determine accurately the repeat number up to 145 repeats (Asuragen assay kit AmplideX ® PCR/CE C9orf72 Kit; Asuragen Genetics; Austin, TX, USA).

Assembly of the Risk-Haplotype within the C9orf72 Locus
To determine the presence of the risk-haplotype in C9orf72-locus, we used the genotype data (from Vacic et al. [29], Affymetrixs SNP6.0) of 127 AJ controls (all part of our cohort of 600 controls) and 597 AJ-PD patients (594 are part of our cohort of 1106 PD patients). When determination of the presence of the risk-haplotype was impossible due to missing genotypes, these individuals were excluded (Control = 1, PD = 5). One PD was excluded due to low genotype rate, one was not tested for repeat size, and two PDs who carried >145 repeats were also excluded. In total, 126 controls and 588 PD patients were analyzed.

Statistical Analyses of C9orf72 G 4 C 2 Hexanucleotide Repeats
All statistical analyses were performed using IBM SPSS statistics software v25 (IBM Corporation, New York, USA). Differences in continuous variables were tested using Mann-Whitney U test or t-test (2-tailed). To test the difference in C9orf72 repeat lengths between patients and controls, both alleles repeat sizes (per individual) were included.
Categorical variables were compared using 2-sided χ 2 -test, or Fisher's exact test when numbers were less than 5. Odds ratio (OR), with 95% confidence interval (CI), was applied to assess the association of C9orf72 G 4 C 2 repeat lengths with PD. This association was examined using the longest repeat size (per individual) as the independent variable. Association of the risk-haplotype with PD was examined using a dominant model. Logistic regression analysis was performed when using repeat units as a quantitative trait (the largest allele or the sum of both alleles).

Allele Frequencies of C9orf72 G 4 C 2 Hexanucleotide Repeats in Ashkenazi PD Patients
Allele frequencies of C9orf72 repeats were determined in our cohort of 1106 Ashkenazi PD patients (Table 1), that was divided into two groups based on their genotypic status, either carriers of PD-associated mutations (see Methods section), or non-carrier patients (PD-NC). We ran a stratified analysis based on the carrier status in LRRK2, GBA, and SMPD1, as a high percentage of our PD cohort carry risk alleles in these 3 genes (35.1%, 388/1106), and these carrier-patients may mask the effect of the hexanucleotide repeat length on PD-risk in non-carrier patients. The most frequent alleles found were 2, 8 and 5 repeat units (66.2%, 12.1%, 8.9% in carrier, and 63.1%, 12.7%, 9.7% in non-carrier patients; Figure 1 and Supplement Table S1). These alleles were also shown as the most common alleles in our previously published data of Ashkenazi controls (66.4%, 11.0%, and 10.2%) [26]. No significant difference in allele distribution was observed between patients with mutations (in LRRK2, GBA, or SMPD1 genes) and controls (Mann-Whitney U test p = 0.756; 4.23 ± 6.13, N = 776 and 4.09 ± 5.36, N = 1200, respectively). However, a significant difference in allele distribution was detected in PD-NC (Mann-Whitney U test p = 0.034; 4.71 ± 8.40, N = 1436). This was also significant for the total number of repeats (combining the numbers of repeats from both alleles; excluding individuals with expanded alleles of >145 repeats) in PD-NC (Mann-Whitney U test p = 0.023; 8.63 ± 5.57, N = 714, and 7.94 ± 5.01, N = 599) and was not significant in PD-carriers (Mann-Whitney U test p = 0.565; 8.10 ± 4.90, N = 387).

The Association of C9orf72 G 4 C 2 Hexanucleotide Intermediate Repeat Lengths with PD in Ashkenazi Patients
We examined the association of the longest repeat allele in each individual with PD (in PD-carriers and PD-NC). First, we examined whether large expansion lengths (>145 repeats) are present within our cohort of PD patients and controls. We found one PD-carrier patient (carrying the LRRK2 p.G2019S mutation, 1/388 = 0.3%), four PD-NC (4/718 = 0.6%), and one control (1/600 = 0.2%) that carried an expanded allele. No significant association was detected in any of the patients' groups compared to controls (Fisher's Exact Test p = 1.000 and p = 0.384, respectively). Of interest, one of these PD patients was later diagnosed with ALS, and two had dementia. For further analysis, we excluded these five carriers with C9orf72 G 4 C 2 repeats expansion (all with > 145 repeats). None of our patients or controls carried alleles between 60 and 145 repeats.

The C9orf72 Risk-Haplotype Is Associated with Higher RNA Expression Levels and with PD
We have previously shown that the C9orf72 expansions in AJ shared a risk-haplotype that expands 107Kb (Goldstein et al. [26], hg19: chr9:27484575-27591569), encompasses the complete C9orf72 gene, and includes 44 informative single nucleotide variants (SNVs), with significant association with higher number of repeats (over 8 repeats). We further examined here the effects of these 44 SNVs (within the risk haplotype) on the RNA expression levels (eQTL) and splicing (sQTL) of their adjacent genes, as reported by the Genotype-Tissue Expression (GTEx) project [30]. Of them, 11 SNVs had no QTLs, 4 had low effect size (absolute NES < 0.2), and one had no significant effect (m-value < 0.9). Nine other variants were also excluded due to high allele frequency in AJs (> 0.5 in gnomAD v3.1 database of non-neuro cases). The other 19 SNVs within this 107Kb risk-haplotype were all associated with higher C9orf72 RNA expression levels compared to the expression levels of the non-risk alleles (Table 3, Figure 2). Cerebellum and Nucleus-accumbens were the tissues with the highest normalized effect size (NES). Moreover, as GTEx evaluates the effect of each SNV on the neighboring genes within a 2 Mb interval (1 Mb upstream and 1 Mb downstream), it is important to note that all 19 SNVs affected exclusively C9orf72-RNA levels and not any other genes within that 2 Mb window. These SNVs also affected splice variants (sQTL) in an exclusive manner, only for C9orf72, mainly in cerebellum (Table 3).
We, therefore, tested if carrying the risk-haplotype is associated with PD. Genotyping data from 127 AJ controls and 597 AJ-PD patients were assembled (see Methods), and the presence of the 19-SNVs-risk-haplotype was determined. Overall enrichment of the risk-haplotype was observed in PDs compared to controls: 167 out of 588 PDs carried one or two copies of the risk-haplotype (28.4%) compared to 24 out of 126 controls (19.0%). When stratifying based on mutation carrier status (PD-carriers and PD-NC), a significant association was detected in PD-NC: 28.6% of them carried one or two copies of the riskhaplotype (OR = 1.71, CI = 1.04-2.81, p = 0.0356), and tendency was shown in PD-carriers (28.0%, OR = 1.65, CI = 0.97-2.82, p = 0.0656).  We also attempted to define the correlation between the existence of risk-haplotype and number of repeats, by looking at all alleles (n = 1428 alleles): 100% negative correlation existed between the risk-haplotype and the 2-repeats' allele (with zero percent risk-haplotype), and 100% positive correlation was observed in carriers of 14-60 repeats (100% carried the risk-haplotype, Supplement Figure S1). Thus, we used 14 repeats as the best assessor for carrying the risk-haplotype and re-calculated the association of 14 repeats and higher with PD in our cohorts. Among all alleles in PD-NC, 2.3% had 14-60 repeats, compared to only 1.

Discussion
More than 40 diseases, most of which affect the nervous system, were identified with a genetic basis of expansions of simple short DNA sequence (reviewed by Reference [31]). Among these diseases are myotonic dystrophy, Huntington's disease, spinocerebeller ataxia (SCA), spinal and bulbar muscular atrophy (SBMA), and Fragile-X syndrome. A common finding in these disorders is the correlation of the number of repeats with clinical phenotypes and penetrance. Interestingly, emerging studies suggest that expanded repeats and intermediate repeats can cause or act as risk factors for different neurological diseases, depending on the number of repeats. This was suggested for the FMR1 gene, when over 200 CCG repeats cause mental retardation (Fragile-X syndrome), while the premutations of 45-200 repeats are a risk factor for Fragile-X-tremor/ataxia syndrome (FXTAS) in males and premature ovarian failure 1 (FXPOI) in females. Other examples are ATXN2, in which more than 34 repeats cause SCA2, while 29-33 are risk factor for ALS [32,33], and ATXN1, in which more than 38 repeat cause SCA1, while ≥33 are risk factor for ALS, mostly in C9orf72 expansion carriers [34]. The latter is an example of the complexity of these mechanisms, as some of these risk alleles are more significant in specific subgroups of patients. To see if a similar phenomenon exists also for C9orf72 intermediate repeats, we analyzed Parkinson's disease patients of Ashkenazi origin, in a stratified manner, in carriers of mutations in LRRK2, GBA, and/or SMPD1 genes, and in patients that do not carry these mutations. We showed that the expanded alleles (>60 repeats) had no association with PD, as shown by other groups and in a meta-analysis [16][17][18][19][20][21][22]; however, intermediate-size repeats of 20-60 are significantly enriched in PD-NC, increasing the risk for PD, while, in PD-carriers, there was no effect. The high significant odds ratio of 3.68 in non-carriers may be due to the exclusion of those patients who carry known risk alleles in LRRK2, GBA, or SMPD1, as we believe that in these patients the risk for PD is likely influenced by these mutations and not by the C9orf72 intron 1 hexanucleotide repeat numbers. Of note is that Xi et al. also reported that intermediate repeats of 20-29 were found only in PD-NC, and not in PD-patients that carry the LRRK2 p.G2019S mutation [19]. Based on these observations, we propose that C9orf72 G 4 C 2 hexanucleotide repeats in intron 1 act as a risk factor for PD when number of repeats are intermediates. Although the exact definition of intermediateand pathogenic-length size is still in debate, and may differ in different populations, we believe that the presence of alleles lower than the 60 repeats, in the intermediate range, should not be dismissed as a potential risk for PD.
How may intermediate numbers of repeats affect the risk to develop PD? In ALS, although the role of C9orf72 expansions is not yet fully established, three main mechanisms are proposed to contribute to its pathogenicity: C9orf72 loss-of-function, generation of toxic RNA aggregates, and short peptide accumulation [12,14]. As G 4 C 2 repeats in the C9orf72 gene are located within intron 1 and are near the promoter region, they may lead to changes in promoter regulation, depending on the number of repeats. Indeed, it was shown that ALS/FTD-expanded alleles are highly methylated and lead to lower levels of C9orf72 mRNA and protein [35][36][37].
Do these same mechanisms contribute to Parkinson's disease risk? We demonstrated here that the risk-haplotype, which is shared by the intermediate C9orf72 G 4 C 2 repeats, includes many SNVs that have the same effect of increasing RNA expression levels of C9orf72, as reported in GTEx. We, therefore, suggest that the mechanism involved in the effect of intermediate-size repeats on PD-increased-risk, could be the higher expression of C9orf72. This mechanism was recently suggested for a different neurodegenerative disease, Corticobasal Degeneration (CBD), a rare neurodegenerative disease that shares some similar clinical features with PD [38]. Researchers showed a significant enrichment of intermediate repeats in autopsy-proven CBD, as well as increased C9orf72-RNA expression levels in human brain tissues and in CRISPR/cas9 knock-in iPSC cells, but no association with pathologic RNA foci or dipeptides aggregates.
One important question is whether intermediate-repeats-sizes or the increase of C9orf72 expression, which is associated with the risk-haplotype, may drive the risk for PD.
As these two mostly go together, this question should be answered in an experimental set-up that separates the two events. Cali et al. tried to answer this question by knocking-in 28 repeats into iPSCs that normally have either 2 or 6 repeats (suggestive of cells that do not carry the risk haplotype) and demonstrated that these knocked-in cells show higher expression of C9orf72 [38]. As GTEx data suggest higher expression of C9orf72 for all 19 variants within the risk-haplotype, it is tempting to suggest that PD-risk may be determined by the level of C9orf72 expression, mediated by both the risk-haplotype and the number of repeats, a hypothesis that needs further evaluation. This hypothesis raises other questions: does the overall level of C9orf72 expression depend on the effect of the total number of repeats in both alleles, and whether the genomic region of the C9orf72-risk-haplotype might expand to a larger interval than the minimal linkage disequilibrium of 107 Kb, as suggested by gnomAD database? In addition, the GTEx data show that the risk-haplotype effect on C9orf72 RNA expression levels is not uniform in all tissues. The effect size is large and significant mostly in brain tissues, as well as in the small intestine, but much smaller in whole blood and lymphocytes.
What could be the effect of intermediate-size repeats on cellular expression? Cali et al. performed a comparative genes expression analysis between cells with intermediate repeats and cells with low number of repeats [38], demonstrating upregulation of genes that are enriched for vesicle trafficking and protein degradation pathways, including golgi vesicle transport, response to ER-stress, and autophagy, pathways which are involved in PD.
In summary, our stratified analysis suggests that intermediate-size hexanucleotide repeats in C9orf72 are a risk factor for PD in individuals who do not carry common AJ founder mutations in LRRK2, GBA, or SMPD1. These results should be interpreted with caution as no correction for multiple comparisons was performed, and similar analyses should be performed on a larger cohort of PD patients. However, we propose a model in which the risk for PD may be driven not only by the number of repeats, but also by the genotypes of SNVs within the risk-haplotype, affecting C9orf72 RNA expression levels. Further studies are needed to elucidate the possible role of C9orf72 in PD pathogenesis.