Next Article in Journal
The Role of Cadherin 17 (CDH17) in Cancer Progression via Wnt/β-Catenin Signalling Pathway: A Systematic Review and Meta-Analysis
Previous Article in Journal
Antiproliferative and Proapoptotic Effects of Chetomin in Human Melanoma Cells
Previous Article in Special Issue
Ajuba as a Potential Nutrition-Responsive Biomarker for the Prevention of Age-Related Sarcopenia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Role of Repeat Tract Structure and the rs7158733 SNP in Spinocerebellar Ataxia 3

1
Ataxia Centre, Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London WC1N 3BG, UK
2
Neurogenetics Service, Rare and Inherited Disease Laboratory, London North Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3BH, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2025, 26(20), 9836; https://doi.org/10.3390/ijms26209836
Submission received: 12 August 2025 / Revised: 19 September 2025 / Accepted: 20 September 2025 / Published: 10 October 2025
(This article belongs to the Special Issue Molecular Studies in Aging, 2nd Edition)

Abstract

Spinocerebellar ataxia 3 (SCA3) is a neurodegenerative condition caused by an expansion of a polyglutamine tract within the ATXN3 gene. Normal alleles range from 12 to 44 repeats, while pathogenic alleles have 52 repeats or more. The canonical ATXN3 repeat tract sequence includes three interruptions at positions 3 (CAA), 4 (AAG), and 6 (CAA). The intragenic rs7158733 single-nucleotide polymorphism (SNP) flanks the ATXN3 repeat region and substitutes a TAC1118 tyrosine codon with a TAA1118 stop codon, resulting in a shorter ataxin-3aS isoform. We examined the distribution of SCA3 allele repeat sizes in a UK-based cohort presenting with an ataxic phenotype. The 6596 alleles showed a clear gap between normal and expanded alleles, with no intermediate alleles containing 41 to 57 repeats. We used clone sequencing to characterize the structure of the ATXN3 repeat region in a sub-cohort of 44 SCA3 patients. We observed that the three canonical interruptions were typically preserved. There was no association of the interruptions with age at onset detected in this cohort, given the limited power of this sub-cohort. We genotyped the rs7158733 SNP in a sub-cohort of 79 SCA3 patients and found that 74.7% of expanded alleles carried the A1118 variant, which was associated with earlier disease onset. This study highlights the importance of rs7158733 genotyping alongside ATXN3 repeat sizing for patient evaluation, as this SNP modifies the effect of repeat size on age at onset in SCA3 for pathogenic alleles up to 69 repeats.

1. Introduction

Spinocerebellar ataxia type 3 (SCA3), also known as Machado–Joseph disease (MJD), is an autosomal dominantly inherited polyglutamine (polyQ) disorder, representing the most common SCA worldwide. This pathology arises from a CAG repeat expansion within the ATXN3 gene, which encodes for the protein ataxin-3. It demonstrates a spectrum of clinical presentations, including but not limited to progressive cerebellar ataxia, external ophthalmoplegia, dysarthria, dysphagia, pyramidal signs, dystonia, rigidity, and distal muscle atrophy [1]. Healthy individuals typically exhibit a range of twelve to forty-four repeats, whereas patients present with an expanded range of fifty-two to eighty-six repeats [1,2,3]. The onset of the disease spans a wide range, from four to seventy-five years [4], with the mean age of onset approximately forty years [5]. Like other polyglutamine disorders, the number of CAG repeats in the ATXN3 expanded allele is inversely correlated with the age at which the disease manifests and directly correlates with disease severity [1,3,6,7,8].
Interruptions within the CAG repeat tract of the genes responsible for polyglutamine diseases have gained increasing attention due to their potential impact on disease severity, diagnostic accuracy, genetic counselling, elucidation of disease mechanisms, and possible future therapeutic development [9,10,11]. The canonical sequence of the ATXN3 gene repeat tract includes three interruptions at positions 3 (CAA), 4 (AAG), and 6 (CAA) [2]. Like other repeat expansion disorders, SCA3 exhibits repeat instability in both germline and somatic cells. Interruptions within the repeat tract are believed to impact this instability, as uninterrupted stretches are typically more prone to instability and associated deleterious effects.
Three intragenic single-nucleotide polymorphisms (SNPs) flanking the ATXN3 CAG repeat tract were originally studied in SCA3 patients, with their initial nomenclature being based on the sequence described in Kawaguchi et al. [2]. A669TG/G669TG (rs1048755, A669/G669) is located 11.4 kbp upstream from the CAG repeat in exon 8 [12,13]. C987GG/G987GG (rs12895357, C987/G987) is immediately downstream to the CAG repeat tract in exon 10 [2]. Finally, TAA1118/TAC1118 (rs7158733, A1118/C1118) is located 132 bp downstream of the CAG repeat tract within the 3′ UTR [13,14]. Here, we focus on the rs7158733 SNP where the ochre stop codon TAA1118 is substituted for the codon TAC1118. Previous studies have found the C1118 to be more common in chromosomes without the expansion (63% of 124 control chromosomes [13] and 75% of 303 non-expanded chromosomes [14]). Conversely, the A1118 allele was particularly frequent in chromosomes carrying more than 33 repeats (73%) [14]. These results were corroborated in a worldwide haplotype study, where 76% of expanded chromosomes were found to carry the A1118 allele, whereas this was present in only 24% of the control chromosomes [15] and similar results have been reproduced in other cohorts [16]. The pathological significance of the rs7158733 (A1118/C1118) SNP has remained elusive. Previous studies have been unable to identify an association between the A1118/C1118 alleles and age at onset in patients with SCA3/MJD [14,17], nor prove an influence of the SNP on a preliminary analysis of patients’ clinical presentation [14]. However, data regarding the effect of the SNP on clinical progression or association with other biomarkers have not yet been reported. The rs7158733 SNP seems to have an effect at both the RNA and protein levels. The presence of the TAA1118 codon has been predicted to have an impact on RNA secondary structure and the binding of RNA-binding proteins (RBPs) to the ATXN3 transcript [17]. The presence of the stop codon TAA1118 also produces a shorter isoform of ataxin-3, referred to as ataxin-3aS (short), whilst the presence of the TAC1118 codon produces the ataxin-3aL (long) isoform (Figure 1). Compared to other isoforms, ataxin-3aS has a lower protein concentration and shorter half-life, resulting from its degradation through both the autophagy pathway and the ubiquitin–proteasome system (UPS). However, ataxin-3aS shows a higher nuclear localization and insolubility than other isoforms, producing larger protein aggregates than ataxin-3aL [18]. Together, these characteristics indicate that ataxin-3aS is more prone to aggregation and, therefore, is likely to have a higher pathogenicity.
In this study, we aim to determine whether genetic biomarkers, such as the repeat tract configuration in the expanded ATXN3 allele and the rs7158733 SNP variant, are associated with distinct SCA3 phenotypes in our cohort. In particular, we seek to understand how these factors influence the age at onset, disease duration, and clinical presentation of SCA3. Additionally, we aim to identify any intermediate alleles in our large cohort, considering the unique gap between normal and pathogenic alleles observed in SCA3 that has remained unresolved since the discovery of the gene over 20 years ago [2], unlike SCA1 (Figure S3).

2. Results

2.1. ATXN3 Allele Distribution in a Large UK Cohort

A summary of the cohorts and sub-cohorts is this study is shown in Figure 2. SCA3 diagnostic tests were performed on 6596 discrete chromosomes from 3298 individuals by the Neurogenetics Unit at the National Hospital for Neurology and Neurosurgery, London. The frequency and distribution of the repeat sizes are shown in Figure 3. Normal alleles in this cohort range from 12 to 40 repeats (n = 6510; mean = 21 repeats), and are shown in Figure 3A, whilst expanded alleles that range from 58 to 91 repeats (n = 86; mean = 69 repeats) are shown in Figure 3B. The median normal allele was 22 repeats, with an interquartile range of 6 repeats. The modal allele also contained 22 repeats and was detected in 1471 chromosomes (22.3% of all tests; 22.6% of normal alleles). The median expanded allele was 70 repeats, with an interquartile range of 7 repeats. The modal allele also contained 70 repeats and was detected in 16 chromosomes (0.24% of all tests; 18.6% of expanded alleles).
There was a clear distinction between normal and expanded alleles, and no intermediate alleles of 41 to 57 repeats were observed in the cohort.

2.2. Clone Sequencing

A total of 440 clones were sequenced from the 44 subjects in the “Cloning” sub-cohort. There were 286 clones (65.0%) that contained expanded alleles, whilst 154 (35.0%) contained non-expanded alleles. The median number of clones per participant was 9.5 (Q1, Q3 = 8, 12). Individual CAG repeat sequences from these clones can be found in Table S1. Since the number of clones sequenced for each individual differed, clone numbers were expressed as a percentage to account for clone depth (Table S2).
The canonical ATXN3 repeat tract contains three interruptions towards its 5′ end: CAA at position 3, AAG at position 4, and CAA at position 6 [2]. In the “Cloning” sub-cohort, 407 out of 440 clones (92.5%) contained sequences with the canonical ATXN3 repeat tract. A total of 29 participants in the “Cloning” sub-cohort (65.9%) had clones with only canonical repeat tract sequences. Information about the most frequent non-expanded allele, the most frequent expanded allele, and the most frequent loss of canonical interruption allele can be found in Table 1.

2.2.1. Loss of Canonical Interruptions

There were 25 clones (5.7%) containing sequences that lacked one or more canonical interruptions in expanded alleles that belonged to eight patients (18.2% of the “Cloning” sub-cohort). Four of these eight participants (Participants #8, 9, 17, and 42) showed non-canonical sequences in all clones for their expanded alleles, whilst the other four presented with a combination of canonical and non-canonical clones for their expanded alleles (Participants #5, 13, 25, and 31). None of the clones lacking one or more of the canonical interruptions contained additional downstream interruptions in the CAG tract. The canonical CAA interruption at position 6 was lost in 23 clones, whereas the AAG interruption at position 4 was missing in four clones, and the CAA interruption at position 3 was only absent in a single clone with a pure CAG stretch.

2.2.2. Presence of Additional Interruptions

Eight clones contained one additional CAA interruption in the CAG repeat tract and belonged to the expanded alleles from seven subjects (15.9% of the “Cloning” sub-cohort). However, all of these participants had clones with canonical alleles and clones with an additional interruption in their expanded allele sequences. The additional interruptions arose from CAG to CAA codon transitions in positions 8 (n = 3), 9 (n = 2), 10 (n = 1), 12 (n = 1), and 25 (n = 1). None of the clones with additional interruptions showed loss of any of the canonical interruptions.

2.2.3. Correlation with Age at Disease Onset

For the 44 subjects in the “Cloning” sub-cohort, age at disease onset information was available for 34 patients (77.3%). We examined the influence of pathogenic allele repeat length on the patient’s age at onset, finding a negative correlation between the age at disease onset and the median pathogenic allele repeat tract size (Figure 4), with a Pearson correlation coefficient r = −0.360 (p = 0.037) and fitting a linear regression model (slope = −1.039, y-intercept = 114.6, R2 = 0.129).

2.2.4. Non-Expanded Alleles

The size of non-expanded alleles had a unimodal distribution, with alleles of 23 CAG repeats being the most frequent (n = 8; 21.6%). The median number of CAG repeats in non-expanded alleles was 23.0 repeats (Q1, Q3 = 22.0, 30.0). All clones carrying non-expanded alleles contained canonical sequences.

2.3. rs7158733 SNP (A1118/C1118) Allele Frequencies

For expanded alleles, 59 out of 79 alleles (74.7%) carried the A1118 variant of the SNP, whilst for non-expanded alleles, 26 out of 79 (32.9%) carried the A1118 variant of the SNP (Table 2). There was a statistically significant association between the expanded allele and the A1118 variant of the SNP (McNemar’s χ 2   = 18.46; df = 1; p < 0.001), with an OR = 3.54 (95% CI with exact approximation: 1.88, 7.14).
The demographic and clinical characteristics of the SCA3/MJD participants based on their rs7158733 SNP genotype are shown in Table 3. The proportion of females was similar in both groups (Table 3).
Participants with the A1118 allele were significantly younger at baseline compared to C1118 carriers (difference in medians = 6 years) (p = 0.029) (Table 3 and Figure 5A), with a significantly earlier age at onset of the disease (difference in medians = 14) (p = 0.017) (Table 3 and Figure 5B). There was no significant difference in either the disease duration (difference in medians = −2) (p = 0.583) (Table 3 and Figure 5C) or the size of the expanded allele (difference in medians = −2) (p = 0.294) (Table 3 and Figure 5D) between subjects with A1118 and C1118. There was also no statistically significant difference in the distribution of the rs7158733 SNP based on ethnicity ( χ 2 = 9.88; df = 4; Fisher’s exact test, p = 0.054) (Figure S1).
When examining the relationship between age at onset and the number of CAG repeats in the expanded allele, stratifying based on expanded allele rs7158733 SNP (A1118/C1118) variant, we found both SNP groups had a negative correlation between their age at disease onset and pathogenic allele repeat tract size (Figure 6). The A1118 group (n = 48) had a Pearson correlation coefficient r = −0.320 (p = 0.027) and fit a linear regression model (slope = −0.779, y-intercept = 90.81, R2 = 0.102). The C1118 group (n = 15) had a Pearson correlation coefficient r = −0.862 (p < 0.0001) and fit a linear regression model (slope = −3.284, y-intercept = 264.4, R2 = 0.743). There was a very significant difference in the regression line slopes (F = 9.348; p = 0.003), suggesting that the SNP acts as a modifier of the effect of the number of CAG repeats on the age at disease onset (Figure 6). Multiple linear regression analysis showed an earlier age of onset in the A1118 group adjusted by the number of CAG repeats in the expanded allele (−7.61 years on average, p = 0.032), with larger differences in age of onset for lower numbers of CAG repeats in the expanded allele (Figure 6). In our cohort, sex and the number of CAG repeats in the non-expanded allele were not associated to age of onset
The effect of the number of CAG repeats in the non-expanded allele on age at disease onset was also explored. However, analyses showed no association between the number of CAG repeats in the non-expanded allele and the age at onset of the disease, stratifying based on rs7158733 SNP (A1118/C1118) variant.
When examining the baseline patient rating scales, there was no significant difference in scale for the assessment and rating of ataxia (SARA), inventory of non-ataxia signs (INAS), or activities of daily living (ADL) scores between patients with the A1118 rs7158733 SNP and patients with the C1118 SNP in cis with their expanded allele (Figure S2). Longitudinal progression changes in SARA, INAS, and ADL were also examined, and no significant differences between the rs7158733 SNP groups were found.
Moreover, we wanted to understand whether there was an association between the presence of the rs7158733 SNP and the presentation of the disease. Table S3 summarises the symptoms at onset in the “Genotyping” sub-cohort. In both groups, the most common symptom was ataxia, with very few patients displaying parkinsonism or spasticity as initial symptoms. There was no significant difference in the symptoms at onset between groups.

3. Discussion

We adopted a deep-genotyping approach to improve our understanding of the genetics of SCA3. Firstly, we aimed to identify intermediate alleles in a large cohort of subjects who have undergone SCA3 tests. This would close the gap between normal and pathogenic alleles currently seen in SCA3, which has not been observed for other CAG repeat disorders, where instability in intermediate and reduced penetrance alleles leads to a repeat expanding past the pathogenic threshold (Figure S3). This study showed a prominent gap between normal and pathogenic alleles in our cohort, with no intermediate alleles detected with 41 to 57 repeats. This gap has been previously described by several groups in different populations. When ATXN3 was first cloned in 1994, no intermediate alleles between 37 and 67 repeats were found in this Japanese cohort [2]. Subsequent studies in an Azorean cohort [19], a Taiwanese cohort [20], British cohort [3], a Dutch individual [21], and others have closed the gap to between 45 and 51. Examining the SCA3 diagnostic tests subsequently performed at the NHNN between 2015 and 2020, we found that the intermediate-allele gap is still present with no alleles between 37 and 57 repeats (774 alleles were between 14 and 36 repeats and 30 alleles were between 58 and 76 repeats). A shift to using tethered-PCR for sizing accuracy [22] as further confirmed the gap. Out of 704 SCA3 tests performed at the North Thames Genomics Laboratory Hub (GLH) between 2020 and the present date in 2025, 662 had repeat sizes of 14 to 39, whilst 42 had repeat sizes of 59 to 79. This shows the intermediate gap is still 40–58 repeats.
The confirmation of the gap between normal and pathogenic alleles in our cohort suggests that this unique behaviour for SCA3 may be due to the founder origins of the Machado and Joseph lineages, where such limited haplotypes have not been observed for other polyglutamine disorders [12,15,23,24,25,26,27,28,29,30,31,32,33]. The gap is still distinct when using an improved repeat sizing technique, spinocerebellar ataxia tethering PCR [22]. Diagnostic tests performed at the North Thames Genomics Laboratory Hub (GLH) revealed that out of 704 SCA3 diagnostic tests performed to date from 2020, no intermediate alleles were detected with 40 to 58 repeats.
To study the repeat tract structure and the role of interruptions in SCA3 pathology, we cloned and sequenced the ATXN3 repeat tract in a sub-cohort of patients. We sequenced 440 clones from 44 patients, with the majority (286 clones, 65.0%) containing expanded alleles. A negative correlation was observed between the age at disease onset and the median pathogenic allele repeat tract size. Most clones (92.5%) contained sequences with the canonical ATXN3 repeat tract with the three interruptions at positions 3 (CAA), 4 (AAG), and 6 (CAA), indicating that the loss or gain of any interruptions is rare. The canonical interruptions in the ATXN3 repeat tract appear consistent across different human populations and even in some non-human primates (e.g., chimpanzees and gorillas) [34]. Unlike most CAG repeat genes, such as ATXN1, the interruptions in ATXN3 are not completely lost from the repeat tract during pathological expansion. In our cohort, all expanded alleles, except one potential artefact, contained interruptions, with the polymorphic CAG present at position 7 of the tract in these alleles. Four subjects (9.1% of the “Cloning” sub-cohort) exhibited a consistent loss of the CAA interruption at position 6, a change previously described for non-expanded alleles [2]. Our data indicate that changes in the repeat tract structure are rare, and interruptions do not seem to influence the age at onset in SCA3, unlike interruptions in SCA1 [9] and Huntington’s Disease (HD) [35,36], where interruptions delay age at onset, and their loss leads to an earlier age at onset.
We genotyped a sub-cohort for the rs7158733 SNP and found that 74.7% of expanded alleles carried the A1118 variant of the SNP, a frequency concordant with previous studies [14,15,16]. Individuals with the A1118 SNP in cis with their expanded allele had a significantly lower baseline age and an earlier age at onset than those with the C1118 SNP, despite both groups having similar median expanded allele repeat sizes. Previous studies could not find differences in age at onset between the A1118 and C1118 groups [14,17], which could be explained by differences in the composition of the cohorts and/or data collection. For instance, Melo et al. studied an Azorean patient cohort with a predominance of the C1118 SNP in cis with expanded alleles (74.2%) and considered the age at onset as the “self-reported age of first symptoms of the disease” [17]. The Azorean population presents with the highest worldwide prevalence of SCA3/MJD [37]. This could bias the reported age at onset towards earlier values because of increased awareness in that population and better recognition of very early symptoms in the disease course. Additionally, European studies such as the European integrated project on spinocerebellar ataxias (EUROSCA) and the European spinocerebellar ataxia type 3/Machado–Joseph disease initiative (ESMI) protocols consider age at onset of gait difficulties as the age at onset of the disease, even if these are not the first symptoms of the disease.
Alleles carrying the TAA1118 codon code for a shorter ataxin-3 isoform (ataxin-3aS) [18]. Ataxin-3aS has a shorter in vitro half-life compared to ataxin-3aL and ataxin-3c, due to combined degradation via autophagy and the ubiquitin-proteasome system [18]. Additionally, ataxin-3aS shows higher enzymatic activity, a preferential nuclear localisation (even for fragments with low numbers of glutamines), and greater insolubility compared to the other isoforms [18]. These properties indicate that ataxin-3aS is a more pathogenic isoform, and hence, patients in the A1118 group would be expected to present with a more severe condition than C1118 carriers.
When stratifying the sub-cohort based on the rs7158733 (A1118/C1118) SNP, the regression line slope for the C1118 group is significantly steeper compared to the A1118 group. This indicates that for pathogenic allele sizes up to 69 repeats, the age at onset for individuals with a C1118 SNP is later than for those with an A1118 SNP. The milder phenotype in terms of age at onset within the C1118 group supports the proposition that the ataxin-3aS isoform expressed by the A1118 group is more pathogenic. For lower repeat numbers, the A1118 allele produces a more toxic form of ataxin-3 (ataxin-3aS), resulting in an earlier age at onset. The impact of longer polyglutamine tracts would override the effect of this isoform, thus both A1118 and C1118 groups would exhibit a similar age at onset. Therefore, it is possible that the rs7158733 (A1118/C1118) SNP acts as a disease modifier in carriers of expanded alleles with lower repeat numbers.
There was no significant difference in cross-sectional patient rating scales (SARA, INAS, or ADL) or symptoms at onset between patients in the A1118 group and the C1118 group. Similarly, the individual rates of progression in SARA total scores for both groups were not significantly different, which may be attributed to the low number of patients in the C1118 SNP group and subjects without a follow-up visit.
Looking at the “genotyping” sub-cohort as a whole, both A1118 and C1118 groups presented with a similar phenotype at onset, with over 90% of participants in both groups experiencing ataxia as their initial symptom. Although data collection could be biased towards cerebellar gait disorders, these figures seem to agree with other studies [38,39]. These results also agree with a previous report that could not identify any differences in disease presentation between both groups [14].
This study has some limitations. The sample size is relatively modest due to SCA3 being a rare disorder. In particular, the C1118 group in the “genotyping” sub-cohort is especially small compared to the A1118 group. Therefore, this study may have lacked enough statistical power to detect all clinically relevant effects and confounders. Comparing our cohort to the recently published European cohort of ESMI [40], the differences we observe, for instance, in the later age at onset in individuals carrying the C1118 SNP, may be due to differences in cohort composition, with our smaller cohort potentially amplifying any observations. However, our cohort is more ethnically diverse and is slightly older at baseline.
Integrating SNP genotyping into the diagnosis of SCA3/MJD can identify individuals suitable for allele-specific silencing therapies. Traditional diagnostic methods rely on PCR-based techniques, with further tests used if larger expansions or interruptions are suspected. Advanced PCR protocols, such as TP-PCR and tethering PCR, streamline diagnosis but do not provide intragenic haplotype details. Recent approaches include the use of haplotype analysis in diagnosis. Future advancements in bioinformatics may enable efficient diagnosis of repeat expansion disorders through Whole Genome Sequencing (WGS) in combination with ExpansionHunter [41], offering intragenic haplotype insights [22,42,43,44,45,46].
In conclusion, we found that the rs7158733 SNP modifies the effect of repeat size on age at onset in SCA3 for pathogenic alleles up to 69 repeats. Therefore, we recommend genotyping this SNP alongside ATXN3 repeat sizing for the diagnostic assessment of patients.

4. Materials and Methods

4.1. Ethical Statement

This study was approved by the London (Queen Square) NHS Research Ethics Committee (reference 09/H0716/53; initial approval date 17 September 2009) at the National Hospital for Neurology and Neurosurgery, London and the University College London NHS Health Research Authority (reference 17/LO/0381; approval date 28 April 2017).

4.2. Patient Cohort

The diagnostic test cohort consists of patients with an ataxic phenotype whose blood samples were sent to the Neurogenetics Unit at The National Hospital for Neurology and Neurosurgery in London to undergo a panel of diagnostic tests for SCA1, SCA2, SCA3, SCA6, and SCA7 between November 1993 and October 2013. For this study, we selected a cohort of 83 patients diagnosed with SCA3 who were seen at the Ataxia Centre in London. A total of 79 patients underwent genotyping for the rs7158733 SNP. Forty-four patients had their ATXN3 repeat tracts cloned and sequenced, with 40 also included in the group genotyped for the rs7158733 SNP. Four patients from the cloning sub-cohort did not undergo rs7158733 SNP genotyping because of DNA shortage from these historic samples. Clinical data were collected from patient records, and rating scales were obtained from at least one visit.

4.3. SCA3 Fragment Sizing and Cloning of SCA3 Allele CAG-Repeat Tracts

Genomic DNA were extracted as previously described [9]. SCA3 alleles were fragment sized by amplifying the CAG-repeat tracts with a FAM-labelled PCR primer, SCA3F (6-FAM-5′-CCAGTGACTACTTTGATT-3′), and SCA3R (5′-TGGCCTTTCACATGGATGTGAA-3′) before resolving fragments on an ABI 3730xl DNA analyzer with a GeneScan 500 LIZ Size Standard (Thermo Fisher Scientific, Waltham, MA, USA). Repeat lengths were calculated by subtracting the number of flanking bases in the PCR product outside the repeat region (172 bp) and then dividing by 3.
Two strategies were employed for clone sequencing. The CAG-repeat tracts were amplified with either ATXN3x12 BamHI Forward (5′-taccgagctcggatccGTGTCAAACTTCTGACCTCAAGCC-3′) and ATXN3x12 XhoI Reverse (5′-gccctctagactcgagATGAATGGTGAGCAGGCCTTACCT-3′) primers, digested with BamHI and XhoI restriction enzymes and cloned into a pcDNA3.1(+) vector or with ATXN3x12 Rpt SNP For (5′-ACCACTCCTGGCCATGATAG-3′)and ATXN3x12 Rpt SNP Rev (5′-AGCAATCCTCTCCTGCCTTG-3′) primers and ligated into a pCR-Blunt vector using the Zero Blunt PCR Cloning kit (Thermo Fisher Scientific). Plasmids were propagated in Stbl3 E. coli (Thermo Fisher Scientific).

4.4. Characterisation of the rs7158733 SNP (A1118/C1118)

Genotyping of the rs7158733 SNP (A1118/C1118) was performed by allele-specific PCR using a 6-FAM-labelled forward primer and an unlabelled allele-specific reverse primer as previously described [16]. The primers used were rs7158733 For (FAM-5′-CCAGTGACTACTTTGATTCG-3′) and either rs7158733 C Rev (5′-AAAAATCACATGGAGCTCG-3′) for TAC1118 or rs7158733 A Rev (5′-AAAAATCACATGGAGCTCT-3′) for TAA1118. The thermocycling conditions were 94 °C for 2 min; 10 cycles of 94 °C for 15 s, 70 °C for 15 s decreasing 1 °C per cycle, 68 °C for 15 s; 32 cycles of 94 °C for 15 s, 60 °C for 30 s, 68 °C for 15 s using Invitrogen™ Platinum™ II Hot-Start PCR Master Mix (Thermo Fisher Scientific). PCR products were purified using a GeneJET Gel Extraction and DNA Cleanup Micro Kit (Thermo Fisher Scientific) according to the manufacturer’s protocol, eluting with 10 µL elution buffer.
A total of 1 µL of purified 6-FAM-labelled PCR products was combined with 12 µL HiDi Formamide (Applied Biosystems, Waltham, MA, USA) and 0.3 µL GeneScan 500 LIZ® Size Standard (Applied Biosystems) and resolved on an ABI 3730xl DNA Analyzer (Applied Biosystems). Fragment analysis was performed using the Peak Scanner analysis module on the Thermo Fisher Cloud. The CAG repeat length for each allele associated with an rs7158733 SNP was calculated by subtracting 268 bp from the fragment size and dividing this by 3.

4.5. Statistical Analysis

Statistical analysis was conducted with Prism (version 10.2.3, GraphPad Software, Inc., Boston, MA, USA) and Stata (version 17.0, StataCorp LLC, College Station, TX, USA). Graphs were prepared in Prism.
In the “Cloning” sub-cohort, for a single participant, each allele (expanded or normal) is represented by a variable number of clones. For each participant, the size of their pathogenic expanded and normal alleles was calculated as the median size of all the clones representing that given allele. The median was chosen rather than the mean to avoid the influence of extreme observations on the data. The relationship between age at onset and median pathogenic allele size was examined using Pearson’s r correlation coefficient, and the data were fitted to a simple linear regression model.
McNemar’s test for paired data was used to examine the association of the rs7158733 SNP variant with the expanded allele.
Groups for comparison were defined by the variant of the rs7158733 SNP in the expanded allele (A1118 or C1118). Bivariate analysis of differences in demographic, genetic variables, and disease characteristics between the groups was performed using Pearson’s χ 2 tests and Mann–Whitney’s tests. Linear regression slopes for rs7158733 SNP groups were compared using a two-tailed F test. Adjusted effects of the A1118/C1118 SNP on age of onset were also analysed through multiple linear regression models, including the following covariates in the maximum model: number of CAG repeats in the expanded allele, and the interaction between the number of CAG repeats in the expanded allele and the SNP allele. To examine a possible effect of the number of CAG repeats in the non-expanded allele on age of onset, multiple linear regression models were fitted with the following covariates: version of the A1118/C1118 SNP, and number of CAG repeats in the expanded allele.
Data on differences in baseline patient rating scales (SARA, INAS, and ADL) were analysed with unpaired two-tailed t-tests.
Fisher’s exact tests were used to compare the frequency of different symptoms at onset in both groups.
A p-value of ≤0.05 was considered statistically significant.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26209836/s1. References [47,48] are cited in the supplementary materials.

Author Contributions

Conceptualization, P.G.; methodology, S.N., H.G.-M., J.A. and R.L.; formal analysis, S.N. and H.G.-M.; investigation, S.N., H.G.-M., J.A. and R.L.; resources, P.G.; writing—original draft preparation, S.N., H.G.-M. and J.A.; writing—review and editing, S.N., H.G.-M. and P.G.; visualization, S.N.; supervision, P.G.; funding acquisition, P.G. All authors have read and agreed to the published version of the manuscript.

Funding

P.G. received support from the National Institute for Health Research University College Hospitals Biomedical Research Centre UCLH and the North Thames CRN. S.N., H.G.-M., J.A., and P.G. work at University College London Hospitals/University College London, which receives a proportion of funding from the Department of Health’s National Institute for Health Research Biomedical Research Centre’s funding scheme. P.G. received funding from CureSCA3 in support of H.G.-M.’s work. P.G. received funding from MRC MR/N028767/1 ESMI.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the London (Queen Square) NHS Ethics Committee (reference 09/H0716/53; initial approval date 17 September 2009) and the University College London NHJS Health Research Authority (reference 17/LO/0381; approval date 28 April 2017).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The de-identified data used in this study are available from the corresponding author.

Acknowledgments

We thank the patients and their families for their valuable contribution to the research and for making their data available. We thank John Hardy for valuable discussions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
ADLActivities of daily living
ESMIEuropean spinocerebellar ataxia type 3/Machado–Joseph disease initiative
EUROSCAEuropean integrated project on spinocerebellar ataxias
HDHuntington’s Disease
INASInventory of non-ataxia signs
MJDMachado–Joseph disease
RBPRNA-binding protein
SARAScale for the assessment and rating of ataxia
SCASpinocerebellar ataxia
SNPSingle-nucleotide polymorphism
TP-PCRTriplet repeat primed PCR
WGSWhole Genome Sequencing

References

  1. do Carmo Costa, M.; Paulson, H.L. Toward understanding Machado-Joseph disease. Prog. Neurobiol. 2012, 97, 239–257. [Google Scholar]
  2. Kawaguchi, Y.; Okamoto, T.; Taniwaki, M.; Aizawa, M.; Inoue, M.; Katayama, S.; Kawakami, H.; Nakamura, S.; Nishimura, M.; Akiguchi, I. CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nat. Genet. 1994, 8, 221–228. [Google Scholar] [CrossRef]
  3. Giunti, P.; Sweeney, M.G.; Harding, A.E. Detection of the Machado-Joseph disease/spinocerebellar ataxia three trinucleotide repeat expansion in families with autosomal dominant motor disorders, including the Drew family of Walworth. Brain A J. Neurol. 1995, 118 Pt 5, 1077–1085. [Google Scholar] [CrossRef]
  4. Carvalho, D.R.; La Rocque-Ferreira, A.; Rizzo, I.M.; Imamura, E.U.; Speck-Martins, C.E. Homozygosity enhances severity in spinocerebellar ataxia type 3. Pediatr. Neurol. 2008, 38, 296–299. [Google Scholar] [CrossRef] [PubMed]
  5. Sequeiros, J.; Coutinho, P. Epidemiology and clinical aspects of Machado-Joseph disease. Adv. Neurol. 1993, 61, 139–153. [Google Scholar]
  6. Jardim, L.B.; Pereira, M.L.; Silveira, I.; Ferro, A.; Sequeiros, J.; Giugliani, R. Neurologic findings in Machado-Joseph disease: Relation with disease duration, subtypes, and (CAG)n. Arch. Neurol. 2001, 58, 899–904. [Google Scholar] [CrossRef]
  7. Schols, L.; Amoiridis, G.; Epplen, J.T.; Langkafel, M.; Przuntek, H.; Riess, O. Relations between genotype and phenotype in German patients with the Machado-Joseph disease mutation. J. Neurol. Neurosurg. Psychiatry 1996, 61, 466–470. [Google Scholar] [CrossRef]
  8. Maciel, P.; Gaspar, C.; DeStefano, A.L.; Silveira, I.; Coutinho, P.; Radvany, J.; Dawson, D.M.; Sudarsky, L.; Guimaraes, J.; Loureiro, J.E.; et al. Correlation between CAG repeat length and clinical features in Machado-Joseph disease. Am. J. Hum. Genet. 1995, 57, 54–61. [Google Scholar]
  9. Menon, R.P.; Nethisinghe, S.; Faggiano, S.; Vannocci, T.; Rezaei, H.; Pemble, S.; Sweeney, M.G.; Wood, N.W.; Davis, M.B.; Pastore, A.; et al. The role of interruptions in polyQ in the pathology of SCA1. PLoS Genet. 2013, 9, e1003648. [Google Scholar]
  10. Wiethoff, S.; O’Connor, E.; Haridy, N.A.; Nethisinghe, S.; Wood, N.; Giunti, P.; Bettencourt, C.; Houlden, H. Sequencing analysis of the SCA6 CAG expansion excludes an influence of repeat interruptions on disease onset. J. Neurol. Neurosurg. Psychiatry 2018, 89, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
  11. Nethisinghe, S.; Lim, W.N.; Ging, H.; Zeitlberger, A.; Abeti, R.; Pemble, S.; Sweeney, M.G.; Labrum, R.; Cervera, C.; Houlden, H.; et al. Complexity of the Genetics and Clinical Presentation of Spinocerebellar Ataxia 17. Front. Cell. Neurosci. 2018, 12, 429. [Google Scholar] [CrossRef] [PubMed]
  12. Martins, S.; Calafell, F.; Gaspar, C.; Wong, V.C.N.; Silveira, I.; Nicholson, G.A.; Brunt, E.R.; Tranebjaerg, L.; Stevanin, G.; Hsieh, M.; et al. Asian origin for the worldwide-spread mutational event in Machado-Joseph disease. Arch. Neurol. 2007, 64, 1502–1508. [Google Scholar] [CrossRef] [PubMed]
  13. Goto, J.; Watanabe, M.; Ichikawa, Y.; Yee, S.B.; Ihara, N.; Endo, K.; Igarashi, S.; Takiyama, Y.; Gaspar, C.; Maciel, P.; et al. Machado-Joseph disease gene products carrying different carboxyl termini. Neurosci. Res. 1997, 28, 373–377. [Google Scholar]
  14. Stevanin, G.; Lebre, A.S.; Mathieux, C.; Cancel, G.; Abbas, N.; Didierjean, O.; Durr, A.; Trottier, Y.; Agid, Y.; Brice, A. Linkage disequilibrium between the spinocerebellar ataxia 3/Machado-Joseph disease mutation and two intragenic polymorphisms, one of which, X359Y, affects the stop codon. Am. J. Hum. Genet. 1997, 60, 1548–1552. [Google Scholar] [PubMed]
  15. Gaspar, C.; Lopes-Cendes, I.; Hayes, S.; Goto, J.; Arvidsson, K.; Dias, A.; Silveira, I.; Maciel, P.; Coutinho, P.; Lima, M.; et al. Ancestral origins of the Machado-Joseph disease mutation: A worldwide haplotype study. Am. J. Hum. Genet. 2001, 68, 523–528. [Google Scholar] [CrossRef]
  16. Prudencio, M.; Garcia-Moreno, H.; Jansen-West, K.R.; Al-Shaikh, R.H.; Gendron, T.F.; Heckman, M.G.; Spiegel, M.R.; Carlomagno, Y.; Daughrity, L.M.; Song, Y.; et al. Toward allele-specific targeting therapy and pharmacodynamic marker for spinocerebellar ataxia type 3. Sci. Transl. Med. 2020, 12, eabb7086. [Google Scholar] [CrossRef]
  17. Melo, A.R.V.; Raposo, M.; Ventura, M.; Martins, S.; Pavao, S.; Alonso, I.; Bettencourt, C.; Lima, M. Genetic Variation in ATXN3 (Ataxin-3) 3’UTR: Insights into the Downstream Regulatory Elements of the Causative Gene of Machado-Joseph Disease/Spinocerebellar Ataxia Type 3. Cerebellum 2023, 22, 37–45. [Google Scholar]
  18. Weishäupl, D.; Schneider, J.; Peixoto Pinheiro, B.; Ruess, C.; Dold, S.M.; von Zweydorf, F.; Gloeckner, C.J.; Schmidt, J.; Riess, O.; Schmidt, T. Physiological and pathophysiological characteristics of ataxin-3 isoforms. J. Biol. Chem. 2019, 294, 644–661. [Google Scholar] [CrossRef]
  19. Tuite, P.J.; Rogaeva, E.A.; St George-Hyslop, P.H.; Lang, A.E. Dopa-responsive parkinsonism pheotype of Machado-Joseph disease: Confirmation of 14q CAG expansion. Ann. Neurol. 1995, 38, 684–687. [Google Scholar]
  20. Hsieh, M.; Tsai, H.F.; Lu, T.M.; Yang, C.Y.; Wu, H.M.; Li, S.Y. Studies of the CAG repeat in the Machado-Joseph disease gene in Taiwan. Hum. Genet. 1997, 100, 155–162. [Google Scholar] [CrossRef]
  21. Van Shaik, I.N.; Jöbsis, G.J.; Vermeulen, M.; Keizers, H.; Bolhuis, P.A.; de Visser, M. Machado-Joseph disease presenting as severe asymmetric proximal neuropathy. J. Neurol. Neurosurg. Psychiatry 1997, 63, 534–536. [Google Scholar] [CrossRef]
  22. Cagnoli, C.; Brussino, A.; Mancini, C.; Ferrone, M.; Orsi, L.; Salmin, P.; Pappi, P.; Giorgio, E.; Pozzi, E.; Cavalieri, S.; et al. Spinocerebellar Ataxia Tethering PCR: A Rapid Genetic Test for the Diagnosis of Spinocerebellar Ataxia Types 1, 2, 3, 6, and 7 by PCR and Capillary Electrophoresis. J. Mol. Diagn. 2018, 20, 289–297. [Google Scholar] [CrossRef]
  23. Mittal, U.; Sharma, S.; Chopra, R.; Dheeraj, K.; Pal, P.K.; Srivastava, A.K.; Mukerji, M. Insights into the mutational history and prevalence of SCA1 in the Indian population through anchored polymorphisms. Hum. Genet. 2005, 118, 107–114. [Google Scholar] [CrossRef] [PubMed]
  24. Sena, L.S.; Furtado, G.V.; Pedroso, J.L.; Barsottini, O.; Cornejo-Olivas, M.; Nobrega, P.R.; Braga Neto, P.; Soares, D.M.B.; Vargas, F.R.; Godeiro, C.; et al. Spinocerebellar ataxia type 2 has multiple ancestral origins. Park. Relat. Disord. 2024, 120, 105985. [Google Scholar] [CrossRef]
  25. Craig, K.; Takiyama, Y.; Soong, B.W.; Jardim, L.B.; Saraiva-Pereira, M.L.; Lythgow, K.; Morino, H.; Maruyama, H.; Kawakami, H.; Chinnery, P.F. Pathogenic expansions of the SCA6 locus are associated with a common CACNA1A haplotype across the globe: Founder effect or predisposing chromosome? Eur. J. Hum. Genet. 2008, 16, 841–847. [Google Scholar] [CrossRef] [PubMed]
  26. Jonasson, J.; Juvonen, V.; Sistonen, P.; Ignatius, J.; Johansson, D.; Bjorck, E.J.; Wahlstrom, J.; Melberg, A.; Holmgren, G.; Forsgren, L.; et al. Evidence for a common Spinocerebellar ataxia type 7 (SCA7) founder mutation in Scandinavia. Eur. J. Hum. Genet. 2000, 8, 918–922. [Google Scholar] [CrossRef]
  27. Zühlke, C.; Dalski, A.; Schwinger, E.; Finckh, U. Spinocerebellar ataxia type 17: Report of a family with reduced penetrance of an unstable Gln49 TBP allele, haplotype analysis supporting a founder effect for unstable alleles and comparative analysis of SCA17 genotypes. BMC Med. Genet. 2005, 6, 27. [Google Scholar] [CrossRef]
  28. Lund, A.; Udd, B.; Juvonen, V.; Andersen, P.M.; Cederquist, K.; Davis, M.; Gellera, C.; Kolmel, C.; Ronnevi, L.O.; Sperfeld, A.D.; et al. Multiple founder effects in spinal and bulbar muscular atrophy (SBMA, Kennedy disease) around the world. Eur. J. Hum. Genet. 2001, 9, 431–436. [Google Scholar] [CrossRef]
  29. Martins, S.; Matama, T.; Guimaraes, L.; Vale, J.; Guimaraes, J.; Ramos, L.; Coutinho, P.; Sequeiros, J.; Silveira, I. Portuguese families with dentatorubropallidoluysian atrophy (DRPLA) share a common haplotype of Asian origin. Eur. J. Hum. Genet. 2003, 11, 808–811. [Google Scholar] [CrossRef] [PubMed]
  30. Veneziano, L.; Mantuano, E.; Catalli, C.; Gellera, C.; Durr, A.; Romano, S.; Spadaro, M.; Frontali, M.; Novelletto, A. A shared haplotype for dentatorubropallidoluysian atrophy (DRPLA) in Italian families testifies of the recent introduction of the mutation. J. Hum. Genet. 2014, 59, 153–157. [Google Scholar] [CrossRef]
  31. Burke, J.R.; Wingfield, M.S.; Lewis, K.E.; Roses, A.D.; Lee, J.E.; Hulette, C.; Pericak-Vance, M.A.; Vance, J.M. The Haw River syndrome: Dentatorubropallidoluysian atrophy (DRPLA) in an African-American family. Nat. Genet. 1994, 7, 521–524. [Google Scholar] [CrossRef]
  32. Warner, T.T.; Williams, L.; Harding, A.E. DRPLA in Europe. Nat. Genet. 1994, 6, 225. [Google Scholar] [CrossRef]
  33. Yanagisawa, H.; Fujii, K.; Nagafuchi, S.; Nakahori, Y.; Nakagome, Y.; Akane, A.; Nakamura, M.; Sano, A.; Komure, O.; Kondo, I.; et al. A unique origin and multistep process for the generation of expanded DRPLA triplet repeats. Hum. Mol. Genet. 1996, 5, 373–379. [Google Scholar] [CrossRef] [PubMed]
  34. Rubinsztein, D.C.; Leggo, J.; Coetzee, G.A.; Irvine, R.A.; Buckley, M.; Ferguson-Smith, M.A. Sequence variation and size ranges of CAG repeats in the Machado-Joseph disease, spinocerebellar ataxia type 1 and androgen receptor genes. Hum. Mol. Genet. 1995, 4, 1585–1590. [Google Scholar] [CrossRef] [PubMed]
  35. Genetic Modifiers of Huntington’s Disease Consortium. CAG Repeat Not Polyglutamine Length Determines Timing of Huntington’s Disease Onset. Cell 2019, 178, 887–900.e14. [Google Scholar] [CrossRef]
  36. Wright, G.E.B.; Collins, J.A.; Kay, C.; McDonald, C.; Dolzhenko, E.; Xia, Q.; Bečanović, K.; Drögemöller, B.I.; Semaka, A.; Nguyen, C.M.; et al. Length of Uninterrupted CAG, Independent of Polyglutamine Size, Results in Increased Somatic Instability, Hastening Onset of Huntington Disease. Am. J. Hum. Genet. 2019, 104, 1116–1126. [Google Scholar] [CrossRef]
  37. Coutinho, P.; Silva, C.C.; Gonçalves, A.F.; Graça, R.I.; Lourenço, E.; Sequeiros, J.; Loureiro, J.L.; Guimarães, J.; Ribeiro, P. Epidemiologia da doença Machado-Joseph em Portugal. Rev. Port. Neurol. 1994, 3, 69–76. [Google Scholar]
  38. Schöls, L.; Bauer, P.; Schmidt, T.; Schulte, T.; Riess, O. Autosomal dominant cerebellar ataxias: Clinical features, genetics, and pathogenesis. Lancet Neurol. 2004, 3, 291–304. [Google Scholar] [CrossRef]
  39. Vale, J.; Bugalho, P.; Silveira, I.; Sequeiros, J.; Guimaraes, J.; Coutinho, P. Autosomal dominant cerebellar ataxia: Frequency analysis and clinical characterization of 45 families from Portugal. Eur. J. Neurol. 2010, 17, 124–128. [Google Scholar] [CrossRef]
  40. Elter, T.L.; Sturm, D.; Santana, M.M.; Schaprian, T.; Raposo, M.; Melo, A.R.V.; Lima, M.; Koyak, B.; Oender, D.; Grobe-Einsler, M.; et al. Regional distribution of polymorphisms associated to the disease-causing gene of spinocerebellar ataxia type 3. J. Neurol. 2024, 272, 54. [Google Scholar] [CrossRef] [PubMed]
  41. Dolzhenko, E.; van Vugt, J.J.F.A.; Shaw, R.J.; Bekritsky, M.A.; van Blitterswijk, M.; Narzisi, G.; Ajay, S.S.; Rajan, V.; Lajoie, B.R.; Johnson, N.H.; et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017, 27, 1895–1903. [Google Scholar] [CrossRef] [PubMed]
  42. Sequeiros, J.; Seneca, S.; Martindale, J. Consensus and controversies in best practices for molecular genetic testing of spinocerebellar ataxias. Eur. J. Hum. Genet. 2010, 18, 1188–1195. [Google Scholar] [CrossRef]
  43. Martindale, J.E. Diagnosis of Spinocerebellar Ataxias Caused by Trinucleotide Repeat Expansions. Curr. Protoc. Hum. Genet. 2017, 92, 1–22. [Google Scholar] [CrossRef]
  44. Costa, I.P.D.; Almeida, B.C.; Sequeiros, J.; Amorim, A.; Martins, S. A Pipeline to Assess Disease-Associated Haplotypes in Repeat Expansion Disorders: The Example of MJD/SCA3 Locus. Front. Genet. 2019, 10, 38. [Google Scholar] [CrossRef]
  45. Lopes, S.M.; Faro, R.; Lopes, M.M.; Onofre, I.; Mendonça, N.; Ribeiro, J.; Januário, C.; Nobre, R.J.; Pereira de Almeida, L. Protocol for the Characterization of the Cytosine-Adenine-Guanine Tract and Flanking Polymorphisms in Machado-Joseph Disease: Impact on Diagnosis and Development of Gene-Based Therapies. J. Mol. Diagn. 2020, 22, 782–793. [Google Scholar] [CrossRef]
  46. Ibañez, K.; Polke, J.; Hagelstrom, R.T.; Dolzhenko, E.; Pasko, D.; Thomas, E.R.A.; Daugherty, L.C.; Kasperaviciute, D.; Smith, K.R.; WGS for Neurological Diseases Group; et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: A retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol. 2022, 21, 234–245. [Google Scholar] [CrossRef]
  47. Borghero, G.; Pugliatti, M.; Marrosu, F.; Marrosu, M.G.; Murru, M.R.; Floris, G.; Cannas, A.; Parish, L.D.; Cau, T.B.; Loi, D.; et al. ATXN2 is a modifier of phenotype in ALS patients of Sardinian ancestry. Neurobiol. Aging 2015, 36, 2906.e1–2906.e5. [Google Scholar] [CrossRef] [PubMed]
  48. Conforti, F.L.; Spataro, R.; Sproviero, W.; Mazzei, R.; Cavalcanti, F.; Condino, F.; Simone, I.L.; Logroscino, G.; Patitucci, A.; Magariello, A.; et al. Ataxin-1 and ataxin-2 intermediate-length PolyQ expansions in amyotrophic lateral sclerosis. Neurology 2012, 79, 2315–2320. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic showing the ATXN3 gene including the three intragenic SNPs flanking the CAG repeat. There are two full-length isoforms of ataxin-3, which differ based on the inclusion of a third UIM motif encoded by the differentially spliced exon 11. The rs7158733 SNP TAA1118 variant results in a premature stop codon and a shorter isoform, ataxin-3aS.
Figure 1. Schematic showing the ATXN3 gene including the three intragenic SNPs flanking the CAG repeat. There are two full-length isoforms of ataxin-3, which differ based on the inclusion of a third UIM motif encoded by the differentially spliced exon 11. The rs7158733 SNP TAA1118 variant results in a premature stop codon and a shorter isoform, ataxin-3aS.
Ijms 26 09836 g001
Figure 2. Summary of the cohorts and sub-cohorts in this study. In total, 79 SCA3 patients from the Ataxia Centre, London, underwent genotyping for the rs7158733 SNP. Of those, 44 SCA3 patients had their ATXN3 repeat tracts cloned and sequenced; this included forty patients who were also genotyped for the rs7158733 SNP. Four patients from the cloning sub-cohort did not undergo rs7158733 SNP genotyping because of DNA shortage.
Figure 2. Summary of the cohorts and sub-cohorts in this study. In total, 79 SCA3 patients from the Ataxia Centre, London, underwent genotyping for the rs7158733 SNP. Of those, 44 SCA3 patients had their ATXN3 repeat tracts cloned and sequenced; this included forty patients who were also genotyped for the rs7158733 SNP. Four patients from the cloning sub-cohort did not undergo rs7158733 SNP genotyping because of DNA shortage.
Ijms 26 09836 g002
Figure 3. Frequency distribution of spinocerebellar ataxia type 3 (SCA3) alleles in a UK cohort at the Neurogenetics Unit, National Hospital for Neurology and Neurosurgery, London. In total, 6596 discrete chromosomes were analyzed for SCA3. Normal alleles range from 12 to 40 repeats (n = 6510; mean = 21 repeats) (A), whilst expanded alleles range from 58 to 91 repeats (n = 86; mean = 69 repeats) (B). The bar charts show the frequency distribution, whilst the box-and-whisker plots above indicate the descriptive statistics for alleles in the normal and expanded range. The box indicates the first quartile, the median, and the third quartile, and the whiskers indicate the minimum and maximum values.
Figure 3. Frequency distribution of spinocerebellar ataxia type 3 (SCA3) alleles in a UK cohort at the Neurogenetics Unit, National Hospital for Neurology and Neurosurgery, London. In total, 6596 discrete chromosomes were analyzed for SCA3. Normal alleles range from 12 to 40 repeats (n = 6510; mean = 21 repeats) (A), whilst expanded alleles range from 58 to 91 repeats (n = 86; mean = 69 repeats) (B). The bar charts show the frequency distribution, whilst the box-and-whisker plots above indicate the descriptive statistics for alleles in the normal and expanded range. The box indicates the first quartile, the median, and the third quartile, and the whiskers indicate the minimum and maximum values.
Ijms 26 09836 g003
Figure 4. Scatter-plot showing the correlation between SCA3 pathogenic allele size and age at disease onset. Median pathogenic allele size was determined from sequenced clones, rounded to the nearest whole repeat, based on the total length of the entire CAG/CAA/AAG repeat tract. Age at onset data were available for 34 out of 44 subjects in the “Cloning” sub-cohort. Median pathogenic allele size showed a negative correlation with respect to age at onset (Pearson r = −0.360). The bold line depicts the linear model fit result, and the narrow lines show the 95% confidence interval bounds, shaded in grey.
Figure 4. Scatter-plot showing the correlation between SCA3 pathogenic allele size and age at disease onset. Median pathogenic allele size was determined from sequenced clones, rounded to the nearest whole repeat, based on the total length of the entire CAG/CAA/AAG repeat tract. Age at onset data were available for 34 out of 44 subjects in the “Cloning” sub-cohort. Median pathogenic allele size showed a negative correlation with respect to age at onset (Pearson r = −0.360). The bold line depicts the linear model fit result, and the narrow lines show the 95% confidence interval bounds, shaded in grey.
Ijms 26 09836 g004
Figure 5. Violin plots comparing the baseline age, age at onset, and disease duration of the SCA3/MJD subjects based on their expanded allele rs7158733 SNP variant. (A) Participants with the A1118 SNP expanded allele (n = 57) were significantly younger at baseline compared to those with the C1118 SNP expanded allele (n = 15) (difference in medians = 6; Mann–Whitney’s test, p = 0.029). (B) Participants with the A1118 SNP expanded allele (n = 56) had a significantly earlier age at disease onset compared to those with the C1118 SNP expanded allele (n = 15) (difference in medians = 14; Mann–Whitney’s test, p = 0.017). (C) There was no significant difference in the disease duration between participants with the A1118 SNP expanded allele (n = 56) and the C1118 SNP expanded allele (n = 15) (difference in medians = −2; Mann–Whitney’s test, p = 0.583). (D) There was no significant difference in the size of the expanded allele between participants with the A1118 SNP expanded allele (n = 51) and the C1118 SNP expanded allele (n = 20) (difference in medians = −2; Mann–Whitney’s test, p = 0.294). * p ≤ 0.05.
Figure 5. Violin plots comparing the baseline age, age at onset, and disease duration of the SCA3/MJD subjects based on their expanded allele rs7158733 SNP variant. (A) Participants with the A1118 SNP expanded allele (n = 57) were significantly younger at baseline compared to those with the C1118 SNP expanded allele (n = 15) (difference in medians = 6; Mann–Whitney’s test, p = 0.029). (B) Participants with the A1118 SNP expanded allele (n = 56) had a significantly earlier age at disease onset compared to those with the C1118 SNP expanded allele (n = 15) (difference in medians = 14; Mann–Whitney’s test, p = 0.017). (C) There was no significant difference in the disease duration between participants with the A1118 SNP expanded allele (n = 56) and the C1118 SNP expanded allele (n = 15) (difference in medians = −2; Mann–Whitney’s test, p = 0.583). (D) There was no significant difference in the size of the expanded allele between participants with the A1118 SNP expanded allele (n = 51) and the C1118 SNP expanded allele (n = 20) (difference in medians = −2; Mann–Whitney’s test, p = 0.294). * p ≤ 0.05.
Ijms 26 09836 g005
Figure 6. Scatter-plot showing the correlation between SCA3 pathogenic allele size and age at disease onset, stratified based on expanded allele rs7158733 SNP variant. Pathogenic allele sizes were determined from fragment sizing of rs7158733 SNP allele-specific PCR products, as detailed in the Material and Methods. Age at onset data were available for 48 out of 59 subjects with an A1118 SNP in cis with their expanded allele and 15 out of 20 subjects with a C1118 SNP in cis with their expanded allele. The blue (A1118) and red (C1118) lines depict the linear model fit result for each genotype group, and the narrow lines and shaded areas show the appropriate 95% confidence interval bounds. Both SNP groups had a negative correlation between their ages at onset and pathogenic allele sizes (A1118 group Pearson r = −0.320; C1118 group Pearson r = −0.862).
Figure 6. Scatter-plot showing the correlation between SCA3 pathogenic allele size and age at disease onset, stratified based on expanded allele rs7158733 SNP variant. Pathogenic allele sizes were determined from fragment sizing of rs7158733 SNP allele-specific PCR products, as detailed in the Material and Methods. Age at onset data were available for 48 out of 59 subjects with an A1118 SNP in cis with their expanded allele and 15 out of 20 subjects with a C1118 SNP in cis with their expanded allele. The blue (A1118) and red (C1118) lines depict the linear model fit result for each genotype group, and the narrow lines and shaded areas show the appropriate 95% confidence interval bounds. Both SNP groups had a negative correlation between their ages at onset and pathogenic allele sizes (A1118 group Pearson r = −0.320; C1118 group Pearson r = −0.862).
Ijms 26 09836 g006
Table 1. Summary of clone sequencing data.
Table 1. Summary of clone sequencing data.
CriteriaSequence and Information
Most frequent non-expanded allele(CAG)2(CAA)(AAG)(CAG)(CAA)(CAG)17
23 repeats (7.5% of all clones; 10 participants)
Most frequent expanded allele(CAG)2(CAA)(AAG)(CAG)(CAA)(CAG)68
74 repeats (5.5% of all clones; 12 participants)
Most frequent loss of canonical interruption allele(CAG)2(CAA)(AAG)(CAG)58
62 repeats (0.7% of all clones; 1 participant, #17)
Table 2. Frequency of the different variants of the rs7158733 (A1118/C1118) SNP in expanded and non-expanded alleles. Percentages refer to the total number of pairs of chromosomes.
Table 2. Frequency of the different variants of the rs7158733 (A1118/C1118) SNP in expanded and non-expanded alleles. Percentages refer to the total number of pairs of chromosomes.
Non-Expanded AlleleTotal
A1118C1118
Expanded alleleA1118134659 (74.7%)
C111813720 (25.3%)
Total26 (32.9%)53 (67.1%)79 (100%)
Table 3. Demographic, genetic and disease characteristics of the SCA3/MJD subjects based on their expanded allele rs7158733 SNP variant. CAG repeat lengths were determined from fragment sizing of rs7158733 SNP allele-specific PCR products, as detailed in the Materials and Methods. Statistically significant differences are highlighted in bold.
Table 3. Demographic, genetic and disease characteristics of the SCA3/MJD subjects based on their expanded allele rs7158733 SNP variant. CAG repeat lengths were determined from fragment sizing of rs7158733 SNP allele-specific PCR products, as detailed in the Materials and Methods. Statistically significant differences are highlighted in bold.
A1118C1118
Female
(n = 79; %)
34
(57.6)
12
(60.0)
χ 2   = 0.03; df = 1; p = 0.852 a
Baseline age
[n = 72; years, median (Q1, Q3)]
51.0
(40.0, 58.5)
57.0
(49.0, 65.0)
p = 0.029 b
CAG repeats, expanded allele
[n = 71; median (Q1, Q3)]
68.0
(63.0, 70.0)
66.0
(61.0, 69.0)
p = 0.294 b
CAG repeats, non-expanded allele
[n = 77; median (Q1, Q3)]
22.0
(18.0, 26.0)
22.0
(20.5, 29.5)
p = 0.339 b
Disease duration
[n = 71; years, median (Q1, Q3)]
11.0
(6.0, 16.75)
9.0
(6.0, 14.0)
p = 0.583 b
Age at onset
[n = 71; years, median (Q1, Q3)]
37.0
(29.5, 46.0)
51.0
(38.0, 53.0)
p = 0.017 b
a Pearson’s χ 2 test; b Mann–Whitney’s test.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nethisinghe, S.; Garcia-Moreno, H.; Alwan, J.; Labrum, R.; Giunti, P. Role of Repeat Tract Structure and the rs7158733 SNP in Spinocerebellar Ataxia 3. Int. J. Mol. Sci. 2025, 26, 9836. https://doi.org/10.3390/ijms26209836

AMA Style

Nethisinghe S, Garcia-Moreno H, Alwan J, Labrum R, Giunti P. Role of Repeat Tract Structure and the rs7158733 SNP in Spinocerebellar Ataxia 3. International Journal of Molecular Sciences. 2025; 26(20):9836. https://doi.org/10.3390/ijms26209836

Chicago/Turabian Style

Nethisinghe, Suran, Hector Garcia-Moreno, Jude Alwan, Robyn Labrum, and Paola Giunti. 2025. "Role of Repeat Tract Structure and the rs7158733 SNP in Spinocerebellar Ataxia 3" International Journal of Molecular Sciences 26, no. 20: 9836. https://doi.org/10.3390/ijms26209836

APA Style

Nethisinghe, S., Garcia-Moreno, H., Alwan, J., Labrum, R., & Giunti, P. (2025). Role of Repeat Tract Structure and the rs7158733 SNP in Spinocerebellar Ataxia 3. International Journal of Molecular Sciences, 26(20), 9836. https://doi.org/10.3390/ijms26209836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop