Molecular Characterization of Portuguese Patients with Hereditary Cerebellar Ataxia

Hereditary cerebellar ataxia (HCA) comprises a clinical and genetic heterogeneous group of neurodegenerative disorders characterized by incoordination of movement, speech, and unsteady gait. In this study, we performed whole-exome sequencing (WES) in 19 families with HCA and presumed autosomal recessive (AR) inheritance, to identify the causal genes. A phenotypic classification was performed, considering the main clinical syndromes: spastic ataxia, ataxia and neuropathy, ataxia and oculomotor apraxia (AOA), ataxia and dystonia, and ataxia with cognitive impairment. The most frequent causal genes were associated with spastic ataxia (SACS and KIF1C) and with ataxia and neuropathy or AOA (PNKP). We also identified three families with autosomal dominant (AD) forms arising from de novo variants in KIF1A, CACNA1A, or ATP1A3, reinforcing the importance of differential diagnosis (AR vs. AD forms) in families with only one affected member. Moreover, 10 novel causal-variants were identified, and the detrimental effect of two splice-site variants confirmed through functional assays. Finally, by reviewing the molecular mechanisms, we speculated that regulation of cytoskeleton function might be impaired in spastic ataxia, whereas DNA repair is clearly associated with AOA. In conclusion, our study provided a genetic diagnosis for HCA families and proposed common molecular pathways underlying cerebellar neurodegeneration.

Relevant variants identified by WES were confirmed by Sanger sequencing, which also allowed verifying intrafamilial segregation in several families. For PCR amplifications, we used Ranger Mix (Bioline, London, UK), purified products with Exo/SAP (GRiSP, Porto, Portugal), and performed Sanger sequencing using Big Dye Terminator Cycle Sequencing v1.1 (Applied Biosystems, Foster City, CA, USA) in an ABI 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Sequencing analysis was carried out using the SeqScape v2.6 software (Applied Biosystems, Foster City, CA, USA).

Minigene Assays
The SPG11 and KIF1C minigenes constructs were obtained by cloning the exonic and intronic sequences flanking the variants of interest (c.3039-5T > G in SPG11 and c.1166-2A > G in KIF1C) into the pCMVdi vector, kindly provided by Dr Alexandra Moreira [32]. Briefly, SPG11 exons 16, 17, and 18 with intronic regions or KIF1C exons 13, 14, and 15 with intronic regions were PCR amplified from patients' genomic DNA and cloned into the pCMVdi vector, using the Gibson assembly method. Sequences were modified by site-directed mutagenesis to generate the wild-type constructs, using a QuikChange II Kit (Agilent, Santa Clara, CA, USA), according to the manufacturer's protocol.
HEK293T cells were transfected with the minigene constructs using jetPRIME (Polyplustransfection, Illkirch, France), according to the manufacturer's protocol. RNA was extracted 48 h after transfection, using NZYol (Nzytech, Lisbon, Portugal), as per manufacturer's recommendations, followed by purification of the RNA aqueous phase, using an RNeasy mini kit (Qiagen, Hilden, Germany). RNA quantification was performed on NanoDrop 2000 (ThermoFisher Scientific, Waltham, MA, USA). cDNA was synthesized by reverse transcription-PCR of 2 µg total RNA with oligo (dT), using a SuperScript III first-strand synthesis system (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's protocol. The resulting cDNA was amplified by PCR and loaded on agarose gel for extraction with a Zymoclean Gel DNA Recovery Kit (Zymo Research, Irvine, CA, USA). The resulting products were sequenced by Sanger sequencing using Big Dye Terminator Cycle Sequencing v1.1 (Applied Biosystems, Foster City, CA, USA) in an ABI 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).

Genetic and Molecular Characterization
We used WES to identify variants and genes causing HCA in 19 Portuguese families with family history compatible with AR transmission (including 9 with known consanguinity), in a total of 30 individuals: 19 index cases, one affected and 10 non-affected relatives. At least, one relative of seven index cases was available. All relevant variants (rare, pathogenic, likely pathogenic, and/or predicted to be deleterious) were confirmed by Sanger sequencing, and segregation analysis was performed with all available family members. Phenotypic classification considered the main clinical symptom associated with ataxia: spasticity, neuropathy, oculomotor apraxia, dystonia, or cognitive impairment.
Variants were classified as pathogenic when having been previously reported as disease-causing (14/24), or by causing an early stop codon (nonsense and frameshift; 8/24) in genes where loss-of-function is a known disease mechanism (Table S2, supporting information). In addition, two variants were predicted to affect splicing by bioinformatics analysis (Table S3, supporting information)-c.3039-5T > G in SPG11 and c.1166-2A > G in KIF1C-both in families with spastic ataxia; their detrimental effect on splicing was confirmed by minigene splicing-assays, concluding that both are expected to result in frameshifts ( Figure 1; Figure S1, supporting information).
We also identified two novel missense variants ( Figure 2 and Table S3, supporting information), both in heterozygosity, classified as likely pathogenic (2/24): one in CACNA1A (c.4996C > G; p.Arg1666Gly; family AR267), at an amino-acid residue where a different missense change has been determined to be pathogenic, located in a well-established functional (transmembrane) domain (Table S4, supporting information) and not found in population databases; and one in ATP1A3 (c.374T > A; p.Val125Glu; family AR278), a gene with low rate of benign missense variation, not reported in population databases, located in a mutation hotspot within a transmembrane domain (Table S4, supporting information) and predicted to be deleterious. Moreover, the WT residues of human CACNA1A (Arg1666) and ATP1A3 (Val125) are conserved across species, while the mutated residues are predicted to change their interactions, probably affecting protein structure ( Figure 2). Besides the apparent AR inheritance, the variant in ATP1A3 was confirmed to occur de novo (absent in the unaffected parents, family AR78). Unfortunately, we did not have parental samples to test for the CACNA1A variant (family AR267), but this variant probably occurs de novo, as these are a recurrent cause of CACNA1A-related ataxia [33]. In addition, we also identified one de novo variant in KIF1A (c.761G > A, described as pathogenic) in a patient with spastic ataxia, and confirmed the unaffected parents and sibling did not carry it (family AR49).
Thus, we reported 10 novel disease-associated variants in nine families (Table S1, supporting information). All variants were absent from population databases or were present only in heterozygosity with very low MAF.

Clinical Characterization
Beyond the 20 patients tested, we had clinical information (but no DNA sample) from nine additional affected relatives of probands (detailed data Table 1).
genes where loss-of-function is a known disease mechanism (Table S2, supporting information). In addition, two variants were predicted to affect splicing by bioinformatics analysis (Table S3, supporting information)-c.3039-5T > G in SPG11 and c.1166-2A > G in KIF1C-both in families with spastic ataxia; their detrimental effect on splicing was confirmed by minigene splicing-assays, concluding that both are expected to result in frameshifts ( Figure 1; Figure S1, supporting information).  Figure S1, supporting information.  Figure S1, supporting information.
We also identified two novel missense variants ( Figure 2 and Table S3, supporting information), both in heterozygosity, classified as likely pathogenic (2/24): one in CACNA1A (c.4996C > G; p.Arg1666Gly; family AR267), at an amino-acid residue where a different missense change has been determined to be pathogenic, located in a well-established functional (transmembrane) domain (Table S4, supporting information) and not found in population databases; and one in ATP1A3 (c.374T > A; p.Val125Glu; family AR278), a gene with low rate of benign missense variation, not reported in population databases, located in a mutation hotspot within a transmembrane domain (Table S4, supporting information) and predicted to be deleterious. Moreover, the WT residues of human CACNA1A (Arg1666) and ATP1A3 (Val125) are conserved across species, while the mutated residues are predicted to change their interactions, probably affecting protein structure ( Figure 2). Besides the apparent AR inheritance, the variant in ATP1A3 was confirmed to occur de novo (absent in the unaffected parents, family AR78). Unfortunately, we did not have parental samples to test for the CACNA1A variant (family AR267), but this variant probably occurs de novo, as these are a recurrent cause of CACNA1A-related ataxia [33]. In addition, we also identified one de novo variant in KIF1A (c.761G > A, described as pathogenic) in a patient with spastic ataxia, and confirmed the unaffected parents and sibling did not carry it (family AR49).  with cerebellar ataxia. (B) Sequence alignment of the residues surrounding the mutated residues in CACNA1A and ATP1A3 against other species was performed using the Clustal Omega program. Protein models of CACNA1A and ATP1A3 showing altered residues interactions upon amino acid changes (performed using DynaMut2). The Arg1666 and Val135 residues in CACNA1A and ATP1A3, respectively, are in the center, displaying the various interactions with nearby residues in different colors.

Ataxia and Neuropathy
This group included four families (10 patients). Onset was in the first decade in AR49 (KIF1A) and AR92 (PNKP), second decade in AR4 (SETX), and third decade in AR126 (POLG). Ataxia was the presenting feature in all families, with neuropathy developing later. It is noteworthy that in AR4 (SETX) and AR92 (PNKP) none of the patients had oculomotor apraxia. All patients from AR4 presented diplopia and nystagmus, but none from AR92 displayed such features. The patient from AR49 (KIF1A) had an extensor plantar response in the absence of spasticity. AR126 had a POLG-related ataxia, with the typical optic atrophy, vertical gaze palsy, and epilepsy. One of the members from AR92 died at age 35, with 29 years of disease duration. Imaging was only available in two families (AR4, AR49), all displaying cerebellar atrophy.

Ataxia and Oculomotor Apraxia
Both families (4 patients) had onset in the first decade of life, and disease-causing variants in PNKP. Ataxia was the first symptom in all patients. OMA, neuropathy, and obesity were present in all individuals, and dystonia was highly frequent. Cerebellar atrophy was identified on magnetic resonance imaging (MRI); when tested, total proteins/albumin level was decreased, and cholesterol increased. In AR117, death occurred between 37 and 55 years of age, with a mean disease duration of 37 years.

Ataxia and Dystonia/Ataxia and Cognitive Impairment
AR2 (HEXB) and AR278 (ATP1A3), with one patient each, presented ataxia and dystonia. AR2 (HEXB) had adult-onset ataxia, very prominent oromandibular dystonia and muscle wasting, while AR278 (ATP1A3) had onset at age 11 years, with paroxysmal lower limbs dystonia induced by walking, and progressive ataxia four years later. Over the disease course, dystonia became permanent and generalized. Both AR2 and AR278 had cerebellar atrophy on MRI.
AR16 (FA2H) was the only family with ataxia and cognitive impairment as the major phenotype, in a patient with delayed motor and cognitive milestones, who later developed ataxia and seizures. Cortical and cerebellar atrophy was present on MRI, and increased latency was identified in visual evoked potentials.

Discussion
In this study, we performed WES on 30 individuals from 19 families with (apparently) recessive cerebellar ataxia, aiming at providing a conclusive genetic diagnosis for them. These families had been identified during our national population survey in Portugal [10], but remained for many years without a molecular diagnosis, after testing the most common genes. More recently, new causal-genes were identified in undiagnosed families, including PNKP in eight families with AOA4 [11], MAG in one family with AOA and neuropathy [12], and DAB1 in three AD families with SCA37 [34].
In this report, pathogenic and likely-pathogenic variants in several genes were identified and clearly associated with HCA (Table 1 and Table S1, supporting information). Division of the cohort into phenotypic subgroups had been performed during the original survey, and we retained this classification for a better genotype-phenotype characterization, and for clinician guidance in everyday practice. All the causal genes now identified had previously been described within their respective phenotypic subgroup [35][36][37][38]. The most frequent were SACS, KIF1C, and PNKP. To note, Friedreich ataxia, AOA (AOA1 and 2) and L-2-hydroxyglutaric aciduria had been previously screened and found to be the most prevalent types of AR-HCA [10]; KIF1C and PNKP were described only a few years later to be the causative genes for SPAX2 and AOA4 [11,39]. Present data reinforce PNKP-related ataxia as one of the most prevalent AR-HCA in Portugal [11]. Both PNKP and SETX were universally associated with neuropathy, but not with AOA; the reason why AR4 had not been previously tested for SETX. This has been described in the meantime [40]. Oculomotor apraxia, thus, should now be regarded as a clinical sign with high specificity, but less sensitivity for AOA2 and AOA4. Additionally, we identified four novel disease-variants in SACS and two in KIF1C, broadening the genetic spectrum of ARSACS and SPAX2.
We also provided functional data to validate the detrimental effect of two new splicesite variants (Figure 1) that would have been otherwise classified as variants of unknown clinical significance. Using minigene assays, we inferred that SPG11 c.3039-5T > G variant creates a new 3 acceptor splice site, while KIF1C c.1166-2A > G abolished the canonical 3 acceptor slice site. Both were predicted to modify the open reading-frame, resulting in a premature stop codon.

Heterozygous De Novo Variants
All families analyzed had been classified as AR-HCA, based either on the presence of affected relatives in the same generation with unaffected parents; absence of other affected relatives; or consanguinity. Nevertheless, de novo variants in heterozygosity were identified (or presumed) in three families with only one affected member (Figure 2), highlighting that various modes of inheritance should considered when analyzing sequencing data in such cases. We identified one previously reported de novo missense variant in KIF1A [41], and two novel variants in CACNA1A and ATP1A3 were classified as likely pathogenic. KIF1A has been associated with AR and de novo AD diseases [42]. Patients with de novo KIF1A-related variants have shown a complex phenotype, including developmental delay, ataxia, spastic paraplegia, and neuropathy, with onset in the first year of life [41,43,44]. Our patient from family AR49 had had cerebellar ataxia since age 5 years, no spasticity, but an extensor plantar response, and neuropathy. Age of onset and combination of neurological syndromes were similar to those observed in Friedreich ataxia, but this was not the case in other cases previously described [43,44]. Nevertheless, our patient raises the hypothesis that heterozygous variants in KIF1A should be considered in 'Friedreich ataxia-like' patients.
Opposed to KIF1A, CACNA1A and ATP1A3 are known to cause only autosomal dominant disorders [45][46][47][48]. CAG repeats in the last exon of the longest isoform of CACNA1A cause SCA6, and missense mutations affecting the calcium channel-encoding sequence cause progressive cerebellar ataxia, familial hemiplegic migraine (FHM1), and episodic ataxia type 2 (EA2) [49]. A congenital form of cerebellar ataxia with cognitive impairment has also been recently described [50]. A spastic ataxic phenotype, as observed in family AR267, has only been described once [35], probably being a less frequent presentation in CACNA1A-related disorders. Interestingly, CACNA1A variants were one of the most prevalent causes of AD-HCA, after SCA3 and dentatorubral-pallidoluysian atrophy (DRPLA), in our population-based survey [10]. Variants in ATP1A3 produce a wide spectrum of AD neurological and psychiatric disorders, ranging from infantile to adult onset. Cerebellar ataxia is classically present in CAPOS syndrome (cerebellar ataxia, areflexia, pes cavus, optic atrophy, and sensorineural hearing loss), but atypical phenotypes with ataxia (either relapsing, of acute onset or slowly progressive) have been increasingly recognized [51]. The proband of AR278 had paroxysmal dystonia induced by walking, resembling exerciseinduced dyskinesia (PED), and, later, slowly progressive ataxia and generalized dystonia. A phenotype similar to PED has been reported in a family with four affected individuals, two of which also display cerebellar ataxia [52].

Molecular Mechanisms
HCA can be caused either by loss-of-function, gain-of function, or a dominant-negative effect, in a multitude of apparently unrelated genes. Generally, AR-HCA is associated with loss-of-function variants, whereas AD-HCA can be caused by a combination of gain and/or loss of function [7]. Our study highlights the diversity of cerebellar ataxia-related genes and phenotypes, possibly reflecting different disease mechanisms (Table S2, supporting information). A pivotal question that remains unanswered is why the Purkinje cells are particularly affected [6,7]. Most genes/proteins identified in our study do not show elevated cerebellar expression, compared to other brain regions (Table S5, supporting information), as per RNA and protein data from the Human Protein Atlas, and in agreement with current literature [7]. We reviewed the molecular mechanisms underlying the main clinical phenotypes (Table S2, supporting information) and discuss potential common diseasepathways in AR-HCA below.

Spastic Ataxia
Spasticity and pyramidal signs are hallmarks of several cerebellar ataxias [53,54]. As demonstrated by the diversity of genes causing spastic ataxia in our study, there is not a particular molecular pathway underlying this phenotype. Nevertheless, we identified one molecular mechanism that may be shared between a set of genes: (1) SACS, KIF1C, SPG11, and SYNE1 interact with the cytoskeleton and may function to ensure proper vesicle/organelle trafficking and/or neuronal structure.
SACS encodes the large protein sacsin, the function of which has not yet been clearly established. The presence of both ubiquitin-like (UBL) and DNaJ domain (implicated in chaperon-mediated folding process), and a potential role in the degradation of aberrant ataxin-1, suggest that sacsin may integrate the ubiquitin-proteasome system and Hsp70 chaperone function [55]. Moreover, studies in SACS knockout mice, revealed that sacsin regulates the neurofilament cytoskeleton and mitochondria dynamics [56,57]. Most SACS variants lead to the complete loss of sacsin, although some missense variants are associated with low levels of the protein [58,59]. Since most variants are upstream of the C-terminal DnaJ domain, it is speculated that ARSACS is associated with loss of chaperone function [60]. Concordantly, we identified known and novel variants in an unspecific region or within the sacsin-repeating region with homology with Hsp90 (Table S4, supporting information); all upstream of the DNaJ domain.
Kinesin family member 1C (KIF1C) is a ubiquitously expressed motor protein involved in both anterograde transport of vesicles and retrograde transport from Golgi to the ER [61]. Few reports have analyzed the impact of KIF1C variants, but it was suggested that variants in the motor domain may affect microtubule binding and impair ATP hydrolysis, while truncating variants lead to reduced protein levels and may impair cargo binding [62]. Therefore, truncating variants, such as the ones identified in our study, are probably associated with a loss-of-function mechanism.
SPG11 encodes spatacsin, a large protein involved with neuronal axonal growth, intracellular cargo trafficking, and lysosome function. SPG11 variants span the entire spatacsin protein [63]. Functional studies of truncating variants indicated that spatacsin loss-of-function is at the basis of neurodegeneration, as evidenced by reduced expression of spatacsin [64]; reduction in the anterograde vesicle trafficking, indicative of impaired axonal transport [65]; reduction in axonal complexity and neurite outgrowth [65]; or the presence of abnormal lysosomes [64]. Given the nature of the variants we identified, it is plausible that they also cause loss of spatacsin function.
SYNE1 is a large gene that encodes the synaptic nuclear envelope protein 1 (SYNE1), also known as nuclear envelope spectrin 1 (nesprin 1). This protein mediates the link from the nuclear envelope to the actin cytoskeleton (LINC complex), being important for nuclear migration [66,67]. Most SYNE1 variants associated with ARCA1 are nonsense or frameshift, and span the giant nesprin-1 isoform, but exclude the C-terminal Klarsicht-ANC-Synehomology (KASH) [68]. Thus, ARCA1 may be caused by defective synaptic transmission through synaptic vesicles trafficking impairment and/or cerebellar dendrites defects; however, this remains to be confirmed [69]. Nevertheless, many missense variants were also identified in ARCA1, including one identified by us, within a spectrin repeat domain.

Ataxia and Neuropathy
Peripheral neuropathy is a common feature associated to HCAs; but, as far as we know, this combination has not been linked with specific disease pathways. In this study, we found three genes (POLG, SETX, and PNKP) with various molecular functions, causing recessive ataxia and neuropathy.
POLG encodes the catalytic subunit of the mitochondrial DNA polymerase gamma protein (POLG), which functions in the replication and repair of mtDNA [72]. Functional studies showed that missense variants cause decreased activity, DNA binding and processivity of the polymerase, and reduction of functional mtDNA [73,74].The POLG variant identified in our study (p.Trp748Ser) is within the C-terminal domain of the polymerase, an important domain for its activity [75]. We can, therefore, speculate that it causes reduction or loss of POLG function.
SETX and PNKP are discussed in the following sub-section.

Ataxia and Oculomotor Apraxia
AOA has been associated with the DNA damage repair pathway. Ataxia telangiectasia, AOA1, AOA2, AOA4, XRCC1-AOA, and spinocerebellar ataxia with axonal neuropathy 1 are all caused by genes with a role in DNA repair [9]. We identified variants in SETX and PNKP, causing AOA2 and AOA4, respectively. Interestingly, in one family with pathogenic variants in SETX and one in PNKP, oculomotor apraxia was not present. Both had been phenotypically classified as 'ataxia with neuropathy', due to severe neuropathy. This possible occurrence has been previously described and is reinforced in this study.
SETX encodes a large nuclear protein termed senataxin, with an RNA/DNA helicase domain. Senataxin is involved in RNA metabolism, DNA maintenance, and damage response. Several types of biallelic variants in SETX were identified in AOA2 [76][77][78], probably causing a loss-of-function phenotype. In addition, dominant missense variants in SETX cause amyotrophic lateral sclerosis (ALS) with juvenile onset (ALS4) [79], suggesting that gain-of-function leads to ALS4. Altered gene expression and mRNA processing, and increased susceptibility to oxidative DNA damage, have all been associated with SETX variants [79][80][81][82]. There have been no functional studies of the SETX p.Gly2047Cys variant [83], but as it lies within the DNA/RNA helicase domain, it may cause a reduction or loss of senataxin function.
Variants in PNKP, encoding the nuclear polynucleotide kinase 3 phosphatase protein (PNKP), result in a range of AR disorders. The enzymatic activity of PNKP is involved in the repair of both DNA double strand-break (DSB) and single-strand break (SSB); impaired function can cause neurodevelopmental dysfunction (microcephaly, seizures, and developmental delay (MCSZ)) [84] and neurodegeneration (AOA4 and Charcot-Marie-Tooth disease (CMTD)) [11,36]. Few functional studies have described the effects of PKNP variants. Nevertheless, most variants seem to cause reduced stability and levels of PNKP, and reduced DNA phosphatase activity in MCSZ or reduced kinase activity in neurodegenerative diseases. These ultimately lead to reduced DNA repair [85,86]. Moreover, several types of variants were identified in PNKP; most are in the kinase domain, particularly in neurodegenerative diseases [11,87], as shown in this study.

Ataxia and Cognitive Impairment or Dystonia
Additionally, other neurological signs may be present in AR-HCA, such as dystonia, and cognitive impairment. In our study, we identified a known variant in HEXB in one family with Sandhoff disease, and a new variant in FA2H causing ataxia with cognitive impairment.
Sandhoff disease is caused by variants in HEXB encoding the β subunit of the enzyme hexosaminidase (HEXB) [88]. HEXB is a lysosomal hydrolase that catalyzes the degradation of GM2 ganglioside. The spectrum of HEXB variants is wide, but most reduce GM2 hydrolysis, resulting in the accumulation of GM2 ganglioside [89,90]. Moreover, it was reported that several variants, including missense, lead to reduced mRNA expression and affect HEXB protein structure [91][92][93]. Particularly, the p.Arg505Gln variant, located in the catalytic domain, was reported to disturb the biochemical properties and cause loss of activity of HEXB and accumulation of GM2 ganglioside [90,94].
FA2H encodes the fatty acid 2-hydroxylase (FA2H) protein, which is involved in the synthesis of 2-hydroxy fatty acid galactolipids, the most abundant lipids in the myelin sheath [95]. Pathogenic variants in FA2H are speculated to cause neurodegeneration through a loss-of-function mechanism, since there is evidence of decreased hydroxylation of myelin lipids [96]. Moreover, a missense variant causing neurodegeneration with brain iron accumulation, led to reduced FA2H protein expression [97]. Since the FA2H variant we identified is predicted to cause a frameshift, it is also possible it leads to loss-of-function.

Conclusions
Our study provided the molecular diagnosis of 19 families with various types of HCA through WES, expanding the genetic and phenotypic spectrum of HCA. Most of the mutated genes caused a spastic ataxia phenotype, but PNKP associated with either ataxia and neuropathy or AOA was also amongst the most frequent causal-genes. We highlight three families with de novo variants in KIF1A, CACNA1A, and ATP1A3, despite having been initially classified has AR. These results evidence the importance of performing a differential diagnosis (AR vs. AD forms) in the absence of affected relatives of the proband. All the remaining families were associated with AR-HCA related genes. In addition, we provided a review on potential common mechanisms underlying neurodegeneration in AR-HCA, namely cytoskeleton function in spastic ataxia, and DNA repair in AOA. Translation of genetic findings into a better understanding of HCA mechanisms may help the development of effective therapies [6,7].