A Systematic Review of Genotype–Phenotype Correlation across Cohorts Having Causal Mutations of Different Genes in ALS

Amyotrophic lateral sclerosis is a rare and fatal neurodegenerative disease characterised by progressive deterioration of upper and lower motor neurons that eventually culminates in severe muscle atrophy, respiratory failure and death. There is a concerning lack of understanding regarding the mechanisms that lead to the onset of ALS and as a result there are no reliable biomarkers that aid in the early detection of the disease nor is there an effective treatment. This review first considers the clinical phenotypes associated with ALS, and discusses the broad categorisation of ALS and ALS-mimic diseases into upper and lower motor neuron diseases, before focusing on the genetic aetiology of ALS and considering the potential relationship of mutations of different genes to variations in phenotype. For this purpose, a systematic review is conducted collating data from 107 original published clinical studies on monogenic forms of the disease, surveying the age and site of onset, disease duration and motor neuron involvement. The collected data highlight the complexity of the disease’s genotype–phenotype relationship, and thus the need for a nuanced approach to the development of clinical assays and therapeutics.


Introduction
Amyotrophic lateral sclerosis, or ALS, is characterised by a progressive and fatal degeneration of upper and/or lower motor neurons (UMN and LMN, respectively) resulting in muscle weakness and wasting. Classical ALS is the most common form of motor neuron disease (MND) [1] and is defined by the selective deterioration of both UMN and LMN [2]. The global incidence of ALS varies between 1 and 2.6 cases per 100,000 people per year [3], with the average age of onset ranging from 54 to 67 years old [4]. The prevalence of ALS increases with age, reaching 1/5000 among people aged 70-79 years old [5]. Consequently, as the population ages, it is expected that the world's total number of cases will reach more than 375,000 by 2040 [6]. Owing to the lack of a reliable diagnostic test, absence of validated biomarkers, and phenotypes that are easily confounded with other MNDs, including primary lateral sclerosis (PLS) and progressive muscular atrophy (PMA), there is a delay of approximately 11-12 months in reaching a definite diagnosis [7]. Currently, diagnosis is based on a set of clinical Multiple features have been associated with a poor prognosis, with an elderly onset being associated with a rapid progression of symptoms and a poor prognosis, especially among elderly females presenting with bulbar-onset phenotype [12]. Disease progression can be assessed either by diagnostic delay or by the ALS functional rating score (ALSFRS: amyotrophic lateral sclerosis functional rating scale). Poor prognosis is associated with patients whose ALS diagnosis has been given less than 8 months after symptom onset, or among those patients losing more than 1.4 points/month on the ALSFRS scale [13].
In the past 30 years, there have been a large number of studies investigating the genetic underpinnings of ALS. To date, over 30 genes have been related to the disease; yet it is important to note that mutations in these genes explain only ~20% of total ALS cases [14] whilst the majority of cases remain unexplained and present no family history. ALS is therefore considered to be a mainly sporadic disease (sALS), with ~80% of cases having no known genetic basis [3], although twin studies have estimated heritability at 40-45% [15] or 61% [16]. Known gene mutations explain some 70% of familial cases (fALS) [17,18], and they have also been identified in 10% of sporadic cases [18]. In European cohorts, the hexanucleotide repeat expansion in the C9orf72 gene is the most common genetic cause of fALS (33.7%) and sALS (5.1%), followed by SOD1 (14.8% in fALS and 1.2% in sALS cases), TARDBP/TDP-43 (4.2% in fALS and 0.8% in sALS), and FUS (2.8% in fALS and 0.3% in sALS) [19].
To understand the molecular mechanisms underlying ALS, it is useful to study genotypephenotype relationships, to determine whether certain gene mutations are associated with specific clinical features or outcomes. Genotype-phenotype relationships have previously been examined for certain gene mutations, and several informatics resources exist to collect genotype-phenotype data [20][21][22][23][24], but a systematic understanding across different gene mutations has not been established. As a step towards this, the present review gathers together the clinical summary statistics from Multiple features have been associated with a poor prognosis, with an elderly onset being associated with a rapid progression of symptoms and a poor prognosis, especially among elderly females presenting with bulbar-onset phenotype [12]. Disease progression can be assessed either by diagnostic delay or by the ALS functional rating score (ALSFRS: amyotrophic lateral sclerosis functional rating scale). Poor prognosis is associated with patients whose ALS diagnosis has been given less than 8 months after symptom onset, or among those patients losing more than 1.4 points/month on the ALSFRS scale [13].
In the past 30 years, there have been a large number of studies investigating the genetic underpinnings of ALS. To date, over 30 genes have been related to the disease; yet it is important to note that mutations in these genes explain only~20% of total ALS cases [14] whilst the majority of cases remain unexplained and present no family history. ALS is therefore considered to be a mainly sporadic disease (sALS), with~80% of cases having no known genetic basis [3], although twin studies have estimated heritability at 40-45% [15] or 61% [16]. Known gene mutations explain some 70% of familial cases (fALS) [17,18], and they have also been identified in 10% of sporadic cases [18]. In European cohorts, the hexanucleotide repeat expansion in the C9orf72 gene is the most common genetic cause of fALS (33.7%) and sALS (5.1%), followed by SOD1 (14.8% in fALS and 1.2% in sALS cases), TARDBP/TDP-43 (4.2% in fALS and 0.8% in sALS), and FUS (2.8% in fALS and 0.3% in sALS) [19].
To understand the molecular mechanisms underlying ALS, it is useful to study genotype-phenotype relationships, to determine whether certain gene mutations are associated with specific clinical features or outcomes. Genotype-phenotype relationships have previously been examined for certain gene mutations, and several informatics resources exist to collect genotype-phenotype data [20][21][22][23][24], but a systematic understanding across different gene mutations has not been established. As a step towards this, the present review gathers together the clinical summary statistics from previously studied cohorts

Site of Onset Variability
The majority of ALS cases (~70%) have spinal onset, usually presenting with focal limb weakness [30] such as foot drop or a weak hand [7]. The disease then tends to spread in a contiguous manner, initiating at distinct focal regions of the body and then propagating from the primarily affected area to adjuvant secondary sites of the body [31].
In 25% of ALS cases, symptoms develop initially in the bulbar-innervated muscles [30,32]. Bulbar-onset ALS is more common in women [7], especially after 70 years (M:F ratio 1:1.6 [12] Cognitive-onset ALS patients usually present symptoms characteristic of frontotemporal dementia (FTD), such as changes in behaviour, personality and cognition which are all suggestive of frontal impairments [35].
In summary, initial site of symptom onset varies among ALS patients from classic limb-onset to rare cognitive-onset phenotypes, and a poor prognosis is often associated with bulbar and respiratory onset [25].

Motor Neuron Involvement in ALS Variants
ALS patients can present with either a LMN or UMN predominant phenotype ( Figure 2). Signs of pure LMN dysfunction are considered as progressive muscular atrophy (PMA), whereas predominant UMN signs are associated with primary lateral sclerosis (PLS) [30]. PMA and PLS are both rare diseases and represent 5% of MND patients [27].

UMN-Dominant ALS Variants
Patients can present predominant UMN dysfunction as in primary lateral sclerosis (PLS) or pseudobulbar palsy. The UMN predominant phenotype can then progress to ALS, which is observed in 40% of PLS cases [36]. Patients diagnosed with PLS for not meeting the diagnostic criteria for ALS can still slowly develop signs of LMN dysfunction and therefore present both UMN and LMN signs [27]. However, LMN involvement and limb atrophy in PLS is exceptionally rare [37] and the prognosis for PLS patients is better than that for patients diagnosed with ALS as symptom progression is relatively slow.

LMN-Dominant ALS Variants
On the contrary, some patients can develop a LMN-dominant phenotype which can be defined as progressive muscular atrophy (PMA), and flail-arm or flail-leg syndrome variants. PMA patients are similar to classic ALS patients without obvious signs of UMN dysfunction. However 50% to 60% of PMA patients develop degeneration of upper motor neurons during the progression of the disease [38], and post-mortem histopathology has demonstrated that some PMA patients show UMN involvement which could not be detected upon clinical examination [39,40]. In patients with flail-arm or flail-leg syndromes, a LMN pattern of weakness and atrophy is observed in the upper limbs or lower limbs, respectively. Similar to PMA, flail-arm and flail-leg syndrome have been described as a LMN variant but can show UMN involvement in the later stages of disease [41]. Involvement of secondary sites should not occur within 12 months of initial onset [42] and prognosis for flail-arm and flail-leg syndrome is better than that seen in ALS, with median survival times of 5 to 6 years [41,43].

Non-Motor Involvement in ALS and Overlap with FTD
For many years, ALS was described as a neurodegenerative disorder with no extra-motor involvement. However, non-motor involvement is now accepted in the ALS phenotype [44], with neuroimaging demonstrating reduced grey matter in motor and non-motor brain regions of ALS patients [45], and histopathology suggesting widespread neuronal and glial TDP-43 pathology in the CNS [46]. In regards to symptomology, a low proportion of ALS patients experience non-motor impairment as a first indication of pathology (3% of sporadic cases and 15% of familial cases) [47]. It has been estimated that approximatively 35% of ALS patients present behavioural and/or cognitive changes (with 15% meeting the Neary criteria [48] for FTD diagnosis (ALS-FTD) [47]). The reported percentage seems to be much lower in most gene-specific studies and varies considerably between them, but it should be noted that the number of patients and studies for which these clinical parameters are reported is relatively small (Table S2). ALS and FTD are sometimes described as part of one continuum, with pure ALS patients (without any non-motor involvement) and pure FTD cases (for whom no motor dysfunction has been described) representing opposite ends of the spectrum.

Non-Motor Involvement in ALS and Overlap with FTD
For many years, ALS was described as a neurodegenerative disorder with no extra-motor involvement. However, non-motor involvement is now accepted in the ALS phenotype [44], with neuroimaging demonstrating reduced grey matter in motor and non-motor brain regions of ALS patients [45], and histopathology suggesting widespread neuronal and glial TDP-43 pathology in the CNS [46]. In regards to symptomology, a low proportion of ALS patients experience non-motor impairment as a first indication of pathology (3% of sporadic cases and 15% of familial cases) [47]. It ALS patients having FTD usually meet the criteria for behavioural variant FTD characterised by defects in cognitive functions, personality traits and behavioural collapse. Among ALS cases experiencing non-motor dysfunction, language (particularly deficits in verbal fluency) and cognition are the most affected categories [49], and apathy is the most frequently encountered personality impairment [47].

Dementia in ALS Patients-ALS-FTD Variants
ALS-FTD diagnosis is made upon the presence of an ALS phenotype associated with behavioural or cognitive defects that fulfil FTD diagnostic criteria: (1) progressive impairment of behavioural/cognitive functions and observation of at least three behavioural symptoms defined by Rascvosky et al. [50]; or (2) loss of insight and/or presence of psychotic features associated with at least two Rascvosky et al. [50] symptoms; or (3) language impairment combined with semantic dementia (defined in [48]).

Cognitive Changes in Non-Demented ALS Patients-ALSci and ALSbi Variants
Non-demented ALS patients presenting with behavioural impairment are classified as ALSbi-variant, while ALS patients experiencing cognitive impairment including language defects are considered to be ALSci variant [47]. Based on the revised diagnostic criteria from Strong et al. [51], ALS patients can be diagnosed as ALSci variant if either executive impairment (social cognition), or language dysfunction, or a combination of the two features are evident during diagnosis. Diagnostic criteria for ALSbi variant require apathy with or without other behavioural symptoms, or two or more behavioural changes, such as disinhibition, loss of sympathy/empathy, perseverative/stereotypic/compulsive behaviour, hyper orality/dietary change, loss of insight and psychotic symptoms.

Genetics of ALS
Superoxide dismutase 1 (SOD1) was the first gene demonstrated to be associated with ALS in 1993 [52]. SOD1 is ubiquitously expressed in human cells and serves to protect them from harmful reactive oxygen species (ROS). Mutated forms of SOD1 are believed to result in a toxic gain of function, provoking the presence of misfolded protein aggregates, increased endoplasmic reticulum (ER) stress, and oxidative stress and ultimately accelerating motor neuron degeneration [17].
In 2001, mutations in ALSIN2 (ALS2) were shown to be implicated in juvenile forms of ALS [53][54][55] and PLS [56]. The ALS2 protein has been found to act as a guanine nucleotide exchange factor for the GTPase, Rab5, which is in involved in endosome trafficking [57]. Mutations in ALS2 have been shown to inhibit activation of Rab5 and its translocation to mitochondria, leaving ALS2 mutated motor neurons more susceptible to oxidative stress [58]. However, in murine studies, genetic ablation of ALS2 has failed to recapitulate the pathological features seen in ALS [59,60] although primary motor neurons from these mice did show greater sensitivity to oxidative stress and aberrant morphology, suggesting that ALS2 mutations may indeed play a role in motor neuron susceptibility in ALS.
Genetic mutations were next reported in 2004 for the senataxin (SETX), angiogenin (ANG), and vesicle-associated membrane protein-associated protein B (VAPB) genes. SETX plays a role in numerous cellular functions including RNA metabolism and has been shown to regulate RNA polymerase II transcription termination [61] and its yeast homolog, SEN1, has been linked with processing of non-coding RNA [62]. SETX mutations are strongly associated with juvenile-onset ALS [63] and associations have been confirmed in American, Italian and Dutch cohorts [63][64][65]. ANG is highly expressed in the human central nervous system [66] and has been reported to show neuroprotective properties [67]. Indeed, expression of ALS-associated ANG variants has been shown to cause motor neuron death in cell culture models [67]. ANG has also been reported to play a role in the transcription of ribosomal RNA [68] and many ALS-associated variants are believed to elicit a loss of function in ANG, thus eliminating any neuroprotective functionality [69]. VAPB is a protein closely associated with the endoplasmic reticulum and is thought to be involved in the induction of the unfolded protein response (UPR) [70], as well as cellular processes including lipid transport [71], protein secretion [72], and calcium homeostasis [73]. The P56S mutation in VAPB has been implicated in an early-onset and slow-progressing form of fALS [74] and follow-up studies have highlighted how this mutation can result in nuclear envelope defects [75], and provoke VAPB ER aggregates [72]. However, murine models expressing the P56S mutation show widespread VAPB aggregates but demonstrate no motor neuron pathology or ALS phenotypes [76].
The next genetic mutation associated with ALS did not arrive until 2008, when mutations in TAR DNA-binding protein (TARDBP), encoding TDP-43, were reported in patients [77]. TDP-43 is a RNA/DNA-binding protein that plays important roles in several RNA metabolism processes [78]. Ubiquitinated TDP-43 was first shown to be present in CNS inclusions of ALS patients in 2006 [79] and subsequent studies have confirmed TDP-43 as the major protein component of pathological inclusions present in approximately 90% of ALS patients [80]. However, TDP-43 pathology is not unique to ALS and has been reported in numerous neurodegenerative conditions including FTD [79], Parkinson's disease [81], Huntington's disease [82], Alzheimer's disease [83], and dementia with Lewy bodies [84].
Then, in 2009, multiple mutations in the nuclear RNA-binding protein, Fused in Sarcoma (FUS) and FIG4 phosphoinositide 5-phosphatase (FIG4), were associated with ALS [85,86]. FUS is another RNA/DNA-binding protein involved in mechanisms of RNA splicing and DNA repair [87] and is implicated in both ALS and FTD [88]. Mutations in FUS, particularly those near the nuclear localisation signal (NLS) domain, cause cytoplasmic protein mislocalisation and are associated with a severe phenotype in murine models [89]. FIG4 is involved in vesicle trafficking due to its role in the regulation of the membrane bound phosphoinositide, PI(3,5)P2 [90]. Mutations in FIG4 were initially shown to cause neurodegeneration in Charcot-Marie-Tooth (CMT) neuropathy [91]. However, others have questioned the role of FIG4 in ALS pathology after failing to find pathogenic mutations in their Taiwanese [92] and Italian [93] cohorts.
In 2010, mutations in Optineurin (OPTN), Spatacsin paraplegia 11 (SPG11), Valosin-containing protein (VCP), and Ataxin-2 (ATXN2) were all implicated in ALS. Three different OPTN mutations were identified in ALS patients [94] and researchers were able to demonstrate the increased immunoreactivity of OPTN in both TDP-43 and SOD1 inclusions found in the spinal cord of sALS patients, suggesting a role for OPTN in general ALS pathogenesis.
The link between SPG11 and ALS was established when mutations were found to be associated with autosomal recessive juvenile ALS [95]. Mutations to SPG11 are the most common cause of autosomal recessive hereditary spastic paraplegia [96] and loss of function mutations have been shown to elicit lysosomal dysfunction and UMN + LMN degeneration in mice [97]. ATXN2 encoding the ataxin-2 polyglutamine (polyQ) protein was associated with ALS when researchers identified the presence of intermediate length polyQ expansions (27-33 Qs) in 4.7% of their North-American ALS cohort [98]. Ataxin-2 protein has been shown to regulate mRNA stability and translation [99,100] and upregulation of the fly homolog of Ataxin-2 was found to enhance neurodegeneration in Drosophila via its interaction with wild-type and mutated forms of TDP-43 [98]. Involvement of Ataxin-2 in ALS pathogenesis has since been confirmed in European and Chinese patient cohorts [101,102]. VCP is an ATP-driven chaperone protein that plays a role in ubiquitin-regulated protein degradation [103], autophagy [104], and mRNA processing [105,106]. VCP mutations were shown to be present in 1-2% of familial ALS patients in an Italian cohort [107] and mice expressing ALS-associated VCP mutations have been shown to develop a slow-progressing ALS phenotype [108].
In 2011, mutations in ubiquilin-2 (UBQLN2), sequestosome-1 (SQSTM1), and chromosome 9 open reading frame 72 (C9orf72) were discovered. Ubiquilin-positive inclusions have been implicated in both sALS and fALS [109], whilst mutations in SQSTM1 have been observed in rare ALS and FTD cases [110] and can be shown to lead to p62 protein inclusions in motor neurons of both patient groups [111]. The G4C2 hexanucleotide repeat expansion mutation (HREM) within C9orf72 [112,113] is perhaps the most significant genetic mutation associated with ALS thus far, and is estimated to be present in 34% of familial cases, and 5% of sporadic cases in Europe [19,114]. In healthy subjects, the G4C2 repeat length ranges from 2 to 23 units [112], whilst intermediate expansions ranging from 24 to 30 [115] and large expansions ranging from 30 to many hundreds of units have been observed in ALS patients [112,116]. Although rare, C9orf72 expansions have been implicated in other neurodegenerative and psychiatric diseases including PD [117] and Schizophrenia [118], suggesting a wider role for C9orf72 in neuropathology and perhaps offering some insight towards the heterogeneous phenotype seen in C9orf72 ALS.
In 2012, Profilin 1 (PFN1) was implicated in familial and sporadic cases of ALS [119]. Mutant PFN1 has been shown to cause motor neuron degeneration through the formation of insoluble aggregates and disrupted cytoskeleton dynamics in mice [120] and co-aggregation of PFN1 and TDP-43 has been reported in cell lines expressing mutant PFN1 [119].
Then, in 2013, heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) was reported to be involved in ALS after researchers identified three hnRNPA1 variants-two of which were associated with familial ALS and the other of which was associated with a sporadic case [121]. hnRNPA1 is known to colocalise with TDP-43 [121] and post-mortem studies have shown that motor neurons of ALS patients display marked reductions in hnRNPA1 alongside concomitant TDP-43 inclusions [122].
In 2014, mutations in Tubulin alpha-4A (TUBA4A) and Matrin-3 (MATR3) were implicated in ALS. Mutations in TUBA4A were first identified in a European and American cohort [123] and then validated in Belgian and Chinese cohorts in 2017 and 2018 [124,125]. TUBA4A mutations have been shown to cause cytoskeletal defects in primary motor neurons [123] and are recognised as a rare cause of ALS and FTD [125].
MATR3 was first associated with ALS after exome sequencing identified mutations in Italian, UK and US kindreds, alongside increased levels of MATR3 protein in spinal cord sections of ALS patients relative to controls [126]. MATR3 has been found to interact with TDP-43 and both proteins were shown to co-aggregate in skeletal muscle tissue of ALS patients [126]. MATR3 is known to play various roles in RNA metabolism and alternative splicing [127,128] and recent evidence suggests ALS-associated MATR3 mutations play a role in defective nuclear export of FUS and TDP-43 mRNA [129] In 2015, NIMA-related kinase 1 (NEK1) was recognised as an ALS-risk gene [130] and was shown to interact with two other ALS genes, ALS2 and VAPB-both of which are involved in endosomal trafficking. Subsequent studies provided further evidence for the pathogenic role of NEK1 in ALS [131,132] and pathway analyses have shown NEK1 to interact with C21orf72-both of which are involved in DNA repair mechanisms [133]. Mutations in Tank-binding kinase 1 (TBK1) were also associated with ALS in 2015 after exome sequencing identified eight loss of function mutations in 13 fALS pedigrees [134].
Cyclin F (CCNF) was implicated in ALS in 2016 with variants identified in both familial and sporadic cases [135]. In the same study, researchers were able to demonstrate how mutant CCNF led to aberrant ubiquitination and aggregation of proteins including TDP-43. More recently, CCNF was shown to be a binding partner of another ALS protein, VCP. Binding of mutated CCNF to VCP increased VCP ATPase activity, which in turn led to increased TDP-43 aggregation in U20S cells [136].
Then, in 2018, the most recent genetic mutations implicated in ALS were discovered when research demonstrated the pathological involvement of Kinesin family member 5A (KIF5A) [137]. KIF5A is a protein expressed specifically in neurons and is involved in regulating neuronal microtubule dynamics [138,139]. KIF5A is also associated with spastic paraplegia and Charcot-Marie-Tooth neuropathy [140] and mutations have been reported in ALS patients in Chinese [141], European [142,143], and US cohorts [137].

Correlation of Genotype/Phenotype: Methods, Results and Discussion
To evaluate whether there is a correlation between associated genes and phenotype in ALS, a systematic search of original papers was performed using key words summarised in Table S1, while adhering to PRISMA guidelines (see checklist in Supplementary Materials).

Protocol
A systematic search was performed in PubMed using the key words: ALS, genotype phenotype, patient, and onset. To make sure that clinical data would also be obtained for rare genes involved in ALS and listed in Vijayakumar et al. [14], the following search terms were added: ALS, phenotype, patient and the gene name such as TBK1, VCP, SQSTM1, CCNF, NEK1, OPTN, FIG4, PFN1, ATXN2, VAPB, ANG, ALS2, SPG11, UBQLN2, KIF5A, and MATR3. There were no language, type of study, or publication date restrictions.

Eligibility Criteria
The search combining the different key words resulted in 355 articles. Reviews and duplicated papers were excluded. To avoid redundancy, papers re-using previously published clinical data were excluded. All studies used in the systematic review were peer-reviewed, written in English, and published original clinical data related to patients affected by monogenic forms of ALS. At least one of the following parameters had to be described in the paper: age of onset, site of onset, motor neuron population being affected (UMN, LMN, UMN+LMN), disease duration, number of patients with FTD, and number of patients with cognitive impairment. A total of 107 papers were then eligible for the analysis (see PRISMA flow chart in Figure 3). adhering to PRISMA guidelines (see checklist in supplemental material).

Protocol
A systematic search was performed in PubMed using the key words: ALS, genotype phenotype, patient, and onset. To make sure that clinical data would also be obtained for rare genes involved in ALS and listed in Vijayakumar et al. [14], the following search terms were added: ALS, phenotype, patient and the gene name such as TBK1, VCP, SQSTM1, CCNF, NEK1, OPTN, FIG4, PFN1, ATXN2, VAPB, ANG, ALS2, SPG11, UBQLN2, KIF5A, and MATR3. There were no language, type of study, or publication date restrictions.

Eligibility Criteria
The search combining the different key words resulted in 355 articles. Reviews and duplicated papers were excluded. To avoid redundancy, papers re-using previously published clinical data were excluded. All studies used in the systematic review were peer-reviewed, written in English, and published original clinical data related to patients affected by monogenic forms of ALS. At least one of the following parameters had to be described in the paper: age of onset, site of onset, motor neuron population being affected (UMN, LMN, UMN+LMN), disease duration, number of patients with FTD, and number of patients with cognitive impairment. A total of 107 papers were then eligible for the analysis (see PRISMA flow chart in Figure 3).

Data Extractions and Synthesis
The papers were thoroughly reviewed by OC, LLG, VM and SD. Key information was extracted from each study, and grouped into cohort characteristics (ethnicity/ country of the study, number of patients), age of onset (distribution and mean and standard deviation (SD)), site of onset (spinal, bulbar, respiratory, other/unknown), motor neurons being affected (UMN, LMN, UMN+LMN), disease duration (mean and SD), percentage of patients with FTD, and percentage of patients with cognitive impairment. All data are collated per gene in Table S2.
For the summary Table (Figure 4), the age of onset and disease duration are presented as the weighted mean ± SD, and the site of onset, motor neuron impairments, and FTD comorbidity are presented as weighted percentages, in all cases taking into account the number of patients studied as described below: Mean = mean of the parameter of interest given in the referenced study; n = number of patients studied for the corresponding parameter in the given study;

Characteristics of Studies
A total of 1630 ALS patients were included in the systematic review. The total number of reported patients for each gene is shown in Figure 4 column 4. As not all studies reported all clinical parameters, the total number of patients studied for each parameter is reported in the first subcolumn for each parameter. On average, 59% of the population was male, with considerable variation between genes (See Table S2). Most of the studies were conducted in Europe, North America and Asia.

Overall Findings and Discussion
For most genetic forms of ALS reported in Table S2 and in Figure 4, the age of onset ranges between 50 and 70 years old. Exceptions to this include cases of juvenile ALS, which are observed with mutations in SPG11 [95,144], FUS [145,146] and ALS2 [53,55] (Table S2). Whilst FUS patients are known to show considerable variation in phenotype, with some showing early onset and fast progression, others show a later age of onset and a slower-progressing phenotype [147]. This variation in the FUS phenotype has been hypothesised to arise due to the different effects exerted by missense and truncating mutations [148]. Interestingly, the studies reviewed here suggest that FUS mutations are indeed associated with a relatively early age of onset (41.8 ± 14.5 years) and a fast-progressing phenotype, with average disease duration lasting 30.6 months (Figure 4). Another gene sometimes associated with early-onset ALS is SETX. Patients with SETX mutations have been reported to display a slow-progressing phenotype in which bulbar and respiratory muscles seem largely unaffected [149]. However, in one reported case, a patient did go on to experience bulbar symptoms 3 years after onset [150]. Moreover, from the studies retrieved in this review, SETX patients do not show an early age of onset nor a particularly slow phenotype. For instance, the average age of onset for SETX patients was 59.5 ± 24.7 years with an average disease duration of 43.8 ± 37.5 months.
Many ALS-associated genes show variation in site of onset. Among the 22 genes included in Figure  4, cases of spinal onset are predominant in 19. This is in line with previous findings that suggest spinal onset accounts for approximately two-thirds of ALS cases [32]. For example, SOD1, hnRNAP1, TUBA4A, and ALS2 show a high percentage of patients with spinal onset (>80%), while spinal onset in VCP, NEK1, and TBK1 cases accounted for 50%, 50% and 55% of cases, respectively. Some other ALS-associated gene mutations were associated with a lower proportion of spinal onset, e.g., 33% of C9orf72 cases, and 40% of UBQLN2 cases. However, previous research suggests that C9orf72 ALS demonstrates frequent occurrence of both spinal [151] and bulbar onset [152]. Moreover, it has been reported that site of onset in C9orf72 ALS can be used to predict disease duration. For instance, the average age of onset in patients with spinal onset was 59.3 years, increasing to 62.3 years in patients with bulbar onset, and male patients with spinal onset seem to display a faster-progressing phenotype [153].
A striking 95% of SOD1 cases were classified as spinal onset. Indeed, animal studies have provided support for the notion that SOD1 pathology begins at the periphery and proceeds in a retrograde manner [154,155]. Recently, a homozygous mutation that eliminates the enzymatic activity of SOD1 was found to result in a severe LMN phenotype and mild cerebellar atrophy in a young child [156] and the presence of a SOD1 p.D12Y variant was shown to result in a LMN-predominant phenotype [157]. Similarly, seven studies reported a non-negligible percentage of patients with pure LMN signs (Table S2, Figure 4, 47.6% pure LMN vs. 45.2% UMN+LMN, [158][159][160][161]). Overall, these studies seem to suggest that SOD1 mutations exert profound effects at the distal nerve. In addition, the observation that both overexpression, and absence of SOD1 activity lead to pathology should be an important consideration in the development of therapeutics that aim to alter SOD1 levels as a novel treatment in ALS [162]. Figure 4 was sorted in descending order for the percentage of patients showing LMN signs. Not all studies reported UMN and/or LMN signs, and thus the percentage given in this table only represents a small proportion of the studies (see Table S2 for more details). However, it is interesting to see that the majority of the gene mutations do indeed elicit a phenotype that is characterised by both UMN+LMN signs, consistent with the classical clinical definition of ALS. FUS, C9orf72 and TARDBP all demonstrated increased presence of both UMN and LMN signs with both neuronal populations affected in 66.7%, 72.7% and 44.4%, respectively. Surprisingly, only 33% of FIG4, PFN1, MATR3 and NEK1 cases showed both UMN and LMN signs, although it should be noted that 4 of the 14 studies reviewed in relation to these genes did not provide details regarding the pattern of motor neuron involvement. Some ALS-associated genes demonstrated >20% of patients with pure LMN signs (SOD1, FUS, PFN1, ATXN2, TARDBP, TBK1, and hnRNPA1), while pure UMN signs had >20% preponderance in several genes (ANG, TBK1, FIG4, MATR3, NEK1, hnRNPA1).

Conclusions
Over 150 years have passed since ALS was first reported by Charcot and still the aetiology of the disease remains elusive. Although research is progressing and genetic studies continue to identify novel gene associations [14,[249][250][251], many questions remain surrounding the pathological mechanisms associated with already established mutations, their role in the ALS phenotype, and the as yet undiscovered mechanisms that underlie sporadic onset of disease. Here, we have performed a systematic review in an attempt to highlight genotype-phenotype correlations for 23 of the more commonly reported mutated genes in ALS. This has proven to be challenging as many genetic studies do not capture or report a complete summary of clinical data. Whilst it is understandable that such data are difficult to acquire, we hope to illustrate that there is a need for improved and more widely available clinical and informatics resources that would enable genotype-phenotype associations to be easily visualised in ALS.
Whilst we have illustrated the relationship between commonly reported mutated genes and various clinical measures including age and site of onset, disease duration and motor neuron involvement, a limitation of the current review is that we do not consider variation among phenotypes of patients having different mutations of the same gene. For many genes involved in ALS, including FUS, SOD1, and TARDBP, the phenotype may be different depending upon the specific genetic mutation in question. In SOD1 patients, for instance, the A4V mutation results in a much more aggressive phenotype (death occurring~1.2 years after onset [252]) than the H46R mutation, for which patients show a relatively mild phenotype (duration of~17 years [253]). It could be of value in future work to comprehensively review variations in genotype-phenotype correlations among the different mutations reported by single-gene studies, which in turn could contribute towards a comprehensive database of ALS genotype-phenotype correlation. Such a resource could ultimately improve our mechanistic understanding of ALS by enabling a more robust assessment of how the ALS phenotype responds to different variants across multiple genes.
Additional limitations include that many of the studies surveyed are relatively small, involving low numbers of patients, and that, as well as only a subset of studies reporting clinical breakdown of phenotype, ethnic breakdown is also not always reported and some ethnicities have minimal representation.
Despite these limitations, the collected data reveal a landscape of highly variable phenotypic associations, underlining the complexity of the disease, and the need for nuanced approaches to the development of clinical assays and therapeutics.