Individual Oligogenic Background in p.D91A-SOD1 Amyotrophic Lateral Sclerosis Patients

The p.D91A is one of the most common ALS-causing SOD1 mutations and is known to be either recessive or dominant. The homozygous phenotype is characterized by prolonged survival and slow progression of disease, whereas the affected heterozygous phenotypes can vary. To date, no genetic protective factors located close to SOD1 have been associated with the mild progressive homozygous phenotype. Using Next Generation Sequencing (NGS), we characterized a small cohort of sporadic and familial p.D91A-SOD1 heterozygous (n = 2) or homozygous (n = 5) ALS patients, to reveal any additional contributing variant in 39 ALS-related genes. We detected unique sets of non-synonymous variants, four of which were of uncertain significance and several in untranslated regions of ALS-related genes. Our results supported an individual oligogenic background underlying both sporadic and familial p.D91A cases irrespective of their p.D91A mutant alleles. We suggest that a comprehensive genomic view of p.D91A-SOD1 ALS patients may be useful in identifying emerging variants and improving disease diagnosis as well as guiding precision medicine.


Introduction
Amyotrophic lateral sclerosis (ALS) represents the third most common neurodegenerative disease, characterized by the progressive adult-onset degeneration of upper and lower motor neurons [1]. There are two main forms of ALS, familial (FALS) and apparently sporadic (SALS) accounting for about 10% and 90% of cases, respectively [2]. This complex disease is caused by the interplay of causative genetic factors (monogenic or oligogenic) and risk factors (genetic and non-genetic) [3].
The first ALS-related gene described was superoxide dismutase 1 (SOD1), whose mutations affect about 12% of FALS and 1% of SALS [1]. The most common mutation affecting SOD1 and causing ALS is the substitution of alanine for aspartate at position 91 of exon 4, called p.D91A (also known as p.D90A; dbSNP 155 ID rs80265967; NM_000454.5, c.272A>C) [4]. Prevalence of p.D91A in ALS cases varies globally and is distinctly absent in some populations (https://gnomad.broadinstitute.org/, accessed on 22 October 2021). In particular, this is the most prevalent variant in Europe [5].
The first group of patients displays a slowly evolving phenotype linked to the p.D91Ahomozygous genotype [1,9,[14][15][16] with no other identified genetic modifier responsible for the mild phenotype. In addition, a p.D91A-homozygous patient has been recently described having vocal cord impairment, which is not a typical clinical sign associated with this genotype [15].
The second group of patients, concerning the heterozygous pattern of zygosity, includes cases with compound heterozygous mutations (p.D91A/p.D96N; p.D91A/p.D90V), showing a lower limb site of onset and a slow progressive phenotype with a variable disease duration, ranging from 7 to 28 years [21,22,24]. In addition, five clinically affected p.D91A-heterozygous cases with slow progression of disease were reported: one patient with a negative family history for the disease [23], another belonging to a large family carrying the TDP-43 p.G298S mutation [17], and three individuals of the same family [16]. However, it is not clear if other SOD1 variants were investigated in these cases.
The third p.D91A SOD1-related phenotype, shown in Table 1, is characterized by an aggressive evolution of the disease and is found in individuals described carrying a dominant p.D91A variant co-segregating with ALS [4]. In this group falls the case of a p.D91A-heterozygous affected carrier showing TDP-43 aggregates, with a family history of ALS and other neurodegenerative diseases [25]. Few other case reports described coexistence of TDP-43 inclusions with dominantly inherited SOD1 variants since these aggregates are neuropathological hallmarks of ALS-FTD and SALS patients [25][26][27][28].
Furthermore, the p.D91A-heterozygous mutation plus the pathogenic C9ORF72 repeat expansion or the variant of uncertain significance (VUS) UBQLN2-Q460R [29] were already described in patients associated with the ALS-FTD phenotype [19,30].
Differences in the genotype-phenotype correlations delineated above may have considerable therapeutic implications. Indeed, recruitment for antisense therapy was recently discouraged in p.D91A-heterozygous affected carriers after finding evidence of one case showing TDP-43 aggregates as autoptic findings [25].
Although no further variants have been identified in the conserved region surrounding SOD1 that may explain the mild progressive phenotype in homozygous mutation carriers, the existence of contributing genetic factors in other DNA regions cannot be ruled out [31].
Based on these premises, in this study we investigated by targeted NGS the presence of additional variants in 39 ALS-related genes in SALS and FALS patients carrying the p.D91A-SOD1 heterozygous or homozygous mutation, with the aim to reveal any other contributing variant able to explain the homozygous mild phenotype.

Patients
Informed consent was obtained from all the participants included in the study. ALS patients (three familial and four sporadic cases, four women and three men) from southern Italy unrelated families, diagnosed with ALS according to the El Escorial criteria [32], were previously recruited at our institution and genetically defined as carriers of the p.D91A-SOD1 heterozygous or homozygous mutation by Sanger sequencing analysis, as previously reported [14,33]. Clinical features and known genetic background of patients described in this study are shown in Table 2. In our cohort, patients affected by the biallelic p.D91A variant, mainly showed a prolonged survival unlike p.D91A heterozygous affected carriers. ALS patients without SOD1 mutations, together with individuals affected either by a motor neuronal (MN) phenotype or other neurodegenerative diseases not associated to ALS and belonging to our Southern Italian reference population were used as control samples (C1-C8) since we were not interested in pleiotropic effect [34]. Our filtering strategy aimed to identify rare and polymorphic variants able to synergistically act with the already known causative mutation, so data were normalized for two different types of confounding factors, such as genetic background and overlapping phenotypes at the same time. The study was approved by the local Ethics Committee of Azienda Ospedaliero Universitaria of Bari N 1025.

NGS Analysis
Genomic DNA quality and quantity were evaluated using standard agarose electrophoresis and Qubit™ Fluorometer (Invitrogen, Waltham, MA, USA). Based on quantitative results, samples were normalized to 50 ng/µL to be used as an input in targeted NGS sequencing analysis. Deep sequencing analysis was performed using a custom ALSrelated gene panel including genes known to be associated or possibly associated with ALS and overlapped phenotypes. In detail, we targeted the coding regions of 39 ALS-related genes with at least 25 bp of intronic flanking regions, together with the promoter region of the following subset of genes: SOD1, TARDBP, FUS, ANG, ALS2, TBK1, SPG11, PFN1, TUBA4A, SETX, VCP, MATR3, VAPB, CCNF, NEK1, HNRPA1, and ERBB4 (Supplementary  Table S1) [35]. Libraries were prepared using the custom Ion AmpliSeq kit (Life Technologies, Carlsbad, CA, USA), and sequencing analysis was run on an Ion Torrent Personal Genome Machine™ (PGM™) sequencer (Thermo Fisher Scientific, Carlsbad, CA, USA).
To set the bioinformatic pipeline, we followed the best practice consensus recommendations developed by the College of American Pathologists and the American Medical Informatics Association [36]. Primary and secondary data analysis were performed using the Torrent Suite (Thermo Fisher Scientific, Carlsbad, USA), with the Human genome [19] to align sequences and a Germline low stringency variant caller setting. Tertiary level data analysis was carried out using Partek Flow software build version 10.0.21.0302 (Partek Inc., St. Louis, MO, USA. Variant annotation was performed using Ensemble transcript release 75, SnpEff, VEP.84 databases. Variant frequencies in ALS patients and controls belonging to the project MinE database, were annotated using the project MinE data browser (http://databrowser.projectmine.com/, accessed on 20 April 2021), whose current dataset contains WGS data from 4366 ALS cases and 1832 controls [37]. Variants were then pre-filtered for SNP low stringency quality parameters (FAO > 2, FDP > 6, QUAL > 20, STB < 0.9) to filter out false positives and retain >99% of true positives calls (a strategy optimized for amplicon-based semiconductor sequencing) [38]. Variants with a QD < 1 were also filtered out based on our experience on SNV false positive calls and the observation that true positive calls have high mean coverage and quality by depth values [39]. Variant prioritization was carry out filtering for Read Depth ≥ 20X (minimum read depth for germline variants calling), and for MAF-European Ancestry Population-Freq< 0.5. We further removed intronic variants and synonymous variants not affecting canonical splice sites. The same NGS and bioinformatic pipelines were applied either on patients (P1-P7) or controls (C1-C8) to compare and filter patients' variants. Comparison between patient and control variants were performed using the Summarize cohort mutations by merging pairs option in Partek Flow. Variant classification was performed querying VarSome (https://varsome.com/, accessed on 20 April 2021), a search engine for human genomic variation freely available and implemented to automatically classify and report the variant classification according to ACMG guidelines [40], and Clin-Var (https://www.ncbi.nlm.nih.gov/clinvar/, accessed on 20 April 2021) [41]. We also used CADD GRCh37-v1.6 (https://cadd.gs.washington.edu/snv, accessed on 24 October 2021) [42] prediction ranking for deleteriousness of variants, without setting an arbitrary cut-off for our disease model, since p.D91A variant is reported having a CADD score of 9.481, which is below the scaled scores of 10 (predicted to correspond to the 10% most deleterious substitutions in the human genome). Inferential statistics was not conducted because of the small sample size. Descriptive statistics for variant frequencies was calculated in our case series, in our South Italian reference population [33], as well as inferred by the related population databases.

eQTLs
Expression Quantitative Trait Loci (eQTL) analysis, helpful to understand the effect of genetic variations on the transcriptome in healthy post-mortem tissues donors, was performed using Genotype-Tissue Expression GTEx Portal v.8 (www.gtexportal.org, accessed on 24 October 2021) [43]. The eQTL for each of the 19 variants of interest was calculated for five tissues of interest (whole blood, brain cortex, brain frontal cortex, spinal cord, and skeletal muscle) using GTEx eQTL Calculator, generating a p-value for each variant-gene pair T-statistics in an eQTL. T-test results were corrected using Bonferroni correction test.

Results
Data obtained from deep sequencing analysis of p.D91A-SOD1 patients are reported in detail in supplementary data. In particular, coverage analysis and alignment quality parameters are shown in Supplementary Tables S1 and S2, respectively. Results obtained after pre-filtering for variant quality and prioritization for germline read depth and frequency in the European population showed only already known variants (Supplementary Table S3). We obtained a p.D91A average read depth across the seven ALS patients of 616X, and a mean quality by depth of 8.4 and 38.3 for heterozygous and homozygous, respectively.
ous substitutions in the human genome). Inferential statistics was not conducted because of the small sample size. Descriptive statistics for variant frequencies was calculated in our case series, in our South Italian reference population [33], as well as inferred by the related population databases.

eQTLs
Expression Quantitative Trait Loci (eQTL) analysis, helpful to understand the effect of genetic variations on the transcriptome in healthy post-mortem tissues donors, was performed using Genotype-Tissue Expression GTEx Portal v.8 (www.gtexportal.org, accessed on 24 October 2021) [43]. The eQTL for each of the 19 variants of interest was calculated for five tissues of interest (whole blood, brain cortex, brain frontal cortex, spinal cord, and skeletal muscle) using GTEx eQTL Calculator, generating a p-value for each variant-gene pair T-statistics in an eQTL. T-test results were corrected using Bonferroni correction test.

Results
Data obtained from deep sequencing analysis of p.D91A-SOD1 patients are reported in detail in supplementary data. In particular, coverage analysis and alignment quality parameters are shown in Supplementary Table S1 and S2, respectively. Results obtained after pre-filtering for variant quality and prioritization for germline read depth and frequency in the European population showed only already known variants (Supplementary Table S3). We obtained a p.D91A average read depth across the seven ALS patients of 616X, and a mean quality by depth of 8.4 and 38.3 for heterozygous and homozygous, respectively.
The variant prioritization strategy adopted ( Figure 1) showed the presence of 19 SNVs in p.D91A-SOD1 patients (Table 3). Using the VarSome tool, four variants were classified as a VUS (rs45488900, rs41266793, rs139334167, rs76708676).    In Table 4 we list the co-occurrence of variants in ALS-associated genes in each patient: 19 variants were exclusively found in patients and not in controls; two variants, in PON1 and/or GRN, instead, were detected in both patients and controls (except for P1 and C1). Only two variants were shared by two or three patients, including one segregating with the genotype zygosity. This was the case of TUBA4A/TUBA4B rs45488900, shared by two p.D91A-homozygous SALS patients of our cohort, P4 (Het) and P5 (Hom). The entire list of annotated variants detected in patients and controls is available in Supplementary Table S4. To investigate the possible relation between variants detected by our analysis and gene loci affecting gene expression, particularly for untranslated region variants, we also calculated their potential effect on gene expression through their mapping on eQTLs. Data retrieved by GTEx Portal v.8 and corrected by a Bonferroni correction test (Supplementary  Table S5) showed a tissue-specific effect for three out of 19 variants queried in non-diseased tissues of interest (rs11555696, 183 rs34099167, and rs118018900).

Discussion
In this study, we performed targeted NGS analysis in a small group of south Italian ALS patients, previously genetically characterized as p.D91A carriers, hypothesizing that genetic factors in other ALS-related genes, in combination with the p.D91A-SOD1 variant, may contribute to the different disease phenotypes in homozygous and heterozygous cases.
Recently, Sanger sequencing analysis performed in 997 ALS patients from southern Italy by our research group, revealed that 2% of patients had SOD1 mutations [33]. In particular, the frequency of p.D91A affected individuals represented 0.8% of all ALS cases diagnosed (0.6% p.D91A-hom and 0.2% p.D91A-het). These data are in line with the frequency of this mutation reported by other Italian research groups [57].
Previous reports demonstrated the absence of a neuroprotective factor in the genomic region near SOD1 in p.D91A-homozygous ALS patients, suggesting the existence of a putative protective factor modulating the phenotype located elsewhere in the genome [31].
The present investigation showed that p.D91A-heterozygous and -homozygous ALS cases do not contain a genetic modifier near SOD1, nor near ALS-linked genes, highlighting the presence of unique variant gene sets in each patient (Table 4). A similar conclusion was recently published in a study investigating the A90V-SOD1 mutation in SALS patients, suggesting that additional genetic variants could contribute to disease penetrance [24].
We identified 19 non-synonymous variants, including four of uncertain significance, in ALS-D91A carriers. Most substitutions (13/19) exclusively found in p.D91A patients were in non-coding regions of ALS-related genes, while the remaining (6/19) were missense mutations without any clear evidence of pathogenic effects. Although large genes (i.e., SETX or NEK1) have more chance to accumulate rare variants, the 19 variants identified in genes related to ALS were exclusively found in patients and not in controls. However, no clear genotype-phenotype correlation was established due to the small sample size.
All but one patient (P1) showed the presence of variants already identified as risks factors for neurodegenerative diseases [56][57][58], rs662 (PON1) and rs5848 (GRN) ( Table 4). Previous studies inconsistently suggested an effect of PON1 SNPs on ALS susceptibility, and rs662 was associated with bulbar onset and reduced survival in ALS cases very recently [55]. However, all the patients carrying this risk factor in our cohort showed a spinal onset of the disease. The GRN rs5848 polymorphism was reported in Alzheimer's Disease (AD) and Parkinson's Disease (PD) patients as risk factor for ubiquitin-and TDP-43 -positive frontotemporal degeneration [59]. Interestingly, this genetic variant lies in the binding-site for the miR-659 of the 3'UTR of GRN and may alter gene regulation [59].
Two variants classified as VUS were already described, the rs139334167 in SPG11 and the rs45488900 in TUBA4A. The missense variation affecting SPG11 was previously reported in a case of PD [44], while the rs45488900 affects TUBA4A but also the upstream region (n.-1456G>T) of TUBA4B. The encoded protein of the latter, was found differentially over-expressed in post-mortem pre-frontal cortex samples of patients affected by atypical ubiquitin-positive frontotemporal lobar degeneration, characterized by ubiquitin and FUS positive inclusions, while TUBA4A was down-expressed in the cerebellum of the same group of patients when compared to controls [58]. Interestingly, rs45488900 was shared by p.D91A-homozygous SALS patients showing a slow course but with a different clinical picture of the disease ( Table 2; Table 4). Due to the small number of homozygous SALS in our cohort, no inference on the possible role of this variant in association with the phenotype could be made, but this aspect remains noteworthy and deserves to be explored in a larger number of cases. We did not find any literature reports describing two other variants classified as VUS. These substitutions, positioned in the 5'UTR of HFE (rs41266793) and in the 3' UTR of VAPB (rs76708676), were detected in the homozygous case P3, and in the P5 patient with the longest observed disease duration (22 years) ( Table 3).
Further investigating the potential influence of identified variants on gene expression by eQTL analysis, we also observed that two polymorphisms, rs34099167 and rs11555696, were associated with deregulated expression levels of DCTN1 and NEK1 in some of the tissues considered (Supplementary Table S5). Interestingly, the altered expression of these two genes was already found by our research team using unsupervised clustering of gene expression in motor cortex samples, identifying two transcriptome-based SALS subgroups of patients [60]. In particular, NEK1 was found down-regulated in one cluster of patients while the second one was characterized by increased expression of DCTN1 and reduced levels of SOD1 [60]. NEK1 belongs to NIMA-related serine/threonine kinases family and is involved in mitochondrial membrane regulation, DNA damage response, ciliogenesis and maintenance of the cytoskeleton network [29], while DCTN1 is a motor protein involved in dynein-mediated axonal retrograde transport and ciliogenesis [61]. Moreover, dynactin (Dctn1) acting with overexpressed dynamitin (Dctn2), was shown to produce a late-onset progressive motor neuron disease inhibiting axonal transport in transgenic mice [62]. In our data, rs34099167 and/or rs11555696 were detected in 3 out of 7 patients, both homozygotes and heterozygotes, showing different clinical features, although no correlation between phenotype and genotype was established (Table 4).
Recently, the analysis of two different cohorts, with a majority of apparently sporadic cases, showed an oligogenic basis of ALS associated with earlier age of disease [63,64]. Differences in genotype-phenotype correlations would have considerable therapeutic implications. ALS is a devastating pathology in which multiple variants cooperate in influencing disease onset, severity or duration and, until now, no truly effective treatment exists [65]. Thanks to recent efforts to selectively treat SOD1-related ALS patients, ASO therapies designed to knock-down the expression of the gene have emerged [66]. To this regard are of particular interest two ongoing phase 3 studies using intrathecally administered ASO Tofersen, an orphan drug capable of reducing SOD1 protein levels [67]. To ensure the appropriate recruitment of patients to clinical trials it appears evident the importance to establish if p.D91A mutation is pathogenic in the heterozygous state and if other contributing factors influence the phenotype.

Conclusions
Our study suggests the possibility that additional genetic factors contribute to the individual oligogenic basis of p.D91A-SOD1 carriers. In particular, all patients, except for P2 carrying only risk factors, showed an oligogenic pattern in line with the model proposed for ALS etiopathogenesis in which mutations in two or more genes are required to develop the disease, but they are not all necessarily truly pathogenic [1,19]. Increasing the number of sequenced p.D91A patients could be useful in identifying emerging genetic factors and improving disease diagnosis, as well as guiding precision medicine.