Cathepsin B p.Gly284Val Variant in Parkinson’s Disease Pathogenesis

Parkinson’s disease (PD) is generally considered a sporadic disorder, but a strong genetic background is often found. The aim of this study was to identify the underlying genetic cause of PD in two affected siblings and to subsequently assess the role of mutations in Cathepsin B (CTSB) in susceptibility to PD. A typical PD family was identified and whole-exome sequencing was performed in two affected siblings. Variants of interest were validated using Sanger sequencing. CTSB p.Gly284Val was genotyped in 2077 PD patients and 615 unrelated healthy controls from the Czech Republic, Ireland, Poland, Ukraine, and the USA. The gene burden analysis was conducted for the CTSB gene in an additional 769 PD probands from Mayo Clinic Florida familial PD cohort. CTSB expression and activity in patient-derived fibroblasts and controls were evaluated by qRT-PCR, western blot, immunocytochemistry, and enzymatic assay. The CTSB p.Gly284Val candidate variant was only identified in affected family members. Functional analysis of CTSB patient-derived fibroblasts under basal conditions did not reveal overt changes in endogenous expression, subcellular localization, or enzymatic activity in the heterozygous carrier of the CTSB variant. The identification of the CTSB p.Gly284Val may support the hypothesis that the CTSB locus harbors variants with differing penetrance that can determine the disease risk.


Introduction
Parkinson's disease (PD) is the second most common neurodegenerative disorder [1]. Clinical symptoms include bradykinesia, resting tremors, muscular rigidity, and a good response to levodopa or dopamine-agonist treatment [2]. The degeneration of dopaminergic neurons in the substantia nigra and the accumulation of α-synuclein are the main pathological hallmarks of PD [3]. It is mostly a sporadic disease, but about 10% of cases are familial [4]. Mutations in PD-related genes are usually found in patients with earlyonset PD (autosomal recessive) or in patients with positive family history (autosomal dominant) [4]. In addition to the well-described causative genes, many more have been identified as potential risk factors. The most recent genome-wide association study (GWAS) revealed almost 100 susceptibility loci [5]. Although the genetics of PD in the Polish population have been explored, known genetic causes are not commonly observed [6]. This suggests that there may be specific PD-associated genetic variants within unresolved loci in the Polish population. Understanding the genetic causes of PD within families can be informative for the more frequent sporadic form of the disease. There are a number of examples in the genetics of PD that show variable penetrance at a single locus, e.g., SNCA and LRRK2, which were both identified through family-based studies but later found to harbor common variants of intermediate/low penetrance. Herein, we describe a genetic investigation of a Polish kindred with two siblings affected by PD and basic functional evaluation of the nominated CTSB mutant in patient-derived fibroblasts.

Results
The proband was a 58-year-old female (II-2) with 17 years history of PD ( Figure 1). She developed right-hand rigidity and was responsive to levodopa treatment. After 11 years of disease onset, she developed on/off fluctuations and bilateral deep brain stimulation to the subthalamic nucleus was implanted. Her brother ( Figure 1) (II-1) developed a right-hand tremor at the age of 61 and was then diagnosed with PD. After 6 years of disease onset, he continued to show a good levodopa response (Supplementary Video S1) [7]. No other known history of PD was found within the family, due to limited information from siblings who currently live in Germany. The analysis of the three main EOPD genes-PRKN, PINK1, and DJ1 (exons sequencing and MLPA) did not show the presence of pathogenic variants ( Figure 2). The WES analysis performed for both affected subjects (II-1, II-2) identified 317 rare (MAF < 1%) missense or loss-of-function variants shared between the sibling pair. Of the 317 variants, 269 were heterozygous and 48 variants were homozygous (Supplementary Table S1). Seventeen of the identified variants, including 11 heterozygous and 6 homozygous, were previously reported to be associated with PD (Table 1).
The variant CTSB p.Gly284Val (c.851G > T) was identified as the top variant responsible for the symptoms and disease within the family. Interestingly, the CTSB locus was previously identified as a GWAS target, and this variant is predicted as likely pathogenic according to in silico analysis ((MutationTaster 2021, Berlin Institute of Health, Berlin, Germany), SIFT4G (Bioinformatics Institute, Singapore), CADD v1.4 (University of Washington, Hudson-Alpha Institute for Biotechnology, St. Louis, MO, USA) score 26.7) [5]. To further validate our findings, 5 independent cohorts (Table 2) were screened with no additional carriers for CTSB p.Gly284Val found. The result from gene burden analysis was negative when using the Mayo Clinic Biobank cohort and the Mayo Clinic familial PD cohort (Supplementary Table S2).  The analysis of the three main EOPD genes-PRKN, PINK1, and DJ1 (exons sequencing and MLPA) did not show the presence of pathogenic variants ( Figure 2). The WES analysis performed for both affected subjects (II-1, II-2) identified 317 rare (MAF < 1%) missense or loss-of-function variants shared between the sibling pair. Of the 317 variants, 269 were heterozygous and 48 variants were homozygous (Supplementary Table S1). Seventeen of the identified variants, including 11 heterozygous and 6 homozygous, were previously reported to be associated with PD (Table 1).  The variant CTSB p.Gly284Val (c.851G > T) was identified as the top variant responsible for the symptoms and disease within the family. Interestingly, the CTSB locus was previously identified as a GWAS target, and this variant is predicted as likely pathogenic according to in silico analysis ((MutationTaster 2021, Berlin Institute of Health, Berlin, Germany), SIFT4G (Bioinformatics Institute, Singapore), CADD v1.4 (University of Washington, Hudson-Alpha Institute for Biotechnology, St. Louis, MO, USA) score 26.7) [5]. To further validate our findings, 5 independent cohorts ( Table 2) were screened with no additional carriers for CTSB p.Gly284Val found. The result from gene burden analysis was negative when using the Mayo Clinic Biobank cohort and the Mayo Clinic familial PD cohort (Supplementary Table S2).  [20] Alt-alternative, CADD-Combined Annotation Dependent Depletion, Chr-chromosome, GWAS-genome wide association studies. To address the potential significance of the p.Gly284Val substitution, we first inspected the surrounding sequence and the localization of the residue within the 3D structure of the protein (Figure 3). CTSB is synthesized as a preproenzyme which, upon cleavage of the N-terminal signal and pro-peptide regions, is further processed into the mature, active protease consisting of light and heavy chains ( Figure 3A). Although Gly284 does not directly localize to the active site of the enzyme, it resides within a highly conserved region of the heavy chain that folds into the R-lobe of CTSB and may modulate catalytic activity ( Figure 3B,C).  To functionally characterize the novel mutation, we examined mRNA and protein expressions as well as its activity and the localization using patient-derived fibroblasts from both PD patients and age-matched controls (Figure 4). During culture, there were no apparent differences in cell proliferation, morphology, or survival observed between patients and controls. Under standard, non-stress conditions, heterozygous mutant fibroblasts showed no major differences in total CTSB transcript or protein levels ( Figure 4A,B). To functionally characterize the novel mutation, we examined mRNA and protein expressions as well as its activity and the localization using patient-derived fibroblasts from both PD patients and age-matched controls ( Figure 4). During culture, there were no apparent differences in cell proliferation, morphology, or survival observed between patients and controls. Under standard, non-stress conditions, heterozygous mutant fibroblasts showed no major differences in total CTSB transcript or protein levels ( Figure 4A,B). There was also no significant change in enzymatic activity of CTSB ( Figure 4C). Immunocytochemistry showed similar lysosomal distribution of the CTSB signal between patients and control fibroblasts ( Figure 4D). There was also no significant change in enzymatic activity of CTSB ( Figure 4C). Immunocytochemistry showed similar lysosomal distribution of the CTSB signal between patients and control fibroblasts ( Figure 4D).

Discussion
Our study identified a novel, rare CTSB p.Gly284Val variant in the affected family that may play a role in the pathogenesis of PD. Although we did not show differences in CTSB gene burden analysis in small cohorts of control and PD cases, accumulating evidence from recent large genetic studies has revealed a strong involvement of CTSB in PD development. The CTSB locus was identified as a genetic risk locus in a recent GWAS analysis [5]. CTSB was considered as PD expression quantitative trait loci in the transcriptomic analysis [21]. The CTSB variant rs1293298 was also identified as the modifier of risk and age of onset in GBA associated PD and Lewy body dementia [22]. On the other hand, we conducted CTSB gene burden analysis and CTSB p.Gly284Val genotyping which did not find any associations. In a limited number of functional studies of CTSB in PD, the direct involvement of CTSB in α-synuclein degradation was revealed. CTSB and CTSL were shown to jointly cleave α-synuclein within its amyloid region and circumvent fibril formation [13]. However, CTSB also contributed to the generation of C-terminally truncated α-synuclein species in symptomatic SNCA p.Ala53Thr transgenic mice [23]. Yet, the impact of the CTSB variant in PD patients on protein activity had never been investigated before. A structure-informed multiple sequence alignment of human CTSB against a selection of homologs found that Gly284 is evolutionarily conserved and lies within the heavy chain of active enzymes [24]. This indicates the potential structural importance of this residue, thus a mutation such as p.Gly284Val may adversely affect protein function.
Cathepsins are lysosomal proteases that are mainly found in the acidic compartments where they are most active, but each cathepsin has its own specific optimal pH environ- (C) CTSB activity assay using cell lysates showed no differences in total enzymatic activity in the mutant fibroblast compared to the control. Circle-control, square-patient #1 carrying CTSB G284V, triangle-patient #2 carrying CTSB G284V. (D) Representative images of CTSB immunofluorescence staining (green) in fibroblasts at baseline condition. Scale bar: 20 µm. n = 3 independent experiments. Data are normalized to set control as 1 and are shown as mean with standard error. One-way ANOVA.

Discussion
Our study identified a novel, rare CTSB p.Gly284Val variant in the affected family that may play a role in the pathogenesis of PD. Although we did not show differences in CTSB gene burden analysis in small cohorts of control and PD cases, accumulating evidence from recent large genetic studies has revealed a strong involvement of CTSB in PD development. The CTSB locus was identified as a genetic risk locus in a recent GWAS analysis [5]. CTSB was considered as PD expression quantitative trait loci in the transcriptomic analysis [21]. The CTSB variant rs1293298 was also identified as the modifier of risk and age of onset in GBA associated PD and Lewy body dementia [22]. On the other hand, we conducted CTSB gene burden analysis and CTSB p.Gly284Val genotyping which did not find any associations. In a limited number of functional studies of CTSB in PD, the direct involvement of CTSB in α-synuclein degradation was revealed. CTSB and CTSL were shown to jointly cleave α-synuclein within its amyloid region and circumvent fibril formation [13]. However, CTSB also contributed to the generation of C-terminally truncated α-synuclein species in symptomatic SNCA p.Ala53Thr transgenic mice [23]. Yet, the impact of the CTSB variant in PD patients on protein activity had never been investigated before. A structure-informed multiple sequence alignment of human CTSB against a selection of homologs found that Gly284 is evolutionarily conserved and lies within the heavy chain of active enzymes [24]. This indicates the potential structural importance of this residue, thus a mutation such as p.Gly284Val may adversely affect protein function.
Cathepsins are lysosomal proteases that are mainly found in the acidic compartments where they are most active, but each cathepsin has its own specific optimal pH environment [25]. Cathepsins take part in different physiological and pathological processes and play critical roles in intracellular protein degradation, energy metabolism, and immune response [26]. The importance of the lysosomal pathway in PD pathogenesis was described previously, and cathepsins belong to the most crucial lysosomal proteins [27]. Significant reduction in the lysosomal degradation capacity, substantially enlarged lysosomes, and increased lysosome number were observed in the CTSB and CTSL double knockout human neuroblastoma cells [28]. Moreover, CTSB knockout cells exhibited accumulated lysosomal protein LAMP1 (lysosomal-associated membrane protein 1) [28]. Lack of CTSB was also shown to impair lysosomal trafficking during neural development [29]. Additionally, CTSB was found to indirectly control the transcription factor EB (TFEB), which is the most important regulator of autophagy and lysosomal gene expression [30].
For functional evaluation of the variant, we did not find clear evidence supporting haploinsufficiency in the fibroblasts at basal condition. However, CTSB activity is also context-dependent, and the impact of the variant may be different in neurons or other models. Thus, a more thorough analysis of the CTSB variant in relevant cells of the brain and/or under disease-relevant stress conditions may be needed to uncover potential functional deficits. Lysosomal dysfunction has a well-established role in the pathogenesis of PD. Currently, there are several ongoing clinical trials targeting this pathway (for example NCT02914366, NCT04127578). Therefore, further analysis of lysosomal dysfunction in PD may result in the development of potential new therapeutic targets.

Clinical Examination
Four members of mixed origins, a Polish-German family (Figure 1) with typical PD from southern Poland, were recruited from the Department of Neurology in the Faculty of Health Science of the Medical University of Warsaw in Warsaw, Poland. The clinical diagnosis of PD was evaluated using the UK Parkinson's Disease Society Brain Bank clinical diagnostic criteria at the time of specimen collection by two neurologists (DK and LM). A blood sample and a single 3 mm skin punch biopsy were collected.

Exome Sequencing in Sib-Pairs
Whole-exome sequencing (SureSelect Human All Exon v6 enrichment, Illumina No-vaSeq 6000 platform, annotations according to Department of Medical Genetics, Institute of Mother and Child pipeline, VEP2.7) was performed on both affected family members [31]. Golden Helix SNP and Variation Suite (SVS) was used to annotate variants and identify shared variants between sib-pairs. Shared variants were filtered for coding variants excluding synonymous, potential splicing, and gnomAD European Non-Finnish minor allele frequency (MAF) < 0.01. The cosegregation of nominated variants was confirmed with Sanger sequencing in affected and nonaffected family members. The study design is summarized in Figure 2

Replication Cohorts and Genotyping
Genotyping was performed in 5 different cohorts (2077 PD cases and 615 controls) collected from independent sites ( Table 2). All individuals were of European Non-Finnish descent. The study was approved by the ethical review board from each institution and all participants provided informed written consent. A custom Applied Biosystems Taqman SNP Genotyping Assay (Thermo Fisher Scientific, Waltham, MA, USA) was designed for CTSB c.851G > T p.Gly284Val (NM_001908). Genotyping was performed using a QuantStudio 7 Real-Time PCR system. QuantStudio™ 7 Real-Time PCR Software (Thermo Fisher Scientific, Waltham, MA, USA) was used for analysis.

Gene Burden Analysis
The Mayo Clinic biobank control cohort consists of 885 unrelated samples of European Caucasian descent with no history of neurologic disease. The average age was 57 ± 15 (range 20-96) years with 438 (49.5%) males. The Mayo Clinic Florida familial PD cohort consists of 769 unrelated patients of European Caucasian descent diagnosed with PD and with a family history of PD [32]. The average age was 59 ± 18 (range 23-91) years with 462 (60%) males. All participants provided informed written consent prior to the commencement of this study. A gene burden analysis using SKAT was performed on rare (European, non-Finnish, MAF < 0.01) nonsynonymous variants with a CADD score greater than 20 within the CTSB gene (n = 16). Gender and age were used as covariates.

Protein Sequence and Structure Analysis
A multiple sequence alignment of human CTSB with a series of homologs from different species was performed in T-coffee Expresso [33,34]. The output was processed in ESPript 3 [35] to produce a structure-informed alignment. To visualize the location of Gly284, the crystal structure of human CTSB (PDB: 6AY2) [36], obtained via X-ray diffraction, was chosen for analysis in CCP4mg [37] (Figure 3).

Generation of Fibroblasts and Cell Culture
In an attempt to confirm the pathogenicity of CTSB p.Gly284Val, we compared the CTSB expression and activity using fibroblasts derived from patients and controls with matched age, APOE, and MAPT genotypes. Three independent experiments were performed for each functional analysis.

RNA Extraction and qRT-PCR
Total RNA isolation was performed using TRIzol ® Reagent by Life Technologies (Carlsbad, CA, USA). Frozen tissue (55 mg) was homogenized in 1 mL of TRIzol reagent using a pestle and incubated for 5 min at room temperature. The 0.2 mL of chloroform was added, the sample was capped, mixed vigorously by hand for 15 s, and incubated at room temperature for 3 min. The samples were centrifuged at 12,000× g for 15 min at 4 • C. Following centrifugation, the aqueous phase was transferred to a fresh tube containing 0.5 mL of isopropyl alcohol and incubated at room temperature for 10 min followed by centrifugation at 12,000× g for 10 min at 4 • C. The pellet was washed once with 1 mL of 75% ethanol and centrifuged at 7500× g for minutes at 4 • C. The RNA pellet was left to air dry briefly and reconstituted using Nuclease-Free water. RNA quality was assessed using Agilent RNA 6000 Nano kit (Agilent Technologies, Waldbronn, Germany). The Hight-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Cheshire, UK) was used to convert total RNA to single-stranded cDNA. QuantStudio™ Real-Time PCR Software (Thermo Fisher Scientific, Waltham, MA, USA) was used for the gene expression analysis (Taqman ® Gene Expression Assay, Probe ID Hs05518041_s1, FAM).

CTSB Enzymatic Activity Assay
The CTSB activity was compared between patient-derived fibroblasts and control fibroblasts using a commercially available CTSB activity kit assay (Abcam, Cambridge, UK, ab65300). The harvested cell pellets were resuspended in 50 µL of chilled Cell Lysis Buffer and incubated on ice for 30 min. Then, the samples were centrifuged at 4 • C for 4 min and the supernatants were collected. The protein concentration was determined by the BCA method. To measure the CTSB activity, cell lysates containing10 ug of proteins were loaded into a 96-well plate and incubated at 37 • C with 2 µL of 10 mM CB Substrate Ac-RR-AFC (Abcam, Cambridge, UK) and 50 µL of the activity assay buffer for 2 h. Fluorescence signals in the plate were measured by a 2104 EnVision ® Multilabel Plate Reader (PerkinElmer, Inc., Waltham, MA, USA) at Ex/Em 400/505 nm.