Proteomics Analysis of Lymphoblastoid Cell Lines from Patients with Amyotrophic Lateral Sclerosis

Amyotrophic lateral sclerosis (ALS) consists of the progressive degeneration of motor neurons, caused by poorly understood mechanisms for which there is no cure. Some of the cellular perturbations associated with ALS can be detected in peripheral cells, including lymphocytes from blood. A related cell system that is very suitable for research consists of human lymphoblastoid cell lines (LCLs), which are immortalized lymphocytes. LCLs that can be easily expanded in culture and can be maintained for long periods as stable cultures. We investigated, on a small set of LCLs, if a proteomics analysis using liquid chromatography followed by tandem mass spectrometry reveals proteins that are differentially present in ALS versus healthy controls. We found that individual proteins, the cellular and molecular pathways in which these proteins participate, are detected as differentially present in the ALS samples. Some of these proteins and pathways are already known to be perturbed in ALS, while others are new and present interest for further investigations. These observations suggest that a more detailed proteomics analysis of LCLs, using a larger number of samples, represents a promising approach for investigating ALS mechanisms and to search for therapeutic agents. Proteomics data are available via ProteomeXchange with identifier PXD040240.


Introduction
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease also known as Lou Gehrig's disease [1,2]. It manifests by the gradual loss of upper and lower motor neurons in the motor cortex, the brain stem nuclei, and the anterior horn of the spinal cord [1,2]. The incidence of ALS is 1-2.6 per 100,000 people per year, the point of prevalence is 3-6 per 100,000 people in Europe and the United States, and the lifetime risk is approx. 1 in 300 [3,4]. The prognosis for survival is 2 to 5 years [3]. There are two types of ALS, differentiated by the genetic background: inherited (familial) and sporadic. The sporadic form represents approx. 90% of all cases [4,5]. The causal factors of sporadic ALS are currently not known. There is no cure or effective treatment for ALS. The treatment consists of multidisciplinary care, including nutritional and respiratory support and symptom management [2,6].
Studying the pathological processes of ALS in the affected cells of alive patients is difficult. The pathological changes in ALS affect the motor neurons, which are the neurons that control the muscles. The death of these neurons leads to general progressive paralysis, which is invariably fatal. Human motor neurons are not accessible for biochemical or cellular investigations, except from deceased patients. Post-mortem neurons display processes that occur in the advanced stage of the disease, which is not very informative about how the pathology is initiated. Moreover, these neurons are affected by the delay between death and the time when the cells become available for analysis. This delay produces damages that make the disease-specific changes more difficult to detect.

Results and Discussion
Differentially present proteins. The presence of proteins differentially present in the ALS vs. the healthy donors was investigated using the statistical packages for proteomics: Scaffold, NormalizeR [25], ProDA [26], and ROTS [27].
The Scaffold package includes two tests: Student's t-test and Fisher's exact test. Fisher's exact test performed with the Benjamini-Hochberg adjustment for multiple testing and a threshold of p < 0.05 identified 223 proteins that had p values below the critical value reported by the test, 0.0178. We excluded from this list the proteins that had less than two non-zero value in the eight samples, and the proteins that showed a difference between the average of ALS and the control sets of less than 150%, corresponding to a log2 ratio smaller than +/− 0.854. Under these conditions, 120 proteins were differentially present, of which 52 were downregulated in ALS and 68 were upregulated (Table 1).  Some of these proteins have been reported to be dysregulated in ALS. For instance, among the most 20 downregulated proteins, 4 are known to be downregulated in ALS: the drebrin-like protein DBNL [28], the calcyclin-binding protein (CacyBP) [29], and the UV excision repair protein RAD23 homolog B (R23B) [30]. The fructose-bisphosphate aldolase A (ALDO A) is downregulated in rapidly progressing ALS [31].
Among the 20 most upregulated proteins in ALS, the trifunctional purine biosynthetic protein adenosine-3 (glycyl-tRNA synthetase (GART) enzyme) has a potential connection with ALS: a dominant mutation in this protein causes a toxic gain-of-function, which causes peripheral neuropathy that principally affects the upper limbs [32].
If the proteins detected by us as differentially expressed are indeed perturbed in ALS, and are not just statistical noise, proteins known to be perturbed in ALS should be much less frequent among the proteins that are not differentially expressed in our samples. This is indeed the case: among 25 proteins that are not differentially expressed (having ratios of ALS/healthy between 0.7 and 1.3), none was reported to have differential expression in ALS. Compared to this frequency (0/25), the frequency of ALS-associated proteins among the downregulated proteins is 4/20 = 20%. The difference is statistically significant: as the test cannot compare zero frequency, we compare the immediate higher frequency, 1/25, with 4/20, which leads to p(Exp-Obs>=3) = 0.007 by the binomial test, two-sided.
Other strategies frequently used for detecting differentially present proteins in proteomics data are based on the statistical package limma ("linear models for microarray data"). Limma was initially developed for DNA microarray analysis and subsequently used extensively for the analysis of RNA-seq data [33]. More recently, limma was applied to the analysis of proteomics data [31,34]. We analyzed our data using three packages based on limma. We submitted to these packages the same set of 120 differentially expressed proteins mentioned previously. NormalizerDE allows the differential expression analysis of liquid chromatographymass spectrometry data using the empirical Bayes limma approach [25]. The software was accessed online via a web server. The proteomics data across the eight samples were normalized using the CycLoess procedure available in this package. Subsequently the data were log2 transformed using the option available in the package. The analysis revealed three differentially expressed proteins (adj p < 0.1): transaldolase (ALDOA), phosphoribosyl-formyl-glycinamidine synthase (PFAS), and the ATP synthase, H+ transporting, mitochondrial F1 complex, alpha subunit 1 (ATP5F1A or ATP5A1). Two of these proteins, ALDOA and PFAS, were also detected as significantly differentially present by the Scaffold Fisher's exact test (Table 1). Among the three differentially expressed proteins, two are known to be downregulated in ALS: ALDOA [24] and ATP5F1A [35].
ROTs (reproducibility-optimized test statistic) is a bioconductor R package, which adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on the differential expression between the two groups [27]. ROTS has been applied successfully in a range of different studies from transcriptomics to proteomics, showing a competitive performance against other state-of-the-art methods. The file generated by Scaffold was normalized using the R function "normalize.quantiles" from BioConductor. The value 0.3 was added to all values to allow for the application of the log function to the initial zero values and the data were subjected to a log2 transformation, to correct the high skewness of the primary data. One protein was detected as significantly downregulated (FDR < 0.1), transaldolase (ALDOA) (average 36.6 in controls versus 0 in ALS). As mentioned, ALDOA is known to be downregulated in ALS.
proDA (inference of protein differential abundance by probabilistic dropout analysis) is a recently introduced method for identifying differentially abundant proteins in label-free mass spectrometry, which boosts statistical power for small sample sizes by using variance moderation [26]. The method is implemented as an open-source R package [26] as the method requires log2 transformed data, 0.3 was added to all the values. Two proteins are detected as significantly differentially expressed (p < 0.1). The first is gamma enolase (ENO2), p-adj = 0.0032, average 23.9 in controls vs. 12.5 in ALS. ENO2 is detected as significantly downregulated in ALS by the Fisher's exact test in Scaffold. The second is HSPC108, downregulated in ALS (0.25 in ALS vs. 2.65 in controls), also detected as differentially expressed in Scaffold by Fisher's exact test. ENO2 levels are known to be elevated in the cerebrospinal fluid (CSF) of ALS patients [36], in the CSF, and serum in patients with a cervico-thoracic form of ALS and in patients with a disease duration from 1 to 4 years [37].
Pathway analysis. Ingenuity pathway analysis (IPA). The list of differentially expressed proteins (Table 1) was subjected to the QIAGEN IPA package (https://digitalinsights. qiagen.com/IPA [38], accessed on 15 September 2022. The top 20 most statistically enriched pathways are listed in Table 2. Perturbations in 6 of these 20 pathways (35%) have been reported to occur in ALS, as detailed below. The pathway that presents the most statistically significant enrichment is the "EIF2 pathway". EIF2α is key factor for the initiation of protein translation. Recent studies suggest that endoplasmic reticulum stress may play a critical role in ALS pathogenesis through an altered regulation of the proteostasis, the cellular pathway-balancing protein synthesis, and degradation. EIF2α is a key factor in this process [39]. In response to proteotoxic stress, EIF2α is phosphorylated by MARK2, which is activated via phosphorylation in human patients with ALS [40]. Our differentially expressed protein set is enriched in components of the "Ran signaling pathway". Ran is a key effector of nucleocytoplasmictransport and its deficit could play an important role in the pathology of ALS [40][41][42]. The differential protein set is also enriched in members of the "Protein Ubiquitination pathway", the disruption of which is a widely accepted factor in ALS [43]. Another enriched pathway is the "Antigen presenting pathway", which was reported to modulate the age of onset of ALS [44]. The blood levels of the "Interferon-gamma (IFN-γ) pathway" present significant differences in ALS patients [45] and the components of this pathway are enriched in our differential protein set. Another pathway that presents enrichment is the "Actin Cytoskeleton Signaling", which contributes to motor neuron degeneration [46]. The modulation of actin polymerization affects nucleocytoplasmic transport in multiple forms of ALS [47].
Two other pathways associated with neuronal pathology are also in the top 20 of the enriched pathways: "Multiple Sclerosis Signaling Pathway" and "Neuro-inflammation Signaling Pathway". This observation supports the assumption that although LCLs are peripheral cells, some processes in these cells correlate with processes in neuronal cells, and therefore LCLs can be used to derive information about the pathology of the neuronal cells affected by the disease.
The last column in Table 2 ("Ratio") represents the ratio between the proteins that belong to the specified pathway and are differentially present in ALS, and the total number of the proteins in the pathway. The average ratio is 6.7% (range 2.1-21%). These percentages seem low, but it should be taken into account that only approx. 1300 proteins have been detected by us, which represent probably only approx. 10% of the total number of proteins expressed in LCLs. Based on this assumption, it can be extrapolated that if all the proteins would be detected, the "coverage" of the differential pathways would be on average approx. 67%, the majority of the proteins participating in the top 20 pathways.
pathfindR is an R package for the identification of enriched pathways based on protein-protein interaction networks [48]. pathfindR can identify relevant pathways which cannot be identified by other tools [48]. The package can use five protein interaction networks (Biogrid, STRING, GeneMania, IntAct, KEGG) and seven gene sets (GO-CC (cellular component), GO-BP (biological process), GO-MF (molecular function), GO-All (all combined), and KEGG, Biocarta, Reactome). These combinations detect a total number of 223 significantly enriched pathways (p < 0.05). The combination Biogrid-Reactome detects the highest number (119) of significantly enriched pathways.
The most frequent significantly enriched pathways are related to rRNA, ribosomes, protein translation (15 occurrences), antigen presentation (9), protein ubiquitination-protein degradation (6), and apoptosis regulation (9). Some of these perturbations have been detected previously in ALS. Relevant for the first category, the following perturbations in ALS has been reported: a profound destabilization of ribosomal and mitochondrial RNAs [49], perturbation in the ribosomal function [50], and increased ribosome numbers in axons as an early event [51]. Regarding the second category, protein homeostasis, ALS is associated with proteostasis collapse [43]. Central to the maintenance of proteostasis are the predominant protein degradation pathways, the ubiquitin-proteasome system (UPS), and the autophagy system.
Gene set enrichment analysis (GSEA) is a computational method that determines whether a set of genes shows statistically significant similarity with gene sets from a large collection that have certain characteristics, e.g., participate in specific cellular or molecular processes, or are perturbed in diseases or by various interventions, e.g., specific gene modifications or treatment with chemical agents [52]. GSEA was initially introduced for analysis of cDNA microarrays and was subsequently adapted for RNA-Seq data. Based on the correlation between the RNA and protein levels, the method could also provide information about protein sets, as illustrated below. Compared to other pathway identification methods, many of the gene sets available in GSEA sets are not pieced together from interactions between protein pairs or protein subsets but are the direct results of RNA-seq or microarray experiments, with no theoretical deductions regarding the pathway that may or may not be actually functioning in cells.
From the list of 614 proteins present in at least 2 samples, as generated by Scaffold and having associated NCBI gi tags, GSEA identified 484 proteins. The parameters used for analysis were: 1000 permutations, no collapse (i.e., data are used in the original format), and enrichment statistics-weighted, max size 500, min size 8, normalization mode: none. Eight types of cumulative gene sets are available in GSEA for comparison with the submitted set. The significance threshold used was p < 0.05 and the FDR (false discovery rate) was 0.25. Across all 8 sets, GSEA identifies 147 sets that are significantly enriched in genes present in our set of ALS differentially expressed genes. Six of these are clearly relevant for ALS, based on published information about processes perturbed in ALS, or interventions that could alleviate ALS. These enrichments are shown in Figure 1).  (a) "E2F1_UP.V1_UP" set, which contains genes up-regulated by everolimus. Everolimus, an inhibitor of the TORC pathway, is a derivative of rapamycin, which is an mTOR inhibitor. This signaling pathway is upregulated in ALS and reduces autophagy. Rapamycin has been evaluated for ALS therapy by the restoration of autophagy [53].
(b) "Rutella response to HGF_DN", this set contains genes induced by HGF (hepatocyte growth factor), which is currently explored as a potential therapy for ALS [54]. The implication is that the LCLs from ALS patients present a reduction in proteins whose restoration by HGF could treat ALS.
(c) and (d) "KEGG spliceosome" and "module 183 cancer RNA splicing". The RNA splicing process is known to be defective in ALS [55].
(e) "QI_Hypoxia", which contains HIF-induced genes, as detected by applying hypoxia to a prostate cancer cell line. Deficiency in HIF1-alpha signaling is known to occur in ALS, due to defective import of HIF1-alpha from cytoplasm in to the nucleus, which is necessary to trigger the induction of hypoxia-responsive genes [56,57].
(f) and (g) "GOBP positive regulation of protein localization to nucleus" and "GOBP regulation of protein localization to nucleus"; there is ample evidence that nucleocytoplasmictransport deficits could play an important role in the pathology of ALS [40].
In all the above cases, the enrichment occurs for genes that are downregulated in the ALS LCLs, and the direction of the perturbation is consistent with the pathological significance of the GSEA pathways. This observation further indicates that the perturbations detected in LCLs in our proteomics study correctly reflect the pathological processes that occur in ALS in human patients.
An interesting observation is that the proteins upregulated in our data do not appear to belong to any functional pathways. The explanation could be that the common feature of these proteins is not participation in some biological pathways, but common physical characteristics. For instance, these proteins could be more prone to folding errors, or more resistant to degradation by the unfolding protein response machinery, leading to the accumulation of these proteins, which confusingly appearing to be upregulated. Notably, such defect cannot be detected by RNA-seq or microarray studies, but can be revealed by proteomics, which detects protein fragments and does not depend on the folding state of the proteins.
In conclusion, the proteomics analysis LCLs derived from ALS donors versus controls revealed statistically significant differences in some proteins and pathways. A subset of these pathways has been previously reported to be perturbed in ALS, while others have not been previously linked to ALS. The latter observation suggests that proteomics of LCLs from ALS donors versus controls could be a useful strategy to derive new information about the ALS pathological processes and identify new markers that could be used for drug discovery.
Compared to primary cells freshly obtained from donors, LCLs offer a stable platform, which can be used repeatedly over long periods, to compare the effects of various interventions. Moreover, the cells can be expanded to large numbers, sufficient for drug discovery by high content screening of thousands of small compounds, or for genome-wide gene knockout of overexpression.
Apart from the statistical significance included in the software packages analyzed, the fact that proteins known to be perturbed in ALS are significantly more frequent among the proteins detected by the proteomics study to be differential, which increases the confidence that the proteomics analysis, although performed on a small number of samples, indeed detects proteins differentially expressed in ALS.
A new hypothesis supported by this study is that many proteins upregulated in ALS do not belong to specific upregulated pathways, but rather have similarities in either a higher propensity of failing to fold properly, or, if unfolded, they have a higher resistance to degradation by the unfolded protein response (UPR) system. Investigations aimed at finding such common characteristics, such as common motifs in the protein structures, could shed a new mechanistic light on ALS, and could suggest new therapies.
Overall, the study yielded promising results, which supports the utility of LCLs for ALS studies, and, in particular, the proteomics approach. Further studies using higher numbers of samples and other proteomics methods could be used to confirm and extend the reported observations.

Cell Lines
ALS and control lymphoblastoid cell lines (B-lymphocytes immortalized with the Epstein-Barr virus) [9,10] were obtained from NINDS Repository at Coriell Cell Biorepositories, Camden, New Jersey. In total, 4 cell lines originated from ALS donors (2 females and 2 males, 44 years old) and 4 lines were derived from healthy donors (2 females and 2 males, 41 to 47 years old). The cells were maintained in T25 flasks in a vertical position in 10 mL of RPMI 1640 medium containing 10% heat-inactivated fetal bovine serum (FBS), penicillin/streptomycin at 100 U/mL and 100 µg/mL in an incubator in 5% CO 2 at 37 • C.
Protein extraction. Each cell line was grown to 500,000 cells/mL in 2 T25 flasks with 10 mL of medium. The proteins from FBS were removed by placing the cells from each flask in 50 mL conical tubes, centrifugation at 1410 RPM for 10 min at 4 • C, followed by resuspension of the pellets in 40 mL of washing buffer (0.29 M mannitol, 10 mM Tris pH 7.4). The centrifugations were repeated 3 times. The last pellet was resuspended in 1 mL of cold lysis buffer (1 M NaOH) after adding to the pellet 10 µL of Protease and Phosphatase Inhibitors Cocktail (MSSAFE-1VL, Sigma Aldrich St. Louis, MO, USA)). The pellet was resuspended by repeated pipetting and incubated on ice for 30 min, followed by microfugation at 14,000 RPM for 20 min at 4 • C. The supernatant was collected and the protein concentration was measured using the BCA colorimetric assay kit (Pierce Cat. No. 23227). Samples of 250-300 ug of protein were flash-frozen in liquid nitrogen and shipped for mass spectrometry analysis.
Mass spectrometry. Sample pre-processing. The samples were subjected to desalting and buffer exchange using Amicon Ultra 0.5 mL 3 kDa centrifugal filters (Millipore, Burlington, MA, USA), as previously described [58]. The total volume was brought to 200 µL using HPLC-grade water and centrifuged for 10 min at 13,000 RPM. HPLC-grade water (100 µL) was added to each filter and centrifuged for another 5 min at 13,000 RPM. The volume left in the filter (70-100 µL) was pipetted repeatedly up and down. The filter was then placed upside down in a new 1.5 mL tube and centrifuged for 2 min at 3500 RPM. The proteins left in the filter were collected by placing 100 µL of HPLC water in the filter and repeating the transfer to the new 1.5 mL tube. The protein concentration was then measured using a Bradford assay. Volumes of solution containing 200 ug of protein were dried down completely in a SpeedVac and resuspended in 20 µL of 6 M urea, 100 mM of Tris buffer pH 7.8, and sonicated for 30 min. Each sample was reduced by adding 1 µL of a solution containing 200 mM of dithiothreitol (DTT) and 100 mM of Tris buffer, gently vortexed and incubated at room temperature (RT) for one hour. Subsequently, the samples were alkylated using 4 µL of the alkylating agent, 200 mM of iodoacetamide (IAA), and 100 mM of Tris buffer, then gently vortexed and incubated in the dark for an hour at RT. Each sample was reduced again by adding 4 µL of the same reducing agent as before. HPLC water (155 µL) was added to each sample to reduce the urea concentration and then 20 ug trypsin in 100 µL of 20 mM Tris was added and the samples were incubated for 16-18 h at 37 • C. The reaction was stopped by adding a drop to glacial acetic acid to bring the pH < 6, and the samples were dried completely in a SpeedVac. The dried samples are then resolubilized in 100 µL of 0.1% formic acid and sonicated for 15 min. For each sample, a volume of 50 µL of 50% acetonitrile and 0.1% formic acid was pushed through 1 mg top-tips (Glygen) 3 times into waste, followed by 3 washes using 0.1% formic acid. The samples were added to the top-tips, pushed into the same 1.5 mL tube, reapplied to the same top-tip, and pushed into the same tube. Then, 50 µL of 0.1% formic acid was pushed through the tip into waste twice. After that, 50 µL of elution buffer (50% acetonitrile with 0.1% formic acid) was added to the tip and pushed into a new 0.6 mL tube. The same 50 µL volume was then pushed through the tip a second time. A new volume of 50 µL elution buffer was added to the tip and pushed into the same 0.6 mL tube, and then pushed again through the tip. The total volume in the new 0.6 mL tube was 100 µL. The samples were then dried down completely and re-solubilized in 50 µL 2% acetonitrile and 0.1% formic acid.
Proteomics analysis. The samples were analyzed using a NanoAcquity UPLC coupled with a QTOF Xevo G2-XS mass spectrometer (both from Waters Corp, Milford, MA, USA). The peptides were loaded on a NanoAcquity BEH130 C18 1.7 µm reversed phase chromatography UPLC column (Waters, Milford, MA, USA), that was coupled to a fused silica nano-ESI emitter (363 µm OD × 20 µm ID × 6.25 cm length, New Objective, Littleton, MA, USA). The samples (3 µL, 12 µg protein) were injected onto the column followed by a linear gradient with a flow rate of 0.6 µL/min: 1-7% organic solvent B (acetonitrile with 0.1% formic acid) for 1-9 min, 7% B (9-15  Protein identification. The raw data were submitted ProteoWizard MS Convert to convert the input raw data to mzML readable data, and then the mzML data were further submitted to our in-house Mascot Daemon server (Matrix Science, London, UK) for database search. The following parameters were used: NCBI database (Homo sapiens) with a peptide tolerance of 1.3 Da and zero 13C, a MS/MS tolerance of 0.8 Da, trypsin enzyme, 3 maximum missed cleavages, a fixed modification of carbamidomethyl on a cysteine and a variable modification of oxidation on methionine.
The output of the database search (.dat files) were further analyzed by the software package Scaffold v. 4.3.4 (Proteome Software, Portland, OR, USA) for protein identification, label-free quantitation, and statistical analysis. The data files were then loaded and analyzed with the following options: protein threshold 90%, minimum number of peptides = 1; peptide threshold 80%; total spectrum count, no filter. The threshold for the minimum number of peptides was selected as 1, to avoid missing true positives. The caveat is obviously that the number of false positives is higher than for more stringent thresholds.
To determine the integrity of the data and proteins identified, we narrowed the thresholds for protein identification. The threshold used in this publication was a peptide threshold of 90%, minimum number of peptides as 1, and a peptide threshold of 80%. This had an FDR of 1.3% for proteins, and an FDR of 2.68% for spectra. To narrow this, we increased both the protein and peptide threshold as well as increased the minimum number of peptides identified. We did 3 different trials, each with a narrower threshold. The thresholds used were as follows: protein threshold of 90, min peptide of 1, and peptide threshold of 90 (90-1-90), a second with protein threshold of 90, min peptide of 2, and peptide threshold of 90 (90-2-90), and, finally, a protein threshold of 90, min peptide of 2, and peptide threshold of 95 (90-2-95). The first variation, 90-1-90, had an FDR of 1.3% for proteins and an FDR of 1.49% for spectra. The second, 90-2-90, had an FDR for proteins of 0.4% and an FDR for spectra of 1.46%. The final trial, 90-2-95, had and FDR of 0.3% for proteins and an FDR of 0.82% for spectra. When narrowed, we were able to identify almost all of the proteins in our tightest constraint, and others were still found, but were considered to be insignificant (p-value ≤ 0.05) under the specific conditions. Doing this confirmed that the proteins identified in this study are most likely true positives.
Proteins differentially expressed between the ALS and the control cells were assessed using Scaffold and the software packages NormalizeR [25], ProDA [26], and ROTS [27].
Author Contributions: A.R., C.C.D. and D.W. performed the experimental design; A.R. selected and prepared the cell lysates; D.W. and C.C.D. performed the mass spectrometry experiments and the primary data analysis; A.R. and E.B. computed the differential expression, the pathway analysis and the biological significance; A.R., C.C.D. and D.W. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.