Absolute Quantification of Pan-Cancer Plasma Proteomes Reveals Unique Signature in Multiple Myeloma

Simple Summary A precise mass spectrometry-based method was utilized to study proteins in the blood samples of over a thousand cancer patients. By accurately identifying and measuring protein levels using mass spectrometry, we focused on multiple myeloma and found potential markers for diagnosing the disease. These markers, including the complement C1 complex, JCHAIN, and CD5L, were combined in a prediction model with high accuracy for identifying multiple myeloma patients. Our findings could significantly impact cancer research by improving diagnostic tools. Abstract Mass spectrometry based on data-independent acquisition (DIA) has developed into a powerful quantitative tool with a variety of implications, including precision medicine. Combined with stable isotope recombinant protein standards, this strategy provides confident protein identification and precise quantification on an absolute scale. Here, we describe a comprehensive targeted proteomics approach to profile a pan-cancer cohort consisting of 1800 blood plasma samples representing 15 different cancer types. We successfully performed an absolute quantification of 253 proteins in multiplex. The assay had low intra-assay variability with a coefficient of variation below 20% (CV = 17.2%) for a total of 1013 peptides quantified across almost two thousand injections. This study identified a potential biomarker panel of seven protein targets for the diagnosis of multiple myeloma patients using differential expression analysis and machine learning. The combination of markers, including the complement C1 complex, JCHAIN, and CD5L, resulted in a prediction model with an AUC of 0.96 for the identification of multiple myeloma patients across various cancer patients. All these proteins are known to interact with immunoglobulins.


Introduction
Cancer accounts for more than ten million deaths worldwide and is considered the second most common cause of mortality today. It consists of more than 200 different subgroups, making it a very heterogeneous disease. However, disease progression typically follows the same trajectory in diverse cancers. These general features of the disease can be described as events in which cells undergo dysregulated and autonomous growth, eventually disrespecting tissue boundaries, leading to metastasis [1]. Molecular events resulting from alterations in the genome of cancer cells are critical components for identifying cancer type-specific biomarkers and have been studied and mapped for several decades [2][3][4]. Recent advances in high-throughput sequencing technologies have enabled large-scale efforts to accurately map genomic alterations across many different types and subtypes of 128 (51%) were actively secreted proteins in blood according to the annotation of the human secretome [19]. To assess the performance and reproducibility of the targeted assay, we determined the intra-assay variation using pool samples randomly distributed onto all plates. The median intra-assay coefficient of variation (CV) was between 7.16 and 20% CVs (median CV = 11.3%) per plate and 17.2% CV across all 31 plates. The pool samples across all plates were highly correlated with a median Pearson's r of 0.99 ( Figure 2B). The overall biological variation between all subjects was low, with a median normalized IQR = 1.7 ( Figure 2C). This signifies that even with the large variety of cancer types, most of the absolute protein variation was less than two-fold when compared between different cancers. However, two proteins with interindividual variation were observed. Those were pregnancy zone protein (PZP) and apolipoprotein(a) (LPA). The large difference in PZP was observed due to its 10-fold higher concentration in the female population compared to males. The LPA variability was caused by its quantification peptide overlapping with a repeated kringle domain whose count is dependent on the individual's genotype, as reported in [20]. Overall, it was possible to report absolute concentration measurements of plasma proteins with low bias with respect to their levels.

Investigated Targets and Analytical Performance
An assay covering 1013 peptides from 253 proteins was established using spiked SIS-PrESTs in a plasma background. In total, 146 proteins were absolutely quantified spanning more than six orders of magnitude in concentration range ( Figure S1) with 395 SIS-PrEST cancers. However, two proteins with interindividual variation were observed. Those were pregnancy zone protein (PZP) and apolipoprotein(a) (LPA). The large difference in PZP was observed due to its 10-fold higher concentration in the female population compared to males. The LPA variability was caused by its quantification peptide overlapping with a repeated kringle domain whose count is dependent on the individual's genotype, as reported in [20]. Overall, it was possible to report absolute concentration measurements of plasma proteins with low bias with respect to their levels.

Identification of Potential Biomarkers
Differential expression analysis was performed comparing each cancer type against the rest on the peptide level ( Figure 3A). In cases of male-and female-specific cancers, only the patients of the relevant sexes were compared. We could observe that in cases where proteins were quantified with multiple peptides, all of them were significantly downregulated or upregulated. Here, cases such as the platelet basic protein (PPBP) in acute myeloid leukemia, which was previously identified as a dysregulated hub gene, can be highlighted [21]. Other examples are C1qB and C1qC, for which four peptides were identified as significantly downregulated in MM. All the targets identified as differentially expressed have been summarized in a protein network ( Figure 3B). In further investigation, we focused on the unique protein pattern of MM patients, which formed an isolated island of four proteins, all part of the complement C1 complex. downregulated or upregulated. Here, cases such as the platelet basic protein (PPBP) in acute myeloid leukemia, which was previously identified as a dysregulated hub gene, can be highlighted [21]. Other example are C1qB and C1qC, for which four peptides were identified as significantly downregulated in MM. All the targets identified as differentially expressed have been summarized in a protein network ( Figure 3B). In further investigation, we focused on the unique protein pattern of MM patients, which formed an isolated island of four proteins, all part of the complement C1 complex. .

Downregulation of Components of the C1 Complex in Multiple Myeloma
Multiple myeloma, characterized by its heterogeneous nature as a hematologic malignancy, displays dysregulation within the plasma proteome due to the accumulation of plasma cells within the bone marrow, ultimately displacing healthy blood cells. In the context of this study, a profound alteration in the plasma proteome of multiple myeloma patients was observed in the label free MS data acquired alongside the absolute quantification MS data. Notably, this alteration stemmed from a widespread reduction in peptides linked to IGHM and IGHA1 as seen in the volcano plot ( Figure S2A). The label free data also showed a couple of non-IgG related proteins that undergo dysregulation in multiple myeloma, among them being albumin and APOA1, aligning with established literature [22].
The diagnosis of multiple myeloma relies on the detection of a monoclonal spike (M spike), often originating from lambda or kappa light variable chains detected by electrophoresis. Interestingly, the kappa variable chain KV37 exhibited the most substantial surge, accompanied by a fold change of 82.6. However, it is essential to note that this elevation of a singular light chain was not universally present among all patients, and did therefore not reach above the statistical threshold.
Differential expression analysis revealed a significant decrease in plasma levels of four components of the complement C1 complex in MM patients ( Figure 4A), namely the proteins C1qB, C1qC, C1r and C1s. Interestingly, this observation was specific to MM and was not detected in any other cancer, including the three immune cell malignancies: lymphoma, acute myeloid leukemia (AML), and chronic lymphocytic leukemia (CLL). We could observe this effect in our label free data as well ( Figure S2B).

Decreased JCHAIN and CD5L Plasma Levels Distinguish Multiple Myeloma
To further identify the unique patterns within the plasma proteome of MM patients we trained a model based on a random forest algorithm to predict disease outcome. We could distinguish MM patients from all other cancer diagnoses with high confidence based on their plasma protein signature (AUC = 0.96) ( Figure 5A). Here, the model identified the downregulation of JCHAIN and CD5L plasma as the most powerful proteins to separate MM from 14 other cancer types ( Figure 5B). Further proteins that defined the plasma protein signature of MM in our study were the previously described decreased levels of complement proteins C1q, C1r, and C1s as well as upregulated TGFBI, CFD, and MGP and downregulated CBPN ( Figure S2). Eight out of the nine of these proteins are linked to the regulation of the complement system and interaction with immunoglobulins. As a joining chain, JCHAIN connects the Fc regions of IgM and IgA and is necessary for the transport of these polymeric immunoglobulins across epithelial cells [25]. JCHAINnegative IgM has been reported to induce a stronger complement activation than JCHAIN-positive IgM [25][26][27].
As quantitative information was not available on either IgM or IgA in the analyzed patients, it was not possible to specify whether decreased JCHAIN levels were accompanied by decreased IgM levels. However, we also found the circulating protein CD5L, also called apoptosis inhibitor of macrophage, to be downregulated. CD5L has been reported be an integral part of IgM and binds to the Fc region of IgM and utilizes immunoglobulin The C1 complex, as part of the innate immune system, initializes the classical complement pathway activation. It consists of five components, C1q built up from C1qA (not quantified), C1qB and C1qC [23]. Further components are peptidase C1r and serine protease C1s. Complement activation occurs after the binding of the globular domain of C1q to target molecules, including IgM and IgG ( Figure 4B). The binding of C1q initializes the activation of C1r, which in turn leads to C1s activation. The activated C1s initializes the following proteolytic complement cascade, leading to cell lysis, the activation of phagocytes, and the induction of inflammation [23]. The complement system and its activation or suppression have been related to pro-as well as anti-tumoral effects in a wide variety of cancers [24].

Decreased JCHAIN and CD5L Plasma Levels Distinguish Multiple Myeloma
To further identify the unique patterns within the plasma proteome of MM patients we trained a model based on a random forest algorithm to predict disease outcome. We could distinguish MM patients from all other cancer diagnoses with high confidence based on their plasma protein signature (AUC = 0.96) ( Figure 5A). Here, the model identified the downregulation of JCHAIN and CD5L plasma as the most powerful proteins to separate MM from 14 other cancer types ( Figure 5B). Further proteins that defined the plasma protein signature of MM in our study were the previously described decreased levels of complement proteins C1q, C1r, and C1s as well as upregulated TGFBI, CFD, and MGP and downregulated CBPN ( Figure S2). Eight out of the nine of these proteins are linked to the regulation of the complement system and interaction with immunoglobulins. As a joining chain, JCHAIN connects the Fc regions of IgM and IgA and is necessary for the transport of these polymeric immunoglobulins across epithelial cells [25]. JCHAIN-negative IgM has been reported to induce a stronger complement activation than JCHAIN-positive IgM [25][26][27]. as a carrier which prevents its renal excretion [28][29][30]. We found CD5L and JCHAIN to be highly dependent ( Figure S4) as recently shown by Oskam and coworkers [31]. As for TGFBI, CFD, and MGP, MM patients displayed the highest median plasma concentration in comparison to the other cancer patients. TGFBI has been reported to be both tumorsuppressive as well as tumor-promoting in multiple cancers depending on the cancer progression [32].

Discussion
Cancer is the second-highest cause of mortality in the world. Improved methods that can detect changes in cancer-associated proteins are needed. Our study presents a large pan-cancer initiative in which proteins were absolutely quantified in human plasma using SIS-PrESTs technology. The analytical strategy based on SIS-PrESTs was capable of quantifying proteins with high precision, as they are long polypeptides added as the first step of sample preparation. Therefore, the biological variance across protein targets could be accurately measured. A disadvantage of using internal standards for quantification lies in the fact that they have to be spiked prior to the sample preparation and DIA analysis. Therefore, the availability of targets for absolute quantification can be a limiting factor. However, the DIA strategy allows to acquire all detectable proteins in samples, which can be used to explore the label-free part of the dataset to identify proteins without the precision of absolute quantification. Furthermore, the analytical sensitivity of today's non-depleted targeted proteomics measurements is limiting. Today's data acquisition of bloodbased tests is comprehensive and has a promising quantitative performance, but the assay is restricted by the dynamic range of plasma, limiting its full potential. Notably, more than 56 FDA-or CLIA-approved biomarkers could be measured in multiplex using this strategy, and by not depleting the plasma, the quantitative integrity of the samples can be assured. This shows that targeted proteomics is an attractive alternative to more sensitive methods based on affinity reagents.
Within this study, we identified proteins that are implicated in the regulation of complement activation and interaction with immunoglobulins and which we suggest as a plasma biomarker panel for MM. These target proteins include JCHAIN, CD5L and four proteins of the C1 complex. Notably, the protein most predictive for MM in the random forest model was JCHAIN, which links two monomer units of either IgM or IgA together. In the case of IgM, the JCHAIN dimerizes and acts as a nucleating unit for the IgM pentamer. The work of Wang et al. [33] has shown that the CD5L loss turns safe Th17 cells into pathogenic cells, causing autoimmunity. By altering lipids, CD5L affects Rorγt, the As quantitative information was not available on either IgM or IgA in the analyzed patients, it was not possible to specify whether decreased JCHAIN levels were accompanied by decreased IgM levels. However, we also found the circulating protein CD5L, also called apoptosis inhibitor of macrophage, to be downregulated. CD5L has been reported be an integral part of IgM and binds to the Fc region of IgM and utilizes immunoglobulin as a carrier which prevents its renal excretion [28][29][30]. We found CD5L and JCHAIN to be highly dependent ( Figure S4) as recently shown by Oskam and coworkers [31]. As for TGFBI, CFD, and MGP, MM patients displayed the highest median plasma concentration in comparison to the other cancer patients. TGFBI has been reported to be both tumor-suppressive as well as tumor-promoting in multiple cancers depending on the cancer progression [32].

Discussion
Cancer is the second-highest cause of mortality in the world. Improved methods that can detect changes in cancer-associated proteins are needed. Our study presents a large pan-cancer initiative in which proteins were absolutely quantified in human plasma using SIS-PrESTs technology. The analytical strategy based on SIS-PrESTs was capable of quantifying proteins with high precision, as they are long polypeptides added as the first step of sample preparation. Therefore, the biological variance across protein targets could be accurately measured. A disadvantage of using internal standards for quantification lies in the fact that they have to be spiked prior to the sample preparation and DIA analysis. Therefore, the availability of targets for absolute quantification can be a limiting factor. However, the DIA strategy allows to acquire all detectable proteins in samples, which can be used to explore the label-free part of the dataset to identify proteins without the precision of absolute quantification. Furthermore, the analytical sensitivity of today's non-depleted targeted proteomics measurements is limiting. Today's data acquisition of blood-based tests is comprehensive and has a promising quantitative performance, but the assay is restricted by the dynamic range of plasma, limiting its full potential. Notably, more than 56 FDA-or CLIA-approved biomarkers could be measured in multiplex using this strategy, and by not depleting the plasma, the quantitative integrity of the samples can be assured. This shows that targeted proteomics is an attractive alternative to more sensitive methods based on affinity reagents.
Within this study, we identified proteins that are implicated in the regulation of complement activation and interaction with immunoglobulins and which we suggest as a plasma biomarker panel for MM. These target proteins include JCHAIN, CD5L and four proteins of the C1 complex. Notably, the protein most predictive for MM in the random forest model was JCHAIN, which links two monomer units of either IgM or IgA together. In the case of IgM, the JCHAIN dimerizes and acts as a nucleating unit for the IgM pentamer. The work of Wang et al. [33] has shown that the CD5L loss turns safe Th17 cells into pathogenic cells, causing autoimmunity. By altering lipids, CD5L affects Rorγt, the master regulator, shifting the immune balance. As CD5L is a major switch of Th17 cell functional states in vivo, this may indicate that Th17 cell functions are dysregulated in Myeloma Patients.
The role of the complement system and its components as potential biomarkers for cancer has been debated in the literature. Here, the role of C1 in cancer has been a double-edged sword. In clear-cell renal-cell carcinoma, the tumor-induced formation of the complement C1 complex in its microenvironment has been described [24]. In contrast, the protein C1q has previously been related to pro-apoptotic and anti-tumor activities in prostate, breast or ovarian cancers [34][35][36]. Furthermore, decreased serum levels of C1q have been described in patients with MM and have been suggested as a potential biomarker for the tumor burden [37]. The systemic decrease of C1q levels in plasma highlighted in our study supports previous findings. Furthermore, we not only observed a decrease of C1q but also of C1r and C1s, suggesting a downregulation of all the proteins forming the C1 complex. Interestingly, this decrease does not extend to other complement proteins, which highlight the proteins related to the C1 complex as possible biomarkers in MM.
The level of TGFBI has been suggested as a biomarker for tumor progression [32]. Another upregulated protein, CFD, is part of the alternative complement pathway. CFD is a serine protease, which cleaves factor B to form C3-convertase [38]. Whereas decreased CFD levels have been reported in obesity [39], there are scant reports of its direct involvement in cancer. CFD has been suggested as a biomarker for cutaneous squamous-cell carcinoma [40]. Matrix Gla protein (MGP) has been connected to the inhibition of calcification and there is evidence of its relation to the progression of different cancers [41]. Finally, Carboxypeptidase N catalytic chain (CPN1) is part of the Carboxypeptidase N complex, which has been shown to lead to the inactivation of C3a, C4a, and C5a [42][43][44]. CPN has been suggested as a prognostic biomarker in breast cancer and it has been reported that MM patients sensitive to bortezomib treatment have lower CPN levels [45,46]. Therefore, we suggest that these four proteins might not be unique identifiers for the classification of MM patients. However, the overall protein levels of these targets provide an interplay that is unique for MM and requires further investigation. Yet, it must be noted that the majority of these proteins interact with the complement cascade and immunoglobulins.
In conclusion, we describe a targeted proteomics approach capable of measuring hundreds of proteins with their concentrations reported on an absolute scale. This multiplex approach provides a complementary strategy to standardized clinical assays and provides the absolute concentrations of a large number of plasma proteins. Here, we show that this approach can be used with liquid biopsies to identify protein targets which are unique for the detection of multiple myeloma patients.

Ethical Statement
The research adheres to all pertinent ethical guidelines. This pan-cancer study received approval from the Swedish Ethical Review Authority (EPM dnr 2019-00222) and aligned with donor consents in U-CAN (28631533, EPN Uppsala 2010-198 with amendments), with all participants providing written informed consent. The study protocol is in accordance with the ethical principles outlined in the 1975 Declaration of Helsinki.

Cohort
A sample cohort consisting of blood plasma from 1800 cancer patients was provided by the biobank of the Uppsala-Umeå Comprehensive Cancer Consortium (UCAN). The samples were collected following the same protocol. Briefly, blood was collected by venipuncture in 6 mL EDTA tubes (Vacuette Cat. no.456243, Greiner-bio One; Kremsmünster, Austria) and centrifuged at 3000 rcf at room temperature (RT) immediately after sample collection. Plasma was transferred to 0.5 mL tubes and frozen and stored at −80 • C. The plasma samples were fully randomized into thirty-three 96-well plates and deidentified.

Sample Preparation
A set of 276 absolutely quantified SIS-PrESTs of 276 proteins was pooled at close-toendogenous levels in healthy blood plasma (Table S1) [47]. In brief, the matrix from 3 layers of Empore C18 disks (Supelco, Sigma Aldrich, St. Louis, MI, USA) was activated with 100% acetonitrile (ACN) and equilibrated with 0.1% TFA. The digest was loaded into the StageTip and washed twice with 80 µL of 0.1% TFA and eluted twice with 30 µL 80% can and 0.1% formic acid (FA). The StageTips were centrifuged for 2 min at 1000 rcf after each addition. Eluted peptides were vacuum dried at 45 • C for 30 min. Prior to analysis, samples were dissolved in Solvent A (3% ACN, 0.1% FA). The samples were processed in batches of two to four plates per digestion.

Mass Spectrometry Analysis
Peptides were quantified in an online system of Ultimate 3000 (Thermo Fisher Scientific, Santa Clara, CA, USA) LC connected to QExactive HF (Thermo Fisher Scientific, Santa Clara, CA, USA) MS. A sample corresponding to 2 ug of raw plasma was loaded onto a trap column (PN 164535, Thermo Fisher Scientific, Santa Clara, CA, USA) and washed for 3 min at 7 µL/min with 100% Solvent A. The peptides were separated on an analytical column (PN ES902, Thermo Fisher Scientific, Santa Clara, CA, USA) using a 40 min linear gradient of 1-32% Solvent B (95% ACN, 0.1% FA) at 0.7 µL/min. The columns were washed with 3 two-minute seesaw gradients of 1-99% Solvent B and equilibrated for 9 min. MS analysis was performed using a DIA method with cycles consisting of a full MS scan (30,000 resolution, AGC = 3 × 10 6 , 300-1200 m/z, IT = 105 ms) followed by 30 DIA scans (30,000 resolution, AGC = 1 × 10 6 , NCE = 26, 10 m/z isolation window, IT = 55 ms).

Absolute Quantification
A list of proteotypic peptides was generated by in silico digestion of the fasta file including the amino acid sequences of all 276 spiked-in SIS-PrESTs using EncyclopeDIA (ver. 1.2.2) [48] and whole human proteome as background (Homo Sapiens, UniProt ID: #UP000009606, 20,370 entries, accessed on 26 October 2020). One missed cleavage was allowed, and other parameters were adjusted according to the MS method. A spectral library was generated for all peptides using a Prosit machine learning algorithm [49]. The background proteome and the first 6 analyzed raw files were imported into Skyline (ver. 20.2.0.286) [50] and the peaks were manually inspected. Peptides in which both light and heavy signals were not detected were deleted together with interfering transitions. Peptide retention times were predicted by an indexed retention time library which included the 12 most intensive APOA1 peptides. A so prepared Skyline file was used to import all of the resulting raw files using 3 min windows of predicted peptide retention time with mass accuracy set to 5 ppm. Data for both light and heavy signals were exported for further analysis.
Exported results were imported into RStudio (ver. 1.4.1717). First, the data were filtered to contain only quantified peptides (rdotp > 0.7, dotp > 0.5, 1000 > ratio to standards > 0.01). Samples that the failed APOA1 iRT regression or had fewer than 120 proteins quantified were excluded from the analysis. Additionally, peptides with a quantification rate of less than 50% across all the samples were excluded. Non-paired transitions were filtered out and the heavy to light ratio was calculated from the summed AUCs of transitions present in both light and heavy channels. Further, the data were median-normalized using the pool samples.

Label-Free Data Extraction
Label-free data was extracted using EncyclopeDIA. First, mzML files were generated from raw files using msConvert within ProteoWizard [51] followed by EncyclopeDIA [52] search against a spectral library generated from list of blood plasma proteins with Prosit integrated into ProteomicsDB [49]. A whole human proteome (Homo Sapiens UniProt ID: # UP000005640, reviewed, 20,371 entries, accessed 11 August 2021) was used as a background proteome.

Disease Prediction
A random forest prediction model was built aiming to classify multiple myeloma patients based on peptide levels in plasma. First, peptide levels for missing values were imputed using the impute.knn function from the impute R package (ver. 1.64.0). The model was built using the train function in the caret R package (ver. 6.0.90) using 70% of the cohort and 5-fold cross validation. The model was tested on the remaining 30% and specificity, sensitivity, and AUC scores were summarized in a receiver operating characteristic (ROC) curve.

Conclusions
In conclusion, our study pioneers a targeted proteomics approach with a precise quantification of hundreds of plasma proteins, offering a complementary strategy to standard clinical assays. We identified potential biomarkers, including JCHAIN, C1 complex proteins, and others, for multiple myeloma detection. This work underscores the promise of targeted proteomics in cancer diagnostics and biomarker discovery.