2.1. Depth of the Proteome
Deep-proteomic profiling resulted in the identification (based on at least one unique peptide) of 8603 and 8330 unique proteins, at 1% protein false discovery rate, from KRASWT
tumors, respectively (Table S1
). We calculated the normalized spectral abundance factors (NSAF), developed by Washburn and coworkers [24
], to estimate the relative abundance of proteins within the tumors and to also allow a quantitative comparison (Table S1
). NSAF has been previously shown to produce robust quantitative spectral-counting data, even allowing the reliable estimation of copy numbers per protein if calibrated using absolute quantitative reference values within a sample [25
]. For both samples, we covered five orders of magnitude of dynamic range of the proteome (Figure 2
a) with complete coverage of important cancer pathways, including 458 transcription factors [27
], 318 kinases/kinase subunits [28
], 157 phosphatases/phosphatase subunits [29
], and 345 proteases [30
]. We detected 9 and 11 components of the mitochondrial import complexes TOM and TIM, respectively, including small proteins such as translocase of inner mitochondrial membrane 8 (TIMM8), TIMM9, TIMM10, TOMM5, translocase of outer mitochondrial membrane 6 (TOMM6), and TOMM7, demonstrating a good coverage of membrane proteins in general. Our data has a nearly complete coverage of important cancer signaling pathways, such as shown for the PI3K/AKT/mTOR pathway, which is the primary target in targeted CRC therapy (Figure 2
In addition, we identified a total of 4144 unique high-confidence (<1% FDR; phosphoRS > 0.9) phosphorylation sites from 4888 phosphopeptides, 763 of which were at least 4-fold differentially regulated between the two tumors, based on spectral counting. The differential phosphoproteins were significantly enriched in REACTOME [31
] pathways-for instance, those related to SUMOylation (4.45 × 10−11
), signaling by Rho GTPases (1.10 × 10−6
), and mRNA splicing (2.00 × 10−3
). Among the proteins showing increased levels of phosphorylation in the KRASG12V
sample were MEK and ERK, reflecting the expected activation of the RAS/MEK/ERK pathway upon the presence of the activating G12V-mutation in KRAS
in this tumor (Figure 2
2.2. Quantitative Comparison of KRASG12V and KRASWT Tumors
By nature, tumor samples are heterogenous. Even under controlled conditions where clear standard operating procedures (SOPs) are strictly followed for sampling and sample preparation (as was done for the tumors analyzed in-depth in this study), a considerable amount of preanalytical variability cannot be excluded. The reasons for this include differing levels of tumor-infiltrating immune cells, necrosis, or simply the tumor content. We therefore decided to compare the proteomes of the KRASWT
tumors with deep-proteome profiling data from 19 different human (primary) cells and tissues (Figure 2
c) which represented potential sources of contamination such as plasma, platelets, B-Cells, T-cells, connective and muscle tissue, and also HepaRG cells (liver stem cells), primary hepatocytes, as well as lymphocytes and megarkaryocytes as hematopoietic progenitor cells. We thus compared the expression patterns for a total of 9408 unique proteins identified in our tumor samples with these 19 different reference proteome samples (Table S2
). Importantly, Eucledian clustering showed that the CRC tumors represent a distinct cluster with the nearest neighboring cluster comprising liver and kidney stem cells, while blood contaminants and immune cells are further apart.
Next, to further identify whether different levels of contamination with blood, liver or immune cells might lead to artifactual proteomic differences between the tumor samples, we compared proteins showing a minimum fold-change of 4.0 between the KRASWT
tumors with the top 100 proteins (based on NSAF values) of each of our 19 reference proteomes (Table S3
). Our rationale was that if a substantial contamination of one of the tumor samples by, for example blood, immune cells, or healthy liver tissue, had led to the false-positive identification of differential tumor proteins, it would be accompanied by a considerable enrichment of the top 100 proteins from the respective contaminant. In fact, we found no major source of preanalytical variability among the 800 proteins that were at least 4.0-fold regulated between KRASWT
. Gene ontology analysis using PANTHER [35
] showed an enrichment of cell migration, (p
-value 1.85 × 10−5
), regulation of the MAPK cascade (7.99 × 10−5
), and transcription by RNA polymerase II (1.60 × 10−4
) in the KRASG12V
tumor, while no significantly enriched pathways were detected in the KRASWT
sample. In general, the two proteomes of KRASWT
show a similar expression of pathways and biological functions (Figure 3
Of the 800 proteins that were differentially regulated between the tumors, 31 had somatic mutations detected by WES and/or RNAseq, and 596 transcripts showed the same trend of regulation in the RNAseq data. Interestingly, the average fold-change for those 31 mutated proteins was >12, indicating a major impact of those mutations on protein expression. Transcriptome and proteome data only poorly correlate for both tumors, WT and G12V (Figure 3
d,e). The G12V/WT ratios derived from either the proteome or the transcriptome also correlate poorly (Figure 3
f). Among the 250 proteins with the strongest discordance between proteome- and transcriptome-derived G12V/WT ratios were 8 transcription factors, 4 kinases, 2 phosphatases, 18 proteases, 10 histones, several members of the WNT-signaling pathway (WNT2, FRZB, SFRP4), the ADP-ribosylation factors ARF1 and ARF3, MUC2, CYP1B1, the SRC signaling inhibitor SRCIN1, eIF5A2, and LAMA2.
Mucin-2 (MUC2), for which the T1636S single nucleotide variant (SNV) was detected by WES in the G12V sample, showed the strongest protein upregulation in G12V (MUC2↑G12V; 1 unique peptide and 2 peptide-spectrum matched (PSMs) in the WT vs. 97 unique peptides and 2361 PSM in the G12V), while it was upregulated to a much lesser degree on the transcriptome level (7.4-fold upregulated in G12V RNAseq vs. WT RNAseq data). It has been reported that patients with low expression (here, WT) of MUC2 had significantly lower cell differentiation and more lymph node metastases than those with higher MUC2 levels (G12V) [37
]. In contrast, high expression of Mucin-5 (MUC5) was associated with lymph node metastasis and poor cellular differentiation and poor prognosis [37
]. Indeed, MUC5B was also clearly upregulated in the G12V tumor proteome (MUC5B↑G12V; 4 peptides and 8 PSM in WT vs. 17 peptides and 56 PSM in G12V; 5.6-fold higher in the G12V transcriptome), and the reduced expression of MUC5B in the WT might derive from a N1032I SNV that was detected in its WES data. Interestingly, MUC12 was also found to be clearly higher in the G12V tumor both on the proteome and the transcriptome levels (MUC12↑G12V; absent in WT, 26 peptides and 76 PSM in G12V; 13.0-fold higher in G12V RNAseq) and was found to be mutated in both tumors, with the SNV V874I in the G12V (both WES and RNAseq) and the SNV V3487I in the WT (only WES). MUC12 expression has been identified as a marker of prognosis in Stage II and III CRC, resulting in a worse prognosis at low MUC12 expression in qPCR [38
Another protein that showed strong regulation between both tumors was Laminin subunit alpha-2 (LAMA2↑G12V; 17 peptides and 39 PSM in WT vs. 114 peptides and 464 PSM in G12V), for which an A1805S SNV was detected by WES in the G12V. Interestingly, LAMA2 was not differentially regulated on the mRNA level (ratio G12V/WT of 0.84), and is a very strong example for discordance between transcriptome and proteome. LAMA2 is a suggested tumor suppressor [39
] and is frequently mutated in other cancers, such as lung cancers [40
]. Indeed, LAMA1 was not detected in the WT tumor while being moderately expressed in the G12V tumor (LAMA1↑G12V), and the transcriptome data showed a 4.3-fold higher expression level in the G12V tumor.
The ADP-ribosylation factor 1 (ARF1) has been reported as significantly elevated in various cancers [41
], and its expression in prostate cancer correlated with activation of ERK1 and ERK2, leading to cell proliferation [42
]. We found ARF1 to be significantly higher in the WT tumor proteome (ARF1↑WT; no peptides/PSM in G12V vs. 10 peptides and 137 PSM in WT), which might represent an EGFR-independent activation of the RAS/MEK/ERK pathway and a potential escape mechanism for anti-EGFR therapy. The strong regulation on the protein level is in stark contrast to the transcriptome data, which indeed show a slightly lower ARF1 mRNA level in the WT tumor (1.3-fold higher in the G12V transcriptome data). Screening for ARF1 in association with mutations in KRAS could be an enhanced predictive signature for anti-EGFR therapy, but clearly ARF1 is preferentially measured on the protein level.
Receptor tyrosine-protein kinase erbB-2 (ERBB2/HER2) showed a massively higher protein expression in the WT than in the G12V tumor (ERBB2↑G12V; 46 peptides and 907 PSM vs. 4 peptides and 12 PSM), in accordance with a detected copy number gain (amplified region chr17:37,690,344–40,762,015) and 69-fold higher normalized counts in the RNAseq data of the WT tumor. ERBB2/HER2 expression in the WT tumor was also significantly higher than in any of our 19 reference proteomes. The opportunity to target ERBB2/HER2 in colorectal cancer has recently emerged [43
] as it is amplified and/or mutated in 5% of CRC tumors, most often in KRAS WT tumors [43
], which agrees with our data.
Mesothelin (MSLN) has recently been described as a prognostic marker for Stage II/III CRC where its expression was associated with a lower survival rate. We could not detect MSLN in the G12V tumor proteome, while it was highly expressed in the WT tumor (MSLN↑WT; 22 peptides and 118 PSM) which is in good agreement with the transcriptomic data.
Another protein that has been associated with a poor outcome in Stage II and III CRC is the serine protease HTRA3 which was not detectable in the WT tumor proteome, but clearly elevated in the G12V (HTRA3↑WT; 19 peptides and 68 PSM). The transcriptome data showed the same trend (5.3-fold up-higher mRNA level in the G12V tumor).
Gremlin-1 has been associated with improved survival in locally advanced Stage II and III CRC [44
], and could not be detected in the WT tumor proteome but was highly expressed in the G12V tumor (GREM1↑G12V) and showed an 11.1-fold higher mRNA level in the G12V tumor, in accordance with the proteome data.
Furthermore, eukaryotic translation initiation factor 5A-2 (eIF5A2) was not detected in the WT tumor but is highly expressed in the G12V tumor (eIF5A2↑G12V). eIF5A2 has been shown to promote chemoresistance to doxorubicin in CRC [45
]. Its overexpression in the non-anti-EGFR eligible G12V tumor might have been an important implication for chemotherapy. The eIF5A2 mRNA level was also higher in the G12V transcriptome (2.0-fold), but to a much lower extent.
Taken together, our deep-proteome profiling of the KRASWT
tumors revealed a number of important expression changes (Figure 3
), some of which might have a direct application to therapeutic options. Clearly, these phenotypic insights on actual protein expression levels are an extremely valuable complement to the precision oncology data on individual tumors.
2.3. Identification and Quantification of SNV Predicted from the WES and RNAseq Data
A total of 51- and 77-point mutations in the KRASWT
samples were predicted by both WES and RNASeq, while 316 and 384 were predicted only by WES, with no evidence from RNASeq (Table S1
). We used this information to conduct mutation-directed database searches in order to identify peptides in our deep proteome profiling dataset that represent evidence for the presence of those mutations on the protein level. Indeed, we were able to identify a total of three and seven mutations in the KRASWT
samples, respectively, with high confidence at 1% group-specific FDR, as had been proposed by Nesvizhskii [46
]. For all of these mutations, we could also identify the corresponding canonical sequences in the same tumor sample.
We hypothesized that targeted mass spectrometry could be utilized to quantify the actual mutation rates on the protein level, and thus enable an improved phenotyping of individual tumor samples by precise quantification of both mutated and canonical variants of target proteins. We therefore synthesized stable-isotope labeled standards (SIS) for both the mutated and the canonical variants of eight mutated proteins (Table 1
), and developed corresponding parallel reaction monitoring (PRM) assays. Among our candidates was KRASG12V
. Notably, the canonical sequence LVVVGAGGVGK is shared between KRAS/HRAS/NRAS, so quantification of this peptide would not allow the determination of the G12V mutation rate. We therefore added unique peptides that represent HRAS, NRAS, and KRAS. We evaluated our PRM assays using the KRASG12
samples, and were able to achieve absolute quantification for KRAS from as little as 3 µg of total tissue protein digest on-column, using nano-LC-PRM.
To further evaluate the utility of our assays, we obtained additional mCRC liver-metastasis samples (T1–T6) and matched healthy liver tissue (H1–H6) from another six patients with KRASG12V
-positive tumors, as defined by hotspot mutation testing for the presence of selected mutations. Notably, these samples were not part of a well-designed study, but were ‘real-life’ biopsies from our internal biobank and therefore represent the ideal setting to evaluate whether targeted MS can be used to improve the phenotyping of samples that were not collected using specific SOPs for proteome analysis. All these patients were ineligible for targeted anti-EGFR treatment and received different (combinations) of treatment, including anti-VEGF-A treatment with Bevacizumab as well as chemotherapy for several cycles. In line with the poor prognosis of patients with Stage III and Stage IV CRC, four out of the six patients from the biobank have passed away. Starting with 50 µg of total protein for sample preparation and loading 3 µg of total tissue protein digest on-column, we quantified the eight proteins shown in Table 1
and their mutation rates.
We were able to quantify SRPX2 in all eight tumor samples (90–860 amol/3 µg; median 470 ± 260 amol) but in none of the six healthy control tissues, indicating a clear upregulation of the protein in CRC, in agreement with previous reports that SRPX2 plays a role in the progression of CRC [47
]. The mutation rate of SRPX2E234K
was 66% in the KRASWT
sample, while the mutated protein could not be detected in any of the G12V tumors.
We were also only able to quantify RPS6KA5 in seven tumor samples (G12V, WT, T1-T5), where its concentration ranged from 110–680 amol/3 µg (median 370 ± 340 amol), but not in any of the healthy tissues. Interestingly, survival data from CRC patients in the Human Protein Atlas [48
] indicate a better prognosis with high expression of RPS6KA5. The only sample where the RPS6KA5D554N
SNV could be detected was the KRASWT
tumor with a mutation rate of 58%.
PTBP1 could be quantified in all samples, ranging from 2.7–21.2 fmol/3 µg in the tumor samples (median 12.7 ± 7.2 fmol/3 µg) and 3.7–9.1 fmol/3 µg in the controls (median 7.4 ± 2.4 fmol/3 µg), showing no clear trend of regulation between cancer and control tissue. The mutated variant PTBP1K508E could only be detected in the original KRASG12V and in tumor T6, amounting to 38% and 26% mutation rates, respectively.
ARL2 could be quantified in all samples, with mutation rates between 35–100% in the tumors (median 620 ± 180 amol/3 µg) and 52.7–100% in the control tissues (median 170 ± 50 amol/3 µg). ARL2V141A is a natural variant with frequencies between 0.5 and 0.7 in various cohorts. The presence of the mutation in all eight patients of this small cohort might be random, but it might be important to follow this up in a larger study.
PPP1R14C, an inhibitor of the Serine/threonine-protein phosphatase PP1-alpha catalytic subunit (PPP1CA), could only be quantified in five tumors (KRASWT, KRASG12V, T1, T3, T6), but in none of the healthy controls, again indicating a potential upregulation of the protein in a mCRC setting (median 70 ± 20 amol/3 µg; 530 amol in G12V). Interestingly, in all five tumors, only the mutated variant PPP1R14CT10A could be detected, notably a natural variant with frequencies up to 0.8 in East Asian cohorts.
HAUS7 showed a 100% mutation rate in the KRASWT and the T1 tumor sample, but no consistent trend between the tumors (median 180 ± 10 amol/3 µg) and the paired healthy controls (median 80 ± 20 amol/3 µg) was observed.
TBC1D2BA8G could be detected in two G12V tumors (KRASG12V 49%, T4 100%). Notably, this mutation was also detected in healthy control H4 at a mutation rate of 100%.
summarizes the results for KRAS for the eight tumors and six controls. Total KRAS expression in the tumor samples varies between 2.0 and 5.7 fmol/3 µg (median 3.0 ± 1.4 fmol/3 µg), and between 0.4 and 1.9 fmol/3 µg (median 1.6 ± 0.6 fmol/3 µg) in the healthy control tissues. KRAS was upregulated in all tumors (T1–T6) compared to their healthy controls (H1–H6), by between 1.3-fold in T4/H4 and 14.8-fold in T1/H1. The strong upregulation of KRAS in T1 (at a mutation rate 86%, in stark contrast to the WES predicted frequency of 32%) might indicate a massive activation of the MEK/ERK signaling pathway in this patient’s tumor. While the G12V SNV could not detected in any of the healthy controls H1–H6 nor in the KRASWT
tumor, the mutation rate for the G12V positive tumors varied considerably (KRASG12V
50%, T1 86%, T2 100%, T3 52%, T4 38%, T5 10%, and T6 42%). The surprisingly low mutation rate of only 10% in T5 may indicate that this patient might be one of the false-negative patients who might benefit from targeted anti-EGFR treatment despite the detection of the KRASG12V
mutation during hotspot sequencing.