Proteomic Approach for Searching for Universal, Tissue-Specific, and Line-Specific Markers of Extracellular Vesicles in Lung and Colorectal Adenocarcinoma Cell Lines

Tumor-derived extracellular vesicles (EVs), including exosomes, contain proteins that mirror the molecular landscape of producer cells. Being potentially detectible in biological fluids, EVs are of great interest for the screening of cancer biomarkers. To reveal universal, tissue-specific, and line-specific markers, we performed label-free mass spectrometric profiling of EVs originating from the human colon cancer cell lines Caco-2, HT29, and HCT-116, as well as from the lung cancer cell lines NCI-H23 and A549. A total of 651 proteins was identified in the EV samples using at least two peptides. These proteins were highly enriched in exosome markers. We found 11 universal, eight tissue-specific, and 29 line-specific markers, the levels of which were increased in EVs compared to the whole lysates. The EV proteins were involved in the EGFR, Rap1, integrin, and microRNA signaling associated with metastasis and cancer progression. An EV protein-based assay could be developed as a liquid biopsy tool.


Introduction
According to the World Health Organization (WHO) data for 2018, lung cancer (LC) and colorectal cancer (CRC) are the most common causes of cancer-related deaths worldwide, accounting for an estimated 1.76 million and 862,000 deaths, respectively. They also rank first (2.09 million cases) and third (1.80 million cases) in terms of occurrence (https://www.who.int/news-room/fact-sheets/detail/cancer).
For LC, non-small cell lung carcinoma (NSCLC) represents 85% of all primary lung tumor cases. NSCLC generally encompasses adenocarcinoma (38.5% of all cases), squamous cell carcinoma (20% of all cases), and large cell carcinoma (2.9% of all cases) [1]. In recent years, the incidence of adenocarcinoma has increased and has overtaken squamous cell carcinoma in terms of prevalence [1,2]. Only 15% of patients with NSCLC are diagnosed at an early stage, while late diagnosis and metastatic and PIK3CA genes [29]. The detection of KRAS and BRAF mutations is a crucial step for the prediction of a response to targeted therapy resistance [9]. This mutation is normally mutually exclusive.
We performed the high-resolution mass-spectrometric profiling of EVs and the whole cell lysate (WhL) followed by label-free quantification to determine the proteomic cargo of EVs and reveal EV proteomic core, or universal EV markers, as well as markers that can distinguish the tissue origin and tumor variants within the same type of cancer. We also attempted to elucidate the biological significance of these factors, especially the involvement of EV markers in cancer progression and metastasis. Finally, to verify putative EV protein markers, we applied the targeted mass-spectrometric method, i.e., selected reaction monitoring (SRM) with stable isotope-labeled peptides standards (SIS).

Proteins Identified in Extracellular Vesicles Are Enriched in Exosomal Markers
Mass spectrometric analysis of the EVs derived from lung cancer and colorectal cancer cells resulted in the identification of 850 proteins in all EV samples studied (potential contaminants, false positive identifications, and proteins identified only by peptides containing modifications were excluded). The mass-spectrometric data are available via ProteomeXchange with the identifiers PXD020467 and PXD020454. All proteins identified in the EV samples are listed in the Supplement 1. According to the guidelines for the interpretation of mass spectrometric data by the HPP (Human Proteome Project), the reliability of protein identification is increased if two or more unique peptides are mapped to the protein (https://www.hupo.org/HPP-Data-Interpretation-Guidelines). In general, the higher the coverage of the amino acid sequence of the protein, the more reliable the results obtained. Moreover, as shown previously, label-free quantification with several peptides per protein provides precise measurements and can detect even subtle biologically determined changes in the proteome, e.g., spaceflight-induced changes in mice liver [30]. Figure 1a represents the distribution of proteins with different numbers of unique peptides per protein. Figure 1a shows that 651 proteins (76% of all identifications) were identified by at least two peptides, and 340 proteins (40% of all identifications) were identified by at least four peptides. The last protein subset was designated as the group with the most reliably identified proteins and used for label-free quantification. From the 103 proteins often identified in exosomes according to the ExoCarta database (ExoCarta Top100 list) (http://exocarta.org/exosome_markers_new), 70 proteins were overlapped with the most reliably identified protein subset. These included SDCB, TSG101, and ALIX, which are involved in exosome biogenesis, as well as integrins ITGA6 and ITGB1, MFGE8, and the lipid raft protein flotillin (FLOT1), which participate in cell adhesion. Additionally, protein abundance correlates with the protein peptide count and the number of unique peptides identified per protein [31]. Moreover, as Figure 1c shows, the most abundant proteins in the EV fraction belong to ExoCarta Top100 list (http://exocarta.org/exosome_markers_new). Conventional exosome markers, i.e., CD9, CD63, and CD81, have been also identified in mass-spectrometric experiments but only using one (CD63) or two (CD63 and CD81) unique peptides. The functional annotation analysis (Supplementary Figure S17) revealed 129 proteins that were annotated by GO terms "biological processes" as belonging to "vesicle-mediated transport" (FDR = 1 × 10 −46 ), as well as 145 and 14 proteins that were annotated by GO terms "cellular component" (GO) as belonging to the "vesicle" group (FDR = 2.02 × 10 −44 ) and "extracellular exosome" group (FDR = 4.87 × 10 −9 ), respectively. Mass spectrometric analysis of the EV samples indicates that the proteins identified in EV samples are enriched in exosomal markers.
To assess the morphology and integrity of EVs, a pooled sample of EVs isolated from LC and CRC cells was visualized using cryo-electron microscopy (Cryo-EM) (Figure 1d). Typical round-shaped vesicular morphology was observed, with the diameter of the majority of EVs under 200 nm. This observation indicates successful isolation of small to medium size vesicles with an intact lipid bilayer. proteins were identified in all EV samples studied (potential contaminants, false positive identifications, and proteins identified only by peptides containing modifications were excluded). (b) Venn diagram that shows the intersection of proteins that were identified by at least 4 unique peptides (the most confidently identified proteins) and proteins that were often identified in exosomes (http://exocarta.org/exosome_markers_new, ExoCarta Top100 list); (c) correlation between enrichment in ExoCarta protein markers and peptide coverage of the identified proteins. ExoCarta protein enrichment is shown as the percentage of proteins often identified in exosomes (http://exocarta.org/exosome_markers_new, ExoCarta Top100 list) from the total number of identified proteins (y-axis). The peptide count of the identified proteins is shown as the minimal number of unique peptides per protein (x-axis); (d) Cryo-EM image of pooled EVs isolated from LC and CRC cells, bar size is 200 nm.

Universal EV Proteins Distinctively Distinguish Whole Cell Lysate and EV Samples
Mass spectrometric analysis of EVs and WhL resulted in the identification of 3314 proteins (potential contaminants, false positive identifications, and proteins identified only by peptides containing modifications were excluded). Mass-spectrometric data are available via ProteomeXchange with the identifier PXD020454. To perform the relative protein quantification, 1859 proteins were used (identified using at least four peptides). Detailed results of the label-free quantification are presented in the Supplement 2. The result is shown in Figure 2. were identified in all EV samples studied (potential contaminants, false positive identifications, and proteins identified only by peptides containing modifications were excluded). (b) Venn diagram that shows the intersection of proteins that were identified by at least 4 unique peptides (the most confidently identified proteins) and proteins that were often identified in exosomes (http://exocarta.org/ exosome_markers_new, ExoCarta Top100 list); (c) correlation between enrichment in ExoCarta protein markers and peptide coverage of the identified proteins. ExoCarta protein enrichment is shown as the percentage of proteins often identified in exosomes (http://exocarta.org/exosome_markers_new, ExoCarta Top100 list) from the total number of identified proteins (y-axis). The peptide count of the identified proteins is shown as the minimal number of unique peptides per protein (x-axis); (d) Cryo-EM image of pooled EVs isolated from LC and CRC cells, bar size is 200 nm.

Universal EV Proteins Distinctively Distinguish Whole Cell Lysate and EV Samples
Mass spectrometric analysis of EVs and WhL resulted in the identification of 3314 proteins (potential contaminants, false positive identifications, and proteins identified only by peptides containing modifications were excluded). Mass-spectrometric data are available via ProteomeXchange with the identifier PXD020454. To perform the relative protein quantification, 1859 proteins were used (identified using at least four peptides). Detailed results of the label-free quantification are presented in the Supplement 2. The result is shown in Figure 2. (a) Volcano plot that shows the differences in the protein abundance in the EV fraction (in red) and in the whole lysate (WhL) (in black) for all cell lines studied. A total of 933 proteins were identified by at least 4 unique peptides per protein and were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 1, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). EV markers (11 proteins) are shown by their gene names in red. Normalized data of label free quantification (LFQ) intensities were used for visualization of the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (considering the log2 transformation) are provided on the x-axis, while -Log p (log10 transformed pvalue) is plotted on the y-axis. The blue dots represent proteins APOB, HBB, and HIST1H4A that could be artefacts of exosome isolation. Detailed data are presented in the Supplement 2; (b) universal EV protein markers were more abundant in the EV samples compared to WhL samples; the log2 transformed fold change that reflects the difference in protein abundance is plotted on the y-axis; proteins included in the ExoCarta Top100 list that contains the most often identified in exosome molecules (http://exocarta.org/exosome_markers_new) are shown in white.
Two-sample testing (Student's t-test, permutation-based false discovery rate (FDR) = 0.01) yielded 18 proteins (ANXA6, CDC42, CNP, FN1, GNAI2, HSPG2, ITGB1, ITGB3, JUP, MVP, RAP1B,  SLC2A1, TLN1, TUBA4A, and UBE2N) that were significantly more abundant in the EV samples compared to the WhL samples. After two-sample testing, proteins with significant differences were subjected to additional filtering. First, the putative artefact of EV isolation, i.e., apolipoprotein B-100 (APOB) and hemoglobin subunit beta (HBB), were excluded from further analysis as they could be components of residual fetal bovine serum (FBS). Second, the peptides' quality and their suitability for targeted analysis (see "Material and Method" section) were assessed. After filtration, 11 proteins remained (Figure 2b). Among them, the content of TUBA4A, HSPG2, ITGB3, CNP, and FN1 proteins in the EV samples was at least 10 times higher than that in WhL samples (Figure 2b). Proteins ITGB1, CDC42, GNAI2, and FN1 are often identified in exosomes (http://exocarta.org/exosome_markers_new, ExoCarta Top100 list). Seven proteins, i.e., MVP, TLN1, SLC2A1, TUBA4A, HSPG2, ITGB3, and CNP, may represent new markers of exosomes isolated from cells of epithelial origin. The revealed universal protein markers clearly distinguished the EVs from the WhL. (a) Volcano plot that shows the differences in the protein abundance in the EV fraction (in red) and in the whole lysate (WhL) (in black) for all cell lines studied. A total of 933 proteins were identified by at least 4 unique peptides per protein and were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 1, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). EV markers (11 proteins) are shown by their gene names in red. Normalized data of label free quantification (LFQ) intensities were used for visualization of the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (considering the log2 transformation) are provided on the x-axis, while -Log p (log10 transformed p-value) is plotted on the y-axis. The blue dots represent proteins APOB, HBB, and HIST1H4A that could be artefacts of exosome isolation. Detailed data are presented in the Supplement 2; (b) universal EV protein markers were more abundant in the EV samples compared to WhL samples; the log2 transformed fold change that reflects the difference in protein abundance is plotted on the y-axis; proteins included in the ExoCarta Top100 list that contains the most often identified in exosome molecules (http://exocarta.org/exosome_markers_new) are shown in white. Two-sample testing (Student's t-test, permutation-based false discovery rate (FDR) = 0.01) yielded 18 proteins (ANXA6, CDC42, CNP, FN1, GNAI2, HSPG2, ITGB1, ITGB3, JUP, MVP, RAP1B, SLC2A1,  TLN1, TUBA4A, and UBE2N) that were significantly more abundant in the EV samples compared to the WhL samples. After two-sample testing, proteins with significant differences were subjected to additional filtering. First, the putative artefact of EV isolation, i.e., apolipoprotein B-100 (APOB) and hemoglobin subunit beta (HBB), were excluded from further analysis as they could be components of residual fetal bovine serum (FBS). Second, the peptides' quality and their suitability for targeted analysis (see "Material and Method" section) were assessed. After filtration, 11 proteins remained (Figure 2b). Among them, the content of TUBA4A, HSPG2, ITGB3, CNP, and FN1 proteins in the EV samples was at least 10 times higher than that in WhL samples (Figure 2b). Proteins ITGB1, CDC42, GNAI2, and FN1 are often identified in exosomes (http://exocarta.org/exosome_markers_new, ExoCarta Top100 list). Seven proteins, i.e., MVP, TLN1, SLC2A1, TUBA4A, HSPG2, ITGB3, and CNP, may represent new markers of exosomes isolated from cells of epithelial origin. The revealed universal protein markers clearly distinguished the EVs from the WhL.

Lung Tissue-Specific EVs Loaded by EGFR Ligand EPS15 and Colon Tissue-Specific EVs Enriched in the Differentiation Regulator DMBT1
Label-free quantification and two-sample testing (Student's t-test, FDR = 0.01) resulted in 39 proteins with significant differences in their abundance in lung tissue-and colon tissue-derived EVs (Figure 3).

Lung Tissue-Specific EVs Loaded by EGFR Ligand EPS15 and Colon Tissue-Specific EVs Enriched in the Differentiation Regulator DMBT1
Label-free quantification and two-sample testing (Student's t-test, FDR = 0.01) resulted in 39 proteins with significant differences in their abundance in lung tissue-and colon tissue-derived EVs ( Figure 3).  Figure S1 shows these proteins in detail); (b) Venn diagram that shows the intersection of proteins that were more abundant in CRC-derived EVs compared to LC-derived EVs, and proteins that were more abundant in EV samples compared to WhL samples of CRC cells; (c) Venn diagram that shows the intersection of proteins that were more abundant in LC-derived EVs compared to CRC-derived EVs and proteins that were more abundant in the EV samples compared to the WhL samples of LC cells; (d) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the CRC cell lines. Overall, 1654 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S2 shows the volcano plot in detail: (e) Volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for LC cell lines. Overall, 1622 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S3 shows the volcano plot in detail: For (d,e), normalized data of the LFQ intensities were used for visualization of the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (taking into account the log2 transformation) are shown on the x-axis, while -Log p (log10 transformed p-value) is plotted on the y-axis. Tissue-specific EV proteins that were also determined to be universal markers were highlighted in red. Detailed data are presented in the Supplement 3.
Proteins with significant content differences were grouped into two clusters. These clusters comprised 17 and 22 proteins that were more abundant in CRC-derived EVs and LC-derived EVs, respectively ( Figure 3; Supplement 3). Applying two-sample testing (Student's t-test, FDR = 0.01), we  Figure S1 shows these proteins in detail); (b) Venn diagram that shows the intersection of proteins that were more abundant in CRC-derived EVs compared to LC-derived EVs, and proteins that were more abundant in EV samples compared to WhL samples of CRC cells; (c) Venn diagram that shows the intersection of proteins that were more abundant in LC-derived EVs compared to CRC-derived EVs and proteins that were more abundant in the EV samples compared to the WhL samples of LC cells; (d) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the CRC cell lines. Overall, 1654 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S2 shows the volcano plot in detail: (e) Volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for LC cell lines. Overall, 1622 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S3 shows the volcano plot in detail: For (d,e), normalized data of the LFQ intensities were used for visualization of the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (taking into account the log2 transformation) are shown on the x-axis, while -Log p (log10 transformed p-value) is plotted on the y-axis. Tissue-specific EV proteins that were also determined to be universal markers were highlighted in red. Detailed data are presented in the Supplement 3.
Proteins with significant content differences were grouped into two clusters. These clusters comprised 17 and 22 proteins that were more abundant in CRC-derived EVs and LC-derived EVs, respectively ( Figure 3; Supplement 3). Applying two-sample testing (Student's t-test, FDR = 0.01), we determined 11 and 14 proteins to have levels at least twofold higher in the EV samples compared to the WhL samples of the CRC and LC cell lines, respectively (Supplement 3, Supplementary  Figures S2 and S3). After overlapping the two sets of semi-quantitative data, 10 tissue-specific exosome markers were determined ( Figure 3, Table 1). The TUBA4A and CNP proteins were also determined to be universal EV protein markers. SDCB, VPS28, and TSG101, as well as EGFR ligand EPS15, and retinol-inducible protein GPRC5A, which are involved in exosome biogenesis, were uniquely found to be specific for the lung tissue EV samples. Collagen COL6A2, the differentiation regulator DMBT1, and the complement component C4B (P0C0L5) were uniquely found to be specific for exosomes of the colon tissue.

Line-Specific EV Proteins Distinguish Different Variants within the Same Type of Cancer
LC cell lines A549 and NCI-H23, as well as CRC cell lines Caco-2, HTC116, and HT29, were used to search for line-specific EV proteins. The investigated cell lines carried mutations in the KRAS, BRAF, PIK3CA, c-MYC, and P53 genes (Supplementary Table S1), which determine different susceptibility levels to pharmacological effects and various metastatic potentials.
Label-free quantification and two-sample testing (Student's t-test, FDR = 0.01) resulted in 69 proteins with significant differences in abundance in A549 line-derived EVs and NCI-H23 line-derived EVs (Figure 4).   Figure S4 shows the heatmap in detail); (b) Venn diagram that shows the intersection of proteins that were more abundant in A549 line-derived EVs compared to NCI-H23 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of A549 cell line; (c) Venn diagram that shows the intersection of proteins that were more abundant in NCI-H23 line-derived EVs compared to A549 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of the NCI-H23 cell line; (d) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the A549 cell line. Overall, 1121 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S5 shows the volcano plot in detail; (e) the volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the NCI-H23 cell line. Overall, 1173 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S6 shows the volcano plot in detail; for (d,e), normalized data of the LFQ intensities were used for visualization of the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (taking into account the log2 transformation) are plotted on the x-axis, while -Log p (log10 transformed p-value) is shown on the y-axis. Line-specific EV proteins that were also determined to be universal and tissue-specific markers are highlighted in red and green, respectively. Detailed data are presented in the Supplement 4.  Figure S4 shows the heatmap in detail); (b) Venn diagram that shows the intersection of proteins that were more abundant in A549 line-derived EVs compared to NCI-H23 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of A549 cell line; (c) Venn diagram that shows the intersection of proteins that were more abundant in NCI-H23 line-derived EVs compared to A549 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of the NCI-H23 cell line; (d) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the A549 cell line. Overall, 1121 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S5 shows the volcano plot in detail; (e) the volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the NCI-H23 cell line. Overall, 1173 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S6 shows the volcano plot in detail; for (d,e), normalized data of the LFQ intensities were used for visualization of the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (taking into account the log2 transformation) are plotted on the x-axis, while -Log p (log10 transformed p-value) is shown on the y-axis. Line-specific EV proteins that were also determined to be universal and tissue-specific markers are highlighted in red and green, respectively. Detailed data are presented in the Supplement 4.
Proteins with significant content differences were grouped into two clusters. These clusters included 35 and 34 proteins that were more abundant in A549 line-derived EVs and NCI-H23 line-derived EVs, respectively. Comparison of the EVs and WhL proteomes revealed 62 and 56 proteins whose levels were at least twofold higher in the EV samples compared to the WhL samples of A549 and the NCI-H23 cell lines, respectively (Supplement 4, Supplementary Figures S5 and S6). The intersection of the two data sets yielded 21 and 8 proteins that were A549 line-and NCI-H23 line-specific, respectively ( Figure 4, Table 1). The A549 line-and NCI-H23 line-specific protein subsets included the universal EV markers TUBA4A and CNP, as well as the tissue-specific EV markers TSG101 and VPS28.
Label-free quantification and multiple sample testing (ANOVA test, FDR = 0.01) resulted in 132 proteins with significant differences in abundances between Caco-2 line-derived EVs, HTC116 line-derived EVs, and HT29 line-derived EVs (Figure 5). Proteins with significant content differences were grouped into three clusters (Figure 5a). These clusters included 11 (cluster 1), two (cluster 2), and 86 (cluster 3) proteins that were more abundant in Caco-2 line-derived EVs, HTC116 line-derived EVs, and HT29 line-derived EVs, respectively. Two-sample testing of EVs and WhL proteomes yielded 48, 28, and 20 proteins, whose levels were at least twofold higher in the EV fraction compared to the WhL samples of the Caco-2, HTC116, and HT29 cell lines, respectively Supplement 5, Figures S8-S10). Pairwise intersection of the two data sets resulted in seven, two, and six proteins that were Caco-2 line-, HTC116 line-, and HT29 line-specific, respectively ( Figure 5, Table 1). The Caco-2 line-, HTC116 line-, and HT29 line-specific protein subsets included the universal EV markers FN1 and ITGB3, as well as the tissue-specific EV marker, GPRC5A.
The universal, tissue-specific, and line-specific EV proteins are listed in Table 1. Their label-free results (fold change, p-value), annotations in terms of subcellular localization, and frequency of detection based on the "ExoCarta Top100 list" are presented in Table 2.  Figure S7 shows the heatmap in detail); (b) Venn diagram that shows the intersection of protein that were more abundant in Caco-2 line-derived EVs compared to HTC116 and HT29 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of the Caco-2 cell line; (c) Venn diagram that shows the intersection of proteins that were more abundant in HTC116 line-derived EVs compared to Caco-2 and HT29 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of the HTC116 cell line; (d) Venn diagram that shows the intersection of proteins that were more abundant in HT29 line-derived EVs compared to HTC116 and Caco-2 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of HT29 cell line; (e) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the Caco-2 cell line. Overall, 1149 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure  S8 shows the volcano plot in detail; (f) the volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the HTC116 cell line. Overall, 1241 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's ttest, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S9 shows the volcano plot in detail; (g) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the HT29 cell line. Overall, 1552 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S10 shows the  Figure S7 shows the heatmap in detail); (b) Venn diagram that shows the intersection of protein that were more abundant in Caco-2 line-derived EVs compared to HTC116 and HT29 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of the Caco-2 cell line; (c) Venn diagram that shows the intersection of proteins that were more abundant in HTC116 line-derived EVs compared to Caco-2 and HT29 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of the HTC116 cell line; (d) Venn diagram that shows the intersection of proteins that were more abundant in HT29 line-derived EVs compared to HTC116 and Caco-2 line-derived EVs and proteins that were more abundant in the EV fraction compared to the WhL of HT29 cell line; (e) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the Caco-2 cell line. Overall, 1149 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S8 shows the volcano plot in detail; (f) the volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the HTC116 cell line. Overall, 1241 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S9 shows the volcano plot in detail; (g) volcano plot that shows the differences in protein abundance in the EV fraction (in red) and in the WhL (in black) for the HT29 cell line. Overall, 1552 proteins identified by at least 4 unique peptides per protein were statistically significant (Student's t-test, truncation: permutation-based FDR = 0.01, S0 = 2, Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). Figure S10 shows the volcano plot in detail; for (e-g) normalized data of the LFQ intensities were used for visualization at the protein level. The LFQ intensities were log2 transformed. Differences in protein abundance (taking into account the log2 transformation) are on the x-axis, while -Log p (log10 transformed p-value) is shown on the y-axis. Detailed data are presented in the Supplement 5. Table 2. Label-free results (fold change, p-value); annotations in terms of subcellular localization (Uniprot database) and frequency of detection based on the "ExoCarta Top100 list" (http://exocarta.org/exosome_markers_new).   -Plasma membrane * Number of peptides identified in EV samples only; ** frequency of detection based on the "ExoCarta Top100 list" (http://exocarta.org/exosome_markers_new) containing molecules commonly identified in exosomes; *** subcellular localization based on the Uniprot database.

EV Protein Markers Are Enriched in Interactions and Are Involved in EGFR, Rap1, and Integrin Signaling
To assess the biological significance of universal, tissue-specific, and line-specific EV markers, we performed a STRING interaction analysis ( Figure 6).
As expected, due to the small sample size, the colon tissue-and HTC116 cell-derived EV proteins were not enriched in their interactions (Figure 6a,h). At the same time, statistically significant interactions were established for lung tissue, NCI-H23 line-, A549 line-, and Caco-2 line-specific proteins with medium confidence, as well as for universal EV proteins and HT29 cell-derived EV proteins with the highest confidence.
The EV markers were also annotated based on their functions (Biological Processes (GeneOntology, GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome Pathways) and subcellular localization (cellular component (GO)) where possible (Figures S11-S16). Table 3 shows the statistically most confidant groups of biological functions, pathways, and subcellular localization annotations. The detailed data are presented in the Supplements 2-5.
The EV proteins of most groups were annotated as "extracellular exosome", "extracellular region", or "ESCRT I complex" by subcellular localization and belonged to the "extracellular matrix organization", "vesicle-mediated transport", "extracellular matrix organization", "endosomal sorting complex required for transport (ESCRT)", or "endocytosis" groups in terms of biological processes and pathways. These data support the EV identification. At the same time, EV proteins were involved in the "negative regulation of epidermal growth factor receptor signaling", "Rap1 signaling pathway", "microRNAs in cancer", "integrin cell surface interactions", "ECM-receptor interaction", and "platelet activation, signaling, and aggregation" groups. Notably, most EV proteins within the annotation groups, even in the small-sized groups, were enriched in their interactions according to the STRING analysis. These results suggest that EVs regulate the oncogenic pathways, i.e., EGFR, Rap1, integrins, and MicroRNA signaling, and can contribute to metastasis and cancer progression in such a way.

EV Protein Markers Are Enriched in Interactions and Are Involved in EGFR, Rap1, and Integrin Signaling
To assess the biological significance of universal, tissue-specific, and line-specific EV markers, we performed a STRING interaction analysis ( Figure 6). Figure 6. STRING interaction analysis (a) for 11 proteins that were more abundant in EVs compared to the WhL: The network was built based on the highest confidence (0.9) and enriched in interactions (PPI enrichment p-value = 3.19 × 10 −5 ); (b) for the 6 proteins that were specific for lung tissue-derived EVs, the network was built based on medium confidence (0.4), and the network was enriched in Figure 6. STRING interaction analysis (a) for 11 proteins that were more abundant in EVs compared to the WhL: The network was built based on the highest confidence (0.9) and enriched in interactions (PPI enrichment p-value = 3.19 × 10 −5 ); (b) for the 6 proteins that were specific for lung tissue-derived EVs, the network was built based on medium confidence (0.4), and the network was enriched in interactions (PPI enrichment p-value = 0.0005); (c) for the 4 proteins that were specific for colon tissue-derived EVs, the network was built based on medium confidence (0.4), and the network was not enriched in interactions (PPI enrichment p-value = 1); (d) for the 8 proteins that were specific for NCI-H23-derived EVs, the network was built based on medium confidence (0.4), and the network was enriched in interactions (PPI enrichment p-value = 0.035); (e) for the 21 proteins that were specific for A549 cell-derived EVs, the network was built based on medium confidence (0.4), and the network was enriched in interactions (PPI enrichment p-value = 0.0008; (f) for the 7 proteins that were specific for Caco-2-derived EVs, the network was built based on medium confidence (0.4), and the network was enriched in interactions (PPI enrichment p-value = 0.035); (g) for the 6 proteins that were specific for HT29 cell-derived EVs, the network was built based on highest confidence (0.9), and the network was enriched in interactions (PPI enrichment p-value = 0.03); (h) for the 2 proteins that were specific for HTC116 cell-derived EVs, the network was built based on medium confidence (0.4), and the network was not enriched in interactions (PPI enrichment p-value = 1). All networks were enriched using the intersection of 8612 genes present on all platforms as the background, along with evidence from experimental protein-protein interactions (PPI) (purple lines), text-mining (bright green lines), and curated (turquoise blue lines) databases.

Verification of Universal, Tissue-Specific, and Line-Specific EV Marker Levels by Targeted Mass-Spectrometry
To evaluate the levels of putative universal, tissue-specific, and line-specific EV markers, we applied SRM analysis with SIS. We measured the abundance of 12 proteins (CNP, EPS15, FN1, HSPG2, ITGB3, MFGE8, PTGFRN, RACGAP1, SDC4, SDCB1, TSG101, and TUBA4A) in all EV and WhL samples. Figure 7 demonstrates the levels of universal, tissue-specific, and line-specific EV markers.
interaction", and "platelet activation, signaling, and aggregation" groups. Notably, most EV proteins within the annotation groups, even in the small-sized groups, were enriched in their interactions according to the STRING analysis. These results suggest that EVs regulate the oncogenic pathways, i.e., EGFR, Rap1, integrins, and MicroRNA signaling, and can contribute to metastasis and cancer progression in such a way.

Verification of Universal, Tissue-Specific, and Line-Specific EV Marker Levels by Targeted Mass-Spectrometry
To evaluate the levels of putative universal, tissue-specific, and line-specific EV markers, we applied SRM analysis with SIS. We measured the abundance of 12 proteins (CNP, EPS15, FN1,  HSPG2, ITGB3, MFGE8, PTGFRN, RACGAP1, SDC4, SDCB1, TSG101, and TUBA4A) in all EV and WhL samples. Figure 7 demonstrates the levels of universal, tissue-specific, and line-specific EV markers.  Caco2_M, and HT29_M). * p-value < 0.01; ** p-value < 0.05; Error bars represent standard deviation for measurements performed at 15,9,6, and 3 points for universal, lung tissue-specific, colon tissuespecific, and line-specific markers, respectively. For each cell line, EV isolation was performed in three biological replicates and yielded three EV sample. Each sample was analyzed using SRM with SIS in three technical replicates with coefficient of variation below 20%. ND-not detected.  Caco2_M, and HT29_M). * p-value < 0.01; ** p-value < 0.05; Error bars represent standard deviation for measurements performed at 15,9,6, and 3 points for universal, lung tissue-specific, colon tissue-specific, and line-specific markers, respectively. For each cell line, EV isolation was performed in three biological replicates and yielded three EV sample. Each sample was analyzed using SRM with SIS in three technical replicates with coefficient of variation below 20%. ND-not detected. Figure 7 shows that among the universal markers, ITGB3 and HSPG2 were detected in the EV samples only, and their measured levels were 0.48 ± 0.44 fmol/µg and 0.23 ± 0.18 fmol/µg, respectively. The FN1 protein content was fourfold (p-value = 1.1 × 10 −4 ) higher in the EV samples compared to WhL. CNP and TUBA4A failed to be verified as universal markers, as there was no significant difference in their content. Nevertheless, three of five universal EV markers (FN1, ITGB3, and HSPG2) distinguished the EV samples from the WhL samples. For lung-specific markers, the CNP, TSGT101, and EPS15 content was 14.5-fold (p-value = 5.7 × 10 −5 ), 6.6-fold (p-value = 7.7 × 10 −4 ), and 5.9-fold (p-value = 3.3 × 10 −4 ) higher in the LC-derived EVs compared to the CRC-derived EVs. The colon-specific marker TUBA4A was 3.6-fold (p-value = 3.7 × 10 −3 ) more abundant in CRC-derived EVs compared to LC-derived EVs. The levels of the putative lung-specific marker SDCB1 were insignificantly higher in LC-derived EVs compared to CRC-derived EVs. Thus, four of five tissue-specific markers (CNP, TSG101, EPS15, and TUBA4A) distinguished LC-derived EVs from CRC-derived EVs. The A549-specific EV protein TSG101 levels were 3.4-fold (p-value = 2.9 × 10 −3 ) higher in the A549 line-derived EV samples compared to the H23 line-derived EV samples. The A549-specific EV protein PTGFRN was detected only in the A549 line-derived EV samples at levels of 0.16 ± 0.07 fmol/µg. The NCI-H23-specific EV protein MFGE8 levels were 2.2-fold (p-value = 0.03) higher in NCI-H23 line-derived EV samples compared to the A549 line-derived EV samples. The NCI-H23-specific EV protein SDC4 was detected only in NCI-H23 line-derived EV samples at levels of 0.18 ± 0.17 fmol/µg. Therefore, four of five LC line-specific markers (TSG101, MFGE8, PTGFRN, and SDC4) distinguished NCI-H23 line-derived EVs from A549 line-derived EVs. The HT29 line-specific markers PTGFRN and RACGAP1 were detected in HT29-derived EV samples at levels of 0.12 ± 0.03 fmol/µg and 2.0 ± 2.4 fmol/µg, respectively. The HT116 line-specific marker ITGB3 content was significantly higher (ANOVA p-value = 0.01) in HT116-derived EVs compared to the other CRC cell lines. The Caco2 line-specific marker FN1 content was higher in Caco2-derived EVs compared to the other CRC cell lines, but the difference was not significant (ANOVA p-value = 0.08). Therefore, three out of four CRC line-specific markers (ITGB3, PTGFRN, and RACGAP1) distinguished HT29-derived, HTC116-derived, and Caco-derived EVs. The high standard deviation calculated for biological replicates can be the result of errors introduced by the sample preparation procedure, e.g., interfering with proteins of residual FBS that can contaminate EV proteins.

Discussion
At a qualitative level, the mass spectrometric analysis allowed us to determine the components of the ESCRT complex (TSG101 and PDCD6IP) and the proteins associated with exosome biogenesis (SDCBP, SDC4, VPS28, VPS37B, MFGE8, ARF6, VPS32, CD82, FLOT1, and FLOT2) [12,13], which were identified in our experiments with high confidence (by at least four unique peptides). Moreover, exosome-characteristic tetraspanins CD63 (one unique peptide), CD9 (two unique peptides), and CD81 (two unique peptides) were identified by means of shotgun mass-spectrometry, but they did not pass the four unique peptide threshold for label-free quantification. Thus, mass-spectrometric profiling allowed for the simultaneous analysis of exosome-characteristic proteins in the EV samples. At the same time, EVs are heterogeneous in their composition and traits [13]. Therefore, the results of the mass spectrometric analysis showed an average proteomic landscape of EVs. The enrichment of various EV populations followed by mass spectrometric analysis is required for a more accurate characterization of EV protein composition.
At the same time, a number of proteins apparently originating from supplemental FBS (e.g., APOA1, APOB, fibrinogen chains, etc.) were identified. This observation suggests that residual FBS from the media contaminated the samples and interfered with the mass-spectrometric analysis as FBS-derived peptides compete with EV-derived peptides for MS acquisition. Due to the conservative amino acid sequences in mammals, the proteins from FBS could be identified as human EV proteins, thus masking valuable biological data. The fact that cultural media supplemented with 10-20% FBS can mimic diluted blood plasma will need to be taken into account when performing a proteomic analysis on plasma-isolated EVs. Highly abundant plasma proteins such as albumin, α-2-macroglobulin (A2M), and hemoglobin subunit α (HBA1) have been assigned to EVs isolated from blood plasma [34] and are listed among "ExoCarta Top100 list" (http://exocarta.org/exosome_markers_new) (i.e., albumin and A2M), but it is very difficult to confirm their EVs' origins.
Using label-free mass-spectrometric profiling, we determined 11 proteins whose levels were higher in all EV samples compared to WhL. We denominated them as universal EV markers. Applying SRM with SIS, we verified FN1, ITGB3, and HSPG2 as EV-specific proteins. Previously, FN1 was determined to be an LC-associated marker on A549 cell line-derived and blood-derived EVs using mass-spectrometric profiling [25]. Moreover, FN1 was found to be up-regulated in the blood-derived EVs of smokers and patients with chronic obstructive pulmonary disease by proteomic methods [34]. Fibronectin is a ubiquitous and essential component of the extracellular matrix (ECM). It plays a role in tissue remodeling and wound-healing alongside the HSPG2 protein [35], which we also verified as a universal EV protein in our study. The verified universal EV marker ITGB3 serves as a receptor for components of the ECM, including FN1, and plays roles in the progression of different cancer-associated processes, including initiation, proliferation, survival, migration, and invasion [36]. Notably, we named the core EV proteins "universal EV proteins", but the applicability of this term to other cell types (liver, brain, skin epithelium, etc.) must be proven. Nevertheless, six so-called universal EV proteins in our study (FN1, GNAI2, ITGB1, CDC42, MVP, and TLN1) overlapped with the core EV proteins determined from proteomic profiling of the EVs derived from 60 cell lines of different origins (NCI-60) [37].
For biological processes, some markers are associated with pro-cancerous properties and are involved in tumor progression and metastasis. According to the database annotation, universal EV proteins (CDC42, GNAI2, ITGB3, TLN1, and TUBA4A) are involved in platelet activation that help cancer cells escape immune surveillance and provide a prometastatic microenvironment [38]. These universal EV proteins, i.e., CDC42, GNAI2, ITGB1, ITGB3, and TLN1, were assigned to the Rap1 signaling pathway that regulates cell invasion and metastasis by affecting cell adhesion and modulating the expression of matrix metalloproteinases [39]. The Ras-associated protein-1 (Rap1) that triggers the Rap1 signaling pathway was identified in our study using eight peptides, and its abundance was at least four-fold higher in the EV fraction compared to the WhL samples.
Among the CRC-cell line markers, the Caco-2 line-specific EV protein Prominin-1 (PROM1), CD133, is a marker of cancer stem cells and associated with metastasis in CRC, and PROM1 overexpression renders tumors resistant to chemotherapy and radiation therapy [40]. The knock-down of APLP2, which is also a Caco-2 line-specific EV protein, reduced the proliferation of this particular cell line [41]. For LC-cell line markers, the A549 line-specific EV proteins CD109 and PTGFRN were found to be metastasis-associated in lung cancer [42]. Moreover, the CD109 protein triggered the process of metastasis in an NSCLC mice model against a similar genetic background of the human A549 cell line, including mutation in the KRAS gene [43]. The NCI-H23 line-specific EV protein ICAM-1 promotes cell-endothelial adhesion, which is an important step in metastasis development [44].
Several EV markers apparently involved in miRNA regulation could be oncogenic. The HT29 line-derived EV proteins ST14, and KIF23 were annotated against the KEGG database as belonging to the signaling pathway "MicroRNAs in cancer". The proteins involved in RNA-mediated gene silencing, i.e., TSN, DHX9, SND1, MOV10, and CNOT1, were identified in our experiments by at least four peptides, but their levels were higher in the WhL samples compared to the EVs derived from the HT29 cell line.
These results highlight the oncogenic proteomes of EVs that are associated with LC and CRC progression and metastasis, the components of which represent a promising source of predictive and prognostic markers.
The verified lung cancer-specific EV protein EPS15 is of special interest. The EPS15 protein, i.e., the marker of the epidermal growth factor receptor substrate 15, is involved in the receptor-mediated endocytosis of EGFR [45]. Overexpression of the EPS15 gene is considered to be a favorable prognostic factor [46]. Moreover, in our study, EPS15 and the other lung cancer-specific EV markers, GPRC5 and TSG101, were associated with the "negative regulation of the epidermal growth factor receptor" signaling pathway. The EGFR itself was identified in both LC cell lines.
Furthermore, a number of EV markers play the role of tumor suppressors and inhibit proliferation. The CRC-specific EV marker DMBT1 may act as a tumor suppressor [34,35], whose loss could be a poor prognostic factor [47]. Specific for the CRC cell line HT29, the suppressor of tumorigenicity 14 (ST14) protein maintains epithelial barrier integrity and suppresses intestinal carcinogenesis [48]. The ability to suppress cancer metastasis was shown for HTC116 line-specific EV protein stomatin (STOM) [49]. The serine peptidase inhibitor Kunitz type 2 (SPINT2), a Caco-2 line-specific EV protein, inhibits HGF and suppresses the progression of various types of cancer [50]. The universal marker MVP was shown to assist in the removal of tumor suppressor microRNAs (miR-193a) from cancer cells [51]. These data also suggest that tumor suppressor proteins could be withdrawn from cancer cell by EVs.
The controversial cancer-related functions of EV proteins suggest that EV groups with specific molecular signatures and diversified functions may be isolated. Putatively, one EV subset may function for cancer cell as an oncosuppressive molecule disposal system, and the other EV subset may be a means for the transmission of an oncogenic signal. However, this concept must be experimentally proven.
The label-free quantitative analysis determined universal, tissue-specific, and line-specific EV protein markers. We verified 12 EV markers (CNP, EPS15, FN1, HSPG2, ITGB3, MFGE8, PTGFRN, RACGAP1, SDC4, SDCB1, TSG101, and TUBA4A) via targeted mass-spectrometry (SRM using SIS, 1 peptide per protein). These proteins could be the backbone for the development of an SRM-based assay for LC and CRC screening. The next step will be the validation of EV markers on plasma samples from patients with LC and CRC. In this case, putative LC and CRC biomarkers should be analyzed against the complex background of EVs originating from blood cells, i.e., platelets, erythrocytes, and leukocytes [52]. Previously, we studied EVs isolated from the blood plasma of healthy volunteers and used SRM analysis to determine the levels of the exosomal markers CD9, CD82, and HSPA8 [53]. According to the literature, the content of FIN1, TUBA4A, and MVP proteins, identified by us as universal EV markers, up-regulated in the blood plasma of patients with NSCLC compared to healthy donors [54].
Being prometastatic or tumor suppressive, the proteins in EVs play a crucial role in tumorigenesis and, therefore, have high diagnostic potential. The simultaneous targeted mass-spectrometric measurement of EV markers in human blood is a promising liquid biopsy tool.

Materials and Methods
To derive EVs, we used the colon cancer cell lines Caco-2, HT29, and HCT-116, as well as the lung cancer cell lines NCI-H23 and A549 as model objects. Cell culturing was performed in the presence of FBS to avoid starvation and oxidative stress. To prevent the contamination of the human EV proteome via bovine EV proteins, the culture medium was supplemented with exosome-depleted FBS prior to the mass-spectrometric analysis. The starting volume was rather small (18 mL) compared to that used in previous studies on EV isolation form cultural medium (120-500 mL [37,55]) to make the exosome isolation protocol applicable to liquid biopsy volume (7.5-40 mL [23,56,57]) in the future.

Cultivation of Cell Lines
The lung adenocarcinoma cell lines A549 and NCI-H23 and colorectal cancer cell lines HT29, HCT-116, and CaCo-2 were obtained from the cell culture bank Institute of Biomedical Chemistry (IBMC), Moscow, Russia.
For proteomic analysis of the lung adenocarcinoma cell lines (A549 and NCI-H23) and colorectal adenocarcinoma cell lines (HT29, HCT-116, and CaCo-2), the cell lines were cultured in a medium supplemented with exosome-depleted FBS.
When the cells reached the monolayer (70-80% confluency), they were washed twice with potassium phosphate buffer (PBS), and the culture medium was replaced with an exosome-free medium (with the addition of FBS previously purified from exosomes via ultracentrifugation at 100,000× g for 14 h). For further analysis, the culture medium was collected after 24 h. Cells were detached from the culture flasks via the addition of 2 mL of 0.25% trypsin-EDTA (Gibco™, Paisley, UK) for 5-10 min at 37 • C. The number of cells was measured using a cell counter and a cell viability analyzer-TC20 ™ Automated Cell Counter (BioRad, Hercules, CA, USA), as well as a cell counting kit (BioRad, Hercules, CA, USA). Cell viability ranged from 97.1% to 99.6% (Supplementary Figure S19). All cell lines were tested for their mycoplasma contaminations.

Isolation of EVs from the Culture Medium and Sample Preparation for Mass Spectrometry Analysis
Isolation of the exosomes from 18 mL of the culture medium supplemented with FBS and their tryptic digestion were carried out as described previously [53]. Briefly, the culture medium in an equal volume of 18 mL was centrifuged at 5000× g for 30 min at 4 In the obtained samples, the peptide concentrations were determined by the colorimetric method using a Pierce™ Quantitative Colorimetric Peptide Assay kit (Pierce, Rockford, IL, USA) in accordance with the manufacturer's recommendations. The peptides were dried and dissolved in 0.1% formic acid to a final concentration of 2 µg/µL. Based on these measurements, an equal quantity of total peptides, i.e., 2 µg, was compared for all samples via label-free mass-spectrometric profiling. The total peptide amount is presented in Supplementary Table S2.

Obtaining WhL
Cells were washed from the culture medium by centrifugation in a cold PBS at 1500× g for 5 min at 4 • C, and the procedure was repeated 3 times. Ten volumes of a lysis buffer containing 1% SDS (Sigma-Aldrich, St. Louis, MO, USA) in 0.1 M Tris Cl (pH 7.6) were added to the pellet, and the samples were sonicated with a Bandelin Sonopuls probe ("BANDELIN electronic GmbH & Co. KG", Berlin, Germany) at 50% power for 5 min on ice. Then, the samples were centrifuged for 15 min at 14,000× g at 4 • C. The protein concentration was determined by the colorimetric method using a Pierce™ BCA Protein Assay Kit (Pierce, Rockford, IL, USA) in accordance with the manufacturer's recommendations.

Tryptic Digestion of the WhL
Tryptic digestion of the proteins was carried out according to the FASP (Filter-Aided Sample Preparation) protocol [58] with some changes. Briefly, each sample in an amount of 100 µg was transferred to concentration filters with a cut-off of 10 kDa (Merck Millipore Limited, Tullagree, Ireland) by centrifugation at 11,000× g for 15 min at 20 • C. To break the disulfide bonds, each sample was incubated with 30 mM Tris TCEP (Thermo Fisher Scientific, Waltham, MA, USA) and 50 mM CAA (Sigma-Aldrich, St. Louis, MO, USA) at 42 • C for 1 h. Then, the samples were washed 3 times with a buffer containing 8M urea (Sigma-Aldrich, St. Louis, MO, USA) in 100 mM Tris Cl, pH 8.5, and washed twice with 50 mM TEAB, pH 8.5, by centrifugation at 11,000× g for 15 min at 20 • C. Then, 100 µL of a buffer containing 0.02% ProteaseMAX (Promega, Fitchburg, WI, USA) in 50 mM TEAB (Sigma-Aldrich, St. Louis, MO, USA), pH 8.5, and trypsin (Promega, Fitchburg, WI, USA) at a "trypsin to total protein" ratio of 1:50 was added. The samples were incubated overnight at 37 • C. After incubation, the peptides were eluted by centrifugation at 11,000× g for 15 min at 20 • C, and the filter was washed twice with 50 µL of 1% formic acid. In the obtained samples, the peptide concentration was determined by the colorimetric method using a Pierce™ Quantitative Colorimetric Peptide Assay kit (Pierce, Rockford, IL, USA) in accordance with the manufacturer's recommendations. The peptides were dried and dissolved in 0.1% formic acid to a final concentration of 2 µg/µL.

Shotgun Mass Spectrometry
Tandem mass spectrometric analysis was performed for each sample in three technical replicates. The peptide mixture was loaded onto a Zorbax 300SB-C18 trap column (5 µm particle diameter, 5 × 0.3 mm) (Agilent Technologies, Santa Clara, CA, USA) and washed with the mobile phase C (5% acetonitrile in 0.1% formic acid and 0.05% trifluoroacetic acid) at a flow rate of 3 µL/min for 5 min. The peptides were separated on an analytical column Zorbax 300SB-C18 (3.5 µm particle diameter, 150 mm × 75 µm) (Agilent Technologies, Santa Clara, CA, USA) in a mobile phase B gradient (80% solution of acetonitrile in 0.1% formic acid) at a flow rate of 0.3 µL/min. The following parameters of the acetonitrile gradient were used: The analytical column was washed with a 2% mobile phase B for 3 min, and then the concentration of the mobile phase B was linearly increased to 40% for 67 min. Then, for 2 min, the concentration of the mobile phase B was increased to 100%, and the analytical column was washed for 9 min with 100% mobile phase B. Next, the concentration of the mobile phase B was reduced to 2% for 2 min, and the analytical column was balanced with 2% mobile phase B for 7 min.
Mass spectrometry analysis was performed using a Q Exactive™ HF Hybrid Quadrupole-Orbitrap™ Mass Spectrometer (Thermo Scientific, Waltham, MA, USA) equipped with an Orbitrap mass analyzer. Mass spectra were acquired in the positive ion mode with a resolution of 60,000 (m/z = 400) for the MS and 15,000 (m/z = 400) for the MS/MS scans. The AGC target was set to 3 × 10 6 and 2 × 10 5 with a maximum ion injection time of 25 and 150 ms for the MS and MS/MS levels, respectively. The survey MS scan was followed by the MS/MS spectra of the 20 most abundant precursors if the AGC target was greater than 10 4 . HCD fragmentation with the normalized collision energy (NCE) set to 28% was used. The dynamic exclusion duration was 60 s.

Data Analysis: Protein Identification and Label-Free Relative Quantitation
For identification and label-free quantification, mass spectrometry data were loaded into the MaxQuant software (version 1.6.0.16, Max Planck Institute of Biochemistry, Martinsried, Germany). Proteins were identified using the built-in Andromeda algorithm. Identification was carried out using the FASTA file (Uniprot release 25-10-2019, EMBL-EBI, Hinxton Cambridge, UK) and its inverted counterpart to calculate the frequency of false positive identifications (FDR), alongside a built-in database of potential contaminants. The carbamidomethylation of cysteine was used as a fixed modification, and methionine oxidation and N-terminal acetylation were used for variable modification.
The tolerance for the precursor and fragment ions was 20 ppm. For proteins and peptides, the FDR threshold value was 0.01. Quantitative analysis was carried out on the basis of the area under the peak of the parent ion with calculation of the LFQ value performed using the algorithm built into MaxQuant (version 1.6.0.16, Max Planck Institute of Biochemistry, Martinsried, Germany) [59]. Unique peptides without modifications were used for the quantitative assessment. Potential contaminants, false positive identifications, and proteins identified only by peptides containing modifications were removed from the potentially identified proteins.
The statistical analysis was performed in the Perseus 1.6.0.7 software (Max Planck Institute of Biochemistry, Martinsried, Germany). To compare the 3 groups, we used a multi-sample ANOVA test. To compare the two groups, we used a two-sample t-test. The FDR threshold value of permutation (correction for multiple comparisons) was 0.01, S0 = 2. We compared the proteins for which at least 4 unique peptides per protein were identified.
Venn diagrams were generated using the online tool Venny version 2.1 (BioinfoGP, Madrid, Spain). The STRING database v.11.0 was used to retrieve the protein-protein interactions (PPIs) from the lists of EV proteins. A medium (0.4) and high confidence (0.9) score were applied. The active interaction sources were text mining, experiments, and databases. The built-in functional enrichment analysis results according to the cellular components (GO), reactome pathways, and KEGG pathways (where available) were used for visualization.

Synthesis of SIS
The target peptides were selected from the shotgun mass-spectrometric data: Analysis of 5 cell lines in 3 biological replicates and in 3 technical replicates resulted in 45 LC-MS/MS runs. The criteria of selection were as follows: For universal markers, the LFQ intensity of the proteins had to be calculated in at least 40 of the 45 LC-MS/MS runs. The amino acid sequence had to be unique within the biological species Homo sapiens. The amino acid sequence did not contain cysteine (C), methionine (M), N-terminal glutamic acid (E), glutamine (Q), or tryptophan (W) and was missing hydrolysis sites. The length of peptides had to be within the range of 9-20 amino acid residues. For proteins with a large number of peptides, the quality of the high-resolution MS spectra was manually evaluated.
Solid-phase peptide synthesis was performed using the Overture™ Robotic Peptide Library Synthesizer (Protein Technologies, Manchester, UK), as described previously [60]. In the synthesis of isotope-labeled peptides, the isotopically-labeled amino acids Fmoc-Lys-OH-13C6.15N or Fmoc-Arg-OH-13C6.15N (Cambridge Isotope Laboratories, Cambridge, MA, USA) were used instead of the usual lysine or arginine.

Quantitative Analysis of EV Markers by Targeted Mass-Spectrometry
Each experimental sample was analyzed in three technical replicates. The measurements were carried out on the same samples as a shotgun spectrometric analysis. Before analysis, the samples were dried in a vacuum concentrator and reconstituted in 0.1% formic acid containing SIS in an equimolar concentration of 200 fmol/µL. The final content of each SIS was 40 fmol/ µg of total peptides. Chromatographic separation was performed using an Agilent 1200 series system (Agilent Technologies, Santa Clara, CA, USA) connected to a TSQ Quantiva triple quadrupole mass analyzer (Thermo Scientific, Waltham, MA, USA). A sample was separated using an analytical column ZORBAX SB-C18 (150 × 0.5 mm, 5 µm particle diameter) (Agilent Technologies, Santa Clara, CA, USA) in a gradient of acetonitrile with a flow rate of 20 µL/min. First, the column was equilibrated with 5% solution B (80% acetonitrile in 0.1% formic acid) and 95% solution A (0.1% formic acid) for 5 min. Then, the concentration of solution B was linearly increased to 50% for 30 min, after which the concentration of solution B was increased to 99% in 1 min, and the column was washed with 99% solution B for 5 min. Then, the concentration was returned to the initial conditions for 1 min, in which the column was balanced for 9 min. A mass spectrometry analysis was performed in the dynamic selected-reaction monitoring (dSRM) mode using the following settings of the MS detector: The capillary voltage was 4000 V, the velocity of the drying gas (nitrogen) was 7 L/min, the velocity of the axillary gas (nitrogen) was 5 L/min, the capillary temperature was 350 • C, the isolation window for the first and third quadrupole was 0.7 Da, the scan cycle time was 1.2 s, and the collision gas (argon) pressure in the second quadrupole was set at 1.5 mTorr. The retention time window on the reverse phase column was 2.2 min for each precursor ion. The transition and normalized collision energy (V) lists are presented in the Supplement 6 (Sheet "SRM Table"). The data were loaded into the Skyline software v4.1.0 (MacCoss Lab Software, Seattle, WA, USA), where the SRM spectra were manually evaluated. The ratio of natural peptides to their SIS counterparts was automatically calculated for each peptide.

Cryo-EM
Prior to Cryo-EM study 3 µL of the sample were applied to Lacey Carbon EM grid treated with a glow discharge (30 s, 25 mA) in Pelco EasiGlow. After blotting for 2.5 s at 4 • C the grid with the specimen was plunge-frozen into a liquid ethane chilled with liquid nitrogen in Vitrobot Mark IV (Thermo Fisher Scientific, Waltham, MA, USA).
Cryo-EM study was carried out on a Titan Krios 60-300 (Thermo Fisher Scientific, Waltham, MA, USA) transmission electron microscope equipped with direct electron detector Falcon II (Thermo Fisher Scientific, Waltham, MA, USA) and a Cs image corrector (CEOS, Heidelberg, Germany) at accelerating voltage of 300 kV. Images were collected at 18kx magnification (pixel size 3.7 Å) in low-dose mode using EPU software (Thermo Fisher Scientific, Waltham, MA, USA).

Conclusions
Studying the proteomic cargo of EVs revealed the proteins associated with platelet activation, EGFR, Rap1, integrin, and microRNA signaling that could regulate metastasis and cancer progression. The EV protein subsets can distinguish different tissues and cell lines within the same type of cancer. Additionally, the proteomic core of the EVs of an epithelial lineage was established. The resulting EV protein list will provide a backbone for the development of a targeted mass-spectrometry assay that can be applied as a liquid biopsy tool.