Small Extracellular Vesicles Isolated from Serum May Serve as Signal-Enhancers for the Monitoring of CNS Tumors

Liquid biopsy-based methods to test biomarkers (e.g., serum proteins and extracellular vesicles) may help to monitor brain tumors. In this proteomics-based study, we aimed to identify a characteristic protein fingerprint associated with central nervous system (CNS) tumors. Overall, 96 human serum samples were obtained from four patient groups, namely glioblastoma multiforme (GBM), non-small-cell lung cancer brain metastasis (BM), meningioma (M) and lumbar disc hernia patients (CTRL). After the isolation and characterization of small extracellular vesicles (sEVs) by nanoparticle tracking analysis (NTA) and atomic force microscopy (AFM), liquid chromatography -mass spectrometry (LC-MS) was performed on two different sample types (whole serum and serum sEVs). Statistical analyses (ratio, Cohen’s d, receiver operating characteristic; ROC) were carried out to compare patient groups. To recognize differences between the two sample types, pairwise comparisons (Welch’s test) and ingenuity pathway analysis (IPA) were performed. According to our knowledge, this is the first study that compares the proteome of whole serum and serum-derived sEVs. From the 311 proteins identified, 10 whole serum proteins and 17 sEV proteins showed the highest intergroup differences. Sixty-five proteins were significantly enriched in sEV samples, while 129 proteins were significantly depleted compared to whole serum. Based on principal component analysis (PCA) analyses, sEVs are more suitable to discriminate between the patient groups. Our results support that sEVs have greater potential to monitor CNS tumors, than whole serum.

for the reliable evaluation of actual tumor status [25][26][27] but none of these proteins alone was found to be sufficiently specific and sensitive to serve as a monitoring marker.
Regarding that previous attempts to find surrogate serum markers for brain tumors have failed when based on a single or only few candidate factors, we made an attempt to identify a characteristic protein fingerprint of 10-20 candidate markers associated with CNS tumors.
For this purpose, 96 serum samples were collected from four patient groups according to the criteria of the National Ethical Committee and proteomics analysis was performed using liquid chromatography and mass spectrometry (LC-MS). The serum samples were obtained from patients diagnosed with the two most common types of brain tumors [28], namely malignant glioblastoma multiforme (GBM) and benign meningioma (M), as well as from patients with a prevalent brain metastasis [29] originating from non-small-cell lung cancer (BM). Patients with lumbar disc herniation served as controls (CTRL). Following a statistical selection, these four patient groups were compared with respect to the identified proteins. In parallel, small extracellular vesicles (sEVs) were isolated from the serum samples by differential centrifugation and proteomics and statistical analyses were also performed on these sEV samples, allowing to compare the suitability of these two different sample types. According to the best of our knowledge, this is the first study that compares the proteome of whole serum and serum-derived sEV samples. Results from the proteomics analysis indicate that using a protein fingerprint of serum-derived sEVs instead of analyzing whole serum increases the accuracy of distinguishing between the clinical samples, that is, between the patient groups. Our results support that sEVs have a greater potential for the proteomics-based monitoring of CNS tumors compared to whole serum analysis.

EV Samples Show sEV Properties with Similar Concentration and Size Distribution in the Different Patient Groups
To verify the value of circulating extracellular vesicles as potential biomarkers for CNS tumors, EVs were isolated from the serum of patients with glioblastoma multiforme (GBM), single brain metastasis originating from non-small-cell lung cancer (BM) and meningioma (M), as well as from control patients with lumbar disc herniation (CTRL). Each group included 24 individuals of both genders with various ages. Extracellular vesicles were isolated from the sera by differential centrifugation and were characterized by atomic force microscopy (AFM) and nanoparticle tracking analysis (NTA). Pools of 6 samples were formed in all groups, allowing four parallel samples to be tested per group (see in Section 4.1). Western blot analyses were also performed to demonstrate the EV nature ( Figure S1).
EVs were divided into subtypes based on their size range, separating small EVs (sEVs) and medium/large EVs (m/lEVs) [30]. AFM analysis revealed that the small EV subtype includes various structures. Mean and mode diameters of the particles, represented by an average of the 16 sample pools, were measured as 112 nm and 86 nm by NTA, respectively ( Figure 1A).
The quantitative characterization of serum sEVs by NTA ( Figure 1B) revealed no significant differences between the four patient groups regarding the size and concentration of circulating sEVs. However, within the groups high individual differences were observed in the measured parameters of the sEVs. The quantitative characterization of serum sEVs by NTA ( Figure 1B) revealed no significant differences between the four patient groups regarding the size and concentration of circulating sEVs. However, within the groups high individual differences were observed in the measured parameters of the sEVs.

Statistical Analysis of LC-MS Data Reveals Characteristic Proteomic Fingerprints for Each Patient Group and Informs on the Suitability of the Two Different Sample Types in Distinguishing CNS Tumors
We aimed to identify the differences between the four patient groups to reveal the characteristic protein profiles associated with the CNS tumors in point. Using an intensity ratio of >2 or <0.5 with Cohen's d effect size of 2 as a cut-off, we investigated which proteins show reliable intensity difference and which proteins can separate at least one group from the others based on a receiver operating characteristic (ROC) analysis. Moreover, utilizing principal component analysis (PCA) with k-means clustering, we were able to compare the suitability of the two different sample types to distinguish between the CNS tumors in point. Figure 2 shows the flowchart of LC-MS data processing and the results of the statistical analyses.

Statistical Analysis of LC-MS Data Reveals Characteristic Proteomic Fingerprints for Each Patient Group and Informs on the Suitability of the Two Different Sample Types in Distinguishing CNS Tumors
We aimed to identify the differences between the four patient groups to reveal the characteristic protein profiles associated with the CNS tumors in point. Using an intensity ratio of >2 or <0.5 with Cohen's d effect size of 2 as a cut-off, we investigated which proteins show reliable intensity difference and which proteins can separate at least one group from the others based on a receiver operating characteristic (ROC) analysis. Moreover, utilizing principal component analysis (PCA) with k-means clustering, we were able to compare the suitability of the two different sample types to distinguish between the CNS tumors in point. Figure 2 shows the flowchart of LC-MS data processing and the results of the statistical analyses.
Proteomics analyses by LC-MS (Step 1) were performed on whole serum and sEV samples obtained from patients with GBM, BM, M and CTRL. Individual samples (n = 24) in each group were arranged into 4 pools (see in Section 4.1) to eliminate individual variances, reduce sample number, shorten the time of LC-MS measurements and reduce the need for materials. The Data independent acquisition (DIA) mode constructed spectral library revealed 311 proteins (see Table S1). Based on Pearson's correlation analyses (Step 2), one of the sEV control samples had to be excluded from further statistical analyses (Table S2). After excluding unreliable proteins, as well as proteins with missing values (Step 3), a total of 262 proteins remained for the final analysis.  Following basic processing, up-and down-regulated protein discovery (Step 4) resulted in 41 whole serum proteins and 45 sEV proteins. In addition to comparing each CNS tumor group to CTRL, between-group differences among the CNS tumor groups were also assessed in the protein selection process. As clinically relevant incidence is an important consideration for selecting the proteins identified, Cohen's d effect size was adopted as an indicator of between-groups difference. The Cohen's d effect size analysis (Step 5) with a threshold of d > 2 yielded 10 and 21 proteins in the whole serum and sEV samples, respectively. In the ROC analyses (Step 6) 10 whole serum proteins (MMP9, HSPB1, CASP14, HBG1, IGHG4, DEFA1, VWF, HNRNPA1, S100A8, TLN1) and 17 sEV proteins (MMP9, HSPB1, CASP14, HBG1, FGB, GGCT, PF4, S100A7, FN1, ANPEP, FLG2, HSPA8, IGLL1, MMRN1, S100A14, SBSN, SPRR2E) were found to meet the AUC = 1 selection criteria. Table S3 includes the UniProt ID, Gene symbol, ratio of intensity means > 2 or < 0.5 and Cohen's d effect size > 2 parameters for the selected proteins. The two sample groups shared four significantly altered proteins (highlighted in Table S3), namely MMP9, CASP14, HBG1 and HSPB1.
Following protein selection, PCA (Step 7) was performed to visualize the dataset, where several potentially correlated proteins were projected into a smaller number of variables. K-means clustering (Step 8) on the whole serum PCA biplot resulted in 3 inhomogeneous or incomplete clusters. Calculated cluster homogeneity and completeness scores are 0.56 and 0.73, respectively. In contrast to whole serum samples, the clustering of sEV samples formed homogeneous and complete clusters, with homogeneity and completeness scores of 1. The results of the PCA analyses and k-means clustering indicate considerable differences between the whole serum and sEV samples ( Figure 2B). We found that the accuracy of distinguishing between various CNS tumors can be increased using a protein panel from serum-derived sEVs, compared to analyzing whole serum samples. Statistical comparison of the proteome of sEV and whole serum samples was performed to reveal quantitative differences affecting the suitability of different sample types to provide biomarkers for CNS tumor status monitoring. Pairwise statistical comparison (Welch's test) was used to identify proteins significantly enriched or depleted in sEV samples compared to whole serum samples ( Figure 3). Sixty-five proteins were found to be significantly enriched in sEV samples, while 129 proteins were significantly depleted (p < 0.05). Using our sEV purification protocol detailed in the Section 4, we obtained a uniform particle size range of sEVs but the magnitude of quantitative changes in the sEV versus whole serum proteome suggested the possible presence of lipoprotein and serum protein contaminations. The level of apolipoproteins was decreased in sEV enriched samples (sEV/serum mean ratio is 0.66), however this fraction could not be completely eliminated. Besides, well known high abundance serum proteins (e.g., ALB) dominated the protein content of sEV enriched samples too. However, the enrichment of non-tissue specific (ITGA2B, ITGB3, LGALS3BP), epithelial cell (CD5L) and platelet related (STOM, TSPAN9) EV marker proteins [31] confirms sEV enrichment (sEV/serum mean ratio is 26.58), while it also demonstrates the presence of sEVs produced during clotting. enriched samples too. However, the enrichment of non-tissue specific (ITGA2B, ITGB3, LGALS3BP), epithelial cell (CD5L) and platelet related (STOM, TSPAN9) EV marker proteins [31] confirms sEV enrichment (sEV/serum mean ratio is 26.58), while it also demonstrates the presence of sEVs produced during clotting. Among the 17 proteins of the sEV marker panel described in Section 2.2 only 6 were significantly enriched in the sEV samples and 5 of the 10 proteins comprising the specific serum panel had higher abundance in whole serum ( Figure 3). These findings suggest that the better suitability of sEV enriched samples to serve a biomarker source is not explained by a total increase in the abundance of specific proteins. (Detailed proteomics findings, protein annotation and sEV enrichment data are available in Table S1).
Additional sample processing (sEV isolation) may introduce higher technical variance in case of sEV samples, thus it may reduce the analytical suitability of this sample type. Our analysis revealed a similar level of variance for proteins quantified in each sample type (excluding contaminants)median coefficients of variation within each patient group were in the ranges of 20.78%-23.87% for sEV and 20.21%-24.45% for serum samples (see Figure S2 for CV distributions). Among the 17 proteins of the sEV marker panel described in Section 2.2 only 6 were significantly enriched in the sEV samples and 5 of the 10 proteins comprising the specific serum panel had higher abundance in whole serum ( Figure 3). These findings suggest that the better suitability of sEV enriched samples to serve a biomarker source is not explained by a total increase in the abundance of specific proteins. (Detailed proteomics findings, protein annotation and sEV enrichment data are available in Table S1).

Biological Background Might be Responsible for the Increased Suitability of sEV Samples to Provide Biomarkers for CNS Tumor Status Monitoring
Additional sample processing (sEV isolation) may introduce higher technical variance in case of sEV samples, thus it may reduce the analytical suitability of this sample type. Our analysis revealed a similar level of variance for proteins quantified in each sample type (excluding contaminants)-median coefficients of variation within each patient group were in the ranges of 20.78-23.87% for sEV and 20.21-24.45% for serum samples (see Figure S2 for CV distributions).

Biological Background Might Be Responsible for the Increased Suitability of sEV Samples to Provide Biomarkers for CNS Tumor Status Monitoring
To gain insight into the biological background of the obtained proteomics data, IPAwas applied. We performed 'Core Analyses' for whole serum and sEV data separately, yielding a list of significantly influenced 'Diseases and Functions' in each patient group (p < 0.05). Using 'Comparison Analysis,' we were able to develop heatmaps covering the relevant systemic and tumor-related functions, as well as the activated or inhibited immune functions ( Figure 4A). Regarding whole serum samples, many of the significantly influenced functions identified are related to CNS involvement and active immune regulatory processes but the patient groups are not clearly distinguished on the heatmaps ( Figure 4A, left panels). In contrast, on two of the three sEV proteome-based heatmaps M was evidently separated from the malignant tumor groups ( Figure 4A, right panels), where tumor progression-related functions (e.g., angiogenesis, proliferation and migration of tumor cells) were detected to be highly activated and the activated immune functions (e.g., cell movement or activation of myeloid cells) predominate over inhibited immune functions (e.g., phagocytosis). To gain insight into the biological background of the obtained proteomics data, IPAwas applied. We performed 'Core Analyses' for whole serum and sEV data separately, yielding a list of significantly influenced 'Diseases and Functions' in each patient group (p < 0.05). Using 'Comparison Analysis,' we were able to develop heatmaps covering the relevant systemic and tumor-related functions, as well as the activated or inhibited immune functions ( Figure 4A). Regarding whole serum samples, many of the significantly influenced functions identified are related to CNS involvement and active immune regulatory processes but the patient groups are not clearly distinguished on the heatmaps ( Figure 4A, left panels). In contrast, on two of the three sEV proteome-based heatmaps M was evidently separated from the malignant tumor groups ( Figure 4A, right panels), where tumor progression-related functions (e.g., angiogenesis, proliferation and migration of tumor cells) were detected to be highly activated and the activated immune functions (e.g., cell movement or activation of myeloid cells) predominate over inhibited immune functions (e.g., phagocytosis). Next, we attempted to specify the common biological role of the characteristic protein profiles identified. Therefore, we elaborated two networks containing the selected 10 and 17 proteins identified based on whole serum and the sEV data, respectively ( Figure 4B). Using the 'Grow tool,' the top ten influenced 'Diseases and Functions' were integrated into the networks. In case of the whole serum network ( Figure 4B, left panel), nine different related 'Diseases and Functions' were identified, including viral infection, apoptosis, necrosis or cell movement of phagocytes and myeloid cells and only one was cancer-related. In contrast, the top ten influenced diseases identified on the sEV network based on the identified 17 proteins ( Figure 4B, right panel) were all tumor-associated, suggesting their potential involvement in the pathophysiology of cancers.

Discussion
Non-invasive diagnostic tests are of outstanding clinical importance because of their minimal burden and risk to the patient, their repeatability, low cost, high information content and easy accessibility. In CNS tumors, a minimally invasive technique for describing the actual tumor status should be particularly important. Conventional MRI tests commonly used for the monitoring of CNS tumors are not absolutely appropriate for discriminating between various tumor types (e.g., cannot differentiate between glioblastomas and solitary metastases, CNS lymphomas or other glioma grades [32]) and cannot distinguish recurrence from pseudoprogression. Brain biopsies, as another option, are highly challenging and risky, especially when multiple sampling is required for long-term follow-up [33]. For several cancer types, blood-based tumor markers, such as PSA, AFP and CA125 have been introduced into clinical practice and research for the identification of further noninvasive biomarkers applicable for monitoring a wider scale of malignant diseases is ongoing [34]. However, regarding CNS tumors these studies have generally failed, presumably explained by several reasons, including (1) the barrier function of BBB (releasing less tumor 'information' into the systemic circulation), (2) the presence of molecules released into the blood from other sources and (3) possibly because of the complexity of tumor tissues (such as glioblastoma multiforme). These issues hamper attempts to use a single or only a few biomarkers to diagnose and monitor CNS tumors.
Based on these considerations, we aimed to detect the characteristic protein fingerprint of some common CNS tumors, trying to amplify the signal/information that brain tumors release into the circulation. For this purpose, the protein content of 96 clinical serum samples and related sEV samples isolated from the whole serum was measured by LC-MS. Serum samples were collected from three brain tumor groups considered as the most common malignant, benign and metastatic brain tumors (glioblastoma multiforme, meningioma [28] and brain metastasis of non-small-cell lung cancer [29]) and a control group (lumbar disc herniation).
To examine whether the proteomes of serum and sEV samples are suitable for differentiating between the CNS tumors in point, that is, whether they are applicable to diagnose and monitor the disease, the proteomes of these four patient groups were compared. The effectiveness of tumor type distinction may be increased if the analysis is restricted to proteins which exhibit significant between-group differences. Protein selection was carried out as described in literature [35] (using ratio of intensity means; Cohen's d effect size; ROC) but much stricter thresholds were applied (>2, <0.5; d > 2; AUC = 1, respectively). Statistical selection yielded a collection of proteins whose intensity showed significant between-group differences and thus these proteins could be reckoned as the most suitable molecules for distinguishing between the tumor types examined. Specifically, protein selection yielded a 10-and 17-membered protein panel for whole serum and sEV samples, respectively. While none of these proteins appeared to be able to distinguish between the patient groups individually, their combination was found to reliably discriminate between the different patient groups suggesting that instead of a few candidates, a specific protein panel is required for a perfect differentiation between various tumor types.
To evaluate group distinction efficiency, PCA with k-means clustering was carried out according to literature [36]. Homogeneity and completeness scores of the clusters were calculated to measure the performance of k-means clustering. Cluster homogeneity and completeness mean that each cluster contains only samples from the same group and all samples of a given group are assigned to the same cluster. Both scores are bounded below by 0 and above by 1. A score of 1 indicates perfect homogeneity or completeness. PCA revealed that sEV samples were more suitable for group distinction. Despite carefully selected and perfectly identical statistical analyses for the two sample types, the homogeneity and completeness scores for the whole serum analysis were 0.56 and 0.73, respectively, compared to scores 1 and 1 for the analysis of sEV samples. The explanation for these findings is illustrated on a PCA biplot (Figure 2). Regarding serum samples, the proteins that can separate two given groups by the appropriate ratio and effect size may have similar intensities in other groups as well. For example, DEFA1 is important in distinguishing the CTRL group from the BM and GBM groups, however, it shows similar intensities in the GBM and BM groups, hampering the separation of these groups (see DEFA1 arrow pointing between the BM and GBM groups in the whole serum PCA plot). Still, DEFA1 cannot be removed, because it plays a key role in separating the CTRL group from malignant tumors. In contrast, the majority of the proteins identified in the sEV samples were able to separate any given group from all the others.
To check whether the poorer performance of whole serum proteins in distinguishing between the patient groups is attributed to the number of the proteins considered, we performed another PCA analysis including only up-and down-regulated proteins selected from the whole identified panel (see Figure 2, Step 4), yielding a similar number of proteins for the two sample types. The PCA analysis of these 41 whole serum and 45 sEV proteins yielded similar results as the previous analysis of carefully selected proteins only and the sEV sample type proved to perform better again. Although the sEV sample was far from being perfect in this case (4 groups were recognized with a homogeneity score of 0.66 and a completeness score of 0.66), the results for the whole serum analysis indicated that not even the sample groups can be recognized based on these proteins only 2 groups were recognized, homogeneity-0.07, completeness: 0.40) ( Figure S3). These findings support that sEVs have a better efficiency in distinguishing between various patient groups, irrespective of the order of magnitude of proteins analyzed for the comparison of sEV and whole serum samples.
To investigate the background of our observations, we performed a quantitative proteomics comparison of the two sample types. A quantitative evaluation of sEV purification protocols was suggested based on quantitative LC-MS based proteomics approach, using enrichment analysis of carefully selected sEV markers along with medium specific contamination marker proteins (e.g., lipoproteins and serum). To the best of our knowledge, we are the first group to quantitatively compare the proteome of serum derived small extracellular vesicles with that of the original whole serum samples. sEV enrichment may increase the relative abundance of proteins present in higher concentration within sEVs and the increased signal-to-noise ratio may be beneficial for the quantitative LC-MS analysis of such proteins. On the other hand, proteins in serum are originating from different sources of the human body. Any fractionation (e.g., enrichment of a specific sEV population) may decrease the suppressing effect of the uninformative protein fraction released from sources not specific for the target disease. No association was revealed between being a sEV marker and sEV enrichment, suggesting that it not the overall enrichment process that should be responsible for the increased suitability of sEV samples to provide biomarkers for CNS tumor monitoring. Instead, the removal of an uninformative protein fraction, providing a more specific sample, may explain why the sEV sample is more applicable for distinguishing between various CNS cancer patient groups. Compared to whole serum samples, EVs may be more suitable for investigating tumor related molecular patterns, as the characteristic fingerprint molecules are present in higher concentrations in sEV samples and are accompanied by less contaminating molecules that may bias the analytical findings.
To understand the biological background for our proteomics-based data, IPA was used for the separate analyses of whole serum and sEV data. 'Core Analyses' were performed, yielding a list of significantly influenced 'Diseases and Functions' comprising tumor-related functions as well as activated or inhibited immune functions in each patient group (p < 0.05). 'Comparison Analysis' was carried out to compare the affected 'Diseases and Functions' in the different patient groups. Regarding whole serum samples, many of the significantly influenced functions identified were associated with CNS involvement and active immune regulatory processes but the patient groups were not clearly distinguished on the heatmaps. In contrast, on the sEV proteome-based heatmaps the benign M was clearly separated from the malignant tumors, for which numerous tumor progression-related functions (e.g., angiogenesis, proliferation and migration of tumor cells) were found to be highly activated. The generated IPA heatmaps also revealed that the proteome of sEV samples may provide more specific information on the immune reactions characteristic to the patient groups. We assume that activated immune functions (e.g., cellular infiltration and migration of phagocytes) may play a crucial role in the development of an immune-suppressive microenvironment, while antitumoral immune responses (e.g., phagocytosis, inflammation) might be inhibited.
Serum is a dual source of biomolecular information on cancer, as it contains the molecules released by cancer cells, as well as those released during the immune system's tumor-specific responses [37]. Therefore, the differences observed in the serum vesicles isolated from different patient groups may not only mirror tumor-specific processes but also those related to the associated immune responses [38,39]. Samples enriched in sEVs can offer an amplified source of relevant information, representing not only the specific tumor tissue but also the associated immune responses. Thus, an appropriate protein panel, covering both sources, may have improved efficiency for CNS tumor classification and monitoring.
In addition, the networks developed based on the IPA 'Grow tool' demonstrated that the biological background of the sEV-based characteristic protein profile is more specifically associated with the tumor types compared with the whole serum based protein profile. The role of some of the proteins included in the sEV-based characteristic protein profile has already been described, for example in GBM biology, making these proteins promising targets for extracellular vesicle-based biomarker development [25].
In addition to the proteomics-based comparison of EV samples, we also examined the EV concentration of individual serum samples. Interestingly, no significant differences were detected between the four patient groups regarding the concentration of serum sEVs, with a mean size of 112.2 nm. Osti et al. observed higher EV plasma levels in GBM patients, brain metastases and extra-axial brain tumors compared to healthy controls. Other researchers also demonstrated higher EV concentration in tumorous patients, when unfractionated EV isolates [40] or a wider spectra of EVs were analyzed [41,42]. However, other non-neoplastic diseases of the central nervous system may also increase the number of small-sized circulating EVs, as it was demonstrated in acute ischemic stroke [43] or multiple sclerosis patients [44]. Our vesicle number measurement results, as well as the findings detailed above suggest that the elevated sEV concentration cannot be clearly attributed to the presence of the tumor as immune responses or other systemic responses also contribute to the circulating EV population. Our proteomics-based findings, coupled with the available literature data, suggest that circulating small-sized EVs show important qualitative but not quantitative differences between benign or malignant brain tumors and spinal disc herniation.
Liu and colleagues highlighted that serum is not the perfect choice for a representative sampling of circulating EVs [45], as a high fraction of EVs may be lost during coagulation and also blood components (e.g., platelets) may release microvesicles (MV) during clotting, altering the original MV content of blood samples. However, serum is still the preferred sample form for blood-based clinical diagnoses and it is a practical choice for future clinical developments. It should be noted that co-purification of proteins [46] and lipoprotein particles [47] in EV isolation methods is a common and well known challenge [31]. The presence of protein aggregates [48] and lipoproteins in sEV isolates may provide additional explanation for the lack of increase in the concentration of enriched sEV particles in cancer patients' serum, contrary to literature data on plasma [49] or serum samples [40]. Efforts to eliminate lipoproteins are described in numerous papers reporting on attempts to introduce more sophisticated methods (e.g., combination of ultracentrifugation and size-exclusion chromatography) [50]. In fact, these laborious and instrumentation demanding methods are of high importance in scientific research of the molecular contents of EVs but they may not be applicable in routine clinical practice. Our sEV isolation protocol has several advantages, as it does not require expensive equipment or highly trained professionals and the entire procedure (along with characterization) is performed within 4 h, therefore, this technique could be easily incorporated into clinical practice.
Our quantitative proteomics results demonstrate that even a simple sEV enrichment protocol can increase the diagnostic potential of serum samples for the identification and classification of patients with different CNS cancers. This finding also supports that even a low-efficacy sEV enrichment/purification method may be appropriate to enhance the analytical applicability of serum samples for CNS cancer monitoring, however, in such cases a quantitative description of enrichment efficiency is definitely required for the right interpretation of the analytical results [51].
In conclusion, our findings support that extracellular vesicles have a greater potential for the monitoring of CNS tumors compared to whole serum samples. Using EV samples is a possible way to amplify the signals released by brain tumors into the circulation. Given the easy-to-implement isolation and enrichment protocol established in this study, the introduction of EV analysis would be beneficial in clinical practice.

Patients
Blood samples of 96 patients treated between March 2015 and January 2018 in the Department of Neurosurgery, University of Debrecen were analyzed. Samples were obtained from patients with primary glioblastoma multiforme (GBM), meningioma (M) and single brain metastasis originating from non-small-cell lung cancer (BM). Control samples (CTRL) were collected from patients with spinal disc herniation without evidence of cancer. This non-tumor patient group served as control group in comparison to the patients having different intracranial tumor to distinguish the effects of tumorous processes from the CNS involvement on circulating sEVs. Each group contained 24 individuals with mixed ages and genders. As shown in the Table 1, six-sample-pools were created from the individuals, allowing four parallel samples to be tested per group. Blood samples were collected one day prior to neurosurgical procedure in each tumor case. None of the patients received radio-or chemotherapy before tumor resection. Blood samples were stored by the Neurosurgical Brain Tumor and Tissue Bank of Debrecen according to the criteria of the National Research Ethics Committee. An informed consent form was signed by each patient; the study was conducted in accordance with the Declaration of Helsinki. This study was carried out according to two ethical approvals, namely

Preparation of Serum Samples, sEV Isolation and Characterization
Blood samples were collected into BD Vacutainer SST II Advance Tubes (Becton, Dickinson and Company, Franklin Lakes, NJ, USA), allowed to clot for at least 1 h at room temperature and centrifuged for 20 min at 3000× g, 10 • C to remove cells. Following the 3000× g centrifugation, the supernatant serum was transferred to new Eppendorf tubes and centrifuged for 30 min at 10,000× g, 4 • C to remove debris and large vesicles. One milliliter serum aliquot was diluted with DPBS (Ca 2+ -free, Mg 2+ -free, Lonza Group Ltd., Basel, Switzerland) to 8 mL and ultracentrifuged for 70 min, at 100,000× g, 4 • C (polycarbonate tubes, fixed angle T-1270 rotor, Thermo Fisher Scientific, Waltham, MA, USA). The pellet was resuspended in 100 uL DPBS and stored at −80 • C until further processing. This sEV isolation protocol served to reach intermediate recovery and intermediate specificity according to MISEV2018 [31].
As suggested by a recent study [53], sEVs were diluted in particle free DPBS and analyzed using a NanoSight NS300 instrument with 532 nm laser (Malvern Panalytical Ltd., Malvern, UK). The measurements were performed on the 16 sEV sample pools (described in 4.1). Six videos of 60 s were recorded for each sample under constant settings (Camera level: 15; Treshold: 4, 25 • C; 60-80 particles/frame) and analyzed to obtain data on size distribution and particle concentration.

'In Solution' Digestion
Individual samples containing 20 µg protein were diluted to 10 µL with 0.1 M NH 4 HCO 3 (pH = 8.0) buffer; 12 µL 0.1% RapiGest SF (Waters, Milford, MA, USA) and 2 µL 55 mM dithioeritritol solution was added and kept at 60 • C for 30 min to unfold and reduce proteins. A volume of 2 µL 200 mM iodo acetamide solution was added to alkylate the proteins which were kept for an additional 30 min in the dark at room temperature. The samples were digested overnight at 37 • C with trypsin (Thermo Scientific, Waltham, MA, USA, enzyme/protein ratio: 0.4 to 1). The digestion was stopped by addition of 1 µL of concentrated formic acid.

LC-MS
The separation of the digested samples was carried out on a nanoAcquity UPLC, (Waters, Milford, MA, USA) using Waters ACQUITY UPLC M-Class Peptide C18 (130 Å, 1.78 µm, 75 µm × 250 mm) column with a nonlinear 90 min gradient. Eluents were water (A) and acetonitrile (B) containing 0.1 V/V% formic acid and the separation of the peptide mixture was performed at 45 • C with 0.35 µL/min flow rate using an optimized nonlinear LC gradient (3-40% B). The LC was coupled to a high-resolution Q Exactive Plus quadrupole-orbitrap hybrid mass spectrometer (Thermo Scientific, Waltham, MA, USA). The quantitative measurements of digested individual samples were performed in DIA mode. The survey scan for DIA method operated with 35,000 resolution. The full scan was performed between 380 to 1020 m/z. The AGC target was set to 5 × 106 or 120 ms maximum injection time. In the 400-1000 m/z region 22 m/z wide overlapping windows were acquired at 17,500 resolution (AGC target: 3 × 106 or 100 ms injection time, normalized collision energy: 27 for charge 2). The quantitative analysis was performed in Encyclopedia 0.81 [54] using default settings after deconvolution, peak picking and conversion of raw MS files to mzML format in Proteowizard [55]. A comprehensive spectral library [56] of 10,000 human proteins was used for peptide identification. Protein quantities calculated by the Encyclopedia software based on summed intensities of the automatically filtered peptides were used in further statistical evaluations.

Statistical Analysis
The collected data about the whole serum and extracellular vesicles were reduced and analyzed using statistical methods. Pearson's correlation analysis was used to investigate the outlier samples [57], contaminating proteins (cytokeratins) and proteins with missing values were excluded from the proteomic data [58]. Data were log-transformed to reduce skewness and increase linearity [59]. Cohen's d effect size was calculated to measure the difference between the protein intensity means, outcomes in two different groups [60,61]. Pairwise ROC analysis allowed us to find those proteins which can separate at least one group to the others [62]. The calculated ROC AUC (area under the ROC curve) values are accepted if it equals to 1. In order to transform several (potentially) correlated proteins into a (smaller) number of uncorrelated variables and visualize the dataset, PCA with k-means clustering was performed [63][64][65][66]. Homogeneity and completeness scores of the clusters were calculated to measure the performance of k-means clustering [67]. Two-tailed Welch's t-test was performed to identify the significantly enriched or depleted proteins in sEV samples. The statistical analyses were performed using R statistical program (version 3.6.3 with pROC, FactoMineR, factoextra and ggplot2 packages; Vienna, Austria), Python programming language (version 3.8, Scotts Valley, CA, USA) and Perseus (MaxQuant, Munich, Germany). Values of p < 0.05 were considered significant (see in Appendix A more detailed). GraphPad Prism 8 (San Diego, CA, USA) was used for further data visualization.

In Silico Analysis of LC-MS Data
Protein data derived from the LC-MS were analyzed by the IPA (Qiagen Bioinformatics, Hilden, Germany). Using fold change values, 'Core Analysis' were performed for whole serum and sEV data separately to identify 'Diseases and Functions,' which can be significantly influenced by the described proteomes (p < 0.05). After 'Comparison Analysis,' we created heatmaps of the relevant 'Diseases and Functions,' that is, tumor-related and immunological functions showing regulatory differences between the three CNS tumor groups. Activation z-score calculated by IPA indicates the extent and direction of the effect that given proteins have on function/disease.
The selected 10 whole serum proteins and 17 sEV proteins were introduced to custom pathways as well. Then, 'Connect tool' of IPA was used to reveal the relationships between these molecules and 'Grow tool' was applied to search the top ten 'Diseases and Functions' assigned to the 10 whole serum proteins or 17 sEV proteins. Results are displayed in two networks created by the IPA Path Designer.
Confidence level was set to 'Experimentally observed' for all IPA procedures, which enables literature data-based analysis but excludes unproven predictions.

Data Availability
All datasets generated during the current study are available from the corresponding author upon reasonable request.
We have submitted all relevant data of our experiments to the EV-TRACK [68] knowledgebase EV-TRACK ID: EV200080.

Conclusions
Our study aimed to detect the characteristic protein fingerprint of the most common CNS tumors. Intending to amplify the signal that brain tumors release into the circulation, in addition to the whole serum's, the protein content of the small extracellular vesicles isolated from the serum was also examined.
Comparative proteomic analysis suggests that sEVs may be more suitable for investigating tumor related molecular patterns, because these molecules are present in higher concentrations in sEV samples compared to whole serum samples and have less 'noise' that may bias the analytical findings. In silico analyses revealed that the biological background of the sEV-based characteristic protein profile of the samples is more specifically associated with the tumor types compared with the whole serum based protein profile. Samples enriched in sEVs can offer an amplified source of relevant information, representing not only the specific tumor tissue but also the associated immune responses.
These findings revealed that circulating small-sized extracellular vesicles were more suitable for separating different patient groups. The number of proteins applied for monitoring cannot be reduced to a few individual molecules, instead, a specific protein panel is required for perfect differentiation. To the best of our knowledge, we are the first group to quantitatively compare the proteome of serum derived small extracellular vesicles with that of the original whole serum samples.
In conclusion, our findings support that extracellular vesicles have a greater potential for monitoring CNS tumors, compared to whole serum samples. Considering that analyzing sEVs can be performed easily, incorporating our method into clinical practice would be of great benefit.  Figure S1: Western blot analyses of classical EV markers, Figure S2: Intragroup Coefficients of variation (CV) distributions, Figure S3: PCA dotplot constructed after statistical selection based on the means of intensity ratio; Table S1: 311 membered protein table of DIA mode constructed spectral library; Table S2: Sample correlation matrix; Table S3: List of the selected proteins.

Acknowledgments:
The authors thank Lilla Pinter for her technical assistance and Zsolt Szegletes for taking the AFM images. The authors thank Dora Bokor, PharmD, for proofreading the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

AFM
Atomic force microscopy BBB Blood brain barrier BM Brain metastasis originating from non-small-cell lung cancer CNS The collected data about the whole serum and extracellular vesicles were reduced and analyzed using statistical methods. Pearson's correlation analysis was used to investigate the outlier samples [57]. Contaminating proteins (cytokeratins) and proteins with missing values were excluded from the proteomic data [58]. Data were log-transformed to reduce skewness and increase linearity [59].
Cohen's d effect size was calculated to measure the difference between the protein intensity mean, outcomes in two different groups. The formula of the Cohen's d effect size d = X 1 − X 2 (n 1 −1)SD 2 1 +(n 2 −1)SD 2 2 n 1 +n 2 −2 where X is the mean protein intensity in a given group, SD is standard deviation and n is sample size [60]. In this study we say at least 2 effect size is necessary. It indicates that the mean of group 1 is at the 97.7 percentile of group 2, and the nonoverlapping area of the two distributions at least is 81.1% [61]. Pairwise ROC analysis allowed us to find those proteins which can separate at least one group to the others [62]. The ROC analysis use the true positive rate (sensitivity) and the true negative rate (specificity) at various threshold settings. Plotting the sensitivity against the 1-specificity we get the ROC curve, under the area under this curve measure the separability of the given variable (protein). AUC = 0.5 represents an unsuitable variable to the separate two groups. If AUC = 1, the separation using the actual variable is error-free. In our study the calculated AUC (area under the curve) values are accepted if it equals to 1.
In order to transform several (potentially) correlated proteins into a (smaller) number of uncorrelated variables, and visualize the dataset, principal component analysis (PCA) with k-means clustering was performed. The goal of PCA is to reduce a large number of correlated variables with a set of uncorrelated principal components. These components can be thought of as linear combinations of the original variables that are optimally weighted and derived from the correlation matrix of the data [63].
K-means clustering was performed on the obtained PCA score plots by Hartigan-Wong algorithm [64,65]. Optimal numbers of clusters were determined with Silhouette method and these recommended values were used for clustering [66].
Homogeneity and completeness scores of the clusters were calculated to measure the performance of k-means clustering Cluster homogeneity and completeness mean that each cluster contains only samples from the same group, and all samples of a given group are assigned to the same cluster. Both scores are bounded below by 0 and above by 1. A score of 1 indicates the perfect homogeneity or completeness. [67].
Two-tailed Welch's t-test was performed to identify significantly enriched or depleted proteins in sEV samples.