Mapping the Melanoma Plasma Proteome (MPP) Using Single-Shot Proteomics Interfaced with the WiMT Database

Simple Summary We developed a clinical proteomics methodology, known as Wise MS Transfer (WiMT), for deep identification of blood proteins in undepleted plasma samples. We applied it to the analysis of undepleted melanoma plasma samples as a proof of principle. Malignant melanoma is the most aggressive type of skin cancer, and early diagnostic and prognostic predictors are essential to establish the most suitable treatment tailored to the patient. Our results showed the greatest identification of proteins and biological processes to date reported for a “dilute and shoot” approach within plasma samples from melanoma patients. More than 1200 proteins related to key biological processes in melanoma progression were mapped, including signaling (the PI3K–Akt signaling pathway), immune system processes (complement and coagulation cascade), and secretion (exosome proteins). These proteins and related biological processes constitute the core of blood components that could be monitored by mass spectrometry in clinical proteomic studies from undepleted plasma samples in melanoma. Abstract Plasma analysis by mass spectrometry-based proteomics remains a challenge due to its large dynamic range of 10 orders in magnitude. We created a methodology for protein identification known as Wise MS Transfer (WiMT). Melanoma plasma samples from biobank archives were directly analyzed using simple sample preparation. WiMT is based on MS1 features between several MS runs together with custom protein databases for ID generation. This entails a multi-level dynamic protein database with different immunodepletion strategies by applying single-shot proteomics. The highest number of melanoma plasma proteins from undepleted and unfractionated plasma was reported, mapping >1200 proteins from >10,000 protein sequences with confirmed significance scoring. Of these, more than 660 proteins were annotated by WiMT from the resulting ~5800 protein sequences. We could verify 4000 proteins by MS1t analysis from HeLA extracts. The WiMT platform provided an output in which 12 previously well-known candidate markers were identified. We also identified low-abundant proteins with functions related to (i) cell signaling, (ii) immune system regulators, and (iii) proteins regulating folding, sorting, and degradation, as well as (iv) vesicular transport proteins. WiMT holds the potential for use in large-scale screening studies with simple sample preparation, and can lead to the discovery of novel proteins with key melanoma disease functions.


Introduction
The diagnosis and prognosis of malignant melanoma (MM) is mainly determined by histological tumor characterization and by its staging [1]. There is, however, an increasing need to identify predictive molecular biomarkers serologically, as blood samples can be obtained in a minimally invasive manner [2,3]. Although plasma holds most of the blood components, the characterization of plasma/serum proteomes is still challenging, especially for low-abundant protein expression. Covering the entire proteome is difficult due to its large dynamic range (which is more than 10 orders in magnitude) and the presence of a small group of proteins in high concentrations (such as immunoglobulins and albumin) which represent 99% of the total plasma protein content [4][5][6].
In the 2000s, the search for new MM biomarkers was performed mainly by proteomic fingerprinting and two-dimensional gel electrophoresis coupled to mass spectrometry (MS) analysis, with the identification of only a few proteins or proteomic profiles that could distinguish patient groups from different disease stages [7][8][9]. Immunodepletion of the most abundant proteins [10], sample fractionation, or a combination of these methodologies can allow for a deeper characterization of plasma/serum MM proteomes. This improves the number of identifications from a few hundred to thousands of proteins by LC-MS/MS [11][12][13][14][15][16].
Currently, there is a lack of techniques and methodologies able to encompass the entire plasma/serum proteome without modifying sample characteristics. This is essential for accurate protein quantification in clinical proteomic studies [17][18][19].
Different strategies have successfully been adopted by MS-based proteomic workflows for the characterization of low-abundant proteins in samples with large dynamic ranges. The procedures explore and/or improve either MS1 or MS2 events by MS in combination with a centered-designed database to reduce the search space [20][21][22][23][24][25][26][27]. At the MS1 level, these strategies focus on MS1 information transference (MS1 transfer or MS1t) between experiments. This means that the identification of peptides achieved by comparing the eluting precursors in different chromatographic runs with high mass accuracy and reproducible retention times is ensured for correct assignments [18,19,[28][29][30][31][32]. The MS1t principle has been reflected in many practical applications, such as the match between runs in the MaxQuant software (MBR) [32][33][34][35], or has been simply defined as the transfer of MS1 features with easy-interfaced software/algorithms such as OpenMS [34,[36][37][38] or Proteome Discoverer [39] (Thermo Fisher Scientific, San José, CA, USA) to increase identifications and to improve label-free quantification (LFQ) workflows by reducing missing values. In 2016 and 2019, Geyer et al. [18,19] applied MBR software to increase the plasma proteome coverage by transferring MS1 information. This was completed using a database of depleted and fractionated plasma samples without previous functional characterization of the proteins identified in the database. This strategy allowed the identification of around 1000 proteins, reaching the low-abundance region down to~10 ng/mL.
In the present work, we describe the melanoma plasma proteome (MPP) obtained from the analysis of undepleted and unfractionated plasma samples from patients with malignant melanoma. This is performed using the Wise MS Transfer procedure (WiMT), which includes the MS1t principle, a custom in-depth database, and single-shot proteomics. The custom database was built and characterized using different immunodepletion strategies for plasma samples. The information covered by the database was transferred to plasma samples from MM patients by MS1t. We suggest increasing the undepleted MPP coverage by performing a single nLC-MS/MS run (using a nano liquid chromatography system interfaced to high-resolution mass spectrometry) without any prior interference with the sample integrity. The database design is simple and tracks protein abundance according to the depletion level required for positive annotation and identification. Most of the proteins from the database have been identified in the unfractionated samples, allowing for deeper characterization of undepleted MPP, in which the main biological processes and protein classes were successfully mapped.

Custom Database Development
We developed a custom database containing more than 1300 proteins identified in plasma from malignant melanoma patients using different immunodepletion approaches (Table S1). The proteins were categorized depending on the strategy applied and we related the protein abundances with the depletion level needed for identifying the proteins in the database. The custom database was divided into four levels depending on the degree of plasma depletion. The undepleted samples represent the first level, and the top7, top14, and SuperMix strategies constitute the second (Low-Dep), third (Mid-Dep), and fourth levels (Deep-Dep), respectively. The database contains a total of 1385 identified proteins, of which 554 are from undepleted plasma, and with the immunodepletion strategies the number of identified proteins increased by~18%, 40%, and 98% for top7 depletion, top14, and SuperMix, respectively ( Figure 1A). While many reports have compared the immunodepletion strategies in terms of efficiency, reproducibility, and specificity [16,[40][41][42], to the best of our knowledge this study is the first to surpass 1000 identified proteins without previous sample fractionation, using a simple "dilute-and-shoot" approach. Unsupervised hierarchical clustering of quantified proteins showed that depleted samples clustered together ( Figure 1B). Thus, it was possible to observe the quantitative difference between the immunodepletion approaches, confirming the enrichment of lower abundant proteins (Cluster A). Cluster B comprises those proteins that are enriched in Low-Dep and Mid-Dep levels but show a decrease in abundance at the Deep-Dep level. As previously stated, the SuperMix depletes 50-60 highly abundant proteins from plasma [10,43,44]. The proteins identified in cluster B and the proteins reported to be captured using SuperMix sample preparation have been compared [43]. At least 52% of the proteins within the cluster were depleted by this methodology (Table S2). Figure 1A shows the numbers of unique proteins identified in undepleted and depleted samples. A comparison of the top7 and top14 strategies with the undepleted plasma results showed that most of the proteins lost in the process were immunoglobulins, at~32%, and 53%, respectively. Excluding the immunoglobulins, approximately 83% of the proteins were lost due to top7 depletion. Consequently, these were also lacking when applying the top14 strategy. The protein abundance distribution is illustrated in Figure 1C. Data collection from undepleted and depleted samples using different strategies can compensate for this loss of information, allowing a broader coverage of the plasma proteome.  Figure 1. Descriptive results of the proteomic analysis of immunodepleted plasma samples. (A) Comparative analysis of the number of proteins identified by each approach. (B) A hierarchical clustering heat map of the proteins identified as common among the 4 groups of samples studied. The gradient from blue to red represents the Z-score scale ranging from −1.4 to 1.4. (C) Protein abundance distribution curve. Classical plasma proteins, tissue leakage, and signaling proteins are highlighted. Tissue leakage proteins were defined as plasma proteins that are not secreted into the blood stream, classified as intracellular proteins (by available information) according to The Human Protein Atlas database (https://www.proteinatlas.org/search/protein_class:Plasma+proteins, and https://www.proteinatlas.org/humanproteome/blood+protein/se-creted+to+blood, accessed on: 5 March 2020) [45][46][47].

Functional Building and Characterization of the Custom Database
We were able to characterize the custom protein database according to: • biological processes; • protein classes; and • pathway biology and enrichment within these signaling cascades.
We were able to verify the functional correlations by applying the respective depletion levels. Protein enrichment is directly related to the increase in plasma proteome coverage. Proteins considered as classical plasma proteins, proteins deriving from tissue leakage, and signaling proteins were identified (see Figure 1C). Most of the tissue leakage proteins were concentrated in the region of the medium abundance of the plasma proteome, as has been previously discussed [2,4]. Signaling proteins such as interleukin-36 gamma (IL36G), macrophage colony-stimulating factor 1 (CSF1), tumor necrosis factor ligand superfamily member 13B (TNFSF13B), and C-C motif chemokine 14 (CCL14) were successfully identified. It was possible to access larger ranges of concentration in the plasma proteome since the number of low abundance proteins identified increased from the Low-dep to Deep-dep level. For instance, we were able to identify 141, 301, 601, and Tissue leakage proteins were defined as plasma proteins that are not secreted into the blood stream, classified as intracellular proteins (by available information) according to The Human Protein Atlas database (https://www.proteinatlas.org/search/ protein_class:Plasma+proteins, and https://www.proteinatlas.org/humanproteome/blood+protein/secreted+to+blood, accessed on: 5 March 2020) [45][46][47].

Functional Building and Characterization of the Custom Database
We were able to characterize the custom protein database according to: • biological processes; • protein classes; and • pathway biology and enrichment within these signaling cascades.
We were able to verify the functional correlations by applying the respective depletion levels. Protein enrichment is directly related to the increase in plasma proteome coverage. Proteins considered as classical plasma proteins, proteins deriving from tissue leakage, and signaling proteins were identified (see Figure 1C). Most of the tissue leakage proteins were concentrated in the region of the medium abundance of the plasma proteome, as has been previously discussed [2,4]. Signaling proteins such as interleukin-36 gamma (IL36G), macrophage colony-stimulating factor 1 (CSF1), tumor necrosis factor ligand superfamily member 13B (TNFSF13B), and C-C motif chemokine 14 (CCL14) were successfully identified. It was possible to access larger ranges of concentration in the plasma proteome since the number of low abundance proteins identified increased from the Low-dep to Deep-dep level. For instance, we were able to identify 141, 301, 601, and 62 proteins at a concentration level of <100 µg/L (low-abundance proteins, LAP) [48], according to the Human Protein Atlas (https://www.proteinatlas.org/humanproteome/blood+protein, accessed on: 5 March 2020), using the top7, top14, SuperMix, and undepleted plasma approaches, respectively [45][46][47]. Proteins identified at each depletion level were submitted to functional annotation enrichment analysis. As expected, proteins related to the acute phase, blood coagulation, and complement pathway were not enriched by any of the immunodepletion techniques (see Figure 2 and Table S3), since most of these proteins can be found in the high and medium concentration ranges [18]. The Low-dep strategy improved the identification of proteins related to angiogenesis, lysosomes, and cell projection, as well as the cytoskeleton. Although there was a higher number of proteins identified with the Mid-dep approach as compared with the Low-dep approach, it was possible to see similarities between the two methodologies, particularly for signaling and secreted proteins. For both strategies, most of the proteins lost in the process were immunoglobulins. The Mid-dep strategy improved our identification, with the enrichment of cell junction, proteasome, tyrosine kinase, and stress response protein kinases. Remarkably, we could annotate membrane, transmembrane, and receptor proteins, together with tissue remodeling and MHC I proteins enriched using the Deep-dep approach.
Cancers 2021, 13, x FOR PEER REVIEW 5 of 25 62 proteins at a concentration level of <100 µg/L (low-abundance proteins, LAP) [48], according to the Human Protein Atlas (https://www.proteinatlas.org/humanproteome/blood+protein, accessed on: 5 March 2020), using the top7, top14, SuperMix, and undepleted plasma approaches, respectively [45][46][47]. Proteins identified at each depletion level were submitted to functional annotation enrichment analysis. As expected, proteins related to the acute phase, blood coagulation, and complement pathway were not enriched by any of the immunodepletion techniques (see Figure 2 and Table S3), since most of these proteins can be found in the high and medium concentration ranges [18]. The Low-dep strategy improved the identification of proteins related to angiogenesis, lysosomes, and cell projection, as well as the cytoskeleton. Although there was a higher number of proteins identified with the Mid-dep approach as compared with the Low-dep approach, it was possible to see similarities between the two methodologies, particularly for signaling and secreted proteins. For both strategies, most of the proteins lost in the process were immunoglobulins. The Mid-dep strategy improved our identification, with the enrichment of cell junction, proteasome, tyrosine kinase, and stress response protein kinases. Remarkably, we could annotate membrane, transmembrane, and receptor proteins, together with tissue remodeling and MHC I proteins enriched using the Deep-dep approach. The results showed a close association between the depletion level and the enrichment of groups of proteins with related functions. As a general trend, the deeper the immunodepletion approach, the higher the number of proteins identified per functional groups and at lower concentrations ( Figure 3). Our approach allows the identifications of growth factors, which are known to be present in plasma in concentrations around ng/L, The results showed a close association between the depletion level and the enrichment of groups of proteins with related functions. As a general trend, the deeper the immunodepletion approach, the higher the number of proteins identified per functional groups and at lower concentrations ( Figure 3). Our approach allows the identifications of growth factors, which are known to be present in plasma in concentrations around ng/L, as described in The Human Blood Atlas (Available in: https://www.proteinatlas.org/humanproteome/ blood+protein, accessed on: 11 August 2020) [45][46][47]. These proteins are commonly identified by immunoassays, whereas the detection by MS still is a challenge [49]. This means that the customized database has great potential for biomarker research [2]. To verify the validity of the WiMT developments, known and established melanoma biomarkers are clearly identified, including lactate dehydrogenase, metalloproteinases, and S100 proteins [50].
Cancers 2021, 13, x FOR PEER REVIEW 6 of 25 as described in The Human Blood Atlas (Available in: https://www.proteinatlas.org/humanproteome/blood+protein, accessed on: 11 August 2020) [45][46][47]. These proteins are commonly identified by immunoassays, whereas the detection by MS still is a challenge [49]. This means that the customized database has great potential for biomarker research [2]. To verify the validity of the WiMT developments, known and established melanoma biomarkers are clearly identified, including lactate dehydrogenase, metalloproteinases, and S100 proteins [50].  [45][46][47]. The graphs were built using the protein concentrations in blood reported in the same database. The boxes represent the median and whisker ranges: 5th-95th percentiles.
We also found the enrichment of proteins related to the biosynthesis of amino acids, carbon metabolism, and glutathione metabolism pathways (Table S4), which have been related to different disorders such as cancer and neurodegenerative diseases [51][52][53][54][55][56][57]. The PI3K-Akt signaling pathway has been found to be altered in several types of cancer, including melanoma. It regulates multiple (patho)physiological processes such as cellular growth, survival, invasion, and angiogenesis in melanoma [58]. This pathway is enriched at the Mid-dep and Deep-dep levels, showing a higher number of proteins identified in the latter. Therefore, we can achieve a better understanding of some aspects of these diseases and discover potential biomarkers. . Functional groups identified in whole and immunodepleted plasma samples. The proteins were annotated using the information included in The Human Protein Atlas database for plasma proteins (https://www.proteinatlas.org/search/ protein_class:Plasma+proteins, accessed on: 29 July 2020) [45][46][47]. The graphs were built using the protein concentrations in blood reported in the same database. The boxes represent the median and whisker ranges: 5th-95th percentiles.
We also found the enrichment of proteins related to the biosynthesis of amino acids, carbon metabolism, and glutathione metabolism pathways (Table S4), which have been related to different disorders such as cancer and neurodegenerative diseases [51][52][53][54][55][56][57]. The PI3K-Akt signaling pathway has been found to be altered in several types of cancer, including melanoma. It regulates multiple (patho)physiological processes such as cellular growth, survival, invasion, and angiogenesis in melanoma [58]. This pathway is enriched at the Mid-dep and Deep-dep levels, showing a higher number of proteins identified in the latter. Therefore, we can achieve a better understanding of some aspects of these diseases and discover potential biomarkers.
Other strategies can be applied to improve database development and consequently peptide/protein identification by WiMT, including extensive fractionation of immunodepleted samples and/or the addition of orthogonal enrichment methods such as ProteoMiner ® [59]. However, the inclusion of fractionated samples requires an improvement in bioinformatic strategies for chromatogram alignment and MS1 transfer.
Since the database was mainly built based on samples from healthy individuals, its applicability is not restricted to melanoma studies, but could be applied to other diseases. We included the undepleted data from the analysis of a plasma pool from melanoma patients to maintain the main characteristics of melanoma to the greatest degree possible. Consequently, this enabled us to identify low-abundant proteins that could not be identified in the analysis of a single sample. More specific proteins could be identified by developing a personalized database with depleted samples from patients with the disease in question (in our case melanoma patients). In WiMT, the researcher can adapt the library to respond the biological question.

Evaluating the MS1 Transfer Procedure (MS1t)
We optimized an experimental model using diluted HeLa protein digests to evaluate the MS1t. The Hela digest is a well-known standard and is commonly used by the proteomics community to evaluate the performance of instruments, new sample preparations, or data acquisition methodologies [27,60,61]. It has also been applied in the evaluation of other MS1 transfer methodologies and single-cell proteomics [62,63]. In addition, most of the proteins identified in different types of cancer cell lines can be found in Hela, which means that this standard provides a qualitative representation of different cell proteomes [64].
The MS1t consists of the transference of MS1 features between two sets of MS data for the identification of peptides, ensuring high mass accuracy and reproducible retention times for the comparison of the eluting precursors [18,19,[28][29][30][31][32]. However, these biological fluids contain proteins at a high dynamic range concentration, and in some cases the input of material to be analyzed by LC-MS/MS may be limited. In fact, these factors could compromise MS1t efficiency with regard to both quantitative and qualitative aspects. In this context, HeLa digest dilutions series were utilized to evaluate the dynamic and linearity ranges of MS1t.
Overall, the MS1t analysis allowed us to verify approximately 4000 protein identifications linearly transferred from 1 µg to 10 ng analyses. Figure 4A confirms that the regression coefficient using the protein intensity medians per concentration (or dilution) group was greater than 0.99. This demonstrates that MS1t maintains linearity, and thus the input material is reduced. To assess the identification data in detail, comparisons between MS1-t, standard DDA, and DIA analysis for each dilution were proven (see Figure 4B). As expected, the identification rate decreased dramatically as the input material was lowered in DDA mode. At 200 ng, the difference between MS1t and DDA was about 1500 proteins, and with 10 ng, this difference increased to more than 3500 proteins. In the same way, in DIA mode (MS2 acquisition) the protein identification decreased dramatically in samples with the lowest input material (<40 ng). In contrast to standard DDA and DIA, MS1-t appeared consistent across the dilution setting. This was because the intensity of some precursors at MS1 was not high enough to select those precursors for fragmentation in DDA. We observed that in DIA experiments, the low intensity of fragments could be the reason for the lack of a positive identification. Thus, MS1-t takes advantage of MS1 detection features to increase proteome coverage. On the other hand, MS1t appeared robust in terms of variation, as shown in Figure 4C, where the median CV values of all dilution points remained lower than 10%. We found that more than 75% of the proteins in all cases had a CV < 25%. The previous MS1t analysis was also performed in grouped proteins according to their abundance. In all dilution points, proteins identified by MS1t and DDA were ranked by their respective intensities and divided into 10 groups of ~300 proteins in each. As illustrated in Figure S1, linearity was maintained across the groups regardless of the protein abundance, ensuring that MS1t was achievable in proteins with at least 2-3 orders of linearity. When MS1t was contrasted with DDA and DIA in the different groups, it was evident that MS1t became significant as the protein abundances decreased, especially at low levels ( Figure S2). Particularly, with 40 ng or less input material, low-abundance proteins were accessible mostly by MS1t (groups 8, 9, and 10). Overall, these results indicate that proteins present in the nanogram range could be accessible "exclusively" by MS1t with high transference confidence. This is particularly relevant in plasma/serum studies where the proteome covering could be improved by applying the MS1t concept, especially for low-abundance key regulators. Furthermore, the HeLa model was successfully applied in the evaluation of our methodology; however, use of a melanoma cell line could reveal additional information regarding key and/or low-abundance melanoma proteins for their effective identification through MS1 transfer.

MM Plasma Proteome Assessment by Applying WiMT
Unlike other methodologies for MS1 transferring in plasma that used extensive depletion top 14-20 together with previous peptide fractionations [18,19] we developed a strategy to use single-shot proteomics (without peptide fractionation), taking advantage of the power of a SuperMix depletion (reaching more than 1000 proteins after LC-MS) The previous MS1t analysis was also performed in grouped proteins according to their abundance. In all dilution points, proteins identified by MS1t and DDA were ranked by their respective intensities and divided into 10 groups of~300 proteins in each. As illustrated in Figure S1, linearity was maintained across the groups regardless of the protein abundance, ensuring that MS1t was achievable in proteins with at least 2-3 orders of linearity. When MS1t was contrasted with DDA and DIA in the different groups, it was evident that MS1t became significant as the protein abundances decreased, especially at low levels ( Figure S2). Particularly, with 40 ng or less input material, low-abundance proteins were accessible mostly by MS1t (groups 8, 9, and 10). Overall, these results indicate that proteins present in the nanogram range could be accessible "exclusively" by MS1t with high transference confidence. This is particularly relevant in plasma/serum studies where the proteome covering could be improved by applying the MS1t concept, especially for low-abundance key regulators. Furthermore, the HeLa model was successfully applied in the evaluation of our methodology; however, use of a melanoma cell line could reveal additional information regarding key and/or low-abundance melanoma proteins for their effective identification through MS1 transfer.

MM Plasma Proteome Assessment by Applying WiMT
Unlike other methodologies for MS1 transferring in plasma that used extensive depletion top 14-20 together with previous peptide fractionations [18,19] we developed a strategy to use single-shot proteomics (without peptide fractionation), taking advantage of the power of a SuperMix depletion (reaching more than 1000 proteins after LC-MS) combined with other depletion strategies in order to complement the specific losses dur-ing each stage of depletion. Thus, in a single-shot experiment it is possible to run each depletion method using the same LC-MS condition in which the undepleted samples are analysed, while keeping the MS1 transfer as simple as possible and in an equivalent way to how it was done with diluted Hela.
The custom database described in the first section was applied for peptide identification using MS1t from immunodepleted to undepleted plasma from MM patients, with more than 1200 proteins and 10,000 peptides identified in total with significant scoring. About 660 proteins and~5800 peptides were annotated by WiMT (Tables S5 and S6). Although the presence of these peptides was not inferred by MS2 spectra annotation, their presence was confirmed by MS1t transfer evaluation and FDR filtering since we could provide evidence of a great improvement in protein identification using a multiple dilution strategy with the HeLa experiment. Furthermore, 80% of the proteins reported on The Human Melanoma Proteome Atlas from depleted plasma samples of the same patients were also identified here [65]. Additionally, our improvement in protein identification is like that reported by Geyer et al. in 2016 [18].
The total expression dataset analyzed from melanoma patients is important as it builds on the expansion of our melanoma database over time. The WiMT approach increased the number of proteins identified in undepleted plasma and consequently enriched several biological processes and pathways that could only be accessed at the highest levels of depletion. The database was built with a depletion protocol, without using any sample fractionation procedures. Therefore, each depletion method is represented by nLC-MS/MS shotgun sequencing, providing the MS1 features and enabling good chromatogram alignment for the MS1t. A graphical representation of the four-layer custom database is shown in Figure 5A. The intensities of the proteins transferred to the undepleted samples are represented by colors depending on the depletion level. The proteins identified show different intensities according to the immunodepletion strategy applied due to MS signal improvements. Thus, higher-intensity layers are associated with the plasma depletion extension (Low-dep, Mid-dep, and Deep-dep). With this strategy, 1088 proteins were identified in a pool of undepleted plasma samples from MM patients (Table S7). Approximately 81-94% of the proteins identified in each of the 3 immunodepletion strategies were transferred to the undepleted plasma samples. As discussed in The Human Melanoma Proteome Atlas [65], we could also identify more than 60% of the FDA-approved biomarkers. The identification in the non-depleted samples of more than 80% of the proteins identified from this pool of metastatic depleted melanoma samples supported the identification process [65]. The analysis included 12 potential MM biomarker candidates such as lactate dehydrogenase, C-reactive protein, serum amyloid A, osteopontin, and the melanoma cell adhesion molecule, among others (Table 1). In general, the deregulation of these protein abundances is also associated with other types of cancers and even with other diseases, which means that these are not melanoma-specific [66][67][68][69][70][71][72]. However, their abundances can be included in protein signatures/patterns to associate changes related to the physiological characteristics of melanoma patients.  [45][46][47]. The graphs were built using the protein concentrations in blood reported in the same database. The boxes represent the median and whiskers for the 5th and 95th percentiles, respectively. (D) KEGG pathway enrichment analysis for the comparison of undepleted plasma before and after MS1t. Each circle represents a pathway, while the size of each circle is related to the number of proteins, and the colors differ from the results obtained before and after MS1t. List of proteins considered as MM biomarkers candidates in plasma or serum by previous works. Among the 17 proteins, lactate dehydrogenase and C-reactive protein have been approved by the FDA. Using the WiMT strategy, we were able to identify 12 proteins in the pools and 11 in individual plasma from MM patients. The immunodepleted strategies were important for the identification of osteopontin, chitinase-3-like protein 1, laminin, tenascin C, and collagen type VI. The percentage of MM patients that had proteins identified with WiMT was calculated, and most were identified in all the 10 patients.
In addition, the WiMT strategy covered more than 40% (176/435) of the secreted blood proteins previously identified by immunoassays and collected based on published research articles as described in The Human Protein Atlas (https://www.proteinatlas. org/humanproteome/blood+protein/proteins+detected+by+immunoassay, accessed on: 28 November 2021) [45][46][47]. There are 55 out of 110 currently FDA-approved blood biomarkers in the list above, with just 11 falling within the very-to ultra-low abundance protein range, i.e., from 10 µg/L to lower 10 ng/L, respectively. In this context, none of these proteins were identified by applying WiMT, nor have they ever been identified in similar studies on undepleted plasma [18].
However, the WiMT strategy allowed the identification of 247 proteins with concentrations under 10 ng/mL, which opens up the possibility of identifying novel melanoma biomarkers with this approach, as has been predicted when more than 1000 proteins are identified in plasma samples [2].
To support the identification process, the multi-level design of the database allows us to categorize the proteins identified in the undepleted samples according to the depletion levels information that we obtained from the well-characterized database. Significant differences were observed in protein abundances when comparing the four levels ( Figure 5B). A decrease in the median abundance of proteins categorized as identified in undepleted to SuperMix was observed. These results support the identification process based on a clear association between the protein expressions in the undepleted samples and the depletion levels from the database. No significant difference was found between the top7 and top14 categories, as expected from the characterization of Low-Dep and Mid-Dep levels.
Although the number of proteins and their intensities from top7 to top14 increases, this increment is not sufficient for the observation of statistically significant changes. More importantly, this shows that it is possible to increase the coverage of the MPP using a WiMT strategy similar to the one achieved with the database. Furthermore, the identification of the same functional groups enriched by these immunodepletion approaches (See Figures 3 and 5C) was achieved. The increased proteome coverage was reached without any additional steps in sample processing. In this way, the sample quantitative characteristic was maintained, helping to ensure protein transfer. Figure 5D shows the pathways enriched in the analysis of non-depleted plasma as compared with the results obtained with WiMT. This resulted in the enrichment of the pathways discussed above, including the PI3K-Akt signaling pathway, the biosynthesis of amino acids, carbon metabolism, adherens junctions, proteasomes, and lysosomes. In addition, the WiMT increased the number of proteins identified in relation to some of these pathways.
In addition, WiMT was applied to the analysis of plasma samples from 10 malignant melanoma patients in the early stages of the disease, i.e., when the primary tumors were detected. Using the custom database, 1134 proteins were identified (Table S7). Notably, almost 90% of the proteins identified in a pool of depleted samples from metastatic melanoma patients [65], including the potential melanoma biomarkers, were covered using the WiMT methodology in undepleted samples. Most of the biomarkers (Table 1) can be found in all the patients, except for osteopontin, which was identified in 40% of the samples. The proteomap based on KEGG pathway enrichment showed that the plasma proteome can be divided into six major groups: environmental information processing, genetic information processing, organismal systems, metabolism, cellular processes, and human diseases (see Figure 6A). These groups can be further divided into categories such as signaling molecules and interaction, signal transduction, biosynthesis, the immune system, and vesicular transport, for instance. The third level of categorization shows the detection of proteins related to the PI3K-Akt, MAPK, and Ras signaling pathways, cell adhesion molecules, complement and coagulation cascades, glycolysis, peptidases, proteasomes, lysosomes, and exosome proteins. Gene ontology analysis showed that more than 200 biological processes were enriched, including the immune system, cell adhesion, angiogenesis, inflammatory response, and positive regulation of the ERK1 and ERK2 cascade. These biological processes and pathways have been found to be dysregulated in many types of cancers, including melanoma, which indicates that the proteins identified could be potential biomarkers, as discussed previously. Evaluating the percentages of genes identified in each process, the 10 patients showed similar results, indicating good reproducibility of the results among the patients using WiMT strategy (see Figure 6B). When comparing the biological processes enriched using the MS1t strategy and the protein identification by MS2 spectra annotation only, 102 biological processes were specifically enriched. Within these, angiogenesis, cell adhesion, adherens junction organization, cell migration, and the regulation of cell proliferation were identified as constituting the main biological processes ( Figure 6C).  Quantitative results showed a good correlation between the two MM pool replicates, with R = 0.97 and a p-value < 0.0001 (Pearson correlation) (Figure S3), and a coefficient variation lower than 10% (85% of the proteins with CV 25%). Similar results were obtained in other experiments with undepleted plasma samples using the WiMT strategy, with a CV < 10% and Pearson correlation above 0.96 (p-value < 0.0001). These results showed a good correlation (R 2 of 0.6154) between the estimated protein abundances by mass spectrometry and the estimated protein concentrations in blood (available in: https://www.proteinatlas. org/humanproteome/blood+protein, accessed on: 5 March 2020) [45][46][47], which were to be compared to the values obtained previously [18] in the analysis of undepleted plasma with similar strategies (See Figure S3). A good correlation was also observed as a resulting outcome from the 10 MM patients, with a mean R = 0.92 (Pearson correlation) and a coefficient variation lower than 5% ( Figure S4).
Altogether, our results provide evidence of a successful MS1 transfer, generating the identity and quantification of more than 1200 proteins in undepleted plasma and detecting 12 out of 17 potential MM biomarker candidates. Here, the WiMT was applied to characterize MPP; however, this strategy can be expanded for the study of different pathologies. Although the WiMT is based on well-known concepts (MS1 transfer), the accuracy and methods for controlling the false discovery rates after transfer depend on the platforms used and have previously been discussed [29,31,33,36,78,79]. For reliable MS1 transfer, a highly reproducible retention time and accurate determination of m/z are required for chromatogram alignment. Therefore, the use of robust HPLC systems coupled to high-resolution mass spectrometry such as Q Exactive HF-X (ThermoScientific) is indispensable. We strongly recommend the use of WiMT only for relative quantification in discovery proteomics, and complementing the analysis, when possible, with orthogonal experiments and biological or clinical information. The validation of selected differentially expressed proteins could be performed in another cohort by use of low-resolution mass spectrometers such as triple quadrupoles, or immunoassays.

Blood Sample Collection and Storage
Blood sample collection was performed before tumor resection surgery at Semmelweis University Hospital. The samples underwent automated fractionation into plasma, serum, lymphocytes, and erythrocytes [80,81] and were stored at −80 °C within 2 h. The samples were then transferred in dry ice to the melanoma biobank (Lund, Sweden) where they were stored at −80 °C until further processing. The project was approved by the local Ethical Committee 727 and the Ethical Committee at Semmelweis University (191-4/2014), as well as the Swedish Ethical Review Authority in Lund (code DNR 2014/311). All patients provided written informed consent. Here, the analyses were performed using a pool of 57 MM patients at different stages of the disease, plasma samples from 10 patients at the primary tumor stage, and a pool of 30 healthy individuals.

Blood Sample Collection and Storage
Blood sample collection was performed before tumor resection surgery at Semmelweis University Hospital. The samples underwent automated fractionation into plasma, serum, lymphocytes, and erythrocytes [80,81] and were stored at −80 • C within 2 h. The samples were then transferred in dry ice to the melanoma biobank (Lund, Sweden) where they were stored at −80 • C until further processing. The project was approved by the local Ethical Committee 727 and the Ethical Committee at Semmelweis University (191-4/2014), as well as the Swedish Ethical Review Authority in Lund (code DNR 2014/311). All patients provided written informed consent. Here, the analyses were performed using a pool of 57 MM patients at different stages of the disease, plasma samples from 10 patients at the primary tumor stage, and a pool of 30 healthy individuals.

Plasma Immunodepletion
A pool of plasma samples from healthy individuals (n = 30) was depleted using a Multiaffinity Removal Column human-7 (4.6 × 50 mm), Multiaffinity Removal Column human-14 (4.6 × 100 mm) (Agilent Technologies, Santa Clara, CA, USA), and Seppro ® SuperMix LC2 (6.4 × 63 mm) (Sigma-Aldrich, St. Louis, MO, USA) coupled to a 1260 Infinity LC System (Agilent Technologies, Santa Clara, CA, USA). Each immunodepletion protocol was performed according to the manufacturer's instructions on technical replicates ( Figure S5A-C). To eliminate the variation caused by technical issues, the replicates for each strategy were pooled together for further steps.
After depletion, the samples were submitted to a buffer exchange using an Amicon Ultra Centrifugal filter (0.5 mL-10 kDa, Millipore, County Cork, Ireland). Briefly, samples were transferred to the Amicon 10 kDa and centrifuged at 13,000× g for 20 min. Then, 400 µL of 50 mM ammonium bicarbonate (Ambic) was added, followed by centrifugation at 13,000× g for 20 min. This step was repeated, and centrifugation was carried out for 30 min. Lastly, 70 µL of 10% of sodium dodecyl sulfate (SDS)/25 mM of 1,4-dithiothreitol (DTT) in 100 mM of triethylammonium bicarbonate buffer (TEAB) were added, the Amicon was turned upside down, and the sample was recovered in a tube by centrifugation at 1000× g for 5 min.

Samples Digestion
Samples were digested in an S-Trap (Protifi, Farmingdale, NY, USA) plate, as described by Kuras et.al in 2020 [82]. Briefly, 70 µg of protein, quantified by Pierce 660 nm protein assay (Thermo Scientific, Waltham, MA, USA), was used for sample processing in the top7 approach, and all protein content in top14 and SuperMix. For sample reduction, samples were incubated in SDS/25 mM DTT in 100 mM TEAB for 5 min at 99 • C, with shaking at 500 rpm. Alkylation was performed with iodoacetamide with a final concentration of 50 mM for 30 min at room temperature in the dark. The samples were then acidified by adding orthophosphoric acid to a final concentration of 1.2% and diluted 7× with binding buffer (90% methanol, 100 mM TEAB). Samples were transferred to the S-Trap plate and captured proteins were washed 4 times with a binding buffer. Each step was performed with centrifugation at 1000× g for 2 min. The protein digestion was carried out by adding LysC (Wako Chemicals, Richmond, VA, USA) in 50 mM TEAB in a ratio of 1:50 (enzyme: protein) and incubating the S-trap plate at 37 • C for 2 h, followed by the addition of trypsin (Promega, Madison, WI, USA) (1:50) in 50 mM TEAB and incubation at 37 • C overnight. Peptide elution was performed in 3 steps by adding 80 µL of 50 mM TEAB, 0.2% formic acid (FA), and then 50% acetonitrile (ACN)/0.2% FA, centrifuging the S-Trap plate at 1000× g for 2 min after each step. The peptides were dried down and resuspended in 40 µL of 2% ACN/0.1% trifluoroacetic acid (TFA). Peptide content was estimated using the Pierce Quantitative Colorimetric Peptide Assay (Thermo Scientific, Waltham, MA, USA) prior to nLC-MS/MS analysis.

LC-MS/MS Analysis
The data were acquired using the data-dependent acquisition (DDA) mode in an UltiMate 3000 RSLCnano system coupled with the high-resolution Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific, San José, CA, USA) to guarantee retention time reproducibility and mass accuracy. The full MS scan was set with an acquisition range of m/z 375-1500, a resolution of 120,000 (at m/z 200), a target AGC value of 3 × 10 6 , and a maximum injection time (IT) of 100 ms. The top 20 precursors were fragmented with a normalized collision energy (NCE) of 28. For the MS2 acquisition, the instrument was set with a resolution of 15,000 (at m/z 200), a target AGC value of 1 × 10 6 , a maximum IT of 50 ms, and an isolation window of 1.2 m/z. Dynamic exclusion was 40 s. Approximately 2 µg of peptides were analyzed for each sample with at least 2 replicates. Peptide elution was performed with a gradient of ACN and FA for 120 min, using the trap column C18 Acclaim PepMap TM 100 (2 cm × 75 µm i.d.; 100 Å) and the column PepMap TM RSLC C18 (2 µm, 100 Å, 75 µm i.d. × 50 cm).

Data Analysis
Data analysis was performed on Proteome Discoverer 2.4 (Thermo Scientific, San José, CA, USA). For peptide identification, MSPepSearch was used against the Human spectral library ProteomeTools_HCD28_PD using a UniProt human database (Date: 28 January 2020). SEQUEST HT was also used against the same UniProt human database for unassigned peptides from MSPepSearch. For the peptide search, cysteine carbamidomethylation was set as a static modification, methionine oxidation as a dynamic modification, and acetylation, methionine loss (met-loss), and met-loss plus acetylation as a dynamic modification in the protein terminus. The precursor and fragment mass tolerance were set at 10 ppm and 0.02 ppm, respectively, and up to 2 missed cleavages were allowed. The confidence level used was FDR < 0.01 at the peptide level and FDR < 0.05 at the protein level. The node Feature Mapper was used in the consensus workflow for chromatographic alignment and identification of peptides based on MS1 information. For chromatographic alignment, the maximum RT shift was set at 3 min and the mass tolerance at 10 ppm. For feature linking and mapping, the RT and mass tolerance were set at 0 min and 0 ppm, respectively, the default for Proteome Discoverer, and the minimum S/N threshold was 5. Peptide and protein quantifications were performed based on the Label-free quantification approach using the precursor ion intensity to infer peptide abundance, and considering all peptides to calculate the abundance at the protein level.

Bioinformatic Analysis
For data analysis, the proteins identified in at least 1 technical replicate were included. The characterization of undepleted and depleted plasma proteome was performed using the DAVID functional annotation tool, analyzing the functional category "UP_KEYWORDS" and "KEGG Pathway" and considering the results with a p-value and FDR < 0.05. The quantitative analysis was performed with the Perseus 1.6.12.0 software. The data were transformed by log2, normalized by subtracting the median, and filtered by 1 valid value in each group. The coefficient variation (CV) between experiments was determined considering the lognormal distribution of the MS experiment results [83][84][85]. The heatmap was built using the mean values and normalizing the proteins by Z-score. The construction of boxplot graphs was performed with the GraphPad Prism. 8.3.1 software using the estimated concentration of proteins in the blood available in the The Human Protein Atlas database for plasma proteins (available at https: //www.proteinatlas.org/humanproteome/blood+protein, accessed on: 5 March 2020). The proteins were annotated using the information available in the same database (available at https://www.proteinatlas.org/search/protein_class:Plasma+proteins, accessed on: 29 July 2020) [45][46][47].

Evaluation of MS1-Transferring Efficiency-HeLa Digest Dilution Series
The peptide dilution series were prepared from commercial HeLa digest (Pierce TM HeLa Protein Digest Standard, Thermo Scientific, Waltham, MA, USA). The analysis considered the following concentrations: 1000, 500, 200, 100, 40, 20, and 10 ng/µL. Each dilution was analyzed by LC-MS/MS by injecting 1 µL of solution. In the case of dilutions of DIA analysis, samples were spiked in with iRT Kit peptides (Biognosys, Schlieren, Switzerland) for retention time normalization. All analyses were performed in triplicate.
Sample loading, separation, and data acquisition were performed in the same LC-MS/MS system described previously. Samples were separated by 97 min gradients whereby data were acquired by both DDA and DIA modes. In DDA, the instrument was set as follows: full MS in a range of m/z 375-1500, resolution of 120,000 (at m/z 200), target AGC value of 3 × 10 6 , and maximum injection time (IT) of 50 ms. The top 20 precursors were fragmented with an NCE of 28. MS2 acquisition was set with a resolution of 15,000 (at m/z 200), a target AGC value of 1 × 10 5 , maximum IT of 19 ms, and an isolation window of 1.2 m/z. Dynamic exclusion was set to 40 s. For DIA acquisition, the instrument was set as follows: full-MS scan parameters were kept the same as described for DDA experiments; for fragmentation analysis, the NCE was set at 28, resolution at 30,000 (at m/z 200), the AGC target value at 1 × 10 6 , and the MSX count and isolation window at 18 and m/z 16, respectively. Data from DDA were analyzed using the same parameters described previously. For DIA, protein identification was performed in Spectronaut (Biognosys, Schlieren, Switzerland) with the following parameters: chromatogram alignment and RT calibration were performed with the iRT Biognosys' kit; the MS quantity level was set as MS1; the quantity type was set as the height and precursor; and protein q-value cutoff was 1%. The spectral library was built in Spectronaut (Biognosys, Schlieren, Switzerland) using all Proteome Discoverer results from the DDA data.
To demonstrate the linear nature of the MS1 transfers, a linear regression analysis per protein was performed, where the MS intensities (log2 expression values) were used as a response variable and the dilution points as an independent variable. The rate of false discoveries was analyzed by following a target-decoy strategy [86]. A decoy set of proteins was created by randomizing the MS intensities from the original set of proteins (target) so that each protein adopted an incorrect intensity value. The linear regression analysis was repeated using both sets of proteins (target + decoy) and the R2 parameter was used as a score to determine the FDR. In this case, an FDR threshold of 5% was set and these proteins were considered to be truly linearly transferred. Linear regression analyses were performed using the R software [87,88].

Sample Description
A pool (n = 57) of MM samples from patients at different stages of the disease (as described in the The Human Melanoma Proteome Atlas [65,89]) and 10 individual samples from MM patients in the early stage (primary tumor) of disease were analyzed. All patients had undergone surgical resection of their tumors and subsequent histopathological characterization supported by imaging studies. Table 2 displays the clinicopathological properties of the herein analyzed patients.

Sample Digestion
EDTA plasma samples were diluted with MilliQ water (1:10), and an aliquot of 8.75 µL (~70 µg of protein) of each sample was separated using the S-trap protein digestion protocol. For sample reduction, 42.25 µL of 10% SDS/25 mM DTT in 100 mM TEAB solution was added to the diluted plasma. The reduction, alkylation, and digestion steps were performed as described previously [82].

LC-MS/MS Analysis
The samples were loaded, separated, and analyzed in the same system, as described for custom database development. The elution gradient and the parameters for data acquisition were kept the same.

Data Analysis
The data analysis was performed using the same parameters described previously. The proteomap analysis was performed using the online tool Bionic Visualizations-Proteomaps (Homo sapiens database) [90][91][92]. For MS1t, all depleted samples were processed together with the undepleted ones in the same workflow. To determine the FDR of the MS1 transfer, a target-decoy strategy was followed [86]. A decoy set of proteins was created by simulating a distribution of values similar to that followed by the MS1 intensities of the transferred proteins (target dataset). To calculate the FDR values, an empirical score [20,93,94] per protein was created based on the protein intensity (Ii), the protein abundance rank (proteins sorted by ascending MS intensities, Rank1), and the probability of being a plasma protein.
The probability of being a plasma protein (PDi) was determined based on cumulative distribution functions that utilized the previous identification of plasma proteins taken from both public repositories (Table S8) and in-house experiments (Table S9). Finally, the empirical scoring scheme was applied ( Figure S6) for both the decoy and target proteins datasets with an FDR threshold set at 5%. Proteins with FDR < 0.05 were considered correctly transferred. The analysis was performed using R software [87,88] and SPSS Statistics 21.0 (IBM, Somers, IL, USA). Bioinformatic analysis was performed as previously described for the characterization of the custom database.

Conclusions
We established the MPP of undepleted patient samples using our newly developed WiMT strategy, mapping more than 1200 proteins and 10,000 peptides in non-depleted plasma samples from MM patients. The MPP is mainly characterized by proteins related to cancer pathway signaling processes, the immune system, genetic information processing (protein folding, sorting, and degradation), cellular processes (protein transport) and the biosynthesis of metabolites. These results represent the proteins and processes that could be followed by the proteomic analysis of undepleted plasma from melanoma patients. Our results show great potential for large-scale screening in melanoma proteomics studies, providing an invaluable tool for monitoring blood proteins in melanoma patients. The developments are generic and can be applied to other neoplastic diseases.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/cancers13246224/s1. Figure S1: Linearity analysis of protein abundance groups depending on the protein amount analyzed. Figure S2: Comparison of the results obtained by MS1t, DDA, and DIA according to the range of protein abundance. Figure S3: Correlation analysis between pool replicates and protein blood concentration. Figure S4: Pearson correlation analysis among the results from the 10 MM patients. Figure S5A-C: Chromatograms obtained from the immunedepletion strategies. Figure S6: Empirical scores used for MS1 transfer evaluation. Table S1: Custom database description. Classification of proteins based on the depletion levels. Table S2: Cluster B proteins depleted by the SuperMix strategy. Table S3: Summary of functional annotation analysis results for the undepleted plasma and the 3 immune depletion approaches. Table S4: KEGG pathway analysis results for the undepleted plasma and the 3 immune depletion approaches. Table S5: Peptide identification in the pool of plasma samples from MM patients. Table S6: Peptide identification in plasma samples from the 10 MM patients. Table S7: Proteins identified in the analysis of MM pool and primary tumor patients by MS2 and WiMT. Table S8: Probability of proteins from public repository being plasma proteins. Table S9: Probability of proteins from in-house experiments being plasma proteins.