Inflammatory Proteins HMGA2 and PRTN3 as Drivers of Vulvar Squamous Cell Carcinoma Progression

Simple Summary Our study aimed to advance the understanding of vulvar squamous cell carcinoma (VSCC) biology by recognizing biological pathways that drive the progression of this disease. We applied the experimental path from global proteomic analysis of vulvar tumors to the targeted and quantitative assessment of specific proteins both in the tumors and blood of VSCC patients. The proteomic analysis has advanced the knowledge on VSCC biology by pointing at inflammation as a driver of progression and by providing grounds for the hypothesis of vulvovaginal microflora disturbances as a trigger for the inflammatory response. The study results indicate prognostic protein markers and potential therapeutic targets for improved and personalized management of VSCC. Abstract Current knowledge on the biology of squamous cell vulvar carcinoma (VSCC) is limited. We aimed to identify protein markers of VSCC tumors that would permit to stratify patients by progression risk. Early-stage tumors from patients who progressed (progVSCC) and from those who were disease-free (d-fVSCC) during follow-up, along with normal vulvar tissues were examined by mass spectrometry-based proteomics. Differentially expressed proteins (DEPs) were then verified in solid tissues and blood samples of patients with VSCC tumors and vulvar premalignant lesions. In progVSCC vs. d-fVSCC tumors, the immune response was the most over-represented Gene Ontology category for the identified DEPs. Pathway profiling suggested bacterial infections to be linked to aggressive VSCC phenotypes. High Mobility Group AT-Hook 2 (HMGA2) and Proteinase 3 (PRTN3) were revealed as proteins predicting VSCC progression. HMGA2 and PRTN3 abundances are associated with an aggressive phenotype, and hold promise as markers for VSCC patient stratification. It appears that vulvovaginal microflora disturbances trigger an inflammatory response contributing to cancer progression, suggesting that bacterial rather than viral infection status should be considered in the development of targeted therapies in VSCC.


High-pH Reversed-Phase Fractionation
For extensive peptide fractionation, 755 µg (dissolved in 10 mM ammonium hydroxide) of the iTRAQ-labeled tissue digest was separated with an H-Class Waters UHPLC system at high-pH (10 mM ammonium hydroxide, pH 10) on a reversed-phase column (Waters XBridge Peptide BEH C18 Column: 130 Å, 5 µm, 4.6 mm × 250 mm) at a 1 mL/min flow rate and a gradient of 3-55% acetonitrile (ACN) over 60 min followed by a washing step (10 min at 90% ACN) and column equilibration (15 min at 3% ACN). Sixty 1 mL fractions were collected, dried completely using a SpeedVac, dissolved in 50 µL of 0.1% FA and 20 µL was used for nano-LC-MS/MS analysis.

LC-MS/MS Analysis
A Waters nanoACQUITY UPLC system was used for peptide separation prior to MS/MS analysis. Mobile phase A consisted of 0.1% FA and mobile phase B was ACN/0.1% FA. 20 µL of each fraction was injected onto a reversed-phase trapping column (180 µm × 20 mm, C18, 5 µm, ACQUITY UPLC Symmetry, Waters) using mobile phase A. Peptides were transferred to a reversed-phase analytical column (75 µm × 250 mm, nanoACQUITY UPLC BEH130 C18 Column, 1.7 µm, Waters) and separated using a 250 nL/min flow rate and a gradient from 3-33% mobile phase B over 150 min. The column was directly coupled to the ion source of an Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific) which worked in the regime of data-dependent MS to MS/MS switch, with the following criteria: positive polarity mode (2-3 kV), capillary temperature of 250 °C, HCD fragmentation (NCE: 27), isolation width 1.2 Th, one MS scan (300-2000 m/z, 70k resolution, 60 ms max fill time, 1e6 AGC), followed by a maximum of 12 MS/MS scans (17.5k resolution, 60 ms max fill time, 5e5 AGC), with a dynamic exclusion of 30 s and profile mode acquisition.

Qualitative MS/MS Data Processing and iTRAQ Quantitative Analysis
Data processing was performed as previously described [1]. Briefly, Mascot Distiller (version 2.5.1.0, Matrix Science, London, UK) was used to pre-process MS/MS data with default value settings for iTRAQ labeling. This included merging of redundant spectra and removal of noisy spectra. Next, a two-step database search procedure was carried out [2]. A first, less restrictive database search of the data was conducted using the MASCOT search engine and the Swiss-Prot Homo sapiens database (from 2015) which enables the calculation of MS and MS/MS measurement errors and recalibration of the data for a repeated stringent MASCOT search. The initial search parameters were set as follows: enzyme, trypsin; fixed modification, cysteine modification by MMTS as well as iTRAQ labeling of the N-terminus of peptides and of lysine side chains; variable modificationsoxidation (M); max missed cleavages-1, peptide tolerance of 100 ppm, MS/MS tolerance of 0.2 Da. The data were calibrated and filtered using the MScan program with a queries threshold of 10. The merged file of all the fractions was then again searched using MAS-COT against the Swiss-Prot database supplemented with a decoy database of randomized sequences mixed in, enabling the calculation of the false discovery rate (FDR). This procedure provided q-value estimates for each peptide spectrum match (PSM) in the dataset. All PSMs with q-values >0.01 were removed from further analysis. Additional search parameters to those mentioned above now included a peptide tolerance of 6 ppm and a fragment ion tolerance of 0.2 Da. Further data filtration was performed using MScan with the following criteria: at least two peptide observations per protein, proteins that matched the same set of peptides were clustered into one protein group, and proteins identified by a subset of peptides from another protein were removed from the analysis. MS/MS spectra of peptides meeting the above acceptance criteria were subjected to the quantitative analysis step to obtain a list of differentially expressed proteins as indicated by the iTRAQ reporter tags. The differentially expressed proteins list with estimation of statistical significance of a single protein ratio was acquired using the in-house program Diffprot as described previously [2].

Verification by Parallel Reaction Monitoring (PRM)
Using total protein digests from each sample, a targeted proteomic method, parallel reaction monitoring (PRM) was employed to verify the differentially expressed proteins. Three to five unique peptides of each protein of interest were selected. To screen for the presence of candidate proteins obtained in the iTRAQ experiment, a panel of 115 peptides from 49 proteins was chosen to verify putative prognostic proteins in 25 d-fVSCC, and 26 progVSCC samples. Additionally, 14 controls, 5 HSIL and 25 prosVSCC samples were included in the PRM analysis. Plasma samples obtained from 68 patients were analyzed with PRM assays to accurately quantitate 49 qualified proteins using stable-isotope standards (3-5 peptides per protein).

Selection of Peptides for PRM
The selection of peptides for PRM was performed according to the criteria described previously [3]. Briefly, tryptic peptides best suited for targeted analysis were selected using databases such as: UniProt, PeptideAtlas, SwissProt-Expasy, NCBI BLASTp, and the NCBI SNP database with the following criteria: unique peptide sequence to the targeted protein, highly observed in spectral databases, peptide length does not exceed 21 amino acids, avoided easily chemically modifiable residues and sequences prone to modifications, peptides with missed cleavages and low efficiency of digestion were excluded, peptides containing high frequency single-nucleotide polymorphisms were excluded, and peptides containing reported post-translational modifications or with known biological features affecting their accurate measurement were excluded.

Trypsin Digestion of Patient Tissue Lysates for Verification
From each cell lysate (obtained as described above in the discovery phase), 100 µg of proteins was transferred into a microplate well into the already added 1% NaDOC in 25 mM ammonium bicarbonate (AmmBic), to a final volume of 105 µL, followed by an addition of 8 µL of 3.75% NaDOC in 100 mM AmmBic. Disulfide bonds were reduced by the addition of 4 µL of 50 mM tris(2-carboxyethyl)phosphine (TCEP) in 100 mM AmmBic, and incubation at 60 °C for 30 min. To block cysteine groups, 4 µL of 100 mM iodacetamide (IAA) in 100 mM AmmBic was added and incubated at 37 °C for 30 min. Next, 4 µL of 100 mM dithiothreitol (DTT) in 100 mM AmmBic was added and incubated at 37 °C for 30 min to quench extra IAA. To digest proteins, 10 µL (4 µg) of trypsin (ratio: 25:1 protein: enzyme) was added and the plate was incubated for 16 hours at 37 °C. 50 µg of digest from each sample was transferred into a new microplate well, 40.3 µL of a SIS mixture (115 SIS peptides balanced and mixed together) were added, and NaDOC was precipitated by addition of 42.2 µL of 1% FA. The precipitate was removed by centrifugation at 3000 × g for 5 min., and the supernatant was cleaned on an SPE plate (Waters, Oasis HLB, micro-elution plate), by adding to 350 µL of 0.2% FA followed by washing with 500 µL of LC-MS water and eluting with 100 µL of 55% ACN, 0.1% FA. Each sample was dried on a Speed-Vac, dissolved in 50 µL of 0.1% FA and stored at −80 °C till PRM analysis.

Digestion of Plasma Samples
Five µL of plasma was transferred into 50 µL of 9 M urea. To reduce disulfide bonds 9 µL of 50 mM TCEP in 100 mM AmmBic was added and incubated at 37 °C for 30 min with shaking. Next, to block cysteine groups, 5 µl of 100 mM IAA in 100 mM AmmBic was added and incubated with shaking at 37 °C for 10 min. To quench IAA, 2.5 µL of 100 mM DTT in 100 mM AmmBic was added and incubated with shaking at 37 °C for 30 min. After this time, 428.5 µL of 100 mM AmmBic was added to dilute urea to 0.9 M, and 35 µL (14 µg) of trypsin in 100 mM AmmBic (ratio 25:1 protein: enzyme) was added and incubated at 37 °C for 16 hours. To each sample, SIS peptides were added and the sample was desalted using SPE cartridges (Waters, Oasis HLB 1cc (10 mg)), as follows: the samples were diluted in 0.1% FA, loaded, washed with 1 mL of LC-MS water, and eluted with 200 µL of 55% ACN, 0.1% FA, and finally dried using a SpeedVac and kept at −80 °C prior to PRM analysis.

PRM nanoLC-MS Analysis
The Q-Exactive nanoLC-MS system was used as for the iTRAQ analysis described above except the instrument was used in PRM mode: in multiplexed-mode for natural and SIS peptides (MSX: 2), isolation width of 1.0 Th, with a resolution of 35k (tissue samples) and 70k (plasma), a 60/120 ms fill time, and AGC of 1e6. Data were analyzed in Skyline with up to 10 selected ion fragments picked according to a spectral library generated on a pure mixture of SIS peptides which generated the highest signals and which were selected for each peptide that were free of signal interferences for both the endogenous and heavy (SIS) peptides. For the SID-PRM analysis the quantity of the endogenous peptide is reported as the Peak Area Ratio (PAR) which is the sum of the peak areas of all transitions for the endogenous peptide divided by the sum of the peak areas of transitions of its heavy standard. The addition of equivalent amounts of standard peptides to the analyzed samples and the calculation of PAR enables the normalization of natural peptide relative abundance between samples in terms of MS signal fluctuations and post-digestion sample processing differences. NanoLC-PRM-MS analysis was carried out with 5 µg of digested proteins (1 µg/µL in 0.1% FA), with three blank (0.1% FA) runs in between every sample. All PRM data were processed using the Skyline Ver. 3 software with default values for peak integration. All integrated peaks were manually inspected to ensure correct peak detection and accurate integration. All peptides were targeted using 5 to 10 ion pairs per peptide that were interference free. Figure S1. Overlapping canonical pathways identified by core analysis in Ingenuity Pathway Analysis (IPA) that were significantly enriched in the dataset. Nodes correspond to canonical pathways whereas links correspond to the molecules shared between two pathways (no line means no shared molecules between two pathways). The color intensity of nodes corresponds to the −log p-value (a more intense color indicates a larger −log p-value).