Application of Proteogenomics to Urine Analysis towards the Identification of Novel Biomarkers of Prostate Cancer: An Exploratory Study

Simple Summary Prostate cancer (PCa) is one of the most common cancers. Due to the limited and invasive approaches for PCa diagnosis, it is crucial to identify more accurate and non-invasive biomarkers for its detection. The aim of our study was to non-invasively uncover new protein targets for detecting PCa using a proteomics and proteogenomics approach. This work identified several dysregulated mutant protein isoforms in urine from PCa patients, some of them predicted to have a protective or an adverse role in these patients. These results are promising given urine’s non-invasive nature and offers an auspicious opportunity for research and development of PCa biomarkers. Abstract To identify new protein targets for PCa detection, first, a shotgun discovery experiment was performed to characterize the urinary proteome of PCa patients. This revealed 18 differentially abundant urinary proteins in PCa patients. Second, selected targets were clinically tested by immunoblot, and the soluble E-cadherin fragment was detected for the first time in the urine of PCa patients. Third, the proteogenome landscape of these PCa patients was characterized, revealing 1665 mutant protein isoforms. Statistical analysis revealed 6 differentially abundant mutant protein isoforms in PCa patients. Analysis of the likely effects of mutations on protein function and PPIs involving the dysregulated mutant protein isoforms suggests a protective role of mutations HSPG2*Q1062H and VASN*R161Q and an adverse role of AMBP*A286G and CD55*S162L in PCa patients. This work originally characterized the urinary proteome, focusing on the proteogenome profile of PCa patients, which is usually overlooked in the analysis of PCa and body fluids. Combined analysis of mass spectrometry data using two different software packages was performed for the first time in the context of PCa, which increased the robustness of the data analysis. The application of proteogenomics to urine proteomic analysis can be very enriching in mutation-related diseases such as cancer.


Introduction
Prostate cancer (PCa) is one of the most prevalent cancers among men and the fifth leading cause of cancer-related death [1]. When detected at early stages, PCa can be treated. However, PCa diagnosis is challenging, largely due to the low specificity of PSA tests, particularly in the diagnostic window of 4-10 ng/mL [2], which underscores the need to identify new and more accurate biomarkers.
An ideal biomarker for PCa should be non-invasively assessed, inexpensive, highly sensitive, and specific [3]. For anatomical reasons, urine is enriched in prostatic secretions and better reflects the molecular changes associated with the prostate than blood, which contains markers and confounding factors from the whole body. Urine can be serially collected, requiring minimal processing steps, and presents a simpler matrix with more stability than blood [4].
The phenotype role of proteins combined with the variety of techniques available for proteome analysis makes the search for protein markers in cancer a very attractive strategy [5]. Some promising single-protein biomarkers have been reported, such as AMBP [6] and zinc-alpha-2-glycoprotein (AZGP1) [7,8]. AMBP discriminated PCa and benign prostatic hyperplasia (BPH) patients with a highest accuracy than that estimated for PSA [9], using 2D-DIGE MALDI-TOF/TOF and immunoturbidimetry as discovery and validation approaches, respectively. AZGP1 significantly improved the prediction of PCa in a cohort of candidates for a prostatic biopsy, using isobaric stable isotope labeling and 2D-LC-MS/MS as the discovery method and Western Blot as the validation approach. Multi-marker panels have been shown to improve performance because they better reflect the cancer complexity and heterogeneity, addressing the limitations of single biomarkers. Although promising, no urine protein panel is available for clinical practice due partly to failure in clinical validation, reflecting the need to discover new biomarkers and/or new combinations of biomarkers [7,8]. Interestingly, and to the best of our knowledge, only one assay (Promark ® ) that quantifies a protein panel in prostate tissue by Mass Spectrometry (MS) is commercially available [10] and, to date, only four mRNA-based urine tests-PCA3 [11], SelectMDX [12], ExoDx Prostate(IntelliScore) [13], and MyProstateScore [14]-have been commercialized.
Cancer is driven by accumulated mutations and other genomic alterations [15]. Mutations on proteins can affect their structure, function, and stability, which may increase their susceptibility to being degraded [16]. As in other types of cancer, in PCa, a weak correlation between RNA and proteins expression is observed. Therefore, the effect of mutations should also be directly investigated at the protein level [17]. To address this inference problem, integration of genome and proteome data (proteogenome) analyses has been performed to identify mutant protein isoforms. Integrated proteogenome analysis can provide new insights into PCa pathophysiology and unveil powerful clinically applicable biomarkers. A shotgun proteomics approach combined with a mutation database has been used to detect mutated peptides related to various types of cancer, such as breast [18], colon [19], and rectal cancer [20]. Still, in PCa, it is mostly unexplored. In 2018, Kwon et al. first applied a proteogenome approach to identify six mutated peptides in the conditioned media from human PCa cell lines related to androgen-independent PCa, which are specific markers for PCa and for metastasis sites [21]. More recently, the same team identified seventy mutant peptides in PCa cell lines, of which seven were differentially expressed in PCa compared to normal tissues [22].
To identify a panel of putative protein markers to be evaluated in a non-invasively collected body fluid for PCa screening, the urine proteome and proteogenome of PCa patients were characterized by an MS-based approach. The integration of results was used to select candidate targets for small-scale clinical testing. MS is widely used to discover urinary protein biomarkers for cancer, including PCa [23]. Usually, biomarker discovery relies on a shotgun proteomics approach, followed by a validation phase using antibodybased techniques or targeted MS. Considering the complex mixture of proteins in urine, separation methodologies are important to increase sensitivity. Thus, a combination of gelbased and gel-free methods, such as GeLC-MS/MS, appears to be a robust and reproducible method for proteome analysis [24], warranting its application in the present work.
This work aims to improve the diagnosis of PCa by investigating the effect of new mutations in proteins that can be detected in urine, a non-invasively collected fluid. Additipnally, it overcomes the limitations of prior studies by using a combination of two software packages for MS data analysis, a proteogenome approach, and a detailed revision and integration of other exploratory proteome analyses to select protein targets.

Patients and Sample Collection
Urine samples were collected, without a prior prostate massage, from patients diagnosed with PCa at the Portuguese Oncology Institute of Porto (IPO Porto, Porto, Portugal), before surgery or therapy. Patients with other types of cancer, obesity, or autoimmune diseases were excluded, and cancer-free subjects had no clinically apparent prostatic disease. All available clinical data of the subjects enrolled in this study (discovery (d) and testing cohorts) is depicted in Tables S1 and S2. The discovery cohort comprised five PCa patients and five cancer-free subjects (controls). The testing cohort comprised thirty patients and thirty cancer-free subjects, not considering benign prostate diseases, such as BPH, due to the unavailability of samples.

Urine Sample Preparation
Urine samples were kept at 4 • C and centrifugated at 4000× g for 20 min at 4 • C. The supernatant (4.5 mL per sample) was collected and stored at −80 • C until laboratory analysis. Each urine sample was concentrated using a filter device (10 kDa cut-off, Vivaspin 500 Sartorius Biotech) by sequential centrifugations at 10,000× g for 10 min at 10 • C. Afterward, the retentate was resuspended in 0.5 M Tris pH 6.8 and 4% SDS and protein concentration were assessed by DCTM kit (Bio-Rad, Hercules, CA, USA).

Protein Identification and Quantification
The MaxQuant (version 1.6.5.0, Thermo software) and Proteome Discoverer (version 2.2, Thermo Fisher Scientific) software packages were used for peptide identification and labelfree quantification. In MaxQuant, the Andromeda, and Proteome Discoverer, the MS Amanda, and Sequest HT search engines were used to search the MS/MS spectra against the Uniprot (TrEMBL and Swiss-Prot) protein sequence database under Homo Sapiens (version December 2018). Both database search parameters were as follows: methionine oxidation, protein N-term acetylation and phosphorylation, as variable modifications, and cysteine carbamidomethylation as a fixed modification. The mass tolerance of precursor mass was 20 ppm for MaxQuant and 10 ppm for Proteome Discoverer, and fragment ion mass tolerance was 0.15 Da (MaxQuant) and 0.02 Da (Proteome Discoverer). Minimal peptide length was set to 7 amino acids and, at most, 2 missed cleavages were allowed for both software. The false discovery rate (FDR) for identification was set to 1% at peptide and protein levels. Only the top-ranking protein of each group (master proteins), identified with at least two peptides, were considered. Exclusion of contaminants relied on those identified by the MaxQuant software and the cRAP protein sequences-THE GPM (https: //www.thegpm.org/crap/) (accessed on 2 April 2019).
The MS proteome data have been deposited on the ProteomeXchange Consortium via the PRIDE [26] partner repository with the data set identifier PXD017902.

Exploratory Analysis of Urine Proteome Data
The protein abundances in Proteome Discoverer (normalized to the respective median) and normalized LFQ intensities in MaxQuant were log 2-transformed. In an exploratory analysis of proteome data, the proteins identified in all individuals were used as variables to perform Principal Component Analysis (PCA) and Heatmap analyses. These analyses were performed on MetaboAnalyst 5.0 [27]. To identify dysregulated proteins in PCa patients, the fold-change in protein abundance between PCa patients and cancer-free subjects was then calculated from the average log2 difference of protein intensities. Student's t-test assessed the statistical significance of this difference.

Comparison with a Previous Bioinformatic Analysis of Putative Urinary Markers of PCa and Selection of Candidate Protein Targets for the Testing Phase
Dysregulated proteins were compared with the results of a bioinformatic analysis focused on comparing and mining the proteome profile of tumor prostate tissue and urine from PCa patients reported by several MS studies [28]. The bioinformatic analysis reported 2641 and 616 dysregulated proteins in tumor prostate tissue and urine from PCa patients, respectively. To place urine proteome as a reflection of events taking place in prostate tissue and to identify specific urinary protein targets for PCa, the dysregulated proteins identified in tumor prostate tissue and urine from PCa patients were compared, resulting in 339 overlapping proteins. In this sense, the dysregulated proteins identified by MS in the present work, common to the 2641 dysregulated proteins expressed in tumor prostate tissue or to the 339 urinary proteins with prostate expression, correspond to the selection criteria of candidate proteins to be tested. Then, the selected proteins were compared with the normal human urinary proteome [29].

Measurement of Urinary PSA Levels
Urinary PSA levels were determined using the same method (Elecsys total PSA, 08791732500) used to determine serum PSA levels. This electrochemiluminescence assay is used in the clinical routine of IPO Porto. It quantifies total PSA (free + complexed PSA) using a Cobas e 801 module, a member of Roche Cobas 8000 Modular Analyzer (Roche, Woerden, The Netherlands).

Identification of Cancer-Associated Mutations
Considering the high impact of mutations on cancer progression, the proteogenome profile of urine from PCa patients was explored. For this, mass spectra resulting from the MS analysis were searched against a database built into the Pinnacle software (https://rimuhc. ca/-/protein-quantification-software-pinnacle?redirect=%2Fproteomics-software, accessed on 5 January 2022). This type of analysis aimed to investigate the existence of cancerassociated mutations that were translated in proteins present in the urine from PCa patients. To select high-confidence urinary proteins with a very likely origin in the prostate, only mutations on proteins present in all samples and with known prostate expression were considered. The prostate proteome was searched in the HPA database and in the abovementioned bioinformatic analysis [28]. The prostate proteome in the HPA consisted of proteins with evidence at the protein level and its last access was on 8 November 2021.

Exploratory Analysis of Urine Proteogenome Data
The abundances of proteins with known prostate expression in Pinnacle were log 2-transformed. In an exploratory analysis of proteogenome data, the levels of mutant protein isoforms identified in all individuals were used as variables to perform Principal Component Analysis (PCA) and Heatmap analyses. These analyses were performed on MetaboAnalyst 5.0 [27]. To identify dysregulated proteins with mutations in PCa patients, the fold-change in protein abundance between PCa patients and cancer-free subjects was then calculated from the average log2 difference of protein intensities. Student's t-test assessed the statistical significance of this difference.

Integration with the Cancer Genome Atlas (TCGA), DisGeNET and Literature Data
To investigate whether mutations identified in proteins with known prostate expression were already described in PCa, TCGA, DisGeNET (v7.0), and literature data were searched.
TCGA is a cancer genomics consortium that generates data (https://www.cancer. gov/tcga, accessed on 12 January 2022) encompassing the profiling of over 20,000 primary tumors and matched non-tumoral samples related to various human cancers, including PCa. The characterization of PCa samples disclosed 20,237 mutated genes and 33,334 mutations. DisGeNET is one of the largest repositories of Gene-Disease (GDA) and Variant-Disease (VDA) Associations [31]. The latest version of DisGeNET contains 1,134,942 GDAs and 369,554 VDAs. In the present work, variants associated with PCa were extracted from the Prostate Carcinoma C0600139 (January 2022).

Comparison of the Levels of Native and Mutant Forms of Proteins in the Urine from PCa Patients
To investigate the influence of mutations on the abundance of proteins with known expression in the prostate, the levels of their native and mutant forms were compared.

Prediction of the Likely Impact of Single-Residue Substitutions in Proteins
The PolyPhen-2 (Polymorphism Phenotyping v2) web tool was used to predict the likely impact of each amino acid substitution on the structure and function of the proteins with known prostate expression [32]. Each mutation is assigned a score, which is the probability of the substitution being damaging, in addition to a sensitivity and specificity value of the prediction confidence. According to the PolyPhen-2 tool, single-residue substitutions in the protein sequence can be classified as benign (score: 0-0.4), possibly damaging (score: 0.4-0.9), or probably damaging (score: 0.9-1) [33].

Protein-Protein Interaction Analysis
Due to the pivotal role of Protein-Protein interactions (PPIs) in cancer and the possible effect of mutations on its dynamics, the interactions between proteins in which point mutations has been identified were explored. For this, the STRING database v 11.5 was sourced on 12 January 2022, and only protein interactions with a confidence score of ≥0.4 were considered [34]. However, we must be cautious when extrapolating the significance of these PPIs to biological fluids such as urine, as most PPIs are identified or predicted from studies in cells and tissues.

Prediction of the Likely Impact of Single-Residue Substitutions in Protein-Protein Affinity
Considering the impact of mutations on PPIs, the SAAMBE-SEQ Web Server was used to predict the effect of point mutations detected in this work on protein binding affinity [35].

Statistical Data Analysis
Statistical analyses were carried out in R software for Windows version 3.6.2 and GraphPad Prism version 6.0 (GraphPad Software, Inc.; San Diego, CA, USA). The Shapiro normality test and visual inspection of the histograms were used to assess the data distribution. To evaluate the effect size of the dysregulated proteins when comparing the tested groups, Cohen's d was determined. Differences were considered statistically significant if p-value was ≤ 0.05. The clinical parameters and protein levels are expressed as mean ± standard deviation (SD).

Urine Proteome Profile of PCa Patients and Cancer-Free Subjects
To identify potential protein targets for PCa prediction, shotgun proteomics was performed in urine collected from PCa patients and cancer-free subjects. To boost MS data analysis, a combination of two different software packages, MaxQuant and Proteome Discover, sourcing three databases (Andromeda, Amanda, and Sequest HT) in total, was used.
Considering only the top-ranking protein of each group identified with at least two peptides and filtering out identifications from reversed sequences and contaminants, 605 and 592 urinary proteins were identified by MaxQuant and Proteome Discoverer, respectively. In total, 732 proteins were identified, excluding those common to both software.

Exploratory Analysis of Urine Proteome Data
Aiming to select and identify proteins of interest for PCa monitoring, only proteins present in all samples analyzed by MaxQuant (82 proteins) and by Proteome Discoverer (84 proteins) were considered for further analysis. These high-confidence proteins were separately used for Principal Component Analysis (PCA) (Figures 1A and 2A) and Heatmap analyses ( Figures 1B and 2B). In both software, no separation of groups was observed in the PCa analysis. However, the proteins identified by the MaxQuant software alone seem to provide a discrimination between PCa patients and non-cancer subjects based on two protein clusters, depicted in the heatmap: AZGP1(zinc-alpha-2-glycoprotein)-SPP1 (Osteopontin); CD14 (Monocyte differentiation antigen CD14)-MASP2 (Mannan-binding lectin serine protease 2) ( Figure 1B). In the first cluster, proteins are mostly upregulated in PCa patients compared to non-cancer subjects, while in the second cluster proteins are predominantly downregulated in PCa patients.
Then, differential protein analysis revealed 18 dysregulated proteins in PCa, with 4 proteins (p-value ≤ 0.05) identified only by Proteome Discoverer, 9 proteins only by MaxQuant analysis, and 5 proteins (Cadherin-1 (CDH1), EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1), Prostate-specific antigen (PSA) (KLK3), Secreted and transmembrane protein 1 (SECTM1), and Transthyretin (TTR)) discovered by both software. Altogether, 11 proteins were significantly downregulated (fold change less than 1), and 7 proteins were significantly upregulated (fold change greater than 1) in PCa patients (Tables 1 and 2). Reassuringly, the most widely used biomarker for PCa diagnosis, PSA, was one of the dysregulated proteins in common in the analysis by both software packages. When the tested groups were compared, proteins showing significant differences (p-value ≤ 0.05) and revealed a "large" effect-size (|Cohen's d|) > 0.8 (Tables 1 and 2). Besides a large effect-size, dysregulated proteins identified by both software presented a consistent direction of dysregulation. It is noteworthy that in the heatmap of MaxQuant data, seven proteins (TTR, KLK3, SECTM1, CDH13, AMY2A, EFEMP1, ITIH4, HSPG2, PTGDS, CDH1, and LMAN2) responsible for the separation of groups were also found dysregulated in PCa patients. It was observed that the decreased levels of SECTM1, CDH13, AMY2A, EFEMP1, ITIH4, HSPG2, PTGDS, CDH1, and LMAN2 and increased levels of TTR and KLK3 characterized the urine proteome of PCa patients. Then, differential protein analysis revealed 18 dysregulated proteins in PCa, with 4 proteins (p-value ≤ 0.05) identified only by Proteome Discoverer, 9 proteins only by MaxQuant analysis, and 5 proteins (Cadherin-1 (CDH1), EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1), Prostate-specific antigen (PSA) (KLK3), Secreted and transmembrane protein 1 (SECTM1), and Transthyretin (TTR)) discovered by both software. Altogether, 11 proteins were significantly downregulated (fold change less than 1), and 7 proteins were significantly upregulated (fold change greater than 1) in PCa patients (Tables 1 and 2). Reassuringly, the most widely used biomarker for PCa diagnosis, PSA, was one of the dysregulated proteins in common in the analysis by both software packages. When the tested groups were compared, proteins showing significant differences (p-value ≤ 0.05) and revealed a "large" effect-size (|Cohen's d|) > 0.8 ( Table  1,2). Besides a large effect-size, dysregulated proteins identified by both software presented a consistent direction of dysregulation. It is noteworthy that in the heatmap of MaxQuant data, seven proteins (TTR, KLK3, SECTM1, CDH13, AMY2A, EFEMP1, ITIH4, HSPG2, PTGDS, CDH1, and LMAN2) responsible for the separation of groups were also found dysregulated in PCa patients. It was observed that the decreased levels of SECTM1, CDH13, AMY2A, EFEMP1, ITIH4, HSPG2, PTGDS, CDH1, and LMAN2 and increased levels of TTR and KLK3 characterized the urine proteome of PCa patients.      To select the most promising proteins for further analysis, dysregulated proteins revealed by MS analysis were compared with proteins resulting from a bioinformatic analysis integrating urine and tumor tissue proteomes of PCa from several MS studies [28]. From this comparison, some common proteins emerged, such as AMBP, CDH1, EFEMP1, KLK3, SECTM1, LMAN2, and TTR.
From the previous study of our group, the dysregulated proteins AMBP, KLK3, LMAN2, and TTR were found dysregulated in urine and tumor tissue from PCa patients, while SECTM1 was only found in urine from PCa patients, and CDH1 and EFEMP1 were only in PCa tissue.
Taken together, and keeping in mind that candidate targets should be urinary proteins with prostate expression, AMBP, CDH1, EFEMP1, KLK3, LMAN2, and TTR were selected for testing in an independent cohort. The presence of these proteins in the urine was already expected, because they are characteristic of the normal human urine proteome [29].

Measurement of Candidate Protein Targets in Urine
Five protein targets, AMBP, CDH1, EFEMP1, LMAN2, and TTR were selected for immunoblot-based testing in a larger and independent cohort (testing group). However, none of the MS findings could be reproduced (Table S3, Figure S1). Measurement of urinary PSA levels in the testing cohort did not agree with the MS findings (p = 0.29, Mann-Whiney test). The results are shown in Figure 3.

Identification of Cancer-Associated Mutations
To characterize the proteogenome landscape of urine from PCa patients, MS/MS spectra were searched against a repository of information from a wide variety of databases encompassing somatic mutations. This search resulted in identifying 6418 mutated peptides corresponding to 1665 mutant protein isoforms. Of these, 609 mutated peptides, which correspond to 417 mutant protein isoforms, were associated with cancer. Only mutant protein isoforms that occurred in all urine samples (322 proteins) were selected for further analysis. Immunoglobulins and highly abundant urinary proteins (serum albumin, uromodulin, serotransferrin) were excluded due to their high abundance in biological samples and the lack of specificity for cancer, resulting in 170 proteins. These 170 proteins corresponded to 122 proteins after filtering out duplicates. As our focus was high confidence proteins with mutations whose origin was very likely the prostate, these data were integrated with the prostate proteome searched in the HPA database and in a bioinformatic analysis [28], resulting in 86 proteins with known expression in the prostate (Table S4). Among these proteins are some of known relevance for PCa, namely Acid ceramidase (ASAH1), Extracellular superoxide dismutase [Cu-Zn] (SOD3), Glutathione S-transferase P (GSTP1), Osteopontin (SPP1), Prostatic acid phosphatase (PAP), and Zinc-alpha-2-glycoprotein (ZAG).

Exploratory Analysis of Urine Proteogenome Data
The levels of the mutant protein isoforms were used for PCA (Principal Component Analysis) ( Figure 4A) and Heatmap analyses ( Figure 4B). No group separation was observed in the PCA of the proteogenome profile of PCa patients. However, the heatmap indicates a discrimination between PCa patients and non-cancer subjects based on two protein clusters: ITIH4*G893S (Inter-alpha-trypsin inhibitor heavy chain H4)-LMAN2*D222N (Vesicular integral-membrane protein VIP36); KLK3*C209Y (PSA)-MVB12B*T198M (Multivesicular body subunit 12B) ( Figure 4B). In the first cluster, mutant forms of proteins are mostly downregulated in PCa patients compared to non-cancer subjects, while in the second cluster mutant forms of proteins are upregulated predominantly in PCa patients. which correspond to 417 mutant protein isoforms, were associated with cancer. Only mutant protein isoforms that occurred in all urine samples (322 proteins) were selected for further analysis. Immunoglobulins and highly abundant urinary proteins (serum albumin, uromodulin, serotransferrin) were excluded due to their high abundance in biological samples and the lack of specificity for cancer, resulting in 170 proteins. These 170 proteins corresponded to 122 proteins after filtering out duplicates. As our focus was high confidence proteins with mutations whose origin was very likely the prostate, these data were integrated with the prostate proteome searched in the HPA database and in a bioinformatic analysis [28], resulting in 86 proteins with known expression in the prostate (Table S4). Among these proteins are some of known relevance for PCa, namely Acid ceramidase (ASAH1), Extracellular superoxide dismutase [Cu-Zn] (SOD3), Glutathione S-transferase P (GSTP1), Osteopontin (SPP1), Prostatic acid phosphatase (PAP), and Zinc-alpha-2-glycoprotein (ZAG).

Exploratory Analysis of Urine Proteogenome Data
The levels of the mutant protein isoforms were used for PCA (Principal Component Analysis) ( Figure 4A) and Heatmap analyses ( Figure 4B). No group separation was observed in the PCA of the proteogenome profile of PCa patients. However, the heatmap indicates a discrimination between PCa patients and non-cancer subjects based on two protein clusters: ITIH4*G893S (Inter-alpha-trypsin inhibitor heavy chain H4)-LMAN2*D222N (Vesicular integral-membrane protein VIP36); KLK3*C209Y (PSA)-MVB12B*T198M (Multivesicular body subunit 12B) ( Figure 4B). In the first cluster, mutant forms of proteins are mostly downregulated in PCa patients compared to non-cancer subjects, while in the second cluster mutant forms of proteins are upregulated predominantly in PCa patients.

Integration with the Cancer Genome Atlas (TCGA), DisGeNET and Literature Data
According to TCGA, DisGeNET, and the literature, only three of the mutations identified in the 86 proteins with known prostate expression have already been described. These mutations (rs17632542, rs1695, rs7041) were mapped on KLK3 (PSA) [36], GSTP1 (Glutathione S-transferase P) [37,38], and GC (Vitamin D-binding protein) [39], respectively. To the best of our knowledge, there is no association of the remaining mutant protein isoforms with PCa. Especially notable are the proteins SPP1, VASN, ASAH1, RBP4, and ASS1, which, until now, have had no mutation related to PCa described in the literature.
Comparing the proteome profile analysis of MaxQuant and Proteome Discoverer with the proteogenome profile of PCa patients resulted in 30 and 31 common proteins, respectively. Of these common proteins, AMBP, CDH1, EFEMP1, HSPG2, ITIH4, KLK3, LMAN2, PTGDS, VASN, and CD55 proteins stood out. The native form of AMBP, CDH1, EFEMP1, HSPG2, ITIH4, KLK3, LMAN2, and PTGDS proteins was found dysregulated in urine from PCa patients, but only the mutant protein isoforms (AMBP*A286G; HSPG2*Q1062H) were found dysregulated ( Figure S2). In the remaining common proteins, the presence of mutations did not affect their abundance in urine. The native form of VASN and CD55 proteins was not found dysregulated in the urine from PCa patients, but their mutant protein isoforms (VASN*R161Q; CD55*S162L) were.
The mutations identified in these proteins and in those with recognized relevance to PCa are summarized in Table 3.   [61]. Dysregulation of the LMAN2 gene has been indicated in some cancers [62][63][64], while the role in PCa remains obscure. However, raised LMAN2 urinary levels were detected in PCa patients [44].

P41222
Prostaglandin-H2 D-isomerase PTGDS L130M missense PTGDS is involved in prostaglandins metabolism and lipid transport. The PTGDS gene is downregulated in malignant prostate tissues compared to non-malignant tissues and integrates a signature that predicts relapse after prostatectomy. In vitro, its overexpression increased death and suppressed the growth of PCa cells [65,66].

Q13510
Acid ceramidase ASAH1 V246A missense ASAH1 hydrolyzes ceramide to sphingosine and fatty acid [67] and its protein levels are elevated in tumor prostate tissue [68]. Its increased levels have been suggested as a therapeutic target in PCa as they have been correlated with metastasis establishment and resistance to chemotherapy [69,70].
A58T missense SOD3 is a known tumor suppressor gene in PCa. It is an antioxidant enzyme that catalyzes the dismutation of the superoxide radical anion [71]. SOD3-reduced levels were reported in PCa patients, and its overexpression in PCa cells prevented cell proliferation, migration, and invasion, suggesting a role as a therapeutic target and predictive marker [72,73].

P09211
Glutathione S-transferase P GSTP1 I105V missense GSTP1 is a known tumor suppressor gene in PCa and is responsible for cellular detoxification through glutathione conjugation [74]. PCa is characterized by loss of GSTP1 function, mostly due to hypermethylation of its regulatory CpG island [75], and it is purported to occur early in prostatic carcinogenesis [76,77]. SPP1 is a bone matrix protein involved in bone remodeling, modulation of inflammation, cell adhesion, and migration and angiogenesis [78]. In PCa, SPP1 is associated with metastasis and proliferation [79], lower overall survival and biochemical relapse-free survival, and high GS [80]. Higher SPP1 levels were reported in PCa patients [80][81][82].

P15309
Prostatic acid phosphatase PAP G68D missense PAP is one of the main secreted proteins by the prostate cells and was the first serum screening marker for PCa. PAP was latter replaced by PSA [83,84].

P25311
Zinc-alpha-2glycoprotein ZAG P187L; A46T missense ZAG promotes adipocyte lipolysis, resulting in cancer cachexia [85]. Elevated levels of this protein have been proposed as a serum marker for PCa [86,87], and a significant predictive ability was found for urinary ZAG [8].

Q4ZJI4
Sodium/hydrogen exchanger 9B1 SLC9B1 N70S missense SLC9B1 is a Na + /H + transporter responsible for preserving cellular homeostasis [88], but this transporter has not yet been correlated with any type of cancer.

Q9P2J8
Zinc finger protein 624 ZNF624 S207F missense ZNF624 has not been well studied yet, but in breast cancer was one of the target genes of a microRNA found to be significantly and independently correlated with patient prognosis [89].
This table shows the UniProt IDs, protein and gene names, mutation site/description and type, and the role of proteins in PCa.

Prediction of the Likely Impact of Single-Residue Substitutions in Proteins
With the purpose of determining the potential impact of point mutations on protein function, PolyPhen-2 tool was used. It is worthy of mention that AMBP*A286G and CD55*S162L mutant protein isoforms were predicted to be probably damaging, while SLC9B1*N70S, ZNF624*S207F, VASN*R161Q, and HSPG2*Q1062H were predicted to be benign. Most point mutations were predicted to be possibly or probably damaging. The results are presented in Tables 4 and S5.

Protein-Protein Interaction Analysis
In addition to impacting the function of proteins, mutations can also affect interactions between proteins and, consequently, important biological processes and signaling pathways. To predict interactions between the proteins in which point mutations were identified, the STRING search tool was used. As shown in Figure 5, the network consisted of 86 connected proteins (nodes) through 214 edges with different confidence levels. The protein-protein interaction enrichment p-value was <1.0 × 10 −16 . Reactome enrichment analysis showed 12 pathways enriched in this network (Table S6). Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs) was the third most important pathway in this network, while Extracellular matrix (ECM) organization was the tenth. This network shows predicted interactions between most of the proteins.

Prediction of the Likely Impact of Single-Residue Substitutions in Protein-Protein Affinity
To predict the impact of point mutations on PPIs, the SAAMBE-SEQ tool was used. The likely effect of AMBP*A286G, HSPG2*Q1062H, VASN*R161Q, and CD55*S162L point mutations on protein-protein interactions was scrutinized. Point mutations detected on SLC9B1 and ZNF624 were not examined as these proteins do not interact with any proteins in the network. Additionally, the impact of point mutations on proteins involved in the IGF pathway was also explored. This analysis revealed that the likely effect of these point mutations is destabilizing for PPIs (Table S7).

Prediction of the Likely Impact of Single-Residue Substitutions in Protein-Protein Affinity
To predict the impact of point mutations on PPIs, the SAAMBE-SEQ tool was used. The likely effect of AMBP*A286G, HSPG2*Q1062H, VASN*R161Q, and CD55*S162L point mutations on protein-protein interactions was scrutinized. Point mutations detected on SLC9B1 and ZNF624 were not examined as these proteins do not interact with any proteins in the network. Additionally, the impact of point mutations on proteins involved in the IGF pathway was also explored. This analysis revealed that the likely effect of these point mutations is destabilizing for PPIs (Table S7).

Discussion
The limitations and the invasive nature of serum PCa screening have driven the discovery of new candidate urinary biomarkers, especially protein markers. However, so far, none has translated into clinically useful tools, reflecting the need to discover novel biomarkers and/or new combinations of biomarkers. Thus, this study aimed to take advantage of a non-invasively collected biofluid, urine, and a high throughput approach, proteomics, to identify new protein targets for predicting the risk of developing PCa. This work was divided into three stages: characterization of the urine proteome profile and selection of protein targets; testing of shortlisted protein targets in a larger, independent cohort; and characterization of the urine proteogenome profile. The urine proteome profile of PCa and cancer-free subjects was analyzed by two software packages and 18 dysregulated proteins, of which 5 (TTR, EFEMP1, CDH1, SECTM1, KLK3) common to both software, were found. The integration of the urine proteome profile of PCa patients with proteome data from other studies reviewed by us [28] supported the selection of potential discriminatory protein targets. As a result, AMBP, CDH1, EFEMP1, LMAN2, and TTR stood out as potential targets and were tested in an independent cohort of patients. In this testing phase, incubation with anti-E-cadherin did not result in a band around 120 kDa (full-length protein), but rather a band about 80 kDa. We realized that this 80 kDa fragment corresponded to soluble E-cadherin (sE-cadherin) and has been previously identified in tissue and serum from PCa patients [93,94] and in urine from patients with other cancers [95,96], using antibody-based techniques. Concerning PCa, as far as we know, here we present the first report of the detection of sE-cadherin fragment in the urine. Kuefer et al. [93] suggested that the 80 kDa fragment is originated from the extracellular domain of full-length E-cadherin. Increased levels of sE-cadherin have been reported in serum and tumor prostate tissue from PCa patients and are correlated with disease stage [94,97,98]. Differential abundances of these MS-detected proteins were tested in an independent cohort using immunoblot, but different variations were observed. Additionally, urinary PSA levels were also assessed in this independent cohort, but did not distinguish PCa patients from controls, which agrees with other studies [99].
The proteogenome landscape of urine from PCa patients was then characterized and 1665 mutant protein isoforms were disclosed, of which 417 were cancer-related mutations. After considering only mutations present in all urine samples and proteins with known prostate expression, 86 mutant protein isoforms emerged. Among these proteins are some of known relevance for PCa, namely Acid ceramidase (ASAH1), Extracellular superoxide dismutase [Cu-Zn] (SOD3), Glutathione S-transferase P (GSTP1), Osteopontin (SPP1), Prostatic Acid Phosphatase (PAP), and Zinc-Alpha-2-Glycoprotein (ZAG). PAP is gaining renewed interest due to its superior predictive role of cause-specific survival and GS compared to serum PSA in men with high risk PCa [100,101]. Remarkably, it was recently suggested that a form of PAP (PLPAcP) associates with early PCa [102]. Identifying a new mutation in this protein in a non-invasive biological fluid, adding to the prediction of PAP mutation to be probably damaging, strengthens the renewed interest in its study in PCa. Mutations found on the 86 proteins were searched for in databases and the literature and, to the best of our knowledge, only rs17632542 [36,[103][104][105], rs1695 [37,38,106,107], and rs7041 [39] mutations mapped on PSA, GSTP1, and GC proteins have been described in the PCa context. In that vein, these results validate the proteogenome analysis performed in the present study.
The analysis of the urine proteogenome profile of PCa patients revealed 6 differentially abundant mutant protein isoforms, namely AMBP*A286G, SLC9B1*N70S, HSPG2*Q1062H, ZNF624*S207F, VASN*R161Q, and CD55*S162L. From the comparison of the proteome and proteogenome profile of PCa patients, AMBP, CDH1, EFEMP1, KLK3, and LMAN2 proteins stood out. Their native form was found dysregulated in urine from PCa patients, but the same was not observed with their mutant form, with the exception of AMBP*A286G and HSPG2*Q1062H. These results may explain the differences between MS and immunoblot data, because the antibodies either do not recognize the mutated peptides or do not specifically recognize them.
PPIs play a pivotal role in most biological processes. Dysregulation of these protein interactions may result in pathological conditions, such as cancer, being involved in tumor progression, invasion, and metastasis [108,109]. In this sense, PPIs have been claimed as promising therapeutic targets for numerous types of cancer, including for PCa. For this type of cancer, 28 small molecules and 14 peptides have been proposed to disrupt PPIs with relevance to PCa progression [110]. To explore PPIs between proteins with known prostate expression and the pathways in which these interactions were involved, the STRING tool was used. In this analysis, the IGF transport and uptake by IGFBPs proved to be the third most important pathway in the network. The IGF axis is a network of ligands (GF1, IGF2, insulin) and IGFBP receptors (IGF1R, IGF2R, INSR), the latter being responsible for mediating the activity of IGFs [111]. IGFs are oncogenic regulators, promoting prostate tumor growth, survival, and proliferation, and the role of IGF axis has been well documented in PCa. For instance, IGFBP-2 enhanced proliferation of androgen-independent prostate cancer cells [112] and IGF-I levels were found raised in serum and prostate tissue from PCa patients, being a predictor of risk for this type of cancer [113,114]. In accordance with this, IGF1R and INSR act as oncogenes in PCa, enhancing tumor growth, proliferation, invasion, and angiogenesis [115]. Considering the relevance of the IGF pathway in PCa, the impact of mutations on the interaction of proteins involved in this pathway was predicted. According to SAAMBE-SEQ, the mutations were predicted to destabilize all PPIs involved in the IGF pathway, which naturally could affect this pathway and consequently the progression of PCa.
To investigate the likely impact of each amino acid substitution on protein function and PPIs involving the dysregulated mutant protein isoforms (AMBP*A286G, SLC9B1*N70S, HSPG2*Q1062H, ZNF624*S207F, VASN*R161Q, and CD55*S162L), the PolyPhen-2 and SAAMBE-SEQ prediction tools were used. The role of the SLC9B1 and ZNF624 proteins on cancer is completely unknown, so the downregulation of their mutant protein isoforms and the prediction of their benign impact do not allow conclusions to be drawn. HSPG2, in its intact form, is a well-described pro-angiogenic molecule, being correlated with GS and increased cell proliferation and viability [55,56,116]. The intact form of this protein was found increased in tumor prostate tissue, but in sera from PCa patients raised levels of HSPG2-derived fragments resulting from matrix metalloproteinase 7 (MMP7) degradation were observed. These fragments were mostly originated from domain IV and were not present in sera from non-cancer subjects, suggesting that HSPG2 cleavage occurs during metastasis and before the protein enters the bloodstream. Using an in silico analysis, Grindel et al. predicted that domains III and V of HSPG2 are the most prone to cleavage by MMP-7 and generate new peptides for other extracellular proteases to digest [55]. Curiously, in this work, the mutated peptide identified in the mutant HSPG2 isoform is located on domain III. The cleavage of HSPG2 and other components of basement membrane occurs during PCa cell invasion and is orchestrated by proteases such as MMPs, cathepsin L, and BMP1/Tolloid-like proteases. Both Cathepsin L and BMP1/Tolloid-like proteases cleave HSPG2 in domain V, originating the Endorepellin [117] and LG3 [118] peptides, respectively. Unlike the intact form, cleaved Endorepellin and LG3 peptides behave as powerful anti-angiogenic factors, being claimed as potential therapeutic targets for cancer [118]. In fact, the administration of endorepellin to mice with squamous cell carcinomas and lung carcinomas resulted in mitigation of tumor growth, angiogenesis and metabolism and promotion of tumor hypoxia [119]. Accordingly, LG3-diminished levels were noticed in breast cancer cells and in plasma from breast cancer patients [120]. Only the LG3 peptide has been detected in urine [121,122]. In PCa, both the existence and the role of these peptides are unknown, and the only recognized HSPG2 protease is MMP7. A complex network between HSPG2 and other basement membrane components, such as collagens, laminin, and nidogen is responsible for ECM integrity. When this integrity is disturbed, the metastatic process is compromised [123]. In the present work, mutations were identified in HSPG2, collagens, nidogen, and in other proteins involved in ECM organization. When the impact of these mutations on PPIs was predicted, they all proved to be destabilizing, which eventually affects ECM dynamics and tumor progression. All these results, together with the fact that the HSPG2*Q1062H point mutation was predicted to be benign and the mutant peptide was downregulated in PCa patients, suggest that this mutant peptide may have beneficial effects in patients with PCa and opens doors for its study in PCa treatment. Concerning the AMBP protein, it is cleaved into three chains, namely Alpha-1-microglobulin, Bikunin, and Trypstatin. The function of the AMBP protein in cancer remains undisclosed. However, it has been claimed that the AMBP-derived product bikunin is underexpressed in oral squamous cell carcinoma and plays an antitumor role [40]. In line with this, there is evidence that bikunin significantly prevented tumor invasion and metastasis in Lewis lung carcinoma and ovarian carcinoma cells [124,125]. Curiously, in this work, the mutant peptide identified in the AMBP isoform is located on the bikunin fragment. The mutation identified in AMBP was predicted to be probably damaging, destabilized all PPIs in which AMBP was involved, and resulted in an upregulation of mutant AMBP isoform in PCa patients. This may suggest a detrimental role of this mutation on PCa patients. Regarding CD55, it blocks complement response by accelerating the decay of C3 and C5 convertases [126] and is involved in PCa cell survival and metastasis [92]. This interplay between CD55 and C3 is visible by their interaction in the STRING network. The mutation detected on the CD55 protein was predicted to be probably damaging and destabilizing for CD55-C3 interaction. With these findings, it seems reasonable to suspect the detrimental role of this mutation on PCa patients. Regarding VASN, it is a known inhibitor of TGF-β signaling [127]. The TGF-β pathway has a dual role in cancer, because it prevents cell proliferation in early stages and in advanced stages stimulates proliferation, epithelial-to-mesenchymal transition (EMT) and evasion of immune surveillance, and attenuates apoptosis [128]. The mechanism involved in this inhibitory action of VASN on TGF-beta was revealed in breast cancer cell lines. It was demonstrated that a soluble form of VASN resulting from the proteolytic shedding of its extracellular domain by Metalloprotease domain 17 (ADAM17) is responsible for controlling the TGFβ pathway [129]. In PCa, the role of VASN is largely unexplored, including the interplay between the VASN and TGFβ pathways. However, overexpression of VASN in prostate tumor tissue and in serum from PCa patients and the subsequent promotion of cell proliferation and PCa progression have already been reported, in agreement with other types of cancer [90]. Interestingly, in this work, the mutated peptide identified in the VASN protein is located on the extracellular domain of the protein, the domain cleaved by ADAM17. The mutation identified in VASN resulted in a downregulation of this mutant protein isoform in PCa patients and was predicted to be benign, which may suggest a protective role of this mutation on PCa patients.
These findings indicate that, in mutational diseases such as cancer and in biofluids with high proteolytic activity, such as urine, the application of proteogenomics to urine analysis and the study of peptides can be very enriching because point mutations can go unnoticed at the protein level but are detected at the peptide level. This may sharpen or renew interest in underexplored targets, as observed in this work. We hope to address some of these questions in future work. Furthermore, it would be interesting to test these mutant peptides by an MS-targeted approach such as MRM, but this is beyond the scope of this work. This work's novelty lies in the proteogenome characterization of urine from PCa patients and the combined analysis of MS data using two different software packages, increasing certainty in the identification of urinary proteins modulated by PCa.

Conclusions
The majority of mutations identified in this work have never been associated with PCa, and some are predicted to be damaging, which offers an auspicious opportunity for research and development of PCa biomarkers, especially in the HSPG2 context. Additionally, the discovery of cancer-associated mutations in PCa-related proteins in urine is promising given this biofluid's non-invasive and dynamic nature.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers14082001/s1: Table S1, Clinical data of subjects included in the discovery cohort group; Table S2, Clinical data of subjects included in the testing cohort group; Table S3, Summary of statistical analysis results of shortlisted proteins evaluated in the testing group; Table S4, List of mutant protein isoforms identified in the 86 proteins with known prostate expression; Table S5, Prediction of likely impact of point mutations on protein function using PolyPhen-2 tool; Table S6, Reactome pathway enrichment analysis of the network; Table S7, Prediction of likely impact of point mutations on protein-protein interactions using SAAMBE-SEQ tool; Figure S1, Original Western blots figures; Figure S2, Levels of AMBP*A286G, SLC9B1*N70S, HSPG2*Q1062H, ZNF624*S207F, VASN*R161Q, and CD55*S162L mutant protein isoforms and respective levels of native form (when applicable) in the urine from PCa patients.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the IPO-Porto Ethics Committee (Comissão de Ética para a Saúde, Reference 282R/2017).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data generated during Mass Spectrometry analysis is available in the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD017902.

Conflicts of Interest:
The authors declare no conflict of interest.