An Integrated Meta-Analysis of Secretome and Proteome Identify Potential Biomarkers of Pancreatic Ductal Adenocarcinoma

Pancreatic ductal adenocarcinoma (PDAC) is extremely aggressive, has an unfavorable prognosis, and there are no biomarkers for early detection of the disease or identification of individuals at high risk for morbidity or mortality. The cellular and molecular complexity of PDAC leads to inconsistences in clinical validations of many proteins that have been evaluated as prognostic biomarkers of the disease. The tumor secretome, a potential source of biomarkers in PDAC, plays a crucial role in cell proliferation and metastasis, as well as in resistance to treatments, which together contribute to a worse clinical outcome. The massive amount of proteomic data from pancreatic cancer that has been generated from previous studies can be integrated and explored to uncover secreted proteins relevant to the diagnosis and prognosis of the disease. The present study aimed to perform an integrated meta-analysis of PDAC proteome and secretome public data to identify potential biomarkers of the disease. Our meta-analysis combined mass spectrometry data obtained from two systematic reviews of the pancreatic cancer literature, which independently selected 20 studies of the secretome and 35 of the proteome. Next, we predicted the secreted proteins using seven in silico tools or databases, which identified 39 secreted proteins shared between the secretome and proteome data. Notably, the expression of 31 genes of these secretome-related proteins was upregulated in PDAC samples from The Cancer Genome Atlas (TCGA) when compared to control samples from TCGA and The Genotype-Tissue Expression (GTEx). The prognostic value of these 39 secreted proteins in predicting survival outcome was confirmed using gene expression data from four PDAC datasets (validation set). The gene expression of these secreted proteins was able to distinguish high- and low-survival patients in nine additional tumor types from TCGA, demonstrating that deregulation of these secreted proteins may also contribute to the prognosis in multiple cancers types. Finally, we compared the prognostic value of the identified secreted proteins in PDAC biomarkers studies from the literature. This analysis revealed that our gene signature performed equally well or better than the signatures from these previous studies. In conclusion, our integrated meta-analysis of PDAC proteome and secretome identified 39 secreted proteins as potential biomarkers, and the tumor gene expression profile of these proteins in patients with PDAC is associated with worse overall survival.


Integration of Secretome and Proteome Meta-Analysis Identifies 39 Secreted Proteins in Pancreatic Ductal Adenocarcinoma
We performed an integrative meta-analysis on pancreatic cancer secretome and proteome data to identify clinically relevant diagnostic and prognostic biomarkers. According to the meta-analysis study design and inclusion and exclusion criteria (Figure 1), 20 secretome studies and 35 proteome studies in pancreatic cancer were selected, which reported protein data obtained by mass spectrometry (Tables 1 and 2) [3,6,29,34,37,41,43,[45][46][47]. These data identified 782 proteins in pancreatic cancer secretome and 517 proteins in pancreatic ductal adenocarcinoma tumor samples. Interestingly, we did not identify a protein shared by all secretome and proteome studies. Therefore, we chose to select proteins present in two or more studies. This approach allowed us to obtain 156 and 132 proteins in the pancreatic cancer secretome and proteome, respectively ( Figure 2 and Tables S1-S4). The intersection between the secretome and proteome protein lists revealed 43 proteins in common between the two meta-analysis strategies ( Figure 2). These 43 shared proteins were further verified as secreted using prediction analysis regarding the nature of secretion ( Figure 2). These analyses confirmed the selection of proteins that are not be derived from other mechanisms such as cell death. Among the 43 proteins verified by SignalP, SecretomeP, Exocarta, Vesiclepedia, TargetP, and TMHMM, the proteins Vimentin (VIM), Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), and Superoxide dismutase 2 (SOD2) were predicted to contain mitochondrial sublocation (TargetP) and the Transforming growth factor beta induced (TGFBI) protein was predicted to contain transmembrane helices (TMHMM). These four proteins were eliminated, resulting in a final list of 39 proteins predicted as secreted. This set of proteins is also detected in plasma, as confirmed in the Plasma Proteome database (Table 3 and Table S5).      [43] AsPC1, MiaPaCa2, Panc1 Free LC-MS/MS - [41] Panc1 iTRAQ LC-MS/MS WB, ELISA [45] Paca44, Panc1, BxPc3, MiaPaca2, HPSC, A818-4 Free LC-MS/MS - [46] BxPC-3, MIA PaCa-2, Panc1, AsPC-1 Free LC-MS/MS WB, IHC, ELISA [93] CAPAN-2, RLT-PSC Silac LC-MRM/MS - [47] PC-1.0, PC-1 (Hamster) Silac Nano-RPLC-MS/MS WB [49] BxPC3, MiaPaca2, Panc1 Free ESI-MS/MS WB [50] SOJ-6, BxPC-3, MiaPaCa-2, Panc-1 Free MALDI-TOF MS WB [34] PAN02 (Mouse) Free MS/MS ELISA [51] NIT-1 Free MS/MS WB [52] Panc-1 Free LC-MS/MS - [53] MiaPaCa-2, BxPc-3, Panc-1, AsPc-1 Free MALDI-TOF MS WB [54] Adenocarcinoma tissue Free LC-MS/MS WB [55] BON-1, NCI-H727, SHP-77 Free LC-MS/MS WB [3] BxPc3, MIA-PaCa2, Panc1, CAPAN1, CFPAC1, SU.86.86, HPDE, PJ Free LC-MS/MS ELISA [56] KLM, PK-59, MIAPaCa2 Free MS/MS WB [57] Panc-1, SW1990 iTRAQ LC-MS/MS - [58] MIA PaCa-2 labeling MALDI-TOF MS WB [59] CAPAN   . Schematic representation of the workflow used to identify secreted proteins in pancreatic cancer studies. Two meta-analyses of publicly available proteomic studies were used to identify secreted proteins in pancreatic ductal adenocarcinomas: meta-analysis 1 identified 782 proteins in the secretome, and meta-analysis 2 identified 517 proteins in the proteome. Subsequently, the proteins present in two or more studies were selected. This strategy resulted in a final list containing 156 proteins in the secretome (meta-analysis 1) and 132 proteins in the proteome (meta-analysis 2). The intersection between the proteins identified 43 shared proteins between the two meta-analyses. These 43 proteins were further verified as secreted using the algorithms available at Center for Biological Sequencing Analysis (CBS): SignalP [97] (identifies classical secreted proteins, presence of signal peptide); SecretomeP [98] (identifies non-classical secreted proteins); and the databases Vesiclepedia [99] (protein data in secretory vesicles), ExoCarta [100] (protein data in exosomes) and Plasma Proteome [101] (protein data identified in the blood). Proteins were excluded from further analysis if detected by the CBS TargetP [102] (mitochondrial protein) or TMHMM [103] (transmembrane helix protein) algorithms, resulting in a final list of 39 proteins in the pancreatic ductal adenocarcinoma shared by proteome and secretome studies. Schematic representation of the workflow used to identify secreted proteins in pancreatic cancer studies. Two meta-analyses of publicly available proteomic studies were used to identify secreted proteins in pancreatic ductal adenocarcinomas: meta-analysis 1 identified 782 proteins in the secretome, and meta-analysis 2 identified 517 proteins in the proteome. Subsequently, the proteins present in two or more studies were selected. This strategy resulted in a final list containing 156 proteins in the secretome (meta-analysis 1) and 132 proteins in the proteome (meta-analysis 2). The intersection between the proteins identified 43 shared proteins between the two meta-analyses. These 43 proteins were further verified as secreted using the algorithms available at Center for Biological Sequencing Analysis (CBS): SignalP [94] (identifies classical secreted proteins, presence of signal peptide); SecretomeP [95] (identifies non-classical secreted proteins); and the databases Vesiclepedia [96] (protein data in secretory vesicles), ExoCarta [97] (protein data in exosomes) and Plasma Proteome [98] (protein data identified in the blood). Proteins were excluded from further analysis if detected by the CBS TargetP [99] (mitochondrial protein) or TMHMM [100] (transmembrane helix protein) algorithms, resulting in a final list of 39 proteins in the pancreatic ductal adenocarcinoma shared by proteome and secretome studies. The PPI network and gene ontology (GO) of the 39 proteins were generated using the Search Tool for Retrieval of Interacting Genes (STRING) database [101]. Data from this database revealed a complex interaction network between 26 proteins with strong associations represented by thick lines; the disconnected nodes in the network were hidden (Figure 3). Gene Ontology analysis revealed significant protein enrichment in categories of extracellular exosomes, membrane-bound vesicles, extracellular part, and blood microparticles ( Figure 3 and Table 4). This analysis also showed involvement in biological processes such as glycolysis, regulation of apoptotic processes, vesicle-mediated transport, and stress response. The enriched molecular function was enzymatic and RNA binding (Table 4). These results further confirm our findings showing that the proteins identified in our integrative pancreatic cancer secretome meta-analysis are secreted and located mainly in extracellular compartments. extracellular part, and blood microparticles ( Figure 3 and Table 4). This analysis also showed involvement in biological processes such as glycolysis, regulation of apoptotic processes, vesiclemediated transport, and stress response. The enriched molecular function was enzymatic and RNA binding (Table 4). These results further confirm our findings showing that the proteins identified in our integrative pancreatic cancer secretome meta-analysis are secreted and located mainly in extracellular compartments.  [104] illustrates potential interactions between secreted proteins with a minimum confidence score of 0.7. Proteins in the interaction network are represented as nodes connected by lines whose thickness reflects a confidence index higher than 0.7. The top five enriched GO terms are represented by the network node colors for each protein: red, extracellular exosome; blue, membrane-bounded vesicle; green, blood microparticle; yellow, cytoplasmic membrane-bounded vesicle lumen; purple, secretory granule lumen.   [101] illustrates potential interactions between secreted proteins with a minimum confidence score of 0.7. Proteins in the interaction network are represented as nodes connected by lines whose thickness reflects a confidence index higher than 0.7. The top five enriched GO terms are represented by the network node colors for each protein: red, extracellular exosome; blue, membrane-bounded vesicle; green, blood microparticle; yellow, cytoplasmic membrane-bounded vesicle lumen; purple, secretory granule lumen.

Secretome-Related Gene Expression is Enriched in Pancreatic Ductal Adenocarcinoma
The gene expression levels of the secretome proteins identified in our integrative meta-analysis were analyzed using the online Gene Expression web-based Profiling Analysis (GEPIA) tool [102]. This tool allows the comparison of transcriptome profiles from TCGA and GTEx using uniformly processed and unified RNA sequencing data by Toil Pipeline [103]. We obtained gene expression profiles of 4743 tumors comprising 10 cancers (TCGA) and 2737 corresponding normal tissues (TCGA and GTEx) ( Table S6). The tumors types: gastric carcinoma (GC), colon adenocarcinoma (COAD), Hepatocellular carcinoma (HCC), Lung squamous cell carcinoma (Lung SCC), Breast carcinoma (BC), Head and neck squamous cell carcinoma (HNSCC), Esophageal carcinoma (ESCC), Lung adenocarcinoma (Lung AD), and Acute myeloid leukemia (AML) were selected to allow a comparison between some prevalent cancers. The expression profile of the 39 genes encoding secreted proteins was analyzed using the GEPIA tool to identify prognostic biomarkers of pancreatic ductal adenocarcinoma. The analysis revealed that 31 genes are significantly upregulated in PDAC (Log2 fold change > one and q-values cutoff = 0.01) when compared to normal tissues ( Figure 4). When we extended the analysis to other TCGA cancers types, the gene expression profile of the 39 secreted-related proteins genes was enriched in PDAC, despite sharing a high number of significantly upregulated genes with other cancers types such as GC, COAD, and HCC ( Figure 5, Figure S1A). In this analysis, we can also observe that ARHGDIA and PARK7 are exclusively upregulated in PDAC ( Figure S1A). The Euclidean cluster analysis performed on the 39 secretome genes demonstrates that PDAC exhibits a gene expression profile that is clearly distinct from other cancers ( Figure 5). This analysis also demonstrated the opposite expression profile for acute myeloid leukemia (AML) ( Figure 5). The principal component analysis (PCA) of the expression profiles of the 39 secretome genes in 10 tumor types showed that PDAC and AML are distinguished between cancer types based on the gene expression profile of the 39 secretome proteins and are capable of clearly distinguish PDAC from other cancer types ( Figure S2).
We also evaluated the expression profile of the secretome proteins in normal tissues in order to find pancreas-specific proteins deregulated in PDAC. With the help of the GTEx database [104], the expression profiles of the majority of the proteins identified in our study are uniformly expressed across the tissues analyzed ( Figure S1B); however, trypsin-1 (PRSS1) is upregulated explicitly in normal pancreatic tissue when compared to other normal tissues ( Figure S1B). Interestingly, we found that PRSS1 is negatively expressed in PDAC ( Figure 4 and 5), although it has often been related to pancreatic cancer [105,106].
Cancers 2020, 12, x 11 of 32   Both columns (tumor types) and rows (secretome genes) were clustered using Euclidian distance. Three genes (ARHGDIA, AHSG, and EEF1A1) presented missing values in 90%-100% of the tumor types and are not represented in the heatmap (constant rows were removed during data pre-processing). The differential expression levels from tumor tissue versus combined normal TCGA and GTEx data were calculated using the web-based Gene Expression Profiling Analysis tool (GEPIA) [105]. GC: Gastric carcinoma; COAD: Colon adenocarcinoma; HCC: Hepatocellular carcinoma; Lung SCC: Lung squamous cell carcinoma; BC: Breast carcinoma; HNSCC: Head and neck squamous cell carcinoma; ESCC: Esophageal squamous cell carcinoma; Lung AD: Lung adenocarcinoma; AML: Acute myeloid leukemia.

Secretome-Related Gene Expression Profile of 39 Proteins Predict Shorter Survival in Patients with Pancreatic Ductal Adenocarcinoma
Secretome components from the tumor environment, including proinflammatory cytokines, play a fundamental role in the development of alterations that result in proliferation, metastasis, and resistance to treatments [32,36,44]. Considering that these alterations contribute to the worse prognosis of patients with PDAC, we determine whether the levels of tumor expression of the 39 secretome genes correlate with the patient's prognosis. We used the SurvExpress platform [110], a web-based biomarker validation tool that provides survival analysis and risk assessment of cancer datasets, to assess whether our set of secretome proteins was able to discriminate overall survival in patients with PDAC. SurvExpress generated a prognostic index (risk score) based on gene expression of the 39 secretome proteins and survival data of the patients with PDAC. These patients were Genes that are specifically upregulated or downregulated genes in each tumor type (absolute values of fold-change > 1.0 and q-value < 0.01; ANOVA) are shown in red and blue, respectively. Both columns (tumor types) and rows (secretome genes) were clustered using Euclidian distance. Three genes (ARHGDIA, AHSG, and EEF1A1) presented missing values in 90%-100% of the tumor types and are not represented in the heatmap (constant rows were removed during data pre-processing). The differential expression levels from tumor tissue versus combined normal TCGA and GTEx data were calculated using the web-based Gene Expression Profiling Analysis tool (GEPIA) [102]. GC: Gastric carcinoma; COAD: Colon adenocarcinoma; HCC: Hepatocellular carcinoma; Lung SCC: Lung squamous cell carcinoma; BC: Breast carcinoma; HNSCC: Head and neck squamous cell carcinoma; ESCC: Esophageal squamous cell carcinoma; Lung AD: Lung adenocarcinoma; AML: Acute myeloid leukemia.

Secretome-Related Gene Expression Profile of 39 Proteins Predict Shorter Survival in Patients with Pancreatic Ductal Adenocarcinoma
Secretome components from the tumor environment, including proinflammatory cytokines, play a fundamental role in the development of alterations that result in proliferation, metastasis, and resistance to treatments [32,36,44]. Considering that these alterations contribute to the worse prognosis of patients with PDAC, we determine whether the levels of tumor expression of the 39 secretome genes correlate with the patient's prognosis. We used the SurvExpress platform [107], a web-based biomarker validation tool that provides survival analysis and risk assessment of cancer datasets, to assess whether our set of secretome proteins was able to discriminate overall survival in patients with PDAC. SurvExpress generated a prognostic index (risk score) based on gene expression of the 39 secretome proteins and survival data of the patients with PDAC. These patients were divided into two groups, high-and low-risk, maximizing the number of patients in risk groups by employing an ordered prognostic index optimization algorithm in SurvExpress (Table S7). We evaluated the prognostic value of the gene expression of the 39 secreted proteins and analyzed their association with survival of patients with PDAC (Cox regression analysis) in four different data sets with patient survival information (TCGA, PACA-AU-ICGC, GSE21501, GSE28735). The 39 secretome genes initially analyzed in the TCGA dataset identified patients with significantly shorter survival (hazard ratio (HR) = 5.36; log-rank p-value = 1.16-16; concordance index (CI) = 74.75; n = 176). In order to validate our findings, survival analysis was performed on three additional independent data sets. These 39 secretome genes were also significantly associated with patient´s outcome in the ICGC datasets (HR  (Figure 6). This analysis also demonstrated increased expression of LDHA, ENO1, and PGK1 in the TCGA, PACA-AU-ICGC, and GSE21501 datasets; NME1 was upregulated in all PDAC datasets (Table S8). Enrichment of secretome genes in PDAC patients with low survival (high-risk group) can be observed in the heatmap generated by cluster gene expression analysis ( Figure S3). This shows the robustness of the gene expression signature of our set of secreted proteins, which demonstrated significant association with patient survival in the independent PDAC validation sets.
Cancers 2020, 12, x 13 of 32 divided into two groups, high-and low-risk, maximizing the number of patients in risk groups by employing an ordered prognostic index optimization algorithm in SurvExpress (Table S7) (Figure 6). This analysis also demonstrated increased expression of LDHA, ENO1, and PGK1 in the TCGA, PACA-AU-ICGC, and GSE21501 datasets; NME1 was upregulated in all PDAC datasets (Table S8). Enrichment of secretome genes in PDAC patients with low survival (high-risk group) can be observed in the heatmap generated by cluster gene expression analysis ( Figure S3). This shows the robustness of the gene expression signature of our set of secreted proteins, which demonstrated significant association with patient survival in the independent PDAC validation sets. Figure 6. The expression of secretome genes predicts cancer outcomes across four pancreatic ductal adenocarcinoma (PDAC) studies. Survival analysis, based on the expression of 39 mRNAs translated into proteins identified as secreted in PDAC, was calculated using the online platform SurvExpress [111], in four additional and independent PDAC datasets. Cancer patients were stratified into high-(red) and low-risk (green) groups. The adjusted risk ratio (HR) with the corresponding 95% confidence interval, log-rank p-value (P), and the number of successfully stratified patients (N) Figure 6. The expression of secretome genes predicts cancer outcomes across four pancreatic ductal adenocarcinoma (PDAC) studies. Survival analysis, based on the expression of 39 mRNAs translated into proteins identified as secreted in PDAC, was calculated using the online platform SurvExpress [107], in four additional and independent PDAC datasets. Cancer patients were stratified into high-(red) and low-risk (green) groups. The adjusted risk ratio (HR) with the corresponding 95% confidence interval, log-rank p-value (P), and the number of successfully stratified patients (N) determined by Cox univariate regression analysis is shown in each Kaplan-Meier survival plot. Datasets of PDAC patients: PDAC-TCGA [108], PACA-AU-ICGC [109], GSE21501 [19], and GSE28735 [20,79].

Secretome Gene Expression Predicts Cancer Outcomes in Different Cancer Studies
Considering that several of our 39 secretome genes are upregulated in other cancer types, we decided to evaluate the prognostic value of the genes in the additional nine different malignancies from TCGA. We found that, in addition to PDAC, our set of 39 genes of secretome-related proteins predicts a worse prognosis in HNSCC, ESCC, GC, HCC, Lung SCC, Lung AD, COAD, BC, and AML ( Figure 7 and Table S9). The heatmap for each of these cancer types shows that the expression profile of our set of 39 secretome-related proteins is able to distinguish patients into high-and low-risk groups ( Figure S3). The robustness of our 39 secretome genes in stratifying patients with high confidence in risk-groups was confirmed by high hazard ratios in nine TCGA cohorts, showing that changes in the expression of these 39 secretome genes are associated with worse overall survival (Figure 7). Therefore, these results suggest that the set of these 39 secretome genes are predictors of cancer survival outcomes.

Secretome Gene Expression Predicts Cancer Outcomes in Different Cancer Studies
Considering that several of our 39 secretome genes are upregulated in other cancer types, we decided to evaluate the prognostic value of the genes in the additional nine different malignancies from TCGA. We found that, in addition to PDAC, our set of 39 genes of secretome-related proteins predicts a worse prognosis in HNSCC, ESCC, GC, HCC, Lung SCC, Lung AD, COAD, BC, and AML ( Figure 7 and Table S9). The heatmap for each of these cancer types shows that the expression profile of our set of 39 secretome-related proteins is able to distinguish patients into high-and low-risk groups ( Figure S3). The robustness of our 39 secretome genes in stratifying patients with high confidence in risk-groups was confirmed by high hazard ratios in nine TCGA cohorts, showing that changes in the expression of these 39 secretome genes are associated with worse overall survival ( Figure 7). Therefore, these results suggest that the set of these 39 secretome genes are predictors of cancer survival outcomes.

Comparison with Prognostic Gene Signatures of Pancreatic Ductal Adenocarcinoma
Several prognostic gene signatures for PDAC have been proposed [17][18][19][20][21][22][23][24]110]. However, we found no gene overlapping with our set of secreted-related genes with these previous signatures from the literature. We also compared the performance of our signature with nine gene expression signatures [17][18][19][20][21][22][23][24]110] in predicting worse survival in PDAC, which was tested in four different PDAC datasets (TCGA, ICGC, GSE21501, GSE28735) available in the SurvExpress tool. These signatures were able to separate risk-groups based on the gene expression profiles (data are not shown). The signatures proposed by Collisson et al. [23] and Donahue et al. [18] showed slightly better results than the set of secretome-related genes identified in our meta-analysis; however, Donahue's signature, which corresponds to a set of 171 genes, had a lower performance in the dataset GSE28735 ( Figure 8). Our signature also had comparable results with Haider et al. [22] (Figure 8). However, it is important to emphasize our dataset comprises proteins that are expressed and secreted in PDAC, constituting a rich source of biomarkers.

In Silico Validation of Secreted Proteins
Immunohistochemical staining for 31 proteins identified with increased gene expression in PDAC tumoral samples by the GEPIA was retrieved from the HPA database [111]. PDAC immunohistochemical images were analyzed, and six proteins (L-lactate dehydrogenase A chain, LDHA; Phosphoglycerate kinase 1, PGK1; Pyruvate kinase, PKM; 14-3-3 protein sigma, SFN; Fibronectin, FN1; Galectin-1, LGALS1) showed medium or high immunostaining in PDAC tumor tissue, while low or not detected in normal pancreatic tissue, indicating that these proteins are also potentially biopsy-based markers to screening PDAC patients with high-risk ( Figure 9 and Figure S4). Additionally, four proteins (Triosephosphate isomerase, TPI1; Galectin-3, LGALS3; Galectin-3-binding protein, LGALS3BP; Filamin-A, FLNA) showed average immunostaining in normal tissue and high in tumor tissue ( Figure S4). Several prognostic gene signatures for PDAC have been proposed [17][18][19][20][21][22][23][24]115]. However, we found no gene overlapping with our set of secreted-related genes with these previous signatures from the literature. We also compared the performance of our signature with nine gene expression signatures [17][18][19][20][21][22][23][24]115] in predicting worse survival in PDAC, which was tested in four different PDAC datasets (TCGA, ICGC, GSE21501, GSE28735) available in the SurvExpress tool. These signatures were able to separate risk-groups based on the gene expression profiles (data are not shown). The signatures proposed by Collisson et al. [23] and Donahue et al. [18] showed slightly better results than the set of secretome-related genes identified in our meta-analysis; however, Donahue's signature, which corresponds to a set of 171 genes, had a lower performance in the dataset GSE28735 ( Figure 8). Our signature also had comparable results with Haider et al. [22] (Figure 8). However, it is important to emphasize our dataset comprises proteins that are expressed and secreted in PDAC, constituting a rich source of biomarkers. Figure 8. Performance of expression profile from the 39 secretome genes as potential pancreatic ductal adenocarcinoma (PDAC) biomarkers compared to nine previously gene signatures proposed for PDAC [17][18][19][20][21][22][23][24]115]. The color of the circles in the heat scatter plot represents the agreement index, while the size of the circle is based on the log-rank p-value of the risk group separation based on the SurvExpress tool. Rows and columns were grouped based on the Euclidean distance between the agreement index values. Datasets of PDAC patients: PDAC-TCGA [112], PACA-AU-ICGC [113], GSE21501 [19], and GSE28735 [20,114].

In Silico Validation of Secreted Proteins
Immunohistochemical staining for 31 proteins identified with increased gene expression in PDAC tumoral samples by the GEPIA was retrieved from the HPA database [116]. PDAC Figure 8. Performance of expression profile from the 39 secretome genes as potential pancreatic ductal adenocarcinoma (PDAC) biomarkers compared to nine previously gene signatures proposed for PDAC [17][18][19][20][21][22][23][24]110]. The color of the circles in the heat scatter plot represents the agreement index, while the size of the circle is based on the log-rank p-value of the risk group separation based on the SurvExpress tool. Rows and columns were grouped based on the Euclidean distance between the agreement index values. Datasets of PDAC patients: PDAC-TCGA [108], PACA-AU-ICGC [109], GSE21501 [19], and GSE28735 [20,79]. Cancers 2020, 12, x 16 of 32 Figure 9. Identification of secreted proteins in normal and pancreatic ductal adenocarcinoma (PDAC) tissues using immunohistochemical staining data available at the Human Protein Atlas database [116]. These proteins were selected based on their significant increased gene expression, as identified using the GEPIA tool [105]. Figure 9. Identification of secreted proteins in normal and pancreatic ductal adenocarcinoma (PDAC) tissues using immunohistochemical staining data available at the Human Protein Atlas database [111]. These proteins were selected based on their significant increased gene expression, as identified using the GEPIA tool [102].

Discussion
Several studies have focused on the investigation of PDAC secretome-proteome in order to identify molecular mechanisms and biomarkers of this malignancy. However, the molecular complexity of pancreatic cancer and the failure of validation for most proposed biomarkers increase the urgency of effective strategies for identifying promising biomarkers of the disease. The heterogeneity of pancreatic cancer proteomic research using different samples, and fractionation and mass spectrometry techniques generates a rich source of datasets that can be explored, integrated, and compared to identify potential biomarkers of the disease. Here, we integrated the proteomic profiles of pancreatic cancer cell lines and tumors obtained by a meta-analysis of the secretome and proteome. This analysis identified 39 biologically relevant secreted proteins in PDAC. The gene expression profile of these proteins predicted worse overall survival in four independent cohorts of PDAC patients. The expression of these secretome-related genes also predicted worse overall survival in nine additional tumor types from the TCGA. Our meta-analysis approach using different proteomic datasets provided higher statistical power to address the biological heterogeneity of PDAC as well as overall patient survival. This strategy allowed us to identify a panel of proteins in pancreatic cancer proteomic studies, which are potential prognostic biomarkers.
Our approach also demonstrated the lack of overlap between the studies selected by the two meta-analysis strategies. We found no protein shared between all secretome and proteome studies. This discrepancy may be due to the diversity of proteomic techniques used by different studies; or limitations in proteomic analysis related to sample preparation and sample heterogeneity, proteome complexity to be analyzed, especially regarding protein expression levels; and limitations of the analytical methods [112][113][114][115]. To overcome these limitations, we selected the proteins presented in two or more studies, aiming to increase the possibility of detecting proteins involved in pancreatic cancer. We also integrated both secretome and proteome meta-analysis to select tumor proteins that are secreted or found in biological fluids. When we compared our list of 39 secretome proteins with PDAC gene signatures from the literature, we found no overlap between these previous studies [17][18][19][20][21][22][23][24]110]. This may be because global gene transcription levels insufficiently reflect global protein levels or due to limitations related to the sensibility of proteomics techniques, which point to the need to integrate transcriptomic and proteomic analyses to identify critical molecular changes of cancer in its essence [112,[115][116][117][118]. Interestingly, our results show that tumor transcription levels of our set of proteins that are expressed and secreted in pancreatic cancer are useful as potential prognostic biomarkers when compared to previously proposed transcript-based signatures for PDAC.
We found a set of differentially expressed proteins with prognostic power that are biologically important in human cancers. For example, albumin (ALB), which was listed in 10 proteome and three secretome studies, has been identified as a poor prognostic factor in cancer patients [119,120]. Serum ALB is the most abundant blood protein in mammals; however, in diseases such as PDAC, its low level may be associated with an advanced stage of the disease [120]. The lower serum ALB levels can increase the risk of venous thromboembolism, which is the second leading cause of death in pancreatic cancer patients [119,121,122]. Triosephosphate Isomerase 1 (TPI1), found in seven secretome and four proteome studies, is a crucial enzyme in carbohydrate metabolism. Proteomic analysis of sera from pancreatic cancer patients showed TPI1 as one of the most abundant proteins in patients with poor survival before and after chemotherapy and could be further investigated as a prognostic marker as its levels gradually increase as the disease progresses [123]. The prognostic value TPI1 was also evaluated in gastric cancer, where patients with higher TPI1 expression had lower overall survival [124].
We also identified the alpha-enolase (ENO1), a glycolytic enzyme involved in the synthesis of pyruvate found in the cytoplasm, cell surface, and nucleus [125]. In addition to its glycolytic function, ENO1 plays a crucial role in cancer cell invasion and metastasis, in part because it also acts as a plasminogen receptor [126][127][128][129][130]. This ENO1 plasminogen receptor function, coupled with its high expression on the cell surface of tumor cells, facilitates the binding of large amounts of plasminogen on the cell surface, enabling plasmin activation, and enhancing the ability of PDAC cells to degrade extracellular matrix and, thus, benefiting the tumor invasion [127]. ENO1 also regulates pancreatic cancer adhesion, invasion, and metastasis by controlling the expression of αvβ3 integrin [128]. The αvβ3 integrin signaling is known to play a crucial role in tumor growth, angiogenesis, and metastasis. The protein αvβ3 integrin is the target of preclinical experiments in cancer treatment, demonstrating positive anti-angiogenic and anti-tumor effects [131,132]. These effects have been observed for cell surface ENO1, which has been found as significantly increased in tissues and plasma of PDAC patients with shorter survival [133]. Thus, further studies should be performed to investigate the role of secreted ENO1 in tumor biology. Additionally, ENO1 has also been seen as a potential prognostic biomarker in breast, head and neck cancers, and gliomas [125,134,135].
Our enrichment analysis also showed changes in the regulation of glycolytic processes. Increased aerobic glycolysis (Warburg effect) is observed exclusively in cancers and is highly dependent on unregulated metabolic enzymes [136]. In addition to the glycolytic enzymes ENO1 and TPI1, lactate dehydrogenase A (LDHA) was also identified in our study and corresponds to a central enzyme in regulating the Warburg effect; it catalyzes the conversion of lactate to pyruvate in the final stage of anaerobic glycolysis and is upregulated in various cancers [137,138]. In gastric cancer, LDHA was upregulated in tumor tissues and promoted tumor cell migration and invasion [137]. Moreover, LDHA has been reported to improve growth and inhibit apoptosis in pancreatic tumor cells [138]. Mohammad et al. [139] showed that increased expression of LDHA and PKM2 in pancreatic biopsy specimens from patients correlates with poor survival. Studies investigating the LDH expression levels in tumor and serum found that high tissue expression is not entirely consistent with elevated serum levels, indicating that tumor LDHA expression and serum levels are two independent predictors of the disease [140]. In our analysis, LDHA was identified in six secretome studies and only two from the proteome, suggesting that this enzyme may be secreted to perform its functions in tissues distant from the primary tumor focus, exacerbating the tumor progression. In addition to its prognostic role, further studies verifying the biological significance of serum LDHA levels need to be performed. Therefore, the high expression of glycolytic enzymes combined with their canonical and non-canonical functions may partly explain the aggressiveness of cancers, such as the pancreatic, influencing different prognostic outcomes and playing a pivotal role in defining personalized treatments [123,141].
Nucleoside diphosphate kinase A (NME1/NDPK-A) is a tumor suppressor with increased expression in different PDAC datasets and in other cancers types. The increased expression may be linked to anticancer mechanisms, as it exhibits anti-metastatic function [142]. Several studies support our findings, showing that NME1 is highly expressed in pancreatic cancer samples and predicts worse prognosis in patients [143][144][145]. Contrary to our results, Liu et al. [146], evaluated the prognostic value of NME1 by meta-analysis and concluded that negative regulation of NM1 was associated with a poor prognosis in breast, esophageal, nasopharyngeal, and lymphoma cancer. Negative regulation and worse prognosis of NME1 was also demonstrated in colon cancer [147]. Further investigations are needed to clarify the controversy between NME1 expression in other cancer types to address the relationship between expression and clinicopathological features.
Our meta-analysis used proteomic data, while our validation was performed with TCGA transcriptomic data. However, only one-third of the RNA species are significantly correlated with the corresponding proteins in human cells [168]. Thus, it is essential to emphasize that a combination of different factors influences a direct association between protein levels and their coding transcripts. These include the availability of a wide range of resources for protein biosynthesis, as well as temporal and spatial variations resulting from transcriptional and post-transcriptional mechanisms that control gene expression [26,169,170]. Previous integrated multi-platform analysis in PDAC has revealed associations of non-coding RNAs with tumor-specific mRNA subtypes [171]. These authors also showed that the differential regulation of gene expression via DNA methylation and microRNAs (miRNAs) could also distinguish tumor subtypes [171]. Recently, our research group has provided data on the role of miRNAs in the regulation of gene networks, including pathways of the adaptive and innate immune response involved in PDAC [172]. Thus, although transcript levels are not enough to predict protein levels in different conditions, the expression of 31 genes out of the 39 identified in our PDAC proteome-secretome meta-analysis was found upregulated in PDAC samples.
The transcripts and proteins identified in our study have the potential to be used in conjunction with miRNAs to increase the sensitivity and specificity of the PDAC diagnosis. To exemplify this point, the TIMP1 protein-which was identified in our study-has been indicated along with LCN2 as potential serum markers for the early detection of familial pancreatic cancer [173]. However, the combination of both LCN2 and TIMP1 with miR-196b was able to distinguish high-grade lesions and stage I from controls with absolute sensitivity and specificity [174]. Also, in accordance with our data, plasma extracellular vesicle long RNA profiling has identified a diagnostic signature, which includes the TIMP1 transcript, for the detection of pancreatic cancer [175]. This increased detection of TIMP1 transcript and protein might be partially explained by the effects of miRNAs such as miR-221/222 and miR-21. These miRNAs are overexpressed in pancreatic cancer cells, and, as a result, they promote cellular proliferation, invasion, and chemoresistance by targeting TIMP-2 or inducing the expression of the invasion-related genes matrix metalloproteinase-2 and -9 [176,177].
The additional TCGA cancer types that we analyzed also shared many upregulated genes of the 39 secretome-related proteins, indicating that, at least in part, these tumors may share programs that promote oncogenesis and cancer progression. Some studies suggest that cancer may be categorized by gene and protein expression patterns of the secretome since similarities are observed between tumors originating from different tissues [134,178,179]. An explanation for these similarities is proposed by Robinson et al. [179], which shows a potential mechanism by which cancer cells relieve secretory stress by decreasing tissue-specific gene expression, facilitating the secretion of invasion-promoting proteins and proliferation. Recently, we have demonstrated that tumor types highly associated with cachexia share a high number of upregulated secretome genes [178]. These observations are in accordance with our findings since only PRSS1 was explicitly expressed in the normal pancreas among the 39 proteins that we identified. However, it is important to note that ARHGDIA (Rho GDP-dissociation inhibitor 1) and PARK7 (Protein/nucleic acid deglycase DJ-1) are specifically upregulated in PDAC, and the increased expression of ARHGDIA and PARK7 can be considered as potential biomarkers of the disease. ARHGDIA-a specific regulator of Rho protein exchange reactions crucial for JNK pathway-was previously identified by a bioinformatic pipeline that searched for candidate genes related to pancreatic cancer using protein-protein interactions and a shortest path approach [180]. Also, in accordance with our results, PARK7 was found to be significantly elevated in PDAC [181], correlated with tumor invasion and worse patients' outcome, and responsible for promoting invasion and metastasis of pancreatic cancer cells [182].
Our main contribution using this strategy consists of the selection of secreted proteins identified in pancreatic cancer proteomic analysis that stratify patients with low and high survival. However, our study has some limitations that should be pointed out, such as the reuse of proteomic data from studies with different biological samples, different protein fractionation approaches, and different mass spectrometry techniques employed. Another limitation of our investigation was the use of proteins present only in the body of the studies, not extending to supplementary data, which could broaden our range of identified proteins. We also noticed that some proteins identified in our analysis correspond to high abundance proteins in samples, suggesting that less abundant proteins may not have been identified by the proteomic techniques used in the studies retrieved by our meta-analysis. Also, our gene expression and survival analysis were performed in silico with the help of databases and bioinformatics tools available online, and experimental validation of the secreted proteins identified in this meta-analysis may help to correlate the results obtained herein with patient's prognosis. Many of the observations made in this study were based on functions investigated at the cellular level, but the molecular mechanisms underlining the secretion of these proteins should be better studied, clarifying whether they are secreted to perform autocrine or paracrine functions, or if they have different functions and modes of action in different tissues. Thus, further research is needed to identify the molecular pathways and contributions of these secreted proteins in the pathophysiology of pancreatic cancer.
Importantly, the molecular subtyping of PDAC is in its infancy and remains without clinically relevant molecular subtypes [13]. Despite the extensive genomic characterization, gene signatures have provided limited prognostic information for the disease [183]. However, it is only by assessing the clinical importance of molecular subtypes that a relevant molecular profile may be identified [13].

Integration of Secretome and Proteome Meta-Analysis to Identify Pancreatic Cancer Biomarkers
We integrated two meta-analyses to identify proteins as potential biomarkers of pancreatic cancer: a meta-analysis of pancreatic cancer secretome and meta-analysis of pancreatic ductal adenocarcinoma proteome. We searched protein data identified in these studies, published from 2005 to 2017, through the PubMed Central (PMC) electronic database at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The study design of each meta-analysis followed the stages of the PRISMA statement [184].

Pancreatic Cancer Secretome Meta-Analysis
To search for pancreatic cancer secretome studies, we used the following keywords: pancreatic cancer, pancreatic cancers, pancreatic neoplasm, pancreas cancer, pancreas cancers, cancer of the pancreas, cancer of pancreas and secretome, "cell secretome", "cancer secretome", "secretome analysis", "secretome proteomic", "secretome proteomics", "secretome profiling", "secretome mass spectrometry", "microvesicles and proteome", "microvesicles and proteomic", "microvesicles and proteomics", "microvesicles and protein profiling", "microvesicles and mass spectrometry", "exosome and proteome", "exosome and proteomic", "exosome and proteomics", "exosome and protein profiling", "exosome and mass spectrometry", apoptotic bodies and proteome, apoptotic bodies and proteomic, apoptotic bodies and proteomics, apoptotic bodies and protein profiling, apoptotic bodies and spectrometry, extracellular vesicles and proteome, extracellular vesicles and proteomic, extracellular vesicles and proteomics, extracellular vesicles and proteomics, extracellular vesicles and protein profiling, extracellular vesicles and mass spectrometry, "secreted proteins" and proteome, "secreted proteins" and proteomic, "secreted proteins" and proteomics, "secreted proteins" and spectrometry mass, conditioned medium and proteome, conditioned medium and proteomic, conditioned medium and proteomics, "conditioned medium and protein profiling", conditioned medium, and spectrometry mass. The criteria for study inclusion in this meta-analysis were: proteomic studies in pancreatic cancer samples (tumor tissue or pancreatic sulcus) or pancreatic cancer cell lines; only mass spectrometry studies were considered, and only data with statistical significance were included for the integrative analysis. Studies were excluded if they did not meet the criteria mentioned above, conducted in non-pancreatic cancer, reviews, studies with treatments prior to proteomic analysis, studies without proteomic analysis, in silico studies, unpublished studies, and studies published before 2005. As this analysis revealed that most studies involved in PDAC, we performed a meta-analysis of the proteome using only this histological type.

Pancreatic Ductal Adenocarcinoma Proteome Meta-Analysis
To search for PDAC proteome studies, we used the following keywords: "pancreatic ductal adenocarcinoma" and proteome, "pancreatic ductal adenocarcinoma" and proteomic, "pancreatic ductal adenocarcinoma" and proteomics, "pancreatic ductal adenocarcinoma" and "protein profiling", and "pancreatic ductal adenocarcinoma" and mass spectrometry. The following criteria were used to include studies in this meta-analysis: proteomic studies in human pancreatic ductal adenocarcinoma samples, only mass spectrometry studies were considered, studies that included normal tissues for comparison (case-control), and only data with statistical significance were included for the integrative analysis. Studies were excluded if they did not meet the criteria mentioned above or were studies on pancreatic ductal non-adenocarcinoma cancers, reviews, studies on experimental models, studies with treatments prior to proteomic analysis, studies without proteomic analysis, studies without case-control, studies without full access, in silico studies, protocols, and studies published before 2005.

Extraction of Meta-Analysis Data and in Silico Confirmation of Secreted Proteins
The studies were independently analyzed, and those that met the inclusion criteria were selected. The following information was extracted from each study: (1) study details: authors, year of publication, scientific journal, experimental model, mass spectrometry technique used, and validation strategies; (2) outcome measures: the proteins identified by the selected studies were tabulated to a Microsoft Excel file, mapped to the same gene symbol, and those identified simultaneously in two or more studies were selected for the study. The proteins shared between the two meta-analyses were identified by a Venn diagram tool (http://bioinfogp.cnb.csic.es/tools/venny/).
To confirm that all proteins commonly identified in the meta-analysis (secretome and proteome) are secreted, their amino acid sequences (FASTA file) were obtained from the UniProtKB database from the UniProt consortium. Next, the amino acid sequences or symbols of each protein were used to predict secreted proteins using the online tools SignalP 4.1, SecretomeP 2.0, TargetP 1.1, TMHMM v. 2.0, Vesiclepedia e Exocarta [94][95][96][97]99,100]. The SignalP 4.1 server has selected classically secreted proteins that have the signal peptide and D value above 0.45 [94]. The predicted proteins belonging to the non-classical secretion pathway without signal peptide were selected with the aid of the SecretomeP tool, and the cut-off point used was the neural network score (NN) >0.6 [95]. The Vesiclepedia and Exocarta databases were used to designate secreted proteins in exosomal fractions [96,97]. After secreted proteins were established by either the classical, nonclassical pathways or exosomes, and these proteins were challenged on the TargetP and TMHMM servers [99,100] for the exclusion of mitochondrial or transmembrane helix proteins, respectively.

Protein-Protein Interaction Network and Gene Ontology Analysis
The proteins identified as secreted were submitted to the STRING database (Search Tool for Retrieval of Interacting Genes, version 10.5; [101] for the construction of a protein-protein interaction network and analysis of ontology of pancreatic cancer secretome components. For the construction of networks, we consider experiments, databases, co-expression, neighborhood, and co-occurrence as sources of active interaction. The minimum interaction score required was 0.700 (high confidence), and nodes disconnected from the network were hidden to simplify the display. The PPI enrichment p-value indicates the statistical significance provided by STRING. For ontology analysis, we consider the top 15 terms with the lowest False Discovery Rate (FDR). Access in September 2018.

Gene Expression Profile in Pancreatic Ductal Adenocarcinoma
The transcriptional profile of genes encoding proteins identified in our meta-analysis as secreted in human pancreatic cancer was evaluated in 10 different cancers from the TCGA database [108] and compared with normal tissues from the TCGA and GTEx [104] databases after being uniformly processed and unified by Toil Pipeline [103] with the web-based Gene Expression Profiling Analysis tool (GEPIA) [102]. Differentially expressed genes between tumor samples and normal samples were determined by one-way ANOVA, applying the log2 fold-change > 1 and q-value <0.01 statistical cutoffs. These differentially expressed genes were further filtered for genes encoding predicted secreted proteins obtained from the HPA database (The Human Protein Atlas) [111], for 10 cancer types (PDAC, HNSCC, ESCC, GC, HCC, Lung AD, Lung SCC, COAD, AML, and BC) and compared with the set of secreted proteins identified by our meta-analysis. For data visualization, we constructed Heatmaps and performed Principal Component Analysis (PCA) in the ClustVis web tool [185]. Additionally, secreted proteins identified with corresponding increased gene expression in pancreatic tumor samples by GEPIA were submitted to the HPA database aiming at the immunohistochemical analysis of selected proteins in tumor and normal tissues.

Prognostic Value of Secreted Protein Translated Transcripts in the Predicting Pancreatic Ductal Adenocarcinoma Outcome
The SurvExpress database [107] was used for survival analysis and risk assessment in four different datasets of PDAC patients (PDAC-TCGA [108], PACA-AU-ICGC [109], GSE21501 [19], GSE28735 [20,79]), and in nine different cancers types from TCGA [108]. This tool allowed the association between the expression of the 39 secreted genes identified in pancreatic cancer with the survival of cancer patients using Cox Proportional Risk Regression, according to risk groups estimated by an optimization algorithm. Morpheus [186] was applied to select the best set of biomarkers from SurvExpress results, using a clustering analysis based on Euclidian distance.

Conclusions
Our integrative secretome and proteome meta-analysis in pancreatic cancer identified a set of 39 secreted proteins as potential biomarkers of the disease. The tumor gene expression profile of these 39 proteins predicted shorter survival in four different PDAC datasets (TCGA, ICGC, GSE21501, GSE28735) and nine different cancer types from the TCGA. The differential expression profile of this set of secreted proteins predicted worse overall survival in cancer patients and may also be used as potential therapeutic targets by acting on progression and resistance processes of pancreatic cancer.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/3/716/s1, Figure S1: Expression profile of 39 secretome genes in normal and tumor tissues from GTEx (Genotype-Tissue Expression) and TCGA (The Cancer Genome Atlas), respectively, Figure S2: Principal component analysis (PCA) of 39 secretome genes in 10 different tumor types from TCGA compared to their respective normal tissues, Figure S3: The expression profile of 39 secretome genes by PDAC stratifies patients into low-and high-risk groups, Figure S4: Validation of secreted protein expression in normal and pancreatic ductal adenocarcinoma (PDAC) tissues using immunohistochemical staining available at the Human Protein Atlas database, Table S1: Total number of proteins identified in proteomic studies of pancreatic cancer, Table S2: Secretome proteins (n = 156) identified in two or more pancreatic cancer proteomic studies, Table S3: Proteins (n = 132) identified in two or more pancreatic cancer proteomic studies, Table S4: Selected studies and proteins included in the secretome and proteome pancreatic cancer meta-analyses, Table S5: Verification of secretion nature of 43 proteins, Table S6: The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) samples used for gene expression analyses, Table S7: Expression of 39 secreted proteins based on samples from four different PDAC tumor data sets predicts poor overall survival, Table S8: Secretome genes with increased expression in high-risk groups from different datasets for PDAC in SurvExpress, Table S9

Conflicts of Interest:
The authors declare no conflict of interest.