Potential Early Markers for Breast Cancer: A Proteomic Approach Comparing Saliva and Serum Samples in a Pilot Study

Breast cancer is the second leading cause of death for women in the United States, and early detection could offer patients the opportunity to receive early intervention. The current methods of diagnosis rely on mammograms and have relatively high rates of false positivity, causing anxiety in patients. We sought to identify protein markers in saliva and serum for early detection of breast cancer. A rigorous analysis was performed for individual saliva and serum samples from women without breast disease, and women diagnosed with benign or malignant breast disease, using isobaric tags for relative and absolute quantitation (iTRAQ) technique, and employing a random effects model. A total of 591 and 371 proteins were identified in saliva and serum samples from the same individuals, respectively. The differentially expressed proteins were mainly involved in exocytosis, secretion, immune response, neutrophil-mediated immunity and cytokine-mediated signaling pathway. Using a network biology approach, significantly expressed proteins in both biological fluids were evaluated for protein–protein interaction networks and further analyzed for these being potential biomarkers in breast cancer diagnosis and prognosis. Our systems approach illustrates a feasible platform for investigating the responsive proteomic profile in benign and malignant breast disease using saliva and serum from the same women.


Introduction
Breast cancer is the second leading cause of mortality for women in the United States and is estimated to result in 43,250 deaths in 2022 [1].Early detection for breast cancer can reduce breast cancer-related mortality.Among women aged 50 years and older, reports have demonstrated a 20-40% reduction in breast cancer mortality in women who underwent mammography and clinical breast examination [2].Among women screened at younger ages (40-49 years), mortality rates decrease by 13-23%.A detailed analysis of these data suggests that a survival rate of 96% can be achieved if women underwent mammography every three months [3].However, the cost and risks of mammography (such as radiation exposure) with increased frequency of use are not ideal.Furthermore, despite accurate mammography diagnoses, the screening procedure may result in relatively high rates of false-positive (56%) and false-negative (22%) diagnoses in women younger than 50 years, especially in women with dense parenchymal breast tissue [4,5].Because of these shortcomings, there is a need to develop additional diagnostic methods to further enhance the sensitivity and specificity of breast cancer detection, particularly in women with dense breast tissue, and thereby reducing the need for unnecessary biopsies.As a complementary approach to mammography, determination of biomarkers in saliva and/or serum could be a critical measurement for the early detection of breast cancer.
Saliva is considered an easily obtained clear fluid, which is indicative of an individual's protein profile at the time of collection.Testing saliva as a diagnostic fluid meets the criteria for an inexpensive, non-invasive, reliable, and relatively simple procedure that can be repeated with a minimum discomfort to patients.In addition, providing a saliva sample may cause less anxiety in study participants than providing a blood sample [6].
The clinical utility of saliva as a diagnostic fluid is being recognized in several diseases, including cancer [7][8][9].A meta-analysis revealed that salivary proteins represent good biomarkers for diagnosis of several cancer types including that of the breast [10,11].Earlier studies focused on transcriptomic and proteomic signatures in saliva revealing sensitive and specific biomarkers for the detection of breast cancer using two-dimensional difference gel electrophoresis (2D-DIGE) [9].A recent review systematically captures proteomics-based technologies for comparing dysregulated proteins in breast cancer in several body fluids including saliva and serum [12].A variety of methods including surface enhanced laser desorption/ionization [13] and nano-liquid chromatography-quadrupole-time-of-flight technology [14] have been utilized for discovering biomarkers for breast cancer in saliva and plasma.Moreover, the isobaric tags for relative and absolute quantitation (iTRAQ) technique has been utilized for identifying salivary proteins as potential biomarkers for breast disease [15].In a comparison study, the global-tagging iTRAQ technique was found to be more sensitive than the cysteine-specific Isotope-coded affinity tag (cICAT) method, which in turn was equally sensitive as the 2D-DIGE technique [16].iTRAQ has an advantage over ICAT and other methods since several samples can be analyzed simultaneously, and helps reduce the time spent for mass spectrometry analysis [17].Another advantage of iTRAQ is the possibility of identifying proteins with varying pI and molecular weights.In addition, using iTRAQ the relative and absolute quantification is possible across different sample states for a synchronous comparison of biological fluids such as saliva and serum from normal, benign and malignant breast disease cases.
We hypothesize that protein changes occurring in breast cells and their environment will be reflected in the saliva and serum of breast cancer patients.We further hypothesize that protein changes in the benign stages will differ from those in the malignant stages of breast disease.In the present study, we compared the proteomic profile in saliva and serum samples from women without breast disease (referred to as normal in our study), with benign breast disease, and with malignant breast disease using the iTRAQ technique.Several proteins were identified in both the benign and malignant groups that could be potential biomarkers for early detection and prognosis of breast cancer in women.
Considering the AUC values calculated from receiver operating characteristic (ROC) curve analysis for saliva proteins to distinguish between breast tumor and normal breast tissue (Table 1), 14 proteins were designated as outstanding (>90%) and 22 proteins each with excellent (80-90%) and acceptable (70-80%) ratings for their diagnostic ability [18].

Proteins Identified in Serum Samples
A total of 371 proteins were identified in the serum samples by iTRAQ analysis (Table S2).Of these, the expressions of 56 proteins were significantly (p < 0.05) altered in the samples from either B/N, M/N or M/B comparisons (Table 2).In addition, 29 proteins in B/N samples (13 up-regulated, 16 down-regulated), 30 proteins in M/N samples (11 upregulated, 19 down-regulated) and 15 proteins in M/B samples (4 up-regulated, 11 downregulated) were observed as differentially expressed.

Enrichment Analysis of Proteins in Saliva Samples
GO enrichment analysis showed that in all three comparisons (B/N, M/N and M/B), most salivary proteins were involved in exocytosis, secretion, immune response, neutrophil mediated immunity and cytokine-mediated signaling pathway, but the number of proteins associated with these processes varied between groups (Figure 1A).Most proteins were localized in the extracellular exosome, extracellular space, secretory granule lumen, secretory vesicle or cytoplasmic vesicles, and again the number of proteins varied among the groups.In terms of molecular functions, the proteins were annotated as enzyme inhibitor activity, calcium ion binding, endopeptidase regulator activity and peptidase activity (Figure 1A, Table S3).

Enrichment Analysis of Proteins in Saliva Samples
GO enrichment analysis showed that in all three comparisons (B/N, M/N and M/B), most salivary proteins were involved in exocytosis, secretion, immune response, neutrophil mediated immunity and cytokine-mediated signaling pathway, but the number of proteins associated with these processes varied between groups (Figure 1A).Most proteins were localized in the extracellular exosome, extracellular space, secretory granule lumen, secretory vesicle or cytoplasmic vesicles, and again the number of proteins varied among the groups.In terms of molecular functions, the proteins were annotated as enzyme inhibitor activity, calcium ion binding, endopeptidase regulator activity and peptidase activity (Figure 1A, Table S3).KEGG pathway analysis identified a total of 25, 9 and 16 pathways (p < 0.05) and Reactome pathway analysis identified 44, 36 and 25 pathways (p < 0.05) for the B/N, M/N and M/B groups of saliva samples, respectively.The overall comparison among the groups can be found in Table S3.The top 10 enriched Reactome pathways related to each of the group samples are shown for B/N (Figure 1B), M/N (Figure 1C) and M/B (Figure 1D) related to the significant proteins in each group.The saliva proteins identified from iTRAQ analysis of B/N, M/N and M/B groups were mainly involved in the neutrophil degranulation and innate immune response based on Reactome pathway analysis (p < 0.05).

Enrichment Analysis of Proteins in Serum Samples
The serum proteins were mostly involved in the regulation of biological processes, were located primarily in organelles or extracellular region, and mostly displayed binding, catalytic or structural molecular activities (Figure 2A, Table S4).
degranulation and innate immune response based on Reactome pathway analysis (p < 0.05).

Protein-Protein Interaction (PPI) Networks for Proteins in Saliva
The network of B/N consisted of 798 interactions among 28 significant saliva proteins and 602 of their first interacting neighbors.Among the major hub proteins in the B/N group, CDC42, HSP7C, PSA1 and PSA5 were down-regulated and had multiple interacting partners, whereas S10A8, CATD, FINC and LG3BP were up-regulated with a moderate number of interactions (Figure 3).The network of M/N consisted of 620 interactions among 44 significant proteins and 521 of their first interacting neighbors.The major hubs in the M/N group consisted of CDC42, H2AX, PSB3 and PDIA1 which were down-regulated and had several to moderate interacting partners while LGS3BP, STOM, ACTN2 and VPS41 were up-regulated with fewer interacting partners (Figure 3).In addition, the network of M/B consisted of 522 interactions among 19 significant proteins and 407 of their first interacting neighbors.Among the major hub proteins in the M/B group, TERA, FINC, HSP7C, PSB3, S10AB and ANXA1 were down-regulated and had several to moderate interacting partners, whereas HS71A, PSA1, A2MG and TTHY were up-regulated with a moderate number of interactions (Figure 3).All the PPIs in each of the groups in saliva are listed in Table S5.

Protein-Protein Interaction Networks for Proteins in Serum
Overall, there were fewer interactions in serum among the smaller number of significant proteins and far fewer interacting partners compared to the respective PPI networks among saliva proteins.In particular, the network of B/N consisted of 161 interactions among 20 significant proteins and 151 of their first interacting neighbors.Among the major hub proteins in the B/N group, DYHC1, APOB, CATD, TSP1 and A2MG were downregulated and had moderate interacting partners and were down-regulated, whereas CE290, PRDX2, CADH5 and APOC1 were up-regulated with a moderate number of interactions (Figure 3).The network of M/N consisted of 175 interactions among 20 significant proteins and 165 of their first interacting neighbors.The major hubs in the M/N group consisted of MED30, SMC3, APOB and APOA1, which were down-regulated and had several to moderate number of interacting partners, whereas PRDX2, HBB, FIBA and FETUA were up-regulated with fewer interacting partners (Figure 3).Additionally, the network of M/B consisted of 66 interactions among 11 significant proteins and 61 of their first interacting neighbors.Among the major hub proteins in the M/B group, APOA1, APOA2, GPKOW and TSP1 were down-regulated and had moderate interacting partners whereas VINC and HBB were up-regulated with a moderate number of interactions (Figure 3).All the PPIs in each of the groups for the serum samples are listed in Table S5.

Protein Ratios across Serum and Saliva in B/N and M/N Groups
Following the iTRAQ analysis, proteins commonly identified in saliva and serum samples of the B/N and M/N groups were fitted without interaction by two-way ANOVA models.As a result, we identified 17 proteins that were significantly (p < 0.05) different among serum and saliva (Table S6).A subset of these proteins that were detected in 6-8 saliva and serum samples are shown in Figure 4.These included alpha-1B-glycoprotein precursor (A1BG), fibrinogen alpha chain isoform alpha-E preproprotein (FIBA), alpha-1antichymotrypsin precursor (AACT), extracellular matrix protein 1 isoform 3 precursor (ECM1), peroxiredoxin-2 (PRDX2), 78 kDa glucose regulated protein precursor (ERP78) and galactin-3-binding protein precursor (LG3BP).PRDX2, A1BG, ECM1, ERP78 and FIBA showed lower ratios in saliva samples when compared to serum samples while LG3BP and AACT ratios were higher in saliva in contrast to serum samples of the same subjects.Upon further comparison between B/N and M/N in the saliva and serum samples, TSP1 was found to be significantly different in serum (p < 0.05).All the above proteins were presently measured as a ratio following iTRAQ analysis and need to be validated using actual quantitation by either Western blot analysis or ELISA in the future.

Protein Ratios across Serum and Saliva in B/N and M/N Groups
Following the iTRAQ analysis, proteins commonly identified in saliva and serum samples of the B/N and M/N groups were fitted without interaction by two-way ANOVA models.As a result, we identified 17 proteins that were significantly (p < 0.05) different among serum and saliva (Table S6).A subset of these proteins that were detected in 6-8 saliva and serum samples are shown in Figure 4.These included alpha-1B-glycoprotein precursor (A1BG), fibrinogen alpha chain isoform alpha-E preproprotein (FIBA), alpha-1antichymotrypsin precursor (AACT), extracellular matrix protein 1 isoform 3 precursor (ECM1), peroxiredoxin-2 (PRDX2), 78 kDa glucose regulated protein precursor (ERP78) and galactin-3-binding protein precursor (LG3BP).PRDX2, A1BG, ECM1, ERP78 and FIBA showed lower ratios in saliva samples when compared to serum samples while LG3BP and AACT ratios were higher in saliva in contrast to serum samples of the same subjects.Upon further comparison between B/N and M/N in the saliva and serum samples, TSP1 was found to be significantly different in serum (p < 0.05).All the above proteins were presently measured as a ratio following iTRAQ analysis and need to be validated using actual quantitation by either Western blot analysis or ELISA in the future.

Prognostic Performance Analysis
When the association of the expression levels of genes encoding significant proteins with prognostic outcome was investigated through survival analyses, all protein sets in B/N and M/N saliva and serum samples (Figure 5), except in the M/B group serum data, indicated high impact on overall patient survival (p < 0.05) in breast cancer.According to

Prognostic Performance Analysis
When the association of the expression levels of genes encoding significant proteins with prognostic outcome was investigated through survival analyses, all protein sets in B/N and M/N saliva and serum samples (Figure 5), except in the M/B group serum data, indicated high impact on overall patient survival (p < 0.05) in breast cancer.According to the parameters of HR and p-values, the prognostic performance of the protein sets in the saliva data was observed to be more significant than the protein sets in the serum data for all the groups.In addition, the comparisons of the B/N and M/N group samples had better prognostic performance than the M/B group samples in both the saliva and serum data.The prognostic performance of each gene encoding significant protein based on high-risk vs. low-risk of the dataset for invasive breast carcinoma (BRCA) obtained from The Cancer Genome Atlas (TCGA) were used to draw the Kaplan-Meier (KM) plots and are presented as box plots for the significant proteins in B/N, M/B and M/N groups in saliva as well as in serum (Figures S1-S6).
prognostic performance than the M/B group samples in both the saliva and serum data.The prognostic performance of each gene encoding significant protein based on high-risk vs. low-risk of the dataset for invasive breast carcinoma (BRCA) obtained from The Cancer Genome Atlas (TCGA) were used to draw the Kaplan-Meier (KM) plots and are presented as box plots for the significant proteins in B/N, M/B and M/N groups in saliva as well as in serum (Figures S1-S6).

Discussion
When comparing the proteomic profile of saliva and serum samples from the same women with a diagnosis of benign or malignant state of the breast disease relative to those of women with no breast disease, we have identified proteins that differed in expression levels.Further, analyzing the significant protein changes suggested involvement of several biological pathways and functionalities.We constructed potential protein-protein interaction networks among hub proteins detected in serum and saliva samples and their interacting partners to identify potential biomolecular markers to be explored for diagnosis or prognosis.Since mammography can lead to false positives and anxiety in subjects, utilizing more than one biomarker from our analysis would greatly improve early diagnosis of breast cancer using non-invasive testing in saliva and/or serum.
Interestingly, several proteins in our saliva and serum proteomic analysis qualified for outstanding and excellent diagnostic power based on the AUC values (Tables 1 and  2).However, ROC curve analysis was based on RNA-Seq data from breast tumor tissues compared to normal tissues from the TCGA database; therefore, it is worthwhile to investigate which of these secretory proteins succeed as biomarkers for early breast cancer diagnosis using a larger cohort.
Several circulating proteins have been identified in the plasma and serum of patients with breast cancer [19] but we still lack highly sensitive and specific biomarkers.Below,

Discussion
When comparing the proteomic profile of saliva and serum samples from the same women with a diagnosis of benign or malignant state of the breast disease relative to those of women with no breast disease, we have identified proteins that differed in expression levels.Further, analyzing the significant protein changes suggested involvement of several biological pathways and functionalities.We constructed potential protein-protein interaction networks among hub proteins detected in serum and saliva samples and their interacting partners to identify potential biomolecular markers to be explored for diagnosis or prognosis.Since mammography can lead to false positives and anxiety in subjects, utilizing more than one biomarker from our analysis would greatly improve early diagnosis of breast cancer using non-invasive testing in saliva and/or serum.
Interestingly, several proteins in our saliva and serum proteomic analysis qualified for outstanding and excellent diagnostic power based on the AUC values (Tables 1 and 2).However, ROC curve analysis was based on RNA-Seq data from breast tumor tissues compared to normal tissues from the TCGA database; therefore, it is worthwhile to investigate which of these secretory proteins succeed as biomarkers for early breast cancer diagnosis using a larger cohort.
Several circulating proteins have been identified in the plasma and serum of patients with breast cancer [19] but we still lack highly sensitive and specific biomarkers.Below, we discuss some of the pertinent proteins that were significantly altered among the different groups (B/N, M/M and M/B) in our analysis of saliva and/or serum and their relevance for a potential biomarkers for breast cancer.
Saliva: Fibronectins bind cell surfaces and various compounds including collagen, fibrin, heparin, DNA and actin.In our analysis, fibronectin isoform 11 preproprotein (FINC), was up-regulated 1.97 fold in B/N (p < 0.05) and did not change in M/N, whereas it was down-regulated at 0.56 fold in the M/B group (p < 0.05).It has been reported that a liquid biopsy detecting FINC on circulating extracellular vesicles could be a promising method to detect early breast cancer [20].Indeed, FINC was one of the hub proteins that had 15 interacting partners and has an AUC of 93.05% with an outstanding diagnostic power.
The SPARK-like isoform 1 protein 1 precursor (SPRL1) is an extracellular matrix glycoprotein that has been implicated in the pathogenesis of several disorders, including cancer.In our analysis, SPRL1 was down-regulated at 0.15 fold (p < 0.05) in B/N and 0.44 fold in M/N (p > 0.05).Previously, a significantly reduced expression SPRL1 was observed in human breast cancer tissues compared to that in normal breast epithelial tissues, at both mRNA and protein levels.In addition, the down-regulation of SPRL1 was significantly correlated with lymphatic metastasis [21].SPRL1 was found to have an outstanding diagnostic power with an AUC of 96.5%.
Histone H2AX (H2AX) is a type of histone protein from the H2A family encoded by the H2AFX gene.An important phosphorylated form is γH2AX (S139), which forms when double-strand breaks appear.In our analysis, H2AX was marginally up-regulated at 1.16 fold in B/N (p > 0.05) but down-regulated significantly in M/N at 0.5 fold (p < 0.05) and at 0.39 fold in M/B group (p > 0.05).Evaluating the formation of γH2AX in breast tumor tissue could potentially be a sensitive means of early breast cancer detection as these levels may reflect endogenous genomic instability in breast cancerous tissues [22].Additionally, the detection of γH2AX could benefit early cancer screening, with breast cancer included [23].Even though in our analysis we found H2AX to be down-regulated in M/N group, it is important to note that we detected H2AX in saliva and this could be conveniently used for monitoring breast disease.H2AX was one of the hub proteins that had 102 interacting partners and had an AUC of 93.7% with an outstanding diagnostic power.
Cystatin-SN precursor (CYTN) belongs to the type 2 cystatin superfamily, which restricts the proteolytic activities of cysteine proteases.In our analysis, CYTN was marginally up-regulated at 1.94-and 1.96-fold in M/N and M/B groups, respectively, while only a moderate change of 1.09 was noted in benign samples (p > 0.05).CYTN promotes cell proliferation, clone formation and metastasis in breast cancer cells and has been proposed to be a potential prognostic biomarker and therapeutic target for breast cancer [24].CYTN was found to have an outstanding diagnostic power with AUC of 93.1%.
Serum: Hemoglobin subunit beta (HBB) is a member of the globin family, a structurally conserved group of proteins often containing the heme group, which have the ability to reversibly bind O2 and other gaseous ligands in erythrocytes [25].In our analysis, HBB was up-regulated 1.97-and 2.04-fold in M/N and M/B groups, respectively (p < 0.05), but moderately down-regulated in B/N group.This protein has been implicated as a potential biomarker of breast cancer progression [26].It was one of the hub proteins that had 5 interacting partners and had an AUC of 93.7% with outstanding diagnostic power.
Retinol-binding protein 4 (RET4) is a recently identified adipokine that is elevated in patients with obesity or type 2 diabetes [27].The iTRAQ analysis revealed that RET4 was up-regulated 1.48 fold in M/N group (p < 0.05) and may be detectable earlier as suggested from our study (1.40 fold increase in B/N, p > 0.05).In a case control study, higher serum RET4 levels were associated with the risk of breast cancer [28].It was one of the hub proteins with just 1 interacting partner (TTHY) and had an AUC of 93.5% with outstanding diagnostic power.
Cadherin-5 isoform X1 (CADH5) is a member of the cadherin family which are calciumdependent cell adhesion proteins.Previously, using a glycoproteomic approach CADH5 emerged as a novel biomarker for metastatic breast cancer [29].The iTRAQ analysis revealed that CADH5 was up-regulated 1.18 fold in M/N group (p > 0.05) and was most likely detected in the benign stage of breast cancer as suggested from our study (1.50 fold increase in B/N, p < 0.05).It was one of the hub proteins with 8 interacting partners and has an AUC of 91.6% with outstanding diagnostic power.
Von Willebrand factor preproprotein (VWF) is a large multimeric plasma glycoprotein that plays important roles in normal hemostasis.VWF can also impact cancer cell metastasis [30] and more recently it has been shown by the same group that breast cancer cells mediate endothelial cell activation and promote VWF release [31].However, in our analysis VWF was elevated 1.57 fold in serum samples of benign breast cancer diagnosis (p < 0.05), so this may be a potential marker that may provide damage to endothelial cells early in the disease.It was one of the hub proteins with 4 interacting partners and had an AUC of 91.6%.
Alpha-2-macroglubulin isoform X1 (A2MG) is a protease inhibitor and cytokine transporter covering a wide range of proteases, including trypsin, thrombin and collagenase.Even though it has a high AUC value for diagnosis (92.4%), it was modestly down-regulated in both benign and malignant samples (p < 0.05).Others have reported it to be lower [32] or higher [14] in breast cancer.
Peroxiredoxin-2 (PRDX2), and peroxiredoxins in general, catalyze the reduction reaction of peroxide and maintain the balance of intracellular H 2 O 2 levels.In our analysis, PRDX2 was up-regulated 1.89-to 2.16-fold in B/N and M/N groups, respectively (p < 0.05), but exhibited no change in M/B group.High mRNA expression of PRDX1/2/4/5/6 was significantly associated with shorter relapse-free survival in breast cancer patients [33].It was one of the hub proteins that had 16 interacting partners and had an AUC of 80.2% with excellent diagnostics power.
Among the proteins commonly identified across serum and saliva, PRDX2, LG3BP and TSP1 are promising for further investigation to distinguish the benign from the malignant stage of breast cancer in a larger cohort.Moreover, some of the proteins identified in the present study have been associated with Hallmarks of Cancer specific proteins in breast cancer [34], including FINC, proteasome subunit alpha type-1 isoform 2 (PSA1), proteasome subunit alpha type-5 isoform 1 (PSA5), proteasome subunit beta type-3 (PSB3), phosphoglycerate kinase 1 (PGK1), heat shock cognate 71 kDa protein isoform 1 (HSP7C) and glutathione S-transferase p (GSTP1) which may provide insights into the early detection of breast disease.

Study Subjects
Subjects were recruited at the Hershey Medical Center Breast Cancer Center upon their routine visit for a mammogram.Sixty healthy adult women with no breast disease, 13 adult women with a diagnosis of benign breast disease and 15 adult women with a diagnosis of malignant breast disease were enrolled in the study.All participants provided written informed consent, following the protocol approved by the Pennsylvania State University Institutional Review Board (STUDY00005159).Subjects were recruited based on the following inclusion criteria: English-speaking female volunteers, 25-85 years of age, who had undergone mammogram examination and were currently non-smokers.Exclusion criteria included any evidence of cancer other than the breast and undergoing treatment for breast cancer prior to saliva and blood sample collection.When there was any abnormality detected on the mammogram, subjects were advised to undergo a biopsy.The diagnosis on the breast biopsy tissues following the surgical pathology reporting on Hematoxylin and Eosin-stained sections were provided by Board Certified Pathologists in the Department of Pathology, at the Penn State College of Medicine.Table 3 provides the characteristics of the subjects used for iTRAQ analysis.

Collection and Storage of Biological Samples
Saliva and blood samples were collected in the fasting state.Saliva samples were centrifuged at 10,000 rpm for 15 min at 4 • C and the clear supernatants were aliquoted in 1 mL screw capped vials.For serum, clotted blood was separated and centrifuged at 1300 rpm for 15 min at 4 • C. The clear serum was aliquoted in 1 mL screw cap vials.All biological samples were stored at −80 • C until analyzed.

Sample Processing and Labeling Procedure for iTRAQ Analysis
Eight saliva and serum samples from the participants in each group were processed for iTRAQ analysis as described earlier [35,36].The serum samples but not the saliva samples were depleted of the 6 most abundant proteins including albumin, IgG, IgA, transferrin, haptoglobin and antitrypsin using a Multiple Affinity Removal System LC Column (Agilent Technologies, Santa Clara, CA).Briefly, equal amounts of protein (100 µg) from each sample were digested with trypsin and subsequently labeled with one of 8 unique isobaric tags using the iTRAQ ® Reagent-8Plex Multiplex kit (AB SCIEX, Framingham, MA).Quantitative fragments, ranging from 113 to 121 Daltons, following MS/MS fragmentation shows proportionally how much of each peptide peak came from each of the individually labeled samples.The Penn State College of Medicine's Proteomic Core Facility received the tagged samples which were subsequently resolved by two-dimensional liquid chromatography prior to triple time-of-flight (TOF) mass spectrometry.Peptide identification, protein grouping and subsequent protein quantitation were done using the Paragon algorithm as implemented in Protein Pilot 5.0 software (ProteinPilot 5.0, which contains the Paragon Algorithm 5.0.0.0, build 4632 from ABI/MDS-SCIEX), searching the NCBI human database plus a list of 389 common contaminants (see Appendix A for details).The datasets presented in this software are ratios of samples with defined diagnoses (e.g., B/N, M/N or M/B).Ratios significantly greater than 1 in a B/N ratio indicates a differential increase in protein in benign compared to normal (similarly for M/N and M/B), and ratios significantly less than 1 in a B/N ratio indicates a differential decrease in benign compared to normal (similarly for M/N and M/B).

Gene Set Over-Representation Analysis
Functional annotations associated with the significant proteins determine as a result B/N, M/N and M/B comparisons in the saliva and serum data were identified in terms of biological processes, signaling and metabolic pathways by over-representation analyses using the Consensus PathDB [37].As the sources for pathway databases, Kyoto Encyclopedia of Genes and Genomes (KEGG) [38] and Reactome [39] were used.While the annotation of the biological process, cellular components and molecular function were determined using Gene Ontology (GO) [40] annotations.The significance of over-representations was evaluated by adjusted-p-values via Fisher's Exact Test, followed by Benjamini-Hochberg correction.Functional enrichment results with an adjusted p-value < 0.05 were considered statistically significant.

Construction of Protein-Protein Interaction Network
Using physical protein-protein interaction (PPI) data consisting of 68,948 interactions among 10,835 proteins which were experimentally detected in human and stored in the BioGRID database (version 4.4.210)[41], PPI networks were constructed around the significant proteins found in three comparisons (B/N, M/N, M/B) in saliva and serum data by enriching them with their first-neighbor interactions.The visualization of the PPI networks was performed via Cytoscape (v.3.7.0) software [42].

Prognostic Performance Analysis
The prognostic performance of the significant proteins in all three comparisons (B/N, M/N, M/B) for saliva and serum data were assessed with survival analyses according to the established pipeline [43,44] using RNA-Sequencing (RNA-Seq) data from 1102 patients suffering from invasive breast carcinoma obtained from TCGA database.Each individual was classified into low-and high-risk groups according to their risk score, the prognostic index (PI), according to the linear component of the Cox model with the equation PI = β 1×1 + β 2×2 + . . .+ β p×p (1) where x i is the expression value of each gene and β i is the coefficient obtained from the Cox fitting.Survival analyses were performed using the survival package (v.3.3.1)[45] in R (v.4.0.4).KM survival plots provided visualizations for the survival time statistics calculated by log-rank test, and the log-rank p-value < 0.05 was considered as the cut-off to describe the statistical significance of survival in each group.In addition, the HR was calculated to quantify the relative hazard of each KM plot.

ROC Curve Analysis
ROC curve analysis was performed for each significant protein in each of the three comparisons (B/N, M/N, M/B) for saliva and serum data using RNA-Seq data of the BRCA dataset including 1102 tumor and 113 normal samples in order to assess the diagnostic capability of protein markers to discriminate between individuals.The AUC values for each ROC curve were measured to determine how well it can discriminate between two diagnostic groups (tumor and normal).In general, a ROC = 0.5 suggests no discrimination, 0.7 ≤ ROC < 0.8 suggests acceptable discrimination, 0.8 ≤ ROC < 0.9 suggests excellent discrimination and ROC ≥ 0.9 is considered outstanding discrimination [18].ROC analyses were performed via pROC package [46] in R.

Statistical Analyses
To combine protein ratios across separate iTRAQ runs and to determine whether protein ratios differed significantly between the three comparisons (B/N, M/N and M/B), the ratios were modeled using a random effects model described earlier [35,47].
Briefly, this procedure used a weighted average of individual ratios across multiple iTRAQ runs to estimate an overall ratio.The weights were proportional to the inverse of

Figure 1 .Figure 1 .
Figure 1.GO classification and enrichment analysis of significantly expressed proteins in saliva samples of B/N, M/N and M/B groups.(A) GO classification in biological processes, cellular components and molecular functions.Top 10 enriched Reactome pathways for (B) B/N, (C) M/N and (D) M/B groups.

Figure 2 .
Figure 2. GO classification and enrichment analysis of significantly expressed proteins in serum samples of B/N, M/N and M/B groups.(A) GO classification in biological processes, cellular components and molecular functions.Top 10 enriched Reactome pathways for (B) B/N, (C) M/N and (D) M/B groups.

Figure 2 .
Figure 2. GO classification and enrichment analysis of significantly expressed proteins in serum samples of B/N, M/N and M/B groups.(A) GO classification in biological processes, cellular components and molecular functions.Top 10 enriched Reactome pathways for (B) B/N, (C) M/N and (D) M/B groups.

Figure 3 .
Figure 3. PPI networks for significant proteins in saliva and serum in B/N, M/B and M/N groups.Proteins colored in red are up-regulated and green colored proteins are down-regulated in expression.

Figure 4 .
Figure 4. Protein ratios in B/N and M/N groups compared for common proteins identified in iTRAQ analysis.Box plots for significant proteins in 6-8 saliva and serum samples are displayed.

Figure 4 .
Figure 4. Protein ratios in B/N and M/N groups compared for common proteins identified in iTRAQ analysis.Box plots for significant proteins in 6-8 saliva and serum samples are displayed.

Figure 5 .
Figure 5. KM plots estimating patients' survival based on significant proteins in B/N, M/B and M/N groups for breast cancer indicating p-value of the log-rank test and HR for each curve.The censoring samples are shown as "+" marks.Horizontal axis represents time to event.

Figure 5 .
Figure 5. KM plots estimating patients' survival based on significant proteins in B/N, M/B and M/N groups for breast cancer indicating p-value of the log-rank test and HR for each curve.The censoring samples are shown as "+" marks.Horizontal axis represents time to event.

Table 1 .
Significant saliva proteins identified in normal, benign and malignant samples.

Table 2 .
Significant serum proteins identified in normal, benign and malignant samples.

Table 3 .
Characteristics of subjects used for iTRAQ analysis.Saliva and blood samples were obtained prior to any treatments received by the subjects.