Plasma Proteome Signature to Predict the Outcome of Breast Cancer Patients Receiving Neoadjuvant Chemotherapy

Simple Summary The prognostic impact of plasma protein biomarkers in breast cancer patients treated with neoadjuvant chemotherapy (NCT) was evaluated using a proteomics approach. Three biomarkers were identified among differentially expressed proteins. The plasma concentration of APOC3 was higher in the pathological complete response (pCR) group, whereas MBL2, ENG, and P4HB were upregulated in the non-pCR group. Univariate survival analysis was performed to identify protein biomarkers that could classify patients into low- and high-risk groups. The results showed that MBL2 and P4HB were statistically significantly associated with disease-free survival (log-rank test p < 0.05); P4HB was statistically significantly associated with overall survival (log-rank test p < 0.05), whereas MBL2 was statistically significantly associated with distant metastasis-free survival (log-rank test p < 0.05). The results demonstrated that protein markers from non-invasive liquid biopsy sampling correlate with pCR and survival in breast cancer patients receiving NCT. Further investigation of these protein markers may help clarify their role in predicting prognosis and thus their therapeutic potential for preventing metastasis. Abstract The plasma proteome of 51 non-metastatic breast cancer patients receiving neoadjuvant chemotherapy (NCT) was prospectively analyzed by high-resolution mass spectrometry coupled with nano-flow liquid chromatography using blood drawn at the time of diagnosis. Plasma proteins were identified as potential biomarkers, and their correlation with clinicopathological variables and survival outcomes was analyzed. Of 51 patients, 20 (39.2%) were HR+/HER2-, five (9.8%) were HR+/HER2+, five (9.8%) were HER2+, and 21 (41.2%) were triple-negative subtype. During a median follow-up of 52.0 months, there were 15 relapses (29.4%) and eight deaths (15.7%). Four potential biomarkers were identified among differentially expressed proteins: APOC3 had higher plasma concentrations in the pathological complete response (pCR) group, whereas MBL2, ENG, and P4HB were higher in the non-pCR group. Proteins statistically significantly associated with survival and capable of differentiating low- and high-risk groups were MBL2 and P4HB for disease-free survival, P4HB for overall survival, and MBL2 for distant metastasis-free survival (DMFS). In the multivariate analysis, only MBL2 was a consistent risk factor for DMFS (HR: 9.65, 95% CI 2.10–44.31). The results demonstrate that the proteomes from non-invasive sampling correlate with pCR and survival in breast cancer patients receiving NCT. Further investigation may clarify the role of these proteins in predicting prognosis and thus their therapeutic potential for the prevention of recurrence.


Introduction
Neoadjuvant chemotherapy (NCT) provides several benefits for locally advanced breast cancer (BC) patients. Down-staging of tumors may increase the probability of breast conservation with better cosmesis [1][2][3]. Additionally, it allows in vivo monitoring of the response of tumors to therapy, which could be helpful for predicting pathological responses [4][5][6][7][8][9]. Tumors that respond well to a given therapy show better outcomes, and pathological complete response (pCR) is a surrogate factor for survival in the neoadjuvant setting [10][11][12]. However, the value of pCR for predicting prognosis in different subtypes of BC, especially in estrogen receptor (ER)-positive and HER2-negative tumors, is under debate. Because of substantial inconsistencies between clinical, radiological, and pathologic responses [13], extensive clinical/laboratory research has focused on achieving a more accurate prediction of the treatment response.
The Oncotype DX ® test provides genomic-based personalized information that enables the design of individualized treatments for patients with ER-positive primary BC [14]. The assay provides information on clinicopathological variables and has been validated for use in predicting the response to chemotherapy in ER-positive and HER2-negative BC. However, the assay requires a surgical specimen (formalin-fixed paraffin-embedded (FFPE) >10 × 10 µm) and two to three weeks of processing time, with costs higher than USD 4000. In addition, its indication is limited to the ER-positive and HER2-negative subtype. Plasma cancer biomarkers can provide useful information relevant to the diagnosis of several malignancies. In BC, several serum tumor biomarkers have been proposed, including the MUC-1 antigen (CA 15-3), the onco-fetal protein carcinoembryonic antigen (CEA), the oncoprotein HER-2/neu, and the cytokeratin tissue polypeptide specific antigen (TPS). However, because of their low sensitivity and specificity, the clinical utility of these serum markers is limited [15][16][17][18]. Compared with tissue biopsy, a quick and easy 'liquid biopsy' provides a less invasive and simpler method to assess tumor response through simple blood collection [19,20]. Circulating tumor cells (CTCs), which are detected by liquid biopsy, serve as a prognostic factor in metastatic BC, where a high CTC level is associated with poor prognosis [21,22]. However, analysis of CTCs in non-metastatic tumors has shown conflicting results [23][24][25][26][27], including in BC patients treated with NCT [28][29][30][31].
In translational research, proteomic-based BC research uses clinical samples such as plasma and tissues, to improve BC care through screening, early or companion diagnostics studies, predicting prognosis, subtyping, predicting metastasis and therapeutic responses, and discovering new drug targets [32][33][34][35][36][37]. Some protein signatures associated with the response to NCT were identified using FFPE tissues. PYCR1 and ALDH18A1 were identified by comparing the proteome of tissue before and after NCT with that of normal tissue [38]. The combination of four proteins, RAC2, RAB6A, BIEA, and IPYR, showed the best performance for predicting recurrence after NCT in triple-negative BC (TNBC) patients [39]. CD45 was identified as a predictor of pCR to neoadjuvant HER2-targeted therapy using the spatial proteomics approach [40]. The VEGF inhibition response predictor score, which is derived using nine-protein signatures, predicts the response to bevacizumab NCT in HER2-negative BC [41]. Analysis of blood samples identified differentially expressed proteins related to the pathological criteria for predicting patient responses to NCT [42][43][44][45][46], and blood exosomes [47] have been analyzed using proteomic approaches to predict the response to NCT.
Recent proteomic-centric multiomics studies analyzed BC clinical samples to classify BC into subtypes using protein and phosphorylated protein information, which led to the identification of novel subtypes such as basal-like and luminal B tumors by infiltration of immunological components [48]. Mass spectrometry-based proteomics that include analysis of post-translational modifications such as phosphorylation or acetylation, combined with next-generation DNA and RNA sequencing profiles, may provide a more comprehensive description of breast tumors [49][50][51]. Proteogenomic approaches highlight the potential of proteomics for clinical research in cancer through the identification of targetable signaling pathways and more precise curation of the biological signatures of tumor heterogeneity. Specifically, different genetic backgrounds may affect the inhibitory relationship between target kinases and tumor suppressors by post-translational modification.
Reliable biomarkers related to treatment response and survival in BC patients receiving NCT are currently lacking, and comorbidities are important for chemotherapy indication and regimen selection [52]. Proteome analysis may provide more reliable and direct information that can be used for monitoring the treatment response and for selecting specific treatment agents. Additionally, plasma proteome analysis can overcome the limitations of current biomarkers and tools by providing faster processing times (<2 days) and by using small samples (40 µL plasma). The aim of the present study was to identify potential biomarkers for predicting the response to therapy and for predicting recurrence by performing whole proteome analyses of the plasma of BC patients undergoing neoadjuvant systemic therapy.

Study Participants and Surveillance
Inclusion criteria for the baseline database of the prospective cohort were, (1) women aged >20 years, (2) pathologically diagnosed primary invasive breast cancer, (3) no distant metastasis at time of diagnosis (no de novo stage IV disease), (4) had undergone preoperative systemic treatment with curative intent, and (5) had agreed and signed consent. All patients had 4 mL blood drawn at time of diagnosis before chemotherapy. Among the 60 initial consecutively enrolled patients treated with NCT at Asan Medical Center in Seoul, Korea, between February 2014 and April 2017, 51 were eligible for final analysis with sufficient protein extracted for analysis. All patients' survival outcomes were updated, including loco-regional recurrences, distant metastasis and death information. The study was approved by the Institutional Review Board (IRB) of Asan Medical Center (Seoul, Korea; IRB-e no. 2013-1048), and was performed in compliance with the REMARK criteria [53]. Written informed consent was obtained from all participants. All experiments were performed in accordance with the relevant guidelines and regulations.
All patients of the study received standard treatment, and regular surveillance was performed. The initial diagnostic and follow-up work-up included mammography, breast ultrasound imaging, magnetic resonance imaging, chest X-rays, blood sampling, and clinical examination. ER and progesterone receptor expression were evaluated based on the Allred score [54]. HER2 status was considered negative if the immunohistochemistry score was 1+, or if the score was 2+ and the result of fluorescence or silver in situ hybridization for HER2 amplification was negative [55]. Clinical and histopathological staging was based on the 7th edition of the Cancer Staging Manual of the American Joint Committee on Cancer [56]. The clinical treatment response was evaluated by both physical examination and imaging assessments at each treatment timeline (baseline, after the first treatment, and after completing the course of NCT). Tumor response was assessed by the Response Evaluation Criteria In Solid Tumors (RECIST 1.1) [57,58].

Sample Preparation
For the proteomic analyses, 51 clinical plasma samples were prepared for LC-MS analysis. Plasma samples were loaded onto a MARS14 column (100 × 4.6 mm; Agilent Technology, Palo Alto, CA, USA) on a Shimadzu binary HPLC system (20A Prominence; Shimadzu, Tokyo, Japan) in order to deplete 14 highly abundant proteins; the unbound fraction was lyophilized with a cold trap (CentriVap Cold Traps; Labconco, Kansas City, MO, USA). Dried samples were resuspended in 400 µL of 5% SDS in 50 mM TEAB (pH 7.55), and dithiothreitol was added to a final concentration of 20 mM for 10 min at 95 • C to reduce disulfide bonds. Reduced samples were then incubated with 40 mM iodoacetamide for 30 min at room temperature in the dark. By a 10-fold dilution of 12% phosphoric acid, acidified samples were loaded onto S-Trap mini columns (ProtiFi, Farmingdale, NY, USA; Cat. No: CO2-mini-80). We treated suspension-trapping (S-trap) proteolysis according to the manufacturer's protocol, followed by the addition of 10 µg Lys-C/trypsin mixture and incubation for 16 h at 37 • C [59]. The eluted peptide mixture was lyophilized using a cold trap and stored at −80 • C until use.

Nano-LC-ESI-MS/MS Analysis
The LC system was an Dionex UltiMate 3000 RSLCnano system (Thermo Fisher Scientific, Waltham, MA, USA). Mobile phase A was 0.1% formic acid and 5% DMSO in water and mobile phase B was 0.1% formic acid, 5% DMSO and 80% acetonitrile in water. Samples were reconstituted with 25 µL of mobile phase A, injected with a full sample loop injection of 5 µL into a C18 Pepmap trap column (20 × 100 µm i.d., 5 µm, 100 Å; Thermo Fisher Scientific), and separated in Acclaim™ Pepmap 100 C18 column (500 × 75 µm i.d., 3 µm, 100 Å; Thermo Fisher Scientific) over 200 min (250 nL/min) at 50 • C. The column was priory equilibrated with 95% mobile phase A and 5% mobile phase B. A gradient of 5-40% B for 150 min, 40-95% for 2 min, 95% for 23 min, 95-5% B for 10 min, and 5% B for 15 min were applied. The LC system was coupled to a Q Exactive plus mass spectrometer (Thermo Fisher Scientific) with a nano-ESI source. The instrument was operated in the datadependent mode. One scan cycle included one MS1 scan at a resolution of 70,000 at m/z 400 followed by 20 MS2 scans in higher energy collisional dissociation mode to fragment the 20 most abundant precursor ions identified in the MS1 spectrum. The target value for MS1 by Orbitrap was 3 × 10 6 with a maximum injection time of 100 ms. The ion target value for MS2 was set to 1 × 10 6 with a maximum injection time of 50 ms and a resolution of 17,500 at m/z 400. The dynamic exclusion was enabled with the following settings: repeat count = 1 and exclusion duration = 20 s. All MS data were deposited in the Proteomics Identification Database (PRIDE) archive under PXD028251 [60].

Protein Identification by Database Search
Individual raw files acquired MS analysis and were retrieved against the reviewed Human Uniprot-SwissProt protein database (released on May 2017) [61] using the SEQUEST-HT on Proteome Discoverer (Version 2.2, Thermo Fisher Scientific). Search parameters used were as follows: 10-ppm tolerance for precursor ion mass and 0.02 Da for fragmentation mass. Trypsin peptides tolerate up to two false cleavages. Carbamidomethylation of cysteines was set as fixed modification and N-terminal acetylation and methionine oxidation were set as variable modifications. The false discovery rate (FDR) was calculated using the target-decoy search strategy, and the peptides within 1% of the FDR were selected using the post-processing semi-supervised learning tool Percolator [62] based on the SEQUEST result. Label-free quantitation (LFQ) of proteins was calculated using the precursor ion peak intensity for unique and razor peptides of each protein and excluded peptides with methionine oxidation.

Differential Data Analysis by Normalization and Filling Missing Data
The normalization method by endogenous normalization proteins, which is mainly used in LFQ, was performed [63,64]. In this study, four proteins (C6, HPX, KNG1, and Cancers 2021, 13, 6267 5 of 19 SERPINC1) were selected by NormFinder software [65] because the difference between the pCR and non-pCR group was the smallest. Since the quantitative values of the four proteins depend on the characteristics of the plasma sample, they were scaled by dividing the median values of the corresponding proteins in all samples. After that, the geometric mean of the adjusted ratio values of the four proteins for each sample is calculated, and this is defined as the normalization scaling factor (NSF) for that sample. The normalized quantitative values of the remaining proteins except for the four proteins were derived by dividing the raw protein quantitative values in each sample by the NSF. The details of this method are described in previous studies [66][67][68].
Proteins were selected based on >80% of quantified proteins in all samples, and the missing data were filled by the local least squared imputation method, calculating through correlation with 100% quantified proteins at the raw abundance [69]. After that, normalization was performed.

Statistical Clinical Model Generation Based on Feature Selection
Feature selection was performed to identify the optimal subset from four proteins [apolipoprotein C3 (APOC3), endoglin (ENG), mannose-binding lectin 2 (MBL2), and prolyl 4-hydrolase beta (P4HB)] to classify patients into two NCT response groups using a random forest (RF)-based backward elimination process [70]. This process consisted of the following two steps: first, 10,000 decision trees containing four variables were randomly generated, and area under the curve (AUC) values were calculated. The AUC values were used to determine the optimal number of proteins using out-of-bag error estimation, yielding a value of three. Second, 50 iterations and three-fold cross-validation were performed using the three selected variables to calculate the predictive importance of each variable included in the model. Three proteins (>0.5 probability of selection) were selected. Data preprocessing, including centering and scaling, were performed before model building. The training and validation sets were divided into thirds using whole data. In the training set, a linear kernel support vector machine (SVM) model [71] with optimized cost parameters was generated by three-fold cross-validation with three repeats. The RF model [72] was optimized with a mtry parameter by three repeats of three-fold crossvalidation with 10,000 trees and nodeSize = 5. Machine learning (ML) model prediction values were obtained in the validation set (without the training set samples), and ROC analysis was performed. This process was performed by randomly changing sets 100 times.

Mining Public Microarray Data
Microarray gene expression data (series accession number: GSE22513 [73][74][75] and GSE22093 [76,77]) were downloaded from the Gene Expression Omnibus database [78]. The GEO2R interactive web tool was used to extract three identifiers that matched the three selected genes according to the platform record and their expression values. When there were two or more probes for one gene, the median value was estimated.

Statistical Methods
Survival analyses were performed including DFS, DMFS, and OS. DFS was defined as the time from the date of enrollment into the study to the first date of any type of recurrence. DMFS and overall survival (OS) were defined as the time elapsed between the date of enrollment into the study and the date of distant metastasis or the date of death from any cause, respectively. Statistical analysis was performed using IBM SPSS Statistics Version 26. The univariate Kaplan-Meier method was used to estimate survival probabilities. Multivariate Cox proportional hazards regression analyses were performed for each proteome using the following clinical parameters: patient age at diagnosis; clinical tumor stage; nodal status; hormone receptor (HR) status; and HER2 status. A p-value of <0.05 was considered to be statistically significant.
Proteome data evaluation was performed by statistical language R 3.6.0 and RStudio 1.1.456 with the several packages that contained ggplot2 for displaying violin and volcano plots, mixOmics for PLA-DA [79], RVAideMemoire for calculating VIP scores, stats for applying the t-test, pcaMethods for missing data imputation, survival for survival analysis, survminer for determining cutoff values by maximally selected rank statistics (minimal proportion of observations per group: 20%), and GEOquery for downloading GEO sets.

Proteome Results from Clinical Plasma Samples by LC-MS/MS
A workflow was established for biomarker identification in BC patients with or without pCR after NCT ( Figure 1A). To identify prognostic marker candidates for pCR, clinical plasma samples were collected from 51 BC patients, including 15 with and 36 without pCR after NCT. Depleted plasma samples from the 51 study participants were used to analyze constitutive proteins via single LC-MS/MS runs, which led to the identification of 594 proteins. Among them, 548 proteins were quantified in one or more samples using a label-free quantification method. Among these, four relatively stable abundant proteins (C6, HPX, KNG1, and SERPINC1) were used to normalize the raw abundance of the other candidates, which were quantified in at least 80% of the samples [81] (Supplementary Figure S1A and Supplementary Table S2). Before normalization, missing values were imputed [82]. After normalization, 254 common proteins out of 305 proteins showed a significant positive correlation with the plasma concentrations of the published Plasma Proteome Database [83] (ρ = 0.657; Pearson's correlation coefficient, permutation p < 0.001; Figure S1B). Partial least squares-discriminant analysis (PLS-DA) indicated that the pCR and non-pCR groups were separated into two components, component 1 (6%) and component 2 (17%) ( Figure 1B). VIP score-ordered contributions are shown in Figure 1C. The top 26 proteins had VIP scores >1.5. Statistical analysis was performed to identify pCR prediction marker candidates. Four signature proteins annotated by molecular functional terms and processes were selected for building the clinical model. Finally, three biomarkers were selected and used in the survival analysis, including recurrence, death, and metastasis events.

Differentially Abundant Plasma Proteins between pCR and Non-pCR BC Patients
Statistical analysis was performed using the Student's t-test to identify differentially abundant plasma proteins (DAPs) between the two groups. A volcano plot was drawn to represent log2 fold-changes against negative log10 p-values. We identified a single upregulated protein in the pCR group and three proteins in the non-pCR group (p < 0.05 and

Differentially Abundant Plasma Proteins between pCR and Non-pCR BC Patients
Statistical analysis was performed using the Student's t-test to identify differentially abundant plasma proteins (DAPs) between the two groups. A volcano plot was drawn to represent log2 fold-changes against negative log10 p-values. We identified a single upregulated protein in the pCR group and three proteins in the non-pCR group (p < 0.05 and |fold-change| > 2; Figure 2A and Supplementary Table S3). We examined whether the abundance of the four proteins was related to the subtypes as confounding factors. Each protein was stratified by pCR status, and the quantitative differences according to subtype (HER2 and HR positive or negative) were statistically analyzed (p > 0.05; Supplementary Figure S2). The results confirmed that the subtype did not affect the four proteins as a confounding factor. Functional annotation of the proteins was performed using Enrichr [84]. Significant Functional annotation of the proteins was performed using Enrichr [84]. Significant differences between the two groups were identified using WikiPathways ( Figure 2B). P4HB and ENG were involved in the "VEGFA-VEGFR2 signaling pathway". ENG was also involved in "transforming growth factor beta binding" and "hypothesized pathways in pathogenesis of cardiovascular disease". P4HB was also associated with "type I collagen synthesis in the context of osteogenesis imperfecta". MBL2 was related to "complement system", "Ebola virus pathway on host", and "regulation of toll-like receptor signaling pathway". APOC3 was linked to "PPAR signaling pathway", "composition of lipid particles", and "statin inhibition of cholesterol production". In addition, we focused on the TNBC subtype, which shows the greatest long-term clinical benefit from pCR in BC [85]. Statistical analysis was performed as described above by dividing patients into pCR and non-pCR groups only in the TNBC subtype (Supplementary Figure S3). In all BC patients, one highly abundant protein, MBL2, was identified in the non-pCR group, and three highly abundant proteins, DCD, KNG1-2, and TLN1, were identified in the non-pCR group. Two highly abundant proteins, ALCAM and MAN1A1, were identified in the pCR group.

Multivariate Analysis for Predicting pCR Outcome
Multivariate analysis was performed using ML classifiers based on random forest (RF) [72] and SVM [71] to improve the predictive performance for distinguishing pCR from non-pCR patients. First, feature selection was performed with the four significant proteins by AUC-based RF backward elimination [70] according to a probability of selection >0.5 (Table 2) and independently performed 305 proteins as input (Supplementary Table  S4). The RF and SVM models were built with three proteins (MBL2, ENG, and P4HB). To avoid overfitting, threefold cross-validation was performed three times to generate 10,000 decision trees from the RF model, and a linear SVM model was applied. To confirm the robustness of the ML models, the sample was randomly trisected 100 times, and the model was then built with 2/3 of the sample and validated with 1/3 of the sample. Evaluation of the performance of the classifiers showed that the median AUC values for SVM and RF were 0.861 (95% CI: 0.845-0.873) and 0.861 (95% CI: 0.830-0.867), respectively ( Figure 3). The three plasma biomarkers were also expressed in tissues of BC patients. The mRNA expression levels of five proteins obtained from BC tissues by fine needle aspiration prior to NCT were analyzed. The data were obtained from two publicly available GEO datasets (GSE22513 [73][74][75] and GSE22093 [76,77]). ML models were built as described above. In GSE22513, the median AUC values for SVM and RF were 0.631 (95% CI: 0.613-0.643) and 0.646 (95% CI: 0.633-0.669), respectively. In GSE22093, the median AUC values for SVM and RF were 0.709 (95% CI: 0.684-0.713) and 0.658 (95% CI: 0.645-0.666), respectively.

Uniprot Accession
No.

Survival Analysis
During a median follow-up of 52.0 months, 15 relapses (29.4%) and eight deaths (15.7%) were observed. To determine the correlation of single plasma proteins with longterm clinical indicators such as DFS, OS, and DMFS, we performed univariate survival analysis for the three proteins in the model (MBL2, ENG, and P4HB), and the remaining 302 proteins were quantified. The Kaplan-Meier method was used to select the cutoff values based on the maximally selected rank statistics. At first, pCR was statistically a better prognostic factor than non-PCR for DFS, OS, and DMFS (log-rank test p < 0.05). MBL2 and P4HB for DFS, P4HB for OS, and MBL2 for DMFS were statistically significant in dividing patients into low-risk and high-risk groups (log-rank test p < 0.05; Figure 4, Supplementary Figure S4A,B). Among the remaining 302 proteins quantified, the prognosis with respect to the three survival results in the two patient groups was separated by a threshold of protein quantification values: 84 proteins for DFS, 46 proteins for OS, and 96 proteins for DMFS were statistically significant for classifying patients into high-risk and low-risk groups (log-rank test p < 0.05; Supplementary Table S5). We also analyzed the prognosis of patients according to Miller-Payne grades in the patients with partial response, but couldn't find a significant correlation ( Figure S5). In the multivariate Cox analysis of survival with following factors: patient age at diagnosis; clinical tumor stage; nodal status; hormone receptor (HR) status; and HER2 status, no factor showed significant correlation. However, in DMFS analysis with proteins, MBL2 was identified as the only consistent risk factor (HR: 9.65, 95% CI: 2.10-44.31, p = 0.004; Table 3). In other survival analyses including DFS and OS, none of the proteins demonstrated a significant correlation. All three protein (MBL2, ENG, P4HB) levels were significantly increased as the pathological stages elevated (Supplementary Table S6).  Patients were divided into two risk groups according to MBL2 abundance: low abundance group (n = 34, 17.6%) and high abundance group (n = 17, 52.9%); HR, hormone receptor. * No events in the node negative group.

Discussion
Despite considerable advances in our understanding of BC biology, the design of therapeutic approaches is dependent on and guided by molecular profiling that categorizes tumors according to HR and HER2 status [86]. Despite the improvement of treatment strategies related to HR and HER2 status, recent emerging global trends show increased BC mortality rates [87], which are attributed to treatment resistance and highly proliferative BC variants within these subtypes [88]. Thus, the identification of novel markers that can detect resistance and prognosis is an important issue.

Discussion
Despite considerable advances in our understanding of BC biology, the design of therapeutic approaches is dependent on and guided by molecular profiling that categorizes tumors according to HR and HER2 status [86]. Despite the improvement of treatment strategies related to HR and HER2 status, recent emerging global trends show increased BC mortality rates [87], which are attributed to treatment resistance and highly proliferative BC variants within these subtypes [88]. Thus, the identification of novel markers that can detect resistance and prognosis is an important issue.
The pCR can be a potential surrogate marker with a prognostic value for predicting survival in the HER2-positive and triple-negative subtypes [11,[89][90][91]. In these sub-types, patients who achieve pCR have a better prognosis than those who fail to achieve pCR [10][11][12]. Thus, predicting tumor response before or during NCT is important for evaluating patients' prognosis. However, in luminal subtype BC, reliable factors associated with tumor response and prognosis are relatively rare.
In luminal subtype BC, a genomic assay based on tissue biopsy samples such as the Oncotype DX ® provides information on clinicopathological factors and is a powerful precision medicine tool for cancer patients. However, its application is limited to the HR-positive and HER2-negative BC subtypes [14]. Moreover, the assay requires a relatively large specimen and is limited by high costs and long processing times.
Another strategy to obtain information for tumor assessment is a 'liquid biopsy'. Compared with tissue biopsy, a quick and easy 'liquid biopsy' allows for a less invasive and simpler assessment of tumor response through simple blood collection [19]. Plasma cancer biomarkers and the number of circulating cancer cells provide useful information that is relevant to cancer diagnosis. CTCs are potential biomarkers of prognosis after treatment [24,26,92,93]. Although studies show that CTC detection is a potential prognostic factor in metastatic BC [21,22,94,95], its clinical value and prognostic impact in non-metastatic BC patients, especially those treated with NCT, are under debate with conflicting results [23,92,96,97].
Proteomics is the analysis method that is similar to CTC analysis and also another form of "non-invasive analysis" as liquid biopsy. Proteomic profiles derived from liquid biopsy samples can provide more direct profiling of diseases. Thus, proteomics is expected to yield promising results for the early diagnosis of cancer and for evaluating the efficacy of antitumor therapy. Current clinical proteomics methods for cancer management are focused on biomarker discovery and validation [98]. Because proteins function through specific pathways rather than individually, anti-cancer strategies can be designed by targeting the biomarker that affects specific pathways related to cancer development [99,100].
In the present study, we analyzed the plasma proteomes of locally advanced BC patients receiving NCT. We used a biomarker discovery workflow system to identify candidate protein biomarkers of tumor response and prognosis in BC patients using LC-MS/MS and three proteins were selected for validation. These proteins were used to generate models that independently predicted the risk of recurrence regardless of tumor subtype. The three protein markers were P4HB, ENG, and MBL2. P4HB is a protein disulfideisomerase that is one of the core genes in the beta subunit of prolyl 4-hydroxylase [101]. P4HB is related to the carcinogenesis and development of multiple tumors. P4HB is highly expressed in colon cancer, and knockdown of P4HB promotes cancer cell apoptosis [102]. In liver cancer, knockdown of P4HB inhibits the migration and invasion of HepG2/ADR cells, as demonstrated by culturing HepG2 cells (a human hepatocellular carcinoma cell line) in the presence of increasing concentrations of adriamycin [103]. P4HB is also highly expressed in renal clear cell carcinoma and significantly related to poor OS [104]. These findings indicate that P4HB may serve as a potential molecular marker for the diagnosis and treatment of cancer. We demonstrated that low serum P4HB level is significantly associated with better DFS and OS, which is consistent with the findings reported by Yang et al. [105]. The study demonstrated that downregulation of P4HB represses the promoting effects of overexpressed COL10A1 on the proliferation, migration, and invasion of BC cells and, conversely, upregulation of P4HB promotes BC cell proliferation and clone-forming ability, as well as increasing BC cell migration and invasion [105].
ENG (also known as CD105) is a receptor for transforming growth factor β that is expressed at high levels on the cell surface of tumor blood vessels and tumor stromal components [106]. ENG shows affinity for "newly forming" angiogenic endothelium, whereas CD34 and CD31 react not only with angiogenic vessels, but also with the endothelium of normal vessels; ENG is thus superior to CD34 and CD31 for the evaluation of tumor angiogenesis [107]. Elevated expression of ENG is often observed in the actively proliferating endothelium [108,109]. There is a significant correlation between markers of cell proliferation such as Ki-67 and cyclin-A [108]. Thus, ENG is a potential marker of tumor-associated angiogenesis and prognosis [109]. Li et al. showed that plasma ENG levels are elevated in BC patients at risk of metastasis, and ENG overexpression is significantly correlated with metastatic disease, suggesting the value of ENG for predicting metastasis [110]. Kumar et al. demonstrated that the reactivity of ENG in blood vessels of BC tissues correlates with a poor prognosis [111]. Although the results did not reach statistical significance, we also observed that patients with high ENG levels had poor survival rates.
MBL2 is an activator of the lectin pathway and a crucial component of the innate immune system, and inflammatory reactions are critical for tumor progression and can promote human carcinogenesis [112,113]. MBL can inhibit tumor progression via the complement system and through MBL-dependent cell-mediated cytotoxicity [114][115][116]. However, recent studies show controversial results. Holm et al. showed that high plasma levels of MBL2 are a marker of poor survival in colorectal cancer patients [117]. Yitting et al. reported that the MBL complement activation pathway is activated in patients with colorectal cancer compared with healthy controls; however, MBL pathway deficiency rates are similar between patients and healthy controls [118]. Additionally, local expression of MBL2 genes is higher in women with ovarian cancer than in controls [119]. In this study, plasma MBL2 levels were higher in the non-pCR group, and high plasma MBL2 levels were associated with poor survival (DFS and DMFS). Taken together, these results suggest that MBL2 can have a protective effect against tumors, as well as a tumorigenic effect.
The present study had several limitations. First, the sample size was small, which limits the statistical power. To overcome any possible overfitting issue associated with the small sample size, cross-validation was used for model development. In addition, the validation study included additional plasma samples collected under IRB. Second, because the study sample was heterogeneous regarding the distribution of tumor subtypes, representing the general BC population is difficult. However, considering the prevalence of each subtype of BC, the present results are valuable for discovering prognosis-related protein signatures. Third, although tumor-derived proteomes are present at high concentrations in the blood of cancer patients, abundant proteins can be derived from cellular sources other than the tumor, which is the major limitation of "liquid biopsy". In terms of homeostasis, the plasma proteome can reflect differences in the immune or inflammatory status of patients, which may affect the response to chemotherapy. The plasma proteome is thus a critical indicator of the chemotherapy response.
Despite the limitations listed above, the findings of this study show that certain proteomes are associated with chemotherapy response and prognosis in patients receiving NCT.

Conclusions
This study demonstrated that proteins from non-invasive liquid biopsy sampling correlate with pCR and survival in BC patients receiving NCT. Among them, potential druggable targets were identified. Plasma protein analyses identified differentially expressed proteins between groups with distant metastasis, independently from the achievement of pCR. Quantitative protein analyses by liquid biopsy may provide a means to predict response and recurrence with minimal amounts of sample, at a lower cost, and with faster times. Further investigation of these proteomes may reveal their role in predicting prognosis, which could serve as a novel therapeutic strategy.
Supplementary Materials: The following materials are available online at https://www.mdpi. com/article/10.3390/cancers13246267/s1. Figure S1: (A) Boxplots of normalized plasma protein abundances in the 51 samples (15 BC patients with pathological complete response (pCR) after neoadjuvant therapy and 36 patients with non-pCR) measured by LC-MS analysis. (B) Scatterplot of 254 plasma proteins between log2 plasma concentration in the Plasma Proteome Database (bottom) and normalized log2 abundance (Pearson correlation coefficient (ρ): 0.657 and p-value: 9.9 × 10 −5 ). Figure S2: (A) Boxplot of four proteins (MBL2, ENG, P4HB, and APOC3) in groups according to pCR and HER2 presence (B). Boxplot of four proteins (MBL2, ENG, P4HB, and APOC3) in groups divided by pCR and hormone receptor presence. Figure S3: Volcano plots are depicted with the fold-change of each protein abundance; the p-value was calculated by performing a t-test. In the triple-negative BC (TNBC) subtype (n = 21), the averages of the plasma proteomic abundance data in the pCR group (n = 8) were compared with the averages of the data for the non-pCR group (n = 13) with TNBC. The red circle shows four plasma proteins showing significant increases in the non-pCR group. The green circle shows two plasma proteins with significant decreases in the non-pCR group. Gray circles are plasma proteins with no statistical significance. Figure S4: Kaplan-Meier plots of pCR in relation to three proteins (MBL2, ENG, and P4HB) against overall survival (A) and distant metastasisfree survival (B). Figure S5: Kaplan-Meier plots of Miller-Payne grades in disease-free survival (A), distant meta-free survival (B), and overall survival (C). Statistical significance was determined using the log-rank test. p-values < 0.05 are displayed in bold. Table S1. Demographic and clinical characteristics of the 51 breast cancer patients with or without pCR. Table S2. Normalized abundance of 305 plasma proteins in 51 breast patients with or without pCR. Table S3. Results of volcano plot analysis between the pCR group and the non-pCR group. Table S4. Selected feature proteins by AUC-based RF backward elimination with 305 proteins as input. Table S5. Univariate survival analysis of recurrence, metastasis, and death. Table S6. Association between three biomarkers with clinical and pathological stage.

Informed Consent Statement:
The patients/participants provided written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Data Availability Statement:
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest:
The authors declare no conflict of interest.