A Targeted Proteomics Approach for Screening Serum Biomarkers Observed in the Early Stage of Type I Endometrial Cancer

Endometrial cancer (EC) is the most common gynecologic malignancy, and it arises in the inner part of the uterus. Identification of serum biomarkers is essential for diagnosing the disease at an early stage. In this study, we selected 44 healthy controls and 44 type I EC at tumor stage 1, and we used the Immuno-oncology panel and the Target 96 Oncology III panel to simultaneously detect the levels of 92 cancer-related proteins in serum, using a proximity extension assay. By applying this methodology, we identified 20 proteins, associated with the outcome at binary logistic regression, with a p-value below 0.01 for the first panel and 24 proteins with a p-value below 0.02 for the second one. The final multivariate logistic regression model, combining proteins from the two panels, generated a model with a sensitivity of 97.67% and a specificity of 83.72%. These results support the use of the proposed algorithm after a validation phase.


Introduction
Endometrial cancer (EC) arises in the inner part of the uterus and represents the fourth most common female malignancy in Europe [1]. Unlike other cancers, the incidence and mortality of EC are rapidly increasing worldwide, especially in North America and Western Europe (incidence 12.9-20.2 per 100,000 women and mortality 2.0-3.7 per 100,000 women) [2]. Although genetic predisposition and racial background might promote EC development, the most important EC-predisposing factors seem to be associated with health and lifestyle conditions (e.g., obesity, metabolic syndrome, diabetes, polycystic ovary syndrome, high estrogen levels) [3][4][5][6].
EC is classified into two subtypes with distinct clinical, pathological and molecular features. Commonly, type I ECs display a low grade (I or II) endometrioid morphology and are estrogen-dependent; thus, they are associated with a good prognosis. Type II ECs include non-endometrioid adenocarcinomas, serous clear cell, undifferentiated carcinomas and carcinosarcomas, usually hormone-receptor negative high-grade tumors, with poor prognosis [7].
Type I EC comprises the large majority of endometrial cancers,~90%, while type II EC comprises~10%. In type I EC, stage 1 is the most frequent [8]. In EC, 80% of patients are in the early stages, while 20% are in more advanced phases [9]. At stage I, the five-year survival rate is 95%, and the survival decreases dramatically to 14% for stage IV [10]. The identification of biomarkers at an early stage would lead to a prompt diagnosis, reduce inappropriate and invasive examinations and improve patient care and prognosis [11].
For this study, a total of 88 women (44 suffering from EC and 44 non-EC controls) were recruited at the Institute for Maternal and Child Health-IRCCS "Burlo Garofolo" (Trieste, Italy) from 2019 to 2021. All EC patients had type I endometrioid adenocarcinomas at tumor stage 1. Type I EC comprises the large majority of endometrial cancers,~90%, while the type II EC comprises~10%. In type I EC, stage 1 is the most frequent. For this reason, in this study we focused on type I EC at stage 1 patients to identify candidate serum protein biomarkers.
All procedures complied with the Declaration of Helsinki and were approved by the Institute's Technical and Scientific Committee. All patients signed informed consent forms. In Supplementary Table S1, we describe the clinical and pathological characteristics of the patients. The median age of patients was 67 years (Inter quartile range [55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70][71], with a minimum of 44 and a maximum of 81, while the median age of controls was 35 years (IQR , with a minimum of 22 and a maximum of 77 years. Controls were chosen excluding oncologic patients, human immunodeficiency virus (HIV), hepatitis B virus (HBV), hepatitis C virus (HCV) seropositive subjects, and patients with leiomyomas or adenomyosis. For EC cases, we ruled out women with other oncologic pathologies, human immunodeficiency virus (HIV), hepatitis B virus (HBV), hepatitis C virus (HCV) seropositive patients, and patients with leiomyomas or adenomyosis. We excluded control patients with benign tumors (myoma), chronic inflammatory disease (adenomyosis) or viral infections because these pathologies may influence the abundance of several proteins in serum and, consequently, affect the proteomic analysis.

Serum Sample Collection and the PEA
To obtain serum, blood was centrifuged at 5000 rcf × 5 min. Once obtained, the serum was preserved at −80 • C. Sera were shipped to Olink ® Proteomic (Dag Hammarskjölds väg 52B, SE-752 37 Uppsala, Sweden). In total, 40 µL of serum was used for PEA analysis in the Immuno-oncology panel and the Target 96 Oncology III panel with 96-wells, in which 92 oligonucleotide-labeled antibody probe pairs bind to their specific targeted proteins. The protein names, gene names, and abbreviations for the 92 proteins of the Immunooncology panel and Target 96 Oncology III panel are reported in Tables S2 and S3. The PEA technology includes three core steps. It starts with an overnight incubation of [16][17][18][19][20][21][22] h. The 92 antibody pairs, labelled with DNA oligonucleotides, bind to their respective protein in the samples. During the second step, we have 2 h of extension and amplification. Oligonucleotides that are brought into proximity hybridize, and are extended using a DNA polymerase. This newly created piece of DNA barcode is amplified by PCR. In the last step, we have 4.5 h of detection. The amount of each DNA barcode is quantified by microfluidic qPCR.
Negative Control for Olink Explore is also included in triplicate on each plate and consists of buffer run as a normal sample. These are used to monitor any background noise generated when DNA-tags come in close proximity without prior binding to the appropriate protein. The negative controls set the background levels for each protein assay and are used to calculate the limit of detection (LOD) and to assess the potential contamination of the assays. The Plate Control was another control included in triplicate on each plate. The median of the Plate Control triplicates is used to normalize each assay and compensate for the potential variation between runs and plates. Once the data were obtained from the plate reading, they were analyzed, including normalization and linearization, by protocols of the manufacturer. The protein level is expressed as NPX, Normalized Protein eXpression, an arbitrary unit in Log2 scale. It is calculated from Ct values, and data pre-processing (normalization) is performed to minimize both intra-and inter-assay variation. NPX data allow users to identify changes in individual protein levels across their sample set, and then use these data to establish protein signatures.
Olink Target 96 Immuno-oncology panels include proteins associated with biological functions linked to immune response and immuno-oncology diseases. The biomarkers in this panel include proteins involved in processes such as promotion and inhibition of tumor immunity, chemotaxis, vascular and tissue remodeling, apoptosis and cell killing and metabolism and autophagy. The Olink Target 96 oncology III panel comprises 92 human proteins that participate in biological mechanisms that are central to the initiation and progression of cancer, e.g., angiogenesis, cell communication, cellular metabolic processes, apoptosis, cell proliferation/differentiation, etc. In Supplement Tables S2 and S3, all the proteins that make up the two panels are reported. These panels do not focus on specific malignancies. The categorization of the proteins included in the panel was carried out via referral to widely used public-access bioinformatic databases, including Uniprot, Human Protein Atlas, Gene Ontology (GO) and DisGeNET.

Statistical Analyses
Each of the two panels comprised 92 proteins. We first excluded all proteins with more than 25% values below the limit of detection (LOD). Olink suggests excluding assays in the range of less than 25-50% of samples above LOD (https://www.olink.com/faq/howis-the-limit-of-detection-lod-estimated-and-handled/) accessed on 4 July 2022, but we adopted a more restrictive approach and chose to have at least 75% of samples above the LOD. After excluding these, for each of the remaining proteins, we calculated the median value, and the interquartile range. We carried out binary logistic regressions to study the association with the outcome. Before proceeding to the multivariate logistic regression, for each panel, we selected the proteins more strongly associated with the dependent variable, on the basis of the binary logistic regression p-value result. Our first approach was to carry out a first selection of proteins with a least absolute shrinkage and selection operator (LASSO) multivariate logistic regression approach, but the results were unfortunately not comparable with a traditional binary logistic regression approach. For each panel separately, the selected proteins were simultaneously considered in a multivariate logistic regression model. A downward selection was applied to exclude, one at a time, the proteins with the highest p-value, if p ≥ 0.05. We thus obtained two final predictive models which included only proteins significantly and simultaneously associated with the outcome. For each model, we reported the Pseudo-R-squared value, the Area under the Receiver Operating Characteristic (ROC) Curve (AUC), sensitivity and specificity. Finally, we decided to consider the two final models together in a multivariate logistic regression model. we hypothesize that improving the predictive models we might obtain a group of proteins that could be included in an ad hoc panel. Again, we adopted a stepdown procedure and obtained a third model.

Results
In the first panel-the Immuno-oncology panel-ten proteins had more than 25% values below the LOD (IL_1_alpha, FGF2, IL2, IL33, CD28, IL5, PTN, CXCL12, IL4, IL13) and were excluded from further analyses. Of the remaining proteins, median values and interquartile ranges are reported for cases and controls, as well as the odds ratios, 95% confidence intervals and p-values of the binary logistic regression (Table 1). The binary logistic regression analyses allowed us to identify the proteins more strongly associated with the outcome. For the first panel, we selected 20 proteins with a binary logistic regression p-value below 0.01. These 20 proteins were considered together in a multivariate logistic regression model (Table 2). After applying a downward selection Biomedicines 2022, 10, 1857 7 of 22 as described in the Methods section, four proteins remain, significantly and simultaneously associated with the outcome (Table 3).   (Table 4 and Figures 1 and 2). For regression coefficients reported in Table 3 and predicted probability cut points reported in Table 4, the following model will identify cases and controls with the specified sensitivity and specificity: This model has a Pseudo R-squared = 0.605, an AUC = 95.4% (95% CI 91.5-99.3%), reaching a sensitivity of 97.67% with a specificity of 74.42% (Table 4 and Figures 1 and 2). For regression coefficients reported in Table 3 and predicted probability cut points reported in Table 4, the following model will identify cases and controls with the specified sensitivity and specificity: Predicted probability = 1/(1 + exp(-(-87.09041 − 3.352554 × Gal-9 + 9.833984 × Gal-1 + 5.496387 × MMP7 − 3.052633 × FASLG))).   In the second panel-the Target 96 Oncology III panel-there were 19 proteins with more than 25% values below the limit of detection (TBL1X, IL17F, TPMT, KLK4, NT5C3A, GAMT, HEXA, TNFAIP8, AIF1, CNPY2, SEPT9, CDC27, CXCL14, LAP3, SPINK4, YTHDF3, ACTN4, GGA1, TPT1) so they were excluded from further analyses. Of the remaining proteins, in Table 5 we report median values and interquartile ranges for cases and controls, as well as the odds ratios, 95% confidence intervals and p-values of the bi- Predicted probability = 1/(1 + exp(-(-87.09041 − 3.352554 × Gal-9 + 9.833984 × Gal-1 + 5.496387 × MMP7 − 3.052633 × FASLG))).
In the second panel-the Target 96 Oncology III panel-there were 19 proteins with  more than 25% values below the limit of detection (TBL1X, IL17F, TPMT, KLK4, NT5C3A,  GAMT, HEXA, TNFAIP8, AIF1, CNPY2, SEPT9, CDC27, CXCL14, LAP3, SPINK4, YTHDF3, ACTN4, GGA1, TPT1) so they were excluded from further analyses. Of the remaining proteins, in Table 5 we report median values and interquartile ranges for cases and controls, as well as the odds ratios, 95% confidence intervals and p-values of the binary logistic regression.  With the results of the binary logistic regression, for the second panel, we selected 24 proteins with a binary logistic regression p-value below 0.02, as we only had 11 proteins with a p-value below 0.01 (Table 6). These 24 proteins were considered together in the multivariate logistic regression model. After applying a downward selection, we were left with five proteins, significantly and simultaneously associated with the outcome (Table 7).  This model has a Pseudo R-squared = 0.436, an AUC = 88.9% (82.1-95.6%), reaching a sensitivity of 95.45% with a specificity of 69.77% (Table 8 and Figures 3 and 4). For the regression coefficients reported in Table 7 and predicted probability cut points reported in Table 8, the following model identified cases and controls with the specified sensitivity and specificity: This model has a Pseudo R-squared = 0.436, an AUC = 88.9% (82.1-95.6%), reaching a sensitivity of 95.45% with a specificity of 69.77% (Table 8 and Figures 3 and 4). For the regression coefficients reported in Table 7 and predicted probability cut points reported in Table 8, the following model identified cases and controls with the specified sensitivity and specificity: Predicted probability = 1/(1 + exp(−(−21.76806 + 0.8618858 × CDHR2 + 3.415408 × NCS1 + 0.6490679 × MLN − 2.915442 × FLT3 − 1.256405 × COL9A1))).    The third model was generated by considering in a multivariate logistic regression all proteins included in the two final models, i.e., Gal-1, Gal-9, MMP7. FASLG, CDHR2 NCS1, MLN, FLT3 and COL9A1 (Table 9). After the stepdown procedure, the final mode included all variables previously included in the immune-oncology final model, plus COL9A1 (Table 10). This model has a Pseudo R-squared = 0.691, an AUC = 96.9% (93.9-99.9%), reaching a sensitivity of 97.67% with a specificity of 83.72% (Table 11, Figures 5  and 6). Regression coefficients are reported in Table 10. Predicted probability cut points for specificity higher than sensitivity are reported in Table 11. The predicted probability can be calculated from the following model: Predicted probability = 1/(1 + exp(−(−121.4969 − 4.713017 × Gal-9 + 11.1979 × Gal-1 + 9.248928 × MMP7 − 4.163016 × FASLG − 2.687621 × COL9A1))).
The third model was generated by considering in a multivariate logistic regression all proteins included in the two final models, i.e., Gal-1, Gal-9, MMP7. FASLG, CDHR2, NCS1, MLN, FLT3 and COL9A1 (Table 9). After the stepdown procedure, the final model included all variables previously included in the immune-oncology final model, plus COL9A1 (Table 10). This model has a Pseudo R-squared = 0.691, an AUC = 96.9% (93.9-99.9%), reaching a sensitivity of 97.67% with a specificity of 83.72% (Table 11, Figures 5 and 6). Regression coefficients are reported in Table 10. Predicted probability cut points for specificity higher than sensitivity are reported in Table 11. The predicted probability can be calculated from the following model:

Bioinformatic Analysis
We used gProfiler as the classification tool for proteomic enrichment data analysis For the Immuno-oncology panel, proteins (Figure 7) are classified into groups according to their molecular function, biological processes, and protein classes. Regarding molecula function, proteins were categorized into: cytokine receptor binding, cytokine activity, re ceptor-ligand activity, signaling receptor activator activity, signaling receptor regulato activity, signaling receptor binding, chemokine activity, and chemokine receptor binding For biological processes, proteins were categorized into: immune response, immune sys tem process, positive regulation of immune system process, cell surface receptor signaling pathway, regulation of immune system process, cytokine-mediated signaling pathway response to cytokine, and cellular response to cytokine stimulus. Contrastingly, for pro tein class, proteins were categorized into: extracellular region, external side of plasma membrane, extracellular space, cell surface, side of membrane, cell periphery, plasma membrane, and integral component of plasma membrane. Reactome tool grouped these proteins into eight pathways: chemokine receptors bind chemokines, interleukin-10 sig naling, cytokine signaling in immune system, immune system, signaling by interleukins peptide ligand-binding receptors, TNFR2 non-canonical NF-kB pathway, class A/1 (Rho dopsin-like receptors).

Bioinformatic Analysis
We used gProfiler as the classification tool for proteomic enrichment data analysis. For the Immuno-oncology panel, proteins (Figure 7) are classified into groups according to their molecular function, biological processes, and protein classes. Regarding molecular function, proteins were categorized into: cytokine receptor binding, cytokine activity, receptor-ligand activity, signaling receptor activator activity, signaling receptor regulator activity, signaling receptor binding, chemokine activity, and chemokine receptor binding. For biological processes, proteins were categorized into: immune response, immune system process, positive regulation of immune system process, cell surface receptor signaling pathway, regulation of immune system process, cytokine-mediated signaling pathway, response to cytokine, and cellular response to cytokine stimulus. Contrastingly, for protein class, proteins were categorized into: extracellular region, external side of plasma membrane, extracellular space, cell surface, side of membrane, cell periphery, plasma membrane, and integral component of plasma membrane. Reactome tool grouped these proteins into eight pathways: chemokine receptors bind chemokines, interleukin-10 signaling, cytokine signaling in immune system, immune system, signaling by interleukins, peptide ligand-binding receptors, TNFR2 non-canonical NF-kB pathway, class A/1 (Rhodopsin-like receptors). The same analysis was performed for the Target 96 Oncology III panel ( Figure 8). Proteins were classified into groups according to their molecular function, biological processes, and protein class. The molecular function categories were: protein binding. The biological processes were: negative regulation of endothelial cell proliferation. The protein class categories were: extracellular region, extracellular space, vesicle, extracellular exosome, extracellular vesicle, extracellular organelle, and extracellular membrane-bounded organelle. Reactome tool analysis classified these proteins into four pathways: CLEC7A/inflammasome pathway, Defective CSF2RA causes SMDP4, and Defective CSF2RB causes SMDP5, Innate Immune System.  The same analysis was performed for the Target 96 Oncology III panel (Figure 8). Proteins were classified into groups according to their molecular function, biological processes, and protein class. The molecular function categories were: protein binding. The biological processes were: negative regulation of endothelial cell proliferation. The protein class categories were: extracellular region, extracellular space, vesicle, extracellular exosome, extracellular vesicle, extracellular organelle, and extracellular membrane-bounded organelle. Reactome tool analysis classified these proteins into four pathways: CLEC7A/inflammasome pathway, Defective CSF2RA causes SMDP4, and Defective CSF2RB causes SMDP5, Innate Immune System. The same analysis was performed for the Target 96 Oncology III panel ( Figure 8). Proteins were classified into groups according to their molecular function, biological processes, and protein class. The molecular function categories were: protein binding. The biological processes were: negative regulation of endothelial cell proliferation. The protein class categories were: extracellular region, extracellular space, vesicle, extracellular exosome, extracellular vesicle, extracellular organelle, and extracellular membrane-bounded organelle. Reactome tool analysis classified these proteins into four pathways: CLEC7A/inflammasome pathway, Defective CSF2RA causes SMDP4, and Defective CSF2RB causes SMDP5, Innate Immune System.

Discussion
Biomarkers play a key role in oncological applications, including diagnosis of the disease, prognosis and determination of personalized treatment [42]. High throughput technology of TMT LC\MS-MS allowed for the identification and quantification of several candidate biomarkers in a large cohort of patients [43]. In recent years, there has been a great effort to identify and validate biomarkers in EC, but, until now, no candidate biomarker has reached the clinical stage [43,44].
To identify early candidate biomarkers in EC, we exploited for the first time PEA technology in targeted proteomics, which had already been used successfully for identification of biomarkers in several pathologies. From the 92 proteins of the Immuno-oncologic panel, only 20 were selected with a binary logistic regression p-value below 0.01. A multivariate logistic regression analysis based on four proteins (Gal-9, Gal-1, MMP7, FASLG) allowed us to separate cases from controls with an AUC = 95.4%. From the Target 96 Oncology III, only 24 proteins were selected with a binary logistic regression p-value below 0.02. A multivariate logistic regression analysis based on five proteins (CDHR2, NCS1, MLN, FLT3, COL9A1) allowed us to separate cases from controls with an AUC = 88.9%. According to these results, the performance of the Immuno-oncologic panel was better than the Target 96 Oncology III panel. To further improve the model, we performed a multivariate logistic regression, including all proteins from the first (Gal-1, Gal-9, MMP7, FASLG) and one from the second model (COL9A1) obtaining an AUC = 96.9%.
In this work, we identified five proteins, namely, Gal-1, Gal-9, MMP7, FASLG, and COL9A1, that based on their known function in endometrial and other cancers might represent useful early-stage EC biomarkers, upon a validation phase.
Gal-1 is a small lectin that binds beta-galactoside and a wide array of complex carbohydrates. This protein acts as an immunosuppressive molecule and is expressed by different types of cancer cells [51]. Once secreted, Gal-1 binds to the glycosylated receptor of immune cells, leading to their inhibition and consequently to the immune escape of cancer cells [52]. In EC, Mylonas and colleagues performed an immunohistochemistry study of Gal-1 and Gal-9, finding a correlation between these proteins and EC clinicopathological features. [53] Indeed, high expression of Gal-1 is associated with poor prognosis, while high expression of Gal-9 is associated with early pathological changes. In addition, Gal-1 also correlates with lymphangiosis, a poor prognostic marker in EC [54].
MMP7 is a small enzyme that degrades several types of galectins, casein and fibronectin [55]. MMP7 promotes tumor cell invasion and migration, digesting the extracellular matrix (ECM) and components of cell surface proteins [56]. This protein is a promising diagnostic and prognostic biomarker of pancreatic cancer [57] and bladder cancer [58]. Its biochemical characteristics make MMP7 a potential target in several types of cancers [56]. Downregulation of MMP7 leads to a reduced proliferation and migration of tumor cells in gallbladder cancer [59] and reduces the cisplatin resistance in NCSLC cells [60]. In EC, high expression of MMP7 correlates with higher lymph node invasion [61] and increased risk of metastasis [62]. These data are supported also by in vitro assays (Misugi et al.), confirming that increased expression of MMP-7 in high-grade ECs may be correlated with tumor invasion and the protein may be a prognostic marker in EC [63].
COL9A1 is a structural component of hyaline cartilage and vitreous of the eye; the COL9A1 gene localizes in chromosome 16q13 [64]. Several studies correlates COL9A1 with breast cancer [65] and oral squamous cell carcinoma [66]. In EC, computational analysis of RNAseq data deposited in the TCGA database show that COL9A1 expression is increased both in primary and metastatic tumor (https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC7173926/) accessed on 4 July 2022.
Lastly, FASLG binds to TNFRSF6/FAS receptor inducing apoptosis [67]. This protein is important to maintain the immune homeostasis and for the elimination of cancer cells [68]. Epithelial mesenchymal transition (EMT) and stiffness make cancer cells more aggressive by excessive production and secretion of FASLG [69]. McGlorthan et al. described a possible mechanism explaining how progesterone and calcitriol induce apoptosis in EC, relying on the induction of FasL, Fas, and FADD expression, which, in turn, activates the caspase-8 pathway [70]. Accordingly, a genetic study showed that the homozygous CC variant of FASL −844 T>C polymorphism confers protection against EC [71].
The Reactome analysis of our five candidate proteins also identified several pathways related with the immunity system. Of them, chemokines and chemokine receptors not only take part in immune regulation but also play a key role in tumor development. Moreover, chemokines and chemokine receptors are related with angiogenesis, metastasis, drug resistance, and immunity of breast cancer [72]. For example, IL-10 plays a key role on the regulation of several genes in gastric cancer cells involved in cell proliferation and migration [73]. Interestingly, cytokines are small proteins that play an important role in cellular functions, such as proliferation, differentiation, and survival, as well as the response to pathogens. These proteins induce the activation of the JAK-STAT pathway, which is fundamental in the regulation of immune system and tumor surveillance [74].
Altogether, in our study, we found a panel of proteins whose secretion in blood correlates with early EC and can be exploited as a diagnostic biomarker upon further validation, although we know that there are some limitations. First, since it was very difficult for us to find controls with the same age as the EC patients, the controls are slightly younger than the EC patients. We acknowledge this is a limitation and that levels of some proteins might change with age. The next step will be the validation of these results with age-matched cases and controls.
Another limit of our study is the small number of patients, which does not allow us to generalize to the overall population. Even if the results are satisfactory, more studies are certainly required to confirm and consolidate these findings.

Conclusions
In conclusion, by combining proteins from the Immuno-oncology panel and the Target 96 Oncology III panel, we were able to generate an algorithm that was able to discriminate early EC type I patients from controls with high specificity and sensitivity thanks to the analysis of Gal-1, Gal-9, MMP7, COL9A1, and FASLG serum levels. Although Gal-1, Gal-9, MMP7, COL9A1, and FASLG are overexpressed in different kinds of cancers, the analysis of their serum levels allows one to discriminate between healthy controls and woman affected by type I EC, and combining this analysis with a typical clinical manifestation of EC, such us bleeding and pelvic pain, might help early EC diagnosis, avoiding invasive diagnostic techniques.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/biomedicines10081857/s1, Table S1: The clinical and pathological characteristics of the patients.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of IRCCS Burlo Garofolo and Regional Ethics Committee (protocol code RC18/19 approved in 2019 and CEUR-2020-Os-030).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ethical reasons.