Application of the Interaction between Tissue Immunohistochemistry Staining and Clinicopathological Factors for Evaluating the Risk of Oral Cancer Progression by Hierarchical Clustering Analysis: A Case-Control Study in a Taiwanese Population

The aim of this single-center case-control study is to investigate the feasibility and accuracy of oral cancer protein risk stratification (OCPRS) to analyze the risk of cancer progression. All patients diagnosed with oral cancer in Taiwan, between 2012 and 2014, and who underwent surgical intervention were selected for the study. The tissue was further processed for immunohistochemistry (IHC) for 21 target proteins. Analyses were performed using the results of IHC staining, clinicopathological characteristics, and survival outcomes. Novel stratifications with a hierarchical clustering approach and combinations were applied using the Cox proportional hazard regression model. Of the 163 participants recruited, 102 patients were analyzed, and OCPRS successfully identified patients with different progression-free survival (PFS) profiles in high-risk (53 subjects) versus low-risk (49 subjects) groups (p = 0.012). OCPRS was composed of cytoplasmic PLK1, phosphoMet, and SGK2 IHC staining. After controlling for the influence of clinicopathological features, high-risk patients were 2.33 times more likely to experience cancer progression than low-risk patients (p = 0.020). In the multivariate model, patients with extranodal extension (HR = 2.66, p = 0.045) demonstrated a significantly increased risk for disease progression. Risk stratification with OCPRS provided distinct PFS groups for patients with oral cancer after surgical intervention. OCPRS appears suitable for routine clinical use for progression and prognosis estimation.


Introduction
Oral squamous cell cancer (OSCC) is the sixth most common cancer worldwide, with 630,000 new cases and 350,000 deaths estimated annually [1]. Its striking worldwide incidence and socioeconomic burden encourage extensive research on factors that could modify clinical outcomes [2]. Despite multidisciplinary interventions, a high incidence of oral cancer recurrence and metastasis affects the quality of life and survival in patients [3,4]. Therefore, it remains crucial to identify prognostic biomarkers and risk stratifications for improved disease management.
Reported and well-known risk factors for oral cancer include alcohol consumption, betel nut use, and cigarette smoking [5,6]. Several predictive factors, including age, ethnicity, gender, primary site, grade, and therapy, have demonstrated associations between these sociodemographic factors and survival in oral and pharyngeal carcinoma [2]. In clinical practice, the tumor-node-metastasis (TNM) system is the most prevalent tool for prognostic evaluation of OSCC. However, this system lacks immediacy and convenience, necessitating extensive physical and imaging examinations. Furthermore, biological phenotypes and clinical presentations differ even at identical diagnostic stages. More accurate and timely prognostic biomarkers are requisite for OSCC, especially in Asian populations.
Reportedly, DNA repair gene XRCC1 polymorphisms could alter the activity of the XRCC1 protein, leading to defective DNA repair and influencing p53 gene mutation, which has demonstrated a negative impact in Taiwanese patients with OSCC [7]. In Korea, Choi et al. have reported that patterns with single nucleotide polymorphisms in ECRG1 and FGFR4 genes were associated with the clinical nodal status. The FGFR4 Arg allele carrier correlated with advanced nodal stage when compared with the Gly allele [8]. Conversely, positive prognostic markers have been reported, including APOBEC3A. In Taiwanese data, high APOBEC3A expression, especially among APOBEC3B-deletion alleles, has been associated with better overall survival (OS) [9]. Capillary electrophoresismass spectrometry (CE-MS) metabolome analysis of saliva samples has been performed in Japanese patients with OSCC, and 25 metabolites have been identified as potential markers that could be used to distinguish between patients with OSCC and healthy controls [10]. Proteins encoded by EGFR, TP53, CCND1, and RB1 are associated with OSCC progression [11][12][13]. In a study evaluating 55 patients regularly using betel nut, immunohistochemistry (IHC) of cyclin D1, MDM2, and γ-catenin has revealed their prognostic potential in buccal squamous cell carcinoma (SCC) [11]. In United States, a similar report demonstrated that APE1, as the DNA repair and redox gene regulator, served as a potential prognostic signature that identifies patients with worsened survival [14]. In Brazil, the expressions of DNA nucleotide repair proteins, TFIIH and XPF, had a potential value for predicting the progression of tongue cancer patients [15]. DNA mismatch repair deficiency in Australian patients with oral cancer was associated with more advanced primary tumors [16]. However, genomic and molecular profiles of OSCC to guide clinical medicine remain scarce.
Hierarchical agglomerative clustering analysis has demonstrated a rapid and invaluable strategy to manage high-dimensional datasets [17], demonstrating the ability to simultaneously dissect substantial data with multiple layers of the clustering structure. Hierarchical clustering algorithms have been applied for exploratory analysis of gene expression data [18]. It is highly sensitive to background noise and can recognize interactions between factors by considering the similarity within a cluster and dissimilarity between clusters [19]. Previously, hierarchical agglomerative clustering algorithms have been successfully applied to the functional grouping of biological data [20,21]. These methods have been used to identify important clinical features in various cancers, including lung and breast cancers [22,23]. However, hierarchy results derived from the clustering algorithm are generally difficult to apply to clinical settings. Therefore, corresponding risk modules based on the agglomerative clustering results are necessary to generalize the findings for clinical applications.
Reportedly, some well-established IHC markers provide useful prognostic and predictive information in addition to classical clinical factors [24]; however, most have not been validated for clinical use. The Cancer Genome Atlas demonstrated a comprehensive landscape of somatic genomic alterations for head and neck cancer [25]. Similarly, Gene Expression Omnibus (GEO) database also provided a high throughput platform for recognition of more potential predictors [26]. Molecular methods with DNA amplifications scattered from 8q22.2 to 8q24.3 is a candidate molecular signature associated with poor prognosis in OSCC patients [27]. Further patient-tailored identification of biomarkers and therapeutic candidate alteration is an important issue that needs to be faced up. Synthetic lethality (SL) seems to play an important role in oral cancer for the promising results of antineoplastic agents, poly (ADP-ribose) polymerases (PARP) inhibitors [28,29]. The primary aim of this study was to stratify different patient risks based on newly discovered IHC markers associated with synthetic lethality using a hierarchical clustering approach, providing the best prognostic information for patients with OSCC. We analyzed the IHC data and clinicopathological and prognostic features in Taiwanese patients with OSCC using our study cohort. Finally, to classify patients and obtain an overall prediction model, we generated a prognostic model integrating oral cancer protein risk stratification (OCPRS) with the most significant IHC markers.

Data Set
This study was approved by the Institutional Review Board and Ethics Committee of Kaohsiung Medical University Hospital (KMUHIRB-E(I)-20170034, approved on 10 March 2017). The data were analyzed anonymously, and therefore, no informed consent was required. All methods were performed under approved guidelines and regulations. We collected 163 cases of oral cavity cancers from the Kaohsiung Medical University Hospital with a 5-year follow-up. The inclusion criteria were: 20 years of age or older at diagnosis, histology of SCC with grade 1 to grade 3, ICD-9 site code specific for the oral cavity, patients who underwent surgical interventions, and diagnosis between 2012 and 2014. The exclusion criteria included: patients who underwent biopsy without surgical intervention, with secondary malignancy, tumor histology of carcinoma in situ, and SCC from the nasopharynx, oropharynx, hypopharynx, and larynx. Histological grades were defined as grade 1, well differentiated; grade 2, moderately differentiated; grade 3, poorly differentiated. We collected medical and demographic data, including age, gender, alcohol consumption, betel nut usage, tobacco habits, and other clinical parameters, retrospectively from the medical records or during patient interviews. The clinicopathological factors included histologic type and grade, tumor size, lymph node status, surgical margin, perineural invasion (PNI), lymphovascular invasion (LVI), and extranodal extension (ENE). Patients without complete clinical data and clinicopathological factors were excluded, and 102 patients were analyzed. We evaluated the results of a retrospective study with the primary endpoint of assessing outcomes at a comprehensive cancer institution in southern Taiwan. We analyzed the OS and progression-free survival (PFS) (defined as the time from registration to objective disease progression or death from any cause) after surgical intervention.

Computation of Gene Expression Profiles for Oral Cavity Cancer versus Non-Cancerous Tissues
The approach to successfully find out novel IHC prognostic markers associated with synthetic lethality in colorectal cancer and lung adenocarcinoma was adopted in our study [30,31]. As previous studies, we selected a list of SL-associated genes, including several oncogenes, tumor-suppressor genes, and genome stability genes. From these validated SL-associated genes, twenty-one genes were used for IHC staining at different cellular locations. We combined the associated 32 individual IHC expressions and identified novel IHC prognostic markers among them. Figure 1 illustrates the study workflow for target genes selection from the validated SL gene pairs and the yield of protein staining matrix according to the 32 individual IHC. First, we selected 742 SL pairs relevant to OSCC, and obtained the microarray gene expression data from the cancer genome atlas (TCGA) of 79 Asian OSCC. Gene expression datasets were screened according to the following parameters: cancerous and noncancerous tissues, no treatments, no metastasis, and Affymetrix chips (up to November 2010). OSCC genes were downloaded from the GEO database [26]. Gene expression data were collected from patients of Han Chinese origin (57 OSCC and 22 noncancerous tissues from Taiwanese patients, GSE 25099), the same ethnicity as that of IHC and clinicopathological data used previously [27]. Gene expression profiles for the 57 OSCC and 22 noncancerous tissues in the dataset were quantile-normalized using "expresso" in R, and log ratios were computed for the target gene expression in each cancerous tissue versus the mean expression in the noncancerous tissues. The selected SL gene pairs were further sorted by the fractions of the upregulation and downregulation patterns, and the SL pairs with 1.5-fold differentially expressed in fractions computed from gene pairs were selected as target genes. Overall, 21 genes were selected using the above criteria, and the cancer specimen collected from the Taiwanese population in the current study were then used to produce tissue microarrays with three cancerous and one noncancerous tissue cores as our previous study [32]. The tissue microarrays were further processed for IHC for 21 target proteins in different cellular components including nucleus (nu), cytoplasm (cy), and membrane (mem). Hence, a total of 32 protein staining scores were obtained from the 21 target proteins.

Protein Staining
Representative sections of the hematoxylin and eosin (H&E)-stained biopsy-confirmed tissues of the 102 patients with OSCC were selected by pathologists (Chun-Chieh Wu and Yi-Ting Chen). Three cancerous and one noncancerous tissue cores (diameter 2

Protein Staining
Representative sections of the hematoxylin and eosin (H&E)-stained biopsy-confirmed tissues of the 102 patients with OSCC were selected by pathologists (Chun-Chieh Wu and Yi-Ting Chen). Three cancerous and one noncancerous tissue cores (diameter 2 mm) were longitudinally cut from each paraffin block and mounted with fine steel needles in new paraffin blocks to produce tissue microarrays.
The scoring criteria used were the same as those previously described [32]. Staining intensity was graded as negative (0), indeterminate (±), weakly positive (1+), moderately positive (2+), or strongly positive (3+). Negative (0) indicates no expression of the detected protein, indeterminate means that the staining is weak and its percentage cannot be accurately counted, weakly positive indicates <5% expression of the detected protein, moderately positive implies a focal expression in 5-20% of the cancer cells, and strongly positive indicates diffuse expression in >20% of the cancer cells. For the cancer tissue, staining intensity was compared with that of noncancerous oral mucosa, categorized as either overexpression or underexpression. Results of duplicate cores of each cancer tissue were combined to give a tumor score. When the two scores differed, the mean of the two scores was used as the overall tumor score. Cores were considered assessable if there was enough tumor tissue for evaluating the immunohistochemical staining. If one core was not assessable, the overall tumor score was the mean of the remaining assessable cores. Cores were regarded as not assessable in case of sampling error (<10% tumor cells in the core, for example only stroma) or absent core (<10% of the tissue was present in the core). For each protein, the staining results of each patient were visualized using a heatmap plot with normalized staining scores ( Figure 1).

Ward's Agglomerative Hierarchical Clustering
Overall, 32 protein staining expressions from 102 patients were normalized and converted into a 32 × 102 matrix. Agglomerative hierarchical clustering with Ward' s method was used to cluster the protein staining expression matrix to build a hierarchy for included protein staining. Ward's agglomerative hierarchical clustering algorithm divided the protein staining expression into n partitions according to their similarity. Silhouette analysis was used to estimate the optimal number of clusters for the input n × m matrix by estimating the average distance between clusters. The silhouette index s i measures the similarity between clusters and indicates whether the clustering configuration is appropriate. The protein staining hierarchical clustering was simply divided into three steps. We started with each object in an n × m matrix. Second, we used the merge cost formula shown in Equation (2) to ascertain the closest pair of clusters by merging the minimum merge cost objects. Third, the tree of cluster merges was returned and the second step was repeated until all objects were merged in the optimal number of clusters measured by the silhouette index. Thus, each cluster C j includes k number of hierarchy protein P with staining expression. The detail algorithm and description for the Ward's agglomerative hierarchical clustering algorithm are described in Supplementary Materials.

OCPRS
The detail algorithm and description for OCPRS are described in Supplementary Materials. Hence, a risk stratification formula was derived to provide a quick and simple risk estimation using PLK1_cy, PhosphoMet_cy, and SGK2_cy staining results.
The agglomerative distance D h for high-risk strata was computed as follows: The mean of PLK1_cy, PhosphoMet_cy, and SGK2_cy in the high-risk cluster were 1.490, 0.962, and 0.981, respectively. Thus, D h was computed as follows: The agglomerative distance D l for low-risk strata was computed as follows: The mean of PLK1_cy, PhosphoMet_cy, and SGK2_cy in the low-risk cluster were 2.310, 1.840, and 1.590, respectively. Thus, D l was computed as follows.
Lastly, each patient was dichotomized into high-and low-risk strata by comparing D h and D l using Equation (S10) (Supplementary Materials).

Statistical Analyses
The patient baseline characteristics are presented as frequency, percentage, or mean and standard deviation (SD). Survival outcomes, including PFS and OS, between highand low-risk strata derived by OCPRS, were analyzed using the Kaplan-Meier method. Survival differences between high-and low-risk strata were tested using the log-rank test. A Cox proportional hazard regression model was used to identify the independent risks of baseline characteristics and OCPRS risk strata for survival outcomes. All statistical analyses were two-sided, and a p-value < 0.05 was considered statistically significant, performed using the computing environment R 3.5.3 (R Core Team, 2019). Overall, 37 patients demonstrated alcohol addiction (36.3%), 75 patients admitted betel nut use (73.5%), and 87 patients were tobacco users (85.3%). Most primary sites were of buccal origin (58.8%). According to the pathological grading system, grade 1 was observed in 48 patients, grade 2 in 52 patients, and grade 3 in 2 patients. A positive margin of the surgical specimen was observed in 6 patients, LVI in 10 patients, PNI in 13 patients, and ENE in 9 patients. The mean OSCC tumor size was 2.4 ± 1.5 cm. Seventy-six patients (74.5%) presented positive lymph node invasion. According to the 8th edition of the AJCC/UICC TNM staging system [33,34], pathological stages I and II were observed in 61 patients, with pathological stages III and IV observed in 41 patients. Finally, 26 patients died, and disease progression was documented in 36 patients during the follow-up period.  Figure 2 presents Ward's agglomerative hierarchical clustering results according to the protein staining expression matrix illustrated in Figure 1. Figure 2A illustrates the average silhouette width of each cluster using a line plot, and the dashed line indicates the optimal number of clusters is ten according to the silhouette index. Figure 2B presents the dendrogram of hierarchical clustering results of 32 protein stainings. Figure 2C visualizes the protein staining in a scatter plot with each protein staining colored according to its assigned cluster.

Hierarchical Clustering Results of Protein Staining
During the follow-up period, 36 subjects experienced disease progression, and 26 fatalities were documented. Table 2 summarizes the distribution of death and progressed subjects according to the agglomerative distance dichotomous results. Two of ten protein staining clusters derived from the hierarchical clustering analysis showed significant survival differences in PFS or OS. Within the 8th protein staining cluster (including PLK1_cy, PhosphoMet_cy, and SGK2_cy), 53 and 49 subjects were dichotomized into high-risk and low-risk strata, respectively. The high-risk strata of the 8th cluster identified 75.0% (25 of 36) of progressed subjects and 76.9% (20 of 26) of dead subjects. The log-rank test results demonstrated that the high-risk strata demonstrated a significant survival difference when compared to low-risk strata in PFS (p = 0.012). Although the 5th protein staining cluster (including EGFR_mem, CDK6_nu, and PIM1_cy) showed significant survival differences in OS between strata (p = 0.015), identifying 100% (36 of 36) progressed subjects and 96.2% (25 of 26) dead subjects in high-risk strata, the results were attributed to the extremely imbalanced dichotomous results between high-risk (101 subjects) and low-risk (1 subject) strata. Hence, the 5th protein staining cluster was excluded in further analysis. The results demonstrated that protein staining, including PLK1_cy, PhosphoMet_cy, and SGK2_cy, could significantly predict oral cancer progression. Figure 2 presents Ward's agglomerative hierarchical clustering results according to the protein staining expression matrix illustrated in Figure 1. Figure 2A illustrates the average silhouette width of each cluster using a line plot, and the dashed line indicates the optimal number of clusters is ten according to the silhouette index. Figure 2B presents the dendrogram of hierarchical clustering results of 32 protein stainings. Figure 2C visualizes the protein staining in a scatter plot with each protein staining colored according to its assigned cluster.   n a and n b indicate the progressed and died numbers of subjects within strata, respectively. p a and p b indicate the log-rank test p-value of progression-free and overall free survival, respectively. * Statistically significant (p < 0.05). Figure 3A compares the survival curves between high-risk and low-risk strata derived from the 8th protein staining cluster. The high-risk strata showed a significantly poor PFS when compared to low-risk strata. Figure 3B and Supplementary Figure S1 Figure 4. However, there are no differences in baseline characteristics between these two groups except for disease progression (47.2% vs. 22.4%, p = 0.016), which is shown in Supplementary Table S2.   The PFS and OS results using the Cox proportional hazard regression analysis are summarized in Tables 3 and 4, respectively. In the survival analysis, the 8th protein staining cluster was included and analyzed using the common survival predictors. In PFS, the high-risk subjects demonstrated a significantly increased risk for disease progression when compared with low-risk subjects in both univariate (hazard ratio (HR) = 2.41, 95% confidence interval (CI) = 1.19-4.91, p = 0.015) and multivariate (HR = 2.33, 95% CI = 1.14-4.75, p = 0.020) analyses. In the multivariate model, patients with ENE (HR = 2.66, 95% CI = 1.02-6.95, p = 0.045) demonstrated a significantly increased disease progression risk. In addition, we provided Supplementary Table S3 for comparison of the prediction ability of different protein location on the overall mortality and disease progression. In our analysis, different localizations of 11 proteins had little impact on mortality and disease progression.

Hierarchical Clustering Analysis of the Optimal Combination of Protein Staining
In OS, no significant difference was observed in the risk of death between high-risk and low-risk subjects in the univariate (HR = 1.79, 95% CI = 0.80-4.02, p = 0.157) analyses. However, OS was highly associated with common clinical factors, including grade and ENE. Patients with higher histological grade demonstrated a significantly increased mortality risk when compared with the subjects presenting lower histological grade in both univariate (hazard ratio [HR] = 3.65, 95% confidence interval (CI) = 1.46-9.10, p = 0.006) and multivariate (HR =3.05, 95% CI = 1.17-7.90, p = 0.022) OS analyses. Patients with ENE showed a significantly increased mortality risk when compared with those without ENE in both univariate (hazard ratio (HR) = 6.94, 95% confidence interval [CI] = 2.71-17.82, p < 0.001) and multivariate (HR = 3.46, 95% CI = 1.05-11.41, p = 0.042) OS analyses. Collectively, stratification via the 8th protein staining cluster (including PLK1_cy, Phos-phoMet_cy, SGK2_cy) demonstrated a novel predictor of disease progression in oral cancer. The PFS and OS results using the Cox proportional hazard regression analysis are summarized in Tables 3 and 4, respectively. In the survival analysis, the 8th protein staining cluster was included and analyzed using the common survival predictors. In PFS, the high-risk subjects demonstrated a significantly increased risk for disease progression when compared with low-risk subjects in both univariate (hazard ratio (HR) = 2.41, 95% confidence interval (CI) = 1.19-4.91, p = 0.015) and multivariate (HR = 2.33, 95% CI = 1.14-4.75, p = 0.020) analyses. In the multivariate model, patients with ENE (HR = 2.66, 95% CI = 1.02-6.95, p = 0.045) demonstrated a significantly increased disease progression risk. In addition, we provided Supplementary Table S3 for comparison of the prediction ability of different protein location on the overall mortality and disease progression. In our analysis, different localizations of 11 proteins had little impact on mortality and disease progression.

Discussion
Oral cancer is a multifactorial malignancy. Several studies have evaluated demographic, epidemiological, histopathological, and molecular prognostic factors that could impact disease outcomes [2]. Previously, some researchers have analyzed the correlation between different factors and prognosis; however, none can individually influence the prognosis of patients with oral cancer. Determining the outcomes and prognosis in patients with oral cancer should incorporate diverse aspects and statistical methods. The hierarchical agglomerative clustering algorithm could effectively recognize the interaction between high-dimensional protein staining matrices by considering similarities in protein clusters. A corresponding risk module, OCPRS risk estimation modules, is derived according to the agglomerative clustering results, enabling the generalization of current study IHC findings in clinical settings. Furthermore, the OCPRS modules could be applied to survival analysis, including the Cox model, to investigate the simultaneous impact of baseline clinical characteristics and OCPRS risk on the survival outcome.
In our study, we collected OSCC demographic data, staging, imaging, surgical interventions, pathological interpretations, survivals, and outcomes concerning 163 patients, and analyzed the profiles of 102 patients. All patients underwent surgery, performed according to national guidelines in centralized settings, with adequate specimens acquired. No singular factor interfered with the progression and survival outcomes. This is favorable, suggesting that the prognosis of patients with OSCC should not consider solitary factors or single markers. Next, we incorporated all clinicopathological features and IHC staining results and utilized the hierarchical clustering analysis to determine optimal combinations of protein staining. We stratified patients into high-risk and low-risk groups according to the 8th protein staining cluster (including PLK1_cy, PhosphoMet_cy, and SGK2_cy). Finally, we developed a unique and novel approach to adopt the prognostic usefulness of a scoring system, OCPRS, for the diagnosis of patients with oral cancer. The OCPRS was generated based on a hierarchical agglomerative algorithm, sensitive to background noise that allows a decrease in type 1 errors (false positive). Consistent with previous studies, we demonstrated that the agglomerative hierarchical clustering algorithm is advantageous for handling high-dimensional data with uncertain interactions between factors [35][36][37]. OCPRS could recognize the interaction between factors by considering the similarities within a protein cluster.
Furthermore, we investigated and predicted disease progression using the Cox proportional hazards regression analysis, controlled for the influence of clinicopathological features, indicating that patients with high-risk cancer were 2.33 times more likely to experience cancer progression than those with low-risk cancer (95% CI for the hazard ratio = 1.14-4.75; p = 0.020). However, similar results were not observed in OS (HR = 1.79, 95% CI = 0.80-4.02, p = 0.157) in univariate analysis. Patients with ENE demonstrated an increased risk of progression in both univariate (p = 0.030) and multivariate models (p = 0.045). Patients with higher histological grade and ENE demonstrated an increased mortality risk in both the univariate (p = 0.006 and p < 0.001) and multivariate models (p = 0.022 and p = 0.042, respectively). Patients with higher histological grade presented a higher risk of progression in the univariate model (p = 0.043), with no significant effects in the multivariate model (p = 0.079). Patients with PNI, larger tumor size, lymph node invasion, and pathological stage demonstrated an increased mortality risk in the univariate model, with no significant effects in the multivariate model. These results suggest that there exist interactions that interfere with these clinicopathological factors [38,39].
The EGFR and CDK6 were found significantly associated with the overall survival in 5th protein staining cluster (including EGFR_mem, CDK6_nu, and PIM1_cy), which could identify 100% (36 of 36) progressed subjects and 96.2% (25 of 26) dead subjects in high-risk strata. However, the results were attributed to the extremely imbalanced dichotomous results between high-risk (101 subjects) and low-risk (1 subject) strata. The results indicate the EGFR and CDK6 might obtain a good sensitivity to detect high-risk patients, but the low specificity to survival outcome estimation leads to the omission of EGFR and CDK6 in current study. Thence, further research to further investigate the interaction between EGFR, CDK6, and others factors using machine learning approaches such as evolutionary and optimization algorithm is necessary to identify the high-risk subjects more accurately.
Besides, our study raised an interesting issue of localization of IHC staining. In previous studies of ovarian cancer, different localizations of thyroid hormone receptors and αvβ3 integrin demonstrated different nuclear and non-nuclear signaling pathway for cancerogenesis [40,41]. The translocation of transmembrane proteins from the cell membrane to the nucleus may be another crucial mechanism [42]. In oral and oropharyngeal cancer, different membrane and cytoplasmic CD44 expressions also determined distinct clinical outcomes [43]. In head and neck cancer, translocalization of ING3 affected tumorigenesis and cancer progression [44]. However, overall mortality and disease progression appeared unaffected by different localizations of 11 proteins in our study.

Conclusions
Here, we developed a new statistical approach using hierarchical clustering. By combining predictive IHC biomarkers, we newly defined OCPRS to predict PFS outcomes for oral cancer. OCPRS is clinically available and easily measurable for the staining of surgical specimens. High-risk OCPRS with evaluation of cytoplasmic PLK1, PhosphoMet, and SGK2 staining is useful for the stratification of clinical outcomes. Therefore, OCPRS may be used as a promising biomarker for predicting progression outcomes and stratifying risk groups for oral cancer.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/diagnostics11060925/s1, Figure S1. IHC staining (magnification 400×) of 8th protein staining cluster and associated H&E images (magnification 200×) of high-risk and low-risk patients, Table S1. The antibodies and retrieval buffers for each protein, Table S2. Baseline characteristics according to identified protein cluster, Table S3. Comparison of the prediction ability of different protein location on overall mortality and disease-progressed.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare that they have no conflict of interest.