Epigenetic Alterations Associated with the Overall Survival and Recurrence Free Survival among Oral Squamous Cell Carcinoma Patients.

Oral squamous cell carcinoma (OSCC) is a fatal disease caused by complex interactions between environmental, genomic, and epigenetic alterations. In the current study, we aimed to identify clusters of genes whose promoter methylation status correlated with various tested clinical features. Molecular datasets of genetic and methylation analysis based on whole-genome sequencing of 159 OSCC patients were obtained from the The Cancer Genome Atlas (TCGA) data portal. Genes were clustered based on their methylation status and were tested for their association with demographic, pathological, and clinical features of the patients. Overall, seven clusters of genes were revealed that showed a significant association with the overall survival/recurrence free survival of patients. The top ranked genes within cluster 4, which showed the worst prognosis, primarily acted as paraneoplastic genes, while the genes within cluster 6 primarily acted as anti-tumor genes. A significant difference was found regarding the mean age in the different clusters. No significant correlation was found between the tumor staging and the different clusters. In conclusion, our result provided a proof-of-principle for the existence of phenotypic diversity among the epigenetic clusters of OSCC and demonstrated the utility of the use epigenetics alterations in devolving new prognostic and therapeutics tools for OSCC patients.


Introduction
Head and neck cancer rank as the sixth most common malignancy, with a yearly incidence of 830,000 cases worldwide. Compared to other head and neck neoplasms, oral squamous cell carcinoma (OSCC) compromise about 90% of the subtypes within this spectrum with~40%-50% mortality [1,2]. Research advances in the study of OSCC complexity have revealed that the induction and development of OSCC are due to a sum of genetic changes, epigenetic alterations, and environmental risk factors, especially tobacco, alcohol consumption, and viral infections [3][4][5][6][7].
Due to the heterogeneous nature of oral cancer, the functional and cosmetic results, and the coexistence of frequent medical comorbidities, treatment options of OSCC are evaluated through a multidisciplinary approach, before reaching a final plan for the specific OSCC patient. However, surgical resection with microscopically clear margins of the primary tumor and prophylactic or therapeutic clearance of the lymph nodes, followed by various reconstructive approaches, remains the fundamental treatment for OSCC with adjuvant therapy reserved for high-risk disease [8][9][10][11][12].
Briefly, the term epigenetics describes potentially reversible heritable changes in the genome that are not due to changes in the primary nucleotide sequence of the deoxyribonucleic acid (DNA) itself but rather is due to the interpretation of the genome. The primary mechanisms of epigenetic carcinogenesis involve DNA methylation, histone modifications, and small and non-coding RNAs (ncRNA), which ultimately orchestrate complex gene regulatory pathways [13,14]. In the past few years, it has become apparent that mutations in genes encoding proteins that regulate epigenetic modifications are common in human cancers. Some of these mutations drive tumor initiation, whereas others influence cell growth, immune invasion, metastasis, heterogeneity, and even drug resistance [15]. Taking into account the potential reversibility of these changes, targeting epigenetic alterations is increasingly being recognized as an attractive strategy for cancer therapy [16].
Nowadays, several advancements have been made in the development of potentially useful painless and non-invasive diagnostic and prognostic tools for OSCC, based on combined clinical and molecular data. Predicting human cancer-related clusters using a high throughput data sets and genetic and epigenetic network is critical to gain an understanding of disease mechanisms, and is also essential for the development of new diagnostics and therapeutics. Clusters, in general, are of great importance because they not only provide concrete hypotheses about the molecular complexes and signaling pathways, but also offer mechanistic hypotheses about the causes of disease [17]. The usage of clustering algorithms were initially proposed to identify the functional modules or protein complexes in particular phenotypes, and found that disease genes that cause similar diseases exhibit an increased tendency for their protein products to interact with each other. In recent years, many studies have shown the utility of these studies in extracting disease-related clusters/subnetworks and inferring disease-causing genes [17][18][19][20][21][22].
Here, we aimed to identify the clusters of genes whose promoter methylation levels correlated with various tested clinical features, using available clinical and methylation data derived from 157 OSCC patients from The Cancer Genome Atlas (TCGA).

Methods
Molecular data sets of 528 head and neck carcinoma patients were obtained from the TCGA data portal (https://cancergenome.nih.gov/) [23] and the Genome Data Analysis Center (GDCA). Genomic processing of the molecular data sets was done using cBioPortal for Cancer Genomic analyses (http://www.cbioportal.org/) [24].
The molecular datasets included genetic and methylation analysis based on whole-genome sequencing. The human papillomavirus (HPV) status was defined using an empirical definition of >1000 mapped RNA-seq reads, primarily aligning to the viral genes E6 and E7 [25]. The HPV status by mapping of RNA-seq reads was concordant with the genomic, sequencing, and molecular data, and indicated that 36 tumors were HPV-positive and 243 were HPV-negative (To eliminate unnecessary molecular or genetic diversity, only HPV-negative and pathologically-proven oral cavity tumors were included in this study (n = 159).
In terms of epigenetic alterations analysis, we classified the samples into consensus clusters, to determine differentially expressed marker genes for each subtype, this way we were able to define the patients into several subgroups, based on genes' methylation profiles. The clustering analysis for the study cohort was based on data available from the Broad Institute TCGA Genome Data Analysis Center (2016) [24]. The clustering analysis calculated clusters based on a consensus non-negative matrix factorization (NMF) clustering method, which converted the input data set (Table S1) to a non-negative matrix, through column rank normalization and by determining differently expressed major genes into different subtypes. This method was based on an unsupervised learning algorithm that identifies a molecular pattern in complex biological systems, when applied to gene expression data [24]. The top 4160 genes, with maximum standard deviations across beta values, were selected (default cutoff 2). For a better assignment for the sample into the different clusters, the cophenetic correlation coefficients were applied. The reliability for each sample was measured and then assigned to the same cluster across many iterations of the clustering algorithm with random initializations. The consistency for each cluster was determined using the average silhouette values, while the silhouette width was defined as the ratio of the average distance of each sample to the samples in the same cluster to the smallest distance to samples not in the same cluster. If silhouette width was close to 1, it meant that the sample was well-clustered. If silhouette width was close to −1, it meant that the sample was misclassified. The silhouette width was calculated using the R silhouette package [26].
The pathological staging was based on the American Joint Committee on Cancer, 7th edition [27], and overall survival (OS), and recurrence-free survival (RFS) were estimated from the clinically available data using the Kaplan-Meier analysis. Follow-up time was defined as the time that passed from the date of the initial diagnosis, as seen on the pathological report of the biopsy, until either the date of death or the last clinical follow-up, as recorded in the files. The correlation between several clinical parameters (such as pathological staging, alcohol and smoking consumption, gender, race) and promoter genes methylation, to investigate the impact of epigenetic alterations on clinical characteristics.

Statistical Analysis
Cross-tab analysis was done to investigate the correlation between clinical parameters and methylation status (cluster-based), using a two-sided Chi-square test. In addition, the association between recurrence and the different clusters was assessed using Fisher's exact test; P value <0.05 was considered to be statistically significant.

Results
The study cohort included 159 patients, 105 males, and 54 females. The mean age at diagnosis was 62 ± 13 years. Alcohol and tobacco consumption were reported in 63% and 51% of patients, respectively ( Table 1). The primary tumor distribution is presented in Figure 1; the tongue was the most common primary tumor site (44%). Based on the aforementioned criteria, 79% of patients had negative margins, 10% had close margins, and 6% had positive margins. Perineural invasion (PNI) was found in 74 (46%) patients of whom only 14 (18%) had local recurrence. Neck dissection (either selective or radical) was performed in 137 (86%) patients. A total of 70 (44%) patients had lymph node metastasis, as seen in the histopathology, with an average of 2 positive lymph nodes for each patient. The mean follow-up period was 26 months. Thirty-eight patients presented with local recurrence (27 male and 11 females), and the average time for recurrence (measured from the day of diagnosis) was 16 months. Clinical parameters that were found to be significant as risk factors for recurrence were-alcohol consumption (P-value = 0.01), primary tumors located in the buccal mucosa (P-value = 0.03), positive surgical margins (P-value = 0.04), and the pathological T staging of the tumor (P-value = 0.05).

Clusters analysis
Each sample that was included in this study was assigned to a specific cluster based on the promoter methylation of several marker genes. Overall, 4160 major genes were analyzed and were clustered into seven different clusters. Table 2 lists the top 10 methylated genes for each cluster. Each

Clusters Analysis
Each sample that was included in this study was assigned to a specific cluster based on the promoter methylation of several marker genes. Overall, 4160 major genes were analyzed and were clustered into seven different clusters. Table 2 lists the top 10 methylated genes for each cluster. Each sample was assigned to the most representative cluster, based on core samples that were identified on the basis of the positive silhouette width. Hereby, each cluster included samples with more similarity to the other samples in the same cluster than to any other cluster, using Student's t-test. The consistency for each cluster was determined by the average silhouette values, for each sample, the silhouette width was calculated, and the overall average of all samples was calculated. Figure 2 shows the average silhouette value, ranged between 0.12 and 0.22, for each cluster 1,2,3,4,5,6,7. As shown, the samples showed a well-clustered pattern. Table 2. List of top 10 marker genes with p ≤ 0.05 (the positive value of the column difference means gene is upregulated in this subtype and vice versa) in each cluster.

Correlation between clusters pattern and demographic, pathological, and clinical features
Demographically, the most predominant cluster was cluster 3 (30% of the patients), with a mean age of 67.55 years, the male-to-female ratio in this cluster was 26/20. A significant difference was

Correlation between Clusters Pattern and Demographic, Pathological, and Clinical Features
Demographically, the most predominant cluster was cluster 3 (30% of the patients), with a mean age of 67.55 years, the male-to-female ratio in this cluster was 26/20. A significant difference was found regarding mean age in the different clusters, patients belonging to clusters 2 and 6 were significantly younger than patients in the other clusters (P-value < 0.05) ( Table 3). In terms of pathological parameters (Table 4), no significant correlation was found between the tumor staging and the different cluster. Oral tongue was the predominant primary tumor site in all clusters, except for cluster 4, which showed the worst prognosis in terms of mean survival time and recurrence-free survival (P-value = 0.001).  A pairwise comparison between the different clusters revealed an obvious relation between the different clusters and the overall survival/recurrence free survival of patients (Figures 3 and 4, respectively). The overall mean survival time was 66 months. Nonetheless, the survival time for cluster 6 was significantly higher than the mean survival time of all other clusters, especially clusters 4 and 2 (80 versus 36, 54 months, P-value = 0.04). Moreover, cluster 4 showed the worst survival time compared to all other clusters (P-value < 0.05). In terms of recurrence-free survival, the overall locoregional recurrence rate was 23%; while cluster 1 showed a recurrence rate of 19%, and in cluster 6 it was 20%. On the other hand, in clusters 2 and 3 the recurrence rate was 28% and 30%, respectively.  To suggest and pinpoint specific candidate genes, which might be related to OSCC development, we identified the top 5 mythelated genes in both; the worst and the best cluster in terms of mean survival time and recurrence-free survival (i.e., cluster 4 and cluster 6, respectively). The top five mythelated genes in cluster 4 were as follows-PNMAL2, RGS7BP, GCKR, PAX7, and IL2RA. The mythelation event resulted in up-regulation and overexpression of all of them, except for the RGS7BP gene. Interestingly, these genes are well-known for their paraneoplastic mechanism. On the other  To suggest and pinpoint specific candidate genes, which might be related to OSCC development, we identified the top 5 mythelated genes in both; the worst and the best cluster in terms of mean survival time and recurrence-free survival (i.e., cluster 4 and cluster 6, respectively). The top five mythelated genes in cluster 4 were as follows-PNMAL2, RGS7BP, GCKR, PAX7, and IL2RA. The mythelation event resulted in up-regulation and overexpression of all of them, except for the RGS7BP gene. Interestingly, these genes are well-known for their paraneoplastic mechanism. On the other To suggest and pinpoint specific candidate genes, which might be related to OSCC development, we identified the top 5 mythelated genes in both; the worst and the best cluster in terms of mean survival time and recurrence-free survival (i.e., cluster 4 and cluster 6, respectively). The top five mythelated genes in cluster 4 were as follows-PNMAL2, RGS7BP, GCKR, PAX7, and IL2RA. The mythelation event resulted in up-regulation and overexpression of all of them, except for the RGS7BP gene. Interestingly, these genes are well-known for their paraneoplastic mechanism. On the other hand, 3 genes (ANKRD11, BCL11A, and FOXN3) out of the top 5 mythelated genes in cluster 6 are well-known in their anti-tumor action (Table 2).

Discussion
OSCC squamous cell carcinoma is a fatal human disease that undoubtedly remains a health priority, offers significant therapeutic challenges. Although slightly improved prognosis was reported over the past decades, OSCC is still a major public health problem with poor overall 5-year survival rates. Therefore, searching for somatic genetic and epigenetic causes have been of interest among many head and of neck surgeons, oncologists, and oncological researchers. Many laboratories have discovered oncogenes and tumor suppressors for OSCC. However, cumulative evidence revealed more complex mechanisms underlying the development and progression of this disease, including, among others, interactions between genomic and epigenetic alterations [28][29][30][31] Epigenetic alterations are responsible for the regulation of ontologically-related gene expression networks, at an appropriate level of environmental conditions and time, leading to a rise of both normal and diseased phenotype development.
As the first step toward identifying specific epigenetic alterations at specific genes that might be involved in development and progression of OOSC, we tested the association between demographic, pathological, and clinical features of 159 OSCC patients and different clusters of expressed genes that were significantly different in terms of methylation pattern alterations, using the available high-throughput data of OSCC patients. A significant heterogeneity of the epigenetics landscape in OSCC diseases was observed in the current study, confirmed by the well-clustered pattern in the clustering analysis.
Interestingly, cluster patterns based on methylation status were to be associated with worse overall survival and recurrence free survival in patients. The survival difference between the two identified clusters, cluster 4 vs. cluster 6, represented a clear difference in the lifetime of these patients. These results fall in agreement with previous reports that showed a significant association between promoter methylation status and worse OSCC patients' survival [31][32][33][34]. These results demonstrated the utility of epigenetic alterations detection for potential clinical application in OSCC patients. Noteworthy, our data showed that oral tongue was the predominant primary tumor site in all clusters, except for cluster 4, which showed the worst prognosis in terms of mean survival time and recurrence-free survival. Although no correlation was observed between the cancer site and the cluster patterns, this result might indicate that epigenetic alterations are site-related.
In terms of pathological parameters, our data showed that no significant correlation between the tumor staging or pathological features and the different cluster's patterns had occurred. This finding is consistent with previous study, which showed a significant correlation between methylation status and worse patient survival, independent of other potential prognostic factors, such as tumor size, lymph node status, clinical stage, and history of tobacco and alcohol use [32]. This finding might indicate that different epigenetic alterations affect independently tumor biology and prognosis, and seemingly indicate that methylation events in particular genes can be used as an independent prognostic factor. We speculate that this might also be explained by the tumorgenesis effect of epigenetic alterations, which alter the normal cell biology and trigger tumorgenesis but have little effect on the tumor microenvironment that directly affects the pathological features of the tumor.
Focusing on the individual gene members of the epigenetic signatures related to the worst and best survival (cluster 4 vs. cluster 6, respectively), we see that genes within cluster 4 are highly relevant to the paraneoplastic genes, while the genes within cluster 6 primarily act as anti-tumor genes. Regarding the suggested genes within the top five regulated gens in cluster 4; gene PNMAL2 is a protein coding gene that act as a paraneoplastic Ma antigen [35][36][37]. The next most mythelated gene was in this cluster, RGS7BP; this gene down-regulation acts as a paraneoplastic alteration. Another gene is GCKR, which play a significant role in cancer cell metabolism. The PAX7 gene play critical roles during fetal development and cancer growth [38]. Finally, IL2RA is an Interleukin 2 receptor subunit alpha protein involved in suppressing the activity of the immune system against tumor cells [39]. To conclude, the overall epigenetic signature of this cluster is paraneoplastic. On the other hand, 3 out of the 5 top mythelated genes in cluster 6 act as anti-tumor genes. For instance, ANKRD11 encodes an ankryin repeat domain-containing protein that inhibits ligand-dependent activation of transcription and prefoliation. BCL11A gene encodes a C2H2-type zinc-finger protein. During hematopoietic cell differentiation, this gene was down-regulated, and it is believed that this gene was involved in lymphoma pathogenesis, since translocations associated with B-cell malignancies also deregulate its expression. In cluster 6, this gene was down regulated, which might demonstrate its anti-cancer role in this cluster [40]. Finally, FOXN3 is a protein-coding gene which has a well-known suppressive role in the progression of colon cancer [41]. To conclude, these results suggest and the nominate number of candidate genes that might be related to the prognosis of the OSCC patients. However, other confirmation and validation studies need to be performed to avoid false positive results.
Indeed, different drugs for human cancers directed at epigenetic modulators have entered clinical development during the last years. These drugs are directed to the three main players of epigenetic modifications; first, the enzymes that place the active and repressive epigenetic marks (writers) (i.e., DNA methyltransferase (DNMT) inhibitors 5-azacytidine (Azacitidine), 5-aza-2 -deoxycytidine (Decitabine), and others; second, those that remove these modifications (erasers) (i.e., histone deacetylase (HDAC) inhibitors); and finally, proteins that recognize the marks (readers). However, except for some agents, the clinical activity of the current epigenetic inhibitors as single agents is largely restricted to hematopoietic malignancies rather than solid cancer type [16,[42][43][44][45]. The current results demonstrate the utility of targeting epigenetic modifications for improving the overall survival and recurrence-free survival among OSCC patient. Additionally, it seems that the potential of developing epigenetic agents against our suggested targets should be explored as the basis for rational combinations with other anti-cancer treatment strategies (i.e., chemotherapy, radiotherapy, and immune therapy). Here, we speculate that the epigenome-wide studies associated with OSCC development, in larger cohorts, might help to locate or suggest single targets for OSCC treatment.
As for the limits in the current study; first; as shown, some of the clinical information are missing in some patients, especially when taking into account that 29% of patients missed the information regarding the primary site of disease, which could lead to results of masking and false positive results. Second, the follow up of the patients was still under 5 years and finally; while this study revealed interesting insight into the involvement of epigenetic events during OSCC developments, the confirmation and validation of the data should be further performed in new independent case control cohorts.

Conclusions
In conclusion, our result provided a proof-of-principle for the existence of phenotypic diversity among the epigenetic clusters in OSCC, and demonstrated the utility of the use epigenetics alteration in devolving new prognostic and therapeutics tools for OSCC patients. Knowing that staging and pathological systems were imperfect predictors for overall survival and recurrence-free survival among OSCC patients, our results suggest a number of novel genes clusters that could potentially be used as prognostic markers, and in patient selection for adjuvant therapy, following primary surgery treatment. We speculate that, pharmacological manipulation targeting these specific genes might convert the most aggressive tumors into a more benign or manageable counterpart in the clinic, to improve survival.