Molecular Differences between Squamous Cell Carcinoma and Adenocarcinoma Cervical Cancer Subtypes: Potential Prognostic Biomarkers

The most frequently diagnosed histological types of cervical cancer (CC) are squamous cell carcinoma (SCC) and adenocarcinoma (ADC). Clinically, the prognosis of both types is controversial. A molecular profile that distinguishes each histological subtype and predicts the prognosis would be of great benefit to CC patients. Methods: The transcriptome of CC patients from The Cancer Genome Atlas (TCGA) was analyzed using the DESeq2 package to obtain the differentially expressed genes (DEGs) between ADC and SCC. The DEGs were validated on a publicly available Mexican-Mestizo patient transcriptome dataset (GSE56303). The global biological pathways involving the DEGs were obtained using the Webgestalt platform. The associations of the DEGs with Overall Survival (OS) were assessed. Finally, three DEGs were validated by RT-qPCR in an independent cohort of Mexican patients. Results. The molecular profiles of ADC and SCC of the CC patients of the TCGA database and the Mexican-Mestizo cohort (GSE56303) were determined obtaining 1768 and 88 DEGs, respectively. Strikingly, 70 genes were concordant—with similar Log2FoldChange values—in both cohorts. The 70 DEGs were involved in IL-17, JAK/STAT, and Ras signaling. Kaplan-Meier OS analysis from the Mexican-Mestizo cohort showed that higher GABRB2 and TSPAN8 and lower TMEM40 expression were associated with better OS. Similar results were found in an independent Mexican cohort. Conclusions: Molecular differences were detected between the ADC and SCC subtypes; however, further studies are required to define the appropriate prognostic biomarker for each histological type.


Introduction
Despite early screening programs, cervical cancer (CC) is an unresolved health issue. Although its incidence has decreased by approximately 40%, it is currently the fourth leading cause of cancer-related death in women, accounting for an estimated 341,000 deaths worldwide [1]. Unfortunately, in developing countries, most tumors are diagnosed at advanced clinical stages (75-90%), at which point the tumors are often unresponsive to

Analysis of Differential Expression
TCGA data of all CC samples (Stage I-IV) [17,18] were downloaded using the Bioconductor package TCGA biolinks [19]. Differential expression analysis was carried out using the DESeq2 package [20], where an adjusted (adj) p-value < 0.01 and a log2FoldChange >2 or <2 were considered significant. Additionally, we analyzed the expression profiles of an independent of 89 Mexico-Mestizo patients diagnosed with local advanced cervical cancer (LACC), obtained from the publicly available database GEO GSE56303 using the R package GEOquery [2,21,22]. This dataset was previously used to obtain an mRNA signature capable of predicting the treatment response in CC patients [2]. The data obtained were normalized using the robust multiarray averaging (RMA) method, and differential expression analysis was carried out using the limma package [23]. An adj p-value < 0.05 was considered in order to indicate a significant difference. To construct the heatmaps, a Z-Score transformation was applied to the normalized gene expression values; next, we employed a Hierarchical cluster analysis using the Euclidian distance and the complete-linkage clustering algorithm with the hclust R package.

Cervical Samples
The Mexican independent cohort included 31 CC patients from 2010 to 2013 through the Instituto Nacional de Cancerología of Mexico City (INCan). Of these, fifteen were ADCs and sixteen were SCCs. This study was approved by INCan's Review Board and Ethics Committee (015/012/IBI-CEI/961/15). All patients of this study agreed and signed the consent form. After a punch biopsy, tumor samples were segmented into two pieces, one for pathological confirmation and another for nucleic acid separation.

RNA Isolation and qPCR
In order to validate differentially expressed genes (DEG), we performed validation by qPCR. Total RNA was extracted from three cuts of FFPE tissue blocks using the RNeasy FFPE Kit (Cat. No. 73504 Qiagen. Hilden, Germany). RNA was quantified in a Qubit 3.0 Fluorometer with a Qubit RNA HS Assay Kit (Cat. No. Q32852. Thermo Fisher Scientific. Waltham, MA, USA). Reverse transcription was performed with 1000 ng of RNA using a High-Capacity cDNA Reverse Transcription Kit (Cat. No. 4374966. Thermo Fisher Scientific. Vilnius, Lituania), following the manufacturer's recommendations. Quantitative real time PCR was performed with Luminaris HiGreen qPCR Master Mix (Cat. No. K0994) on Step One System (Thermo Fisher Scientific) with primers for GABRB2 Fw: GCACTGGGCAGACTAAGTTGG Rv: GGGTCATTGACACTCTGCGCA, TSPAN8 Fw: TGGTCCTGTATTGCCAGATCG Rv: GGTCTAGCTAGCCGAGACATTT and TMEM40 Fw: CCAGAAGTTTAGGCTGACAGGGT Rv: GGGCTGGTGCCAACATTCAGAC, data were normalized with GAPDH housekeeping gene.

CC Transcriptome and Pathway Analyses
Pathway analysis of the differentially expressed genes of both cohorts (TCGA and Mexican-Mestizo cohort) was assessed using a Gene Set Enrichment Analysis (GSEA) with the Webgestalt platform [24], where a p-value < 0.05 was considered to indicate a significant difference.

OS Analysis
We obtained the clinical data from the 309 cervical tumors from the TCGA, the 89 patients from the Mexican-Mestizo cohort and the 31 patients in the Mexican independent cohort. For each cohort, patients were divided in two groups: high and low, depending in the median expression of each gene. Next, the OS of each group was evaluated using the Kaplan-Meier method, and the statistical significance of survival differences was determined with the log-rank test. Univariate and multivariate Cox proportional hazard regressions were assessed using the survival package, implemented in the R language.
A p-value < 0.05 was considered as significant. OS was defined as the time diagnosis to the date of death or last contact. It is important to explain that for the OS analyses, we possessed complete 5-year follow-up information for the Mexican-Mestizo cohort and the Mexican independent cohort. However, for the TCGA patients, we used only the data that were available, since follow-up information was not available for all patients 3. Results 3.1. Determination of the Molecular Profile Distinguishing ADC and SCC among CC Patients from the TCGA Database Our first aim was to determine molecular differences between SCC and ADC. Thus, we analyzed the transcriptomes of the TCGA CC cohort. This dataset contains information on 260 SCC and 49 ADC tumors from patients of different ethnicities. The clinical characteristics are shown in Table 1. It should be noted that the clinical data of the TCGA cohort only contained the HPV information from 22 patients (7%) (Table S1), whereas 63 patients (70.78%) from the Mexican-Mestizo cohort were positive to HPV infection, along with all patients from Mexican independent cohort (Table S2). The results showed 1768 unique DEGs between SCC and ADC (adj p-value < 0.05, log2FoldChange >2 or <2) (Table S3), where most of the ADC tumors were clustered together ( Figure 1). This finding confirms the existence of molecular differences between the two histological types of cervical tumors. A list of the top DEGs between the two histological types is depicted in Table 2.

Validation of the Molecular Profile Obtained from the TCGA Database in a Mexican-Mestizo Independent Cohort
To validate the differential expression profile between SCC and ADC obtained from the TCGA data analysis, we utilized an online, publicly available transcriptome dataset (GSE56303) [2]. This dataset only includes a Mexican-Mestizo population, which comprises 89 CC patients, 81 of whom had SCC and 8 of whom had ADC. The analysis of DEGs between the two histological types revealed a profile of 130 transcripts that separated CC patients into ADC vs. SCC subgroups (p adj < 0.05) (Table S4). Strikingly, the 130 differentially expressed transcripts corresponded to 88 unique genes, of which 52 were overexpressed and 36 were downregulated ( Figure 2). HNF1A-AS1, SPINK1, EPS8L3, PROM1, and TM4SF5 were the most overexpressed genes, whereas CLCA2, CALML3, PKP1, GPR87, and SPRR2A were the most downregulated genes in ADC (Table 3). This finding suggested a histology-driven molecular profile.
Notably, of the 88 unique genes differentially expressed between ADC and SCC patients in the Mexican-Mestizo cohort, 70 were also differentially expressed in the TCGA cohort and even presented a similar log2FoldChange values ( Figure 3) (Table S5). These findings show a high concordance of the molecular differences of the two histological types between cohorts.  Each column represents a patient with CC, and each row represents the expressio gene. Gene expression changes are represented in blue (upregulated), red (downregulated white (no significant change or the absence of data). Patients with SCC are represented in gree ADCs in purple. There were 1768 unique DEGs between SCC and ADC tumors (adj p-value < log2FoldChange >2 or <2), and most of the ADC tumors were clustered together.

Validation of the Molecular Profile Obtained from the TCGA Database in a Mexican-Mestizo Independent Cohort
To validate the differential expression profile between SCC and ADC obtained the TCGA data analysis, we utilized an online, publicly available transcriptome da (GSE56303) [2]. This dataset only includes a Mexican-Mestizo population, which prises 89 CC patients, 81 of whom had SCC and 8 of whom had ADC. The analy DEGs between the two histological types revealed a profile of 130 transcripts that rated CC patients into ADC vs. SCC subgroups (p adj < 0.05) (Table S4). Strikingly, th differentially expressed transcripts corresponded to 88 unique genes, of which 52 overexpressed and 36 were downregulated ( Figure 2). HNF1A-AS1, SPINK1, EP PROM1, and TM4SF5 were the most overexpressed genes, whereas CLCA2, CAL PKP1, GPR87, and SPRR2A were the most downregulated genes in ADC (Table 3). finding suggested a histology-driven molecular profile. Each column represents a patient with CC, and each row represents the expression of a gene. Gene expression changes are represented in blue (upregulated), red (downregulated), and white (no significant change or the absence of data). Patients with SCC are represented in green and ADCs in purple. There were 1768 unique DEGs between SCC and ADC tumors (adj p-value < 0.05, log2FoldChange >2 or <2), and most of the ADC tumors were clustered together. Notably, of the 88 unique genes differentially expressed between ADC and SCC patients in the Mexican-Mestizo cohort, 70 were also differentially expressed in the TCGA cohort and even presented a similar log2FoldChange values ( Figure 3) (Table S5). These findings show a high concordance of the molecular differences of the two histological types between cohorts.

Signaling Pathway Enrichment Analysis in DEGs between ADC and SCC Subtypes
Once we recognized that there was a differential expression profile between SCC and ADC, we decided to determine which signaling pathways the DEGs were involved in. For this purpose, we performed an analysis using the Webgestalt platform exploiting three different databases-the Kyoto Encyclopedia of Genes and Genomes (KEGG), WikiPathways, and Reactome databases-to obtain a global pathway assessment of the deregulation of both subtypes. It is important to mention that each database has a different set of manually curated genes; thus, the results could differ between them. In the KEGG database analysis, the pathways with the highest number of deregulated genes were the cytokine-cytokine receptor interaction, estrogen signaling and IL-17 signaling pathways. Interestingly, we also found that the JAK-STAT signaling pathway was significantly enriched in this analysis ( Figure 4A). When assessing the WikiPathways database, we found that Ras signaling was highly enriched in DEGs ( Figure 4B). In the Reactome database analysis, we also found deregulation in pathways associated with the immune response, such as cytokine-cytokine receptor interaction, and again, the estrogen signaling pathways, and the IL-17 and JAK-STAT signaling pathways ( Figure 4C). Interestingly, our results show that SCC tumors have a higher level of activation of critical cancer pathways, such as IL17, JAK-STAT, and Ras, than ADC tumors ( Figure 4A-C). Overall, we assessed different databases and observed enrichment of multiple signaling pathways that are commonly dysregulated in cancer, suggesting that our results are highly reliable and independent of the database used.
tion of both subtypes. It is important to mention that each database has a different set of manually curated genes; thus, the results could differ between them. In the KEGG database analysis, the pathways with the highest number of deregulated genes were the cytokine-cytokine receptor interaction, estrogen signaling and IL-17 signaling pathways. Interestingly, we also found that the JAK-STAT signaling pathway was significantly enriched in this analysis ( Figure 4A). When assessing the WikiPathways database, we found that Ras signaling was highly enriched in DEGs ( Figure 4B). In the Reactome database analysis, we also found deregulation in pathways associated with the immune response, such as cytokine-cytokine receptor interaction, and again, the estrogen signaling pathways, and the IL-17 and JAK-STAT signaling pathways ( Figure 4C). Interestingly, our results show that SCC tumors have a higher level of activation of critical cancer pathways, such as IL17, JAK-STAT, and Ras, than ADC tumors ( Figure 4A-C). Overall, we assessed different databases and observed enrichment of multiple signaling pathways that are commonly dysregulated in cancer, suggesting that our results are highly reliable and independent of the database used.  To determine whether the DEGs between SCC and ADC are correlated with OS, we used survival data from the TCGA, Mexican-Mestizo, and independent Mexican cohorts. The Kaplan-Meier curves showed no significant difference between the survival of SCC vs. ADC patients in the TCGA dataset ( Figure S1). However, when we used the survival data from the Mexican-Mestizo cohort, the Kaplan-Meier analysis showed that high expression of the DEGs GABRB2 and TSPAN8 and underexpression of TMEM40 were associated with favorable OS ( Figure 5). Moreover, the univariate and multivariate cox-regressions, using these genes, HPV infection, and tumor histology as covariates, showed that only TMEM40 and GABRB2 were able to independently differentiate the OS of the patients, whereas HPV infection or tumor histology were not significant factors for the OS (Likelihood ratio test p = 0.05, Wald test p = 0.09, Logrank test p = 0.06) (Table 4). Additionally, we performed qPCR to validate the expression of the DEGs GABRB2, TSPAN8, and TMEM40 in the 31 patients of the independent Mexican cohort. Using the median expression value of each gene validated, patients were sorted into two groups, those with high and low expression. Similarly to Mexican-Mestizo cohort, Kaplan-Meier curves showed that Mexican patients with high expression of GABRB2 and underexpression of TMEM40 were associated with favorable OS (Figure 6).
color represents the significance of the analysis, where red shows a more significantly enriched pathway.
To determine whether the DEGs between SCC and ADC are correlated with OS, we used survival data from the TCGA, Mexican-Mestizo, and independent Mexican cohorts. The Kaplan-Meier curves showed no significant difference between the survival of SCC vs. ADC patients in the TCGA dataset ( Figure S1). However, when we used the survival data from the Mexican-Mestizo cohort, the Kaplan-Meier analysis showed that high expression of the DEGs GABRB2 and TSPAN8 and underexpression of TMEM40 were associated with favorable OS ( Figure 5). Moreover, the univariate and multivariate cox-regressions, using these genes, HPV infection, and tumor histology as covariates, showed that only TMEM40 and GABRB2 were able to independently differentiate the OS of the patients, whereas HPV infection or tumor histology were not significant factors for the OS (Likelihood ratio test p = 0.05, Wald test p = 0.09, Logrank test p = 0.06) ( Table 4). Additionally, we performed qPCR to validate the expression of the DEGs GABRB2, TSPAN8, and TMEM40 in the 31 patients of the independent Mexican cohort. Using the median expression value of each gene validated, patients were sorted into two groups, those with high and low expression. Similarly to Mexican-Mestizo cohort, Kaplan-Meier curves showed that Mexican patients with high expression of GABRB2 and underexpression of TMEM40 were associated with favorable OS (Figure 6).

Discussion
It is well described that persistent infection with HR-HPV is the causal factor in the development of CC [25]. Several reports have described that HPV16 is more frequent in SCCs, and HPV18 and HPV45 in ADCs [5][6][7]. Hence, it is to be expected that there are molecular and clinical differences in both histologic subtypes. In this regard, several reports have described controversial data about the prognosis of CC patients with ADC and SCC. For instance, some of them have reported that ADC is associated with a poorer prognosis than SCC [9][10][11]. However, other studies described that ADC and SCC displayed equivalent survival outcomes [11,12]. For instance, in a small cohort of patients, the 5-year

Discussion
It is well described that persistent infection with HR-HPV is the causal factor in the development of CC [25]. Several reports have described that HPV16 is more frequent in SCCs, and HPV18 and HPV45 in ADCs [5][6][7]. Hence, it is to be expected that there are molecular and clinical differences in both histologic subtypes. In this regard, several reports have described controversial data about the prognosis of CC patients with ADC and SCC. For instance, some of them have reported that ADC is associated with a poorer prognosis than SCC [9][10][11]. However, other studies described that ADC and SCC displayed equivalent survival outcomes [11,12]. For instance, in a small cohort of patients, the 5-year OS was comparable for ADC and SCC (98.2% and 95.2%, respectively, p = 0.369) [14]. Additionally, in a larger cohort of patients, it was reported that there was no difference in the 5-year survival rate between SCC and ADC patients (87.3% vs. 82.4%; p > 0.05) [12]. In this respect, researchers have focused on identifying molecular differences unique to one or another histological subtype so that they can be used as prognostic biomarkers for these patients. Nonetheless, this field needs to be investigated further. In this work, we identified a molecular profile of DEGs between ADC and SCC, in CC patients from TCGA and in the independent Mexican-Mestizo cohort. Despite the fact that the TCGA database includes patients from different ethnic groups and all clinical stages, we detected that, of the DEGs, 70 overlapped with genes from the Mexican-Mestizo database and even displayed a similar log2FoldChange values, thus demonstrating a high concordance in the molecular differences between the two histological types in the datasets. Similarly, other studies have demonstrated molecular differences between SCC and ADC histologic types of CC patients. For instance, Lin W.E. et al. analyzed TCGA data and reported 1733 DEGs between ADC and SCC from CC patients [13]. Likewise, Chao A. et al., through microarray assays, revealed a profile of DEGs between ADC and SCC samples from CC patients; TSPAN-3, CEACAM5, TACSTD1, MSLN, and S100P were of particular interest, as they were overexpressed in ADC [15]. Moreover, transcription factors encoding genes PAX6, PDX1, HNF4A, HNF1A, HNF4G, and FOXA3 show higher expression levels in cervical ADC as compared with cervical SCC [16]. These findings confirm that there are molecular differences between ADC and SCC.
The molecular differences between ADC and SCC also have an impact on signaling pathways that promote disease. In this work, the KEGG, WikiPathways, and Reactome databases were employed to elucidate these differences. We found that SCC tumors have a higher level of activation of critical cancer pathways, such as IL-17, JAK/STAT, and Ras signaling, compared to ADC tumors. In this respect, it is well-known that IL-17 is the central cytokine of the Th17 response during persistent infection with HR-HPV; this response triggers chronic inflammation of a long duration and the production of IL-17, among other pro-inflammatory cytokines, hence creating a favorable environment for the tumor development associated with associated with poor CC prognosis [26][27][28]. Additionally, Punt S. et al. reported that the predominant cell type expressing IL-17 in SCC CC is the neutrophilic granulocyte and directly contributes to tumorigenesis [29,30]. Although there are no studies that evaluate the correlation between IL-17 and its histological subtypes (ADC and SCC), we observed that SCC has a higher level of IL-17 pathway activation. JAK/STAT signaling was another deregulated pathway in our study. This pathway has been found involved in proliferation, invasion, survival, inflammation, and immunity in CC patients [31]. Strikingly, there is a connection between JAK/STAT pathway and Th17 cells, since JAK/STAT pathway is necessary for the differentiation of Th cells [32]. The aberrant signaling of JAK/STAT pathway has not been associated with any histological subtype (ADC vs. SCC). However, it is expected that this pathway is altered in both subtypes, nonetheless we were able to distinguish that it is enriched in the SCC subtype, which is an interesting fact that needs to be analyzed in depth. Another pathway enriched in our study was Ras signaling. Aberrant activation of this pathway is common in several cancers, including CC, which often results from the presence of mutations and amplifications of KRAS. For instance, Zou Y. et al. reported that the frequency of the KRAS mutation ranged from 8.0-17.5% in cervical ADC [33], to absent or rare in SCC, which suggested that KRAS mutations are frequent and might be a driving factor for the development of cervical ADC but not SCC [34,35]. However, in our study, we observed that the Ras pathway has a higher level of activation in SCC than ADC, which may be due to mutations or alterations in other elements of the pathway [33]. According to the fact that these pathways are more overexpressed in SCC than in ADC, it leads us to speculate that SCC seems to be more aggressive than ADC; however, the candidate pathways and genes for distinguishing the subtypes need to be further verified.
A main focus of this work was to identify potential genes associated with the prognosis of ADC or SCC, and we generated Kaplan-Meier survival curves with the Mexican-Mestizo, Mexican, and TCGA patient's clinical data. We noticed that GABRB2 and TSPAN8 overexpression and TMEM40 under-expression were associated with favorable OS, in the Mexican-Mestizo and independent Mexican cohorts. Regarding the role of these genes in cancer, there has been only one report associating GABRB2 overexpression with lymph node metastasis in thyroid cancer [36]. TSPAN8 expression is correlated with a poor prognosis in breast cancer [37], renal cell carcinoma [38], and pancreatic cancer [39]. However, its role in CC had not yet been studied. On the other hand, TMEM40 has been poorly studied in cancer, but a study proposed that this protein plays a crucial role in proliferation and apoptosis via the p53 signaling pathway [40] and may be a potential diagnostic biomarker for bladder cancer [41]. In our study, we noticed that the expression levels of GABRB2, TSPAN8, and TMEM40 were not associated with the OS of TCGA CC patients ( Figure  S1). In this respect, it is well-known that the TCGA database contains incomplete or no information regarding the survival of many patients. This often occurs because almost all of the patients are untreated and thus have no response data and short follow-up periods [17]. Moreover, Lin W.E. noted that the TCGA database is interesting because it includes patients of different ethnicities; however, although TCGA datasets are generally large, they may not be representative of the general population [13]. It is possible that for this reason, in our analysis, we did not find any association between the expression of these genes and OS with the TCGA CC patients. Likewise, it is reasonable to assume that there is a consistent molecular profile for each histological subtype. Taken together, it is important to note that we found molecular differences between the two histologies (ADC vs. SCC); nevertheless, in order to identify a specific prognostic biomarker for each subtype it is necessary to expand the sample of ADCs, since, in the cohorts used, including the TCGA cohort, the number of ADCs is limited. Therefore, the correlation of these genes with the survival or prognosis of patients' needs to be studied meticulously.

Conclusions
These findings are relevant since they show a high concordance of molecular differences for ADC versus SCC between independent cohorts, independently of HPV type, thus opening a window of opportunity to identify new prognostic biomarkers by histological type. Nonetheless, further studies are required to define these findings.