Comprehensive Genomic Review of TCGA Head and Neck Squamous Cell Carcinomas (HNSCC)

The aim of this present study was to comprehensively describe somatic DNA alterations and transcriptional alterations in the last extension of the HNSCC subsets in TCGA, encompassing a total of 528 tumours. In order to achieve this goal, transcriptional analysis, functional enrichment assays, survival analysis, somatic copy number alteration analysis and somatic alteration analysis were carried out. A total of 3491 deregulated genes were found in HNSCC patients, and the functional analysis carried out determined that tissue development and cell differentiation were the most relevant signalling pathways in upregulated and downregulated genes, respectively. Somatic copy number alteration analysis showed a “top five” altered HNSCC genes: CDKN2A (deleted in 32.03% of patients), CDKN2B (deleted in 28.34% of patients), PPFIA1 (amplified in 26.02% of patients), FADD (amplified in 25.63% of patients) and ANO1 (amplified in 25.44% of patients). Somatic mutations analysis revealed TP53 mutation in 72% of the tumour samples followed by TTN (39%), FAT1 (23%) and MUC16 (19%). Another interesting result is the mutual exclusivity pattern that was discovered between the TP53 and PIK3CA mutations, and the co-occurrence of CDKN2A with the TP53 and FAT1 alterations. On analysis to relate differential expression genes and somatic copy number alterations, some genes were overexpressed and amplified, for example, FOXL2, but other deleted genes also showed overexpression, such as CDKN2A. Survival analysis revealed that overexpression of some oncogenes, such as EGFR, CDK6 or CDK4 were associated with poorer prognosis tumours. These new findings help us to develop new therapies and programs for the prevention of HNSCC.


Introduction
Head and neck squamous cell carcinomas (HNSCCs) encompass a heterogeneous group of malignancies that begin in the upper aerodigestive tract mucosae. HNSCCs include cancers of the oral cavity, larynx, and oropharynx [1,2]. According to the GLOBOCAN project, HNSCCs are the sixth most common cancers worldwide, with an annual estimation of 355,000 attributable deaths and 633,000 incident cases [3]. In this vein, the overall five-year survival rate for HNSCCs has not decreased in recent decades and it is still about 60% [4]. Treatment of HNSCCs relies on different combinations of surgery, radiotherapy, and chemotherapy. The rationale behind therapy decision-making has been classically based on TNM staging and the site involved; nonetheless, in recent years, the usefulness of the presence of high-risk human papillomavirus (HR-HPV) infection has been proven [5]. Tobacco use and alcohol intake are the best-characterised modifiable risk factors of HNSCCs, although its magnitude factors may change significantly within the variation of each subgroup [6].
Until 2010, very few efforts had been made in order to unravel the molecular biology of HNSCCs [7]. Recent initiatives have addressed this issue. Among these, the Cancer Genome Atlas (TCGA) consortium has provided the largest publicly available HNSCC dataset, which is considered to be a cornerstone for future research [8]. The impact of differentially expressed genes (DEGs) on the aggressiveness and evolution of tumours has revealed new roles which contribute to carcinogenesis. Today's primary challenge is to differentiate between somatic mutations with oncogenic potential and all of those which crop up later in a tumour's lifetime without contributing to the tumour's development. In this sense, the concept of positive selection has opened a new avenue for the discovery of novel tumorigenic roles in cancer. Mutations that improve the fitness of the tumour cells would be under positive selection pressure, meaning, therefore, that they would recur at elevated rates across patients. This principle is widely used to identify these signals and differentiate them from alterations with real oncogenic power [2,9]. Since somatic genomic alterations are still considered to be very important oncogenic events, changes in tumour transcriptomes have now outdated the former classification of oncogenes which was based exclusively on their mutational profile [10]. The differences in mRNA expression is a new and important point for (1) discovering new signatures related to patients' responses to drugs or treatments, (2) finding out potential oncogenic events and (3) providing further understanding of tumour cell signalling and the adaptability of tumour cells to changing environments. Nevertheless, tumour staging for HNSCC based on transcriptional signatures is not commonly used in treatment decision-making, although it can be very informative from a biological perspective [11,12]. In the past, sequencing technology was limited by the huge amount of generated data that required high levels of computational expertise, nowadays, however, bioinformatic advances have provided the possibility for this information to be integrated, leaving a huge window of opportunity for modern oncology [13].
The aim of the present study was to comprehensively describe somatic DNA alterations and transcriptional alterations in the last extension of the HNSCC subsets in TCGA, encompassing a total of 528 tumours.

Samples and Clinical Data
Data from TCGA-HNSCC patients was collected in a database that had been specifically designed for this purpose, with repeated verification. The clinical variables used were age, sex, tobacco consumption, alcohol consumption, tumour stage and HPV (human papilloma virus) infection. The HPV infection status was determined according to the TCGA network group 2015 classification, using an empiric definition of >1000 mapped RNA sequencing (RNA-Seq) reads, primarily aligning to viral genes E6 and E7.

Transcriptional Analysis
RNA-seq counts were downloaded from the Genomic Data Commons (GDC) data portal. The normalisation and differential expression analyses were then calculated following the instructions detailed in the TCGAbiolinks R package [14]. This package uses a first within-lane normalization to adjust for GC-content effect on read counts, and a second between-lane normalization to adjust for distributional differences. Differentially expressed genes were considered as those which had a (log2(FC) > 1) and false discovery range (FDR) q-value < 0.01 when comparing the mRNA level expression between the TCGA-HNSCC tumour and the normal samples. "exactTest" methods were used to calculate the p-value FDR correction that computes a gene-wise exact test for differences in the means between two groups of negative-binomially distributed counts.
Survival analysis was performed by computing all the genes of the human transcriptome to see which of them can have an impact on HNSCC patients and in what way (if greater or lesser expression) it correlates with worse survival. After that, we have filtered data to show only those that have been linked to cancer and, therefore, fall under the classification of COSMIC cancer gene census (CGC).

Functional Enrichment Assays
Differentially expressed genes were divided into up-or down-regulated groups. Both lists were then used for functional annotation into biological process categories. For this purpose, we used the Metascape web app express analysis mode. This app allows the user to use expression information to determine how GO categories are differentially affected by higher or lower expressed genes in tumour samples.

Somatic Copy Number Alteration (SCNA) Analysis
The performance of the GISTIC algorithm will not be revised in this paper. In summary, we downloaded GISTIC results from the Broad Institute's latest data Firehose run. Significant peak alterations were identified as those with a q-value < 0.25, as described in GISTIC documentation. Using threshold data, we were able to classify each patient for each gene into deeply amplified (+2), diploid (+1, 0 and −1) or deeply deleted (−2). The genes with alterations which affected more than 10% of the TCGA-HNSCC patients were considered as positively selected oncodriver alterations.

Somatic Alteration Analysis
Both the results from the MuTect2 and the VarScan variant callers were downloaded from the GDC data portal using the TCGAbiolinks R package (R version 3.6.1, Vienna, Austria). Mutation annotation format (MAF) files were concatenated and filtered from the duplicated and silent mutations using the Maftools R package. Likewise, all of the mutation data calculations were performed with this last package.

Samples and Clinical Data
The TCGA-HNSCC cohort included 528 tumour samples resected from different locations across the patients' heads and necks. This cohort consisted primarily of tumours from the oral cavity (72.40%), larynx (21.96%) and oropharynx (5.64%). In fact, most of the samples came from the tongue, tonsil and larynx (91.24% of the total 528 samples).
With regards to the patients' data, there was a larger male than female representation (73.11% versus 26.89%, respectively) and samples from 50 to 70-year-old subjects (62.12% of the whole cohort) were the most abundant. In terms of habits, 76.70% of patients were smokers or former smokers, and 66.66% had a history of alcohol consumption. Only tumours in 38 patients (7.19%) could not be associated with either of these two unsafe habits.
With regards to the HPV infection status, we were able to conclude that the current TCGA-HNSC cohort was formed by 36 HPV (+), 243 HPV (−) and 249 HPV "unknown" status samples. Regarding the resection site, the highest proportion of HPV (+) were found in the tonsil (78.94%), whereas the least common areas were the cheek mucosa, lip, mouth and oropharynx, with no infected samples. The second and third positions were BOT (base of tongue) (54.54%) and hypopharynx (50%), respectively.

Transcriptional Alterations and Functional Enrichment Assays
In our study, we identified 3491 DEGs in HNSCC tumour samples when compared to healthy oral tissue. In Figure 1A  The functional annotation of the top transcriptionally altered genes (log (FC) > 4) demonstrated their integration in tumour-relevant signalling pathways. In particular, we found upregulated genes involved in tissue development and the repression of those implicated in cell differentiation ( Figure  2). There were also other genetic circuits enriched in these DEGs involved in cellular matrix reorganization or metabolic rewiring, both of which are important activities for tumour cell fitness. Further analysis indicated the presence of differentially expressed cancer-related genes. In total, 148 oncogenes and tumour suppressors from the COSMIC cancer genes census list were deregulated in these tumours. We highlighted different expression concretely in 10 genes that are commonly known to be deregulated in HNSCC ( Figure 1B). For example, samples from base of tongue, that are concentrated in the left site of the figure, clearly show a high expression of CDKN2A compared to normal samples.
The functional annotation of the top transcriptionally altered genes (log (FC) > 4) demonstrated their integration in tumour-relevant signalling pathways. In particular, we found upregulated genes involved in tissue development and the repression of those implicated in cell differentiation ( Figure 2). There were also other genetic circuits enriched in these DEGs involved in cellular matrix reorganization or metabolic rewiring, both of which are important activities for tumour cell fitness. We also performed gene expression profile attending on tissue origin. We grouped tumour samples according to their localization, based on risk factors and molecular mechanisms implicated, in oral cavity (tongue, cheek mucosa, floor of mouth, base of tongue, hard palate, gum, lower gum, mouth, anterior floor of mouth, mandible, border of tongue, retromolar area, upper gum, palate, ventral surface of tongue), lip (lip, overlapping lesion of lip), oropharynx tonsil, oropharynx, supraglotis, posterior wall of orophayrnx, pharynx and hypopharynx (laringe, hypharynx). A total of 286 samples belonged to the oral cavity group, 55 to oropharynx, 137 to hypopharynx, and 86 to lip. If we select, only those genes that are differentially expressed according to their localization, we observed that in the oral cavity, we obtained 133 genes, in oropharynx 1209 genes, in hypopharynx 224, and in lip 86. To visualize the different profiles obtained in each group, the 20 most relevant DEGs, attending to p-value, from each group are summarized in Tables 1-4. Figure 2. The functional annotation analysis shows the association of the genes with several biological processes. (A) Top differentially expressed genes both up-or down-regulated were annotated functionally in GO terms derived from the biological process. Each GO term is represented as a node and links show the gene sharing level between two nodes. Each colour represents one GO term as indicated in the legend. (B) GO terms are segregated depending on when they are more likely to be affected by up-or down-regulated genes. The heat map indicates the significant level of each term annotation depending on the number of genes contained from the input list.
We also performed gene expression profile attending on tissue origin. We grouped tumour samples according to their localization, based on risk factors and molecular mechanisms implicated, in oral cavity (tongue, cheek mucosa, floor of mouth, base of tongue, hard palate, gum, lower gum, mouth, anterior floor of mouth, mandible, border of tongue, retromolar area, upper gum, palate, ventral surface of tongue), lip (lip, overlapping lesion of lip), oropharynx tonsil, oropharynx, supraglotis, posterior wall of orophayrnx, pharynx and hypopharynx (laringe, hypharynx). A total of 286 samples belonged to the oral cavity group, 55 to oropharynx, 137 to hypopharynx, and 86 to lip. If we select, only those genes that are differentially expressed according to their localization, we observed that in the oral cavity, we obtained 133 genes, in oropharynx 1209 genes, in hypopharynx 224, and in lip 86. To visualize the different profiles obtained in each group, the 20 most relevant DEGs, attending to p-value, from each group are summarized in Tables 1-4.    Survival analysis revealed 25 genes, that are DEGSs in HNSCC and that can be associated significantly with poorer prognosis. The majority of them were identified as oncogenes, and only 4 of them could be identified as tumour suppressor gene (Table 5).
We have also compared expression levels in HPV (+) samples compared with HPV (−) samples. Although we have obtained more than 2000 DEGs in this analysis, we resumed more significantly differently expressed genes in Table 6.
We compared every tumour sample (diploid or somatic copy number modified) with the normal control samples to assess somatic copy number alterations (SCNAs) in terms of inducing alterations in the transcriptome. The results showed that 112 out of the 285 SCN-altered genes were differentially expressed in HNSCC samples. In total, 91 genes were upregulated while 21 showed lower levels of expression, 12 of them were cancer-related genes: two deleted (CDKN2A and PTPRD) and nine amplified (EGFR among others).
After obtaining DEGs and SCNA results, we have made a thorough calculation to relate expression and alterations in the number of copies. Table 7 shows those differently expressed genes in HNSCC, that have a copy number change in more than 10% of the cohort, with (log(FC) > 4 and < −4), and those that are classified in COSMIC.

Somatic Alteration Analysis
In the same HNSCC samples, around 94% of the VarScan-reported mutations had already been found in the MuTect2 results. The remaining 6% were distributed in different mutation categories, therefore demonstrating that VarScan2 can recall more mutations from every type due to its softer false positive filtering as: 276 frame shift deletions, 63 frame shift insertions, 147 in frame deletions, 5 in frame insertions, 2273 missense mutations, 193 nonsense mutations, 1 non-stop mutation, 71 splicing variants and 7 alterations in the transcription start site (TSS).
The TCGA-HNSCC cohort was the ninth most mutated out of TCGA's projects ( Figure 4A). This indicated an average mutation rate per sample of 95 somatic alterations. In terms of proportion, missense variations were the most common, followed by nonsense and frameshift deletions, and cytosine SNPs (single nucleotide polymorphisms) (C > T, C > G or C > A) ( Figure 4B) shows the highest frequencies. In fact, when looking at the top ten most mutated genes ( Figure 4C) within the cohort, FAT1 and CDKN2A are the only ones which were more affected by nonsense or frameshift deletions than missense ones. TP53 was mutated in 72% of the tumour samples followed by TTN (39%), FAT1 (23%) and MUC16 (19%).  Among these, we can highlight the mutual exclusivity pattern found between the TP53 and PIK3CA mutations ( Figure 4D); while, on the other hand, CDKN2A has shown co-occurrence with TP53 and FAT1 alterations. Finally, our results suggest different, very mutated protein domains in the cohort related to transcriptional control, signal transduction and, interestingly, G protein-coupled receptors (GPCR).

Discussion
A total of 528 patients were included in our study, 70.11% of patients were males, as had been the case in another similar study, however oral cavity tumours were more prevalent in our study, with 72.4% versus 62%. The tumour stage was predominantly (64.2%) advanced, (stage III and IV), similar to previous studies [15].
Transcriptional alterations analysis identified 3491 DEGs in HNSCC tumour samples when compared to healthy oral tissue. The oncogenes and tumour suppressor genes from the COSMIC cancer genes list were selected from the total DEGs, reducing these DEGs to 148. According to previous results, the more relevant DEGs were CDKN2A, BRCA2, CDK6, BRCA1, CDK4, TP63, EGFR, SOX2, KIT and ERBB4 [16,17].
The upregulated D-cyclin-dependent kinases, CDK4 and CDK6, were pro-tumorigenic proteins and core controllers of G1/S cell cycle phase transition [18]. Their tyrosin kinase domain phosphorylated the retinoblastoma protein (RB), impairing it from forming a complex with the E2F transcription factor. Once released, E2F induces the expression of the several genes that are necessary for the cell's progression into the DNA synthesis phase. There are different regulators of CDK activity that could act as tumour suppressors, even by dually inhibiting cell cycle progression and enhancing pro-apoptotic factors. In this sense, HPV (+) cells show P16 INK4A upregulation, a tumour suppressor gene transcribed from the CDKN2A locus [19]. This polypeptide inhibits CDK4/6 kinase activity and arrests the cell cycle at G1/S checkpoint. At the same time, P16 INK4A can also stabilize TP53, a potent tumour suppressor and pro-apoptotic protein, by sequestering the murine double murine 2 (MDM2) ubiquitin into the nucleus. The high levels of p16 in HPV+ tumours unlikely result in significant stabilisation of p53 or inhibition of the cell cycle, as RB and p53 are degraded because of viral genes E7 and E6 activity. On the other hand, HPV (−) cancers centre around genes that control cell motility, invasion, EMT and angiogenesis [20].
The upregulation of the epidermal growth factor receptor (EGFR) into tumour cells can intensify its signalling and promote both abnormal rates of cell proliferation and cell survival [21]. The EGFR gene belongs to the ERBB family of kinase receptor proteins alongside three other members: ERBB2, ERBB3 and ERBB4. Intriguingly, contrary to what was expected and published for other cancers [22,23], the ERBB4 gene is found deeply downregulated (log 2 (FC) = −3.67) in the HNSCC tumour samples. Other pathways, in which members are upregulated, are the PIK3CA/AKT/mTOR and WNT/B-catenin signalling axes, some of the Hippo pathway transductors and the DNA repair machinery members. It has been proven that these mechanisms were also implicated in previous HNSCC studies, not only by DEGs but also by other epigenetic biomarkers [24].
We also found recurrent repression of KIT receptor and other elements of its signalling route. This is not the first time this pro-tumorigenic pathway has been found to be downregulated in cancer. In thyroid carcinoma, lower levels of KIT positively correlate with more malignant phenotypes [25].
Functional analysis reveals that the more relevant signalling pathways in upregulated genes were those that were involved in tissue development, such as endoderm development, tissue morphogenesis, or skeletal system development. On the other hand, those downregulated genes were more implicated in the cell differentiation molecular pathway, salivary secretion, cornification or starch and sucrose metabolism. Altered molecular mechanism identification is useful for designing new treatment strategies such as immunotherapy, since these targeted therapies are effective against determined molecules [26].
In order to help to develop new therapies and programs for HNSCC prevention, as there is a lot of information in our analysis, we wanted to make our results more useful by performing a studies profile in HNSCC attending on localization. We could observe that, especially in oropharynx, DEGs' number were significantly different (1209 genes exclusively different express in oropharynx) comparing results with the other localization, as none of the other group showed more than 224 DEGs. Although cautiously, these differences in DEGs results, the majority of the HPV (+) tumours are concentrated in the oropharynx group, where samples from the tonsil are included. Differential expression profiles developed by other authors, as well as by our group some years before, in HNSCC, revealed heterogeneity in results that reflects the nature of cancer [27,28].
Data obtained from survival analysis reveal different oncogenes overexpressed and whose higher expression correlates with worse survival. Other genes classified as tumour suppressors, whose lower expression correlates with worse survival but which, in the general cohort, are found with greater expression than normal samples. Concretely, EXT2 and IGF2BP2, are two known tumour suppressor genes whose highest expression correlates with worse survival. Taken altogether, there are genes classified as oncogenes or tumour suppressor genes in the COSMIC and that in our cohort show survival impacts that support this hypothesis, but there are some others that deviate from this theory, which behaviours show contrary to those expected genes that are classified as oncogenes, overexpressed and whose higher expression correlates with worse survival. CDKN2A, which is used in the diagnosis of HNSCC to check malignancy, has different behaviour in our results, as its overexpression is usually associated with better survival. It may be due to the fact that this correlation can be made if we are in context of HPV infection. Analyses that report differential expression genes and somatic copy number alterations, show that is complicated to establish a cause-effect edge. Some genes are overexpressed and amplified, such as, for example, FOXL2, but there are other deleted genes that are over expressed, such as MUC4. This can be explained because, although there is a percentage of samples where the gene is not present, in the other, they have a tendency to increase its expression in comparison with normal tissue. The same can occur with amplified or repressed genes. Although a gene is amplified, there are other known mechanisms that act in a superior level that copy number alterations, like epigenetic changes or cotranscriptional repression.
Today's primary challenge is to differentiate somatic mutations with oncogenic potential from all those which crop up later in a tumour's lifetime without contributing to the tumour development. In this sense, the concept of positive selection has opened a new avenue for the discovery of novel tumorigenic roles in cancer. Mutations that improve the fitness of the tumour cells would be under positive selection pressure, meaning that they would recur at elevated rates across patients. This is the principle that is widely used to identify these signals and differentiate them from alterations with real oncogenic power [29].
SCNAs are very frequent genomic aberrations in cancer. Despite the fact that they directly affect the genome architecture, the main force through which they trigger their oncogenic potential is transcriptional deregulation. In fact, the amplifications of many oncogenes and the deletions of tumour suppressors were related to the cancer's aggressiveness, endangering the patients' survival.
In this study, we used the GISTIC2.0 algorithm results to identify recurrent SCNAs in the TCGA-HNSCC tumour samples. Our results demonstrated the co-amplification of segments 11q13 and 11q22, both embedding caspase cascade inhibitors, as has been previously described [8]. Modifications of 27 arm-level with gains in 3q, 5p and 8q, and losses in 3p and 8p chromosomal arms, have already been described in lung squamous cell carcinomas [30]. Taking into account the five most SCN-altered genes in HNSCC, deletion was observed in CDKN2A, CDKN2B, and amplification in PPFIA1, FADD and ANO1. These results concur with recent studies [31] and also with other known cancer driver amplifications, such as SOX2, PIK3CA, EGFR and CCDN1. SOX2 is a well-known oncogene that has already been shown SCNA in HNSCC and lung squamous cell carcinoma [32]. PIK3CA amplifications have been associated with an increased risk of local recurrence [33]. EGFR and CCDN1 amplification have been associated with clinical stage, tumour differentiation, and lymph node metastasis in HNSCC [31].
The TCGA mutation-calling pipeline offers four different methods to elucidate every single variation inside the tumour's whole exome sequencing (WES) results. They are used in parallel and independently to better compare final results. In this sense, VarScan has been designed to be more permissive in the introduction of false positives, which was demonstrated to give a better performance [34] when dealing with "normal allele alteration" problems. On the other hand, MuTect2 is the algorithm with the most stringent false-positive filter [34]. These methods and others have been challenged in many reviews, showing that MuTect2 has the best performance dealing with both low-and high-frequency somatic mutation discovery [35,36]. In this study, we decided to use the combination of results from both the VarScan and MuTect2 pipelines.
The most relevant mutations in our cohort were TP53 (72%) followed by TTN (39%), FAT1 (23%) and MUC16 (19%). TP53 has also been observed as the most prevalent mutation in previous studies [37,38] and also CDKN2A, EGFR and FAT1 were observed as being the most relevant [38]. Among all mutations, we highlighted the mutual exclusivity pattern found between the TP53 and PIK3CA mutations. PIK3CA mutations have been associated with tobacco and alcohol consumption in previous studies [39]. Maybe, HPV status can explain this mutually exclusive mutations based on HPV status, as p53 mutations are very rare in HPV (+) cancers, while PIK3CA mutations are common, at least in a genetically well-defined subgroup of HPV (+) cancers [40].
Although these results do not concur with Babur et al. [41], the mutual exclusivity pattern is necessary in order to delineate the functional relations and involvement in common pathways of cancer-causing alterations. The identification of these patterns permits clinicians and scientists, not only to establish new tumour classifications or new preventive therapies, but also to design potential treatment targets and identify tumour vulnerabilities. Therefore, further analysis of mutual exclusivity patterns in HNSCC is still necessary [42].
cBioportal is a web application that simplifies data acquisition and processing from many high-quality publications, including the TCGA-HNSCC cohort. According to the cBioPortal analysis, the top five alteration genes were CDKN2A, CDKN2B, FGF3, FGF4, and FGF19 whereas in our results, those were CDKN2A, CDKN2B, PPFIA1, FADD, and ANO1. Theses discrepancies can be explained because there are two ways of using cBioportal: first, we can consult a dataset's highlighting alterations in the "summary mode" or, second, a "by gene" query is available if we have a pre-selected genes list of our interest. cBioportal summary mode on TCGA-HNSCC "provisional" dataset show FGF3, FGF4 and FGF19 as copy number altered genes in 25% of the patients and no mention about PPFIA1, FADD or ANO1 genes. Nevertheless, the same platform when using the "by gene" query and the same dataset reports copy number alterations in these genes as 26% in the case of PPFIA1 and 25% in the cases of ANO1 and FADD.
We also have found differences in mutation data, while the top five genes in cBioPortal were TP53, FAT1, CDKN2A, NOTCH1, and PIK3CA, whereas in our results, they were TP53, TTN, FAT1, and MUC16. We did both summary and "by gene" queries in cBioportal and results were very similar as in copy number alteration matter. In this case, a reviewer search returned TP53, FAT1, CDKN2A, NOTCH1 and PIK3CA as the top mutated genes in the TCGA-HNSCC "provisional" dataset. There were no mentions about TTN or MUC16 mutation frequency, as we have reported. Once again, if we do a personalized query including all these genes, we face the same issue where TTN and MUC16 show larger mutation proportion (42% and 20%, respectively) than NOTCH1 or PIK3CA, both mutated in 18% of the samples. These discrepancies in cBioportal reports are intriguing and may be much better explained from its creator team and should not suppose any issue for the acceptance of our results as accurate and veracious.
Identification of those genes that are drivers in the tumorigenesis process is difficult as, even in the same tumour samples, different somatic alterations can be identified [43]. After analysing missense mutations and truncated mutations in oncogenes and tumour suppressors (results not shown), we have found that generally, in oncogenes, missense mutations are superior to truncated mutations but not in tumour suppressors, except from those more famous in HNSCC as CDKN2, FAT1 or PTEN. This means that missense mutations also can produce downregulation in a protein activity, so they are not always associated with gain of function. Some tumour suppressors analysed in this study were not described as tumour suppressors in HNSCC and it is possible that missense mutations that we have found were really random.
Although we were unable to assess the HPV infection with SCNA or somatic mutations, as only a small proportion were HPV (+), previous studies that have associated these alterations with HPV infections have shown enrichment mutations in HPV-negative in TP53, CDKN2A and PIK3CA, and also copy number gains in EGFR and CCND1, similar to our results [44]. Despite the already mentioned limited HPV (+) proportion, we performed differentially expression gene analysis in HPV (−) and HPV (+), obtaining similar results as other authors before us [45].
Clustering genes in similar signalling pathways or even targets similar to other cancers can help to determine the treatment with targeted drugs that are already available or under development. HNSCC classical classification is not always able to be associated with a determined prognostic. Genetic expression analysis, somatic mutations and SCNA can help us to determine patterns that will enable the tumoral behaviour to be determined or the possible targeted therapies to be identified.

Conclusions
Comprehensive analysis of HNSCC expression genes and somatic mutations allows us to identify deregulated genes and the most relevant molecular pathways altered in patients with HNSCC. The most prevalent mutations were found in TP53, TTN, FAT1 and MUC16. Other relevant results were the mutual exclusivity patterns between the TP53 and PIK3CA mutations and CDKN2A co-occurrence with TP53 and FAT1. Functional studies confirmed that deregulated genes play a role in mechanisms involved in cell differentiation and tissue development. Together, these new findings help us to develop new therapies and programs for the prevention of HNSCC.

Acknowledgments:
We would like to thank Rubén Fernández Caloto for his invaluable implication and collaboration with the bioinformatic data management of this project.

Conflicts of Interest:
The authors declare that there is no conflict of interest.