Genomic Signature of Oral Squamous Cell Carcinomas from Non-Smoking Non-Drinking Patients

Simple Summary A clinically distinct cohort of non-smoking non-drinking patients who develop oral cavity squamous cell carcinomas has been identified, with previous work suggesting that these patients tend to be older, female, and have poor outcomes. Our study characterised tumour molecular alterations in these patients, identifying differences in genomic profiles as compared to patients who smoke and/or drink. Associations between molecular alterations and other clinical and pathological characteristics were also explored. Abstract Molecular alterations in 176 patients with oral squamous cell carcinomas (OSCC) were evaluated to delineate differences in non-smoking non-drinking (NSND) patients. Somatic mutations and DNA copy number variations (CNVs) in a 68-gene panel and human papilloma virus (HPV) status were interrogated using targeted next-generation sequencing. In the entire cohort, TP53 (60%) and CDKN2A (24%) were most frequently mutated, and the most common CNVs were EGFR amplifications (9%) and deletions of BRCA2 (5%) and CDKN2A (4%). Significant associations were found for TP53 mutation and nodal disease, lymphovascular invasion and extracapsular spread, CDKN2A mutation or deletion with advanced tumour stage, and EGFR amplification with perineural invasion and extracapsular spread. PIK3CA mutation, CDKN2A deletion, and EGFR amplification were associated with worse survival in univariate analyses (p < 0.05 for all comparisons). There were 59 NSND patients who tended to be female and older than patients who smoke and/or drink, and showed enrichment of CDKN2A mutations, EGFR amplifications, and BRCA2 deletions (p < 0.05 for all comparisons), with a younger subset showing higher mutation burden. HPV was detected in three OSCC patients and not associated with smoking and drinking habits. NSND OSCC exhibits distinct genomic profiles and further exploration to elucidate the molecular aetiology in these patients is warranted.


Introduction
Squamous cell carcinomas of the head and neck (HNSCC) are a heterogeneous group of cancers arising in the upper aerodigestive tract, with oral cavity cancers being the most common. HNSCC is traditionally viewed as a disease of smokers [1] and drinkers [2], but non-smoking non-drinking patients (NSND) also develop HNSCC. Chronic exposures to heavy metals from sources other than tobacco, such as contaminated food and soil, may also constitute a risk factor [3]. The human papilloma virus (HPV) is more common in oropharyngeal patients with no tobacco risk factors [4] and has a clear role in the development of oropharyngeal SCCs, but its role in oral cavity SCC (OSCC) patients without tobacco or alcohol risk factors remains poorly defined [5].
Retrospective audits of OSCC patients at our centre have revealed a larger than expected group of non-smoking (40%) and NSND (24%) patients who are predominantly female, have a bimodal age distribution, and a predilection for disease on the oral tongue. Furthermore, NSND patients with OSCC appear to have worse disease-specific mortality than smoking or drinking (SD) patients [6,7]. Other retrospective studies have also explored this NSND group, and whilst they concur that the group is more likely to be female and have oral cavity tumours, no consensus pattern in age distribution or survival outcomes has emerged [8][9][10][11][12][13][14][15]. One previous study reported poorer survival in the NSND group, but this was confined to young NSND patients [12], whilst another found a non-significant trend towards improved survival in the NSND group as a whole [11].
NSND patients are unlikely to be a homogenous group, and the suggested bimodal age distribution and adverse clinical outcomes of NSND patients highlight these patients as an important group requiring further study. Delineation of molecular alterations in NSND patients may provide insights into the aetiology of OSCC in these patients.
The impact of risk factors on somatic mutation load may also contribute to the clinical course of NSND patients: Tobacco use has been associated with a distinct somatic mutation signature in HNSCC with an enrichment of C > A transversions, although this signature appears much more pronounced in laryngeal cancers than OSCC [31]. Furthermore, a mutation signature related to APOBEC cytidine deaminase editing has been identified in HPV-positive HNSCC [32]. Notably, alcohol consumption has been associated with T > C transitions in oesophageal [33] and hepatocellular [34] carcinomas, although this has not been reported for HNSCC.  [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30], stratified by human papilloma virus (HPV) status as available. Studies dedicated to oral squamous cell carcinomas (OSCC) are shown separately. Percentage of patients with a gene mutation are shown; red indicates low percentages and yellow indicates high percentages. Grey boxes indicate that no data were available for that gene for a particular publication.
The impact of risk factors on somatic mutation load may also contribute to the clinical course of NSND patients: Tobacco use has been associated with a distinct somatic mutation signature in HNSCC with an enrichment of C>A transversions, although this signature appears much more pronounced in laryngeal cancers than OSCC [31]. Furthermore, a mutation signature related to APOBEC cytidine deaminase editing has been identified in HPV-positive HNSCC [32]. Notably, alcohol consumption has been associated with T>C transitions in oesophageal [33] and hepatocellular [34] carcinomas, although this has not been reported for HNSCC.
Apart from somatic mutations, HNSCCs exhibit significant genomic instability. Many HNSCCs show abundant DNA copy number variations (CNV), with prominent amplifications of chromosome 3q26/28 (the locus containing the PIK3CA oncogene), deletions of chromosome 9p21.3 (containing the CDKN2A tumour suppressor) as well as focal amplifications of EGFR and CCND1, and deletions of FAT1 and NOTCH1 [28]. There is one report on CNVs in a small cohort of non-smokers with oral tongue cancers that found  [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30], stratified by human papilloma virus (HPV) status as available. Studies dedicated to oral squamous cell carcinomas (OSCC) are shown separately. Percentage of patients with a gene mutation are shown; red indicates low percentages and yellow indicates high percentages. Grey boxes indicate that no data were available for that gene for a particular publication.
Apart from somatic mutations, HNSCCs exhibit significant genomic instability. Many HNSCCs show abundant DNA copy number variations (CNV), with prominent amplifications of chromosome 3q26/28 (the locus containing the PIK3CA oncogene), deletions of chromosome 9p21.3 (containing the CDKN2A tumour suppressor) as well as focal amplifications of EGFR and CCND1, and deletions of FAT1 and NOTCH1 [28]. There is one report on CNVs in a small cohort of non-smokers with oral tongue cancers that found no genomic differences as compared to smokers [35], but CNVs in the NSND group of HNSCC patients has not been addressed previously.
To refine our understanding of gene mutation profiles and somatic CNVs in OSCC and to elucidate potential genomic associations with tobacco and alcohol consumption, we performed targeted sequencing of 176 OSCCs from a community-based patient cohort for a panel of 68 frequently mutated HNSCC genes. To examine the involvement of HPV in OSCC from NSND and SD patients, our amplicon panel also included the genomes of the four most prevalent HPV risk subtypes (HPV subtypes 16, 18, 33, and 35). Mutation data were interrogated for associations with patient reported smoking and drinking habits, HPV status, clinicopathologic data, and survival outcomes.

Materials and Methods
Patients. A total of 176 patients with newly diagnosed OSCC presenting to the Royal Melbourne Hospital, Parkville, Australia, were examined. This study was approved by the relevant Human Research Ethics committees (RMH HREC 2013.087, RMH HREC 2012.071). For 103 patients diagnosed between January 2007 and August 2010, archival tumour blocks were retrieved from pathology archives. Regions of tumour with >50% neoplastic cell content were marked out by a specialist head and neck pathologist (C.M.A.) based on hematoxylin and eosin (H&E) stained sections, and macrodissected from 10 µm unstained serial sections. For 73 patients diagnosed between January 2014 and July 2016, fresh tumour and blood samples were obtained at surgery. Fresh-frozen tumour tissue was embedded in OCT medium and assessed for adequate (>50%) neoplastic cell content based on H&E-stained sections.
Disease stage at presentation was classified according to the AJCC 7th edition [36]. Patient smoking and drinking habits were recorded. Individuals who had smoked less than 100 cigarettes in their lifetime were classified as non-smokers, with all patients who were current or former smokers classified as smokers. Individuals without regular alcohol consumption (<1 standard drink per week) were classified as non-drinkers. All patients were treated by radical intent surgery and referred for adjuvant radiotherapy (with or without chemotherapy) as clinically appropriate. Clinical, treatment, and follow-up details were collected in a dedicated database, with a census date set at 1/1/2020 (minimum patient follow-up time of 3.5 years). Follow up was performed in line with current clinical guidelines, with disease-free patients discharged after 5 years.
Targeted gene panel sequencing. HNSCC somatic mutation and RNASeq data for 313 patients with oral cavity SCC were retrieved from the TCGA data portal and analysed to select genes for the curation of a dedicated 500 kb custom Agilent SureSelect XT2 amplicon panel for next-generation sequencing. Gene selection was based on mutation prevalence, RNA expression, and likelihood of contributing to oncogenesis as assessed by two previously described algorithms, OncodriveClust [37] and MutSigCV [38]. The finalised panel included 68 candidate genes, achieving a mean coverage of 95% (range 86-100%, Supplementary Table S1). To enable tumour typing for HPV status, HPV genomes for the four main high-risk subtypes (HPV subtypes 16, 18, 33, and 35) were included. DNA was extracted using the DNeasy Blood & Tissue, AllPrep DNA/RNA Mini and GeneRead FFPE extraction kits (Qiagen), according to manufacturer's instructions. Libraries were prepared using the Agilent SureSelect XT2 system and single-end sequencing performed on an Illumina Next-Seq platform.
Mutation detection. Raw data were processed and mutation calling performed using GATK software [39,40]. Local realignment and base recalibration steps were performed prior to variant calling. Identified SNPs and indels were filtered and annotated with SnpEff [41]. Mutations identified exclusively on forward or reverse reads were found to be enriched in the FFPE samples as compared to the fresh-frozen samples, a known FFPE sequencing artefact [42]. Accordingly, a strand bias filter removing any mutation calls based solely on forward or reverse reads was applied across all samples to remove such sequencing artifacts.
For fresh-frozen tumour samples, somatic mutations were identified based on the sequencing data from the matched blood samples. Matched normal samples were not available for FFPE tumour samples, and putative somatic mutations were identified by filtering against germline variants identified in the 1000 Genomes Project, the normal samples from our prospective cohort and a previously curated database created for identification of somatic mutations in colorectal cancer cell lines [43]. Pathogenicity prediction was performed using the previously published PolyPhen-2 algorithm, with scores above 0.85 considered to be likely pathogenic [44].
HPV detection. Read counts mapping to viral sequences were normalised against library size. Samples with post-normalisation read counts for any single HPV subtype of greater than 1000 were considered to be HPV-positive.
DNA copy number analysis. DNA copy number analysis was conducted using Ex-omeDepth [45], which has been demonstrated to be a robust technique for determination of CNVs from targeted capture sequencing data [46]. A variant of the standard ExomeDepth pipeline was used [47], whereby low mappability regions as computed for 36-mers were removed from the SureSelect probe set prior to read mapping [48], with blood samples used as a reference set.
Statistical Analysis. All statistical analyses were performed using the R software for statistical computing [49]. Differences between groups were assessed using Fisher's exact test for categorical variables and the Kruskal Wallis test for continuous variables. Mutation counts were compared between groups of interest using a generalised linear model [50]. Each gene mutated in at least 5% of patients (mutations in >10 cases) and with at least 50% of mutations assigned as likely pathogenic were correlated to clinicopathologic variables. Between-group survival differences by mutation status were assessed using Kaplan-Meier analysis and Cox-proportional hazard models adjusting for clinicopathologic variables. Overall survival was defined as time from diagnosis to death, with censoring done where patients were alive at last contact. Two-sided p-values < 0.05 were considered statistically significant.

Patient Clinical Characteristics and HPV Status
Clinical details of 176 OSCC patients examined in this study are summarised in Table 1. A total of 82 patients had early stage (stage I/II) disease and 94 patients had local or regionally advanced disease (stage III/IV). All patients were treated with radical intent surgery and were referred for radiotherapy and/or chemotherapy following discussion at a multidisciplinary team meeting. Sixty-three percent (110/176) of patients received adjuvant radiotherapy and 22% (39/176) were treated with chemotherapy.
Clinicopathologic details and treatment delivery were similar between retrospective patients (n = 103) diagnosed between January 2007 and August 2010 and prospectively recruited patients (n = 73) diagnosed between January 2014 and July 2016. However, the proportions of non-drinkers and NSND patients were higher in the prospective cohort, consistent with the reported trend of reduced alcohol consumption among Australians over this time period [51] (Supplementary Table S2).
Presence of HPV was identified through our targeted sequencing approach in 3 out of 176 (1.7%) OSCCs ( Figure 2); one case was positive for HPV-16 and two cases for HPV-33. This HPV detection rate is consistent with a previous study from our centre, which used orthogonal methods (PCR-ELISA and RNA in situ hybridization) to identify HPV [52] and all of the overlapping patients between the two studies had concordant HPV detection results (39/39 patients, 2/39 HPV-positive), supporting accuracy of targeted next generation sequencing for virus detection. As a further control, a small set of prospectively collected oropharyngeal tumours, which are known to have high prevalence of HPV infection [5], were also sequenced with 57% (4 out of 7) tumours found to be positive for HPV-16, consistent with the prevalence reported by a previous systematic review [53]. A single OSCC NSND patient (1.7%, 1/59) was HPV-positive, similar to the HPV-positive rate in SD patients (1.7%, 2/117, p = 1). There were no significant associations between HPV status and clinicopathologic variables in OSCC patients (Supplementary Table S3). NSND patients were significantly older than SD patients (mean age of 70 years vs. 64 years, p = 0.004). However, there was evidence for a bimodal age distribution (Figure 3), consistent with our previously reported findings that included a subset of the current cohort [6]. As anticipated, a significantly higher proportion of NSND patients (73%, 43/59) were female as compared to SD patients (28%, 28/117; p < 0.001), while other clinical features were similar (Supplementary Table S4). NSND patients showed poorer five-year NSND patients were significantly older than SD patients (mean age of 70 years vs. 64 years, p = 0.004). However, there was evidence for a bimodal age distribution (Figure 3), consistent with our previously reported findings that included a subset of the current cohort [6]. As anticipated, a significantly higher proportion of NSND patients (73%, 43/59) were female as compared to SD patients (28%, 28/117; p < 0.001), while other clinical features were similar (Supplementary Table S4). NSND patients showed poorer five-year overall survival as compared to SD patients in univariate analysis (HR 1.7, 95% CI 1.0-2.8, p = 0.05, Supplementary Figure S1), although this was not maintained in multivariate analysis adjusting for clinicopathologic features (Supplementary Table S5).

Genomic Alterations and Clinical Associations for OSCC Patients
Non-synonymous somatic mutations in 68 cancer genes were identified in 93%  Figures S2-S4).
A total of 17 genes were mutated in at least 5% of patients and had at least 50% of mutations assigned likely pathogenic. Associations with clinicopathologic variables were examined for these genes as well as the five genes with recurrent CNVs (Table 2). Table 2. Univariate analysis for selected gene mutations and copy number alterations against clinicopathologic variables. "Group 1" indicates the referent variable, whilst "Group 2" indicates the comparison variable. Only comparisons where p < 0.05 are shown. NSND = non-smoker and non-drinker; SD = smokers and/or drinker; LN = lymph node; LVI = lymphovascular invasion; PNI = perineural invasion; ECS = extracapsular spread; OR = odds ratio, CI = confidence interval; * p < 0.05.  Univariate analysis for five-year overall survival was not significant for TP53 ( Figure 5), CDKN2A, and FAT1 (Supplementary Figure S7) mutations (p > 0.05). Significantly poorer outcomes were observed for patients with PIK3CA mutated tumours as compared to patients with PIK3CA wild-type tumours (HR 2.0, 95% CI 1.0-3.9, p = 0.045) ( Figure 5) although this did not remain significant in a multivariate analysis adjusting for clinicopathologic variables (Table 3). No other gene mutation was associated with a statistically significant survival difference (Supplementary Table S8). EGFR amplification was significantly associated with poorer survival (HR 2.7, CI 1. 4-5.4, p = 0.004) as was CDKN2A deletion (HR 2.8, CI 1.1-7.1, p = 0.026) in univariate analyses ( Figure 5), but this was not maintained when adjusting for other variables (Table 3). to patients with PIK3CA wild-type tumours (HR 2.0, 95% CI 1.0-3.9, p = 0.045) ( Figure 5) although this did not remain significant in a multivariate analysis adjusting for clinicopathologic variables (Table 3). No other gene mutation was associated with a statistically significant survival difference (Supplementary Table S8). EGFR amplification was significantly associated with poorer survival (HR 2.7, CI 1. 4-5.4, p = 0.004) as was CDKN2A deletion (HR 2.8, CI 1.1-7.1, p = 0.026) in univariate analyses ( Figure 5), but this was not maintained when adjusting for other variables (Table 3).

Mutation Differences between NSND and SD Patients
We observed more mutated genes in non-drinkers (mean 4.3 vs. 3.4 in drinkers, p = 0.001), non-smokers (mean 4.2 vs. 3.4 in smokers, p = 0.008), and the NSND patients (mean 4.7 vs. mean 3.3 in SD patients, p < 0.001). The mutation spectrum comparing NSND to SD patients is visualised in Supplementary Figure S8. Examination of mutation counts identified five patients among the NSND group who had higher numbers of mutations (>12) as compared to the SD group ( Figure 6). Table 3. Univariate and multivariate Cox proportional hazards analysis assessing PIK3CA mutation, EGFR amplification or CDKN2A mutation and clinicopathologic variables in OSCC patients. NSND = non-smoker and non-drinker; SD = smokers and/or drinker; LN = lymph node; PNI = perineural invasion; LVI = lymphovascular invasion; HR = hazard ratio, AHR = adjusted hazard ratio, CI = confidence interval; * p < 0.05.

Mutation Differences Between NSND and SD Patients
We observed more mutated genes in non-drinkers (mean 4.3 vs. 3.4 in drinkers, p = 0.001), non-smokers (mean 4.2 vs. 3.4 in smokers, p = 0.008), and the NSND patients (mean 4.7 vs. mean 3.3 in SD patients, p < 0.001). The mutation spectrum comparing NSND to SD patients is visualised in Supplementary Figure S8. Examination of mutation counts identified five patients among the NSND group who had higher numbers of mutations (>12) as compared to the SD group ( Figure 6). These five patients were younger than the remainder of the NSND group (mean 53 years vs. 71 years, p = 0.013). The distribution of mutation types (transitions, transversions, and indels) in these five patients were compared to the distribution in other NSND patients as well as the SD group (Table 4). There was no significant difference between this high mutation group and the remainder of the NSND group (p = 0.297). However, compared to the SD group, there was a decrease in proportion of insertions/deletions, and an These five patients were younger than the remainder of the NSND group (mean 53 years vs. 71 years, p = 0.013). The distribution of mutation types (transitions, transversions, and indels) in these five patients were compared to the distribution in other NSND patients as well as the SD group (Table 4). There was no significant difference between this high mutation group and the remainder of the NSND group (p = 0.297). However, compared to the SD group, there was a decrease in proportion of insertions/deletions, and an enrichment of T > C transitions (p = 0.019 for the NSND high mutation group, p = 0.067 for the NSND group as a whole). There was no evidence of enrichment of tobacco-associated enrichment of C > A transversions or alcohol-associated enrichment of T > C transitions among SD patients. Table 4. Distribution of mutational alterations, comparing the SD group with the entire NSND group or subset of with low or high mutation load. NSND = non-smoker and non-drinker; SD = smokers and/or drinker. * p < 0.05.

Discussion
This study surveyed the molecular profiles of 176 OSCC patients, 34% of which were NSND patients, providing insights into the aetiology of this subgroup. HPV was excluded as a major contributor to carcinogenesis in oral cavity cancers in the NSND group, with a similar low prevalence in both this subgroup (1.7%) and SD patients (1.7%). Nonetheless, none of the HPV-positive OSCCs in this study harboured a TP53 mutations, consistent with the well-established role of HPV E6 protein as an inhibitor of TP53 [54].
In the context of the targeted gene panel, a subset of our NSND OSCC patients had a higher mutation burden than SD patients. This was an unexpected finding as the a priori expectation was that smokers/drinkers would accumulate more mutations over time as a result of carcinogen exposure. The increase in mutation burden, particularly of T > C transitions, in the NSND group could imply an underlying mutational process, but with our limited targeted sequencing, mutational signatures could not be explored in depth. An alternate hypothesis is that the oncogenes and tumour suppressor genes targeted by our sequencing panel may play a more dominant role in NSND patients. Sequencing of the entire exome or genome and replication in an independent cohort would be required to differentiate between these possibilities.
In NSND patients, the well described tumour suppressor CDKN2A was found to be mutated at almost twice the frequency of SD patients (35.6% vs. 17.9%), and this was also evident when comparing smokers to non-smokers. However, the frequency of CDKN2A deletions was not significantly different between groups (NSND: 1/59, 1.7%; SD 6/117, 5.1%). Notably, CDKN2A promoter methylation is another mechanism of CDKN2A inactivation, which is known to be common in HNSCC as a whole (20% of cases in TCGA data [28]) but could not be evaluated in our cohort. Whilst an association between smoking and CDKN2A inactivation has not previously been identified in OSCC, a meta-analysis in non-small cell lung carcinoma (NSCLC) has reported a positive association between p16 promoter methylation and smoking [55].
Amplification of EGFR was more common in the NSND group than the SD group (16.9% vs. 5.1%). Overexpression of EGFR has been found to be correlated with smoking and poorer overall survival in oropharyngeal SCC [56], and in NSCLC, EGFR mutations are more common in non-smokers than smokers and is clinically helpful in guiding the use of targeted therapy [57]. In a similar vein, exploration of EGFR as a biomarker for EGFR-directed therapy in NSND OSCC patients may be warranted. BRCA2 deletions were more frequently identified in the NSND group than the SD group (11.9% vs. 1.7%) although the significance of these deletions is uncertain.
Our study also highlighted a number of more general molecular associations in OSCC. TP53 mutation was associated with nodal disease, lymphovascular invasion, and extracapsular spread, consistent with previous reports in the OSCC literature [58]. Mutations and deletions of CDKN2A were independently associated with advanced tumour stage in our cohort and some investigators have associated CDKN2A copy number loss with poor prognosis in HNSCC [59], which was also observed in univariate analysis in our patients. Finally, EGFR amplification was associated with poor overall survival in univariate analysis and was associated with perineural invasion and extracapsular spread. Extracapsular spread has previously been associated with EGFR amplification [60] or high expression levels of EGFR [61,62], as has perineural invasion [63]. Whilst overexpression of EGFR has been associated with worse survival in oropharyngeal cancers [56], previous work has not identified an association between EGFR amplification and survival [64]. Finally, PIK3CA mutations were found to be associated with poor prognosis in OSCC patients in univariate analysis, which has previously been reported in a cohort of HPV-positive oropharyngeal SCCs [65].
Caveats of our study are that tobacco and alcohol histories were self-reported and exposure to second-hand tobacco is difficult to quantify, which may lead to some erroneous classifications of NSND status. The cohort size in our study was limited although molecular findings were broadly consistent with the OSCC literature. Our survey of molecular alterations was limited to a panel of genes, precluding more detailed examination of mutation signatures or larger-scale DNA copy-number or structural alterations that may drive oncogenesis in the NSND group. In addition, transcriptomic and epigenomic alterations may contribute to OSCC in NSND patients. Examination of independent cohorts will be required to validate our findings. As the proportion of NSND HNSCC patients is relatively small, this will likely require aggregation of clinically annotated HNSCC sequencing datasets across multiple institutions.

Conclusions
In summary, we have excluded HPV as a primary driver underlying oral carcinogenesis in NSND patients and have identified significant molecular differences between the NSND and SD groups in OSCC including cancer gene alterations and mutation burden based on our targeted gene panel. Further studies are warranted to elucidate the molecular aetiology of OSCC in NSND patients.   Informed Consent Statement: Informed consent was obtained from prospective subjects involved in the study (HREC 2013.087). A waiver of informed consent for retrospectively recruited patients was granted by the relevant Ethics committee (HREC 2012.071).

Data Availability Statement:
The molecular data presented in this study are available in the Supplementary Data. Associated clinical data cannot be provided to maintain patient confidentiality.