Comprehensive Analysis of Mutation-Based and Expressed Genes-Based Pathways in Head and Neck Squamous Cell Carcinoma

Background: Over- or under-expression of mRNA results from genetic alterations. Comprehensive pathway analyses based on mRNA expression are as important as single gene level mutations. This study aimed to compare the mutation-and mRNA expression-based signaling pathways in head and neck squamous cell carcinoma (HNSCC), and to match these with potential drug or druggable pathways. Methods: Altogether, 93 recurrent/metastatic HNSCC patients were enrolled. We performed targeted gene sequencing using Agilent 244 customized gene panel for NGS, and nanostring nCounter® for mRNA expression. mRNA expression was classied into over- or under expression groups based on the expression levels that showed 2-fold change in tumor compared with non-tumor tissues. We investigated mutational and nanostring data using the CBSJukebox® system, which is a big-data driven platform to analyze druggable pathways, genes, and protein-protein interaction. Analyses were based on 14 databases related to protein interaction, signaling pathways and drugs such as NCBI, Uniprot, KEGG, Biogrid, DIP, HPRD, and Drugbank. High score interaction genes were mapped to investigate druggable pathways. We calculated Treatment Benet Prediction Score (TBPS) to identify suitable drugs. Results: By mapping the high score interaction genes to identify druggable pathways, we found that highly related signaling pathways with mutation mapping were Inuenza A, pathways in cancer, EBV, HPV, HTLV-1 pathways. Based on mRNA expression and interaction gene scoring model, several pathways were found to be associated with over- and under-expression. Comparison between mutation-based pathway and over-/under-expressed genes-based pathways, we observed that 19.8% of NGS based pathways matched with mRNA over-expressed genes-based pathways, and 43.2% of NGS based pathways consist with mRNA under-expressed genes-based pathways. TBPS found several matching drugs such as immune checkpoint inhibitors, EGFR inhibitors, and FGFR inhibitors. Conclusions: Mutation-based pathway was associated with mRNA under-expressed genes-based pathway. These results suggest that HNSCCs are mainly caused by the loss-of-function mutations. However, big data based platform for druggable pathway can nd potential matching drugs. RAF1: RAF proto-oncogene serine/threonine-protein kinase, EGFR: Epidermal Growth Factor Receptor, EPHA2: Ephrin type-A receptor ESR2: Estrogen receptor Fibroblast growth factor receptor Fibroblast Growth Factor Receptor Receptor RAC-alpha serine/threonine-protein kinase, RAF1: RAF proto-oncogene serine/threonine-protein kinase, EGFR: Epidermal Growth Factor Receptor, Ephrin type-A receptor Estrogen receptor factor receptor Factor

inhibitor, FGFR inhibitor, CDK4/6 inhibitor, and immune checkpoint inhibitor (ClinicalTrials.gov: NCT03292250) [6]. Although potentially targetable genetic alterations in genes such as PIK3CA, EGFR, and FGFR have been identi ed in HNSCC, in depth functional studies to validate their roles as predictive biomarkers have not performed.
Integration of cancer genes into networks offers opportunities to reveal protein-protein interactions (PPIs) with functional and therapeutic signi cance. PPIs network based on cancer gene landscape can give us insight into how these genes contribute to deregulated oncogenic pathways [7][8][9][10]. Pathological over-or under-expression of mRNA results from cancer speci c genetic alterations. Genetic mutation without change in mRNA expression might not result in the functional change at the protein level. mRNA expression based pathways are as important as single gene level mutational analysis.
In this study, we aimed and to compare the mutational and mRNA expression based signal transduction pathways in HNSCC and establish a cancer-associated PPI network in an e cient high throughput format. The objective of this study is to integrate the DNA mutational landscape and mRNA expression patterns into the PPI network pathways, which could be used to match potential drug or druggable pathway in HNSCC.

Patients and data collection
Altogether, 93 recurrent/metastatic HNSCC patients from 19 institutions were enrolled. The details of the study population were described in our previous report [5]. In brief, pretreatment tumor tissues (somatic) and matched normal DNA (germline) from prospectively recruited patients with HNSCC were used for the analysis. Clinicopathological data were collected from patient medical records. Informed consents were obtained. Institutional Review Boards of each institute approved this study protocol.
Targeted gene sequencing and mRNA expression assay Genomic DNA was isolated from formalin-xed para n-embedded tissue samples for the targeted sequencing of 244 head and neck cancer-related genes. The genomic regions of the 244 genes were captured by the customized SureSelectXT Target Enrichment library generation kit (Agilent, Santa Clara, CA, USA), and sequenced using the Illumina HiSeq 2500 platform with a depth of coverage > 1000x. nCounter Analysis System (Nanostring Technologies Seattle, WA, USA) was used to screen for the expression of 93 immune-related genes. Counts were normalized to the internal controls and reference genes using the nSolver software version 4.0.

Basic scheme of protein-protein interaction network analysis
To analyze with a deep insight of combinatorial signaling events evolved in cell communication, we applied novel PPI method named CBSJukebox®. Figure 1 shows the analytic ow in CBSJukebox®. In brief, CBSJukebox® enabled to compare DNA mutation-based pathways and over/under expression based pathways leveraged PPI analysis; and further, CBSJukebox® enabled to perform a simple signal pathway analysis as well as high interaction frequency ratio genes analysis.

Gene list enrichment
The variants selected for DNA mutation-based analysis included nonsynonymous single-nucleotide variant (SNV), frameshift inserts, frameshift deletions, stop-gain, stop-loss and copy number variation (CNV). The signi cantly different mRNAs expression subtypes were identi ed as over or under expressed genes based on Student's t test (p-value < 0.05 and |fold-change| > 2) and compared with those expressed in normal tissue for further over/under expression based pathway analysis.
PPI mapping of mutated genes and over-or underexpressed genes A multi-functional analytical tool CBSJukebox® was used to match DNA mutated genes with Entrez Gene identi cation included interaction distance, interaction type, interaction detection method, number of interactive information related database, number of related literature and number of interaction detection method [11]. In this study, we investigated the directly interacting genes with the start genes (mutation genes, over/under-expressed genes), and the organism chosen was Homo sapiens.

Signal Transduction Pathway analysis
For each patient, CBSJukebox® identi ed genes that interacted with start genes and mapped genes in signal transduction pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes, https://www.ncbi.nlm.nih.gov/pubmed/11752249) database as well as provided the type of interaction information (interaction distance and ratio etc.). We selected top 10 signal transduction pathways among all of the recorded pathways in KEGG database based on the weight of numbers of interactions as well as interacting genes.

High Interaction Frequency Ratio Genes Analysis
For each signal transduction pathway, CBSJukebox® calculated the interaction frequency ratio of interacting genes that interacted with start genes. A 100% interaction frequency gene deems the gene that has highest interaction frequency with start genes within each signal transduction pathway. We calculated the interaction frequency ratio of each gene lied within each signal transduction pathway and set the high interaction frequency ratio cut-off of 75%.

Treatment Bene t Prediction Score (TBPS) calculation
We applied a novel algorithm that calculated gene interaction score for the top 10 signal transduction pathways that divided the number of interactions for each interacting gene (between start genes and interacting genes) in a speci c signal transduction pathway by the total number of interactions. Then we calculated each gene's treatment bene t prediction score (TBPS) by a sum of gene interaction scores included in the top 10 signal transduction pathways [12].
Potential treatment recommendation for patient CBSJuekbox® current version enables to suggest potential treatment options in the order of the genes' TBPS. The genes with high TBPS considered as potential targets for patient treatment could match with the drug target genes from DrugBank (Drugbank, https://academic.oup.com/nar/article/46/D1/D1074/4602867) database. In this study, we only considered drug targets, not limited to drug conditions of approval, indication, and non-prescription.
Comparison of mutation-based pathway and over-/under expressed genes-based pathways The top 10 mutation-based pathways (MBP), top 10 mRNA over-expressed genes-based pathway (OEBP) and top 10 mRNA under-expressed genes-based pathways (UEBP) for each patient were analyzed. The matching rate of MBP and OEBP, and the matching rate of MBP and OEBP were compared.

Validation of TBPS in two HNSCC patients treated with targeted agents
We validated TBPS in two HNSCC patients who were treated with molecular targeted gene therapies. One patient was a 55-year-old male patient. He had recurrent cancer and metastatic oral cavity cancer with Q75E mutation in PIK3CA. The other patient was a 38-year-old female patient and she had recurrent and metastatic paranasal sinus squamous cell carcinoma with frame shift mutations in FGFR1. These two patients were enrolled in TRIUMPH trial (NCT03292250) [6], umbrella trial for recurrent/metastatic HNSCC, consisting of ve targeted therapies including PI3K inhibitor, pan-HER inhibitor, FGFR inhibitor, CDK4/6 inhibitor and immune checkpoint inhibitor. These two patients received alpelisib (BYL719) monotherapy and nintedanib monotherapy, respectively, and showed partial responses. We calculated TBPS in these two patients and analyzed the correlation between TBPS and drug matching results.
We excluded tumor samples without any mutations, because such tumors cannot perform in pathway mapping analysis. We also excluded tumor sample with FoxoG error [13] and QC ag. Altogether, 77 samples which were available with regard to both mutational data and over-/under-expression mRNA data were nally analyzed. Top 10 signaling pathway discovered by mutation-based analysis and mRNA expressed genes-based analysis We compared the top 10 pathways frequently discovered by mutation-based analysis and mRNA gene expressed-based analyses. Two pathways, Kaposi's sarcoma associated herpes virus infection and HTLV-I infection pathways, were found overlapping both in mutation-based analysis and in over/ underexpression genes-based analysis (  Overlapping of mutation-based pathway and over/underexpressed genes based-pathways Comparing MBP and mRNA of over/under-expression genes-based pathway, we observed that 19.1% (147/770) of MBP were overlapping with OEBP, and 42.7% (329/770) of MBP were overlapping with UEBP (Table 3). Calculation of Treatment Bene t Prediction Score (TBPS) Table 4(and Supplementary Table S1) shows the results of TBPS and suggested drug. In the OEBP results, the patient #4 had alteration in T cell receptor signaling pathway, and CD3E gene was identi ed as druggable gene. Muromonab was suggested as targeted agent with TBPS 72.7 for the patient #4. The patient #5 had alteration in cell adhesion molecules (CAMs) pathway, and CD274 (PD-L1) overexpression. PD-L1 inhibitors, such as atezolizumab, avelumab, durvalumab were suggested for the patient #5. Interestingly, JAK1, which is not a well-known target for HNSCC, was identi ed in the patients #16, and roxilitinib was suggested. In the UEBP results, FYN was identi ed as a candidate gene in patient #57, and dasatinib was suggested as the matching drug. and UEBP (Fig. 3, Supplementary Figure S1).
To validate TBPS and suggest the matching drug, we analyzed the data of two HNSCC patients who showed good response to the treatment with PIK3CA inhibitor and FGFR inhibitor. When analyzing data of the nintedanib responding patient with FGFR1 mutation using cutoff of frequency ratio as 75%, nintedanib was suggested in mRNA expression-based analysis with TBPS of 2.5 (Table 5). Alpelisib

Discussion
In this study, we described a novel approach for pathway analysis using mutation data and mRNA expression data. Mutated-gene-related-pathways were associated mainly with mRNA under-expressiongenes-related-pathways. These results suggest that HNSCC are mainly related to loss-of-function mutations. However, big data based platform for druggable pathways can nd potential matching drugs.
Our model is based on 14 open databases for protein, interaction, and signaling pathways such as NCBI, Uniprot, KEGG, Biogrid, DIP, HPRD, and Drugbank. High interaction genes were mapped to investigate druggable pathways. We hypothesized that integration of each mutation and the respective mRNA expression into signaling pathway can identify their functional signi cance and therapeutic target.
Pathway network based on cancer gene landscape can give us insight into how these genes contribute to deregulated oncogenic pathways. Several studies [5,[14][15][16] had similar approaches based on pathway analysis. However, we developed a novel scoring model that measures the overlap between mutation and mRNA expression data, and calculates the interaction relationship score for discovering a potential target drug.
Each mutation and mRNA expression data from signaling nodes and hubs transmit pathological cues along molecular networks to achieve integrated tumorigenic pathways. From the interaction of receptors with deregulated growth factors to dimerization of receptor tyrosine kinases triggered by gene mutations, PPIs initiate a cascade of interactions to promote uncontrolled cell proliferation [9]. In response to oncogenic stimulation, PPIs play essential roles in linking networks that relay oncogenic signals, and therefore allow the suggestion of the target drug.
Unlike existing methods, our model is capable of ranking and scoring the signi cant KEGG pathways reported in the cancer research literature. We used the prior knowledge speci ed in the pathway in order to identify the particular pathway in gene/protein interaction that could explain the molecular basis of carcinogenesis. Our novel algorithm, so called CBSJukebox®, calculates the interaction frequency ratio of interacting genes. Based on the interaction frequency ratio, we can calculate each gene's TBPS using a sum of gene interaction scores. TBPS suggests the matching drug, and visualize the likelihood of responding probability.
During the experiment, we also observed that not only oncogenic pathways but also non-oncogenic pathways were deregulated and activated in HNSCC. These multiple pathways involvement implies that targeting multiple pathway is useful for further re ning the anti-cancer chemotherapy. We also found that overly activated pathway measured by mRNA over-expression and suppressed pathway measured by mRNA under-expression were quite different. However, biologically important pathways were overlapping in both mutation-based and expression-based pathways.
This study has limitations. This model was developed in silico and has not yet been fully validated in the patients. We tried to validate TBPS in two HNSCC patients who showed good response to FGFR inhibitor and PIK3 inhibitor. Our developed CBSJukebox® system suggested both the FGFR inhibitor and PIK3 inhibitor. However, when we applied interaction frequency ratio cut-off of 75%, the PIK3 inhibitor was excluded. This might be have been caused by the insu cient availability of gene data that interacted with PIK3CA pathway in public database. We will expand and use the updated public database in the future to re ne our CBSJukebox® system. Future work will focus on validation of the suggested drugs that were identi ed in this study with large sample size. Regarding future work, our Bayesian network model offers an easy way of incorporating additional data types such as CNV, proteomics data, methylation data, and so on, and such model extension should be attempted.
In conclusion, our pathway based systematic analysis of mutational and mRNA expression pathways provides novel mechanistic and clinical insights into the precision therapeutics for HNSCC. NGS-based mutated-gene-related-pathways were associated with mRNA under-expression-genes-related-pathways.
These results suggest that HNSCC are mainly caused by the loss-of-function mutations. However, big data based platform for druggable pathways can nd potential matching drugs.

Availability of data and materials
The data that support the ndings of this study are available from KCSG (Korean Cancer Study Group) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of KCSG.

Figure 1
Over ow of protein-protein interaction analysis (CBSJukebox® Analysis) Figure 2 Comparison of mutation-based pathways, over-expressed genes-based pathways, and under-expressed genes-based pathways Oncoplot for top 20 pathways analyses in all patients

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. gureS1full.tif