Effect of MHC Linked 7-Gene Signature on Delayed Hepatocellular Carcinoma Recurrence

Dysregulated immune response significantly affects hepatocellular carcinoma’s (HCC) prognosis. Human Leukocyte Antigens are key in devising immune responses against HCC. Here, we investigated how HLAs modulate HCC development at the transcriptomic level. RNA-seq data of 576 patients from two independent cohorts was retrieved. The clinicopathological relevance of all HLA genes was investigated using Fisher-Exact, correlation, and Kaplan–Meier and cox regression survival tests. Clustering of ~800 immune-related genes against HLAs was completed using a ward-agglomerative method. Networks were generated using 40 HLA associated unique genes and hub genes were investigated. HLAs including HLA-DMA, HLA-DMB, HLA-DOA and HLA-DRB6 were associated with delayed recurrence in both discovery (204 HCC cases) and validation (372 HCC cases) cohorts. Clustering analyses revealed 40 genes associated with these four HLAs in both cohorts. A set of seven genes (NCF4, TYROBP, LCP2, ZAP70, PTPRC, FYN and WAS) was found co-expressed at gene–gene interaction level in both cohorts. Furthermore, survival analysis revealed seven HLA-linked genes as predictors of delayed recurrence. Multivariate analysis also predicted that mean expression of 7-gene is an independent predictor of delayed recurrence in both cohorts. We conclude that the expression of 7-gene signature may lead to improved patient prognosis. Further studies are required for consideration in clinical practice.


Introduction
HCC is a common heterogeneous malignancy of the liver with a high mortality rate. It is second most common cause of mortality in males and overall third most common worldwide [1,2]. Despite continuously improving therapeutic approaches, the overall survival rate of HCC patients is still very low. Studies revealed that 40-80% of HCC patients will have recurrence and metastasis within 5 years after receiving treatment [1]. Therefore, a new and distinct approach is required for early-stage diagnosis and improving survival of HCC patients. 2 of 14 Major Histocompatibility Complex (MHC) assists the immune system in "host defense mechanism", which comprises of genes that encode proteins present on the cell surface, also known as Human Leukocyte Antigens (HLAs) [3]. MHCs can be divided into three classes: class I, II and III. The class I region contains genes including three classical (HLA-A, HLA-B and HLA-C), three non-classical (HLA-E, HLA-F and HLA-G) and twelve non-coding genes or pseudogenes [4]. Recent studies demonstrated the association of HLA expression with the prognosis of cancer, suggesting new immunotherapeutic options. However, contradictory results are observed in HLA expression-based prognosis prediction. For instance, overexpression of HLA-G, an MHC class I gene, was found in high grade tumors and suggested as a new checkpoint in different cancers [5,6]. The expression of HLA-B and HLA-E showed linkage with cancer cell migration, metastasis and poor overall survival in pancreatic and endometrial cancers [7,8]. On the contrary, MHC class I genes were also linked with better prognosis in breast and gastrointestinal tumors, suggesting tissue specific roles of HLAs [9,10]. Similarly, overexpression of MHC class II genes including HLA-DP, HLA-DQ and HLA-DR showed association with better prognosis in lung cancer [11]. Additionally, HLA-DR expression was also suggested to improve the prognosis of metastatic breast cancer patients [12].
In HCC, MHC class II molecules are most commonly expressed [13]. Previous studies have also shown that HLAs may play a role in HCC prognosis [14][15][16]. However, due to contradictory results on prognosis prediction, no HLA based immunotherapies are suggested in clinical practice. Additionally, limited information is available regarding the linkage of major HCC associated pathways with HLA expression. Therefore, a comprehensive and integrated study is required to ascertain the prognostic role of all HLA genes in HCC as well as their linkage with genetic pathways associated with HCC.
Thus, the current study is devised to investigate the association of HLA genes with the prognosis and survival of HCC patients. For that purpose, high throughput whole transcriptomic data of 372 and 204 HCC patients along with complete clinical information was retrieved for the analysis. Initially, the clinicopathological relevance of 23 HLA genes from all classes was established. Next, the prognostic role of immune response genes linked with HLA expression was also evaluated. The current study has immense clinical implications as we classified HCC patients into different prognostic groups based on the expression of HLAs and HLA-associated immune response genes.

RNA Sequencing Data of 204 HCC Patients (Discovery Cohort)
Overall study design is presented in Figure 1. The tissues samples of 204 HCCs patients diagnosed between 2005 to 2013 were obtained from the Bio-Resource Center, Korea Biobank Network. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards ((IRB # 2012-0389, ASAN Medical Center, Korea), (COMSATS University Islamabad (CIIT/Bio/ERB/16/21)). Library preparation and RNA quality assessment is described in previous studies [17,18]. Quality check was performed using FASTQC on raw FASTQ files. Alignment was done using STAR RNA-Seq Aligner v2.x7.2 [19]. Duplicates were marked using a Picard tool (https://broadinstitute.github.io/picard/ last accessed on 30 March 2020). After post-alignment clean-up, all the mapped sequences were merged into a Binary Aligned/Mapped (BAM) file. Gene expression analysis was performed using HTSEQ-count (https://htseq.readthedocs.io/en/master/ last accessed on 30 March 2020). The numbers of aligned reads that overlap each gene in the annotations were counted. Lastly, Reads per Kilobase Million (RPKM) values were generated for the analysis. To calculate z-score, = mean expression in reference sample was subtracted from each expression in the tumor sample and the result was divided by the standard deviation of expression in the reference sample [20]. Clinical information included age, sex, HBV, HCV, tumor size, Edmonson most grade, alpha-fetoprotein level, microvascular invasion, cirrhosis, fibrosis, overall survival and disease free survival (Table S1). each expression in the tumor sample and the result was divided by the standard deviation of expression in the reference sample [20]. Clinical information included age, sex, HBV, HCV, tumor size, Edmonson most grade, alpha-fetoprotein level, microvascular invasion, cirrhosis, fibrosis, overall survival and disease free survival (Table S1).

Figure 1.
A detailed workflow of the study.

RNA Sequencing Data of 372 HCC Patients (Validation Cohort)
RPKM and z-score values of 372 HCC patients were downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/ last accessed on 30 March 2020) [21]. Complete clinical information of these samples was downloaded from the cBi-oPortal database (http://www.cbioportal.org/public-portal/, last accessed on 30 March 2020). Clinical information includes sex, HBV, HCV, tumor size, M stage, T stage, N stage, vascular invasion, cirrhosis, fibrosis, overall survival and disease free survival (Table S2).

RNA Sequencing Data of 372 HCC Patients (Validation Cohort)
RPKM and z-score values of 372 HCC patients were downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/ last accessed on 30 March 2020) [21]. Complete clinical information of these samples was downloaded from the cBioPortal database (http://www.cbioportal.org/public-portal/, last accessed on 30 March 2020). Clinical information includes sex, HBV, HCV, tumor size, M stage, T stage, N stage, vascular invasion, cirrhosis, fibrosis, overall survival and disease free survival (Table S2).

Statistical Analysis
SPSS 21.0 software (IBM, Armonk, NY, USA) and R version 3.5.1 were used for all statistical analyses. Multiple statistical tests were performed including Fisher Test, Chi square Test, Kaplan-Meier survival and univariate and multivariate cox regression survival analysis. p < 0.05 was considered as statistically significant.

Clustering and Correlation Analysis
For clustering analysis, more than~800 immune pathway and major HCC-related pathway genes were extracted through extensive text mining and database search. Correlation and clustering analysis of those genes against HLAs were performed. Spearman test was performed for correlation analysis. Hierarchical clustering was performed using Ward's agglomerative method [22]. HLAs in samples were indicated by a colored bar: red for upregulation and yellow for downregulation. Rows indicated clustering of genes ( Figure S1).

Network Analysis of Candidate Co-Expressed Genes
The gene interactions among HLA genes and immune pathway genes were retrieved using GeneMania based parameters including physical contacts, co-expression, genetic interactivity, shared protein domains and co-localization. The hub was obtained using the CytoHubba [23]. Frequently occurring hub nodes in the gene interaction network with respect to mcc centrality measure, betweenness, bottleneck and degree measure were selected for the analysis.

Gene Signature Score Calculation
Lastly, a signature score was calculated for the HLA-linked immune response genes which showed association with HCC prognosis. The following equation was used to calculate gene score: where x i : mean of expression of genes. Briefly, mean expression of all genes was calculated separately for each sample. Then mean of mean was calculated across the samples. Lastly, if the mean expression in each sample was greater than the overall mean, we gave it a score of 1. If the mean expression in each sample was less than the overall mean, we gave it a score of 0. After generating the score for significant genes using Kaplan Meier plots and univariate, multivariate cox regression methods were applied on validation and discovery cohorts for analyzing the association of gene signature with disease free survival (DFS) and overall survival (OS). Associations with p-values < 0.05 were considered significant.

Clinical Analysis of MHC Genes (Discovery Cohort)
Initially, mRNA expression of all HLA genes was screened from both discovery and validation for the analysis. Tables S1 and S2 consist of clinical characteristics of HCC patients for discovery and validation cohorts, respectively. In the discovery cohort, high expression of MHC Class I genes showed significant association with different clinical features including sex (HLA-G p = 0.003, HLA-L p = 0.018), non-HBV (HLA-E p = 0.004), Edmonson grade (HLA-C p = 0.029, HLA-E p = 0.029), tumor size (HLA-L p = 0.047) and cirrhosis (HLA-F, p = 0.034). According to survival analysis results, HLA-E (p = 0.041) showed association with better disease-free survival while HLA-G (p < 0.05) was associated with better disease free as well as overall survival of HCC patients (Table S3) (Table S3).

Clinical Analysis of HLA Genes (Validation Cohort)
Association between HLA genes and clinicopathological factors in the validation cohort is summarized in Table S4. According to the results, elevated levels of MHC class 1 genes including HLA-C, HLA-H, HLA-E, HLA-F, HLA-G, and HLA-H were associated with HCV (HLA-C p = 0.002, HLA-E p = 0.005, HLA-F p = 0.000, HLA-G p = 0.009, HLA-H p = 0.007). HLA-C showed an association with sex, alcohol consumption, cirrhosis with p values = 0.001, 0.045, 0.043 respectively. Of note, the HLA-H gene showed maximum association with clinical parameters including age (p = 0.048), sex (p = 0.023), tumor-stage (0.004) and T stage (p = 0.002). HLA-G (p = 0.007) and HLA-E (p = 0.003) also showed association with disease recurrence. Moreover, survival curves demonstrated that patients with high levels of HLA-E (p = 0.034) and HLA-H (0.045) were significantly associated with improved survival (Table S4). In addition, MHC class II genes including HLA-DMA  (Table S4). Interestingly, four genes of MHC Class II HLA-DOA, HLA-DMA, HLA-DMB and HLA-DRB6 showed significant association with disease free survival in both cohorts.

Clustering Analysis of HLA Genes with Adaptive Immune and HCC Related Genes
Out of a total 822 genes, expression levels of 505 genes were identified in both cohorts. Hierarchical clustering was performed on both cohorts using normalized expression of 505 genes by applying a heatmap function and ward d2 clustering method. Significant clusters of genes were selected which showed possible co-expression with HLAs ( Figure S1). Selected clusters were further validated using the Spearman correlation method (Table S5).
Lastly, out of all these genes, a set of 40 unique genes showing positive correlation with HLAs were selected for further analysis (Table S6).

Network Analysis and Hub Gene Identification of 40 Genes Co-Expressed with HLAs
Next, gene-gene interaction among 40 unique genes and HLAs were obtained from GeneMania. The gene-gene interaction network showed interactions based on various parameters where co-expression based interactions (in light purple color) were 72.63%, physical interactions among the genes scored 6.40% (marked in pink color) of all interactions, genetic interactions (in light green color) were 5.60% and the rest were the rare interactions ( Figure S2).
Furthermore, degree centrality, betweenness, closeness, bottleneck and MCC calculation methods were chosen as the measure of hub node identification in CytoHubba. Network results showed the top 10 genes as potential hub nodes ( Figure S3). Common genes present in all five networks including NCF4, TYROBP, LCP2, ZAP70, PTPRC, FYN and WAS were identified as key hub nodes.

Clinical Analysis of Seven Genes Signature
Next, the clinical relevance of these seven HLA associated genes was evaluated. All these genes showed significant associations with different clinical features (Table 1). Upregulation of TYROBP and WAS showed significant association with age <60 (p value = 0.035).
Similarly, in validation cohort, high expression of these genes showed significant association with different clinical features including sex (FYN p = 0.038), grade (FYN p = 0.001), tumor stage WAS p = 0.037, ZAP70 p = 0.043) and T stage (WAS p = 0.049, ZAP70 p = 0.049) ( Table 2). Kaplan-Meier survival curves results showed that elevated expression of LCP2, PTPRC, WAS, ZAP70 and FYN were associated with delayed recurrence. Furthermore, upregulation of FYN and ZAP70 showed significant association with better survival (Figure 3).

Survival Analysis of 7 Genes Signature Score
Lastly, after identifying the significance of these seven MHC linked genes at multiple genetic and prognostic levels, we generated a score based on the mean expression of all significant genes. Interestingly, Kaplan-Meier survival analysis revealed that upregulation of seven gene signature in both cohorts (Discovery p = 0.0021) and (Validation p = 0.0066) showed improved disease-free survival in HCC patients (Figure 4a,b). Of note, we also downloaded another HCC expression cohort (ID: GSE45114) to further establish the association between seven gene signature and disease recurrence. Interestingly, after gene signature score estimation, we identified that upregulation of seven gene signature showed association with increased DFS of HCC patients (p = 0.017) (Figure 4c).    significant genes. Interestingly, Kaplan-Meier survival analysis revealed that upregulation of seven gene signature in both cohorts (Discovery p = 0.0021) and (Validation p = 0.0066) showed improved disease-free survival in HCC patients (Figure 4a,b). Of note, we also downloaded another HCC expression cohort (ID: GSE45114) to further establish the association between seven gene signature and disease recurrence. Interestingly, after gene signature score estimation, we identified that upregulation of seven gene signature showed association with increased DFS of HCC patients (p = 0.017) (Figure 4c).

Discussion
Clinically, HCC is one of the most challenging cancers with an increasing mortality rate. In order to improve patient prognosis, the focus of HCC is gradually transforming from treatment to boosting host immune response [24,25]. Hence, identification of potential biomarkers for devising new immunotherapies and boosting patient prognosis are the prime focus of the study. In this study, using RNA-seq data of 576 HCC patients, a panel of seven genes (including LCP2, TYROBP, ZAP70, PTPRC, FYN, WAS and NCF4) has been identified which are independent predictors of delayed tumor recurrence and improved patient prognosis.
According to the statistical analysis results, several HLA genes (including HLA-A, HLA-C, HLA-E, HLA-F, HLA-G, HLA-L, HLA-DMB, HLA-DQA2 and HLA-DRA) showed association with different clinicopathological features in both HCC cohorts. Recent studies demonstrated the linkage of all MHC classes with HCC prognosis [26]. Interestingly, according to our results, HLA-E, HLA-DMA, HLA-DMB, HLA-DOA and HLA-DRB6 showed association with improved disease-free survival in both cohorts, emphasizing the need to investigate the functional implications of HLA expression in HCC using pre-clinical and clinical studies.
Interestingly, our study showed strong association of five HLA genes (including HLA-DMA, HLA-DMB, HLA-DOA, HLA-DRB6 and HLA-E) with delayed tumor recurrence in both cohorts. Previous studies have demonstrated aberrant expression of HLA class II components in epithelial ovarian cancer cells [27]. Consistent with our findings, HLA-DMB expression in ovarian cancer epithelial cells was associated with CD8 T cell infiltration and a marked improvement in disease survival [27]. Similarly, a recent study showed decreased expression of HLA-DMA, HLA-DMB and HLA-DOA in HBV associated HCC patients [26]. In a recent study, HLA-DRB6 was found to be upregulated among cervical cancer tissues and cell line data [28]. Additionally, a decreased expression of HLA-DRB6 was also reported in HCC patients compared to non-tumor liver tissues [26,29]. Furthermore, a cell line based study suggested that the upregulation of HLA-E represents a mechanism of tumor escape from NK cytotoxicity in ovarian cancer and melanoma [30]. Additionally, dysregulation of HLA-E expression and genetic mutations are also previously reported in HCC. However, the clear mechanism of how HLA-E affects tumor environment is largely unknown.
Recently, reactivation of the T cell receptor (TCR) signaling pathway gained focus in HCC [31,32]. Interestingly, most of the HLA linked genes identified in our study work in concordance to modulate TCR signaling, except NCF4. For instance, activation of CD3 heterodimer complex (TCR-CD3) on T cell membrane stimulates ZAP70 responsible for both recruitment and phosphorylation of numerous downstream adaptors proteins. Two main substrates for ZAP70 are linked for the activation of T cells (LAT) and LCP-2/SLP-76 [33]. Interestingly, ZAP70 influences calcium mobilization (via PL-Cγ1) and ras activation (via GRB2) by interacting with LAT, while for cytoskeleton rearrangement (via VAV1), LCP-2 complex plays a pivotal role. Regulation of these signaling cascades sets the threshold of activation of naive T cells so that they are not activated by selfpeptide-MHC complexes but respond only to foreign peptide-MHC complex [34]. This signaling triggers Rho family GTPases (Rac1 and Rac2), leading to actin polymerization via Wiskott-Aldrich syndrome protein (WASP) and ARP2-ARP3 complex. This structural rearrangement is desired for both recognition and induction of cytotoxic effect on APC cells. Hence, enhanced LCP-2 significantly boosts cell mediated immune response. Involvement of LCP-2 as the key mediator responsible for enhanced T cell activation and immune function is already known [35].
Additionally, several co-stimulatory factors also regulate maturation of naive T-cells. TCR-CD3 complex also activates Src family kinase members (Lck and Fyn), leading to activation of several signaling pathways, such as the ERK pathway [36]. Interestingly, FYN irrespective of Lck can also stimulate the ERK pathway while PTPRC (CD45) maintains dephosphorylated form at Tyr 505 of C-terminal residues of Lck and Fyn retaining their active conformation inside cell [37]. Of note, TYROBP is also previously identified as a substrate of PTPRC gene in TCR signaling [38].
Of note, these seven genes were linked with different cancers in previous studies. For instance, LCP2 expression was linked with improved survival in breast and ovarian cancer [39]. Similarly, ZAP70, PTPRC, FYN and WASP2 were found to be dysregulated in different cancers including breast, leukemia, colorectal cancer, etc. [40][41][42][43]. However, none of the studies showed any linkage of these seven genes with delayed tumor recurrence in HCC. Our integrated approach provided a novel insight in predicting new prognostic subtypes of HCC. However, due to the sparsity of the high throughput sequencing data and mixture of HBV and HCV infected patients, further in-vitro and in-vivo studies are warranted to evaluate the effect of HBV/HCV infection before consideration of gene signature in clinical practice.
In conclusion, our data suggests that the seven gene signature can boost immune response in HCC patients by mainly regulating TCR signaling. Earlier studies showed that TCR associated immunotherapies are also emerging as new treatment options against HCC. Interestingly, we demonstrated that despite the substantial heterogeneity of HCC at the genomic level, whole transcriptomic data analysis is an excellent strategy to identify new biomarkers for improving HCC prognosis. Of note, this immune related signature is also validated in three independent cohorts, suggesting it as a potential biomarker for HCC patients worldwide.  Table S1: Clinicopathological characteristics of Discovery cohort (204 HCC patients), Table S2: Clinicopathological characteristics of Validation cohort (372 HCC patients), Table S3: Association of expression of HLA genes with clinical features in Discovery cohort, Table S4: Association of expression of HLA genes with clinical features in Validation cohort, Table S5: MHC gene and their  correlated gene list, Table S6

Conflicts of Interest:
The authors declare no conflict of interest.