A Multi-Omics Analysis of NASH-Related Prognostic Biomarkers Associated with Drug Sensitivity and Immune Infiltration in Hepatocellular Carcinoma

Background: Nonalcoholic steatohepatitis (NASH)-driven hepatocellular carcinoma (HCC) is becoming a major health-related problem. The exploration of NASH-related prognostic biomarkers and therapeutic targets is necessary. Methods: Data were downloaded from the GEO database. The “glmnet” package was used to identify differentially expressed genes (DEGs). The prognostic model was constructed by the univariate Cox and LASSO regression analyses. Validation of the expression and prognosis by immunohistochemistry (IHC) in vitro. Drug sensitivity and immune cell infiltration were analyzed by CTR-DB and ImmuCellAI. Results: We constructed a prognostic model that identified the NASH-related gene set (DLAT, IDH3B, and MAP3K4), which was validated in a real-world cohort. Next, seven prognostic transcription factors (TFs) were identified. The prognostic ceRNA network included three mRNAs, four miRNAs, and seven lncRNAs. Finally, we found that the gene set was associated with drug response which was validated in six clinical trial cohorts. Moreover, the expression level of the gene set was inversely correlated with CD8 T cell infiltration in HCC. Conclusions: We established a NASH-related prognostic model. Upstream transcriptome analysis and the ceRNA network provided clues for mechanism exploration. The mutant profile, drug sensitivity, and immune infiltration analysis further guided precise diagnosis and treatment strategies.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common cancers with an increasing mortality rate [1,2]. Nonalcoholic steatohepatitis (NASH), a manifestation of systemic metabolic disease, has been recognized as a leading cause of HCC and liver transplantation in recent years [3]. Although the incidence of NASH-driven HCC is lower than that of other causes (such as hepatitis virus), the number of NASH patients is large, and the incidence of NASH is projected to double within 10 years [4].

Construction of a Prognostic Model and Nomogram
The least absolute shrinkage and selection operator (LASSO) regression model was performed using the "glmnet" package in TCGA-LIHC. Univariate Cox regression analysis further screened the DEGs with prognostic value using the "survival" package. The risk score was calculated according to the centralized and standardized mRNA expression data in TCGA-LIHC. Risk score = ∑ n i=1 exp(Xi) coef (Xi). Here, risk score = expression level of DLAT × 0.255685 + expression level of IDH3B × 0.027537 + expression level of MAP3K4 × 0.190317. Next, the overall survival (OS) curves and receiver operating characteristic (ROC) curves were produced by the "survival" package and "timeROC" package, respectively. The "rms" and "survival" packages were used to form nomogram and calibration plots.

Specimen Collection
Liver tissues were obtained from Xiangya hospital. Two highly qualified pathologists independently diagnosed these samples. HCC tissues and their adjacent normal tissues were collected and immediately transferred to −80 • C for storage once excised. Formalin-fixed tissue samples were embedded with paraffin for subsequent immunohistochemistry (IHC). This study was performed in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Xiangya hospital (ID: 202105066). All participants/patients have given informed consent.

Immunohistochemistry (IHC)
After dehydration and paraffin embedding, the liver tissues fixed with 4% paraformaldehyde were prepared into tissue chips. After dewaxing and hydration, 10 mM sodium citrate was used at 95 • C for 15 min for antigen repair. After cooling to room temperature, endogenous peroxidase was blocked with 3% H2O2 for 10 min. The diluted primary antibodies (DLAT Afantibody Cat# AF302309; IDH3B Afantibody Cat# AF10574; MAP3K4 Afantibody Cat# AF05249) were then incubated at 4°C overnight. Next, biotin-labeled IgG (Abcam Cat# ab6721) was used to detect antibody binding. DAB-H 2 O 2 and hematoxylin were prepared for staining. Images were photographed with a microscope. The staining score was defined by both the staining intensity score and the positive staining percentage score. The staining intensity was divided into 4 grades: negative is 0; weakly positive is 1; moderately positive is 2; strongly positive is 3. The percentage of positive staining was divided into 4 groups: 1 means positive in 1-25%; 2 means positive in 26-50%; 3 means positive in 51-75%; 4 means positive in 76-100%.

Drug Sensitivity and Response Analysis
We collected the IC50 data and mRNA gene expression from the Genomics of Therapeutics Response Portal (GTRP) (https://portals.broadinstitute.org/ accessed on 10 October 2022) [18]. Pearson correlation analysis was performed to obtain the correlation between gene mRNA expression and drug IC50. Data on drug response to cancer treatments were obtained from the cancer treatment response gene signature database (CTR-DB) (http://ctrdb. cloudna.cn/ accessed on 10 October 2022) [19]. CTR_Microarray_14, CTR_Microarray_15, CTR_Microarray_35, CTR_Microarray_71, CTR_RNAseq_410, and CTR_RNAseq_386 were analyzed. The ability to predict drug response was based on the AUC value.

Immune Cell Abundance Identifier (ImmuCellAI)
The association between gene expression and immune cell infiltration was analyzed by ImmuCellAI (https://github.com/lydiaMyr/ImmuCellAI accessed on 10 October 2022) [20]. Spearman's coefficient was used for evaluation. The immune infiltration data were obtained from TCGA. Heatmaps were drawn through the "ggplot2" package.

Statistical Analysis
Statistical analysis was performed in R software (v.3.6.3) and GraphPad Prism. The differential expression analyses were detected with t-test. The survival analyses were determined by the Logrank test. In all analyses, p-value < 0.05 was considered statistically significant. *, **, and *** indicated p < 0.05, p < 0.01, and p < 0.001, respectively.

Screening of the Common DEGs in NASH and HCC
In search of gene expression microarrays regarding NASH and HCC in the GEO database, three datasets (GSE89632, GSE99807, and GSE87630) were finally included ( Figure 1A). Subsequently, differential expression analysis was conducted. A total of 487 DEGs were obtained in GSE89632, 828 DEGs were obtained in GSE99807, and 1247 DEGs in GSE87630. The corresponding volcano maps and heat maps (top 20) are shown in Figure 1B-D. Next, we further identified 19 DEGs that appeared in all 3 databases ( Figure 1E). The gene symbols and interactions of these 19 significant DEGs are presented in Figure 1F and Table S1.

Functional Enrichment Analysis for the Common DEGs
We performed functional enrichment analysis to predict the underlying biological function and corresponding pathways between NASH and HCC. The enriched GO functions included citrate metabolic process and acetyl-CoA metabolic process in the BP category ( Figure 2A); mitochondrial matrix, oxidoreductase complex, and secretory granule lumen in the CC category ( Figure 2B); oxidoreductase activity, acting on the aldehyde or oxo group of donors and sulfur compound binding in the MF category ( Figure 2C). Some enriched KEGG pathways were also observed, among which citrate cycle, carbon metabolism, and pyruvate metabolism were the most highly enriched pathways ( Figure 2D). We performed functional enrichment analysis to predict the underlying biological function and corresponding pathways between NASH and HCC. The enriched GO functions included citrate metabolic process and acetyl-CoA metabolic process in the BP category ( Figure 2A); mitochondrial matrix, oxidoreductase complex, and secretory granule lumen in the CC category ( Figure 2B); oxidoreductase activity, acting on the aldehyde or oxo group of donors and sulfur compound binding in the MF category ( Figure 2C). Some enriched KEGG pathways were also observed, among which citrate cycle, carbon metabolism, and pyruvate metabolism were the most highly enriched pathways ( Figure 2D).

Construction and Validation of a Prognostic Model
As shown in Figure 1F, we screened out 19 common DEGs. The LASSO regression algorithm and Cox regression analysis were used to refine the gene set ( Figure 3A-C) to reduce the overfitting of high-dimensional genes. In this manner, three genes were identified as the most valuable predictive DEGs, and the risk score system was established using the formula mentioned above. In addition, we also analyzed the expression levels of DLAT, IDH3B, and MAP3K4 in HCC ( Figure 3D) and their prognostic value in different clinical subgroups: T3 and T4 stage, pathological stage III and IV, fibrosis Ishak score 3-6, and Child-Pugh grade B and C. The results indicated that the lower expression of the three genes might be correlated with better prognosis in advanced patients ( Figure 3E-G, Figure S1).   In order to confirm whether this risk model could predict the prognosis better, patients from the TCGA-LIHC database were randomly divided into a training set (50%) and a test set (50%). The median risk score was defined as the threshold, dividing the HCC patients into high-risk and low-risk groups. We found that the high-risk group had a higher mortality rate compared with the low-risk group in the training set ( Figure 4A). Kaplan-Meier curve showed that patients in the high-risk group had a worse prognosis (p < 0.001) ( Figure 4B). Moreover, the prognostic accuracy of OS was 0.775 at one year, 0.699 at three years, and 0.658 at five years ( Figure 4C). In the test set, the high-and low-risk groups exhibited significantly different survival probabilities (p = 0.008) with an AUC of 0.780, 0.726, and 0.699 in a time-dependent ROC at one, three, and five years ( Figure 4D-F). In addition, the prognostic model constructed based on our risk score was superior to the single gene ( Figure 4G-I) and other models that have been published [21][22][23][24].

Construction of the Nomogram and Calibration Curves
To better guide clinical decision-making, we established a nomogram combining multiple clinical parameters, including age, gender, T stage, pathological stage, and risk scores ( Figure 5A). In the nomogram, each signature was assigned points according to its risk contribution to OS. Furthermore, we constructed calibration curves, which illustrated a remarkable consensus with the actual survival time of patients with HCC ( Figure 5B). Based on these findings, we found that the nomogram containing our risk scores performed well in predicting OS in HCC.

Construction of the Nomogram and Calibration Curves
To better guide clinical decision-making, we established a nomogram combining multiple clinical parameters, including age, gender, T stage, pathological stage, and risk scores ( Figure 5A). In the nomogram, each signature was assigned points according to its risk contribution to OS. Furthermore, we constructed calibration curves, which illustrated a remarkable consensus with the actual survival time of patients with HCC ( Figure 5B). Based on these findings, we found that the nomogram containing our risk scores performed well in predicting OS in HCC.

Validation of the Expression and Prognosis in a Real-World HCC Cohort
We collected 30 pairs of HCC tissues and normal liver tissues to make the tissue microarray (Table 1). Excluding sample loss during tissue microarray fabrication and staining, 28 pairs of samples are available. The protein expression levels of DLAT, IDH3B, and MAP3K4 were detected by IHC. The high/low expression of the target protein was relative, defined by the median IHC staining score. Consistent with the previous analysis, our results showed that DLAT presented a high expression level, and the patients with high DLAT had poorer OS (Log-rank p = 0.049) ( Figure 6A

Prediction of the Upstream TFs in HCC
To better understand the regulatory mechanisms, transcriptome analysis was performed using GRNdb in HCC. First, we predicted 32 TFs targeted DLAT, 35 TFs targeted IDH3B, and 14 TFs targeted MAP3K4 (Table S2) and constructed a transcriptional regulatory network ( Figure 7A). To further explore the intrinsic relationship of the gene set, we screened TFs targeting at least two genes for subsequent analysis ( Figure 7B). The correlation between TFs and target genes was shown by heatmaps ( Figure 7C-E). Based on correlation analysis (R > 0.3) and prognostic analysis (p < 0.05), we finally identified seven potential TFs: RELA, RAD21, PML, EZH2, ELF4, NFYA, and SUPT20H ( Figure 7F). The corresponding relationships between TFs and target genes are presented in Figure 7G.

Construction of ceRNA Network in HCC
In recent years, more and more evidence has revealed the role of mRNA-miRNA-lncRNA networks in a variety of human cancers. The predicted miRNAs and lncRNAs are from mirDIP and StarBase V2.0. These two databases combined nearly 30 different sources, and we selected the targets with the highest score class for further analysis. Finally, we identified 8 miRNAs that targeted DLAT, 10 miRNAs that targeted IDH3B, and 9 miRNAs that targeted MAP3K4 ( Figure 8A-D, Table S3). Based on the classical inverse relationship between miRNAs and target genes, we hypothesized that the upstream miRNAs should theoretically display lower expression levels and favorable prognosis. Therefore, we further evaluated the expression and prognosis of 27 predicted miRNAs in HCC. The results showed that four miRNAs (hsa-miR-1271-5p, hsa-miR-582-3p, hsa-miR-491-5p, and hsa-miR-148a-3p) out of 27 miRNAs functioned as significant prognostic biomarkers for patients with HCC ( Figure S2A).
Next, we further predicted those lncRNAs that can potentially bind to the four key miRNAs ( Figure 8E, Table S4) based on the StarBase. There is a negative correlation between lncRNAs and miRNAs, according to the ceRNA hypothesis. Combining the expression analysis and survival analysis results for these predicted lncRNAs, we defined the seven lncRNAs: LINC00847 and LINC01011 targeted hsa-miR-582-3p; MAPKAPK5-AS1 targeted hsa-miR-1271-5p; AC078846.1 targeted hsa-miR-491-5p; and SNHG4, LINC00667, and AC093227.1 targeted hsa-miR-148a-3p ( Figure S2B). Through the above-integrated analysis, we have constructed the mRNA-miRNA-lncRNA triple network ( Figure 8F), which was significantly associated with the prognosis of HCC. The network may also be developed as promising prognostic biomarkers or therapeutic targets for HCC in the future.
IDH3B, and 14 TFs targeted MAP3K4 (Table S2) and constructed a transcriptional regu tory network ( Figure 7A). To further explore the intrinsic relationship of the gene set, screened TFs targeting at least two genes for subsequent analysis ( Figure 7B). The cor lation between TFs and target genes was shown by heatmaps ( Figure 7C-E). Based correlation analysis (R > 0.3) and prognostic analysis (p < 0.05), we finally identified sev potential TFs: RELA, RAD21, PML, EZH2, ELF4, NFYA, and SUPT20H ( Figure 7F). T corresponding relationships between TFs and target genes are presented in Figure 7G.

The Expression and Mutation Profile in Pan-Cancer
To further explore the role of the gene set (DLAT, IDH3B, and MAP3K4) across diverse cancer types, we analyzed their expression levels and mutant profiles. We found that the gene set had consistently high expression levels in CHOL, ESCA, LIHC, LUAD, LUSC, and STAD but low expression levels in KIRC and THCA ( Figure 9A-C). To further explore the role of the gene set (DLAT, IDH3B, and MAP3K4) across diverse cancer types, we analyzed their expression levels and mutant profiles. We found that the gene set had consistently high expression levels in CHOL, ESCA, LIHC, LUAD, LUSC, and STAD but low expression levels in KIRC and THCA ( Figure 9A-C).  Mutation data of DLAT, IDH3B, and MAP3K4 were analyzed and visualized in the GSCA online tool. The landscape of mutation in each cancer sample was presented in the waterfall plot ( Figure 10A). By comparison, missense mutations, SNPs, and C > T accounted for the largest fraction of SNV in pan-cancer ( Figure 10B). SNV heatmap indicated that the mutation rate of MAP3K4 was the highest, and uterine corpus endometrial carcinoma (UCEC) had the highest mutation rate ( Figure 10C, Table S5). On the other hand, CNV percentages in pan-cancer are shown in Figure 10D. As for DLAT, the highest copy number amplification (>33%) was found in DLBC, and the highest copy number deletion (>81%) was found in TGCT. As for IDH3B, the highest copy number amplification (>62%) was found in UCS, and the highest copy number deletion (>29%) was found in READ. As for MAP3K4, the highest copy number amplification (>44%) was found in UCS, and the highest copy number deletion (>77%) was found in KICH (Table S5). In addition, the prognostic value of the gene set mutations was also analyzed in pan-cancer ( Figure 10E,F). The endpoints of OS, PFI, DFI, and DSS data are in accordance with Liu's suggestion [25]. The results suggested that the gene set mutant had better PFS and DFI than the wild type in UCEC but had worse PFS in OV. Moreover, the gene set amplification/deletion type in THCA, SARC, UCEC, and LGG had consistently better OS, PFS, and DSS than the wild type.

Drugs Sensitivity and Response Analysis
The correlations between the gene set expression and drug sensitivity (top 30) in pancancer were summarized in Figure 11A and Table S6. We found that the expression level of the gene set was generally negatively correlated with drug sensitivity. Next, we evaluated the response of the gene set to therapeutic drugs in six real-world trial cohorts (Table 2). Figure 11B-D revealed the ability of DLAT to predict response to Imatinib, FOLFIRI, and FOLFOX6 regimen were 0.744, 0.824, and 0.756. Figure 11E,F revealed the ability of IDH3B to predict response to Paclitaxel and Fluorouracil were 0.868 and 0.84. Figure 11G revealed the ability of MAP3K4 to predict response to Peginterferon alfa-2a was 0.921.

Correlation Analysis of Immune Cell Infiltration
With the popularity of immunotherapy, attention has been paid to the role of the tumor microenvironment in the process of tumor development, metastasis, recurrence, and drug resistance [26]. We analyzed the relationship between the gene set expression levels and immune cell infiltration in 33 human tumors. As shown in Figure 12A-C, we selected six common immune cells: CD8 T cell, CD4 T cell, B cell, macrophage, neutrophil, and dendritic cell. To show the results more clearly as a whole, we used the GSVA score to replace the gene set expression value ( Figure 12D) (Table S7). GSVA score represented the comprehensive level of gene set expression and was positively correlated with gene set expression [27]. It is generally accepted that CD8 T cells are the main performers of the immune response. Specifically in LIHC, we found that DLAT, MAP3K4, and CD8 T cell infiltration were negatively correlated ( Figure 12E). Further evaluation by the infiltration score results suggested that the expression level of the gene set was inversely correlated with immune cell infiltration in general ( Figure 12F).
2). Figure 11B-D revealed the ability of DLAT to predict response to Imatinib, FOLFIRI, and FOLFOX6 regimen were 0.744, 0.824, and 0.756. Figure 11E,F revealed the ability of IDH3B to predict response to Paclitaxel and Fluorouracil were 0.868 and 0.84. Figure 11G revealed the ability of MAP3K4 to predict response to Peginterferon alfa-2a was 0.921.

Discussion
In recent decades, NASH has gradually developed into the most common chronic liver disease, which is closely associated with obesity, type 2 diabetes mellitus, dyslipidemia, and hypertension [28]. NASH-driven HCC is a serious public health concern as the number of cases increases year by year. Current international clinical practice guidelines for HCC do not consider etiology. Therefore, the explorations of NASH-related biomarkers and prognostic models may help stratify patients and predict prognosis, and reveal potential therapeutic targets.

Discussion
In recent decades, NASH has gradually developed into the most common chronic liver disease, which is closely associated with obesity, type 2 diabetes mellitus, dyslipidemia, and hypertension [28]. NASH-driven HCC is a serious public health concern as the number of cases increases year by year. Current international clinical practice guidelines for HCC do not consider etiology. Therefore, the explorations of NASH-related biomarkers and prognostic models may help stratify patients and predict prognosis, and reveal potential therapeutic targets.
Studies of the mechanisms of NASH-driven HCC have provided us with some potential targets. STARD1 is considered to affect the progression of HCC by promoting the synthesis of primary bile acids through the mitochondrial pathway [29]. Activation of the E2F1-E2F2-CPT2 axis established a lipid-rich pro-tumor environment [30]. FGF21, which controls inflammation and anti-fibrosis, has been identified as a novel intervention target for NASH-HCC transformation [31]. Targeted inhibition of the LPL/FABP4/CPT1 axis can block malignant transformation during NASH [32]. Studies of senescent cells also indicated that nicotinamide adenine dinucleotide (NAD), silence information regulator protein family (sirtuins), and mechanistic target of rapamycin (mTOR) were the major signaling pathways involved in NASH-driven HCC [33]. Augmenter of liver regeneration (ALR) is an important enzyme to maintain mitochondrial dynamics [34]. Zheng et al. proved that the ubiquitination degradation of ALR had a potential contribution to NASH-driven HCC progression at the mitochondrial level [35].
We defined three common DEGs between NASH and HCC. DLAT, or dihydrolipoamide S-acetyltransferase, has been reported to mediate glycolysis reprogramming in non-small cell lung cancer [36]. It is also considered to be one of the key proteins regulating cuproptosis [37]. IDH3B (isocitrate dehydrogenase 3β) is thought to cooperate with the cell cycle and participate in cell proliferation in squamous esophageal carcinoma [38]. MAP3K4 (mitogenactivated protein kinase kinase kinase 4), as an important member of the MAPK pathway, is known to be a regulatory gene related to abnormal lipid droplet accumulation [39]. However, their roles in HCC have not been reported.
We constructed a prognostic risk model using these three DEGs through LASSO regression analysis and univariate Cox analysis. Transcriptome and ceRNA network analyses were performed to better understand the regulatory mechanisms. We finally identified seven potential TFs that may be involved in the upstream regulation of this gene set. In addition, we constructed a prognosis-related ceRNA network consisting of seven lncRNAs, four miRNAs, and three mRNAs. The network may also be developed as promising prognostic biomarkers or therapeutic targets for HCC in the future. The mutant profile and drug sensitivity/response analysis further guided gene diagnosis and clinical medication. It is well known that the treatment of HCC is a comprehensive treatment strategy based on surgery. There are few drugs for HCC compared to other cancers. Therefore, the hypothesis and exploration of "drug functional relocation" and "drug combination" have become more important and urgent for HCC. Although none of these drugs were used in the treatment of HCC, this study at least provided novel therapeutic hypotheses and potential therapeutic targets.
It is well known that immunotherapy for advanced HCC has shown promising results in animal models and clinical trials. However, surprisingly, NASH-HCC mouse models did not respond to PD-1/PD-L1 treatment but instead exacerbated liver fibrosis and increased the incidence of tumors [40]. Dominik Pfister et al. analyzed three clinical trials (CheckMate 459, IMbrave 150, and Keymat-240) and found that immunotherapy did not improve survival in patients with non-viral HCC [41][42][43][44]. Our study showed that the infiltration of immune cells, mainly CD8 T cells, was negatively correlated with the gene set expression. We preliminarily believe that DLAT and MAP3K4 are one of the targets, and interfering with their expression may improve the immunotherapy efficacy of NASH-HCC.
However, our study has some limitations. First, further exploration of its biological functions and mechanisms in vivo and in vitro is necessary, which we are carrying out. Secondly, single-cell sequencing and flow sorting are needed to further verify the correlation between the gene set and the tumor immune microenvironment. Thirdly, the implementation of an international multi-regional clinical trial is necessary to avoid selection bias.

Conclusions
To sum up, by combining machine learning with multi-omics analysis, we found a set of NASH-related prognostic biomarkers, which may contribute to the development of more personalized and precise treatment strategies for HCC.

Institutional Review Board Statement:
The Ethics Committee approved this study at Xiangya Hospital (202105066).