Prognosis-Related Molecular Subtypes and Immune Features Associated with Hepatocellular Carcinoma

Simple Summary Currently, there is no effective method to detect the prognosis for hepatocellular carcinoma (HCC). This study used bioinformatics techniques to determine HCC molecular subtypes and prognosis-related biomarkers. A total of 3330 intersectional differentially expressed genes (DEGs) with the same differential direction in four datasets were identified by differential expression analysis. Intersectional DEGs were involved in the cell cycle, FOXO signaling pathway, and complement and coagulation cascades. Then, two subtypes were identified using a non-negative matrix decomposition method. Subtype C2 displayed better overall survival than subtype C1. Moreover, 217 prognostic related-genes were identified using the Cox regression and Kaplan-Meier curves. The area under the curve >0.80 of prognostic relate-genes were selected to construct random survival forest and the least absolute shrinkage and selection operator model and obtained seven feature genes (SORBS2, DHRS1, SLC16A2, RCL1, IGFALS, GNA14 and FANCI). Risk score model and recurrence model were constructed based on feature genes, and FANCI was inferred as a key gene by univariate Cox model. High expression of FANCI was mainly involved in cell cycle, DNA replication and mismatch repair. Interestingly, Single Sample Gene Set Enrichment Analysis was used to quantify immune infiltration and showed that Th2 cells and T helper cells were significantly up regulated in HCC compared to controls. Furthermore, we found the presence of two mutation sites as well as methylation modifications occurred in FANCI. Overall, we identified two types of HCC and identified that FANCI will serve as a potential biomarker for HCC prognosis and be important to the diagnosis and treatment of HCC. Abstract Bioinformatics tools were used to identify prognosis-related molecular subtypes and biomarkers of hepatocellular carcinoma (HCC). Differential expression analysis of four datasets identified 3330 overlapping differentially expressed genes (DEGs) in the same direction in all four datasets. Those genes were involved in the cell cycle, FOXO signaling pathway, as well as complement and coagulation cascades. Based on non-negative matrix decomposition, two molecular subtypes of HCC with different prognoses were identified, with subtype C2 showing better overall survival than subtype C1. Cox regression and Kaplan-Meier analysis showed that 217 of the overlapping DEGs were closely associated with HCC prognosis. The subset of those genes showing an area under the curve >0.80 was used to construct random survival forest and least absolute shrinkage and selection operator models, which identified seven feature genes (SORBS2, DHRS1, SLC16A2, RCL1, IGFALS, GNA14, and FANCI) that may be involved in HCC occurrence and prognosis. Based on the feature genes, risk score and recurrence models were constructed, while a univariate Cox model identified FANCI as a key gene involved mainly in the cell cycle, DNA replication, and mismatch repair. Further analysis showed that FANCI had two mutation sites and that its gene may undergo methylation. Single-sample gene set enrichment analysis showed that Th2 and T helper cells are significantly upregulated in HCC patients compared to controls. Our results identify FANCI as a potential prognostic biomarker for HCC.


Introduction
Primary liver cancer is the fourth leading cause of cancer-related deaths worldwide, and it shows remarkable histological and biological heterogeneity [1,2]. Hepatocellular carcinoma (HCC) is the most common type, accounting for >90% of primary liver cancers [3] and with a 5-year survival rate below 5% [4]. The major risk factors for HCC include infection with hepatitis C or hepatitis B virus, excessive alcohol consumption, smoking, and diet [5]. Genetic risk factors may interact with these environmental risk factors, such as mutations in the genes CTNNB1 (encoding β-catenin), TP53, and AXINI [6].
Although early HCC can be treated surgically, recurrence or metastasis may still occur [7]. One major reason is intratumoral heterogeneity, which reduces the efficacy of current therapies against other cancers [8]. Sorafenib can significantly improve overall survival (OS) in patients with advanced HCC, but it is extremely expensive and therefore inaccessible to many patients [9]. Identifying genes that help predict HCC prognosis may facilitate personalized treatment and management.
Since HCC is so heterogeneous, tumors in different regions of the body may represent different subtypes [10]. These subtypes can differ in metabolic and signaling pathways, leading to differences in patient survival [11]. For example, one study has identified three HCC subtypes whose tumor microenvironments differ in terms of immune cell compositions [12], and these differences can influence cancer progression [13,14]. However, only a few studies have identified HCC subtypes based on a direct transcriptomic comparison of tumor and non-tumor tissues.
In this manuscript, we used bioinformatics to compare gene expression between tumor and non-tumor tissue in order to identify HCC molecular subtypes and biomarkers to aid the prediction of prognosis. In addition, we used single-sample gene set enrichment analysis (ssGSEA) to determine the levels of immune infiltration and characterize the tumor immune microenvironment. Our study identified two molecular subtypes of HCC patients and detected key genes that may serve as potential biomarkers and therapeutic targets for HCC.

Human Subject
In this study, six pairs of HCC and paired paracancerous normal tissue samples were collected from six patients with pathological diagnosis of HCC at different clini-cal stages from the Guangxi Medical University Cancer Hospital. Clinical follow-up found early tumor progression in patients 2 (RFS 66 Days) and 5 (RFS 248 Days) after surgery. All included patients consented to a sequencing and experimental protocol carried out for all research procedures and the approval for these was provided by the ethics review committee of the Guangxi Medical University Cancer Hospital (LW20221156).

Data Collection
In order to obtain the intersection genes and the main datasets that have more of the same differentially expressed genes, we used more large samples for analysis in this study. Gene expression profiles of 371 primary tumor samples, 3 recurrent tumor samples, and 50 normal tissue samples were downloaded from the liver hepatocellular carcinoma (LIHC) dataset in The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/, accessed on 17 September 2021) [15]. In the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/, accessed on 17 September 2021) [16]

Screening for Differentially Expressed Genes (DEGs) and Enrichment Analysis
The TCGA datasets and GEO datasets (GSE14520, GSE76427, and GSE25097) were screened for differentially expressed genes (DEGs) between HCC and normal tissues using the limma package in R [17]. Statistically significant DEGs (p < 0.05) were then identi-

Screening for Differentially Expressed Genes (DEGs) and Enrichment Analysis
The TCGA datasets and GEO datasets (GSE14520, GSE76427, and GSE25097) were screened for differentially expressed genes (DEGs) between HCC and normal tissues using the limma package in R [17]. Statistically significant DEGs (p < 0.05) were then identified, and DEGs differentially expressed in the same up or down expressed direction across the four datasets were considered as overlapping DEGs and analyzed for enrichment of Gene Ontology (GO) functions (including molecular function, MF; cellular component, CC; biological process, BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways using the clusterProfiler package in R [18]. Gene set enrichment analysis (GSEA) was then performed, and the results were analyzed using the fgsea package in R.

Cox Regression and Kaplan-Meier Analyses
Cox regression and Kaplan-Meier curve analyses were performed to identify overlapping DEGs that were significantly associated with OS in both GSE14520 and TCGA datasets. Genes with a hazard ratio >1 or <1 in both datasets based on the Cox regression model were selected, and those genes that also showed a significant effect on survival based on Kaplan-Meier analysis were defined as prognosis-related genes. The Kaplan-Meier curves were used to estimate the conditional survival (CS) [19][20][21] of HCC patients, under the assumption that the survival rate for 0-5 years was 100% for patients in the GSE14520 dataset. The functional enrichment of OS-related genes was further verified using metascape (https://metascape.org, accessed on 20 September 2021).

Construction of Random Survival Forest and Least Absolute Shrinkage and Selection Operator Regression Models
A random forest model was constructed based on prognosis-related genes whose area under the receiver operating characteristic curve (AUC) >0.80 (p < 0.01) using the "coxph" function in the survival package. Prognostic genes with a relative importance >0.2 were considered as final signature genes.
To further reduce model overfitting, we performed least absolute shrinkage and selection operator (LASSO) regression based on the final signature genes using the glmnet package [22]. After 10-fold cross-validation of the parameter selection in the LASSO model, the results were further processed using the "ploidy history" function to obtain feature genes, which were used to calculate the classification efficiency for the 5-year risk score using the timeROC package [23].

Construction of the Gaussian Finite Mixture Model
In order to identify feature genes with a strong ability to diagnose HCC recurrence, we constructed a Gaussian mixture model (GMM) using 127 combinations of expression profiles obtained from the TCGA and GSE14520 datasets. The optimal cluster was determined based on the AUC calculated for each model.

Construction of the Feature Gene-Based Risk Score Prognostic Model
Feature genes associated with OS were determined by univariate Cox regression analysis using the forestplot package in R. The risk score for each patient was calculated using the "predict" function of the survival package in R [24]. HCC patients were divided into low-risk and high-risk groups based on the median risk score. Time-dependent ROC analysis of the GSE14520 and TCGA datasets was performed using the survival ROC package in R. Nomograms were plotted using the rms package in R, and the consistency index and 95% confidence interval were calculated using the survcomp installation package in order to evaluate the predictive power of the model. The results were validated using calibration curves.

Non-Negative Matrix Factorization
Clustering analysis based on prognosis-related genes was performed using nonnegative matrix factorization (NMF) in the factoextra package in R (https://CRAN.Rproject.org/package=factoextra, accessed on 21 September 2021) and the k-mean clustering algorithm. The average contour width was used to identify the optimal number of clusters. Subtypes were compared in terms of survival analysis.

Subtype-Related Drug Sensitivity and Chemotherapeutic Response
To explore the distribution of clinical data (tumor stage, age, sex, and survival time) between the two HCC subtypes in the TCGA and GSE14520 datasets, we used the dplyr package in R. A submap algorithm was also used to predict the responsiveness of the HCC subtypes to immunotherapy. The Tumor Immune Dysfunction and Exclusion (TIDE) database (http://tide.dfci.harvard.edu/, accessed on 22 September 2021) was used to predict the responsiveness of patients to immune checkpoint inhibitors, while the SubMap module of the GenePattern database (https://cloud.genepattern.org/gp, accessed on 25 September 2021) [25] was used to identify similarities between the different subtypes in the GSE14520 and TCGA datasets. p values greater than 0.05 indicated high similarity.
The therapeutic response of HCC patients in the GSE14520 and TCGA datasets to anticancer drugs was evaluated based on the Genomics of Drug Sensitivity in Cancer (www.cancerRxgene.org, accessed on 25 September 2021) using the pRophetic package in R. The IC 50 values of the samples were estimated by ridge regression. The mean value for duplicate genes was determined using the allSolidTumors package in R.

Gene Expression-Related Stemness Index and Key Gene Expression
To calculate the mRNA expression-based stemness index (mRNAsi) in tumor tissues, we constructed a predictive model using the one-class logistic regression algorithm [26]. The mRNA-based signature contained the expression profiles of 10,362 genes. A stemness index model was then used to rank the 211 HCC samples using the Spearman correlation. The HCC samples stratified by the stemness index were used in subsequent integrative analyses. For external validation, the expression of key genes in the GSE138178 and GSE84006 datasets was explored using the Oncomine (https://www.oncomine.org/, accessed on 27 September 2021) [27] and Tumor Immune Estimation Resource (TIMER) (http://timer.cistrome.org/, accessed on 28 September 2021) databases, with selection criteria defined as p < 0.001 and fold change >1.5.

ssGSEA
The relative levels of immune cell infiltration in HCC and control samples were determined by ssGSEA using the GSVA package in R [18]. Correlations among the 24 types of immune cells were then explored by immunity network analysis, while the correlation of feature genes with immune infiltration was assessed by Pearson correlation analysis. The CIBERSORT algorithm was used to quantify the proportions of immune cells in the HCC samples.

Mutant Genes and DNA Methylation Analysis in HCC
The mutation data of overlapping DEGs in TCGA were visualized and analyzed using the maftools package [28], and the position of genetic mutations was determined using the lollipop package [29]. Differences in the total number of 450k probes and differentially methylated positions between HCC and control samples in the GSE136319 dataset were also identified. Associations among methylation, gene expression, and clinical phenotypes as well as the correlation between key gene expression and methylation status in TCGA HCC samples were assessed using the MEXPRESS tool [30].

Transcriptome Sequencing
The raw fastq data used Trimmomatic to remove linkers and low-quality reads, the reads genome alignment using TopHat2 software, and the genes used StringTie count quantification, then use the TMM normalization algorithm to normalize the reads, and finally calculate the FPKM value. Finally, edgeR software was used for differential gene analysis. Use ggplot2 for statistical graphing. The accession number for tran-scriptome sequencing reported in this paper is HRA002748 (https://ngdc.cncb.ac.cn/gsa-human/ browse/HRA002748, accessed on 8 August 2022).

DEGs in HCC and Their Functional Enrichment
To identify genes related to prognosis in HCC, we first performed differential analysis using data from the TCGA, GSE76427, GSE25097, and GSE14520 datasets (Figure 2A,B). Of the 4036 intersected DEGs overlapping across the four datasets, 2058 were upregulated, 1272 were downregulated, and 706 DEGs with different expression directions ( Figures 2C and S1). Functional enrichment analysis indicated that activated various HCCrelated pathways, such as the P53 signaling pathway, tryptophan metabolism, as well as primary bile acid biosynthesis ( Figure 2D). Overlapping DEGs may be involved in the cellular amino acid catabolic process, carboxylic acid catabolism, and the other metabolic processes ( Figure 2E).
GSEA showed that DEGs positively correlated with the cell cycle, mismatch repair, and DNA replication, but negatively correlated with mineral absorption, PPAR signaling, as well as complement and coagulation cascades ( Figure S2A). The AUC for predicting the 5-year OS of HCC patients in GSE14520 was 58% ( Figure S2B). Cox regression and Kaplan-Meier analyses also showed that 217 of the 3330 overlapping DEGs were closely associated with HCC prognosis, while further enrichment analysis using Metascape revealed that these prognostic genes were significantly enriched in small molecule catabolism, small molecule biosynthesis, and the mitotic cell cycle ( Figure S2C).

Identification of Diagnostic Genes in HCC
To evaluate the diagnostic value of prognosis-related genes in TCGA and GSE14520, their AUC values were calculated, and 138 of the 217 prognostic genes with AUC >0.80 were selected ( Figure 3A). Those 138 genes were then subjected to univariate Cox analysis to obtain 10 survival-related genes with relative importance >0.2 ( Figure 3B). Subsequent LASSO regression identified seven feature genes with an AUC of 0.744 for predicting 5-year OS: SORBS2, DHRS1, SLC16A2, RCL1, IGFALS, GNA14, and FANCI ( Figure 3C-E). The expression data of the seven feature genes were then integrated into three clusters of 127 combinations using the GMM model, and the cluster with the highest AUC was selected to identify feature genes with strong power for predicting HCC recurrence. The average accuracy of the seven feature genes in 1 of the 127 combinations was 0.9901, as determined by the GMM classifier ( Figure 3F). Further investigation of the independent prognostic value of the feature genes by univariate Cox regression analysis indicated that FANCI was significantly associated with poor OS (hazard ratio >1) in both datasets ( Figure 3G,H), suggesting that it may serve as a novel predictive biomarker of HCC recurrence. pathways, such as the P53 signaling pathway, tryptophan metabolism, as well as primary bile acid biosynthesis ( Figure 2D). Overlapping DEGs may be involved in the cellular amino acid catabolic process, carboxylic acid catabolism, and the other metabolic processes ( Figure  2E).

Feature Gene-Based Prognostic Risk Score as a Prognostic Tool in HCC
HCC patients were divided into high-and low-risk groups according to the median risk score (Figure 4A,B) and their AUC values for predicting 1-, 3-and 5-year OS were greater than 0.65 for GSE14520 and TCGA data ( Figure 4C,D). OS prediction was quantified using nomograms that integrated feature genes with clinicopathological risk factors ( Figure 4E). Calibration plots also showed that the nomograms performed well against an ideal model ( Figure 4F). HCC patients were divided into high-and low-risk groups according to the median risk score (Figure 4A,B) and their AUC values for predicting 1-, 3-and 5-year OS were greater than 0.65 for GSE14520 and TCGA data ( Figure 4C,D). OS prediction was quantified using nomograms that integrated feature genes with clinicopathological risk factors ( Figure 4E). Calibration plots also showed that the nomograms performed well against an ideal model ( Figure 4F).

Identification of HCC Subtypes by NMF of Prognostic Genes
In order to identify HCC molecular subtypes, HCC samples from the GSE14520 and TCGA databases were clustered by NMF based on the 217 prognosis-related genes ( Figure 5A-C). We were able to divide HCC patients into two molecular subtypes ( Figures 5C and S3A): subtype C1 with poor prognosis for HCC, and subtype C2 with good OS (Figures 5E and S2C). We found that the silhouette width value was 0.85 in GSE14520 ( Figure 5D) and 0.88 in TCGA ( Figure S3B), which suggested a good correlation between the HCC samples and the two different subtypes.
are drawn upward to determine the points received from the predictor. The sum of these points is reported on the "Points" axis. A line is then drawn downward to determine the 3-and 5-year survival probability based on the seven feature genes. (F) Calibration plots showing the performance of nomograms with an ideal model for 3-and 5-year survival. RFS, recurrence-free survival; TCGA, The Cancer Genome Atlas.

Identification of HCC Subtypes by NMF of Prognostic Genes
In order to identify HCC molecular subtypes, HCC samples from the GSE14520 and TCGA databases were clustered by NMF based on the 217 prognosis-related genes ( Figure  5A-C). We were able to divide HCC patients into two molecular subtypes (Figures 5C and  S3A): subtype C1 with poor prognosis for HCC, and subtype C2 with good OS (Figures 5E  and S2C). We found that the silhouette width value was 0.85 in GSE14520 ( Figure 5D) and 0.88 in TCGA ( Figure S3B), which suggested a good correlation between the HCC samples and the two different subtypes.

Sensitivity of HCC Subtypes to Immunotherapy and Chemotherapeutic Drugs
A comparison of clinical data distribution between the two HCC subtypes in GSE14520 and TCGA indicated that there was no significant difference in age between the two subtypes, but men were more prone to both subtypes of disease than women. In addition, subtype C1 showed shorter survival than subtype C2 ( Figure 6A), as well as greater responsiveness to CTLA4-R therapy, based on data from GSE14520 (nominal p value = 0.03; Figure 6B) and TCGA (nominal p value = 0.09; Figure 6C). In contrast, subtype C2 in TCGA showed significantly greater responsiveness to anticancer drugs ZM.447439 and AG.14699 than subtype C1 ( Figure 6D).
A comparison of clinical data distribution between the two HCC subtypes in GSE14520 and TCGA indicated that there was no significant difference in age between the two subtypes, but men were more prone to both subtypes of disease than women. In addition, subtype C1 showed shorter survival than subtype C2 ( Figure 6A), as well as greater responsiveness to CTLA4-R therapy, based on data from GSE14520 (nominal p value = 0.03; Figure 6B) and TCGA (nominal p value = 0.09; Figure 6C). In contrast, subtype C2 in TCGA showed significantly greater responsiveness to anticancer drugs ZM.447439 and AG.14699 than subtype C1 ( Figure 6D).

Stemness Index and FANCI Expression
Ranking of the HCC samples according to stemness index values showed that their clinico-demographic features significantly correlated with mRNAsi ( Figure S4A). FANCI positively correlated with the stemness index ( Figure S4B), and it was upregulated in both the GSE138178 and GSE84006 datasets ( Figure S4C,D). Further analysis of FANCI mRNA expression in various cancer types using the Oncomine database showed that FANCI was significantly upregulated in liver cancer compared to normal tissues ( Figure S4E). These results were confirmed by TCGA RNA-sequencing data in TIMER, which indicated that FANCI levels were significantly higher in tumors than in adjacent normal tissues ( Figure S4F).

Enrichment of FANCI in Biological Pathways
To explore the role of FANCI in HCC prognosis, we performed time-ROC analysis, which showed that the AUC for predicting 5-year OS was highest in the GSE14520 dataset ( Figure 7A), while the AUCs for predicting 1-, 3-, 5-year OS were greater than 0.60 in TCGA ( Figure 7B). The AUC for FANCI was higher than 0.90 in all four datasets ( Figure 7C). Enrichment analysis showed that FANCI positively correlated with the regulation of fibrinolysis, epoxygenase P450 pathway, and protein activation cascade, while it was negatively correlated with mismatch repair, centrosome separation, and translesion synthesis ( Figure 7D). In addition, FANCI positively correlated with the cell cycle, DNA replication, and proteasome, while it was negatively correlated with FOXO, IL-17, and p53 signaling pathways ( Figure 7E). We also found that FANCI expression was higher in HCC tissues than controls in the Human Protein Atlas database (https://www.proteinatlas.org/, accessed on 15 October 2021) ( Figure 8A).Similarly, the results of transcriptome sequencing in 6 HCC patients showed that FANCI was highly expressed in HCC compared to adjacent, especially in patients 2 (RFS 66 Days) and 5 (RFS 248 Days) with tumor progression (Figure 8B-D). Meanwhile, FANCI was identified in previous studies as a down-stream gene for Wnt signalling that regulates early recurrence of HCC [31]. These results promptand that it may positively regulate the Fanconi anemia pathway ( Figure 8E).

Immune Cell Infiltration
To explore the potential clinical significance of immune cell infiltration in HCC, we determined the infiltration levels in all four datasets ( Figure 9A). T helper type 2 (Th2), T helper, and pre-dendritic cells were significantly upregulated in HCC samples compared to controls, while T cells and cytotoxic cells correlated significantly with subtype C1 ( Figure 9B). Correlations among the 24 immune cell types in HCC tissues were also analyzed ( Figure 9C), and four clusters were constructed showing positive and negative correlations with one another ( Figure 9D). In addition, dendritic and Th12 cells correlated with the seven feature genes, while a significant correlation was observed between Th2 cells and FANCI ( Figure 9E). The CIBERSORT algorithm also showed that most infiltrated immune cells were T cells ( Figure 9E).

Somatic Mutations and DNA Methylation
Somatic single-nucleotide variants were identified in 364 HCC patients based on sequencing data showing at least 20-fold coverage. Mutations were found to inactivate the tumor suppressor genes TP53 (31% of all patients), AXINI (8%), and RB1 (5%) ( Figure  S5A). In addition, we found that FANCI was mutated at two sites in HCC patients, and those mutations may affect protein function ( Figure S5B). We further found that FANCI mRNA levels negatively correlated with protein levels, implying that the gene is subject to methylation ( Figure S5D). Analysis of genes differentially methylated between HCC and non-tumor tissues in Peruvian hepatocellular carcinoma patients in GSE136319 showed that methylation modification was investigated ( Figure S5C). Furthermore, the MEXPRESS tool showed that the methylation level in the promoter region of FANCI was significantly higher in HCC samples than in normal tissues in TCGA ( Figure S6).

Immune Cell Infiltration
To explore the potential clinical significance of immune cell infiltration in HCC, we determined the infiltration levels in all four datasets ( Figure 9A). T helper type 2 (Th2), T helper, and pre-dendritic cells were significantly upregulated in HCC samples compared to controls, while T cells and cytotoxic cells correlated significantly with subtype C1 ( Figure 9B). Correlations among the 24 immune cell types in HCC tissues were also analyzed ( Figure 9C), and four clusters were constructed showing positive and negative correlations with one another ( Figure 9D). In addition, dendritic and Th12 cells correlated with the seven feature genes, while a significant correlation was observed between Th2 cells and FANCI ( Figure 9E). The CIBERSORT algorithm also showed that most infiltrated immune cells were T cells ( Figure 9E).

Discussion
HCC is one of the most common tumors in the world, but it remains a fatal disease due to its poor prognosis, highlighting the need to identify HCC biomarkers. Studies have shown that the cachexia of cancer affects the quality of life of many patients with advanced cancer [32]. Cachexia is prevalent in patients with HCC and associated with worse prognosis, which develops in approximately 25% of HCC patients during the disease course [33]; however, cachexia is considered a covariate also need to further study. In this manuscript, we used bioinformatics to determine prognosis-related molecular subtypes of HCC using four public datasets. Our analysis identified two subtypes of HCC (C1 and C2) and seven feature genes that may serve as potential biomarkers and therapeutic targets for HCC. Among them, FANCI showed good prognostic performance and was positively associated with the cell cycle, DNA replication, and mismatch repair.
The identification of novel HCC biomarkers remains critically important. For instance, MITD1 has been reported as a novel liver cancer biomarker involved in cytokinesis [34]. We found that the DEGs overlapping across four databases were involved mainly in the cell cycle, FOXO signaling pathway, mismatch repair, as well as complement and coagulation cascades. Consistent with our results, another study identified DEGs in HCC that were significantly enriched in mismatch repair and complement and coagulation cascades [35]. Juglanthraquinone, a natural compound, can induce apoptosis in HCC cells by activating the Akt/FOXO signaling pathway [36].
In the present study, random forest survival and LASSO regression models identified SORBS2, DHRS1, SLC16A2, RCL1, IGFALS, GNA14, and FANCI as feature genes that may be involved in HCC occurrence and may influence prognosis. Earlier studies showed that IGFALS might be a useful diagnostic and therapeutic target for HCC [37] and that SORBS2 can accurately predict the prognosis of HCC patients [38]. Expression of SORBS2 in TCGA GBM cohorts is associated with worse outcome [39]. IGFALS has also been identified as a tumor suppressor gene, which is silenced by methylation in HCC [40,41], and IGFALS was associated with disease-free survival of gastric cancer [42]. For their part, DHRS1 [43], SLC16A2 [44], and GNA14 [45] are significantly underexpressed in HCC tissues compared to normal tissues and have shown potential as prognostic biomarkers of HCC. In addition, RCL1 has shown strong potential for predicting overall and disease-free survival of HCC patients [46], while FANCI has been identified as a reliable marker of hepatitis B virusassociated HCC [47]. Previous studies have indicated that RCL1 could independently predict breast cancer prognosis [48] and copy number variants of RCL1 are associated with a range of neuropsychiatric phenotypes [49]. Above all, our study showed that SORBS2, FANCI, DHRS1, and IGFALS can be mutated in HCC, and the effects of these mutations should be explored in future work.
Mutations in FANCI of familial colorectal cancer that regulate DNA repair and were associated with the Fanconi anemia repair pathway [50]. Furthermore, FANCI mutations were found that mainly involved in breast cancer and ovarian cancer [51,52]. Therefore, suggesting that FANCI is mutated in cancers, and may play a vital role.
Since tumors in different regions may represent different subtypes [10], the identification of HCC subtypes may improve the prognosis of HCC patients and provide new therapeutic strategies. The histological subtypes of primary liver cancer are mainly HCC and intrahepatic cholangiocarcinoma confined to the liver [53]. HCC has recently been stratified into three subtypes differing in metabolic and signaling pathways, including altered kynurenine metabolism, Wnt/β-catenin-associated lipid metabolism, and PI3K/AKT/mTOR signaling [11]. Another study defined three other major HCC subtypes: mitogenic and stem cell-like tumors with chromosomal instability, CTNNB1-mutated tumors displaying immune suppression, and metabolic disease-associated tumors [54]. The clustering of immune cells in the HCC microenvironment led to yet another classification of subtypes as immunocompetent, immunodeficient, or immunosuppressive [12]. In the present work, we identified two molecular subtypes of HCC patients that were associated with different prognosis. Comparison of their clinico-demographic features showed that subtype C1 had significantly shorter OS than subtype C2, consistent with the results of the TCGA groups. In addition, men were more prevalent than women in both subtypes, consistent with a previous study where men showed higher risk of non-alcoholic fatty liver disease and HCC than women [55].
Compared to healthy liver samples, most of the immune cell subpopulations required for antitumor immune response are reduced in HCC samples, whereas gene signatures defining T helper and Th2 cells are significantly increased [56]. In addition, underexpression of tumor antigens in HCC cells reduces T cell activation and tumor infiltration, resulting in a less efficient control of tumor growth, leading to worse clinical outcomes [57]. In the present research, ssGSEA showed that Th2 and T helper cells were significantly upregulated in the four datasets, and cytotoxic and T cells strongly infiltrated tumor tissue in both HCC subtypes.
Our analysis demonstrated that the expression of the seven feature genes positively correlated with dendritic, natural killer, and Th17 cell infiltration. The tumor microenvironment disrupts the maturation and activation of dendritic cells, resulting in dendritic cells with immunosuppressive potential in HCC and breast cancer [58]. Moreover, dysfunction of natural killer cells contributes to HCC development [59], while overexpression of Th17 cells has been associated with worse prognosis of HCC patients [60]. Interestingly, here, we found a significant correlation between FANCI and Th2 cells. Since Th2 cells are associated with immune evasion [61], we hypothesize that FANCI may promote HCC development by evading HCC immune cells.
However, the present study had some limitations. Although our analysis used multiple datasets, it did not feature HCC patients in terms of diversity, so validation with a large sample size in the HCC population is also needed to obtain the generalizability of the key results. Moreover, the corresponding information of datasets did not mention, such as amino acid levels. We screened the seven feature genes based on the bioinformatics analysis, which are highly expressed in HCC, and their roles should be further studied both in vitro and in vivo. Further studies are also needed in animal or cell experiments to validate the effect of FANCI on postoperative recurrence. In fact, all our bioinformatic findings need to be confirmed in preclinical studies and, ultimately, prospective clinical trials.

Conclusions
We defined two molecular subtypes of HCC that are associated with different prognoses, and we identified FANCI as a good prognostic indicator in HCC.
Key Points: Two subtypes of HCC were identified based on tumor and non-tumor data using non-negative matrix decomposition.
Genes from four HCC datasets were significantly enriched in the cell cycle, FOXO signaling, as well as complement and coagulation cascades.
FANCI in HCC positively correlated with the cell cycle, DNA replication, and mismatch repair.
FANCI was able to predict the survival of HCC patients, making it a potential prognostic biomarker.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14225721/s1, Figure S1: Pie charts of 4036 intersected DEGs overlapping across the four datasets; Figure S2: Gene set enrichment analysis and survival analysis of hepatocellular carcinoma (HCC) samples; Figure S3: Identification of hepatocellular carcinoma (HCC) subtypes based on the TCGA dataset; Figure S4: Clinico-demographic features associated with the mRNA expression-based stemness index (mRNAsi) and FANCI expression in hepatocellular carcinoma (HCC); Figure S5: Genomic landscape of hepatocellular carcinoma and DNA methylation changes; Figure S6: Correlation of FANCI expression and methylation status in hepatocellular carcinoma samples from The Cancer Genome Atlas, as determined by the MEXPRESS tool. Data Availability Statement: Data are available in a repository and can be accessed using a unique identifier other than a DOI: The data underlying this article are available in (The Cancer Genome Atlas (TCGA) database and Gene Expression Omnibus (GEO) database, and can be accessed with [GSE138178, GSE14520, GSE76427, GSE25097, GSE84006, and GSE136319) in GEO database.

Acknowledgments:
The authors are grateful to the patients whose publicly available data made this project possible.

Conflicts of Interest:
The authors declare that there are no conflicts of interest.