HAMP as a Potential Diagnostic, PD-(L)1 Immunotherapy Sensitivity and Prognostic Biomarker in Hepatocellular Carcinoma

Hepatocellular carcinoma (HCC) remains a global medical problem. Programmed cell death protein 1 (PD-1) is a powerful weapon against many cancers, but it is not sensitive to some patients with HCC. We obtained datasets from the Gene Expression Omnibus (GEO) database on HCC patients and PD-1 immunotherapy to select seven intersecting DEGs. Through Lasso regression, two intersecting genes were acquired as predictors of HCC and PD-1 treatment prognosis, including HAMP and FOS. Logistic regression was performed to build a prediction model. HAMP had a better ability to diagnose HCC and predict PD1 treatment sensitivity. Further, we adapted the support vector machine (SVM) technique using HAMP to predict triple-classified outcomes after PD1 treatment in HCC patients, which had an excellent classification ability. We also performed external validation using TCGA data, which showed that HAMP was elevated in the early stage of HCC. HAMP was positively correlated with the infiltration of 18 major immune cells and the expression of 2 important immune checkpoints, PDCD1 and CTLA4. We discovered a biomarker that can be used for the early diagnosis, prognosis and PD1 immunotherapy efficacy prediction of HCC for the first time and developed a diagnostic model, prognostic model and prediction model of PD1 treatment sensitivity and treatment outcome for HCC patients accordingly.


Introduction
Liver cancer, as one of the four main causes of cancer-related death all over the world, has brought heavy human and economic burdens to society [1,2]. Hepatocellular carcinoma (HCC) takes up the vast majority of primary liver cancer cases, which is a primary malignant neoplasm of epithelial liver cells [3]. According to a recent study using the SEER registry, the incidence of HCC is predicted to continue climbing until 2030 [4]. In the face of the low survival rate and high mortality rate of liver cancer worldwide, how to effectively reduce the burden associated with liver cancer remains a major issue in the field of global public health as well as chronic disease prevention and control, while the optimization of screening strategies for the liver cancer population is an essential goal that deserves continuous exploration.
Although the diagnosis and treatment techniques of liver cancer have developed in recent years, early detection and monitoring are still obscure [4][5][6]. Currently, alphafetoprotein (AFP), accompanied by ultrasound scanning (USS), multiphasic computed tomography (CT) and magnetic resonance imaging (MRI), is widely used for the surveillance and diagnosis of HCC [7]. Some tools from other serum biomarkers and liquid biopsy are also in development but require fuller clinical evaluation [8,9]. It is worth pointing out

Acquisition of Datasets
With the search criteria of "PD-1 Receptor" and "Liver Neoplasm", the dataset GSE140901 was obtained from the Gene Expression Omnibus (GEO) of NCBI (https://www.ncbinlm. nih.gov/geo/, accessed on 2 February 2022), which is based on the GPL19965 platform and contains a total of 24 samples, all of which were treated with PD-1 immunotherapy, and they were classified as 6 with partial remission (PR), 10 with stable disease (SD) and 8 with progressive disease (PD) according to the treatment efficacy. The raw data on the "Liver Tumor" criterion were downloaded from GSE14520 based on the GPL571 platform, and 43 samples were availa2015013052@stu.gzucm.edu.cnble, of which 22 were tumor samples and 21 were normal samples. The patients back ground data was showed in Tables 1-3.

Extraction of Differentially Expressed Genes and Lasso Regression
The GSE14520 dataset was grouped in accordance with whether the patients had liver cancer, and the GSE140901 dataset was grouped in accordance with whether the liver cancer patients were in remission after PD-1 immunotherapy; these two datasets were analyzed for discrepancies using the DESeq2 and ggrepel software packages, and they were visualized in volcano plots at |log2FC| ≥ 1 with a p-value < 0.05 to screen for DEGs. Subsequently, the intersection sets of the above two differential gene sets were excavated as predictor variables and further screened via Lasso regression. Lasso regression is characterized by the ability to perform variable selection and complexity regularization while fitting a generalized linear model. Then, UpSet plots were built for the aforementioned genes so as to acquire the pivotal genes that were simultaneously associated with liver cancer disease and the prognosis of PD-1 immunotherapy in liver cancer patients. In the field of bioinformatics research, when the number of sample groups is relatively large, it is more intuitive to employ UpSet plots to observe the information that is common among all sample groups as well as the information unique to each sample group.

Model Construction and Validation, Results Evaluation
The two pivotal intersecting genes screened in 2.2 were constructed as logistic regression models for the GSE14520 and GSE140901 datasets, respectively. Thereafter, subject working characteristic curves were plotted using the pROC package, and the area under the curve (AUC) was calculated to assess the discriminatory ability of the prediction models. It should be highlighted that the nomograms of the critical factors were created separately for visualization in this study. Nomograms are able to visualize the results of regression analysis and allow for a more intuitive presentation of multiple indicators that combine to diagnose or predict disease risk or prognosis. Meanwhile, this study also plotted survival curves using the Kaplan-Meier (KM) method by using individual overall survival times (OS) as an outcome variable for the GSE14520 and GSE140901 datasets, separately, and performed a logrank test to compare significant differences in survival curves, aiming to screen genes that specifically affect the prognosis of PD-1 treatment.

Construction of Support Vector Machine Model
The support vector machine (SVM) algorithm is capable of mapping the categorized data to a higher dimensional feature space with certain fault tolerance conditions based on an appropriate kernel function and classifying the data by constructing an optimal classification hyperplane in this space. This algorithm is suitable for small sample, nonlinear, high-dimensional data classification problems, with high reliability in its prediction, stability and generalization ability, and it is widely applied to processing classification problems and regression analysis. However, the model built in this study incorporates only one feature variable and does not require an SVM algorithm for high-level processing data; compared with logistic regression, SVM is more suitable for small-scale data sets. Moreover, SVM is better for handling nonlinear relationships; therefore, it is feasible for us to model with the SVM algorithm. The present study segmented the GSE140901 dataset into three categorical outcome variables on the basis of post-treatment efficacy, constructing prediction models of pivotal genes screened from 1.3 by SVM and assessing them through the subject working characteristic curves.

Validation of Pivotal Gene in TCGA Database
To provide a further test of the actual value of the pivotal gene in clinical samples, we downloaded liver cancer and its normal samples from the TCGA database (https://www. cancer.gov/aboutnci/organization/ccg/research/structural-genomics/tcga, accessed on 2 February 2022). The patients' background for TCGA can be found in Table 4. Furthermore, as the prognosis of HCC is determined by liver function, we also provided the Child-Pugh scores of these patients in the background data. Benefiting from the R software (version 4.0.2, https://www.Rproject.org, accessed on 2 February 2022) and its powerful data analysis capabilities and box plotting in the "ggplot2" R package, we compared the expression levels of the pivotal gene in normal versus HCC tissues, as well as in different tumor stages. The expression differences between HCC and normal tissue samples were identified with a p-value of <0.05.
Sequentially, the pROC package was employed to perform sensitivity-and specificitybased ROC curve analyses, which were designed to assess the accuracy of the pivotal gene diagnosis. Overall survival (OS), disease-specific survival (DSS) and progression-free interval (PFI) were deemed to be indicators to explore the correlation between pivotal gene expression and HCC patient prognosis. When it concerned survival analysis, the Kaplan-Meier method and logrank test were deployed.
What is more, the correlations between the pivotal gene and the 24 immune cells' infiltration levels in HCC were assessed using CIBERSORT (https://cibersort.stanford. edu/, accessed on 2 February 2022) and the R software packages "ggplot2" and "ggpubr". As a final step, we separately evaluated the co-expression of the pivotal gene with PDCD1 and CTLA4 in HCC using Spearman correlation coefficients, which are the target genes of two well-recognized anticancer miracle drugs.

Extraction of Differential Genes and Lasso Regression
Through differential analysis, 1154 DEGs were screened in GSE14520, of which 635 were upregulated and 519 were downregulated ( Figure 1A); 109 DEGs were screened in GSE140901, of which 56 were upregulated and 53 were downregulated ( Figure 1B). The intersection of the aforesaid two differential genes was taken to obtain seven genes, which were considered predictor variables for Lasso regression; thus, the λ value with the smallest cross-validation error was selected via ten-fold cross-validation, resulting in finding the best predictor by continuously fitting the model. Under Lasso regression ( Figure 1C,D), we shortlisted four predictors for HCC including FOS, HAMP, IL13RA2 and LCN2, as well as four predictors for the prognosis of the PD-1 treatment, including SPP1, FOS, CXCL2 and HAMP, ulteriorly cross-tabulating the two to obtain two genes, HAMP and FOS ( Figure 1E).

Model Building and Validation, Evaluation of Results
The two pivotal genes identified in Section 3.1 were regarded as independent variables for logistic regression to build a prediction model. In this study, the prediction performance was evaluated using ROC curves, and nomograms were drawn using RMS. As a general rule, we use the AUC value for the evaluation of the predictive model, which corresponds to a larger AUC for better prediction. When AUC = 1, the model is considered

Model Building and Validation, Evaluation of Results
The two pivotal genes identified in Section 3.1 were regarded as independent variables for logistic regression to build a prediction model. In this study, the prediction performance was evaluated using ROC curves, and nomograms were drawn using RMS. As a general rule, we use the AUC value for the evaluation of the predictive model, which corresponds to a larger AUC for better prediction. When AUC = 1, the model is considered as a perfect  (Figure 3A-D). Additionally, after KM curve drawing and logrank analysis ( Figure 3E,F), it was surprising to discover that HAMP had a better prognostic performance in PD-1 treatment (p < 0.05).  (Figure 3A-D). Additionally, after KM curve drawing and logrank analysis ( Figure 3E,F), it was surprising to discover that HAMP had a better prognostic performance in PD-1 treatment (p < 0.05).

Construction of Support Vector Machine Model
Setting HAMP as an independent variable, we constructed an SVM model utilizing the e1071 package and applied a lattice parameter search function to find the best cost value. When cost = four, we constructed the best support vector machine model, and the performance of the model was assessed using multi-category ROC curves (PD: 2.50 [0.20,1.00], AUC = 0.9; PR: 2.50 [0.00,0.20], AUC = 0.6; SD: 2.50 [0.00,1.00], AUC = 1.0). Meanwhile, the model also had good discriminative performance in the prognostic multiclassification ( Figure 4).

Construction of Support Vector Machine Model
Setting HAMP as an independent variable, we constructed an SVM model utilizing the e1071 package and applied a lattice parameter search function to find the best cost value. When cost = four, we constructed the best support vector machine model, and the performance of the model was assessed using multi-category ROC curves (PD: 2.50 [0.20,1.00], AUC = 0.9; PR: 2.50 [0.00,0.20], AUC = 0.6; SD: 2.50 [0.00,1.00], AUC = 1.0). Meanwhile, the model also had good discriminative performance in the prognostic multiclassification ( Figure 4).

The Clinical Value of the Pivotal Gene in the TCGA Database
Based on the TCGA data, by comparing the expression levels of HAMP in normal samples with those of HCC samples, it was obvious that the expression level of HAMP was significantly reduced in HCC tissues ( Figure 5A). This conclusion was also applicable in the comparison of paired samples ( Figure 5B). When examining the relevance of the tumor stage, we noticed that the expression of HAMP displayed a striking decrease in the early stage of HCC ( Figure 5C-E). When focusing on Figure 6, the AUC was up to 0.946, strongly demonstrating the high accuracy of the HAMP prediction model in the diagnosis of HCC.

The Clinical Value of the Pivotal Gene in the TCGA Database
Based on the TCGA data, by comparing the expression levels of HAMP in normal samples with those of HCC samples, it was obvious that the expression level of HAMP was significantly reduced in HCC tissues ( Figure 5A). This conclusion was also applicable in the comparison of paired samples ( Figure 5B). When examining the relevance of the tumor stage, we noticed that the expression of HAMP displayed a striking decrease in the early stage of HCC ( Figure 5C-E). When focusing on Figure 6, the AUC was up to 0.946, strongly demonstrating the high accuracy of the HAMP prediction model in the diagnosis of HCC.
The survival association analysis for HCC illustrated that the expression levels of HAMP were associated with DSS (p < 0.05) ( Figure 7B). DSS is a good response to the clinical benefit of a specific disease, that is, the reduction or increase in death due to a specific disease. Hence, we configured a prognostic prediction model for DSS to assess the survival probability at 1, 3, and 5 years for patients with HCC, which suggested that a higher HAMP level tends to be associated with a better prognosis ( Figure 7D).
From the results of CIBERSORT, it appeared that for HCC, the level of immune cell infiltration was remarkably correlated with HAMP expression to a great extent, except for Tcm ( Figure 8). Meanwhile, the infiltration level of B cells, macrophages, and iDC was most closely correlated with HAMP expression (Figure 8). Furthermore, the results in Figure 9 leave no doubt about the significant co-expression relationship that HAMP has with PDCD1 (p < 0.001) and CTLA4 (p = 0.004). Immunotherapy targets mainly included CD274, PDCD1, and CTLA4. CD274 correlation was calculated as cor = 0.066, p = 0.203 > 0.05, so it was not included. Moreover, CD8, T-cell failure, tumor-associated fibroblasts (CAF), and tumor-associated macrophages (TAM) markers belonged to tumor microenvironment markers and were not in our scope of concern.  The survival association analysis for HCC illustrated that the expression levels of HAMP were associated with DSS (p < 0.05) ( Figure 7B). DSS is a good response to the clinical benefit of a specific disease, that is, the reduction or increase in death due to a specific disease. Hence, we configured a prognostic prediction model for DSS to assess the survival probability at 1, 3, and 5 years for patients with HCC, which suggested that a higher HAMP level tends to be associated with a better prognosis ( Figure 7D).  The survival association analysis for HCC illustrated that the expression levels of HAMP were associated with DSS (p < 0.05) ( Figure 7B). DSS is a good response to the clinical benefit of a specific disease, that is, the reduction or increase in death due to a specific disease. Hence, we configured a prognostic prediction model for DSS to assess the survival probability at 1, 3, and 5 years for patients with HCC, which suggested that a higher HAMP level tends to be associated with a better prognosis ( Figure 7D).  From the results of CIBERSORT, it appeared that for HCC, the level of immune cell infiltration was remarkably correlated with HAMP expression to a great extent, except for Tcm ( Figure 8). Meanwhile, the infiltration level of B cells, macrophages, and iDC was most closely correlated with HAMP expression (Figure 8). Furthermore, the results in Figure 9 leave no doubt about the significant co-expression relationship that HAMP has with PDCD1 (p < 0.001) and CTLA4 (p = 0.004). Immunotherapy targets mainly included CD274, PDCD1, and CTLA4. CD274 correlation was calculated as cor = 0.066, p = 0.203 > 0.05, so it was not included. Moreover, CD8, T-cell failure, tumor-associated fibroblasts (CAF), and tumor-associated macrophages (TAM) markers belonged to tumor microenvironment markers and were not in our scope of concern.

Discussion
In the present study, firstly, we attained two crossover genes, HAMP and FOS, by differential gene extraction and lasso regression analysis. Next, to pick a more valuable biomarker, we employed logistic regression to build prediction models using HAMP and FOS as independent variables. Combining the results of ROC curve and nomogram, we concluded that HAMP has a better ability to diagnose HCC and predict PD-1 treatment sensitivity. As a further step, we constructed the SVM model of HAMP to predict the three-stage outcome after PD-1 treatment in HCC patients, which had good classification

Discussion
In the present study, firstly, we attained two crossover genes, HAMP and FOS, by differential gene extraction and lasso regression analysis. Next, to pick a more valuable biomarker, we employed logistic regression to build prediction models using HAMP and FOS as independent variables. Combining the results of ROC curve and nomogram, we concluded that HAMP has a better ability to diagnose HCC and predict PD-1 treatment sensitivity. As a further step, we constructed the SVM model of HAMP to predict the threestage outcome after PD-1 treatment in HCC patients, which had good classification ability. In addition, we performed external validation by using TCGA data, which revealed that the expression level of HAMP was significantly reduced in HCC tissues and was significantly elevated in the early stage of HCC, suggesting an important role of HAMP in the early diagnosis of HCC. More notably, we found that HAMP was significantly and positively correlated with the expression of 18 major immune cell infiltrates and 2 important immune checkpoints, PDCD1 and CTLA4. This finding lays a foundation for HAMP to guide the immunotherapy of HCC, which could be beneficial for clinical precision treatment and saving medical resources.
Hepcidin antimicrobial peptide (HAMP), a liver-produced protein-coding gene, is the master regulator of iron homeostasis maintenance [24], and it can be detected in human urine and serum [25]. Under normal physiological conditions, HAMP can function by promoting endocytosis and the degradation of ferroportin/SLC40A1, resulting in the retention of iron in iron export cells and reduced iron influx in plasma [26][27][28]. In previous studies, HAMP has been regularly associated with hemochromatosis, especially type 2B hemochromatosis [29]. This is significantly attributed to the metabolism of iron in the placenta and the effect of Hfe on hemoglobin production [30,31]. Likewise, in this research, we found that HAMP also has a non-negligible role in the diagnosis of HCC based on analysis of variance, Lasso regression and model construction. In Figure 2A,B, the areas under the curve for HAMP and FOS are both over 0.98, which is higher than the performance of alpha-fetoprotein (AFP). Considering the result may be affected by the small number of GEO samples we included, as well as sample bias existing in each dataset, we used another dataset as external validation to ensure the robustness of the biomarkers. Although some bias or error in our results is inevitable due to the small sample size or the source bias present in the dataset population, we identified a biomarker with excellent sensitivity and specificity, whether reliable or not, but worthy of further clinical and basic experimental confirmation and excavation. The conclusion coincides with the notions of some other researchers; in other words, it can be corroborated with them. The work of Silvia Udali et al. highlighted the emergent role of the downregulation of HAMP induced by DNA promoter hypermethylation [32]. Other investigators have elucidated the potential of HAMP as a diagnostic biomarker for HCC from the perspectives of HAMP-SLC40A1 signaling, cyclin4-dependent kinase-1/STAT3 pathway and Circ_0004913 [33][34][35]. Since the existing studies have focused more on the mechanism of the effect of HAMP on HCC, this research was more concerned with its practical application in the clinical diagnosis and treatment process; thus, our validation analysis of the TCGA data corroborated this very favorably. Although a small proportion of the paired samples had results contrary to our findings, it is the consensus among the medical community that HCC is a multifactorial predisposing disease, so these deviations are permitted to exist.
Of more interest, however, is that this study further examines the predictive role of HAMP in assessing the sensitivity and prognosis of patients with HCC to PD-1 therapy, which is of great significance both in terms of improving effective clinical treatment rates as well as saving healthcare resources. PD-1 immunotherapy serves to increase the antitumor activity of T cells in the body by relieving the suppression of immune function, which is mediated as a result of the combination of PD-1 and PDL-1, thus providing a killing and suppressing impact on cancer cells [36]. In some cancers, for example, cholangiocarcinoma, renal cell carcinoma and colorectal cancer, the relevant evidence implies that HAMP may contribute to immune activation in the tumor microenvironment and inhibit cancer cell development by acting synergistically with immune cells and chemokines [37,38]. This may corroborate the predictive role of HAMP in liver cancer immunotherapy. Additionally, a protein dynamics analysis of iron and iron interactions during T cell activation uncovered that pathways rich in iron-interacting proteins were likely impaired by iron deficiency during T cell activation and emphasized the role of iron in T cell differentiation [39]. At this point, we reasonably conclude that the iron metabolism mechanism is an intermediate pivot linking HAMP and PD-1; nevertheless, more experimental evidence still needs to be performed.
The main inadequacy of this study was that the superior prognosis of HAMP was not as well captured in the OS and PFI of patients with HCC. Regarding this point, we speculated that it was due to a high number of missing follow-ups in the extracted patient data; thus, the time of death of these patients was not precisely estimated. In addition, the cause of HCC is still not understood, due to the limitation of data sources.
In a nutshell, this investigation suggested whether HAMP could be considered a new biomarker for diagnosing early-stage HCC and used to predict the sensitivity and prognosis of PD-1 immunotherapy, in addition to orienting the development of new drugs.

Conflicts of Interest:
The authors declare no conflict of interest.