Differentiating Hepatocellular Carcinoma from Hepatitis C Using Metabolite Profiling

Hepatocellular carcinoma (HCC) accounts for most liver cancer cases worldwide. Contraction of the hepatitis C virus (HCV) is considered a major risk factor for liver cancer. In order to identify the risk of cancer, metabolic profiling of serum samples from patients with HCC (n=40) and HCV (n=22) was performed by 1H nuclear magnetic resonance spectroscopy. Multivariate statistical analysis showed a distinct separation of the two patient cohorts, indicating a distinct metabolic difference between HCC and HCV patient groups based on signals from lipids and other individual metabolites. Univariate analysis showed that three metabolites (choline, valine and creatinine) were significantly altered in HCC. A PLS-DA model based on these three metabolites showed a sensitivity of 80%, specificity of 71% and an area under the receiver operating curve of 0.83, outperforming the clinical marker alpha-fetoprotein (AFP). The robustness of the model was tested using Monte-Carlo cross validation (MCCV). This study showed that metabolite profiling could provide an alternative approach for HCC screening in HCV patients, many of whom have high risk for developing liver cancer.


Introduction
Hepatocellular Carcinoma (HCC) is the most common type of liver cancer and the third leading cause of cancer mortality worldwide, especially in China and South East Asia [1]. Although most cases (85%) occur in developing countries, the incidence of HCC in the U.S. has tripled over the past twenty years [2]. The five-year survival rate is very poor, less than 5% [3]. Early diagnosis can give patients an opportunity to receive curative treatments; this then improves outcomes [4]. The current diagnostic methods include cross sectional imaging and biopsy in cases where the imaging does not meet established diagnostic criteria. Once cancer develops in a hepatitis C infected liver, the disease is predictably destructive. For this reason, identification of patients at high risk for the development of cancer would allow for: 1) closer surveillance and 2) chemoprevention protocols. The major risk factors of HCC include infection with Hepatitis B or C virus (HBV or HCV), with the highest risk occurring when patients develop cirrhosis. It is estimated that patients with HCV and cirrhosis have much higher risk (15-20 fold) to develop HCC [5].
Serologic biomarkers such as alpha-fetoprotein (AFP) have been used to help diagnose or assess prognosis in HCC for decades. In patients with inflammatory conditions such as hepatitis, the value of AFP is limited as AFP levels can be elevated beyond the threshold in the absence of measureable cancer and negative in cases of obvious malignancy [6]. For this reason, the physician cannot argue for an intervention, such as liver transplant, based on AFP alone. This lack of specificity diminishes its value in screening hepatitis patients [6][7][8][9]. Other serum markers, such as serum Lens culinaris agglutinin-reactive AFP (AFP-L3), des γ-carboxy prothrombin (DCP) and the secreted isoforms of ERBB3 (sERBB3) have been observed to have better performance for the diagnosis of HCC [10][11][12][13][14]. However, most of these markers have not been integrated into clinical practice.
Given the importance of liver function in metabolism, metabolite biomarkers might provide alternative biomarker candidates. In particular, metabolite profiling provides a broad and systematic view of metabolic change in complex biological samples and can be potentially useful for identifying metabolite biomarkers. Utilizing high-throughput analytical techniques such as nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS), metabolite profiling provides a detailed and quantitative analysis of 10s to 100s of metabolites and has therefore been applied to numerous areas including drug response, early disease diagnosis, toxicity and nutritional studies. [15][16][17][18]. A number of biomarker candidates have been proposed for different cancers, including lung [19,20], prostate [21], colon [22], breast [23,24] and esophageal [25,26].
Several metabolite-profiling studies have focused on detecting HCC in different patient populations. Yang et al. applied high-resolution magic-angle spinning (HRMAS) in order to study adjacent, high-grade and adjacent low-grade liver cancer tissues and found several metabolites that clearly differentiated the samples, including lactate and several amino acids [27]. NMR was also used to screen urine samples from HCC patients in a Nigerian population [28]. Multivariate, partial least squares discriminant analysis (PLS-DA) models, based on markers such as creatinine, carnitine, creatine and acetone, were found to differentiate HCC patients from both healthy controls and patients with cirrhosis with high accuracy. The use of liquid chromatography (LC)-MS and gas chromatography (GC)-MS has also been made to discover promising metabolite marker candidates, including amino acids and lipids [29−33]. These studies have identified metabolites with high classification accuracy, revealing metabolite profiling to be a promising approach. However, additional studies are needed; specifically, studies focusing on metabolite markers that distinguish patients with a risk of developing HCC. Many of the earlier studies have focused on separating HCC patients and healthy controls, which is less relevant clinically since healthy subjects are unlikely to develop HCC. Second, several of the metabolite marker candidates were discovered based on a limited number of samples and lack sufficient validation. Additionally, only a few of these studies focus on the population of the U.S. Considering that the risk of HCC differs across regions and ethnic groups, studies on different populations are also important.
In the present work, serum samples from 40 HCC patients with underlying HCV were collected before radiation or chemotherapy treatments, and 22 HCV patients with cirrhosis were studied. Most of these patients are Caucasians. Metabolite profiles were performed using 1 H NMR and analyzed statistically using several approaches including partial least squares discriminant analysis (PLS-DA). A good model could be built based on the entire NMR spectrum as well as on only three metabolite biomarkers, and these results were internally cross-validated. This study is the first to identify good serum metabolite biomarkers by NMR to distinguish HCC patients from a population of patients with HCV and cirrhosis in the U.S.

Chemicals
Deuterium oxide (D 2 O, 99.9% D) and sodium azide (NaN 3 ) were purchased from Cambridge Isotope Laboratories, Inc. (Andover, MA). The sodium salt of trimethylsilylpropionic acid-d 4 (TSP), used as the internal standard, was from Sigma-Aldrich (Milwaukee, WI). All chemical reagents were analytical grade and used without further purification.

Serum Sample Collection and Storage
Human serum samples (n = 62) were obtained from the Indiana University/Lilly tissue bank, and consisted of two cohorts: HCC patients (n = 40) with underlying HCV, and HCV patients (n = 22) without HCC. A summary of sample information can be seen in Table 1. Frozen samples were transported to Purdue University under dry ice and then kept at -80 °C until analysis. The study was approved by the Institutional Review Boards at both Purdue University and Indiana University School of Medicine.

Sample Preparation and Acquisition of NMR Spectra
Samples were prepared by mixing 400 µL serum with 5µL sodium azide (0.01% w/v) and 130 μL D 2 O. The solution (530 µL) was then transferred to a 5-mm NMR tube. A 60 μL, 0.5mM TSP solution contained in a capillary insert was used as an internal standard. For the 1D NMR experiments, the spectra were acquired at 298 K on a Bruker Avance-500 spectrometer equipped with a TXI gradient cryoprobe, using standard 1D NOESY and 1D CPMG (Carr-Purcell-Meiboom-Gill) pulse sequences, each coupled with water presaturation. For each spectrum, 128 transients were collected with 16k time domain data points and using a spectral width of 6,000 Hz. All spectra were Fourier transformed using a 1.0 Hz exponential line broadening. Each acquired spectrum was then phased, baseline corrected and aligned with reference to alanine (δ=1.479 ppm) using Bruker Topspin 3.0 software.

Statistical Analysis
After excluding the spectral region δ 4.7-5.2 ppm containing the residual water resonance, each spectrum was binned to 4096 points (bin size 0.003 ppm), and then normalized to the area of the TSP signal at 0.0 ppm. The spectral data from both the CPMG and NOESY experiments were initially mean centered and subjected to orthogonal-signal-corrected (OSC) partial least squares (PLS) analysis using Matlab (R2008a; Mathworks, Natick, MA) and the PLS Toolbox (version 4.11, Eigenvector Research Inc.).
In a second, more targeted analysis, a total of 19 metabolites were identified in the CPMG spectra by comparing their chemical shifts and multiplicities with the Human Metabolome Data Base [34]. The individual spectral regions for each of the 19 metabolite signals were then integrated. After auto-scaling, these peak integrals for both the HCC patients (n = 40) and HCV patients (n = 22) were subjected to principal component analysis (PCA) as well as partial least squares discriminant analysis (PLS-DA) with 7-fold internal cross-validation for model building. A receiver operating characteristics (ROC) curve was used to evaluate the performance of the model. Monte Carlo Cross Validation (MCCV) with 200 iterations was used to assess the model robustness using Matlab, PLS Toolbox version 4.11 and a home-developed code. For each of the iterations, the whole dataset was randomly divided into the training set (60% of the whole data set) and a testing set (40%). A PLS-DA model was built on the training set with 7-fold internal cross-validation to predict the validation set. The internal cross-validation prediction on the training set and the external prediction of the validation set were combined as the predicting result for each MCCV run. The overall true positive and true negative numbers were summarized, after which the sensitivity and specificity were calculated and compared with the results of a permutation analysis. In the permutation, the sample classification was randomly permuted and 200 MCCV iterations were performed as above.
Third, feature selection using the Student's t-test was performed for each metabolite between the HCC and HCV cohorts to focus the analysis on the most important metabolites for classification. Three significant metabolites (valine, creatinine and choline) with low (uncorrected, vide infra) p-values (<0.05) were selected as potential biomarkers. A new PLS-DA model was built, followed by MCCV and permutation with 200 iterations. Except for using 3 metabolite signals instead of 19, all the other procedures are the same as above. PCA analysis was also performed on these 3 biomarkers.

Results
The CPMG and NOESY spectra, averaged over the samples from each of the HCC and HCV patient cohorts, along with a difference spectrum, are shown in Figure 1 (a) and (c), respectively. We can observe clear changes in the CPMG spectra from several of the metabolite signals, including those from glucose, valine, alanine, lactate and choline. The changes from NOESY spectra are also clear, with most contributions coming from broad lipid signals. However, the large variation between samples makes it difficult to give any solid conclusion. The metabolic differences in both the NOESY and CPMG spectra between HCC and HCV patients can be identified using OSC-PLS analysis. The score plot for OSC-PLS analysis of the CPMG spectra is shown in Figure 1 (b). The two patient cohorts are separated and clustered in different areas of this score plot, with a few HCC samples overlapping the HCV region. The AUC for separation along LV1 was 0.71, with moderate sensitivity (0.74) but poor specificity (0.60). The loading plot (Supplemental Figure S1) indicates a number of peaks contribute to the separation. The score plot from the OSC-PLS analysis of the NOESY spectra shown in Figure 1 (d), shows an even better separation between the two patient cohorts, and the loading plot (Supplemental Figure S2) shows mostly lipid peaks. These results show promise for the future study of lipids. However, a major challenge in using NOESY to study lipids is that it cannot fully distinguish lipids with different fatty acid chains as they overlap. As a result, the following analysis will focus on CPMG spectra since they contain a larger number of peaks from identifiable and quantifiable metabolites. Considering the contribution to the loading plots from many low-lying and unidentified metabolite peaks, as well as noise, a more targeted approach was also pursued. Individual peaks from 19 known metabolites (See Supplemental Information Table S1) were integrated and analyzed to reduce the contribution from chemical noise and to focus the analysis on known metabolite species so as to provide more mechanistic information. Initially, PCA analysis was performed on the 19 metabolites to see the data clustering. The results are shown in Figure S3; as anticipated, clear separation of the two groups was not observed in the PCA results. A PLS-DA model was built based on these metabolite signals to investigate classification and discrimination. The cross-validated prediction result and ROC curve are shown in Figure S4. The two sample classes are somewhat separated by this model, but a number of misclassifications still exist. The area under curve (AUC) is 0.71.
The model was further tested by MCCV, and the results of the classification confusion matrix are shown in Supplementary Table S2. The low sensitivity (54%) and specificity (58%) that result from the MCCV procedure indicate that the model is not very strong. However, this model is still better than the permutation result (these data are provided in Table S2 as the values in parentheses). The sensitivity and specificity of the permutation test are only 50% and 48%, respectively, which is essentially a random result, as anticipated. The sensitivity and specificity results for both the true model and permutation test from 200 iterations are also plotted (see Supplemental Information Figure  S5). Although not very impressive there is still some separation, which indicates that the predictive model is better than a random one.  Analysis of the PLS-DA loading plots (Supplemental Figure S6) indicated that only a few metabolites, such as valine, choline, alanine, creatine and asparagine, contributed to the separation. Feature selection was therefore used to further filter the metabolite signals and focus the analysis on the true differences between the two patient cohorts. P-values from the unpaired Student's t-test were calculated for all 19 metabolites, and those metabolites with p < 0.05 were selected. Only three metabolites (choline, valine, and creatinine) passed this filter, and the p-values, fold changes, NMR chemical shifts and multiplicities for these three metabolites are listed in Table 2. Box-plots of the intensity data for the three metabolites ( Figure 2) indicate that choline and valine are up-regulated in HCC, while creatinine is down-regulated.
A new PLS-DA model was built based on the three metabolites, and the cross validation prediction results are shown in Figure 3. A much better result can be seen both in the classification and the ROC curve. The new AUC is 0.83, indicating that this is an improved model. A sensitivity of 80% can be obtained with a specificity of 71%, outperforming the clinical marker AFP, which has a sensitivity of 41% to 65% and specificity of 80% to 94% when using AFP level > 20 microg/L as the cutoff for HCC vs. HCV [35]. PCA analysis on these three markers showed some separation along PC1 as shown in Figure S7. To better evaluate the robustness of this model, the same MCVV and permutation were used again, and the results can be found in Table 3. This time, the average sensitivity and specificity are 71% and 73% for the true model, a significant increase over the results of the model based on 19 metabolites. As expected, the permutation results show essentially a random distribution (sensitivity = 54% and specificity = 39%). To better visualize the difference, the results of the MCCV procedure are plotted in Figure 4. True model results cluster towards the top-left corner of the plot, representing good sensitivity and specificity. The permutation results are spread about the center of the plot and are well separated from the true model.

Discussion
A metabolite profiling approach was applied to identify biomarker candidates for distinguishing HCC patients within an HCV population. The effectiveness of current HCC surveillance markers or methods such as alpha-fetoprotein (AFP) and abdominal ultrasound (US) are limited by low sensitivity and specificity. Hence the effectiveness of such approaches in reducing HCC mortality has remained modest [36]. Improved detection methods, such as blood-based biomarkers, are needed to improve this situation.
Metabolite biomarkers provide an opportunity to enhance the detection of HCC [29,33]. As shown in the present work, the entire 1 H NMR spectrum can be used to develop a diagnostic metabolite profile with good sensitivity and specificity [20]. This approach is based on the combination of a large number of metabolite signals, many of which have not yet been identified. The use of feature selection, based on the Student's t-test, resulted in 3 relatively strong biomarker candidates. We decided to use the uncorrected p-values, in part because each of these biomarker candidates has some precedence in cancer metabolism and due to our desire to avoid possible false negatives. In the case of creatinine, a gender imbalance in the two patient cohorts may be reducing its significance (vide infra). The resulting PLS-DA model based on these 3 metabolites shows good performance, at least better than the model based on the entire CPMG NMR spectrum. In contrast, the use of 19 metabolites without the use of feature selection performs much more poorly. The OSC-PLS analysis of the full NOESY spectra showed a clear separation between HCC and HCV patients, and this approach may be quite useful for distinguishing these cohorts. Future studies to validate these findings are planned, including an investigation of which types of lipids are contributing to the separation. However, the combination of isolated lipid signals and identified metabolites could not provide an improved model compared with the one built using the 3 metabolites (data not shown).
An investigation of age and gender effects on the model was also performed to evaluate possible confounding effects. The averages and standard deviations of the age distributions in HCC and HCV groups are quite similar, indicating that there is no confounding effect to be anticipated due to age. However, the gender distribution differs significantly between the two groups. We, therefore, performed a Student's t-test for the 3 markers between the male and female patients in each of the two patient cohorts. All p-values were above 0.05 (see Supplemental Table S3), indicating that any gender effect can be neglected for these metabolites in this study. The results also show that the disease effect on creatinine levels dominated any gender effect; creatinine increased overall, in males compared with females, and in females with HCV compared to those with HCC. Interestingly, the increase in creatinine levels for females was highly significant (Table S4).
The three metabolites identified by feature selection do have some precedence as biomarkers. Creatinine was found to decrease in the samples from HCC patients compared to those from patients with HCV without cancer. Unique to this study was the ability to show differences within two diseased states, as opposed to other studies that focused on differences between diseased states (cirrhosis or cancer) compared to normal controls. For example, creatinine was seen to decrease in the urine of liver cancer patients compared with healthy controls as detected by MS [37]. In an NMR study focused on African subjects, creatinine was lower in urine samples of patients with cirrhosis compared to the urine from healthy controls [28]. More recently, creatinine was found to be decreased in the serum of patients with HCC compared with healthy subjects [33]. Corroborating its potential role as a cancer biomarker, aberrations in serum or urine creatinine levels were also associated with other cancers such as lung cancer (in urine) [20], pancreatic cancer (in serum) [38], esophageal cancer (in serum) [25] and colorectal cancer (in urine) [39]. Creatinine levels are generally higher in males than in females and correlate with muscle mass [40]. It is important to emphasize that studies with unmatched gender participation can result in biased results for metabolites that are sensitive to gender. However, in this study, we find that the HCC patient group, which does have a significantly larger number of males compared to the HCV group, actually exhibits a lower concentration of creatinine, indicating a definitive pathological role for creatinine. In fact, among female patients alone the creatinine change from HCV to HCC is quite significant (p=0.003, Supplemental Tables S3 & S4 and Figure S8). One can anticipate that better gender-matched cohorts might well increase the significance of creatinine as a biomarker for HCC. Nevertheless, the specific molecular mechanism of its association with HCC and/or HCV remains to be explored.
In contrast, valine and choline were found to be upregulated in HCC patients. The elevation of valine has been observed in HCC tissue [27] and blood [41], as well as the serum of HBV infected cirrhosis patients [42]. An important step of valine catabolism occurs largely in the liver. This step involves oxidative decarboxylation of branched-chain α-keto acids generated from valine and other branched-chain amino acids in extrahepatic tissues [43,44]. Previous studies showed that methacrylylcoenzyme A (MC-CoA), a toxic compound generated in valine catabolism, is less detoxified in HCC or cirrhosis patients. MC-CoA induces a change of valine metabolism resulting in increased serum valine [45]. It is worth noting that changes in valine levels have been found in some digestive system cancers, such as oral cancer [46] and gastric cancer [47].
Changes in choline metabolism have also been related with HCC previously. The Lin group found decreased choline in HCC and cirrhosis patient sera compared with normal sera, although they did not compare HCC and cirrhotic patients [48]. In HCC tissue, choline was found upregulated [27], which is consistent with previous in vivo MRS studies [49]. Generally, choline is an essential metabolite in the synthesis of phospholipids for cancer cell membranes [50]. This metabolism has been studied and monitored by NMR previously [51][52][53]. Choline is also associated with many cancer types. For example, it has shown to be associated with colorectal cancer [54], high grade gliomas [55], and breast cancer [56]. Thus, the metabolism of the membrane phospholipids caused by accelerated cell proliferation could be a reason for elevated choline in the sera of HCC patients [27].

Conclusions
1 H NMR metabolic profiling of serum samples has been shown to differentiate HCC from HCV patients. In addition to a good separation based on broad lipid signals in the NMR spectra, three metabolites, creatinine, valine and choline, were found to differentiate the two disease groups and each metabolite has some precedence as a potential HCC biomarker in human serum or urine. In addition, these metabolites are readily detected in serum by a number of analytical methods, indicating that upon further validation they could be straightforwardly translated into clinical practice.
A distinguishing feature of this study is that it focuses on a particularly challenging patient cohort, i.e., those with underlying HCV. It is extremely difficult to differentiate HCC patients with underlying HCV from HCV patients for several reasons: 1) mediators associated with inflammation often overlap with those associated with cancer and therefore teasing out cancer specific differences is difficult; 2) changes associated with fibrosis also overlap with cancer and the majority of HCV patients do not develop cancer until the liver has become severely fibrotic; and 3) confirmation of cancer requires pathologic evidence that is not found in cases where resection or transplant has not been performed or where occult disease is present, but only detected from the most sophisticated tests. Patients with HCV were of particular interest for this study since they represent the largest cohort of HCC patients within the US and are at the highest risk for developing HCC during their lifetimes. The results of this study indicate the promise of developing metabolite profiles for the detection of HCC. Future studies will focus on adding MS detected biomarker candidates and expansion of the studies with additional sample cohorts. We anticipate that additional metabolite biomarkers will significantly improve the detection model and provide an alternative to current modalities.

Conflict of Interest
The authors declare no conflict of interest.