Exploring the Genomic Landscape of Hepatobiliary Cancers to Establish a Novel Molecular Classification System

Simple Summary Machine learning identified hepatobiliary molecular subtypes with overlapping oncogenomic pathways independent of traditional cancer taxonomy. This new classification system identified tumor biology features that may affect future hepatobiliary staging systems and lead to the improved selection of oncologic treatment plans. Abstract Taxonomy of hepatobiliary cancer (HBC) categorizes tumors by location or histopathology (tissue of origin, TO). Tumors originating from different TOs can also be grouped by overlapping genomic alterations (GA) into molecular subtypes (MS). The aim of this study was to create novel HBC MSs. Next-generation sequencing (NGS) data from the AACR-GENIE database were used to examine the genomic landscape of HBCs. Machine learning and gene enrichment analysis identified MSs and their oncogenomic pathways. Descriptive statistics were used to compare subtypes and their associations with clinical and molecular variables. Integrative analyses generated three MSs with different oncogenomic pathways independent of TO (n = 324; p < 0.05). HC-1 “hyper-mutated-proliferative state” MS had rapidly dividing cells susceptible to chemotherapy; HC-2 “adaptive stem cell-cellular senescence” MS had epigenomic alterations to evade immune system and treatment-resistant mechanisms; HC-3 “metabolic-stress pathway” MS had metabolic alterations. The discovery of HBC MSs is the initial step in cancer taxonomy evolution and the incorporation of genomic profiling into the TNM system. The goal is the development of a precision oncology machine learning algorithm to guide treatment planning and improve HBC outcomes. Future studies should validate findings of this study, incorporate clinical outcomes, and compare the MS classification to the AJCC 8th staging system.

Recent advancements in molecular, genomic, epigenomic, and transcriptomic research signal a potential paradigm shift in HBC taxonomy and staging systems [19,20].There is a growing body of evidence suggesting that tumors from differing TOs can be grouped based on similar genomic alterations (GAs), leading to a molecular classification or subtyping [21][22][23][24][25][26][27][28].This approach is exemplified by the latest AJCC breast cancer staging, which has adopted a Prognostic Stage (PS) [29][30][31].The development of the PS addressed limitations of prior staging systems that failed to adequately discern between breast cancer subtypes based on tumor biology and response to treatment [32][33][34][35].This modern staging system integrates molecular subtypes, such as hormone and HER-2 receptor status, with genomic profiles, and a 21-gene assay Recurrence Score (RS), to more accurately predict treatment outcomes [36].
In light of these developments, we contend that there is a critical need for an updated HBC taxonomy and staging systems that integrate molecular classifications.We hypothesize that the tumor biology, prognosis, and treatment response of HBC are significantly influenced by their molecular subtypes' oncogenomic pathways.Therefore, this proofof-concept study was conducted with the objective to identify molecular subtypes (MS) unique to HBC, with the intent to enhance our understanding of HBC tumor biology.

Selection of a Genomic Database
We conducted a multidimensional analysis of next-generation sequencing (NGS) data to unravel the genomic landscape of hepatobiliary cancers (HBCs), utilizing the American Association of Cancer Research (AACR) Genomics Evidence Neoplasia Information Exchange (GENIE) Project.The AACR-GENIE, an international precision oncology collaborative, compiles genomic data linked with electronic medical records from 19 leading cancer centers, providing an expansive registry [37][38][39].Protocols for data generation and sharing of de-identified information are institutionally approved across all participating entities of AACR-GENIE.

Gene Panels Used for Next-Generation Sequencing of Hepatobiliary Cancers
The gene panels from the AACR-GENIE database (version 6.1) employed in this study are detailed in the Pipeline for Annotating Mutations and Filtering Putative Germline Single Nucleotide Polymorphisms (SNPs), available at https://www.aacr.org/wp-content/uploads/2020/02/20200127_GENIE_Data_Guide_7.pdf(accessed on 2 March 2023) [37][38][39].This compendium included 48 genes consistently represented across the gene panels used by contributing institutions.Data encompass mutation profiles, copy-number variants from three centers, and structural rearrangements from two centers.Paired tumor-normal sequencing was conducted at two centers, while the remaining executed tumor-only sequencing [37][38][39].To enhance the validity of our findings, we excluded patients with incomplete gene mutation coverage based on the specific genomic panels used.

Patient Selection
From the AACR-GENIE database, 329 patients aged 18-89 years with hepatobiliary cancer were identified and classified into three TO subtypes: cholangiocarcinoma (CCA, n = 115), hepatocellular carcinoma (HCC, n = 127), and gallbladder cancer (GBC, n = 87) (Figure 1).The CCA TO subtype included intrahepatic, perihilar, and distal cholangiocarcinomas due to the database's present classification system.Exclusion criteria entailed patients with non-hepatobiliary cancers, hepatoblastoma, liver angiosarcoma, co-diagnosis of HCC and intrahepatic cholangiocarcinoma (ICC), or limited genomic sequencing data (Figure 1).Predominantly, a single tissue sample was subjected to NGS, with primary tumor data utilized for patients with dual samples.

Patient Selection
From the AACR-GENIE database, 329 patients aged 18-89 years with hepatobiliary cancer were identified and classified into three TO subtypes: cholangiocarcinoma (CCA, n = 115), hepatocellular carcinoma (HCC, n = 127), and gallbladder cancer (GBC, n = 87) (Figure 1).The CCA TO subtype included intrahepatic, perihilar, and distal cholangiocarcinomas due to the database's present classification system.Exclusion criteria entailed patients with non-hepatobiliary cancers, hepatoblastoma, liver angiosarcoma, co-diagnosis of HCC and intrahepatic cholangiocarcinoma (ICC), or limited genomic sequencing data (Figure 1).Predominantly, a single tissue sample was subjected to NGS, with primary tumor data utilized for patients with dual samples.

Identification of HBC Molecular Subtypes
A comprehensive evaluation of the molecular data for gene mutations and CNVs was undertaken using machine learning.The initial dataset contained 981 features, including 36 duplicates and 194 CNV events.After excluding empty features and patients lacking CNV data, 786 features across 324 patients remained.PREX2 was excluded due to inadequate sequencing data, leaving 47 genes.An AI-based deep integrative analysis utilized both mutational and CNV data to identify unique molecular subtypes.This analysis distilled a singular value of molecular alterations with 99 features in 324 patients (excluding 5 for insufficient CNV data); three features with correlated events were removed.Nonnegative Matrix Factorization (NMF) machine learning discerned the optimal molecular subtype number, employing the top 38 most variable features.The Uncorrelated

Identification of HBC Molecular Subtypes
A comprehensive evaluation of the molecular data for gene mutations and CNVs was undertaken using machine learning.The initial dataset contained 981 features, including 36 duplicates and 194 CNV events.After excluding empty features and patients lacking CNV data, 786 features across 324 patients remained.PREX2 was excluded due to inadequate sequencing data, leaving 47 genes.An AI-based deep integrative analysis utilized both mutational and CNV data to identify unique molecular subtypes.This analysis distilled a singular value of molecular alterations with 99 features in 324 patients (excluding 5 for insufficient CNV data); three features with correlated events were removed.Nonnegative Matrix Factorization (NMF) machine learning discerned the optimal molecular subtype number, employing the top 38 most variable features.The Uncorrelated Shrunken Centroids (USC) method, set to a delta value below 5 (∆ < 5) and a maximum correlation between genomic regions of 0.7, was used to classify hepatobiliary tumors into distinct molecular subtypes.

Bioinformatics and Biostatistical Analyses
The 'oncoprint' feature of the R/Bioconductor package 'TCGAbiolinks' visualized the mutational profiles of each HBC molecular subtype.Correlation and unsupervised hierarchical clustering analyses were performed with the MultiExperiment Viewer v4.9 [41][42][43][44].Gene enrichment pathway analyses utilized Metascape [45] to interpret the MS genomic data.Pearson's χ2 test analyzed the associations between TO and MS subtypes in 311 patients (excluding 13 for lack of data, Figure 1).Statistical analyses employed the dplyr and tidyr R packages, considering p < 0.05 as significant.Summary statistics detailed the cohort and individual subtypes, with Pearson's χ2 test elucidating associations between categorical variables like TO subtype, molecular subtype, age groups, and race.The R/ggplot2 package facilitated data visualization.

Machine Learning Distinguishes Three Unique HBC Molecular Subtypes
Out of the initial cohort, 324 hepatobiliary tumors were subjected to a machine learning algorithm, resulting in the identification of three distinct HBC molecular subtypes, denoted as HC-1, HC-2, and HC-3 (p < 0.05) (Figures 2 and 3).Subtype HC-1 presented a hypermutated phenotype with prevalent mutations in TP53 and ARID1A, and CNVs in CCND1 and FGF4.This subtype exhibited high CNVs (>5%) across all genes analyzed.HC-2 was marked by pathogenic mutations in TP53, TERT, and CTNNB1, with generally low CNVs, except in genes like MYC, RECQL4, and CDKN2A.The HC-3 subtype, characterized by lower mutation rates, displayed significant mutations in IDH1 and higher CNVs in MDM2 and CDKN2A.

Gene Enrichment Analyses Elucidates Oncogenomic Pathways
Gene mutation and CNV profiles for each molecular subtype (n = 324) guided our gene enrichment analyses (Figure 4).The HC-1 subtype was enriched for alterations in cell growth, mitotic transition, and chromosome organization, indicative of a proliferative phenotype.In contrast, HC-2 tumors were disrupted in pathways akin to stem cell function and cellular senescence, while HC-3 tumors exhibited changes in metabolic processes and stress response pathways (p < 0.05).
The HC-2 molecular subtype was most prevalent, representing 56.9% of HBC tumors and showing a higher frequency in HCC and GBC (Table 2 and Figure 5).Conversely, the HC-3 subtype, comprising 32.5% of the cohort, was predominantly associated with CCA.The least common, HC-1, accounted for 10.6% and was evenly distributed across all TOs.

Comparative Analysis of HBC Molecular and TO Subtypes
Following the establishment of MSs, 311 HBC cases remained for comparative analysis, mirroring the demographics of our initial genomic evaluation: primarily male (57.6%),Caucasian (74.9%), non-Hispanic (92.9%), aged 55-69 years (51.8%), with a median age of 64 years.HCC constituted the largest TO subtype (38.3%), with CCA (34.4%) and GBC (27.3%) following.Gender disparities were noted, with a higher likelihood of HCC in males and GBC in females (p < 0.001).No significant differences in TO subtypes were observed in relation to age or ethnicity (Table 2).The HC-2 molecular subtype was most prevalent, representing 56.9% of HBC tumors and showing a higher frequency in HCC and GBC (Table 2 and Figure 5).Conversely, the HC-3 subtype, comprising 32.5% of the cohort, was predominantly associated with CCA.The least common, HC-1, accounted for 10.6% and was evenly distributed across all TOs.Significant differences in genomic alterations led to distinct mutation profiles for each MS (p < 0.001) (Supplemental Figure S1a-c), with oncogenic mutations being particularly abundant in HC-2 and HCC subtypes (p = 0.0005) (Figure 6).
Cancers 2024, 16, x FOR PEER REVIEW 9 of 15 Significant differences in genomic alterations led to distinct mutation profiles for each MS (p < 0.001) (Supplemental Figure S1a-c), with oncogenic mutations being particularly abundant in HC-2 and HCC subtypes (p = 0.0005) (Figure 6).Cancers 2024, 16, x FOR PEER REVIEW 9 of 15 Significant differences in genomic alterations led to distinct mutation profiles for each MS (p < 0.001) (Supplemental Figure S1a-c), with oncogenic mutations being particularly abundant in HC-2 and HCC subtypes (p = 0.0005) (Figure 6).
In this study, machine learning facilitated the creation of a modern taxonomy system comprising three HBC-specific MSs.Unlike previous GI malignancy classifications, which included a heterogeneous mix of tumors, the HBC MSs we identified are more homogenized, a feature we believe is critical for unraveling the obscure oncogenomic pathways that influence HBC tumor biology.
Gene enrichment analysis shed light on several oncogenomic pathways associated with each HBC MS, revealing their untapped potential.Our data suggest that HBC patients within the HC-1 subtype, characterized by a "hyper-mutated-proliferative state", could potentially derive the most benefit from chemotherapy.This subgroup's predisposition to rapid cell division renders them more susceptible to cytotoxic chemotherapy than slowerdividing cells.This may clarify why previous studies have reported a lackluster response to chemotherapy, given that only a subset of hepatobiliary tumors falls under the HC-1 classification [2][3][4][5][6][61][62][63][64].Future research should aim to determine the predictive value of the HC-1 classification for chemotherapy responsiveness.
The HC-2 subtype, dubbed the "adaptive stem cell-cellular senescence" subtype, exhibited epigenomic alterations that may enable evasion of immune surveillance and contribute to treatment resistance.These cells can become dormant under treatmentinduced stress, potentially resurging when conditions are favorable, which may serve as a mechanism for treatment resistance.Additionally, the prevalence of CTNNB1 mutations in HC-2 tumors suggests a disrupted WNT/beta-catenin pathway.While inhibiting upstream regulators of this pathway has been explored, direct targeting of beta-catenin might improve response in tumors with such mutations [60,65].The absence of FDA-approved drugs targeting beta-catenin highlights the necessity for research into HC-2 specific therapies to enhance treatment efficacy and survival outcomes for HBC.
The HC-3 "metabolic-stress pathway" subtype, despite its lower mutational load compared to HC-1 and HC-2, features clinically actionable IDH1 mutations.Nevertheless, the majority of HC-3 tumors lacked this mutation, underscoring the limited number of HBC patients suitable for targeted therapy.Alterations in this subtype are implicated in proteomic mutation pathways and metabolism of lipids, fats, and amino acids [48,66].Further investigation is warranted to elucidate how these metabolic genomic alterations influence tumor biology and to potentially discover novel treatments [67].
An intriguing aspect of this study was the observed disparity in baseline patient characteristics, hinting at potential inequalities in NGS utilization, referrals to high-volume cancer centers, or access to national cancer centers for non-Caucasian HBC patients.Although NGS is covered by Medicare since 2018, recent studies have indicated disparities in NGS usage among different racial and insurance groups in non-HBC malignancies [68].This finding, not previously confirmed in HBC patients, warrants further research to uncover the underlying causes of such disparities in NGS utilization.
Our study, conducted through retrospective analysis of the ACCR-GENIE database, inherently faces several limitations that warrant a careful interpretation of the results.These limitations include selection bias, confounding factors, and the potential for incomplete data, which could influence the generalizability of our findings.Acknowledging these challenges, we emphasize the need for diverse patient populations and the use of real-world data in future research to enhance the reliability of outcomes and extend their relevance to broader patient groups.
Despite the breadth of the ACCR-GENIE database, the omission of transcriptomic data presents a notable gap in our exploration of oncogenomic pathways.Additionally, the current amalgamation of intrahepatic and extrahepatic cholangiocarcinoma cases under a single category limits our ability to distinguish the nuances between these subtypes.It is anticipated that subsequent updates to the database will enable more refined classification, facilitating deeper insights into the molecular subtypes identified.
The scarcity of detailed clinical data also hampers our ability to draw robust correlations between molecular subtypes and clinical endpoints such as prognosis and treatment response.As the database evolves, incorporating more comprehensive clinical details will be crucial for refining these molecular subtypes and understanding their implications in a clinical context.
Our study sets the stage for future research, aiming to validate and build upon the initial hypothesis.Through subsequent, more inclusive and methodologically robust studies, including phase III clinical trials and multivariable analyses, we aim to mitigate the impact of biases and enhance the external validity of our findings.The ultimate goal is to foster a deeper understanding of hepatobiliary cancer biology and to potentially guide clinical decision-making.

Conclusions
This study introduces HBC molecular subtypes as a pioneering step toward revolutionizing cancer taxonomy and integrating genomic profiling into HBC staging systems.Our goal is to develop a precision oncology algorithm, informed by machine learning, to refine treatment planning and enhance patient outcomes in HBC.Future studies should validate our findings, incorporate clinical data, and evaluate how our HBC MS classification compares with the current AJCC HBC staging system.
Author Contributions: A.J.S., R.K.M., D.G., M.G. and J.I.J.O.: conception and design of work, data analysis, drafting, and revising of the manuscript.M.G.-K., J.A.S.-B., M.E.-M., J.G. and A.K.: data analysis, drafting, and revising manuscript.All authors have read and agreed to the published version of the manuscript.Funding: M.E.M. would like to thank the Asociación Española Contra el Cáncer and Instituto de la Salud Carlos III Miguel Servet Project (# CP17/00188).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Figure 1 .
Figure 1.Consortium diagram for hepatobiliary cancer (HBC) patients included in analyses.

Figure 3 .
Figure 3. (a) The gene alterations of the HC-1 subtype.HC-1 subtype favored gene mutations in TP53 and ARIDIA.(b) The gene alterations of the HC-2 subtype.The HC-2 subtype demonstrated gene mutations in TP53, TERT, and CTNNB1.(c) The gene alterations of the HC-3 subtype.The HC-3 subtype had mutations in IDH1.HC (hepatobiliary cancer molecular subtype).

Figure 4 .
Figure 4. Gene enrichment pathway analyses using molecular subtype genomic data identified different oncogenomic pathways influencing the tumor biology of the distinct molecular subtypes (p < 0.05).HC (hepatobiliary cancer molecular subtype).

Figure 4 .
Figure 4. Gene enrichment pathway analyses using molecular subtype genomic data identified different oncogenomic pathways influencing the tumor biology of the distinct molecular subtypes (p < 0.05).HC (hepatobiliary cancer molecular subtype).
* Analysis performed prior to the creation of molecular subtypes.

Table 2 .
Comparison of hepatobiliary tumor molecular and tissue of origin subtypes.

Table 2 .
Comparison of hepatobiliary tumor molecular and tissue of origin subtypes.