Figure 1.
(A) Study design for the identification and validation of a metaflammation signature. The workflow involved differential expression analysis of several disease-specific datasets: TCGA-CHOL (cholangiocarcinoma, n = 45), GSE107943 (CCA, n = 163), GSE23343 (Type 2 Diabetes, n = 20), and GSE58208 (HBV infection, n = 102). A Venn analysis identified a consistent core set of genes altered across these conditions. This core gene set was validated using data from the Human Protein Atlas and subsequently analyzed using pathway enrichment (KEGG/Reactome), protein–protein interaction networks (STRING), and clinical survival analysis. The findings were synthesized to define a metaflammation signature and construct a model linking chronic metabolic inflammation to disease pathogenesis. (B) Workflow for the integrative transcriptomic analysis and validation. The schematic describes a bioinformatics pipeline beginning with transcriptomic data acquisition from the TCGA-CHOL cohort and GEO (GSE107943, GSE23343, GSE58208) microarray datasets. Following preprocessing and normalization, differential expression analysis was conducted for conditions CCA, T2D, and HBV. The resulting gene lists were merged to identify a core metaflammation gene set, which was then subjected to functional enrichment and protein–protein interaction network analysis. The clinical relevance of key hub genes was evaluated in the TCGA cohort through survival analysis, and the protein-level expression of prioritized hubs was confirmed with immunohistochemistry data from the Human Protein Atlas. Note: CCA datasets are derived from bile duct tissue, while T2D and HBV datasets are from whole liver tissue. This tissue source heterogeneity is a key limitation discussed in the text.
Figure 1.
(A) Study design for the identification and validation of a metaflammation signature. The workflow involved differential expression analysis of several disease-specific datasets: TCGA-CHOL (cholangiocarcinoma, n = 45), GSE107943 (CCA, n = 163), GSE23343 (Type 2 Diabetes, n = 20), and GSE58208 (HBV infection, n = 102). A Venn analysis identified a consistent core set of genes altered across these conditions. This core gene set was validated using data from the Human Protein Atlas and subsequently analyzed using pathway enrichment (KEGG/Reactome), protein–protein interaction networks (STRING), and clinical survival analysis. The findings were synthesized to define a metaflammation signature and construct a model linking chronic metabolic inflammation to disease pathogenesis. (B) Workflow for the integrative transcriptomic analysis and validation. The schematic describes a bioinformatics pipeline beginning with transcriptomic data acquisition from the TCGA-CHOL cohort and GEO (GSE107943, GSE23343, GSE58208) microarray datasets. Following preprocessing and normalization, differential expression analysis was conducted for conditions CCA, T2D, and HBV. The resulting gene lists were merged to identify a core metaflammation gene set, which was then subjected to functional enrichment and protein–protein interaction network analysis. The clinical relevance of key hub genes was evaluated in the TCGA cohort through survival analysis, and the protein-level expression of prioritized hubs was confirmed with immunohistochemistry data from the Human Protein Atlas. Note: CCA datasets are derived from bile duct tissue, while T2D and HBV datasets are from whole liver tissue. This tissue source heterogeneity is a key limitation discussed in the text.
![Cancers 18 00923 g001 Cancers 18 00923 g001]()
Figure 2.
(A) Evaluation of within-dataset batch correction for microarray data. PCA plots of the GSE58208 (HBV) dataset show samples before and after batch correction, illustrating the effects of the ComBat algorithm. Initially, samples are colored by processing batch and biological condition (HBV+ vs. HBV-). After correction, the plots indicate reduced clustering by technical batch while maintaining clear separation by biological condition, demonstrating the effective removal of non-biological variance. Similar quality control assessments were conducted on other datasets, such as GSE107943. (B) Comparative Differential Gene Expression Analysis Across Datasets. Volcano plots show the results of differential gene expression across four transcriptomic studies comparing disease to control groups. The x-axis represents log2 fold change (log2FC) in gene expression, and the y-axis indicates statistical significance, marked by −log10(FDR). Dashed horizontal lines denote significance thresholds (FDR < 0.05), while vertical dashed lines indicate fold-change thresholds (|log2FC| > 1). Data points are highlighted in red for significantly upregulated genes (FDR < 0.05, log2FC > 1), blue for downregulated genes (FDR < 0.05, log2FC < −1), and gray for non-significant genes. The studies include TCGA-CHOL (tumor vs. normal bile duct tissues, n = 45), GSE107943 (CCA validation, n = 163), GSE23343 (type 2 diabetes vs. control, n = 20), and GSE58208 (HBV+ vs. HBV−, n = 102).
Figure 2.
(A) Evaluation of within-dataset batch correction for microarray data. PCA plots of the GSE58208 (HBV) dataset show samples before and after batch correction, illustrating the effects of the ComBat algorithm. Initially, samples are colored by processing batch and biological condition (HBV+ vs. HBV-). After correction, the plots indicate reduced clustering by technical batch while maintaining clear separation by biological condition, demonstrating the effective removal of non-biological variance. Similar quality control assessments were conducted on other datasets, such as GSE107943. (B) Comparative Differential Gene Expression Analysis Across Datasets. Volcano plots show the results of differential gene expression across four transcriptomic studies comparing disease to control groups. The x-axis represents log2 fold change (log2FC) in gene expression, and the y-axis indicates statistical significance, marked by −log10(FDR). Dashed horizontal lines denote significance thresholds (FDR < 0.05), while vertical dashed lines indicate fold-change thresholds (|log2FC| > 1). Data points are highlighted in red for significantly upregulated genes (FDR < 0.05, log2FC > 1), blue for downregulated genes (FDR < 0.05, log2FC < −1), and gray for non-significant genes. The studies include TCGA-CHOL (tumor vs. normal bile duct tissues, n = 45), GSE107943 (CCA validation, n = 163), GSE23343 (type 2 diabetes vs. control, n = 20), and GSE58208 (HBV+ vs. HBV−, n = 102).
![Cancers 18 00923 g002 Cancers 18 00923 g002]()
Figure 3.
(A) Identification and Functional Characterization of a Core Metaflammation Gene Set. Identification of a statistically significant core gene set. (A) Venn diagram of DEG overlap across three pathological states. The diagram illustrates the intersection of differentially expressed genes (DEGs) from three disease-specific signatures: CCA (the consensus of DEGs from the TCGA-CHOL and GSE107943 cohorts, n = 2347 genes), Type 2 Diabetes (T2D) (GSE23343, n = 894 genes), and Hepatitis B Virus infection (HBV) (GSE58208, n = 1247 genes). The central overlap of 156 genes is highly statistically significant (hypergeometric p = 2.3 × 10−15, 42.6-fold enrichment) and is defined as the core metaflammation module. (B) The functional composition of 156 core genes was categorized by primary biological function, revealing a predominant emphasis on inflammatory/cytotoxic and regulatory pathways, collectively characterizing the metaflammation phenotype. (C) Summary of the core module’s role. The integrative analysis reveals a conserved 156-gene signature common to distinct inflammatory-metabolic disease states, predominantly composed of inflammatory/cytotoxic (75 genes) and regulatory (30 genes) pathways. (B) Expression heatmap of 20 representative core metaflammation genes. Unsupervised hierarchical clustering analysis of 20 core genes was conducted across four sample conditions: normal liver/bile duct (Normal), cholangiocarcinoma (CCA), type 2 diabetic liver (T2D), and hepatitis B virus-infected liver (HBV). Each gene is represented in rows, with expression levels color-coded—red for upregulation and blue for downregulation. Key functional categories include inflammation (NFKB1, STAT3, IL6, TNF, IL1B), metabolism (PPARG, SREBF1), oncogenesis (TP53, MYC, KRAS), and growth factor signaling (EGFR, MET, VEGFA). Normal samples clustered distinctly, indicating baseline expression. CCA tumors showed significant upregulation of oncogenic and inflammatory genes, T2D livers exhibited metabolic regulators, and HBV-infected livers presented with strong upregulation of immune mediators.
Figure 3.
(A) Identification and Functional Characterization of a Core Metaflammation Gene Set. Identification of a statistically significant core gene set. (A) Venn diagram of DEG overlap across three pathological states. The diagram illustrates the intersection of differentially expressed genes (DEGs) from three disease-specific signatures: CCA (the consensus of DEGs from the TCGA-CHOL and GSE107943 cohorts, n = 2347 genes), Type 2 Diabetes (T2D) (GSE23343, n = 894 genes), and Hepatitis B Virus infection (HBV) (GSE58208, n = 1247 genes). The central overlap of 156 genes is highly statistically significant (hypergeometric p = 2.3 × 10−15, 42.6-fold enrichment) and is defined as the core metaflammation module. (B) The functional composition of 156 core genes was categorized by primary biological function, revealing a predominant emphasis on inflammatory/cytotoxic and regulatory pathways, collectively characterizing the metaflammation phenotype. (C) Summary of the core module’s role. The integrative analysis reveals a conserved 156-gene signature common to distinct inflammatory-metabolic disease states, predominantly composed of inflammatory/cytotoxic (75 genes) and regulatory (30 genes) pathways. (B) Expression heatmap of 20 representative core metaflammation genes. Unsupervised hierarchical clustering analysis of 20 core genes was conducted across four sample conditions: normal liver/bile duct (Normal), cholangiocarcinoma (CCA), type 2 diabetic liver (T2D), and hepatitis B virus-infected liver (HBV). Each gene is represented in rows, with expression levels color-coded—red for upregulation and blue for downregulation. Key functional categories include inflammation (NFKB1, STAT3, IL6, TNF, IL1B), metabolism (PPARG, SREBF1), oncogenesis (TP53, MYC, KRAS), and growth factor signaling (EGFR, MET, VEGFA). Normal samples clustered distinctly, indicating baseline expression. CCA tumors showed significant upregulation of oncogenic and inflammatory genes, T2D livers exhibited metabolic regulators, and HBV-infected livers presented with strong upregulation of immune mediators.
![Cancers 18 00923 g003 Cancers 18 00923 g003]()
Figure 6.
(A) Representative immunohistochemical validation of hub protein expression and subcellular localization. Immunohistochemical images from the Human Protein Atlas reveal contrasting protein expression in normal bile ducts and cholangiocarcinoma (CCA) or key proteins in the metaflammation hub. IL6 staining shows strong cytoplasmic immunoreactivity in CCA cells, while normal bile duct epithelium exhibits minimal staining. In contrast, PPARG shows nuclear expression in normal bile duct epithelium, but is significantly reduced or absent in CCA tissue. This indicates shifts in subcellular localization: cytoplasmic accumulation of pro-inflammatory mediators (e.g., IL6, TNF) and loss of nuclear localization of metabolic regulators (e.g., PPARG) during CCA progression. (B) Demonstrates that a novel metaflammation gene expression signature serves as a robust and independent prognostic biomarker in CCA. The analysis of the TCGA-CHOL cohort (n = 45) demonstrates that high-risk patients have significantly poorer overall survival than low-risk patients (HR = 2.8, 95% CI: 1.8–4.3, p < 0.001). Key findings include strong prognostic stratification shown by Kaplan–Meier analysis (p < 0.001), with markedly reduced 3-year survival in the high-risk group, an independent predictive value confirmed by multivariate Cox regression (p = 0.000), and a significant negative correlation between the metaflammation score and survival time (Spearman ρ = −0.458, p < 0.0016). Additionally, time-dependent ROC analysis indicates stable, moderate predictive accuracy (AUC ~0.65–0.70) for survival up to 36 months.
Figure 6.
(A) Representative immunohistochemical validation of hub protein expression and subcellular localization. Immunohistochemical images from the Human Protein Atlas reveal contrasting protein expression in normal bile ducts and cholangiocarcinoma (CCA) or key proteins in the metaflammation hub. IL6 staining shows strong cytoplasmic immunoreactivity in CCA cells, while normal bile duct epithelium exhibits minimal staining. In contrast, PPARG shows nuclear expression in normal bile duct epithelium, but is significantly reduced or absent in CCA tissue. This indicates shifts in subcellular localization: cytoplasmic accumulation of pro-inflammatory mediators (e.g., IL6, TNF) and loss of nuclear localization of metabolic regulators (e.g., PPARG) during CCA progression. (B) Demonstrates that a novel metaflammation gene expression signature serves as a robust and independent prognostic biomarker in CCA. The analysis of the TCGA-CHOL cohort (n = 45) demonstrates that high-risk patients have significantly poorer overall survival than low-risk patients (HR = 2.8, 95% CI: 1.8–4.3, p < 0.001). Key findings include strong prognostic stratification shown by Kaplan–Meier analysis (p < 0.001), with markedly reduced 3-year survival in the high-risk group, an independent predictive value confirmed by multivariate Cox regression (p = 0.000), and a significant negative correlation between the metaflammation score and survival time (Spearman ρ = −0.458, p < 0.0016). Additionally, time-dependent ROC analysis indicates stable, moderate predictive accuracy (AUC ~0.65–0.70) for survival up to 36 months.
![Cancers 18 00923 g006 Cancers 18 00923 g006]()
Figure 7.
(A) Integrative Model of Module Crosstalk. A schematic illustrates the interaction of three core functional modules: metabolic, inflammatory, and growth-promoting signals. Metabolic (Score: 6.8), Inflammatory (Score: 8.4), and Growth (Score: 5.2), highlighting how key regulator genes integrate signals. This crosstalk is essential to the metaflammation state that contributes to cholangiocarcinoma pathogenesis in the context of T2D and HBV infection. (B) Proposed Model of Convergent Mechanisms Linking T2D and HBV to CCA. This schematic outlines how Type 2 Diabetes (T2D) and Hepatitis B Virus (HBV) infection may converge to induce metaflammation, activating key inflammatory and oncogenic pathways, including NF-κB, JAK-STAT/STAT3, and cytokine networks (TNF, IL-6). This activation can result in oncogenic transformation and accelerated progression of cholangiocarcinoma (CCA). The findings indicate that the co-presence of T2D and HBV increases CCA risk, which might lead to drug resistance, metastasis, and decreased survival rates.
Figure 7.
(A) Integrative Model of Module Crosstalk. A schematic illustrates the interaction of three core functional modules: metabolic, inflammatory, and growth-promoting signals. Metabolic (Score: 6.8), Inflammatory (Score: 8.4), and Growth (Score: 5.2), highlighting how key regulator genes integrate signals. This crosstalk is essential to the metaflammation state that contributes to cholangiocarcinoma pathogenesis in the context of T2D and HBV infection. (B) Proposed Model of Convergent Mechanisms Linking T2D and HBV to CCA. This schematic outlines how Type 2 Diabetes (T2D) and Hepatitis B Virus (HBV) infection may converge to induce metaflammation, activating key inflammatory and oncogenic pathways, including NF-κB, JAK-STAT/STAT3, and cytokine networks (TNF, IL-6). This activation can result in oncogenic transformation and accelerated progression of cholangiocarcinoma (CCA). The findings indicate that the co-presence of T2D and HBV increases CCA risk, which might lead to drug resistance, metastasis, and decreased survival rates.
![Cancers 18 00923 g007 Cancers 18 00923 g007]()
Table 1.
Dataset Characteristics and Preprocessing Summary.
Table 1.
Dataset Characteristics and Preprocessing Summary.
| Datasets | Platform | Samples (Case/Control) | Normalization | Batch Correction |
|---|
| TCGA-CHOL | RNA-Seq | 36/9 | TMM + DESeq2 VST (for visualization) | ComBat-seq (on raw counts) |
| GSE107943 | Microarray | 104/59 | RMA | ComBat (post-RMA) |
| GSE23343 | Microarray | 10/10 | RMA | None required |
| GSE58208 | Microarray | 62/40 | RMA | ComBat (post-RMA) |
| Total for Core Analysis | | 212/118 | | |
| Contextual Dataset | Platform | Samples | Normalization | Batch Correction |
| GSE89632 | RNA-Seq | Variable (by analysis) | TMM + DESeq2 VST (for visualization) | ComBat-seq (on raw counts) |
Table 2.
Characteristics of Integrated Datasets for Core Analysis.
Table 2.
Characteristics of Integrated Datasets for Core Analysis.
| Characteristic | TCGA-CHOL | GSE107943 | GSE23343 | GSE58208 | Total |
|---|
| Samples (n) | 45 | 163 | 20 | 102 | 330 |
| Platform | RNA-Seq | Microarray | Microarray | Microarray | Mixed |
| Tissue | Bile duct | Bile duct | Liver | Liver | Mixed |
| Conditions | CCA/Normal | CCA/Normal | T2D/Control | HBV+/HBV- | 4 |
| Genes | 19,645 | 20,329 | 12,625 | 23,042 | 15,892 |
| Number of genes after intersection across all platforms. | | | | | |
Table 3.
Characteristics of the Core Metaflammation Gene Set.
Table 3.
Characteristics of the Core Metaflammation Gene Set.
| Category | Number | Percentage | Representative Genes |
|---|
| Total Genes | 156 | 100% | — |
| Upregulated | 92 | 59% | IL6, TNF, STAT3, AKT1 |
| Downregulated | 64 | 41% | PPARG, ADIPOQ, IRS1 |
| Metabolic | 58 | 37% | PPARG, SREBF1, FASN |
| Inflammatory | 72 | 46% | IL6, TNF, IL1B, CXCL8 |
| Signaling | 42 | 27% | AKT1, STAT3, NFKB1 |
| Cancer-related | 38 | 24% | MYC, VEGFA, EGFR |
Table 4.
Top Enriched Pathways for the Core Metaflammation Gene Set.
Table 4.
Top Enriched Pathways for the Core Metaflammation Gene Set.
| Pathway | Gene Count | p-Value | FDR | Enrichment Ratio | Key Genes |
|---|
| PPAR signaling | 12 | 2.1 × 10−10 | 3.2 × 10−8 | 8.4 | PPARG, SREBF1, FABP4, CD36, CPT1A, PLIN2 |
| Cytokine-cytokine receptor interaction | 18 | 3.4 × 10−9 | 2.1 × 10−6 | 6.2 | IL6, TNF, CXCL8, IL1B, CCL2, CCR5 |
| Metabolic pathways | 24 | 7.8 × 10−8 | 5.4 × 10−5 | 4.1 | Multiple enzymes (HK2, PFKFB3, ACLY, etc.) |
| PI3K-Akt signaling | 14 | 1.8 × 10−7 | 1.2 × 10−4 | 5.8 | AKT1, mTOR, PIK3CA, IRS1, ITGB1 |
| TNF signaling | 8 | 5.6 × 10−7 | 3.8 × 10−4 | 7.2 | TNF, NFKB1, JUN, MAPK8, CASP8 |
Table 6.
Top network hubs identified by integrated centrality analysis.
Table 6.
Top network hubs identified by integrated centrality analysis.
| Gene | Degree Centrality | Betweenness Centrality |
|---|
| IL6 | 28 | 0.12 |
| TNF | 26 | 0.11 |
| AKT1 | 24 | 0.10 |
| STAT3 | 22 | 0.09 |
| NFKB1 | 20 | 0.08 |
| PPARG | 18 | 0.07 |
| JUN | 17 | 0.06 |
| MYC | 16 | 0.05 |
| FOS | 15 | 0.04 |
| VEGFA | 14 | 0.03 |
Table 7.
Survival Analysis of Hub Genes in the TCGA-CHOL Cohort.
Table 7.
Survival Analysis of Hub Genes in the TCGA-CHOL Cohort.
| Gene | HR (95% CI) | p-Value | Median OS (High) | Median OS (Low) |
|---|
| IL6 | 2.1 (1.4–3.2) | 0.001 | 18.4 months | 32.7 months |
| TNF | 1.8 (1.2–2.7) | 0.004 | 20.1 months | 30.5 months |
| PPARG | 0.5 (0.3–0.8) | 0.002 | 31.9 months | 19.8 months |
| AKT1 | 1.6 (1.1–2.3) | 0.02 | 22.3 months | 29.6 months |
| STAT3 | 1.5 (1.0–2.2) | 0.04 | 23.8 months | 28.4 months |
Table 8.
Multivariate Cox Regression and Bootstrap Validation.
Table 8.
Multivariate Cox Regression and Bootstrap Validation.
| Variable | HR (Original) | 95% CI (Original) | p-Value | HR (Bootstrap Mean) | 95% Bootstrap CI | % Significant Iterations |
|---|
| IL6 (high) | 2.41 | 1.52–3.82 | 0.001 | 2.38 | 1.48–3.91 | 98.2% |
| TNF (high) | 2.12 | 1.34–3.35 | 0.004 | 2.08 | 1.29–3.48 | 94.7% |
| PPARG (low) | 0.48 | 0.28–0.82 | 0.002 | 0.51 | 0.31–0.89 | 96.1% |
| AKT1 (high) | 1.72 | 1.08–2.74 | 0.02 | 1.68 | 0.95–2.98 | 78.3% |
| STAT3 (high) | 1.58 | 0.98–2.55 | 0.04 | 1.54 | 0.89–2.71 | 72.1% |
| Age | 1.01 | 0.98–1.04 | 0.42 | 1.01 | 0.97–1.05 | 32.4% |
| Sex (Male) | 1.12 | 0.71–1.77 | 0.61 | 1.09 | 0.68–1.82 | 28.7% |
| Stage (III/IV) | 1.89 | 1.21–2.95 | 0.005 | 1.91 | 1.18–3.12 | 92.3% |
Table 9.
Summary of Protein Expression Validation.
Table 9.
Summary of Protein Expression Validation.
| Gene | Normal Expression (Score) | CCA Expression (Score) | Change | IHC Score |
|---|
| IL6 | Low (1.2) | High (3.4) | ↑ 2.2 | 8.7 |
| TNF | Low (1.5) | Medium (2.8) | ↑ 1.3 | 7.9 |
| PPARG | Medium (2.8) | Low (1.6) | ↓ 1.2 | 8.2 |
| AKT1 | Low (1.8) | High (3.6) | ↑ 1.8 | 9.1 |
| STAT3 | Medium (2.4) | High (3.2) | ↑ 0.8 | 7.5 |