1. Introduction
Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease characterized by progressive inflammation and fibrosis of the intrahepatic and extrahepatic bile ducts, ultimately leading to biliary strictures, cirrhosis, and hepatic failure [
1]. No pharmacological therapy has been validated through randomized controlled trials, and liver transplantation remains the only curative option for patients with end-stage disease [
1]. Ulcerative colitis (UC) is a chronic, non-specific inflammatory bowel disease affecting the colonic and rectal mucosa, manifesting primarily as recurrent bloody diarrhea, abdominal pain, and mucoid stool, with a steadily increasing global incidence [
2].
Notably, PSC and UC share a remarkably high clinical comorbidity: approximately 60–80% of PSC patients have concomitant UC, and the prevalence of PSC among UC patients is substantially higher than that in the general population [
1,
3]. This pronounced co-occurrence strongly implies that the two diseases may share underlying genetic susceptibility and immunopathological mechanisms, yet the molecular basis driving this comorbid association has not been fully elucidated.
From an immunological standpoint, both PSC and UC are considered immune-mediated diseases. In PSC, aberrant immune cell infiltration into the periductal region and the consequent release of pro-inflammatory cytokines lead to biliary epithelial injury and progressive fibrosis [
1]. In UC, dysregulation of both innate and adaptive mucosal immunity sustains a chronic inflammatory response, with aberrant activation of cytokine networks regarded as a core pathological driver [
2,
4]. The “gut–liver axis” hypothesis posits that gut-derived immune cells and inflammatory mediators can reach the liver via the portal venous circulation, triggering bile duct injury and thereby serving as a critical link between the two diseases [
5]. However, the specific immune cell subsets and core regulatory molecules that bridge the gut–liver immune axis remain to be systematically defined at the genetic level.
Genome-wide association studies (GWAS) have provided important clues regarding the genetic underpinnings of PSC and UC. Previous GWAS have identified multiple risk loci for each disease, some of which show genetic overlap between the two conditions [
6]. Cross-trait genetic analyses of multiple immune-mediated diseases have further uncovered a shared genetic architecture across autoimmune disorders [
7]. Nevertheless, conventional GWAS have primarily focused on individual disease signal discovery and lack in-depth, systematic dissection of the shared mechanisms underlying PSC–UC comorbidity. Moreover, the majority of GWAS-identified risk loci reside in non-coding genomic regions, making it difficult to directly infer causal genes and effector cell types [
6].
In recent years, the development of multiple post-GWAS analytical approaches has provided powerful tools to bridge this gap. Colocalization analyses of expression quantitative trait loci (eQTLs) and splicing QTLs (sQTLs), leveraging large-scale functional genomics resources such as GTEx [
8], can link GWAS signals to gene expression regulation. Transcriptome-wide association studies (TWAS) aggregate SNP-level signals to the gene level through gene expression prediction models [
9]. In particular, cell-type prioritization algorithms [
10] have enabled the resolution of GWAS signals at the cell-type level. Concurrently, advances in single-cell RNA sequencing (scRNA-seq) technology have provided unprecedented resolution for directly observing disease-state changes in cellular composition and transcriptional programs [
10].
Despite these advances, no study has yet systematically integrated multi-layered omics data to comprehensively dissect the shared immunogenetic mechanisms of PSC–UC comorbidity across tissue, cellular, and gene levels. Accordingly, the present study aimed to construct a multi-dimensional integrative analytical framework: at the tissue level, to identify commonly affected tissues through QTLEnrich (v2) [
8], MAGMA [
11], and gsMap (v1.73.7) [
12] spatial mapping; at the cellular level, to systematically prioritize key pathogenic cell types by integrating single-cell transcriptomic differential analysis with multiple GWAS-informed algorithms; at the gene level, to pinpoint core pathogenic genes through cross-validation using colocalization analysis, chromatin accessibility analysis, co-expression network analysis, and cell-type-specific TWAS; and at the variant level, to perform fine-resolution dissection through risk locus annotation and conditional analysis. Through this strategy, our study seeks to uncover the core immune cell types and key regulatory genes underlying PSC–UC comorbidity, to provide novel genetic evidence for understanding the molecular basis of the gut–liver immune axis, and to lay a theoretical foundation for the development of comorbidity-targeted therapeutic strategies.
2. Materials and Methods
2.1. Sources of Genome-Wide Summary Statistics
Our study focused on GWAS data for PSC and UC, each incorporating summary statistics from two independent studies. For PSC, GWAS data were obtained from the FinnGen database (
https://www.finngen.fi/en, accessed on 13 December 2025; ID: K11_CHOLANGI; 2317 cases and 437,418 controls) and the IEU OpenGWAS database (
https://opengwas.io/, accessed on 16 December 2025; ID: ieu-a-1112; 2871 cases and 12,019 controls). For UC, GWAS data were similarly retrieved from the FinnGen database (ID: K11_UC_STRICT2; 7220 cases and 492,160 controls) and the IEU OpenGWAS database (ID: ukb-b-19386; 1987 cases and 461,023 controls).
2.2. Single-Cell Multiome (ATAC + Gene Expression) Dataset from Healthy Donor PBMCs
The peripheral blood mononuclear cell (PBMC) dataset from healthy donors was obtained from a publicly accessible dataset on the 10× Genomics platform (
https://www.10xgenomics.com/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-1-0-0; single-cell multi-omics data from healthy human PBMCs, accessed on 16 December 2025) and is distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The dataset contains a median of 1826 genes and 3776 unique molecular identifiers (UMIs) per cell for gene expression, and 13,486 high-quality fragments per cell for chromatin accessibility, covering 108,377 open-chromatin peaks and 15,494 genes, with 85,468 peak–gene links identified. Further details are provided in the Data Availability Statement.
2.3. Single-Cell Transcriptomic Data
Single-cell transcriptomic data for PSC were obtained from the publicly available Gene Expression Omnibus (GEO) dataset GSE247128, from which 8 PSC samples and 6 healthy control samples were selected for downstream analysis. Single-cell transcriptomic data for UC were obtained from GEO dataset GSE250487, comprising 8 UC samples and 4 healthy controls.
2.4. Quality Control
A series of quality control (QC) procedures were applied to the GWAS data prior to analysis to ensure data accuracy and reliability. The minor allele frequency (MAF) was calculated for each SNP, and variants with MAF < 0.01 were excluded to reduce the uncertainty associated with low-frequency variants. All GWAS data were converted to the GRCh38/hg38 genome build to ensure format consistency for downstream analyses. Given the strong association of the major histocompatibility complex (MHC) region on chromosome 6 with autoimmunity and its extensive linkage disequilibrium (LD), which could confound GWAS analyses, SNPs located within the MHC region were excluded.
For single-cell transcriptomic data, a systematic QC pipeline was implemented. Data were loaded and initialized using the Seurat package (version 5.0.0) [
13], and the proportions of mitochondrial genes and hemoglobin genes (including HBA1, HBA2, and HBB) were calculated to assess cell quality. Stringent filtering criteria were applied to retain high-quality cells: a minimum of 1000 unique molecular identifiers (UMIs) per cell, between 200 and 5000 detected genes per cell, mitochondrial gene proportion ≤ 15%, and hemoglobin gene proportion ≤ 3%. Dimensionality reduction was performed using principal component analysis (PCA) followed by Uniform Manifold Approximation and Projection (UMAP). The Harmony algorithm [
14] was applied to correct for batch effects arising from sample origin.
2.5. Genome-Wide Meta-Analysis
To enhance statistical power, a fixed-effect model GWAS meta-analysis was performed by separately integrating GWAS summary statistics from two independent cohorts for each trait, thereby improving the ability to detect small-effect genetic variants.
2.6. Tissue-Specific eQTL/sQTL Enrichment Analysis Using QTLEnrich
Expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) across 49 tissues from GTEx v8 were used to evaluate tissue-level relevance for PSC and UC. QTLEnrich is a rank-based and permutation-based method that assesses whether phenotypic associations are enriched among eQTLs and sQTLs in specific tissues and quantifies the statistical significance of such enrichment. The method accounts for three potential confounders: minor allele frequency (MAF), distance to the transcription start site (TSS) of the target gene, and local linkage disequilibrium (LD). Adjusted fold enrichment and enrichment p-values were used to evaluate the significance of QTLEnrich results.
2.7. Tissue-Specific MAGMA Enrichment Analysis
To complement the QTLEnrich analysis, MAGMA enrichment analysis was performed to explore the tissue-specific genomic features associated with PSC and UC. GWAS summary data for both traits were formatted for MAGMA input, and gene-level enrichment analyses were conducted. Tissues with p < 0.01 were considered credibly enriched.
2.8. Spatial Transcriptomic Mapping of Tissue Enrichment Specificity for PSC and UC
To investigate the spatial distribution of PSC- and UC-associated cellular signals, we integrated single-cell spatial transcriptomic (sc-ST) data with GWAS summary statistics using the genetically informed spatial mapping of cells for complex traits (gsMap) algorithm [
12]. This algorithm integrates cross-species analyses spanning mouse embryonic and brain tissues, macaque cerebral cortex, and human GWAS data to identify the spatial distribution patterns of disease-associated cell populations. Its core principle involves mapping GWAS-derived trait-associated gene expression patterns onto spatially resolved cells to evaluate the association between specific anatomical regions and complex traits at cellular resolution. Based on spatial transcriptomic atlases of E16.5 mouse embryonic tissues (covering 25 organs), we generated PSC- and UC-specific enrichment maps and spatial gene expression profiles, establishing phenotypic spatial pathogenesis maps at single-cell resolution.
2.9. Biological Pathway Enrichment Analysis Using GeneEnrich
A gene set enrichment analysis approach was adopted to systematically analyze the genetic variants and expression profiles associated with PSC and UC. Using the GeneEnrich tool, hypergeometric and permutation tests were performed to evaluate the enrichment of candidate gene sets in biologically relevant functional or phenotypic gene sets. Empirical p-values were calculated via permutation testing to mitigate tissue-specific bias. Functional gene sets were sourced from Gene Ontology (GO), Reactome, the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Molecular Signatures Database (MSigDB), and Mouse Genome Informatics (MGI). An empirical p < 0.05 within each database was considered nominally significant.
2.10. Single-Cell Transcriptomic Atlas Annotation for Identification of Cellular Signals in PSC and UC
Preprocessed single-cell data were subjected to multi-dimensional visualization using Harmony-corrected dimensionality reduction results. Cell neighborhood graphs were constructed based on Harmony-reduced embeddings, and a multi-resolution clustering strategy (32 resolutions ranging from 0.01 to 3.0) was employed. The optimal resolution was determined via clustering tree evaluation, yielding stable cell subpopulations. To characterize the molecular features of each cluster, differentially expressed marker genes were identified using the FindAllMarkers function (minimum expression fraction of 25%; log fold-change threshold of 0.25), and heatmaps were generated based on these marker genes. Cell-type annotation was performed automatically using the SingleR algorithm [
15]. To assess whether cell-type proportions differed significantly between disease and control groups, the relative abundance of each cell type was calculated for each individual sample as the number of cells of that type divided by the total number of cells in that sample. The resulting per-sample proportions were then compared between groups using the Wilcoxon rank-sum test, with each donor sample treated as the independent unit of observation.
2.11. Identification of PSC- and UC-Associated Cellular Signals Using the seismicGWAS Method
seismicGWAS [
16] introduces the Seismic framework, which computes a novel cell-type specificity score that captures both the expression intensity and consistency across different cell types. Compared with methods such as scDRS, FUMA, and S-MAGMA for integrating GWAS and single-cell data, seismicGWAS robustly and efficiently detects cell-type–trait associations by incorporating a specificity score that accounts for gene expression variability, thereby avoiding arbitrary threshold selection. Specifically, GWAS summary statistics containing MAGMA Z-scores and cell-type-annotated single-cell transcriptomic data were imported, and the seismicGWAS R package was used to compute a gene-level specificity score for each cell type. This score reflects two aspects: (i) whether the gene is expressed at a consistently higher level in the target cell type compared with other cell types and (ii) whether the gene is expressed across all cells within that cell type. Cell-type–trait associations were then calculated using specificity scores and MAGMA Z-scores. Cell types with
p ≤ 0.05 were considered significantly associated with PSC or UC.
2.12. Identification of PSC- and UC-Associated Cellular Signals Using the ECLIPSER Method
ECLIPSER [
17,
18] employs a Bayesian Fisher’s exact test against background GWAS locus scores to estimate cell-type-specific enrichment fold changes and
p-values for each trait (GWAS locus set), tissue, and cell-type combination, with the cell-type specificity threshold set at the 95th percentile of background locus scores. The Bayesian approach estimates 95% confidence intervals for the enrichment fold change and is applicable to traits with few loci or where no locus exceeds the enrichment threshold. The analysis utilizes multi-level annotation data obtained from functional genomics platforms. Briefly, significant genetic loci associated with the target trait were first identified from GWAS results and then expanded using LD proxy relationships (r
2 > 0.8) to comprehensively capture potential functional variants. During the preparatory stage, Wilcoxon rank-sum tests were performed for between-group differential expression analysis within each cell type, with a minimum of 3 cells per group, a minimum gene expression fraction of 10%, and a log
2FC threshold of 0.5. Genes meeting the differential expression criteria (adjusted
p < 0.05 and |log
2FC| > 0.5) were retained. Cell types with
p ≤ 0.05 were deemed to exhibit statistically significant enrichment.
2.13. CELLECT Analysis for Identification of PSC- and UC-Associated Cellular Signals
To evaluate the contribution of cell-type-specific gene expression to disease heritability, we employed two complementary methods within the CELLECT (Cell-type Expression-specific Integration for Complex Traits) framework: heritability-based stratified LD score regression (S-LDSC) [
19] and gene-set-based MAGMA gene analysis [
11]. S-LDSC was applied to the PSC and UC GWAS summary statistics. The baseline cell-type dataset was derived from the Tabula Muris reference [
20]. LD scores were calculated using European samples from the 1000 Genomes Project Phase 3 as a reference panel.
p < 0.05 was considered indicative of significant enrichment. The MAGMA gene set analysis module tested the correlation between gene-level association statistics and cell-type-specific mean gene expression levels. MAGMA performed pairwise conditional analyses for cell-type combinations to identify cell types whose association signals were independent of other prominent cell types. As with S-LDSC, a significance threshold of
p < 0.05 was applied.
2.14. Multi-Dimensional Evidence Integration for Cell-Type Prioritization
To prioritize the most likely pathogenic cell types for PSC and UC, we consolidated evidence from four analytical approaches: (i) single-cell atlas annotation and differential analysis, (ii) ECLIPSER analysis, (iii) seismicGWAS analysis, and (iv) CELLECT analysis. For simplicity, each line of evidence was assigned equal weight (1 point), and the total score for each cell type was calculated as the sum across all evidence streams. A higher total score indicated a greater likelihood of the cell type being a key participant in disease pathogenesis [
21].
2.15. Weighted Gene Co-Expression Network Analysis of Prioritized Cell Types to Identify Core Module Genes
The high-dimensional Weighted Gene Co-expression Network Analysis (hdWGCNA) method [
22] was applied to perform weighted gene co-expression network analysis on the cell type identified through multi-dimensional evidence integration. Annotated single-cell Seurat objects were loaded, and the target cell type was specified for analysis. The WGCNA function was used to initialize the analysis environment, selecting genes expressed in at least 5% of cells as the candidate gene set. Metacells (k = 25) were constructed using the MetacellsByGroups function based on cell type and sample origin to reduce single-cell data sparsity while preserving biological variability. Metacell expression data were then normalized and scaled. Following PCA-based dimensionality reduction, the Harmony algorithm was applied to correct for batch effects arising from sample origin. For network construction, the SetDatExpr function was used to extract the expression matrix of the target cell type, and the TestSoftPowers function was employed to test soft-thresholding powers. An appropriate soft threshold was selected to construct a signed topological overlap matrix (TOM) network. Co-expression modules were identified using a dynamic tree-cutting algorithm, and module eigengene values were calculated. Module–cell type associations were further analyzed.
2.16. eCAVIAR Colocalization Analysis for PSC and UC
To identify high-confidence genes and regulatory mechanisms (eQTLs/sQTLs) potentially mediating the association at common risk loci for PSC and UC, we employed the Bayesian colocalization method eCAVIAR, which assesses whether co-occurring GWAS and eQTL/sQTL signals tag the same causal variant or haplotype, thereby accounting for local LD and allelic heterogeneity [
23]. eCAVIAR incorporates built-in fine-mapping functionality and can process large-scale GWAS summary statistics. A maximum of two independent causal variants per locus was assumed. Input data comprised Z-scores (effect size beta divided by standard error) for each variant from both the GWAS and GTEx eQTL/sQTL studies. The LD window around each lead variant was defined as the chromosomal region containing variants with r
2 > 0.1 (calculated using the 1000 Genomes Project Phase 3 as a reference panel), extended by 50,000 bp on each side. A colocalization posterior probability (CLPP) exceeding 0.01 was considered significant.
2.17. fastENLOC Colocalization Analysis for PSC and UC
To further refine the identification of genes and regulatory mechanisms mediating the association at common risk loci, we employed the Bayesian colocalization method fastENLOC [
24]. This method uses its embedded DAP-G algorithm to perform fine-mapping on both GWAS and eQTL/sQTL loci separately, estimating the posterior probability that each variant is causal and then evaluating the colocalization probability that the two signals share the same causal variant. Importantly, fastENLOC does not impose an upper limit on the number of independent causal variants per locus. Input data comprised Z-scores for each variant from the GWAS and GTEx eQTL/sQTL summary data. The LD window for each lead variant was defined as described above for eCAVIAR. Colocalization results were expressed as the regional colocalization probability (RCP). Following the method’s recommendations, an RCP exceeding 0.1 was considered evidence of significant colocalization.
2.18. Open Chromatin to Gene Expression Analysis
The Open4Gene analysis aimed to explore genes expressed in immune cells from normal tissues and to determine whether stably expressed genes in immune cells exhibit potential associations with specific cell types identified in the PSC and UC single-cell transcriptomic atlases. Open4Gene is a Hurdle model-based statistical method specifically designed to handle the zero-inflated characteristics common in single-cell data. The analytical pipeline consisted of several key steps: scRNA-seq and assay for transposase-accessible chromatin with sequencing (ATAC-seq) data were first normalized, dimensionally reduced, and clustered using the Seurat and Signac packages. RNA expression matrices, ATAC peak matrices, and cell metadata (including cell-type annotations and technical covariates) were extracted from the Seurat objects. ATAC peaks were then linked to gene promoter regions using a window size of 100 kb to define peak–gene pairs. A two-component Hurdle model was employed to test the association of each peak–gene pair: the zero component used a binomial distribution (logit link function) to model the relationship between ATAC open-chromatin status and the probability of zero gene expression; the count component used a truncated negative binomial distribution (log link function) to model the relationship between ATAC signal intensity and non-zero gene expression levels [
25].
2.19. Identification of Candidate Genes Using scPrediXcan
scPrediXcan [
26] is a cell-type-specific association analysis framework that integrates state-of-the-art deep learning methods with a conventional transcriptome-wide association study (TWAS) framework. The method predicts epigenomic features from DNA sequences, enabling high-precision prediction of cell-type-specific gene expression and capturing complex gene regulatory syntax overlooked by linear models. The scPrediXcan framework consists of three core steps: first, a ctPred model is constructed using a multi-layer perceptron (MLP) to predict gene expression percentiles based on epigenomic data and observed cell-type-specific gene expression. Second, the deep learning model is linearized into an SNP-based elastic net model (ℓ-ctPred) compatible with GWAS summary statistic-based association testing. Finally, ℓ-ctPred and the S-PrediXcan framework are used to execute TWAS at the cell-type level, testing gene–trait associations across different cell types.
2.20. Genomic Risk Locus Analysis for PSC and UC
Genomic risk locus identification and annotation were performed using the FUMA platform (
https://fuma.ctglab.nl/snp2gene, accessed on 18 February 2026). GWAS summary files for PSC and UC, containing SNP identifiers and LD reference information, were uploaded. The FUMA platform performed initial quality control to remove missing values and low-quality SNPs. A genome-wide significance threshold of
p < 5 × 10
−8 was applied to identify risk loci associated with PSC and UC.
2.21. Conditional Analysis of Genomic Risk Loci
Conditional analysis was performed on the high-confidence risk loci identified for PSC and UC to investigate whether independent secondary association signals existed within the significantly colocalized GWAS signal regions. Specifically, the lead variant at each locus was used as a conditioning variable, and conditional association analysis was performed on the GWAS summary statistics using the COJO tool in the GCTA software (v1.94.1) [
27]. To ensure inclusion of low-frequency lead variants, variants with MAF < 0.0001 were pre-filtered. Variant allele frequencies required for the analysis were obtained from the 1000 Genomes Project.
2.22. Prioritized Gene Expression Annotation and Exon-Level Expression Analysis
Exon-level expression data from the GTEx portal (
https://gtexportal.org/, accessed on 22 February 2026) were used to annotate prioritized genes, enabling analysis of transcript isoform diversity, structural features of each transcript, and tissue-specific expression distributions across human tissues.
4. Discussion
By constructing a systematic multi-omics integrative analytical framework, this study provided an in-depth dissection of the shared immunogenetic mechanisms underlying PSC–UC comorbidity across tissue, cellular, and gene levels. The core findings can be summarized at three levels: (1) the intestine and immune-related tissues represent commonly enriched tissues for the genetic signals of both PSC and UC; (2) NK cells constitute the core immune effector cell type driving comorbidity between the two diseases; and (3) STAT3 serves as a key hub gene connecting the immunopathological mechanisms of PSC and UC, with its locus harboring multiple independent functional variants.
4.1. The Intestine and Immune-Related Tissues: The Common Tissue-Level Foundation of PSC–UC Comorbidity
QTLEnrich analysis confirmed widespread tissue-specific enrichment of eQTLs and sQTLs among the GWAS association signals for both PSC and UC, with intestinal tissues showing significant enrichment for both diseases. MAGMA tissue enrichment analysis further revealed that PSC genetic signals were highly enriched in EBV-transformed lymphocytes, while UC signals were significantly enriched in the small intestine terminal ileum, spleen, and whole blood. These results collectively point to the immune and digestive systems as the shared genetic foundation of PSC and UC. The enrichment of PSC genetic signals in lymphocytes aligns with its nature as an immune-mediated disease [
1], while the enrichment of UC signals in the spleen and whole blood further underscores the importance of systemic immune activation in IBD [
4]. Using the gsMap algorithm to map GWAS signals onto the spatial transcriptomic atlas of E16.5 mouse embryos, we observed common spatial enrichment of PSC and UC genetic signals in the gastrointestinal tract. This finding provides spatial-dimensional genetic support for the “gut–liver axis” hypothesis from a developmental biology perspective, suggesting that the comorbid relationship between PSC and UC may be rooted in shared gene expression regulatory programs during digestive system development.
4.2. NK Cells: The Core Immune Effector Cell Type in PSC–UC Comorbidity
A notable finding of this study is the identification of NK cells as the core effector cell type underlying PSC–UC comorbidity through multi-dimensional evidence integration. It is important to formally address why seismicGWAS and ECLIPSER did not independently identify NK cells as a statistically significant cell type, despite their prioritization by CELLECT and single-cell differential analysis. Several methodological factors likely contribute to this discrepancy. First, seismicGWAS computes a cell-type specificity score that rewards genes expressed at consistently higher levels across all cells within a given cell type; NK cells, which represent a relatively rare population in the datasets analyzed, may exhibit higher transcriptional heterogeneity, thereby reducing the specificity score despite their biological relevance. Second, ECLIPSER relies on the enrichment of GWAS risk loci within cell-type-specific differentially expressed genes identified from the single-cell atlases used in this study; given the modest sample sizes of the PSC and UC single-cell datasets, the statistical power to identify cell-type-specific differential expression patterns may be insufficient to drive ECLIPSER enrichment to significance. Third, the CELLECT framework employs the Tabula Muris reference, which encompasses a broader range of cell types and may better capture the heritability contribution of NK cells through its gene set-based heritability enrichment approach, which is less sensitive to single-cell dataset sample size. Taken together, the failure of seismicGWAS and ECLIPSER to reach significance should be interpreted in the context of method-specific sensitivity, dataset-specific limitations, and the inherent challenge of detecting enrichment for rare cell types in small single-cell cohorts.
NK cells are core effector cells of the innate immune system, and recent studies have amply demonstrated their functional diversity across different tissues [
29]. NK cells exhibit remarkable phenotypic and functional heterogeneity in different tissue microenvironments, with their tissue-resident properties and local immunoregulatory functions far exceeding classical understanding [
29,
30]. In the liver, NK cells account for a substantial proportion of hepatic lymphocytes and participate in diverse functions including anti-infection defense, immune surveillance, and immunoregulation [
29]. In the pathological context of PSC, activated NK cells can secrete pro-inflammatory cytokines such as interferon-γ (IFN-γ) and tumor necrosis factor-α (TNF-α), directly injuring biliary epithelial cells or promoting periductal inflammatory reactions [
1]. In the intestine, dysfunction of NK cells and innate lymphoid cells has been established to be closely linked to the pathogenesis of IBD [
4,
30].
Notably, our findings position NK cells at the intersection of the PSC–UC “gut–liver axis.” According to the gut–liver axis hypothesis, gut-derived immune cells can migrate to the liver via the portal venous circulation [
5]. As innate immune cells with high migratory capacity, NK cells may serve as the critical “messenger” cells that relay intestinal inflammatory signals to the liver. NK cells activated within the intestinal inflammatory microenvironment may enter the liver through the portal vein and exert pro-inflammatory and pro-fibrotic effects in the periductal region, thereby establishing a cellular-level link between the intestinal inflammation of UC and the bile duct injury of PSC. This hypothesis provides a novel immune cell-based mechanistic framework for explaining the high comorbidity rate between PSC and UC.
4.3. STAT3: The Hub Gene Linking the Immunopathological Mechanisms of PSC and UC
Through cross-validation by five independent analytical algorithms—eCAVIAR colocalization, fastENLOC colocalization, Open4Gene chromatin accessibility analysis, hdWGCNA co-expression network analysis, and scPrediXcan cell-type-specific TWAS—
STAT3 was identified as the sole high-confidence key gene.
STAT3 (signal transducer and activator of transcription 3) is a central effector molecule of the JAK–STAT signaling pathway [
31] and plays an indispensable role in immune cell differentiation, activation, survival, and inflammatory response regulation [
32]. In NK cell biology,
STAT3 is a key downstream mediator of cytokine signaling from interleukin-21 (IL-21) and interleukin-15 (IL-15), among others, regulating NK cell maturation, activation, and cytotoxic function [
31,
32].
In the pathological context of PSC, sustained
STAT3 activation can drive biliary epithelial cell proliferation and inflammatory mediator secretion, further recruiting immune cells to the periductal region and establishing a self-perpetuating inflammatory positive-feedback loop [
1]. In UC, hyperactivation of the IL-6/
STAT3 signaling pathway promotes Th17 cell differentiation, suppresses regulatory T cell function, and drives the chronicity of intestinal mucosal inflammation [
4,
33]. Thus,
STAT3 as a shared key gene for PSC and UC may simultaneously drive hepatic and intestinal immunopathological processes through aberrant activation in NK cells and other immune cells, providing a molecular-level explanation for the comorbid relationship between the two diseases.
Exon-level analysis further revealed the potential association between tissue-differential expression of the
STAT3 gene and alternative splicing events. While isoform-specific quantification was not performed in the present study, the exon-level expression heterogeneity across tissues raises the intriguing hypothesis that tissue-specific splicing regulation may contribute to the differential functional effects of
STAT3 in the liver versus the colon. This possibility merits formal investigation using sQTL-based isoform analyses in future work.
STAT3 has at least two major splice isoforms—
STAT3α and
STAT3β—which differ significantly in function [
34,
35,
36].
STAT3α possesses a complete transactivation domain and primarily exerts pro-inflammatory and pro-survival effects, whereas
STAT3β, truncated at the C-terminus, displays certain inhibitory functions. The alternative splicing events observed in the Exon 1–4, 5–7, 14–19, and 21–25 regions suggest that tissue-specific splicing patterns of
STAT3 may determine its differential functional effects in the liver and colon. The high expression of
STAT3 in EBV-transformed lymphocytes is consistent with the MAGMA tissue enrichment results, further supporting the central role of
STAT3 in immune cell function.
4.4. Genetic Architecture Complexity of the STAT3 Locus
Colocalization analysis identified the high-risk variant rs3736161 within the
STAT3 locus on chromosome 17. This SNP is located directly within the
STAT3 gene region, suggesting that this genetic variant may influence PSC and UC disease risk by directly regulating
STAT3 expression or splicing patterns. More importantly, GCTA-COJO conditional analysis revealed 35 independent association signals within this region in addition to the lead SNP rs3736161 (including rs1053004). This allelic heterogeneity indicates that the
STAT3 locus is regulated by multiple independent functional variants, which may independently contribute to disease risk through different molecular mechanisms [
6]. The clinical implication is that therapeutic strategies targeting the
STAT3 pathway may need to account for individual-level variability across different genetic backgrounds.
4.5. Clinical Translational Implications
Our findings offer several insights for the clinical management of PSC–UC comorbidity. First, the identification of NK cells as core effector cells suggests that monitoring the number and functional status of NK cells in peripheral blood or tissue may aid in assessing disease activity and progression risk in patients with PSC–UC comorbidity. Second, the identification of
STAT3 as a key hub gene provides a theoretical basis for targeted therapeutic strategies. Multiple inhibitors targeting
STAT3 and the JAK–STAT pathway (including ruxolitinib, tofacitinib, filgotinib, and upadacitinib) have entered clinical use or trial stages [
32,
33,
34]. Notably, JAK inhibitors such as upadacitinib have demonstrated significant clinical benefits in the treatment of UC [
37], and their efficacy may be partly mediated through the suppression of excessive
STAT3 activation in NK cells. This hypothesis warrants further investigation in clinical studies of PSC–UC comorbidity patients and provides a theoretical rationale for exploring the potential therapeutic value of JAK inhibitors in PSC.
4.6. Limitations
Several limitations of this study should be addressed in future work. First, the GWAS data were derived primarily from European-ancestry cohorts, and the generalizability of the findings to other ethnic populations requires further validation. Second, the relatively limited sample size of the single-cell transcriptomic data may have affected the statistical power of certain analyses; larger single-cell cohorts are needed for validation. Third, this study was primarily based on computational biology analyses; although cross-validation by multiple algorithms enhanced the reliability of the results, the specific molecular mechanisms by which STAT3 mediates PSC–UC comorbidity in NK cells require verification through in vitro functional experiments and animal models. Fourth, the single-cell multiome data used for Open4Gene analysis were derived from healthy donor PBMCs and may not directly reflect disease-state chromatin accessibility changes. Fifth, the comorbid relationship between PSC and UC may also involve other dimensions such as the gut microbiome and metabolome, which were not addressed in this study. Additionally, no formal genome-wide genetic correlation analysis between PSC and UC was performed in the present study; incorporating such cross-trait analyses in future work would provide a direct quantitative estimate of shared genetic architecture to complement the multi-omics integrative findings reported here. Future research should integrate multi-dimensional omics data, validate the present findings in larger multi-center, multi-ethnic cohorts, and elucidate the precise molecular mechanisms of the STAT3/NK cell axis in gut–liver immune crosstalk through functional experiments.
5. Conclusions
By constructing a multi-layered integrative analytical framework spanning GWAS to single-cell transcriptomics, this study systematically uncovered the shared immunogenetic mechanisms underlying PSC–UC comorbidity. The results demonstrate that (1) the intestine and immune-related tissues are key tissues commonly enriched for the genetic signals of PSC and UC, supporting the important role of the “gut–liver axis” in comorbidity pathogenesis; (2) through multi-dimensional evidence integration across four complementary methods, NK cells were consistently identified as the core immune effector cell type in PSC–UC comorbidity, revealing the pivotal role of the innate immune system in gut–liver immune crosstalk; (3) STAT3 was cross-validated by five independent algorithms as the sole high-confidence key comorbidity gene, and its broad expression in NK cells and multiple immune cell types, together with its tissue-differential alternative splicing patterns, suggests that STAT3 may simultaneously regulate hepatic and intestinal immunopathological processes through different molecular mechanisms; and (4) multiple independent functional variants, represented by rs3736161, exist within the STAT3 locus, revealing a complex genetic regulatory architecture at this site. In summary, this study is the first to identify the NK cell–STAT3 axis as the central molecular nexus linking the immunopathological mechanisms of PSC and UC from a multi-omics integrative perspective. These findings provide novel genetic evidence for understanding the gut–liver immune axis in comorbidity and lay a theoretical foundation for developing therapeutic strategies targeting the STAT3/NK cell axis.