The Genetics of Primary Biliary Cholangitis: A GWAS and Post-GWAS Update

Primary biliary cholangitis (PBC) is a chronic, progressive cholestatic liver disease in which the small intrahepatic bile ducts are destroyed by autoimmune reactions. Among autoimmune diseases, which are polygenic complex traits caused by the combined contribution of genetic and environmental factors, PBC exhibits the strongest involvement of genetic heritability in disease development. As at December 2022, genome-wide association studies (GWASs) and associated meta-analyses identified approximately 70 PBC susceptibility gene loci in various populations, including those of European and East Asian descent. However, the molecular mechanisms through which these susceptibility loci affect the pathogenesis of PBC are not fully understood. This study provides an overview of current data regarding the genetic factors of PBC as well as post-GWAS approaches to identifying primary functional variants and effector genes in disease-susceptibility loci. Possible mechanisms of these genetic factors in the development of PBC are also discussed, focusing on four major disease pathways identified by in silico gene set analyses, namely, (1) antigen presentation by human leukocyte antigens, (2) interleukin-12-related pathways, (3) cellular responses to tumor necrosis factor, and (4) B cell activation, maturation, and differentiation pathways.


Introduction
Primary biliary cholangitis (PBC) is a chronic cholestatic liver disease characterized by the destruction of the intrahepatic bile ducts, portal hypertension, the development of biliary cirrhosis, and ultimately, hepatic failure. Histologically, PBC is characterized by chronic, non-suppurative destructive cholangitis, ductopenia, interface hepatitis, and fibrosis [1][2][3][4]. Although recent studies have reported a good prognosis for the majority of PBC patients treated with ursodeoxycholic acid, obeticholic acid, and bezafibrate, approximately 10% to 20% of patients do not respond to these agents, eventually progressing to end-stage hepatic failure [5]. Therefore, novel therapies are needed for these treatment-resistant cases.
The destruction of bile ducts is mediated by autoimmune responses, including both adaptive immune responses (including CD4 + T cells, CD8 + T cells, and B cells) and innate immune responses (including natural killer [NK] cells), against biliary epithelial cells (BECs) [4,6,7]. Therefore, PBC is considered an organ-specific autoimmune disease. PBC also has the following features in common with autoimmune diseases. First, PBC exhibits a

Human Leukocyte Antigen (HLA)
Class II HLA are signal transduction molecules that present specific antigenic peptides to T cell antigen receptors expressed on CD4 + helper T cells. The HLA class II locus on chromosome 6 includes groups of genes encoding HLA class II proteins that are strongly associated with susceptibility to various autoimmune or immune-related diseases [17]. As HLA loci are highly polymorphic due to natural selection driven by a wide range of pathogens [18], accurate detection of disease-susceptibility loci can be challenging. For example, deviations from the Hardy-Weinberg equilibrium in case-control association studies for each single-nucleotide polymorphism (SNP) in the HLA-DRB locus are caused by gene copy number variations. Therefore, to identify disease-susceptibility alleles in HLA loci, it is necessary to perform HLA allele typing using approaches such as the Lu-minex sequence-specific oligo method or the next-generation sequencing-based HLA allele typing method.
Recently, tools have been established that can predict HLA alleles from existing GWAS data [30,31]. Furthermore, next-generation sequencer HLA typing methods have also been developed [30]. These approaches will help elucidate the contribution of HLA genes to PBC on a larger scale.
Generally, a GWAS aims to comprehensively search for genetic factors that contribute to the onset of a disease. However, by using the same methodology, it is also possible to comprehensively explore genetic factors associated with disease severity, response to treatment, and side effects. The IL28B and ITPA genes are known to be strongly associated with response to treatment and side effects of pegylated interferon-α and ribavirin therapy for chronic hepatitis C [46][47][48][49]. Recently, genetic loci near L2TFL1, FOXP4, TMEM65, OAS1, KANSL1, DPP9, RAVER1, and IFNAR2 were reported as susceptibility loci for severity to coronavirus disease 2019 (COVID-19), which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [50]. As for PBC, CTSZ/NELFCD, which has not been reported as a PBC-susceptibility locus, was associated with the progression of PBC by comparing an advanced jaundice/liver failure group with a non-advanced, early-stage group of Japanese PBC patients [51].
European [44] 16q24.1 IRF8 rs11117432 2.82 × 10 −24 --European [35] Asian [45] a The Lowest p-value in the international meta-analysis (Mells et al, 2021), b e-QTL genes with GWAS-lead SNPs for liver and immunological organs (whole blood and spleen) registered in GTEX portal. c e-QTL genes with GWAS-lead SNPs registered in ImmuNexUT (p < 0.0001). d PBC susceptibility locus detected by the Regional Heritability Mapping (RHM) method. The genes shown in red are e-QTL genes that do not match the nearest gene.
GWAS data can also be used to examine the effects of multiple variants that are located within a chromosomal unit region but contribute only marginally to pathogenesis. For example, STAT4, ULK4, and KCNH5 were identified as novel Japanese PBC susceptibility loci in a study utilizing regional gene mapping, a method used to detect polygenic effects [52] ( Table 1).

Specific and Non-Specific PBC Susceptibility Genes
Over 80% of identified PBC susceptibility gene loci overlap with loci associated with other autoimmune diseases, such as Crohn's disease, ulcerative colitis, multiple sclerosis, systemic lupus erythematosus, autoimmune thyroid disease, rheumatoid arthritis, psoriasis, ankylosing spondylitis, systemic sclerosis, celiac disease, type 1 diabetes, Sjogren's syndrome, vitiligo, and Bechet's disease. Conversely, PBC susceptibility loci near TMEM163, TET2, ITGB8, ZC3HAV1L, RIN3, EXOC3L4, DPEP2, and SPIB are specific for PBC and not associated with other autoimmune diseases [16,53]. Under the genetic background in which loci are shared among various autoimmune diseases, it is assumed that the development of PBC is caused by PBC-specific genetic and environmental factors. Further cross-disease analyses are warranted to confirm this hypothesis.

Identification of Primary Functional (Causal) Variants and Effector Genes
Within a given disease-susceptibility gene locus, the variant exhibiting the strongest association with disease susceptibility is known as the "GWAS lead variant." The GWAS lead variant in each disease susceptibility locus often does not contribute per se to disease susceptibility. The lowest p-value for the association with disease susceptibility is frequently affected by LD with a "primary functional variant" that directly contributes to disease development. In other words, the GWAS is a means of identifying the "mapped gene," which is the gene nearest to the GWAS lead variant. Therefore, it is necessary to identify the primary functional variant itself from among many variants in the disease-susceptibility gene locus.
Most of the primary functional variants are located in untranslated regions (UTRs) or expression regulatory regions, such as gene expression promoters and enhancers located in the intronic or intergenic region, and they do not alter the amino acid sequence of the relevant gene product. Primary functional variants usually control the expression of the mapped gene in which they are located. However, as DNA adopts a higher-dimension structure during transcription that associates with physically distant gene regions, primary functional variants often control the expression of other genes that can be more than several hundred kilobases apart (Figure 1). In this way, the gene regulated by the primary functional variant (i.e., the effector gene) sometimes does not match the mapped gene (genes shown in red in Table 1), as determined by analyses of public expression quantitative trait loci (e-QTL) databases, including GTEx [54] and ImmuNexUT [55]. Therefore, inferring an association with pathogenesis only from the mapped gene may lead to misinterpretation of disease pathways. Figure 1. Example of the variant that regulates the expression of genes, separated from the primary functional variant by several hundred kilobases. As DNA adopts a higher-dimensional structure during transcription that associates with physically distant gene regions, primary functional variants often control the expression of other genes that are separated by several hundred kilobases (i.e., Gene A) from the mapped gene (i.e., Gene B).
To date, we have identified 11 primary functional variants in 60 PBC susceptibility gene loci (Table 1). In order to identify primary functional variants from many variants that exhibit higher LD with GWAS lead SNPs, we selected candidate variants by in silico analysis. Gene expression-regulating variants are located within DNA sequences that show higher scores for both histone marks (H3K27Ac, H3K4Me1, and H3K4Me3) and histone loosening (DNase high-sensitivity sites and peak of ATAC-seq) in promoters, enhancers, and insulators. Furthermore, differences between major and minor alleles in transcription factor binding to the variant, in addition to significant e-QTL, are also characteristics of primary functional variants. Variants that regulate mRNA structure and translation are located in the 3 -and 5 -UTRs, respectively. Splicing-regulating variants are located within the splice site, exonic splicing enhancers, exonic splicing silencers, intronic splicing enhancers, or intronic splicing silencers. Candidate variants can be efficiently narrowed down by thoroughly analyzing each variant. Moreover, for the definitive identification of primary functional variants, in vitro functional evaluation of candidates selected via in silico analysis should be performed.
Using the allele knock-in version of a cell line constructed using genome-editing techniques such as CRISPR/Cas9, the endogenous effect of a primary functional variant can be evaluated without interference from other variants that show high LD. These analyses can also be used to elucidate the mechanism of pathogenesis. Among primary functional variants in each PBC susceptibility locus, approximately 40% were found to be associated with the expression of each mapped gene (Table 1, genes shown in black among "e-QTL genes"). For example, rs4979462, rs17032850, rs227361, rs9459874, and rs1012656, located in TNFSF15, NFKB1, MANBA, FGFR1OP, and CCR6, respectively, were identified as primary functional variants that regulate the expression of each mapped gene [56][57][58]. Alternatively, approximately 40% of primary functional variants were associated with the expression of genes other than the mapped genes ( Table 1, genes shown in red among "e-QTL genes"). For example, rs12946510, rs2293370, and rs1944919, which are located near IKZF3, TIM-MDC1, and POU2AF1, respectively, regulate the expression of ORMDL3, POGLUT1, and COLCA1/COLCA2, respectively [42,59,60] (Table 1). Therefore, such discrepancies between mapped genes and e-QTL genes can be corroborated by in vitro functional evaluation using genome-editing techniques. This type of functional evaluation was applied to the variants that regulate splicing. CD28, which was reported as a susceptibility locus for lymphocyte and eosinophil counts, multiple sclerosis, ulcerative colitis, celiac disease, rheumatoid arthritis, asthma, and PBC, has a shared disease-related primary functional variant (i.e., rs2013278) that regulates the CD28 alternative splicing that generates loss-of-function isoforms (CD28i and CD28∆ex2) [61].

In Silico Gene Set Analysis
Compared with monogenic diseases that are caused by mutations in a single gene within a patient's DNA sequence, each GWAS lead variant in the disease-susceptibility gene locus will show a weak association with the risk of a polygenic complex disease. Although the HLA class II locus showed a relatively high OR, similarly to other autoimmune diseases, all other PBC susceptibility gene loci outside HLA showed an OR of less than 2.0. However, by analyzing disease-susceptibility gene loci in detail by gene set analysis, it is possible to obtain a more comprehensive understanding of disease pathways. In such cases, gene set analysis using only mapped genes may be insufficient, because primary functional variants can sometimes be associated with the expression of genes other than the mapped genes. Gerussi et al. summarized the PBC susceptibility genes as of 2021 and classified the contribution of candidate causal genes, as well as mapped genes, into interleukin (IL)-12-mediated signaling, the cellular response to tumor necrosis factor (TNF), and the activation, maturation, and differentiation of B cells [53].
In addition, functional enrichment analysis based on protein-protein interactions using STRING [62] was found to be very useful in further identifying disease pathways in detail. The following functional enrichment analysis using STRING on these three pathways highlighted the importance of using not only mapped genes, but also e-QTL genes.

IL-12-Mediated Signaling Pathway
Interleukin-12 induces interferon (IFN)-γ production. By the enrichment analysis using STRING, the "Interleukin-12-mediated signaling pathway (GO: 0035722)" in the Gene Ontology database includes mapped PBC susceptibility genes and e-QTL genes such as IL12A, IL12B, IL12RB2, STAT4, and TYK2 (shown as red nodes in Figure 2; p = 0.0034) [62]. In addition, "Regulation of interleukin-12 production (GO: 0032655)" includes genes such as IL12B, NFKB1, IRF5, and IRF8 (shown as green nodes in Figure 2; p = 0.0472) [62]. However, IRF5 was not a PBC susceptibility mapped gene, but an e-QTL gene. Therefore, "Regulation of interleukin-12 production (GO: 0032655)" in the Gene Ontology database did not reach a level of statistical significance when using PBC susceptibility genes alone. Figure 2. PBC susceptibility mapped genes and e-QTL genes whose protein products are involved in Gene Ontology (GO) pathways related to "IL-12-mediated signaling." Among 59 PBC susceptibility mapped genes and 60 e-QTL genes, those defined in the GO database as "Interleukin-12-mediated signaling pathway (GO: 0035722)" and "Regulation of interleukin-12 production (GO: 0032655)" are shown as red and green nodes, respectively. However, after removal of "the e-QTL gene but not the mapped gene" (gene name shown in red), "Regulation of interleukin-12 production (GO: 0032655)" in the Gene Ontology database did not reach a level of statistical significance when using PBC susceptibility genes alone. Death domain receptor 3 (DR3, also known as TNFRSF25), the receptor for TNF-like cytokine 1A (TL1A; encoded by the Asian-specific PBC susceptibility gene TNFSF15), promotes IFN-γ production through activated CD4 + T cells. This suggests that TL1A is positioned in the pathway, leading to IL-12 receptor (IL12R) stimulation and IFN-γ production [63]. Additionally, studies have reported the increased expression of IFN-γrelated genes in the liver of PBC patients [64], increased expression of IFN-α/IFN-β and IFN-γ in lesions of PBC liver [65], and the appearance of PBC-like liver lesions in female mice with chronic expression of IFN-γ [66]. These data also suggest the importance of the IL-12 signaling pathway.

Cellular Response to TNF
TNF-α itself plays a role in disrupting the barrier function of tight junctions in cholangiocytes without inducing major structural changes [67]. Additionally, each member of the TNF superfamily and TNF receptor (TNFR) superfamily plays a distinct role. By conducting enrichment analysis using STRING, the "Cellular response to tumor necrosis factor (GO: 0071356)" in the Gene Ontology database includes genes such as TNFSF8, TNFSF15, TNFSF11, TNFRSF1A, LTBR, NFKB1, PSMA6, SMPD3, and CD58 (shown as red nodes in Figure 3; p = 0.0041) [62]. However, TNFSF8, LTBR, PSMA6, and SMPD3 were not PBC susceptibility mapped genes, but e-QTL genes. Therefore, "Cellular response to tumor necrosis factor (GO: 0071356)" in the Gene Ontology database did not reach a level of statistical significance when using PBC susceptibility genes alone. Figure 3. PBC susceptibility mapped genes and e-QTL genes whose protein products are involved in GO pathways related to "Cellular response to tumor necrosis factor." Genes identified in the GO database as part of pathways associated with "Cellular response to tumor necrosis factor (GO: 0071356)" are shown as red nodes. However, after removal of "the e-QTL genes but not the mapped genes" (gene names shown in red), "Cellular response to tumor necrosis factor (GO: 0071356)" in the Gene Ontology database did not reach a level of statistical significance when using PBC susceptibility genes alone. TNFR superfamily member 1A (TNFRSF1A, also known as TNFR1) is a receptor for TNF-α. Activated TNFRSF1A initiates caspase-8-mediated apoptosis via receptorinteracting serine/threonine-protein kinase 3 (RIPK3) [68]. TNF superfamily member 8 (TNFSF8, also known as CD30L and CD153) inhibits class-switch DNA recombination and antibody production in human IgD + IgM + B cells by engaging with the ligand TNFR superfamily member 8 (TNFRSF8, also known as CD30) [69]. Although TNF superfamily member 11 (TNFSF11, also known as RANKL and CD254) plays a key role in the differentiation and activation of osteoclasts, the pathway including TNFSF11 in dendritic cells (DCs) promotes naïve T cell proliferation and DC survival [70]. This pathway also regulates lymph node organogenesis and lymphocyte differentiation [71]. The lymphotoxin β receptor (also known as TNFRSF3) is involved in the thymic homing of lymphoid progenitor and peripheral antigen-presenting cells, thymocyte trafficking and shedding, differentiation of medullary thymic epithelial cells, and the establishment of central tolerance [72]. TNF-α and related molecules play important roles in several immunologic reactions related to the pathogenesis of PBC.

Activation, Maturation, and Differentiation of B Cells
By the enrichment analysis using STRING, the "Regulation of B cell activation (GO: 0050864)" in the Gene Ontology database includes genes such as CD28, FCRL3, IKZF3, ID2, PPP2R3C, and NDFIP1 (shown as blue nodes in Figure 4; p = 0.0156). The "B cell activation (GO: 0042113)" in the Gene Ontology database includes genes such as IL7R, CXCR5, CCR6, IKZF3, FCRL1, and PLCL2 (shown as green nodes in Figure 4; p = 0.0383). Finally, the "B-lymphocyte cell line (BTO: 0001522)" in the Gene Ontology database includes genes such as HLA-DRB1, HLA-DQB1, WDFY4, SPIB, IL7R, and IL21R (shown as red nodes in Figure 4; p = 0.0139) [62]. However, ID2, PPP2R3C, FCRL1, and PLCL2 were not PBC susceptibility mapped genes, but e-QTL genes. Therefore, "Regulation of B cell activation (GO: 0050864)" and "B cell activation (GO: 0042113)" in the Gene Ontology database did not reach a level of statistical significance when using PBC susceptibility genes alone. . PBC susceptibility mapped genes and e-QTL genes whose protein products are involved in GO pathways related to "Activation, maturation, and differentiation of B cells." Genes identified in the GO database as part of pathways associated with "Regulation of B cell activation (GO: 0050864)," "B cell activation (GO: 0042113)," and "B-lymphocyte cell line (BTO: 0001522)" are shown as blue, green, and red nodes, respectively. However, after removal of "the e-QTL genes but not the mapped genes" (gene names shown in red), "Regulation of B cell activation (GO: 0050864)" and "B cell activation (GO: 0042113)" in the Gene Ontology database did not reach a level of statistical significance when using PBC susceptibility genes alone. POU class 2 homeobox associating factor 1 (POU2AF1) and its cofactor, Spi-B transcription factor (SPIB), are essential molecules in the differentiation and maturation of B cells into plasma cells [73]. IKAROS family zinc finger 3 (IKZF3, also known as aiolos) is a transcription factor essential for B cell differentiation into plasma cells and long-term B cell memory [74]. The C-X-C motif chemokine receptor 5 (CXCR5) is involved in the homing of follicular helper T cells (i.e., it is involved in affinity selection after T cell-B cell interactions) to germinal centers, and IL-21 is a growth factor for CXCR5-positive T cells [75,76]. Additionally, certain subgroups of helper T cells, such as CCR9 + CXCR5 + cells, express more IL-7 receptor (IL7R)-α and secrete a higher proportion of IL-21, thereby inducing greater IgG production than CCR9 − CXCR5 − helper T cells [75]. Fc receptor-like (FCRL) family members are preferentially expressed on B cells and exert immunomodulatory functions involving the immunoreceptor tyrosine-based inhibitory motif (ITIM) and immunoreceptor tyrosine-based activation motif. FCRL1 has two ITIMs in its cytoplasmic tail and enhances B cell antigen receptor (BCR) signaling. In contrast, other FCRL members, including FCRL3, have an ITIM in their cytoplasmic tail and inhibit BCR signaling [77,78]. These results strongly suggest that B cell activation, maturation, and differentiation pathways are actively involved in the development of PBC.

Integration of GWAS and Transcriptome Data
Based on gene ontology analyses, PBC susceptibility mapped genes and e-QTL genes contain a signal transduction system leading to the production of interferon-γ (IFN-γ), which is involved in immune and inflammatory reactions, and genes involved in B cell differentiation. These genes are assumed to play an important role in the development of PBC. In support of these data, an integrated pathway analysis of Japanese PBC susceptibility genes and analyses of PBC patient liver biopsy transcriptome data indicated that genes with characteristic expression patterns in PBC patients are primarily regulated by upstream factors such as IFNG and CD40L [64].

In Silico Drug Efficacy Screening
An in silico drug efficacy screening study using disease-susceptibility genes identified by GWAS has recently been reported for several diseases [79][80][81]. This method can identify the candidate molecular targets of drugs that show protein-protein interaction with the protein product of disease-susceptibility genes by using drug databases, the therapeutic target database, and the protein-protein interaction database. In addition to the drugs that are used for each disease, this method could potentially identify previously unexploited molecular targets of drugs. We identified immunotherapeutic drugs such as ustekinumab, abatacept, and denosumab; retinoids such as acitretin; and fibrates such as bezafibrate, as potential repurposed drug candidates for the treatment of PBC via in silico drug efficacy screening [39].

Conclusions
In this study, we summarized the PBC susceptibility genes reported as at December 2022 and discussed the latest post-GWAS approaches to identifying primary functional variants (causal variants) and effector genes. The understanding of the mechanisms by which these genetic factors mediate the pathogenesis of PBC has been expanded through the use of in silico gene set analyses, e-QTL and STRING database searching, and studies examining knock-in versions of cell lines constructed using genome-editing techniques such as CRISPR/Cas9. Recent advances in informatics and genetic statistics methodologies have led to new analytical methods that utilize GWAS data. For example, SNP imputation analysis can be used to infer the genotypes of variants not loaded on a GWAS array by combining with thousands of individual whole-genome sequencing data. High-density association mapping, which involves case-control association studies for all of the variants in the disease-susceptible gene regions, can detect variants with p-values less than those of the GWAS lead variants located in the same LD blocks. Protein kinase C β (PRKCB) and chromosome 3q13.33 (including ARHGAP31, TMEM39A, POGLUT1, TIMMDC1, and CD80) were identified as the susceptibility gene loci for PBC in the Japanese population using high-density association mapping [41,42].
SNP imputation analysis can also be used for meta-analysis combining GWAS data that were analyzed by different GWAS arrays. In a recent genome-wide meta-analysis (meta-GWAS) of five European and two East Asian cohorts, SNP imputation analysis was performed [44,45]. By performing such SNP imputation analysis, it was possible to evaluate the association of primary functional variants with disease susceptibility, the correlation of primary functional variants with e-QTL, and the LD of primary functional variants with other variants. Thus, SNP imputation analysis is indispensable for post-GWAS analysis.
Whole-genome sequencing analysis of individual PBC patients can aid in the identification of disease-causing rare variants that have not been identified by GWASs and subsequent SNP imputation analyses. At present, this analysis has succeeded in detecting rare pathological variants in many Mendelian diseases and some immune-related diseases, such as cold medicine-related Stevens-Johnson syndrome [82]. Whole-genome sequencing analysis will be potentially useful in the identification of disease-causing or disease-modifying rare variants of PBC.
Additionally, for more accurate post-GWAS analysis, protein-QTL analysis, an approach derived from a novel proteomics assay capable of measuring thousands of human protein analytes, enables accurate determination of correlations between variants and expression levels, without the uncertainty of e-QTL data that results from occasional non-correlations between mRNA and protein levels [83].
In the near future, re-analysis of existing GWAS data may yield important findings that have thus far been overlooked in the pathogenesis of multifactorial diseases such as PBC.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.