Trans-Ancestral Fine-Mapping and Epigenetic Annotation as Tools to Delineate Functionally Relevant Risk Alleles at IKZF1 and IKZF3 in Systemic Lupus Erythematosus

Background: Prioritizing tag-SNPs carried on extended risk haplotypes at susceptibility loci for common disease is a challenge. Methods: We utilized trans-ancestral exclusion mapping to reduce risk haplotypes at IKZF1 and IKZF3 identified in multiple ancestries from SLE GWAS and ImmunoChip datasets. We characterized functional annotation data across each risk haplotype from publicly available datasets including ENCODE, RoadMap Consortium, PC Hi-C data from 3D genome browser, NESDR NTR conditional eQTL database, GeneCards Genehancers and TF (transcription factor) binding sites from Haploregv4. Results: We refined the 60 kb associated haplotype upstream of IKZF1 to just 12 tag-SNPs tagging a 47.7 kb core risk haplotype. There was preferential enrichment of DNAse I hypersensitivity and H3K27ac modification across the 3′ end of the risk haplotype, with four tag-SNPs sharing allele-specific TF binding sites with promoter variants, which are eQTLs for IKZF1 in whole blood. At IKZF3, we refined a core risk haplotype of 101 kb (27 tag-SNPs) from an initial extended haplotype of 194 kb (282 tag-SNPs), which had widespread DNAse I hypersensitivity, H3K27ac modification and multiple allele-specific TF binding sites. Dimerization of Fox family TFs bound at the 3′ and promoter of IKZF3 may stabilize chromatin looping across the locus. Conclusions: We combined trans-ancestral exclusion mapping and epigenetic annotation to identify variants at both IKZF1 and IKZF3 with the highest likelihood of biological relevance. The approach will be of strong interest to other complex trait geneticists seeking to attribute biological relevance to risk alleles on extended risk haplotypes in their disease of interest.


Introduction
Systemic Lupus Erythematosus (SLE) is a complex autoimmune disease of unknown etiology. However, genome-wide association analysis of cohorts has proven to be a successful means of identifying novel susceptibility loci for lupus [1][2][3][4][5][6][7][8][9][10][11]. The 84 autosomal genetic risk factors identified in the largest of these Genome-wide association studies (GWAS) studies, in a Euro-Canadian cohort [12]) implicate many different gene families from diverse biochemical pathways. Dysregulation of these molecular pathways could have serious consequences for the function of multiple immune cell types. The Ikaros family of Kruppel zinc finger transcription factors is one such gene family. The importance of this gene family in SLE pathogenesis is evidenced by the associations (P meta < 5 × 10 −8 ) for three family members: IKZF1 (Ikaros) (rs2366293-C, rs4917014-T), IKZF3 (Aiolos) (rs2941509-T) and IKZF2 (Helios) (rs6435760-C) [12]. annotation and trans-ancestral fine mapping we seek to narrow down the core regions of association at IKZF1 and IKZF3 and define sets of candidate causal variants at each locus.

Defining the Risk Haplotype at IKZF1 in SLE
The strongest risk allele at IKZF1 (rs4917014-T) from our European SLE GWAS [12] is located 38.5 kb upstream of the TSS for IKZF1 (Pmeta < 5 × 10 −8 ). The variant lies within the proximal end of the risk haplotype in the control samples from this GWAS ( Figure A1A-C). This 60 kb risk haplotype (EUR_GWAS) ( Figure A1D), which carries a total of 186 variants (using boundary cut-off of r 2 > 0.75 with rs4917014) is bounded by rs1870027 and rs17552904 (chr7:50258234-50318308, hg19).
The association was replicated in a meta-analysis with two Chinese (ASN) GWAS [7,31,32]. In these Chinese datasets, rs4917014 is located on an overlapping, albeit slightly longer risk haplotype ASN_GWAS, comprising 198 variants over 65 kb, bounded by rs4598207 and rs6964608 (chr7:50258479-50324037, hg19) ( Figure A1C). There are no other associations outside these risk haplotypes in either the European or Chinese populations.
The trans-ancestral SLE ImmunoChip study [33] provided minimal additional information, because the gene-centric genotyping platform used for the study had sparse coverage of the IKZF1 risk haplotype. Only five of the variants on the risk haplotypes from the European/Chinese GWAS studies were included on the chip. However, the dataset revealed that the MAF of those five risk alleles were more similar in samples of European and Asian origin to those of African origin. There was association for all five variants in African Americans and European samples (Table A1). We cannot explore the association in African samples in more detail because there is currently no published SLE GWAS in samples of African origin.

Refining the IKZF1 Risk Haplotype Using the 1000 Genomes Super-Populations
We narrowed down the risk haplotype with a trans-ancestral mapping approach, using healthy individuals taken from the five superpopulations from the 1000G super-population data: AFR-African; AMR-Admixed American; EAS-East Asian; EUR-European and SAS-South Asian. The refined region around rs4917014 shared across ancestries, using an LD cut-off of r 2 > 0.75 with rs4917014, comprised 15 SNPs across only 47.7 kb, bounded by rs34767118 and rs876039 (chr7:50271064-50308811) (Figure 1). This region is most likely to harbor alleles of functional significance at IKZF1.  Figure A1) was used to refine the risk haplotype to 15 variants in tight LD (r 2 > 0.75) with rs4917014 over a distance of 47.7 kb upstream of the IKZF1 transcriptional start site.

Functional Annotation of IKZF1 Risk Alleles
Given the limited cell types used for the published protein expression data in SLE samples [26] and the fact that the authors did not select cells based on specific risk alleles at IKZF1, we employed several strategies to investigate the mechanisms by which risk alleles may impact IKZF1 expression levels. We used publicly available epigenetic data in a diverse set of immune cell types to search for  Figure A1) was used to refine the risk haplotype to 15 variants in tight LD (r 2 > 0.75) with rs4917014 over a distance of 47.7 kb upstream of the IKZF1 transcriptional start site.

Functional Annotation of IKZF1 Risk Alleles
Given the limited cell types used for the published protein expression data in SLE samples [26] and the fact that the authors did not select cells based on specific risk alleles at IKZF1, we employed several strategies to investigate the mechanisms by which risk alleles may impact IKZF1 expression levels. We used publicly available epigenetic data in a diverse set of immune cell types to search for enrichment of epigenetic signals which overlapped the risk alleles within the 47.7 kb IKZF1 risk haplotype and therefore more likely to have functional significance.

Determination of Chromatin Status
Alignment of the risk alleles upstream of IKZF1 revealed that only the seven SNPs on the risk haplotype lie within a predicted enhancer (orange) using the Combined Genome Segmentation data from ENCODE in LCLs ( Figure A1G). The remaining five variants were located within areas of heterochromatin (grey) or low activity (green). Taken together, these data suggest that the seven variants within the predicted enhancer region are more likely to be functionally active.

Chromatin Looping with Risk Alleles
The IKZF1 promoter is the hub of chromatin looping events at the locus. Analysis of Promoter Capture Hi-C data showed three interaction regions at IKZF1 (Figures 2 and A1F) [29]. These data revealed that the proximal promoter (chr7:50341186-50347256) (TSS) interacts with the 3 end of the enhancer region (chr7:50305428-50311993) (Enh) in multiple immune cell types ( Figure A2A). The Enh region contains a set of seven risk alleles. A second interaction between the TSS and a shorter sequence in intron 3 (chr7:50411807-50412756) (I3) did not involve the Enh region (data not shown). There was cell-type specificity in the Enh-TSS looping activities ( Figure A2A), with the strongest interaction (CHICAGO score > 11) seen in neutrophils, T and B lymphocytes. Each of the cell types which exhibited strong interaction also demonstrated higher than median IKZF1 expression for the human cells/tissues assessed by the GeneAtlas U133A microarray (BIOGPS) [34].

Determination of Chromatin Status
Alignment of the risk alleles upstream of IKZF1 revealed that only the seven SNPs on the risk haplotype lie within a predicted enhancer (orange) using the Combined Genome Segmentation data from ENCODE in LCLs ( Figure A1G). The remaining five variants were located within areas of heterochromatin (grey) or low activity (green). Taken together, these data suggest that the seven variants within the predicted enhancer region are more likely to be functionally active.

Chromatin Looping with Risk Alleles
The IKZF1 promoter is the hub of chromatin looping events at the locus. Analysis of Promoter Capture Hi-C data showed three interaction regions at IKZF1 (Figures 2 and A1F) [29]. These data revealed that the proximal promoter (chr7:50341186-50347256) (TSS) interacts with the 3′ end of the enhancer region (chr7:50305428-50311993) (Enh) in multiple immune cell types ( Figure A2A). The Enh region contains a set of seven risk alleles. A second interaction between the TSS and a shorter sequence in intron 3 (chr7:50411807-50412756) (I3) did not involve the Enh region (data not shown). There was cell-type specificity in the Enh-TSS looping activities ( Figure A2A), with the strongest interaction (CHICAGO score > 11) seen in neutrophils, T and B lymphocytes. Each of the cell types which exhibited strong interaction also demonstrated higher than median IKZF1 expression for the human cells/tissues assessed by the GeneAtlas U133A microarray (BIOGPS) [34].

Discovery of Allele-Specific Transcription Factor Binding Sites
We characterized the transcription factors which are predicted to show allele-specific differences in binding affinity (from Haploreg v4.1) to each of the 12 risk alleles defined by GWAS. Ten of these polymorphism are predicted to exhibit allele-specific binding of one or more TFs (Table A3). Five of the risk alleles within the PC Hi-C Enh region exhibit strong allele-specific binding affinity (>3 fold predicted change) for TFs which also bind to variants in the IKZF1 PC Hi-C TSS/promoter interaction region or the GeneHancer promoter region (Table 1). These five risk variants, through shared binding events have the greatest potential for genetic control of IKZF1 gene expression through chromatin looping events, leading to dimerization of the shared TF and increased regulatory activity on gene expression. Figure 4 summarizes the epigenetic landscape across IKZF1. The TFs predicted to show allele-specific binding (ASTF) lie within one of the CTCF regions within the upstream associated region and at one of the multiple EP300 binding sites across the locus. Both of these elements are characteristic of enhancer regions. There is also evidence of several epigenetic modifications across the region which commonly reside in active enhancers (H3K27ac), active regulatory elements/promoters (H3K9ac); promoter/TSS (H3K4me3) or are located in the gene body of CpG genes with higher expression (H3K4me1 and H3K4me2).

Identification of cis-eQTLs at IKZF1
None of the SLE risk alleles in the PC Hi-C Enh or TSS/Promoter regions are themselves cis-eQTLs for IKZF1 expression in whole blood from the GTEx2015_v6 data or from the NESDR NTR conditional eQTL database [37,38].
However, four of the ten risk variants predicted to exhibit allele-specific TF binding share the same TFs with other polymorphism in the promoter GH07J050293 interaction region, which are also cis-eQTLs for IKZF1 in whole blood in either the GTEx2015_v6(*) or the NESDA NTR conditional eQTL(#) databases (Table 1). These six promoter eQTLs are: rs11765436/rs7802443-RXRA-rs11185603; rs9886239-PU.1-rs11185603; rs11761922/rs7781977-BDP1-rs876038; rs10269380-Brachyury-rs876038 and rs7777365-FOXA-rs876039. It will be important to establish whether the TFs involved form a "bridge" to support the chromatin looping between the enhancer and promoter regions and whether there is a potential contribution of SLE risk alleles to control gene expression at IKZF1.

Extended IKZF3 Haplotype across Multiple Genes in European SLE GWAS Study
In our European GWAS [12] we identified a single associated haplotype at the IKZF3 locus which stretches from intron 19 of ERBB2 (rs903506), across IKZF3, ZPBP2, GSDMB and ORMDL3 into the upstream region of ORMDL3 (rs9303281) ( Figure A5A), a distance 194 kb (chr17:37879762-38074046). This European IKZF3 risk haplotype (EUR-IKZF3 haplotype) present at a frequency of 3% in Europeans, is tagged by the minor risk alleles of 282 variants with each of the five genes within the haplotype boundary containing multiple risk alleles. The peak association from conditional analyses is in the 3 UTR of IKZF3 (rs2941509). However, the tight LD across the locus in Europeans means that it is not possible to discriminate between any of the 282 tag SNPs as possessing functional significance. specific binding (ASTF) lie within one of the CTCF regions within the upstream associated region and at one of the multiple EP300 binding sites across the locus. Both of these elements are characteristic of enhancer regions. There is also evidence of several epigenetic modifications across the region which commonly reside in active enhancers (H3K27ac), active regulatory elements/promoters (H3K9ac); promoter/TSS (H3K4me3) or are located in the gene body of CpG genes with higher expression (H3K4me1 and H3K4me2). ChIP-Seq signal wiggle density graphs for chromatin marks from ENCODE/BROAD in GM12878 EBV-LCL cells for-H3K27ac (active enhancer region), H3K9ac (active regulatory elements/promoters), H3K4me1 (found in gene body of CpG genes with higher expression), H3K4me2 (found in gene body of CpG genes with higher expression) and H3K4me3 (associated with promoter/TSS). The vertical viewing range for each of these epigenetic tracks is set to viewing maximum at 50, to allow comparison of signal between each epigenetic modification.

Identification of cis-eQTLs at IKZF1
None of the SLE risk alleles in the PC Hi-C Enh or TSS/Promoter regions are themselves cis-eQTLs for IKZF1 expression in whole blood from the GTEx2015_v6 data or from the NESDR NTR conditional eQTL database [37,38].
However, four of the ten risk variants predicted to exhibit allele-specific TF binding share the same TFs with other polymorphism in the promoter GH07J050293 interaction region, which are also cis-eQTLs for IKZF1 in whole blood in either the GTEx2015_v6(*) or the NESDA NTR conditional eQTL(#) databases (Table 1). These six promoter eQTLs are: rs11765436/rs7802443-RXRA-rs11185603; rs9886239-PU.1-rs11185603; rs11761922/rs7781977-BDP1-rs876038; rs10269380-Brachyury-rs876038 and rs7777365-FOXA-rs876039. It will be important to establish whether the TFs involved form a "bridge" to support the chromatin looping between the enhancer and promoter regions and whether there is a potential contribution of SLE risk alleles to control gene expression at IKZF1.  Table 1. Panel B: Genomic architecture of IKZF1 and the location of the 15 upstream risk alleles. Panel C: Clusters of statistically significant enrichment (score range 200-1000) ChIP-Seq peaks for EP300 and CTCF (Transcription Factor ChIP-seq Uniform Peaks from ENCODE/Analysis) in GM12878 EBV-LCLs, aligned with the PC-Hi-C interaction intervals across IKZF3. Panel D: ChIP-Seq signal wiggle density graphs for chromatin marks from ENCODE/BROAD in GM12878 EBV-LCL cells for-H3K27ac (active enhancer region), H3K9ac (active regulatory elements/promoters), H3K4me1 (found in gene body of CpG genes with higher expression), H3K4me2 (found in gene body of CpG genes with higher expression) and H3K4me3 (associated with promoter/TSS). The vertical viewing range for each of these epigenetic tracks is set to viewing maximum at 50, to allow comparison of signal between each epigenetic modification.

Fine-Mapping the IKZF3 Risk Haplotype Using the 1000 Genomes Super-Populations
In an attempt to narrow down the region of the European risk haplotype to define the segment most likely to harbor alleles of functional significance, we adopted a trans-ancestral approach, which utilized the five 1000G super-population datasets, to discover the minimal risk haplotype shared between ancestries.
The frequency of the European risk haplotype in the EUR-GWAS (3%) and EUR 1000G samples (2.9%) is~6-fold less in the African AFR 1000G samples (12.5%), whereas in AMR individuals the frequency was marginally below (2.3%) that seen in EUR samples. In both Asian super-populations, the EUR-IKZF3 haplotype was present at <0.1%, so we did not include the two Asian super-populations in further trans-ancestral analyses.
The alignment of the haplotype blocks from AFR, EUR and AMR 1000G samples allowed us to identify a common shared haplotype block containing the rs2941509 risk variant, of 107 kb ( Figure A5B). In all three datasets, the 3 of this refined haplotype is at the 3 end of IKZF3, between the immediate 3 flanking region (within an IKZF1 ChIP-binding site from ENCODE in EBV-LCLs) (rs9674624) and the 3 UTR (rs3764354). The 5 boundary of the risk haplotype was defined using the AFR 1000G samples because in both EUR and AMR samples the 5 LD break is in the same place, upstream of ORMDL3 (rs112191651-rs4795405). However, in the AFR samples, the haplotype block is shorter, with the 5 boundary lying within an IKZF1 binding site in the IKZF3-ZPBP2 bi-directional promoter (rs4795397-rs12936231). Taken together, these results show that the AFR samples are a key discriminator in narrowing down the common shared haplotype. Using the 1000G data we have successfully reduced the length of the core IKZF3 risk haplotype by over 44% from 194 kb (EUR GWAS) to 107 kb (AFR 1000G)(chr17:37916823-38023745). We have also reduced the number of tag SNPs from 282 (EUR GWAS) to 152 (AFR 1000G) ( Figure A5B).
Using genotypes from 2452 AA healthy control samples on the ImmunoChip we further reduced the length of the risk haplotype block, at both the 5 and 3 ends, by a total of 6 kb compared to the same block in the AFR 1000 Genomes dataset ( Figure A5B). In a similar manner to our results in the AFR 1000G samples, the haplotype carrying the European risk alleles (EUR-IKZF3 haplotype) in the AA (African-American) ImmunoChip cohort was present at a higher frequency (~12%) than in European samples. However, in the HA (Hispanic-American) (n controls = 2016) ImmunoChip cohort, the haplotype carrying the European risk alleles was at a reduced frequency (2.5%) compared to the European GWAS haplotype ( Figure A5B), albeit it the same length, so would not add any further information in fine-mapping the European signal.
In summary, the LD break-points in both the AA ImmunoChip and AFR 1000G datasets allow us to massively reduce, by >47%, the IKZF3 risk haplotype first identified in the Euro-Canadian SLE GWAS, leading to a risk haplotype covering 101 kb (chr17:37920146-38021117), restricted to the coding region for IKZF3 and carrying only 140 European tag-SNPs.

Trans-Ancestral Exclusion Mapping of IKZF3 using the SLE ImmunoChip Data
We replicated the association signal at IKZF3 in a EA (European-American) SLE ImmunoChip cohort (n cases = 6748, n controls = 11,516), with a total of 93 tag-SNPs in LD with rs2941509 (OR rs2941509 = 1.27, CI 1.14-1.41) showing highly significant association (Table A4).
We used trans-ancestral exclusion mapping as a method of narrowing down the EUR-IKZF3 risk haplotype to variants with greater potential for biological significance, by excluding sets of variants based on the strength of association and MAF in two ancestries. Our analyses split the associated variants into two groups with 27 of the 93 tagging variants (Group 1) showing association with SLE (OR > 1.27) in the AA ImmunoChip cohort (n cases = 2970, n controls = 2452). The remaining 66 variants (Group 2) were not associated (OR < 1.14) with lupus in AA samples. None of the Group 1 or 2 variants were associated with other autoimmune diseases (from the GWAS Catalogue).
Furthermore, the Group 1 risk alleles (3.6% in EA samples) were much rarer in the AA ImmunoChip cohort (MAF < 0.1%). Conversely, for the Group 2 variants, the risk alleles from the EA study present at a higher MAF in the AA cohort (MAF >12%) compared with the EA samples. However, the increased frequency of the Group 2 variants did not lead to increased association in the AA population. Added to this, meta-analysis of the EA and AFR ImmunoChip datasets revealed that the OR of Group 2 SNPs was not increased by either a fixed effects (OR) or by random effects (OR(R)) model and we found high heterogeneity between the two ancestries (I > 50) (Table A4). These results led to the exclusion of 66 variants on the risk haplotype which included the lead SNP identified in our original GWAS study (rs2941509) [12]. Therefore, we focused our further functional annotation on the 27 Group 1 SNPs because they showed association in both populations and were more likely to harbor alleles of functional significance for lupus.
We employed a subsequent round of trans-ancestral exclusion mapping to split the remaining 27 group 1 variants into two sets, based on the degree of association in the AA cohort (Table A4, Figure 5). The 17 variants in Group 1A, which extend across the regulatory region of the gene (between the promoter region and I3), exhibited a stronger association (OR > 1.5) in the AA cohort compared to that seen in the EA population (OR > 1. 27). This is despite the meta-analysis of Class 1A variants only providing marginal improvement in association, because of the low MAF in the AA cohort for these SNPs (Table 2). Conversely, the nine SNPs in Group 1B, which lie within the coding region including all six Zinc Fingers (I3-E7), showed similar strength of association in the AA and EA samples, despite the radically reduced MAF for variants in the AA cohort. We will include both Group 1A and 1B variants in our functional annotation of IKZF3 but have greater confidence that the variants in Group 1A will have a better predictive ability of biological significance than those in Group 1B. cohort for these SNPs (Table 2). Conversely, the nine SNPs in Group 1B, which lie within the coding region including all six Zinc Fingers (I3-E7), showed similar strength of association in the AA and EA samples, despite the radically reduced MAF for variants in the AA cohort. We will include both Group 1A and 1B variants in our functional annotation of IKZF3 but have greater confidence that the variants in Group 1A will have a better predictive ability of biological significance than those in Group 1B. Figure 5. Trans-ancestral exclusion mapping to refine risk alleles at IKZF3. Location of the 93 European tag-SNPs carried on the 101 kb core risk haplotype across IKZF3 coded on the antisense strand, shared between healthy EA (European American) and AA (African American) individuals from the SLE ImmunoChip study. Trans-ancestral exclusion mapping led to the removal of 66 variants (Group 2) which had MAF > 12% but which were not associated (p > 0.01) in the AA samples. The remaining 27 variants (Group 1) showed stronger association in the AA samples, despite having MAF < 0.1%. This group of variants, were split into Group 1A (variants located in promoter-I3 regulatory region of the gene) and Group 1B (variants in the I3-E7 region covering the six Zinc Fingers). Group 1A variants were more strongly associated (OR > 1.5) than the Group 1B variants (OR > 1.27) in the AA cohort. Figure 5. Trans-ancestral exclusion mapping to refine risk alleles at IKZF3. Location of the 93 European tag-SNPs carried on the 101 kb core risk haplotype across IKZF3 coded on the antisense strand, shared between healthy EA (European American) and AA (African American) individuals from the SLE ImmunoChip study. Trans-ancestral exclusion mapping led to the removal of 66 variants (Group 2) which had MAF > 12% but which were not associated (p > 0.01) in the AA samples. The remaining 27 variants (Group 1) showed stronger association in the AA samples, despite having MAF < 0.1%. This group of variants, were split into Group 1A (variants located in promoter-I3 regulatory region of the gene) and Group 1B (variants in the I3-E7 region covering the six Zinc Fingers). Group 1A variants were more strongly associated (OR > 1.5) than the Group 1B variants (OR > 1.27) in the AA cohort.

Analysis of Expression Levels
As with IKZF1, none of the IKZF3 risk alleles are cis-eQTLs for IKZF3 in whole blood [37,38]. At IKZF3, this may reflect the lack of power in cis-eQTL analysis given the low MAF of the risk alleles (MAF = 0.03). However, at the protein level there is a significant increase of in the MFI detection of IKZF3 positive CD27 + IgD − switched memory (SwM) B cells and CD27 + IgD + double-positive non-switched memory (NSM) B cells in 10 SLE cases and 10 healthy controls, with moderate increases in the detection of MFI in CD27 − IgD − DN B cells and CD27 − IgD + mature naive B cells (naive) in the patients compared with the healthy controls [26].
Nevertheless, recognizing that the risk alleles at IKZF3 may exert their function through epigenetic mechanisms rather than direct transcriptional regulation and that this function may be cell-type and/or activation state specific, we looked for epigenetic mechanisms operating across the risk haplotype which may indicate that specific risk alleles may act in this way.
The strongest interactions were between the IKZF3-ZPBP2 promoter and the most 3 interaction fragment (3 E4-7) were in naïve CD4+ T cells, total CD4+ T cells, activated total CD4+ T cells, non-activated total CD4+ T cells, naïve CD8+ T cells, total CD8+ T cells, naïve B cells and total B cells (tB) (CHICAGO interaction score > 5.5) ( Figure A2B). This 3 E4-7 interaction region contains the four DNA binding zinc fingers (ZnF 1-4) and the first~8.4 kb, around the TSS, of a shorter IKZF3 isoform, implying the promoter-gene interaction may affect the expression of these two functional regions of the locus. Interactions of the promoter with all three coding fragments are greatest in lymphocytes, which reflects the predominant lymphocyte expression pattern of IKZF3. However, the 3 E4-7 interaction region does not contain the two dimerization Zinc Fingers (ZnF 5-6) ( Figure 6). The lack of direct interaction between the promoter and dimerization domains means that the risk alleles in the promoter region may only have an indirect interaction with variants in the dimerization domains (in E8) [29,39].  the locus. Interactions of the promoter with all three coding fragments are greatest in lymphocytes, which reflects the predominant lymphocyte expression pattern of IKZF3. However, the 3′ E4-7 interaction region does not contain the two dimerization Zinc Fingers (ZnF 5-6) ( Figure 6). The lack of direct interaction between the promoter and dimerization domains means that the risk alleles in the promoter region may only have an indirect interaction with variants in the dimerization domains (in E8) [29,39]. across IKZF3, taken from Pi-HiC data [29]. The strongest interactions (CHICAGO Score > 5.5) were seen in T and B lymphocytes: Naïve CD4+ T cells (nCD4),  Table A5). However, only one of the GeneHancer elements within IKZF3 undertakes chromatin looping with the major bi-directional IKZF3 promoter (GH17J039859). This element is the second promoter (GH17039839), located in intron 1, which contains the ribosomal protein L39 pseudogene 4 (interaction confidence score = 190) (data not shown). (GH17J039859) contains three Group 1 risk alleles but GH17039839 does not contain any risk alleles) (Table A5). Nevertheless, the bi-directional IKZF3 promoter (GH17J039859) interacts with GeneHancer element upstream of GSDMB and ORMDL3 (GH17J039916) (interaction confidence score = 652). GH17J039916 lies within the original 194 kb EUR associated LD region but not the 101 kb core risk haplotype.

Accessibility of the Chromatin across IKZF3
Extracting the Combined Genome Segmentation data from ENCODE in LCLs, revealed that the entire IKZF3 risk haplotype is within regions of open chromatin ( Figure 6). We also found that the Group 1 variants were preferentially enriched within the three PC Hi-C interaction regions (17 out of 27 SNPs) (Table A4), giving further evidence of potential biological function for these risk alleles. By contrast, although there are 12 GeneHancer regions across IKZF3, which contain 17 Group 1 variants, of the two GeneHancer (promoter) regions interacting at IKZF3 only one of these, the GH17J039859 primary promoter, contained risk alleles (Table A5). Figure 7 illustrates the enrichment of DNAseI hotspots at the DNA interaction regions across the whole of IKZF3 from the PC Hi-C or GeneHancer datasets. of the two GeneHancer (promoter) regions interacting at IKZF3 only one of these, the GH17J039859 primary promoter, contained risk alleles (Table A5). 2.7.4. Cell-Type Specificity in DNAse Sensitivity in the IKZF3 Interaction Regions Figure 7 illustrates the enrichment of DNAseI hotspots at the DNA interaction regions across the whole of IKZF3 from the PC Hi-C or GeneHancer datasets.  The hotspot signal for individual Group 1 risk alleles mirrors the locus-wide signal so that we can see signal enrichment (SignalValue > 2.5) in 14 Group 1 variants spread across the entire risk haplotype ( Figure A4B). The most convincing DNAseI hotspots (SignalValue > 5) were seen at Group 1 SNPs predominantly residing within the promoter (IKZF3-ZPBP2) and the 5 I3 regions (PC Hi-C experiments). In terms of cell type specificity, the hotspots in B cells are restricted to the promoter region but there is enhanced enrichment of hotspots seen in T cell types within the coding region, including at rs113370572 within the E4-7 interaction fragment. We therefore established that 26 of the Group 1 SNPs were in regions of open chromatin in lymphoblastoid cell lines LCLs ( Figure 6) and that there is a degree of cell-type specificity of DNAse1 HS ( Figure A4B).

Cell-Type Specificity in DNAse Sensitivity in the IKZF3 Interaction Regions
For each allele of the tag-SNPs on the core associated haplotypes for IKZF1 and IKZF3, we extracted the predicted allele-specific differences in binding affinity of transcription factor (taken from the ENCODE TF Binding experiments) from Haploreg v4.1. These differences were calculated as the change in log-odds (LOD) score between the Ref and Alt alleles for each tag-SNP-using Position Weight Matrices (PWM) for any TF binding motifs overlapping a 29 bp region around each risk allele, which reached a stringency (threshold of p < 4 −8 ) for either the Ref or Alt allele [30].

Discovery of Allele-Specific Transcription Factor Binding Sites
We extracted the allele-specific differences in TF binding affinity predicted at each of the Group 1 SNPs from the Haploreg database. These results revealed that 18 Group 1 variants exhibited allele-specific differences in binding affinity for one or more of the transcription factors from ENCODE (AS-TF) ( Table 2). The table shows the relative strength of this allele-specific binding (using a between cut-off of log-odds >2) for the minor risk (Alt) allele compared with the non-risk (Ref) allele. Ten of these 18 variants lie within one of the four interaction regions described for IKZF3 from the PC Hi-C data.
We also found that variants within the IKZF3-ZPBP2 bi-directional promoter (chr17:38,020,431-38,024,500) share TF binding sites with the Group 1 risk alleles within the coding region of IKZF3 (Table 2). Dimerization between these TFs may be a mechanism to stabilize chromatin looping events [40,41] across IKZF3 and the promoter region.
One example of how TF dimerization may be involved in reinforcing chromatin looping is for the Fox(o) family of transcription factors [42]. Figure 8 illustrates how potential dimerization between members of the Foxo family of TFs, which when bound to three IKZF3 risk alleles could stabilize chromatin looping across the locus. The IKZF3-ZPBP2 promoter polymorphism rs111678394 (Foxi1/Foxo_1) can interact with two variants in intron 7: rs113730542 (Fox) ( Table A6) and/or rs112876941 (Foxo_2) via Fox family dimerization (Tables 2 and A7). Figure A6 categorizes the SNP-by-SNP functional annotations across IKZF3, revealing that only four variants rs111678394, rs112412105, rs75148376 and rs113370572 lie with a PC Hi-C interaction, a DNAse HS and exhibit a predicted allele-specific TF binding. The variants lie within an interval of just 87.6 kb (chr17: 38,021,116-37,933,467). However, we also know that the entire IKZF3 region has been identified as a SuperEnhancer in B lymphocytes [43] (Figure A3), which complicates the prioritization of individual variants as having greater functional relevance than others. Some of the additional epigenetic modifications which characterize this SuperEnhancer/core risk haplotype are illustrated in Figure 9. The region is bounded by CTCF binding sites, demonstrating that there is a TAD (topologically associated domain) within IKZF3 ( Figure 9C). We also found multiple EP300 binding sites across the locus, which are also commonly seen in enhancer regions. There are several epigenetic modifications across the entire locus found in EBV-LCLs which characterize: active enhancers (H3K27ac); active regulatory elements/promoters (H3K9ac); promoter/TSS (H3K4me3) or are located in the gene body of CpG genes with higher expression (H3K4me1 and H3K4me2) ( Figure 9D). cut-off of log-odds >2) for the minor risk (Alt) allele compared with the non-risk (Ref) allele. Ten of these 18 variants lie within one of the four interaction regions described for IKZF3 from the PC Hi-C data.

IKZF3 Risk Alleles Lie within a SuperEnhancer in B Cells
We also found that variants within the IKZF3-ZPBP2 bi-directional promoter (chr17:38,020,431-38,024,500) share TF binding sites with the Group 1 risk alleles within the coding region of IKZF3 (Table 2). Dimerization between these TFs may be a mechanism to stabilize chromatin looping events [40,41] across IKZF3 and the promoter region.
One example of how TF dimerization may be involved in reinforcing chromatin looping is for the Fox(o) family of transcription factors [42]. Figure 8 illustrates how potential dimerization between members of the Foxo family of TFs, which when bound to three IKZF3 risk alleles could stabilize chromatin looping across the locus. The IKZF3-ZPBP2 promoter polymorphism rs111678394 (Foxi1/Foxo_1) can interact with two variants in intron 7: rs113730542 (Fox) (Table A6) and/or rs112876941 (Foxo_2) via Fox family dimerization (Tables 2 and A7). we have just shown the interaction between the IKZF2-ZPBP2 promoter and 3′ E4-7 interaction fragments from PC Hi-C, which brings together the TSSfull length (promoter of the full-length isoform) and the TSSshort (TSS of the shorter isoform) of IKZF3 (grey dotted lines). The Fox family members (red diamonds) bind to the risk alleles in the promoter (rs111678394) and dimerize with the Fox TFs binding two risk variants downstream of the 3′ E4-7 fragment: rs113730542 and rs112876941. Since Fox transcription factors act as dimers this potential for Fox dimerization may stabilize the interaction between the IKZF3-ZPBP2 and 3′ E4-7 fragments. 2.7.6. IKZF3 Risk Alleles Lie within a SuperEnhancer in B Cells Figure A6 categorizes the SNP-by-SNP functional annotations across IKZF3, revealing that only four variants rs111678394, rs112412105, rs75148376 and rs113370572 lie with a PC Hi-C interaction, a DNAse HS and exhibit a predicted allele-specific TF binding. The variants lie within an interval of just 87.6 kb (chr17: 38,021,116-37,933,467). However, we also know that the entire IKZF3 region has been identified as a SuperEnhancer in B lymphocytes [43] (Figure A3), which complicates the prioritization of individual variants as having greater functional relevance than others. Some of the additional epigenetic modifications which characterize this SuperEnhancer/core risk haplotype are illustrated in Figure 9. The region is bounded by CTCF binding sites, demonstrating that there is a TAD (topologically associated domain) within IKZF3 ( Figure 9C). We also found multiple EP300 binding sites across the locus, which are also commonly seen in enhancer regions. There are several epigenetic modifications across the entire locus found in EBV-LCLs which characterize: active enhancers (H3K27ac); active regulatory elements/promoters (H3K9ac); promoter/TSS (H3K4me3) or are located in the gene body of CpG genes with higher expression (H3K4me1 and H3K4me2) ( Figure  9D).  The transcription factors which are predicted to exhibit significant (LOD < 3) allele-specific binding (ASTF) to group 1 risk alleles within the PC-Hi-C interaction regions, taken from Table 2. Panel B: Genomic architecture of IKZF3 and the location of the 26 Group 1 risk alleles (Table 2). Panel C: Clusters of statistically significant enrichment (score range 200-1000) ChIP-Seq peaks for EP300 and CTCF (Transcription Factor ChIP-seq Uniform Peaks from ENCODE/Analysis) in GM12878 EBV-LCLs, aligned with the PC-Hi-C interaction intervals across IKZF3. Panel D: ChIP-Seq signal wiggle density graphs for chromatin marks from ENCODE/BROAD in GM12878 EBV-LCL cells for-H3K27ac (active enhancer region), H3K9ac (active regulatory elements/promoters), H3K4me1 (found in gene body of CpG genes with higher expression), H3K4me2 (found in gene body of CpG genes with higher expression) and H3K4me3 (associated with promoter/TSS). The vertical viewing range for each of these epigenetic tracks is set to viewing maximum at 50, to allow comparison of signal between each epigenetic modification.

Discussion
There is clear evidence from large scale SLE GWAS studies that three members of the Ikaros family of transcription factors (TF) are associated with lupus across multiple ancestries. The Ikaros transcription factors are important regulators of multiple immune cell types but in each case, the risk alleles tag an extended risk haplotype, so the identity of the causal risk alleles is unknown. Identifying these causal risk alleles will be an important step forward in understanding how genetics may alter the function of IKZF1 and IKZF3 in SLE.
Since three members of the same family show evidence of association for the same disease, it provides a convincing argument that these TFs play an important role in disease pathogenesis and indeed builds the case for a comprehensive analysis of the association signals in order to define the causal risk alleles at each locus. We therefore used a multi-omic strategy to build up a picture of the genetic, epigenetic and functional annotation across the associated loci, to pin-point the risk alleles which are likely to make the strongest contribution to the genetic-dysregulation of IKZF1 and IKZF3. At each locus we identified a set of risk alleles across multiple ancestries which are located within regions of open chromatin, are predicted to show differences allele-specific TF binding affinity, be part of regions displaying chromatin looping and show chromatin modification characteristic of the presence of a SuperEnhancer.
Given the differences in the prevalence and severity of SLE between different ancestries [32], our strategy was to take advantage of the minor allele frequency differences for risk alleles between ancestries to track down the causal risk alleles at IKZF1 and IKZF3. Through a combination of aligning tag SNPs on European risk haplotypes with the corresponding alleles in non-Europeans and subsequent fine-mapping using the multi-ancestral SLE ImmunoChip dataset, we identified the core risk haplotypes at both loci. At IKZF1 we successfully reduced the core risk haplotype by~37% down to 37.7 kb, located 38.5 kb upstream of the transcriptional start site and which includes just 12 tag-SNPs variants for functional annotation, by excluding 174 associated variants.
At IKZF3, after haplotype alignments between ancestries, we were still left with 93 tag SNPs over 101 kb in the core risk haplotype. Therefore, the nature of the fine-mapping and subsequent functional annotation was more demanding at this locus. It was therefore necessary to incorporate a trans-ancestral exclusion mapping process to exclude tag SNPs from functional annotation based on their MAF and OR. We did this using the African American samples from the SLE multi-ancestry ImmunoChip, because there is no published SLE GWAS in African American samples. This exclusion strategy was based on the assumption that since SLE is more common in samples of African origin, it was reasonable to assume that European tag-SNPs (MAF EA = 3%), would be more common and exhibit stronger association in SLE cases of African origin. Using this approach, we excluded a total of 66 SNPs (from the 93 tag SNPs) which exhibited MAF AA > MAF EA with MAF > 3%OR AA < OR EA , leaving just 27 SNPs over 101 kb for functional annotation. Therefore, in this manuscript, we set out to discover which of the risk variants at IKZF1 and IKZF3 were candidate causal risk alleles for SLE or other immune-related disease. Our results revealed that neither set of risk alleles were cis-eQTLs, nor caused amino acid changes in the Ikaros (encoded by IKZF1) or Aiolos (encoded by IKZF3) proteins. Consequently, we went on to investigate whether the risk alleles acted via epigenetic mechanisms, such as DNA methylation and DNA hypersensitivity, both of which can influence TF binding and chromatin looping.
Although the utility of DNA methylation in unravelling epigenetic mechanisms is immense, there are only two studies of this heritable, cell-type specific mark in SLE samples, both of which utilized probe-based rather than sequencing-based platforms. The first study revealed significant hypomethylation (correlated with increased gene transcription) at IKZF3 in CD4+ T cells but not at IKZF1 [44]. There was no ancestry specific analysis published on this dataset, which may be due to the moderate sample size of each cohort. The second study in Danish SLE samples revealed no evidence of hypermethylation (corresponding to down-regulated gene expression) at IKZF1 or IKZF3 in B cells, T cells, monocytes or granulocytes [45]. Determination of a detailed allele-specific methylation map across IKZF1 and IKZF3 which takes into account trans-ancestral differences in allele frequencies in SLE awaits sequence-based methylation study in immune cell types from SLE samples of different ancestries during flare and during more quiescent disease.
The data in this manuscript suggest that by far the biggest epigenetic determinant of cell-specific differences in gene regulation at IKZF1 and IKZF3 come from measurements of DNAse hypersensitivity. Hotspots delineating regions of open chromatin work provide a permissive landscape to allow allele specific TF binding and chromatin looping. All three types of event contribute to an accessible scaffold for post translational modification of chromatin tails, such as acetylation of lysine 27 on histone 3 (H3K27ac), which delineate enhancer elements.
There is widespread open chromatin in multiple cell types across the risk haplotypes for IKZF1 in T cell types and in a more diverse set of immune cell types across IKZF3 (Figures 2 and 3). This made it impossible to prioritize specific risk alleles as being more functionally significant. Similarly, it was not possible to prioritize specific risk alleles which were colocalized with sites of preferential marking by H3K27ac. This is in line with a previous report, which indicated that both IKZF1 and IKZF3 contain SuperEnhancers (SE) for multiple immune cell types [43] (Figure A3). These SE groups of enhancers, usually found at master transcription factors, which control the identity of a given cell types. Finally, the chromatin looping observed at IKZF1 and IKZF3 bring the risk alleles within the enhancers into closer proximity to promoter elements and make the DNA backbone more accessible to large numbers of additional TFs which characterize SuperEnhancers.
In summary, through a process of layered functional annotation at, using publicly available resources, we have found that the core SLE risk alleles at IKZF1 and IKZF3 are part of "functionally active DNA," within SuperEnhancers. Taken together, these results suggest that the IKZF1 and IKZF3 risk alleles may contribute to the genetic dysregulation of the SuperEnhancers and the consequential dysregulation in the function of immune cell types. However, we accept that confirmation of these findings requires detailed "wet lab" experimentation, which is outside the remit of this current manuscript.

Datasets
We used 1000-Genome imputed GWAS data from the European GWAS [12] and the two Chinese GWAS [7,31]. The entire 1000-Genome imputed SLE ImmunoChip data from Europeans (n cases = 6748, n controls = 11,516) and African Americans (AA) (n cases = 2970, n controls = 2452) was available through collaboration [33]. The 1000 Genomes data for the five super-populations was downloaded from the 1000 Genomes website via Ensembl. All the genetic data were aligned using the UCSC hg19 build.

Haplotype Analysis of the Genetic Datasets
Haplotypes were derived in each dataset, using the Solid-Spine algorithm in Haploview, (HWE cut off of 0.0001 and minor allele frequency cut off of 0.01) [46]. Visual inspection of overlapping haplotype blocks in the European SLE GWAS was used to identify continuous risk haplotypes across IKZF1 and IKZF3, using an inter-block D score of > 0.75 and to select sets of tag SNPs. The European risk alleles and haplotypes were used as a template to align the haplotypes from the other datasets and to track the presence of the European risk haplotype in these populations. The core risk haplotypes were defined by minimal alignment of the haplotype blocks from each dataset.

Trans-Ancestral Meta-Analysis
Trans-ancestral meta-analysis was undertaken using PLINK with the default settings for combining two datasets using a random effect and a fixed effects model [47]. A test of heterogeneity was used to confirm that the datasets were homogenous using a p value cut off of >0.01.

Trans-Ancestral Exclusion Mapping
Trans-ancestral exclusion mapping was carried out at IKZF3 using the EUR (n cases = 6748, n controls = 11,516) and AA (n cases = 2970, n controls = 2452) samples from the SLE ImmunoChip dataset and the EUR and AFR samples from the 1000 Genomes data. Variants were included in the analysis if >75% individuals were typed in each study. The SNPs were aligned by genomic position across all four studies, recording minor allele frequency (MAF) and/or association p value/OR for each variant. SNPs were grouped by the differences in MAF between EA/EUR and AA/AFR samples, taking into account the association p value where available. A set of European risk alleles which were most likely to tag the causal alleles at IKZF3 in Europeans were defined as being absent/very rare (MAF < 0.01) in Africans.

Functional Annotation of Risk Alleles
The H3K27ac epigenetic data for the core association intervals and flanking regions (<10kb) was downloaded from the RoadMap Consortium in a total of 27 blood cell-types together with three fibroblast cell-types and a lung endothelial cell-type for use as a control. The epigenetic data contained the consolidated imputed epigenetic data based on the p value signals from each of the individual epigenetic marks in each of the cell-types. We used the UCSC genome browser (hg19) to subset each epigenetic track for the required intervals and then exported the signal data via Galaxy [48]. Where the SNPs of interest were <10 bp away from the edge of the 25-bp epigenetic interval containing it, we averaged the enrichment from two adjacent intervals. The Signal Values for the DNAse I Hotspot data from ENCODE/Washington were downloaded for each of the risk alleles at IKZF1 and IKZF3 using UCSC/Galaxy. We accessed the PC Hi-C data across IKZF1 and IKZF3 in immune cell types from the 3D Genome Browser [39,49]. The Combined Genome Segmentation data from ENCODE in EBV-LCLs was extracted from the UCSC Genome Browser [50]. We used the R package haploR to extract cis-eQTL data for risk alleles across IKZF1 and IKZF3 from Haploreg [30,51] and accessed conditional cis-eQTLs across both genes from the NESDR NTR conditional eQTL database [38]. We exported the enhancers intervals inferred across IKZF1 and IKZF3 from the GeneHancer database [35].

Allele-Specific Transcription Factor Binding
For each allele of the tag-SNPs on the core associated haplotypes for IKZF1 and IKZF3, we extracted the predicted allele-specific differences in binding affinity of transcription factor from Haploreg v4.1 using haploR [51]. These differences were calculated as the change in log-odds (LOD) score between the Ref and Alt alleles for each tag-SNP-using Position Weight Matrices (PWM) for any TF binding motifs overlapping a 29 bp region around each risk allele, which reached a stringency (threshold of P < 4 −8 ) for either the Ref or Alt allele [30].

Visualisation of Genomic Data
We visualized the epigenetic and genomic data within the UCSC genome browser or using the Gviz package from Bioconductor, within R [52].

Acknowledgments:
We would like to express our appreciation to all the patients who have given samples to make this research possible. We thank David Morris for useful discussions regarding the SLE association data. We gratefully acknowledge the Alliance for Lupus Research for funding and support for the SLE ImmunoChip study and to Rob Graham (Genentech Inc.) for additional funding for genotyping for part of the SLE ImmunoChip cohort. Thanks to Carl Langefeld and all the other authors and contributors for the collaboration on the SLE ImmunoChip data. Thank you also to Phil Tombleson for his work on the data administration as part of the National Institute for Health Research Biomedical Research Centre (NIHR BRC) at Guy's and St Thomas' NHS Foundation and King's College London.

Conflicts of Interest:
The authors declare no conflict of interest.