Trans-Ethnic Mapping of BANK1 Identifies Two Independent SLE-Risk Linkage Groups Enriched for Co-Transcriptional Splicing Marks

BANK1 is a susceptibility gene for several systemic autoimmune diseases in several populations. Using the genome-wide association study (GWAS) data from Europeans (EUR) and African Americans (AA), we performed an extensive fine mapping of ankyrin repeats 1 (BANK1). To increase the SNP density, we used imputation followed by univariate and conditional analysis, combined with a haplotypic and expression quantitative trait locus (eQTL) analysis. The data from Europeans showed that the associated region was restricted to a minimal and dependent set of SNPs covering introns two and three, and exon two. In AA, the signal found in the Europeans was split into two independent effects. All of the major risk associated SNPs were eQTLs, and the risks were associated with an increased BANK1 gene expression. Functional annotation analysis revealed the enrichment of repressive B cell epigenomic marks (EZH2 and H3K27me3) and a strong enrichment of splice junctions. Furthermore, one eQTL located in intron two, rs13106926, was found within the binding site for RUNX3, a transcriptional activator. These results connect the local genome topography, chromatin structure, and the regulatory landscape of BANK1 with co-transcriptional splicing of exon two. Our data defines a minimal set of risk associated eQTLs predicted to be involved in the expression of BANK1 modulated through epigenetic regulation and splicing. These findings allow us to suggest that the increased expression of BANK1 will have an impact on B-cell mediated disease pathways.


BANK1 Transethnic Finemapping
Using the high quality filtered genome-wide data from the European and African American genome-wide association scans, we performed a univariate genetic logistic regression association analysis of BANK1 using an additive model. In Europeans, 1133 imputed markers mapped on BANK1 were tested for association. We selected a set of 307 markers with a p value lower than 0.05 (Supplemental Table S1A and Supplemental Figure S1A). Eighteen tagging markers captured all 307 (100%) alleles at r 2 ≥0.8, representing the 18 respective linkage disequilibrium groups organized in four haplotypic blocks. Only one marker remained as an independent signal, rs17031708 ( Figure 1, Table 1 and Supplemental Table S1A). The best hit for association was the SNP rs10028805, (haplotypic block two, and linkage disequilibrium group five), with OR = 0.8177, confidence interval (CI) = 95% (0.7656, 0.8735), and a Pcorr = 2.22E−09 ( Figure 2a, Table 1, and Supplemental Table S1A). Conditioning on rs10028805, the association disappeared, except that of the five low frequency variants (MAF < 5%), rs148572654, rs10212686, rs62321740, 4:102857352:AT:ATT, and 4:102945562:GT:G (Table 1 and Figure 2b), representing linkage groups 8, 11, 12, 16, and 19, respectively. Only by conditioning on rs10028805 and three of these low frequency variants, namely, rs148572654, rs62321740, and 4:102945562:GT:G, did all of the signals of association in BANK1 disappear (Table 1, Supplemental  Table S1A, and Figure 2c). In addition, these four markers constituted the haplotypes significantly associated with the affected phenotype. The best was the allelic combination '1111', corresponding to a risk haplotype with frequencies of 62% in the cases and 56% in the controls, and a value of p = 7.64E−13 (Table 2, Figure 3a).

BANK1 Transethnic Finemapping
Using the high quality filtered genome-wide data from the European and African American genome-wide association scans, we performed a univariate genetic logistic regression association analysis of BANK1 using an additive model. In Europeans, 1133 imputed markers mapped on BANK1 were tested for association. We selected a set of 307 markers with a p value lower than 0.05 (Supplemental Table S1A and Supplemental Figure S1A). Eighteen tagging markers captured all 307 (100%) alleles at r 2 ≥0.8, representing the 18 respective linkage disequilibrium groups organized in four haplotypic blocks. Only one marker remained as an independent signal, rs17031708 (Figure 1, Table 1 and Supplemental Table S1A). The best hit for association was the SNP rs10028805, (haplotypic block two, and linkage disequilibrium group five), with OR = 0.8177, confidence interval (CI) = 95% (0.7656, 0.8735), and a Pcorr = 2.22E−09 ( Figure 2a, Table 1, and Supplemental Table S1A). Conditioning on rs10028805, the association disappeared, except that of the five low frequency variants (MAF < 5%), rs148572654, rs10212686, rs62321740, 4:102857352:AT:ATT, and 4:102945562:GT:G (Table 1 and Figure 2b), representing linkage groups 8, 11, 12, 16, and 19, respectively. Only by conditioning on rs10028805 and three of these low frequency variants, namely, rs148572654, rs62321740, and 4:102945562:GT:G, did all of the signals of association in BANK1 disappear (Table 1, Supplemental Table S1A, and Figure 2c). In addition, these four markers constituted the haplotypes significantly associated with the affected phenotype. The best was the allelic combination '1111', corresponding to a risk haplotype with frequencies of 62% in the cases and 56% in the controls, and a value of p = 7.64E−13 (Table 2, Figure 3a).         In African Americans, 1752 imputed SNPs were mapped on BANK1, of which 260 showed association test p values lower than 0.05 (Supplemental Table S1B and Supplemental Figure S1B). Running the tagger tool, a set of 50 markers captured all 260 (100%) alleles at r 2 ≥0.8, representing their 50 respective linkage disequilibrium groups (Table 3 and Supplemental Table S1B). These 50 tagging markers were organized in 8 haplotypic blocks and 11 independent signals ( Figure 4). The best hit was the SNP rs17200824 with OR = 0.6787, CI = 95% (0.5947, 0.7747), and a Pcorr = 9.24E−09. This best-hit was in linkage group 13, the same as the branch-point site SNP rs17266594 and rs10516487 (R61H) (Figure 5a, Table 3, and Supplemental Table S1B). After conditioning on rs17200824, the association signals remained in 11 of the tagging markers ( Figure 5b, Table 3, and Supplemental Table S1B). Only after the conditioning on rs17200824 on two of these 11 variants, rs4295265 and rs149302668, had all of the signals on the BANK1 association disappeared ( Table 3, Supplemental Table S1B and Figure 5c). In addition, these two markers constituted haplotypes that are significantly associated with the affected phenotype; the best was the allelic combination '11', corresponding to a risk haplotype with frequencies of 90% in the cases and 84.6% in the controls, and a value of Pcorr = 3.17E−08 (Table 4 and Figure 3b). It is noted that rs4295265 (Pcorr = 3.89E-07) was located on intron seven, at 2,319 bp upstream to the exonic SNP rs3733197 (A 353 T, in the ankyrin domain) (Pcorr = 1.44E−07), described as being associated to SLE1 and sharing the same haplotypic block eight (pairwise r 2 = 0.75). The SNP rs149302668, located on intron 11, was a low frequency variant (MAF = 0.01348).
In EUR, markers significantly associated with SLE were located to haplotypic blocks one and two, including intron one and intron two-exon two-intron three. In the first haplotypic block, covering intron one, the markers were organized in linkage disequilibrium groups one, two, and three. Group one showed the best signal of association in the block with rs4518254 (Pcorr = 6.36E-09) as a best hit. The second haplotypic block included two linkage disequilibrium groups, five and six, covering intron two-exon two-intron three. Group five contained rs10028805 (intron2) (Pcorr = 2.22E-09) and rs10516486 (L98L, exon2). For group six, rs13106926 (intron2) (Pcorr = 1.36E-08) was the best hit; this group also contained the branch-point site, SNP rs17266594 (intron1), and rs10516487 (R61H, exon two) [1].
After conditioning on the best-hit, rs10028805, all of the signals of association for the common variation (MAF > 5%) in BANK1 disappeared; hence, rs10028805 would be the statistically most plausible causal SNP of the BANK1 association to SLE in the EUR sample, as previously suggested. It has been remarked that, as a result of the LD status among the markers in 'block two-group five' (intraBK = 0.96) (Supplemental Figure S2B), we would expect a similar effect with all of them. We observed that from the conditioning on each of the three SNPs, rs10028805, rs13136297, and rs4411998, the same results were obtained, (=0.99). However, traces of association with a low-frequency variation remained (Table 1, Supplemental Table S1A, and Figure 2b). Only after the conditioning on rs10028805 and three low frequency variants, rs148572654, rs62321740, and 4:102945562:GT:G, did the signals of association in the BANK1 completely disappear (Table 1, Supplemental Table S1A, and Figure 2c).
In AA, the markers that were significantly associated (10 −8 ) were located in the same regions as in EUR, intron one, and intron two-exon two-intron three ( Table 2 and Supplemental Table S1B). Intron one was split in two haplotypic blocks, one and two. In the second block, markers were distributed in four linkage disequilibrium groups (one, two, three, and four) and the best hits were in the third group (equivalent to EUR 'block one-group three ), SNP rs4699262 (Pcorr = 3.19E-08). Region intron two-exon two-intron three was divided into three haplotypic blocks (three, four, and five) containing 12 linkage disequilibrium groups (Supplemental Table S1B). The best hits were in the fourth haplotypic block, group 13 (partially equivalent to EUR group six), with the best-hit rs17200824 (Pcorr = 9.24E-09) and the branch-point site SNP rs17266594 (intron one) and rs10516487 (R61H, exon two).   Table 3. Fifty BANK1 tagging markers with association p value lower than 0.05, representing their respective linkage disequilibrium groups in the AA (African American) sample. (p *-p value conditioning on rs17200824; p **-p value conditioning on rs17200824, rs4295265, and rs149302668; A*-represents the allele used in the association analysis; HP.BK-haplotypic block; LK.GR-linkage disequilibrium group).    Note that AA group eight, partially equivalent to EUR group six, containing the best-hits, did not contain associated markers at the genome-wide association study (GWAS) level in AA (SNP rs10028805 had a p value = 4.72E-04) (Supplemental Table S1B), and was independent of group 13 (AAgroups_8_13 = 0.18 and EURgroups_5_6 = 0.70).

HP.BK
No marker, including the best hit in the AA population, rs17200824, was able to completely eliminate the associations of all of the others, (Table 3 and Supplemental Table S1B). Only by conditioning on rs17200824 and two variants, rs4295265 and rs149302668, did all of the signals of the association in BANK1 disappear (Table 3, Supplemental Table S1B, and Figure 5c). Note that rs4295265 (Pcorr = 3.89E-07) was located on intron seven at 2319 bp upstream to the exonic SNP rs3733197 (Pcorr = 1.44E-07) (exon seven, A353T ankyrin domain), described as SLE associated one. Thus, the elimination of the genetic association in AA required conditioning on the markers located  Note that AA group eight, partially equivalent to EUR group six, containing the best-hits, did not contain associated markers at the genome-wide association study (GWAS) level in AA (SNP rs10028805 had a p value = 4.72E-04) (Supplemental Table S1B), and was independent of group 13 (AAgroups_8_13 = 0.18 and EURgroups_5_6 = 0.70).
No marker, including the best hit in the AA population, rs17200824, was able to completely eliminate the associations of all of the others, (Table 3 and Supplemental Table S1B). Only by conditioning on rs17200824 and two variants, rs4295265 and rs149302668, did all of the signals of the association in BANK1 disappear (Table 3, Supplemental Table S1B, and Figure 5c). Note that rs4295265 (Pcorr = 3.89E-07) was located on intron seven at 2319 bp upstream to the exonic SNP rs3733197 (Pcorr = 1.44E-07) (exon seven, A353T ankyrin domain), described as SLE associated one.
Thus, the elimination of the genetic association in AA required conditioning on the markers located in at least two separate blocks, suggesting two independent but closely related signals, as well as suggestive signals in the intron [7].
We then performed a trans-racial 'EUR-AA' meta-analysis. We selected 165 markers that showed association p values less than 0.05 in both of the EUR and AA samples (Supplemental Table S2). The set of meta-analysis best signals for association were located on the region comprising intron two, exon two, and intron three. In EUR, it coincided with haplotypic block two containing linkage disequilibrium groups five and six, while in AA, the same region was divided into three haplotypic blocks containing seven linkage disequilibrium groups (5, 6, 7, 8, 9 10, and 13). Note that the markers in the EUR linkage disequilibrium groups five and six, in the AA sample, were divided into groups of 5, 8, 9, and 10, and 6, 7, and 13 ( Table 5, Supplemental Table S2, and Supplemental Figure S3B). The meta-analysis of the SNP rs10028805 in groups EUR 5 and AA 8 showed OR = 0.8167, CI = 95% (0.7712, 0.8648), and P val = 4.33E-12. Interestingly, rs10028805 had a P val = 4.71 E-04 in the AA sample, clearly under the accepted GWAS threshold. It should be noted that the linkage disequilibrium group AA 13 (and EUR 6) contained the rs71597109, best hit of the meta-analysis, OR = 0.7898, CI = 95% (0.7423, 0.8404), and P val = 6.86E-14. In addition, SNP rs71597109 shared, with group AA 13, rs17200824 and the markers described as the best causal hits of the BANK1 association to with SLE, such as the branch-point site SNP rs17266594 on intron two, and the R61H rs10516487 on exon two. We then performed an eQTL analysis using various sources of data available, [8][9][10][11], as described. The analysis showed that several associated and non-associated SNPs had eQTL effects. In fact, all of the strongly associated SNPs were eQTLs covering a region from the promoter of BANK1 to UTR-3 /downstream region. However, the strongest eQTLs, according to the BIOS browser10, were localized to intron two, close to exon two, exon two, and intron three, correlating with the most associated SLE signals at the GWAS significance level. These eQTLs were rs13136297 (Cis-eQTL Exon-ratio p = 5.29E-39) and rs10028805 (Cis-eQTL exon-level p = 2.74E-52), on the European haploblock two, linkage group five, and the AA haploblock three, linkage group eight. Importantly, risk was associated with an increased expression of BANK1 (Supplemental Table S3 and Figure 6), and importantly, the effects were directly observed in the RNA-Seq data at the exon level of the expression. Using the BIOS data, we also identified a cis-meQTL, specifically rs6833764 containing CpG cg01116491 (p = 3.64E-35), located in the European haploblock one linkage group one; this SNP is however not associated with SLE in AA (Supplemental Table S3).
In order to better define the functional effects on the BANK1 gene expression, we performed an analysis of the ENCODE tracks and investigated the possible causes for the eQTLs. This analysis revealed that the number of annotations in the BANK1 SNPs increased from the promoter, peaking in the region of association of SLE, centered in intron two and exon two. Using the meta-analysis data, these annotations can be observed in the Supplemental Table S4 (see the various sheets for transcription factor binding site, histones, and junctions). These annotations showed an enrichment of the splice junctions (p value = 1.98E-08, Fisher's exact test), and in various histone marks, particularly histone three lysine 36 27 trimethylation (H3K27me3, p-value = 1.20E-10) (Supplemental Table S4), in the lymphoblastoid cell line Gm12878, further supporting a clear regulatory BANK1 region in this area. The activating histone marks, such as H3K4me1, H3K4me3 in peripheral, and cord blood B cells and CD56+ NK cells, covered the whole associated region. Some spots were also observed in the monocytes and CD34+ hematopoietic stem cells (HSC), but these were clearly less abundant across the region. In addition, the best-hits were enriched for the sites of binding of the transcription factor EZH2 in the Gm12878 cell line (p value = 2.99E-44). One transcription factor binding site for RUNX3, an activator of the transcription of B cells, was observed in the GM12878 lymphoblastoid cell line, where SNP rs13106926 is located (Supplemental Table S4) (UCSC genome browser, coordinates chr4: 102,739,791-102,739,791 of GRCh37/Hg19) (Supplemental Material) 14. This SNP is also found within an active histone mark, histone 3 lysine 27 acetylation mark (H3K27Ac), according to the UCSC browser14. This SNP is an eQTL in lymphoblastoid cell lines, according to the Harvard browser15, but no records were obtained on the BIOS browser. This SNP is located within the European haplotypic block two, group six, and the AA block three, group six, and is strongly associated with SLE (Supplemental Table S2).
Combining all of the information, our data supports the functional effects limited to the regions of intron two, exon two, and intron three, in both European and African Americans. As expected, the AA population shows a more dispersed haplotypic structure than Europeans, separating the haplotypic block two containing linkage disequilibrium groups five and six, into three haplotypic blocks (three, four, and five) with seven linkage disequilibrium groups (5, 6, 7, 8, 9 10, and 13) (Supplemental Table S2 and Supplemental Figure S3B). Combining all of the information, our data supports the functional effects limited to the regions of intron two, exon two, and intron three, in both European and African Americans. As expected, the AA population shows a more dispersed haplotypic structure than Europeans, separating the haplotypic block two containing linkage disequilibrium groups five and six, into three haplotypic blocks (three, four, and five) with seven linkage disequilibrium groups (5, 6, 7, 8, 9 10, and 13) (Supplemental Table S2 and Supplemental Figure S3B). . Transcribed areas, promoter, and enhancer marks are depicted as green, red, and orange rectangles, respectively. Fantom5 tissue-specific transcribed enhancers loci are shown as enhancer 1 (hg19 chr4:102728165-102728530) specific to spleen, blood, and B-lymphocytes; enhancer 2 (hg19 chr4:102894116-102894225) specific to lymph node, esophagus, B-lymphocytes, melanocytes, and basophils. Shown below is the SLE association plot for European data and expression quantitative trait locus (eQTL) map for BANK1 gene in blood (top markers shown for exon-ratio and exon-level eQTL, according to BIOS [whole blood data] and to MuTHER [LCL data]). . Transcribed areas, promoter, and enhancer marks are depicted as green, red, and orange rectangles, respectively. Fantom5 tissue-specific transcribed enhancers loci are shown as enhancer 1 (hg19 chr4:102728165-102728530) specific to spleen, blood, and B-lymphocytes; enhancer 2 (hg19 chr4:102894116-102894225) specific to lymph node, esophagus, B-lymphocytes, melanocytes, and basophils. Shown below is the SLE association plot for European data and expression quantitative trait locus (eQTL) map for BANK1 gene in blood (top markers shown for exon-ratio and exon-level eQTL, according to BIOS [whole blood data] and to MuTHER [LCL data]). Table 5. The EUR-African Americans' (AA) meta-analysis best signals for association located on the region comprising intron two, exon two, and intron three. In the EUR, the association best-hit coincided with haplotypic block two, containing linkage disequilibrium groups five and six; the AA the region was divided into three haplotypic blocks containing seven linkage disequilibrium groups (5, 6, 7, 8, 9 10, and 13). (A*-represents the allele used in the association analysis; FEM-fixed effect model).

Discussion
We identified a set of SNPs in intron one and surrounding exon two, clearly associated with SLE in the two studied populations, namely European and African Americans. In our current study, we did not present in vitro functional tests for these SNPs. We carried out a bioinformatics analysis and showed that all of the strongest associated SNPs are BANK1-associated eQTLs, and associate with the enrichment of various epigenetic marks known to associate with pre-mRNA co-transcriptional exon splicing [8,12]. Furthermore, the data supports, in particular regions containing three SNPs with potential functional properties, rs13136297 (Cis-eQTL exon ratio), rs10028805 (Cis-eQTL exon level), and rs13106926 (eQTL), the latter located within the RUNX3 TFBS and an active regulatory element histone mark in a lymphoblastoid cell line. The combined presence of the histone marks observed, and the enrichment of the splice junctions further support such co-transcriptional splicing as a mechanism regulating the splicing of exon two of BANK1 involving a higher level of regulation of the chromatin topography of the locus. This is also supported by the eQTL data obtained from BIOS browser. These authors analyzed the functionality of exon ratio cis-eQTLs in mRNA splicing, showing an enrichment of splicing events and a close proximity to the splicing acceptor and donor sites [9].
Recently, it was shown that the SNPs of BANK1, associated with SLE and identified through targeted sequencing, increased the expression of the SLC39A8 gene in the LPS stimulated CD14+ monocytes. SLC39A8 is a gene located immediately downstream of BANK1. The authors utilized the GEUVADIS database, based on the expression data in the lymphoblastoid cell lines (LCLs) and HapMap SNPs [10]. We have used the Zhernakova data [9], based on whole blood RNA sequencing information from a large number of European individuals, and still, not all of the SNPs have been found and the complete information is not fully available. The SNP shown by Raj et al. [11] to be strongly associated with SLE and an eQTL for SLC39A8 is actually a weak eQTL for this gene, and the effect is probably related to its LD with SNPs in the promoter of SLC39A8. The main eQTL peak for SLC39A8 is located within the promoter of that gene [9], not BANK1. We have analyzed our data for a putative association of the SLC39A8 locus and SLE, and did not observe any evidence of this association (data not shown). It disproves the involvement of SLC39A8 in the SLE pathogenesis, proposed by Raj and co-authors [11].
Our genetic analysis and the fact that risk was associated with an increased BANK1 expression at the exon level, supports our original postulate that the increased expression of the exon two coded domain of BANK1 may be involved in the development of autoimmunity, and that this expression and the splicing of exon two are epigenetically regulated.
We show the enrichment of given epigenetic marks, such as H3K27me3 and EZH2, in the BANK1 associated SNP region. Furthermore, the presence of such epigenetic marks supports the observation of the eQTLs in the SLE-associated SNPs, and that the region could function as an alternative promoter. In fact, this might be the case, as the BANK1 gene has two transcription initiation sites, and the first description of the gene had missed the first initiation site [2].
We found one SNP lying with five nucleotides within the binding site of RUNX3, a transcriptional activator. However, this TFBS is only observed in the lymphoblastoid cell lines and not primary B cells. Importantly, the RUNX3 transcriptional activity is induced during the Epstein-Barr virus (EBV) infection of B cell immortality, leading to the repression of RUNX1, with both factors considered as interesting transcriptional regulators of autoimmunity [13][14][15]. An infection with EBV has been considered as an environmental trigger of SLE [16,17]. Nevertheless, we do not know if RUNX3 is induced in lupus B cells or if the modifications in the RUNX3 transcriptional activity have a role in the BANK1 expression. Thus, the role of the SNP rs13106926 as an eQTL, associated with SLE and the RUNX3 transcriptional activity in EBV infected cells is plausible, but needs to be experimentally tested. Another transcription factor with numerous sites across the region, particularly in intron one, was the histone methyltransferase EZH2, a member of the polycomb repressive complexes that promote silencing by catalyzing posttranslational modifications of histones. EZH2 catalyzes the histone three, lysine 27 trimethylation rich in transcriptional start sites of the repressed genes. This transcription factor has been shown to protect germinal center B cells and impede the mutation of the activation-induced cytidine deaminase, an enzyme key in the class switch recombination of the immunoglobulin heavy chain locus [18]. Its role in SLE B cells and the induction of BANK1 is also not known, but a role for BANK1 in the class switch recombination was recently suggested in the mouse [6].
Alternative splicing involves various chromatin modifications, including histone modifications [19]. These changes may function as scaffolds or adaptors for the binding proteins that are part of the splicing machinery. In particular, an important protein is RNA Pol II, in charge of the elongation, which interacts with histone modifiers to recruit SR proteins [19]. These proteins are particularly efficient during co-transcriptional splicing events. In addition, the modulation of H3K36me3 or H3K4me3 levels by the overexpression or downregulation of their respective histone methyltransferases, changes the tissue-specific alternative splicing pattern in a predictable fashion [20]. Studies have shown that a chromatin adaptor protein (CHD1) has a chromodomain that specifically recognizes H3K4me3 and interacts with specific small RNA proteins28, particularly components of the U2 snRNP complexes.
We reported [7] that the protective allele (61H) of the associated coding SNP rs10516487 led to the formation of a site for the splicing trans-factor Srp40, suppressing splicing of exon two and, hence, of the FL isoform. Thus, the differences in the expression of the D2 and FL isoforms of BANK1 lie behind the risk for SLE, and it is feasible that the co-transcriptional regulation of exon two splicing, combined with the presence of R61, promotes the expression of the FL isoform and enhanced signaling in susceptible individuals. Our data and our animal work fully support an increase in the FL BANK1 expression, with increased BANK1-dependent toll-like receptor 7/9 signaling. Interestingly, the RNA splicing events are controlled by the abundance of SR proteins modulated by the signaling pathway activity, including the pathways controlled by the phosphatidyl-inositol-3 (PI3K) kinase, a key molecule in B cell signaling [21], which regulates the phosphorylation of the serine of the Srp40 trans-activator protein. Therefore, the presence of co-transcriptional alternative splicing modifiers supports our previous data.

Sample Collection and Genotyping
We used GWAS data from 5478 individuals of European ancestry, including 4254 SLE patients and 1224 controls genotyped using the Illumina HumanOmni1_Quad_v1-0_B chip [12]. The U.K. subjects with SLE in the study were recruited, with the study having obtained ethical approval from the London Ethics Committee. Individuals were invited into the study and given information sheets as well as verbal explanations of what the research entailed. For those individuals willing to participate, informed written consent was obtained. The recruitment in continental Europe and Canada were subject to local reviews and ethical approval. Copies of the relevant supporting documentation were sent to the investigators at King's College at the commencement of the study [12]. In order to increase the number of controls, data from the European controls that were genotyped on the same platform were obtained from the dbGaP database (http://www.ncbi.nlm.nih.gov/gap), including the DCEG Imputation Reference Dataset (phs000396.v1.p1; 1175 individuals), the GENIE UK-ROI Diabetic Nephropathy GWAS (phs000389.v1.p1; 903 individuals), and the High Density SNP Association Analysis of Melanoma (phs000187.v1.p1; 1047 individuals). In total, the initial dataset consisted of 4254 SLE patients and 4349 controls. We used a set of individuals with African American (AA) ancestry, collected from several of the SLEGEN groups and from the Oklahoma Medical Research Foundation, following ethics committee approval and informed consent [22]. The African American samples were genotyped with the same Illumina platform HumanOmni1_Quad_v1-0_B chip as the European individuals (unpublished data).

Data Filtering
In order to obtain a quality-controlled working dataset satisfying current state-of-the-art criteria for the association studies, data filtering was conducted using PLINK v1.07 [23], applying the following criteria: minimum total call rate per sample of 90% (18 individuals were eliminated), minimum call rate per marker of 98%, minor allele frequency (MAF) threshold of 0.01%, Hardy-Weinberg Equilibrium (HWE) p-value at a minimum of 0.0001 for cases, and 0.01 for controls, as well as a final cutoff p value of 0.00001 for differential missingness in the no-call genotypes between the cases and controls.

Stratification Analysis
To assess the genetic diversity in the EUR data set, markers (linkage equilibrium r 2 < 0.1) with maximum differences in the allele frequencies between the HapMap subpopulations (CEU, MXC, YRI, CHB, EUR, AMR, AFR, and ASN) were selected. A first principal component analysis (without sigma threshold) showed that all of the samples were indeed of European descent. In the next step, the genetic diversity within the European context was examined using 13,222 markers (r 2 < 0.1), with maximum allele differences between the HapMap European subpopulations CEU and TSI (north central and southern Europe, respectively). The principal component analysis was performed with smartpca, EIGENSOFT 4.0 beta package [24]. A set of 60 samples was detected as outliers, using a sigma_threshold = 6.0. This low number of outliers supports the European homogeneity of the sample. An association analysis by logistic regression corrected for stratification with the first 10 principal components resulted in a λGC = 1.05. The cryptic relatedness was assessed using REAP [25], by removing one individual from each sample pair with a kinship-coefficient greater than 0.055. Using this procedure, 248 samples were removed. The final data set contained 573,370 autosomal markers from 8277 individuals (4212 cases and 4065 controls).
The final African American data set contained 2899 individuals, 1761 SLE patients, and 1138 controls. This data set was supplied after QC-filtering, and removing the outliers and 'closely-related' samples. For each African American sample, coordinates for the four first PCs were provided.

Statistical Analyses
The allelic association for each marker was tested using the Plink additive logistic regression, testing for the minor allele, as in the literature [23]. The Tracy-Widom test was used to evaluate the significance of the principal components. Association tests included the axes of the 10 most significant principal components as covariates. The dependency of association for the genetic variants with the lowest p values were systematically examined by conditional testing. Plots were generated using a modified version of LocusZoom [26]. The haplotypic relations, defined by D' (four gametes rule), were computed using Haploview version 4.2 [27]. The haploview tagger tool was used (threshold r 2 ≥ 0.8) to select the minimum set of tagging markers containing the allelic variability of the whole BANK1 region, defining the linkage disequilibrium groups of the markers. The haplotype analyses were carried out with Plink, and the phase was confirmed with BEAGLE version 3.3 [28].
The markers within the region of the BANK1 gene were selected for imputation with IMPUTE2 [29] using GWAS and 1000Genomes data as reference (http://www.1000genomes.org). Prior to that, the GWAS data set was pre-phased with Shapeit [30], using only the 1000G EUR populations as reference. For imputation, a restrictive QC-filter was applied (SNP genotyping rate ≥99%, sample genotyping rate ≥98%) without restriction of the allele frequencies, in order to include rare and low frequency variants. To ensure a high degree of reliable imputation, a conservative 'info_value' threshold of >0.8 for each marker was applied. The same imputation procedure was used with the AA data set, with the only difference being that the AFR 1000Genomes' populations were used as reference in the Shapeit pre-imputation.
A trans-racial 'EUR-AA' meta-analysis was implemented using Metal [31] and Plink. The beta parameter in the logistic regression (natural logarithm of the odds ratio) was used as the effect and the sample size in each study was used as the weighting factor. Two criteria for selecting the markers resulting in the meta-analysis were used, namely: SLE association p value < 0.05 values and the same direction of the effect in both studies, and heterogeneity-test p value > 0.01.

eQTL/meQTL Analysis
In order to explore the relationship between SLE and the BANK1 gene expression, we analyzed the expression quantitative trait loci (eQTLs) from several recent databases for lymphoblastoid cell lines (LCLs), cells, or tissues obtained from donors of European ancestry [8][9][10]. MuTHER [32] includes the gene expression data from fat, skin, and LCLs. The BIOS browser [9] (genenetwork.nl/biosqtlbrowser) containing RNASeq data from the peripheral blood derived from 2116 Dutch individuals was used to define the remaining eQTLs of BANK1. This data set provides quantitative gene and exon expression data various cis-eQTLs and meQTLs, combined with the cell type and context-specific data. The RTeQTL [33] browser (http://eqtl.rc.fas.harvard.edu) was used to search for information on rs13106926.

Annotation Analysis
We used the genome annotation data assembled under the GenomeRunner project [34] to annotate the set of BANK1 associated SNPs. Genome annotations include the cell type specific histone modification marks, transcription factor binding sites, splice junction regions, and other regulation-related tracks from the ENCODE and Roadmap Epigenomics consortia [35,36]. The analysis was restricted to the cell type-specific regulatory marks, Gm12878, B-cell lymphoblastoid cell line for the ENCODE data, and the 'HSC_and_Bcell' tissue IDs for the Roadmap Epigenomics data. Only the annotations that overlap with at least one SNP are reported.
The enrichment p values were calculated using Fisher's exact test, and were adjusted for multiple testing using the false discovery rate (FDR) method. Briefly, the enriched co-localization of the set of BANK1 associated SNPs was assessed against the null hypothesis of the random co-localization. To evaluate the random co-localization, a set of all of the SNPs from the BANK1 region (chr4: 102614068-103075430, the human GRCh37/hg19 genome assembly) was obtained from dbSNP build 138, totaling 8772 'background' SNPs.

Conclusions
Our analysis of two populations showed that the eQTL SNPs could extend to various portions of the gene, but were indeed centered on the same major region of the BANK1 gene, surrounding exon two. The genetic association of BANK1 in AA individuals is particularly strong and supports an important role of the B cell-related pathways in the disease in those individuals. The linkage disequilibrium groups AA 13 and EUR 6 contained, what could be considered the best meta-analysis hits, the SNP rs71597109 and markers described as the best causal hits of the BANK1 association to SLE, such as the branch-point site SNP rs17266594 on intron two, and the R61H rs10516487 on exon two.
The presence of at least two independent genetic effects in AA shows that BANK1 contributes to the inheritance of risk through two separate haplotype blocks containing eQTLs. Funding: The work presented in this paper has been supported by the Ministerio de Economía y Competitividad, Spain (SAF2016-78631-P), partly co-financed by FEDER funds of the European Union, the Gustaf den V:e-80-års Fond and the Swedish Association against Rheumatism to M.E.A-R. In addition, this work was financed by the NIH P01 grant P01-AI-083194 to C.D.L., J.B.H., R.K., and M.E.A-R. JBH: NIH grants: R01 AI024717, U01 HG00866, P30 AR070549 and U01 AI130830 and the US Department of Veterans Affairs: I01 BX001834.C.D.L.: Center for Public Health Genomics. R.K.: NIH grant R01-AR33062. J.A.J.: NIH grants U54GM104938, P30AR053483.

Conflicts of Interest:
The authors declare no conflict of interest.