Functional SNPs in the Human Autoimmunity-Associated Locus 17q12-21

Genome-wide association studies (GWASes) revealed several single-nucleotide polymorphisms (SNPs) in the human 17q12-21 locus associated with autoimmune diseases. However, follow-up studies are still needed to identify causative SNPs directly mediating autoimmune risk in the locus. We have chosen six SNPs in high linkage disequilibrium with the GWAS hits that showed the strongest evidence of causality according to association pattern and epigenetic data and assessed their functionality in a local genomic context using luciferase reporter system. We found that rs12946510, rs4795397, rs12709365, and rs8067378 influenced the reporter expression level in leukocytic cell lines. The strongest effect visible in three distinct cell types was observed for rs12946510 that is predicted to alter MEF2A/C and FOXO1 binding sites.


Introduction
The study of human 17q12-21 locus began with the discovery of its association with inheritable predisposition to breast cancer [1]. Later, strong association with asthma was also discovered [2]. Since the identification of the first single-nucleotide polymorphisms (SNPs) in the locus that strongly correlate with susceptibility to childhood asthma and ORMDL3 gene expression level [3], 17q12-21 has been the most replicated and most significant asthma locus defined by genome-wide association studies (GWASes) and meta-analyses of GWASes, and the list of candidate genes was substantially extended [4]. More recently, several GWASes also mapped autoimmunity association signals to 17q12-21 for primary biliary cirrhosis [5,6], rheumatoid arthritis [7], type I diabetes [8], ulcerative colitis [9], and Crohn's disease [10]. 17q12-21 has also been linked to allergy [11]. Interestingly, some variants have opposing risk alleles for autoimmunity and asthma. Taken together, these data highlight 17q12-21 as an important locus for immune system in general.
Autoimmunity-associated SNPs in the 17q12-21 locus overlap with several genes: NEUROD2, PPP1R1B, STARD3, TCAP, PNMT, PGAP3, ERBB2, MIR4728, MIEN1, GRB7, IKZF3, ZPBP2, and GSDMB. The role of these genes in the immune system is not obvious, with the exception of IKZF3 that encodes Aiolos transcription factor and is particularly important for normal development and function of B cells [12]. Considering also the fact that regulatory SNPs may influence gene expression over very long distances, identification of causative variants and their target genes becomes an important task on our way to deeper understanding of molecular mechanisms of autoimmunity. While the target genes can be discovered using genome editing, the true causative SNPs in the locus must be identified first.

SNPs Selection
We used the results of fine-mapping of autoimmunity-associated loci performed by Farh et al. [16] and of asthma-associated 17q12-21 SNPs performed by Schmiedel et al. [17]. The list of candidate causal SNPs for 39 immune and non-immune diseases and enhancer annotations from the former work was downloaded from the data portal [19]. For each autoimmune GWAS hit in the dense 17q12-21 autoimmune SNP cluster (GRCh37/hg19 chr17:37740161-38066240) we picked a candidate with the highest PICS (probabilistic identification of causal SNPs) probability (PP) among those that overlap any immune enhancer. Of 136 polymorphisms analyzed by Schmiedel et al., we chose SNPs highlighted by the authors for their overlap with DNase I hypersensitivity sites highly specific for lymphocytes ( Figure 3a in [17]).

PCR Mutagenesis
After sequencing of cloned enhancers alternative SNP alleles were introduced by overlap extension PCR using the following primers (Table 3). Table 3. Primers used for PCR mutagenesis.

Activation of U-937
The protocol of activation was developed previously [37]. Cells were seeded in a culture-treated flask at 0.8 mln/mL density and 10 ng/mL of phorbol 12-myristate 13-acetate (Sigma-Aldrich, St. Louis, MO, USA) was added to culture medium. Twenty-four hours later 1 µg/mL of lipopolysaccharide (LPS; E. coli O111:B4, L2630, Sigma-Aldrich, St. Louis, MO, USA) was also added for another 3 h. After electroporation, cells were transferred to fresh medium containing 1 µg/mL LPS.

Luciferase Reporter Assay
Cells were transfected using Neon™ Transfection System (Thermo Fisher Scientific, Waltham, MA, USA) with the following parameters (Table 4). For two million cells in 100 µL of buffer R we added 0.5 µg of pRL-CMV Renilla luciferase control vector (Promega, Fitchburg, WI, USA), 1.2 pmol of the pGL3-based test vector, and salmon sperm DNA (Sigma 31149) to the total DNA amount of 26.5 µg. Luciferase activity was measured 24 h after electroporation using Dual-Luciferase Reporter Assay System (Promega, Fitchburg, WI, USA), Hidex Bioscan Plate Chameleon Luminometer and MicroWin 2000 software. Minimum of two independent experiments were performed, each including three to four technical replicates.

Statistical Analysis
Statistical significance of the difference in the luciferase signal between alternative SNP alleles was determined using two-tailed unpaired Student's t-test. Calculation was performed in Microsoft Excel. Data were represented as mean ± SEM.

Selection of 17q12-21 SNPs for Experimental Validation
A mathematical algorithm developed by Farh et al., called PICS [16], assigns each variant a probability of being causative.
Schmiedel and co-authors [17] used epigenetic data to analyze 136 asthma-associated SNPs in the 17q21 locus and found that three of them overlapped DNase I hypersensitivity sites highly specific for lymphocytes: rs12946510 (already included in our list), rs2313430, and rs4795397. The latter two appeared to be also associated with ulcerative colitis as was shown in a GWAS not covered by Farh et al. [38]. All candidate SNPs are in strong linkage disequilibrium (LD) with each other in European ancestry (Figure 1b). Short summary of the selected candidate SNPs is present in Table 5. More details can be found in Supplementary Table S1.

Possible Target Genes of the Selected Candidate SNPs
In our previous works we studied influence of regulatory SNPs on the promoter activity of the nearest genes as they were the most probable targets for the studied variants. However, functional polymorphisms do not always regulate expression of the nearest gene [25,[45][46][47][48][49]. A number of efforts were made to identify eQTL in several human cell types and tissues, including whole blood, lymphoblastoid cell lines, and some major populations of primary leukocytes [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35]50]. These data have shown that regulatory SNPs can indeed be located several million bases away from their target genes or even at another chromosome, and that transcriptional regulation they exert is often specific for cell type and functional state of the cells (for example, activation mode). We performed a publication search for eQTL in primary leukocytes, whole blood, and lymphoblastoid cell lines, and found evidence that selected SNPs may be associated with the expression levels of at least eight genes within the 100 kb distance (Table 6). Therefore, we decided to use a minimal synthetic promoter to study the influence of selected SNPs on enhancer activity in the luciferase reporter assay.

Effects of the Selected Variants on Transcription in Luciferase Reporter Assay
We cloned putative enhancers containing selected SNPs into luciferase reporter vector to test how each variant could affect enhancer properties of the surrounding genomic DNA in a range of human leukocytic cell lines. As gene targets of the selected SNPs are not precisely known, we used an artificial minimal promoter from the pGL4.24 vector. After determining the likely borders of enhancers surrounding each SNP (Figure 1a), corresponding DNA fragments were amplified from human genomic DNA and inserted downstream of the luciferase gene. Alternative alleles were introduced by overlap extension PCR.

PSMD3
rs12709365/ rs13380815 primary monocytes (24 h LPS-stimulated) by LD (r 2 = 0.874) [20] * "Direct data" type of evidence indicates that a candidate SNP was present on the genotyping chip used in the corresponding study. Otherwise, we looked for expression quantitative trait loci (eQTLs) in strong linkage disequilibrium (LD; r 2 > 0.8 according to the 1000 Genomes Pilot Project [44]) with any of the candidate SNPs. We provide here the variant in the highest LD with an index eQTL, or a pair of variants if they are equally associated. LPS = lipopolysaccharide; IFNγ = interferon gamma; EBV = Epstein-Barr virus.

Effects of the Selected Variants on Transcription in Luciferase Reporter Assay
We cloned putative enhancers containing selected SNPs into luciferase reporter vector to test how each variant could affect enhancer properties of the surrounding genomic DNA in a range of human leukocytic cell lines. As gene targets of the selected SNPs are not precisely known, we used an artificial minimal promoter from the pGL4.24 vector. After determining the likely borders of enhancers surrounding each SNP (Figure 1a), corresponding DNA fragments were amplified from human genomic DNA and inserted downstream of the luciferase gene. Alternative alleles were introduced by overlap extension PCR.  [51,52] view of the candidate functional SNPs (black lines) associated with autoimmune diseases in the 17q12-21 locus and the corresponding putative enhancers (highlighted in green). Histone acetylation tracks provided by ENCODE [53,54] illustrate enhancer activity in a variety of primary human leukocytes. Gene schemes are shown in purple. Thin lines correspond to introns, thicker parts correspond to exons with coding sequences being the thickest. Direction of transcription is indicated by arrows next to gene names. DHS-DNaseI hypersensitivity clusters in 125 cell types from ENCODE (V3). TF binding-ChIP-seq peaks for 161 transcription factors in 91 cell types from ENCODE. (b) Linkage disequilibrium between candidate SNPs (r 2 in CEU (Utah Residents with To maximize the chance to observe the putative enhancers work, we used a variety of human leukocytic cell lines for transfection ( Table 7).
The results of the luciferase assay are shown in Figure 2. As alternating alleles have similar frequencies in European population for all the studied SNPs, we refer to them as risk and non-risk instead of major and minor alleles. For rs12946510, the difference in luciferase expression between two allelic variants was statistically significant in both B cell lines and activated U-937. Direction of the effect was cell-dependent. Risk allele inhibited enhancer activity in mature B cell line MP1, contrary to pre-B cell line Nalm6 and activated U-937 where it boosted luciferase expression.
Among two candidate SNPs linked to the index variant rs2305480, only rs4795397 showed weak but statistically significant influence on luciferase activity. Its risk allele corresponded to lower luciferase signal in MP1 cells.
In another pair of tightly linked candidates, rs12709365 and rs13380815, associated with the same GWAS hit, only rs12709365 showed statistically reliable inhibition of luciferase expression by the risk allele in activated U-937.
Finally, risk allele of rs8067378 corresponded to twice higher reporter signal in MT-2 cell line.

Figure 2.
Effects of the candidate functional SNPs on enhancer activity in luciferase reporter assay. Two allelic variants of putative enhancers containing each tested SNP were cloned into modified pGL3 vector downstream of the firefly luciferase gene. A panel of leukocytic cell lines were transfected with these reporter constructs together with pRL. Luminescence was measured 24 h after. Relative firefly to Renilla signal was normalized to the value obtained from the promoter-only control. White boxes-protective alleles; black boxes-risk alleles; green boxes-size-matched negative controls. N ≥ 6. Mean ± SEM. * p < 0.05 for the difference between alternative alleles.
For rs12946510, the difference in luciferase expression between two allelic variants was statistically significant in both B cell lines and activated U-937. Direction of the effect was cell-dependent. Risk allele inhibited enhancer activity in mature B cell line MP1, contrary to pre-B cell line Nalm6 and activated U-937 where it boosted luciferase expression.
Among two candidate SNPs linked to the index variant rs2305480, only rs4795397 showed weak but statistically significant influence on luciferase activity. Its risk allele corresponded to lower luciferase signal in MP1 cells.
In another pair of tightly linked candidates, rs12709365 and rs13380815, associated with the same GWAS hit, only rs12709365 showed statistically reliable inhibition of luciferase expression by the risk allele in activated U-937.
Finally, risk allele of rs8067378 corresponded to twice higher reporter signal in MT-2 cell line. We next analyzed in silico which transcription factors could differentially bind to alternative alleles of the studied SNPs using PERFECTOS-APE software [63,64] with JASPAR collection of binding motifs [65] (Supplementary Table S2). The most potent polymorphism among our candidates-rs12946510-was predicted to alter binding affinity of five transcription factors to the corresponding DNA more than 20-fold: MEF2A, SOX3, MEF2C, FOXO1, and SOX6. Of them, MEF2C and FOXO1 are highly expressed in MP1 cell line, while SOX6 and MEF2A are scarce and SOX3 is totally absent [66]. ChIP-seq data from ENCODE is available for MEF2A and MEF2C, and their signals overlap rs12946510 in the lymphoblastoid cell line GM12878 (Figure 3). Therefore, in MP1 cells rs12946510 could act through creating MEF2A/MEF2C binding site or by disrupting the binding site for FOXO1. We next analyzed in silico which transcription factors could differentially bind to alternative alleles of the studied SNPs using PERFECTOS-APE software [63,64] with JASPAR collection of binding motifs [65] (Supplementary Table S2). The most potent polymorphism among our candidates-rs12946510-was predicted to alter binding affinity of five transcription factors to the corresponding DNA more than 20-fold: MEF2A, SOX3, MEF2C, FOXO1, and SOX6. Of them, MEF2C and FOXO1 are highly expressed in MP1 cell line, while SOX6 and MEF2A are scarce and SOX3 is totally absent [66]. ChIP-seq data from ENCODE is available for MEF2A and MEF2C, and their signals overlap rs12946510 in the lymphoblastoid cell line GM12878 (Figure 3). Therefore, in MP1 cells rs12946510 could act through creating MEF2A/MEF2C binding site or by disrupting the binding site for FOXO1. . C to T substitution dramatically alters motif P-value for MEF2A and moderately for MEF2C and FOXO1 (fold-change is given according to PERFECTOS-APE [63,64]).

Discussion
When autoimmunity-associated SNPs from the GWAS Catalog [67] are visualized in the Genome Browser [51,52], a dense cluster can be seen in the human 17q12-21 locus. This area contains

Discussion
When autoimmunity-associated SNPs from the GWAS Catalog [67] are visualized in the Genome Browser [51,52], a dense cluster can be seen in the human 17q12-21 locus. This area contains hundreds of SNPs in strong LD (r 2 > 0.8), and each can potentially be causative, i.e., underlie the molecular mechanism that modulates disease risk. Several approaches exist to narrow down the list of candidate causative variants (fine-mapping) having their own strengths and weaknesses, including simple heuristic methods, penalized regression, Bayesian methods, trans-ethnic fine-mapping, genomic annotation, and integration of gene expression data with GWAS data [15,68].
We used the results of two fine-mapping studies. Farh and colleagues [16] developed PICS algorithm that makes use of densely-mapped genotyping data to estimate each SNP's probability of being a causal variant given the observed pattern of association at the locus and combined this excellent mathematical tool with experimental data on histone modifications in primary human cells. Schmiedel et al., analyzed 136 asthma-associated SNPs in the 17q12-21 locus for their coincidence with DNase hypersensitivity sites in 62 primary cell types [17]. Of the six predicted autoimmunity-associated functional SNPs in the 17q12-21 locus, four showed ability to alter reporter expression level in our experimental system: rs12946510, rs4795397, rs12709365, and rs8067378-one for each index SNP.
Polymorphism rs12946510 showed the brightest results in our experimental follow-up. The highest difference between alternative alleles was observed for this SNP and it was the only candidate which revealed its functionality in several cell types. This variant was described in detail by Farh et al., as an example of a polymorphism with bright genetic and epigenetic evidence of causality which simultaneously is an eQTL [16]. Its risk allele is associated with decreased IKZF3 expression in peripheral blood [24]. IKZF3 encodes Aiolos transcription factor which is first detected at low levels in pro-B cells and is upregulated in pre-B and mature peripheral B cells. Aiolos is necessary for normal B cell development and generation of long-lived, high-affinity bone marrow plasma cells [12]. In our reporter system, risk allele of rs12946510 also inhibited transcription in mature B cell line MP1. In subjects carrying the risk allele of rs12946510 this variant is likely to decrease enhancer activity in mature B cells resulting in lower Aiolos expression and impaired B cell function. This, in turn, can lead to autoimmunity through spontaneous production of autoantibodies or other mechanisms [69].
An opposing effect of rs12946510 on luciferase expression was observed in Nalm6 and activated U-937, whose primary counterparts (pre-B cells and activated monocytes respectively) are not normally present in peripheral blood. Given that promoter-enhancer pairs are often cell-type specific [70], the question which genes in these cells may be sensitive to rs12946510 requires further investigation.
Most likely, in MP1 cell line rs12946510 acts through creating MEF2C binding site or by disrupting the binding site for FOXO1. MEF2C is B cell specific in mice [71]. It orchestrates lineage commitment and is needed for normal development and functioning of B cells [72]. FOXO1 is another transcription factor important for B cell development. It promotes and stabilizes specification to the B cell lineage [73].
Two possible explanations can be proposed for divergent effect of rs12946510 on enhancer activity in MP1 (inhibition by the risk allele) and Nalm6/U-937 (activation by the risk allele). This divergence is possible if the allelic effect is mediated through altered MEF2C binding in both cases, as MEF2C was previously shown to be either activating or repressing depending on its protein partners [72]. Alternatively, bidirectional effect observed in different cell types could be explained by effective predominance of either one or another transcription factor which binding sites are altered by rs12946510, because risk allele creates MEF2C binding motif while disrupting it for FOXO1.
Two candidate polymorphisms associated with the same GWAS hit, rs13380815 and rs12709365, could not be differentiated by fine-mapping strategy based on genomic annotation, because they are separated by only 183 nucleotides on the chromosome and overlap the same epigenetic marks. They cannot be prioritized by mathematical, trans-ethnic, or gene expression approaches ether, due to absolute LD (r 2 = 1) in all populations studied by the 1000 Genomes Project. Only experimental testing enabled to pinpoint one of them-rs12709365-as a possible causative SNP for inflammatory bowel disease and rheumatoid arthritis.
Fine-mapping strategy used by Schmiedel et al. [17] could not discriminate between two possibly functional SNPs, rs2313430 and rs4795397, associated with the same index SNP. In contrast, PICS algorithm, which takes into account the statistical data and linkage structure of the locus, assigns to these two variants very different probabilities of being causative: 0.0006 for rs2313430, and 0.0113 for rs4795397. In line with this prediction, rs4795397 showed statistically significant effect on luciferase expression. However, this data should be taken with caution as the surrounding DNA sequence did not demonstrate any enhancer properties in our system despite strong H3K27Ac and DHS signals in primary immune cells.
Polymorphism rs8067378 is an index SNP for primary biliary cirrhosis and at the same time it has the highest probability of being causative among its LD group members according to PICS. Verlaan et al., mentioned it as one of the top three allelic expression-associated SNPs in the locus, observed in both the CEU (Utah Residents with Northern and Western European Ancestry) and YRI (Yoruba in Ibadan, Nigeria) populations. These authors also found allelic differences in protein binding for this polymorphism using electrophoretic mobility shift assay, however they could not detect any allelic effect in luciferase reporter system [74]. This discrepancy between our results probably comes from a different study design used by Verlaan et al., because a three-times shorter genomic segment was used as an enhancer which did not improve expression from the SV40 promoter, and only one leukocytic cell line was used for transfection (Jurkat) in which we did not detect any significant effect either.
Identification of causative polymorphisms in each GWAS locus is the first step towards our understanding of how non-coding genetic variation translates into disease. This understanding can be built through identification of intermediate phenotypes, such as gene expression and functional differences between immune cells bearing alternative alleles. Studying narrow sub-populations of primary cells from donors of different genotypes can provide useful insights, however addressing the question of causal relationships between intermediate phenotypes and disease requires additional approaches, including Mendelian randomization and longitudinal studies [14]. Genetic engineering can also help to address this point without a need for huge sample sizes if adequate ex vivo cell systems are used.
In conclusion, using systematic approach we found four autoimmunity-associated SNPs in the human 17q12-21 locus with strong statistical, epigenetic and functional evidence of causality. Their target genes in various cell types are of great interest and can be further identified using genome editing coupled with transcriptomics. Funding: Most of this work was supported by the Russian Science Foundation (http://rscf.ru/en), grant # 14-14-01140. Publication mining for eQTL ( Table 6) was supported by the Program of fundamental research for state academies for 2013-2020, research topic 01201363823.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.