Expression of DDX11 and DNM1L at the 12p11 Locus Modulates Systemic Lupus Erythematosus Susceptibility

Objectives: This study employed genetic and functional analyses using OASIS meta-analysis of multiple existing GWAS and gene-expression datasets to identify novel SLE genes. Methods: Four hundred and ten genes were mapped using SNIPPER to 30 SLE GWAS loci and investigated for expression in three SLE GEO-datasets and the Cordoba GSE50395-dataset. Blood eQTL for significant SNPs in SLE loci and STRING for functional pathways of differentially expressed genes were used. Confirmatory qPCR on SLE monocytes was performed. The entire 12p11 locus was investigated for genetic association using two additional GWAS. Expression of 150 genes at this locus was assessed. Based on this significance, qPCRs for DNM1L and KRAS were performed. Results: Fifty genes were differentially expressed in at least two SLE GEO-datasets, with all probes directionally aligned. DDX11, an RNA helicase involved in genome stability, was downregulated in both GEO and Cordoba datasets. The most significant SNP, rs3741869 in OASIS locus 12p11.21, containing DDX11, was a cis-eQTL regulating DDX11 expression. DDX11 was found repressed. The entire 12p11 locus showed three association peaks. Gene expression in GEO datasets identified DNM1L and KRAS, besides DDX11. Confirmatory qPCR validated DNM1L as an SLE susceptibility gene. DDX11, DNM1L and KRAS interact with each other and multiple known SLE genes including STAT1/STAT4 and major components of IFN-dependent gene expression, and are responsible for signal transduction of cytokines, hormones, and growth-factors, deregulation of which is involved in SLE-development. Conclusion: A genomic convergence approach with OASIS analysis of multiple GWAS and expression datasets identified DDX11 and DNM1L as novel SLE-genes, the expression of which is altered in monocytes from SLE patients. This study lays the foundation for understanding the pathogenic involvement of DDX11 and DNM1L in SLE by identifying them using a systems-biology approach, while the 12p11 locus harboring these genes was previously missed by four independent GWAS.


Introduction
Systemic lupus erythematosus (SLE) is a complex disorder manifesting as a syndrome [1]. In SLE, identification of susceptibility genes is of great relevance for understanding specific pathobiological mechanisms and expanding the number of molecular targets for clinical testing and drug discovery. Despite the identification of a large number of genes in multiple SLE genome-wide association studies (GWAS) and candidate gene studies, together they explain only 15 to 30% of SLE heritability [2][3][4]. It is possible that several loci of modest significance remain to be discovered due to the complex, syndromic nature of SLE, along with other GWAS limitations. Multiple strategies have been applied to tease out such modest genetic effects including GWAS meta-analyses, inclusion of different populations, gene-based testing and biological pathway analysis [2,4].
A complex disease such as SLE could be thought of as a mixture of multiple resembling phenotypes, each a result of a separate mutation/susceptibility variant, pooled in GWAS cohorts [2]. Finding a particular gene, then, depends on the enrichment of a causal variant carrying haplotype in the study sample. Genotyping a multitude of single-nucleotide polymorphisms (SNPs) in GWAS leads to multiple testing and proportional signal problems, preventing modest associations from being distinguished from random noise. GWAS have majorly contributed to genetic discovery of complex disorders; however, genotyping an enormous number of variants leads to a high rate of false positivity. This is dealt with by multiple testing corrections, which directly result in a high false negativity rate leading to missing heritability, a major cause of disappointment with GWAS. The other approach to deal with the inherent false positivity has been association binning methods such as genebased testing. Such tests generally work by assigning the most significant, or a weighted p value to a gene. Binning methods, by decreasing the number of tests per study, potentially increase power and reduce the multiple-testing burden. Caveats with such approaches include skewing of the association signal due to weighting procedures and to the large extent of LD in historically more recent mutations. OASIS, based on linkage disequilibrium (LD) clustering (Available online: https://ldlink.nci.nih.gov/?tab=ldmatrix (accessed on 10 May 2021)), can be used to mine existing GWAS datasets for new genes [2,5]. OASIS provides an alternative to increasing sample size for GWAS by composite analysis, unifying two aspects of the LD phenomenon: strength of association and number of surrounding significant SNPs. A genomic convergence approach [6] mapping genetic association signals on expression data, and other biological studies, can then be used for verification of particular genes. The concept is that multiple datasets with results pointing in the same direction is evidence of a true scientific finding [6]. In genomic studies, the most frequent datasets that have been converged are genetic association studies highlighting a candidate gene and the expression of that gene in a biologically relevant tissue.
We mapped 410 genes to 30 SLE loci identified using OASIS analysis of two dbGAP GWAS datasets (6077 subjects; 0.75 million SNPs) [2]. Using a genomic convergence approach, these genes were investigated for expression in three SLE GEO datasets and a fourth from Cordoba, Spain. This methodology identified DDX11, located at the 12p11 locus, as an SLE susceptibility gene. The most significant SNP in GWAS is an eQTL modulating expression of DDX11. DDX11 is an RNA helicase involved in genome stability and is repressed in SLE. Since the 12p11 locus has previously been linked to familial SLE as well, we performed a comprehensive screen of the 12p11 locus (~50K SNPs) in two additional GWAS. This study confirmed DDX11 and further identified DNM1L and its eQTLs to be associated with SLE. DNM1L, involved in mitophagy, is upregulated in SLE. The expression of both DDX11 and DNM1L was confirmed to be significantly altered in SLE by quantitative PCR on SLE monocytes.
Convergence of results from several datasets of multiple data types, such as genetic association and expression, provides robust evidence of pathobiological involvement of genes. This study, by utilizing genomic convergence of the 12p11 locus with four European GWAS and four gene expression datasets, followed by eQTL and protein network analysis, identified DDX11 and DNM1L as novel SLE susceptibility genes.

Global Genomic Convergence
SNIPPER located 410 genes (Table S1) in 30 SLE loci identified by OASIS 2, and these were tested in three GEO expression datasets. Over 1000 expression probe sets in GEO datasets GSE30153, GSE13887 and >500 probe sets in GEO dataset GSE10325 were identified for the reference gene list. Significant t-tests narrowed the search to 103 probes in dataset 4193 (GSE30153), 141 probes in dataset 4719 (GSE13887) and 66 probes in dataset 4185 (GSE10325). An additional 215 probes were found to be significant in GSE10325: T-cells, B-cells and Monocytes datasets. Hence, a total of 525 statistically significant probe sets in six GEO datasets were finalized. This composite probe list was matched against the SNIPPER reference gene list of 410 genes to remove those that the GEO query included, but were not found in the OASIS SLE loci. Hence, 187 unique genes in the six GEO datasets were identified that had significant changes in expression and were located in the 30 SLE loci (data not shown).
Of the 187 candidate genes, 55 genes were found to be differentially expressed in at least two GEO datasets with all probes directionally aligned (Table S2). Ten genes crossed the false discovery rate (FDR) and seven crossed the Bonferroni correction (Table S2). DDX11, located in OASIS locus 19 (12p11.21), was downregulated but did not cross the FDR for expression. In the Cordoba expression dataset, GSE50395, 518 genes were found to have differential expression. Of these, 10 matched with the reference list of 410 genes (Table S2). BARHL1 had the most significant expression change (p = 8.9 × 10 −5 ) and DDX11 had the second highest expression change (p = 8 × 10 −3 ).

Locus 12p11-Specific Genomic Convergence
As 12p11 had previously been linked to SLE [7], this locus was further tested for association using OASIS in two additional GWAS (db3 and db4). Both these GWAS originally failed to identify association at this locus mainly due to statistical corrections for multiple testing. LocusZoom.js v0.12 plots showed the association signal at 12p11.21 ( Figure 1A-D). In db3, there was a modest association peak near DDX11 with the most significant SNP being rs7485934 (p = 1.45 × 10 −4 ) located at 31.1 Mb. In db4, rs11050576 located at 30.1 Mb was the most significant SNP (p = 5.81 × 10 −4 ) though the signal at 30-33 Mb was low (ranging mostly p = 1 × 10 −2 to 1 × 10 −3 ) compared to db3.
Genetic association of the entire 12p11 locus in db3 and db4 showed three association peaks located at 24-26 Mb, 30-33 Mb and 33-35 Mb ( Figure S1D). The first association peak containing KRAS was located upstream of the 12p11 locus and had low LD ( Figure S2).
LD was also modest in the second association peak containing DDX11 and DNM1L ( Figure S3). The third association peak was in a high LD, gene-sparse locus ( Figure S4).
The NCBI gene database identified 150 genes at the 12p11 locus, which were tested for significant expression changes in GEO expression datasets as described above. Besides DDX11, two genes were found to be differentially expressed in at least two GEO datasets with all probes directionally aligned viz DNM1L (GSE13887: p = 0.046; GSE10325: p = 0.041) and KRAS (GSE13887: p = 0.03; GSE10325: p = 0.02). Both these genes were upregulated in SLE.

Protein Network Analysis
Protein network analysis using STRING of 187 genes (Table S2) showed a complex interaction network. DDX11 interacted indirectly with several genes found to be significant in OASIS loci and GEO datasets. DDX11 interaction with STAT1/STAT4 was mediated via RIF1, RBM25 and then BAZ1A ( Figure 3A). Interestingly, the remaining 132 genes included several known SLE genes such as IRF5, BLK, TNIP1 and CD44; however, these genes were either significant in a single GEO dataset only or did not have uniform expression. STRING analysis provided further supportive evidence that DDX11 is potentially involved in SLE pathobiology. Interaction analysis of 55 genes (Table S3), showed that DDX11 interacted directly but modestly with MED6 (interaction score 0.31) ( Figure 3B). DNM1L interacted with PSMD14 (0.41), and this link connected it with KRAS via TNPO3 (0.17) ( Figure 3B). Clustering using k-means into three groups showed that DDX11, DNM1L and KRAS interacted through the PSMD14 node with major SLE genes, STAT1/STAT4 and IFIH1. Interestingly, PSMD14, located in an OASIS significant locus on chromosome 2, crossed the FDR as well as Bonferroni corrections (Table S3) and is, therefore, an important candidate gene for SLE.

Discussion
This study employed a genetic and functional analysis using OASIS meta-analysis of multiple existing GWAS and gene expression datasets to identify novel SLE genes. The most important finding of this study was the identification of DDX11 and DNM1L as SLE susceptibility genes at the 12p11 locus. DDX11 was downregulated in both the GEO and the Cordoba SLE expression datasets. DDX11 downregulation in SLE was confirmed using qPCR on monocytes. The most significant SNP, rs3741869, in OASIS locus 19 containing the gene DDX11, was a cis-eQTL regulating the expression of DDX11. DNM1L was upregulated in SLE, and several SNPs associated in GWAS db1 and db2 function as eQTLs regulating its expression. Resequencing studies will identify the functional variant(s) in DDX11 and DNM1L. It may be possible that their expression is influenced by haplotypes of eQTL SNPs, rather than by a single variant. Both DDX11 and DNM1L interact with multiple known SLE genes including STAT1/STAT4 and IFIH1, all of them closely associated with SLE development, and also identified using genomic convergence of OASIS loci and expression analysis. This study provides a major step forward in the SLE genetics of 12p11 locus by extensively mapping the association signal in four independent GWAS that had originally designated this locus as negative, functionally identifying DDX11 and DNM1L as susceptibility genes whose expression is altered in SLE and identifying the potentially responsible eQTLs. Further supportive evidence of the involvement of these genes in SLE pathways was provided by protein network analysis.
The DEAD/H box helicase, DDX11, is an RNA helicase involved in genome stability. DDX11 mutations cause the Warsaw Breakage Syndrome (WABS), which shows features of genome instability similar to Fanconi anemia (FA) [8]. DDX11 functions as an FA pathway backup, and its deficiency leads to impaired strand repair [9], while its inhibition caused apoptosis in melanomas [10]. Previously, another melanoma-associated gene, Melanoma differentiation antigen 5 (MDA5), has been shown to be important in SLE [1]. Moreover, DDX11 is involved in immunoglobulin diversification [9], which makes it an even more interesting candidate gene for SLE.
DNM1L has been shown to alter mitophagy in T and B cells of SLE mouse models [11,12]. Mitophagy is autophagic removal of dysfunctional mitochondria and is important in immune regulation. Altered mitophagy led to auto-antibody production and lupus nephritis [11,12]. DNM1L expression modulates mitophagy, though heterogeneously, and needs further study [13].
Wide mapping of the 12p11 locus in four GWAS identified two further association peaks. KRAS was the only gene at 24-26 Mb (upstream of 12p11) with significantly altered expression in GEO datasets. However, this finding could not be replicated by qPCR of SLE and HD. KRAS remains an interesting SLE candidate gene because novel mutations have been identified in a few patients using next-generation sequencing [14][15][16]. The second association peak at 34 Mb is in a high LD, gene-sparse region. None of the genes located here showed significant expression change in SLE. This signal could be due to: (i) LD with DDX11 and DNM1L SLE susceptibility variant(s); (ii) resulting from population stratification or (iii) harboring a novel gene. Given the high LD in the region and its proximity to the DDX11/DNM1L locus, it is likely that the signal is due to LD with this locus.
Furthermore, this study identified several other SLE candidate genes such as PSMD14, which need additional data for verification. Hence, the replication-based genomic convergence approach with OASIS, and expression studies, can help identify novel SLE genes. This was previously not possible even with gene-based testing [2]. The identification of DDX11 and DNM1L as SLE susceptibility genes will provide further insight into SLE pathogenesis. Limitations of this study include a need for larger sample sizes to verify the expression change in DDX11 and DNM1L, and correlation with various sub-phenotypes of SLE to identify the risk phenotype. Further functional studies including proteomic, immunohistopathologic, in vitro and in vivo strategies would help to confirm the pathogenic relevance of these genes and may provide diagnostic therapeutic targets for SLE modulation. This study lays the foundation for understanding the pathogenic involvement of DDX11 and DNM1L in SLE by identifying them using a systems-biology approach, while the 12p11 locus harboring these genes was previously missed by four independent GWAS.

eQTL and Statistical Analysis
Association plots of the 12p11 locus were obtained using LocusZoom.js vs0.12 [Available online: http://locuszoom.org/ (accessed on 10 May 2021)] for all SNPs in db1 and db2. However, due to the large number of SNPs in db3 and db4, the most significant SNPs in OASIS analysis using a 20 kb window were plotted with LocusZoom. Significant SNPs in loci harboring differentially expressed genes in SLE were tested using Blood eQTL Browser (expression quantitative trait loci [Available online: https: //genenetwork.nl/bloodeqtlbrowser/ (accessed on 10 May 2021)] [22]. Statistical analysis was performed using SSPS V.15.0 (SPSS Inc., Chicago, IL, USA). For DDX11, DNM1L and KRAS expression, after normalization using GAPDH as internal control, statistical significance was assessed using t-tests. Receiver Operator Characteristic curves and P-P plots were generated using SPSS. Graphs were generated using GraphPad Prism 8.4.3.

Quantitative Real-Time PCR (qPCR)
Changes of DDX11 were validated in an independent cohort of 31 SLE patients and 32 healthy donors (HD) from Cordoba, Spain (Table 1) by quantitative real-time RT-PCR using a LightCycler thermal cycler system (Roche Diagnostics, Indianapolis, IN, USA), using GAPDH as housekeeping gene. Specific primers for DDX11 were previously reported [23]. These samples were collected after obtaining approval from the ethics committee of the Reina Sofia Hospital, Cordoba. All subjects provided written informed consent. Briefly, RT was performed using an NZY First-Strand cDNA Synthesis Kit Reverse Transcription Kit (Nzytech, Lisbon, Portugal), following the manufacturer's instructions. For the qPCR, the LightCycler thermocycler system (Roche Diagnostics, Indianapolis, IN, USA) was used. The reaction was carried out with SYBR ® Green (Promega Biotech, Madrid, Spain) according to the manufacturer's instructions. Expression of DDX11 was corrected by the geometric average of α-actin (ACT) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The data were analyzed by the 2-∆∆Ct method. The expression of DNM1L and KRAS was similarly assessed in 20 SLE cases and 20 HD controls, as described for DDX11 above.

Protein Network Analysis
All genes with significant and consistent directional change in expression in the GEO datasets were tested for protein-protein interactions using STRING [Available online: http://string-db.org (accessed on 10 May 2021)] [24]. Gene pairs that were either coexpressed or involved in experimentally validated interactions with at least a low score (0.15) were evaluated. Funding: This study was supported by grants from the Instituto de Salud Carlos III (PI18/00837), cofinanciado por el Fondo Europeo de Desarrollo Regional de la Unión Europea "Una manera de hacer Europa," Spain, and the Spanish Inflammatory and Rheumatic Diseases Network (RIER), Instituto de Salud Carlos III (RD16/0012/0015). C.L.-P. was supported by a contract from the Spanish Junta de Andalucía ("Nicolas Monardes" program). A.I.-C. was supported by a "Sara Borrell" contract (CD19/00255).

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Reina Sofia University Hospital (protocol code 4183; date of approval: 29 January 2021).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
All data relevant to the study are included in the article or uploaded as Supplementary Materials. All datasets are in publicly available repositories with their accession numbers in the manuscript.