Approaching Shared Pathophysiology in Immune-Mediated Diseases through Functional Genomics

Immune-mediated diseases (IMDs) are complex pathologies that are strongly influenced by environmental and genetic factors. Associations between genetic loci and susceptibility to these diseases have been widely studied, and hundreds of risk variants have emerged during the last two decades, with researchers observing a shared genetic pattern among them. Nevertheless, the pathological mechanism behind these associations remains a challenge that has just started to be understood thanks to functional genomic approaches. Transcriptomics, regulatory elements, chromatin interactome, as well as the experimental characterization of genomic findings, constitute key elements in the emerging understandings of how genetics affects the etiopathogenesis of IMDs. In this review, we will focus on the latest advances in the field of functional genomics, centering our attention on systemic rheumatic IMDs.


Introduction
Immune mediated diseases (IMDs) are a diverse group of pathologies with different etiologies, characterized by a dysregulation of the immune system. These diseases show different effects on the organism, including either systemic or local symptoms, which may overlap between the diseases [1]. This complexity makes their diagnosis a clinical challenge, as different IMDs are found to have shared comorbidities and may co-occur in the same patient. A common example is cardiovascular disorders, which are present in several of these diseases [2][3][4][5], as well as the presence of autoantibodies, which have great clinical and diagnostic significance [6,7].
Thus, the high rates of familial clustering and comorbidities observed across IMDs indicate that they share molecular mechanisms of disease pathogenesis [8]. In the last decades, large-scale genetic studies, such as genome-wide association studies (GWAS) and Immunochip studies [9,10] have been essential to our understanding of IMD genetics, allowing the identification of a considerable number of loci associated with each individual disease [11][12][13][14], but also suggesting the existence of a common genetic background in autoimmunity [8].
the disease are not clear [15,16]. In this regard, functional genomics is very useful in order to identify the mechanism of action of disease-associated variants and therefore the mechanisms underlying complex diseases, thus allowing progress in translating genetic findings to the clinic [17,18]. As shown in Figure 1, the process from the identification of a disease-associated variant to its characterization at the phenotypic level includes different functional assessments, which differ depending on the genomic location of the variant.
The purpose of this review is to highlight the latest advances in the field of functional genomics, with a particular focus on systemic rheumatic diseases.

Figure 1.
Overview of different techniques used in functional genomics. These techniques cover the spectrum from early association studies, through techniques that explore the physical interaction of these variants and their effects on the transcriptome, to phenotype characterization and gene function. GWAS: genome-wide association studies; WES: whole-exome sequencing; WGS: whole-genome sequencing; sc-RNA-seq: single-cell RNA sequencing, TALEs: Transcription activator-like effectors; CRISPR: clustered regularly interspaced short palindromic repeats; Cas9: CRISPR associated protein 9.

Shared Genetics Across IMDs
In recent years, many efforts have been made in order to identify risk loci shared among IMDs by combining GWAS or Immunochip data across several diseases. This strategy allows a direct comparison of the genetic component of these diseases, as well as an increase in statistical power to detect associations with low-effect variants. To date, a considerable amount of pairwise cross-disease meta-analyses of GWAS data of systemic rheumatic diseases has been published [19- Overview of different techniques used in functional genomics. These techniques cover the spectrum from early association studies, through techniques that explore the physical interaction of these variants and their effects on the transcriptome, to phenotype characterization and gene function. GWAS: genome-wide association studies; WES: whole-exome sequencing; WGS: whole-genome sequencing; sc-RNA-seq: single-cell RNA sequencing, TALEs: Transcription activator-like effectors; CRISPR: clustered regularly interspaced short palindromic repeats; Cas9: CRISPR associated protein 9.
The purpose of this review is to highlight the latest advances in the field of functional genomics, with a particular focus on systemic rheumatic diseases.

Shared Genetics across IMDs
In recent years, many efforts have been made in order to identify risk loci shared among IMDs by combining GWAS or Immunochip data across several diseases. This strategy allows a direct comparison of the genetic component of these diseases, as well as an increase in statistical power to detect associations with low-effect variants. To date, a considerable amount of pairwise cross-disease meta-analyses of GWAS data of systemic rheumatic diseases has been published [19][20][21], which has led to the identification of many risk loci shared between pairs of these diseases. Furthermore, five studies combining the GWAS or Immunochip data of multiple IMDs simultaneously have been published, thus identifying a total of 75 new shared risk loci with some degree of pleiotropy in autoimmunity, which could partially explain the comorbidity observed among IMDs [22][23][24][25][26]. Good examples of known shared risk loci in IMDs are PTPN22, IL23R and TNFAIP3, which have allowed the repositioning of anti-TNF and anti-IL-12/IL-23 therapies to be used in rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE), among others [1].
On the other hand, a recent study identified shared germline variants that predispose patients to RA, SLE and primary Sjögren's syndrome (SjS) through whole-exome sequencing of 31 families, highlighting related T-cell-activating genes [27]. This familial aggregation suggests that a specific molecular pattern, leading to common pathogenesis in certain IMDs, could exist. Furthermore, Li et al. [22] quantified pairwise genetic sharing across 17 IMDs from the Immunobase resource, revealing a closer association among major systemic IMDs (including RA, SLE and systemic sclerosis (SSc)) than with other autoimmune disorders, such as psoriasis and inflammatory bowel disease. These studies support the idea that genetic pathways are shared among apparently clinically different IMDs and therefore a molecular reclassification of these diseases could lead to the discovery of new biomarkers for patient stratification and prognosis [28]. In line with this, a recent study stratified seven systemic IMDs into groups of molecular patterns, taking into account high dimensional molecular data including genome, transcriptome, and methylome data from whole blood samples, performing an unsupervised clustering analysis. Authors observed that systemic IMDs clustered in three different groups, representing "inflammatory", "lymphoid" and "interferon" groups, with specific molecular patterns independently from their clinical classification [29].
The emergence of GWAS and associated genotyping platforms led to an increase in the number of variants associated with complex diseases, allowing for the development of more accurate genomic risk score (GRS) calculations, a direct application of genomic data to the clinical setting. GRS measures the additive effect of single nucleotide polymorphisms (SNPs), calculating the relative risk of individuals suffering from a given disease [30][31][32]. GRSs have been applied to different IMDs such as SLE, RA, SSc and psoriatic arthritis (PsA) [33][34][35][36]. Furthermore, genomic data can be useful for making a differential diagnosis, which is especially interesting in the case of inflammatory arthritis, since these conditions present with similar symptoms in the early stages [37]. In addition, the different relevant symptoms of the disease can be used to increase the predictive power of GRS, such as the appearance of lupus nephritis in SLE [33], or the appearance of autoantibodies in SSc [35].

Transcriptomic Studies
The analysis of the transcriptomic profile has been a major advance in genomics, especially thanks to the development of RNA sequencing (RNA-seq) technology [38], which has led to relevant findings in IMDs [39]. In this regard, transcriptomics has emerged as a useful approach to elucidate the target genes and pathways affected by genetic variants [40]. With GWAS, a global picture of the genetic variation across the genome influencing disease has been obtained. However, as previously mentioned, one of the main challenges in assigning biological meaning to GWAS findings is the fact that most disease-associated variants are located in non-coding regions of the genome. In recent years, it has been shown that these GWAS variants act as expression quantitative trait loci (eQTL), thus influencing disease by affecting gene expression levels through different mechanisms.
However, in the last few years, it has become clear that different cells show different gene expression profiles within a cell type, making it essential to evaluate the transcriptomic profile at single-cell resolution. The technique of single-cell RNA sequencing (scRNA-seq) [41][42][43] has made it possible to analyze the transcriptional profiles of key cells in the development of multiple and well-known diseases [44,45]. In this way, researchers have gone from studying large cell populations based on surface molecules to being able to differentiate cell subpopulations thanks to their gene expression profiles within a classical classification [46]. However, correctly identifying the true cell types present in a sample remains a challenge. In this regard, several bioinformatics tools have been recently developed in order to address this issue [47]. For example, some methods, such as that of Seurat [48] and Scanpy [49], allow for the clustering of subpopulations using the method of analyzing shared nearest neighbors, which mixes principal component analysis and graphical analysis.
Currently, there are no published studies of scRNA-seq among cross-diseases in IMDs, but given the relevance and impact of this technique; it is important to highlight existing studies in separate IMDs. In addition, it is expected that this novel approach will allow for the identification of common expression patterns and molecular pathways in autoimmunity. Several studies of scRNA-seq have been performed on immune cells of healthy controls compared against diseased cells. A signaling gradient, both spatial and transcriptional, was found for NOTCH3 in perivascular fibroblasts from both RA patients and healthy controls, but they were upregulated in RA fibroblasts. Murine models showed a decrease in inflammation when deleting or blocking Notch3 [50]. Furthermore, another study found that a subpopulation of fibroblasts-myofibroblasts, characterized by the presence of high expression levels of the ACTA2 gene-were detected in fibroblasts from lung samples of healthy individuals and interstitial lung disease patients with SSc. This cellular subtype is considered to be a key player in the development of the disease, since myofibroblasts have a high capacity for synthesis and proliferation, and also produce very high levels of collagen type I. Therefore, this study suggests that ACTA2 transcript levels could be used as a potential biomarker for SSc [51].
Several scRNA-seq studies have been performed in T-cells, a major cell type in the immune system with an important role in the pathogenesis of IMDs [52,53]. In order to search for biomarkers in RA, a recent study used scRNA-seq data to construct models of disease-associated cell types, their expression profiles and putative interactions using a multicellular model of disease. The authors observed that the analysis of the T-cell transcriptome can serve as a biomarker of different diseases, but also that the profile of T-cells can lead to the discovery of protein biomarkers in these cells [54].
Furthermore, an important application of this technique is the analysis of the B-cell receptor (BCR) and the T-cell receptor (TCR). These receptors, which are the result of the recombination of immunoglobulin genes during the development of these cells in the bone marrow, play a relevant role in autoimmunity [55][56][57]. The rise of high-throughput genotyping techniques and bioinformatic tools has allowed for the analysis of these highly variable and complex regions, such as BCR and TCR [58,59]. In this sense, islet antigen-reactive T-cell clonotypes were identified by their TCR profile in the peripheral blood of type 1 diabetes (T1D) patients [60]. A similar strategy was carried out in RA, in which the transcriptomic profile of the cell and the TCR of CD4+ T-cell clones was used, combining it with data on clonal expansion, tissue infiltration and membrane markers, enabling the detailed characterization of these cells [61].
Given the large amount of data being generated using different -omics approaches between different fields, their bioinformatics integration is crucial in order to achieve biological insights from genetics associations. An example is RolyPoly, which prioritizes disease-associated variants detected by GWAS using Bulk RNA-seq and scRNA-seq expression data [62].

Epigenomics Studies
Our knowledge of the chromatin landscape and the epigenetic regulatory elements, as well as their importance in disease pathogenesis, has undergone exponential growth in recent years. In this regard, different global initiatives and projects have emerged with the aim of sharing and putting together the knowledge in this area. Some of these projects are the Encyclopedia of DNA Elements (ENCODE) project [63], the Roadmap Epigenomics Consortium [64] and the Blueprint of Haematopoietic Epigenomes project [65], which aim to identify all the functional elements of the human genome across dozens of tissues.
Several epigenomic studies have been conducted in IMDs such as SSc or SjS in recent years, including histone modification, DNA methylation and non-coding RNA studies [66,67]. In this sense, a complex multidimensional study in RA fibroblast-like synoviocytes (FLS), an aggressive phenotype of fibroblasts in RA, was performed with the objective of establishing a high-resolution epigenomic landscape in RA. This study included histone modifications, open chromatin, RNA expression and DNA methylation analysis in RA FLS [68]. The effect of variants on histone regions is relevant, as seen in SLE. An enrichment of enhancer histone quantitative trait locus (hQTL) variants seems to have an important effect on gene expression in haplotypes with an eQTL in an SLE model, affecting it through the eQTL itself in most cases. Furthermore, its effect has been seen in HLA class II, when it is located in these regions, supporting data showing high allelic imbalance, even in the epigenome, in regions of post-translational histone modifications (PTM), using 3D chromatin techniques [69].
Another example of a multidimensional epigenomic study in IMDs is that performed in Takayasu arteritis, identifying the functional consequences of the susceptibility locus rs2069837 in a non-coding region of the IL6. Change in this region causes an alteration in a chromatin loop, leading to the recruitment of a repressor complex, which ends with the suppression of the anti-inflammatory gene GPNMB [70].
During the last years, our understanding of higher-order chromatin architecture has deeply changed. eQTL analyses have correlated genomic loci with variations in expression levels of nearby genes [71], thus providing a hypothetical mechanical link between a large number of loci associated with IMDs and the regulation of gene expression [72,73]. Nevertheless, little is known about how these genes and regulatory elements physically interact with each other across kilobase and megabase distances. In this sense, recent advances in 3D conformation of the genome can help to link local changes in chromatin with an effect on the expression of certain genes.
Nowadays, long-range chromatin interactions across an entire genome can be detected thanks to sequencing-based chromatin conformation capture techniques, highlighting Hi-C [74], which allows an ensemble average of interactions across the genome to be observed as a heatmap of interaction frequency. This spatial organization seems to be a general property of the genome, as it is consistent throughout and stable across different cell types, as well as being highly conserved between species. However, focusing at individual loci and at a higher resolution, more variability emerges. In this regard, a recent study performing Hi-C on lymphoblastoid cell lines from 20 individuals demonstrated that common SNPs can influence 3D chromatin conformation, and that this variation is often accompanied by variation in gene expression, transcription factor binding and histone modification [75].
In order to obtain high-resolution interaction maps, a new technique called capture Hi-C (CHi-C) enables an enrichment of Hi-C interactions through the capture of specific regions [76,77]. In this regard, Martin et al. characterized for the first time the interactions of confirmed susceptibility loci for four IMDs, namely RA, T1D, PsA and juvenile idiopathic arthritis (JIA) in different cell lines [76]. Interestingly, a considerable amount of disease-associated SNPs interacted with distant promoters instead of with the nearest gene, thus implicating different target genes. Subsequent studies using CHi-C data from T-and B-cell lines identified a novel causal gene, IL20RA, in the pan-autoimmune genetic susceptibility region 6q23 [77]. Associations within this region have been generally assigned to TNFAIP3, as it is the closest gene. Thus, these results provide new insights into IMD genetics, as well as a new perspective in the way we perceive the causal gene in disease, with the obvious implications that these findings have on pathogenic mechanisms and the discovery of potential therapeutic targets. In line with the above, a recently-published study performed CHi-C on RA-, JIAand PsA-associated loci in T-helper and B-cell lines in order to identify high confidence candidate causal genes [78]. These genes were significantly enriched in disease-relevant pathways, and were used to interrogate drug target information to identify existing treatments that could be potentially repositioned to treat these diseases. In addition, the versatility of the CHi-C technique allows the identification of regions interacting with promoters (promoter CHi-C) [79,80]. In this regard, Burren et al. identified common candidate genes for five IMDs in activated T-cells using promoter CHi-C, highlighting an interaction-mediated regulation of IL2RA expression [81]. Furthermore, recently published results observed a specific promoter interaction landscape in THP-1 macrophages stimulated with lipopolysaccharide, with an enrichment of immune risk variants within the interacting regions [82]. These results demonstrate that the study of chromatin interactomes from specific cell types can help elucidate common patterns in regulatory pathways of IMDs.
A more recent technique called HiChIP has enabled the generation of contact maps combining Hi-C libraries with chromatin immunoprecipitation (ChIP), a technique used to investigate the interaction between proteins (such as transcription factors or determined histones) and DNA [83]. In this regard, a recent HiChIP study that combined the detection of H3K27ac and cohesin subunit Smc1 HiChIP in T-cells of T1D mice model showed hyperconnected 3D clusters in T1D-associated regions, including important genes in autoimmunity such as BCL11B and ETS1, whereas these hyperconnected clusters were not present in T-cells from diabetes-resistant mice [84]. Another study performed in immune cell line models used H3K27ac HiChIP in order to link SLE risk variants, traditionally associated with the autoimmunity risk gene TNIP1, with enhancer regions, observing that the TNIP1 haplotype extended to neighboring genes [85].
The combination of the different techniques described above has led to the identification of relevant genes in autoimmune diseases. As an example, recent studies combining publicly available Hi-C, CHi-C and HiChIP data identified novel candidate genes for SSc and PsA [86].

Experimental Characterization
Despite major advances in functional genomics techniques, characterization studies remain the most successful tool to identify the effect of a mutation. In this way we can determine the cellular phenotype and are able to sort out the effects of suppressing, rescuing, over-expressing or under-expressing a gene in a controlled environment. There are different approaches in characterization studies, the most classic being fluorescent reporter studies (e.g., luciferase) and genome editing, such as transcription activator-like effectors (TALE) and the clustered regularly interspersed palindromic repeats (CRISPR) technique associated with Cas9 (CRISPR-Cas9).
Reporter studies have come a long way, with the development of high-throughput techniques using massive parallel reporter assays (MPRAs). An example is the association of non-coding candidate SNPs in several IMDs with the disruption of a regulatory circuit driven by NF-κB, which constrains T-cell activation through the upregulation of TNFAIP3 (an inhibitor of NF-κB), thus leading to an autoimmune response in activated T-cells [87]. However, these techniques are still not as powerful as genetic editing studies, as they study variants outside their cellular and organic context. Among all the characterization tools, CRISPR-Cas9 stands out. This technique is based on the endonuclease repair capability of Cas9, and it is capable of making changes to the genome through genetic guidance [88].
Recent studies have demonstrated that this technology can be used to investigate and validate pathological mechanisms of IMDs proposed by the different genomic approaches previously mentioned. In this line, a recently published study targeted the variant rs6927172, which is strongly associated with some IMDs and is located upstream of the TNFAIP3 gene. Using CRISPR/Cas9, the authors induced the deletion of this variant in HEK293 cells, resulting in reduced TNFAIP3 expression, and, interestingly, they also found a reduction in OLIG3 and IL20RA expression [89]. In a similar study performed in primary T-helper cells, researchers also observed a reduced expression of TNFAIP3 after inducing the deletion of rs6927172, as well as other edits [87]. In addition, it is worth mentioning a CRISPR/Cas9 screening study developed in primary human T-cells in which the authors identified genes regulating the TCR response after stimulation [90]. The combination of CRISPR/Cas9 with other characterization and functional genomics techniques constitutes the perfect toolset to uncover mechanistic insights of potential functional variants. In this regard, a recent study combining CRISPR/Cas9 and luciferase assays with Hi-C and CHi-C data observed that the allele rs13239597-A, which is strongly associated with SSc and SLE, acts as an allele-specific enhancer regulating IRF5 expression [91].
On the other hand, the CRISPR/Cas9 technology can also be used to directly treat IMDs. In a very interesting study, authors were able to activate the transcription of the insulin gene in fibroblasts from T1D patients using this technology [92]. Thus, more attention will be paid to CRISPR/Cas9 technology in the near future as a revolutionary method to study and treat IMDs.

New Approaches for Drug Targeting
The identification of new targets is a critical step in the drug discovery process. In this sense, the recent massive accumulation of different genomic data, mainly through GWAS, and their annotation through different functional data, establish the perfect framework to elucidate the underlying pathogenic mechanisms of IMDs and thus prioritize potential new therapeutic targets [93].
In this regard, one of the main examples of the usefulness of genomic studies in the identification of potential new drug targets is a study published by Okada et al. [11]. Through the largest GWAS conducted in RA, the authors identified more than 100 loci associated with this disorder. Subsequently, using bioinformatic methods based on functional annotation, they identified a total of 98 biological candidate genes, some of which were targets of RA therapies, whereas others were targets of drugs potentially useful for the treatment of this disease. A more recent GWAS performed in SSc identified a total of 27 independent risk loci for this condition. The subsequent functional analysis, using HiChIP data, of the most probable causal variants allowed the identification of 43 robust target genes, highlighting CD80 and BLK as potential drug targets [14]. Furthermore, meta-analysis of GWAS data, including different IMDs, also led to the identification of shared risk loci, as well as potential new drug targets through drug enrichment analysis [24,26]. It is worth mentioning that drugs with mechanisms based on genetic evidence have a higher probability to be approved through the drug development process [94].
On the other hand, the biological function of a genetic change can be easy to observe, thus making it easy to identify a potential drug target. For example, the knowledge of mutations and deletions occurring in the JAK3 gene that cause severe combined immunodeficiency [95] was useful to develop drugs such as tofacitinib to treat inflammation in RA [96].
Additionally, genomic and epigenomic data, using functional genomic approaches, are being integrated in order to facilitate drug repurposing in IMDs [97]. In this regard, analysis of capture Hi-C data from B-cells and T-cells to identify causal genes for rheumatic diseases revealed many disease-associated genes to be targets of existing drugs [78]. A recent study demonstrated an approach that integrates functional genomics and immune-mediated annotations with the evidence of physical interaction in order to prioritize drug targets, an approach which has been validated and applied in RA [98]. In this sense, the emergence of TNF as a highly ranked target confirms the utility of this approach [98,99].

Conclusions
As noted throughout this review, the combination of different functional genomics approaches is proving useful for the identification of potentially causal variants and causal genes in autoimmunity. In this regard, the whole process, from the early discovery of genetic associations through association studies to the validation and functional repercussions of these signals, has become an essential and holistic strategy (Figure 1). Today, it is evident that these approaches that in the past were applied separately must be taken from a global point of view in order to fully understand the underlying pathological mechanisms of IMDs. The reduction in the cost of these techniques has allowed an increasing number of researchers to afford to use a combination of these experiments to refine associations with IMD susceptibility. Thus, the genetic background of IMDs and their manifestation through their comorbidities is likely to lead to the discovery of shared genetic mechanisms. Elucidating the key features of the genetic changes shared by these diseases will allow a better classification of patients, as well as the development of more effective therapeutic interventions. Funding: This work was supported by Red de Investigación en Inflamación y Enfermedades Reumáticas (RIER) from Instituto de Salud Carlos III (RD16/0012/0013).