Genetic and Real-World Clinical Data, Combined with Empirical Validation, Nominate Jak-Stat Signaling as a Target for Alzheimer’s Disease Therapeutic Development

As genome-wide association studies (GWAS) have grown in size, the number of genetic variants that have been associated per disease has correspondingly increased. Despite this increase in the number of single-nucleotide polymorphisms (SNPs) identified per disease, their biological interpretation has in many cases remained elusive. To address this, we have combined GWAS results with orthogonal sources of evidence, namely the current knowledge of molecular pathways; real-world clinical data from six million patients; RNA expression across tissues from Alzheimer’s disease (AD) patients, and purpose-built rodent models for experimental validation. In more detail, first we show that when examined at a pathway level, analysis of all GWAS studies groups AD in a cluster with disorders of immunity and inflammation. Using clinical data, we show that the degree of comorbidity of these diseases with AD correlates with the strength of their genetic association with molecular participants in the Janus kinases/signal transducer and activator of transcription (JAK-STAT) pathway. Using four independent RNA expression datasets we then find evidence for the altered regulation of JAK-STAT pathway genes in AD. Finally, we use both in vitro and in vivo rodent models to demonstrate that Aβ induces gene expression of the key drivers of this pathway, providing experimental evidence to validate these data-driven observations. These results therefore nominate JAK-STAT anomalies as a prominent aetiopathological event in AD and hence a potential target for therapeutic development, and moreover demonstrate a de novo multi-modal approach to derive information from rapidly increasing genomic datasets.


Introduction
As genome wide association studies (GWAS) have grown in size, often now numbering tens of thousands of research participants, the number of genes identified as contributing to disease susceptibility have correspondingly grown. This is as true for Alzheimer's Disease (AD) as it is for many other disorders, and bioinformatic and pathway analyses of this large number of susceptibility genes is providing a highly efficient method of proposing and prioritising underlying biological pathways for further study [1,2]. In some cases, this understanding adds to existing observations-such as the evidence from GWAS that pathways of inflammation are important in AD-whereas in other cases pathway analysis yields less expected findings, such as evidence that cholesterol synthesis and endocytic recycling are part of the pathological process [3]. However, such analyses have their limitations due to our incomplete understanding of the canonical molecular pathways. Namely, many pathways described as categorical entities through data sources such as Kyoto Encyclopedia of Genes and Genomes (KEGG) and similar repositories, are skeletal at best and only contain a fraction of the genes that might be involved in any given process. Furthermore, many genes are nominated in multiple different pathways through their pleiotropic function. As an example, one of the most replicated associations with AD, the gene CLU encoding clusterin, is involved in processes as diverse as complement signaling, protein binding and chaperoning, and cell survival [4]. In the context of this incomplete understanding together with inherent molecular pathway complexity, determining the underlying biology of disease from GWAS studies alone becomes difficult and hence inevitably limited.
In an effort to address this limitation, we reasoned that it should be possible to hone pathway analysis by utilising orthogonal datasets. Specifically, we hypothesised that pathways are more relevant to disease aetiopathogenesis if diseases that shared pathways also shared morbidity. Put another way, if two or more diseases are more commonly found to co-occur rather than by chance, and if those comorbid diseases also share molecular pathways, one would predict that those shared pathways are more likely to play a role in aetiopathogenesis. In order to test this reasoning, we combined pathway analysis of the GWAS associations from all diseases (as reported in the GWAS catalog; https://www.ebi.ac.uk/gwas/) together with a co-morbidity study from real-world data to identify shared pathological processes. We then tested the resulting pathway in observational and empirically derived genome wide expression datasets from human and rodent studies, and finally validated the results in empirical studies in rat models in vitro and in vivo. The results, demonstrating a role for JAK-STAT signaling in AD, are in line with the known contribution of inflammatory processes to the disease, but they further nominate a specific target for therapy and provide a possible approach to interpretation of GWAS data for other disease areas.

Overlap of Susceptibility Genes Across Human Disease
In order to identify biological pathways shared across different diseases, we utilised the GWAS catalogue [5] to obtain a list of all known gene associations with disease derived from GWAS studies. We used the experimental factor ontology (EFO) [6] to identify disease studies, filtering by diseases with at least 25 associated genes, and only including the 25 strongest associated genes ranked by p value where more than 25 genes have been found to show some association. No other filters, such as date of study, were applied. For any two genes 'α'and 'β'sampled in the GWAS-catalogue, we then calculated the number of KEGG pathways that these two genes share, obtaining a gene × gene matrix of which we show a section in Figure 1. In order to determine whether the susceptibility genes of any given disease share more pathways with the susceptibility genes of any other given disease than expected from chance alone, we used a non-parametric Wilcoxon rank-sum test. For any two genes (as indicated on the X and Y axes), the color of the corresponding cell represents how many KEGG pathways these two genes share. Each one of these 15 genes has been associated through a GWAS study with at least one of these diseases. Although performed for each disease in the GWAS Catalogue using the 20 most strongly associated genes (i.e., lowest p value), this figure only shows the top five genes and only three diseases for representational purposes.
Formally, if we denote the gene × gene matrix as 'n(α,β)' (with 'α' and 'β' being any gene sampled in the GWAS-catalogue), 'A' as the set of genes associated with disease 'A', and 'B' as the set of genes associated with disease 'B', the first sample 'S+' of our test becomes: while the second sample 'S-' becomes: then, a one-tailed Wilcoxon rank-sum test compares 'S+' against sample 'S-'. The statistical results of this comparison determine whether diseases 'A' and 'B' share more pathways than expected from chance according to their GWAS associations.

Contribution of Biological Pathways to Diseases
Given that the results from the analysis described above suggested an intersection between AD and inflammatory diseases and hence confirming known associations, as discussed in the results and discussion sections, we subsequently focused on this overlap. For each inflammatory disease sampled in the GWAS-catalogue that had statistical power for subsequent analysis (see below), we calculated a so-called pathway load for each KEGG pathway. This is a numeric value that represents the proportion of susceptibility genes that a given disease has on a given KEGG pathway. Given disease 'A' and pathway 'p', the pathway load is equal to the number of susceptibility genes that disease 'A' has on pathway 'p', divided by the total number of times that any associated gene of disease 'A' is annotated as belonging to any KEGG pathway. Number of shared pathways. Number of pathways shared by the top five genes of three of the studied diseases, namely Crohn's Disease and Type 1 and Type 2 Diabetes mellitus. For any two genes (as indicated on the X and Y axes), the color of the corresponding cell represents how many KEGG pathways these two genes share. Each one of these 15 genes has been associated through a GWAS study with at least one of these diseases. Although performed for each disease in the GWAS Catalogue using the 20 most strongly associated genes (i.e., lowest p value), this figure only shows the top five genes and only three diseases for representational purposes.
Formally, if we denote the gene × gene matrix as 'n(α,β)' (with 'α' and 'β' being any gene sampled in the GWAS-catalogue), 'A' as the set of genes associated with disease 'A', and 'B' as the set of genes associated with disease 'B', the first sample 'S + ' of our test becomes: while the second sample 'S − ' becomes: then, a one-tailed Wilcoxon rank-sum test compares 'S + ' against sample 'S − '. The statistical results of this comparison determine whether diseases 'A' and 'B' share more pathways than expected from chance according to their GWAS associations.

Contribution of Biological Pathways to Diseases
Given that the results from the analysis described above suggested an intersection between AD and inflammatory diseases and hence confirming known associations, as discussed in the results and discussion sections, we subsequently focused on this overlap. For each inflammatory disease sampled in the GWAS-catalogue that had statistical power for subsequent analysis (see below), we calculated a so-called pathway load for each KEGG pathway. This is a numeric value that represents the proportion of susceptibility genes that a given disease has on a given KEGG pathway. Given disease 'A' and pathway 'p', the pathway load is equal to the number of susceptibility genes that disease 'A' has on pathway 'p', divided by the total number of times that any associated gene of disease 'A' is annotated as belonging to any KEGG pathway.
Formally, if we denote 'm(A,p)'as the number of genes of disease 'A' that belong to pathway 'p', the pathway load is: We divide by ∀ρ m(A, ρ) to control for diseases whose genes may have been more thoroughly studied, and therefore included in more KEGG pathways, than less studied diseases.

Shared Contribution of Biological Pathways to Disease Comorbidity with AD
We determine the comorbidity of inflammatory diseases with AD using the US National Hospital Discharge Survey (NHDS https://www.cdc.gov/nchs/nhds/, [7]) records from 1979 to 2006, which can be found in the Inter-university Consortium for Political and Social Research (ICPSR), file ICPSR 24281 (http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/24281), or in the National Archive of Computerized Data on Aging (NACDA [8]). This randomised survey collects medical information, including final diagnosis coded in ICD-9-CM and demographic data, from the discharge records of a number of hospitals in the US, totaling 6,552,504 patient discharges in the sample used here. The record only includes hospitals with an average patient stay length of 30 days or less, excluding military and hospitals from institutions such as prisons [7]. This is an epidemiological resource that observationally collects diagnosis and demographics of patients, and must not be mistaken with other bioinformatics resources that curate information on pathways of mechanisms, such as KEGG or Reactome. In NHDS we identified all patients in the dataset with a diagnosis of AD (ICD-9-CM code 331.0) and aged above 60 years in order to focus on late onset AD cases. For each case we find one control matched in gender, age, race and survey year (i.e., the year at which the survey was ran on that patient). To eliminate the diagnoses with the lowest statistical power, we analyse only the inflammatory diseases that had more than 15 patients among the AD cases and their matched controls. For each inflammatory disease, we calculate the number of patients of that disease in the AD population divided by the number of patients of that disease in the matched population. For descriptive purposes we call this value ratioAD(A). Finally, for each KEGG pathway, we calculate whether the pathway load (load(A,p)) values correlate with the AD ratios (ratioAD(A)) using Pearson's correlation.

Pathway Gene Expression Dysregulation in AD
In order to examine the relationship between gene expression and disease we obtained RNA expression data from four independent datasets; 1) AddNeuroMed (ANM), a longitudinal multi-centre cohort study with blood samples from 105 AD cases, 125 Mild Cognitive Impairment (MCI) individuals and 114 controls [9,10]; 2) the Dementia Case Register (DCR), a single centre longitudinal cohort utilising the exact same protocol as ANM with blood samples from 90 AD cases, 65 MCI individuals and 73 controls; 3) the largest post-mortem AD dataset available in ArrayExpress, E-GEOD-44770, with samples of post-mortem pre-frontal cortex from 129 AD cases and 101 controls [11]; 4) a rodent in-vitro dataset of neuronal gene response to amyloid peptide [12]. Throughout this paper, these datasets are respectively denoted "Dataset 1", "Dataset 2", "Dataset 3" and "Dataset 4". The case ascertainment, diagnostic process and other details of ANM have been described previously [9,10,13] while DCR used an identical clinical and laboratory protocol. Demographics from the three human cohorts (Datasets 1 to 3) are reported in Table 1. To test for statistically significant differences in RNA-expression in these four datasets for a given pathway, we use a per gene general linear model (GLM) with a binomial link function, which models AD status (two levels per person-either AD patient or control) as a function of RNA-expression while controlling for a number of covariates (gender, age and sampling centre). The p values we report correspond to the RNA-expression variable per gene.

Proof of Concept in an In Vitro Rat Model of Aβ Exposure
In order to explore JAK-STAT signaling in vitro, primary neuronal cultures were generated from Sprague Dawley E18 rat embryos by papain dissociation according to the manufacturer's instructions (Worthington, Lakewood, NJ, USA) and cultured as previously described [14]. Briefly, brains were harvested and maintained in sterile PBS, hippocampi and cortices were dissected out using a dissection microscope (MoticEurope, Barcelona, Spain), triturated using a sterile glass Pasteur pipette and maintained in serum-free medium. Viable cells were counted using a hemacytometer (ThermoFisher Scientific, Waltham, MA, USA). Neurons were plated in Neurobasal medium supplemented with B27, 0.30% glutamine, 100 U/mL penicillin and 100 µg/mL streptomycin, at a density of 300,000 cells/mL in plates coated with poly-D-lysine and incubated at 37 • C in 5% CO 2 atmosphere. Neuronal cultures were treated 7-9 days post-plating.
Aβ1-42 peptide was purchased from Dr. David Teplow (California, UCLA) and was resuspended in 100% 1,1,1,3,3,3 hexafluoro-2-propanol (HFIP) at a final concentration of 1 mM. For complete solubilisation the peptide was homogenized using a Teflon plugged 250 µL Hamilton syringe. HFIP was removed by evaporation in a SpeedVac (Fisher Scientific UK Ltd. Loughborough, Leicestershire, UK), Aβ1-42 resuspended at a concentration of 5 mM in dimethylsulfoxide (DMSO) and sonicated for 10 min. Oligomers were prepared as previously described [15]: Aβ1-42 was diluted in PBS to 400 µM and 1/10 volume 2% sodium dodecyl sulfate (SDS) in H 2 O added. Aβ was incubated for 24 h at 37 • C and further diluted to 100 µM in PBS followed by 18 h incubation at 37 • C. Rat primary neuronal cultures were treated with 3 µM Aβ for either 4 h or 24 h. These time points were selected based on the commonly observed time progression of changes in RNA after Aβ treatment.

Proof of Concept in an In Vivo Rat Model of Aβ Exposure
We further tested whether the observations made in vitro were also found in in vivo models. Male wild type Wistar rats (approximately 300g body weight, 8 weeks of age) were subjected to bilateral injections of 50 µM Aβ1-42 or PBS (n = 5 per group) into the lateral ventricles (QPS, Grambach, Austria). Brains were collected 3 h post injection. Entorhinal cortex was subdissected, frozen in liquid nitrogen and processed for RNA extraction. Frozen tissues were thawed and homogenised in Trizol and total RNA extracted according to the manufacturer's instructions (Invitrogen, Paisley, UK).
Total RNA from entorhinal cortex (1 µg) was reverse transcribed using random hexamers and a Taqman RT kit (Applied Biosystems, Cheshire, UK) according to the manufacturer's instructions. PCR primers were designed using the Universal Probe Library package (Roche Molecular Biochemicals, Lewes, UK) and used in SYBR Green-based PCR reactions performed on a StepOnePlusTM (96 well) thermal cycler (Applied Biosystems, California, USA). Relative quantification of gene expression between samples was determined using the 2 −∆∆CT method as described by Livak and Schmittgen [16]. Internal control genes used to normalise for RNA input were HPRT and β-actin.
All samples were run in triplicate from three independent experiments. The mean crossing threshold (CT) values for both the target and internal control genes in each sample were determined and the 2 −∆∆CT calculations performed. The fold change in the target genes (after normalising to the internal control gene) were calculated for each sample and the mean calculated. Statistical significance was determined by Student's t-test. Data are represented as normalised fold increases over control samples. Values are given as mean ± s.e.m (standard error of the mean).

Ethical Considerations
All animal studies described in this manuscript were ethically reviewed and carried out in accordance with Animals (Scientific Procedures) Act 1986. We further certify that the research (including Embryonic and Fetal Research if appropriate) was conducted according to the requirements of POL-GSKF-410 and associated relevant SOPs, and that all related documentation is stored in an approved HBSM database. Human biological samples were sourced ethically and their research use was in accord with the terms of the informed consents

Genes Associated with AD Show Shared Susceptibility to Diseases of Immunity
Given that recent large scale association studies suggest genes related to AD risk are involved in many different biological pathways, only some of which would have been predicted in advance, we wondered whether some of these pathway associations would be shared with other diseases. To address this question systematically, we first obtained from the GWAS catalogue [5] a list of all genes that have been found to be associated with all diseases in that dataset. Subsequently, for each gene appearing in the list, we determined the number of KEGG pathways that that gene shared with each other gene on the list. As a general concept, we assume at this point that if two genes, one associated with one disorder and the other associated with a second disorder, both appear in the same biological pathways, then those pathways should commonly be associated with both diseases.
Using this data of genes associated with diseases and the KEGG pathways that those genes appear in, we then generated a gene-gene matrix, plotting for each gene associated with each disease the numbers of KEGG pathways shared with genes associated with each other disease in the dataset. Not surprisingly, we find many genes participate in many different pathways. This is illustrated for three  Figure 1, using for this explanatory purpose five genes only. In this illustrative segment of the data, it can be seen that the genes associated with Crohn's disease in particular participate in many different named KEGG pathways. Again, this is unsurprising as these genes are part of the human leukocyte antigen system encoding the major histocompatability complex; a much-studied master regulator of immunity. Using this gene-gene matrix we are then able to explore which diseases share pathways with other diseases. Again, in the illustrative segment of the data in Figure 1 it can be seen that the genes associated with Crohn's disease share no pathways with the genes associated with Type 2 diabetes (T2DM) but do share considerable numbers of KEGG pathways with genes associated with Type 1 diabetes (T1DM). Note that this observation is not driven by the same genes being associated with any two diseases, as we exclude from the comparison the number of pathways that HLA-DRB1 shares with itself in the T1DM-Crohn's instance. Rather the T1DM-Chrohn's association is driven by different genes sharing common pathways such as CTLA4 and INS, or HLA-DRB1 with genes others than itself, which are genes of the HLA complex genes associated with both T1DM and Crohn's disease.
The GWAS catalogue includes 59 diseases where studies have identified at least 25 associated genes per condition, and these were therefore used in this analysis. Using the Wilcoxon rank-sum test, most of these conditions did not show significant overlap of pathways derived from GWAS genes in the pathway space, except for a cluster of 18 disorders that strongly overlapped with each other (Bonferroni corrected p value < 0.05). Fourteen of these diseases were disorders commonly classified as diseases of immunity, while the other four were AD, age-related macular degeneration (ARMD), T1DM and Hypothyroidism (see Figure 2 for a segment of the data and Figure S1 for the full matrix). Part of this cluster is evident in Figure 2 as 11 diseases that overlap with each other in 44 out of 55 disease-disease pairs (repeated pairs eliminated) while the full cluster of 18 disorders includes 127 out of 153 significant pairs as seen in Figure S1. The disorder with fewest associations with other diseases from the cluster was seasonal allergic rhinitis (8 out of 18), while the disorders out of the cluster which had highest number of associations with the cluster were atopic eczema and HIV-1 infection (both 2 out of 18).

Among all KEGG Pathways, Associations with JAK-STAT Signaling are Shared between Diseases Co-Morbid with AD
Having demonstrated that AD, together with ARMD, hypothyroidism and T1DM, share multiple KEGG pathways with disorders of immunity, we then explored which of these pathways was responsible for this overlap. In order to do this systematically, we first determined the strength of association of each disease with each KEGG pathway, calculating the proportion of susceptibility genes that a given disease has for each KEGG pathway ('pathway load'). Then, using the US National Hospital Discharge Survey (NHDS), a dataset including diagnostic information from more than six million patients, we calculated the co-morbidity between disorders of immunity and late onset AD on discharge from in-patient care between 1979 and 2006. Finally, for each individual pathway we then calculated the degree of correlation between pathway load and comorbidity with AD. Most of the KEGG pathways showed no significant correlation, with only three exceptions, indicating possible meaningful shared pathways contributing to disease comorbidity (see Figure 3). Of these, 'Cytokine-cytokine interaction'(KEGG ID hsa04060) and 'Viral myocarditis'(hsa05416) did not survive Bonferroni correction for multiple comparisons (uncorrected p value 0.05), while 'JAK-STAT'(hsa04630) did survive (uncorrected p value 0.00022) suggesting that the shared GWAS genes on this pathway might contribute to the observed co-morbidity between AD and these disorders of immunity Overlap of molecular pathways in susceptibility genes for all disease. Susceptibility genes from association studies of multiple diseases overlap in the pathway space. Although we analyzed 59 diseases with at least 25 susceptibility genes each, for representational purposes the figure shows those classified as either eye disease, immune system disease, or nervous system disease. The color of each square in the table represents the p value obtained when calculating whether the GWAS-genes of diseases A and B overlap in the pathway space more than would be expected from chance. Full data is presented in Figure S1.

Among all KEGG Pathways, Associations with JAK-STAT Signaling are Shared between Diseases Co-Morbid with AD
Having demonstrated that AD, together with ARMD, hypothyroidism and T1DM, share multiple KEGG pathways with disorders of immunity, we then explored which of these pathways was responsible for this overlap. In order to do this systematically, we first determined the strength of association of each disease with each KEGG pathway, calculating the proportion of susceptibility genes that a given disease has for each KEGG pathway ('pathway load'). Then, using the US National Hospital Discharge Survey (NHDS), a dataset including diagnostic information from more than six million patients, we calculated the co-morbidity between disorders of immunity and late onset AD on discharge from in-patient care between 1979 and 2006. Finally, for each individual pathway we then calculated the degree of correlation between pathway load and comorbidity with AD. Most of the KEGG pathways showed no significant correlation, with only three exceptions, indicating possible meaningful shared pathways contributing to disease comorbidity (see Figure 3). Of these, 'Cytokine-cytokine interaction'(KEGG ID hsa04060) and 'Viral myocarditis'(hsa05416) did not survive Bonferroni correction for multiple comparisons (uncorrected p value 0.05), while 'JAK-STAT'(hsa04630) did survive (uncorrected p value 0.00022) suggesting that the shared GWAS genes Overlap of molecular pathways in susceptibility genes for all disease. Susceptibility genes from association studies of multiple diseases overlap in the pathway space. Although we analyzed 59 diseases with at least 25 susceptibility genes each, for representational purposes the figure shows those classified as either eye disease, immune system disease, or nervous system disease. The color of each square in the table represents the p value obtained when calculating whether the GWAS-genes of diseases A and B overlap in the pathway space more than would be expected from chance. Full data is presented in Figure S1. on this pathway might contribute to the observed co-morbidity between AD and these disorders of immunity Figure 3. Pathways association with disease from real-world data. As described in the methods, we calculate the correlation between the co-incidence of the inflammatory diseases (denoted ratioAD(A)) and the proportion of susceptibility genes of these diseases on any given pathway (denoted load (A,p)).This figure shows this correlation for the three pathways where such correlation was the most significant. The ratioAD(A) of each disease is represented in the Y-axis, while load(A,p) is represented in the X-axis. Each disease is represented by a number and a color, as shown in the legend to the right. The p values corresponding to each pathway are, from left to right, 0.00022, 0.027 & 0.039. Only JAK-STAT signaling survives multiple testing correction. T1DM stands for Type 1 Diabetes mellitus; COPD for Chronic obstructive pulmonary disease; Cytokine-cytokine for Cytokine-cytokine receptor interaction pathway; JAK-STAT for JAK-STAT signaling pathway.

Evidence for Altered JAK-STAT Pathway Gene Expression in AD Blood, Brain and in an In Vitro Model with Established Relevance to AD
If the KEGG pathway is responsible for the observed co-morbidity between AD and disorders of immunity, then we reasoned that altered JAK-STAT signaling would be a feature of AD. In order to explore this, we utilised gene expression datasets from blood from human cohort studies (Datasets 1 and 2), from brain from post-mortem human studies (Dataset 3) and from a rodent in vitro model relevant to AD (Dataset 4) with robust proof of concept using empirical studies with geneknockdown, in animal models and in post-mortem human brain.
We first examined expression of genes from the JAK-STAT pathway (KEGG ID hsa04630) in blood from patients with AD compared to unaffected controls, reasoning that post-mortem brain might have more late-stage, secondary changes of inflammation, possibly less relevant to aetiopathological pathways. We used two cohorts, one a multinational longitudinal study (AddNeuroMed-ANM; 105 AD and 114 unaffected age matched subjects; denoted Dataset 1 in this manuscript), and another, a single centre longitudinal study utilising the exact same protocol (Dementia Case Register-DCR; 95 AD and 78 control blood samples [10]; denoted Dataset 2). In ANM, 15 of the 47 JAK-STAT genes measured in this dataset were significantly dysregulated when comparing controls with AD subjects (p < 0.05 after Bonferroni correction, see Figure 4A). A binomial exact test reveals that this proportion of significant genes in the JAK-STAT pathway is larger than expected by chance (p value 5 × 10 −9 ). In DCR, six of the 47 JAK-STAT genes were significantly altered (p < 0.05), also revealing significance at the pathway level (p value 0.03). p values obtained in ANM were significantly correlated (p = 4 × 10 −7 for Spearman correlation, see Figure 4B) with those obtained from DCR. We also merged both datasets and repeated the analysis, finding again 15 out of 47 genes were dysregulated (p < 0.05) in the AD population when compared with the control population. We . Pathways association with disease from real-world data. As described in the methods, we calculate the correlation between the co-incidence of the inflammatory diseases (denoted ratioAD(A)) and the proportion of susceptibility genes of these diseases on any given pathway (denoted load(A,p)). This figure shows this correlation for the three pathways where such correlation was the most significant. The ratioAD(A) of each disease is represented in the Y-axis, while load(A,p) is represented in the X-axis. Each disease is represented by a number and a color, as shown in the legend to the right. The p values corresponding to each pathway are, from left to right, 0.00022, 0.027 & 0.039. Only JAK-STAT signaling survives multiple testing correction. T1DM stands for Type 1 Diabetes mellitus; COPD for Chronic obstructive pulmonary disease; Cytokine-cytokine for Cytokine-cytokine receptor interaction pathway; JAK-STAT for JAK-STAT signaling pathway.

Evidence for Altered JAK-STAT Pathway Gene Expression in AD Blood, Brain and in an In Vitro Model with Established Relevance to AD
If the KEGG pathway is responsible for the observed co-morbidity between AD and disorders of immunity, then we reasoned that altered JAK-STAT signaling would be a feature of AD. In order to explore this, we utilised gene expression datasets from blood from human cohort studies (Datasets 1 and 2), from brain from post-mortem human studies (Dataset 3) and from a rodent in vitro model relevant to AD (Dataset 4) with robust proof of concept using empirical studies with gene-knockdown, in animal models and in post-mortem human brain.
We first examined expression of genes from the JAK-STAT pathway (KEGG ID hsa04630) in blood from patients with AD compared to unaffected controls, reasoning that post-mortem brain might have more late-stage, secondary changes of inflammation, possibly less relevant to aetiopathological pathways. We used two cohorts, one a multinational longitudinal study (AddNeuroMed-ANM; 105 AD and 114 unaffected age matched subjects; denoted Dataset 1 in this manuscript), and another, a single centre longitudinal study utilising the exact same protocol (Dementia Case Register-DCR; 95 AD and 78 control blood samples [10]; denoted Dataset 2). In ANM, 15 of the 47 JAK-STAT genes measured in this dataset were significantly dysregulated when comparing controls with AD subjects (p < 0.05 after Bonferroni correction, see Figure 4A). A binomial exact test reveals that this proportion of significant genes in the JAK-STAT pathway is larger than expected by chance (p value 5 × 10 −9 ). In DCR, six of the 47 JAK-STAT genes were significantly altered (p < 0.05), also revealing significance at the pathway level (p value 0.03). p values obtained in ANM were significantly correlated (p = 4 × 10 −7 for Spearman correlation, see Figure 4B) with those obtained from DCR. We also merged both datasets and repeated the analysis, finding again 15 out of 47 genes were dysregulated (p < 0.05) in the AD population when compared with the control population. We then examined JAK-STAT genes in the MCI population compared to unaffected controls in both cohort datasets, finding 10 out of 47 genes dysregulated in both studies, with binomial test being also significant (p value 9 × 10 −5 ). We then examined the expression of JAK-STAT signaling genes in post-mortem brain tissue utilising a study of 129 AD subjects and 101 controls [11] (Dataset 3). In this cohort, 153 JAK-STAT genes were sampled, of which 53 showed significant dysregulation (p value < 0.05), and 37 of which survived multiple comparison correction (p value < 0.05 after FDR adjustment). A binomial exact test reveals that this proportion of significant genes in the JAK-STAT pathway is larger than expected by chance (p value 2 × 10 −16 ). With respect to the five genes that were significant in ANM (Dataset 1) and that were also sampled in DCR (Dataset 2), TYK2 (p value 0.0002), IFNAR2 (Interferon-alpha/beta receptor beta chain, 0.04) and PIAS1 (E3 SUMO-protein ligase, 2 × 10 −5 ) were also significantly altered in Dataset 3 (see Table 2). We then examined the expression of JAK-STAT signaling genes in post-mortem brain tissue utilising a study of 129 AD subjects and 101 controls [11] (Dataset 3). In this cohort, 153 JAK-STAT genes were sampled, of which 53 showed significant dysregulation (p value < 0.05), and 37 of which survived multiple comparison correction (p value < 0.05 after FDR adjustment). A binomial exact test reveals that this proportion of significant genes in the JAK-STAT pathway is larger than expected by chance (p value 2 × 10 −16 ). With respect to the five genes that were significant in ANM (Dataset 1) and that were also sampled in DCR (Dataset 2), TYK2 (p value 0.0002), IFNAR2 (Interferon-alpha/beta receptor beta chain, 0.04) and PIAS1 (E3 SUMO-protein ligase, 2 × 10 −5 ) were also significantly altered in Dataset 3 (see Table 2). Finally, we also analyzed a previously reported [12] in vitro cell model derived dataset (Dataset 4). This study demonstrated that in rodent neurons, exposure to Aβ induced gene expression changes that overlap extensively with changes in gene expression induced by the Wnt signaling antagonist, Dickkopf-1 (Dkk1); itself a gene induced in response to Aβ in rodent cells, animal models and human brain. This gene expression signature was subsequently validated in animal models with cerebral amyloid plaque pathology and in AD brain, as well as in a human DKK1 over-expressing mouse model. Knock-down of particular genes on this pathway prevented Aβ induced phenotypes including neurotoxicity and tau-phosphorylation. This experimental dataset contained measurements of 32 JAK-STAT genes, with 8 of these being significantly dysregulated (p values < 0.05). Binomial testing showed that this proportion of dysregulated genes was higher than expected by chance alone (p value 0.0001). Of the four genes that were significant in ANM (Dataset 1) and were also sampled in all other datasets (DCR-Dataset 2, Array Express-Dataset 3; and rodent model-Dataset 4), Akt1 (p value 0.001), Ifnar2 (0.007) and Pias1 (0.016) were also dysregulated in Dataset 4 (see Table 2).

Empirical Evidence for JAK-STAT Dysregulation in Both In Vitro and In Vivo Models of Aβ-Induced Neurotoxicity
We then further validated these bioinformatics results using empirical studies, measuring the key drivers of the pathway, Jak1, Jak2, Jak3 and Tyk2, in an in vitro rat model of neuronal toxicity. Previously, we and many others have demonstrated that rodent neurons exposed to amyloid peptides are susceptible of Aβ-induced toxicity and other phenotypes, including tau phosphorylation and synaptic alterations [12,[17][18][19][20][21][22][23] We therefore exposed primary neuronal cultures to 3 µM of oligomerised Aβ42 and real time PCR was performed at 30 m, 4 h and 24 h after Aβ exposure ( Figure 5). Hprt1 and Actb were used as housekeeping genes for data normalization. Tyk2 showed a time-dependent increase in expression detectable at 4 h (p = 0.0077) and maintained up to 24 h (p = 0.0063). Jak1 and Jak2 mRNA levels were only significantly increased at 24 h with p values of p = 0.024 and p = 0.015, respectively. Jak3 expression was modulated by Aβ at any time point (p > 0.05).
We proceeded to test the response of the JAK-STAT pathway in an acute in vivo rat model of Aβ toxicity (QPS; Austria GmbH). Five male rats were subjected to bilateral intracerebroventricular injection of 50 µM Aβ1-42 or equal volumes of phosphate buffered saline (PBS). After 3 h following injection, animals were sacrificed, and brains harvested for RNA extraction. We measured the mRNA levels for Jak1, Jak2, Jak3 and Tyk2 in entorhinal cortex from Aβ-injected rats or PBS-sham controls by real time PCR. 3 h after Aβ injection, the levels of Jak1 (p value 0.001), Jak2 (0.0071) and Tyk3 (0.0032) were significantly upregulated (p < 0.05) in the brains of these rats but not in control animals ( Figure 6) providing further validation for the implication of this pathway in Aβ-mediated toxicity. We proceeded to test the response of the JAK-STAT pathway in an acute in vivo rat model of Aβ toxicity (QPS; Austria GmbH). Five male rats were subjected to bilateral intracerebroventricular injection of 50 μM Aβ1-42 or equal volumes of phosphate buffered saline (PBS). After 3h following injection, animals were sacrificed, and brains harvested for RNA extraction. We measured the mRNA levels for Jak1, Jak2, Jak3 and Tyk2 in entorhinal cortex from Aβ-injected rats or PBS-sham controls by real time PCR. 3h after Aβ injection, the levels of Jak1 (p value 0.001), Jak2 (0.0071) and Tyk3 (0.0032) were significantly upregulated (p < 0.05) in the brains of these rats but not in control animals ( Figure 6) providing further validation for the implication of this pathway in Aβ-mediated toxicity. Figure 6. JAK kinases are upregulated in an in vivo rat model of Aβ toxicity. Entorhinal cortex of male wild type rats injected with Aβ (n = 5) or with PBS (n = 5) bilaterally into the lateral ventricles was analyzed by real time PCR to determine the levels of Jak1, Jak2, Jak3 and Tyk2. Each sample was run in triplicate. Jak1, Jak2 and Tyk2 were significantly upregulated (p = 0.001, p = 0.0071, p = 0.0032 respectively) in the Aβ-injected animals but not in the PBS injected control group. Data represented as normalized fold induction. Values given as mean ± s.e.m. Statistical significance determined by Student´s t-test.  We proceeded to test the response of the JAK-STAT pathway in an acute in vivo rat model of Aβ toxicity (QPS; Austria GmbH). Five male rats were subjected to bilateral intracerebroventricular injection of 50 μM Aβ1-42 or equal volumes of phosphate buffered saline (PBS). After 3h following injection, animals were sacrificed, and brains harvested for RNA extraction. We measured the mRNA levels for Jak1, Jak2, Jak3 and Tyk2 in entorhinal cortex from Aβ-injected rats or PBS-sham controls by real time PCR. 3h after Aβ injection, the levels of Jak1 (p value 0.001), Jak2 (0.0071) and Tyk3 (0.0032) were significantly upregulated (p < 0.05) in the brains of these rats but not in control animals ( Figure 6) providing further validation for the implication of this pathway in Aβ-mediated toxicity. Figure 6. JAK kinases are upregulated in an in vivo rat model of Aβ toxicity. Entorhinal cortex of male wild type rats injected with Aβ (n = 5) or with PBS (n = 5) bilaterally into the lateral ventricles was analyzed by real time PCR to determine the levels of Jak1, Jak2, Jak3 and Tyk2. Each sample was run in triplicate. Jak1, Jak2 and Tyk2 were significantly upregulated (p = 0.001, p = 0.0071, p = 0.0032 respectively) in the Aβ-injected animals but not in the PBS injected control group. Data represented as normalized fold induction. Values given as mean ± s.e.m. Statistical significance determined by Student´s t-test. Figure 6. JAK kinases are upregulated in an in vivo rat model of Aβ toxicity. Entorhinal cortex of male wild type rats injected with Aβ (n = 5) or with PBS (n = 5) bilaterally into the lateral ventricles was analyzed by real time PCR to determine the levels of Jak1, Jak2, Jak3 and Tyk2. Each sample was run in triplicate. Jak1, Jak2 and Tyk2 were significantly upregulated (p = 0.001, p = 0.0071, p = 0.0032 respectively) in the Aβ-injected animals but not in the PBS injected control group. Data represented as normalized fold induction. Values given as mean ± s.e.m. Statistical significance determined by Student's t-test.

Discussion
We have presented here a series of integrated analyses predicated on the underlying hypothesis that co-morbidity of disease can, in some cases, indicate shared genetic susceptibility to disease and that this is manifested most robustly at the level of pathways more than at the level of single genes. By combining genome wide pathway association from all diseases together with their comorbidity with AD, we identify JAK-STAT signaling as a shared factor correlated with the degree of comorbidity. In the first data-driven phase utilizing gene association data we find this disease cluster to include a series of disorders of immunity and inflammation together with age-related macular degeneration (ARMD) and Type 1 Diabetes mellitus (T1DM). The association with ARMD is particularly interesting as it has previously been found to be a risk factor for AD [24,25], because Aβ is a component of the drusen pathology in the retina of people with ARMD [26,27] and because the gene most associated with ARMD, CFH, encodes for a protein replicated as a biomarker of AD, complement factor H [28][29][30]. That our approach of using all GWAS data in a pathway clustering analysis identifies a relationship between AD and ARMD is all the more remarkable in because this association, in our study, is not driven by CFH. The association we find with T1DM is also intriguing as, although T2DM diabetes is one of the most substantiated risk factors for AD [31], our data now suggests that there might be a relationship also with early onset T1DM that is worth further attention.
However, the most extensive association between shared pathways and disease we find is with disorders of immunity and inflammation. The role of inflammation in AD has been apparent for many years. This evidence is very extensive and comes from many directions [32]. It includes evidence the use of non-steroidal anti-inflammatory drugs appear to decrease the risk of AD [33][34][35], post-mortem studies showing that inflammation is associated with AD pathology [36,37], and in vivo data showing that markers of inflammation are predominant amongst protein biomarkers both in AD and in pre-dementia conditions [38,39]. In addition, there is increasing evidence from GWAS studies and from rare mutations, that genes encoding proteins involved in immunity are amongst the most consistently associated with disease [32]. However, when considered in isolation, the pathways and processes identified by AD genetic studies are predominantly those of complement signaling and microglial function [3,40,41]. These pathways are clearly important in diseases with very considerable evidence to support their role, but the approach we have used here, triangulating between GWAS, real-world and empirical data, and including all disease and all genes, nominates a pathway as part of the aetiopathogenic process that is not identified by such AD gene focused studies.
As in any 'big data' approach, there are limitations both to the datasets available and to our use of them. First, in using the GWAS catalogue as a primary data-source, we limit ourselves to disease-gene associations where a significant number of genes have been identified. We do this in order to provide sufficient power for analysis, but acknowledge that the limit of 25 susceptibility genes to enable a disease to enter analysis is both arbitrary and dependent on the size and numbers of studies that happen to have been conducted to date. Almost certainly, we miss information as a consequence of data limitation. Secondly, by segregating genes into pathways we attempt to overcome the intrinsic limitation of GWAS studies, in that biology is mechanistically enacted at the level of pathway and not gene, let alone SNP. Given the large number of SNPs and genes in the human genome, two diseases may have no elements measured in GWAS, or even sequencing studies, in common and yet share an overlapping disease pathogenesis. Measuring association not with SNP but with multiple SNPs across a gene ('gene-wide association') is one attempt to overcome the limitation; here we go one step beyond this with a pathway-wide association approach. However, in attempting to derive such information from the GWAS data, we are severely limited by current understanding of biological pathways, which is rudimentary at best. This limitation is bound to hinder our derivation of knowledge from information in this context. Thirdly, in seeking to identify diseases comorbid with AD, we have crudely utilised a dataset of concurrent diagnoses, taking no account of some of the confounds or other concerns of conventional epidemiology. Indeed we cannot be sure whether the co-morbidity we observe is due to the disease itself or the drugs used to treat the disease. However, we note that similar claims-level analyses of real-world clinical data have recently proven valuable in studying genetic and environmental factors shared amongst diseases [42].
We accept the limitations of our approach described above. However, in mitigation of these potential limitations, the datasets we examine are huge; namely, all genetic studies with all genes and all diseases in the first phase, and a dataset of over six million people in the second. Furthermore, we would suggest that some confounds are less critical in the analysis we present here. For example, in studies of risk and protection then clearly understanding direction of effect-whether it is the disease or the treatment that affects risk-is fundamental. However, it becomes less important, potentially irrelevant, where we are determining simply whether the same processes are involved, as both disease and treatment will at some level and in some cases affect the same molecular pathways, which are the axis of our analysis. Finally, despite the limitations of deriving knowledge from data using this approach, the fact that the findings replicate in observational molecular studies in man and in experimental studies in rodents offers strong support to the results.
The JAK-STAT signaling pathway, nominated as a potential target for therapy through data-driven genomics and real-world data in this study, is a key regulator of the response to mediators of inflammation, including cytokines, chemokines [43] and microglia activation [44]. The binding of cytokines (interleukin, interferon and growth factors) and other ligands (such as hormones) to their receptors increases tyrosine kinase activity of Janus Kinases (JAKs, including TYK2), which in turn phosphorylate the receptors and recruit STATs which are themselves then phosphorylated. Subsequent dimerization leads to nuclear translocation and transcription factor activity. Given this critical role in the modulation of the inflammatory response, it is not surprising that JAK-STAT signaling has previously been associated with inflammatory disease and targeted for therapeutics, some of which have been approved for clinical use [45]. The pathway has also been identified as of importance in relation to diabetes [46] but less often in relation to AD. Very high concentrations of Aβ have been shown to increase tyrosine phosphorylation and transcriptional activity in a Tyk2 dependent manner in rodent models, and tyrosine phosphorylation of STAT3 is increased in AD brain [47], while inhibition of JAK-STAT3 signaling inhibits activation of astrocytes and microglia in animal models of neurodegeneration [48]. JAK-STAT signaling has also been identified as a component of plasticity, specifically long-term depression (LTD) [49,50], interesting not least because LTD and long term potentiation are regulated, in opposite directions, by GSK3, the predominant tau-kinase implicated in AD pathogenesis [51][52][53].
In summary, a combined, sequential analysis of GWAS data agnostic to disease type, combined with real-world data of co-incidence of AD with other diseases, nominated JAK-STAT signaling amongst other pathways as a possible underlying pathogenetic mechanism shared across multiple diseases. Remarkably, these diseases-inflammatory disorders, ARMD and Diabetes had previously been implicated as risk factors for AD. Adding to the weight of evidence for JAK-STAT signaling in AD, we subsequently found altered gene expression of the pathway in multiple human and rodent datasets and in empirical studies of Aβ exposure in rodents, both in vitro and in vivo. These data are not the first to nominate JAK-STAT signaling for therapeutic intervention [54], as experiments suggest that humanin and colivelin protect against AD-related neurotoxicity thought its activity in JAK-STAT. However the combination of genetic, human and rodent, observational and empirical data, make a strong case to pursue this pathway as a target for therapies for AD, not least because clinically approved compounds already exist. This further suggests that this novel integration of orthogonal data is a promising approach to find novel targets in complex disorders.