2.1. Pathway Proximity Captures the Similarities between Autoimmune Disorders
Conventionally, functional enrichment analysis relies on the significance of the overlap between a set of genes belonging to a condition of interest and a list of genes involved in known biological processes (pathways). Using known pathway genes, one can identify pathways associated with the disease via a statistical test (e.g., Fisher’s exact test for the overlap between genes or z-score comparing the observed number of common genes to the number of genes one would have in common if genes were randomly sampled from the data set). We start with the observation that such an approach (hereafter referred as to
conventional approach) often misses key biological processes involved in the disease due to the limited overlap between the disease and pathway genes. To show that this is the case, we focus on nine autoimmune disorders for which we obtain genes associated with the disease in the literature and we calculate
p-values based on the overlap between these genes and the pathway genes for each of the 674 pathways in the Reactome database (Fisher’s exact test, one-sided
). Intriguingly,
Table 1 demonstrates that this conventional approach yields less than ten pathways that are significantly enriched in five out of nine diseases, potentially underestimating the molecular underpinning of these diseases.
Alternatively, the shortest distance between genes in the interactome can be used to find pathways closer than random expectation to a given set of genes [
7,
26], augmenting substantially the number of pathways relevant to the disease pathology. Using network-based proximity [
7], we define the
pathway span of a disease as the set of pathways significantly proximal to the disease (
, see Methods). We show that the number of pathways involved in diseases increases substantially when proximity is used (
Table 1).
To show the biological relevance of the identified pathways using interactome-based proximity, we check how well these pathways can highlight genetic and phenotypic relationships between nine autoimmune disorders. First, to serve as a background model, we build a disease network for the autoimmune disorders (diseasome) using the genes and symptoms shared between these diseases as well as the comorbidity information extracted from medical insurance claim records (see Methods). The autoimmune diseasome (
Figure 1a) is extremely connected, covering 33 out of 36 potential links between nine diseases (with average degree
= 7.3 and clustering coefficient
). The three missing links are those between ulcerative colitis and rheumatoid arthritis, ulcerative colitis and Graves’ disease, and Graves’ disease and type 1 diabetes. On the other hand, several diseases such as celiac disease, Crohn’s disease, systemic lupus erythematosus, and multiple sclerosis are connected to each other with multiple evidence types in the autoimmune diseasome based on genetic (shared genes) and phenotypic (shared symptoms and comorbidity) similarities, emphasizing the shared pathological components underlying these diseases.
We compare the autoimmune diseasome generated using shared genes, common symptoms and comorbidity, to the disease network in which the disease-disease connections are identified using the pathways they share. We identify the pathways enriched in the diseases using both the conventional and proximity approaches mentioned above and check whether the number of common pathways between two diseases is significant (two-tailed Fisher’s exact test,
). The disease network based on pathways shared across diseases using the overlap between the pathway and disease genes is markedly sparser than the original diseasome, containing 17 links (
Figure 1b). None of the diseases share pathways with psoriasis and among the connections supported by multiple evidence in the original diseasome, the links between Crohn’s disease and celiac disease as well as Crohn’s disease and systemic lupus erythematosus are missing. On the contrary, the disease network based on shared pathways using proximity of the pathway genes to the disease genes consists of 34 links, where the only unconnected disease pairs are Crohn’s disease and Graves’ disease and type 1 diabetes and psoriasis, suggesting that it captures the connectedness of the original diseasome better than the conventional approach.
We next turn our attention to the shared pathways across diseases identified by both conventional and proximity based approaches and observe that most common pathways involve biological processes relevant to the immune system endophenotypes. In particular, we see that inflammasome-related pathways, such as signaling of cytokines (interferon gamma, interleukins like IL6, IL7) and lymphocytes (ZAP70, PD1, TCR, among others) are overrepresented. While conventional enrichment finds that most of these pathways are shared among only 4–5 diseases, proximity based enrichment points to the commonality of these pathways among almost all the diseases. Furthermore, the proximity based enrichment uncovers the involvement of additional interleukin (IL2, IL3, IL5) and lymphocyte (BCR) molecules ubiquitously in autoimmune disorders. These findings suggest that proximity-based pathway enrichment identifies biological processes relevant to the diseases, highlighting the common etiology across autoimmune disorders.
2.2. Diseases Targeted by the Same Drugs Exhibit Functional Similarities
Having observed that pathway proximity to diseases in the interactome captures the underlying biological mechanisms across diseases, we seek to investigate the potential implications of the connections between diseases for drug discovery. We hypothesize that a drug indicated for several autoimmune disorders would exert its effect by targeting the shared biological pathways across these diseases. To test this, we use 25 drugs that are indicated for two or more of the autoimmune disorders in Hetionet [
27] and split disease pairs into two groups: (i) diseases for which a common drug exists and (ii) diseases for which no drugs are shared. We then count the number of pathways in common between two diseases for each pair in the two groups using pathway enrichment based on both the gene overlap and proximity in the interactome. We find that the diseases targeted by the same drugs tend to involve an elevated number of common pathways compared to the disease pairs that do not have any drug in common (
Figure 2). The average number of pathways shared among diseases that are targeted by the same drug is 3.4 and 38 using overlap and proximity based enrichment, respectively, whereas, the remaining disease pairs share 2 and 31 pathways on average using the two enrichment approaches. We note that due to the relatively small sample size and potentially incomplete drug indication information, we interpret the elevated number of pathways as a trend rather than a general rule across all diseases (
and
, assessed by one-tailed Mann-Whitney U test, for the overlap and proximity based approaches, respectively). Nevertheless, taken together with the high overall pathway level commonalities observed in the autoimmune disorders mentioned in the previous section, this result suggests that the drugs used for multiple indications are likely to target common pathways involved in these diseases.
2.3. Proximal Pathway Enrichment Analysis Reveals Drugs Targeting the Autoimmune Endophenotypes
The results indicating that the drugs used for multiple autoimmune disorders potentially target common pathways raise the following question: “Can pathway level commonalities between diseases be leveraged to quantify the impact of a given drug on these diseases?” To this end, we propose PxEA, a novel method for
Pro
ximal pathway
Enrichment
Analysis that scores the likelihood of a set of pathways (e.g., targeted by a drug) to be represented among another set of pathways (e.g., disease pathways) based on the proximity of the pathway genes in the interactome. As opposed to the Gene Set Enrichment Analysis (GSEA) [
28] which uses gene sets and the ranking of genes based on differential expression, PxEA uses pathway sets and the ranking of pathways based on proximity in the interactome. PxEA scores a drug based on whether or not the pathways targeted by the drug are proximal to a pathway set of interest, such as pathways shared across different diseases. For a given drug and a pair of diseases, we first identify the pathways in the pathway span of both of the diseases, then we rank the pathways with respect to the proximity of the drug targets to the pathway genes and finally we calculate a running sum statistics corresponding to the enrichment score between the drug and the disease pair (
Figure 3, see Methods for details).
We employ PxEA to score 25 drugs indicated for at least two of the seven autoimmune disorders (there were no common drugs for celiac and Graves’ diseases). For each disease, we first run PxEA using the pathways proximal to the disease and the proximity of the drugs used for that disease to these pathways. We then run PxEA for each disease pair, using the pathways proximal to both of the diseases in the pair and the drugs commonly used for the two diseases. We notice that several drugs indicated for multiple conditions score higher using common pathways between two diseases than using the pathways of the disease they are indicated for (
Figure 4). This is not surprising considering that many of the drugs used for autoimmune disorders target common immune and inflammatory processes. For instance, sildenafil, a drug used for the treatment of erectile dysfunction and to relieve the symptoms of pulmonary arterial hypertension, is reported by Hetionet to show palliative effect on type 1 diabetes and multiple sclerosis. Actually, sildenafil is not specific to any of these two conditions and targets a number of the 57 pathways in common between type 1 diabetes and multiple sclerosis including but not limited to pathways mentioned in
Table 2, such as “IL-3, 5 and GM CSF signaling” (
), “regulation of signaling by CBL” (
), “regulation of KIT signaling” (
), “IL receptor SHC signaling” (
), and “growth hormone receptor signaling” (
).
Similarly, prednisone, a synthetic anti-inflammatory glucocorticoid agent that is indicated for six of the autoimmune disorders, is assigned a higher PxEA score using the pathways shared by Crohn’s disease and systemic lupus erythematosus compared to using the pathways involved only in Crohn’s disease, systemic lupus erythematosus, multiple sclerosis, psoriasis, rheumatoid arthritis, or ulcerative colitis. Thus, prednisone does not specifically target any of the six autoimmune disorders but rather acts on the endophenotypes that manifest across these diseases. We observe a similar trend in meloxicam, an anti-inflammatory drug that shows analgesic and antipyretic effects by inhibiting prostaglandin synthesis. Consistent with its known mechanism of action, meloxicam is proximal to “cholesterol biosynthesis” (), “fatty acid, triacylglycerol, and ketone body metabolism” (), and “prostanoid ligand receptors” () pathways in the interactome. While meloxicam is originally indicated for rheumatoid arthritis and systemic lupus erythematosus, the higher PxEA score when common arthritis and lupus pathways are used suggests that it targets common inflammatory processes in these two diseases.
2.4. Targeting the Common Pathology of Type 2 Diabetes and Alzheimer’s Disease
T2D and AD, two diseases highly prevalent to an ageing society, are known to exhibit increased comorbidity [
29,
30]. Recently, repurposing anti-diabetic agents to prevent insulin resistance in AD has gained substantial attention due to the therapeutic potential it offers [
31]. Indeed, the pathway spans of T2D and AD cover 170 and 82 pathways, respectively, 35 of which are shared between two diseases, linking significantly the two diseases at the pathway level (Fisher’s exact test, two-sided
).
We use PxEA to score 1466 drugs from DrugBank using the 35 pathways involved in the common pathology of T2D and AD. When we look at the drugs ranked on the top of the list (
Table 3), we spot orlistat, a drug indicated for obesity and T2D in Hetionet. Interestingly, existing studies also suggest a role for this drug in the treatment of AD [
32]. Orlistat targets extracellular communication (Ras-Raf-MEK-ERK, NOTCH, and GM-CSF/IL-3/IL-5 signaling) and lipid metabolism pathways (
Figure 5). Several of the proteins in the pathways pertinent to the common T2D-AD pathology, such as APOA1, PSEN2, PNLIP, LPL, and IGHG1 are either Orlistat’s targets themselves or are in the close vicinity of the targets. The next top scoring drugs are chenodeoxycholic and obeticholic acid, biliar acids that are in clinical trials for T2D (NCT01666223) and are argued to modulate cognitive changes in AD [
33].
It is noteworthy that the top scoring drugs belong to a diverse set of Anatomical Therapeutic Chemical (ATC) classes, covering alimentary tract and metabolism drugs (A05, A06, A08, A12), blood substitutes (B05), dermatologicals (D11) as well as cardiovascular (C01, C07), genito-urinary (G02), nervous (N07), and respiratory (R03) system drugs. The diversity of the ATC classes of top scoring drugs indicates that PxEA is not biased towards any particular ATC class. We also calculate the significance of the PxEA scores by permuting the ranking of the pathways. We find that the adjusted p-values (corrected for multiple hypothesis testing using Benjamini–Hochberg procedure) for the top candidates are all below , the minimum possible value (due to the 10,000 permutations used in the calculation).