Evaluation of the Common Molecular Basis in Alzheimer’s and Parkinson’s Diseases

Alzheimer’s disease (AD) and Parkinson’s disease (PD) are the most common neurodegenerative disorders related to aging. Though several risk factors are shared between these two diseases, the exact relationship between them is still unknown. In this paper, we analyzed how these two diseases relate to each other from the genomic, epigenomic, and transcriptomic viewpoints. Using an extensive literature mining, we first accumulated the list of genes from major genome-wide association (GWAS) studies. Based on these GWAS studies, we observed that only one gene (HLA-DRB5) was shared between AD and PD. A subsequent literature search identified a few other genes involved in these two diseases, among which SIRT1 seemed to be the most prominent one. While we listed all the miRNAs that have been previously reported for AD and PD separately, we found only 15 different miRNAs that were reported in both diseases. In order to get better insights, we predicted the gene co-expression network for both AD and PD using network analysis algorithms applied to two GEO datasets. The network analysis revealed six clusters of genes related to AD and four clusters of genes related to PD; however, there was very low functional similarity between these clusters, pointing to insignificant similarity between AD and PD even at the level of affected biological processes. Finally, we postulated the putative epigenetic regulator modules that are common to AD and PD.


Introduction
Alzheimer's disease (AD) is a complex neurodegenerative disorder, clinically characterized by a gradual decline in memory and impairment of other cognitive functions like communication, movement, higher visual processing, and language inability [1]. About 5.7 million people were living with AD in the U.S. in 2018 [2]. The manifestation of AD is primarily attributed to the extracellular beta-amyloid (Aβ42/40) aggregates and intracellular hyperphosphorylated Tau protein that accumulate in the brain of AD patients, causing neuroinflammation and brain cell death. AD is classified into two distinct categories: early onset AD (EOAD), which accounts for less than 5% of the AD population, whereas late-onset AD (LOAD) accounts for about 95% of AD patients [3]. EOAD is a Mendelian pattern disease, whereas LOAD is genetically complex and associated with several genes. The heritability contribution in LOAD is estimated to be around 58-79% [4], and the gene, APOE, has been named as the most important genetic risk factor in LOAD.
On the other hand, Parkinson's disease (PD) is the second most common age-related neurodegenerative disease. PD is caused by the death of dopamine-generating cells of substantia nigra in the mid-brain region, which affects the function of the central nervous system. Clinically, PD is characterized by syndromes like resting tremor, rigidity, bradykinesia, gait impairment, and postural instability [5,6]. Aggregation of the α-synuclein protein has been considered to be the principal cause for PD.
The relationship between AD and PD is not yet clear. AD and PD share common pathological overlaps despite occurring at different brain locations and having different clinical features. Xie et al. summarized the common pathological overlap between AD and PD, which relates to genes, nicotinic receptors, locus coeruleus, iron, mitochondrial dysfunction, oxidative stress, and neuroinflammation, tau protein, and α-Synuclein protein [7]. Patients with AD have been shown to possess a higher chance of developing PD. One study shows that out of 29 patients with PD, as many as 16 (55%) have mild or severe dementia, which is related to AD [8]. The survival time of PD patients with AD is also lower than that without AD [8]. Both diseases have common risk factors like oxidative stress and aging. Insufficiency of vitamin D has been also reported for both AD and PD patients when compared to the healthy controls [9]. However, so far, no common genetic risk factors have been reported for AD and PD.
In this paper, we examined the genomic, epigenomic, and transcriptomic level similarity between AD and PD. Genome-wide association studies (GWAS) identified more than 50 risk loci associated with LOAD and PD. Furthermore, several studies confirmed the effect of miRNAs in neurodegenerative diseases like AD and PD and reported the associated differentially-expressed microRNAs (miRNAs). miRNAs are non-coding single-stranded RNAs that are very small (20-22 nt) in size and function as negative gene regulators. miRNAs have been also used as a biomarker in early detection and staging information of diseases. Though we know the associated miRNAs and genes in AD and PD, the similarity or possible relationships between them is still unknown. Here, based on a literature search, we identify the different genes and miRNAs that have been associated with AD and PD. Next, we discuss how they may be related in these diseases considering their high likelihood of co-occurrence and predicted common epigenetic modules shared by AD and PD. Lastly, we report the transcriptomic level similarity between AD and PD using regulatory co-expression network prediction and network-based analysis.

Genetic Associations of AD According to GWAS
EOAD is a Mendelian pattern disease. Three genes, APP, PSEN1, and PSEN2, are considered to be genomic biomarkers in EOAD [10]. These three genes are involved in APP breakdown and Aβ generation. For example, PSEN1 encodes the subunit of γ-secretase, and mutations in PSEN1 is a common cause of EOAD. PSEN1 mutant fibroblasts increase the ratio of Aβ42 to Aβ40 [11]. Mutation in these three genes has been attributed to a wide range (between 12-77%) of EOAD patient [12].
The genetic contribution of EOAD is estimated to be 60-80% [13]. In contrast to EOAD, LOAD is a non-Mendelian disease and demonstrates a complicated relationship with genomics. The first degree relative of an LOAD patient has about a two-times higher probability of developing LOAD in their lifetime than the individual not having first degree LOAD relatives [3]. Genome-wide association studies identified more than 50 risk loci associated with LOAD. A summary of all major GWAS for LOAD is shown in Tables 1 and 2 and Figure 1. These genes were found to be related to the Aβ pathway, as well as to the immune system, lipid metabolism, and synaptic function. LOAD-related functional effects of these genes are summarized as in [14]: • Lipid metabolic pathway: APOE, CLU, ABCA7 • Immune system: CLU, CR1, CD33, ABCA7, MS4A, EPHA1 • Complement system: CR1, CLU, ABCA7, CD2AP • Endocytosis pathway: BIN1, PICLAM, CD2AP Though genes like PLD3 have higher risk (Figure 1), they are less common in the LOAD population. Therefore, in the following, we briefly review only those genes that are more common. a) b) Figure 1. (a) Rare and common variants of AD genes and their risk. Red color signifies Mendelian genes, and green signifies non-Mendelian genes (adapted from [15]). (b) Rare and common variants of PD genes and their risk (adapted from [16]).  APOE located on chromosome 19 is the most potent risk factor and the only confirmed susceptibility locus of LOAD. The most common genotype of APOE is APOE3 and has an odds ratio (OR) estimated around 3.2, whereas APOE4, which is present in about 20% of LOAD population, has OR estimated to be around 14.2 [27]. Here, OR is the quantification of the odds that an outcome will occur given a specific exposure [28] compared to the odds of the outcome occurring in the absence of the exposure, with a higher value (>1) reflecting that the exposure is associated with higher odds of outcome and can be designated as a risk factor. However, the APOE2 allele shows some protective effects in AD. APOE has several implications in the AD pathway [29]; it controls lipoprotein metabolism and also affects Aβ clearance by binding with Aβ protein. There is a strong connection of APOE with inflammation, cholesterol transport, and the central nervous system [30]. Neuroimaging studies showed that an APOE4-positive individual has higher deposits of Aβ plaques in the brain compared to an APOE4-negative individual [31]. Few APOE receptors, notably Lrp1, Apoer2, and Vldlr, were identified in the postsynaptic density, which interacts with the synaptic system. Reelin signaling by these receptors activates some pathways that protect Aβ polymerization [32].
Association of the gene CLU (also known as APOJ) and AD has been confirmed in several GWAS experiments. CLU encodes the major brain apolipoprotein, and CLU expression was reported to increase in LOAD brain and also was associated with the reduction of white matter and lower fractional anisotropy in a young, healthy human [33]. This gene is also related to both Aβ clearance and Aβ aggregation. CLU has an essential relationship with inflammation and the immune system [10]. Studies found an increase in CLU concentration in the brain, plasma, and CSF of the patient with AD [34]. Moreover, CLU variants can alter the coupling between the prefrontal cortex and hippocampus [35].
BIN1 is another critical risk locus of LOAD, and altered expression of BIN1 was found in the AD brain. BIN1 mainly increases the risk of AD by modulating tau pathology [36]. Lower BIN1-amphiphysin 2 expression promotes the propagation of tau pathology [37]. BIN1 can also interact with cytoplasmic linker protein CLIP-170; studies found an interaction between tau protein and BIN1 in human neuroblastoma cell [38]. BIN1 is also related to clathrin-mediated endocytosis, which can significantly affect APP processing and Aβ production. A relation between the clathrin-mediated endocytosis gene and toxic effects of Aβ was shown in a study [39]. It also plays a vital role in inflammation. BIN1 participates in phagocytosis and binds to α integrins, which is related to immune response [40]. Studies also found a possible link between the reduction of intracellular Ca++ release and BIN1 protein. Ca++ increase is linked with presenilin mutation, amyloid plaques, and ApoE4 expression, and maintaining calcium homeostasis is essential for normal neuronal function and synaptic transmission [27,41].
Complement receptor 1 (CR1) is the receptor of the C3b/C4b peptide. It encodes monomeric single-pass type I transmembrane glycoprotein, which is involved in immune complement cascade. Four CR1 SNPs (rs646817, rs1746659, rs11803956, and rs12034383) were found to increase Aβ42 concentration in AD patients, which is suggestive of CR1's role in Aβ metabolism. This gene also might increase Aβ oligomerization over Aβ fibrillogenesis, which causes more neurodegeneration [42]. Further studies suggested that CR1 (rs6656401) is associated with cerebral amyloid angiopathy and vascular amyloid deposition [43]. CR1 mRNA level also correlates with neurofibrillary tangles and phosphorylated tau [42]. CR1 can modulate the complement activation system, which leads to inflammation. A detailed review of this process can be found in [44].
TREM2 is another high-risk gene linked to AD, although it is present in a lower percentage of the population. Studies found the mutation in TREM2 is related to an autosomal recessive form of dementia [45]. Rare missense mutations raise LOAD risk with a similar effect size of APOE [46]. TREM2 R47H raises AD risk by 1.7-3.4-fold [29,47]. TREM2 also correlated with an increase in tau levels in cerebrospinal fluid [48].

Genetic Associations of PD According to GWAS
Genome-wide association studies confirmed that PD has a significant genetic contribution. Previous studies reported about 20 loci and 15 genes related to PD. A summary of all major GWAS for PD is shown in Tables 3 and 4. From a genetics viewpoint, common variation of loci α-synuclein (SNCA), leucine-rich repeat kinase 2 (LRRK2), and microtubule-associated protein tau (MAPT) showed significant relationships with PD. Moreover, mutation in nine genes, namely SNCA, LRRK2, VPS35, EIF4G1, CHCHD2, PRKN, DJ1, PINK1, and ATP13A2, is associated with the monogenic form of PD [49]. Table 3. Genome-wide association studies (GWAS) in PD.

Study Ethnic Group
Sample Size Locus SNPs [50] European PD cases: 5353; control: 5551 Missense and multiplication mutations in the SNCA gene are believed to be the primary cause of the monogenic form of PD. However, these mutations only account for 10% of PD cases [55]. Mutation in SNCA was first identified in PD in 1997, and until now, five different point mutations have been confirmed as the cause of PD [56]. The non-coding intron in the SNCA gene increases PD susceptibility. Mutated alleles of SNCA change the expression and property of α-synuclein protein, which leads to abnormal aggregation of α-synuclein. The first identified mutation of SNCA was p.A53T, which causes PD. These patients have early age onset (38-49 years) within the Mediterranean origin and rapid disease progression. However, this mutation only accounts for 0.5% of familial and sporadic cases of PD. The second SNCA mutation is p.A30P, with a variable age onset (54-76 years). Cognitive impairment is frequent and early in the patients having this mutation. The third mutation was identified as the heterozygous p.E46K mutation with age ranging from 49-67 years. The fourth mutation p.H50Q was identified in 2013 in a PD patient of age 60 and also in the PD brain-driven DNA. The fifth missense mutation of SNCA is p.G51D; it has an early age onset in the 30s. This mutation leads to PD with unusual clinical and biochemical features. Multiplication of the SNCA gene is more common than these single-point mutations. SNCA duplication and triplication has been reported worldwide. A two-fold expression level of α-synuclein protein has been identified in those patients. SNCA duplication is more common than triplication and has late age onset and slow disease progression compared to the triplication. A common variant of SNCA was also identified as a risk factor of sporadic PD [57].
In 2004, mutation of the LRRK2 gene was identified as a genetic cause of PD. The frequency of LRRK2 mutation in hereditary PD has been estimated to be 4% with an average age onset of 60 years, and sporadic PD is estimated to be around 1% [58]. The most frequent mutation of LRRK2 is G2019S, whereas some of the other common mutations are R1441G, R1441C, Y1699C, and R1441H.
Another monogenic cause of PD is D620N mutation in the VPS35 gene, which was first identified in 2011 in an Austrian family [59]. This mutation accounts for about 1% of familial PD cases. This mutation has a mean age of onset around 53 years with slow disease progression. Other monogenic causes of PD such as the mutation of PARK, PINK1, ATP13A2, and DJ-1, typically have a lower age of onset (<45 years) [49].
Another important gene related to PD is MAPT, which encodes the tau protein. Tau aggregates frequently can be seen in the brain of AD patients. The toxic interaction between tau and α-synuclein may lead to the deposition of both proteins in the brain [60]. α-synuclein also binds with tau, which can reduce the rate of axonal transport. MAPT haplotypes, especially H1 haplotypes, have been identified as a risk factor of PD [61]. MAPT exhibits a mutual regulation with the lysosome function. Interestingly, the autophagy-lysosome pathway is also related to PD [62].

Common Regulator Genes in AD/PD
In order to identify the common regulator genes for AD and PD, we first performed an inner merge of the GWAS reported gene loci for AD and PD. We have found only a single common gene HLA-DRB5 reported for both diseases. HLA-DRB5 has a strong involvement with the immune system. The biological processes related to HLA-DRB5 are adaptive immune response, the T cell receptor signaling pathway, the interferon-gamma-mediated signaling pathway, and antigen processing [63]. Its association with AD and PD has been reported in several other reports [64][65][66].
Outside of GWAS studies, various other studies reported common risk loci for AD and PD, one such gene being SIRT1. It defends against microglia-dependent amyloid β though the NF-kB signaling pathway [67]. Pharmacological and overexpression studies revealed the role of SIRT1 in impacting Aβ plaques [68,69]. A study found that overexpression of SIRT1 suppresses the α synuclein aggregate formation in PD [70], while inactivation of SIRT1 also elevates mitochondrial apoptosis and immune system alterations [71]. Mitochondria are implicated in regulation of cellular redox potency, which is important for normal physiological processes, the deregulation of which is associated with the pathogenesis of aging, neurodegenerative diseases, such as Parkinson's and Alzheimer's disease (PD, AD), cardiovascular diseases, inflammation, and metabolic disorders [72]. Additionally, miRNA-34a, miRNA 122, and miRNA 132 inhibit Sirt1; regulation of miRNA-34a and miR132 was reported for AD, while miRNA132 was reported for PD in the literature [73][74][75], which potentially corroborates the involvement of SIRT1 in both AD and PD. A few other genes have also demonstrated shared genetic mechanisms in both AD and PD such as PON1, GSTO, and NEDD9 [7]. PON1 is associated with pesticide metabolism, oxidative stress, and inflammation. A study found that GSTO increases the risk and gene expression level in the brain of both AD and PD patients [76], whereas Li et al. reported NEDD9 as a common risk factor of AD and PD [77]. However, more studies are needed on these genes to determine whether they can be considered as shared risk factors for both diseases.

Differential Expression Analysis and Functional Enrichment on the GEO Dataset
To get better insights into the genomic level similarity between AD and PD, we next used the gene expression data that were downloaded from the Gene Expression Omnibus (GEO) repository [116]. Next, we performed differential expression (DE) analysis on these datasets to identify the important genes in AD/PD. We found that out of the reported gene list, 38 genes were expressed in this dataset in AD, and 1444 genes were expressed in PD (p < 0.05), as shown in Table 7. None of these DE genes were however reported in GWAS studies in AD patients. However, in PD, nine DE genes HIP1R, FAM171A1, BIN3, MAPT, RIT2, ALAS1, SH3GL2, ITPKB, and SNCA were reported in PD based on GWAS studies. Next, we performed a functional enrichment analysis on these DE genes. The enriched biological processes are shown in Figure 4.

Gene Co-Expression Network Prediction and Network Analysis on the GEO Dataset
In order to analyze how each gene regulates the others in these diseases, we predicted the gene co-expression network for AD and PD separately using the same GEO dataset. We used only a subset of the genes that were differentially expressed or reported in the GWAS experiments to predict these networks. Next, we took a consensus cutoff of 0.96 for PD and 0.90 for AD to select only the high confidence edges from these networks and visualize the relationship between these genes in AD and PD. AD and PD network data are given in the Supplementary material.  Using graph modularity on these networks, we identified a few distinct clusters for AD and PD, which are shown in Figure 5b,c. Each cluster in the network signifies a group of genes that work together closely in the disease. In this dataset, we found six closely-related clusters in AD and four clusters in PD. Next, we functionally enriched the genes in the cluster to relate them to specific biological functions. The identified functions of these clusters are shown in Figures 6 and 7. Here, Clusters 0-5 are identified as AD clusters, and 6-9 are identified as PD clusters. Darker green color corresponds to higher similarity between clusters. Figure 5d visualizes the functional similarity among the clusters of AD and PD. The functional similarity is defined as the number of common functions of the clusters divided by the total number of functions from any two clusters, each chosen from the ones listed in Figures 6 and 7. We found that the functional similarity between the clusters was quite low for AD and PD. This suggests that these clusters affect a different set of functions in each disease.

Literature Mining for GWAS/miRNA Studies
First, we explored the miRNAs/genes reported in different databases like Alzgene [117], PDgene [50], and phenomiR [118] to identify the relevant genes and miRNAs associated with AD and PD. We found that some of these databases are outdated and do not contain current information from the literature. For example, the phenomiR database was last updated in 2011 [118]. Hence, we next manually queried the published literature on or before 2018 through PubMed, ScienceDirect, Scopus, and Google Scholar searches using search terms like "AD/Alzheimer's + GWAS/gene", "PD/Parkinson's + GWAS/gene", "AD/Alzheimer's + miRNA/microRNA", "PD/Parkinson's + miRNA/microRNA", "AD/Alzheimer's + risk loci", "PD/Parkinson's + risk loci", and "LOAD + gene/GWAS/microRNA/miRNAs" to update the information obtained in the previous step for both PD and AD. For GWAS, we only considered the studies having a large number of samples. However, for miRNAs, we listed out all the reported miRNAs in AD/PD as there are fewer reports associated with miRNAs.

Analysis on GEO Data
The transcript expression data for AD/PD were downloaded from the GEO database [119]. For AD, we used GEO accession number GSE84422 as the data source of our studies [120]. GSE84422 contains RNA samples from the brain of 125 human subjects and profiled using Affymetrix Genechip microarrays. For PD, we used GSE accession number GSE20295. It consists of 93 samples taken from different brain regions of PD patients and controls [121].

Gene Coexpression Network Inference Algorithm
Predicting gene-gene interactions is a popular research area and has already been significantly documented in the literature. Genes interact among themselves via transcription factors, through mutual co-expression of a gene group. High-throughput data captured under different conditions by next-generation sequencing (NGS) or RNA-seq make it feasible to computationally predict the gene coexpression network. There are several network inference algorithms that have been implemented over the last few years to infer networks from a snapshot of the transcriptome. However, the performance of these algorithms widely varies over the different datasets and possesses a different inherent bias. There is no single algorithm that performs best in different settings. Hence, in order to predict a high confidence gene coexpression network, we used six popular network inference algorithms. These include two mutual information-based algorithms: (i) context likelihood of relatedness (CLR) [122,123] and (ii) maximum relevance minimum redundancy backward (MRNETB) [124]. We also used basic correlation-based network inference methods: (iii) Pearson and (iv) Spearman correlation, as well as (v) the distance correlation (DC)-based method and (vi) one regression-based gene network inference algorithm called the ensemble of trees (GENIE3) [125]. We next integrated the individual network predictions from each of these six different methods to get one high-confidence interaction network. To integrate the results, we used the wisdom of crowds approach, which is a phenomenon where aggregation of information of a group outperforms the results from an individual. Marbach et al. [126] showed this consensus-based approach outperformed any individual network inference algorithm and predicted a more robust and high-confidence inferred network. Therefore, the wisdom of crowds approach gave us a more accurate picture of gene regulation; this network inference pipeline was previously validated in our prior work [127][128][129]. A flowchart of the steps involved in the gene coexpression network prediction algorithm is shown in Figure 4a.
Unfortunately, some of these network inference algorithms are quite computationally expensive and not feasible to run for thousands of transcripts. Therefore, we re-implemented the parallelized version of these algorithms in CUDA-GPU; the basic idea was to compute the correlation between any gene pair on a different GPU thread. Our implementation achieved about 1000-times speed-up, which enabled us to predict the coexpression network for a large number of transcripts. Predicting high-confidence gene coexpression networks is an essential step towards understanding the role of genes or miRNAs in diseases. It not only shows us how one gene affects another gene in a specific disease, but also gives us the ability to identify how several genes work as a single group in a specific disease.

Gene Set and Functional Similarity Analysis on the GEO Dataset
We used the statistical method LIMMAto find the differentially-expressed (DE) genes from the GEO dataset [130]. Functional analysis on DE genes was performed using the CluterProfiler package in R [131]. We used the Python package networkX and the Gephi tool for analyzing the gene co-expression networks and the subsequent cluster analysis. On the predicted gene coexpression network, we performed modularity-based community detection to identify the clusters in AD/PD. Next, we performed the functional analysis on each cluster to identify the functions of the genes in each cluster. Functional similarity was calculated using the Jaccard index, which is calculated as the common functions between any two clusters divided by the union of functions from the two clusters.

Common miRNA Identification and Pathway Analysis
After identifying causal and common miRNAs between AD and PD, we analyzed the potential effect of these miRNAs in biological pathways. We used the DIANA-miRpath tool to find out the association of critical biological pathways through functional analysis with these deregulated miRNAs [132]. DIANA-miRpath is a bioinformatics tool that identifies experimentally-validated or predicted target genes associated with miRNAs. On the list of genes, it performs merging and meta-analysis algorithms to identify pathways associated with miRNAs. We used the miRTarBasedatabase to predict associated pathways from this tool; miRTarBase predicts biological pathways using only experimentally-confirmed miRNA target genes in a disease [133]. Next, we explored the literature again to gather information about how these miRNAs associate with the identified biological processes in the context of AD and PD.

Conclusions and Discussions
In this paper, we analyzed the similarity of the two most widely-occurring neurodegenerative diseases: AD and PD. Major GWAS studies identified approximately 50 risk loci for PD and AD. However, we found only one common risk loci (HLA-DRB5) that has been reported for AD and PD in these GWAS studies. HLA-DRB5 has a strong connection with the central nervous system; it has been reported several times before for AD and PD. Other studies from the literature also reported some common risk loci for AD and PD where the gene SIRT1, among others, has been implicated, which plays a dual role in impacting Aβ plaque formation and α-synuclein aggregation. Literature mining also identified 15 common miRNAs that have been reported to be associated with both AD and PD, among which hsa-miR-16 and hsa-miR-29a/b/c could be common epigenetic regulators in these two diseases. The 15 common miRNAs are mainly involved in the TGF-beta signaling pathway, MAPK signaling pathway, neurotrophin signaling pathway, glycosphingolipid biosynthesis, lacto and neolacto series, Ras signaling pathway, and arrhythmogenic right ventricular cardiomyopathy (ARVC).
To get more insights into the reasons behind the co-occurrence of AD and PD, we separately predicted the gene co-expression networks for AD and PD. Using cluster analysis, we found six different clusters in AD and four different clusters in PD, which work together in each of these diseases. We also calculated the functional similarity of these clusters in a combined AD and PD setting, but found very low functional similarity between them; this suggests that very different biological processes are activated in these two diseases, which corroborated our finding that there were not many common genetic loci between AD and PD. Additionally, this may also suggest that the 15 common miRNAs reported for AD and PD may serve as mostly a defense mechanism against brain toxicity and may not play a causal role in either AD or PD.
In a complex heterogeneous disease, different genes' activation can lead to the same disease outcome [134]. Possibly, AD and PD have different genetic roots, but converge to a similar phenotypic outcome as PD and AD share a few similar symptoms. In this study, we did not considered patient-specific variability of the gene expression while predicting the coexpression networks. One future direction of this study is to consider patient-specific variability to find the genome level similarity between AD and PD.
Funding: This work was supported by NSF-1802588 to P.G.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: