Characterizing Gene and Protein Crosstalks in Subjects at Risk of Developing Alzheimer ’ s Disease : A New Computational Approach

Alzheimer’s disease (AD) is a major public health threat; however, despite decades of research, the disease mechanisms are not completely understood, and there is a significant dearth of predictive biomarkers. The availability of systems biology approaches has opened new avenues for understanding disease mechanisms at a pathway level. However, to the best of our knowledge, no prior study has characterized the nature of pathway crosstalks in AD, or examined their utility as biomarkers for diagnosis or prognosis. In this paper, we build the first computational crosstalk model of AD incorporating genetics, antecedent knowledge, and biomarkers from a national study to create a generic pathway crosstalk reference map and to characterize the nature of genetic and protein pathway crosstalks in mild cognitive impairment (MCI) subjects. We perform initial studies of the utility of incorporating these crosstalks as biomarkers for assessing the risk of MCI progression to AD dementia. Our analysis identified Single Nucleotide Polymorphism-enriched pathways representing six of the seven Kyoto Encyclopedia of Genes and Genomes pathway categories. Integrating pathway crosstalks as a predictor improved the accuracy by 11.7% compared to standard clinical parameters and apolipoprotein E ε4 status alone. Our findings highlight the importance of moving beyond discrete biomarkers to studying interactions among complex biological pathways. Processes 2017, 5, 47; doi:10.3390/pr5030047 www.mdpi.com/journal/processes Processes 2017, 5, 47 2 of 17


Introduction
It is common knowledge that the prognostics of diseases such as Alzheimer's disease (AD) is of national importance.AD alone affects about 10% of the population over 65 years old [1,2], and is among the leading causes of death in patients over 75 years of age in the U.S. [3].There is evidence suggesting that the progression to AD dementia begins years before it is clinically determined and is preceded by a phase of mild cognitive impairment (MCI), during which AD-related treatments are likely to be more effective.Thus, it is important to discover the mechanisms underlying risk of AD and to develop accurate biomarkers that reflect the complexity of the disease at an individual level.Although a number of biomarkers are currently being evaluated for use to predict AD or study disease progression (e.g., tau, p-tau181P, β-amyloid1-42, apolipoprotein E ε4 (APOE ε4), and microRNAs) [4][5][6][7], none of these markers are yet fully validated or approved for predicting the risk of AD.Indeed, AD is no longer seen as a disease of single discrete lesions, but as a perturbation of altered cortical networks by pathological processes in interlinked pathways.Hence, the application of systems biology methods to the discovery and characterization of novel biomarkers [8][9][10][11][12][13][14][15][16][17][18][19][20] has taken on greater promise and urgency.
The cellular mechanisms underlying many neurological disorders are complex, with crosstalks between multiple molecular pathways likely contributing to disease initiation and progression.In living organisms, pathways are said to crosstalk if they are linked together to perform biological functions as a system.Crosstalks can also be defined as interactions between signal transduction pathways, and usually take the form of protein or transmembrane interactions.A number of potential crosstalks have been noted in vitro in AD, such as those between amyloid and tau pathways, oxidative phosphorylation, the p53 signaling pathway, and apoptosis [21][22][23].Another example is the reported crosstalk among MAPK, insulin, and calcium signaling pathways [24].There is also evidence of crosstalk among pathways involved in the regulation of glycolysis metabolism, pathways involved in the regulation of the actin cytoskeleton, and apoptosis [24].The latter crosstalk is also associated with other neurodegenerative disorders, such as Huntington disease and amyotrophic lateral sclerosis [24].Furthermore, the cellular signaling pathways in AD have been reported, such as Wnt signaling, 5 adenosine monophosphate-activated protein kinase, mammalian target of rapamycin, Sirtuin 1, and peroxisome proliferator-activated receptor gamma co-activator 1-α, and possible crosstalk between these pathways has been discussed [25].For a review of multiple interacting pathways in neurodegenerative disease, see [26].In clinical AD research studies of diagnosis or prognosis, biomarkers are typically treated as discrete entities, in part because biological pathway crosstalks between genes or proteins have not yet been fully characterized at a systems biology level in AD.
From the computational methodology standpoint, the study of pathway crosstalks is still in its infancy.Existing methods predict crosstalks between known metabolic pathways using chemical protein interaction networks [24,[27][28][29].However, these computational methods do not take advantage of the different chemical evidence available, such as direct binding, the biochemical evidence, such as phosphorylation, and the functional evidence, such as transcriptional regulation.Moreover, the discovery, characterization, and utilization of pathway crosstalks as biomarkers for disease prognosis has not been investigated.
Here, we use clinical, cognitive, and genetic data from a national cohort study, the Alzheimer's Disease Neuroimaging Initiative (ADNI-1), along with a systematic computational methodology to discover and characterize biological pathway crosstalks in subjects with MCI.We further examine the utility of these novel biomarkers to discriminate stable MCI from those who progress to AD dementia.The first part of the methodology (Figure 1), focuses on utilizing several existing evidence, such as chemical interaction, genetic interaction, domain interaction, and transcription factors, to identify potential pathway crosstalks.In the second part (Figure 2), Single Nucleotide Polymorphisms (SNPs) are used to find patient-specific pathway crosstalks as biomarkers.In the third part, we build and test initial prognostic models that use pathway crosstalks as biomarkers to predict patient progression from MCI progression to AD dementia (see Results).To the best of our knowledge, this is the first such systematic characterization of biological pathway crosstalk biomarkers associated with the risk of AD.
Polymorphisms (SNPs) are used to find patient-specific pathway crosstalks as biomarkers.In the third part, we build and test initial prognostic models that use pathway crosstalks as biomarkers to predict patient progression from MCI progression to AD dementia (see Results).To the best of our knowledge, this is the first such systematic characterization of biological pathway crosstalk biomarkers associated with the risk of AD.  mapping the Single Nucleotide Polymorphisms (SNPs) to genes and in turn to pathways using the SNP and gene location information, (2) choosing a genetic model and calculating a patient-specific SNP enrichment score for each pathway using the patient's allele information, and (3) overlaying the pathway enrichment scores on the reference crosstalk map to build patient-specific pathway crosstalk maps.

Materials and Methods
Our methodology consists of the following steps: (A) identifying potential pathway crosstalks by using existing gene and protein data (Figure 1), (B) identifying patient-specific pathway crosstalks via SNP information (Figure 2), and (C) identifying significant pathway crosstalks as biomarkers for MCI progression to AD dementia progression prediction.Step (1)

Identification of Potential Pathway Crosstalks
Step (2) Step (3) Polymorphisms (SNPs) are used to find patient-specific pathway crosstalks as biomarkers.In the third part, we build and test initial prognostic models that use pathway crosstalks as biomarkers to predict patient progression from MCI progression to AD dementia (see Results).To the best of our knowledge, this is the first such systematic characterization of biological pathway crosstalk biomarkers associated with the risk of AD.  mapping the Single Nucleotide Polymorphisms (SNPs) to genes and in turn to pathways using the SNP and gene location information, (2) choosing a genetic model and calculating a patient-specific SNP enrichment score for each pathway using the patient's allele information, and (3) overlaying the pathway enrichment scores on the reference crosstalk map to build patient-specific pathway crosstalk maps.

Materials and Methods
Our methodology consists of the following steps: (A) identifying potential pathway crosstalks by using existing gene and protein data (Figure 1), (B) identifying patient-specific pathway crosstalks via SNP information (Figure 2), and (C) identifying significant pathway crosstalks as biomarkers for MCI progression to AD dementia progression prediction.Step (1)

Identification of Potential Pathway Crosstalks
Step ( 2) Step ( 3) Step (1) Step Step (3) Figure 2. Identification of patient-specific pathway crosstalks.The methodology has three steps: (1) mapping the Single Nucleotide Polymorphisms (SNPs) to genes and in turn to pathways using the SNP and gene location information, (2) choosing a genetic model and calculating a patient-specific SNP enrichment score for each pathway using the patient's allele information, and (3) overlaying the pathway enrichment scores on the reference crosstalk map to build patient-specific pathway crosstalk maps.

Materials and Methods
Our methodology consists of the following steps: (A) identifying potential pathway crosstalks by using existing gene and protein data (Figure 1), (B) identifying patient-specific pathway crosstalks via SNP information (Figure 2), and (C) identifying significant pathway crosstalks as biomarkers for MCI progression to AD dementia progression prediction.

Identification of Potential Pathway Crosstalks
We quantify how likely it is that a pair of pathways will crosstalk based on biological datasets that provide evidence for possible crosstalks (including chemical interaction, genetic interaction, and transcription factors).To have a more robust pathway crosstalk map, we incorporate a wide array of evidence.The scores from each of these evidence are then combined to build one generic pathway crosstalk reference map analogous to the "Kyoto Encyclopedia of Genes and Genomes" (KEGG) pathway reference map.
The likelihood of a pathway pair crosstalking can be scored by utilizing one of two different methods.The first method is based on the presence of common elements, such as kinases and enzymes.The second method is based on the presence of interacting elements, such as chemically interacting proteins.In the following sections, we will discuss the different evidence used and their corresponding scoring methods.

Scoring Pathway Crosstalks Based on Common Elements
The pathway pairs were scored for how likely they are to crosstalk based on common elements from each of the following evidence:

•
Shared enzymes and metabolites: The number of enzymes and metabolites shared by a pair of pathways is utilized as one of the evidence to identify potential pathway crosstalks.This is reasonable because a variation in the concentration of common enzymes or metabolites will affect both pathways.

•
Phosphorylation: Phosphorylation, performed by protein kinases, is the addition of a phosphate group to a protein, which results in a change of the protein's function.Co-phosphorylated proteins in different pathways suggest potential pathway crosstalks.

•
Transcriptional regulation: Genes with common transcription factors are likely coexpressed.
Coexpressed genes in different pathways provide an avenue for the pathways to crosstalk.
For each pathway pair, we find the group of transcription factors that have coexpressed genes in both pathways.
For each pair of pathways, P i and P j , we define the scoring function as Equation (1): where Y(P i ) is the set of proteins (enzymes, metabolites, transcription factors, kinases) associated with pathway P i .

Scoring Pathway Crosstalks Based on Interacting Elements
The pathway pairs were scored for how likely they are to crosstalk based on interacting elements from each of the following evidence:

•
Chemical interactions: Protein interactions have previously been used to identify pathway crosstalks [24,30].Chemical interaction between proteins belonging to different pathways provides a mechanism for pathways to crosstalk.

•
Genetic interactions: The use of genetic interactions for identifying pathway crosstalks stems from the concept of "between-pathway" interactions.This essentially states that if there is a genetic interaction between pathways, one pathway covers for the defects in the other pathway.

•
Protein domain: Protein function is closely related to fundamental units of protein structure called "domains".In the domain interaction network, a pair of proteins has an edge if they are associated with the same set of protein domains.These edges are taken into consideration to assess for potential pathway crosstalks because of the common domains.
• Synthetically lethal gene pairs: Gene pairs whose simultaneous low-or non-expression can cause the organism to die are called synthetically lethal pairs [31,32].The presence of synthetically lethal pairs of genes across two pathways is a possible sign of pathway crosstalks.
For each pair of pathways, P i and P j , we define the scoring function as Equation (2): where N inter P i , P j is the number of interactions (genetic, chemical, domain, synthetically lethal) that exist among the proteins associated with pathway P i and the proteins associated with pathway P j .

Significance Estimation of Pathway Crosstalk Scores
Estimating p-values using Monte Carlo methods [33] is a robust technique for statistical significance assessments.This technique was utilized to assess the significance of the scores obtained for the pathway crosstalks using different evidence, as follows: 1.
For each pair of pathways, a score for how likely they are to crosstalk is calculated based on each evidence.

2.
Each pathway is randomized by replacing all proteins in that pathway with randomly selected proteins from the set of all proteins in the organism.This pathway randomization step is repeated W = 1000 times, i.e., we obtain W sets of pathways with randomized proteins.

3.
The evidence-specific scores for each pathway pair are recalculated W times using each set of pathways with randomized proteins.4.
An evidence-specific p-value is estimated for each pathway pair as R/W, where R is the number of randomized versions of that pathway pair that produce an evidence-specific score greater than or equal to the score obtained for the original pathway pair.

Combining the Scores for Each Pathway Crosstalk
For each pathway pair, we combine the evidence-specific p-values obtained using Monte Carlo methods.This gives a combined estimation for crosstalk likelihood between the pathway pair.To combine the p-values, we use the QFAST information fusion methodology proposed by Bailey and Gribskov [34], which is based on a theorem by Feller [35].The QFAST methodology uses the product of the individual p-values as a test statistic to calculate the combined p-value; using the product of p-values as a test statistic has been shown to be a desirable method for information fusion [34].One issue to consider is that some pathway pairs may not be scored by some of the evidence due to missing data.For those cases, we assign a p-value of 1 to denote that the particular evidence offers no information about those pathways crosstalking.The QFAST formula to calculate the combined p-value is Equation (3): where P i is the p-value obtained for evidence i, and n is the number of evidence.
A generic pathway crosstalk reference map is then built as a network, where the nodes represent pathways and the edges represent a statistically significant combined p-value for crosstalk likelihood between a pathway pair (at a significance level of α = 0.01).

Identification of Patient-Specific Pathway Crosstalks
To determine which of the pathway crosstalks in the generic reference map may be utilized as a biomarker for MCI progression to AD dementia progression, we identify patient-specific pathway crosstalks.For this purpose, we make use of SNP data.SNPs are variations in the deoxyribonucleic acid (DNA) sequence at particular locations, which can influence phenotypes such as proneness to disease or reaction to drugs.Initiatives such as the ADNI collect patient-specific SNP information.We utilize this information to identify patient-specific pathway crosstalks via the following four steps (Figure 2): 1.
Obtain a mapping of SNPs to pathways using genetic information.

2.
Identify the list of SNPs that are present in a patient.

3.
Use the mapping obtained in Step 1 and the patient-specific SNP list in Step 2 to obtain the pathways that are "SNP-enriched" in the patient.4.
Use the "SNP-enriched" pathways from Step 3 to obtain patient-specific pathway crosstalks.

Obtain a Mapping of SNPs to Pathways
Every SNP is assigned a chromosome number and a location on the genome, which can be used to map SNPs to genes, and, in turn, SNPs to pathways.Starting with a list of all genes that map to at least one pathway, we assign an SNP to a gene if it is present within 10 kilo base pairs (kbp) distance upstream or downstream of that gene.This method has been previously used by Silver et al. [36,37].Note that since SNPs are mapped to all genes within a range of 10 kbp, the same SNP may be mapped to more than one gene.The set of SNPs assigned to a pathway is the union of all SNPs assigned to the genes of that pathway.

Identify Patient-Specific SNPs That Are Present
For each patient, we identify a list of SNPs that are present based on the homozygous minor (recessive) genetic model.This genetic model requires a minor allele count of 2 for an SNP to be considered present, i.e., the minor allele is inherited from both parents.

Identify Patient-Specific SNP-Enriched Pathways
Given the set of SNPs assigned to a pathway, SNP pathway , the set of SNPs that are present in a patient, SNP patient , and the set of SNPs of interest, SNP interest , we define an enrichment score for this pathway and patient as Equation (4): where SNP interest is the set of all SNPs found on the human genome or a set of relevant SNPs from the scientific literature.
A p-value for the enrichment score is calculated using Monte Carlo methods, as discussed previously.The "SNP-enriched" pathways for each patient are then defined as the pathways with a statistically significant p-value for that patient (at a significance level of α = 0.05).

Identify Patient-Specific Pathway Crosstalks
Given the SNP-enriched pathways for each patient, we build patient-specific pathway crosstalk maps from the generic pathway crosstalk reference map, analogous to building organism-specific pathway maps from the KEGG pathway reference map.A pathway crosstalk, i.e., an edge in the patient-specific reference map, is present if both pathways are SNP-enriched for that patient.

Identification of Biased Pathway Crosstalk
The pathways and patient-specific pathway crosstalks that are biased towards MCI progressive patients or MCI non-progressive patients (at a significance level of α = 0.01) are incorporated as features into the model to predict MCI progression to AD dementia progression.The bias of an active pathway crosstalk towards MCI progressive patients is quantified using the hypergeometric test (Equation ( 5)): • Population: n is the total number of patients.

•
Success in population: x is the total number of MCI progressive patients and y is the number of MCI non-progressive patients.

•
Sample: v is the total number of patients (both MCI progressive and MCI non-progressive patients) a pathway crosstalk is enriched in.

•
Success in sample: w is the number of MCI progressive patients and z is the number of MCI non-progressive patients the pathway crosstalk is enriched in.
Similarly, the bias of an active pathway crosstalk towards MCI non-progressive patients can be calculated via φ(n, y, v, z).

Datasets
In this study, we utilize cellular subsystems that model biological pathways.Henceforth, we will refer to a cellular subsystem as a pathway.To create a potential pathway crosstalk reference map, we used cellular pathway data from the KEGG database [38][39][40].We obtained evidence for human chemical interaction, genetic interaction, and synthetic lethal gene pairs from BioGRID [41], domain interaction from GeneMania [42], transcription factors from the FANTOM database [43,44], and protein phosphorylation [45].We obtained SNPs associated with genes that were manually curated to be associated with AD from the Comparative Toxicogenomics Database [46], and we obtained a compilation of genes from the literature that have been identified as likely risk factors of AD from SNPedia [47].This information was utilized as our biologically meaningful knowledge priors.Some of the genes associated with Alzheimer's that were used in this study can be found in Table 1.
Table 1.Some of the genes associated with Alzheimer's disease (AD) that were used in this study.

APP amyloid beta (A4) precursor protein
Mutations in this gene have been implicated in autosomal dominant AD and cerebroarterial amyloidosis (NCBI Entrez Gene)

IL-1β
Four new genetic studies underscore the relevance of IL-1 to Alzheimer's pathogenesis, showing that homozygosity of a specific polymorphism in the IL-1α gene at least triples Alzheimer's risk, especially for an earlier age of onset and in combination with homozygosity for another polymorphism in the IL-1β gene [48] SOD2 A polymorphism in SOD2 is associated with development of AD [49] NOS3 NOS3 may be a new genetic risk factor of late onset AD [50] The data used in the preparation of this manuscript were obtained from the ADNI [51] database.The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD.The primary goal of the ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.
For our predictive study, we utilized the dataset from an earlier study by Shaffer et al. [52] based on ADNI-1.That particular study identified 97 MCI patients and predicted progression to AD dementia based on their clinical parameters, MRI results, PET scans, cerebrospinal fluid (CSF) markers (tau, p-tau181P, and β-amyloid1-42), the APOE ε4 genotype, and results from at least one follow-up clinical examination.Out of the 97 patients from the earlier study, only 91 patients have corresponding SNP data in the ADNI database.Hence, for the current study, we only utilized these 91 patients.However, this reduction in the number of patients did not considerably affect the ratio of MCI progressive patients to MCI non-progressive patients.The original study had 43 MCI progressive patients and 54 MCI non-progressive patients, and the reduced dataset has 41 MCI progressive patients and 50 MCI non-progressive patients.Thus, there is still sufficient representation of the two classes of patients.

Sample Characteristics
The mean age for all 91 MCI patients was 74.96 ± 7.32 years (mean ± standard deviation).The male-to-female ratio was 2.37, and 96.7% of subjects were white.A total of 36.26% of subjects had a family history of AD, and 54.94% had a positive finding for the APOE ε4 genotype.The mean follow-up duration for all of the subjects was 31.6 ± 10.6 months.Of these, 41 progressed to AD during follow-up (MCI progressive patients) and 50 did not (MCI non-progressive patients), with MCI progressive patients tending to have longer follow-up times by about 4.5 months.Statistically, MCI progressive patients did not differ from MCI non-progressive patients in mean age, sex ratio, education, race, ethnicity, family history of AD, or APOE ε4 prevalence.See Table 2 for details.

SNP-Enriched Pathways and Associated Crosstalks
Our analysis identified SNP-enriched pathways that represent six of the seven KEGG pathway categories, including Cellular Processes, Metabolism, Environmental Information Processing, Genetic Information Processing, Human Diseases, and Organismal Systems.This broad array of pathway categories represents the complex nature of AD pathogenesis, which has been attributed to many different biological mechanisms, ranging from amyloid toxicity to metabolic dysfunction to immune dysregulation.Figure 3 depicts the distribution of SNP-enriched pathways amongst the six KEGG categories.The majority of enriched pathways are classified under Human Diseases (31%).This supports the well-established relationships between AD and multiple other cardiovascular, autoimmune, and neurodegenerative diseases.For instance, diabetes, obesity, and heart diseases are well-established risk factors of AD, so much so that AD has been referred to as type 3 diabetes.As such, finding SNP-enriched pathways for cardiovascular, endocrine, and metabolic diseases in individuals with MCI is anticipated [53].
Similarly, the enrichment of metabolic pathways, organismal systems including nervous and immune system pathways, and common signaling pathways of the environmental information processing category is also expected and well-supported in the literature [54][55][56][57][58][59][60][61][62][63][64][65][66][67][68].Interestingly, several genetic information processing pathways, including cell cycle regulation and DNA replication and repair, were found to be enriched.Evidence for the roles of these pathways in AD has only recently begun to surface [69][70][71].Our findings of the SNP-enrichment of these pathways among MCI individuals may provide support for further investigations into such pathways.
SNP-enriched pathway crosstalks were discovered between six KEGG categories, with the greatest number of crosstalks occurring between Human Diseases and Organismal systems.It is difficult to stipulate the significance of these findings.However, given that the etiology of many diseases, including AD, is complex and likely involves the failure/dysregulation of many pathways that are involved in the normal functioning of multiple organ systems, such significant crosstalk between these two categories among MCI individuals is not unexpected.The ageing process itself may facilitate a greater number of crosstalks in many pathways, since aging is associated with degeneration in many tissues and raises the risk for other chronic diseases besides dementia.
Processes 2017, 5, 47 9 of 17 may facilitate a greater number of crosstalks in many pathways, since aging is associated with degeneration in many tissues and raises the risk for other chronic diseases besides dementia.To investigate the genetic load in regards to AD, we further examined enriched pathway crosstalks specifically relating to the KEGG AD pathway.We identified 97 AD-related crosstalks and grouped the participating pathways by KEGG category (Figure 4).In line with the overall findings of crosstalk enrichment, the AD-specific pathway crosstalks primarily fell between the categories Human Diseases and Organismal Systems, supporting the importance of the pathways within these categories in AD genetic load.In contrast, pathways of Metabolism and Genetic Information Processing had very few crosstalks, suggesting that genetic load To investigate the genetic load in regards to AD, we further examined enriched pathway crosstalks specifically relating to the KEGG AD pathway.We identified 97 AD-related crosstalks and grouped the participating pathways by KEGG category (Figure 4).To investigate the genetic load in regards to AD, we further examined enriched pathway crosstalks specifically relating to the KEGG AD pathway.We identified 97 AD-related crosstalks and grouped the participating pathways by KEGG category (Figure 4).In line with the overall findings of crosstalk enrichment, the AD-specific pathway crosstalks primarily fell between the categories Human Diseases and Organismal Systems, supporting the importance of the pathways within these categories in AD genetic load.In contrast, pathways of Metabolism and Genetic Information Processing had very few crosstalks, suggesting that genetic load  In line with the overall findings of crosstalk enrichment, the AD-specific pathway crosstalks primarily fell between the categories Human Diseases and Organismal Systems, supporting the importance of the pathways within these categories in AD genetic load.In contrast, pathways of Metabolism and Genetic Information Processing had very few crosstalks, suggesting that genetic load in these processes is not as important to the disease process, at least in this particular cohort.Similar findings were seen in the analysis of all pathway crosstalks.Focusing in on the AD pathway, we observe significant crosstalk in between all pathway categories supporting the complex etiology of this disease.

SNP-Enriched Features with Baseline Clinical Parameters
We predicted MCI progression to AD dementia progression using a support vector machine (SVM) with a linear kernel function with baseline clinical parameters (age, education, and Alzheimer's disease assessment scale-cognitive subscale (ADAS-Cog)), significant pathways, or significant pathway crosstalks as predictors.The results for 100 iterations of 10-fold cross-validation are shown in Table 3.The model built with the clinical parameters only produced an accuracy of 59.19 ± 2.46% with 83.64 ± 0.29% of training data points as support vectors.The model built with significant pathways alone produced an accuracy of 56.78 ± 3.5% with 68.36 3.5% support vectors.Typically, we expect a random guessing model to yield an accuracy of 50%; thus, both models only perform moderately above a random model.A high percentage of support vectors indicate that an SVM model is overfitted and unlikely to generalize well.Thus, if we have two models that produce the same accuracy, then we pick the model that has the lower percentage of support vectors.Sixty-eight percent (68%) or more of the training data points were used as support vectors and this indicates highly overfitted models, which is shown by the poor cross-validation accuracy.
Incorporating both the baseline clinical parameters and significant pathways as predictors produced a model with an accuracy of 64.57± 3.56% with 63.3 ± 1.15% support vectors.This combined model demonstrated a 5.38% increase in accuracy compared to the baseline clinical parameters model and a 7.79% increase in accuracy compared to the model using significant pathways alone.Additionally, the reduced support vector percentage of this combined model indicates a better generalizability than the baseline clinical parameters model (20.34% decrease in support vectors) and the significant pathways model (5.04% decrease in support vectors).
With our novel approach of using significant pathway crosstalks to predict AD progression, our model provides an accuracy of 60.97 ± 3.24, which is higher than using baseline clinical parameters or significant pathways alone.Furthermore, this crosstalks model has the lower support vector percentage of 50.83 ± 4.77%, and thus the greatest generalizability of all of the models.
The enhancement of the significant pathway crosstalks model with the inclusion of baseline clinical parameters produced a model that has the greatest accuracy of 70.9 ± 3.3% with a moderate support vectors percentage of 54.29 ± 0.56%.These initial results support the utility of using pathway crosstalks as significant predictors in the progression from MCI progression to AD dementia and warrant replication in larger samples followed for longer periods.We compared models built using the clinical parameters and the SNP-enriched features (significant pathways or significant pathway crosstalks) to a logistic regression model with only clinical parameters by Shaffer et al. [52] (Table 4).We also noticed that the average accuracy of the logistic regression model slightly increased (from 58.7% to 59.10 ± 1.71%) when we repeatedly created random 10-folds instead of using the 10 original folds from Shaffer et al. [52].It decreased (to 57.04 ± 2%) when we removed the six patients that did not have corresponding SNP data in the ADNI database.Our method, when incorporating either significant pathways or significant pathway crosstalks, had a higher average accuracy on 100 randomly generated 10-folds than the method by Shaffer et al. [52].Impressively, the combination of the baseline clinical parameters, APOE ε4, and significant pathway crosstalks in our logistic regression model yielded an accuracy of 72.1 ± 2.66.Also, a similar accuracy was obtained using a linear kernel SVM built on the SNP-enriched features.This indicates that the pathways and pathway crosstalks indeed lead to a better rate of prediction from MCI progression to AD dementia progression.

Randomized SNP-Enriched Features
To demonstrate that the pathway crosstalks found in this study have true predictive power and the results are not a random occurrence, we generated 25 random samples of pathway crosstalks with no prior association to Alzheimer's and performed 100 iterations of 10-fold cross-validation for each of these 25 samples.The results are shown in Table 5.The model with the baseline clinical parameters and randomized significant pathway crosstalks gave an accuracy of 59.27 ± 3.66 with 83.47 ± 1.84 support vectors.This model yields 12% less accuracy and a 29.1% increase in support vectors, in comparison to the original model that uses baseline parameters and significant pathway crosstalks (instead of randomized significant pathway crosstalks).As expected, our randomly generated pathway crosstalks shows worse performance than significant pathway crosstalks.The model accuracy is still moderately above a random guessing model, likely due to the presence of the clinical parameters.A similar trend was seen when investigating models with baseline clinical parameters and all AD biomarkers to determine the effects of randomized pathways.
In this work, we focus on the development of a novel computational methodology for the discovery of pathway crosstalks to be used as biomarkers for the prognosis of AD.To demonstrate the efficacy of our methodology, we compared it with methods and results from prior studies in this area, which used ADNI-1 data.Although there is more recent data available, ADNI-1 data was used so that we could benchmark our methodology against these prior studies.In future work, we will continue our characterization efforts by incorporating the newer ADNI datasets as well as increasing the sensitivity of the proposed methodology through the use of the additive genetic model for the identification of patient-specific SNPs.There are also some limitations to our study.The ADNI is not a population-based study; it is essentially a biomarker cohort at research sites and our sample size was relatively small: we relied on a sample that was previously studied, since our initial goal was to examine the additive value of crosstalk biomarkers.We also did not incorporate other biomarkers such as tau, p-tau181P, β-amyloid1-42, APOE ε4, and microRNAs at this time, since our main focus was on methodological development for discovering and characterizing pathway crosstalks.However, the ADNI results have formed the basis for many current clinical prevention drug trials, and hence the ADNI is a highly relevant dataset.Moreover, its careful selection criteria and the way it makes available rich biomarker and genetic data and longitudinal cognitive data are enormous strengths.Indeed, the study of pathway crosstalks may yield novel insights into how AD pathological (e.g., beta-amyloid, tau) and neuronal loss (e.g., apoptosis, atrophy) mechanisms interact, and our methods lay the foundation for such future work.
The generic pathway crosstalk reference map was built using several different datasets, and hence the question arises as to whether all datasets should be treated equally.For simplicity, in this study, we treated all datasets equivalently.However, a modification to our information fusion method would allow us to introduce parameters to weigh evidence differently based on expert knowledge or trustworthiness.In the future, we would like to perform additional experiments to see the effects of these parameters on disease AD prognosis.This is non-trivial, as we would first need to define a weighting scheme and then develop additional methods to gauge the weights for different evidence.

Conclusions
AD is a major public health challenge, and there remain substantial gaps in our knowledge of its biology and treatment targets.Fully characterizing AD at a systems biology level is a priority for these reasons.In this work, we demonstrate a new methodology to build a pathway crosstalk reference map using the combined power of several gene and protein knowledge antecedents, and use this to make AD-specific discovery pathway crosstalks by enrichment with patient-specific SNP information.Our pilot data documents the promise of utilizing those SNP-enriched pathway crosstalks to identify potential AD-linked mechanisms at a systems level.More specifically, we demonstrate a three-step methodology to build a generic pathway crosstalk reference map by combining several protein/gene evidence.We then used the identified pathway crosstalks from this map as potential AD biomarkers by enriching them with patient-specific SNP information.In an initial sample of at risk subjects, we found that utilizing SNP-enriched pathway crosstalks as additional features significantly improved the prediction accuracy of MCI progression to AD dementia progression.
In addition, we verified some previously identified pathways and identified some new pathway crosstalks that warrant further study.Furthermore, we built the prediction model including the identified pathways and crosstalks, and compared our model's outputs with a previous study.These prediction model comparison analyses show that the identified pathways and crosstalks can be used as significant biomarkers of MCI progression to AD dementia progression prediction with other clinical information.Additional analysis would be required to understand the biological mechanisms that explain the association of these pathways to AD.
In summary, this is the first report to our knowledge that characterizes biological crosstalk pathways in subjects at risk of AD using gene and protein knowledge antecedents and studies their potential utility as prognostic biomarkers.Further application of this methodology to the full ADNI-1 and ADNI-2 cohort as well as to other population studies is warranted, and may yield further insights into disease mechanisms as well as novel targets for biomarker development and drug discovery.

Figure 1 .
Figure 1.Identification of potential pathway crosstalks.The methodology has three steps: (1) quantifying crosstalk likelihood using multiple individual evidence to score each pathway pair, (2) obtaining a combined score using information fusion, and (3) building the crosstalk reference map.

Figure 2 .
Figure 2. Identification of patient-specific pathway crosstalks.The methodology has three steps: (1) mapping the Single Nucleotide Polymorphisms (SNPs) to genes and in turn to pathways using the SNP and gene location information, (2) choosing a genetic model and calculating a patient-specific SNP enrichment score for each pathway using the patient's allele information, and (3) overlaying the pathway enrichment scores on the reference crosstalk map to build patient-specific pathway crosstalk maps.

Figure 1 .
Figure 1.Identification of potential pathway crosstalks.The methodology has three steps: (1) quantifying crosstalk likelihood using multiple individual evidence to score each pathway pair, (2) obtaining a combined score using information fusion, and (3) building the crosstalk reference map.

Figure 1 .
Figure 1.Identification of potential pathway crosstalks.The methodology has three steps: (1) quantifying crosstalk likelihood using multiple individual evidence to score each pathway pair, (2) obtaining a combined score using information fusion, and (3) building the crosstalk reference map.

Figure 2 .
Figure 2. Identification of patient-specific pathway crosstalks.The methodology has three steps: (1) mapping the Single Nucleotide Polymorphisms (SNPs) to genes and in turn to pathways using the SNP and gene location information, (2) choosing a genetic model and calculating a patient-specific SNP enrichment score for each pathway using the patient's allele information, and (3) overlaying the pathway enrichment scores on the reference crosstalk map to build patient-specific pathway crosstalk maps.

Figure 3 .
Figure 3.The distribution of the types of SNP-enriched pathways identified in this study and a comparison to the pathway distribution of the Kyoto Encyclopedia of Genes and Genomes (KEGG).NOTE: Although there are seven KEGG pathway categories, here we only show the six KEGG pathway categories that included identified SNP-enriched pathways in this study.

Figure 4 .
Figure 4. Pathways found to have significant crosstalk with the AD pathway and corresponding KEGG categories (shown in colored blocks).Specific KEGG pathway types are listed below each category with the number of occurrences in parentheses.NOTE: Although there are seven KEGG pathway categories, here we only show the six KEGG pathway categories that included identified SNP-enriched pathways in this study.

Figure 3 .
Figure 3.The distribution of the types of SNP-enriched pathways identified in this study and a comparison to the pathway distribution of the Kyoto Encyclopedia of Genes and Genomes (KEGG).NOTE: Although there are seven KEGG pathway categories, here we only show the six KEGG pathway categories that included identified SNP-enriched pathways in this study.
greater number of crosstalks in many pathways, since aging is associated with degeneration in many tissues and raises the risk for other chronic diseases besides dementia.

Figure 3 .
Figure 3.The distribution of the types of SNP-enriched pathways identified in this study and a comparison to the pathway distribution of the Kyoto Encyclopedia of Genes and Genomes (KEGG).NOTE: Although there are seven KEGG pathway categories, here we only show the six KEGG pathway categories that included identified SNP-enriched pathways in this study.

Figure 4 .
Figure 4. Pathways found to have significant crosstalk with the AD pathway and corresponding KEGG categories (shown in colored blocks).Specific KEGG pathway types are listed below each category with the number of occurrences in parentheses.NOTE: Although there are seven KEGG pathway categories, here we only show the six KEGG pathway categories that included identified SNP-enriched pathways in this study.

Figure 4 .
Figure 4. Pathways found to have significant crosstalk with the AD pathway and corresponding KEGG categories (shown in colored blocks).Specific KEGG pathway types are listed below each category with the number of occurrences in parentheses.NOTE: Although there are seven KEGG pathway categories, here we only show the six KEGG pathway categories that included identified SNP-enriched pathways in this study.

3. 4 .
Comparison of Model Performances from Shaffer et al. (2013) with Our Model Performance including SNP-Enriched Features

Table 2 .
Baseline Characteristics of mild cognitive impairment (MCI) Study Sample.

Table 3 .
Performance of support vector machine (SVM) models with baseline clinical parameters.

Table 4 .
[52]ormance of Shaffer et al.[52]model with clinical parameters with 97 patients in comparison to our model with 97 and 91 patients.

Table 5 .
Performance of models with randomized pathway cross-talk features.