Prediction of Parkinson’s Disease Risk Based on Genetic Profile and Established Risk Factors

Background: Parkinson’s disease (PD) is a neurodegenerative disorder, and literature suggests that genetics and lifestyle/environmental factors may play a key role in the triggering of the disease. This study aimed to evaluate the predictive performance of a 12-Single Nucleotide Polymorphisms (SNPs) polygenic risk score (PRS) in combination with already established PD-environmental/lifestyle factors. Methods: Genotypic and lifestyle/environmental data on 235 PD-patients and 464 controls were obtained from a previous study carried out in the Cypriot population. A PRS was calculated for each individual. Univariate logistic-regression analysis was used to assess the association of PRS and each risk factor with PD-status. Stepwise-regression analysis was used to select the best predictive model for PD combining genetic and lifestyle/environmental factors. Results: The 12-SNPs PRS was significantly increased in PD-cases compared to controls. Furthermore, univariate analyses showed that age, head injury, family history, depression, and Body Mass Index (BMI) were significantly associated with PD-status. Stepwise-regression suggested that a model which includes PRS and seven other independent lifestyle/environmental factors is the most predictive of PD in our population. Conclusions: These results suggest an association between both genetic and environmental factors and PD, and highlight the potential for the use of PRS in combination with the classical risk factors for risk prediction of PD.


Introduction
Parkinson's disease (PD) is a progressive neurodegenerative movement disorder and the second most common after Alzheimer's disease, worldwide [1]. Selective loss or death of dopamine secreting neurons of the substantia nigra, and Lewy bodies accumulation in the spinal cord and brain are key pathological feature of the disease [2]. PD is characterised by motor symptoms such as rigidity, bradykinesia, and resting tremor as well as nonmotor symptoms [2]. The prevalence of PD varies by age as it affects~0.3% of the general population,~1% of the population over of 60 years, and 3.5% of the population over 85 years [1,3]. Up to date, the etiology of PD is still unclear [2]. However, PD is considered as a multifactorial disorder and previous epidemiological studies suggest that both genetic and environmental factors play an essential role in the triggering of the disease [2][3][4].
Over the years, more than 90 genetic variants that are associated with sporadic PD, progression and age at onset have been identified through multiple Genome Wide Association Studies (GWAS) [5]. Despite the large number of the genetic studies and the reported variants in PD, some of the variants have a small effect on disease risk and an important proportion of the overall genetic contribution to PD risk is not clearly understood [1]. In addition, previous studies reported that environmental, including lifestyle, factors may also play a role in the development of the disease. Environmental factors, such as diet (e.g., dairy products, soft drinks, and red meat consumption), depression, exposure to pesticides, rural living and head injury were positively associated with PD, whereas smoking, alcohol, coffee consumption, and physical activity were inversely associated with the disease [3,[6][7][8].
Although the already published genetic and epidemiological studies explain a substantial part of genetic as well as phenotypic variability and etiology of PD, a large fraction of genetic and environmental contribution remains to be studied [1]. Genetic and environmental factors may interact with each other in a complex manner, increasing the risk for the development of PD [2][3][4]. Several studies indicated that polygenic risk scores (PRS), which combine the effect of multiple genetic variants, can capture the overall genetic background of complex traits and diseases, including PD [1,4,9,10]. Furthermore, a combination of the genetic background with environmental findings may provide a better understanding of the disease. Although, a small GWAS was previously performed in the Greek population and some of the investigated SNPs (rs6599389 and rs356220) had the same OR direction with the results of our study in the Greek-Cypriot population, PRS was not calculated [11]. Therefore, the aim of this pilot study was to evaluate the predictive performance of a PRS consisting of 12-SNPs, which have been previously associated with PD in GWAS, and to test the combined effect of the PRS and already established risk factors on PD risk. This is a proof-of-concept study that aims to explore our population. Despite that the sample size of our study in this population is relatively small; this study provides important results and will initiate the investigation of combined environmental and genetic factors of PD in the Greek-Cypriot population. Future studies with increased sample size and number of SNPs could redefine future models and will have the potential to be evaluated for their clinical utility.

Dataset
This investigation used genotypic and demographic/lifestyle data obtained from a previous case-control study carried out in the Greek-Cypriot population (235 unrelated PD patients and 464 age and sex matched unrelated healthy controls) [3]. All patients were recruited in the study after clinical diagnosis of PD. Demographic/lifestyle data from PD cases and controls were collected through a personal interview. The study was approved by the Cyprus National Bioethics Committee (EEBK/EΠ/2014/29) and all subjects gave written informed consent in accordance with the 1964 Declaration of Helsinki. Genotypic data of 12 (rs12185268, rs10513789, rs6599389, rs356220, rs7617877, rs17115100, rs13312, rs1801582, rs4837628, rs823118, rs356182, rs17649553) out of 13 SNPs that were genotyped by Georgiou et al. [3] were used for the current study. These SNPs were reported to be associated with PD in previous GWAS or interaction studies [12][13][14][15][16]. One SNP was excluded from our study due to discrepancy between the genotypes. Detailed information on the study's methodology and original SNP selection can be found in the original publication [3]. Based on the previous study, the selected SNPs have been associated with PD (p ≤ 5 × 10 −8 ) in at least one out of the five large GWAS meta-analysis studies for PD in the European population and had 0.81 > OR > 1.23 and MAF > 5%.

PRS Calculation
A weighted PRS based on 12 SNPs previously related with PD [3] (Table 1) was calculated for each individual. The PRS was calculated following the approach previously described in Mavaddat et al. [17] and using the formula PRS = β 1 x 1 + β 2 x 2 +...+ β k x k + . . . + β n x n ; where β is the log(OR) of each SNP from previous published studies and x is the i th SNP dosage (0,1,2) of each individual in our dataset.

Imputation
Complete observation of the missing data was carried out using imputation packages in R. Briefly, genomic prediction was carried out using Ridge Regression Best Linear Unbiased Predictor (rrBLUP) implemented in R package rrBLUP [18], while demographic data imputation was performed by Multivariable Imputation via Chained Equations (MICE) package [19].

Statistical Analysis
A series of packages which work under R software (version 3.6.3) were used. Univariate logistic regression analysis was applied in order to assess the association of each previously established PD risk factor and PD status. Area Under Curve (AUC) was also calculated in order to measure the ability of a risk factor to distinguish between PD patients and controls. In addition, logistic regression analysis was also applied in order to assess the relationship between the PRS and PD status. A p-value of < 0.05 was considered as statistically significant. Odds ratio (OR) and 95% confidence intervals (CI) were calculated.
Continuous variables, PRS and BMI were stratified into quartiles and ORs of each quartile were assessed using logistic regression with the 25-50% (second quartile) and 20-24.94 (normal weight) ranges as references, respectively. PRS was standarised based on the control values.
Stepwise-regression analysis was used to select the best performing variables in the predictive models, using all the variables from the univariate logistic regression analyses.
Stepwise regression was performed using the statistical software R following three approaches; forward selection, backward elimination, and bidirectional elimination. AUC and goodness-of-fit (GOF) were calculated for all the different models. PRS was adjusted by all covariates and possible cofounders.
Cases and controls were stratified into deciles based on the significant variables/risk factors, obtained from the predicted probabilities of the multivariate model. The ORs of extreme deciles were evaluated using logistic regression with a reference range of 40-50%. Plots were designed using ggplot2 function in R.

Study Participants
This study included the genotypic and demographic data of 235 PD patients with a mean ± standard deviation (SD) age of 66.5 ± 10.5 years and 464 age and sex matched controls with a mean ± SD age of 65 ± 10.7.

Parkinson's Disease Risk
In this study, we assessed the predictive performance of a PRS consisting of 12-SNPs and tested the combined effect of the PRS and already established PD risk factors. All demographic data before and after the imputation are shown in Table 2 and Table S1, respectively. Univariate logistic regression analyses showed that four risk factors (age (p = 1.25 × 10 −5 ; OR: 1.04; 95% CI: 1.02-1.05), head injury (p = 3.21 × 10 −3 ; OR: 1.67; 95% CI: 1.19-2.36), family history (p = 4.55 × 10 −14 ; OR: 5.27; 95% CI: 3.44-8. 19) and depression (p ≤ 2.00 × 10 −16 ; OR: 7.47; 95% CI: 4.98-11.37)) were positively associated with PD risk, while BMI at enrollment time was inversely associated with PD risk (p = 4.03 × 10 −6 ; OR: 0.91; 95% CI: (0.87-0.94)) ( Table 3 and Table S2). Logistic regression analysis evidenced that the 12 SNPs-PRS was significantly associated with PD (p = 1.87 × 10 −2 ; OR: 1.39; 95% CI: (1.06-1.84)) (Table 3, Figure 1, Table S2 and Figure S1).    The percentage of cases and controls in PRS score and BMI quartiles and their ORs were also assessed. Tables 4 and 5 show the results before the imputation while Table S3 and Table S4 show the results after the imputation. PRS quartiles analysis showed that participants with the lowest quartile exhibited a non-significant lower risk of PD (OR: 0.74; 95% CI: 0.46−1.19; p = 2.21 × 10 −1 ) compared to the reference quartile. On the contrary, participants in the highest quartile exhibited a non-significant increased risk of PD (OR: 1.14; 95% CI: 0.74-1.76; p = 5.44 × 10 −1 ). BMI quartiles analysis showed that obese (24.95-29.94) participants have approximately 2.5 times less risk to develop PD compared to normal BMI participants (OR: 0.42; 95% CI: 0.25-0.69; p-value: 7.64 × 10 −4 ), whereas underweight (<20) participants are 3 times more likely to develop PD compared again to normal (OR: 2.99; 95% CI: 1.21-7.88; p-value: 2.07 × 10 −2 ). These trends are similar in the imputed data as well.  Stepwise regression model analysis was carried out in forward selection, backward elimination, and bidirectional elimination approaches. All approaches resulted in the The percentage of cases and controls in PRS score and BMI quartiles and their ORs were also assessed. Tables 4 and 5 show the results before the imputation while Table  S3 and Table S4 show the results after the imputation. PRS quartiles analysis showed that participants with the lowest quartile exhibited a non-significant lower risk of PD (OR: 0.74; 95% CI: 0.46−1.19; p = 2.21 × 10 −1 ) compared to the reference quartile. On the contrary, participants in the highest quartile exhibited a non-significant increased risk of PD (OR: 1.14; 95% CI: 0.74-1.76; p = 5.44 × 10 −1 ). BMI quartiles analysis showed that obese (24.95-29.94) participants have approximately 2.5 times less risk to develop PD compared to normal BMI participants (OR: 0.42; 95% CI: 0.25-0.69; p-value: 7.64 × 10 −4 ), whereas underweight (<20) participants are 3 times more likely to develop PD compared again to normal (OR: 2.99; 95% CI: 1.21-7.88; p-value: 2.07 × 10 −2 ). These trends are similar in the imputed data as well.  Stepwise regression model analysis was carried out in forward selection, backward elimination, and bidirectional elimination approaches. All approaches resulted in the same model and suggested that the best predictive model for PD in our study includes eight independent variables; PRS score, age, gender, head injury, family history, depression, smoking, and BMI (Table 6 and Table S5, before and after imputation, respectively). Table 6. Stepwise-regression analysis by forward selection, backward elimination, and bidirectional elimination approaches using data before imputation. All analyses of imputed and non-imputed data yielded similar results. Distribution of cases and controls in deciles of the final multivariate model was investigated (Table 7 and Figure 2) and OR by decile was also calculated (Figure 3) on the non-imputed data. The OR of the first decile was 0.14 (95% CI: 0.03-0.47) with a p-value of 3.66 × 10 −3 . This decile includes 1.9% of cases versus 16.2% of controls. This trend was also observed in the second decile with 14.2% controls and 2.6% cases (OR: 0.21; 95% CI: 0.06-0.66; p-value: 1.14 × 10 −2 ). Interestingly, in the last two deciles (9th and 10th) the ORs were 4.98 (95%CI: 2.20-11.84) and 12 (95% CI: 4.76-33.08), respectively. In addition, the percentages of cases and controls were inversed, with the proportion of cases to be higher than the proportion of controls. These analyses were also carried out on the imputed data and results are shown in Table S6 and Figures S2 and S3. These results show that the multivariate model enables the stratification of the population according to the risk of developing PD.

Discussion
A large number of studies suggest that a combination of genetic and environmental/lifestyle factors play a key role in the triggering of PD [20]. Although, several GWAS studies were carried out on PD, 90 genetic variants, which have been reported to be associated with the disease in the latest GWAS, explain only approximately 16% of the PD burden [4]. In addition, epidemiological studies reported various environmental/lifestyle factors that are either positively or negatively associated with the development of the disease [4]. The incorporation of these genetic and non-genetic factors in predictive models may help in the identification of individuals with a higher risk to develop PD. The aim of this study was to investigate the predictive performance of a PRS consisting of 12 SNPs that have been previously associated with PD, and also to assess the association between already established environmental PD risk factors and PD risk. Despite that the largest

Discussion
A large number of studies suggest that a combination of genetic and environmental/lifestyle factors play a key role in the triggering of PD [20]. Although, several GWAS studies were carried out on PD, 90 genetic variants, which have been reported to be associated with the disease in the latest GWAS, explain only approximately 16% of the PD burden [4]. In addition, epidemiological studies reported various environmental/lifestyle factors that are either positively or negatively associated with the development of the disease [4]. The incorporation of these genetic and non-genetic factors in predictive models may help in the identification of individuals with a higher risk to develop PD. The aim of this study was to investigate the predictive performance of a PRS consisting of 12 SNPs that have been previously associated with PD, and also to assess the association between already established environmental PD risk factors and PD risk. Despite that the largest

Discussion
A large number of studies suggest that a combination of genetic and environmental/lifestyle factors play a key role in the triggering of PD [20]. Although, several GWAS studies were carried out on PD, 90 genetic variants, which have been reported to be associated with the disease in the latest GWAS, explain only approximately 16% of the PD burden [4]. In addition, epidemiological studies reported various environmental/lifestyle factors that are either positively or negatively associated with the development of the disease [4]. The incorporation of these genetic and non-genetic factors in predictive models may help in the identification of individuals with a higher risk to develop PD. The aim of this study was to investigate the predictive performance of a PRS consisting of 12 SNPs that have been previously associated with PD, and also to assess the association between already established environmental PD risk factors and PD risk. Despite that the largest PD GWAS meta-analysis has reported 90 risk loci associated with PD risk in the European population [16], in our study we assess only 12 SNP that were previously selected and investigated in our population. We carried out an evaluation/replication study because the number of Greek-Cypriot patients with PD was relatively small and thus statistical power for a discovery study could not be reached.
Through this study, we assessed some risk factors that were previously reported to be associated with the development of PD and we found that age, head injury, family history, and depression were positively associated with PD, while BMI was inversely associated with the disease (Table 3 and Table S2). Interestingly, these results are consistent with the results of previous studies (head injury (OR:1.55; 95% CI:1.33-1.81)) [21]; family history (RR:4.45; 95% CI:3.39-5.83) [22]; depression (OR:15.1; 95% CI:5.64-40.78) [8]). In addition, Chen et al. [23] in a meta-analysis study suggested that being overweight may decrease the risk of developing PD. In our analysis for BMI, it was observed that being obese was significantly associated with decreased risk for PD, while being underweight was significantly associated with increased risk for PD, compared with individuals with normal BMI. These results suggest that BMI might be associated with the development of PD. Our findings are in concordance with Noyce et al. [24], but of course these associations might also be a result of PD side effects, e.g., due to poor nutrition.
Furthermore, a PRS consisting of 12-SNPs that were genotyped in a previous study by Georgiou et al. [3] was also calculated. Logistic regression analysis showed that the 12-SNPs PRS was significantly associated with PD status (OR:1.39; 95% CI:1.06-1.84). In addition, division of PRS in quartiles highlighted that individuals with a higher PRS score have a higher risk to develop PD while individuals with lower PRS score have a lower risk. A recent study by Jacobs et al. demonstrated that individuals which are in the highest PRS decile had about 3.5 times higher risk to develop PD compared to the individuals in the lowest PRS decile [4]. In a previous study, Escott-Price et al. [9] reported that PRS is correlated with age of onset in PD as the average of PRS was significantly higher in patients with early onset compared to late onset [9]. In another study, Ibanez et al. replicated the results using GWAS loci from Nalls et al. [16] and suggested that the genetic plays an essential role both in PD risk and its age of onset [1]. On the other hand, Butcher et al. used common variants that are associated with PD and showed that the PRS of those variants is not significantly associated with PD risk [25]. In a more recent study, Nalls et al. performed 2-stages PRS analysis using~90 and~2000 variants from NeuroX-dbGaP dataset and showed that AUC of PRS with the larger number of variants was better and based on their calculations, these PRSs explain~16% and 26% of PD heritability [26]. Similar to our results, individuals with PRS values in the highest quartile had higher risk to develop PD while individuals with PRS values in the lowest quartile had lower risk compared to the reference range [26]. Furthermore, Paul et al. used 23 GWAS SNPs and suggested an association between the PRS and faster cognitive dysfunction and progression of motor symptoms [27]. In addition, Iwaki et al. [28] showed that PRS may modify the penetrance and age of onset in LRRK2 p.G2019S carriers. Up to date, the largest Genome-Wide Polygenic Risk Score (GPRS) was carried out by Han et al. [29] using data from~80,000 individuals and 6.2 million variants and showed that the GPRS is associated with age of onset, PD risk, and UPDRS scores.
In this study, we performed stepwise-regression analysis using the three different approaches in order to estimate the best predictive model for PD in our population. All three approaches suggested that the best predictive model includes PRS as well as seven additional independent factors (age, gender, head injury, family history, depression, smoking, and BMI). A similarly designed study carried out by Jacobs et al. [4] using data from the UK biobank demonstrated that family history, not-smoking, low-alcohol consumption, sleepiness, depression, family history with dementia, early menarche and epilepsy are strongly associated with PD. No essential differences were observed when all the significant risk factors were combined. Interestingly, model performance was moderately improved with the inclusion of the PRS in the PREDICT-PD algorithm [4].
We also investigated the distribution of cases and controls using the final multivariate model. In the lowest deciles, the percentage of controls was large, and the percentage of cases was small. On the other hand, in the highest deciles the percentages of cases and controls were inversed, with the proportion of cases to be greater than the proportion of controls. These results highlight the capability of this multivariate model to stratify the population according to the risk of developing PD.
The main limitations of our study are the small number of cohort and the small number of SNPs that were used for the PRS calculation, which lead to lower power of the study and minimal ability of genetic factors to differentiate case from controls (e.g., AUC of PRS; 0.55). Selection bias might be also included as a small single population was sampled and some members of the population are more likely to be included than others. In addition, as several data were collected through Yes/No answers from the participants, interview bias might be also included. However, this study is important as it is the first pilot study which aims to evaluate the predictive performance of a PRS and design a predictive model for PD in the Greek-Cypriot population, a relatively small population. After the appropriate evaluations, a future model with more SNPs and individuals could be used as a complementary diagnostic method. In addition, this study initiated the investigation of combined genetic and environmental factors of PD in our population.

Conclusions
In conclusion, these results suggest an association between five environmental/lifestyle factors as well as a 12-SNPs PRS and risk of PD. In addition, the model, which combines eight independent factors, could be useful for the calculation of a risk score predictive of PD in the Greek-Cypriot population.
Therefore, this may facilitate a better understanding of gene-gene as well as geneenvironment interactions in the development of PD. Further investigation with a larger cohort and a PRS with additional variants may increase the statistical power and confirm that the combination of these factors could potentially be used for predictive testing.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12081278/s1, Table S1: Demographic characteristics and lifestyle/environmental exposure risk factors of Cypriot PD cases and controls after imputation.; Table S2: Univariate logistic regression analysis of lifestyle/environmental risk factors after imputation.; Table S3: ORs, 95% CI and distribution of cases and controls in PRS quartiles after imputation.; Table S4: ORs, 95% CI and distribution of cases and controls in BMI quartiles after imputation.; Table S5: Stepwise-regression analysis by forward selection, backward elimination and bidirectional elimination approaches using the imputed data.; Table S6: ORs, 95%CI and distribution of cases and controls in deciles of the final multivariate-model after imputation.; Figure S1: PRS distribution between PD cases and controls after imputation. This plot shows the probability density versus PRS in cases and controls.; Figure  S2: Cases and controls distribution in deciles using the multivariable model after imputation. The distribution of cases and controls are described in blue and orange, respectively; Figure S3: OR by decile of the multivariable model after imputation.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.