1. Introduction
Parkinson’s disease (PD) is a progressive neurodegenerative movement disorder and the second most common after Alzheimer’s disease, worldwide [
1]. Selective loss or death of dopamine secreting neurons of the substantia nigra, and Lewy bodies accumulation in the spinal cord and brain are key pathological feature of the disease [
2]. PD is characterised by motor symptoms such as rigidity, bradykinesia, and resting tremor as well as non-motor symptoms [
2]. The prevalence of PD varies by age as it affects ~0.3% of the general population, ~1% of the population over of 60 years, and 3.5% of the population over 85 years [
1,
3]. Up to date, the etiology of PD is still unclear [
2]. However, PD is considered as a multifactorial disorder and previous epidemiological studies suggest that both genetic and environmental factors play an essential role in the triggering of the disease [
2,
3,
4].
Over the years, more than 90 genetic variants that are associated with sporadic PD, progression and age at onset have been identified through multiple Genome Wide Association Studies (GWAS) [
5]. Despite the large number of the genetic studies and the reported variants in PD, some of the variants have a small effect on disease risk and an important proportion of the overall genetic contribution to PD risk is not clearly understood [
1]. In addition, previous studies reported that environmental, including lifestyle, factors may also play a role in the development of the disease. Environmental factors, such as diet (e.g., dairy products, soft drinks, and red meat consumption), depression, exposure to pesticides, rural living and head injury were positively associated with PD, whereas smoking, alcohol, coffee consumption, and physical activity were inversely associated with the disease [
3,
6,
7,
8].
Although the already published genetic and epidemiological studies explain a substantial part of genetic as well as phenotypic variability and etiology of PD, a large fraction of genetic and environmental contribution remains to be studied [
1]. Genetic and environmental factors may interact with each other in a complex manner, increasing the risk for the development of PD [
2,
3,
4]. Several studies indicated that polygenic risk scores (PRS), which combine the effect of multiple genetic variants, can capture the overall genetic background of complex traits and diseases, including PD [
1,
4,
9,
10]. Furthermore, a combination of the genetic background with environmental findings may provide a better understanding of the disease. Although, a small GWAS was previously performed in the Greek population and some of the investigated SNPs (rs6599389 and rs356220) had the same OR direction with the results of our study in the Greek-Cypriot population, PRS was not calculated [
11]. Therefore, the aim of this pilot study was to evaluate the predictive performance of a PRS consisting of 12-SNPs, which have been previously associated with PD in GWAS, and to test the combined effect of the PRS and already established risk factors on PD risk. This is a proof-of-concept study that aims to explore our population. Despite that the sample size of our study in this population is relatively small; this study provides important results and will initiate the investigation of combined environmental and genetic factors of PD in the Greek-Cypriot population. Future studies with increased sample size and number of SNPs could redefine future models and will have the potential to be evaluated for their clinical utility.
2. Materials and Methods
2.1. Dataset
This investigation used genotypic and demographic/lifestyle data obtained from a previous case-control study carried out in the Greek-Cypriot population (235 unrelated PD patients and 464 age and sex matched unrelated healthy controls) [
3]. All patients were recruited in the study after clinical diagnosis of PD. Demographic/lifestyle data from PD cases and controls were collected through a personal interview. The study was approved by the Cyprus National Bioethics Committee (ΕΕΒΚ/ΕΠ/2014/29) and all subjects gave written informed consent in accordance with the 1964 Declaration of Helsinki. Genotypic data of 12 (rs12185268, rs10513789, rs6599389, rs356220, rs7617877, rs17115100, rs13312, rs1801582, rs4837628, rs823118, rs356182, rs17649553) out of 13 SNPs that were genotyped by Georgiou et al. [
3] were used for the current study. These SNPs were reported to be associated with PD in previous GWAS or interaction studies [
12,
13,
14,
15,
16]. One SNP was excluded from our study due to discrepancy between the genotypes. Detailed information on the study’s methodology and original SNP selection can be found in the original publication [
3]. Based on the previous study, the selected SNPs have been associated with PD (
p ≤ 5 × 10
−8) in at least one out of the five large GWAS meta-analysis studies for PD in the European population and had 0.81 > OR > 1.23 and MAF > 5%.
2.2. PRS Calculation
A weighted PRS based on 12 SNPs previously related with PD [
3] (
Table 1) was calculated for each individual. The PRS was calculated following the approach previously described in Mavaddat et al. [
17] and using the formula PRS = β
1x
1 + β
2x
2 +...+ β
kx
k +…+ β
nx
n; where β is the log(OR) of each SNP from previous published studies and x is the i
th SNP dosage (0,1,2) of each individual in our dataset.
2.3. Selected Demographic Data
Fifteen demographic variables were selected and assessed for this study: age (years), gender (female/male), outdoor work (yes/no), pesticides or toxic substances (yes/no), pesticides (yes/no), well water drinking (yes/no), head injury (yes/no), family history of PD (yes/no), hypertension (yes/no), statin use (yes/no), depression (yes/no), smoking-current or ever (yes/no), physical activity (yes/no), Body Mass Index (BMI) (kg/m2), and coffee consumption (yes/no).
2.4. Imputation
Complete observation of the missing data was carried out using imputation packages in R. Briefly, genomic prediction was carried out using Ridge Regression Best Linear Unbiased Predictor (rrBLUP) implemented in R package rrBLUP [
18], while demographic data imputation was performed by Multivariable Imputation via Chained Equations (MICE) package [
19].
2.5. Statistical Analysis
A series of packages which work under R software (version 3.6.3) were used. Univariate logistic regression analysis was applied in order to assess the association of each previously established PD risk factor and PD status. Area Under Curve (AUC) was also calculated in order to measure the ability of a risk factor to distinguish between PD patients and controls. In addition, logistic regression analysis was also applied in order to assess the relationship between the PRS and PD status. A p-value of < 0.05 was considered as statistically significant. Odds ratio (OR) and 95% confidence intervals (CI) were calculated.
Continuous variables, PRS and BMI were stratified into quartiles and ORs of each quartile were assessed using logistic regression with the 25–50% (second quartile) and 20–24.94 (normal weight) ranges as references, respectively. PRS was standarised based on the control values.
Stepwise-regression analysis was used to select the best performing variables in the predictive models, using all the variables from the univariate logistic regression analyses. Stepwise regression was performed using the statistical software R following three approaches; forward selection, backward elimination, and bidirectional elimination. AUC and goodness-of-fit (GOF) were calculated for all the different models. PRS was adjusted by all covariates and possible cofounders.
Cases and controls were stratified into deciles based on the significant variables/risk factors, obtained from the predicted probabilities of the multivariate model. The ORs of extreme deciles were evaluated using logistic regression with a reference range of 40–50%. Plots were designed using ggplot2 function in R.
4. Discussion
A large number of studies suggest that a combination of genetic and environmental/lifestyle factors play a key role in the triggering of PD [
20]. Although, several GWAS studies were carried out on PD, 90 genetic variants, which have been reported to be associated with the disease in the latest GWAS, explain only approximately 16% of the PD burden [
4]. In addition, epidemiological studies reported various environmental/lifestyle factors that are either positively or negatively associated with the development of the disease [
4]. The incorporation of these genetic and non-genetic factors in predictive models may help in the identification of individuals with a higher risk to develop PD. The aim of this study was to investigate the predictive performance of a PRS consisting of 12 SNPs that have been previously associated with PD, and also to assess the association between already established environmental PD risk factors and PD risk. Despite that the largest PD GWAS meta-analysis has reported 90 risk loci associated with PD risk in the European population [
16], in our study we assess only 12 SNP that were previously selected and investigated in our population. We carried out an evaluation/replication study because the number of Greek-Cypriot patients with PD was relatively small and thus statistical power for a discovery study could not be reached.
Through this study, we assessed some risk factors that were previously reported to be associated with the development of PD and we found that age, head injury, family history, and depression were positively associated with PD, while BMI was inversely associated with the disease (
Table 3 and
Table S2). Interestingly, these results are consistent with the results of previous studies (head injury (OR:1.55; 95% CI:1.33–1.81)) [
21]; family history (RR:4.45; 95% CI:3.39–5.83) [
22]; depression (OR:15.1; 95% CI:5.64–40.78) [
8]). In addition, Chen et al. [
23] in a meta-analysis study suggested that being overweight may decrease the risk of developing PD. In our analysis for BMI, it was observed that being obese was significantly associated with decreased risk for PD, while being underweight was significantly associated with increased risk for PD, compared with individuals with normal BMI. These results suggest that BMI might be associated with the development of PD. Our findings are in concordance with Noyce et al. [
24], but of course these associations might also be a result of PD side effects, e.g., due to poor nutrition.
Furthermore, a PRS consisting of 12-SNPs that were genotyped in a previous study by Georgiou et al. [
3] was also calculated. Logistic regression analysis showed that the 12-SNPs PRS was significantly associated with PD status (OR:1.39; 95% CI:1.06–1.84). In addition, division of PRS in quartiles highlighted that individuals with a higher PRS score have a higher risk to develop PD while individuals with lower PRS score have a lower risk. A recent study by Jacobs et al. demonstrated that individuals which are in the highest PRS decile had about 3.5 times higher risk to develop PD compared to the individuals in the lowest PRS decile [
4]. In a previous study, Escott-Price et al. [
9] reported that PRS is correlated with age of onset in PD as the average of PRS was significantly higher in patients with early onset compared to late onset [
9]. In another study, Ibanez et al. replicated the results using GWAS loci from Nalls et al. [
16] and suggested that the genetic plays an essential role both in PD risk and its age of onset [
1]. On the other hand, Butcher et al. used common variants that are associated with PD and showed that the PRS of those variants is not significantly associated with PD risk [
25]. In a more recent study, Nalls et al. performed 2-stages PRS analysis using ~90 and ~2000 variants from NeuroX-dbGaP dataset and showed that AUC of PRS with the larger number of variants was better and based on their calculations, these PRSs explain ~16% and 26% of PD heritability [
26]. Similar to our results, individuals with PRS values in the highest quartile had higher risk to develop PD while individuals with PRS values in the lowest quartile had lower risk compared to the reference range [
26]. Furthermore, Paul et al. used 23 GWAS SNPs and suggested an association between the PRS and faster cognitive dysfunction and progression of motor symptoms [
27]. In addition, Iwaki et al. [
28] showed that PRS may modify the penetrance and age of onset in
LRRK2 p.G2019S carriers. Up to date, the largest Genome-Wide Polygenic Risk Score (GPRS) was carried out by Han et al. [
29] using data from ~80,000 individuals and 6.2 million variants and showed that the GPRS is associated with age of onset, PD risk, and UPDRS scores.
In this study, we performed stepwise-regression analysis using the three different approaches in order to estimate the best predictive model for PD in our population. All three approaches suggested that the best predictive model includes PRS as well as seven additional independent factors (age, gender, head injury, family history, depression, smoking, and BMI). A similarly designed study carried out by Jacobs et al. [
4] using data from the UK biobank demonstrated that family history, not-smoking, low-alcohol consumption, sleepiness, depression, family history with dementia, early menarche and epilepsy are strongly associated with PD. No essential differences were observed when all the significant risk factors were combined. Interestingly, model performance was moderately improved with the inclusion of the PRS in the PREDICT-PD algorithm [
4].
We also investigated the distribution of cases and controls using the final multivariate model. In the lowest deciles, the percentage of controls was large, and the percentage of cases was small. On the other hand, in the highest deciles the percentages of cases and controls were inversed, with the proportion of cases to be greater than the proportion of controls. These results highlight the capability of this multivariate model to stratify the population according to the risk of developing PD.
The main limitations of our study are the small number of cohort and the small number of SNPs that were used for the PRS calculation, which lead to lower power of the study and minimal ability of genetic factors to differentiate case from controls (e.g., AUC of PRS; 0.55). Selection bias might be also included as a small single population was sampled and some members of the population are more likely to be included than others. In addition, as several data were collected through Yes/No answers from the participants, interview bias might be also included. However, this study is important as it is the first pilot study which aims to evaluate the predictive performance of a PRS and design a predictive model for PD in the Greek-Cypriot population, a relatively small population. After the appropriate evaluations, a future model with more SNPs and individuals could be used as a complementary diagnostic method. In addition, this study initiated the investigation of combined genetic and environmental factors of PD in our population.