Imputation and Reanalysis of ExomeChip Data Identifies Novel, Conditional and Joint Genetic Effects on Parkinson’s Disease Risk

Given that improved imputation software and high-coverage whole genome sequence (WGS)-based haplotype reference panels now enable inexpensive approximation of WGS genotype data, we hypothesised that WGS-based imputation and analysis of existing ExomeChip-based genome-wide association (GWA) data will identify novel intronic and intergenic single nucleotide polymorphism (SNP) effects associated with complex disease risk. In this study, we reanalysed a Parkinson’s disease (PD) dataset comprising 5540 cases and 5862 controls genotyped using the ExomeChip-based NeuroX array. After genotype imputation and extensive quality control, GWA analysis was performed using PLINK and a recently developed machine learning approach (GenEpi), to identify novel, conditional and joint genetic effects associated with PD. In addition to improved validation of previously reported loci, we identified five novel genome-wide significant loci associated with PD: three (rs137887044, rs78837976 and rs117672332) with 0.01 < MAF < 0.05, and two (rs187989831 and rs12100172) with MAF < 0.01. Conditional analysis within genome-wide significant loci revealed four loci (p < 1 × 10−5) with multiple independent risk variants, while GenEpi analysis identified SNP–SNP interactions in seven genes. In addition to identifying novel risk loci for PD, these results demonstrate that WGS-based imputation and analysis of existing exome genotype data can identify novel intronic and intergenic SNP effects associated with complex disease risk.


Introduction
Over the past decade, genome-wide association studies (GWAS) have successfully identified many individual common genetic variants (i.e., single nucleotide polymorphisms (SNPs)) associated with the risk of a wide range of complex diseases. However, due to insufficient statistical power, the genetic effects identified by typical GWAS studies tend to explain only a small fraction of the overall genetic variation underlying complex diseases [1]. In order to identify this missing heritability of complex diseases, it is important to explore the role of low-frequency SNPs: SNPs with minor allele frequency (MAF) less than 0.05 at novel or established risk loci and the potential interaction between SNPs that might have a strong contribution towards disease risk compared to their main effects. However, because most GWAS studies focus on generating genetic data in new samples and use standard statistical tools to detect common SNPs with marginal effects, they do not identify heterogeneous effects or epistasis interaction effects of multiple SNPs.
Next-generation sequencing (NGS) technology allowed the development and use of cost-effective genotyping arrays to efficiently genotype and assess common genome-wide genetic variation in large samples, leading to the discovery of thousands of risk SNPs for many complex diseases. The genetic resolution of those large genotyped datasets can be increased via imputation of unobserved common and rare variants with advanced gender discrepancy (individuals with genetically predicted and reported sex difference) and individuals showing excess heterozygosity (deviate ±3 SD from sample heterozygosity rate) were excluded. After this initial individual filtering, SNPs with per SNP missingness greater than 5% as well as SNPs with MAC below 3 and SNPs that are not in Hardy-Weinberg equilibrium (HWE p-value < 1 × 10 −6 ) were removed. Then, the pairwise identity by descent (IBD) was calculated using PLINK v1.9 after pruning for linkage disequilibrium (LD), where SNPs with r 2 > 0.02 within 50-SNP sliding window were pruned out, and used to remove cryptically related individuals (individuals with lowest call rate from the pairs of individuals with 'pi_hat' > 0.2). Finally, principal component analysis (PCA) was used to identify and exclude individuals with genetic ancestry inconsistent with European descent compared to the 1000 G reference panel.

Imputation and Post Imputation QC
Genotype imputation using the HRC reference panel (version r1.1, which consists of 64,940 haplotypes of predominantly European ancestry) was performed with mini-mac4 using next-generation genotype imputation service and methods [12] available in Michigan Imputation Server (https://imputationserver.sph.umich.edu/index.html accessed on 18 January 2020). Input data preparation for imputation according the data preparation guidelines provided by Michigan Imputation Server and post-imputation QC was done using PLINK v2.0 and vcftools [13]. Quality control for HRC reference as a pre-preparation step was carried out using the toolbox provided by Will Rayner (http://www.well.ox.ac.uk/~wrayner/tools/ accessed on 21 January 2020). Genotype imputation for each chromosome was performed after several QC and phasing steps by the server and imputed data which includes dose.vcf and info file for each chromosome were download directly from the server. The imputation quality of the imputed SNPs was evaluated using minimac4 info score provided in .info file. Post-imputation QC was done by extracting SNPs with MAF ≥ 0.001, info score ≥ 0.5, and HWE p-value ≥ 1 × 10 −7 in controls, to identify quality SNPs for further analysis. Using PLINK, the imputed genotype posterior probabilities in the VCF files were converted to Oxford-format (.gen) best-guess genotypes for the GenEpi interaction analyses.

Association Analysis and Conditional Analysis
Association analysis of imputed genotype dosage data was done using PLINK v2.0. First, PCA was carried out with LD pruned SNPs (following the same criteria described in QC section) to generate eigenvectors. Then, logistic regression was performed, adjusting for the first two principal components (PCs), age, and sex to examine the additive effect for each SNP on PD risk.
Conditional analyses were next applied to identify secondary association signals. For each genome-wide significant locus identified in the association analysis, we performed region-wise conditional analysis using PLINK and tested all the SNPs in the region while adjusting for the most significant ("index") SNP in that region, as well as for all the covariates analogous to the association analysis.

Interaction Analysis
To identify joint effects of SNPs on PD risk, we used a recently developed computational package called GenEpi [3], which applies a gene-based machine learning approach to discover pair-wise epistasis associated with a phenotype. In GenEpi, the first step is to group genetic variants by a set of loci (i.e., genes) in the genome using gene information available in the UCSC human genome annotation database [14] followed by dimensionality reduction of genetic features in each locus using LD which involves grouping of features into LD blocks using a given r 2 and D' threshold and selection of the features with the largest MAF to represent each block. The selected genotype features of each single gene will then be independently modelled by L1-regularised regression. In the next stage, to identify cross-gene epistasis features, both the individual SNPs and the previously selected within-gene epistasis features are pooled together and used in L1-regularised regression to select the final genotype feature set. In addition, users have the option to include environmental factors to build a final model. Evaluation of the final model is available in the design by 2-fold cross validation (CV). Given this study's focus is to identify SNP-SNP interactions associated with PD risk (not prediction), SNP-SNP interactions were further analysed by generating counts and frequencies of each two-locus genotype using PLINK v1.9 to understand the manner of each interaction.
For the current study, GenEpi was applied to best-guess genotypes on the set of SNPs with nominal statistically significant association results (p-value < 0.05) using the thresholds of D' > 0.8 and r 2 > 0.8 to generate LD blocks of features, and the first two ancestry PCs, age and sex, as environmental factors.

Genome-Wide Association Analysis
In the current study, the initial PD case-control GWAS individual-level dataset downloaded from dbGaP was quality controlled to exclude low-quality variants and samples (see Supplementary Table S1 for composition of the sample) using customised quality control procedures (Methods and Supplementary Table S2) to include low-frequency variants for further analysis. After quality control steps, a total of 10,533 individuals (5167 cases, 5366 controls) and 110,504 SNPs remained for genotype imputation.
After filtering out low-quality individuals and SNPs, the remaining dataset was processed following the data preparation guidelines provided by the Michigan Imputation Server. Briefly, for each chromosome, VCF files created using VCFCooker were sorted by genome position and uploaded as input files to the server. The server's imputation process, including pre-phrasing and imputation to the HRC reference panel using min-imac4, took about 15 h after successful input validation and quality control. The total imputed dataset downloaded from the server contained approximately 40 million SNPs. Post-imputation quality screening using minimac4 info score, MAF and HWE p-value as parameters resulted in a substantially increased dataset. Compared to the original NeuroX genotyped dataset, the final imputed dataset contained 1,465,938 SNPs with good imputation quality (minimac4 info score ≥ 0.5, MAF ≥ 0.001, and HWE p-value ≥ 1 × 10 −7 in controls), representing an increase of 1200%. Among the imputed SNPs, 733,576 were common (MAF ≥ 0.05) and 732,362 were low frequency (MAF < 0.05). Moreover, despite the vast majority (73%) of SNPs in the initial NeuroX dataset being exonic, many intronic and intergenic SNPs were imputed; indeed, the imputed dataset comprised 53% intronic, 40% intergenic and 7% exonic SNPs (Table 1 and Supplementary Table S5). After testing each SNP for association with PD risk using logistic regression including age, sex and the first two ancestry PCs as covariates (Methods), association results were obtained for 1,465,918 SNPs, with 20 very rare variants producing NA values where the logistic regression failed to converge (see Supplementary Data S1 'Neu-roX_Reanalysis_Summary_Statistics.txt' and Supplementary Note S1 'Description_NeuroX_ Reanalysis_Summary_Statistics.txt'), A total of 11 independent association signals for PD were identified, reaching a genome-wide significant p-value (p ≤ 5 × 10 −8 ), including 5 newly identified signals that are more than 1 MB from the previously reported PD risk loci (Table 2, Figure 1, Supplementary Table S3, Supplementary Figure S2). Of these novel loci, three are driven by low-frequency (0.01 < MAF < 0.05) variants (rs137887044 in WDR41 on chromosome 5q14.1, rs78837976 in MUC12 on 7q22.1, and rs117672332 in ITGAE/HASPIN on 17p13.2), and two by rare (MAF < 0.01) variants (rs187989831 near TEKT4 on chromosome 2q11.1 and rs12100172 in CARS2 on 13q34). LocusZoom plots of the identified novel loci are shown in Figure 2.    Overall, with this individual SNP analysis of the imputed data, we were able to identify seven PD risk loci that were not reported in the original Nalls et al. (2014) study, comprising five novel loci and two other loci: rs983361 in SNCA at 4q22.1, which has been reported to be associated with PD age at onset [15] and rs7221167 in MAPT at 17q21.31, which has been reported but failed final filtering and QC in Nalls et al. 2019 PD GWAS [5] ( Table 2).

PD Risk Loci Reported in Other GWAS
Conditional analysis revealed nine loci with more than one independent risk signal, including two loci (within SNCA and HASPIN) reaching genome-wide significance (p < 5 × 10 −8 ). Of those, one secondary association signal is in the newly identified gene HASPIN (rs11653889 and rs117672332 at 17p13.2) and loci within GBA, TMEM175, SNCA, and GAK/DGKQ had been previously identified as multi-signal loci by PD GWAS. In addition, of those secondary association signals identified in conditional analysis, four (rs113319394, rs3806789, rs74125084 and rs11653889) have high LD (r 2 > 0.1) with the index SNP and five (rs112344141, rs181580861, rs72765119, rs28645997, rs3851784) have very low LD (r 2 ≤ 0.01) with the index SNP. The locus with a secondary association signal at 4q22.1 (rs3806789 in SNCA) showed the largest decrease in p-value (from 2.10 × 10 −2 to 9.13 × 10 −10 ) producing a conditional odds ratio of 1.2 when conditioned on rs356182, indicating significant allelic heterogeneity at this locus. Detailed summary statistics on all nine secondary loci can be found in the Table 3 (LocusZoom plots of these loci are available in Supplementary Figure S1). Secondary SNP = secondary association single-nucleotide polymorphism; CHR = secondary SNP chromosome; BP = secondary SNP base position in GRCh37 (hg19); EA = secondary SNP effect allele; EAF = secondary SNP effect allele frequency; Index SNP = most significant SNP used to condition on; r 2 = LD between the secondary and index SNP; OR = odds ratio and p-value = p-value for the secondary SNP from standard association analysis; OR cond = odds ratio and p-value cond = p-value for the secondary SNP from conditional analyses.

Comparison of Association Results with Nalls et al. (2014) Findings
The NeuroX dataset has been previously used by Nalls et al. (2014) to replicate 26 SNP loci found to be associated with PD disease risk (p < 5 × 10 −8 ) from a meta-analysis of genome-wide association data (Discovery Phase). Of these 26 SNPs, eight were not available in the NeuroX dataset for analysis due to failed assay design or quality control, so for the replication study, a suitable proxy SNP was selected. Of the 26 PD risk loci examined in the original NeuroX study by Nalls et al. (2014), 18 were replicated (p < 0.05) using the same SNP and an additional four loci were replicated using proxy SNPs.
In the current study, of the eight SNPs missing in the NeuroX dataset, apart from one SNP (rs8118008)-due to its absence in the HRC reference panel-seven were successfully imputed. Analysis of the imputed genotype data successfully replicated 21 of the 22 PD risk loci that were originally replicated in Nalls et al. (2014), including the rs8118008 locus that although not imputed itself, was replicated using a stronger proxy SNP rs8125675 (r 2 = 1) compared to the proxy SNP (rs55785911, r 2 = 0.85) used in Nalls et al. (2014). Notably, our analysis was able to impute and replicate (p = 0.031) an additional PD risk locus (rs62120679) that was not replicated using a moderate (r 2 = 0.49) proxy SNP (rs10402629) in  (Table 4 and Supplementary Table S4).  (hg19); EA = effect allele; EAF = effect allele frequency; OR = odds ratio and p-value = p-value of the association analysis; Imp_rsq = IMPUTE4 info score. In replication phase of Nalls et al. results, * indicates the SNPs that failed assay design or quality control and a suitable proxy SNP was used (proxy rs71628662 for rs35749011; proxy rs1955337 for rs1474055; proxy rs62267708 for rs115185635; proxy rs118117788 for rs117896735; proxy rs12283611 for rs3793947; proxy rs1077989 for rs1555399; proxy rs10402629 for rs62120679; proxy rs55785911 for rs8118008).
In current study results, for rs8118008 that is not available in HRC to impute, a perfect (r 2 = 1) proxy SNP rs8125675 was selected. SNPs with divergent replication results are shown in bold.

Novel Low-Frequency Variants Associated with PD
Along with those known genetic loci associated with PD, we also identified five novel loci, of which three were driven by low-frequency variants with effect sizes (OR > 1.85; Table 2, Figure 1). One of these loci, driven by a low-frequency intron variant in WDR41 gene at chromosome 5q14.1 (rs137887044, OR = 1.850 [1.489-2.295], p = 2.41 × 10 −8 ), has previously been implicated in multiple neurological disorders. The LocusZoom plot for the 1 Mb region of this novel SNP (Figure 2b), showed another genome-wide significant SNP (rs148662448 near WDR41) having strong LD (r 2 = 0.899) with the novel SNP rs137887044. As expected, when conditioned on rs137887044, rs148662448 no longer showed evidence for association (p = 0.411), indicating a single genetic risk factor exists at this location.
Our analysis highlighted two novel rare variants. One at chromosome 2q11.1 (rs187989831, p = 7.56 × 10 −10 ) near TEKT4. As shown in the LocusZoom plot (Figure 2a), there are two other genome-wide significant SNPs (rs1281734107 and rs78890475) in low LD (r 2 ≤ 0.2) close to the novel index SNP. However, conditional analyses conditioning on rs187989831 found only weak evidence for residual association (0.005 < p < 0.03) of these two SNPs at this locus. The second rare novel variant (rs74125032, p = 2.15 × 10 −10 ) lies within an intron of the CARS2 gene on chromosome 13q34. However, these two variants have extremely small OR in this study and the reason could be that these variants are extremely rare and have very low genotype frequency within the NeuroX dataset.

Joint Genetic Effects on PD Risk
Machine learning (GenEpi) association analyses identified significant (p < 3.77 × 10 −6 ) SNP-SNP interactions at five independent genomic loci harbouring eight different genes ( Table 5). Seven of the eight genes (GAK, TMEM175, SNCA, PLEKHM1, CRHR1, MAPT and NSF) have been implicated via GWAS by others as having individual SNPs associated with PD risk, whereas a joint effect of two SNPs at chromosome 7p15.3 (rs2965400 and rs6461595, p = 3.77 × 10 −6 ) within an intron of the DNAH11 gene has not previously been implicated in PD, although it has been reported to be associated with cholesterol level and (age-related) cognitive decline. Furthermore, the most significant interaction effect on PD was found between two SNPs rs34186148 and rs242941 (p = 4.78 × 10 −10 ) at chromosome 17q21.31 in the CRHR1 gene with the homozygous CC genotype being protective for PD at both SNPs. The protein coding CRHR1 gene is reported to be associated with anxiety and depression which are common in PD. Figure 3 shows the genotype combination of SNPs in CRHR1 (Figure 3a), DNAH11 (Figure 3b) and the most significant interaction in other three independent loci (TMEM175 at 4p16.3, SNCA at 4q22.1 and NSF at 17q21.31), highlighting frequency differences in cases and controls for different genotype combinations underlying the significant association with PD.

Discussion
In this study, we reanalysed an ExomeChip-based NeuroX dataset-previously used for the replication of GWA meta-analysis results [6,16,17]-to identify novel common and rare SNPs and their interactions associated with PD risk. Starting with only 110,504 NeuroX SNPs passing QC, comprising predominantly (73%) exonic and less common variants, we accurately imputed 1,465,938 SNPs using the HRC reference panel. The imputed dataset comprised 53% intronic, 40% intergenic and 7% exonic SNPs and spanned a wide frequency range including rarer as well as more common SNPs across chromosomes 1-22 and X. A review of the literature only found examples focussing on genome-wide imputation of exonic variants. For example, Auer et al. (2012) performed genotype imputation of exome sequence variants in a sample of more than 13,000 African Americans with Affymetrix GWA genotyping array (Affy6.0) data, using a reference comprising 761 African Americans with both Affy6.0 genotype data (838,337 SNPs with MAF > 0.01 spread across the genome) and exome sequence data to identify exonic variants associated with blood cell counts [18]. Similar studies in the same cohort were performed by Johnsen et al. (2013) and Du et al. (2014) to identify novel low-frequency variants that contribute to von Willebrand factor [19] and adult body height [20]. In contrast, we imputed common and rare variants across the genome using a WGS-based HRC reference panel, starting with predominantly rare exonic variants. In the current study, compared to the association results from the original NeuroX dataset, the results after imputation produced (i) more robust evidence for replication with smaller p-values for most of the original significant SNPs, and (ii) a larger number of genome-wide significant loci associated with PD.
Association analysis of imputed genetic data confirmed several already-known PD risk loci and also allowed us to identify five novel association signals driven by low-frequency variants in or near TEKT4, WDR41, MUC12, CARS2, and ITGAE/HASPIN. Of those, first, the low-frequency variant identified in WDR41 at chromosome 5q14.1 showed a near twofold increased risk for PD and WDR41 which is associated with several neurogenerative disorders and could be a potential candidate gene to identify PD risk. WDR41 is a proteincoding gene and diseases associated with this gene include striatal degeneration, autosomal dominant 1, a rare autosomal-dominant movement disorder with some motor symptoms similar to PD, and frontotemporal dementia and/or amyotrophic lateral sclerosis 1, an autosomal dominant neurodegenerative disorder [21]. Importantly, several variants in WDR41 have been identified in previous GWAS having near genome-wide significant association signals for: Alzheimer's disease (AD) (p = 7 × 10 −7 ) [22]; caudate nucleus volume (p = 2 × 10 −7 ), where caudate is a subcortical brain structure implicated in many common neurological and psychiatric disorders [23]; and epileptogenesis (p = 5 × 10 −6 ) [24] in European populations. AD is also an age-related neurodegenerative condition caused by damaged brain cells and both PD and AD can involve common symptoms such as anxiety, depression, and sleep disturbances; some studies have noted shared risk variants across AD and PD [25]. However, none of these studies were able to identify the same rare cording variants for both diseases, perhaps due to limited sample sizes and different data processing methods. These previous findings and results of the current study suggest that WDR41 is a strong candidate gene involved in PD risk.
Second, the variant near TEKT4 is a very rare variant and thus showed an extreme odds ratio in the NeuroX sample. Such an extreme effect estimates that less common or rare genetic variants have large standard errors and result from the small number of alleles observed in the analysed case and control samples. Therefore, analyses in larger samples are required to produce more accurate effect estimates. That said, the near-QC threshold minimac info scores for the variants producing extreme OR values in this study (r 2 = 0.5334 for rs187989831 and r 2 = 0.50635 for rs74125032) could indicate that lower imputation quality may negatively influence the association test and effect estimation. Indeed, there are several challenges associated with both imputing and analysing rare genetic variants due to the low frequency of those variants in the study sample due to their low correlation with surrounding variants, especially compared to and with common genetic variants. Hence, replication via direct genotyping ideally in larger sample sizes is required to ultimately validate such findings.
Although the TEKT4-associated variant is rare and requires validation, it may be an important finding due to its potential involvement in sudden unexplained death in PD or seizure. Diseases potentially related to TEKT4 include myoclonic juvenile epilepsy [26], a condition characterised by recurrent seizures which cause rapid, uncontrolled muscle jerks, muscle rigidity, convulsions, and loss of consciousness.
Of the genes implicated by the other novel SNP loci, CARS2 is associated with combined oxidative phosphorylation deficiency and ovarian cancer. It was found that mutations in CARS2 are associated with progressive myoclonus epilepsy [27] and could lead to a severe epileptic encephalopathy and complex movement disorder [28]. Epilepsy is an uncommon comorbidity of PD. Although rare, the coexistence of epilepsy and PD may influence PD progression [29]. Gruntz et al. (2018) clearly suggest that incident PD is associated with an increased incident epileptic seizures risk [30]. This suggests that these two rare variants could be possible candidate genes for PD risk and since epilepsy is associated with increased risk of sudden unexplained death in epilepsy [31], having variants at these loci, patients with increased risk of PD may experience sudden unexpected death. However, these two variants have extremely large odds ratios in this study, perhaps due to their very low genotype frequency within the analysed dataset. MUC12 is associated with Tn polyagglutination syndrome and colorectal cancer and previous GWAS have pointed out the effect of the genetic variants in MUC12 on hemoglobin levels and CARS2 on diastolic blood pressure. However, there is no disease reported to be associated with the HASPIN gene, making it an important gene for further analysis.
In addition, our results show strong evidence for multiple association signals: one at chromosome 17p13.2 in HASPIN substantiating the importance of this gene in PD risk, and one at chromosome 4q22.1 in SNCA. SNPs at chromosome 4p22.1 are well known for their association with PD [15,[32][33][34] and several other diseases including dementia with Lewy bodies [35].
Notably, this analysis using genotype imputation identified eight PD risk loci, including the five novel genetic loci mentioned above and two other loci in SNCA and MAPT that have been previously reported in other PD GWAS, that were not reported in the original Nalls et al. (2014) study which used the same NeuroX dataset to replicate their discovery phase findings without genotype imputation. Of these risk loci, all genetic variants in the five novel loci are low-frequency or rare variants, while variants in the two other previously reported loci are common genetic variants. In addition to the replication of PD risk loci identified and replicated in Nalls et al. (2014), our analysis was able to impute and replicate (p = 0.031) an additional PD risk locus (rs62120679 in SPPL2B) that was not replicated using a proxy SNP (rs10402629) in Nalls et al. (2014). These results support the utility of genotype imputation using dense reference panel such as HRC to assess genetic variants with wide frequency range.
Interestingly, our results provide support for the findings of a recent meta-analysis of whole-exome sequencing data by Gaare et al. (2020) [16] that was replicated using a cohort genotyped using the NeuroX array. This 2020 study found no evidence of rare mutation enrichment in genes within PD-associated loci. Similarly, our study found genome-wide significant associations of rare SNPs only within novel PD risk loci and not within known PD-associated loci.
The interaction analysis using the GenEpi machine learning approach identified eight SNP pairs having joint genetic effects associated with PD, including a strong genome-wide significant interaction association signal at chromosome 17q21.31 in CRHR1, although producing no significant association signals of those two SNPs individually for PD risk. Given that SNPs in CRHR1 have been previously reported to be associated with PD [5,32,36,37] and Alzheimer's disease [38] and SNP-SNP interactions are identified in SNCA, GAK and MAPT, a well-known risk gene for PD, this suggests that these joint effects are true findings and nicely demonstrate the utility of our approach to identify joint genetic effects associated with complex diseases like PD. However, in GenEpi two criteria were adopted before modelling the genotype features: first, exclude features with genotype frequency (proportion of a genotype among the total samples in the dataset) ≤ 5%; and second, exclude features with weak association (χ 2 test p ≥ 0.01) with the disease. This limits the discovery of joint genetic effects of SNPs having relatively small main effects and the interactions of non-common SNPs.
Overall, the novel individual association signals in TEKT4 and WDR41 and the SNP-SNP interaction effect in CRHR1 identified in this study are important because although TEKT4 and WDR41 have not previously been reported to be associated with PD, previous findings indicate the possible associations of these genes with several neurogenerative and neurological disorders, making them strong biological candidates due to their established pleiotropy. Furthermore, variants in these genes may have utility as prognostic/diagnostic markers to stratify patients with complex (e.g., PD and other neurogenerative/neurological disorder) symptomatology. Although follow-up studies are required to confirm some findings, this study highlights the utility of genome-wide genotype imputation, followed by careful and thorough statistical analyses, in existing custom and ExomeChip arraybased genetic datasets to identify intronic and intergenic risk loci, despite their sparse, inconsistent and predominantly exonic coverage.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/genes12050689/s1, Figure S1: LocusZoom plots of genome-wide significant PD loci having significant secondary association signals, Figure S2: Q-Q plot for the association results of genotype data (a) without imputation and (b) with imputation, Table S1: Composition of the initial PD-NuroX sample, Table S2: Summary of QC process, Table S3: Summary of the GWAS results for genome-wide significant loci, Table S4: Comparison of association results with Nalls et al., Table S5: Genomic region of the SNPs in the imputed dataset, Table S6: Number of cases and controls for each genotpe combination of two SNPs found in interaction analysis, Data S1: 'NeuroX_Reanalysis_Summary_Statistics.txt', Note S1: 'Description_NeuroX_Reanalysis_Summary_Statistics.txt'.