Exploring MicroRNA Biomarkers for Parkinson’s Disease from mRNA Expression Profiles

Parkinson’s disease (PD) is a chronic, progressive neurodegenerative disease characterized by both motor and nonmotor features. The diagnose of PD is based on a review of patients’ signs and symptoms, and neurological and physical examinations. So far, no tests have been devised that can conclusively diagnose PD. In this study, we explore both microRNA and gene biomarkers for PD. Microarray gene expression profiles for PD patients and healthy control are analyzed using a principal component analysis (PCA)-based unsupervised feature extraction (FE). 244 genes are selected to be potential gene biomarkers for PD. In addition, we implement these genes into Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and find that the 15 microRNAs (miRNAs), hsa-miR-92a-3p, 16-5p, 615-3p, 877-3p, 100-5p, 320a, 877-5p, 23a-3p, 484, 23b-3p, 15a-5p, 324-3p, 19b-3p, 7b-5p and 505-3p, significantly target these 244 genes. These miRNAs are shown to be significantly related to PD. This reveals that both selected genes and miRNAs are potential biomarkers for PD.


Introduction
Parkinson's disease (PD), first described by Dr. James Parkinson in 1817, is a chronic, progressive neurodegenerative disease characterized by both motor and nonmotor features [1]. PD motor symptoms such as shaking, rigidity, and slowness of movement are caused by the loss of striatal dopaminergic neurons [2]. The nonmotor symptoms of PD include sleep disorders, depression, and cognitive changes [3][4][5].
The incidence and prevalence of PD increase with age. So far, it is problematic to conclusively diagnose PD because of the lack of a reference standard test [6]. The diagnose of PD is based on a review of patients' signs and symptoms, and neurological and physical examinations. Resting tremor, cogwheel rigidity, and bradykinesia are three "cardinal signs" of PD, and postural instability, a late finding in PD, is the fourth cardinal sign of PD [6].
Owing to the lack of a standard test for diagnosing PD, genetic testing of mutations in disease-causing genes may be one of a helpful way to diagnose familial Parkinson's disease (fPD) and sporadic Parkinson's disease (sPD). A number of PD disease-causing genes have been discovered and debated for both physicians and patients regarding diagnostic and presymptomatic genetic testing of PD in the clinic [7].
A common form of monogenic PD with dominant inheritance is caused by mutations in the gene for leucine-rich repeat kinase 2 (LRRK2) [8,9]; H-Synuclein (SNCA), which is a presynaptic neuronal protein, is linked genetically and neuropathologically to PD [10]; mutations in Parkin are the second most common known cause of PD [11]; mutations in DJ-1 and PTEN Kinase 1 (PINK1) can cause PD [11]; there is a significant inverse association of the ubiquitin carboxy-terminal hydrolase L1 (UCHL1) S18Y variant with PD [12]; and variants in glucocerebrosidase (GBA) and SNCA influence PD risk [13].
In addition to neurological and physical examinations and genetic testing, the gene expression differences between PD and healthy controls can be used as a potential prognosis of PD. Several studies have explored gene biomarkers for PD [14,15]. In this study, we explore gene and microRNA (miRNA) biomarkers for PD.

Gene Expression
We downloaded three mRNA expression profiles of PD and healthy control from Gene Expression Omnibus (GEO) under the GEO ID, GSE20295, GSE20163, and GSE20164. For all three profiles, raw data files (CEL files) (Supplementary file S1) were read with the function ReadAffy of the package Affy (Affymetrix; Thermo Fisher Scientific, Inc., Waltham, MA, USA) in R. Loaded CEL files were treated by mas5 function in the affy package. Then, write.exprs function was used to output normalized mRNA expression profiles. The tissue of the data was substantia nigra (see Supplementary file S1). Integrating the three datasets, we had 35 normal control and 25 PD patients' substantia nigra mRNA expression profiles.

Principal Component Analysis Based Unsupervised Feature Extraction
We applied principal component analysis (PCA)-based unsupervised feature extraction (FE) to mRNA expression profiles in order to select mRNAs that were expressed distinctly between controls and PD patients (see Supplementary file S2 for more details). This method has been successful in identifying potential biomarkers for other neurological disorders [16,17]. PCA was applied to only mRNAs identified by PCA-based unsupervised feature extraction (FE). PC loadings and PC scores were attributed to samples and genes, respectively. Furthermore, PC loadings are associated with the distinction between controls and PD patients; PD patients and controls were differentiated using linear discriminant analysis (LDA) with these selected PC loadings (see Supplementary file S2 for more details).

Validation of Obtained mRNAs
In order to validate the obtained mRNAs, we computed the area under the curve (AUC) of the receiver operating characteristic curve (ROC). Gene symbols associated with mRNAs selected by PCA-based unsupervised FE were uploaded to Enrichr [18,19], and various enriched biological terms were identified (see Supplementary file S4 for more details).

Results
255 probes (Supplementary file S3) were identified using PCA-based unsupervised FE. PCA was applied to 255 probes, and PC loadings attributed to samples were computed. The fourth PC loadings turned out to be associated with distinction between PD patients and controls. Table 1 shows the confusion table obtained by the linear discriminant analysis (LDA) using the fourth PC loading. Sensitivity of PD is 0.88, precision of PD is 0.73, F1 score (F-measure) is 0.8, and accuracy is 0.80. AUC is 0.95. Odds ratio is 20.5. p value computed by Fisher's exact test is 2.5 × 10 −6 . All of these evaluations suggest that the fourth PC loadings can significantly discriminate controls and PD patients.
Although we found 255 probes that successfully discriminate PD patients from controls, biological evaluation of obtained probes is important. Two hundred and forty four gene symbols corresponding to these 255 probes (See Supplementary Materials) were uploaded to Enrichr for biological evaluation. There turned out to be many biological terms enriched in these gene symbols. Table 2 shows the top-ranked five categories in "Disease Perturbations from GEO down" of Enrichr (full list is available in Supplementary Material S4). The first four among these five categories are the PD. Since the GEO datasets used in Enrichr are not the same datasets used in our study, PCA based unsupervised FE successfully identified genes downregulated in PD patients. As for "Disease Perturbations from GEO up" of Enrichr, PD is less enriched, but there are still seven PD experiments with a significant p-value (Table 3, full list in Supplementary Material S4). The next biological term investigated is KEGG pathway (Table 4). Although PD was not top ranked, it is the eighth most significant enriched KEGG pathway. Tables 1-4 suggest that the identified 244 genes are significantly related to PD. In this regard, we found that 15 miRNAs significantly target 244 gene symbols by checking "miRTarBase 2017" in Enrichr (Table 5); 113 gene symbols out of 244 gene symbols were targeted by either of 15 miRNAs. These 15 miRNAs were reported to be related to PD (Table 5). Thus, we should consider these 15 miRNAs as key regulators of PD instead of 244 gene symbols. In reality, miRNAs are generally suggested as key regulator of PD [20].

More Brain Synapse-Related Biological Terms Are Enriched
Although in the previous section, we notice that downregulated genes and upregulated genes in PD GEO datasets and in PD KEGG pathway are enriched in the 244 identified gene symbols, more detailed analyses give us more convincing insight about the relationship between these 244 gene symbols and PD. Other than PD, some KEGG pathways in Table 4 are also related to PD. For example, as for ribosome that is top-ranked in Table 4, ribosomal protein s15 phosphorylation was reported to mediate LRRK2 neurodegeneration in Parkinson's disease [30] As for phagosome that was the second top-ranked, LRRK2 was reported to be a negative regulator of Mycobacterium tuberculosis phagosome maturation in macrophages [31]. As for synaptic vesicle cycle that was the third top-ranked, synaptic vesicle trafficking was reported to be related to Parkinson's disease [32]. These results suggest that 244 gene symbols are expected to be closely related to PD progression mechanisms.
Although LRRK2 itself was not included in 244 gene symbols, significant number of genes obtained from "Single Gene Perturbations from GEO up/down" affected by LRKK2 KO/KI (knocked out/knocked in) were included in 244 gene symbols (Tables 6 and 7). This also supports that 244 gene symbols are related to mechanisms of PD.
Other than PD specificity, it is important to check if 244 genes are specifically upregulated in brain/synapse, since, otherwise, genes identified are not convincing. Tables 8 and 9 show that 244 genes are highly brain tissue-specific, while Table 10 shows that 244 genes are overlapped with genes downregulated in other tissues than brain (full list is in Supporting Materials).    Table 9. Top ranked five gene expression profiles in "Genotype-Tissue Expression (GTEx) Tissue Sample Gene Expression Profiles up" whose altered genes are associated with 244 gene symbols.  Thus, we can conclude that 244 gene symbols are not only PD-specific but also brain tissue-specific. This also suggests that we successfully identified gene-related PD mechanisms. In spite of successfully identifications, it is not very easy to investigate as many as 244 gene symbols one by one. It is better to find a more limited number of factors that regulate 244 gene symbols.

Term
For the selected 15 miRNAs in Table 5, we confirmed our results by comparing other studies. hsa-miR-92a appeared as novel hub miR in both regulatory and co-expression network, indicating its strong functional role in PD; GBA deficiency is associated with PD, and miR-16-5p has been shown to correspond to enhanced GBA protein levels [22,33]; and both PD and HD (Huntington's disease) are neurodegenerative and caused by protein inclusions. miR-615-3p was identified as differentially expressed in HD prefrontal cortex compared to non-neurological disease controls, and hsa-miR-615-3p was identified up-regulated in HD [23]; miR-100, miR-23a, and miR484 were identified to be PD-related miRNAs [25]; miR-320a was identified as a PD-related miRNA [26]; tumor necrosis factor-alpha (TNF-α), a pro-inflammatory cytokine, was elevated in blood, CSF, and striatum regions of the brain in PD patients, and hsa-miR-23a, hsa-miR-23b, and hsa-miR-320a significantly decreased in the presence of TNF-α [27]; mmu-miR-15a-5p, mmu-miR-19b-3p, and mmu-miR-7b-5p miR-7 were shown to downregulate the inflammatory response in cellular in vitro and/or in vivo PD neurotoxic models, and miR-19b was shown to be downregulated in the prodromal stage of alpha-synucleinopathies and pinpointed this miRNA as a potential biomarker also for PD and DLB [20]; and miR-505-3p was identified as a PD-predictive miRNA by microarrays [29].

Conclusions
PD is a long-term degenerative disorder that usually affects elderly people. Since there are no standard tests that can conclusively diagnose PD, biological biomarkers can help in early diagnosis. In this study, PD-related mRNAs are selected using a PCA-based, unsupervised FE method, and miRNA biomarkers of PD are explored based on these selected mRNAs. Two hundred and forty-four genes and 15 miRNAs are identified to be related to PD. Biological evidence shows the selected genes and miRNAs are potential PD biomarkers. However, since the tissues used in this study are substantia nigra from postmortem brain, it needs to be further verified whether these selected biomarkers can be used as blood/serum PD biomarkers.