Dysregulation of Human Somatic piRNA Expression in Parkinson’s Disease Subtypes and Stages

Piwi interacting RNAs (piRNAs) are small non-coding single-stranded RNA species 20–31 nucleotides in size generated from distinct loci. In germline tissues, piRNAs are amplified via a “ping-pong cycle” to produce secondary piRNAs, which act in transposon silencing. In contrast, the role of somatic-derived piRNAs remains obscure. Here, we investigated the identity and distribution of piRNAs in human somatic tissues to determine their function and potential role in Parkinson’s disease (PD). Human datasets were curated from the Gene Expression Omnibus (GEO) database and a workflow was developed to identify piRNAs, which revealed 902 somatic piRNAs of which 527 were expressed in the brain. These were mainly derived from chromosomes 1, 11, and 19 compared to the germline tissues, which were from 15 and 19. Approximately 20% of somatic piRNAs mapped to transposon 3′ untranslated regions (UTRs), but a large proportion were sensed to the transcript in contrast to germline piRNAs. Gene set enrichment analysis suggested that somatic piRNAs function in neurodegenerative disease. piRNAs undergo dysregulation in different PD subtypes (PD and Parkinson’s disease dementia (PDD)) and stages (premotor and motor). piR-has-92056, piR-hsa-150797, piR-hsa-347751, piR-hsa-1909905, piR-hsa-2476630, and piR-hsa-2834636 from blood small extracellular vesicles were identified as novel biomarkers for PD diagnosis using a sparse partial least square discriminant analysis (sPLS-DA) (accuracy: 92%, AUC = 0.89). This study highlights a role for piRNAs in PD and provides tools for novel biomarker development.


Introduction
piRNAs (piwi-interacting RNAs) form the largest class of small non-coding RNAs in animals and are characterized by their length distribution of 21-30 nucleotides, interaction with PIWI proteins, and 2 -O-methylation at the 3 end of the single-stranded mature product [1][2][3][4]. To date, the most vital function of piRNAs known is the formation of a piRNA-PIWI silencing complex, which suppresses the expression of transposons in germline cells [5,6]. Because of the ubiquitous epigenetic reprogramming in the germline, transposons may transform their silent status to activation and subsequently alter their chromosome position or propagate, leading to a significant increased risk of genomic instability [7][8][9]. The deficiency of the piRNA-PIWI complex gives rise to the activation of transposons and causes sterility [10][11][12]. In animal models, the piRNA-PIWI complex suppresses transposons by depositing repressing genetic biomarkers (usually H3K9me3) at a transcriptional level or by utilizing PIWI-dependent cleavage of target mRNAs at a posttranscriptional level [13][14][15]. However, in mammalian germ cells, most pre-pachytene piR-NAs appear to be within coding regions or are 3 UTR derived [16,17]. The most abundant piRNA population, pachytene piRNAs, also show lack of transposon sequences [17][18][19]. The widespread existence and high expression level of non-transposon recognizing piR-NAs suggest additional functions beyond transposon-silencing [20]. Previous studies have revealed that Aub (homolog of Piwi in Drosophila), a piRNA binding complex, increases the translation of self-renewal and differentiation factors in germ stem cells, as well as stabilize and more accurate knowledge on the identity, abundance, distribution, and differences of piRNAs in neuronal tissues will help to inform and provide evidence of piRNAs as playing an essential role in neurodegenerative disease processes.
The piRNA-related discoveries in human somatic tissues are limited [64]. Moreover, some somatic piRNAs investigated by related research appear ambiguous as their alignment positions vastly overlap with other ncRNAs (e.g., tRNAs, rRNAs, and snoR-NAs) [28,42,65,66], which suggests these somatic piRNAs might be overestimated as they are probably not bona fide piRNAs. In this study, we analyzed high-throughput small RNA sequencing data from several datasets downloaded from the GEO database with accession numbers and references provided therein. We used a relatively strict piRNA identification method by filtering other annotated small ncRNAs first and then annotated the rest of the sequences by a piRNA annotation database. Then, we analyzed the expression patterns and characteristics of germline piRNAs in tissue-specific somatic cells. To investigate piRNAs in PD tissues, we performed a comparative study of PD patients and controls and evaluated piRNAs' functional role in disease onset and progression. Our investigation identifies a small but appreciable number of somatic piRNAs that exist in somatic cells and are dysregulated in human PD tissues. Furthermore, we identify the nature and potential gene targets of these piRNAs and propose a subset as disease biomarkers.

Workflow
In order to identify and compare piRNA expression patterns in germline tissues and somatic tissues, we built an analysis workflow. The workflow of the study includes preprocessing steps (quality control (QC), adapter trimming, and length filtering), annotation of piRNAs (mapping and annotation based on piRNA database). Next, tissue-specific piRNA features (chromosome distribution, genomic context, length distribution, and pingpong cycle) were investigated. For the comparison study between PD and the control, we implemented differential expression analysis, followed by gene enrichment analysis for piRNAs-aligned to genes/predicted targets. Finally, classifiers were used to differentiate between the control and PD.

Collection of High-Throughput Small RNA Sequencing Datasets from GEO
We collected germline and somatic tissue small RNA sequencing datasets from GEO [67]. A previous study demonstrated that small RNA sequencing captures and sequences small RNAs with moderate-to-low-abundance while able to detect modest expression differences across samples [68]. We obtained the piRNA expression patterns in various tissues by analyzing their corresponding small RNA sequencing datasets. The description of datasets used for the study is provided in Table 1.

Transposons and Transcripts Annotation
Repeats' annotation was downloaded from UCSC repeat browser [91], while transcripts annotation (protein-coding, lncRNAs, pseudogenes) was obtained from Ensemble [83]. In our study, we classified LINE, SINE, LTR, and DNA in repeats as transposons and the rest of the repetitive sequence types, such as simple repeats, as other repeats.

Differential Analysis and Enrichment Analysis
We used the sample size, variance homogeneity of variance, and existence of outliers to construct differential analysis by the Wilcoxon test (piRNAs) or Deseq2 (v1.30.1) [92]. Combat-seq algorithm [93] was used to correct batch effects when needed. We implemented gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis by a web-based resource g: Profiler [94].

Classifier Construction
Sparse partial least squares discriminant analysis (sPLS-DA) [95][96][97] was used to construct control and PD classifiers. For each classifier, we used expression counts data (piRNAs, piRNAs in each repeat region, and piRNAs in each gene region) normalized by library size to train the classifier. In each training process, we first evaluated the performance of sPLS-DA by component numbers from one to ten, using leave-one-out cross-validation [98,99] to obtain optimized component numbers. We then used the gridsearch method to access the optimal number of variables to select on each component for a further sparse partial least squares discriminant analysis (sPLS-DA). Finally, we used accuracy and area under the curve (AUC) of the receiver operating characteristic curve (ROC) [100] to identify the quality of each classifier. We used package mixOmics [101] to complete the processes mentioned above.

piRNA Expression in Somatic Cells and Comparison to Germline Cells
In order to characterize the properties of somatic piRNAs, we built a pipeline for extracting and identifying piRNAs from next generation sequencing (NGS) datasets. The pipeline is shown in (Figure 1). As the ovarian tissue dataset has a different experimental design and sequencing depth with others, piRNAs were considered expressed if reads were >2 in all samples or reads > 5 in 50% of samples, while for other tissues, the standard was reads > 2 in all samples or reads > 10 in 50%. We first compared piRNAs from somatic tissue to the testis and ovaries. The total number of piRNAs identified from the testis, ovary, and somatic tissue were 19981, 6363, and 902, respectively (Figure 2a). Consistent with previous studies, we identified a large number of testis piRNAs and a small number of somatic RNAs [102]. There were 49 common piRNAs between tissues studied, 51 between somatic and testis tissues, and 312 between somatic and ovaries, representing >30% of all somatic piRNAs. When the brain piRNAs were considered, the total number of piRNAs were 527 (Figure 2b). There was a dramatic reduction in piRNAs overlapping with testis, 3 in total, while 208 overlapped with ovaries. When we further compared brain regions, we observed that piRNAs were widely dispersed between the amygdala, prefrontal cortex, and dorsal lateral prefrontal cortex (Figure 2c). The expression pattern of somatic piRNA types appears to be tissue-specific with a large proportion of different piRNAs in the germline (testis, sperm, and ovary) (  Figure S1), which suggests that somatic piRNAs might function differently than those from germline piRNAs.   piRNAs are known to be expressed from specific loci in the genome. In order to identify the specific regions of the genome from where the piRNAs originated, we mapped the reads. Testis/sperm piRNAs display a pronounced chromosome 15 and 19 alignment preference, which is lost in the ovary (Figure 3a). Chromosome 19 alignment can also be observed in somatic tissue; however, there appears to be piRNA loci, such as chromosome 4 for ovary, 1 for brain tissues, liver, and pancreas, and 11 for liver ( Figure 3a). Somatic piRNAs also show a high mitochondrial chromosome alignment rate among all three brain tissues studied and the liver in contrast to germline tissues ( Figure 3b). We observed that testis and sperm piRNAs showed a higher percentage of lncRNAalignment ( Figure 4b). These are mainly mammalian pachytene piRNAs, which are primarily generated from transposon-depleted lncRNAs and thought to regulate genes required for male fertility [17]. We detected a higher ratio of protein-coding derived piR-NAs, especially in somatic tissues, suggesting a function different that in germline cells. This phenomenon is partly contributed by some coding-gene (~25%) containing transposon sequences in the 3′UTR [105]. Some studies demonstrated that these genes have the potential to be controlled by PIWI and transposon-derived piRNA complex [106], but it is more likely that there are more protein-coding-derived piRNAs that exist across tissues. Currently the detailed function of coding-protein-derived piRNAs remains unknown.
We also identified many piRNAs aligned to pseudogenes, which is more evident in testis and sperm tissue as their lower protein-coding gene alignment rate (Figure 4b). A recent study suggested that some pseudogene-derived piRNAs have the potential to target and regulate their cognate protein-coding genes [106]. A more detailed study might thus unravel a regulation network among pseudogenes, piRNAs, and protein-coding genes [107].
Next, to more clearly define a function for somatic piRNAs, we performed KEGG enrichment analysis with genes aligned by piRNAs at the 3′UTR region. Previous studies reported the existence of pre-pachytene piRNAs derived from 3′UTR of protein-coding To determine whether somatic piRNAs were functioning via the well-established transposon suppressing pathway, we identified the transposon sequences aligned to the piRNAs. We unexpectedly detected a lower percentage (~20%) of piRNAs aligned to transposon sequences in both somatic tissues and germline tissues ( Figure 4a). This lower percentage in germline was also found by other studies [103,104]. Unlike other tissues, amygdala and germline piRNAs contain more reverse-stranded repeat transposons than stranded. Considering the prevalent model related to piRNA-guided transposon silencing, where the PIWI-piRNA complex interacts with a nascent transposon transcript, a higher reverse-stranded piRNA percentage might indicate that repressing transposon expression is a vital function in both amygdala and germline tissues. In LINE-alignment piRNAs, somatic and ovarian piRNAs show a reverse-stranded preference, while testis and sperm piRNAs present the opposite. However, testis piRNAs show a reverse-stranded preference in SINEalignment piRNAs, while somatic piRNAs are just the opposite (Figure 4a). Because LINE and SINE might have different impacts in somatic and germline tissues, their aligned-piRNAs might have varying regulatory functions.
sues with significant ping-pong cycle signals synthesize piRNAs, which possess a 5′U bias and an A nucleotide bias at the 10th position. We found these attributes in testis, sperm, and ovary derived piRNAs from our analysis pipeline. However, we were not able to uncover evidence of ping-pong cycle biogenesis in somatic tissues based upon the distribution of mapped reads ( Figure 6). We further confirmed this with an algorithm, pingpongpro [108], which also was not able to detect a ping-pong cycle signal in somatic tissues.  Genomic context of piRNAs. The percentage of annotated piRNAs reads mapped to transposons/transposon subtypes (a) and lncRNA/protein-coding genes/pseudogenes (b) of total annotated piRNA reads number. Ratio = reads number mapped to transposons (or lncRNA, protein-coding genes, pseudogenes)/total annotated piRNAs reads number. Symbols: *, p < 0.05; ** p < 0.01; *** p <0.001.
We observed that testis and sperm piRNAs showed a higher percentage of lncRNAalignment ( Figure 4b). These are mainly mammalian pachytene piRNAs, which are primarily generated from transposon-depleted lncRNAs and thought to regulate genes required for male fertility [17]. We detected a higher ratio of protein-coding derived piRNAs, especially in somatic tissues, suggesting a function different that in germline cells. This phenomenon is partly contributed by some coding-gene (~25%) containing transposon sequences in the 3 UTR [105]. Some studies demonstrated that these genes have the potential to be controlled by PIWI and transposon-derived piRNA complex [106], but it is more likely that there are more protein-coding-derived piRNAs that exist across tissues. Currently the detailed function of coding-protein-derived piRNAs remains unknown.
We also identified many piRNAs aligned to pseudogenes, which is more evident in testis and sperm tissue as their lower protein-coding gene alignment rate (Figure 4b).
A recent study suggested that some pseudogene-derived piRNAs have the potential to target and regulate their cognate protein-coding genes [106]. A more detailed study might thus unravel a regulation network among pseudogenes, piRNAs, and protein-coding genes [107].
Next, to more clearly define a function for somatic piRNAs, we performed KEGG enrichment analysis with genes aligned by piRNAs at the 3 UTR region. Previous studies reported the existence of pre-pachytene piRNAs derived from 3 UTR of protein-coding genes in germline cells [17]. To our surprise, both brain tissue and testis tissue achieve enrichment results for "Parkinson's disease" and "Pathways of Neurodegeneration-multiple diseases", respectively ( Figure 5). However, the function of these sense-piRNAs is indistinct. They might be transacting with their derived mRNAs by an unknown mechanism. When we used the transcript (including CDS, exon, 3 UTR, and 5 UTR) aligned by piRNAs to perform enrichment analysis of the three different brain tissues, testis, and ovary, all tissues identified an enrichment of neurodegeneration related genes.  To further define attributes of piRNAs from somatic tissues, we characterized their length features. Somatic piRNAs show shorter and more evenly distributed lengths than germline piRNAs ( Figure 6). According to the ping-pong cycle biogenesis pathway, tissues with significant ping-pong cycle signals synthesize piRNAs, which possess a 5 U bias and an A nucleotide bias at the 10th position. We found these attributes in testis, sperm, and ovary derived piRNAs from our analysis pipeline. However, we were not able to uncover evidence of ping-pong cycle biogenesis in somatic tissues based upon the distribution of mapped reads (Figure 6). We further confirmed this with an algorithm, pingpongpro [108], which also was not able to detect a ping-pong cycle signal in somatic tissues.

piRNAs in Parkinson's Disease
To determine the significance of somatic piRNAs in human disease, we detected 296 piRNAs in the prefrontal cortex of which 20 piRNA expression levels were significantly different in PD; and 508 piRNAs in amygdala of which 55 piRNA expression levels were significantly different in PD (Figure 7a). Among these, one piRNA (piR-hsa-748391) was expressed differently in PD in both prefrontal cortex and amygdala tissues. Unfortunately, none of 3′UTR derived piRNAs from neurodegeneration-related genes mentioned above were among these differently expressed piRNAs. A target predicting method raised by a previous paper [109] showed 55 differently expressed piRNAs in amygdala target 20 proteins. Of these 20 proteins, MT-CO1 and MT-CO3 were included in the Parkinson's disease KEGG enrichment result, while MT-CO1, MT-CO3, and GRIA4 was included in the Pathways of neurodegeneration-multiple diseases KEGG enrichment result. However, this stringent target prediction method predicted that none of the genes were targeted by piRNAs differently expressed in PD prefrontal cortex.
We also found five piRNAs were differently expressed among the control, premotor stage PD, and motor stage PD in amygdala but not in the control vs. PD (premotor plus motor stage) (Figure 7b). Among these, piR-hsa-131693 is predicted to target MT-CO3 and MT-CO3 related pseudogenes (MT-CO3, MTCO3P12, MTCO3P18, MTCO3P8, MTCO3P38, MTCO3P7, MTCO3P13, and MTCO3P22) and the expression of piR-has-131693 decreases as PD disease progresses in amygdala. In the prefrontal cortex, we also identified different piRNAs expression patterns in PD and PDD, although there were no significant enrichment results with their predicted target genes (Figure 7a). In PD SEVs, we found 17 differently expressed piRNAs, of which piR-hsa-2435261 was predicted to target TANGO2 and piR-hsa-1516701 was predicted to target JAK3. In PD LEVs, two differently expressed piRNAs were detected with no specific predicted target genes. Figure 6. The length distribution of annotated piRNAs. The y-axis represents the ratio of piRNA reads with specific length/base to total annotated piRNAs reads in that tissue. Colors represent the first base of annotated piRNAs.

piRNAs in Parkinson's Disease
To determine the significance of somatic piRNAs in human disease, we detected 296 piRNAs in the prefrontal cortex of which 20 piRNA expression levels were significantly different in PD; and 508 piRNAs in amygdala of which 55 piRNA expression levels were significantly different in PD (Figure 7a). Among these, one piRNA (piR-hsa-748391) was expressed differently in PD in both prefrontal cortex and amygdala tissues. Unfortunately, none of 3 UTR derived piRNAs from neurodegeneration-related genes mentioned above were among these differently expressed piRNAs. A target predicting method raised by a previous paper [109] showed 55 differently expressed piRNAs in amygdala target 20 proteins. Of these 20 proteins, MT-CO1 and MT-CO3 were included in the Parkinson's disease KEGG enrichment result, while MT-CO1, MT-CO3, and GRIA4 was included in the Pathways of neurodegeneration-multiple diseases KEGG enrichment result. However, this stringent target prediction method predicted that none of the genes were targeted by piRNAs differently expressed in PD prefrontal cortex.
We also found five piRNAs were differently expressed among the control, premotor stage PD, and motor stage PD in amygdala but not in the control vs. PD (premotor plus motor stage) (Figure 7b). Among these, piR-hsa-131693 is predicted to target MT-CO3 and MT-CO3 related pseudogenes (MT-CO3, MTCO3P12, MTCO3P18, MTCO3P8, MTCO3P38, MTCO3P7, MTCO3P13, and MTCO3P22) and the expression of piR-has-131693 decreases as PD disease progresses in amygdala. In the prefrontal cortex, we also identified different piRNAs expression patterns in PD and PDD, although there were no significant enrichment results with their predicted target genes (Figure 7a). In PD SEVs, we found 17 differently expressed piRNAs, of which piR-hsa-2435261 was predicted to target TANGO2 and piRhsa-1516701 was predicted to target JAK3. In PD LEVs, two differently expressed piRNAs were detected with no specific predicted target genes. We observed a higher percentage of transposon-derived piRNAs in the PD group in the prefrontal cortex (control vs. PD, control vs. PDD, control vs. PD plus PDD), but no significant difference between PD and PDD ( Figure 8a). However, the transposon-derived piRNA expression difference between PD and control in amygdala is not significant (Figure 8b). In contrast, piRNAs in PD showed higher pseudogene alignment in the prefrontal cortex (Figure 9a), where the different piRNA-aligned pseudogenes were HSP90AA1 and EEF1A1-related pseudogenes. Interestingly, these two genes have been reported as PDrelated genes and showed a downregulation trend in PD-affected tissues [110,111]. In amygdala, the overall piRNAs-aligned pseudogenes ratio is not different in Parkinson's disease (Figure 9b), but some related pseudogene-aligned piRNAs, including HSP90, MTCO1, MTCO2, YWHAZ, GAPDH, and MTND were upregulated in the PD group.
We defined piRNAs-aligned genes with a stringent standard (piRNAs-aligned number > 10 in more than 50% samples or piRNAs-aligned number > 2 in all samples). Using differently expressed analysis (p < 0.05), we identified 24 genes with different piRNAsalignment numbers between the control and PD in prefrontal cortex and 162 genes in amygdala. Figure 10 shows the top 10 results for each enrichment term. We also obtained Parkinson's disease KEGG results in the amygdala tissue and prefrontal tissue with a relaxed standard (piRNAs-aligned number >2 in more than 10 samples) (data not shown). From the Parkinson's disease KEGG enrichment result, five of differently expressed piR-NAs-aligned genes (NDUFA4, CALM3, UCHL1, CALM2, and HSPA5) in the prefrontal cortex (15 genes) overlapped with differently expressed piRNAs-aligned genes in the amygdala (16 genes). We observed a higher percentage of transposon-derived piRNAs in the PD group in the prefrontal cortex (control vs. PD, control vs. PDD, control vs. PD plus PDD), but no significant difference between PD and PDD (Figure 8a). However, the transposonderived piRNA expression difference between PD and control in amygdala is not significant (Figure 8b). In contrast, piRNAs in PD showed higher pseudogene alignment in the prefrontal cortex (Figure 9a), where the different piRNA-aligned pseudogenes were HSP90AA1 and EEF1A1-related pseudogenes. Interestingly, these two genes have been reported as PD-related genes and showed a downregulation trend in PD-affected tissues [110,111]. In amygdala, the overall piRNAs-aligned pseudogenes ratio is not different in Parkinson's disease (Figure 9b), but some related pseudogene-aligned piRNAs, including HSP90, MTCO1, MTCO2, YWHAZ, GAPDH, and MTND were upregulated in the PD group.
We defined piRNAs-aligned genes with a stringent standard (piRNAs-aligned number > 10 in more than 50% samples or piRNAs-aligned number > 2 in all samples). Using differently expressed analysis (p < 0.05), we identified 24 genes with different piRNAsalignment numbers between the control and PD in prefrontal cortex and 162 genes in amygdala. Figure 10 shows the top 10 results for each enrichment term. We also obtained Parkinson's disease KEGG results in the amygdala tissue and prefrontal tissue with a relaxed standard (piRNAs-aligned number >2 in more than 10 samples) (data not shown). From the Parkinson's disease KEGG enrichment result, five of differently expressed piRNAs-aligned genes (NDUFA4, CALM3, UCHL1, CALM2, and HSPA5) in the prefrontal cortex (15 genes) overlapped with differently expressed piRNAs-aligned genes in the amygdala (16 genes).

sPLS-DA of piRNAs Expression Level in Control and PD
Lastly, we employed a sparse partial least square discriminant analysis, a supervised principal component analysis analog statistical method [95,97], to distinguish control and PD based on piRNAs expression levels. The leave-one-out cross-validation method was Figure 10. Gene set enrichment analysis for genes with different piRNAs-aligned rates. piRNAsaligned genes was defined by a stringent standard (piRNAs-aligned number > 10 in > 50% of samples or > 2 in 100% of samples). The gene ontology and KEGG pathway enrichment analyses were established by genes with different piRNA aligned rates in PD and control groups. The top 10 results for each enrichment term: BP (biological process), CC (cellular component), MF (molecular function), and KEGG are shown. The tissues are indicated (a-h). Gene ratio = gene number in query list (piRNAs aligned in gene list)/total gene number of specific pathway in database.

sPLS-DA of piRNAs Expression Level in Control and PD
Lastly, we employed a sparse partial least square discriminant analysis, a supervised principal component analysis analog statistical method [95,97], to distinguish control and PD based on piRNAs expression levels. The leave-one-out cross-validation method was applied to construct 2-classes classifiers/3-classes classifiers (results are shown in Supplementary Table S1). These results show the potential that piRNAs can be biomarkers to separate PD from controls, suggesting piRNAs undergo perturbation in different PD subtypes (PD and PDD) and stages (premotor and motor). We also used piRNAs counts in piRNAs-aligned genes/repeats as indicators to train sPLS-DA classifiers, which showed validation performance improvement over the above piRNAs model to some extent. We also used a random forest algorithm and the results obtained were similar (data not shown). We also observed that a classifier based on piRNAs in blood extracellular vesicles achieve an excellent performace (accuracy: 92%, AUC = 0.89). The sPLS-DA based scatter plots for individuals show a clear separation (Figure 11).  shown in Supplementary Table S1). These results show the potential that piRNAs can be biomarkers to separate PD from controls, suggesting piRNAs undergo perturbation in different PD subtypes (PD and PDD) and stages (premotor and motor). We also used piRNAs counts in piRNAs-aligned genes/repeats as indicators to train sPLS-DA classifiers, which showed validation performance improvement over the above piRNAs model to some extent. We also used a random forest algorithm and the results obtained were similar (data not shown). We also observed that a classifier based on piRNAs in blood extracellular vesicles achieve an excellent performace (accuracy: 92%, AUC = 0.89). The sPLS-DA based scatter plots for individuals show a clear separation (Figure 11). Figure 11. sPLS-DA of piRNA expression data. The 3D sPLS-DA plots for prefrontal cortex (a) and amygdala (b) piRNA expression levels are shown. The optimized components value of blood SEVs is 1, therefore was not presented as a 3d PLS-DA plot.

Discussion
In this study, we identified piRNAs that display specific expression patterns across germline and somatic tissues. Compared with germline derived piRNAs, the overall abundance in somatic tissues was much lower but still appreciable. A previous study proposed that piRNA targeting has a greater correlation with binding energy compared to piRNA abundance, suggesting that piRNAs might act in a concentration-independent manner [112]. This supports the notion that somatic piRNAs can exert their functions with low expression levels. We found evidence for a ping-pong cycle biogenesis pathway signal in germline (testis, sperm, and ovary) but not in somatic tissues, which suggests a different piRNA generation pathway in the soma. The chromosomal loci from where piR-NAs are generated showed some overlap between somatic and germline tissues; however, most somatic piRNAs were generated from loci distinct from germline cells.
While the existence of piRNAs from somatic tissues has been described, the function of these piRNAs remains unclear [107,113,114]. It could be hypothesized that somatic piR-NAs function similarly to germline piRNAs, where the primary function is to repress transposons [115]. We found only 20% of piRNAs derived from transposon areas in both germline and somatic tissues. As somatic cells have more extensive epigenetic biomarkers than germline cells [116], the primary function of somatic piRNAs might be different from repressing transposons. Notably, transposon-derived somatic piRNAs have different expression levels and stranded preferences with transposon-derived germline piRNAs, suggesting that transposon repression may not be its only function.
Previous studies have shown evidence that piRNAs regulate protein expression and participate in many bioprocesses [117][118][119][120]. Some studies suggest that piRNAs repress gene expression by cleaving mRNAs of protein-coding genes [13,14,121,122]. Another Figure 11. sPLS-DA of piRNA expression data. The 3D sPLS-DA plots for prefrontal cortex (a) and amygdala (b) piRNA expression levels are shown. The optimized components value of blood SEVs is 1, therefore was not presented as a 3d PLS-DA plot.

Discussion
In this study, we identified piRNAs that display specific expression patterns across germline and somatic tissues. Compared with germline derived piRNAs, the overall abundance in somatic tissues was much lower but still appreciable. A previous study proposed that piRNA targeting has a greater correlation with binding energy compared to piRNA abundance, suggesting that piRNAs might act in a concentration-independent manner [112]. This supports the notion that somatic piRNAs can exert their functions with low expression levels. We found evidence for a ping-pong cycle biogenesis pathway signal in germline (testis, sperm, and ovary) but not in somatic tissues, which suggests a different piRNA generation pathway in the soma. The chromosomal loci from where piRNAs are generated showed some overlap between somatic and germline tissues; however, most somatic piRNAs were generated from loci distinct from germline cells.
While the existence of piRNAs from somatic tissues has been described, the function of these piRNAs remains unclear [107,113,114]. It could be hypothesized that somatic piRNAs function similarly to germline piRNAs, where the primary function is to repress transposons [115]. We found only 20% of piRNAs derived from transposon areas in both germline and somatic tissues. As somatic cells have more extensive epigenetic biomarkers than germline cells [116], the primary function of somatic piRNAs might be different from repressing transposons. Notably, transposon-derived somatic piRNAs have different expression levels and stranded preferences with transposon-derived germline piRNAs, suggesting that transposon repression may not be its only function.
Previous studies have shown evidence that piRNAs regulate protein expression and participate in many bioprocesses [117][118][119][120]. Some studies suggest that piRNAs repress gene expression by cleaving mRNAs of protein-coding genes [13,14,121,122]. Another study showed non-transposon-derived piRNAs have the potential to induce DNA methylation at non-transposon regions in a complementary manner in two human cell lines [123]. In addition, studies have indicated that piRNAs activate mRNA translation [15] and localize mRNA [124,125]. Conversely, some evidence has also shown that mRNAs act as a source of piRNAs, which in turn regulate their target mRNAs [22,126]. We hypothesized that piRNAs' generation and function follow base-pairing rules, and we performed enrichment analysis with the genes aligned by piRNAs. The enrichment results showed neurodegeneration disease in both brain tissues (dorsolateral prefrontal cortex, prefrontal cortex, and amygdala) and germline tissues (testis and ovary), which suggested piRNAs may be involved in neurodegenerative processes. To further test this, we performed a comparative study of Parkinson's disease in the prefrontal cortex and amygdala. The prefrontal cortex and amygdala were previously shown to be affected in Parkinson's disease in a study using lesions with immunoreactions against α-synuclein and focusing on limbic and motor regions, and in PET imaging studies measuring blood flow changes, respectively [127,128]. Especially, amygdala was demonstrated to be damaged in the early stage and related to some incipient symptoms, such as anxiety determined by the association of anxiety symptoms and amygdala volume in 110 early stage patients [129]. Imaging studies have also addressed the link between anxiety and the amygdala in Parkinson's disease especially highlighting the fear circuit [130]. We found that the piRNAs in Parkinson's disease brains have a higher pseudogene alignment rate, while the cognate protein-coding genes of some of those pseudogenes showed neurodegeneration-related functions. Pseudogene-derived piRNAs have been proposed to suppress their cognate gene level in late spermatocytes [106]. Based on these results, we propose a pseudogenes-piRNAs protein-coding gene regulation network, where those pseudogenes-related piRNAs have the potential to regulate their cognate genes in brains.
Finally, we used piRNAs expression levels as features to construct sPLS-DA classifiers with the prefrontal cortex, amygdala, and blood extracellular vesicles [97]. The blood extracellular vesicle-related classifier is particularly interesting, as it has potential to be measured as a noninvasive biomarker in Parkinson's disease [131][132][133][134][135] To our surprise, the classifier's performance with small extracellular vesicle piRNAs was excellent (accuracy: 92%, AUC = 0.89). This compares favorably to previous studies where miRNAs were proposed as biomarkers and may further complement studies where the amount of exosomes, α-synuclein, or other proteins were proposed [134,[136][137][138]. The variable importance in projection (VIP) score in sPLS-DA indicates a feature's relevance and importance to the trained model. We found piR-has-92056, piR-hsa-150797, piR-hsa-347751, piR-hsa-1909905, piR-hsa-2476630, and piR-hsa-2834636 showed the highest relevance in our trained model. piR-hsa-92056 and piR-hsa-150797 are derived from SPPL3, which is a signal peptide peptidase and identified as a risk gene of PD in a GWAS study [139]. piR-hsa-2476630 is derived from DOK3, which inhibits lipopolysaccharide signaling in macrophages [140], while lipopolysaccharide causes chronic neuroinflammation and progressive neurodegeneration [141]. piR-has-347751 is derived from RHEB, which alleviates neurodegeneration and induces axon regrowth [142]. piR-hsa-2834636 is derived from HBA2, of which the mRNA is dysregulated in the frontal cortex of neurodegenerative disease [143]. While these associations suggest specific targets through which the piRNAs may act, validation of each specific biochemical step remains to be performed. Model organisms may thus help in this regard, and a study from our lab demonstrated dysregulation of specific miRNAs and piRNAs in a PD model [144]. Ultimately, the function piRNAs in blood extracellular vesicles might reveal information concerning cell-cell communication and disease progression.
While we were able to detect specific piRNAs in somatic tissue and demonstrate their dysregulation, the abundance is very small, especially compared to germline tissues. Moreover, we were not able to detect evidence of an amplification system or biological process to produce secondary piRNAs. How these modest levels of piRNAs affect their cognate genes remains to be determined. Recent studies suggest that multiple piRNAs may target a specific mRNA or have broad targeting capacity [109,112]. Thus, the predicted targets of the piRNAs in this study may require further verification and/or refinement. Furthermore, while the notion of dysregulated piRNAs from blood is attractive, their biological origins, such as from brain or other tissues, remain elusive. Previous work on the role of inflammatory processes in neurodegeneration suggests a blood-based pathway but requires additional investigation [145]. Finally, while recent studies have linked transposon expression in the brain to Alzheimer's disease processes, the role of these sequences in PD is largely unknown [146,147].

Conclusions
In summary, we identified and characterized somatic piRNAs from a multitude of non-germline tissues. Their distribution and source of biogenesis appeared to differ from germline tissues. Their function, based upon cognate RNA targets appeared to be different as well. Surprisingly, their gene targets were enriched for neurodegenerative disease and specific genes identified highlighted pseudogenes related to Parkinson's disease. The results presented here provide insights into the potential role of piRNAs in human neurodegenerative disorders.  Institutional Review Board Statement: Ethical review and approval were waived for this study due to anonymous data obtained from public database sources.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data were obtained from the GEO database https://www.ncbi.nlm. nih.gov/geo/ accessed on 14 January 2022. All data used in this study can be obtained from accession numbers provided in Table 1.

Acknowledgments:
The authors thank members of the Wong lab for discussion, critical reading, and suggestions for the manuscript. The Information and Technology Communication Office (ICTO) at the University of Macau is gratefully acknowledged for providing resources and assistance with the project.

Conflicts of Interest:
The authors declare no conflict of interest.