Sex-Specific Transcriptome Differences in Substantia Nigra Tissue: A Meta-Analysis of Parkinson’s Disease Data

Parkinson’s disease (PD) is one of the most common progressive neurodegenerative diseases. Clinical and epidemiological studies indicate that sex differences, as well as genetic components and ageing, can influence the prevalence, age at onset and symptomatology of PD. This study undertook a systematic meta-analysis of substantia nigra microarray data using the Transcriptome Mapper (TRAM) software to integrate and normalize a total of 10 suitable datasets from multiple sources. Four different analyses were performed according to default parameters, to better define the segments differentially expressed between PD patients and healthy controls, when comparing men and women data sets. The results suggest a possible regulation of specific sex-biased systems in PD susceptibility. TRAM software allowed us to highlight the different activation of some genomic regions and loci involved in molecular pathways related to neurodegeneration and neuroinflammatory mechanisms.


Introduction
Parkinson's disease (PD) is a common progressive neurodegenerative disease, affecting the older population (1-2% prevalence in individuals over 65 years, about 4% over the age of 85 years) [1,2]. The main clinical manifestations of the disease include resting tremor, bradykinesia (slowed movements), rigidity (increased muscular tone), postural instability, and gait impairment. Besides motor symptoms, PD patients manifest a wide range of non-motor symptoms such as depression, fatigue, sleep disturbance and autonomic dysfunction [3,4]. It has been demonstrated that the clinical manifestations of the disease, in particular motor symptoms, are mainly a consequence of the neurodegeneration of dopaminergic (DA) neurons in the substantia nigra (SN) pars compacta [5], and that symptoms usually appear when about 50-60% of these neurons are lost [2,6]. Furthermore, another pathological hallmark of PD is the presence of eosinophilic intraneuronal cytoplasmic protein-rich inclusions known as Lewy bodies (LBs) [7].
Parkinson's disease is considered a complex disease with an etiology that, to date, is not completely understood. Several factors, including genetic susceptibility and environmental influences, have been identified as possible causes. It has been demonstrated that Mendelian form of PD are caused by autosomal dominant and recessive gene mutations; likewise, genome wide experiments have highlighted an increasing number of loci that can be predisposed to the development of the sporadic form of the disease [8][9][10][11][12].
In addition to the well-known Mendelian transmission of PD, several studies have recently drawn attention to the role of epigenetics in the neurodegeneration that occurs in PD patients [11,13,14]. Epigenetic mechanisms in PD involve DNA demethylation, over-expression of the SNCA (synuclein α) gene and deregulation of the histone acetylation/deacetylation equilibrium [15][16][17][18]. Furthermore, the ubiquitin-proteasome system (UPS) has a remarkable impact in inherited PD through the well-known parkin protein, an E3 ubiquitin-ligase protein that changes in PD, loses its function and leads to an accumulation of anomalous proteins [19]. Finally, long non-coding RNA (lncRNA) and microRNA (miRNA) are thought to be involved in neuronal pathways and diseases such as PD [20].
As well as the genetic component, ageing [21] and sex [22,23] can also contribute as risk factors to the onset of the disease. While the role of ageing in PD has already been long established, epidemiological studies have also shown that, regardless of age and ethnicity, the male sex is associated with a higher development rate of PD. In fact, on average, men are twice as likely to develop PD and the disease is 1.3-3.7 times more prevalent in men than in women [22][23][24][25][26][27]. Clinical studies have demonstrated that sex differences can also influence the age of PD's onset and its symptomatology. Indeed, women develop PD almost two years later than men and initially display tremor symptoms instead of bradykinesia or rigidity and more pronounced symptoms such as depression, anxiety, pain, and sleep disturbances [22,24,27].
It has been hypothesized that estrogens may have an important role in defining such differences; several experiments have shown that estrogens could have a neuroprotective activity towards the dopaminergic (DA) system, both at dopamine synthesis and function level [22][23][24][27][28][29]. It has also recently been suggested that basic sex differences in brain organization, structure and function may be due to several genetic factors [30] and could justify some of the differences observed between male and female PD patients [30][31][32]. Indeed, experiments performed in animal models (such as on rats and monkeys) and neuroimaging measurement on the human brain seem to show that females have a higher number of DA neurons than males, as well as a greater striatal uptake [25,29,31], which could explain the delayed PD onset in women.
Furthermore, transcriptome analysis has allowed for a better understanding in relation to genes or loci potentially associated with the disease. Several experiments conducted on post-mortem brain tissue of PD patients and related controls, mainly on SN whole tissue or DA neurons obtained with laser capture microdissection (LMD), demonstrated differential gene expression profiles between PD patients and controls [33][34][35][36][37][38][39][40][41][42].
In a previous study, we used Transcriptome Mapper (TRAM) software to analyze a large amount of publicly available microarray data from independent studies conducted on the SN brain region of PD patients and controls, analyzing both post-mortem whole tissue and isolated LMD DA neuron expression data [43]. Thanks to original methods for parsing, normalizing and statistically analyzing expression data, the software was able to generate maps showing differential expression between two sample groups and to identify chromosomal segments and gene clusters which are relevant for different biological conditions [44].
To date only a few gene expression studies have focused on the characterization of transcriptome in male and female PD patients. Such studies, conducted on LMD DA neurons, have highlighted a sex-specific gene expression profile and showed a predominantly male deregulation of the genes involved in the DA neurons degenerative pathways or in mechanisms already associated with the pathogenesis of PD, thus confirming male sex as a risk factor for the disease [41,42].
We believe that further investigations in male and female brain transcriptome profiles could therefore contribute to a better understanding of the physiological and pathological molecular pathways and enable the development of sex-specific therapies.
For this reason, we conducted a new TRAM meta-analysis study to better define the differences in gene expression of PD patients vs. controls, when comparing men and women data sets from SN whole tissue.

Database Search and Selection
We conducted our search for gene expression data related to SN whole tissue of PD patients and healthy controls using the genomics repositories Gene Expression Omnibus (GEO) [45] and ArrayExpress [46]: in GEO with the search terms "Parkinson Disease" and "Homo sapiens"[Organism]; in the ArrayExpress database with the terms "Parkinson Disease" and then filtered for "Homo Sapiens"[Organism], "RNA assay" and "array assay"[Experiment type] and "all array". We then selected those series including microarray experiments conducted on SN whole tissue and where sex was specified.
A further search was carried out using the criteria described in the TRAM guidelines [44] and listed in our previous work [43]. In particular, all the selected experiments were carried out on specific brain structure and clinical conditions (SN pars compacta from post-mortem brains from individuals with PD and matched controls), based on availability of raw or pre-processed data (see Supplementary  Table S1 for details). We did not include data from exon array or other probes, as a too high number of data rows could interfere with program execution. Other exclusion criteria were also: the absence of identifiers corresponding to those found in the GEO sample records (GSM) or Array express sample records; platforms without standard format (for example with an atypical number of genes, i.e., <5000 or >60,000); data with expression values not evidently marked as linear or logarithmic.
Searches were executed up to July 2017.

Literature Search
We performed a systematic literature search to identify further articles related to global gene expression profile experiments in PD patients that were not present in GEO or ArrayExpress. A general search was conducted on PubMed up to July 2017, using the terms "Parkinson disease" and "microarray analysis" and "human". The medical subject headings (MeSH) terms "Parkinson disease", "microarray analysis" (or "gene expression profiling" or "oligonucleotide array sequence analysis"), "substantia nigra" and "human" were also used for a more advanced search. The search did not detect additional data.

Tram Analysis
Transcriptome Mapper software is freely available at http://apollo11.isto.unibo.it/software. We used the TRAM version 1.2 (April 2014) [44] which contains an updated version of UniGene and Entrez Genes databases compared to the original 2011 version [44].
At first, all platforms used in the array experiment that are not present in the TRAM 1.2 version were manually extracted and imported, so that, during the import phase, the software could associate a specific gene symbol via UniGene parsing with each probe identifier.
In order to be imported and analyzed by TRAM, all samples selected for each series were downloaded as tab-delimited text format. Samples were then divided into pools, according to different sex and brain condition: SN from male control subjects (pool A); SN from female control subjects (pool B); SN from male PD-affected patients (pool C); SN from female PD-affected patients (pool D).
As previously stated, when importing the data, the software was able to assign the appropriate gene symbol to each probe identifier through UniGene parsing and subsequently normalize them (via intra-and inter-sample normalization) in order to compare gene expression data from experiments performed with different platforms and/or biological conditions [44]. It is noteworthy that the TRAM software evaluates the possible risk of bias as it is intrinsically resistant to the systematic differences between batches (groups) of samples [44].
Four different analyses were performed according to standard setting ("Map" mode) using default parameters [43,44,[47][48][49] and named as below: • TRAM control (CTR), comparing the transcriptome map of SN from male control subjects (pool A) and SN from female control subjects (pool B); • TRAM PD, comparing the transcriptome map of SN from male patients affected by PD (pool C) and SN from female patients affected by PD (pool D); • TRAM MALE, comparing the transcriptome map of SN from male patients affected by PD (pool C) and SN from male control subjects (pool A); • TRAM FEMALE, comparing the transcriptome map of SN from female patients affected by PD (pool D) and SN from female control subjects (pool B).
Briefly, the "Map" mode analysis was set to evaluate segments of 500,000 bp with a sliding window of 250,000 bp [44]. The expression value for each genomic segment is calculated as the mean of the expression values of all the loci included in that segment. As a result, TRAM highlighted those segments where the expression value was significantly different between the two considered conditions and contained at least three over/under-expressed genes (genes at the top/bottom 2.5% percentile of values) [44].
The statistical significance was calculated taking into account all genes in the genome (genome median) and corrected for multiple comparisons possible causing false discovery rate (FDR) due to the high number of segments or genes in a genome (q-value). A segment or a gene was considered to be statistically significantly over-or under-expressed for q < 0.05. [44].

Other Analyses
European Bioinformatics Institute (EMBL-EBI) Expression Atlas [50], UniGene [51], National Center for Biotechnology Information (NCBI) Entrez Gene [52] and Gene Ontology (GO) [53] were used to obtain gene-specific information and to functionally characterize the set of genes derived from the TRAM analysis.

Database and Literature Search
The gene expression data search in GEO and ArrayExpress data repository, followed by the selection according to the criteria described in the Materials and Methods section, allowed us to include a total of 10 series in the meta-analysis. More details about series and samples are listed in Table 1 and Supplementary Table S1.  Table S2).
Results are listed in Table 2 and highlight the differential expression of nine segments: five containing over-expressed genes and four classified as under-expressed.
As expected, the highest expression ratio was observed for a segment on chromosome Y (Yq11.1 region) including the ncRNA TTTY15 and the known genes USP9Y and DDX3Y. USP9Y is a member of the peptidase C19 family, encoding for a protein that is similar to ubiquitin-specific proteases; DDX3Y is a member of the DEAD-box RNA helicase family.
Conversely, the most under-expressed segment is located on chromosome X (Xq13.2) and significance was detected for the ncRNAs XIST and JPX and for the EST cluster Hs.720466.
Eleven EST clusters (Hs.731466 EST was validated for coding UBE2QL1 as seen in the UniGene database search) were found as over-or under-expressed, but not associated with known genes.  Table S3).
Results are reported in Table 3 and underline the differential expression of nine segments, only one of which is under-expressed.
As per TRAM CTR results, the first over-expressed segment maps on chromosome Y and includes the significant genes TTTY15, USP9Y and DDX3Y.
Only one segment is reported as under-expressed and is located on chromosome 8 (8p23.1), including three genes of the defensins family (DEFA4, DEFA1 and DEFA3).
Eleven expressed sequence tags (EST) clusters (Hs.635716 EST was validated for coding CDR1-AS1 as seen in the UniGene database search) were found as over-or under-expressed, but not associated with known genes.  Table S4).
The results, as reported in Table 4, underline the deregulation of 12 segments, only two of which are under-expressed.
The segment with the highest expression ratio is located on chromosome 1 (1p22.2), showing as statistically significant GBP1, a gene of the GBP family of guanylate binding protein, the protein kinase coding gene PKN2 and the EST cluster Hs.732899 as statistically significant. Another segment with the same expression ratio (1.31) maps on chromosome 15 (15q21.1) and includes the genes SHC4, an SHC adaptor protein 4, EID1, an EP300 interacting inhibitor of differentiation and the EST cluster Hs.585204.
The lowest expression ratio was reported for the segment on chromosome 19 comprising the genes POP4, encoding for a protein subunit of the small nucleolar ribonucleoprotein complexes, C19orf12, encoding for a small transmembrane protein and the ncRNA LOC284395.
Seventeen EST clusters (Hs.732899 was validated for coding GBP2 as seen in the UniGene database search, Hs.98445 for coding RASA2, Hs.741442 for coding GS1-124K5.11) were found as over-or under-expressed, but not associated with known genes.  Table S5). Table 5 are listed the statistically significant segments obtained, six of which are over-expressed and only one segment is under-expressed.
The highest expression ratio was observed for a segment on chromosome 8 (8p23.1) that includes three genes belonging to the Defensins protein family (DEFA4, DEFA1, DEFA3) as over-expressed.
The only under-expressed segment is located on chromosome X (Xq27.1), containing the ncRNA LINC00632, the gene CDR1 named as cerebellar degeneration related protein 1, not yet functionally characterized, and the antisense CDR1-AS1 (corresponding in Table 4 to Hs.635716).
Four EST clusters (Hs.637575 was validated for coding LOC101928809 as seen in the UniGene database search, C8orf46 for coding VXN vexin, Hs.660769 for coding ARID3A, Hs.635716 for coding CDR1-AS) were found as over-or under-expressed, but not associated with known genes.   Table 3. * Cytoband was derived from the UCSC Genome Browser [54].   Table 4.   Table 5.

Discussion
The main aim of this study was to characterize the male vs. female gene expression differences observed in the SN brain region of PD patients and matched controls.
Our meta-analysis was conducted on SN whole tissue to assess the global alteration occurring in this brain region, considering the contribution of both DA neurons, whose degeneration leads to the onset of the major (motor and non-motor) symptoms, and glial cells (non-neuronal cells). The TRAM software was chosen for its particular features, mainly for its ability to process microarray data obtained from independent studies performed with different array platforms and for the intra-and inter-sample normalization (scaled quantile) method, useful to avoid possible batch effect [44]. In addition, the statistical validity of the software has been confirmed in different studies [43,[47][48][49].
In our previous study, we used TRAM to focus on segments differentially expressed by the disease but not by sex [43]. Starting from the same data collection, plus new microarray series, we specifically investigated the sex differences. In particular, we performed a comparison of healthy controls in order to generate a sex-specific transcriptome reference map of normal substantia nigra (TRAM CTR) to compare with the others of unhealthy tissues (TRAM PD, TRAM MALE and TRAM FEMALE) and possibly evaluate the differences due only to the disease and not to the sex.

Sex-Biased Gene Expression in Parkinson's and Control SN
The transcriptome map of healthy controls showed the evident different expression in male vs. female SN of segments located on sex chromosomes (TRAM CTR, Table 2). This confirm the robustness of TRAM statistics, as it identifies sample-specific segments. In particular, we can note that the Y chromosome segment, containing the male sex specific genes TTTY15, USP9Y and DDX3Y, is the most over-expressed in both analyses involving the comparison between male vs. female subjects (TRAM  CTR and TRAM PD, respectively Tables 2 and 3).
Therefore, the present study supports the known differential expression of X-and Y-linked genes in a sex-specific brain gene expression analysis [41,55] as well as the obvious deregulation of loci not mapping on sex chromosomes [41,42]. Among them, we can highlight several genes which encode for proteins with specific neuronal function, found to be upregulated in males. i.e., the MSRA gene in SN control samples (TRAM CTR, Table 2), whose product has a key role in oxidative stress response and a high expression in the brain, including the SN, with a higher immuno-reactivity in neurons than in glial cells [56,57]; the SCN3A and SCN2A genes, that encode for glycoprotein members of a family of voltage-gated sodium channels involved in the generation and propagation of action potentials in neurons and muscle cells [58,59]; the CSRNP3 gene, encoding for a transcriptional activator, whose mouse ortholog Mbu-1 shows a high expression in the central nervous system, in particular in the brain and spinal cord in the course of embryonic development but also in adults [60]; the NSUN2 gene, whose product is a methyltransferase involved in the stabilization of the anticodon-codon pairing and the correct translation of the mRNA, and whose loss-of-function is associated with increased cellular stress and neurodevelopmental disorders [61].
It is noteworthy that these genes have not yet been reported in relevant research literature as being associated with sex-biased gene expression, so they could be further investigated to better understand the anatomical and functional dimorphism of SN tissue.
Nonetheless, the under-expressed segments in Table 2 reflect the main deregulation of ncRNA genes or unknown transcripts in men, indicating a possible higher activity of epigenetic mechanisms in women. Other studies have pointed out the significance of epigenetic mechanisms in promoting sexual dimorphism in the brain and also their involvement in several neurological and psychiatric disorders, suggesting that epigenetics could have a role in the different regulation and susceptibility of these diseases [62].
In the TRAM PD analysis (Table 3), in addition to the confirmed presence of the segment on chromosome Y, we can also note the over-expression of GABRB2, GABRA6, and GABRA1 genes, encoding for various subunits of a chlorine channel that mediates the rapid inhibition of synaptic transmission in the central nervous system. Gamma-aminobutyric acid GABA is the major inhibitory neurotransmitter of the brain. Sex differences in brain neurochemistry have been well documented, with studies reporting GABAergic variations together with DA and serotonergic systems alterations [31]. In addition, in vitro and in vivo studies are focused on the relationship between deregulation of GABA receptors and PD [63,64]. In particular, it has been assumed that a decrease in GABA receptors in the dentate nucleus is related to tremors generation [65]. As shown by the results of the TRAM PD analysis, the under-expression of some GABA receptors genes in females could suggest that the early tremor symptom in women could be associated with the down-regulation of the GABA system. Furthermore, in Table 3 we connote another nervous tissue-associated gene, CDR1 (cerebellar degeneration related protein 1) and the antisense CDR1-AS1 (Hs.635716), both of which are over-expressed in male PD. These two loci are also reported as significant in the TRAM FEMALE analysis (Table 5), where they are under-expressed in female PD vs. control subjects. Recent studies described CDR1 and its antisense counterpart as highly expressed in the brain [66]. CDR1-AS1 is a circular RNA (circRNA) that binds and inhibits microRNA (miRNA) such as miR-7, which are in turn involved in alpha synuclein repression; in particular, this mechanism has been investigated in humans and mice, where the loss-of-function of CDR1-AS1 is associated with brain dysfunctions [67].
If we consider the two specific analyses for PD, TRAM MALE and TRAM FEMALE, the difference between their relative results is noticeable.
In the TRAM FEMALE analysis (Table 5), the main over-expression in pathological tissue of segments including immune system activation-related loci is very clear: i.e., DEFA4, DEFA1 and DEFA3 genes belonging to the defensins family of proteins and involved in host defense; CXCL9, CXCL10 and CXCL11 genes, that encode for the Chemokine protein family; AZU1, PRTN3 and ELANE genes, known to be expressed coordinately and to encode for proteins incorporated into azurophil granules during neutrophil differentiation.
When we compare all the affected males vs. females in the TRAM PD analysis, the only under-expressed segment is the one on chromosome 8 including the defensins gene family, confirming the sexually dimorphic activation of gliosis/inflammation specific genes in PD female subjects. Some of these genes were already commented on in our previous work [43] and several studies described the activation of microglia and increased expression of immune system mediators in the brain as a common feature of many neurodegenerative diseases, including PD [68][69][70]. Notably, recent findings have associated higher brain levels of inflammatory markers with more severe symptoms of stress-related depression and fatigue [71] and suggested that sex differences in neuroinflammation may explain the female bias in these symptoms [72]. Based on these data, we can hypothesize that depression and anxiety which mainly characterize females affected by PD could be related to the alteration of inflammatory pathways.
In the TRAM MALE analysis, the results highlight several genes that cannot be related to common functional pathways. Among them, we can underline as over-expressed the GPNMB gene (glycoprotein nmb) whose locus has been indicated as a susceptibility factor for PD in gene expression and methylation assays [73]. The locus is also present in the "Top Results" list of PDGene, a publicly available database providing collection of genome wide association studies (GWAS) performed on PD phenotypes [74]. Interestingly, other segments resulted in our study seem to include genes with genetic variant associated to the disease. Further investigation integrating GWAS and gene expression data, could be a robust approach for the identification of candidate genes and functions associated.

Other Results
It is interesting to observe the high number of Hs. sequences shown to be under-or over-expressed in a statistical manner. These sequences are from H. sapiens transcriptome, still not fully characterized and supposedly coding for proteins or non-coding transcripts, such as lncRNA, miRNA or other small non-coding RNAs (sncRNAs). Also noticeable is the presence of LOC sequences (genes of uncertain function) and large intergenic non-coding RNAs (lincRNAs), recently emerging as key regulators of several cellular processes. Some evidence suggests their possible function in the central nervous system as regulators in brain evolution and neuronal development [75]. It is worthwhile considering that the Homo sapiens (Hs.). sequences are constantly updated in the UniGene database, and they could be linked to gene sequences in the future.
One of the TRAM software's main features is to detect the differential expression of a chromosomal segment region, not just single genes, providing statistical analysis of over-or under-expressed regions when compared to the whole genome or to the relative chromosome [44]. Results could suggest common regulatory mechanisms for the many genes within the significant segment, whose genetic variability has already been associated with PD.
Remarkably, the present study identifies some segments immediately adjacent to known PD-related genes, i.e., the segment on chromosome 4 (4q21.1) that in the TRAM FEMALE analysis (Table 5), was shown to be adjacent to the one comprising the SNCA gene (PARK1/PARK4), encoding for alpha-synuclein. Currently, this protein is one of the main pathophysiological hallmarks of PD since its accumulation forms the typical intracellular aggregates characteristic of some neurodegenerative diseases, the LBs [8][9][10][11][12].
In the same analysis, we can note the segment on chromosome 1 (1q21) known to be the location of the GBA gene (glucosylceramidase β), another important risk factor for PD and dementia with LBs [12,76]. In addition, in the TRAM CTR analysis the segment on chromosome 2 was over-expressed in men vs. women (2q24) and including the SCN3A gene as statistically significant, it is very near to SKT39 (serine/threonine kinase 39), a gene recently confirmed as associated with PD in a genome-wide association study [77]. Interestingly, the same study also indicates also SCN3A as a novel risk locus.

Conclusions
Taken together, the TRAM meta-analysis results lead us to hypothesize a possible activation of specific sex-biased systems in PD susceptibility, although further investigations are needed to confirm the hypothesis.
Most notably, these data could suggest the existence of differences in the activation of neurodegeneration and neuroinflammatory mechanisms. In particular, the under-expression of some GABA receptors genes observed in females could be related to the prevalence of tremors as their initial symptom. In addition, the increased expression of proinflammatory genes detected in women could explain the different development of the PD in the two sexes, with females shown to be more likely to be affected by depression and anxiety, symptoms that reduce patients' quality of life.
The validation of the molecular pathways and genes which proved to be significant in this study could help to understand the etiology of PD and contribute in identifying those factors underpinning differences between men and women. Having a better understanding of the genes involved in the disease could also be helpful in the design of new drugs to slow disease progression and to potentially define sex-specific therapies.
Finally, the TRAM software has provided some confirmation and some new insights into PD, proving to be a useful tool in the study of a complex disease, where the need to integrate different types of genetic technologies and approaches is fully acknowledged.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/9/6/275/s1, Table S1: Samples selected for the meta-analysis of gene expression profiles of PD patients vs. healthy controls, Table S2: List of 36,654 TRAM mapped loci for which an expression value A/B was calculated (SN samples from male controls vs. female controls), Table S3: List of 36,654 TRAM mapped loci for which an expression value C/D was calculated (SN samples from PD male patients vs. PD female patients), Table S4: List of 36,654 TRAM mapped loci for which an expression value C/A was calculated (SN samples from PD male patients vs. male controls), Table S5: List of 36,654 TRAM mapped loci for which an expression value D/B was calculated (SN samples from PD female patients vs. female controls).
Author Contributions: E.M. and R.C. conceived and designed the study; E.M. and L.L. performed the data collection and meta-analysis; E.M., L.L. and R.C. analyzed the data; F.Fa. and F.P. critically revised the meta-analysis results and the manuscript; F.Fr. and A.T. supervised the work and revised the whole manuscript. E.M, L.L. and R.C. wrote the manuscript. All authors drafted, critically discussed and approved the final manuscript.