Gene Set Enrichment Analysis and Genetic Experiment Reveal Changes in Cell Signaling Pathways Induced by α-Synuclein Overexpression

Abnormal accumulation of alpha synuclein (α-Syn) in sporadic and familial Parkinson’s disease (PD) may be a key step in its pathogenesis. In this study, the expression matrix of the GSE95427 dataset after α-Syn overexpression in human glioma cell line H4 was obtained from the GEO database. We used the Gene Set Enrichment Analysis (GSEA) method to reanalyze this dataset to evaluate the possible functions of α-Syn. The results showed that the tumor necrosis factor alpha (TNF-α) signal was significantly activated in α-Syn-overexpressing cells, and oxidative phosphorylation signal, extracellular matrix signal, cell cycle related signal and fatty acid metabolism signal were significantly inhibited. Moreover, we employed the α-Syn-expressing transgenic Drosophila model of Parkinson’s disease and knocked-down eiger, a TNF superfamily ligand homologue, indicating that the TNF-α pathway plays a role in the common pathogenesis of synucleinopathies. Our analysis based on GSEA data provides more clues for a better understanding of α-Syn function.


Introduction
The protein α-Synuclein (α-Syn) was discovered in 1988 as a component of cholinergic synapses in the discharge organ of Torpedo californica [1]. In humans, α-Syn is abundant in the brain, primarily in synaptic terminals [2,3] at the tips of neurons. Pre-synaptic terminals transmit signals between neurons by releasing various neurotransmitters from synaptic vesicles, and are essential for normal brain function [4]. In 1997 and 1998, evidence that mutations in the α-Syn gene were found in sporadic and familial Parkinson's disease (PD) [5,6]. This understanding of the pathogenesis of PD was considered by many scientists to be at least as important as the description of the toxicity of MPTP (a neurotoxin capable of destroying dopamine-producing nerve cells in the substantia nigra) 15 years ago [7]. The gene encoding α-Syn is also called "PARK1", and its mutations, such as A30P and A53T, are closely related to PD. PD, the second most common neurodegenerative disorder, is characterized by a progressive loss of dopaminergic neurons within the substantia nigra pars compacta of the midbrain [8]. The neuropathological hallmark of the disease is the presence of intracytoplasmic inclusions called Lewy bodies (LBs) and Lewy neurites (LNs) [9]. Although it is clear that LB and LN mainly contain α-Syn, the mechanism leading to the aggregation of α-Syn needs to be clarified [10]. Perhaps α-Syn is a multidimensional factor that might subtend several neurobiological underpinnings. Thus, the cause of α-Syn aggregation and its relationship to dopamine neuron loss in PD is the subject of much current work [11][12][13]. The main problem in the molecular pathology of PD is not only to understand the aggregation of α-Syn, but also to know which key signals are associated with it, to regulate the pathological development of PD. Therefore, it is of great significance to study the gene networks related to α-Syn. However, many physiological functions of α-Syn are only partially understood; the exact pathomechanisms of α-Syn underlying neurodegenerative diseases remain elusive.

Differential Expression Matrix Acquisition
We installed and loaded the Bioconductor package (https://bioconductor.org/, accessed date: 24 December 2018) in R3.5 software. Bioconductor was developed in the R statistical programming language, which is a popular toolkit for analyzing high-throughput genomic data [20]. After reading the original expression matrix we found that the platform information used by the GSE95427 dataset was GPL570, and its corresponding package file in Bioconductor was hgu133plus2.db. After installing the package, the corresponding relationship between the gene name information and the probe can be obtained, and then the probe and gene name can be converted to obtain the final expression matrix. Finally, the group matrix of the α-Syn overexpression group and control group was constructed using the limma package [21] in Bioconductor, and the differentially expressed genes were output after Bayesian test and linear model fitting.

Acquisition of Heat Map and Volcano Map
All differentially expressed genes were analyzed using the Subset function, and 806 genes with differences greater than twice and p-values less than 0.05 were found. The 806 differentially expressed genes were mapped using the Pheatmap function, and the heat map analysis results were obtained. All the obtained differential genes were plotted using the ggplot2 function to obtain a volcano map, and the names of the genes with a log2fold ratio change of more than 2 were displayed on the map.

GO and KEGG Analysis
The ClusterProfiler package [22] in the Bioconductor toolkit was used to obtain a gene list from 806 genes with a difference of more than twice and a p value of less than Biomedicines 2023, 11, 263 3 of 13 0.05, and the gene names were converted into ENTREZID to represent genes in the NCBI database. GO analysis of cell components, biological processes, and molecular functions was performed using the function enrichGO, and KEGG [23] analysis was performed by the function enrichKEGG, with the partition p-value pvalueCutoff = 0.01 and the estimated false discovery rate qvalueCutoff = 0.01. Finally, the Barplot function was used to plot.

Gene set Enrichment Analysis
After loading the clusterProfiler package, regardless of the fold of gene differential expression, the gene list was obtained from all genes, the gene names were converted to ENTREZID, and the genes were sorted according to their differential expression values from high to low. The Hallmark gene set file was obtained from the MSigDB database [24], gene set enrichment analysis was carried out with GSEA function, and the overall distribution map was obtained using the dotplot function. After installing the enrichplot package (https://github.com/GuangchuangYu/enrichplot, accessed date: 24 December 2018), the gseaplot2 function in the package was used to obtain enrichment information maps of individual gene sets.

Differentially Expressed Genes
To analyze differentially expressed genes in response to α-Syn overexpression it is necessary to construct a linear model, which is a common method for experimental data analysis. The limma package allows the construction of linear models and differential expression analysis. This package allows for simultaneously comparison of multiple experimental groups [21]. First, a linear model was fitted for each gene expression, and then Empirical Bayes was used to analyze the residuals to obtain the appropriate t-statistic, which was optimized for the variance estimation of the experiment to make the analysis results more reliable [21]. After constructing the matrix of the α-Syn overexpression group and control group using the limma program package, 20,186 of all differentially expressed genes were output after the steps of the Bayesian test and linear model fitting ( Figure 1). Further, a total of 806 genes with a difference of more than twice and a p value of less than 0.05 were screened ( Figure 1). pression analysis. This package allows for simultaneously comparison of multiple experimental groups [21]. First, a linear model was fitted for each gene expression, and then Empirical Bayes was used to analyze the residuals to obtain the appropriate t-statistic, which was optimized for the variance estimation of the experiment to make the analysis results more reliable [21]. After constructing the matrix of the α-Syn overexpression group and control group using the limma program package, 20,186 of all differentially expressed genes were output after the steps of the Bayesian test and linear model fitting ( Figure 1). Further, a total of 806 genes with a difference of more than twice and a p value of less than 0.05 were screened ( Figure 1).

Clustering of Differential Expression Profiles
Clustering analysis was performed on 806 genes with a difference of more than twice and a p-value of less than 0.05. By observing the dendrogram (cluster analysis of rows and columns), it is evident that the expression patterns of the two samples belonging to the α-Syn overexpression group were similar, as were the two samples belonging to the control group ( Figure 2). Compared to the control group, 332 genes were upregulated and 474 genes were downregulated in the α-Syn overexpression group ( Figure 2). This indicates that both the control group and the α-Syn overexpression group had good reproducibility of their respective samples, which proves that there is no significant problem in the experimental treatment and that the data are reliable for further analysis.

Clustering of Differential Expression Profiles
Clustering analysis was performed on 806 genes with a difference of more than twice and a p-value of less than 0.05. By observing the dendrogram (cluster analysis of rows and columns), it is evident that the expression patterns of the two samples belonging to the α-Syn overexpression group were similar, as were the two samples belonging to the control group ( Figure 2). Compared to the control group, 332 genes were upregulated and 474 genes were downregulated in the α-Syn overexpression group ( Figure 2). This indicates that both the control group and the α-Syn overexpression group had good reproducibility of their respective samples, which proves that there is no significant problem in the experimental treatment and that the data are reliable for further analysis.

Interaction Network of Differentially Expressed Genes
The STRING database (https://string-db.org/cgi/input.pl, accessed date: 10 May 2021) is a database for searching protein interaction on line [25]; it was searched, and the names of 806 genes with a difference of more than twice and a p-value less than 0.05 were used as inputs. Finally, the network view was used to show the direct or predicted association network of a specific protein. We found that 707 of the 806 differentially expressed genes

Interaction Network of Differentially Expressed Genes
The STRING database (https://string-db.org/cgi/input.pl, accessed date: 10 May 2021) is a database for searching protein interaction on line [25]; it was searched, and the names of 806 genes with a difference of more than twice and a p-value less than 0.05 were used as inputs. Finally, the network view was used to show the direct or predicted association network of a specific protein. We found that 707 of the 806 differentially expressed genes could be densely located in a network through protein interactions, accounting for 87.7% of the total differentially expressed genes ( Figure 3) and indicating that the genes with differences after α-Syn overexpression were very closely linked.

GO and KEGG Enrichment Analysis
The differentially expressed genes were analyzed by traditional Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. GO enrichment analysis can be divided into three categories: cellular component, molecular function and biological process. According to the 806 differentially expressed genes

GO and KEGG Enrichment Analysis
The differentially expressed genes were analyzed by traditional Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. GO enrichment analysis can be divided into three categories: cellular component, molecular function and biological process. According to the 806 differentially expressed genes selected above, the hypergeometric distribution relationship between these differentially expressed genes and some specific branches in the known GO classification was calculated. GO analysis will return a p-value for each GO with differential genes, and a small p-value indicates that the differential genes are enriched in GO. The results showed that extracellular matrix signals were significantly enriched in cellular components, molecular functions and biological processes after α-Syn overexpression ( Figure 4A-C). In addition, KEGG pathway enrichment analysis revealed that only a few signals, such as TNF-α signaling pathway, was enriched after α-Syn overexpression ( Figure 4D).
Biomedicines 2023, 11, 263 7 of 15 extracellular matrix signals were significantly enriched in cellular components, molecular functions and biological processes after α-Syn overexpression ( Figure 4A-C). In addition, KEGG pathway enrichment analysis revealed that only a few signals, such as TNF-α signaling pathway, was enriched after α-Syn overexpression ( Figure 4D).

Gene Set Enrichment Analysis
Given the limited amount of information obtained from the GO and KEGG enrichment analysis, gene set enrichment analysis (GSEA), a method that can examine the enrichment signal, was utilized to further analyze the data [24,26]. The GSEA method defines a specific gene set related to a particular biological process in advance based on existing research results, and then uses statistical methods to evaluate the distribution trend of all genes obtained in a specific experiment (regardless of the gene differential expression fold) corresponding to the predetermined gene set. Its input data consist of two parts: one is the preset known gene sets (which can be GO annotation, KEGG annotation, or other custom gene set conforming to the format), and the other is the expression matrix obtained by the experiment according to the change in gene expression value from large

Gene Set Enrichment Analysis
Given the limited amount of information obtained from the GO and KEGG enrichment analysis, gene set enrichment analysis (GSEA), a method that can examine the enrichment signal, was utilized to further analyze the data [24,26]. The GSEA method defines a specific gene set related to a particular biological process in advance based on existing research results, and then uses statistical methods to evaluate the distribution trend of all genes obtained in a specific experiment (regardless of the gene differential expression fold) corresponding to the predetermined gene set. Its input data consist of two parts: one is the preset known gene sets (which can be GO annotation, KEGG annotation, or other custom gene set conforming to the format), and the other is the expression matrix obtained by the experiment according to the change in gene expression value from large to small, and then judge whether the genes under each annotation in the gene set are enriched in a certain position of the expression matrix. GSEA analysis did not consider the fold of differential expression of genes, and the input variable was the differential expression of all genes in the two comparison groups, while the traditional GO and KEGG analysis methods artificially set the false discovery rate (FDR) to screen out a certain proportion of differentially expressed genes, and the input variable was a specific gene list. Therefore, compared with traditional GO and KEGG analysis, the results of GESA are more scientific. Here, GSEA was used to reanalyze the gene set enriched after α-Syn overexpression. The results showed that in the Hallmark gene set, overexpression of α-Syn activated 9 sub-gene sets and inhibited 18 sub-gene sets. The most obvious set of activating genes was the set associated with TNF-α signaling (NFKB-dependent TNF-α signaling) ( Figure 5 and Table 1). However, oxidative phosphorylation-related signals (oxidative phosphorylation), extracellular matrix-related signals (epithelial-mesenchymal transition), and cell cycle-related signals (E2F targets) were significantly inhibited ( Figure 5 and Table 1). The details of the genes involved in the top four enriched gene sets are shown in Table 1. Simultaneously, we found that the genes involved in the top four enriched gene sets appeared to different degrees in the protein interaction networks (Figure 3), especially for the TNF-α signal-related gene set and extracellular matrix-related signal: TNF-α signal-related gene set (39/88), oxidative phosphorylation-related signals (3/112), extracellular matrix-related signals (41/92), and cell cycle-related signals (4/89).

, 263
Therefore, compared with traditional GO and KEGG analysis, the results more scientific. Here, GSEA was used to reanalyze the gene set enriched after expression. The results showed that in the Hallmark gene set, overexpress activated 9 sub-gene sets and inhibited 18 sub-gene sets. The most obvious s ing genes was the set associated with TNF-α signaling (NFKB-dependent T ing) ( Figure 5 and Table 1). However, oxidative phosphorylation-related sig tive phosphorylation), extracellular matrix-related signals (epithelial-mesen sition), and cell cycle-related signals (E2F targets) were significantly inhibit and Table 1). The details of the genes involved in the top four enriched gene se in Table 1. Simultaneously, we found that the genes involved in the top four e sets appeared to different degrees in the protein interaction networks (

Reliability Analysis of Gene Enrichment
To further evaluate the reliability of the results of the gene enrichment analysis, the gseaplot2 function was used to obtain the enrichment information map of a single gene set. We selected 6 gene sets to view detailed enrichment information. From the density distribution and enrichment scores of the ranking association matrix in the gene list it can be seen that both activated and repressed gene sets show relatively reliable enrichment, indicating that these gene sets are indeed significantly enriched ( Figure 6). In the GSEA enrichment graphs, the curves represent the running sum of enrichment scores, the middle part of the graph shows the position of genes associated with specific pathways, and the bottom part of the graph shows how the fold change is distributed along with the gene list. The normalized enrichment score (NES) and the adjusted p-values were shown in the graphs.

Knockdown of TNF-α Pathway Partially Rescues the DA Neurons Degeneration Caused by α-Syn Overexpression
To further confirm the results obtained from GSEA we employed the α-Syn-expressing transgenic Drosophila model of PD for the in vivo experiment. The TNF-α pathway was chosen for further investigation because it was strongly activated in the top four enriched genes following α-Syn overexpression. We performed qRT-PCR detection of the Drosophila TNF-α homologue eiger [27] and its two receptors, Grindelwald (Grnd) [28] and Wengen (Wgn) [29]. Tyrosine hydroxylase (TH) is the rate-limiting enzyme for catecholamine synthesis, catalyzes the hydroxylation of tyrosine to dopamine [30], and is widely used to immunolabel DA neurons in Drosophila [31][32][33][34][35][36]. We found that α-Syn.A30P overexpression with a tyrosine hydroxylase promoter driver (TH-Gal4) [37] in dopamine neurons significantly increased the levels of eiger and Grnd in the head ( Figure 7A). In Drosophila, we and other studies have shown that age-dependent specific loss of lateral protocerebral posterior 1 (PPL1) neurons was detected in several models of PD, including mutants for PINK1 [35], parkin [36], and α-Syn or Lrrk2 overexpression models [34,37]. When α-Syn.A30P was introduced into the DA neurons with TH-Gal4 driver, significant suppression of PPL1 DA neuronal loss was detected in the aged flies ( Figure 7B,C). Inter- In the GSEA enrichment graphs, the curves represent the running sum of enrichment scores, the middle part of the graph shows the position of genes associated with specific pathways, and the bottom part of the graph shows how the fold change is distributed along with the gene list. The normalized enrichment score (NES) and the adjusted p-values were shown in the graphs.

Knockdown of TNF-α Pathway Partially Rescues the DA Neurons Degeneration Caused by α-Syn Overexpression
To further confirm the results obtained from GSEA we employed the α-Syn-expressing transgenic Drosophila model of PD for the in vivo experiment. The TNF-α pathway was chosen for further investigation because it was strongly activated in the top four enriched genes following α-Syn overexpression. We performed qRT-PCR detection of the Drosophila TNF-α homologue eiger [27] and its two receptors, Grindelwald (Grnd) [28] and Wengen (Wgn) [29]. Tyrosine hydroxylase (TH) is the rate-limiting enzyme for catecholamine synthesis, catalyzes the hydroxylation of tyrosine to dopamine [30], and is widely used to immunolabel DA neurons in Drosophila [31][32][33][34][35][36]. We found that α-Syn.A30P overexpression with a tyrosine hydroxylase promoter driver (TH-Gal4) [37] in dopamine neurons significantly increased the levels of eiger and Grnd in the head ( Figure 7A). In Drosophila, we and other studies have shown that age-dependent specific loss of lateral protocerebral posterior 1 (PPL1) neurons was detected in several models of PD, including mutants for PINK1 [35], parkin [36], and α-Syn or Lrrk2 overexpression models [34,37]. When α-Syn.A30P was introduced into the DA neurons with TH-Gal4 driver, significant suppression of PPL1 DA neuronal loss was detected in the aged flies ( Figure 7B,C). Interestingly, when the eiger was knocked down, it partially prevented PPL1 DA neurons degeneration in α-Syn.A30P overexpression flies ( Figure 7B,C). Thus, we demonstrate that the TNF-α pathway is a physiological ligand in the α-Syn-mediated PD model. estingly, when the eiger was knocked down, it partially prevented PPL1 DA neurons degeneration in α-Syn.A30P overexpression flies ( Figure 7B,C). Thus, we demonstrate that the TNF-α pathway is a physiological ligand in the α-Syn-mediated PD model.

Discussion
TNF-α is increasingly being recognized as a key pro-inflammatory cytokine involved in chronic inflammation and PD neurodegeneration. Microglial release and recombinant TNF-α have been shown to disrupt α-Syn degradation and lead to its accumulation in PC12 cells and midbrain neurons [38]. TNF-α can affect the autophagy pathway and regulate lysosomal acidification in dopaminergic cells, leading to the α-Syn accumulation. This may represent a novel mechanism for TNF-α-induced degeneration of dopaminergic neurons in PD. Another study also observed an increase in TNF-α expression after injecting α-Syn into the striatum of mice [39]. Both KEGG signaling pathway enrichment and gene set enrichment analyses showed that TNF-α-related signals or gene sets were significantly activated in cells overexpressing α-Syn, which is consistent with previous studies. Our study found that both eiger and Grnd were up-regulated in the α-Syn overexpression Drosophila heads. A recent study showed that Grnd and Wgn have different affinities for

Discussion
TNF-α is increasingly being recognized as a key pro-inflammatory cytokine involved in chronic inflammation and PD neurodegeneration. Microglial release and recombinant TNF-α have been shown to disrupt α-Syn degradation and lead to its accumulation in PC12 cells and midbrain neurons [38]. TNF-α can affect the autophagy pathway and regulate lysosomal acidification in dopaminergic cells, leading to the α-Syn accumulation. This may represent a novel mechanism for TNF-α-induced degeneration of dopaminergic neurons in PD. Another study also observed an increase in TNF-α expression after injecting α-Syn into the striatum of mice [39]. Both KEGG signaling pathway enrichment and gene set enrichment analyses showed that TNF-α-related signals or gene sets were significantly activated in cells overexpressing α-Syn, which is consistent with previous studies. Our study found that both eiger and Grnd were up-regulated in the α-Syn overexpression Drosophila heads. A recent study showed that Grnd and Wgn have different affinities for Eiger [40]. Ectopic Eiger expression leads to high-affinity interactions of Eiger: Grnd complexes in vesicles, which is a prerequisite for Eiger-induced apoptosis [40]. They also found that Wgn binds to Eiger with a much lower affinity in intracellular vesicles.
Studies have shown that the α-Syn protein can enter neurons and localize in mitochondria, interact with ATP synthase α subunit, and regulate the ATP synthase function [41]. However, this regulation appears to be opposite to that we found in the GSEA analysis. They found that low monomeric α-Syn had the ability to increase ATP synthase efficiency. As a physiological regulator of mitochondrial bioenergy, the α-Syn protein improves the efficiency of energy production through its interaction with ATP synthase. This may be particularly important when stress or PD mutations lead to energy depletion or neurotoxicity. In contrast, another study found that the highly neurotoxic α-Syn protein induces mitochondrial damage and mitochondrial autophagy [42], which is the site of oxidative phosphorylation. Gene set enrichment analysis revealed that oxidative phosphorylation was significantly inhibited. Therefore, how oxidative phosphorylation affects the aggregation of α-Syn protein and the pathological process of PD remains to be further studied.
Studies have also explored the role of cell cycle changes in neurodegenerative diseases such as Alzheimer's disease. In one study, the authors examined the effect of α-Syn protein expression levels on the cell cycle index of PC12 cells and found that overexpression of α-Syn protein resulted in an increased rate of cell division, and a large number of cells appeared enriched in the S phase [43]. The α-Syn accelerates cell cycle and promotes neurotoxicity [44]. In addition, the research team from which the dataset was derived in this study also found that α-Syn causes severe transcriptional dysregulation, including downregulation of important cell cycle-related genes [18]. Gene set enrichment analysis revealed that several cell cycle-related gene sets were inhibited, such as E2F_TARGETS and G2M_CHECKPOINT, which provides more possibilities for future research on cell cycle changes, α-Syn protein aggregation and the impact on the pathological process of PD.
In addition to the above enriched gene sets, extracellular matrix signaling, DNA damage repair signaling, misfolded protein response, peroxisome-related signaling, inflammation, and immune signaling were significantly affected by α-Syn overexpression. Moreover, H4 cells are rather glial cells than neurons. The data obtained using H4 cells may be more specific to glial cells than to neurons. Multiple system atrophy (MSA) is another representative synucleinopathy [45]. It is well known that in MSA the α-Syn deposition is mainly observed in glia cells [46]. If the results using H4 cells are further examined in MSA models, more substantial conclusions may be obtained. Furthermore, lund human mesencephalic (LUHMES) cells can differentiate into neuronal cells with a robust dopaminergic phenotype [18]. In PD, major degeneration occurs in the dopaminergic neurons [47]. Analysis of the LUHMES-cell dataset may be more feasible for the study of PD. From the original point of view, the new factors found in the present study (Table 1) should be examined in vivo in future studies. If these new factors are validated in the future and translated into human medicine, they may be used as potential biomarkers or treatment targets for PD. In conclusion, the results of the gene set enrichment analysis provide more clues and ideas for future studies on α-Syn function.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.