Integrated Analysis of Methylomic and Transcriptomic Data to Identify Potential Diagnostic Biomarkers for Major Depressive Disorder

Major depressive disorder (MDD) is a mental illness with high incidence and complex etiology, that poses a serious threat to human health and increases the socioeconomic burden. Currently, high-accuracy biomarkers for MDD diagnosis are urgently needed. This paper aims to identify novel blood-based diagnostic biomarkers for MDD. Whole blood DNA methylation data and gene expression data from the Gene Expression Omnibus database are downloaded. Then, differentially expressed/methylated genes (DEGs/DMGs) are identified. In addition, we made a systematic analysis of the DNA methylation on 5′-C-phosphate-G-3′ (CpGs) in all of the gene regions, as well as different gene regions, and then we defined a “dominant” region. Subsequently, integrated analysis is employed to identify the robust MDD-related blood biomarkers. Finally, a gene expression classifier and a methylation classifier are constructed using the random forest algorithm and the leave-one-out cross-validation method. Our results demonstrate that DEGs are mainly involved in the inflammatory response-associated pathways, while DMGs are primarily concentrated in the neurodevelopment- and neuroplasticity-associated pathways. Our integrated analysis identified 46 hypo-methylated and up-regulated (hypo-up) genes and 71 hyper-methylated and down-regulated (hyper-down) genes. One gene expression classifier and two DNA methylation classifiers, based on the CpGs in all of the regions or in the dominant regions are constructed. The gene expression classifier possessed the best predictive ability, followed by the DNA methylation classifiers, based on the CpGs in both the dominant regions and all of the regions. In summary, the integrated analysis of DNA methylation and gene expression has identified 46 hypo-up genes and 71 hyper-down genes, which could be used as diagnostic biomarkers for MDD.


Introduction
Major depressive disorder (MDD) is one of the most common mental disorders around the world, and as estimated by the World Health Organization, approximately 350 million people of all ages worldwide suffer from depression [1]. The symptoms of depression include pervasive and persistent low mood, lack of motivation, and loss of interest in social interactions, which are important public health problems contributing to severe morbidity and mortality [2][3][4]. At present, the most common classical method for diagnosing depression is scale assessment, and imagological diagnosis may provide effective information in MDD classification [5,6]. Recently, as a relatively low-invasive and accessible method, peripheral blood (PB) examination has become an important complement to the classic diagnostic method mentioned above, improving the accuracy of an MDD diagnosis [7]. For example, microRNAs, exosomes, or a certain protein (e.g., C-reactive protein) in PB samples have been used as powerful tools to distinguish MDD patients from healthy controls [8,9]. To date, high-accuracy biomarkers for MDD diagnosis and prognosis are still lacking. Therefore, it is of great significance to investigate the molecular mechanism of MDD, aiming to identify precise targets and necessary biomarkers for the diagnosis of MDD.
MDD is a complex and heterogeneous disease strongly associated with genetic and environmental factors [10]. Each genetic or environmental factor alone cannot sufficiently explain MDD [11,12]. This then motivates people in the field of MDD research to select epigenetic mechanisms as prime candidates for mediating the genetic and environmental interactions in several brain regions [13]. Furthermore, DNA methylation is one of the major epigenetic modifications and plays an important role in the etiology of complex diseases [14]. Thus far, most DNA methylation studies have used candidate gene approaches and have been predominantly focused on gene promoter regions [15][16][17]. Accompanied by the rise of DNA methylation chip arrays and whole-genome bisulfite sequencing technology, several attempts have been made to decipher the relationship between DNA methylation and depression [18,19]. Although many genome-wide studies have indicated that DNA methylation is associated with depression, both positive and negative associations have been reported, and conflicting results are often observed [20,21].
DNA methylation is a very complicated phenomenon. It can occur in different regions such as in transcriptional start sites (TSS), gene bodies, and beyond. Except for the most studied promoter region, the function of other regions has been a largely underexplored domain. Perhaps the average methylation level of a specific gene is not a good reflection of methylation and disease, and a certain region or a CpG site may be better. In this study, we made a systematic analysis of DNA methylation on CpGs in all gene regions, as well as different gene regions (i.e., TSS1500, TSS200, 5 untranslated region (UTR), first exon (Exon1st), gene body, and 3 UTR), and then defined a "dominant" region, making the DNA methylation research in MDD more elaborate.
Both DNA methylation and gene expression are associated with depression, and conventional wisdom holds that DNA methylation has a negative regulatory effect on gene expression. Their combined analysis gives us a more in-depth understanding of MDD. Hence, we aimed to screen for genes associated with methylation alterations, as well as gene expression changes, through integrated analysis to provide more accurate diagnosis biomarkers for MDD. In this study, integrated analysis of DNA methylation and gene expression data identified 46 hypo-methylated and up-regulated (hypo-up) genes and 71 hyper-methylated and down-regulated (hyper-down) genes, and the random forest (RF) algorithm and leave-one-out (LOO) cross-validation method indicated that they could be used as diagnostic biomarkers for MDD.

Data Collection
The Gene Expression Omnibus (GEO) database is the largest and most comprehensive public gene expression data resource archiving and sorting high-throughput gene expression and genomics data. Our goal was to identify diagnostic biomarkers through a relatively accessible and low-invasive method. Therefore, we used the keywords "MDD" or "major depressive disorder" and "blood sample" for retrieval from GEO. After a careful review, the DNA methylation dataset GSE113725, including blood samples of 100 MDD patients and 50 healthy controls, was downloaded from the GEO database (GPL13534 Illumina HumanMethylation450 BeadChip) [22]. In addition, a gene expression profile, GSE98793, deposited by Leday et al., was selected based on the GPL570 HG-U133_Plus_2 platform, containing blood samples of 128 MDD patients and 64 healthy controls [23]. The demographic and clinical features for GSE113725 and GSE98793 are listed in Supplementary Tables S1 and S2. The workflow of this study is shown in Figure 1.
(GPL13534 Illumina HumanMethylation450 BeadChip) [22]. In addition, a gene expression profile, GSE98793, deposited by Leday et al., was selected based on the GPL570 HG-U133_Plus_2 platform, containing blood samples of 128 MDD patients and 64 healthy controls [23]. The demographic and clinical features for GSE113725 and GSE98793 are listed in Supplementary Tables S1 and S2. The workflow of this study is shown in Figure 1.

Screening of Differentially Expressed Genes (DEGs)
The "Affy" package in the R software (version 3.5.2) was adopted to process the raw data in CEL format. After eliminating batch differences and performing data background correction, normalization, and summarization, a robust multiarray average was created for further analysis. The "limma" package was applied to assess the differential expression between MDD patients and healthy controls [24]. Benjamini and Hochberg's (BH) method was used to control the false discovery rate across all genes. The threshold for identifying of DEGs was set at a BH-adjusted p-value of <0.05 and a | Log2 fold-change| > 0.2.

Differential Methylation Analysis
Illumina Infinium Human Methylation450 BeadChip array covering 99% of the genes' annotated promoter (TSS1500, TSS200), 5′UTR, Exon1st, gene body, and 3′UTR in the RefGENE database is one of the most classic DNA methylation detection techniques. TSS200/TSS1500 stands for 0-200/200-1500 bases upstream of the TSS, while gene body refers to the region between the initiation codon and stop codon. The analysis process is as follows:

•
(1) Methylation level of each CpG. The methylation level of each CpG can be calculated by the equation β = M / (M + U + a), where M > 0, U > 0, and a ≥ 0. M and U denote the number of methylated and unmethylated probes, respectively. Since M and U are small, "a" is set to 100 to stabilize the β-value [25].

•
(2) Methylation level of different regions. In this study, we employed the "ChAMP" package (version 2.18.3) to measure the methylation level of the different

Screening of Differentially Expressed Genes (DEGs)
The "Affy" package in the R software (version 3.5.2) was adopted to process the raw data in CEL format. After eliminating batch differences and performing data background correction, normalization, and summarization, a robust multiarray average was created for further analysis. The "limma" package was applied to assess the differential expression between MDD patients and healthy controls [24]. Benjamini and Hochberg's (BH) method was used to control the false discovery rate across all genes. The threshold for identifying of DEGs was set at a BH-adjusted p-value of <0.05 and a | Log2 fold-change| > 0.2.

Differential Methylation Analysis
Illumina Infinium Human Methylation450 BeadChip array covering 99% of the genes' annotated promoter (TSS1500, TSS200), 5 UTR, Exon1st, gene body, and 3 UTR in the RefGENE database is one of the most classic DNA methylation detection techniques. TSS200/TSS1500 stands for 0-200/200-1500 bases upstream of the TSS, while gene body refers to the region between the initiation codon and stop codon. The analysis process is as follows: To measure the methylation difference between MDD patients and healthy controls, a linear model was built. Ten quantiles of the delta beta value of all genes and all intergenic CpG sites were calculated, and the DMGs were defined as a ∆β value of <1/10 quantile or >9/10 quantile and a BH-adjusted p-value of <0.05.

Identification of the Dominant Hypo/Hyper-Methylated Regions
In this study, we defined the dominant hypo/hyper-methylated regions (hereafter, dominant regions). A dominant region refers to the smallest delta beta value between the MDD patients and healthy controls. It is worth noting that there may be more than one dominant region in an individual gene. Herein, if the difference between the delta beta value of another region and the smallest delta beta value was smaller than 0.005, the region was regarded as one of the dominant regions [26].

GO and KEGG Enrichment Analyses
To better understand the biological functions of the DEGs and DMGs, GO enrichment analysis was performed to provide structured annotations on three subontologies: Biological process (BP), molecular function (MF), and cellular component (CC). KEGG pathway enrichment analysis of the DEGs and DMGs was also implemented. Both enrichment analyses employed the R package "clusterprofiler". A BH-adjusted p-value of <0.05 was set as the cut-off criterion.

Classifier Construction and LOO Validation
RF machine learning is a nonlinear classifier that trains a large number of decision trees and uses the class predicted the most from these trees as the final prediction, which has been widely used in bioinformatics analysis, such as in in vivo transcription factorbinding prediction [27], and enhancer identification [28]. For this study, RF, implemented by the R package "randomForest," was employed to build prediction models to distinguish MDD patients from healthy controls on 46 hypo-up genes and 71 hyper-down genes. Three types of classifiers were trained based on the log2 transformed gene expression level data, the average β-value of the CpGs in all of the regions, and the average β-value of the CpGs in the dominant regions.

LOO Cross-Validation
LOO is a cross-validation method that removes only one sample from the training set, and each learning set is created by taking all of the samples except one (test set left out). We employed the LOO cross-validation method to monitor the performance of the classifiers using the R package "caret." The discriminative ability of each classifier was measured by receiver op-erating characteristic (ROC) curves, and the area under the ROC curve (AUROC) was calculated using the R package "ROCR."

Identification of the DEGs in MDD
To identify the DEGs in the MDD patients and healthy controls, we selected the most frequently used dataset, namely, GSE98793, containing blood samples of 128 MDD patients and 64 healthy controls. By employing the linear modeling approach, a total of 1506 DEGs were identified, 713 of which were up-regulated and the other 793 down-regulated ( Figure 2A and Table S3). The top 50 genes with significant differences in up-and down-regulated genes were selected to construct a heat map to show the changes in the DEG expression ( Figure 2B). in up-and down-regulated genes were selected to construct a heat map to show the changes in the DEG expression ( Figure 2B).

GO and KEGG Enrichment Analysis of the DEGs
To explore the biological relevance of the DEGs, we performed GO and KEGG enrichment analysis and found that the DEGs were associated with the following: (1) BP terms: Neutrophil-mediated immunity, neutrophil activation involved in immune response, antimicrobial humoral response, etc.; (2) CC terms: Specific granule lumen, secretory granule lumen, cytoplasmic vesicle lumen, etc.; (3) MF terms: Serine-type endopeptidase activity, serine-type peptidase activity, hydrolase activity, acting on acid phosphorus-nitrogen bonds, etc.; (4) KEGG pathways: Hematopoietic cell lineage, asthma, complement, and coagulation cascades, etc. ( Figure 3A-D). These results tie in well with previous studies, wherein inflammation was shown to trigger depression [29][30][31]. Proinflammatorycytokines, including IL-1, IL-6, and TNF-α, exhibited higher circulating levels in MDD patients than in non-depressed individuals [32]. The results of our analysis may provide new inflammation-associated diagnostic biomarkers for depression.

GO and KEGG Enrichment Analysis of the DEGs
To explore the biological relevance of the DEGs, we performed GO and KEGG enrichment analysis and found that the DEGs were associated with the following: (1) BP terms: Neutrophil-mediated immunity, neutrophil activation involved in immune response, antimicrobial humoral response, etc.; (2) CC terms: Specific granule lumen, secretory granule lumen, cytoplasmic vesicle lumen, etc.; (3) MF terms: Serine-type endopeptidase activity, serine-type peptidase activity, hydrolase activity, acting on acid phosphorus-nitrogen bonds, etc.; (4) KEGG pathways: Hematopoietic cell lineage, asthma, complement, and coagulation cascades, etc. ( Figure 3A-D). These results tie in well with previous studies, wherein inflammation was shown to trigger depression [29][30][31]. Proinflammatorycytokines, including IL-1, IL-6, and TNF-α, exhibited higher circulating levels in MDD patients than in non-depressed individuals [32]. The results of our analysis may provide new inflammationassociated diagnostic biomarkers for depression.

Identification and GO/KEGG Enrichment Analysis of the DMGs
We identified the significant DMGs based on the linear modeling approach and the delta β value of the CpGs of all of the regions. A total of 8313 DMGs were identified, including 4636 hyper-methylated genes and 3677 hypo-methylated genes (Supplementary  Table S4). We also performed GO and KEGG enrichment analysis of all of the DMGs, and the results showed that the DMGs were associated with following: (1) BP terms: Axonogenesis, neuron projection guidance, axon guidance, etc.; (2) CC terms: Cell leading edge, synaptic membrane, cell-substrate junction, etc.; (3) MF terms: Actin binding, DNAbinding transcription activator activity, DNA-binding transcription activator activity, RNA polymerase II-specific, etc.; (4) KEGG pathways: Axon guidance, MAPK signaling pathway, Rap1 signaling pathway, etc. ( Figure 4A-D). The results highlighted that the circulating DNA methylation probably participates in neurodevelopment and neuroplasticity, which are significantly associated with MDD.

Integrated Analysis of the Gene Expression and DNA Methylation
It is generally considered that there is a negative regulatory relationship between DNA methylation and gene expression. Therefore, we obtained the overlap of hypo-methylated and up-regulated genes, as well as hyper-methylated and down-regulated genes. As a result, 46 hypo-up genes and 71 hyper-down genes were identified ( Figure 5A,B). All of the gene symbols are listed in Table 1. The heat map showed the changes in the hypo-up and hyper-down genes between the MDD patients and healthy controls ( Figure 5C,D).

Identification and GO/KEGG Enrichment Analysis of the DMGs
We identified the significant DMGs based on the linear modeling approach and the delta β value of the CpGs of all of the regions. A total of 8313 DMGs were identified, including 4636 hyper-methylated genes and 3677 hypo-methylated genes (Supplementary Table S4). We also performed GO and KEGG enrichment analysis of all of the DMGs, and the results showed that the DMGs were associated with following: (1) BP terms: Axonogenesis, neuron projection guidance, axon guidance, etc.; (2) CC terms: Cell leading edge, synaptic membrane, cell-substrate junction, etc.; (3) MF terms: Actin binding, DNA-binding transcription activator activity, DNA-binding transcription activator activity, RNA polymerase II-specific, etc.; (4) KEGG pathways: Axon guidance, MAPK signaling pathway, Rap1 signaling pathway, etc. ( Figure 4A-D). The results highlighted that the circulating DNA methylation probably participates in neurodevelopment and neuroplasticity, which are significantly associated with MDD.

Integrated Analysis of the Gene Expression and DNA Methylation
It is generally considered that there is a negative regulatory relationship between DNA methylation and gene expression. Therefore, we obtained the overlap of hypo-methylated and up-regulated genes, as well as hyper-methylated and The 46 hypo-up genes were involved in such pathways (Table S5) as PI3K-Akt signaling pathway, the IL-17 signaling pathway, axon guidance, and neuroactive lig-   C20orf85, CCDC151, CEACAM4, CEACAM6, CHRM4,  CNTNAP3, DHRS7C, DISC1, DMRTC2, EVPLL, GBAP1, GPR45, GRINA,  GUCY2D, HYAL3, INSCI, TGA2B, LGSN, LRRC4C, MMP3, NEU4, OLIG1,  PADI4, PCDHB12, PCDHB5, PLOD1, PRKAR2B, PVRL2, SEC14L4,  SIPA1L2, SLC26A4, SLC26A9, SLC30A3, SPARC, TAS1R2, TBCC, TCTE3,  TDRD9, TENC1, THBS2, TMEM53, TNFSF13, TREML1, VTCN1 hyper-down The 46 hypo-up genes were involved in such pathways (Table S5) as PI3K-Akt signaling pathway, the IL-17 signaling pathway, axon guidance, and neuroactive ligand-receptor interaction. The 71 hyper-down genes were involved in pathways (Table S6), such as the NF-kappa B signaling pathway, the MAPK signaling pathway, neuroactive ligand-receptor interaction, and the synaptic vesicle cycle. It is worth noting that most of the 46 hypo-up genes and 71 hyper-down genes were associated with both the inflammatory response and neuroplasticity. We also found that a proportion of the 71 hyper-down genes were engaged in nucleotide excision repair, DNA replication, ribosome, and ribosome biogenesis in the eukaryotes pathways. The results reveal that depression should be accompanied by changes in the metabolism of biological macromolecules, such as DNA and proteins.

Identification of the DMGs Based on CpGs in the Different Regions
In a certain gene, DNA methylation occurs in different regions, including TSS1500, TSS200, 1stExon, gene body, 5 UTR, and 3 UTR, as well as other regions, the function of which remains unclear. Genomic annotation of the methylation based on the CpGs in different regions revealed a biased genomic distribution. As shown in Figure 6A, the maximum number of DMGs is was in the gene body region (>3000), followed by the TSS1500 region (>2000), while the 1stExon region had the minimum (<500). The DNA methylation distribution tendency of the overlap of the DEGs and DMGs was identical in that the gene body region had more than 100 DMGs ranking first, with the TSS1500 region coming second with more than 80, and the 1stExon region being last with at least 30 ( Figure 6B). In terms of individual regions, the gene body had the highest percentage of DMGs. Unfortunately, little research on the DNA methylation in this region has been reported.
To get a closer look at the distribution of each region, we divided all of the overlaps of DMGs and DEGs into four groups: Hyper-up group, hyper-down group, hypo-up group, and hypo-down group. The methylation distribution of the four groups in each region is listed in Table 2. Figure 6C shows that the distribution of the CpGs of the hyper-down and hypo-up genes in the different regions is quite similar. The gene body and TSS1500 region remained in the top two, while the least in the hyper-down group was in 1stExon and in the hypo-up group was TSS200. The hyper-up group had the largest number of DMGs, and the DNA methylation distribution of the hypo-down group in the six regions was similar to that of the hyper-up group. We also found that the hyper-up group had the largest number in the gene body region (>60). It has been reported that the methylation of the gene body may have a positive impact on gene expression [33]. The relationship between gene body methylation and gene expression remains to be further elucidated. Our results suggest that region-specific methylation may play a potential role in the diagnosis of depression.

Classifier Construction and ROC Curve
We employed the RF algorithm and the LOO cross-validation method to construct classifiers based on the gene expression and methylation data of the 46 hypo-up genes and 71 hyper-down genes to distinguish the MDD patients from the healthy controls. The "importance" function of the "randomForest" package was used to calculate the average importance of each gene in the six classifiers, and their importance was ranked in descending order. Figure 7A-F shows the importance scores of the top 20 genes in each classifier, and all of the genes and importance scores are listed in Tables 3 and 4.

Classifier Construction and ROC Curve
We employed the RF algorithm and the LOO cross-validation method to construct classifiers based on the gene expression and methylation data of the 46 hypo-up genes and 71 hyper-down genes to distinguish the MDD patients from the healthy controls. There were two types of methylation classifiers for the 46 Supplementary  Tables S7 and S8, while for the 71 hyper-down genes, the information is presented in Supplementary Tables S9 and S10.

The Importance Score of the 46 Hypo-Up Genes and the 71 Hyper-Down Genes in Each Classifier
The "importance" function of the "randomForest" package was used to calculate the average importance of each gene in the six classifiers, and their importance was ranked in descending order. Figure 7A-F shows the importance scores of the top 20 genes in each classifier, and all of the genes and importance scores are listed in Tables 3 and 4.  Table 3. Gene symbol and importance scores of 46 hypo-up genes in gene expression classifier and gene methylation classifiers.   Table 3. Gene symbol and importance scores of 46 hypo-up genes in gene expression classifier and gene methylation classifiers.  Table 4. Gene symbol and importance scores of 71 hyper-down genes in gene expression classifier and gene methylation classifiers.

Importance of 71 Hyper-Down Genes Based on DNA Methylation Level of Dominant Regions
Gene Symbol Importance Gene Symbol Importance Gene Symbol Importance

Determine the Number of Genes with the Best Predictive Power in Each Classifier
To obtain the best classification predictive power of each classifier, we added the candidate genes into each classifier one-by-one in order of importance. Figure

The Predictive Ability of Each Classifier
The ROC curve shows that the gene expression classifier exhibited the best predictive ability (AUC = 0.964, p = 1.1 × 10 −25 ) for the hypo-up genes. The predictive ability (AUC = 0.712, p = 3.7 × 10 −5 ) of the methylation classifier based on the CpGs in the dominant regions performed slightly better than that of the classifier based on the CpGs of all of the regions (AUC = 0.677, p = 5.3 × 10 −4 ) ( Figure 9A-C). In regard to the hyper-down genes, the gene expression classifier still presented the best predictive ability (AUC = 0.9993, p = 1.9× 10 −29 ). The predictive ability of the two methylation classifiers was similar for the classifier based on the CpGs of all of the regions (AUC = 0.712, p = 3.2× 10 −5 ) and on the CpGs of the dominant regions (AUC = 0.716, p = 2.2 × 10 −5 ) ( Figure 9D-F). The results reveal that for both the hypo-up and hyper-down genes, the classifier based on the gene expression data possessed the best predictive ability (AUC > 0.95), while the classifier based on the CpGs in the dominant regions had a relatively higher predictive ability than the classifier based on the CpGs in all of the regions. candidate genes into each classifier one-by-one in order of importance. Figure 8A-C shows that the top 25, top 12, and top 23 are the best predictors of the hypo-up gene expression classifier, the gene methylation classifier based on the CpGs in all of the regions, that based on the CpGs in the dominant regions, respectively. Figure 8D-F shows that the top 31, top 2, and top 18 are the best predictors of the hyper-down gene expression classifier, the gene methylation classifier based on the CpGs in all of the regions, and that based on the CpGs in the dominant regions, respectively.

Discussion
MDD is a mental disorder with high morbidity, a complicated etiology, and a severe socioeconomic burden, lacking diagnostic biomarkers. By conducting bioinformatics analysis and mining of the GSE98793 gene expression dataset and the

Discussion
MDD is a mental disorder with high morbidity, a complicated etiology, and a severe socioeconomic burden, lacking diagnostic biomarkers. By conducting bioinformatics analysis and mining of the GSE98793 gene expression dataset and the GSE113725genomewide methylation dataset, we obtained the following results. DEGs are mainly enriched in the inflammatory response-associated pathways, while DMGs are mainly enriched in the neurodevelopmental-and neuroplasticity-associated pathways. Through integrated analysis of the gene expression and DNA methylation data, 46 hypo-up genes and 71 hyper-down genes were identified. These genes are mainly involved in immune activation, synaptic development, and DNA repair. Classifiers based on the gene expression and DNA methylation in all regions, as well as in the different regions, were established by the random forest algorithm and the LOO cross-validation method. The results reveal that for both the hypo-up and hyper-down genes, the classifier based on the gene expression data exhibited the best predictive power, while the methylation classifier based on the CpGs in the dominant regions possessed a relatively higher predictive ability than the methylation classifier based on the CpGs in all regions.
Sigmund Freud wrote that "the complex of melancholia behaves like an open wound" [34]. Clinical and translational studies have shown that inflammatory responses are associated with the onset and maintenance of MDD. Glial cells, including microglia and astrocytes, are the primary immune mediators of the brain and respond accordingly to external stimuli. Proinflammatory (TNF-α and IL-1β) and anti-inflammatory (IL-1, IL-10, and TGFβ1) factors are released under stress. Increasing levels of proinflammatory mediators such as IL-1, IL-2, IL-6, and TNF-α have been observed in patients with depression [35]. Herein, the GO enrichment analysis indicated that DEGs are associated with neutrophil-mediated immunity and the neutrophil activation involved in immune response, and the results are consistent with previous studies. A growing body of research shows that inflammation is closely related to depression [36], but the exact molecular mechanisms remain to be elucidated. Based on the results of our analysis, we hypothesize that inflammatory responses are involved in the progression of depression, and circulating inflammatory factors could be potential diagnostic biomarkers for MDD.
Interestingly, different from DEGs, DMGs are enriched in axonogenesis, neuron projection guidance, axon guidance, GO terms, and the MAPK signaling pathways. The postmortem and meta-analyses of magnetic resonance imaging studies indicate that hippocampal volume decreases in patients with depression [37,38]. There are two hypotheses for this phenomenon, namely, the neuroplasticity hypothesis and the neurogenesis hypothesis. The former suggests that stress induces the atrophy of mature neurons in the hippocampus, while the latter suggests that stress decreases the number of newborn neurons and neural precursor cells in the dentate gyrus of the hippocampus. As mentioned in the last paragraph, astrocytes and microglia participate in the inflammatory response in the brain, and they also secrete nutrients, such as BDNF, to nourish neurons. BDNF positively regulates nerve polymorphisms, and BDNF expression is decreased in the hippocampus of depressed patients [39,40]. Research suggests that the methylation level of the BDNF promoter region in MDD patients is increased, while the mRNA expression is decreased [41]. It has also been reported that IL-6 modulates synaptic plasticity [42]. We presume that there could be extremely complex crosstalk among inflammatory response, neuroplasticity, gene expression, and DNA methylation, but the molecular mechanisms remain a mystery. TSS1500, TSS200, 5 UTR, and 1stExon are all related to transcriptional initiation and can be subsumed into the promoter region. Our results showed that DMGs have the highest distribution in the promoter region, and the gene body region also accounts for a large proportion. Basically, there is one CpG island (CGI) in every 10 base pairs in the human genome. However, the content of CGI surrounding the TSS of protein-coding genes is as high as 60%. The promoter region is the most studied region of DNA methylation, and it is considered that the CGI methylation of the promoter region is a hallmark of inhibiting gene expression.
Little research has been done on DNA methylation in the gene body region. Benefitting from the development of sequencing technology, we have a more complete understanding of genome-wide DNA methylation modification. Except for the promoter region, many CGIs are distributed in the gene body and intergenic region. Ehrlich et al. [43] found that brain tissue contains some of the highest levels of DNA methylation in the gene body. In the human brain, 16% of all CGIs are methylated, while 98% of the annotated 5 promoter regions are unmethylated and, surprisingly, CGI methylation in the gene body region is up to 34% [44]. Although the methylation function of the gene body is unclear, we speculate that methylation in the gene body region may be more dynamic than in the promoter region. Evidence suggests that the degree of gene body methylation in dividing cells is positively correlated with gene expression [45]. Unlike the methylation of the promoter region, which inhibits transcription initiation, gene body methylation does not prevent-and may even promote-transcription elongation [33]. Our findings provide new insight into research related to gene body methylation and depression.
We constructed three types of classifiers: A gene expression classifier, a methylation classifier based on the CpGs in all of the regions, and a methylation classifier based on the CpGs in the dominant regions for both the 46 hypo-up genes and the 71 hyper-down genes. The classifier based on the gene expression data exhibited satisfactory predictive ability, with an AUC > 0.95. The predictive ability of the methylation classifier based on the CpGs in the dominant regions was relatively better than that of the methylation classifier based on the CpGs in all of the regions. The relationship between DNA methylation and depression is a controversial topic. The results presented by genome-wide DNA methylation studies are multitudinous [20,21]. A detailed study of a certain region(s) or CpG(s) should better demonstrate the relationship between the two. A post-hoc investigation indicated that FKBP5 intron methylation has a negative correlation with transcription activation in MDD patients [46]. In a prospective analysis of major depressive disorder in adolescent girls, the authors found that all four significant CpGs in NR3C1 were in the gene body region: Two sites were located within a transcription factor-binding site (TFBS) region, one was in a region of open chromatin, and one site associated with an enhancer element [47]. Benjamard et al. [48] reported that in the promoter region of parvalbumin, methylation was significantly increased at CpG2 and decreased at CpG4 in the MDD group compared to the control group. Some alterations of CpGs are limited to specific gene phenotypes. In the SS genotype of 5-HTTLPR, depression is significantly involved with a decrease in methylation levels at CpG21, CpG25, and CpG26 [49]. Studies based on a certain region(s) or CpG(s) level should be the future research trend of the relationship between DNA methylation and MDD.
Although all of the classifiers demonstrated favorable predictive ability, the limitations cannot be ignored. First, the samples of gene expression and DNA methylation data came from different cohorts. Complicated diseases (such as MDD) involve molecular changes at multiple levels, such as at the genome, epigenome, and transcriptome levels. Researchers hope to systematically and comprehensively study the pathogenesis of diseases from multiple dimensions and perspectives. However, due to limited data sources and research funding, many studies have been conducted using datasets of similar disease models or similar research backgrounds. For example, Reference [26] integrated DNA methylation and transcriptome data and identified 85 hypo-up genes that could be potential diagnostic biomarkers for Parkinson's disease. The same integrated analysis was performed in Reference [50] to predict gastric cancer. Integrated analyses of other omics are also common in medical research, namely, multi-genome [51], and multi-transcriptome [52] analyses. Not only is this phenomenon observed in clinical studies, but also in botanical studies (e.g., multi-transcriptome in References [53,54], and multi-ChIP-seq in Reference [55]). Although our data did not match completely, we used the integrated analysis method of different omics to provide a new perspective and direction for depression. Second, to what extent changes in peripheral blood genes are associated with genes in the brain is unknown, and integrated analysis of peripheral blood samples and brain tissues is necessary. Finally, further experimental validation will improve the credibility of genes in the classifiers as potential biomarkers for MDD.

Conclusions
For the first time, three types of classifiers (i.e., a gene expression classifier, a methylation classifier based on the CpGs in all of the regions, and a methylation classifier based on the CpGs in the dominant regions) were constructed and compared with one another on the basis of integrated analysis in MDD. The results showed that for the 46 hypo-up genes and the 71 hyper-down genes, the gene expression classifier presented the best predictive power, while the methylation classifier based on the CpGs in the dominant regions was relatively better than the methylation classifier based on the CpGs in all of the regions. Taken together, we identified a blood signature consisting of 46 hypo-up genes and 71 hyper-down genes, which may play a potential role in the diagnosis of MDD.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-442 5/12/2/178/s1: Table S1: Demographic and clinical features for GSE113725. Table S2: Demographic and clinical features for the GSK-HiTDiP and Janssen-BRC (GSE98793) case-control studies. Table S3: Gene symbols of 1056 differentially expressed genes (DEGs). Table S4: Gene symbols of 8313 differently methylated genes (DMGs).  Table S7: Relevant information about the methylation of CpGs in all regions of the 46 hypo-up genes. Table S8: Relevant information about the methylation of CpGs in the dominant regions of the 46 hypo-up genes. Table S9: Relevant information about the methylation of CpGs in all regions of the 71 hyper-down genes. Table S10: Relevant information about methylation of CpGs in the dominant regions of the 71 hyper-down genes.
Author Contributions: Y.X. and G.W. designed the study, conducted the transcriptional and methylation data analysis, and drafted the manuscript. L.X. and L.C. carried out integration of the methylation and gene expression data and constructed classifiers based on gene expression and DNA methylation using the random forest algorithm and the leave-one-out cross-validation method. Y.Z. and C.Z. were responsible for data screening and collection. All authors have read and agreed to the published version of the manuscript.