Genome-Wide Identiﬁcation and Coexpression Network Analysis of DNA Methylation Pathway Genes and Their Di ﬀ erentiated Functions in Ginkgo biloba

: DNA methylation plays a vital role in diverse biological processes. DNA methyltransferases ( DNMTs ) genes and RNA-directed DNA methylation (RdDM)-related genes are key genes responsible for establishing and maintaining genome DNA methylation in plants. In the present study, we systematically identiﬁed nine GbDNMTs in Ginkgo biloba , including the three common families of GbMET1a / 1b , GbCMT2 , and GbDRMa / b / 2a / 2b / 2c , and a fourth family— GbDNMT3 —which is absent in most angiosperms. We also identiﬁed twenty RdDM-related genes, including four GbDCLs , six GbAGOs, and ten GbRDRs . Expression analysis of the genes showed the di ﬀ erent patterns of individual genes, and 15 of 29 genes displayed expression change under ﬁve types of abiotic stress. Gene coexpression analysis and weighted gene co-expression network analysis (WGCNA) using 126 public transcriptomic datasets revealed that these genes were clustered into two groups. In group I, genes covered members from all six families which were preferentially expressed in the ovulate strobile and fruit. A gene ontology (GO) enrichment analysis of WGCNA modules indicated that group I genes were most correlated with the biological process of cell proliferation. Group II only consisted of RdDM-related genes, including GbDRMs , GbAGOs , and GbRDRs , but no GbDCLs , and these genes were speciﬁcally expressed in the cambium, suggesting that they may function in a dicer-like (DCL)-independent RdDM pathway in speciﬁc tissues. The gene module related to group II was most enriched in signal transduction, cell communication, and the response to the stimulus. These results demonstrate that gene family members could be conserved or diverged across species, and multi-member families in the same pathway may cluster into di ﬀ erent modules to function di ﬀ erentially. The study provides insight into the DNA methylation genes and their possible functions in G. biloba , laying a foundation for the further study of DNA methylation in gymnosperms. in various tissues and under abiotic stresses were investigated. Furthermore, a large number of public RNA-seq data of G. biloba were used to explore the function of DNA methylation pathway genes more deeply. We obtained high-quality RNA-seq data of 126 samples from the NCBI Sequence Read Archive (SRA) database and assigned them to 18 modules with WGCNA. We further identiﬁed modules highly related to DNA methylation pathway genes, which were used for exploring their functions through gene ontology (GO) enrichment analysis [36]. Our study provides valuable information for further understanding the functional roles of DNA methylation in G. biloba . highly correlated modules The results show the very distinguishing enriched biological processes of GO for each module. In the four modules associated with group I, genes of the turquoise module are mainly involved in cell proliferation, genes related to lipid biosynthesis and anion transport are enriched in the midnightblue module, cellular localization-related genes are enriched in the green module, and genes in the grey60 module are involved in polysaccharide metabolism. In the brown module, which is highly correlated with group II DNA methylation genes, signal transduction, cell communication, and the response to stimulus are highly enriched biological processes. In the blue module, which is negatively correlated with almost all of the DNA methylation genes, the photosynthesis and cation transport-related genes are mostly enriched. These results demonstrate that di ﬀ erent groups of DNA methylation genes may have di ﬀ erentiated functions, which involve di ﬀ erent upstream- or downstream- genes.


Chromosomal Localization and Gene Tandem Duplication
Chromosome position and gene tandem duplication were illustrated by TBtools V0.6731 [39]. The tandem duplication of DNA methylation pathway genes was confirmed according to Zhao et al., which included two standards: more than 70% of the longer sequence aligned with the shorter sequence, and a higher than 70% similarity between two aligned sequences [40]. Two genes with five or fewer genes apart in 100-kb chromosome length were considered to be tandem duplicated genes [41].

Plant Tissues and Abiotic Stress Treatments for the Gene Expression Analysis
To investigate the gene expression patterns in various tissues, the seeds were collected from a 20-year-old ginkgo tree (cv. Jiafoshou), grown at Nanjing Forestry University (32 • 04 47" N, 118 • 48 56" E), Nanjing, China. After cleaning and removing the outer seed coat, seeds were stratified at 4°C for 1.5 months. The physiological mature seeds were then germinated at 27°C and sown into the soil at germinating prophase. Different aged seedlings were then used for both tissue sampling and the abiotic stress treatments. For the young root (YR), young stem (YS), and young leaf (YL) samples, three 1.5-month-old seedlings were collected, of which all roots, main stem without branches, and two uppermost leaves from the same individual seedling were collected, respectively, as one replicate. Mature leaf (ML) and immature fruit (IF) samples were collected from the outer middle part of the mother tree on 2 May, of which each replicate contains two leaves and two fruits, respectively. A total of 15 samples with three replicates for five tissues were all immediately frozen in liquid nitrogen and stored at −80°C after collecting for future use. Some other tissues were integrated into this Forests 2020, 11, 1076 4 of 20 study with their publicly available data, that included: adult root (AR), adult stem (AS), immature leaf (IL, 30 March), microstrobilus (M), ovulate strobilus (OS), and mature fruit (MF, 15 October), with three replicates each, collected from the 31-year-old ginkgo trees (cv. Jiafoshou) planted at Yangtze University, China (30 • 20 06" N, 116 • 19 24" E) [27]; kernels (with the removal of the testae from the seeds) collected in July (K7), August (K8), September (K9), November (K11), and December (K12) with only one replicate, also from the same mature tree at Nanjing Forestry University [31]; cambium samples from the different aged trees (15Y, 20Y, 22Y, 193Y, 211Y, 236Y, 538Y, 553Y, and 667Y) [30]; and a chichi sample at the elongating stage with a 1.7-cm length collected from Shandong Province, China (35 • 37 -35 • 40 N, 116 • 36 -117 • 38 E) [33].
For the abiotic stress treatments, 35 two-month seedlings, germinating from the seeds described above, were treated with a Hoagland solution with 20% (W/V) PEG6000 to simulate drought stress; two leaves from two random individual seedlings were then collected at 0 (control), 2, 6, 12, and 24 h, respectively, with three replicates. Additionally, 20 six-month seedlings were treated at 40°C (Heat), with 150 µM NaCl (Salt) and 100 mM methyl jasmonate (MeJA) for 24 h, respectively, then the uppermost leaves from two individual seedlings were collected as one replicate, with three replicates for each; the seedlings before the treatments (0 h) were collected as control. All samples were immediately frozen in liquid nitrogen and stored at −80°C. The data for the UV-B treatment were downloaded from NCBI (PRJNA595103) [35]. In the experiments, the four-month seedlings were treated with the high-dose UV-B (21.42 kJ/m 2 /day) for 14 d, then the upper leaves were collected with three replicates; the seedlings treated with the UV-B free white light for 14 d were collected as control.

Transcriptome Sequencing and Gene Expression Analysis of DNA Methylation Pathway Genes
Total RNAs of collected samples were extracted using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany), and then quality-checked using the kaiaoK5500 Spectrophotometer (Kaiao, Beijing, China) and quantified using the RNA Nano 6000 Assay kit (Agilent Technologies, Santa Clara, CA, USA). A total amount of 2 µg RNA per sample was used to generate the sequencing library using NEBNext Ultra RNA Library Prep Kit for Illumina (#E7530, NEB, Ipswich, MA, USA). The libraries were sequenced on an Illumina NovaSeq platform and 150 bp paired-end reads were generated. The raw data have been uploaded to the NCBI database under the project PRJNA650527.
Publicly available transcriptomic data mentioned above for more tissues were downloaded from the NCBI Sequence Read Archive (SRA) database (Table S3). All of the raw RNA-seq data were subject to adapter and low-quality base trimming using Trimmomatic v0.36 [42]. The clean reads were then mapped to the reference genome of G. biloba using Spliced Transcripts Alignment to a Reference (STAR) v2.5.3a [43] and the number of reads was calculated for each gene by featureCounts v1.6.2 [44]. The fpkm (fragment counts normalized per kilobase of feature length per million mapped fragments) function of DESeq2 v1.28.1 package was used to calculate and normalize gene expression level using the "median ratio method" [45]. In tissue samples, the Pearson correlation coefficient (PCC) of the data from different replicates was calculated to inspect the consistency (Table S3). The expression value for each gene was represented by the averaged FPKM of the replicates for most of the samples and the FPKM in the only one replicate for the kernel, cambium, and chichi samples. Differentially expressed genes under abiotic stresses were identified using the DESeq2. For each treatment, gene expression in the treated sample was compared with that in the untreated control.

Coexpression Analysis of DNA Methylation Pathway Genes
RNA-Seq data of 126 G. biloba samples, not including the data generated in this study, were downloaded from SRA. Detailed information on these samples is listed in Table S4. The SRA Toolkit v2.9.2 (https://github.com/ncbi/sra-tools) was used to convert the raw data into a fastq format. The cleaning, trimming, mapping, and counting of reads were performed as described above. The FPKM values normalized using the "median ratio method" were calculated and applied for co-expression analysis using the weighted gene co-expression network analysis (WGCNA) v1.69 package [46] with Forests 2020, 11, 1076 5 of 20 a soft-thresholding power 8. The GO enrichment analysis was performed for genes in each module using the topGO v2.40.0 package [47]. To visualize the connections of 29 genes in the DNA methylation pathway, the Pearson correlation coefficient (PCC) values were extracted and imported into Cytoscape v3.7.1 [48] to generate the network.

Identification and Structural Analysis of DNA Methylation Pathway Gene Families
The hidden Markov model (HMM) profile analyses identified nine DNA methyltransferases (DNMTs) genes in G. biloba (Table 1), including two methyltransferases (GbMETs); one chromomethylase (GbCMT); five domains rearranged methyltransferases (GbDRMs); and notably, one GbDNMT3, which was lost in angiosperms and recently functionally identified in Physcomitrella patens. Three families involved in the RNA-directed DNA methylation (RdDM) pathway were identified, including four Dicer-likes (GbDCLs), six Argonautes (GbAGOs), and 10 RNA-dependent RNA Polymerases (GbRDRs) ( Table 1). The sequences analysis showed that the open reading frames (ORFs) of the GbDNMTs ranged from 665 to 225,575 bp, containing 0 to 22 introns and encoding varied from 144 to 1536 amino acids. The molecular weight (MW) of GbDNMTs ranged from 15.18 to 171.27 KDa, and the protein isoelectric point (PI) ranged from 5.10 to 9.46 (Table 1). Protein structural domain and motif analysis indicated that all of the GbDNMTs have the DNA-methylase domain and most of them have the conserved motif 2, except for three GbDRM2 (Figure 1a, Figure S1). Moreover, GbMETs have two DNMT1-RFD domains and two BAH domains, GbCMT2 has BAH and Chromo domains, and GbDRMs and GbDNMT3 only have the DNA-methylase domain (Figure 1a). structure of proteins showed seven conserved domains in all four GbDCLs, except for an extra Double Stranded RNS-binding (dsRB) domain at the C-terminus of GbDCL1 (Figure 1b). The same conserved motifs were also observed in all GbDCLs, except for GbDCL1, which had an extra motif 2 in the middle, and GbDCL3b, which had an extra motif 3 ( Figure S1). The six candidate AGO genes identified in G. biloba have a range from 2800 to 81,193 bp ORFs with one to 23 introns, and encode 741 to 1199-aa proteins (Table 1). Domain analysis using SMART showed that these GbAGO proteins contain the same DUF1785, PAZ, and PIWI domains ( Figure 1b). The motif analysis also showed that all 10 motifs are conserved in six GbAGOs, except for the fact that GbAGO3 has an extra motif 7 and 10 ( Figure S1).
All 10 RDR proteins in G. biloba share a common RdRP domain corresponding to the catalytic β' subunit of RDR [49]. Additionally, an RNA recognition motif (RRM) domain exists at the N-terminus of GbRDR6a and GbRDR6b (Figure 1b). The GbRDRs have a range between 1734 and 521,276 bp ORFs and encode for proteins with 491 to 1187 amino acids. Most of them have zero to three introns, except for GbRDR3 with 18 introns and GbRDR1a with nine introns (Table 1). Interestingly, the RdRP domain of GbRDR1b might consist of two isolated segments (95 and 320 amino acids), encoded by two different portions of the coding region in the G. biloba genome, respectively. The motif analysis showed that the GbRDRs can be divided into three sets. The set with GbRDR1a-1e has the conserved motif 1-7; the second set with GbRDR6a and GbRDR6c has all 10 motifs; and the third set with GbRDR1f and RDR3 is special, with only two different motifs ( Figure S1). There are four DCLs in G. biloba, of which two DCL3 (GbDCL3a and GbDCL3b) were identified, as previously reported in maize and Pinus tabuliformis [19,37]. GbDCL1 has the shortest ORF, but with the least number of introns, and encodes a 2164-aa protein, while the other three GbDCLs have seven to nine times longer ORFs, but more introns, and encode shorter proteins (Table 1, Figure 1b). The structure of proteins showed seven conserved domains in all four GbDCLs, except for an extra Double Stranded RNS-binding (dsRB) domain at the C-terminus of GbDCL1 (Figure 1b). The same conserved motifs were also observed in all GbDCLs, except for GbDCL1, which had an extra motif 2 in the middle, and GbDCL3b, which had an extra motif 3 ( Figure S1).
The six candidate AGO genes identified in G. biloba have a range from 2800 to 81,193 bp ORFs with one to 23 introns, and encode 741 to 1199-aa proteins (Table 1). Domain analysis using SMART showed that these GbAGO proteins contain the same DUF1785, PAZ, and PIWI domains (Figure 1b). The motif analysis also showed that all 10 motifs are conserved in six GbAGOs, except for the fact that GbAGO3 has an extra motif 7 and 10 ( Figure S1).
All 10 RDR proteins in G. biloba share a common RdRP domain corresponding to the catalytic β' subunit of RDR [49]. Additionally, an RNA recognition motif (RRM) domain exists at the N-terminus of GbRDR6a and GbRDR6b (Figure 1b). The GbRDRs have a range between 1734 and 521,276 bp ORFs and encode for proteins with 491 to 1187 amino acids. Most of them have zero to three introns, except for GbRDR3 with 18 introns and GbRDR1a with nine introns (Table 1). Interestingly, the RdRP domain of GbRDR1b might consist of two isolated segments (95 and 320 amino acids), encoded by two different portions of the coding region in the G. biloba genome, respectively. The motif analysis showed that the GbRDRs can be divided into three sets. The set with GbRDR1a-1e has the conserved motif 1-7; the second set with GbRDR6a and GbRDR6c has all 10 motifs; and the third set with GbRDR1f and RDR3 is special, with only two different motifs ( Figure S1).

Phylogenetic Analysis of DNA Methylation Pathway Genes in G. biloba
To study the evolutionary relationship of different DNA methylation pathway gene families, full-length protein sequences from six plant species (one monocot of Oryza sativa; three dicots of Populus trichocarpa, Solanum lycopersicum, and Arabidopsis thaliana; and two gymnosperms of Pinus tabuliformis and G. biloba) were aligned to construct the unrooted neighbor-joining trees. All DNMTs were clustered into individual subfamilies, in which two GbMETs were close to AtMET1 so were named GbMET1a and GbMET1b, respectively; the only GbCMT was close to OsCMT2 and AtCMT2, so was named GbCMT2 ( Figure 2). DNMT3, with a deficiency in angiosperms and overlooked in the past, was identified in this study. The phylogenetic tree showed that the DNMT3s were clustered together with DRMs, which are close homologs in plants, and formed four clades: DRMa/b, DRM2, DRM3, and DNMT3. From three of these clades, two GbDRMs, three GbDRM2s, and one GbDNMT3 were identified, and no DRM3 was obtained in G. biloba ( Figure 2).

Phylogenetic Analysis of DNA Methylation Pathway Genes in G. biloba
To study the evolutionary relationship of different DNA methylation pathway gene families, full-length protein sequences from six plant species (one monocot of Oryza sativa; three dicots of Populus trichocarpa, Solanum lycopersicum, and Arabidopsis thaliana; and two gymnosperms of Pinus tabuliformis and G. biloba) were aligned to construct the unrooted neighbor-joining trees. All DNMTs were clustered into individual subfamilies, in which two GbMETs were close to AtMET1 so were named GbMET1a and GbMET1b, respectively; the only GbCMT was close to OsCMT2 and AtCMT2, so was named GbCMT2 ( Figure 2). DNMT3, with a deficiency in angiosperms and overlooked in the past, was identified in this study. The phylogenetic tree showed that the DNMT3s were clustered together with DRMs, which are close homologs in plants, and formed four clades: DRMa/b, DRM2, DRM3, and DNMT3. From three of these clades, two GbDRMs, three GbDRM2s, and one GbDNMT3 were identified, and no DRM3 was obtained in G. biloba ( Figure 2).  (Table S2). For the domains rearranged methyltransferase (DRM)/DNA methyltransferase 3 (DNMT3) family, the DNA methylation domain region from more species was used [7].

Chromosomal Location and Tandem Duplication of DNA Methylation Pathway Genes in the G. biloba Genome
Nine GbDNMTs were found to be distributed on five chromosomes with a variable distribution: GbMET1a and GbMET1b were located on chromosomes 1 and 6, respectively. The single GbCMT2 was located on chromosome 2. Each member of GbDRMs was positioned separately on five different chromosomes. GbDNMT3 was located on chromosome 9. Tandem duplication was not found in any DNA methyltransferase family ( Figure 3).

Chromosomal Location and Tandem Duplication of DNA Methylation Pathway Genes in the G. biloba Genome
Nine GbDNMTs were found to be distributed on five chromosomes with a variable distribution: GbMET1a and GbMET1b were located on chromosomes 1 and 6, respectively. The single GbCMT2 was located on chromosome 2. Each member of GbDRMs was positioned separately on five different chromosomes. GbDNMT3 was located on chromosome 9. Tandem duplication was not found in any DNA methyltransferase family ( Figure 3). Twenty RdDM related genes were distributed on the seven chromosomes of G. biloba. Four GbDCLs were located on chromosomes 1, 4, and 5, respectively; six GbAGOs were found on chromosomes 1, 9, and 13; and 10 GbRDRs were gathered on chromosomes 2 and 3. With more than a 75% similarity of the coding sequence (CDS), tandem duplication was observed for the GbDCL3a and GbDCL3b on chromosome 1, GbRDR1e and 1f on chromosome 2, and GbRDR6a and 6c on chromosome 3 (Figure 3).

Expression Patterns of DNA Methylation Pathway Genes in Various Tissues of G. biloba
DNA methylation is involved in various aspects of plant growth and development; even sex determination [2,50]. To investigate the expression pattern of DNA methylation pathway genes at different growth and development stages of various tissues, we collected root, stem, leaf, flower, and fruit samples from seedlings or adult plants of G. biloba for the transcriptomic analysis. Expressions from some other tissues, including kernels and cambium, were also included using the public dataset. The results showed the different tissue specificities among the genes (Figure 4a). For GbDNMTs, GbDRMs had a relatively higher expression then GbMETs and GbCMT. Almost all of the GbDNMTs showed the highest expression in female flower (OS) and fruit (IF and MF) tissues, medium expression in other vegetative tissues (root, stem, and cambium), and low expression in leaves and kernels. GbDRMb and GbDRM2c were constitutively expressed in all tissues, while GbDRM2b was preferentially expressed in the different aged cambium, suggesting their different functions. Twenty RdDM related genes were distributed on the seven chromosomes of G. biloba. Four GbDCLs were located on chromosomes 1, 4, and 5, respectively; six GbAGOs were found on chromosomes 1, 9, and 13; and 10 GbRDRs were gathered on chromosomes 2 and 3. With more than a 75% similarity of the coding sequence (CDS), tandem duplication was observed for the GbDCL3a and GbDCL3b on chromosome 1, GbRDR1e and 1f on chromosome 2, and GbRDR6a and 6c on chromosome 3 (Figure 3).

Expression Patterns of DNA Methylation Pathway Genes in Various Tissues of G. biloba
DNA methylation is involved in various aspects of plant growth and development; even sex determination [2,50]. To investigate the expression pattern of DNA methylation pathway genes at different growth and development stages of various tissues, we collected root, stem, leaf, flower, and fruit samples from seedlings or adult plants of G. biloba for the transcriptomic analysis. Expressions from some other tissues, including kernels and cambium, were also included using the public dataset. The results showed the different tissue specificities among the genes (Figure 4a). For GbDNMTs, GbDRMs had a relatively higher expression then GbMETs and GbCMT. Almost all of the GbDNMTs showed the highest expression in female flower (OS) and fruit (IF and MF) tissues, medium expression in other vegetative tissues (root, stem, and cambium), and low expression in leaves and kernels. GbDRMb and GbDRM2c were constitutively expressed in all tissues, while GbDRM2b was preferentially expressed in the different aged cambium, suggesting their different functions. Furthermore, GbDNMT3 was hardly detectable in any tissue, although it was identified (Figure 4a).  Genes involved in the RdDM showed distinct expression patterns. GbDCL1 was expressed in all tissues, but had a slightly lower expression in the root and stem of seedlings, while GbDCL2 was preferentially expressed in kernels. Although GbDCL3a and GbDCL3b are tandem duplicates, they had a higher expression in the cambium, and OS and fruits, respectively, suggesting that their function diverged after the duplication event. Four of the GbAGOs showed the highest expression compared with other families. GbAGO2a and GbAGO18 were highly expressed in almost all tissues. GbAGO10 had a higher expression in the stem, fruits and kernels, and GbAGO2b was specifically expressed in the cambium. Most of the GbRDRs were highly expressed in the cambium and chichi. GbRDR1c and GbRDR6a also showed a higher expression in OS and fruits; GbRDR6c was preferentially expressed in the adult root (AR), and GbRDR1f had a higher expression in mature fruit and early stages of kernels (Figure 4a). These results showed a very differentiated expression pattern between different families and among different members of the same family.

Expression Changes of DNA Methylation Pathway Genes under Different Abiotic Stresses in G. biloba
To study the function of the genes in the abiotic stress responses, we treated the seedlings of G. biloba with drought, a high temperature, a high concentration of salt, and methyl jasmonate (MeJA), and the expressions of DNA methylation pathway genes were then detected. The expression under ultraviolet (UV) treatment downloaded from NCBI was also included. The results showed that, under each treatment, at least one member from each family exhibited a changed expression, and several Genes involved in the RdDM showed distinct expression patterns. GbDCL1 was expressed in all tissues, but had a slightly lower expression in the root and stem of seedlings, while GbDCL2 was preferentially expressed in kernels. Although GbDCL3a and GbDCL3b are tandem duplicates, they had a higher expression in the cambium, and OS and fruits, respectively, suggesting that their function diverged after the duplication event. Four of the GbAGOs showed the highest expression compared with other families. GbAGO2a and GbAGO18 were highly expressed in almost all tissues. GbAGO10 had a higher expression in the stem, fruits and kernels, and GbAGO2b was specifically expressed in the cambium. Most of the GbRDRs were highly expressed in the cambium and chichi. GbRDR1c and GbRDR6a also showed a higher expression in OS and fruits; GbRDR6c was preferentially expressed in the adult root (AR), and GbRDR1f had a higher expression in mature fruit and early stages of kernels (Figure 4a). These results showed a very differentiated expression pattern between different families and among different members of the same family.

Expression Changes of DNA Methylation Pathway Genes under Different Abiotic Stresses in G. biloba
To study the function of the genes in the abiotic stress responses, we treated the seedlings of G. biloba with drought, a high temperature, a high concentration of salt, and methyl jasmonate (MeJA), and the expressions of DNA methylation pathway genes were then detected. The expression under ultraviolet (UV) treatment downloaded from NCBI was also included. The results showed that, under each treatment, at least one member from each family exhibited a changed expression, and several different members responded to different stresses (Figure 4b). Most of the changed genes were involved in multiple stresses, such as GbCMT2, GbDRM2a, GbDCL3b, GbAGO2b/3, and GbRDR1a/1b/1c/6b, while some others only responded to specific stresses, such as GbMET1b, GbDRMa/2b/2c, GbDCL2, and GbAGO2a/10. For the responding GbDNMTs, all genes were down-regulated to different extents under different stresses, except for GbCMT2, which was significantly up-regulated (by 2.22-fold) under UV treatment. GbDCL3b showed an increased expression under high temperature, salt, and MeJA treatment, but a decreased expression under drought. GbAGO2a and GbAGO3 were significantly up-regulated under heat stress, while GbAGO2b/3/10 was down-regulated under short-time drought stress. GbAGO2b was also significantly down-regulated under salt and UV stress. GbRDRs were the most differentially regulated under different treatments. Both GbRDR1b and GbRDR1c were involved in drought stress, but were changed oppositely. GbRDR1b also displayed an increased expression under UV stress. However, GbRDR6b was significantly down-regulated under salt stress (Figure 4b).
In total, more DNA methylation pathway genes were involved under the drought, high temperature, and salt stresses than under MeJA and UV stresses, indicating the different roles of DNA methylation in different abiotic stress responses.

Coexpression Analysis of DNA Methylation Pathway Genes in G. biloba
To explore the function of DNA methylation pathway genes, we downloaded all 135 available public transcriptomic data of G. biloba samples, which include varied tissues, mutants, and treatments, from NCBI for the gene coexpression analysis. After filtering out the low-quality data, expressions from 126 samples were used. The expression profiles across all samples showed three main patterns: group I genes (pink) and group II genes (blue) are preferentially expressed in a different subset of samples, respectively, while group III genes are constitutively expressed in all of the samples (Figure 5a). In addition, GbDNMT3 was not only absent in previous tissues, but also not expressed (FPKM < 1) in any of the samples here, strongly suggesting that it may exist as a pseudogene. Further retrieval of the sample information with high concentrated expressions indicated that group I genes are highly expressed in the ovulate strobilus and fruit, and group II genes are more specifically expressed in the cambium and root (Figure 5a, Table S4). A coexpression analysis was then performed and the network with a Pearson correlation coefficient (PCC) higher than 0.65 was constructed. Consistently, group I and group II genes clustered together, respectively, while group III genes were not closely correlated with each other. In group I, the 11 genes included three families of GbDNMTs (GbMETs, GbCMT, and three GbDRMs), two GbDCLs, two GbAGOs, and one GbRDR. In group II, only RdDM-related genes were involved, which included GbDRM2b, GbAGO2b/3, and six GbRDRs (Figure 5b). This suggests that all of the DNA methylation pathway genes may function together as different machines to play a role in different tissues or different growth stages in G. biloba.

Weighted Gene Co-Expression Network Analysis (WGCNA) of DNA Methylation Pathway Genes
The same dataset was then used for the identification of genes coexpressed with DNA methylation pathway genes through weighted gene co-expression network analysis (WGCNA). In total, 18 expressed gene modules were detected, with an average size of 1618 genes, the most genes (4712) in the turquoise module, and the least genes (471) in the grey60 module (Figure 6a).

Weighted Gene Co-expression Network Analysis (WGCNA) of DNA Methylation Pathway Genes
The same dataset was then used for the identification of genes coexpressed with DNA methylation pathway genes through weighted gene co-expression network analysis (WGCNA). In total, 18 expressed gene modules were detected, with an average size of 1618 genes, the most genes (4712) in the turquoise module, and the least genes (471) in the grey60 module (Figure 6a). Relating modules to the expression of DNA methylation pathway genes demonstrated that different modules were correlated with different sets of genes (Figure 6b). The turquoise module was the most highly correlated with group I DNA methylation genes, with most of the PCCs being higher than 0.9, followed by the green, mignightblue and grey60 modules, which were also significantly correlated with the group I genes. The brown module had a close relationship with group II DNA methylation genes, with an average PCC of 0.8, while the blue module showed a significant negative correlation with both groups I and II genes. In addition, some other modules were correlated with other individual DNA methylation genes, such as the salmon module with GbRDR6c; pink and salmon modules with GbAGO2a; and greenyellow, black, and tan modules with GbDCL2 (Figure 6b).
To investigate the detailed roles that DNA methylation plays, gene ontology (GO) enrichment analysis was performed for the six highly correlated modules (Figure 7). The results show the very distinguishing enriched biological processes of GO for each module. In the four modules associated with group I, genes of the turquoise module are mainly involved in cell proliferation, genes related to lipid biosynthesis and anion transport are enriched in the midnightblue module, cellular localization-related genes are enriched in the green module, and genes in the grey60 module are involved in polysaccharide metabolism. In the brown module, which is highly correlated with group II DNA methylation genes, signal transduction, cell communication, and the response to stimulus are highly enriched biological processes. In the blue module, which is negatively correlated with almost all of the DNA methylation genes, the photosynthesis and cation transport-related genes are mostly enriched. These results demonstrate that different groups of DNA methylation genes may have differentiated functions, which involve different upstream-or downstream-genes. Forests 2020, 11, x FOR PEER REVIEW 12 of 20 Relating modules to the expression of DNA methylation pathway genes demonstrated that different modules were correlated with different sets of genes (Figure 6b). The turquoise module was the most highly correlated with group I DNA methylation genes, with most of the PCCs being higher than 0.9, followed by the green, mignightblue and grey60 modules, which were also significantly correlated with the group I genes. The brown module had a close relationship with group II DNA methylation genes, with an average PCC of 0.8, while the blue module showed a significant negative correlation with both groups I and II genes. In addition, some other modules were correlated with other individual DNA methylation genes, such as the salmon module with GbRDR6c; pink and salmon modules with GbAGO2a; and greenyellow, black, and tan modules with GbDCL2 (Figure 6b).
To investigate the detailed roles that DNA methylation plays, gene ontology (GO) enrichment analysis was performed for the six highly correlated modules (Figure 7). The results show the very distinguishing enriched biological processes of GO for each module. In the four modules associated with group I, genes of the turquoise module are mainly involved in cell proliferation, genes related to lipid biosynthesis and anion transport are enriched in the midnightblue module, cellular localization-related genes are enriched in the green module, and genes in the grey60 module are involved in polysaccharide metabolism. In the brown module, which is highly correlated with group II DNA methylation genes, signal transduction, cell communication, and the response to stimulus are highly enriched biological processes. In the blue module, which is negatively correlated with almost all of the DNA methylation genes, the photosynthesis and cation transport-related genes are mostly enriched. These results demonstrate that different groups of DNA methylation genes may have differentiated functions, which involve different upstream-or downstream-genes.

Key Genes Coexpressed with DNA Methylation Pathway Genes
Focusing on the whole DNA methylation pathway, instead of the individual methylation genes, we identified the top 30 genes that coexpressed with the whole group I or group II DNA methylation genes, respectively ( Figure 8, Table S5). Reasonably, the group I coexpressed genes all belong to the turquoise module from the WGCNA. Consistent with the GO enrichment results, it includes the genes directly involved in DNA replication and the cell cycle, such as replication factor C subunit 2, DNA helicase, double strand RNA binding protein, chromatin assembly factor, kinetochore protein, chromatin remodeling 24, and histone H2AXa. It also has some genes encoding transcription factors, such as the NAC domain protein, receptor protein kinase-like protein, and protease with unknown functions. The group II coexpressed genes all fall into the brown module, including coding genes of RNA-helicase, E3 ubiquitin-protein ligase, protein kinases, transcription factors, and more unknown proteins, which may function in cell communication and the response to stimulus (Figure 8). These genes provide the candidates that the DNA methylation pathway genes may interact with or regulate directly for downstream functions.

Key Genes Coexpressed with DNA Methylation Pathway Genes
Focusing on the whole DNA methylation pathway, instead of the individual methylation genes, we identified the top 30 genes that coexpressed with the whole group I or group II DNA methylation genes, respectively ( Figure 8, Table S5). Reasonably, the group I coexpressed genes all belong to the turquoise module from the WGCNA. Consistent with the GO enrichment results, it includes the genes directly involved in DNA replication and the cell cycle, such as replication factor C subunit 2, DNA helicase, double strand RNA binding protein, chromatin assembly factor, kinetochore protein, chromatin remodeling 24, and histone H2AXa. It also has some genes encoding transcription factors, such as the NAC domain protein, receptor protein kinase-like protein, and protease with unknown functions. The group II coexpressed genes all fall into the brown module, including coding genes of RNAhelicase, E3 ubiquitin-protein ligase, protein kinases, transcription factors, and more unknown proteins, which may function in cell communication and the response to stimulus (Figure 8). These

The Structure of DNA Methylation Pathway Genes
In this study, four subfamilies of DNMTs were identified in G. biloba, among which METs, CMTs, and DRMs are commonly present in plants, while the fourth one-DNMT3-is absent in most angiosperms and had only been functionally identified in the gymnosperm, Physcomitrella patens. It

The Structure of DNA Methylation Pathway Genes
In this study, four subfamilies of DNMTs were identified in G. biloba, among which METs, CMTs, and DRMs are commonly present in plants, while the fourth one-DNMT3-is absent in most angiosperms and had only been functionally identified in the gymnosperm, Physcomitrella patens. It has been reported that DNMT3 mediates the de novo methylation of CG and CHG in a way that is independent from DRM [7]. However, the expression of GbDNMT3 was barely detectable in various tissues and all of the public data, suggesting that GbDNMT3 may be a pseudogene and has lost its function in G. biloba. Moreover, most angiosperms have DNMT2 [51][52][53], but we did not find it in G. biloba. The function of DNMT2 may be replaced by DRM. GbMETs are conserved between G. biloba and Arabidopsis, and all have two DNA-methylase domains [51].
Only three classes of DCL genes have been identified in G. biloba, including GbDCL1, GbDCL2, and GbDCL3 (containing GbDCL3a and GbDCL3b), which is different from a previous study, which revealed four families containing one member each [54]. This was possibly because of the updated genome version we used in this study. Moreover, different DCLs are responsible for the formation of specific small RNAs. DCL1 plays an essential role in the formation of microRNA (miRNA). DCL2 is responsible for producing 22-nt siRNA, and DCL4 is related to the generation of trans-acting small interfering RNAs (ta-siRNAs) and 21-nt siRNAs [55]. DCL3 generates 24-nt siRNAs which participate in RdDM [56,57]. However, the functions of various DCLs in G. biloba still need to be studied. Based on protein sequence analysis, the identity of GbDCL3a and GbDCL3b is 54.39%, while the identity of GbDCL3a and Ptc|DCL3a is 59.63%, and the identity of GbDCL3b and Ptc|DCL3b is 60.16%. This suggests that GbDCL3a and GbDCL3b were separated before the divergence of the two species, consistent with the results obtained for Pinus tabulaeformis [37].
The AGO family is a highly basic binding protein, which is characterized by PAZ and PIWI domains [58]. All kinds of sRNAs produced by DCLs combine with the AGO protein to form the core of RNA-induced silencing complexes (RISCs) [55]. Six AGOs were identified in G. biloba, including GbAGO2a, 2b, 3, 7, 10, and 18. Interestingly, AGO1 and AGO4 were identified in most of the plants, while not obtained in our study. Moreover, GbAGO2a, GbAGO2b, and GbAGO3 are close to each other in the phylogenetic tree, and their positions on the chromosomes are also close, which is consistent with previous studies on Arabidopsis. However, the results show that the function of AtAGO3 is different from that of AtAGO2 and it mainly recruits 24-nt siRNA to regulate epigenetic silencing [59].
The function of RDRs is to synthesize the precursor of double-stranded RNA with the single-stranded RNA template [18]. There are two types of RDRs in Arabidopsis thaliana, including RDRα (AtRDR1, AtRDR2 and AtRDR6), which has been well-studied, and RDRγ (AtRDR3, AtRDR4 and AtRDR5), which still has an unknown function [60,61]. The major function of AtRDR1 is an anti-virus function. Using the single-stranded RNA produced by the virus, AtRDR2 is related to the synthesis of endogenous siRNAs (hsiRNAs), and AtRDR6 is involved in the formation of ta-siRNAs and nat-siRNAs [60]. In the present study, the G. biloba genome encoded three types of RDR genes, including RDR1, RDR3 and RDR6. GbRDR3 is the only acidic protein of the RDR family and is located on chromosome 1 alone, while the function of its homolog in Arabidopsis is still unclear.

The Expression Pattern of DNA Methylation Pathway Genes in Various Tissues and under Abiotic Stresses
The transcript abundance of methylation pathway genes is uneven in different gene families and family members. It has been reported that DNA methylation plays an important role in reproductive development and sex determination [62]. In Pinus tabuliformis, most of the DNA methylation-related sRNA pathway genes have higher expression levels in female than in male cones [37]. In Populus tomentosa, DNA methylation genes MET1 and DECREASED DNA METHYLATION 1 (DDM1) are located in a sex-determination region in the sex chromosome, and the expression levels in female flowers are higher than in male flowers [50]. In maize, sex determination is associated with DNA methylation genes [63]. In the present study, half of the DNA methylation pathway genes, especially the group I genes, had higher expression levels in the ovulate strobilus (female flower) than in the microstrobilus (male flower), consistent with a previous study demonstrating that the differential expression levels of DNA methylation genes may be related to sex determination in G. biloba [62]. Some genes also had higher expression levels in fruit, suggesting that DNA methylation may be involved in reproductive development. In addition, the RdDM-related genes are more highly expressed in meristem tissues than in tissues that grow primarily by cell expansion in Arabidopsis thaliana [64]. In the present study, most of the genes also had high expression levels in the cambium compared with the other vegetative tissues, which is consistent with the study on Arabidopsis thaliana. This indicates that DNA methylation may play an indispensable role in the vegetative growth of G. biloba. Furthermore, GbRDR1b and GbRDR1f and GbRDR6a and GbRDR6c are both tandem duplication pairs with similar expression patterns, while GbDCL3a and GbDCL3b have quite different expressions in various tissues and under stresses, suggesting the divergence of their functions after duplication in G. biloba.
Most of the DNA methylation pathway genes (15/29) in this study showed a changed expression under different abiotic stresses. GbAGO2a was significantly up-regulated (5.25-fold) under heat stress, which is consistent with the change after two-hour treatment in maize. GbAGO2b exhibited a decreased expression under salt and drought treatments, similar to the expression changes in maize [65]. GbRDR1a was significantly up-regulated after six-hour drought treatment, which was congruent with the result obtained in tomato [18]. However, GbDRM2a was down-regulated under salt stress, while DRM2 showed an increased expression in chickpea roots [66]. Similarly, the expression of GbDCL3 was decreased (0.32-fold after 6h and 0.47-fold after 12 h) under drought stress, while it was increased in pepper [58]. These results indicate that some gene family members are functionally conserved across species, while other members play roles in different ways. Moreover, certain G. biloba genes (such as GbCMT2, GbDCL3b, and GbAGO3) have distinct expression patterns under different abiotic stresses, suggesting that they might respond to different abiotic stresses with specific mechanisms [67].

The Differentiated Functions of DNA Methylation Pathway Genes
The coexpression network of DNA methylation pathway genes showed that they were clustered into two groups. Group I contains three types of GbDNMTs and three other RdDM-related families, which covers the genes responsible for the de novo methylation and maintenance of all three types of DNA methylation (CG, CHG, and CHH). All of these genes were preferentially expressed in the ovulate strobilus and fruits in G. biloba. The genes of group II are only RdDM-related and more specifically expressed in the cambium. WGCNA analysis revealed that the genes of group I and group II are highly correlated with different modules, in which distinct GO enrichments were observed, e.g., group I genes were closely related to the turquoise module where cell proliferation genes were enriched, while the cell communication and signal transduction genes were enriched in the brown module, which significantly relates to group II genes. This indicates that multi-member families in the same pathway may differentiate into separate machines to work independently, with same or different functions, in the entire or specific tissues, respectively. This agrees with a previous report on rice, illustrating that 27 starch synthesis genes of seven families were clustered into two groups, and were responsible for starch biosynthesis in the leaf (source) and seed (sink), respectively [68].
It should be noted that there were no DCLs in group II genes. Previous studies have shown that two-thirds of the target region of RdDM still existed even when most of the siRNA disappeared in dcl mutants, signifying that there is a DCL-independent RdDM pathway mediated by some DCL-independent siRNA in Arabidopsis thaliana [11,69]. Our results imply that a DCL-independent RdDM pathway may play a role in the growth and development of G. biloba, especially in the cambium. However, this needs to be further confirmed by experiments.

Conclusions
In this study, we systematically identified DNA methylation pathway genes by bioinformatics methods, and analyzed their potential functions by WGCNA in G. biloba. Twenty-nine genes of seven families were identified and clustered into two groups according to their expression. Through WGCNA, two modules (turquoise and brown modules) demonstrated the highest correlation with group I and group II DNA methylation genes, respectively, in which the genes encoding enzymes or transcription factor, such as Gb_08699, Gb_08989 and Gb_15278 in the turquoise module, were believed to play an important role in DNA repair or cell proliferation, while the genes in the brown module, such as the transcription factors Gb_07584 and Gb_06045, may be very important for adapting to environmental stress. In addition, most of the DNA methylation pathway genes were highly expressed in reproductive tissues and meristem tissues, consistent with the previous research in rice and Arabidopsis thaliana [10,64]. In summary, we identified the DNA methylation pathway genes and explored their expression pattern in various tissues and environmental conditions, and further identified the genes highly correlated with them, laying a foundation for DNA methylation related research in G. biloba. A further DNA methylation level test would be helpful for better understanding its epigenetic effects. With the development of research methods, DNA methylation-related gene editing, the modification of specific methylation sites, or developing epimarkers could greatly boost molecular design breeding or forest selection and cultivation.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/10/1076/s1, Figure S1: Motif and gene structure of DNA methylation genes in Ginkgo biloba; Figure S2: Visualization of the eigengene network representing the relationships among the modules and the DNA methylation pathway genes; Figure S3: Gene Ontology (GO) enrichment of genes from six correlated WGCNA modules. p values were calculated using topGO; Table S1: Pfam and SMART domain IDs of DNA methylation pathway proteins; Table S2: Gene ID of MET, CMT, AGO, RDR and DCL gene families used for the phylogenetic analysis; Table S3: RNA-seq data of different tissue samples collected from this study and SRA database; Table S4: RNA-seq data of 126 samples downloaded from NCBI SRA database and used for WGCNA; Table S5: Top 30 coexpressed genes with two groups of DNA methylation pathway genes.

Conflicts of Interest:
The authors declare no conflict of interest.