Cytoplasm Types Affect DNA Methylation among Different Cytoplasmic Male Sterility Lines and Their Maintainer Line in Soybean (Glycine max L.).

Cytoplasmic male sterility (CMS) lines and their maintainer line have the same nucleus but different cytoplasm types. We used three soybean (Glycine max L.) CMS lines, JLCMS9A, JLCMSZ9A, and JLCMSPI9A, and their maintainer line, JLCMS9B, to explore whether methylation levels differed in their nuclei. Whole-genome bisulfite sequencing of these four lines was performed. The results show that the cytosine methylation level in the maintainer line was lower than in the CMS lines. Compared with JLCMS9B, the Gene Ontology (GO) enrichment analysis of DMR (differentially methylated region, DMR)-related genes of JLCMS9A revealed that their different 5-methylcytosine backgrounds were enriched in molecular function, whereas JLCMSZ9A and JLCMSPI9A were enriched in biological process and cellular component. The Kyoto Encyclopedia of Genes and Genome (KEGG) analysis of DMR-related genes and different methylated promoter regions in different cytosine contexts, hypomethylation or hypermethylation, showed that the numbers of DMR-related genes and promoter regions were clearly different. According to the DNA methylation and genetic distances separately, JLCMS9A clustered with JLCMS9B, and JLCMSPI9A with JLCMSZ9A. Thus, the effects of different cytoplasm types on DNA methylation were significantly different. This may be related to their genetic distances revealed by re-sequencing these lines. The detected DMR-related genes and pathways that are probably associated with CMS are also discussed.


Introduction
Instead of traditional sexual cross-breeding, the seed industry now primarily uses cytoplasmic male sterility (CMS) for hybrid seed production [1]. Because CMS lines generate pollen abortion, using a CMS system avoids the need to artificially remove the maternal line pollen in cross-breeding programs. CMS systems also improve the genetic purity of hybrid seeds and increase seed yield. A CMS system involves three-lines, a CMS line (A line), its maintainer line (B line), and a restoring line (R line) [2]. The A line is controlled by both nuclear and cytoplasmic genes and is the donor of the sterile genes in the cytoplasm. The B line is the donor of the fertility genes in the cytoplasm and nucleus. When a B line has been backcrossed with the corresponding A line for over 10 generations, the two lines share the same nucleus. To reproduce an A line, the A line (maternal line) is crossed to obtain fertile pollen from the B line (paternal line). Soybean (Glycine max L.) is a major crop that provides protein and oil. The first CMS systems were reported by Davis [3], but no further reported.
DNA methylation is an important regulatory mode in eukaryotes that plays major roles in maintaining genome stability and regulating gene expression. Most DNA methylation occurs as 5-methylcytosine [10]. The level of DNA methylation in a genome depends not only on the establishment and maintenance of DNA methylation (hypermethylation), but also on the loss of DNA methylation (hypomethylation). In general, hypomethylation induces gene expression and activates transposable elements, whereas hypermethylation reduces gene expression and inactivates transposable elements [11]. In plants and other organisms, DNA methylation occurs in three sequence contexts, CG (or CpG), CHG, and CHH (where H is A, T, or C) [12].
Differences in DNA methylation patterns between CMS lines and their maintainer lines have been reported in plants [13,14]. In wheat, the DNA methylation was significantly affected by CMS. Three alloplasmic-type A lines with different male sterile cytoplasms, ms (S) -90-110 (S-type), ms (T) -90-110 (T-type) and ms (K) -90-110 (K-type) were used to do methylation-sensitive amplified polymorphism (MSAP). K-type cytoplasm induced more differences than T-type or S-type cytoplasm, as was indicated by the ratios of methylated to fully methylated sites detected by MSAP. Because the K-type was between two genera (Aegilops and Triticum), whereas the T-and S-types were within Triticum genus between Triticum spelta and Triticum timopheevii species the genetic distance among the cytoplasm was greater among nucleus for the K-type than for the T-and S-types. [13]. These differences in genetic distance may explain the variations in methylation patterns. Although MSAP techniques have been used widely in plants, this method is not good at detecting cytosine methylation at non-CCGG sites [15]. Whole-genome bisulfite sequencing (WGBS) is a sensitive and stable method for characterizing genome-wide methylation patterns at single-base resolution and is the gold standard for detecting methylation at CG, CHG, and CHH sites [16,17]. Until now, there is only limited information available about whether DNA methylation is affected by different CMS types in hybrid soybean. Some reports have speculated that methylation levels, especially the methylation levels of some differentially methylated region (DMR)-related genes, are probably affected by CMS [18,19]. In this study, to analyze polymorphisms of DNA methylation and DNA methylation associated with CMS in soybean, we compared the cytoplasm types of three CMS lines (A lines) with the cytoplasm type of the maintainer line (B line) by WGBS. Compared with the shared maintainer line, the three alloplasmic types sterile lines are whether different in their genomic DNA methylation and what difference between them. At the same time, in order to understand what causing to the difference, we also re-sequenced the four lines, expecting to analyze the cytoplasm types affect DNA methylation reason.

Plant Materials
The plant materials used in this study were developed at the Jilin Academy of Agricultural Sciences (Changchun, China). The CMS RN-type was found in 1994 [20], followed by the CMS ZDand XXT-types in 1998 [21]. In particular, the CMS line JLCMS9A (RN-type) was used as the female parent to produce the first commercialized hybrid HYBSOY-1 in the world in 2002 [22]. The soybean CMS lines (A lines), JLCMS9A, JLCMSPI9A, and JLCMSZ9A, have three types of sterile cytoplasm, RN-, ZD-, and XXT-types, respectively, and share the same maintainer line, JLCMS9B (B line). Seeds from the three sterile lines were developed over more than 15 generations by backcrosses with the maintainer line JLCMS9B as the male parent. For the present study, the CMS and maintainer lines were grown in the field. Intact young terminal leaves were harvested and immediately stored in liquid nitrogen for DNA extraction. Total genomic DNA was extracted using the cetyltrimethylammonium bromide method [23].

WGBS Library Construction
After agarose gel electrophoresis detecting no degradation, Qubit 2.0 fluorimeter (Life Technologies, CA, USA) was used to detect the concentration of the genomic DNA, and a NanoPhotometer ® spectrophotometer (Implen, CA, USA) was used to detect the purity. The genomic DNA was sonicated into 200-300-bp DNA fragments using a Covaris S220 ultrasonicator (Covaris, MA, USA). The ends of the fragmented DNA were repaired and poly (A) was added at the 3 ends. All the cytosines had methylated DNA adaptors added. After bisulfite treatment using an EZ DNA Methylation Gold Kit (Zymo Research, Irvine, CA, USA), the unmethylated cytosines were converted to uracil, and then PCR amplified to thymine; while methylated cytosines remained after PCR. Finally, the DNA libraries were obtained. After constructing the libraries, initial quantification was calculated using Qubit 2.0, and each library was diluted to 1 ng/µL. The insertion length of the library was detected using the Agilent 2100 Bioanalyzer platform (Agilent, Santa Clara, CA, USA). The effective concentration of each library was quantified accurately by real-time quantitative PCR (Q-PCR; library effective concentration >2 nM) to ensure the quality of the library. The qualified libraries were sequenced by the Beijing Novo Zhiyuan Company (China) on an Illumina HiSeq Sequencing Platform (HiSeq PE125) using double-ended sequencing.

WGBS Data Analysis
After removing the adaptors, raw reads with an N ratio <10% (N represents undetermined bases), low quality bases (quantity score <20), and high proportion of low-quality bases (>50% of the whole read) were removed. The remaining reads were considered clean reads. Bismark software (https://www.bioinformatics.babraham.ac.uk/projects/bismark/) was used to compare the clean reads with the soybean reference genome sequence (https://phytozome.jgi.doe.gov/pz/portal.html#!info? alias=Org_Gmax). To identify methylation sites, the methylation level was calculated as ML = mC/(mC + umC) [24], where ML is the methylation level, and mC and umC are the numbers of methylated and unmethylated cytosines, respectively. To determine the influence of the bisulfite conversion rate, the methylation level was corrected as follows: ML_corrected = (ML−r)/(1−r), where ML_corrected is the corrected methylation level and r is the bisulfite non-conversion rate [25]. DMRs were identified using the Bioconductor package DSS [26,27]. The distribution and significance of the mapped DMR locations in the genome were analyzed using Circos maps [28]. During the methylation level analysis, the regions with the most differences in methylation levels between samples are generally selected for clustering and classification analyses [29]. Gene ontology (GO) functional enrichment analysis was performed using GOseq software [30,31], and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were compared and analyzed [32]. The corrected data analysis was tested using p-values, and p < 0.05 was selected as the significance value. We also used the A lines and B line for clustering analysis during the methylation level analysis [33].

Re-Sequencing of the A Lines and the B Line
After detecting, method same as part of 2.2, DNA extracted from the soybean CMS lines (A lines) JLCMS9A, JLCMSPI9A, and JLCMSZ9A and their maintainer line JLCMS9B (B line) were randomly fragmented by sonication to 350 bp, then a Truseq Library Construction Kit (Illumina, TX, USA) was used to construct the libraries. The ends of the fragmented DNA were repaired, a poly(A) tail was added, sequencing adaptors were removed, then the fragments were purified and amplified by PCR. The libraries were sequenced on an Illumina Novaseq 6000 platform (Illumina, TX, USA). After constructing the libraries, the detecting method of the library is the same as part of 2.2. Clean data were obtained by filtering the raw data and the clean reads were aligned against the soybean reference genome (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Gmax).
The re-sequencing data were analyzed to detect single nucleotide polymorphisms (SNPs) and insertions/deletions (Indels) (<50 bp) using MPILEUP in SAMTOOLS [34] and annotated using ANNOVAR [35]. SNPs can be used to calculate distances between different lines. A phylogenetic tree Plants 2020, 9, 385 4 of 20 was constructed for the A lines and the B line using the neighbor joining method with 100 bootstrap replicates using TreeBeST 1.9.2 (http://treesoft.sourceforge.net/treebest.shtml). Methylation levels were calculated using a 2-Kb/bin sequencing environment, and a Pearson's correlation coefficient analysis was performed [36].

Whole DNA Methylation Levels of Soybean Lines with Different Cytoplasm types
We obtained 35.51-39.87 Gb of clean reads from the four libraries (three CMS A lines and the maintainer B line). The soybean genome is 1.1 Gb, so the sequencing depth was approximately 20×. The Q30 values were 94.89-96.97%, and the bisulfite conversion rate was over 99%. The unique mapping rate to the soybean reference genome was 65.27-74.53% (Table 1). These results confirm the WGBS data were enough to continue the analyses. The GC content of the JLCMS9A and JLCM9B was 24.14% and 23.23%, respectively. Whereas the GC content of the JLCMSPI9A and JLCMZ9A was lower, at 20.35% and 20.74%, respectively (Table 1).  In the four soybean lines, the methylated cytosine (mC) levels were 9.76-11.27%. Among the methylated cytosines, the methylated cytosine at CpG (mCpG) levels were 42.14-46.65%, methylated cytosine at CHG (H can be A, T, or G; mCHG) levels were 31.8-36.65%, and methylated cytosine at CHH (mCHH) levels were 1.91-3.29% (Table 2). We calculated the average cytosine methylation levels in the 2-Kb upstream sequences of the transcription initiation sites and 2-Kb downstream sequences of the transcription termination sites in each of the four genomes and the results are shown in Figure 1. Compared JLCMS9B among three sterile lines in the gene bodies, respective the 2-Kb upstream and 2-Kb downstream sequences the mCHH methylation level at less than 10% was lower than the mCG level or mCHG level. In gene body region, mCG methylation level was highest, at~40-50% 50%, in any cytosine context for JLCMS9A. Additionally, in the gene bodies, respective the 2-Kb upstream and 2-Kb downstream sequences methylation patterns of JLCMSZ9A and JLCMSPI9A were very similar. The trends of the methylation Plants 2020, 9, 385 5 of 20 in any cytosine context for both JLCMSZ9A and JLCMSPI9A were different from that of JLCMS9B. To better understand the DNA methylation levels of different functional genomic elements (promoters, exons, introns) and repeat sequences in each methylation context, cytosine-containing loci were located in different functional regions of the genome (Figure 2). Among them, intronic regions had the highest level of DNA methylation, followed by promoters, exons, and 5' untranslated regions, and 3' untranslated regions had the lowest level of DNA methylation in the CpG and CHG contexts. The methylation level curves for all cytosine contexts in the functional regions in the CMS lines almost coincided with those in JLCMS9B.
We calculated the average cytosine methylation levels in the 2-Kb upstream sequences of the transcription initiation sites and 2-Kb downstream sequences of the transcription termination sites in each of the four genomes and the results are shown in Figure 1. Compared JLCMS9B among three sterile lines in the gene bodies, respective the 2-Kb upstream and 2-Kb downstream sequences the mCHH methylation level at less than 10% was lower than the mCG level or mCHG level. In gene body region, mCG methylation level was highest, at ~40-50% 50%, in any cytosine context for JLCMS9A. Additionally, in the gene bodies, respective the 2-Kb upstream and 2-Kb downstream sequences methylation patterns of JLCMSZ9A and JLCMSPI9A were very similar. The trends of the methylation in any cytosine context for both JLCMSZ9A and JLCMSPI9A were different from that of JLCMS9B. To better understand the DNA methylation levels of different functional genomic elements (promoters, exons, introns) and repeat sequences in each methylation context, cytosine-containing loci were located in different functional regions of the genome (Figure 2). Among them, intronic regions had the highest level of DNA methylation, followed by promoters, exons, and 5' untranslated regions, and 3' untranslated regions had the lowest level of DNA methylation in the CpG and CHG contexts. The methylation level curves for all cytosine contexts in the functional regions in the CMS lines almost coincided with those in JLCMS9B.   The correlation coefficients among the four lines were >0.50 (as shown in Figure S1). The results show that JLCMSZ9A and JLCMSPI9A, as well as JLCMS9A and JLCMS9B were closest, with correlation coefficients of >0.88 in the three mC contexts ( Figure S1). The correlation coefficients among the four lines were >0.50 (as shown in Figure S1). The results show that JLCMSZ9A and JLCMSPI9A, as well as JLCMS9A and JLCMS9B were closest, with correlation coefficients of >0.88 in the three mC contexts ( Figure S1).

DMR Analyses in the Three CMS Types
We used statistical significance values to reveal significant regional differences in the DMRs (Figure 3). Compared with JLCMS9B, the significant levels of hypermethylation or hypomethylation among the CHH sequences were higher than those among the CG and CHG sequences in the DMRs for JLCMS9A. Compared with JLCMS9B, the significant levels of hypermethylation among the CG and CHG sequences were higher than that of the CHH sequences in the DMRs for JLCMSZ9A and JLCMSPI9A. The GO enrichment analysis of differentially methylated genes (DMGs) in JLCMS9A vs JLCMS9B in the CG background showed they were enriched in molecular function terms, while in JLCMSZ9A vs JLCMS9B and JLCMSPI9A vs JLCMS9B they also were enriched in biological process and cellular component terms compared with JLCMS9A ( Figure 4). The GO enrichment analysis of differentially methylated genes (DMGs) in JLCMS9A vs JLCMS9B in the CG background showed they were enriched in molecular function terms, while in JLCMSZ9A vs JLCMS9B and JLCMSPI9A vs JLCMS9B they also were enriched in biological process and cellular component terms compared with JLCMS9A ( Figure 4).

DNA Methylation Levels and Genetic Distances between Three Soybean CMS Lines and the Maintainer Line
We used the regions that had the most differences in methylation levels to construct an epigenetic map of the three CMS lines and the maintainer line ( Figure 6). The results indicate that the pairs JLCMS9A and JLCMS9B, and JLCMSZ9A and JLCMSPI9A had short epigenetic distances and similar methylation patterns ( Figure 6).

DNA Methylation Levels and Genetic Distances between Three Soybean CMS Lines and the Maintainer Line
We used the regions that had the most differences in methylation levels to construct an epigenetic map of the three CMS lines and the maintainer line ( Figure 6). The results indicate that the pairs JLCMS9A and JLCMS9B, and JLCMSZ9A and JLCMSPI9A had short epigenetic distances and similar methylation patterns ( Figure 6). Figure 6. Epigenetic map of three cytoplasmic male sterility lines and the maintainer line according to differences in their methylation levels. Blue, low methylation level; red high methylation level.

16
We re-sequenced the genomes of the A lines and the B line and constructed a phylogenetic tree to determine their genetic distances (Tables S1 and S2). The results show that the pairs JLCMS9A and JLCMS9B and JLCMSZ9A and JLCMSPI9A shared close genetic distances (Figure 7). This finding is consistent with the results obtained from the epigenetic map ( Figure 6).

DMRs Potentially Involved in CMS among the A Lines and B Line in Soybean
To determine if DMRs in the gene body or promoter regions affect the functions of genes related to CMS, we selected DMGs that had the highest statistical significance in the KEGG pathway enrichment analysis. The selected DMGs are listed in Table 3. We found that the DMGs that encode mitochondrial proteins or the related pathway were enriched in the three A lines and in the B line. For example, the genes encoding ATPase 4 and transcription factors such as WRKY42 and MYB2 were enriched between JLCMS9A and JLCMS9B as well as among JLCMSZ9A, JLCMSZ9A, and JLCMS9B. We re-sequenced the genomes of the A lines and the B line and constructed a phylogenetic tree to determine their genetic distances (Tables S1 and S2). The results show that the pairs JLCMS9A and JLCMS9B and JLCMSZ9A and JLCMSPI9A shared close genetic distances (Figure 7). This finding is consistent with the results obtained from the epigenetic map ( Figure 6).
16 Figure 6. Epigenetic map of three cytoplasmic male sterility lines and the maintainer line according to differences in their methylation levels. Blue, low methylation level; red high methylation level.
We re-sequenced the genomes of the A lines and the B line and constructed a phylogenetic tree to determine their genetic distances (Tables S1 and S2). The results show that the pairs JLCMS9A and JLCMS9B and JLCMSZ9A and JLCMSPI9A shared close genetic distances (Figure 7). This finding is consistent with the results obtained from the epigenetic map ( Figure 6).

DMRs Potentially Involved in CMS among the A Lines and B Line in Soybean
To determine if DMRs in the gene body or promoter regions affect the functions of genes related to CMS, we selected DMGs that had the highest statistical significance in the KEGG pathway enrichment analysis. The selected DMGs are listed in Table 3. We found that the DMGs that encode mitochondrial proteins or the related pathway were enriched in the three A lines and in the B line. For example, the genes encoding ATPase 4 and transcription factors such as WRKY42 and MYB2 were enriched between JLCMS9A and JLCMS9B as well as among JLCMSZ9A, JLCMSZ9A, and JLCMS9B.

DMRs Potentially Involved in CMS among the A Lines and B Line in Soybean
To determine if DMRs in the gene body or promoter regions affect the functions of genes related to CMS, we selected DMGs that had the highest statistical significance in the KEGG pathway enrichment analysis. The selected DMGs are listed in Table 3. We found that the DMGs that encode mitochondrial proteins or the related pathway were enriched in the three A lines and in the B line. For example, the genes encoding ATPase 4 and transcription factors such as WRKY42 and MYB2 were enriched between JLCMS9A and JLCMS9B as well as among JLCMSZ9A, JLCMSZ9A, and JLCMS9B.

Cytoplasmic Effects of DNA Methylation among CMS Lines (A line) and Their Maintainer Line (B line)
As the A lines and the B line have the same nucleus donor, the most genetic differences were in the cytoplasm. We used the CMS lines JLCMS9A, JLCMSZ9A, and JLCMSPI9A to determine whether the cytoplasm type affected their nuclear methylation levels. Different genomic regions have different methylation patterns and perform different biological functions [37], so the differences in DNA methylation levels can help in understanding the roles of DNA methylation at the genome level [38]. We found significant differences in the DNA methylation levels in the cytoplasm of these three CMS lines. The RN-type cytoplasm donor is a landrace, Ru Nan Tian Er Dan [39], The ZD-type cytoplasm donor is cultivar line, ZD8319 [21], The XXT-type cytoplasm donor is not publisedpublished due to the need of patent application [21]. The cytoplasm ZD-and XXT-types in JLCMSPI9A and JLCMSZ9A, respectively, affected the nuclear methylation to a much greater extent than the RN-type in JLCMS9A. We re-sequenced the genomes of the A lines and B lines, which showed their genetic distances were very short, probably an effect of their cytoplasmic genomes. A similar result was also found in wheat [13]. In rice, the difference in the genetic distance between the cytoplasm and nucleus for the WA-type is much greater than for the G-and D-types because the former is between wild and cultivated rice species whereas the G-and D-types is within indica subspecies between African and Asian cultivars. [14]. The authors suggested that these differences may be responsible for the variations in methylation levels in rice genomes.

Enrichment Analysis of DMR-Related Genes (DMGs) Potentially Related to CMS
CMS is most likely related to variations in the mitochondrial genomes of plants [18,40,41]. Five out of the six selected methylated genes in male sterile rice PA64S (sterility) were downregulated than PA64S (fertility), which indicated that DNA methylation is involved in the sterility-fertility transition of PA64S under two environmental conditions [40]. In soybean, genome-wide DNA methylation profiles of the CMS line NJCMS5A and its maintainer NJCMS5B have been reported, and several DMGs that participated in pollen and flower development were identified as possibly associated with CMS [18,41]. These results indicate that different cytoplasm types can affect nuclear DNA methylation levels and that DMRs may lead to sterility-fertility transitions. We also found mitochondrial DMGs and related pathways, including those involving transcription factors WRKY42 and MYB2, in JLCMSZ9A, JLCMSZ9A, and JLCMS9B. A mitochondrial DMG that encodes a WRKY transcription factor was reported recently in soybean [41]. But this gene ID is different to the mitochondrial DMG that encodes WRKY42 detected among the A lines and B line in this study. Despite these findings, the specific mechanism of sterility-fertility transition in soybean is still unclear and the effects of DMGs on sterility need further clarification. For example, further studies into the structure on buds [41] and RNA transcript expression in flower development [19] may help in understanding the relationship between DNA methylation and CMS in plants.

Conclusions
Different cytoplasmic affects on DNA methylation among different male sterile lines and their maintainer line in soybean using WGBS. Significant differences in DNA methylation levels and different methylation regions existed among three different male sterile cytoplasmic types. The ZDand XXT-types of cytoplasm affected the methylation much more than the RN-type, and the genetic distances among these three cytoplasmic types probably lead nuclear methylation levels difference. The relationships among the four lines were the same as determined by both their genetic and epigenetic distances. Compared with JLCMS9B, the GO enrichment analysis of DMR-related genes of JLCMS9A revealed that their different 5-methylcytosine backgrounds were enriched in molecular function, whereas JLCMSZ9A and JLCMSPI9A were enriched in biological process and cellular component. The detected DMR-related genes and pathways that are probably associated with CMS also are discussed.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/3/385/s1. Author Contributions: L.Z. and C.Z. conceived and designed the study. C.L. and Y.L. performed the experiments. B.P. and P.W. collected the plant materials. C.L. and Y.L. analyzed the data. G.Z. modified the data. X.D. and R.L. provided advice and assistance. C.L. wrote the manuscript with contributions from all the authors. C.Z. and L.Z. critically revised the manuscript. All of the authors read and approved the manuscript. All authors have read and agreed to the published version of the manuscript.