Identification and Bioinformatic Analysis of the GmDOG1-Like Family in Soybean and Investigation of Their Expression in Response to Gibberellic Acid and Abscisic Acid

Seed germination is one of the most important stages during plant life cycle, and DOG1 (Delay of germination1) plays a pivotal regulatory role in seed dormancy and germination. In this study, we have identified the DOG1-Like (DOG1L) family in soybean (Glycine max), a staple oil crop worldwide, and investigated their chromosomal distribution, structure and expression patterns. The results showed that the GmDOG1L family is composed of 40 members, which can be divided into six subgroups, according to their evolutionary relationship with other known DOG1-Like genes. These GmDOG1Ls are distributed on 18 of 20 chromosomes in the soybean genome and the number of exons for all the 40 GmDOG1Ls varied greatly. Members of the different subgroups possess a similar motif structure composition. qRT-PCR assay showed that the expression patterns of different GmDOG1Ls were significantly altered in various tissues, and some GmDOG1Ls expressed primarily in soybean seeds. Gibberellic acid (GA) remarkably inhibited the expression of most of GmDOG1Ls, whereas Abscisic acid (ABA) inhibited some of the GmDOG1Ls expression while promoting others. It is speculated that some GmDOG1Ls regulate seed dormancy and germination by directly or indirectly relating to ABA and GA pathways, with complex interaction networks. This study provides an important theoretical basis for further investigation about the regulatory roles of GmDOG1L family on soybean seed germination.


Introduction
Soybean is an important legume plant, and is now widely grown in many countries as a staple oil crop [1]. To meet the growing demand for soybean as food, oil and other resources, it is imperative to increase soybean production, and maize-soybean relay-intercropping is a model to increase soybean yield [2]. Seed germination is an important stage during the plant life cycle, which contributes to the spread and distribution of wild species and enhance the quality and yield of cultivated crops [3].
Most angiosperms begin a new stage of growth and development after seed dormancy, which is a very useful survival mechanism protecting plants from adverse environments [4,5]. Successful germination of seeds in the field is essential for the stable high yield [6]. Therefore, the intensive and extensive investigation of the regulatory mechanisms of seed dormancy and germination has an important scientific significance.
DOG1 is a key gene in the control of seed dormancy and germination, which was first discovered through quantitative trait locus (QTL) analysis in Arabidopsis [7]. Following studies revealed that the amount of DOG1 protein regulates the seed dormancy level and germination capacity [8,9]. DOG1 starts to accumulate during seed maturation, and its protein structure is altered during the post-ripening stage. It has been reported that DOG1 express stably after maturation, and DOG1 protein abundance in freshly harvested seeds is important for seed dormancy release [8]. Interestingly, several studies have demonstrated that DOG1 is highly conserved in diverse plant species, including Lactuca sativa, Brassica rapa, Hordeum vulgare, Triticum aestivum and Oryza sativa [10][11][12][13].
Phytohormone abscisic acid (ABA) and DOG1 protein are essential regulators of seed dormancy and germination. It is believed that DOG1 is associated with ABA signaling pathway via clade A of type 2C protein phosphatases [14]. ABA is essential for DOG1 to function, and DOG1 indirectly promotes the transcription of ABA biosynthesis gene NCED [8]. In addition, AHG1 (ABA-HYPERSENSITIVE GERMINATION 1) and AHG3, other important components in ABA signaling pathway also act downstream of DOG1 and are critical to DOG1 function [14]. This indicated that DOG1 controls seed germination by inhibiting the action of specific PP2Cs, which are the negative regulators of ABA signaling pathways.
The successful germination of soybean seeds is one of the preconditions required for a high yield. According to the previous studies, soybean is rich in oil and protein, which leads to the decrease of soybean germination rate, affecting the final yield [3]. Meanwhile, soybean seed germination is deficient, if exposed to diverse environmental stresses, such as salinity, drought or flooding stress, which further results in a serious decrease in final yield [15,16]. The studies of seed germination and dormancy are mainly focused on ABA and GA, and the molecular mechanisms of DOG1 are extensively documented in Arabidopsis [7]. However, the studies of DOG1 mediating the seed germination and dormancy are largely unknown especially in soybean cultivar, because of the shortage of the genetic information of DOG1L gene family in this species.
In this study, the characteristics and functions of GmDOG1L family are analyzed by several bioinformatics methods. The chromosome distribution, gene structure, gene expression patterns in different tissues and the responsiveness of phytohormone treatment of GmDOG1L family have been thoroughly investigated. We aimed for this study to help us in understanding the characteristics of soybean DOG1 gene and its relationship with phytohormones.

Identification of GmDOG1Ls Genes
The soybean-specific Hidden Markov Model for the DOG1 domain was used to identify GmDOG1-Likes (GmDOG1Ls), upon which a total of 40 non-redundant GmDOG1-Like candidates were identified in the soybean genome. Various data about these 40 GmDOG1Ls, including gene name, chromosome location, the number of exons and length of the coding sequences (CDS) were collected. As such, we named them GmDOG1-Like-1 (GmDOG1-L1) to GmDOG1-Like-40 (GmDOG1-L40) ( Table 1). By using the NCBI CDD website, we found that the 40 GmDOG1L genes contained the conservative structural domain of DOG1(PF14144), which suggest that our analysis is reliable. Interestingly, among the 40 GmDOG1Ls identified, several members also contained the b-ZIP domain, presumably because of the three domains (PD870616, PD004114 and PD388003) conferring the characterization of DOG1, and relatively, PD004114 is also presented in the b-ZIP domain containing transcription factors described elsewhere [7]. The length of the coding sequences of the 40 GmDOG1Ls genes varied significantly, the longest one being GmDOG1-L16 with 1554 bp, while the shortest one is GmDOG1-L7 with only 285 bp. Accordingly, GmDOG1-L16 had 517 amino acid residues, which was found to be the largest protein in the GmDOG1L family, while the smallest one was GmDOG1-L7 with 94 amino acid residues. Furthermore, their predicted isoelectric points (pI) varied from 4.97 (GmDOG1-L9) to 9.97 (GmDOG1-L12) ( Table 1).

GmDOG1L Genes Structure and Phylogenetic Analysis
In this study, we used the HMM method to search the DOG1L family in soybean, and as a result, a total of 40 GmDOG1-like genes were found. In addition, by using the same method, we found 20 DOG1-like genes in Arabidopsis thaliana, 20 DOG1-like genes in Oryza sativa and 53 DOG1-like genes in Triticum aestivum; the detailed gene names and gene IDs are shown in Table  S1. A total of 135 DOG1Ls from these five species were clustered into six subgroups (I-VI), including 40 GmDOG1-Like, 20 AtDOG1-Like, 53 TaDOG1-Like (Triticum aestivum), 20 OsDOG1-Like (Oryza sativa) and two HvDOG1-Like (Hordeum vulgare) ( Figure 1). Among them, AtDOG1 (AT5G45830), AtDOG1-Like-1 (AT4G18660), AtDOG1-Like-2 (AT4G18680), AtDOG1-Like-3 (AT4G18690) and AtDOG1-Like-4 (AT4G18650) were the first five discovered DOG1 genes [7], followed by HvDOG1-L1, HvDOG1-L2, TaDOG1-L1, TaDOG1-L4 and OsDOG1-L3 for which a seed dormancy function has already been confirmed [13]. We found that GmDOG1L was included in all six subgroups. The six DOG1-like genes that have been proven to regulate seed dormancy were concentrated in the subgroups I and II. In addition, we found that HvDOG1-2, GmDOG1-L2 and GmDOG1-L26 were in the subgroup I alone. Meanwhile, GmDOG1-L8, GmDOG1-L37 and GmDOG1-L40 were found to be very close to AtDOG1 in phylogenetic relationship, and GmDOG1-L37 had the closest phylogenetic relationship with AtDOG1. From the phylogenetic tree analysis, we speculated that GmDOG1-L8, GmDOG1-L37 and GmDOG1-L40 might be the potential DOG1 genes in soybean.
Plants 2020, 9, x FOR PEER REVIEW 6 of 16 The length of the coding sequences of the 40 GmDOG1Ls genes varied significantly, the longest one being GmDOG1-L16 with 1554 bp, while the shortest one is GmDOG1-L7 with only 285 bp. Accordingly, GmDOG1-L16 had 517 amino acid residues, which was found to be the largest protein in the GmDOG1L family, while the smallest one was GmDOG1-L7 with 94 amino acid residues. Furthermore, their predicted isoelectric points (pI) varied from 4.97 (GmDOG1-L9) to 9.97 (GmDOG1-L12) ( Table 1).

GmDOG1L Genes Structure and Phylogenetic Analysis
In this study, we used the HMM method to search the DOG1L family in soybean, and as a result, a total of 40 GmDOG1-like genes were found. In addition, by using the same method, we found 20 DOG1-like genes in Arabidopsis thaliana, 20 DOG1-like genes in Oryza sativa and 53 DOG1-like genes in Triticum aestivum; the detailed gene names and gene IDs are shown in Table S1. A total of 135 DOG1Ls from these five species were clustered into six subgroups (I-VI), including 40 GmDOG1-Like, 20 AtDOG1-Like, 53 TaDOG1-Like (Triticum aestivum), 20 OsDOG1-Like (Oryza sativa) and two HvDOG1-Like (Hordeum vulgare) ( Figure 1). Among them, AtDOG1 (AT5G45830), AtDOG1-Like-1 (AT4G18660), AtDOG1-Like-2 (AT4G18680), AtDOG1-Like-3 (AT4G18690) and AtDOG1-Like-4 (AT4G18650) were the first five discovered DOG1 genes [7], followed by HvDOG1-L1, HvDOG1-L2, TaDOG1-L1, TaDOG1-L4 and OsDOG1-L3 for which a seed dormancy function has already been confirmed [13]. We found that GmDOG1L was included in all six subgroups. The six DOG1-like genes that have been proven to regulate seed dormancy were concentrated in the subgroups I and II. In addition, we found that HvDOG1-2, GmDOG1-L2 and GmDOG1-L26 were in the subgroup I alone. Meanwhile, GmDOG1-L8, GmDOG1-L37 and GmDOG1-L40 were found to be very close to AtDOG1 in phylogenetic relationship, and GmDOG1-L37 had the closest phylogenetic relationship with AtDOG1. From the phylogenetic tree analysis, we speculated that GmDOG1-L8, GmDOG1-L37 and GmDOG1-L40 might be the potential DOG1 genes in soybean.
. To narrow down the scope of the study, phylogenetic trees were constructed by selecting only GmDOG1Ls, five AtDOG1Ls genes and five other DOG1-like genes that have been proven to regulate seed dormancy [13] ( Figure S1). Ten conserved DOG1Ls protein domains were characterized through MEME software analysis. As a result, we found that the motif composition of DOG1-Like in the same group had adequate consistency in phylogenetic trees ( Figure 2). For example, in the subgroup IV, AtDOG1 consisted only of motif 1 and motif 8, with GmDOG1-L8, GmDOG1-L40 having the same composition. GmDOG1-L10, GmDOG1-L13 and GmDOG1-L11 all contain two motifs of AtDOG1. AtDOG1-L2, AtDOG1-L3 and AtDOG1-L4 contain motif 1 and motif 8. AtDOG1-L1 consisted of motif 5 and motif 8. These observations probably imply that motif 8 might be the most important component of DOG1. In addition, we found that TaDOG1-L1, TaDOG1-L4, HvDOG1-L1 and OsDOG1-L3 also contained motif 8. On the other hand, 37 of 40 contained both motif 1, whereas GmDOG1-L5, GmDOG1-L27 and GmDOG1-L37 were devoid of motif 1. While GmDOG1-L7 consisted of motif 1 alone. Interestingly, we found no any consistent motif between HvDOG1-L2 and other known DOG1-Like. It is to be noted that motif 8 was found in most of known DOG1 sequences, implying that motif 8 are the conserved structural domains shared by DOG1-like proteins and might play an important role. In summary, from the motif analysis, we speculate GmDOG1-L8, GmDOG1-L10, GmDOG1-L13, GmDOG1-L11 and GmDOG1-L40 as potential DOG1 genes in soybeans. All of them had motifs contained in AtDOG1, and their motifs were similar to the motifs of three other DOG1-Like genes that had been identified to regulate seed dormancy. Ten conserved motifs labeled with different colors were found in the DOG1Ls sequences using the MEME program.
According to a previous research, the soybean genome experiences one whole genome triplication (WGT) event and two whole genome duplication (WGD) events with legume WGD and  The introns and exons of different DOG1Ls genes were varied as shown in the Figure S2. The smallest number of exons was only one, and the largest number was as many as 12. We found that AtDOG1 contained three exons. AtDOG1-L1, AtDOG1-L2, AtDOG1-L3, TaDOG1-L1, HvDOG1-L1 and OsDOG1-L3 had only one exon, but AtDOG1-L4 had two exons. In addition, the exons of HvDOG1-L2 and TaDOG1-L4 were temporarily unavailable. The distribution of exons and introns of DOG1Ls family was found to be complex, and even members of DOG1Ls that were grouped together in the evolutionary analysis had inconsistent exon composition.
In addition, we performed bioinformatics analysis on the DNA sequences of 40 GmDOG1-Like gene promoters, and the analysis results are shown in Figure S3. We analyzed the main role in cis element containing ABRE (cis acting element involved in the abscisic acid responsiveness), the TATC-box (cis acting element involved in gibberellin responsiveness), AuxRR-core (cis acting regulatory element involved in auxin responsiveness) and TGA-box (part of an auxin-responsive element), respectively. We found that GmDOG1-L5, GmDOG1-L10, GmDOG1-L11, GmDOG1-L37 and GmDOG1-L40 contained ABRE elements. GmDOG1-L13 and GmDOG1-L27 contained TGA elements with GmDOG1-L28 containing ABRE and AuxRR elements. Interestingly, GmDOG1-L37 contains three ABRE elements. This result suggested that these GmDOG1Ls might be involved in the regulation of phytohormone response in soybean.
According to a previous research, the soybean genome experiences one whole genome triplication (WGT) event and two whole genome duplication (WGD) events with legume WGD and Glycine WGD, as well as about 75% of the genes in soybean, have multiple paralogs [17]. Among the paralog genes, 50% displayed expression of a sub-functionalization that may cause phenotypic variation [18,19]. In addition, dispersed duplicates generally arise by the transposition of DNA or RNA, which might play an important role in creating new genes and changing gene function [20,21]. Finally, we found that 24 of the 40 GmDOG1Ls were distributed in the duplication regions by using the MCScanX program, suggesting that these genes were generated by large-scale duplication events ( Figure S5 and Table S2). Furthermore, gene family expansion might be caused by a tandem duplication event, generating consecutive copies of genes in the genome [22,23]. But no tandem duplications were detected in the GmDOG1L gene family.
We then calculated the nonsynonymous substitution rate (Ka) and synonymous substitution rate (Ks) of these duplicated gene pairs (Table S3). In this study, only the Ka/Ks ratio of GmDOG1-L2 and GmDOG1-L1, GmDOG1-L16 & GmDOG1-L3 and GmDOG1-L32 and GmDOG1-L25 was greater than 1, which is considered to be subject of positive selection, the remaining Ka/Ks ratios were found to be less than 1, which is considered as a purification selection [24]. In addition, the Ks value of GmDOG1-L2 and GmDOG1-L1, GmDOG1-L10 and GmDOG1-L34, GmDOG1-L10 and GmDOG1-L21, GmDOG1-L15 and GmDOG1-L34, GmDOG1-L16 and GmDOG1-L3 and GmDOG1-L32 and GmDOG1-L25 were found to be lesser than 1.3 and greater than 0.3, suggesting that their divergence time was after legume WGD event and before the Glycine WGD. The Ks value of other duplicated gene pairs were less than 0.3, which indicates that their divergence time was after the Glycine whole genome duplication (WGD) event [17,25].

Expression Profiles of GmDOG1L Genes in Various Tissues of Soybean
In order to understand the roles of GmDOG1Ls in the growth and development of soybean, we selected 10 GmDOG1L genes from different subgroups for tissue specific expression analysis. First, Plants 2020, 9, 937 7 of 14 we selected the GmDOG1-L37 gene from the subgroup II, which was found closest to AtDOG1 in phylogenetic analysis among the 40 members of GmDOG1Ls, and GmDOG1-L37 contained three ABRE elements. Secondly, we selected the GmDOG1-L11 gene from the subgroup II, as GmDOG1-L11 contained AtDOG1's motif composition along with containing ABRE elements. Third, we chose GmDOG1-L27 from the subgroup II, as it was close to HvDOG1-L1 and TaDOG1-L1, and GmDOG1-L27 also contained TGA elements. Finally, GmDOG1-L10 was also employed for its closest evolutionary relationship to AtDOG1-L3 from the subgroup II, which had the similar motifs composition of AtDOG1, and GmDOG1-L10 contained ABRE elements. In addition, considering that soybean genome experiences one whole genome triplication (WGT) event and two whole genome duplication (WGD) events, functional differentiation might have occurred in some GmDOG1Ls. In order to explore whether GmDOG1Ls had any functional differentiation, we chose GmDOG1-L2 and GmDOG1-L26 from subgroup I, GmDOG1-L39 from subgroup III, GmDOG1-L30 from subgroup IV, GmDOG1-L3 from subgroup V and GmDOG1-L1 from subgroup VI. These 10 GmDOG1L genes were selected from different subgroups as representatives to study whether there were any differences in expression patterns of GmDOG1Ls in different various tissues.
The qRT-PCR assay was employed to investigate the expression patterns of these GmDOG1Ls in several soybean tissues, including root, stem, leaf, flower, apical meristems, pod, developing seed and dry seed. The results showed that all the genes were expressed in eight tissues with different levels ( Figure 3). Interestingly, the expression of GmDOG1-L1, GmDOG1-L2, GmDOG1-L3 and GmDOG1-L39 was found to be highest in the pods, followed by dry seeds. GmDOG1-L11, GmDOG1-L27, GmDOG1-L30 and GmDOG1-L37 were most highly expressed in dry seeds. Furthermore, expression of GmDOG1-L26 was unexpected with high expression level in leaves, followed by pods and dry seeds. In addition, in the phylogenetic analysis, GmDOG1-L26 and GmDOG1-L2 were in subgroup I, but there were significant differences in gene expression in different soybean tissues. Finally, GmDOG1-L26, GmDOG1-L27 and GmDOG1-L30 genes were found to be highly expressed in developing seeds.
These 10 GmDOG1L genes were all expressed relatively high in seeds, which is consistent with the previous studies reporting that DOG1 is a key gene regulating seed dormancy and germination. In general, GmDOG1-L11, GmDOG1-L27, GmDOG1-L30 and GmDOG1-L37 were indeed primarily expressed in dry seeds indicating that these genes might mainly function in seed biology. In summary, from tissue specific expression analysis, we believed that GmDOG1-L11, GmDOG1-L27, GmDOG1-L30 and GmDOG1-L37 might be the potential GmDOG1 genes.

Expression Analysis of GmDOG1L under Phytohormones Treatment
To further investigate the relationship between GmDOG1Ls and phytohormones, seven GmDOG1L genes were selected from different groups for qRT-PCR assay. Compared with control (CK), the expression of GmDOG1-L10, GmDOG1-L27 and GmDOG1-L30 were up-regulated, while the transcription level of GmDOG1-L11, GmDOG1-L26, GmDOG1-L37 and GmDOG1-L39 were downregulated with distinct levels, after exogenous ABA treatment ( Figure 5). Interestingly, the expression of all the GmDOG1L genes was down-regulated under GA treatment, suggesting that the expression of these GmDOG1Ls was inhibited by GA. In addition, the expression of GmDOG1-L10, GmDOG1-L11 and GmDOG1-L30 were up-regulated under FL (fluridone) treatment, while the expression of GmDOG1-L26, GmDOG1-L27, GmDOG1-L37 and GmDOG1-L39 were down-regulated. Intriguingly,

Expression Analysis of GmDOG1L under Phytohormones Treatment
To further investigate the relationship between GmDOG1Ls and phytohormones, seven GmDOG1L genes were selected from different groups for qRT-PCR assay. Compared with control (CK), the expression of GmDOG1-L10, GmDOG1-L27 and GmDOG1-L30 were up-regulated, while the transcription level of GmDOG1-L11, GmDOG1-L26, GmDOG1-L37 and GmDOG1-L39 were down-regulated with distinct levels, after exogenous ABA treatment ( Figure 5). Interestingly, the expression of all the GmDOG1L genes was down-regulated under GA treatment, suggesting that the expression of these GmDOG1Ls was inhibited by GA. In addition, the expression of GmDOG1-L10, GmDOG1-L11 and GmDOG1-L30 were up-regulated under FL (fluridone) treatment, while the expression of GmDOG1-L26, GmDOG1-L27, GmDOG1-L37 and GmDOG1-L39 were down-regulated. Intriguingly, the expression of all GmDOG1L genes were down-regulated under PAC (paclobutrazol) treatment.

Discussions
DOG1s is a key gene regulating seed dormancy and germination [7]. Compared with wild type, dog1 mutant seeds germinated much faster, while the germination was delayed in overexpressed AtDOG1 seeds [7]. In this study, we used the HMM approach to look out for DOG1-like genes in soybeans, and finally found 40 GmDOG1Ls. We performed phylogenetic analysis with 40 GmDOG1like genes, 20 AtDOG1-Like genes, 20 OsDOG1-like genes, 53 TaDOG1-Like genes and two HvDOG1like genes. From the phylogenetic analysis, GmDOG1-L37 with AtDOG1 were the closest, followed by GmDOG1-L8 and GmDOG1-L40. Moreover, GmDOG1-L37 was found in between AtDOG1 and OsDOG1-L3, which is one of the genes proven to regulate seed germination. The 135 DOG1-like genes were divided into six subgroups, and we found that AtDOG1 and 5 DOG1-like genes that have been proven to regulate seed dormancy were clustered in the subgroup I and II, which might indicate that these two subgroups played a special role in the phylogenetic analysis. In addition, among the remaining four subgroups, no DOG1-like gene that has been confirmed to regulate seed dormancy was found, which might imply that the DOG1-like genes of these four subgroups are not potential DOG1s and might be those that have lost the ability to regulate seed dormancy.
In Figure 2, the motifs of DOG1Ls had obvious characters in the same subgroup, whereas in

Discussions
DOG1s is a key gene regulating seed dormancy and germination [7]. Compared with wild type, dog1 mutant seeds germinated much faster, while the germination was delayed in overexpressed AtDOG1 seeds [7]. In this study, we used the HMM approach to look out for DOG1-like genes in soybeans, and finally found 40 GmDOG1Ls. We performed phylogenetic analysis with 40 GmDOG1-like genes, 20 AtDOG1-Like genes, 20 OsDOG1-like genes, 53 TaDOG1-Like genes and two HvDOG1-like genes. From the phylogenetic analysis, GmDOG1-L37 with AtDOG1 were the closest, followed by GmDOG1-L8 and GmDOG1-L40. Moreover, GmDOG1-L37 was found in between AtDOG1 and OsDOG1-L3, which is one of the genes proven to regulate seed germination. The 135 DOG1-like genes were divided into six subgroups, and we found that AtDOG1 and 5 DOG1-like genes that have been proven to regulate seed dormancy were clustered in the subgroup I and II, which might indicate that these two subgroups played a special role in the phylogenetic analysis. In addition, among the remaining four subgroups, no DOG1-like gene that has been confirmed to regulate seed dormancy was found, which might imply that the DOG1-like genes of these four subgroups are not potential DOG1s and might be those that have lost the ability to regulate seed dormancy.
In Figure 2, the motifs of DOG1Ls had obvious characters in the same subgroup, whereas in Figure S2, the exons of DOG1Ls had no obvious characters. Previous investigations have shown that gene duplication is an important approach to generate new genes, which is actually a protection mechanism for plants to adapt to changing environment [26,27]. In this study, we found that most of the GmDOG1Ls were distributed in duplication blocks, indicating that WGD or segmental duplications plays an important role in the extension of GmDOG1L family [28]. We speculate that the increase in the number of GmDOG1L gene family might have been primarily caused by gene duplication, especially the WGD.
We chose 10 GmDOG1Ls that were relatively highly expressed in dry seeds (Figure 3). Among the 10 genes, GmDOG1-L11, GmDOG1-L27, GmDOG1-L30 and GmDOG1-L37 showed the highest expression in dry seeds, thus, we speculated them as the potential DOG1 genes in soybean. After exploring the genes selected further, we found that the expression patterns of the genes in the same subgroup showed similar patterns. For example, GmDOG1-L11 and GmDOG1-L27 genes from the group II were highly expressed in dry seeds. GmDOG1-L26 and GmDOG1-L30 had similar motif compositions, and the expression patterns were also similar, in parallel to their high expression in leaves. These tissue-specific expression results suggest that some of the 40 GmDOG1L family members were not potential GmDOG1 genes, thus showing different biological functions. In addition, by analyzing the transcriptome data, we found that the expression of GmDOG1-L3, GmDOG1-L6, GmDOG1-L12, GmDOG1-L16, GmDOG1-L27, GmDOG1-L32, GmDOG1-L36 and GmDOG1-L37 genes were sustained during seed development. Therefore, we speculate that they might be the potential DOG1 genes in soybean.
Phytohormones ABA and GA play a leading role in regulating seed dormancy and germination, and they antagonistically mediate diverse plant developmental processes including seed dormancy and germination [29,30]. For example, when seeds are treated with ABA, dormancy is enhanced and germination is inhibited [6]. Contrarily, when seeds are treated with GA, seed dormancy is inhibited and seed germination is promoted [6]. Analysis of qRT-PCR of GmDOG1Ls under phytohormone treatment showed that the precise correlation between GmDOG1Ls expression and ABA was not detected as ABA inhibits some of the GmDOG1Ls expression while promoting others. Similarly, the correlation between FL and GmDOG1Ls was observed to be the same. In addition, experimental results showed that GA and PAC down-regulated the expression of GmDOG1L genes. The expression of these seven genes had no obvious characters under phytohormone treatment. From this, we speculated that some members of the 40 GmDOG1L may not belong to the GmDOG1 family, and they might not have the function of regulating seed dormancy and germination, thus, their response to phytohormones is inconsistent with the results obtained.

Conclusions
In this study, we performed a comprehensive analysis of GmDOG1L gene family, providing a perspective for the evolution of this family. Using DOG1 HMM, we identified 40 GmDOG1Ls from soybean genome. These GmDOG1Ls were distributed on 18 chromosomes of soybean and were divided into five subgroups according to their evolutionary relationship. We found that GmDOG1Ls' motifs are similar. Gene duplication analysis indicated that the WGD or segmental duplications might lead to the expansion of GmDOG1L family. Expression profile analysis showed that GmDOG1-L11, GmDOG1-L27, GmDOG1-L30 and GmDOG1-L37 had the highest expression in seeds compared to other GmDOG1Ls. The precise correlation between GmDOG1L genes expression and ABA or FL was not detected, but the expression of GmDOG1Ls was inhibited under GA and PAC treatments.
More importantly, we identified some potentially useful GmDOG1L genes through this study, such as GmDOG1-L11, GmDOG1-L37 and GmDOG1-L40. Not only were they found closer to identified DOG1-like in phylogenetic tree analysis, but their motif compositions were also similar to identified DOG1-like compositions, and their expression in seeds was the highest in qRT-PCR assay. Among them, we believe that GmDOG1-L37 is the most likely gene to become potential GmDOG1. It was not only found closest to AtDOG1 in phylogenetic analysis, but also had the highest expression in seeds in qRT-PCR analysis. Transcriptome data also confirmed that its expression continues to increase during seed development and their promoter also contained three ABRE elements. Therefore, we believe that GmDOG1-L37 might have the greatest potential to become GmDOG1 gene in soybean. These results provide a basis for further understanding the molecular functions of GmDOG1L family and the specific mechanisms of GmDOG1L family in regulating seed dormancy and germination.

Identification of GmDOG1Ls
The soybean genome and protein sequences were downloaded from Phytozome12 website (http://www.phytozome.net/). The Hidden Markov Model (HMM) of conservative structure domain of DOG1 (PF14144) was downloaded from the PFAM database (http://pfam.xfam.org/) [13,31,32]. Predicted GmDOG1Ls were scanned with HMMER 3.1 software using the HMM of conserved domain of DOG1 [33,34]. We then used those protein sequences (E-value < 0.001) to construct a new HMM model of soybean using HMMER 3.1 software. This new soybean-specific HMM was used to identify GmDOG1Ls (E-value < 0.001). In order to ensure the accuracy of the results, 40 GmDOG1Ls proteins sequence were submitted on the NCBI website (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) for further verification [35].

Synteny Analysis of GmDOG1Ls
Duplication events for GmDOG1Ls was performed using the Multiple Collinearity Scan toolkit X (MCScanX) program with the default parameters [44]. According to the results, KaKs-Calculator software was used to calculate the non-synonymous replacement rate (Ka) and synonymous replacement rate (Ks) of duplication genes [45]. We used the Circos tool to draw the positions of genes and segmental duplicated regions on the soybean chromosomes [46,47].

Gene Expression Analysis
As previously described, we performed the total RNA preparation and first-strand cDNA synthesis as well as a qRT-PCR assay [48]. Total RNA was treated with DNase I, and then 2 µg total RNA was reverse-transcribed using Moloney murine leukemia virus reverse transcriptase (200 units per reaction; Promega Corporation), according to the manufacturer's protocol. The soybean housekeeping gene GmTubulin was used as endogenous reference gene, and each reactions were repeated three times [3].
The qRT-PCR reaction system was 10 µL, which included: 0.4 µL forward primer and reverse primer, 3.6 µL DNase-free ddH 2 O, 1 µL cDNA and 5 µL Vazyme™ AceQ qPCR SYBR Green Mastermix. The qRT-PCR reaction procedure was set as follows: 94 • C for 2 min 30 s, and then 40 cycles of 94 • C for 10 s and 60 • C for 32 s. Each experiment values represent three biological replicates. The qRT-PCR performed using Vazyme™ AceQ qPCR SYBR Green Master mix on a QuantStudio 6 Flex Real-Time PCR System (Thermo Fisher Scientific, USA) [3]. The expression level of GmDOG1Ls were calculated by the comparative C T method [49]. Online primers were designed using NCBI primer design tool site (https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC=BlastHome). The detailed information of the primers is shown in Table S4.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/8/937/s1, Figure S1: Phylogenetic analysis of DOG1Ls from soybean, Arabidopsis, Hordeum vulgare, Triticum aestivum and Oryza sativa, Figure S2: Phylogenetic analysis and exon-intron organization of DOG1Ls, Figure S3: Cis-acting element of GmDOG1Ls, Figure S4: Distribution of GmDOG1Ls on soybean chromosomes. Forty GmDOG1L genes were located in 18 chromosomes. The chromosome 9 and chromosome16 has no GmDOG1L genes, Figure S5: The syntenic relationships among GmDOG1Ls, Table S1: Gene Name and Gene ID of DOG1-Like in four species, Table S2: Information about the duplicated regions of GmDOG1Ls, Table S3: Duplication events of GmDOG1Ls, Table S4: Primers sequence used in this study.
Author Contributions: K.S. and Y.Y. designed the research. Y.Y. and C.Z. performed most of the data analysis and experiments. U.C. performed part of the experiments. L.Y., C.L., T.P., X.W., J.D., J.L., F.Y., T.Y. and W.Y. provided valuable suggestions for this study. K.S., W.L. and Y.Y. analyzed the data and wrote the manuscript. All authors read and approved the final manuscript.
Funding: This work was supported by the grants from the National Key Research and Development Program of China (2016YFD0300209) and the National Natural Science Foundation of China (31872804, 31701064). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflicts of Interest:
The authors declare that they have no competing interests.