Characterization of the MADS-Box Gene Family in Akebia trifoliata and Their Evolutionary Events in Angiosperms

As the largest clade of modern plants, flower plants have evolved a wide variety of flowers and fruits. MADS-box genes play key roles in regulating plant morphogenesis, while basal eudicots have an evolutionarily important position of acting as an evolutionary bridge between basal angiosperms and core eudicots. Akebia trifoliata is an important member of the basal eudicot group. To study the early evolution of angiosperms, we identified and characterized the MADS-Box gene family on the whole-genome level of A. trifoliata. There were 47 MADS-box genes (13 type I and 34 type II genes) in the A. trifoliata genome; type I genes had a greater gene length and coefficient of variation and a smaller exon number than type II genes. A total of 27 (57.4%) experienced whole or segmental genome duplication and purifying selection. A transcriptome analysis suggested that three and eight genes were involved in whole fruit and seed development, respectively. The diversification and phylogenetic analysis of 1479 type II MADS-box genes of 22 angiosperm species provided some clues indicating that a γ whole genome triplication event of eudicots possibility experienced a two-step process. These results are valuable for improving A. trifoliata fruit traits and theoretically elucidating evolutionary processes of angiosperms, especially eudicots.


Introduction
Throughout history, plants, especially flowering plants, have evolved ideal molecular mechanisms for morphogenesis, primarily driven by the differential growth of various tissues [1], which is regulated by intertwining network cis-element and trans-acting factors also called transcription factors [2]. Transcription factors encoded by the MADS-box gene families play fundamental roles in organogenesis control and signal transduction during the tissue growth process and have therefore been extensively studied [3]. In addition, MADS-box genes are also widely used in gene or genome duplication analyses for plant species evolution because they are highly conserved [4]. Thus, further identification of MADS-box gene families from recently published plant genomes is important work.
The MADS acronym is derived from the initial names of the first four proteins with a MADS-box to be reported: MCM1 from Saccharomyces cerevisiae [5], AGAMOUS from Arabidopsis thaliana [6], DEFICIENS from Antirrhinum majus [7] and SRE from Homo sapiens [8], which shared the highly conserved 180-bp-long motif called the MADS-box, suggesting that MADS-box genes widely exist in fungi, animals and plants [9]. The conserved MADS domain with approximately 60 amino acid residues at the N-terminus of all MADSbox proteins can bind to the CArG box (CC-A-rich-GC) in the promoter of their target genes [10]. Based on the sequence structure, there are two evolutionary clades of MADSbox genes: type I, including three subgroups (Mα, Mβ and Mγ), and type II, including A. trifoliata as well as in other angiosperms, especially basal eudicots, indirectly contributing to the genetic improvement of commercial traits such as seed yield.

Identification of MADS-Box Genes in A. trifoliata
Genome sequence and annotations of A. trifoliata downloaded from the Genome Warehouse of the National Genomics Data Center under the accession number GWH-BISH00000000 (https://ngdc.cncb.ac.cn/gwh (accessed on 8 March 2022)) were employed to identify the MADS-box genes. Despite the existence of recent publications on the A. trifoliata genome [25], the corresponding genomic data files are still unavailable online. Therefore, a high quality A. trafoliata genome assembled and submitted by our group to NCBI was employed the identification and characterization in this study. To identify MADS-box genes, we used the MADS-box typical domain SRF-TF (PF00319) from the Pfam database to search for the predicted genes using the hmmsearch tool in HMMER software (v3.3.2, Sara El-Gebali, UK) with the '10e-5 e value parameter [26]. Then, the candidate genes with incomplete domains or gene structure were manually removed. To further confirm the MAD-box feature, the conserved domains and motifs were annotated using the NCBI CDD (v3. 19, Aron Marchler-Bauer, USA) and MEME Suite tools (v5.3.2, Timothy L. Bailey, Australia), respectively [27,28]. The MADS-box subfamilies were classified according to sequence similarity and the phylogenetic tree branches of reported MADS-box subfamilies from the reference genome of A. thaliana [29]. Another phylogenetic tree was constructed based only on A. trifoliata MADS-box genes to further validate the classification results. These two phylogenetic trees were both constructed by using IQtree (v2.2.0.3, Lam-Tung Nguyen, Austria) software with the maximum likelihood method, automatically selecting the optimal substitution model and evaluating branch support values via UFBoot2 tests [30][31][32]. Multiple sequence alignment analysis of MADS-box domains was aligned by ClustalW (v2.0.10, Julie D. Thompson, UK) with default parameters [33]. Previously reported A. trifoliata MADS-box genes from the GenBank database and references [22,23] were anchored to our homologues using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi (accessed on 26 March 2022)) with blastn mode.

Chromosomal Distribution of MADS-Box Genes
The A. trifoliata genome was previously assembled into 16 chromosomes based on the assembled genome data. The chromosomal positions of the MADS-box genes were assigned using the GFF3 annotation of genome data. The potential cluster of MADS-box genes was identified by sliding window analysis assuming a window size of 250 kb [34]. Then, TBtools (v1.098769, Chengjie Chen, China) was applied to visualize the positions of genes on chromosomes [35].

Synteny and Duplication Analysis of MADS-Box Genes
Both paralogous gene pairs and syntenic blocks of MADS-box genes in the A. trifoliata genome were identified using TBtools with default parameters of e-value 1 × 10 −10 and BlastP hits number 5 for a gene based on the MCScanX tool [36]. A collinear block was defined as at least 5 paralogous gene pairs that were significantly aligned (1 × 10 −5 as default). Then, the duplication types containing singleton, dispersed, proximal, tandem and WGD/segmental of the MADS-box genes were determined according to the MCScanX synteny analysis results. The nonsynonymous substitution rate (Ka) and synonymous substitution rate (Ks) of MADS-box duplicated gene pairs were calculated using TBtools with the NG method.

Construction of A Phylogenetic Tree of Type II MADS-Box Genes from 22 Species
A total of 22 species genomes (A. trichopoda, N. colorata, O. sativa, Sorghum bicolor, Apostasia shenzhenica, V. vinifera, A. majus, Helianthus annuus, A. thaliana, Nelumbo nucifera, P. somniferum, Macleaya cordata, A. coerulea, Beta vulgaris, B. distachyon, Populus trichocarpa, Theo-broma cacao, P. patens, Selaginella moellendorffii, Spirodela polyrhiza, Zostera marina and A. trifoliata) were employed to identify MADS-box genes, in which A. thaliana [37], O. sativa [38] and A. trichopoda [39]) were applied as references to identify the subfamilies of the MADS-box genes. The same identified methods as for A. trifoliata were used to detect the MADS-box genes for 22 plant species. After classifying the MADS-box subfamilies, the phylogenetic tree of every type II subfamily was constructed and modified using IQtree and FigTree v1.4.4 (https://github.com/4ambaut/figtree (accessed on 21 June 2022)) [30]. The parameter settings were the same as those listed above for the phylogenetic tree of A. trifoliata and A. thaliana to detect potential duplication events. Gene trees were rooted with genes from mosses or basal angiosperms and cladogram transformation of the branches was applied.

Expression Analysis of A. trifoliata MADS-Box Genes
The 271.49 Gb Illumina transcriptome data of A. trifoliata consisted of 36 samples of three tissues (peel, flesh and seed) at four different stages (young, enlargement, colouring and maturity) with three biological replicates, respectively. Details of the data are present in NCBI BioProject PRJNA671772 (accession IDs: SAMN16551931-33, young stage of rind, flesh and seed; SAMN16551934-36, enlargement stage; SAMN16551937-39, coloring stage; SAMN16551940-42, mature stage). The sequences of these samples were aligned to the A. trifoliata reference genome using Hisat2 (v2.1.0, Mihaela Pertea, USA) software with default parameters [40]. Then, stringtie [41] was used to evaluate the abundance count of each gene in 36 samples. The gene expression levels in each sample were estimated using fragments per kilobase of transcript per million fragments mapped (FPKM) values according to the abundance count data by using R package DESeq2 (v1.36.0, Michael I Love, Germany) [42]. For differential gene expression, 12 groups of 4 stages and 3 tissues with 3 replicates were calculated as the count matrix. The means of each gene expression in three replicates were normalized to log 2 (FPKM + 1) for visual display. After identifying the MADS-box genes from the whole genome, the expression clusters of these genes were plotted using the R package ggplot2 (v3.3) (https://ggplot2.tidyverse.org (accessed on 28 June 2022)) [43].

The Number and Type of Identified MADS-Box Genes in the A. trifoliata Genome
A total of 47 MADS-box genes were identified from the A. trifoliata genome through HMM analysis (Table 1), in which there was a broad range in the gene length (from 360 bp to 80,167 bp), exon number (from one to 19), isoelectric point (from 4.67 to 11.42), amino acid number (from 81 to 545) and molecular weight (from 9.1 to 62.53) of the putative proteins encoded by the MADS-box genes. The phylogenetic tree consisting of both 108 reference MADS-box genes (consisting of both three and 14 subfamilies of type I and type II, respectively) in A. thaliana and 47 MADS-box genes identified in A. trifoliata was constructed to classify the MADS-box types in A. trifoliata. The phylogenetic tree ( Figure 1) showed that 47 MADS-box genes of A. trifoliata were unevenly classified into all three subfamilies of type I and 13 subfamilies of type II reference MADS-box genes of A. thaliana and only FLC subfamilies of type II in A. thaliana were not detected in A. trifoliata ( Figure 1).
The type I group consisted of 13 genes and the type II group consisted of the remaining 34 genes (Table 1). A comparison analysis found that type I genes have a significantly smaller length and fewer exons than type II genes (Table S1) at the p = 0.01 level, while the differences in isoelectric point, number of amino acids and molecular weight of the putative proteins between them were not significant at the corresponding statistical test level. For example, the largest exon number of type I was only 3 for AktMγ_3, while that of type II was up to 19 for AktSOC1/TM3_2.
In addition, two reference genes (AT5G55690 and AT5G58890 in A. thaliana) were classified into Mγ subfamilies rather than the previously classified Mβ subfamilies [29], and three A. trifoliata MADS-box genes, EVM0009117, EVM0013722 and EVM0016918 were the paraphyletic group and classified into the same clade. A comparison analysis of the MADS- box domain of the putative proteins encoded by 44 Mβ and Mγ genes (6 from A. trifoliata and 38 from A. thaliana) also found that AT5G55690, AT5G58890, EVM0009117, EVM0013722 and EVM0016918 seemed closer to the Mγ clade than to the Mβ clade ( Figure S1). Conservatively, we classified the three genes into the Mβ/Mγ clade (Table 1).  The type I group consisted of 13 genes and the type II group consisted of the remaining 34 genes (Table 1). A comparison analysis found that type I genes have a significantly smaller length and fewer exons than type II genes (Table S1) at the p = 0.01 level, while the differences in isoelectric point, number of amino acids and molecular weight of the putative proteins between them were not significant at the corresponding statistical test level. For example, the largest exon number of type I was only 3 for AktMγ_3, while that of type II was up to 19 for AktSOC1/TM3_2.
In addition, two reference genes (AT5G55690 and AT5G58890 in A. thaliana) were classified into Mγ subfamilies rather than the previously classified Mβ subfamilies [29], and three A. trifoliata MADS-box genes, EVM0009117, EVM0013722 and EVM0016918 were the paraphyletic group and classified into the same clade. A comparison analysis of

Phylogeny and Conserved Motifs of MADS-Box Genes
Based on the A. trifoliata MADS-box gene phylogenetic tree (Figure 2a), 47 identified MADS-box genes in A. trifoliata were also classified as described above, except EVM0004910 (AktAG/TM8_4). Motifs and conserved domains of typical MADS-box family members were further annotated to check the MADS domain. The results showed that 46 out of 47 identified MADS-box genes in A. trifoliata had the conserved motifs (motif 1) of MADS proteins, except one Akt Mβ_1. Though there was no motif annotated with the MEME Suite in Akt Mβ_1, a MADS domain could be detected with NCBI CDD (Figure 2b). In addition, all 13 type I and 9 MIKC* genes contained only the conserved MADS domain, while all 25 MIKC C genes had both MADS and K-box domains, except for AktSOC1/TM3_2, which had MADS, LPLAT and EFh domains (Figure 2c). Generally, the coding DNA sequence of type I genes with a smaller gene length had good continuity compared with that of type II genes (Figure 2d). Compared with A. trifoliata MADS-box genes from a previous report and the GenBank database, only a few genes (10) related to floral organ development were reported (Table S2), and most of the genes (37) identified in our study have never been deeply studied before.
EVM0004910 (AktAG/TM8_4). Motifs and conserved domains of typical MADS-box family members were further annotated to check the MADS domain. The results showed that 46 out of 47 identified MADS-box genes in A. trifoliata had the conserved motifs (motif 1) of MADS proteins, except one Akt Mβ_1. Though there was no motif annotated with the MEME Suite in Akt Mβ_1, a MADS domain could be detected with NCBI CDD ( Figure  2b). In addition, all 13 type I and 9 MIKC* genes contained only the conserved MADS domain, while all 25 MIKC C genes had both MADS and K-box domains, except for Ak-tSOC1/TM3_2, which had MADS, LPLAT and EFh domains (Figure 2c). Generally, the coding DNA sequence of type I genes with a smaller gene length had good continuity compared with that of type II genes (Figure 2d). Compared with A. trifoliata MADS-box genes from a previous report and the GenBank database, only a few genes (10) related to floral organ development were reported (Table S2), and most of the genes (37) identified in our study have never been deeply studied before.

Chromosomal Position and Duplication Type of MADS-Box Genes
The physical position of all 47 identified MADS-box genes is ranked in Table 1, and 46 MADS-box genes were mapped to almost all 16 A. trifoliata chromosomes except chromosome 6 ( Figure 3), while only AktMIKC*_2 was assigned to the unassembled contig.

Estimation of Ka/Ks Value
The ratio of the nonsynonymous substitution rate (Ka) to the synonymous substitution rate (Ks) was calculated according to the paralogous pairs of 27 WGD or segmental duplication MADS-box genes, and we detected a total of 19 paralogous pairs among them. All 19 Ka/Ks values of these A. trifoliata MADS-box duplicated gene pairs were far less than 1 (Table S3). The largest Ka/Ks value of the paralogous MADS-box gene pair (Ak-tMIKC*_3 and AktMIKC*_9) was only 0.52, and the Ka/Ks value of Akt AG_3 and Akt AG_3 WGD or segmental duplicated-type genes are marked in red font, tandem duplicated-type genes are marked in blue font, and other dispersed or proximal duplicated genes are marked in black font. The black line to the right of the gene names represents cluster loci.

Estimation of Ka/Ks Value
The ratio of the nonsynonymous substitution rate (Ka) to the synonymous substitution rate (Ks) was calculated according to the paralogous pairs of 27 WGD or segmental duplication MADS-box genes, and we detected a total of 19 paralogous pairs among them. All 19 Ka/Ks values of these A. trifoliata MADS-box duplicated gene pairs were far less than 1 (Table S3). The largest Ka/Ks value of the paralogous MADS-box gene pair (AktMIKC*_3 and AktMIKC*_9) was only 0.52, and the Ka/Ks value of Akt AG_3 and Akt AG_3 was as low as 0.11. In addition, the Ka/Ks value of the paralogous type II MADS-box gene pair was commonly lower than that of type I (Table S3).

MADS-Box Genes of Evolutionarily Important Species
The genomes of twenty two evolutionarily important species, two mosses, two basal angiosperms, six monocots, five basal eudicots and seven core eudicots (Figure 4a), were employed to identify and classify the MADS-box genes, in which three species, A. thaliana, O. sativa and A. trichopoda, were used as references. A total of 1469 MADS-box genes with 19 subfamilies were identified, and the average number of MADS-box genes was 67, with a range from 24 to 162 (Tables S4 and S5). There was a medium coefficient (R = 0.52, p = 0.012) between the MADS-box gene number and genomic size among the 22 species. For type I, the number of Mα genes was larger than that of Mβ and Mγ genes in 17 out 22 species, and for type II, there were more MIKC C genes than MIKC* genes in all 22 species except Physcomitrella patens (Table S5, Figure 4b). Notably, EVM0004910 (AktAG/TM8_4) of A. trifiliata was classified into TM8 (Table S5) rather than the AG clade ( Figure 1). In addition, there was no Mγ in either moss, no FLC in all 22 species except for seven core eudicots and no OsMADS32 clade in 12 eudicots (Table S5). Both mosses only contained the GGM13 of MIKC C and lacked the remaining 14 subfamilies (Table S5). We also found that the coefficient of variation (CV) of type I genes was larger than that of type II genes (Figure 4b).

Possible Duplication Events on Different Divergent Angiosperm Branches
Plant MADS-box genes, especially type II MADS-box genes, are widely used in gene or genome duplication analyses; in particular, most duplicated MADS-box genes in A. trifoliata were derived from a WGD event (Figures 3 and S2). A total of 15 phylogenetic trees of type II MADS-box gene subfamilies in 22 angiosperm species were constructed,

Possible Duplication Events on Different Divergent Angiosperm Branches
Plant MADS-box genes, especially type II MADS-box genes, are widely used in gene or genome duplication analyses; in particular, most duplicated MADS-box genes in A. trifoliata were derived from a WGD event (Figures 3 and S2). A total of 15 phylogenetic trees of type II MADS-box gene subfamilies in 22 angiosperm species were constructed, and only the phylogenetic tree of OsMADS32 was not produced because of the lower number of these genes ( Figure S3). Although the trace of these gene duplications could be lost in evolutionary processes, we still detected at least one potential ancestral duplication event in the phylogenetic trees of 15 subfamilies except TM8 and AGL6 ( Figure S3 and Table S5), in which those of MIKC*, GGM13, ANR1, SEP and AG were probably duplicated from ancestral angiosperms or seed plant-wide WGDs (ζ or ε) due to the duplications early in basal angiosperms species, those of AGL12, PI, SQUA and SVP supported monocot-specific WGD (τ). Interestingly, similarly to the A. trifoliata, other basal eudicots also had duplicated genes in most of subfamilies of type II genes including MIKC*, AG, ANR1, AP3/DEF, SEP, SOC1/TM3, SQUA and SVP which highly overlap with core eudicots (Table S5 and Figure  S3). These subfamilies were not only duplicated in the core eudicot lineage. Moreover, DEF/AP3, SQUA, SOC1/TM3 and SVP possibility derived from a duplication event in early eudicot (ψ) and a core eudicot-wide diploid fusion event (ω) as presented in scenario 2 rather than scenario 1 in Figure 4c ( Figure S3 and Table S5).

Differential Expression Analysis of MADS-Box Genes of A. trifoliata in Various Fruit Tissues
The expression of many MADS-box genes in the flesh, seeds and peels showed that all type I genes were expressed at very low levels in all samples (Figure 5a,b). In contrast, many type II genes have a higher expression level. Three type II genes (AktAG_3, AktAG_1 and AktSEP_3) exhibited high expression at every stage in all tissues, especially flesh ( Figure 5c). In addition, many type II genes also exhibited differential expression levels among different tissues. For example, AktAG_2, AktAGL6_2, AktMIKC*_8, AktMIKC*_9, AktMIKC*_5, AktAGL15, AktGGM13 and AktAGL6_1 exhibited high expression levels in seeds and low expression levels in both flesh and peel (Figure 5c). Moreover, AktMIKC*_8 had a differential expression level among different developmental stages of seeds, with an obvious increase with growth progress (Figure 5a). Because SEEDSTICK (STK) subfamily are a sister group of AG genes, it is difficult to differentiate them based solely on a phylogenetic tree. We found that the AktAG_2 gene was relatively highly expressed in all seed stages, while AktAG_1 and AktAG_3 genes were expressed at relatively lower levels in seed. Therefore, AktAG_2 should be classified as part of STK functional genes in seed development.
had a differential expression level among different developmental stages of seeds, with an obvious increase with growth progress (Figure 5a). Because SEEDSTICK (STK) subfamily are a sister group of AG genes, it is difficult to differentiate them based solely on a phylogenetic tree. We found that the AktAG_2 gene was relatively highly expressed in all seed stages, while AktAG_1 and AktAG_3 genes were expressed at relatively lower levels in seed. Therefore, AktAG_2 should be classified as part of STK functional genes in seed development

Discussion
The name of MADS-box transcription factors indicates that they could exist widely and be highly conserved in plants, animals and fungi, which has been confirmed by various studies [45], and some homology has even been found in the common ancestor to prokaryotes and eukaryotes [46]. In plants, MADS-box transcription factors have evolved important regulatory functions in various biological processes, such as disease resistance [47], signal transduction [48] and morphogenesis, especially flower formation and seed development [49]. MADS-box genes were identified from all plant genomes, but whether their number is related to evolutionary divergence or genomic size was unclear.
Previous studies have shown that the number of MADS-box genes varies slightly among different species and usually ranges in the dozens, although up to 300 have been identified in the extreme case of Wheat [50]. The numbers were 70, 65, 47 and 90 in the genome basal angiosperm Nymphaea colorata [51], monocot Sorghum biocolour [52], basal eudicto Aquilegia coerulea [53] and core eudicot V. vinifera [54], respectively, and they were very close to the corresponding numbers (Table S5). This indicated that the number of MADS-box genes could be slightly related to the evolutionary clade. In the present study, we identified the numbers of these taxa in the genomes of 25 species from five different evolutionary clades: mosses, basal angiosperms, monocots, basal eudicots and core eudicots (Tables S4 and S5). Although those of moss were obviously fewer than those of all angiosperms, their variation was slight among the four clades within angiosperms, in which their variation was smaller in monocots than in eudicots. This result further supported the view that the number of MADS-box genes was not usually related to the position on phylogenetic trees of species, and only a dozen genes indicated that they could regulate various biological processes. In addition, previous reports showed 71, 81, 162 and 300 MADS-box genes in 430 Mb O. sativa [55], 475 Mb V. vinifera [56], 2870 Mb P. somniferum [57] and 1700 Mb Wheat [58] genomes, respectively, which suggested that the greater the number of MADS-box genes was, the larger the genomic size was. In this study, we found that the correlation coefficient between the MADS-box gene number and genomic size was 0.52 (Table S5), indicating a moderate relationship. Therefore, we agree with the view that the number of MADS-box genes could be generally related to genomic size and scarcely related to species divergence.
Here, we identified 47 MADS-box genes from the 619 Mb A. trifoliata genome (Table 1), of which 10 were previously reported [22,23] and 37 were newly identified by genomewide analysis (Table S2) which were classified into all three subfamilies of type I and 13 subfamilies of type II MADS-box genes when only the A. thaliana genome was used as the reference (Figure 1), while there were 14 subfamilies of type II genes when A. thaliana, O. sativa and A. trichopoda were used as the reference (Table S4). Our comparison analysis found that EVM0004910 in the AG lineage ( Figure 1) was reclassified into TM8 (Table S4). This could be explained by the fact that there was no TM8 clade in A. thaliana [29]. Therefore, there were a total of seventeen subfamilies, and only two subfamilies, OsMADS32 and FLC, were not identified in the MADS-box genes of the A. trifoliata genome (Table 1).
Structurally, type I genes encode only one simple SRF-like MADS domain, containing one to two exons, whereas type II genes encode MEF2-like or MIKC-type MADS domains with multiple additional domains, such as the K-box (keratin-like domain), I-box (intervening domain) and variable C-terminal domain (C-terminal region) [59], and evolutionarily, type II genes are more conserved than type I genes [4]. In type I genes, AktMγ_3 has three exons, and the other has one or two exons, while the exon number in type II was largely variable (Table 1). We found that the average amino acid number encoded by type I and type II genes was close, while the exon number of type II was significantly greater than that of type I (Table S1); additionally, the DNA sequence of type I had good continuity compared with that of type II (Figure 2d). Similar results have been reported in grape [54], rice [37] and B. distachyon [60].
The number of MADS-box genes was small, but the subfamilies, especially those of type II genes, were very numerous (Table S5), which indicated that they could have experienced different evolutionary events, such as different genomic duplications and natural selection styles. In addition, despite the small number of MADS-box genes, they were assigned to all 16 chromosomes except chromosome 6 ( Figure 3). Both the rich subfamilies and wide chromosomal distribution suggested that whole genomic duplication (WGD) or segmental duplication could be important duplication styles for A. trifoliata MADS-box genes. By performing a synteny analysis, we found that 27 (57.4%) out of the 47 total identified A. trifoliata MADS-box genes experienced WGD or segmental duplication events. Moreover, all Ka/Ks values were far less than 1 (Table S3), which clearly indicates purifying selection. Thus, we propose that MADS-box genes in A. trifoliata mainly experienced WGD or segmental duplication and strong purifying selection in the long evolutionary period.
Previous studies have suggested that type II genes exhibit higher conservation than type I genes [4]. In the present study, we found that the coefficient of variation (CV) of all three subfamilies, Mα, Mβ and Mγ, of type I genes was greater than 0.8, while that of both MIKC* and MIKC C of type II genes was less than 0.5 among 22 species (Figure 4a). In addition, although AT5G55690 and AT5G58890 were previously classified into Mβ (Table 1), the amino acid sequence encoded by them has a high similarity with Mγ ( Figure S1), which provides a reasonable explanation for why they were classified into the Mγ clade in Figure 1 and indicated that there was a rapid evolution from Mβ to Mγ. Both results confirmed that type II genes were highly conserved. Type II genes also played a key role in investigating the species diversification of flowering plants because the ABC flowering model was commonly regulated by several subfamilies of type II [12,14], even some subfamilies of type II genes involved in specification of ovule and flower development in seed plants were significant expansion in ferns [61], so we focused on the phylogenetic tree analysis for subfamilies of type II ( Figure S3).
Only GGM13 subfamilies of MIKC C were identified in both moss species (Table S5), which indicated that these genes could have originated early, generally before basal angiosperm divergence. In contrast, FLC subfamilies were only detected in core eudicots (Table S5) and were putatively core eudicot-specific, which suggested that they could have recently originated, after core eudicots diverged from basal eudicots. Previous studies have suggested a series of WGDs in angiosperms, mainly including ancestral angiosperms or seed plant-wide WGDs (ζ or ε), monocot-specific WGDs (τ) and core eudicot-specific γ WGTs [44,62,63]. In our study, traces of ζ or ε could be detected in the phylogenetic tree of DEF/AP3, SQUA, SOC1/TM3 and SVP. On the one hand, there were at least two copies of their homologues in many species among 20 angiosperms, especially in A. trichopoda, a model basal angiosperm (Table S5). On the other hand, their topological structures are largely consistent with the early angiosperm species phylogeny ( Figure S3). Likewise, traces of monocot-specific τ could also be detected in the phylogenetic tree of the AGL12, PI, SQUA and SVP subfamilies ( Figure S3). Only the traditional γ-WGT would be challenged by the phylogenetic tree of some subfamilies of type II genes.
The traditional view is that the γ event occurred approximately 117 million years ago (Mya) and at the early stages of core eudicots after the divergence of the Ranunculales and core eudicots [57,63]. In fact, both the composition and occurrence time of the γ event are still in doubt. For example, a shared WGD event on the common ancestor of basal eudicot, which is phylogenetic close to the core eudicot γ-WGT was also detected in A. trifolita and many other basal eudicots based on the transcriptome data [61]. However, another genome analyses of A. coerulea provided some evidence suggesting that the γ-WGT event may have consisted of two steps [64]. Obviously, precisely positioning the γ-WGT within a narrow time window is difficult mainly due to the following two reasons: one is the absence of knowledge about the WGT, such as whether a single event or two independent events resulted in the triplication of the genome, and the other is the short time space from eudicot divergence to core eudicot divergence. The phylogenetic trees of DEF/AP3, SQUA, SOC1/TM3 and SVP in our study supported one common eudicot-wide duplication event (ψ) and core eudicot-wide additional diploid fusion event (ω), putatively occurring before and after core eudicot divergence from basal eudicots, respectively ( Figure S3), which could be explained well by scenario 2 in Figure 4c (Figures 4c and S3 and Table S5). Furthermore, the genes of the FLC and AGL15 subfamilies were only duplicated in the core eudicot lineage, and two clades of homologues were also more likely to be duplicated from a potential single ω diploidization event rather than an indivisible γ triploidization event (Figures 4c and S3 and Table S5). Parts of type II MADS-box gene subfamilies, such as AP3/DEF, SEP, SOC1/TM3 and SQUA, were considered to be duplicated from core eudicot-wide γ-WGT events in previous reports [14]. Unfortunately, MADS-box genes were not identified completely due to the use of transcriptomic rather than genomic data. For example, only a few MADS-box genes have been previously identified in A. trifoliata (Table S2) [22,23] and in A. coerulea (Table S5) [53]. Importantly, previous phylogenetic analyses of other basal eudicots also afforded some strong evidence of two close duplication events supporting the ψ and ω hypothesis [65]. Comprehensively, the phylogenetic trees of these subfamilies agreed well with scenario 2 rather than scenario 1 of Figure 4c, which suggested that γ could be a multiple event consisting of eudicot-wide ψ and core eudicotspecific ω rather than a single γ event. Although the origin of the γ-WGT event is currently a highly-debated field which cannot be solved simply by using the phylogenetic gene tree of the MADS-box family, our results provide an important clue to solve this problem in the future. Differential expression analysis using comparative transcriptomics can provide important information about gene functions. We found that many MADS-box genes, especially type I genes, were expressed at extremely low levels in all three tissues (flesh, seed and peel) at all developmental stages, while AktAG_3, AktAG_1 and AktSEP_3 exhibited high expression in almost all samples ( Figure 5), which indicated that only a few MADS-box genes could meet the regulatory requirement of fruit formation and that MADS-box gene expression is usually organ-or tissue-specific. Some previous reports suggested that some genes of both the AG and SEP subfamilies could be related to fruit development [66,67]; therefore, we concluded that the three genes could regulate whole fruit development. In addition, we also found that eight genes of five subfamilies (MIKC*, AG, AGL6, AGL15 and GGM13) were only expressed a high level in seeds, and in all subfamilies except AG15 which existed in all twenty angiosperms (Table S5). There are various reports stating that MIKC* [68], AG [67], AGL6 [69] and GGM13 [70] participate in regulating seed development. Therefore, AktAG_2, AktAGL6_2, AktMIKC*_8, AktMIKC*_9, AktMIKC*_5, AktAGL15, AktGGM13 and AktAGL6_1 positively regulated seed development in A. trifoliata.

Conclusions
We identified 47 nonredundant MADS-box genes in the A. trifoliata genome, and 46 of them were physically assigned to 16 high-quality assembled pseudochromosomes. All 47 genes were classified into 17 subfamilies, and many of them putatively experienced WGD or segmental duplication and purification in the evolutionary process. Several candidate MADS-box genes involved in fruit or seed development were screened. Importantly, traces of major WGD events in whole angiosperms could be detected in many subfamilies of type II genes. Interestingly, the phylogenetic tree of the DEF/AP3, SQUA, SOC1/TM3, SVP, FLC and AGL15 subfamilies provides some evidence that γ events are multiple events rather than a single event consisting of a eudicot-wide ψ and core eudicot-specific ω. Comprehensively, both new data resources and insight into traditional γ events would be helpful to practically improve the objective traits of A. trifoliata fruits and to completely elucidate the evolutionary process of early-stage eudicots.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes13101777/s1, Figure S1: Comparison of MADS-box domains between Mβ and Mγ.; Figure S2: Synteny results of the A. trifoliata genome and MADS-box genes.; Figure S3: Phylogenetic tree of type II genes from 22 plant species.; Table S1: Statistical analysis of gene structures between type I and type II MADS-box genes; Table S2: Previously reported A. trifoliata MADS-box genes; Table S3: Substitution rate and selection pressure of MADS-box duplicated gene pairs; Table S4