Genome-Wide Identification MIKC-Type MADS-Box Gene Family and Their Roles during Development of Floral Buds in Wheel Wingnut (Cyclocarya paliurus)

MADS-box transcription factors (TFs) have fundamental roles in regulating floral organ formation and flowering time in flowering plants. In order to understand the function of MIKC-type MADS-box family genes in Cyclocarya paliurus (Batal.) Iljinskaja, we first implemented a genome-wide analysis of MIKC-type MADS-box genes in C. paliurus. Here, the phylogenetic relationships, chromosome location, conserved motif, gene structure, promoter region, and gene expression profile were analyzed. The results showed that 45 MIKC-type MADS-box were divided into 14 subfamilies: BS (3), AGL12 (1), AP3-PI (3), MIKC* (3), AGL15 (3), SVP (5), AGL17 (2), AG (3), TM8 (1), AGL6 (2), SEP (5), AP1-FUL (6), SOC1 (7), and FLC (1). The 43 MIKC-type MADS-box genes were distributed unevenly in 14 chromosomes, but two members were mapped on unanchored scaffolds. Gene structures were varied in the same gene family or subfamily, but conserved motifs shared similar distributions and sequences. The element analysis in promoters’ regions revealed that MIKC-type MADS-box family genes were associated with light, phytohormone, and temperature responsiveness, which may play important roles in floral development and differentiation. The expression profile showed that most MIKC-type MADS-box genes were differentially expressed in six tissues (specifically expressed in floral buds), and the expression patterns were also visibly varied in the same subfamily. CpaF1st24796 and CpaF1st23405, belonging to AP3-PI and SEP subfamilies, exhibited the high expression levels in PA-M and PG-F, respectively, indicating their functions in presenting heterodichogamy. We further verified the MIKC-type MADS-box gene expression levels on the basis of transcriptome and qRT-PCR analysis. This study would provide a theoretical basis for classification, cloning, and regulation of flowering mechanism of MIKC-type MADS-box genes in C. paliurus.


Introduction
MADS-box gene family, one of the most extensively studied transcription factors (TFs), have fundamental roles in developmental control and signal transduction in plants, animals, and fungi [1]. According to the sequence structure, MADS-box gens have been divided into two super clades: type I and type II. Among them, type I encodes one SRF-like MADS domain with a simple structure, containing only 1 to 2 exons, whereas type II constitutes MIKC C and MIKC * in plants encodes a MEF2-like and MIKC type MADS domain with semi-conservative K-box (keratin-like domain) and poorly conservative I-box (intervening domain) and variable C-terminal domain (C-terminal region) [2,3]. MIKCtype MADS-box genes are best known for their functions: regulating the development of

Identification of the MIKC-Type MADS-Box Gene Family in C. paliurus
MIKC-type MADS-box genes were identified from the C. paliurus genome using the HMM program for the K-box domain search and the local BLASTP search with Arabidopsis MADS-box gene sequences. As a result, 45 identified MIKC-type MADS-box genes were serially named as CpMADS01 through CpMADS45 for convenience (Table 1). In addition, the results showed that the MADS-box genes varied substantially in the length of mRNA transcripts and their encoded protein sequences. The length of 45 CpMADS mRNA products (CDS) ranged from 474 to 1350 bp and that of translated protein sequences varied from 157 to 449 amino acids. The amino acid sequence of CpMADS25 was the shortest, with only 157 amino acid residues; CpMADS18 was the longest, with 449 amino acid residues. The theoretical isoelectric point (PI) was between 4.99 (CpMADS05) and 9.96 (CpMADS22). The molecular mass of MADS-box members ranged from 18.12 (CpMADS25) to 50.77 kDa (CpMADS18) ( Table 1).

Chromosome Location and Synteny Analysis of MIKC-Type MADS-Box Genes
All MIKC-type MADS-box genes except for CpMADS44 and CpMADS45 in C. paliurus were mapped to 14 chromosomes (Chrs) (Figure 1). The maximum number of MADS-box genes per chromosome was found on Chr02 and Chr13, both with seven MADS-box genes. Six genes were located on Chr01; four genes were found on Chr06; and three genes each were located on Chr04, Chr05, and Chr07. Chr10, Chr11, Chr12, and Chr14 encompassed the minimum numbers of MADS-box genes, with only one each. Among MIKC-type MADS-box genes from C. paliurus, a total of 127 pairs of segmental duplicated genes were identified, as shown in Figure S1 and Table S1, and five pairs of tandem duplicated genes (CpaF1st02480 and CpaF1st02481, CpaF1st23405 and CpaF1st23406, CpaF1st34609 and CpaF1st34610, CpaF1st34822 and CpaF1st34828, CpaF1st46308 and CpaF1st46310) were observed, as shown in Figure 1.

Chromosome Location and Synteny Analysis of MIKC-Type MADS-Box Genes
All MIKC-type MADS-box genes except for CpMADS44 and CpMADS45 in C. paliurus were mapped to 14 chromosomes (Chrs) (Figure 1). The maximum number of MADSbox genes per chromosome was found on Chr02 and Chr13, both with seven MADS-box genes. Six genes were located on Chr01; four genes were found on Chr06; and three genes each were located on Chr04, Chr05, and Chr07. Chr10, Chr11, Chr12, and Chr14 encompassed the minimum numbers of MADS-box genes, with only one each. Among MIKCtype MADS-box genes from C. paliurus, a total of 127 pairs of segmental duplicated genes were identified, as shown in Figure S1 and Table S1, and five pairs of tandem duplicated genes (CpaF1st02480 and CpaF1st02481, CpaF1st23405 and CpaF1st23406, CpaF1st34609 and CpaF1st34610, CpaF1st34822 and CpaF1st34828, CpaF1st46308 and CpaF1st46310) were observed, as shown in Figure 1. Syntenic analysis was performed for three plants (Arabidopsis thaliana, Populus trichocarpa, and Juglans regia) to investigate the evolutionary relationship of MIKC-type MADSbox genes among these three species (Figure 2). A total of 23, 7, and 6 orthologous gene pairs of orthologous MADS-box genes were identified between C. paliurus and J. regia, C. paliurus and P. trichocarpa, and C. paliurus and A. thaliana, respectively ( Figure 2). In contrast, 13 collinear gene pairs identified between C. paliurus and J. regia were found between C. paliurus and P. trichocarpa/A. thaliana, 6 collinear gene pairs identified between C. paliurus and P. trichocarpaa were presented between C. paliurus and J. regia/A. thaliana, and all 6 collinear gene pairs identified between C. paliurus and A. thaliana were detected between C. paliurus and P. trichocarpaa/J. regia ( Figure 2). Moreover, we also noticed that there were five common collinear gene pairs in each of the following species: C. paliurus, J. regia, and A. thaliana ( Figure 2). Syntenic analysis was performed for three plants (Arabidopsis thaliana, Populus trichocarpa, and Juglans regia) to investigate the evolutionary relationship of MIKC-type MADS-box genes among these three species (Figure 2). A total of 23, 7, and 6 orthologous gene pairs of orthologous MADS-box genes were identified between C. paliurus and J. regia, C. paliurus and P. trichocarpa, and C. paliurus and A. thaliana, respectively ( Figure 2). In contrast, 13 collinear gene pairs identified between C. paliurus and J. regia were found between C. paliurus and P. trichocarpa/A. thaliana, 6 collinear gene pairs identified between C. paliurus and P. trichocarpaa were presented between C. paliurus and J. regia/A. thaliana, and all 6 collinear gene pairs identified between C. paliurus and A. thaliana were detected between C. paliurus and P. trichocarpaa/J. regia ( Figure 2). Moreover, we also noticed that there were five common collinear gene pairs in each of the following species: C. paliurus, J. regia, and A. thaliana

Phylogeny, Structural, and Conserved Motifs Analysis of MIKC-Type MADS-Box Genes
To identify the evolutionary relationship of MIKC-type MADS-box proteins, we con structed a neighbor-joining (NJ) phylogenetic tree on the basis of the full-length protei sequences ( Figure 3). The MADS-box proteins were divided into 14 subfamilies in C. pal urus ( Figure 3). Among the 45 MIKC-type MADS-box members, only three members namely, CpaF1st12920, CpaF1st25075, and CpaF1st14025, were classified as MIKC*. Th other 42 genes were classified as MIKC C , which were divided into 13 subfamilies. Th subfamily SOC1 with seven MADS-box family members were the largest clade. In con trast, the subfamilies AGL12, TM8, and FLC contained the smallest number of MADS-bo families with one member. Moreover, the numbers of the subfamilies BS, AP3-PI, AGL15 SVP, AGL17, AG, AGL6, SEP, and AP1-FUL were 3, 3, 3, 5, 2, 3, 2, 5, and 7, respectively.

Phylogeny, Structural, and Conserved Motifs Analysis of MIKC-Type MADS-Box Genes
To identify the evolutionary relationship of MIKC-type MADS-box proteins, we constructed a neighbor-joining (NJ) phylogenetic tree on the basis of the full-length protein sequences ( Figure 3). The MADS-box proteins were divided into 14 subfamilies in C. paliurus ( Figure 3). Among the 45 MIKC-type MADS-box members, only three members, namely, CpaF1st12920, CpaF1st25075, and CpaF1st14025, were classified as MIKC*. The other 42 genes were classified as MIKC C , which were divided into 13 subfamilies. The subfamily SOC1 with seven MADS-box family members were the largest clade. In contrast, the subfamilies AGL12, TM8, and FLC contained the smallest number of MADS-box families with one member. Moreover, the numbers of the subfamilies BS, AP3-PI, AGL15, SVP, AGL17, AG, AGL6, SEP, and AP1-FUL were 3, 3, 3, 5, 2, 3, 2, 5, and 7, respectively. The gene structure of 45 MIKC-type MADS-box genes in C. paliurus was analyzed ( Figure 4A,B). We observed that the gene structure of MIKC-type MADS-box members was relatively variable as the exon number ranged from 4 to 26. Among them, CpaF1st12920 of MIKC* subfamily contained the maximum exons of up to 26, which was obviously different from the structures of the other two subfamily members (the exon numbers were 8 and 19, respectively). Similarly, the CpaF1st23406 of AP1-FUL subfamily contained 15 exons, but members in other subfamily contained only seven to eight exons. By contrast, the exon numbers of BS subfamily members were extremely few with four to seven exons in each gene. These results indicated that the MIKC-type MADS-box genes structures were also visibly varied, even in the same gene family or subfamily.
A total of 10 different motifs were detected in MIKC-type MADS-box gene family of C. paliurus ( Figure 4C). A total of 32 of 45 MADS-box genes contained at least 4 main motifs (motifs 1, 2, 4, and 5). Motif 9 was specific to the subfamily AGL15 proteins CpaF1st16532 and CpaF1st11746. Motif 8 only existed in AP1-FUL. Motif 10 was shared by subfamilies FLC, SOC1, AGL15, and SVP. Each member belonging to SEP and AGL6 subfamilies shared the same seven motifs among the gene structures. The C-terminal of C. paliurus MADS-box family members consisted of five motifs: motif 5, motif 6, motif 7, motif 8, and motif 5. Moreover, the N-terminal was composed of only two motifs: motif 1 and motif 5 ( Figure 4C). This evidence collectively supported that the same subfamily consists of almost identical conserved motifs. The gene structure of 45 MIKC-type MADS-box genes in C. paliurus was analyzed ( Figure 4A,B). We observed that the gene structure of MIKC-type MADS-box members was relatively variable as the exon number ranged from 4 to 26. Among them, CpaF1st12920 of MIKC* subfamily contained the maximum exons of up to 26, which was obviously different from the structures of the other two subfamily members (the exon numbers were 8 and 19, respectively). Similarly, the CpaF1st23406 of AP1-FUL subfamily contained 15 exons, but members in other subfamily contained only seven to eight exons. By contrast, the exon numbers of BS subfamily members were extremely few with four to seven exons in each gene. These results indicated that the MIKC-type MADS-box genes structures were also visibly varied, even in the same gene family or subfamily.
A total of 10 different motifs were detected in MIKC-type MADS-box gene family of C. paliurus ( Figure 4C). A total of 32 of 45 MADS-box genes contained at least 4 main motifs (motifs 1, 2, 4, and 5). Motif 9 was specific to the subfamily AGL15 proteins CpaF1st16532 and CpaF1st11746. Motif 8 only existed in AP1-FUL. Motif 10 was shared by subfamilies FLC, SOC1, AGL15, and SVP. Each member belonging to SEP and AGL6 subfamilies shared the same seven motifs among the gene structures. The C-terminal of C. paliurus MADS-box family members consisted of five motifs: motif 5, motif 6, motif 7, motif 8, and motif 5. Moreover, the N-terminal was composed of only two motifs: motif 1 and motif 5 ( Figure 4C). This evidence collectively supported that the same subfamily consists of almost identical conserved motifs.

Estimation of Nonsynonymous (Ka) and Synonymous (Ks) Substitution Rates and Ka/Ks Values
To explore the selection pressure experienced by MIKC-type MADS-box genes after the duplication during evolution, we calculated the substitution ratios of nonsynonymous (Ka) versus synonymous substitution (Ks) for each paralogous pair (Table 2). We identified a total of five pairs of tandem replications in C. paliurus. The Ka/Ks ratio ranged from 0.2165 (CpaF1st02480 and CpaF1st02481) to 1.3276 (CpaF1st46308 and CpaF1st46310) with an average of 0.6473. The ratios of Ka/Ks of three pairs of paralogous MADS-box gene pairs (CpaF1st02480 and CpaF1st02481, CpaF1st23405 and CpaF1st23406, and CpaF1st46308 and CpaF1st46310) were less than 1, suggesting that these genes are under purifying selection. The value of Ka/Ks (CpaF1st34609 and CpaF1st34610, and CpaF1st34822 and CpaF1st34828) were greater than 1, suggesting positive selection for

Estimation of Nonsynonymous (Ka) and Synonymous (Ks) Substitution Rates and Ka/Ks Values
To explore the selection pressure experienced by MIKC-type MADS-box genes after the duplication during evolution, we calculated the substitution ratios of nonsynonymous (Ka) versus synonymous substitution (Ks) for each paralogous pair ( Table 2). We identified a total of five pairs of tandem replications in C. paliurus. The Ka/Ks ratio ranged from 0.2165 (CpaF1st02480 and CpaF1st02481) to 1.3276 (CpaF1st46308 and CpaF1st46310) with an average of 0.6473. The ratios of Ka/Ks of three pairs of paralogous MADS-box gene pairs (CpaF1st02480 and CpaF1st02481, CpaF1st23405 and CpaF1st23406, and CpaF1st46308 and CpaF1st46310) were less than 1, suggesting that these genes are under purifying selection. The value of Ka/Ks (CpaF1st34609 and CpaF1st34610, and CpaF1st34822 and CpaF1st34828) were greater than 1, suggesting positive selection for both duplicates in the two gene pairs. The results indicated that most of the gene pairs have evolved under purifying selection pressure and that the duplication might play an important role in the expansion of MADS-box gene family in C. paliurus.

Promoter Region Analysis of MIKC-Type MADS-Box Genes
To identify cis-acting elements involving C. paliurus flower development, we analyzed the promoter sequences of MIKC-type MADS-box genes. In addition to the cis-acting element, which commonly exists in eukaryotes including TATA-box and CAAT-box, the most predominant cis-acting elements in the promoters' regions of MADS-box family genes were associated with light, phytohormone, and temperature responsiveness ( Figure 5). Additionally, the common cis-acting elements, such as ARE, G-box, ABRE, CGTCA-motif, and TGACG-motif, were frequently identified in the majority of MADS-box members, with at least 30 genes containing the above elements. By contrast, we also found several ciselements (Sp1, P-box, A-box, GARE-motif, and MRE), which were distributed in minority members, with at least 16 genes containing the above elements. Moreover, a number of cis-elements were related to light responsiveness through G-box, G-Box, GT1-motif, MBS, Sp1, and MRE. The CGTCA-motif and TGACG-motif were associated with methyl jasmonate (MeJA) responsiveness. The LTR was involved in low-temperature responsiveness. The ABRE and GARE-motif were related to abscisic acid and gibberellin responsiveness, respectively. The CAT-box was related to regulation of meristem expression (Table S1). These results indicated that different components of cis-elements in MADS-box genes might effectively perform different biological functions during flower development and differentiation.

Expression Profiles of MIKC-Type MADS-Box Genes among Different Tissues
To investigate potential functions of MIKC-type MADS-box genes among six tissues of C. paliurus during the floral initiation, we performed RNA-seq analysis to examine expression patterns of 45 MADS-box genes ( Figure S2). Except for SOC1 and SVP subfamilies, which were highly expressed not only in floral buds but also in leaves, most of the other subfamilies were expressed specifically in floral buds. Further, we found that the expression levels of AP1-FUL, AP3-PI, AG, and SEP subfamilies belonging to A, B, C, and E modules, respectively, were higher than the other 10 subfamily members. However, the AP1-FUL (A module) subfamily CpaF1st12808 gene and AG (C module) subfamily CpaF1st24126 were weakly expressed in each floral bud. These results indicated that the MIKC-type MADS-box gene expression levels were also visibly varied in the same gene family, and even in the same subfamily. Interestingly, the expression level of AP3-PI (B) subfamily CpaF1st24796 in PA male floral buds (PA-M) was higher than in PA female floral buds (PA-F) and PG male floral buds (PG-M). These results strongly suggest that the high expression level of CpaF1st24796 could promote the developmental priority of the male flower, and eventually present as a PA mating type. On the other hand, the expression of SEP subfamily CpaF1st23405 in PG female floral buds (PG-F) was higher than in PA female floral buds (PA-F) and PG male floral buds (PG-M), which may preferentially promote female floral bud development and eventually contribute to its PG mating type. Moreover, most gene pairs of tandem replications also showed the different expression patterns. For instance, CpaF1st02481 was strongly expressed in four floral buds, but CpaF1st02480 was weakly expressed in them ( Figure S2). In addition, a total of nine ABCE module MADS-box genes, namely, SEP subfamily CpaF1st02480, CpaF1st02666, and CpaF1st12676 genes; AG subfamily CpaF1st08032 and CpaF1st38580 genes; AP3-PI subfamily CpaF1st24796 gene; and AP1-FUL subfamily CpaF1st29740 and CpaF1st18733 genes, were selected for qRT-PCR analysis. Generally, the relative expressions of selected genes were consistent with FPKM values from RNA-seq, indicating the expression profiles of our transcriptomic results were reliable ( Figure 6).  Table S1.

Expression Profiles of MIKC-Type MADS-Box Genes among Different Tissues
To investigate potential functions of MIKC-type MADS-box genes among six tissues of C. paliurus during the floral initiation, we performed RNA-seq analysis to examine expression patterns of 45 MADS-box genes ( Figure S2). Except for SOC1 and SVP subfamilies, which were highly expressed not only in floral buds but also in leaves, most of the other subfamilies were expressed specifically in floral buds. Further, we found that the expression levels of AP1-FUL, AP3-PI, AG, and SEP subfamilies belonging to A, B, C, and E modules, respectively, were higher than the other 10 subfamily members. However, the AP1-FUL (A module) subfamily CpaF1st12808 gene and AG (C module) subfamily CpaF1st24126 were weakly expressed in each floral bud. These results indicated that the  Table S1.
CpaF1st02480 was weakly expressed in them ( Figure S2). In addition, a total of nine ABCE module MADS-box genes, namely, SEP subfamily CpaF1st02480, CpaF1st02666, and CpaF1st12676 genes; AG subfamily CpaF1st08032 and CpaF1st38580 genes; AP3-PI subfamily CpaF1st24796 gene; and AP1-FUL subfamily CpaF1st29740 and CpaF1st18733 genes, were selected for qRT-PCR analysis. Generally, the relative expressions of selected genes were consistent with FPKM values from RNA-seq, indicating the expression profiles of our transcriptomic results were reliable ( Figure 6).

Discussion
MIKC-type MADS-box transcription factors are one of the notable protein families in flowering plants involved in flower induction, floral initiation, and floral morphogenesis [11]. In this study, a genome-wide identification of the MIKC-type MADS-box protein family was carried out in C. paliurus. The identified 45 C. paliurus MIKC-type MADS-box proteins were classified into 14 distinct subfamilies, which is also consistent with the classification and identified results of A. thaliana [12]. Comparing the number of each subfamily, we found that only one and three MIKC-type MADS-box members in subfamilies FLC and MIKC* were far less than six in A. thaliana, respectively (Figure 3). On the contrary, six and five members in subfamily AP1-FUL and SVP were more than three and two in A. thaliana, respectively ( Figure 3). Previously, the identification of MIKC-type MADS-box genes have been reported in many other plants, such as 38 members in Vitis vinifera [6], 39 members in Prunus persica [13], 36 members in Punica granatum [14], 201 members in Triticum aestivum [15], and 110 members in Gossypium hirsutum [16]. The differences in the number of MIKC-type MADS-box members between C. paliurus and T. aestivum or G. hirsutum may be due to the fact that T. aestivum and G. hirsutum experienced gene duplication events, including genome-wide duplication, tandem duplication, and fragment duplication. Comparing the number of duplicated genes of MIKC-type MADS-box genes between C. paliurus and A. thaliana, we found the 132 (127 segmental duplicated and 5 tandem duplicated genes) gene pairs to be more than 81 in A. thaliana [12], which may promote the expansion of the MIKC-type MADS-box gene family. Moreover, the 583.45 Mb assembled genome size of C. paliurus was normally larger than 119.67 Mb in A. thaliana [17]. Comparison of the MIKC-type MADS-box gene number between C. paliurus and A. thaliana showed that there was no significant correlation between genome size and number of members.
The structures and conserved motifs of each subfamily genes were analyzed on the basis of our classification results from phylogenetic tree analysis. The MIKC-type MADS-box genes structures were visibly varied, even in the same gene subfamily, which is similar to the results reported in Beta vulgaris [18], but different from the results in Punica granatum [14] and citrus [19]. Motif analysis demonstrated that most closely related MADS-box proteins from the same subfamilies shared similar motif distributions and sequences. Similar results were also reported in Gossypium hirsutum [16] and Lactuca sativa [20]. On the basis of the expression analysis, most of the same subfamily genes showed the same expression patterns in different tissues, but small amounts of genes from the same subfamilies were also differentially expressed. For instance, AG subfamily genes CpaF1st38580 and CpaF1st08032 showed high expression levels in four types of floral buds, but CpaF1st24126 presented contrary expression patterns. The study indicated that the same subfamily genes had divergent and differentiated expressions in terms of temporal and spatial characteristics during the process of evolution. These results may have been caused by the inherent need for paragenetic genes of the same origin to avoid functional redundancy and to produce subfunctions and new functions. In addition, most genes of paralogous MIKC-type MADS-box gene pairs in C. paliurus showed different expression patterns, indicating that C. paliurus may undergo multiple selection directions during the evolutionary process [21].
Promoter region analysis discovered that most of the components of cis-elements in MIKC-type MADS-box genes were associated with light, phytohormone, and temperature responsiveness. In addition, most MADS-box genes showed higher expression levels in floral buds than leaves. These results indicated that MIKC-type MADS-box genes might play an important role in flower development and differentiation, which was also supported by the results in A. thaliana [12]. As a typical heterodichogamous species, C. paliurus possesses two complementary morphs with asynchronous flowering (PG and PA mating types), which may effectively prevent from selfing, reduce intramorph inbreeding, and heavily contribute to the pattern of genetic diversity in the process of species evolution. Important as they are, the molecular mechanisms involving heterodichogamy remain unknown. In our study, the expression level of AP3-PI subfamily CpaF1st24796 in PA-M was higher than in PA-F and PG-M. On the other hand, the expression of SEP subfamily CpaF1st23405 in PG-F was higher than in PA-F and PG-M ( Figure S1). These results strongly suggest that the high expression of CpaF1st24796 or CpaF1st23405 in PA-M or PG-F may promote male or female flower development priority, eventually presenting as the PA or PG mating type. These two gene functions were also consistent with previous reports on A. thaliana [22], Rosa chinensis [23], and Populus trichocarpa [24].

Plant Materials, cDNA Synthesis, and Transcriptome Sequencing
The female flowers, male flowers, and leaves in various protandry (PA) and protogyny (PG) individuals were sampled at germination stage (24 March 2019) from germplasm bank of C. paliurus, which is located in Baima experimental base of Nanjing Forestry University (31 • 35 N, 119 • 09 E), Nanjing, Jiangsu Province, China. Collected samples were immediately frozen in liquid nitrogen and stored at −80 • C. Each tissue sample contained three biological replicates, and each replica was composed of tissues from 3 individuals. Total RNA was extracted from various tissues using E.Z.N.A Plant RNA Isolation Kit (OMEGA, USA). RNA concentration and purity were determined by NanoDrop spectrophotometer 2000C (Thermo Fisher Scientific, Waltham, MA, USA), and RNA integrity was confirmed by 1.0% agarose gel electrophoresis. The RNAs with the values of OD 260 /OD 280 within 1.8-2.0 were reverse transcribed to cDNA using PrimeScript™ RT reagent kit with gDNA Eraser (TaKaRa, China), according to the instructions of the manufacturer. Then, the cDNAs were diluted at 1:10 with RNase-free water and stored at −20 • C for qRT-PCR analyses. All cDNA libraries were loaded onto an Illumina HiSeqTM2000 system (2 × 100 bp read length), and the quality of raw reads was checked by FastQC [25]. The poor-quality sequences (the bases of Q-score < 10%), short sequences (<50 bp), primer sequences, and adapter sequences were removed and excluded from the raw reads.

Identification of MIKC-Type MADS-Box Genes in C. paliurus
The C. paliurus whole protein sequence was downloaded from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/, accessed on 10 August 2021). To identify the MIKC-type MADS-box proteins from C. paliurus, we used Arabidopsis MADS-box protein sequences as a query [12] (Table S2), and the local alignment search tool (BLASTP) [26] with E value less than 1e-10 was applied for global search of MIKCtype MADS-box proteins. After this initial screening, the HMM (hidden Markov model) profile for the Pfam MADS-domain (PF01486) was obtained from the pfam database (http: //pfam.xfam.org/, accessed on 10 August 2021). Then, the HMM profile was used to search against C. paliurus proteomes using the HMMER v3.3.2 [27] with default parameters. The obtained sequences were filtered by the following two criteria: (1) incomplete sequences were removed if the sequence lengths were less than the length of conserved motif (33 bp); (2) sequences with 'N' >10 bp were also excluded from further analysis.

Chromosome Location and Synteny Analysis of MIKC-Type MADS-Box Genes
MIKC-type MADS-box genes of C. paliurus were exactly located on the chromosome regions using the MapGene2Chrom wb v2 tool (http://mg2c.iask.in/mg2c_v2.0/, accessed on 11 August 2021) on the basis of genome annotation downloaded from the NCBI database. Multiple Collinearity Scan toolkit (MCScanX) [28] was adopted to identify the synteny relationship of homologous MIKC-type MADS-box genes obtained from C. paliurus and other three species (A. thaliana, P. trichocarpa, and J. regia).

Prediction of Cis-Regulatory Elements of Promoter Region
The 2000 bp upstream sequences of C. paliurus MIKC-type MADS-box genes were extracted from whole-genome sequence and were analyzed using PlantCARE promoter analysis tool (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 13 August 2021) with default parameters for the prediction of various cis-acting regulatory elements [29,30].

Estimation of Ka/Ks Values
The duplicated gene pairs belonging to tandem and segmental duplications were extracted from whole genome genes. Then, we calculated the Ka and Ks substitution rates by using KaKs_Calculator toolkit [31] on the basis of verified duplicated gene pairs of the MIKC-type MADS-box family.

Expression Analysis of Selected MIKC-Type MADS-Box Genes
After the reads' quality control and adapter trimming, the clean RNA-seq reads were aligned to reference genomes using HiSAT2 with default parameters, and the expected number of FPKM fragments mapped were calculated using StringTie program [32] with default parameters.
Nine genes related to floral bud differentiation and development, as well as hormonemediated signaling pathway, were selected for qRT-PCR analysis, including CpaF1st02666, CpaF1st02480, CpaF1st12676, CpaF1st08032, CpaF1st38580, CpaF1st24796, CpaF1st29740, and CpaF1st18733. Moreover, internal control 18S ribosomal RNA were selected according to Chen's report [7]. The reaction conditions of PCR and methods of relative gene expression calculation were presented by Chen [7].