Gene Structural Specificity and Expression of MADS-Box Gene Family in Camellia chekiangoleosa

MADS-box genes encode transcription factors that affect plant growth and development. Camellia chekiangoleosa is an oil tree species with ornamental value, but there have been few molecular biological studies on the developmental regulation of this species. To explore their possible role in C. chekiangoleosa and lay a foundation for subsequent research, 89 MADS-box genes were identified across the whole genome of C. chekiangoleosa for the first time. These genes were present on all the chromosomes and were found to have expanded by tandem duplication and fragment duplication. Based on the results of a phylogenetic analysis, the 89 MADS-box genes could be divided into either type I (38) or type II (51). Both the number and proportion of the type II genes were significantly greater than those of Camellia sinensis and Arabidopsis thaliana, indicating that C. chekiangoleosa type II genes experienced a higher duplication rate or a lower loss rate. The results of both a sequence alignment and a conserved motif analysis suggest that the type II genes are more conserved, meaning that they may have originated and differentiated earlier than the type I genes did. At the same time, the presence of extra-long amino acid sequences may be an important feature of C. chekiangoleosa. Gene structure analysis revealed the number of introns of MADS-box genes: twenty-one type I genes had no introns, and 13 type I genes contained only 1~2 introns. The type II genes have far more introns and longer introns than the type I genes do. Some MIKCC genes have super large introns (≥15 kb), which are rare in other species. The super large introns of these MIKCC genes may indicate richer gene expression. Moreover, the results of a qPCR expression analysis of the roots, flowers, leaves and seeds of C. chekiangoleosa showed that the MADS-box genes were expressed in all those tissues. Overall, compared with that of the type I genes, the expression of the type II genes was significantly higher. The CchMADS31 and CchMADS58 genes (type II) were highly expressed specifically in the flowers, which may in turn regulate the size of the flower meristem and petals. CchMADS55 was expressed specifically in the seeds, which might affect seed development. This study provides additional information for the functional characterization of the MADS-box gene family and lays an important foundation for in-depth study of related genes, such as those involved in the development of the reproductive organs of C. chekiangoleosa.


Introduction
Oil tea is one of the four major types of oil trees in the world and generally refers to any one of several Camellia plant species that are in the Theaceae family and have a high seed oil content. Oil tea species are the most important woody oil plant species in China. Camellia chekiangoleosa Hu, one of the main cultivars in China, has a short fruit They all had a conserved MADS domain at the N-terminus, and this domain consisted of approximately 59 amino acid sequences. Multiple sequence alignments of the MADS domains of the 89 proteins and sequence icons ( Figure S1) revealed four highly conserved amino acids (aa) (aa 21, 25, 32, and 39). According to the alignment results ( Figure S2), we found that the difference between type I and type II MADS-box domains mainly involved differences in N-terminal aa; moreover, the domain of type II MADS-box proteins was more conserved.
The physicochemical properties of the MADS-box proteins of C. chekiangoleosa showed that the aa of the 89 MADS-box proteins ranged from 102 to 1528 aa (Table 1). The sequence length of most CchMADS proteins (61.8%) was 200~300 aa. In total, 24.7% of the proteins were between 100 and 200 aa in length, and the rest were greater than 300 aa. CchMADS76 was the longest (1528 aa). The predicted molecular weight was from 11.521 to 176.716 kDa, and the predicted isoelectric point ranged from 4.92 to 10.48. Subcellular localization predicted that all the proteins were localized in the nucleus.

Phylogenetic Analysis of CchMADS Proteins
A phylogenetic tree was constructed based on the MADS-box genes of A. thaliana, Camellia sinensis, and C. chekiangoleosa (Figure 1), and the results showed that the CchMADS proteins could be divided into two categories: type I and type II. The type I proteins could be further divided into three subfamilies, namely, Mα (27), Mβ (2), and Mγ (9), and the type II proteins could be divided into two subfamilies, namely, MIKC C (45) and MIKC* (6). The type of MIKC C subfamily in CchMADS genes was shown in Table S1.
A phylogenetic tree was constructed based on the MADS-box genes of A. thaliana, Camellia sinensis, and C. chekiangoleosa (Figure 1), and the results showed that the CchMADS proteins could be divided into two categories: type I and type II. The type I proteins could be further divided into three subfamilies, namely, Mα (27), Mβ (2), and Mγ (9), and the type II proteins could be divided into two subfamilies, namely, MIKC C (45) and MIKC* (6). The type of MIKC C subfamily in CchMADS genes was shown in Table S1.
Although the total number of CchMADS genes in C. chekiangoleosa was similar to that in C. sinensis (83) (Table S2), C. sinensis had more type I genes, and there were fewer type II MIKC C genes than type II CchMADS genes. Moreover, there were significant differences in the number of genes in the Mβ and MIKC C subfamilies. In addition, although the total number of MADS-box genes (106) in A. thaliana was much higher than that in C. chekiangoleosa and the number of genes in each type I subfamily was higher, the number of type II MIKC C genes was lower in A. thaliana than in C. chekiangoleosa.  Although the total number of CchMADS genes in C. chekiangoleosa was similar to that in C. sinensis (83) (Table S2), C. sinensis had more type I genes, and there were fewer type II MIKC C genes than type II CchMADS genes. Moreover, there were significant differences in the number of genes in the Mβ and MIKC C subfamilies. In addition, although the total number of MADS-box genes (106) in A. thaliana was much higher than that in C. chekiangoleosa and the number of genes in each type I subfamily was higher, the number of type II MIKC C genes was lower in A. thaliana than in C. chekiangoleosa.

Gene Structure and Motif Analysis
Based on the results of our gene structure analysis (Figure 2c), it was found that among the 38 type I genes, 21 did not have introns. Four genes (CchMADS33, CchMADS39, CchMADS76, CchMADS87) contained 4-12 introns, and the remaining 13 genes contained 1~2 introns. Among the 51 type II genes, each contained at least one intron. Except for 8 genes (CchMADS11, CchMADS23, CchMADS35, CchMADS63, CchMADS70, CchMADS82, CchMADS83, CchMADS84) containing 1-4 introns, the number of introns of the remaining 43 genes were between 6 and 11. Overall, the average intron numbers of the type II genes (6.3) were much higher than those of the type I genes (1.6). In addition, we found that the length of introns for the different genes also varied greatly. The introns of CchMADS54, which were only 7 kb, were the longest among those of the type I genes. Among the type II genes, 39.2% of the introns were larger than 10 kb, and 11 genes, namely, CchMADS24, CchMADS31, CchMADS32, CchMADS36, CchMADS42, CchMADS44, In this study, a total of 10 conserved motifs of CchMADS proteins-motifs 1-10-were identified ( Figure 2b). The results showed that motif 1, motif 2, and motif 4 were widely present in all CchMADS proteins. They were MADS domains, and motif 1 was the classic MADS domain. In addition, motif 3 only appeared in Mα and MIKC C . However, the type I proteins were quite different. Motif 6 was endemic to the Mγ subfamily, and motif 7, motif 9, and motif 10 were endemic to the Mα subfamily. The MIKC C proteins were more conserved. Motifs 5 and 8 were specific to MIKC C proteins. Motif 5 was a highly conserved K domain motif.

Chromosomal Localization and Duplication of CchMADS Genes
Chromosomal localization was based on data within gff3 annotation files. We found that the 86 CchMADS genes were unevenly distributed on 15 chromosomes ( Figure 3). These genes were named CchMADS01 to CchMADS86 according to their chromosomal localization. Only three CchMADS genes (CchMADS87, CchMADS88, CchMADS89) could not be mapped to any chromosome. The results showed that the proportion of genes on the 15 chromosomes was between 2.25% and 14.61%. Chromosomes 3, 5, 13, and 15 had the fewest CchMADS genes (2), whereas chromosome 4 had the most genes (13). ferent colors of boxes represent different motif numbers. The length of a box indicates the motif length. (c) Structure of CchMADS genes.

Chromosomal Localization and Duplication of CchMADS Genes
Chromosomal localization was based on data within gff3 annotation files. We found that the 86 CchMADS genes were unevenly distributed on 15 chromosomes (Figure 3). These genes were named CchMADS01 to CchMADS86 according to their chromosomal localization. Only three CchMADS genes (CchMADS87, CchMADS88, CchMADS89) could not be mapped to any chromosome. The results showed that the proportion of genes on the 15 chromosomes was between 2.25% and 14.61%. Chromosomes 3, 5, 13, and 15 had the fewest CchMADS genes (2), whereas chromosome 4 had the most genes (13).
There was one pair of tandemly duplicated genes (CchMADS41 and CchMADS42) (CchMADS52 and CchMADS53) (Table S3) on chromosomes 7 and 8, respectively. One pair (CchMADS88 and CchMADS89) of tandemly duplicated genes could not be mapped to any chromosome. In addition, 22.5% of segmentally duplicated genes were located on different chromosomes (Figure 4). Many duplicate sequences were detected on different chromosomes, which may be one of the driving forces of gene evolution. Their nonsynonymous (Ka) and synonymous (Ks) substitution rates were analyzed (Table S2), and it was found that all Ka/Ks values were less than 1, indicating that they evolved under purifying selection.  There was one pair of tandemly duplicated genes (CchMADS41 and CchMADS42) (CchMADS52 and CchMADS53) (Table S3) on chromosomes 7 and 8, respectively. One pair (CchMADS88 and CchMADS89) of tandemly duplicated genes could not be mapped to any chromosome. In addition, 22.5% of segmentally duplicated genes were located on different chromosomes ( Figure 4). Many duplicate sequences were detected on different chromosomes, which may be one of the driving forces of gene evolution. Their nonsynonymous (Ka) and synonymous (Ks) substitution rates were analyzed (Table S2), and it was found that all Ka/Ks values were less than 1, indicating that they evolved under purifying selection.

Cis-Acting Elements of MADS-Box Gene Family-Associated Promoters
To further study the regulatory mechanism of the MADS-box gene family in terms of the development of C. chekiangoleosa, cis-acting elements were analyzed ( Figure S3). It was found that approximately 50 cis-acting elements could be effectively expressed, and the analysis revealed 21 elements with clear functions. Each gene had more than three lightresponsive elements, which constituted the most abundant type (994), followed by hormone-responsive elements (509), including abscisic acid response elements, gibberellin response elements, methyl jasmonate (MeJA) response elements, etc. Most of the other response elements (303) were related to fruit and seed development, including circadian rhythm control, endosperm expression, and seed-specific regulation. This might mean that CchMADS genes play an important role in the reproductive growth of C. chekiangoleosa. The lowest number of cis-acting elements were involved in responses to abiotic stress (140), including drought stress, calli, etc.

Cis-Acting Elements of MADS-Box Gene Family-Associated Promoters
To further study the regulatory mechanism of the MADS-box gene family in terms of the development of C. chekiangoleosa, cis-acting elements were analyzed ( Figure S3). It was found that approximately 50 cis-acting elements could be effectively expressed, and the analysis revealed 21 elements with clear functions. Each gene had more than three light-responsive elements, which constituted the most abundant type (994), followed by hormone-responsive elements (509), including abscisic acid response elements, gibberellin response elements, methyl jasmonate (MeJA) response elements, etc. Most of the other response elements (303) were related to fruit and seed development, including circadian rhythm control, endosperm expression, and seed-specific regulation. This might mean that CchMADS genes play an important role in the reproductive growth of C. chekiangoleosa. The lowest number of cis-acting elements were involved in responses to abiotic stress (140), including drought stress, calli, etc.
. Sci. 2023, 24, x FOR PEER REVIEW CchMADS43, CchMADS47, CchMADS56, CchMADS84) could interact with more other MADS proteins. Most of these proteins, such as AGL6, SVP, and TT16, are related to the development of flowers. These proteins (such as SHP2, STK, and A are not only related to flower development but also affect fruit development, m and seed dispersal.

Expression of CchMADS Genes in Different Tissues
To further confirm the expression patterns of CchMADS genes in different and predict their potential role in plant growth and development, 18 genes were s to explore their expression patterns in the roots, flowers, leaves, and seeds of C. che leosa. The information of primer sequence is shown in Table S4, the relative expres CchMADS genes in four tissues is shown in Table S5. In Figure 6, the red block in high expression, and the blue indicates low expression. Overall, the expression o genes in all four tissues was lower than that of type II genes. Among the type I gen members of the Mβ and Mγ subfamilies (CchMADS13, CchMADS20, CchMA CchMADS77) were expressed at very low levels in the four tissues. The genes of t subfamily (CchMADS39, CchMADS7, CchMADS21) were slightly more highly exp especially CchMADS39 in the seeds, whose expression was relatively high. Amo type II genes, the expression of CchMADS12 in the leaves was relatively hig CchMADS10 was highly expressed in the roots and leaves. Notably, CchMADS CchMADS58 showed high expression specifically in the flowers, and sim CchMADS55 showed high expression specifically in the seeds, which might me these type II genes play important roles in the development of reproductive organ

Expression of CchMADS Genes in Different Tissues
To further confirm the expression patterns of CchMADS genes in different organs and predict their potential role in plant growth and development, 18 genes were selected to explore their expression patterns in the roots, flowers, leaves, and seeds of C. chekiangoleosa. The information of primer sequence is shown in Table S4, the relative expression of CchMADS genes in four tissues is shown in Table S5. In Figure 6, the red block indicates high expression, and the blue indicates low expression. Overall, the expression of type I genes in all four tissues was lower than that of type II genes. Among the type I genes, the members of the Mβ and Mγ subfamilies (CchMADS13, CchMADS20, CchMADS40, CchMADS77) were expressed at very low levels in the four tissues. The genes of the Mα subfamily (CchMADS39, CchMADS7, CchMADS21) were slightly more highly expressed, especially CchMADS39 in the seeds, whose expression was relatively high. Among the type II genes, the expression of CchMADS12 in the leaves was relatively high, and CchMADS10 was highly expressed in the roots and leaves. Notably, CchMADS31 and CchMADS58 showed high expression specifically in the flowers, and similarly, CchMADS55 showed high expression specifically in the seeds, which might mean that these type II genes play important roles in the development of reproductive organs.

Number and Characteristics of CchMADS Genes
After verification via the SMART program, 89 MADS-box gene family members (CchMADS1 to CchMADS89) with intact MADS domains were ultimately identified in C. chekiangoleosa. However, these genes were not evenly distributed on the chromosomes. Zhang [29] found a similar number of genes in C. sinensis (83), which was more than that in Salix suchowensis (60) [30] and Sesamum indicum (57) [31] but less than that in A. thaliana (106) [32] and poplar (105) [33]. The genome sizes of these species varied (C. sinensis, 3 Gb; C. chekiangoleosa, 2.73 Gb; S. suchowensis, 356 Mb; S. indicum, 337 Mb; A. thaliana, 207 Mb; poplar, 431 Mb). In general, the number of gene family members is related specifically to genome size. However, some species seem to be unrelated, which may be the result of complex historical events, such as genome duplication, but the specific reasons need to be further explored.
Our constructed phylogenetic tree that referred to the classification of A. thaliana [32] revealed that the proportion of type II genes in C. chekiangoleosa (57.3%) was significantly higher than that in C. sinensis (43.4%) and A. thaliana (47.4%). This meant that type II CchMADS genes experienced a higher duplication rate or a lower gene loss rate after genome duplication. Although the proportion of type II genes was quite different, the MIKC C subfamily (45 members) of C. chekiangoleosa had the most genes with homologs in C. sinensis and A. thaliana. This indicated that the genes of the MIKC C subfamily were conserved between different species. In addition, through the subfamily classification in the MIKC C group (Table S1), we found that the gene numbers of AGL17, SOC1, and SEP subfamilies in C. chekiangoleosa (8,8,5, respectively) was much higher than that of C. sinensis (3, 5, 2, respectively) [29]. There were no genes of the AGL6, AGL12, and Bsister subfamilies in C. sinensis. In previous research of A. thaliana, AGL17, SOC1, SEP, and AGL6 affect the flower organs [19,34,35], AGL12 affects root cell differentiation, and Bsister affects ovule and seed development [36,37]. These homologous genes in C. chekiangoleosa may also

Number and Characteristics of CchMADS Genes
After verification via the SMART program, 89 MADS-box gene family members (CchMADS1 to CchMADS89) with intact MADS domains were ultimately identified in C. chekiangoleosa. However, these genes were not evenly distributed on the chromosomes. Zhang [29] found a similar number of genes in C. sinensis (83), which was more than that in Salix suchowensis (60) [30] and Sesamum indicum (57) [31] but less than that in A. thaliana (106) [32] and poplar (105) [33]. The genome sizes of these species varied (C. sinensis, 3 Gb; C. chekiangoleosa, 2.73 Gb; S. suchowensis, 356 Mb; S. indicum, 337 Mb; A. thaliana, 207 Mb; poplar, 431 Mb). In general, the number of gene family members is related specifically to genome size. However, some species seem to be unrelated, which may be the result of complex historical events, such as genome duplication, but the specific reasons need to be further explored.
Our constructed phylogenetic tree that referred to the classification of A. thaliana [32] revealed that the proportion of type II genes in C. chekiangoleosa (57.3%) was significantly higher than that in C. sinensis (43.4%) and A. thaliana (47.4%). This meant that type II CchMADS genes experienced a higher duplication rate or a lower gene loss rate after genome duplication. Although the proportion of type II genes was quite different, the MIKC C subfamily (45 members) of C. chekiangoleosa had the most genes with homologs in C. sinensis and A. thaliana. This indicated that the genes of the MIKC C subfamily were conserved between different species. In addition, through the subfamily classification in the MIKC C group (Table S1), we found that the gene numbers of AGL17, SOC1, and SEP subfamilies in C. chekiangoleosa (8,8,5, respectively) was much higher than that of C. sinensis (3, 5, 2, respectively) [29]. There were no genes of the AGL6, AGL12, and Bsister subfamilies in C. sinensis. In previous research of A. thaliana, AGL17, SOC1, SEP, and AGL6 affect the flower organs [19,34,35], AGL12 affects root cell differentiation, and Bsister affects ovule and seed development [36,37]. These homologous genes in C. chekiangoleosa may also play a similar role, resulting in the characteristics of C. chekiangoleosa with large flowers and fruits.
Other subfamilies were more varied, and the most obvious was the Mβ subfamily. The Mβ gene family of C. chekiangoleosa was 1/6 that of C. sinensis and poplar and only 1/10 that of A. thaliana. Gene loss might have occurred during the evolutionary process because the Mβ genes in C. chekiangoleosa failed to play an important role. Similar results were found in sesame and soybean [31,38]. Therefore, compared with other CchMADS genes, the MIKC C genes may have undergone more duplication and differentiation in C. chekiangoleosa, while other CchMADS genes have been severely lost.
In addition, we found that the longest amino acid sequence of a MADS-box protein in C. chekiangoleosa was 1528 aa (CchMADS76), which was smaller than that in other species, such as Malus domestica (the longest being 593 aa) [22], Setaria italica L. (the longest being 477 aa) [28], and Solanum lycopersicum (the longest being 389 aa) [39]. In C. sinensis [29], a sequence of up to 2691 aa was found, which has not been reported in other species. Both genes belong to the Mα subfamily, which may be unique to the Camellia genus.

Expansion of the CchMADS Gene Family
Gene duplication was thought to be the product of errors in DNA replication and reconstruction. The copied genes may exert new functions and enhance the ability of plants to adapt to the environment [40,41]. In this study, both segmentally duplicated (18 pairs) and tandemly duplicated (3 pairs) gene pairs belonged to the Mα and MIKC C subfamilies. The proportion of Mα subfamilies (54%) was relatively high. This is similar to findings of a study of soybean, where tandem duplication and segment duplication of type I genes were found to have occurred more frequently than those of type II genes [38]. This may be because type I genes originated and differentiated later than did type II genes, and type II genes were more conserved. Ka and Ks values are considered important indicators for studying the selection pressure or strength of protein-coding genes [42]. By calculating the Ka and Ks values, we found that the CchMADS genes evolved under the pressure of purifying selection. Segmental duplication and tandem duplication may be driving forces of gene family expansion and play an important role in functional gene diversity [43].

Intron Specificity of CchMADS Genes
The greater the number and length of introns, the more diverse the number of ways in which genes are spliced, thus affecting gene expression and protein activity [44][45][46]. The structure of type I genes of C. chekiangoleosa was simple, and most of them had no or only one intron, which is similar to the genes of A. thaliana. Compared with the type II genes of C. sinensis, the type I genes had more introns (average of 3.8). The reasons might be that this species has more type I genes and greater variation. The difference was that the number of introns (6.3) and the proportion of introns greater than 10 kb (39.2%) in type II CchMADS genes far exceeded those in C. sinensis (3.6 and 4.8%, respectively). A high proportion of type II genes was not found in other species, such as A. thaliana and S. suchowensis. Therefore, these type II genes may play important role in C. chekiangoleosa.
It has been suggested that genes with super large introns and transposons are more highly expressed [47,48]. Compared to only 3 MIKC C genes in C. sinensis, more than 10 MIKC C genes with super large introns (≥15 kb) in C. chekiangoleosa were found, but this was not the case in other species. MIKC C genes play an important role in plant flower organ development, flowering time duration, and sex determination of male and female flowers [49]. The presence of super large introns in Camellia plant species, especially in C. chekiangoleosa, may have contributed to the specificity of Camellia plants. The morphology and size of the flowers and fruits of C. chekiangoleosa are quite different from those of other species, and the number of introns and longer length of its type II genes mean that the expression of genes is relatively diverse. These characteristics may affect the expression of MIKC C genes in the reproductive organs of C. chekiangoleosa and play an important role in the formation of flower morphology and size.

Gene Expression and Potential Function
According to the results of qPCR, the AGL6-like gene CchMADS31 and PI-like gene CchMADS58 were highly expressed, specifically in flowers. The AGL104-like gene CchMADS55 was expressed specifically in the seeds, and all of these genes were type II. It is believed that AGL6 is involved in the gene regulation of floral meristems. PI affects the development of petals and stamens and is responsible for regulating the expression of genes associated with flower development [50,51]. AGL104 double mutants have defects in pollen viability and pollen tube growth, resulting in delayed germination and reduced fertility [52]. Therefore, CchMADS31 and CchMADS58 may regulate the flower meristem of C. chekiangoleosa and affect the size of the petals, and CchMADS55 may affect seed development. In addition, the type II genes CchMADS10 and CchMADS12 were also highly expressed in the roots and leaves, which meant that some type II genes not only play important roles in the reproductive organs of C. chekiangoleosa but also participate in the development of roots, leaves, and other tissues.

Identification of C. chekiangoleosa MADS-Box Genes
The C. chekiangoleosa genomic data were published by our team in 2022 and are publicly accessible at https://ngdc.cncb.ac.cn/gwh (accessed on 17 January 2022). The A. thaliana MADS-box protein sequences were obtained from The Arabidopsis Information Resource (TAIR) database (http://www.arabidopsis.org/, accessed on 10 January 2022) and utilized for BLASTP (E value = 1 × 10 −20 ) searches against the sequences of proteins of C. chekiangoleosa. Moreover, the HMM profile of the MADS-box domain (Pfam accession PF00319) was downloaded from the Pfam database (http://pfam.xfam.org/, accessed on 11 January 2022) and then used to retrieve the MADS-box protein sequences from all the annotated genes of the C. chekiangoleosa genome via the HMMER program version 3.0 [53].

Multiple Alignment and Phylogenetic Analysis
A sequence logo of the identified C. chekiangoleosa MADS-box genes was generated using WebLogo3 (http://weblogo.threeplusone.com, accessed on 16 February 2022) with the default parameters [58]. We subsequently used ClustalX version 2.1 to perform a multisequence alignment of the MADS-box domains, and ESPript 3.0 (https://espript.ibcp. fr/ESPript/cgi-bin/ESPript.cgi, accessed on 18 February 2022) was then used to visualize the resulting alignment.
A phylogenetic tree was constructed by the maximum likelihood (ML) method in MEGA version 11.0 [59]. The MADS-box protein sequences of C. sinensis were acquired from the article of Zhang [29]. Finally, the network profile of the phylogenetic tree was visualized by iTOLs version 6 (https://itol.embl.de/, accessed on 12 March 2022) [60].

Chromosomal Localization and Gene Structure Analysis
The distribution of the 89 MADS-box genes and gene density were visualized using TBtools version 1.098745 [61]. The conserved motifs were analyzed using MEME (http://meme-suite. org/, accessed on 20 March 2022) [62]. The parameters were set to a repeat motif site of any number, a maximum number of motifs of 10, and a width of each motif ranging from 6 to 60 residues. An exon/intron map was constructed in the Gene Structure Display Server program (http://gsds.cbi.pku.edu.cn/, accessed on 21 March 2022) [63].

Gene Duplication and Promoter Cis-Acting Regulatory Element Analysis
MCScanX [64] was used to analyze gene duplication. Multi-sequence and BLASTp alignments (E-value = 1 × 10 −20 ) were performed to obtain the similarities between these CchMADS genes. The major criteria used for analyzing potential gene duplications included the following: (a) the length of sequence that can be aligned covers 75% of the longer gene, and (b) the similarity of aligned regions covers 75% [65]. When a duplicated gene pair constituted two consecutive genes on the same chromosome, it was considered a tandemly duplicated gene pair. The Ka and Ks values were determined via KaKs_Calculator [42].
The upstream regions (2000 bp) of the start codon (ATG) of the MADS-box genes were used as the gene promoter sequence and retrieved from the C. chekiangoleosa genome, and its cis-acting elements were analyzed using PlantCARE (http://bioinformatics.psb.ugent. be/webtools/plantcare/html/, accessed on 2 April 2022) online software. The results were visualized by TBtools.

Protein-Protein Interaction Network Analysis
The MADS-box protein interaction network of C. chekiangoleosa was analyzed using the online website String (https://string-db.org/, accessed on 12 April 2022). The protein interaction network was visualized using Cytoscape version 3.9.1 software [66].

Plant Materials and Expression Analysis in Different Tissues
The experimental qPCR materials, which included 5-year-old live seedlings, were derived from the germplasm resource garden of the Zhongshan Botanical Garden of Jiangsu Province (118 • 69 N, 32 • 51 E). Seeds were collected in August 2021, and roots, flowers, and leaves were collected in March 2022. Total RNA was isolated using an RNA kit (RNAprep Pure Plant Kit, Tiangen, Beijing, China), the extraction procedures could be found in the manufacturer's instructions. The quality and concentration of RNA samples were determined by the NanoDrop 2000 c spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and 1% agarose gel electrophoresis. cDNA was synthesized from 1000 ng of total RNA in a 20 µL reaction volume using PrimeScriptTM RT Master Mix (TaKaRa, Dalian, China). We took 3 different biological replicates per tissue and froze them in liquid nitrogen. Then, we stored them in a −80 • C freezer. The 2 −∆∆CT method was used to calculate the relative expression in the various tissues. The expression levels were log10 standardized, and Heml version 1.0 software (http://hemi.biocuckoo.org/down.php, accessed on 20 May 2022) was used to construct an expression profile heatmap of the CchMADS genes.

Conclusions
In this study, a total of 89 MADS-box genes were identified in C. chekiangoleosa. Fragment duplication and tandem duplication were the driving forces for the expansion of MADS-box family members. These genes could be divided into type I and type II, of which there were 38 and 51 genes, respectively. The proportion of type II genes in C. chekiangoleosa was higher than that in the other species analyzed. Through phylogenetic, conserved motif, and other analyses, we found that the structure of the type II genes is more conserved than that of the type I genes. The presence of superlong amino acid sequences may be an important feature of the Camellia genus. Compared with type I genes, type II genes are present in a higher proportion, have a higher number of introns, and have longer introns in the genome of C. chekiangoleosa. The superlong introns of many genes of the MIKC C subfamily may indicate increased gene expression. Further qPCR analysis found that the overall expression level of the type II genes was significantly higher than that of the type I genes. Some type II genes were highly expressed specifically in the reproductive organs, indicating that these genes may be involved in regulating their developmental process. In addition, we found evidence of the expression of the MADS-box genes in the roots and leaves. This study provides additional information for the functional characterization of the MADS-box gene family. At the same time, it establishes an important foundation for the in-depth study of reproductive organ development and other related genes of C. chekiangoleosa.