Genome-Wide Analysis of the MADS-Box Transcription Factor Family in Solanum lycopersicum

MADS-box family genes encode transcription factors that are involved in multiple developmental processes in plants, especially in floral organ specification, fruit development, and ripening. However, a comprehensive analysis of tomato MADS-box family genes, which is an important model plant to study flower fruit development and ripening, remains obscure. To gain insight into the MADS-box genes in tomato, 131 tomato MADS-box genes were identified. These genes could be divided into five groups (Mα, Mβ, Mγ, Mδ, and MIKC) and were found to be located on all 12 chromosomes. We further analyzed the phylogenetic relationships among Arabidopsis and tomato, as well as the protein motif structure and exon–intron organization, to better understand the tomato MADS-box gene family. Additionally, owing to the role of MADS-box genes in floral organ identification and fruit development, the constitutive expression patterns of MADS-box genes at different stages in tomato development were identified. We analyzed 15 tomato MADS-box genes involved in floral organ identification and five tomato MADS-box genes related to fruit development by qRT-PCR. Collectively, our study provides a comprehensive and systematic analysis of the tomato MADS-box genes and would be valuable for the further functional characterization of some important members of the MADS-box gene family.


Introduction
The MADS-box family genes encode transcription factors (TFs), which are widely distributed in eukaryotes and play fundamental roles in diverse biological functions [1]. The name MADS-box is derived from the initials of four transcription factors that were first discovered of this family: MINICHROMOSOME MAINTENANCE 1 (MCM1), AGAMOUS (AG), DEFICIENS (DEF), and SERUM RESPONSE FACTOR (SRF) [2]. Their N-terminal contains a highly conserved DNA-binding MADS-domain containing 56-60 amino acids [2]. Thus, a protein encoding the MADS-domain is referred to as a MADS-box protein. MADS-box genes are divided into two groups (type I and type II) throughout the eukaryotes [3]. Type I MADS-box transcription factors can be further classified into four subclasses (Mα, Mβ, Mγ, and Mδ) in view of the M domain of the encoded protein, while only a few type I genes have been characterized for their biological function [4]. Type II MADS-box transcription factors include the Myocyte Enhancer Factor 2-like (MEF2-like) group from animals and yeast and the plant-specific MIKC-type group. The name of the MIKC-type protein is derived from their To obtain a genome-wide analysis of the tomato MAD-box gene family, we sorted 131 MADS-box genes from tomato that are highly homologous to MADS-box proteins known in other plant species, analyzed their phylogenetic relationships, gene structure, and conserved motifs, determined their exon-intron organization, and predicted their chromosomal localization. Furthermore, we obtained the predictions of the expression pattern of these tomato MADS-box genes in order to predict their expression pattern. In addition, the expression pattern of some MADS-box genes related to tomato flower organ identity, fruit development, and ripening were determined by qPCR analyses in different stages of tomato development. These results provide details of the tomato MADS-box family and may be useful for more comprehensive investigations of tomato MADS-box gene family members.

Identification of MADS-Box Genes in Tomato
To extensively identify tomato MADS-box genes, a set of 131 tomato MADS-box genes that are highly homologous to the MADS-box proteins reported in other plants was recovered using BLAST searches against the NCBI and SGN databases. Redundant sequences were removed. Since 35 tomato MADS-box genes have been officially named, the newly identified 96 tomato MADS-box genes were designated as SlMADS2-SlMADS98 (Table 1). Additionally, the molecular characteristics of the 131 MADS-box genes in tomato were analyzed. The names of the tomato MADS-box genes, the locus/gene names of SGN, the length of amino acid sequence, the molecular weight, and the isoelectric point are shown in Table 1. The statistical results showed that the amino acid sequence length of the 131 predicted tomato MADS-box proteins varied from 54 (SlMBP19) to 417 (SlMADS52), the relative molecular mass ranged from 6224.26 Da (SlMBP19) to 47275.1 Da (SlMADS52), and the isoelectric point (pI) varied from 4.41 (SlMADS70) to 11.03 (SlMADS77 and SlMADS80). The homologous genes of the tomato MADS-box family genes in Arabidopsis and the references for the study of the functions in homologous genes are shown in Table S1. In addition, the functional reports of several genes that have been studied previously were listed. Table 1. Overview of MADS-box genes identified in tomato. List of predicted genes and related information include gene name, gene locus, molecular details, classification of two types according to the phylogenetic analysis shown in Figure S1, homologs in Arabidopsis, as well as the reference of the gene function investigation. pI, isoelectric point; MW, molecular weight.

Classification and Phylogenetic Analysis of Tomato MADS-Box Genes
To study the phylogenetic relationships among MADS-box genes in tomato and Arabidopsis [6], a phylogenetic tree was drawn by the neighbor-joining (NJ) method using MEGA 5.02 ( Figure S1A). Based on previous reports on Arabidopsis, the 131 tomato MADS-box genes were classified into two types: type I (81) and type II (50). Based on the phylogenetic tree, type I and type II tomato MADS-box genes were subdivided into more detailed subgroups. Then, type I could be further divided into four groups (Mα, Mβ, Mγ, and Mδ), while Type II (MIKC) could be further divided into MIKC* and MIKC C . The MIKC C -type genes included the AP3/PI, SVP, AGL15, SEPALLATA (SEP), AGL6, AP1, FLOWERING LOCUS C (FLC) SOC1, AGAMOUS (AG), TM8, and DEFICIENS (DEF)/GLOBOSA (GLO) subfamilies, similar to the MADS-box genes in other plants species. In addition, the phylogenetic tree of type I and type II MADS-box protein in tomato plants were constructed to identify the phylogenetic relationships of gene numbers in the two types of tomato MADS-box family more clearly ( Figure S1B,C).

Conserved Motif and Gene Structure Analysis of Tomato MADS-Box Genes
The intron-exon organization was analyzed to investigate the structural diversity and evolution of the 131 tomato MADS-box genes. As shown in Figure 1, we obtained each gene's intron/exon arrangement by comparing their CDS with their genomic sequences using the program Gene Structure Display Server (GSDS). The number of introns in tomato MADS-box genes ranged from one to 11. Similar to Arabidopsis, the distribution of introns in tomato was different in type I and type II genes [6]. In our study, we found that the Mα, Mβ, and Mγ groups of the type I genes usually had no introns or one intron, which might be explained by the diversity of the reverse-transcribed origin or the differences in acquisition or loss introns by the ancestors of these three groups of genes [6]. Based on the genomic data, the Mδ group of type I and the type II genes contained multiple introns. Among the Mδ clade and the type II genes, 52 of 56 (92.9%) genes had more than five introns. Additionally, the gene structures of closely related genes in tomato MADS-box genes were more similar, and the differences were only in the lengths of introns and exons. However, some close gene pairs showed different intron/exon arrangements. For instance, SlMBP61 has one exon, whereas its close homologs SlMBP51 and SlMBP10 both have two, although their phylogenetic relationship displayed a high bootstrap value ( Figure S1B).
identify the phylogenetic relationships of gene numbers in the two types of tomato MADS-box family more clearly ( Figure S1B,C).

Conserved Motif and Gene Structure Analysis of Tomato MADS-Box Genes
The intron-exon organization was analyzed to investigate the structural diversity and evolution of the 131 tomato MADS-box genes. As shown in Figure 1, we obtained each gene's intron/exon arrangement by comparing their CDS with their genomic sequences using the program Gene Structure Display Server (GSDS). The number of introns in tomato MADS-box genes ranged from one to 11. Similar to Arabidopsis, the distribution of introns in tomato was different in type I and type II genes [6]. In our study, we found that the Mα, Mβ, and Mγ groups of the type I genes usually had no introns or one intron, which might be explained by the diversity of the reverse-transcribed origin or the differences in acquisition or loss introns by the ancestors of these three groups of genes [6]. Based on the genomic data, the Mδ group of type I and the type II genes contained multiple introns. Among the Mδ clade and the type II genes, 52 of 56 (92.9%) genes had more than five introns. Additionally, the gene structures of closely related genes in tomato MADS-box genes were more similar, and the differences were only in the lengths of introns and exons. However, some close gene pairs showed different intron/exon arrangements. For instance, SlMBP61 has one exon, whereas its close homologs SlMBP51 and SlMBP10 both have two, although their phylogenetic relationship displayed a high bootstrap value ( Figure S1B). To better analyze conserved motifs in tomato MADS-box proteins, we constructed a conserved motif figure using the Multiple EM for Motif Elicitation (MEME) program and annotated them using SMART. A total of 10 conserved motifs, named 1 to 10, were identified ( Figure 2). The details of the To better analyze conserved motifs in tomato MADS-box proteins, we constructed a conserved motif figure using the Multiple EM for Motif Elicitation (MEME) program and annotated them using SMART. A total of 10 conserved motifs, named 1 to 10, were identified ( Figure 2). The details of the motifs are shown in Figure S2. As expected, the same types of genes tend to possess the same motifs. Motif 1-one of the most typical MADS-box domains-comprised 42 amino acids was found in the majority of tomato MADS-box proteins. Motif 3 was also conserved across most of the tomato MADS-box proteins, including type I and type II genes. Motifs 2 and 4 represent the K domain, which plays an important role in protein-protein interactions among MADS-box proteins, and they were found only in type II MADS-box proteins. Motif 2 was identified in almost all the type II proteins except for TM8/TDR8, SlMADS86, SlMADS87, and SlMADS83. In the type II proteins, a large number of proteins had motif 4, with seven exceptions (SlGLO2, SlGLO1, SlMBP11, SlMADS84, SlMADS86, SlMADS87, and SlMADS83). Motif 9 is also a MADS-box domain that is present in a small number of tomato MADS-box proteins. However, some motifs (6, 7, 8, and 10) were shown to be weakly conserved in tomato MADS-box proteins, and they were found only in type I MADS-box proteins.

Chromosomal Locations of Tomato MADS-Box Genes
According to physical genome annotation files that were obtained by using genomic sequences from the SGN and NCBI databases, 131 members of the MADS-box genes were located on all 12 tomato chromosomes, implying that the MADS-box transcription factor family may have multiple functions in tomato plants.
As shown in Figure 3, the tomato MADS-box genes are distributed unevenly on each chromosome. Chromosome 1 contains the most MADS-box genes (24), while chromosome 8 and 9 contain the fewest (two). Gene duplication events have a significant function in amplifying gene family numbers and genome complexity in eukaryotes [73,74]. The tandem amplification or segmental duplication of chromosomal regions can increase gene families. In this study, the results showed that chromosomes 1, 3, 4, 10, and 12 contain gene clusters or gene hotspots; in particular, chromosome 1 contains eight MADS-box genes within a short region. Additionally, we confirmed that internal chromosome duplication (tandem duplication) occurred in these genes.

Chromosomal Locations of Tomato MADS-Box Genes
According to physical genome annotation files that were obtained by using genomic sequences from the SGN and NCBI databases, 131 members of the MADS-box genes were located on all 12 tomato chromosomes, implying that the MADS-box transcription factor family may have multiple functions in tomato plants.
As shown in Figure 3, the tomato MADS-box genes are distributed unevenly on each chromosome. Chromosome 1 contains the most MADS-box genes (24), while chromosome 8 and 9 contain the fewest (two). Gene duplication events have a significant function in amplifying gene family numbers and genome complexity in eukaryotes [73,74]. The tandem amplification or segmental duplication of chromosomal regions can increase gene families. In this study, the results showed that chromosomes 1, 3, 4, 10, and 12 contain gene clusters or gene hotspots; in particular, chromosome 1 contains eight MADS-box genes within a short region. Additionally, we confirmed that internal chromosome duplication (tandem duplication) occurred in these genes.

Predictions of Expression Profiles of Tomato MADS-Box Genes in Different Organs
To investigate the tomato MADS-box genes expression patterns in different tissues of tomato plants, we analyzed tomato transcript expression (RNA-seq) data in nine different tomato tissues at different developmental stages. This included the expression in the whole root (RT), young leaf (YL), mature leaves (ML), young flower buds (YFB), fully open flowers (F), and at the immature green (IMG), mature green (MG), break (B), and mature (MF) stages of fruit development and ripening. These datasets were searched using the locus/gene names in SGN of 124 tomato MADS-box gene sequences, except for SlMADS4, SlMADS11, SlMADS37, SlMADS44, SlMADS46, SlMADS56, SlMADS68, SlMADS70 and SlMADS89, which were not accurately found in TFGD. Then, we constructed a hierarchical clustering heat map using these datasets ( Figure 4).
SlMADS68, SlMADS70 and SlMADS89, which were not accurately found in TFGD. Then, we constructed a hierarchical clustering heat map using these datasets ( Figure 4). The expression profiles revealed that 117 genes were expressed in at least one tomato plant organ, while the other seven genes (SlMADS24, SlMADS25, SlMADS26, SlMADS33, SlMADS45, SlMADS61, and SlMADS74) were expressed at levels that were too low to be identified, or they had temporal and spatial specific expression patterns that showed no expression in the organs tested. Most tomato MADS-box genes displayed a broad expression range across all the organs and developmental stages, which is consistent with previous reports that the MADS-box genes may play multiple roles in plant growth and development [75,76]. However, some genes exhibited tissuespecific expression. For example, the expression of SlMADS12, SlMADS20, SlMADS21, SlMADS22, and SlMADS23 were restricted in whole root, and the SlMADS16, SlMADS17, and SlMADS132 transcripts were observed only during flower development. These results illustrate that these genes may be involved in the regulation of some biological process of the tomato root or in flower growth and development. Eight genes (SlMBP2, SlMBP6, SlMBP10, SlMBP21, TAP3, SlMADS78, SlMADS92, and SlMADS98) showed especially high expression in young flower buds (YFB) and fully open flowers (F), indicating that these genes may play important roles in floral organ development. We further discovered that most type II genes (SlMBP3, SlMBP7, SlMBP11, SlMBP15, MADS-RIN, MADS-MC, TAGL1, and LeAP1) were highly expressed during flower or fruit development; especially, the expression values of MADS-RIN and MADS-MC in fruits were more than 1000, suggesting that these genes may be associated with the reproductive growth of tomato. However, the expression of most type II genes showed no significant difference among tissues.
In short, these results indicate that the MADS-box genes had different expression levels in various tomato organs, and the predictions of the organ expression profiles of the tomato MADS-box gene family may provide insight for future studies on the functions of MADS-box genes in tomato plant growth and development. The expression profiles revealed that 117 genes were expressed in at least one tomato plant organ, while the other seven genes (SlMADS24, SlMADS25, SlMADS26, SlMADS33, SlMADS45, SlMADS61, and SlMADS74) were expressed at levels that were too low to be identified, or they had temporal and spatial specific expression patterns that showed no expression in the organs tested. Most tomato MADS-box genes displayed a broad expression range across all the organs and developmental stages, which is consistent with previous reports that the MADS-box genes may play multiple roles in plant growth and development [75,76]. However, some genes exhibited tissue-specific expression. For example, the expression of SlMADS12, SlMADS20, SlMADS21, SlMADS22, and SlMADS23 were restricted in whole root, and the SlMADS16, SlMADS17, and SlMADS132 transcripts were observed only during flower development. These results illustrate that these genes may be involved in the regulation of some biological process of the tomato root or in flower growth and development. Eight genes (SlMBP2, SlMBP6, SlMBP10, SlMBP21, TAP3, SlMADS78, SlMADS92, and SlMADS98) showed especially high expression in young flower buds (YFB) and fully open flowers (F), indicating that these genes may play important roles in floral organ development. We further discovered that most type II genes (SlMBP3, SlMBP7, SlMBP11, SlMBP15, MADS-RIN, MADS-MC, TAGL1, and LeAP1) were highly expressed during flower or fruit development; especially, the expression values of MADS-RIN and MADS-MC in fruits were more than 1000, suggesting that these genes may be associated with the reproductive growth of tomato. However, the expression of most type II genes showed no significant difference among tissues.
In short, these results indicate that the MADS-box genes had different expression levels in various tomato organs, and the predictions of the organ expression profiles of the tomato MADS-box gene family may provide insight for future studies on the functions of MADS-box genes in tomato plant growth and development.

The Critical Tomato MADS-Box Genes Involved in Floral Organ Development
Based on the phylogenetic analysis of the MADS-box genes that participated in floral organ development in petunia, which has been reported previously [32,77] (Figure S3), 15 tomato MADS-box genes that are possibly involved in floral organ development were screened out, including tow class A genes (MADS-MC and SlMBP20), four class B genes (TAP3, TM6, SlMBP1, and SlMBP2, two class C genes (TAG1 and TAGL1), two class D genes (SlMBP3 and SlMBP22), and five class E genes (TAGL2, TM5, SlMADS1, SlMBP21, and SlMBP6), as shown in Table S2.
According to the expression profile predictions shown in Figure 4, among the 15 MADS-box genes, nine genes (MADS-MC, TAP3, TM6, SlMBP1, SlMBP2, TAG1, SlMBP22, SlMBP21, and SlMBP6) were extremely highly expressed in flowers. However, TAGL1, SlMBP3, TAGL2, TM5, and SlMADS1 were mainly expressed during the stages of fruit development and ripening, and the expression of SlMBP20 was particularly high in leaves.
To further investigate the potential role of these MADS-box genes in floral organ development, a comparison of the expression patterns of 15 tomato MADS-box genes in four whorls of floral organs (sepal, petal, stamen, and carpel) were analyzed by qPCR. As shown in Figure 5A,B, the class A genes MADS-MC and SlMBP20 were highly expressed in sepals. The expression levels of class B genes (TAP3, SlMBP2, and SlMBP1) were notably high in the petal and stamen ( Figure 5C-E), whereas the transcription of TM6 was found to be markedly higher in carpel and stamen compared with other floral organs ( Figure 5F). Two class C genes, TAG1 and TAGL1, were found to be mainly expressed in stamens and carpels ( Figure 5G,H). Moreover, the class D genes, SlMBP3 and SlMBP22, showed organ-specific expression patterns that were exclusively expressed in carpel ( Figure 5I,J). The expression patterns of class E genes indicated that the expression of TAGL2 and TM5 were significantly higher expressed in the petal, stamen, and carpel than in the sepal ( Figure 5K,L), and SlMBP6 was highly expressed in petals and carpels ( Figure 5M). In comparison, the other class E genes (SlMADS1 and SlMBP21) were shown to exhibit higher expression levels in the sepal and carpel ( Figure 5N,O).
In addition, the interaction networks of these 15 tomato MADS-box proteins were predicted by STRING software ( Figure S4). The results showed that they established interactions with other proteins, directly or indirectly. The TAGL2, SlMBP3, TAGL1, and SlMBP22 proteins can interact directly with each other. Apart from that, the TAGL1 protein showed complex interactions with several other proteins, including the SlMBP3 and TAP3 proteins. The TM5 protein was also shown to directly interact with the TAP3 and TM6 proteins.

Differential Expression Analysis of Tomato MADS-Box Genes at Different Stages of Fruit Development and Ripening
According to the expression profiles predictions, we selected five tomato MADS-box genes (SlMBP3, MADS-RIN, TAGL1, TM4, and SlMBP7) that may take part in fruit development and ripening. We analyzed their expression patterns by qPCR at five different stages of fruit development and ripening, including the immature green (IMG), mature green (MG), and break (B) stages, as well as at four days after break (B+4) and seven days after break (B+7). SlMBP3 exhibited a strikingly high expression level at the IMG stage, and showed extremely low expression at the other stages ( Figure 6A). The expression levels of MADS-RIN and TAGL1 exhibited an increasing tendency from the MG to the B+4 stage, and then decreased at B+7 (Figure 6B,C). TM4 expression increased continuously during the process of fruit development and exhibited its maximum expression level at the B+7 stage ( Figure 6D). Compared with the other three stages, the expression of SlMBP7 was slightly higher at B ( Figure 6E). These results indicated that the qPCR data were consistent with the predictions of expression profiles and that our predictions are suitable for investigating the expression patterns of tomato MADS-box genes.
In addition, the interaction networks of these 15 tomato MADS-box proteins were predicted by STRING software (Figure S4). The results showed that they established interactions with other proteins, directly or indirectly. The TAGL2, SlMBP3, TAGL1, and SlMBP22 proteins can interact directly with each other. Apart from that, the TAGL1 protein showed complex interactions with several other proteins, including the SlMBP3 and TAP3 proteins. The TM5 protein was also shown to directly interact with the TAP3 and TM6 proteins.

Differential Expression Analysis of Tomato MADS-Box Genes at Different Stages of Fruit Development and Ripening
According to the expression profiles predictions, we selected five tomato MADS-box genes (SlMBP3, MADS-RIN, TAGL1, TM4, and SlMBP7) that may take part in fruit development and ripening. We analyzed their expression patterns by qPCR at five different stages of fruit development and ripening, including the immature green (IMG), mature green (MG), and break (B) stages, as well as at four days after break (B+4) and seven days after break (B+7). SlMBP3 exhibited a strikingly high expression level at the IMG stage, and showed extremely low expression at the other stages ( Figure  6A). The expression levels of MADS-RIN and TAGL1 exhibited an increasing tendency from the MG to the B+4 stage, and then decreased at B+7 (Figure 6B,C). TM4 expression increased continuously during the process of fruit development and exhibited its maximum expression level at the B+7 stage ( Figure 6D). Compared with the other three stages, the expression of SlMBP7 was slightly higher at B ( Figure 6E). These results indicated that the qPCR data were consistent with the predictions of expression profiles and that our predictions are suitable for investigating the expression patterns of tomato MADS-box genes.

Characterization of MADS-Box Genes in Tomato
The MADS-box genes control diverse biological processes in plants, including vegetative growth and reproductive development. They mainly play key roles in the developmental processes of

Characterization of MADS-Box Genes in Tomato
The MADS-box genes control diverse biological processes in plants, including vegetative growth and reproductive development. They mainly play key roles in the developmental processes of inflorescences, flowers, and fruits [78,79]. 24 MIKC C -type MADS-box transcription factors have been identified, and their functions and evolutions in tomatoes were thoroughly studied in 2006 [35]. However, the MIKC C -type MADS-box members are only part of the MADS-box transcription factor family, and to date, there has been no comparative report on the tomato MADS-box genes. It is well known that genome-wide analysis of gene families is a major and necessary approach to analyze the structure, evolution, and function of genes. In this study, 131 tomato MADS-box proteins were identified, and 96 new tomato MADS-box proteins with unknown functions were systemically named (Table 1). This study is the first comparative analysis of the tomato MADS-box gene family, and we believe that the resolving confusion in naming the genes will facilitate further functional analysis of the tomato MADS-box genes.
First, we presented the phylogenetic relationships of 131 tomato MADS-box proteins with Arabidopsis MADS-box proteins to classify the tomato MADS-box proteins into five subfamilies (MIKC, Mα, Mβ, Mγ, and Mδ), as shown Figure S1A. Compared with Arabidopsis, a larger number of MADS-box proteins were found in tomato. In total, 81 tomato MADS-box genes were determined to be type I genes, including the Mα, Mβ, Mγ, and Mδ groups, which is more than that in Arabidopsis. We speculate that tomato type I MADS-box genes may have a higher duplication rate and/or a lower gene loss rate after duplication. Nevertheless, 50 tomato MADS-box genes were classified as type II genes, including MIKC c and MIKC*, which is comparable to that in Arabidopsis. These results indicate that the genes' retention duplication have been different in various species, leading to different numbers of MAD-box genes among different species, with different evolutionary constraints [80]. Then, in order to investigate the phylogenetic relationships of MADS-box genes in tomato, a phylogenetic tree for two types of tomato MADS-box genes was constructed ( Figure S1B,C). This showed that tomato MADS-box genes are conservative in subfamilies.
To obtain insight into the structural diversity of the tomato MADS-box genes, the intron-exon organization was analyzed ( Figure 1). Previous studies have postulated that an intron-rich gene would lose multiple introns simultaneously by retrotransposition, thereby producing intron-less ancestral genes. In this study, we found that the Mα, Mβ and Mγ groups of the type I genes usually have no introns or one intron, which may experience the loss of multiple introns during MADS-box gene family diversification. In addition, the distribution of introns in tomato type I and type II genes were different, and the Mδ group of the type I and type II genes had more introns than the Mα, Mβ, and Mγ groups genes. Similar cases have also been detected in Arabidopsis and rice [7,13], suggesting the evolutionary conservation among plants. However, some close gene pairs showed different intron/exon arrangements, indicating that a more complicated gene structural evolution may exist in tomato MADS-box genes. The conserved motif analysis indicated that the same group contained most conserved motifs (Figure 2). The results suggested that these conserved motifs play important roles in group-specific functions. However, high structural divergence was found between the different groups. An analysis of the gene structures and conserved motifs could provide more clues about the evolutionary relationships of the MADS-box family in tomato.
Gene duplication (segmental duplication and tandem duplication) as well as transposition events were prevalent forces that result in the expansion of family members and genome complexity in eukaryotes [74]. The duplication of more than two genes located on one chromosome is confirmed as a tandem duplication event, whereas gene duplication that occurs on different chromosomes is identified as segmental duplication [73,81]. Both tandem and segmental duplication can play crucial roles in MADS-box gene expansion the tomato genome. In our study, a chromosomal location analysis of the tomato MADS-box genes showed that the MADS-box genes are distributed on 12 chromosomes (Figure 3). The tomato MADS-box genes had a high-density distribution on chromosome 1, which had 24 genes, suggesting that they might be caused by tandem duplications. The closely related tomato MADS-box genes formed tandem arrays on chromosomes 1, 3, 4, 10, and 12, which may help the tomato evolve distinct characterizations from other plants.
Since gene expression profiles can provide significant clues about gene function, the expression of tomato MADS-box genes in whole root (Rt), young leaf (YL), mature leaves (ML), young flower buds (YFB), fully open flowers (F), and five different fruit tissues were examined by transcription expression (RNA-seq) data. All the tested 124 tomato MAD-box genes that were expected to contain SlMADS4, SlMADS11, SlMADS37, SlMADS44, SlMADS46, SlMADS56, SlMADS68, SlMADS70, and SlMADS89 showed distinct expression patterns (Figure 4). This finding may supply insight for future studies on the functions of MADS-box genes in tomato plant growth and development. For example, we found that the SlMADS23 gene was only expressed in the root, so we speculate that the SlMADS23 gene may play a key role in root growth and development. The roots of plants determined the capacity of plants to acquire and distribute nutrients and water, as well as provide a means to suit the environmental conditions. Thus, the root architecture is extremely important for plant development and breeding. In the future, we will verify whether SlMADS23 is related to root growth by constructing a SlMADS23 overexpression vector and generating transgenic overexpression tomato plants to study the regulation of SlMADS23 gene on root growth and development. In addition, performing SlMADS23 gene mutagenesis with the CRISPR/Cas9 system transformation method may also prove to be a helpful strategy.

Prediction of MADS-Box Genes Involved in the Regulation of Flower Development and Floral Organ Identity
An investigation of the genetic and molecular basis of flower development and floral organ identity in Arabidopsis and petunia suggested that MADS-box genes play fundamental roles in floral organ identity and flower development [82]. It has been confirmed that five classes of MADS-box genes (A-E) were involved in specifying floral organ identity [83][84][85]. In Arabidopsis, the class A genes (AP1 and AP2), the class B genes (AP3 and PI), the class C gene (AG), the class D gene (AGL11), and the class E gene (SEP1, SEP2, SEP3, and SEP4) were MADS-box genes, which have been reported to be involved in the regulation of floral organ development [83,86]. In petunia, lots of MADS-box genes, including the class A genes PETUNIA FLOWERING GENE (PFG), FLORAL BINDING PROTEIN 26 (FBP26), and FBP29, the class B genes TM6, PMADS1/GP, PMADS2, and FBP1, the class C genes PMADS3, FBP6, and FBP24, the class D genes FBP11 and FBP7, as well as FBP2, FBP4, FBP5, FBP9, and the class E genes FBP23, PMADS4, and PMADS12 played important roles in flower development [17,87].
In this paper, we investigated the tomato MADS-box genes' phylogenetic relationships with the petunia hybrid to select 15 tomato MADS-box genes that may play specific roles in flower development (Table S2). According to the expression profile predictions shown in Figure 4, the highest expression values for most of the genes (MADS-MC TAP3, TM6, SlMBP1, SlMBP2, TAG1, SlMBP22, SlMBP21, and SlMBP6) were observed in flower development stages. Furthermore, qPCR was used to study the expression patterns of four whorls of floral organs (sepal, petal, stamen, and carpel) in these 15 tomato MADS-box genes.
AP1 is an Arabidopsis A class gene, which conferred sepal identity in the first floral [88]. In petunia, the three genes PFG, FBP26, and FBP29 have been identified, which were orthologs of AP1/SQUA in Arabidopsis [17]. Our phylogenetic analysis showed that MADS-MC and SlMBP20 belonged to this clade ( Figure S3) and their expression were particularly high in sepal, suggesting that they might play a similar role to AP1 ( Figure 5A,B). The class B genes were involved in the identification of petal and stamen in angiosperms [89]. Regarding the class B genes, five close homologs of petunia-TOMATO MADS-BOX GENE6 (TM6), GREEN PETAL (GP)/ PETUNIA MADS BOX GENE 1 (PMADS1), PMADS2, and FBP1-were found in tomato [77]. The qPCR analysis showed that TAP3, SlMBP2, and SlMBP1 have petal and stamen specific expression, while the TM6/TDR6 transcripts were mainly detected in the petal and carpel ( Figure 5C-F). These results were similar to the homologous genes of that in petunia [77]. Two tomato MADS-box genes TAG1 and TAGL1, which were involved in C functions, were from the monophyletic AGAMOUS (AG) subfamily. These two genes were mainly expressed in stamens and carpels ( Figure 5G,H), which is consistent with their function in specifying stamen and carpel development [52]. SlMBP3 and SlMBP22, which are highly homologous to two petunia class D MADS-box genes, FBP11 and FBP7, were shown to be separately and exclusively expressed in carpel ( Figure 5L,I). The result suggested that SlMBP3 and SlMBP22 may have similar functions to the petunia FBP7 and FBP11 genes, which are related to the establishment of real ovules or carpel-like structures [90]. Arabidopsis SEP genes, the typical class E genes, were necessary for the specification of sepal, petal, stamen, and carpel identity with interaction with the class A, B, C, and D genes [16]. In petunia, seven class E genes (FBP2, FBP4, FBP23, FBP5, FBP9, pMADS12, and pMADS4) have been determined that belong to the SEPALLATA (SEP) clade [32]. The tomato TAGL2, TM5, SlMADS1, SlMBP21, and SlMBP6 genes were homologous to these petunia class E genes ( Figure S3), and some differences in expression patterns have been observed ( Figure 5J-N), indicating that these five tomato class E genes may be involved in multiple floral organ identity. Thus, we believe that the expression patterns of the tomato MADS-box genes identified in our study will be an important tool for understanding the flower development mechanisms in tomato. Previous reports have found that the MADS-box proteins are able to form multiple homologous or heterodimeric complexes in plants, and the combinatorial MADS-box proteins are often deriving their regulatory specificity from other DNA binding or accessory factors. To understand how the tomato MADS-box genes can act in flower development and floral organ identity, it is necessary to identify the network of protein-protein interactions amongst them. Therefore, the predicted interaction networks of the 15 tomato MADS-box proteins, which are involved in floral organ development, were analyzed in our report ( Figure S4). In many domesticated crops, it's an important way to select inflorescence architecture with improved flower production and yield. In tomato, SlMBP21 forms protein complexes with JOINTLESS and MACROCALYX as a transcription activator for tomato flower abscission zone development [50], because SlMBP21, J, and MC have a common function in the development of the tomato flower abscission zone. In breeding, altering any of these genes will have the function on plant growth. In this study, the predicted interaction networks may help us to understand how the tomato MADS-box genes can act in flower development and floral organ identity.

Potential Functions of Tomato MADS-Box Genes during Fruit Development
To better understand the roles of tomato MADS-box genes in fruit development and ripening, we selected five tomato MADS-box genes (SlMBP3, MADS-RIN, TAGL1, TM4, and SlMBP7) that were predicted to abundantly expressed at different stages of fruit development and ripening. Then, we detected their relative expression level in fruits samples from five different stages of development by qPCR ( Figure 6). As shown in Figure 6A, the expression level of SlMBP3 was found to be higher at the MG stage than at the other stages. We found that the expression levels of MADS-RIN and TAGL1 exhibited an increasing tendency during the transition from the MG stage to the B+4 stage, and then decreased at the B+7 stage ( Figure 6B,C). Moreover, the expression level of TM4 was found to increase continuously in the process of fruit development ( Figure 6D). The SlMBP7 gene showed a relatively high expression level at the B stage ( Figure 6E). Recent reports have identified a number of MADS-box genes that are required for the regulation of fruit development and ripening. One of the most representative is the tomato MADS-RIN gene, which is one of the earliest acting ripening regulators, and plays crucial roles in fruit ripening through ethylene dependent and independent ripening regulatory pathways [19,91]. In addition, TAGL1, TM4, and SlMBP7 have been found to regulate fruit ripening in tomato [21,22]. Since our results were consistent with the functional research of these genes, the SlMBP3 gene was predicted to be particularly high in the MG stage, indicating that it might play an important function in fruit development and ripening. These results help to advance our understanding of the function of MADS-box genes in the regulation of fruit developmental and ripening processes in tomato.

Plant Material and Growth Condition
In this article, tomatoes (Solanum lycopersicum, 'Ailsa Craig' AC ++ ) from Laboratory of molecular biology of tomato, Bioengineering College, Chongqing University, Chongqing, China, were grown in controlled greenhouse conditions of a 16-h day (25 • C)/8-h night (18 • C) cycle, 80% humidity, a 250-µmol·m −2 ·s −1 light intensity, and were managed routinely. The tomato flowers were tagged at anthesis and floral organs: sepals (Se), petals (Pe), stamens (St), and carpels (Ca) were collected from flowers at anthesis. The fruit color and days post-anthesis (DPA) were used to differentiate the ripening days of tomato fruits. We defined 20 DPA as the immature green (IMG), and 35 DPA as the mature green (MG), at which point the fruits are green and shiny and no obvious color change is observed.
The 38-DPA tomato fruits with color change from green to yellow were characterized as breaker (B) fruits. Besides, the samples of B+4 (4 days after breaker) fruits and B+7 (7 days after breaker) fruits were also used in our study. All the samples that we used were frozen in liquid nitrogen immediately and stored at -80 • C.

Identification of MADS-Box Genes in Tomato
The Sol Genomics Network (SGN, Available online: http://solgenomics.net/) and the National Center for Biotechnology Information (NCBI, Available online: https://www.ncbi.nlm.nih.gov/) database were used to comprehensively identify the whole MADS-box protein sequences of tomato. BLAST searches, using all the Arabidopsis and rice MADS-box protein sequences as queries, were performed to check the predicted tomato MADS-box protein sequences in the database. Subsequently, we further examined all the candidate protein sequences by the PROSITE (Available online: http://www.expasy.org/prosite/) and SMART (Available online: http://smart.embl-heidelberg.de/) databases for reliability. The tomato protein sequences, containing the typical conserved domain of the MADS-box protein family, were selected for amino acid sequence multiple alignment and phylogenetic tree analysis. Then, we obtained their DNA sequences according to their amino acid sequence from the SGN database. Additionally, the molecular weight and isoelectric points of tomato MADS-box proteins were detected by the ExPASy proteomics server.

Phylogenetic Analysis of Tomato MADS-Box Proteins
Multiple sequence alignment for the two groups of all the 131 tomato MADS-box genes (Table 1) was generated using ClustalX 1.81. The alignment results were used to conduct a phylogenetic tree by the MEGA5.02 program, and the evolutionary history was inferred using the neighbor-joining method. [92].

The Analysis of Gene Structure and Conserved Motif
The tomato MADS-box coding domain sequences (CDS) and corresponding genomic DNA sequences were collected from SGN and NCBI databases to predict gene structure. The online tool Gene Structure Display Server 2.0 (GSDS 2.0, Available online: http://gsds.cbi.pku.edu.cn/index.php), was used to construct an exon/intron map [93].
Conserved motifs of the tomato MADS-box protein sequences were identified by online software MEME Version 4.12.0 (Available online: http://meme-suite.org/tools/meme). It was performed with the following parameters: 10 different motifs, a motif width of 6-200 amino acids, and any number of repetitions. The SMART database was used to annotate the MEME motifs.

Chromosomal Locations and Identification of Interaction Network
To determine the chromosomal locations of tomato MADS-box genes, we obtained the physical genome annotation files from the SGN and NCBI database. The physical map of the tomato MADS-box genes was drawn by the Tomato-EXPEN 2000 (Available online: https://solgenomics.net/cview/map.pl).
The interaction network was conducted by STRING (functional protein association networks, Available online: https://string-db.org/) software using the search of multiple proteins sequences [94].

Digital Gene Expression Analysis of Tomato MADS-Box Genes
To obtain the expression profile of tomato MADS-box genes, the RNA-seq data based on the locus/gene names of SGN were analyzed. We downloaded the RNA-seq data from TFGD (the Tomato Functional Genomics Database), and the sequence data were obtained from various tissues in wild species LA1589 (S. pimpinellifolium). In addition, the data of the tomato lab and Tomato eFP Browser were also used to analyze the gene expression of tomato MADS-box genes. A heatmap was generated by Heml 1.0 (Heatmap illustrator, Available online: http://hemi.biocuckoo.org/) using the relative expression values or ratios of each tomato MADS-box gene [95].

Total RNA Extraction and qPCR Analysis
To study the expression patterns of the MADS-box genes involved in flower organ identity and fruit development in tomato, total RNA was extracted from the sepals (Se), petals (Pe), stamens (St), and carpels (Ca) of tomato and different developmental stages of tomato fruits at different developmental stages, including IMG (immature green), MG (mature green), B (breaker), B+4 (4 days after breaker), and B+7 (7 days after breaker) using RNAiso Plus (Takara) in accordance with the instructions. After DNase digestion (Promega, Madison, WI, USA), cDNA was synthesized with oligo(dT)20 as a primer for RNA reverse-transcription using M-MLV reverse transcriptase (Promega, Madison, WI, USA). For gene expression quantification, qPCR analysis was performed with the CFX96™ Real-Time System (Bio-Rad, Hercules, CA, USA) using the GoTaq qPCR Master Mix (Promega, Madison, WI, USA). First, 1.0 µL of mixture primers, 1.0 µL of cDNA, and 3.0 µL of ddH 2 O were used. NRT (no reverse transcription control) and NTC (no template control) experiments were performed to eliminate the genomic DNA and environment effects. The tomato SlCAC gene was used as an internal standard [96], and the 2 −∆∆CT method was used to perform the relative gene expression levels analysis [97]. In addition, all the experiments were performed in three biological triplicates with three technical replicates. The standard curves were run at the same time. All the primers used were designed by Primer 5.0 software and are shown in Table S3.

Data Analysis
In this study, the mean values of data are presented as mean ± standard deviation. The Origin 8.0 software (Available online: https://www.originlab.com/) was used to perform the data analysis, and mean differences were significant by t-test (* p < 0.05).

Conclusions
In this study, a comprehensive and systematic analysis of the tomato MADS-box transcription factor family was first conducted. A total of 131 genes encoding MADS-box transcription factors, including 81 type I and 50 type II genes, were extensively identified in the tomato genome. Then, we classified the genes according to their phylogenetic relationships between tomato and Arabidopsis. The phylogenetic relationships, gene structures, conserved motifs, chromosomal distribution, and expression patterns of the genes were characterized. The 131 tomato MADS-box genes showed differential expression levels in different organs. Since the MADS-box genes are the most powerful TFs that regulate floral organ identity and fruit development and ripening in plants, we showed that 15 tomato MADS-box genes were involved in floral organ development, and we studied the expression of five tomato MADS-box genes in different stages of fruit development and ripening. These results provide evidence of the relationship between MADS-box genes and floral organ and fruit development. Our study provides comprehensive information on the tomato MADS-box gene family, enables a better understanding of the structure-function relationships among the tomato MADS-box gene family members, and lays a solid foundation of comprehensive functional characteristics in the tomato MADS-box gene family. Furthermore, our bioinformatics and evolutionary analysis will be helpful for better understanding the underlying evolutionary relationship of the MADS-box family in higher plants.

Conflicts of Interest:
The authors declare no conflict of interest.