Next Article in Journal
Pulmonary Arterial Hypertension: A Deeper Evaluation of Genetic Risk in the -Omics Era
Previous Article in Journal
Polyamines Involved in Regulating Self-Incompatibility in Apple
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Theobroma cacao

1
Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
2
Institute of Marine Materials Science and Engineering, College of Ocean Science and Engineering, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Genes 2021, 12(11), 1799; https://doi.org/10.3390/genes12111799
Submission received: 13 October 2021 / Revised: 10 November 2021 / Accepted: 12 November 2021 / Published: 15 November 2021
(This article belongs to the Section Plant Genetics and Genomics)

Abstract

:
The MADS-box family gene is a class of transcription factors that have been extensively studied and involved in several plant growth and development processes, especially in floral organ specificity, flowering time and initiation and fruit development. In this study, we identified 69 candidate MADS-box genes and clustered these genes into five subgroups (Mα: 11; Mβ: 2; Mγ: 14; Mδ: 9; MIKC: 32) based on their phylogenetical relationships with Arabidopsis. Most TcMADS genes within the same subgroup showed a similar gene structure and highly conserved motifs. Chromosomal distribution analysis revealed that all the TcMADS genes were evenly distributed in 10 chromosomes. Additionally, the cis-acting elements of promoter, physicochemical properties and subcellular localization were also analyzed. This study provides a comprehensive analysis of MADS-box genes in Theobroma cacao and lays the foundation for further functional research.

1. Introduction

MADS-box genes encode eukaryotic transcription factors that play a prominent role in plant development processes. MADS-box proteins contain a highly conserved DNA-binding MADS-domain of approximately 50–60 amino acids in length in their N-terminal region, and this domain could be involved in recognizing and binding the CArG motif of their target gene [1]. The name itself is given by the initials of the four first-discovered transcription factors in this family, which are MCMI in Saccharomyces cerevisiae [2], AGAMOUS in Arabidopsis thaliana [3], DEFICENS in Antirrhinum majus [4] and SRF4 in Homo sapiens [5]. Based on protein domain structure, the MADS-box genes are divided into two categories: type I and type II. The type I MADS-box genes can be further classified into Mα, Mβ, Mγ, Mδ subclasses. Type II lineage, also known as MIKC type, has a special MIKC structure, which is composed of an N-terminal MADS domain, the I (intervening) and K (keratin-like) regions and a variable C-terminal transcriptional activation domain [6]. Type MIKC were further divided into two subgroups, MIKCC and MIKC*, according to their MIKC structural features [7].
The MADS-box gene family is known to have functions in many significant physiological and developmental processes, such as the regulation of floral organ specificity [3,4], control of flowering signals and initiation [8,9], fruit development [10], meristem identify specification [11], and seed development [12]. For example, Wheat VERNALIZATION1 (VRN1) is a key regulator of flowering time and floral meristem determination [13] The MADS-box gene FLOWERING LOCUS C (FLC) controls the vernalization pathway in Arabidopsis [14]. Apple MdDAM1 plays a role in bud dormancy and growth cessation in autumn [15]. Although MADS-box genes are well-known for their roles in the flower developmental process and participating in the classical ABC flower development model, some of them have been validated to function on root and leaf morphogenesis [16,17]. To date, the MADS-box proteins have been characterized in various kinds of plants, including Arabidopsis [18], Populus trichocarpa [19], pineapple [20], Saccharum spontaneum [21], Erigeron breviscapus [22]. However, little is known regarding the MADS-box gene family in Theobroma cacao.
Theobroma cacao is an economically important tropical tree, native to South America, which is planted in large quantities for its fruits (cacao pods), where its beans were used as the raw material for making chocolate, coco butter, cosmetics and confectionery [23]. Additionally, some studies have proposed that an ingredient found in coco might exert cardiovascular benefits [24]. Research into the sequencing and assembling genome of Theobroma cacao was carried out in 2010 [25], leading to the genome-wide identification and analysis of important gene families such as the NAC domain transcription factor family [26], WRKY transcription factor family [27], and GPX family [28]. The metabolome and transcriptome profiling of the Theobroma cacao pods was completed [29]. In this scenario, we conducted a bioinformatics analysis of MADS-box members of Theobroma cacao at the gene level. We identified 69 MADS-box gene members, investigated their phylogenetic relationship, classified them, and analyzed gene structures, motifs, and chromosome location. Moreover, subcellular localization and cis-acting elements were also performed. Our results may provide a basis for further functional studies of coco tree genes and references for subsequent research into molecular mechanisms.

2. Materials and Methods

2.1. Identification of MADS-Box Genes in Theobroma cacao

Theobroma cacao genome sequences and annotation files were provided by Ensembl Plants (http://plants.ensembl.org/index.html, accessed on 16 April 2021. The hidden Markov model (HMM) profile of the MADS-domain was retrieved from the Pfam database (release 34.0; http://pfam.xfam.org/, accessed on 16 April 2021) with the accession number ‘PF00319’ [30].
MADS-box proteins in Theobroma cacao were searched using the following two approaches. First, the downloaded HMM profile was employed using the HMMER v3.3.2 program to search proteins containing the MADS-domain. Secondly, to avoid missing candidates, we constructed a new HMM model with proteins with e-value < 1 × 10−20, and ClustalW (version 2.1) was used for multiple sequence alignments [31]. The new model was used to search all Theobroma cacao protein sequences using HMMER (version 3.3.2), with a cut-off e-value of 0.05. Additionally, the predicted proteins were invalidated by conducting protein domain searches on the SMART program (http://smart.embl-heidelberg.de/, accessed on 19 June 2021) and NCBI Conserved Domain Search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, accessed on 19 June 2021) to confirm the presence of the MADS-domain in all candidate proteins.

2.2. Phylogenetic Analysis and Classification of MADS-Box Genes

To understand the phylogenetic relationship and to classify the MADS-box genes, a rooted neighbor-joining (NJ) phylogenetic tree for Theobroma cacao (TcMADS) and Arabidopsis MADS-box proteins was constructed using MEGA X software (version 10.2.2) [32]. The TcMADS gene family was classified according to their phylogenetic relations with corresponding Arabidopsis MADS-box members. Arabidopsis MADS-box protein sequences were downloaded from TAIR (https://www.arabidopsis.org/, accessed on 24 July 2021) with the accession numbers reported by Parenicová et al [18]. All protein sequences were aligned by Muscle with the default parameters [33]. The Neighbor-Joining method was used, with the following parameters: 1000 replications for bootstrap method, Poission model, Pairwise deletion. Additionally, an individual phylogenetic tree of TcMADS genes was built with the same method and beautified by ggtree [34].

2.3. Conserved Motif and Gene Structure Analysis

Online program MEME (https://meme-suite.org/meme/tools/meme, accessed on 29 July 2021) was applied to analyze the conserved motifs in the MADS-box protein with the following settings: maximum number of motifs 10, minimum motif width 6, maximum motif width 50, number of repetitions any [35]. The intron–exon structure information was contained in the Theobroma cacao gtf file downloaded from Ensembl Plants. Conserved motif and gene structure were both visualized by TBtools software (version 0.665).
The online tools ProtParam (https://web.expasy.org/protparam/, accessed on 10 August 2021) and Compute pI/Mw (https://web.expasy.org/compute_pi/, accessed on 10 August 2021) was employed to analyze physicochemical properties including theoretical isoelectric points (PI), average molecular weight (MW), instability index and aliphatic index. Number of amino acids (aa) and open reading frame (ORF) lengths were both found with the ORFfinder website (https://www.ncbi.nlm.nih.gov/orffinder/, accessed on 11 August 2021). The BUSCA program (https://busca.biocomp.unibo.it/, accessed on 4 August 2021) was used to predict TcMADS proteins’ subcellular localization (SL).

2.4. Chromosomal Localization and Gene Duplication

The locational information on the chromosomes and chromosome length of TcMADS genes was acquired from Ensembl Plant. All identified genes were mapped to 10 chromosomes with MG2C (http://mg2c.iask.in/mg2c_v2.1/, accessed on 25 June 2021) according to their chromosomal positions and relative distance. TcMADS gene potential duplication was confirmed based on major criteria as follows: (a) sequence alignment length cover > 75% of longer sequence, and (b) the similarity of the aligned region > 75% [36]. Bio-Linux was used to screened tandem repeat sequences. The TcMADS protein sequences were aligned by MAFFT (version 7.481), and then multiple protein alignment were confirmed and the corresponding DNA sequences were sorted into codon alignments [37], which were used to calculate the Ka/Ks ratios using KaKs calculator Toolbox 2.0 (version 2.0).

2.5. Analysis of Cis-Acting Element in MADS-Box Genes’ Promoters

The upstream sequences (2 kb) of TcMADS genes’ CDS were retrieved from the Theobroma cacao genome by TBtools software according to gene ID, and then submitted to PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 5 August 2021) to identify four cis-acting elements, including light-responsive elements, wound-responsive elements, gibberellin-responsive elements, and auxin-responsive elements, after filtering and screening. The variety and quantity of cis-acting elements upstream each gene was found with TBtools.

3. Results

3.1. Identification of MADS-Box Genes in Theobroma cacao

To identify the MADS-box genes in Theobroma cacao, two HMM analyses were performed: after removing duplicates, a total of 68 putative MADS proteins were obtained by first HMMER searches, using the MADS domain profile as a query, in the coco tree protein database. For the second HMM analysis, we selected proteins which e-value > 0.05 as candidate members, choosing the longest transcript for each screened gene, and thus generating 69 MADS-box genes after confirming MADS domain by SMART and NCBI Conserved Domain Search Service (Supplementary File S1). These 69 MADS-box genes were sequentially renamed from TcMADS1 to TcMADS69 based on their chromosomal location and subjected to further analyses. Detailed characteristics, including number of amino acids (aa), average molecular weight (MW), theoretical pI, instability index, and aliphatic index about TcMADS genes, are listed in Table 1. The statistical results showed that the protein length varied, ranging from 78 (TcMADS23) to 600 (TcMADS7) amino acids, with an average length of amino acids, and the molecular weights varied from 66752.75 Da (TcMADS23) to 8995.45 Da (TcMADS7). Additionally, thirteen MADS-box proteins were acidic, with pI values less than 6.5; 52 were alkaline, with pI values greater than 7.5; four were neutral, with a pI are between 6.5 and 7.5. The instability index analysis indicated that most of the TcMADS proteins were unstable, with an instability index greater than 40, except for TcMADS12, TcMADS37, TcMADS57, TcMADS67, TcMADS1, TcMADS55, TcMADS9, TcMADS47. The subcellular localization prediction of TcMADS genes was analyzed by BUSCA tools. From the analysis results, most TcMADS genes appeared to mainly be located in the nucleus (63.77%) and chloroplast (34.78%), with only TcMADS11 found in the endomembrane system.

3.2. Phylogenetic Analysis and Classification of the MADS-Box Gene

To understand the phylogenetic relationship among MADS-box genes in the coco tree and group them into the established subfamilies, we employed MEGA X to construct a rooted neighbor-joining phylogenetic tree based on the amino acid sequence alignment of 69 proteins from Theobroma cacao and 96 from Arabidopsis (Figure 1) [15], which also allowed for inferences to be made about the possible function of these genes based on Arabidopsis gene function research. According to the general MADS-box gene classification in Arabidopsis, the TcMADS genes were grouped into two types: type I and type II. Then, based on the phylogenetic relationships, the type I MADS-box genes were further subdivided into more detailed subfamilies: Mα (11), Mβ (2), Mγ (14), Mδ (9). The Mβ group has the minimum number of members, 2, while the corresponding group members in Arabidopsis contains 16, which indicates that genes were lost over the development of evolution. and the remaining 32 members were classified as MIKC type II. It is notable that TcMADS39 is not classified into any of these subfamilies; therefore, we group it as UN.

3.3. Conserved Motif and Structure Analysis

To gain insights into the structural diversity and similarity of MADS-box genes in coco tree, we analyzed the intron–exon arrangements and conserved motifs according to their phylogenetic relations. As shown in Figure 2A, we first constructed an individual phylogenetic tree using an NJ method similar to that of the species tree described above, and then mapped their intron–exon structure (Figure 2B). A very striking distribution of introns in the Arabidopsis MADS-box genes was previously reported: the MICK subfamily of TcMADS genes contained multiple introns, as did the Mδ group, whereas the remaining three subfamilies (Mα, Mβ and Mγ) usually had no introns, or only one or two introns. The reason the Mα, Mβ and Mγ groups contain fewer introns might be a differential tendency to lose or acquire introns or a reverse-transcribed origin for the ancestors of the three subfamilies [18]. In our study, the number of introns in TcMADS genes ranged from one (TcMADS14, TcMADS15, TcMADS4, TcMADS63, TcMADS47, TcMADS49) to eighteen (TcMADS7). Furthermore, closely related genes have a similar gene structure, differing only in the length of exons and introns. The shortest TcMADS gene was just 237 bp in length (TcMADS23), while the longest gene was TcMADS7, with a length of 1803 bp.
To further study the characteristics of the MADS-box gene family and the conserved motifs that are shared among different subfamilies in Theobroma cacao, Multiple Expectation Maximization for Motif Elicitation program was used to identify the conserved motifs. A total of 10 conservative motifs were predicted and named from Motif 1 to Motif 10 (Figure 2C). Among these motifs, Motif 1 was prevalent in all genes; it is worth noting that there were only two Motif 1s in TcMADS53. Motif 2 was also present in almost TcMADS genes. Motif 3, Motif 8, Motif 5, Motif 9 and Motif 10 were only observed in the Mγ subfamily, which indicated that they might be unique to the Mγ group. Generally, TcMADS genes of the same subfamily had similar motifs; we speculated that they might have a similar biological function.

3.4. Genome Distribution and Gene Evolution Analysis of TcMADS Genes

According to the location information acquired from genome annotation file downloaded in Ensembl Plants database, 69 TcMADS genes were evenly distributed on 10 chromosomals (Figure 3A) and renamed based on their position on the chromosome. A higher abundance of MADS-box genes (18.84%) of coco tree was observed on chromosome (Chr) I and II, whereas ChrVII, ChrXI, ChrX had only two MADS-box genes (2.90%). As shown in Figure 3B, a chromosomal bias was observed in the distribution of Mγ subfamily, which was mainly confined to ChrV. ChrIII and ChrVIII were both contained nine TcMADS genes. The other MADS-box genes of Theobroma cacao were located as follows: 5, 10 and 4 on ChrIV, ChrV and ChrVI, respectively.
Some of the MADS-box genes distribution showed a relatively high density on chromosomes. We screened tandem duplicated gene pairs among sixty-nine TcMADS genes. The analysis showed that three genes (TcMADS41, TcMADS42, TcMADS43) on ChrV are duplications of each other, and two gene pairs were also found on ChrV (TcMADS49&TcMADS50, TcMADS45&TcMADS46) and one pair on ChrII (TcMADS68&TcMADS69). Additionally, the substitution ratio of non-synonymous (Ka) to synonymous (Ks) mutations (Ka/Ks) of above six pairs were calculated. As shown in Table 2, Ka/Ks values of TcMADS43&TcMADS42 and TcMADS43&TcMADS41 > 1, which means that these genes were positively selected over the course of evolution and the new protein functions could be beneficial to the survival and reproduction of the coco tree. The remaining four gene pairs had Ka/Ks < 1, indicating that these duplicated gene pairs evolved under purifying selection.

3.5. Analysis of Putative Promoter Regions in TcMADS Genes

The cis-regulatory elements serve as a molecular switch by binding to transcription factors, which are associated with gene transcription initiation and transcription activity. To explore the putative functions of TcMADS genes, we extracted and examined the 2k bp sequences upstream the transcription start site. Four types of cis-acting elements were present in the promoter regions when submitted to PlantCARE Online program, including a light-responsive element, wound-responsive element, gibberellin-responsive element and auxin-responsive element. These were identified in our study, indicating that TcMADS genes are closely related to abiotic stress response. The distribution of these cis-acting elements on the promoters is shown in Supplementary Figure S1. Light-responsive elements were present in almost all promoter regions of MADS-box genes, with an especially large number in TcMADS28, TcMADS46, TcMADS62, TcMADS65.

4. Discussion

MADS-box proteins are major transcription factors involved in almost every biological process, and a surprising number of them have been systematically identified and analyzed in a variety of species. Although many MADS-box genes have been shown to have conserved functions in flower development and fruit ripening [38,39,40,41], some MADS-box genes have acquired novel functions in specific species during evolution [42]. To date, no detailed analysis of MADS-box genes has been performed in Theobroma coco. A better understanding of this family in terms of their member feature, structure characteristics can provide new ideas for further functional analysis. Compared with previous studies, the number of this family member varies in different species, with 107 in Arabidopsis [18], 105 in populus trichocarpa [19], 48 in pineapple [20], 182 in Saccharum spontaneum [21], 44 in Erigeron breviscapus [19,22], 64 in Salix suchowensis [43]. A total of 69 MADS-box proteins were identified from the coco tree in this study (Table 1), which is less than that in Arabidopsis. One possible explanation for this is that MADS-box genes coco tree may have a higher gene loss rate compared to that of Arabidopsis, indicating an important role of gene duplication over the course of evolution in various species [44]. These 69 TcMADS genes were renamed (TcMADS1-TcMADS69) based on their chromosomal location and further classified two types according to their phylogenetic relationship with Arabidopsis: type I including subclass Mα (11 genes), Mβ (2 genes), Mγ (14 genes), Mδ (9 genes) and type II MIKC (32 genes). The remaining TcMADS gene was classified as group UN. We found that most MADS-box genes belong to the MIKC subfamily, and Theobroma coco had a comparable number of Mδ and Mγ genes but fewer Mα, Mβ and MIKC genes than Arabidopsis, meaning that Arabidopsis may undergo more gene duplication events than Theobroma cacao. The structures of two types of MADS-box genes were obviously different, and MIKC subgroup genes were more conservative compared with other groups. Additionally, Type II genes usually have multiple introns, whereas most Mα, Mβ and Mγ members have fewer or no introns, indicating that these genes may experience more intron loss during gene family diversification. Previous studies proposed that the number of gene introns correlates with the expression level: the fewer the introns, the higher the pression [45,46]. The same pattern of intron–exon structures in type I and type II exists among diverse species including watermelon [47], Brachypodium distachyon [48], rice [49], and lettuce [50]. Overall, genes within the same group are structurally different from other genes; therefore, we speculated that there may be a complicated gene structural evolution in TcMADS genes.
Phylogenomic analyses shows that gene and genome duplication events usually contribute to the diversification of the MADS-box transcription factor and play significant roles in shaping the regulatory networks involved in key phenotypic characters [51]. In this study, six tandem-duplicated gene pairs were identified, which all belong to the Mδ subfamily. As ubiquitous genetic components, promoters drive gene transcription and precisely, temporarily and spatially control gene in response to developmental and environmental signals [52]. The cis-acting elements located upstream of the transcription start sites play a vital biological role in regulating gene expression during growth and development [53]. A promoter analysis indicated that TcMADS genes are involved in diverse stress and hormone responses, making it possible to study individual gene function.

5. Conclusions

In this study, a systematic analysis was conducted of the Theobroma MADS-box gene family. Based on the Theobroma cacao genome data, we used HMM profiles to identify 69 MADS-box genes. Sixty-nine MADS-box genes were distributed across 10 chromosomes and phylogenetically classified into six subfamilies, which showed high similarity in terms of gene structure and conserved motifs within the same subfamily. Furthermore, cis-acting elements analysis show that TcMADS genes may be involved in diverse stress responses. In summary, these results provided more information about MADS-box genes and establish a foundation for future study of MADS-box genes in Theobroma cacao.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/genes12111799/s1, File S1: MADS-box gene sequences identified in Theobroma cacao in this study; Table S1: Sequence logs of ten conserved motifs in Theobroma cacao; Figure S1: The predicted cis-regulatory elements in promoters of TcMADS genes.

Author Contributions

Conceptualization, Q.Z.; methodology, Q.Z.; software, Q.Z. and S.H.; formal analysis, S.H. and Z.S.; investigation, Q.Z.; resources, Q.Z. and J.C.; writing—original draft preparation, Q.Z.; writing—review and editing, Y.G. and D.L.; visualization, Q.Z. and J.M.; supervision, Q.Z.; project administration, R.W. and Y.G.; funding acquisition, Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (31370669).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be available on reasonable request.

Acknowledgments

We are thankful to Ang Dong (Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University) and Shiya Shen (Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology Beijing Forestry University) for the kindly technical support. This work was supported by a grant from the National Natural Science Foundation of China (31370669).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. De Bodt, S.; Raes, J.; Van de Peer, Y.; Theissen, G. And then there were many: MADS goes genomic. Trends Plant Sci. 2003, 8, 475–483. [Google Scholar] [CrossRef] [PubMed]
  2. Passmore, S.; Elble, R.; Tye, B.K. A protein involved in minichromosome maintenance in yeast binds a transcriptional enhancer conserved in eukaryotes. Genes Dev. 1989, 3, 921–935. [Google Scholar] [CrossRef] [Green Version]
  3. Yanofsky, M.F.; Ma, H.; Bowman, J.L.; Drews, G.N.; Feldmann, K.A.; Meyerowitz, E.M. The protein encoded by the Arabidopsis homeotic gene agamous resembles transcription factors. Nature 1990, 346, 35–39. [Google Scholar] [CrossRef]
  4. Sommer, H.; Beltrán, J.P.; Huijser, P.; Pape, H.; Lönnig, W.E.; Saedler, H.; Schwarz-Sommer, Z. Deficiens, a homeotic gene involved in the control of flower morphogenesis in Antirrhinum majus: The protein shows homology to transcription factors. EMBO J. 1990, 9, 605–613. [Google Scholar] [CrossRef] [PubMed]
  5. Norman, C.; Runswick, M.; Pollock, R.; Treisman, R. Isolation and properties of cDNA clones encoding SRF, a transcription factor that binds to the c-fos serum response element. Cell 1988, 55, 989–1003. [Google Scholar] [CrossRef]
  6. Kaufmann, K.; Melzer, R.; Theissen, G. MIKC-type MADS-domain proteins: Structural modularity, protein interactions and network evolution in land plants. Gene 2005, 347, 183–198. [Google Scholar] [CrossRef]
  7. Henschel, K.; Kofuji, R.; Hasebe, M.; Saedler, H.; Münster, T.; Theissen, G. Two ancient classes of MIKC-type MADS-box genes are present in the moss Physcomitrella patens. Mol. Biol. Evol. 2002, 19, 801–814. [Google Scholar] [CrossRef] [PubMed]
  8. Li, D.; Liu, C.; Shen, L.; Wu, Y.; Chen, H.; Robertson, M.; Helliwell, C.A.; Ito, T.; Meyerowitz, E.; Yu, H. A repressor complex governs the integration of flowering signals in Arabidopsis. Dev. Cell 2008, 15, 110–120. [Google Scholar] [CrossRef] [Green Version]
  9. Deng, W.; Ying, H.; Helliwell, C.A.; Taylor, J.M.; Peacock, W.J.; Dennis, E.S. FLOWERING LOCUS C (FLC) regulates development pathways throughout the life cycle of Arabidopsis. Proc. Natl. Acad. Sci. USA 2011, 108, 6680–6885. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Ferrándiz, C.; Liljegren, S.J.; Yanofsky, M.F. Negative regulation of the SHATTERPROOF genes by FRUITFULL during Arabidopsis fruit development. Science 2000, 289, 436–438. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Ferrándiz, C.; Gu, Q.; Martienssen, R.; Yanofsky, M.F. Redundant regulation of meristem identity and plant architecture by FRUITFULL, APETALA1 and CAULIFLOWER. Development 2000, 127, 725–734. [Google Scholar] [CrossRef] [PubMed]
  12. Köhler, C.; Hennig, L.; Spillane, C.; Pien, S.; Gruissem, W.; Grossniklaus, U. The Polycomb-group protein MEDEA regulates seed development by controlling expression of the MADS-box gene PHERES1. Genes Dev. 2003, 17, 1540–1553. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Li, C.; Lin, H.; Chen, A.; Lau, M.; Jernstedt, J.; Dubcovsky, J. Wheat VRN1, FUL2 and FUL3 play critical and redundant roles in spikelet development and spike determinacy. Development 2019, 146, dev175398. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Michaels, S.D.; Amasino, R.M. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell. 1999, 11, 949–956. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Moser, M.; Asquini, E.; Miolli, G.V.; Weigl, K.; Hanke, M.V.; Flachowsky, H.; Si-Ammour, A. The MADS-box gene MdDAM1 controls growth cessation and bud dormancy in Apple. Front. Plant Sci. 2020, 11, 1003. [Google Scholar] [CrossRef] [PubMed]
  16. Gan, Y.; Filleur, S.; Rahman, A.; Gotensparre, S.; Forde, B.G. Nutritional regulation of ANR1 and other root-expressed MADS-box genes in Arabidopsis thaliana. Planta 2005, 222, 730–742. [Google Scholar] [CrossRef]
  17. Kutter, C.; Schöb, H.; Stadler, M.; Meins, F.J.; Si-Ammour, A. MicroRNA-mediated regulation of stomatal development in Arabidopsis. Plant Cell. 2007, 19, 2417–2429. [Google Scholar] [CrossRef] [Green Version]
  18. Parenicová, L.; de Folter, S.; Kieffer, M.; Horner, D.S.; Favalli, C.; Busscher, J.; Cook, H.E.; Ingram, R.M.; Kater, M.M.; Davies, B.; et al. Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: New openings to the MADS world. Plant Cell. 2003, 15, 1538–1551. [Google Scholar] [CrossRef] [Green Version]
  19. Leseberg, C.H.; Li, A.; Kang, H.; Duvall, M.; Mao, L. Genome-wide analysis of the MADS-box gene family in Populus trichocarpa. Gene 2006, 378, 84–94. [Google Scholar] [CrossRef]
  20. Zhang, X.; Fatima, M.; Zhou, P.; Ma, Q.; Ming, R. Analysis of MADS-box genes revealed modified flowering gene network and diurnal expression in pineapple. BMC Genom. 2020, 21, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Fatima, M.; Zhang, X.; Lin, J.; Zhou, P.; Zhou, D.; Ming, R. Expression profiling of MADS-box gene family revealed its role in vegetative development and stem ripening in S. spontaneum. Sci. Rep. 2020, 10, 20536. [Google Scholar] [CrossRef]
  22. Tang, W.; Tu, Y.; Cheng, X.; Zhang, L.; Meng, H.; Zhao, X.; Zhang, W.; He, B. Genome-wide identification and expression profile of the MADS-box gene family in Erigeron breviscapus. PLoS ONE 2019, 14, e0226599. [Google Scholar] [CrossRef]
  23. Mustiga, G.M.; Gezan, S.A.; Phillips-Mora, W.; Arciniegas-Leal, A.; Mata-Quirós, A.; Motamayor, J.C. Phenotypic description of Theobroma cacao L. for yield and vigor traits from 34 hybrid families in Costa Rica based on the genetic basis of the parental population. Front. Plant Sci. 2018, 9, 808. [Google Scholar] [CrossRef] [PubMed]
  24. Corti, R.; Flammer, A.J.; Hollenberg, N.K.; Lüscher, T.F. Cocoa and cardiovascular health. Circulation 2009, 119, 1433–1441. [Google Scholar] [CrossRef] [Green Version]
  25. Argout, X.; Salse, J.; Aury, J.M.; Guiltinan, M.J.; Droc, G.; Gouzy, J.; Allegre, M.; Chaparro, C.; Legavre, T.; Maximova, S.N. The genome of Theobroma cacao. Nat. Genet. 2011, 3, 101–108. [Google Scholar] [CrossRef]
  26. Shen, S.; Zhang, Q.; Shi, Y.; Sun, Z.; Zhang, Q.; Hou, S.; Wu, R.; Jiang, L.; Zhao, X.; Guo, Y. Genome-wide analysis of the NAC Domain transcription factor gene family in Theobroma cacao. Genes 2019, 11, 35. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Silva Monteiro de Almeida, D.; Oliveira Jordão do Amaral, D.; Del-Bem, L.E.; Bronze Dos Santos, E.; Santana Silva, R.J.; Peres Gramacho, K.; Vincentz, M.; Micheli, F. Genome-wide identification and characterization of cacao WRKY transcription factors and analysis of their expression in response to witches’ broom disease. PLoS ONE 2017, 12, e0187346. [Google Scholar]
  28. Martins Alves, A.M.; Pereira Menezes Reis, S.; Peres Gramacho, K.; Micheli, F. The glutathione peroxidase family of Theobroma cacao: Involvement in the oxidative stress during witches’ broom disease. Int. J. Biol. Macromol. 2020, 164, 3698–3708. [Google Scholar] [CrossRef] [PubMed]
  29. Li, F.; Wu, B.; Yan, L.; Qin, X.; Lai, J. Metabolome and transcriptome profiling of Theobroma cacao provides insights into the molecular basis of pod color variation. J. Plant Res. 2021, 134, 1323–1334. [Google Scholar] [CrossRef]
  30. El-Gebali, S.; Mistry, J.; Bateman, A.; Eddy, S.R.; Luciani, A.; Potter, S.C.; Qureshi, M.; Richardson, L.J.; Salazar, G.A.; Smart, A. The Pfam protein families database in 2019. Nucleic Acids Res. 2019, 47, D427–D432. [Google Scholar] [CrossRef]
  31. Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.; McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R. Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23, 2947–2948. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [Google Scholar] [CrossRef]
  33. Edgar, R.C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 2004, 5, 113. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinform. 2020, 69, e96. [Google Scholar] [CrossRef] [PubMed]
  35. Bailey, T.L.; Boden, M.; Buske, F.A.; Frith, M.; Grant, C.E.; Clementi, L.; Ren, J.; Li, W.W.; Noble, W.S. MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res. 2009, 37, W202–W208. [Google Scholar] [CrossRef] [PubMed]
  36. Yang, S.; Zhang, X.; Yue, J.X.; Tian, D.; Chen, J.Q. Recent duplications dominate NBS-encoding gene expansion in two woody species. Mol. Genet. Genom. 2008, 280, 187–198. [Google Scholar] [CrossRef]
  37. Suyama, M.; Torrents, D.; Bork, P. PAL2NAL: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006, 34, W609–W612. [Google Scholar] [CrossRef] [Green Version]
  38. Qi, X.; Liu, C.; Song, L.; Li, M. PaMADS7, a MADS-box transcription factor, regulates sweet cherry fruit ripening and softening. Plant Sci. 2020, 301, 110634. [Google Scholar] [CrossRef]
  39. Liu, J.H.; Xu, B.Y.; Zhang, J.; Jin, Z.Q. The interaction of MADS-box transcription factors and manipulating fruit development and ripening. Yi Chuan 2010, 32, 893–902. [Google Scholar] [PubMed]
  40. Vrebalov, J.; Ruezinsky, D.; Padmanabhan, V.; White, R.; Medrano, D.; Drake, R.; Schuch, W.; Giovannoni, J. A MADS-box gene necessary for fruit ripening at the tomato ripening-inhibitor (rin) locus. Science 2002, 296, 343–346. [Google Scholar] [CrossRef]
  41. Fujisawa, M.; Nakano, T.; Shima, Y.; Ito, Y. A large-scale identification of direct targets of the tomato MADS box transcription factor RIPENING INHIBITOR reveals the regulation of fruit ripening. Plant Cell 2013, 25, 371–386. [Google Scholar] [CrossRef] [Green Version]
  42. Smaczniak, C.; Immink, R.G.; Angenent, G.C.; Kaufmann, K. Developmental and evolutionary diversity of plant MADS-domain factors: Insights from recent studies. Development 2012, 139, 3081–3098. [Google Scholar] [CrossRef] [Green Version]
  43. Qu, Y.; Bi, C.; He, B.; Ye, N.; Yin, T.; Xu, L.A. Genome-wide identification and characterization of the MADS-box gene family in Salix suchowensis. PeerJ 2019, 7, e8019. [Google Scholar] [CrossRef] [Green Version]
  44. Airoldi, C.A.; Davies, B. Gene duplication and the evolution of plant MADS-box transcription factors. J. Genet. Genom. 2012, 39, 157–165. [Google Scholar] [CrossRef]
  45. Chung, B.Y.; Simons, C.; Firth, A.E.; Brown, C.M.; Hellens, R.P. Effect of 5’UTR introns on gene expression in Arabidopsis thaliana. BMC Genom. 2006, 7, 120. [Google Scholar] [CrossRef] [Green Version]
  46. Jeffares, D.C.; Penkett, C.J.; Bähler, J. Rapidly regulated genes are intron poor. Trends Genet. 2008, 24, 375–378. [Google Scholar] [CrossRef]
  47. Wang, P.; Wang, S.; Chen, Y.; Xu, X.; Guang, X.; Zhang, Y. Genome-wide analysis of the MADS-box gene family in Watermelon. Comput. Biol. Chem. 2019, 80, 341–350. [Google Scholar] [CrossRef] [PubMed]
  48. Wei, B.; Zhang, R.Z.; Guo, J.J.; Liu, D.M.; Li, A.L.; Fan, R.C.; Mao, L.; Zhang, X.Q. Genome-wide analysis of the MADS-box gene family in Brachypodium distachyon. PLoS ONE 2014, 9, e84781. [Google Scholar]
  49. Arora, R.; Agarwal, P.; Ray, S.; Singh, A.K.; Singh, V.P.; Tyagi, A.K.; Kapoor, S. MADS-box gene family in rice: Genome-wide identification, organization and expression profiling during reproductive development and stress. BMC Genom. 2007, 8, 242. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  50. Ning, K.; Han, Y.; Chen, Z.; Luo, C.; Wang, S.; Zhang, W.; Li, L.; Zhang, X.; Fan, S.; Wang, Q. Genome-wide analysis of MADS-box family genes during flower development in lettuce. Plant Cell Environ. 2019, 42, 1868–1881. [Google Scholar] [CrossRef]
  51. Shan, H.; Zahn, L.; Guindon, S.; Wall, P.K.; Kong, H.; Ma, H.; DePamphilis, C.W.; Leebens-Mack, J. Evolution of plant MADS box transcription factors: Evidence for shifts in selection associated with early angiosperm diversification and concerted gene duplications. Mol. Biol. Evol. 2009, 26, 2229–2244. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. Hernandez-Garcia, C.M.; Finer, J.J. Identification and validation of promoters and cis-acting regulatory elements. Plant Sci. 2014, 217–218, 109–119. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Ho, C.L.; Geisler, M. Genome-wide computational identification of biologically significant cis-regulatory elements and associated transcription factors from rice. Plants 2019, 8, 441. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Phylogenetic tree of MADS-box genes in Arabidopsis and Theobroma cacao. The MADS-box genes are indicated with light pink and light green shade for Arabidopsis and Theobroma cacao, respectively. In second (narrow) ring from outside, the size of the area represented by the two colors shows the proportion of genes from two species in each group. The subgroups are marked by colorful background and circles.
Figure 1. Phylogenetic tree of MADS-box genes in Arabidopsis and Theobroma cacao. The MADS-box genes are indicated with light pink and light green shade for Arabidopsis and Theobroma cacao, respectively. In second (narrow) ring from outside, the size of the area represented by the two colors shows the proportion of genes from two species in each group. The subgroups are marked by colorful background and circles.
Genes 12 01799 g001
Figure 2. Phylogenetic relationship, gene structure and conserved motifs of the TcMADS genes. (A) An unrooted NJ tree (left side of the figure) obtained using the MEGA X based on coco tree MADS-box protein sequences. (B) The exon–intro structures of Theobroma cacao MADS-box genes (central of the figure) were displayed by TBtools software. (C) Conserved motif composition of the TcMADS proteins (right side). Detailed information on the ten motifs is provided in Supplementary Table S1.
Figure 2. Phylogenetic relationship, gene structure and conserved motifs of the TcMADS genes. (A) An unrooted NJ tree (left side of the figure) obtained using the MEGA X based on coco tree MADS-box protein sequences. (B) The exon–intro structures of Theobroma cacao MADS-box genes (central of the figure) were displayed by TBtools software. (C) Conserved motif composition of the TcMADS proteins (right side). Detailed information on the ten motifs is provided in Supplementary Table S1.
Genes 12 01799 g002
Figure 3. (A) Physical distribution of TcMADS genes among 10 chromosomes. (B) Number of TcMADS subfamily on each chromosome.
Figure 3. (A) Physical distribution of TcMADS genes among 10 chromosomes. (B) Number of TcMADS subfamily on each chromosome.
Genes 12 01799 g003
Table 1. Detailed information regarding MADS-box gene family in Theobroma cacao.
Table 1. Detailed information regarding MADS-box gene family in Theobroma cacao.
Gene NameGene IDPhysicochemical CharacteristicsSLORF
PIMW (Da)Length (aa)Instability IndexAliphatic Index
TcMADS1TCM_0002399.4846,436.940639.0977.32nucleus1221
TcMADS2TCM_0002669.5926,095.822446.5287.1chloroplast675
TcMADS3TCM_0007259.9136,097.6831558.2383.87nucleus948
TcMADS4TCM_0008786.8526,140.8923741.1178.23chloroplast714
TcMADS5TCM_0009319.1229,034.1425045.3384.56chloroplast753
TcMADS6TCM_0009926.5525,353.7922453.3285.76nucleus675
TcMADS7TCM_0011815.8666,752.7560048.7677.52nucleus1803
TcMADS8TCM_0011825.4337,831.3833762.273.77nucleus1014
TcMADS9TCM_0013359.1938,279.1533838.4968.11nucleus1017
TcMADS10TCM_0018419.8531,109.1826962.986.17chloroplast810
TcMADS11TCM_0054568.9130,163.0826253.6384.89endomembrane system789
TcMADS12TCM_0054589.0827,810.8424338.4982.3nucleus732
TcMADS13TCM_0058188.5142,280.2737542.75101.09nucleus1128
TcMADS14TCM_0063239.4228,732.0225452.2262.24nucleus765
TcMADS15TCM_0063249.1527,699.7724343.9785.56nucleus732
TcMADS16TCM_0063255.4224,228.1221845.8955.96nucleus657
TcMADS17TCM_0073249.5220,330.4517448.32100.29chloroplast525
TcMADS18TCM_0073788.9624,754.1221951.1385.11chloroplast660
TcMADS19TCM_0077137.7427,574.323366.5480.73chloroplast702
TcMADS20TCM_0077879.1236,022.5731046.3192.13nucleus933
TcMADS21TCM_0087038.529,428.2925846.9982.79nucleus777
TcMADS22TCM_0087165.9222,921.2119956.2689.65nucleus600
TcMADS23TCM_0089739.398995.457851.8996.15chloroplast237
TcMADS24TCM_0114758.9728,016.0824157.3680.17nucleus726
TcMADS25TCM_0114786.6127,447.9624058.3873.62nucleus723
TcMADS26TCM_0116876.3339,385.4835147.7378.66nucleus1056
TcMADS27TCM_0124896.8523,710.1521040.5271.57nucleus633
TcMADS28TCM_0140516.1341,766.3937051.4677.22chloroplast1113
TcMADS29TCM_0143378.7946,247.6140752.1488.87nucleus1224
TcMADS30TCM_0143459.0627,737.4723950.1776.78nucleus720
TcMADS31TCM_0146619.8324,429.0721055.0786.33chloroplast633
TcMADS32TCM_0150448.8227,249.0223653.7686.78nucleus711
TcMADS33TCM_0150499.8827,106.3823747.2390.13chloroplast714
TcMADS34TCM_0156745.4727,657.4523869.787.65nucleus717
TcMADS35TCM_0161479.2424,830.3921563.3173.95nucleus648
TcMADS36TCM_0172428.5124,320.8220947.0886.75nucleus630
TcMADS37TCM_0189799.0727,740.6324334.0685.14nucleus732
TcMADS38TCM_0189818.7728,366.1424862.8680.24nucleus747
TcMADS39TCM_0193628.2330,811.5826758.2470.15nucleus804
TcMADS40TCM_0210509.717,902.0715545.2989.94chloroplast468
TcMADS41TCM_0229939.5140,637.6335653.1676.71chloroplast1071
TcMADS42TCM_0230069.238,815.1935458.3670.54nucleus1065
TcMADS43TCM_0230418.9338,451.3735459.0169.49nucleus1065
TcMADS44TCM_0245796.0428,901.425261.3985.48nucleus759
TcMADS45TCM_0256708.8626,594.6523364.6674.12chloroplast702
TcMADS46TCM_0256719.3726,384.4323363.0270.39nucleus702
TcMADS47TCM_0256749.6416,948.7315036.0379.4nucleus453
TcMADS48TCM_02567610.2911,744.910356.2668.25chloroplast312
TcMADS49TCM_0268429.2630,49927346.0671.87nucleus822
TcMADS50TCM_0268459.4723,787.5220749.1578.74chloroplast624
TcMADS51TCM_0292344.9119,812.0517442.7176.21nucleus525
TcMADS52TCM_0295189.5220,156.218244.4579.84chloroplast549
TcMADS53TCM_0295199.6449,499.2443750.1568.56nucleus1314
TcMADS54TCM_0295969.6834,012.7629466.0374.05nucleus885
TcMADS55TCM_0324029.2513,764.4513234.3860.68nucleus399
TcMADS56TCM_0324037.7419,620.3417250.9874.24chloroplast519
TcMADS57TCM_0341489.0826,051.722533.2188.36chloroplast678
TcMADS58TCM_0345017.6425,504.1622755.8191.06nucleus684
TcMADS59TCM_0345499.6221,512.8418450.388.42chloroplast555
TcMADS60TCM_0347575.4537,593.5433359.5284.83nucleus1002
TcMADS61TCM_0349705.2638,476.1633760.1882.43nucleus1014
TcMADS62TCM_0352128.8724,432.8720961.7578.42nucleus630
TcMADS63TCM_0364739.3418,742.0416241.1999.38chloroplast489
TcMADS64TCM_0365419.4325,550.2422261.6191.4chloroplast669
TcMADS65TCM_0365689.5223,170.7820356.6287nucleus612
TcMADS66TCM_0373949.6221,551.0118642.992.31chloroplast561
TcMADS67TCM_0407359.7624,088.721439.1887.01chloroplast645
TcMADS68TCM_0427994.8426,046.3923357.568.28nucleus702
TcMADS69TCM_0428484.8126,121.6923355.4175.41nucleus702
Table 2. Tandem duplicated gene pairs and their Ka, Ks, Ka/Ks values.
Table 2. Tandem duplicated gene pairs and their Ka, Ks, Ka/Ks values.
Tandem Duplicated Gene PairsChromosomeKaKsKa/Ks
TcMADS43&TcMADS42ChrV0.1083010.1037581.04379
TcMADS43&TcMADS41ChrV0.2132540.1704081.25143
TcMADS42&TcMADS41ChrV0.2158780.2365280.912695
TcMADS49&TcMADS50ChrV0.0989150.2378760.415826
TcMADS45&TcMADS46ChrV0.0780580.1113670.700902
TcMADS68&TcMADS69ChrII0.0354490.0864530.410042
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, Q.; Hou, S.; Sun, Z.; Chen, J.; Meng, J.; Liang, D.; Wu, R.; Guo, Y. Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Theobroma cacao. Genes 2021, 12, 1799. https://doi.org/10.3390/genes12111799

AMA Style

Zhang Q, Hou S, Sun Z, Chen J, Meng J, Liang D, Wu R, Guo Y. Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Theobroma cacao. Genes. 2021; 12(11):1799. https://doi.org/10.3390/genes12111799

Chicago/Turabian Style

Zhang, Qianqian, Sijia Hou, Zhenmei Sun, Jing Chen, Jianqiao Meng, Dan Liang, Rongling Wu, and Yunqian Guo. 2021. "Genome-Wide Identification and Analysis of the MADS-Box Gene Family in Theobroma cacao" Genes 12, no. 11: 1799. https://doi.org/10.3390/genes12111799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop