Phylogenomic Analysis of the PEBP Gene Family from Kalanchoë

: The PEBP family comprises proteins that function as key regulators of flowering time throughout the plant kingdom and they also regulate growth and plant architecture. Within the PEBP protein family, three subfamilies can be distinguished in angiosperms: MOTHER OF FT AND TFL1 -like (MFT), FLOWERING LOCUS T-like (FT-like), and TERMINAL FLOWER1-like ( TFL1 -like). Taking advantage of the genome sequences available from K. fedtschenkoi and K. laxiflora , we performed computational analysis to identify the members of the PEBP gene family in these species. The analyses revealed the existence of 11 PEBP genes in K. fedtschenkoi and 18 in K. laxiflora , which are clustered in two clades: FT -like and TFL1 -like. The PEBP genes had conserved gene structure and the proteins had highly conserved amino acid sequences in the positions crucial for the protein functions. The analysis of Ka/Ks ratio revealed that most recently duplicated genes are under positive selection. Despite being an economically important genus, the genetics underlying the regulation of flowering in Kalanchoë is poorly understood. The results of this study may provide a new insight into the molecular control of flowering that will allow further studies on flowering control in Kalanchoë.


Introduction
The family of phosphatidylethanolamine-binding proteins (PEBPs) is a group of proteins present in all eukaryote kingdoms. Despite extensive sequence conservation, PEBP proteins act as regulators of various signaling pathways to control growth and differentiation [1,2]. Phylogenetic studies suggest that the PEBP gene family can be divided into three subfamilies: MOTHER OF FT AND TFL1-like (MFT-like), FLOWERING LOCUS T-like (FT-like), and TERMINAL FLOWER 1-like (TFL1-like) [1,3]. It is thought that the MFT-like clade is the evolutionary ancestor to the FT-like and TFL1-like clades. A duplication of an ancestral MFT-like gene might have given rise to the MFT-like clade and the FT/TFL1-like clade. Further diversification of function and a second duplication resulted in the emergence of separate FT-like and TFL1-like clades [3,4]. A recent study revealed that the MFT-like subfamily exists in both basal land plants (bryophytes and pteridophytes) and seed plants (gymnosperms and angiosperms), while FT-like and TFL1-like genes are only found in gymnosperm and angiosperms. This suggests that the first duplication event took place after the divergence of basal land plants from the common ancestor of seed plants, while the second duplication event occurred before the divergence of seed plants. Additionally, within the three systematically analyzed the gene structure, gene family evolution and protein attributes to identify relevant targets for future functional genomic studies.

Identification of PEBP Sequences
The sequences were obtained by annotation and using full-length A. thaliana sequences as query sequences in BLASTP and TBLASTX searches against genomes and proteomes of K. fedtschenkoi  (Table 1). Obtained protein sequences were analyzed for the presence of PEBP domains using the CDD database [38].  [39] was performed to determine the homologous relationships among PEBP proteins from Kalanchoë and other species.

Protein Characterization
The protein analysis was performed using ProtParam [40] for prediction of protein length, molecular weight, pI and instability index.
To identify conserved motifs in PEBP family proteins, the online MEME v.4.12.0 (Multiple Expectation Maximization for Motif Elicitation) [41] program was used with minimum motif length of 6 and maximum of 100, the maximum number of motifs was set to 20. Protein structure prediction was performed using SWISSMODEL and Swiss PdbViewer [42].

Gene Structure Analysis
The information about the gene length and distribution of exons and introns in PEBP genes was obtained from Phytozome. The intron/exon organization for PEBP genes was determined by aligning the CDS sequences to their corresponding genomic DNA sequences and using the result as the input for graphical display at the Gene Structure Display Server v2 [43].
Gene CDS sequences were aligned and percent identity matrix was generated using Clustal Omega with default parameters [44].

Gene Duplication Analysis
MEGA 7.0 software [45] was used to calculate rate of nonsynonymous substitutions (Ka) and synonymous substitutions (Ks). Codon-based testing of purifying selection for analysis between sequences was conducted using the modified Nei-Gojobori (assumed transition/transversion bias = 2) method. All ambiguous positions were removed for each sequence pair. The dates of the duplication events (T-Duplication time; Mya-Million years ago) between the duplicated Kalanchoë genes associated with terminal branches in the tree species clades were calculated by the equation [46] T = Ks/2λ × 10 −6 Mya, the λ = 1.5 × 10 −8 (1) Diversity analysis (nonsynonymous substitutions per nonsynonymous site; Ka/synonymous substitutions per synonymous site-Ks) was performed using the DnaSP v5.10 program [47] with a sliding window mode (window size 50, step 10).

Identification of PEBP Sequences
BLAST searches using PEBP-like nucleotide and protein sequences from A. thaliana against plant protein and nucleotide databases of two Kalanchoë species resulted in 29 PEBP-like genes being retrieved; predicted to encode 30 proteins. There were 11 genes in K. fedtschenkoi that is a diploid species and 18 in K. laxiflora that is a tetraploid species. Based on annotations provided by the Phytozome database the genes and corresponding proteins were assigned to three groups: FT-like (seven genes from K. fedtschenkoi and 12 from K. laxiflora), TFL1-like (three genes from K. fedtschenkoi and four from K. laxiflora), and BFT-like (one gene from K. fedtschenkoi and two from K. laxiflora). The detailed information about Phytozome gene IDs assigned gene names and number of transcripts are included in Table 1.

Comparative Phylogenetic Analysis
The unrooted neighbor-joining phylogenetic tree was constructed using predicted amino acid protein sequences from Kalanchoë and six other species in which functional assessment of FT and TFL1 functions was confirmed by transgenic approach [4,15,48] (Figure 1). The phylogenetic analysis revealed that tree main clades MFT-like, FT-like, and TFL1/CEN/BFTlike could be distinguished in the constructed tree as described previously [2]. In the FT-like clade, proteins of K. fedtschenkoi and K. laxiflora grouped very closely together forming three distinct groups. Furthermore, the Kalanchoë FT-like proteins shared high homology with those from other perennial plants. Within TFL1/CEN/BFT-like clade, the BFT-like proteins formed clearly separated subclade containing Kalanchoë BFT-like proteins together with Populus and Arabidopsis BFT, as well as CsAFT, an antiflorigen protein from Chrysanthemum. The TFL1-like proteins from Kalanchoë grouped together with both TFL1-and CEN-like proteins.
The comparative phylogenetic analysis confirmed that PEBP-like proteins from Kalanchoë are homologous to FT-like, TFL1-like and CEN-like, and BFT-like proteins from other species, and that MFT-like sequences could not be identified in Kalanchoë.
The gene homology analysis using DELTA-BLAST further confirmed that Kalanchoë FT-like, TFL1-like, and BFT-like genes are homologous to the A. thaliana FT gene (69-78% sequence similarity), the CEN/TFL genes (62-77% sequence similarity), and the BFT gene (66-67% sequence similarity), respectively (Table S1). Furthermore, the homology analysis comparing the Kalanchoë PEBP genes and sequences available in the Genbank for various species confirmed that Kalanchoë sequences have high homology with perennial species (Table S2).

Kalanchoë Protein Characterization
The length of the proteins ranged from 112 to 203 aa (average: 176 aa; median: 177 aa) in FT-like proteins and 145 to 179 aa (average: 172 aa; median: 175 aa) in TFL1-like proteins. The BFT-like proteins were either 176 or 177 aa. All identified proteins were characterized by a conserved PEBP domain. Domain analysis of primary transcript results from the CDD database confirmed the presence of the PEBP domain in the N-terminal regions of these proteins, except for KlFTL1 that contained only a partial PEBP domain. The multiple alignment of domain sequences followed by phylogenetic analysis confirmed the presence of three protein groups corresponding to FT-like, TFL1like, and BFT-like proteins ( Figure 2). The Kalanchoë PEBP proteins shared 40% to 100% identity at the amino acid sequence level (Table S3). The FT-like protein identity ranged between 81% to 96% in K. fedtschenkoi and 71% to 99% in K. laxiflora. There was between 72% to 100% sequence identity among FT-like proteins between both species. The TFL1-like protein identity ranged between 65% to 69% in K. fedtschenkoi and from 50% to 100% in K. laxiflora. There was between 51% to 100% sequence identity among TFL1-like proteins between both species. The BFT-like proteins in K. laxiflora showed 98% identity, while between species the identity was 98%.
The PEBP proteins had highly conserved amino acid sequences in the positions crucial for the protein functions (description in Figure 2). The differences in the conserved residues were observed in KfTFL1.1 and KlTFL1.1a at position 62 where lysine (K) was identified instead of arginine (R) that has, however, similar properties, and a deletion of the second aspartic acid (D) in the DPDXP motif that forms the anion-binding site was observed in the KlTFL1.4 protein. In addition, in BFT-like proteins the arginine residues were observed at positions 129 and 131, but not at position 130.
The molecular weights of the predicted molecules were ~19-23 kDa for all proteins except from KlFTL1 (12.28 kDa) and KlTFL1.1b (15.89 kDa). The pIs were 5.24 and 9.27 and instability indexes were between 27.75 and 54.72. Of 30 analyzed proteins, 10 were predicted as stable and 20 as unstable. The detailed information about the proteins' molecular weight, isoelectric points and instability indexes are included in Table 1.
The MEME protein motif search tool identified a total of ten conserved motifs in Kalanchoë PEBP proteins ranging from 6 to 44 amino acids. The motif pattern appeared to be highly conserved among the proteins ( Figure S1).
The protein models of KfFT3 and KfTFL1.3 were constructed based on the similarity with crystal structure of A. thaliana FT, 1wkpA, and TFL1, 1wkoA ( Figure 3). KfFT3 shared 76% identity with 1wkpA, and TFL1 shared 69% identity with 1wkoA.  Similarly, to FT-like genes, the second and third exons were highly conserved with 62 and 41 bp, respectively, except the KlFTL1.1 gene (KlFTL1.1b), which had a third exon of 160 bp. The fourth exon in TFL1-like group was 221 bp in all the genes. All three BFT-like genes were characterized by structures comprising three exons and two introns. The first exon was 207 or 210 bp, while second and third exons were 103 and 221 bp, respectively, in all the genes. The introns had variable length in all the gene groups from 74 to 1396 bp. The 4-exon FT-like genes were characterized by a structure with two short introns (<200 bp) and one long intron (>500 bp). The long intron was found either in the second (12 out 18 genes) or the third position (six out 18 genes). The 4-exon TFL1-like genes had either all three short introns (<200 bp; five out of seven genes), or had a first medium length intron (~240 bp) and a second long intron (~600 bp) (two out of seven genes). The BFT-like genes were characterized by a first short intron (<100 bp) and one medium length intron (~320 bp).
The identity of coding sequences for all PEBP-like genes in Kalanchoë ranged between 50.5% and 100% (Table S3). The FT-like gene identity ranged between 72% to 99 % in K. fedtschenkoi and 69% to 99% in K. laxiflora. There was between 69% to 100% sequence identity among FT-like genes between both species. The TFL1-like gene identity ranged between 65% to 68% in K. fedtschenkoi and 56% to 99% in K. laxiflora. There was between 56% to 100% sequence identity among TFL1-like genes between both species. The BFT-like genes in K. laxiflora showed 98% identity, while between species the identity was 97% and 99%.

Gene Duplication Analysis
The number of synonymous and nonsynonymous substitutions per site of the duplicated PEBP genes in Kalanchoë associated with terminal branches in the tree species were determined using MEGA 7.0. We determined that average GC content in the third codon position that was 59% (Table  S5)  with proposed gene duplication events; red marks indicate probable respective gene duplication events (D1 to D3-4) that occurred before divergence of K. fedtschenkoi and K. laxiflora; blue marks indicate gene duplications observed only in K. laxiflora associated with probable WGD that lead to speciation; and the green mark indicates gene duplication in K. fedtschenkoi that occurred after split between K. fedtschenkoi and K. laxiflora.
Codon-based test of purifying selection revealed that the majority of duplicated gene pairs were under purifying selection, except one FT-like gene pair in K. fedtschenkoi (KfFT2-KfFT7) and three FTlike gene pairs in K. laxiflora (KlFT1-KfFTL1, KlFT2-KfFT6 and KlFT5-KfFT10) ( Table 2). To further evaluate the sequence diversity in FT and TFL1 homologs, we performed a sliding window analysis of rates of pairwise nonsynonymous (Ka) and synonymous (Ks) substitutions ( Figure 6). While exon 4 of the FT proteins had almost no amino acid substitutions indicating strong purifying selection, exon 4 of TFL1 had higher Ka/Ks ratio, suggesting more relaxed selection.

Discussion.
The PEBP family is one of the most ancient gene families, with a highly conserved gene structure and high protein sequence similarities across species [1,2]. Its members include genes with very important functions in flowering induction and plant architecture [3,4]. The PEBP genes, particularly FT-like and TFL1-like genes, have been identified and their detailed functions were studied in many plant species including model plants, crop and vegetable species, and ornamental plants [7,9,15,21,50,51]. In this study, we focused on PEBP genes from Kalanchoë that is an important genus of flowering ornamental plants. So far, there is no information available about PEBP genes in Kalanchoë or other species of the Crassulaceae family. Therefore, using available genome sequence data we identified PEBP gene members in two species-K. fedtschenkoi and K. laxiflora-and characterized their gene structures, gene family evolution, and protein features.
The FT gene is expressed in the phloem companion cells of the leaves under flower inductive conditions and corresponds to an FT protein of approximately 20 kDa with a theoretical isoelectric point of ~8-9 [52][53][54][55]. The FT protein is a key component of the florigen complex, which is translocated from leaves to SAM where it promotes flowering [4,6,56]. The TFL1 gene is expressed in the SAM [7] and the TFL1 protein is a signal that is translocated from the inner SAM cells to the outer cells and coordinates the cell identity [57]. With regard to flowering, the FT protein promotes flowering in plants, while [6] TFL1 represses flowering [49]. However, other TFL1-like proteins can be expressed in the vascular tissues and translocate to SAM to repress flowering [15,49,58,59].
Previous functional studies have shown that the PEBP family can be divided into three major functional clades [2,3,24]. In the present study, phylogenetic analysis of the deduced protein sequences demonstrated that Kalanchoë proteins can be classified into an FT-like clade and a TFL1/BFT-like clade. The FT-like proteins were closely related to proteins of perennial species, which were reported to regulate induction of flowering [19,48,[60][61][62]. Thus, these proteins are likely the FT homologs that can regulate flower transition and initiation in Kalanchoë. Similar to FT-like proteins, the TFL1-like proteins showed close relation to TFL1-like proteins that controls inflorescence meristem identity and delays the transition to the reproductive phase at the SAM [63][64][65]. Interestingly, the BFT-like proteins showed high homology to a mobile floral inhibitor from Chrysanthemum seticuspe [15]. Old physiological studies suggested an existence of a flowering inhibitor produced in leaves of Kalanchoë plants [66,67]. Thus, BFT-like proteins might fulfill the function of a mobile flower repressor. In angiosperms, all three PEBP gene families were identified in all species (including Phoenix dactylifera which was previously reported to lack MFT-like genes) [3]. However, in our study we were unable to identify MFT-like sequences in neither K. fedtschenkoi nor K. laxiflora. The inability to identify MFT-like genes might be due to genome misassembly resulting in mosaic gene sequences or gene loss in the assembly due to collapse of the repetitive surroundings [68,69]. However, even though very unlikely, the loss of MFT-like genes cannot be ruled out. K. fedtschenkoi and K. laxiflora represent the only species with available genome data from the Saxifragales order. Thus, the comparison with other closely related species is currently not possible.
It has been demonstrated in A. thaliana that FT and TFL1 may have an interchangeable roles by replacing a single amino acid [7] or protein segment [8,16] (Figure 2). Differences in FT/TFL1-like protein activities might be due to their binding affinity towards FD and/or 14-3-3 proteins. In the Kalanchoë PEBP protein alignment, we identified four amino acids, i.e., R62, P94, F101, and R130, predicted to participate in binding of 14-3-3 proteins [9,21]. Amino acid alignment revealed that Kalanchoë FT-like proteins had conserved tyrosine at position 85 (Y85) characteristic for floral promoters, while TFL1-like proteins had conserved histidine at position 85 (H85) characteristic for floral repressors [7]. Generally, FT-like proteins contain a region called segment B, which forms a loop in the protein structure and is essential for FT-like proteins to function as floral promoters. FTlike proteins usually contain tyrosine at position 134 (Y134) and tryptophan at position 138 (W138) in segment B, whereas the flowering repressor proteins contain non-tyrosine and non-tryptophan amino acids in these positions, respectively [4,8,49]. Additionally, FT-like proteins contain a triad region-Segment C-that is required for full functionality of FT-like proteins but not TFL1-like proteins [4,8]. Therefore, the presence of the mentioned residues indicates that Kalanchoë FT-like can act as flower activators and TFL1/BFT-like as flower repressors.
The number of PEBP-like genes found in different plant species varies greatly with up to 19 genes found in Glycine max and 24 copies identified in Musa acuminate [3]. The average number of PEBP genes in monocots (~17) was shown to be roughly twice the number in dicots (~8) [3]. In our study, we identified 11 genes in K. fedtschenkoi and 18 in K. laxiflora. These numbers are similar to eudicot species, such as Brassica rapa (12 PEBP genes), Solanum lycopersicum (12 PEBP genes), and G. max (19 PEBP genes), which were demonstrated to experience additional whole genome duplication events throughout their evolutionary history [3]. Recent analysis of the K. fedtschenkoi genome provided strong evidence for two ancestral WGD events in this species [35]. The comparison between PEBP trees from K. fedtschenkoi and K. laxiflora suggests that the latter species experienced a recent WGD event that based on Ks values associated with lateral branches took place between 0.9 and 3.2 MYA (Table 2 and Figure 5). Thus, the number of PEBP genes is consistent with diploid/ tetraploid nature of the analyzed Kalanchoë species.
The investigated PEBP genes had highly similar sequences and exon-intron structures (Table S3 and Figure 4). Particularly, the exonic structures were highly conserved with characteristic for PEBP gene family exon 2 (62 nt) and exon 3 (41 nt) invariable in size in FT-like and TFL1-like genes. The only exception from FT-like genes includes KlFTL1 that appears to be a pseudogene lacking the entire 4th exon. However, incomplete gene sequence can be also a result of a mistake during genome assembly. Furthermore, the BFT-like genes from both species demonstrated novel gene structure with three exons and two introns resulting from a fusion between exon 2 and exon 3 (103 nt). Even though PEBP genes have highly conserved gene structures, some species exhibit novel features such as additional intron/exon in Musa acuminate [3] and Chenopodium rubrum [70], and exon fusion in Zea mays of FT-like genes [24]. KlTFL1.1 was predicted to have alternative transcript as a result of downstream alternative usage of transcription start site (TSS). Alternative usage of TSSs and alternative splicing are key mechanisms to generate gene variation in eukaryotes. Both mechanisms are known to play important roles in tissue-specific gene expression and functional variation, which have significant impact on biological processes [71]. Alternative splicing in TFL1/CEN paralogs in saffron was reported to influence terminal flowering and flowering time [72]. Thus alternative transcript of the TFL1.1 gene in K. laxiflora may be relevant in spatiotemporal expression of the gene. However, it is also possible that downstream TSS might produce a truncated protein whose function is deteriorated or lost.
The evaluation of sequence diversity in FT and TFL1 homologs revealed that the majority of the most recently duplicated genes are under positive selection (Table 2). Furthermore, a more detailed sliding window analysis of Ka and Ks revealed strong differences in the substitution rates in exon 4 between FT-and TFL1-like genes. This is consistent with previous studies showing that segment B situated in exon 4 evolved very rapidly in TFL1 orthologs, but is almost invariant in FT orthologs. Thus, the residues encoded by the fourth exon of FT determine the function of the protein [1,8].
In Kalanchoë, flowering time is an important economic trait. Despite efforts to understand the mechanisms underlying the impact of photoperiod and temperature on the induction of flowering, little is known about the genetic basis of flower transition. The FT protein is a key integrator among different flowering pathways in angiosperms that promotes flowering [73]. In many plant species FT homologs regulate aspects of plant development in response to photoperiod and temperature [48,51,62,74,75]. Thus, the extended family of FT genes identified in Kalanchoë might be of significant relevance to flower induction in the response to different environmental cues. In breeding programs and commercial cultivation of many perennial plants a prolonged juvenility is one of the major problems. Modifying the FT/TFL1 ratio can change the flowering time [73]. Furthermore, TFL1 expression may be relevant for prevention of precocious flowering [76]. The overexpression of an FT homolog in poplar induced early flowering [62] and downregulation of a TFL1 homolog accelerated the flowering age [64]. The continuous flowering trait in roses and strawberries is associated with loss of function of the TFL1 homolog. The continuous flowering plants are characterized by short juvenile phase and rapid flowering after seed germination [77]. Thus, modification of FT/TFL1 expression may provide possibility to shorten the juvenile phase in Kalanchoë species and obtain plants that flower for a long period of time with no need of environmental control.
Supplementary Materials: The following are available online at www.mdpi.com/link, Figure S1: Alignments of the amino acid sequences of the PEBP proteins, Figure S2: Results of MEME motif analysis with PEBP proteins from Kalanchoë, Table S1: Results of homology analysis of Kalanchoë genes with Arabidopsis thaliana, Table S2: Results of homology analysis of Kalanchoë genes with various species, Table S3: CDS and protein identity in Kalanchoë, Table S4: Predicted subcellular localization of Kalanchoë PEBP proteins, Table S5: GC content (%) at first (P1), second (P2) and third (P3) codon position in PEBP family genes in Kalanchoë. Funding: Part of this research was granted by CAPES (Coordination for the Improvement of Higher Level Personnel, Brazil, process number 9110-13-5). The authors did not receive funds for covering the publication costs.