Genome-Wide Analysis of Somatic Embryogenesis-Related Transcription Factors in Cultivated Strawberry ( Fragaria × ananassa ) and Evolutionary Relationships among Rosaceae Species

: Somatic embryogenesis is a plant regeneration method commonly used in tissue culture. Its molecular mechanisms are well-known in model plants such as Arabidopsis thaliana L. LEAFY COTYLEDON1 ( LEC1 ), LEAFY COTYLEDON2 ( LEC2 ), FUSCA3 ( FUS3 ), ABSCISIC ACID INSENSITIVE3 ( ABI3 ), and BABYBOOM ( BBM ) genes are considered master regulators in the induction, growth, and maturation of somatic embryos. However, the study of these transcription factors in fruit crops with high agronomic and economic value such as cultivated strawberry ( Fragaria × ananassa Duch.) and other Rosaceae species is scarce. The purpose of this study was the in silico characterization of LEC1 , ABI3 , FUS3 , LEC2 , and BBM (LAFL-B) genes from F. × ananassa genome and the study of the evolutionary relationships within the Rosaceae family. Synteny analyses and molecular evolutionary rates were performed to analyze the evolution of each transcription factor within the Rosaceae family. Synteny was conserved between F. × ananassa and other Rosaceae genomes, and paralogous genes were selected through negative selection. Additionally, the exon–intron organization and multiple alignments showed that gene structure and DNA-binding domains were conserved in F. × ananassa transcription factors. Finally, phylogenetic trees showed close evolutionary relationships between F. × ananassa and its orthologous proteins in the Rosoideae subfamily. Overall, this research revealed novel insights in the LAFL-B network in F. × ananassa and other species of the Rosaceae family. These results provide useful in silico information and new resources for the establishment of more efﬁcient propagation systems or the study of ploidy effects on somatic embryogenesis. These results indicate that most positions for LEC1 , L1L , LEC2 , FUS3 , ABI3, and BBM genes are conserved between F. × ananassa and other Rosaceae family genomes.


Introduction
Plants exhibit high plasticity during their developmental stages, and it is a survival strategy to overcome environmental constraints [1]. For instance, zygotic embryogenesis is an essential process for plant reproduction and allows the regeneration of a complete plant from a single cell through the fertilization process [2]. In a similar manner, somatic embryogenesis (SE) is an alternative regeneration process, for which a new individual is generated from a vegetative cell [3,4], occurring naturally in some Kalanchoe species [5]. It can also be manipulated by in vitro tissue culture under appropriate stress signals such as plant growth regulators and wounding, as well as other mechanisms [5,6], offering multiple possibilities to investigate the bases and potential applications of SE in economically important crops. In general, SE is well-known at the morphological, physiological, and molecular level in plant model species such as Arabidopsis thaliana [7][8][9], Medicago truncatula [10], and others, such as Coffea canephora [11]. However, in fruit crops, such as To know the master regulator genes underlying the SE process in F. × ananassa is a key step to elucidating the molecular bases of SE growth and physiology [14][15][16]. In addition, the evolutionary comparison of these genes with orthologous of other members from the Rosaceae family is essential to establishing bases for understanding the embryogenic ability of this family and the discovery of new biotechnological applications in these crops. Therefore, the goal of this research was the in silico characterization of LAFL-B gene networking in F. × ananassa and the study of their evolutionary relationships with other species of the Rosaceae family. Results suggested that LEC1, L1L, LEC2, FUS3, ABI3, and BBM genes and proteins were present in variable copy number in F. × ananassa genome and conserved at the structural level. In addition, evolutionary relationships with LABL-B genes in other Rosaceae species suggest that F. × ananassa genes were closely conserved with genes of Rosoideae subfamily, including F. vesca, Rubus occidentalis, and Rosa chinensis. Taken together, these in silico analyses constitute the first detailed report about SE-related TFs in F. × ananassa and other Rosaceae species.

Synteny Analysis and Molecular Evolutionary Rates between Somatic Embryogenesis-Related Transcription Factors in Fragaria × ananassa and Other Rosaceae Species
Chromosomal locations and syntenic relationships between LEC1s, L1Ls, LEC2s, FUS3s, ABI3s, and BBMs of F. × ananassa, and orthologous genes of F. vesca, R. occidentalis, R. chinensis, M. × domestica, P. persica, and P. communis were analyzed using Synteny Viewer tool, available on GDR database, and the diagram was constructed using Basic Circos, available on TBtools software [46]. Molecular evolutionary rates for paralogous gene pairs were calculated using Simple Ka/Ks Calculator, available on TBtools software [46]. FaL1L3 and FvL1L2 coding-sequences were subjected to the identification of site-specific positive selection operating on amino acids using Selecton version 2.4 (http://selecton. tau.ac.il/index.html; [47]), performing the model M8 (positive selection enabled). We consider that Ka/Ks > 1 and probability < 0.01 indicate amino acid residues under positive selection. FaL1L3 protein was used as reference sequence to show amino acid under positive selection.
Additionally, orthologous genes were also detected in diploid F. vesca and other Rosaceae species. In the case of F. vesca, the following were identified: LEC1 (1), L1L (2), LEC2 (1), FUS3 (1), ABI3 (1), and BBM (1) loci. Two alleles were identified for F. vesca ABI3 and BBM genes ( Table 2 and Table S3). In the remaining Rosaceae members, a variable number of SE-related TF was identified ( Table 2 and Table S3), although ABI3 genes could not be found in P. communis (Table 2). LEC1 exhibited a higher number of genes in R. occidentalis, M. × domestica, and P. communis compared to F. × ananassa and F. vesca (Table 2  and Table S3). L1L, LEC2, FUS3, and ABI3 genes showed the highest loci number in F. × ananassa compared to F. vesca and other Rosaceae species (Table 2). In the case of BBM, F. × ananassa displayed two loci, as well as R. chinensis, while other species only showed one locus each (Table 2). In P. persica, L1L, LEC2, and ABI3 showed two, three, and six alleles, respectively (Table 2). Regarding specific molecular characteristic for LAFL-B genes and proteins in other Rosaceae species, LEC1 exhibited the lower length for genes and proteins (Table S3). In contrast, the higher gene and protein lengths were observed for R. chinensis and P. persica BBM, respectively, according to higher molecular weights in these proteins (Table S3). In relation to pI, RoLEC1.3 and MdLEC2.1 presented the lower values (Table S3).  To gain insights into the conservation of LEC1, L1L, LEC2, FUS3, ABI3, and BBM TFs in F. × ananassa and other members of Rosaceae family, structural analyses of genes and proteins were performed (Figures 1-3). Full-length coding sequences were aligned with corresponding genomic sequences to obtain the exon-intron organization. Protein structures were analyzed through discovery of motifs and multiple alignments of DNA binding domains. Finally, F. × ananassa and other Rosaceae genes and proteins were compared with their orthologous sequences in A. thaliana.
The FaLEC1 gene did not show introns, and information about its UTR regions was missing for F. × ananassa genome, while one intron and UTR sequences were detected in its F. vesca and A. thaliana orthologous sequences ( Figure 1A). In a similar manner, RoLEC1.2 showed a single intron, but RoLEC1.3 displayed four introns ( Figure 1A). Similarly, FaLEC1, FaL1L1, and FaL1L4 were intronless, and FaL1L2 and FaL1L3 displayed one and two introns ( Figure 1A), respectively. In the remaining Rosaceae species, almost all L1L orthologous genes showed one intron except RoL1L, RcL1L, and PpL1L2 orthologous genes, all of them being intronless ( Figure 1A). Regarding the structure of protein domains, all LEC1s and L1Ls sequences showed the CBF-A/NF-YB domain conserved for its amino acid sequences ( Figure 1B). A clear difference in the motif content was observed between LEC1 and L1L protein sequences for the Rosaceae species ( Figure 1B). In the case of LEC1 orthologous, two additional motifs were detected for M. × domestica, P. persica, and P. communis proteins. They were not present in FaLEC1, FvLEC1, and other orthologous sequences of R. occidentalis and R. chinensis ( Figure 1B). Regard the L1L proteins, a higher number of motifs were detected for F. × ananassa L1L proteins and the rest of Rosaceae sequences ( Figure 1B). Additionally, the CBF-A/NF-YB domain exhibited high conservation of amino acid residues for FaLEC1 and FaL1Ls and orthologous Rosaceae proteins. However, this domain was lower conserved in RoLEC1.1, RoLEC1.3, and RoLEC1.4 sequences ( Figure S1). The Asp (D) residue was present for the majority of the proteins except for FaL1L2, FaL1L4, and FvL1L2, showing similar amino acid residue Glu (E) ( Figure S1). In the case of RoLEC1.1, RoLEC1.3, and RoLEC1.4, this amino acid was substituted by Lys (K) or Arg (R) ( Figure S1). proteins were performed (Figures 1-3). Full-length coding sequences were aligned with corresponding genomic sequences to obtain the exon-intron organization. Protein structures were analyzed through discovery of motifs and multiple alignments of DNA binding domains. Finally, F. × ananassa and other Rosaceae genes and proteins were compared with their orthologous sequences in A. thaliana.
The FaLEC1 gene did not show introns, and information about its UTR regions was missing for F. × ananassa genome, while one intron and UTR sequences were detected in its F. vesca and A. thaliana orthologous sequences ( Figure 1A). In a similar manner, RoLEC1.2 showed a single intron, but RoLEC1.3 displayed four introns ( Figure 1A). Similarly, FaLEC1, FaL1L1, and FaL1L4 were intronless, and FaL1L2 and FaL1L3 displayed one and two introns ( Figure 1A), respectively. In the remaining Rosaceae species, almost all L1L orthologous genes showed one intron except RoL1L, RcL1L, and PpL1L2 orthologous genes, all of them being intronless ( Figure 1A). Regarding the structure of protein domains, all LEC1s and L1Ls sequences showed the CBF-A/NF-YB domain conserved for its amino acid sequences ( Figure 1B). A clear difference in the motif content was observed between LEC1 and L1L protein sequences for the Rosaceae species ( Figure 1B). In the case of LEC1 orthologous, two additional motifs were detected for M. × domestica, P. persica, and P. communis proteins. They were not present in FaLEC1, FvLEC1, and other orthologous sequences of R. occidentalis and R. chinensis ( Figure 1B). Regard the L1L proteins, a higher number of motifs were detected for F. × ananassa L1L proteins and the rest of Rosaceae sequences ( Figure 1B). Additionally, the CBF-A/NF-YB domain exhibited high conservation of amino acid residues for FaLEC1 and FaL1Ls and orthologous Rosaceae proteins. However, this domain was lower conserved in RoLEC1.1, RoLEC1.3, and RoLEC1.4 sequences ( Figure S1). The Asp (D) residue was present for the majority of the proteins except for FaL1L2, FaL1L4, and FvL1L2, showing similar amino acid residue Glu (E) ( Figure S1). In the case of RoLEC1.1, RoLEC1.3, and RoLEC1.4, this amino acid was substituted by Lys (K) or Arg (R) ( Figure S1). In the case of AFL genes, the exon-intron organization of FaLEC2s, FaFUS3s, and FaABI3s genes shared a similar number of introns ( Figure 2A). FaLEC2.1 and FaLEC2.2 contained four introns, while FaLEC2.3, FaLEC2.4, FaFUS3s, and FaABI3s genes displayed five introns each ( Figure 2A). Most of the LEC2, FUS3, and ABI3 orthologous genes of other Rosaceae species contained five introns, although some genes, such as FvLEC2 and PcFUS3, only showed two introns ( Figure 2A). Others, such as MdLEC2.1, displayed six introns ( Figure 2A). In the case of PpLEC2 alleles, a larger 5 -UTR was detected ( Figure 2A). Regarding the proteins, orthologous sequences for LEC2s, FUS3, and ABI3 contained the common B3 domain ( Figure 2B). FaLEC2.2 and FaLEC2.3 only showed the conserved B3 domain, but FaLEC2.1 and FaLEC2.4 also presented an N-terminal motif ( Figure 2B). MdLEC2.2 and PpLEC2 isoforms contained an additional motif ( Figure 2B). In the case of FaFUS3s and FvFUS3, the previous motif (motif 2) was conserved and located in the C-terminal region. It was also shared by R. chinensis ( Figure 2B). The remaining FUS3 sequences of the Rosaceae species showed a similar pattern of B3 domain and motifs ( Figure 2B). In general, FaABI3s and orthologous sequences in other Rosaceae species, as well as A. thaliana, displayed a high number of motifs and the same profile of domain distribution in the protein sequence ( Figure 2B). In general, the B3 domains were conserved, and differences in specific amino acid residues were observed between FaLEC2, FaFUS3, and FaABI3 sequences ( Figure S2). Moreover, some Rosaceae sequences, such as RcFUS3, MdFUS3.1, PcFUS3, and MdABI3.1, exhibited additional fragments of sequences compared to the other orthologous ( Figure S2). In the case of AFL genes, the exon-intron organization of FaLEC2s, FaFUS3s, and FaABI3s genes shared a similar number of introns ( Figure 2A). FaLEC2.1 and FaLEC2.2 contained four introns, while FaLEC2.3, FaLEC2.4, FaFUS3s, and FaABI3s genes displayed five introns each ( Figure 2A). Most of the LEC2, FUS3, and ABI3 orthologous genes of other Rosaceae species contained five introns, although some genes, such as FvLEC2 and PcFUS3, only showed two introns ( Figure 2A). Others, such as MdLEC2.1, displayed six introns ( Figure 2A). In the case of PpLEC2 alleles, a larger 5′-UTR was detected ( Figure  2A). Regarding the proteins, orthologous sequences for LEC2s, FUS3, and ABI3 contained the common B3 domain ( Figure 2B). FaLEC2.2 and FaLEC2.3 only showed the conserved B3 domain, but FaLEC2.1 and FaLEC2.4 also presented an N-terminal motif ( Figure 2B). MdLEC2.2 and PpLEC2 isoforms contained an additional motif ( Figure 2B). In the case of FaFUS3s and FvFUS3, the previous motif (motif 2) was conserved and located in the Cterminal region. It was also shared by R. chinensis ( Figure 2B). The remaining FUS3 sequences of the Rosaceae species showed a similar pattern of B3 domain and motifs ( Figure  2B). In general, FaABI3s and orthologous sequences in other Rosaceae species, as well as A. thaliana, displayed a high number of motifs and the same profile of domain distribution in the protein sequence ( Figure 2B). In general, the B3 domains were conserved, and differences in specific amino acid residues were observed between FaLEC2, FaFUS3, and FaABI3 sequences ( Figure S2). Moreover, some Rosaceae sequences, such as RcFUS3, MdFUS3.1, PcFUS3, and MdABI3.1, exhibited additional fragments of sequences compared to the other orthologous ( Figure S2). The same number and length of introns were observed between F. × ananassa and F. vesca BBM genes ( Figure 3A). Both loci FaBBM1, FaBBM2, and two alleles of FvBBM showed six introns ( Figure 3A). In a similar manner, RcBBM1 also showed six introns, but RcBBM2 and other Rosaceae orthologous displayed seven or eight introns ( Figure 3A). The 5 -UTR of RcBBM2 showed more length than other genes ( Figure 3A). In the context of protein structures, FaBMM1 and FaBBM2 proteins exhibited AP2/ERF domain and identical structural motifs for FvBBM isoforms, as well as R. occidentalis and R. chinensis orthologous sequences ( Figure 3B). For M. × domestica, P. persica, and P. communis, BBM proteins only retained a single motif compared to the previously mentioned species ( Figure 3B). Protein alignments for AP2/ERF domain for BBM proteins displayed conservation nearly to 100% in all Rosaceae species, and slight differences with respect to the AtBBM protein ( Figure S3). The results of the molecular characterization of SE-related TFs LEC1, L1L, LEC2, FUS3, ABI3, and BBM (Figures 1-3) indicated that gene and protein structures are highly conserved at structural level in F. × ananassa, and shared similarity with other species of Rosaceae family. The same number and length of introns were observed between F. × ananassa and F. vesca BBM genes ( Figure 3A). Both loci FaBBM1, FaBBM2, and two alleles of FvBBM showed six introns ( Figure 3A). In a similar manner, RcBBM1 also showed six introns, but RcBBM2 and other Rosaceae orthologous displayed seven or eight introns ( Figure 3A). The 5′-UTR of RcBBM2 showed more length than other genes ( Figure 3A). In the context of protein structures, FaBMM1 and FaBBM2 proteins exhibited AP2/ERF domain and identical structural motifs for FvBBM isoforms, as well as R. occidentalis and R. chinensis orthologous sequences ( Figure 3B). For M. × domestica, P. persica, and P. communis, BBM proteins only retained a single motif compared to the previously mentioned species (Figure 3B). Protein alignments for AP2/ERF domain for BBM proteins displayed conservation nearly to 100% in all Rosaceae species, and slight differences with respect to the AtBBM protein ( Figure S3). The results of the molecular characterization of SE-related TFs LEC1, L1L, LEC2, FUS3, ABI3, and BBM (Figures 1-3) indicated that gene and protein structures are highly conserved at structural level in F. × ananassa, and shared similarity with other species of Rosaceae family.  Table S3.

Syntenic Relationships and Molecular Evolutionary Analysis of LEC1, L1L, LEC2, FUS3, ABI3, and BBM Genes in Fragaria × ananassa and Six Other Rosaceae Species
To gain a better understanding of the evolution aspect of SE-related TFs in F. × ananassa and within the Rosaceae family, a comparison of conserved syntenic regions of F. x ananassa with F. vesca, R. occidentalis, R. chinensis, M. × domestica, P. persica, and P. communis chromosomes were performed. We observed that all genes corresponding to FaLEC1, FaL1Ls, FaLEC2s, FaFUS3s, FaABI3s, and FaBBMs were mapped to 12 chromosomes (Table  1 and Figure 4). In other species of the Rosaceae family, all genes except MdLEC2.1 (unknown position) were also assigned to its chromosome positions ( Figure 4). Next, synteny blocks between F. × ananassa and other genomes were detected (Table S4), and the higher number of conserved syntenic regions was observed with M. × domestica and F. vesca genomes ( Figure 4 and Table S4). The lowest synteny was observed between F. × ananassa and P. communis genome ( Figure 4 and Table S4). Otherwise, FaFUS3.2 and FaBBM1 genes showed a higher number of syntenic relationships with other Rosaceae species ( Figure 4 and Table S4), while FaLEC2.4 were assigned to a single conserved syntenic block with To gain a better understanding of the evolution aspect of SE-related TFs in F. × ananassa and within the Rosaceae family, a comparison of conserved syntenic regions of F. × ananassa with F. vesca, R. occidentalis, R. chinensis, M. × domestica, P. persica, and P. communis chromosomes were performed. We observed that all genes corresponding to FaLEC1, FaL1Ls, FaLEC2s, FaFUS3s, FaABI3s, and FaBBMs were mapped to 12 chromosomes (Table 1 and Figure 4). In other species of the Rosaceae family, all genes except MdLEC2.1 (unknown position) were also assigned to its chromosome positions ( Figure 4). Next, synteny blocks between F. × ananassa and other genomes were detected (Table S4), and the higher number of conserved syntenic regions was observed with M. × domestica and F. vesca genomes (Figure 4 and Table S4). The lowest synteny was observed between F. × ananassa and P. communis genome (Figure 4 and Table S4). Otherwise, FaFUS3.2 and FaBBM1 genes showed a higher number of syntenic relationships with other Rosaceae species (Figure 4 and Table S4), while FaLEC2.4 were assigned to a single conserved syntenic block with FvLEC2 gene (Figure 4 and Table S4). In the case of FaL1L2, syntenic relationships were not detected with any species. These results indicate that most positions for LEC1, L1L, LEC2, FUS3, ABI3, and BBM genes are conserved between F. × ananassa and other Rosaceae family genomes. FvLEC2 gene (Figure 4 and Table S4). In the case of FaL1L2, syntenic relationships were not detected with any species. These results indicate that most positions for LEC1, L1L, LEC2, FUS3, ABI3, and BBM genes are conserved between F. × ananassa and other Rosaceae family genomes.  Table S4.
To determine the impact of gene duplications of SE-related TFs in the Rosaceae family, molecular evolutionary rates were estimated (Table 3). Firstly, gene duplication patterns were analyzed for each pair of paralogous genes [58,59]. Secondly, paralogous genes pairs were identified considering a coverage of alignment length >70% and an identity for aligned regions >70% [60]. Thirdly, molecular evolutionary rates were calculated for paralogous of F. × ananassa and between F. × ananassa-F. vesca genes. A total of seven and one paralogous genes were found in F. × ananassa and F. vesca genomes, respectively (Table 3). In the remaining species, only one gene pair was paralogous ( Table 3). Most of the  Table S4.
To determine the impact of gene duplications of SE-related TFs in the Rosaceae family, molecular evolutionary rates were estimated (Table 3). Firstly, gene duplication patterns were analyzed for each pair of paralogous genes [58,59]. Secondly, paralogous genes pairs were identified considering a coverage of alignment length >70% and an identity for aligned regions >70% [60]. Thirdly, molecular evolutionary rates were calculated for paralogous of F. × ananassa and between F. × ananassa-F. vesca genes. A total of seven and one paralogous genes were found in F. × ananassa and F. vesca genomes, respectively (Table 3). In the remaining species, only one gene pair was paralogous ( Table 3). Most of the intraspecies paralogous genes were generated by dispersed duplications (DSD), and only MdLEC1.1 and MdLEC1.2 were a consequence of proximal duplication (PD) ( Table 3). The selective pressures were calculated with synonymous site (Ks), non-synonymous site (Ka), and the ratio Ka/Ks, for understanding the gene family expansion. Values of Ka/Ks~0 indicated neutral selection, while values > or <1 indicated positive and negative selection, respectively [61]. In this study, Ka/Ks values ranged between 0.237 and 1.25 (Table 3). All paralogous genes intraspecies showed Ka/Ks values <1, suggesting that these genes are under negative selection (Table 3). For genes duplicated between F. × ananassa and F. vesca, one gene displayed positive selection (Ka/Ks > 1), but most of the genes were under negative selection (Ka/Ks < 1). These results suggest that the negative selection promoted the removal of deleterious genes during the evolutionary history of LEC1, L1L, LEC2, FUS3, ABI3, and BBM genes in Fragaria × ananassa and other species of Rosaceae family. To gain insights into amino acid sites subjected to positive selection, the calculation of Ka/Ks values at each protein position were performed ( Figure 5 and Table S5). A total of 59 specific sites showed positive selection (Ka/Ks = 1.3-1.5, Table S5) and were represented in the FaL1L3 sequence ( Figure 5). The CBF-A/NF-YB domain presented 18 residues with positive selection, although most of them were part of lower conserved regions ( Figure 5).
intraspecies paralogous genes were generated by dispersed duplications (DSD), and only MdLEC1.1 and MdLEC1.2 were a consequence of proximal duplication (PD) ( Table 3). The selective pressures were calculated with synonymous site (Ks), non-synonymous site (Ka), and the ratio Ka/Ks, for understanding the gene family expansion. Values of Ka/Ks ~ 0 indicated neutral selection, while values > or <1 indicated positive and negative selection, respectively [61]. In this study, Ka/Ks values ranged between 0.237 and 1.25 (Table 3). All paralogous genes intraspecies showed Ka/Ks values <1, suggesting that these genes are under negative selection (Table 3). For genes duplicated between F. × ananassa and F. vesca, one gene displayed positive selection (Ka/Ks > 1), but most of the genes were under negative selection (Ka/Ks < 1). These results suggest that the negative selection promoted the removal of deleterious genes during the evolutionary history of LEC1, L1L, LEC2, FUS3, ABI3, and BBM genes in Fragaria × ananassa and other species of Rosaceae family. To gain insights into amino acid sites subjected to positive selection, the calculation of Ka/Ks values at each protein position were performed ( Figure 5 and Table S5). A total of 59 specific sites showed positive selection (Ka/Ks = 1.3-1.5, Table S5) and were represented in the FaL1L3 sequence ( Figure 5). The CBF-A/NF-YB domain presented 18 residues with positive selection, although most of them were part of lower conserved regions ( Figure 5).  Table S5.

Discussion
Somatic embryogenesis is a developmental event through which somatic cells experience reprogramming and acquisition of embryogenic competence to form somatic embryos, and then a complete plant [4,5]. This process offers multiple opportunities for the study of molecular bases regulating the zygotic embryogenesis and development pathways in model plants, such as A. thaliana and M. truncatula [6,10], and even in economically important crops, such as Coffea canephora [11,63]. In the Rosaceae family, most studies related to SE have been focused on the establishment of culture conditions for somatic embryo development, based on other studies on Rubus, Rosa, Malus, Prunus, and Pyrus  Table S3.

Discussion
Somatic embryogenesis is a developmental event through which somatic cells experience reprogramming and acquisition of embryogenic competence to form somatic embryos, and then a complete plant [4,5]. This process offers multiple opportunities for the study of molecular bases regulating the zygotic embryogenesis and development pathways in model plants, such as A. thaliana and M. truncatula [6,10], and even in economically important crops, such as Coffea canephora [11,63]. In the Rosaceae family, most studies related to SE have been focused on the establishment of culture conditions for somatic embryo development, based on other studies on Rubus, Rosa, Malus, Prunus, and Pyrus genus [64][65][66][67][68] and F. × ananassa [14][15][16]. However, the study of molecular bases in these agronomic species has not been addressed in depth [13,69,70].

Fragaria × ananassa Genome Contains a Variable Loci Number of SE-related TFs
Molecular networks of SE are known to be controlled by TFs acting as master regulators of the initiation and development of embryogenic program in A. thaliana, M. truncatula, and Coffea canephora [7][8][9]20,21], among others. This information will increase and can add to basic knowledge and eventually play a key role in the improvement of cultivated strawberry. Numerous SE-related TFs have been reported, but only six genes, including LEC1, L1L, LEC2, FUS3, ABI3, and BBM, are major regulators of the cell totipotency, establishment, growth, and maturation of somatic embryos [7][8][9]17] offering significant biotechnological applications [20,21]. F. × ananassa is an allopolyploid species (2n = 8x = 56), i.e., its genome is constituted by multiple diploid genomes, including the F. vesca subgenome (2n = 2x = 14) [36]. Consequently, F. × ananassa could contribute with a significant and broader repertoire number of TFs than F. vesca. Moreover, it could be useful for the understanding of molecular mechanisms underlying the SE process in different species [36], due to the fact that F. × ananassa showed the highest loci number for L1L, LEC2, FUS3, and ABI3 genes (Tables 1 and 2). In a similar way, the doubled-haploid genome of M. x domestica [53] also contained a higher loci number for SE-related genes ( Table 2 and Table  S3), due to a higher number of protein-coding genes [36,53]. The polyploidy generated by whole-genome duplications (WGD) is an important driving force in the evolution of plant genomes [71]. These loci numbers could be a result of the polyploidy generated by the F. × ananassa genome, compared to the diploid genome of F. vesca [49]. Surprisingly, some species with diploid genome, such as R. occidentalis [51], exhibited a high number of LEC1 loci ( Table 2 and Table S3). For instance, this could be the cause of the expansion of loci numbers for SE-related TFs in F. × ananassa. However, not all diversity within TF families is explained by WGD [72]. It is worthwhile to consider the tandem, transposon-mediated, segmental duplication, and retroduplication as additional mechanisms of gene duplication [73], which could be acting in F. × ananassa and other Rosaceae species. Otherwise, the loss of duplicated genes is a natural process occurring in plant genomes and can be the result of WGD or loss-of-function mutations [73]. In the case of LEC1, four loci were detected in Rubus against one locus in the Rosa and Fragaria genome ( Table 2). This could be related with the early Rubus divergency from the lineage of Rosa and Fragaria genus [74]. Interestingly, ABI3 genes seem to have been subjected both to gain and to loss of duplicated genes in the Amygdaloideae subfamily. For instance, the ABI3 gene was not found in P. communis, while two copies were detected in M. × domestica, compared to one copy found in P. persica (Table 2). This fact suggest that these events have occurred after that Prunus diverged from a common ancestor of Malus and Pyrus [74,75].

Genes and Proteins of LAFL-B Network Are Conserved in Fragaria × ananassa and Other
Rosaceae Species LEC1 and L1L are essential genes responsible for embryo identity in early phases and the development and maturation of embryos [7,22]. Furthermore, these genes are central regulators in the establishment of the developmental program of somatic embryo from vegetative cells [7,22]. In this study, a single LEC1 locus in octoploid F. × ananassa and diploid F. vesca genome was identified (Table 1). This has been also reported for A. thaliana, M. truncatula, Manihot esculenta, and V. vinifera [7,22,33,76], as well as in some Rosaceae species, such as R. chinensis and P. persica (Table 2, [35]). In the case of L1L, four and two loci were found in the octoploid and diploid genome of F. × ananassa and F. vesca, respectively (Tables 1 and 2). Most species contain one or two L1L loci [33,35,76], suggesting that the ploidy of F. × ananassa [36,77] could be related with the expansion of this gene family. The increase in the number of gene copies in TF families is, in part, a result of WGD events. It is necessary for the emergence of TFs with specific roles in adaptative traits [72]. However, it is noteworthy that the higher ploidy levels promote the efficiency decline of plant in vitro regeneration in some species [78,79]. In general, LEC1 locus and L1Ls loci of F. × ananassa displayed similar lengths to their respective F. vesca paralogous (Table 2), according to the predominance of the F. vesca subgenome as part of the F. × ananassa genome [36]. Regarding the intron content, some genes, such as FaLEC1, FaL1L1, and FaLEC4, did not show introns in their genomic sequences ( Figure 1A). This is similar to reports for NF-YB4 and NF-YB8 in V. vinifera [33]. Otherwise, the intron content and length in FaL1L3 locus was different from its F. vesca orthologous and paralogous genes (Table 3), respectively. These differences in the absence, presence, or variation of introns is a common characteristic in plant genomes, related to its evolution and the expansion of the functional diversity in proteins [80][81][82]. Specifically, introns can increase the number of protein isoforms when splicing sites are present [80]. Otherwise, the functionality of LEC1 and L1L TFs is given by the CBF-A/NF-YB domain. This allows the interaction with NF-YA and NF-YC subunits to DNA-binding and regulation of the gene transcription [22,23]. Also, this domain is fully conserved at the amino acid sequence in F. × ananassa and other Rosaceae species ( Figure S1). The Asp (D) residue is considered the key amino acid for the activity of these proteins, in contrast to what has been observed in other proteins of the family NF-YB in Arabidopsis [23]. In general, this residue was conserved, although some proteins, such as FaL1L2, FaL1L4, and FvL1L2, showed the Glu (E) amino acid in the same sequence position ( Figure S1). This characteristic was also reported for its orthologous sequences in V. vinifera [33]. It is noteworthy that F. × ananassa LEC1 and L1L and orthologous proteins in other Rosaceae species exhibited similar high distribution of exclusive protein motifs compared to A. thaliana sequences ( Figure 1B).
LEC2, FUS3, and ABI3 genes belong to the LAV family, forming part of the B3 superfamily [26]. These are involved in cell totipotency, embryo identity, and storing reserve compounds during early and late embryogenesis [8,24,25]. F. × ananassa displayed four LEC2, three FUS3, and three ABI3 loci (Table 1), showing the highest loci number compared to F. vesca and other Rosaceae species (Table 2). In some species, such as M. × domestica ( Table 2) or M. esculenta [76], a maximum number of two loci for each gene was detected at the genome level. In some instances, WGD may be the result of the loss of duplicated genes [83]. In contrast, the higher loci number in F. × ananassa could be a consequence of a greater chromosome number and the genetic redundancy could be a result of the accumulation of non-deletional mutations through different WGD events [73]. P. persica LEC2 and ABI3 were detected with three and six alleles, respectively ( Table 2 and Table  S3). This indicated that these genes could contribute with different proteins isoforms for fine-tuning of phenotypic responses, similar to observations for some TFs in other species [84]. In general, genes showed conservation of number, length, and exon-intron structure between F. × ananassa and A. thaliana genes. However, slight differences in FaLECs and FvLEC2 with AtLEC2 were detected ( Figure 1A). First, FaLEC2.3 and FaLEC2.4 showed a longer 3 -end intron compared to A. thaliana ( Figure 1A). Second, FaLEC2s contained five introns compared to three introns in FvLEC2 or PcFUS3 ( Figure 1A). These differences in the organization of genes would have implications for the regulation of gene expression and the generation capacity of splicing protein variants among species [80,81]. Regarding conservation of protein structures, FaLEC2 only showed the conserved B3 domain, similar to those observed in other members of the Rosaceae family ( Figure 2B) and V. vinifera orthologous sequences [32]. In contrast, FUS3 and ABI3 proteins contained a higher number of conserved motifs in F. × ananassa and other Rosaceae species. Some were not present in A. thaliana proteins ( Figure 2B). The acquisition of new genes, including TFs, with conserved sequences could occur by genome duplication from ancestor species [72,73]. On the other hand, amino acid residues defining the function of LEC2 against FUS3 and ABI3 proteins displayed conservation in FaLEC2 proteins and their orthologous in Rosaceae species ( Figure S2), according to previous reports regarding M. esculenta and T. cacao [34,76].
The BBM gene was initially considered as an auxiliary gene in the SE process that increased the number of somatic embryos when overexpressed [9]. However, recently, Hortsman et al. [17] reported that the BBM TF is responsible for the expression activation of LEC1, LEC2, FUS3, and ABI3 genes in A. thaliana. A total of two BBM loci were identified in F. × ananassa (Table 1), similar to R. chinensis (Table 2) and Rosa canina [85]. In the case of F. × ananassa genes, the chromosome number was not related to the BBM loci number (Table 1). It could be related to losses of duplicated genes during genome evolution [83]. Regarding the gene structure, FaBBM1 and FaBBM2 genes exhibited similar length of introns compared to F. vesca and other Rosaceae species. However, the number of introns was lower than the A. thaliana orthologous gene ( Figure 3A). Although the BBM genes conserved intron structures in the Rosaceae species, its impaired molecular function proposed for the SE process in F. × ananassa [13] or its function in the shoot regeneration in R. canina [85] appeared to be contrary to that traditionally reported in other species, such as A. thaliana [19] or C. canephora [11]. Otherwise, clear differences were observed in FaBBMs and other BBM orthologous proteins ( Figure 3B) of Rosoideae against Amygdaloideae subfamily [37,38]. For instance, BBM proteins in Fragaria, Rubus, and Rosa genus contained additional conserved motifs compared to their orthologous in Malus, Prunus, and Pyrus genus, suggesting that these proteins evolved independently in these subgroups of the Rosaceae family [40,86]. The AP2/ERF domain is used for DNA-binding [27] and displayed small differences in the amino acid sequence with A. thaliana ( Figure S3), according to those previously reported in other species such as R. canina, M. truncatula, C. arabica, and Glycine max [85,87]. The FaBBM1 gene reported in this study is the same as the BBM gene previously identified in the cultivar "Benihopp" by Gao et al. [13]. Furthermore, it could act as a putative inhibitor of SE because of its lower expression levels in embryogenic callus than in non-embryogenic callus and somatic embryos [13]. In contrast, the expression of the BBM orthologous gene promotes the development of somatic embryos in other species such as Saccharum officinarum, C. canephora, and C. arabica [11,87,88]. Therefore, although BBM genes of F. × ananassa and other Rosaceae species were similar to its orthologous gene in other species, such as the model plant A. thaliana, the molecular function of this SE-related TF could depend on the species or even a particular plant lineage.

LAFL-B Genes and Proteins of Rosoideae Evolved Independently of Amygdaloideae Subfamily
Most of the evolutionary approaches consider the study of loci number, gene duplications, synteny analyses, calculation of molecular evolutionary rates of genes, and phylogenetic relationships between proteins. Firstly, all Rosaceae species showed LAFL and BBM genes in their genomes, except for the ABI3 gene that was not found in P. communis, and loci, and allele number were different between species (Table 2). F. × ananassa holds a polyploid genome constituted by 56 chromosomes [36] and in general, contains a higher number of SE-related genes than diploid F. vesca and other Rosaceae genomes (Tables 1 and 2). Similarly, the hybrid M. × domestica also have a high number of chromosomes [53] and showed a high number of LEC1 genes (Table 2). Moreover, a number of LAFL-B genes appeared to be based on the number of protein-coding genes compared to what was observed among F. × ananassa and M. × domestica versus F. vesca and P. persica (Table 2). Secondly, gene duplications contribute to expansion of gene families, and they are grouped into four types [58]. These include the tandem duplications (TD), where two genes are adjacent in the same chromosome, and the proximal duplications (PD), where two genes are in the same chromosome and separated by a few genes. Transposed duplications (TRD), DNA, or RNA-based molecular mechanisms generate two gene copies distantly; and dispersed duplications (DSD) generate two copies of genes, which are not close nor colinear. In the case of LAFL-B genes belonging to the Rosaceae family species, the majority have been generated by DSD through the genome (Table 3). These are a result of the polyploidization and are the type of duplication more prevalent in the Rosaceae family [58]. Thirdly, regarding the interspecies synteny of the LAFL-B genes in Rosaceae family, a greater number of syntenic genes were observed between F. × ananassa genome compared to M. × domestica and F. vesca (Figure 4). This could be related to the ploidy level [89] and the existence of a common ancestor [36,49], respectively. A weak synteny was detected between F. × ananassa and P. communis (Figure 4), reflecting greater evolutionary distances [38,40]. Fourthly, the estimation of the relationship between the number of non-synonymous and synonymous substitutions (Ka/Ks) inform the type of selective pressures on gene sequences [61]. In the case of the LAFL-B genes, this selection pressure was negative for paralogous pairs of F. × ananassa and other genomes (Table 3). Otherwise, the duplication of paralogous genes from F. × ananassa and F. vesca ancestor was similar, and these genes were selected by negative selection. The predominance of negative selection for SE-related genes is a decreasing mechanism of genetic diversity [90], which could have an adverse impact on the efficiency of SE in these species. However, FaL1L3 and FvL1L1 were under positive selection (Table 3), showing 59 amino acid residues subjected to a positive selective pressure, mostly out of the CBF-A/NF-YB domain ( Figure 5). These facts indicate that FaL1L3 and its paralogous sequence in F. vesca contain beneficial mutations [61], suggesting diversification and functional adaptation of these genes from a common ancestor [91]. Fifthly, phylogenetic trees display evolutionary relationships between protein sequences for different species (Figure 6). On the other hand, LEC1, L1L, FUS3, ABI3, and BBM proteins of F. × ananassa were clustered in the same group of F. vesca, R. chinensis, and R. occidentalis. However, RcLEC2.2 and RoLEC2.2 were grouped closer to Amygdaloideae species. These are members of Rosoideae subfamily, while P. persica, M. × domestica, and P. communis belong to Amygdaloideae, and showed lower similarity ( Figure 6). Overall, the LAFL-B gene network in F. × ananassa contained the higher loci number within the Rosaceae family; is more closely related to F. vesca and other Rosoideae subfamily species; was generated by dispersed duplications; and was under negative selection.
The Rosaceae family is composed of Dryadoideae, Rosoideae, and Amygdaloideae subfamilies [39]. Rosoideae includes Fragaria, Rosa, and Rubus genus, F. × ananassa and F. vesca being evolutionarily closer to R. chinensis than R. occidentalis [74]. On the other hand, Prunus, Malus and Pyrus are members of the Amygdaloideae subfamily [38,75]. M. × domestica and Pyrus communis share a common ancestor, while P. persica belongs to a different clade within this subfamily [38,75]. In this sense, our results about LAFL-B gene families exhibited clear relationships according to the evolutionary history of the Rosaceae family [38,74,75]. Taking into account the diversity of loci number in SE-related gene families (Table 2), multiple duplication events occurring in the two families and in specific genus [38,75] could have triggered the gain or loss of gene number in a specific lineage. Additionally, F. × ananassa, F. vesca, and R. chinensis showed closer relationships at structural level of genes and proteins than R. occidentalis. Moreover, Rosoideae species presented higher differences with orthologous genes of Amygdaloideae species (Figures 1-4 and Figure 6) according to higher evolutionary distance [38,74,75]. In a similar manner, genes and proteins shared similar structural properties within Amygdaloideae lineage. In the case of P. persica, the divergency from the clade of M. × domestica and P. communis [38,75] could be the event that determined higher differences between genes and proteins of LAFL-B (Figures 1-4 and Figure 6). Regarding syntenic relationships, the results suggest that most of the genes related to the SE process conserve their position in each genome ( Figure 4). Lower phylogenetic distances determine a high number of syntenic regions, according to that observed in Fragaria genomes [36,49]. Furthermore, syntenic genes of Rosaceae may be present in a common ancestor of two larger families ( Figure 4). Finally, the positive selection of genes suggested that paralogous genes FaL1L3 and FvL1L1 contain beneficial mutations ( Figure 5), contributing to some functional advantages in the early SE [22]. However, although many genes duplicated are retained in the genome, not all genes are functionals [73], making necessary other additional epigenomic, transcriptomic or proteomic studies.

Conclusions
F. × ananassa is an important crop belonging to the Rosaceae family. Genomic information is available to address the molecular basis for the SE process in a polyploid species that has not been explored in depth. In this study, the presence of LEC1, L1L, LEC2, FUS3, ABI3, and BBM genes of F. × ananassa and other Rosaceae genomes provides insights into the TFs that would act as regulators of the SE process. In general, genes and proteins of the LAFL-B network showed conservation at a structural level, through gene structure and DNA-binding domains of proteins in each TF family. In addition, evolutionary analyses indicated that F. × ananassa contained the highest loci number for L1L, LEC2, FUS3, ABI3, and BBM with respect to other Rosaceae species. Furthermore, it may contribute to a wider range of targets for the establishment of in vitro regeneration systems of F. × ananassa than F. vesca. Regarding the evolutionary history of these TFs, interspecies synteny analyses displayed a greater number of synteny blocks for LAFL-B genes among F. × ananassa, F. vesca, and M. × domestica genomes. The lower molecular evolutionary rates indicated that the negative selection was predominant in genes for F. × ananassa and other Rosaceae species. Finally, phylogenetic analyses showed that LAFL-B TFs were most closely related to its orthologous proteins of Rosoideae, compared to the Amygdaloideae subfamily.
Globally, the knowledge about SE-related TFs involved in the induction, development, and maturation of somatic embryos offers genomic targets for obtaining new F. × ananassa varieties with better and more efficient characteristics for clonal propagation. Moreover, new biotechnological approaches incorporating omics techniques could be used for the subsequent breeding of this species, considering that SE is a versatile regeneration system allowing the transformation and generation of new plants for cultivated strawberry with interesting and improved agronomic traits. However, future studies need to be directed to reveal how the dynamics of these TFs work during the SE and how the ploidy affects this process in F. × ananassa. These studies will be important for improved strawberry propagation.