Genome Wide Identiﬁcation and Analysis of the R2R3-MYB Transcription Factor Gene Family in the Mangrove Avicennia marina

: Drought and salinity stress have become the major factors for crop yield loss in recent years. Drastically changing climatic conditions will only add to the adverse effects of such abiotic stresses in the future. Hence, it is necessary to conduct extensive research to elucidate the molecular mechanisms that regulate plants’ response to abiotic stress. Halophytes are plants that can grow in conditions of high salinity and are naturally resistant to a number of abiotic stresses. Avicennia marina is one such halophyte, which grows in tropical regions of the world in areas of high salinity. In this study, we have analysed the role of R2R3-MYB transcription factor gene family in response abiotic stress, as a number of transcription factors have been reported to have a deﬁnite role in stress manifestation. We identiﬁed 185 R2R3 MYB genes at genome-wide level in A. marina and classiﬁed them based on the presence of conserved motifs in the protein sequences. Cis-regulatory elements (CREs) present in the promoter region of these genes were analysed to identify stress responsive elements. Comparative homology with genes from other plants provided an insight into the evolutionary changes in the A. marina R2R3 MYB genes. In silico expression analysis revealed 34 AmR2R3 MYB genes that were differentially regulated in the leaves and root tissue of A. marina subjected to drought and salinity stress. This study is the ﬁrst report of the R2R3 MYB gene family in the A. marina genome and will help in selecting candidates for further functional characterisation.


Introduction
The phenomenon of global warming is responsible for drastic climate changes leading to an increase in the incidences of both biotic and abiotic stress conditions all over the world. In light of this worldwide problem, sustenance and enhancement of agricultural productivity in future is a global concern since such extremes in climatic conditions adversely affect the development, growth and productivity of major agricultural systems. Plants in general, evolve different mechanisms to cope with environmental stresses such as salinity, drought and low/high temperature. Among various strategies used for the development of abiotic stress tolerant plants, the identification of candidate genes and dissecting signaling mechanisms in abiotic stress responses have been widely attempted by researchers [1]. Recent studies have analysed the expression patterns of an array of genes regulating plant growth and development in order to counteract adverse environmental stress conditions [2,3]. Comprehension and analysis of the role of transcription factors is believed to be a more practical approach than focusing on individual gene components. Transcription factors (TFs) have been reported to play an important role in plant growth, development, and stress response through self-regulation [4][5][6][7]. They also regulate the expression of downstream target genes [5,6]. These abilities make them suitable candidates for genetic manipulation of complex stress tolerance traits. However, the difficulty often tated CDS of A. marina. Thereafter, sequence identifiers provided by [14] were used to download the peptide sequences for R2R3-MYB transcription factors of rice and Arabidopsis from Rice Genome Annotation project (http://rice.plantbiology.msu.edu/) and TAIR (https://www.arabidopsis.org/) respectively. These sequences were aligned and the alignment file was used to build the HMM profile of R2R3-MYB domain using HM-MER v3.3 (http://hmmer.org/) and this profile was used to identify the R2R3-MYB domain containing protein sequences amongst the MYB TFs in A. marina genome. Chemical properties of the protein sequences were evaluated with the ProtParam tool of ExPASy (https://web.expasy.org/protparam/; [37]) and subcellular localisation was predicted with BUSCA (http://busca.biocomp.unibo.it/; [38])

Chromosomal Location and Nomenclature
The standalone version of BLAST software (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/ blast+/) was used to map the corresponding CDS sequences of the MYB TFs of A. marina onto the genome using Blastn. The output file was used to locate the positions of all predicted MYB genes on the various scaffolds of A. marina genome. The MYB coding genes were named sequentially based on their position on the chromosomes/scaffolds, starting from AmMYB1. Only the R2R3-MYB TFs were taken for further analysis and were localised onto the chromosomes using MapChart [39].

Phylogenetic Analysis
The protein sequences of the R2R3-MYB TFs were used to generate the phylogenetic tree on MEGA X software ( [40]; https://www.megasoftware.net/). The sequences were aligned with ClustalW and the tree was constructed with the neighbour-joining method. Bootstrap value was set to 100 and branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed. The evolutionary distances were computed using the Jones-Taylor-Thornton (JTT) matrix-based method.

Exon-Intron Structure and Conserved Motif Analysis
The CDS sequences of R2R3 MYB TFs were mapped onto the genome of A. marina using Blastn and the start and end positions of the mapped gene positions were used to prepare the bed file for extracting the fasta sequences of the genomic regions. The fasta formatted CDS sequences and genomic sequences were uploaded to Gene Structure Display Server (http://gsds.gao-lab.org/) to generate exon-intron structure. Conserved motifs in the protein sequences of R2R3 MYB TFs were identified with MEME tool of MEME-suite (http://meme-suite.org/tools/meme) with the parameters set as Zero to one occurrences per sequence; motifs to be found = 15 and motif width = 6-200.

Duplication, dN/dS and Homology/Synteny Analysis
Protein sequences of R2R3 MYB TFs in A. marina were aligned with clustalW and duplicate genes were identified using MCScanX [41]. The ratios of non-synonymous (dN) and synonymous (dS) substitutions were calculated using PAL2NAL v14 (http:// www.bork.embl.de/pal2nal/#Download) and CODEML program of PAML package (http: //abacus.gene.ucl.ac.uk/software/paml.html). The R2R3 AmMYB genes were compared with the CDS sequences of R2R3 MYB genes of Arabidopsis thaliana and Oryza sativa and their genomic positions were recorded. The Circos tool (circos-0.69-9; http://circos.ca/) was used to generate the figure using the genomic positions and chromosome lengths.

Conserved Motif Identification in Promoter Region
A stretch of 2000 bp upstream of the transcription start site of A. marina R2R3 MYB TFs were extracted from the genome using BEDtools [42]. The motifs in promoter sequences were identified with PlantCARE [43]; http://bioinformatics.psb.ugent. be/webtools/plantcare/html/).

In Silico Expression Analysis
The short reads of Avicennia marina control leaf (Accession SRR2029733), control root (Accession SRR2029734), salt stressed leaf (Accession SRR2029735), salt stressed root (Accession SRR2029736), drought stressed leaf (Accession SRR2029738) and drought stressed root (Accession SRR2029739) were obtained from the SRA database of NCBI (https://www.ncbi.nlm.nih.gov/sra). These were aligned onto the CDS sequences of A. marina R2R3 MYB TFs and their normalised expression values were calculated using the tools provided in Trinity package v2.11.0 (https://github.com/trinityrnaseq/trinityrnaseq/ releases; [44]). Detailed instructions for differential gene expression analysis can be found at: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression [45]. The TMM method was used for preparing expression matrix and the FPKM values were normalised by calculating the log2 values. Z score values were calculated for the log 2 normalized FPKM according to the formulae described by [46] values to further remove bias andthese expression values were used to generate the heat map on MeV v4.8.1 (http://mev.tm4.org/; [47]).

R2R3-MYB Transcription Factor Family in Avicennia marina
A total of 284 MYB TF proteins were identified in the genome of A. marina based on the presence of the MYB consensus sequence as well as the HMM profile of MYB-DNA binding domain. They were named AmMYB1-AmMYB284 based on their positions on A. marina chromosome/scaffold. Of these, 185 members were identified as R2R3 MYB based on their similarity to the R2R3 MYB proteins of rice and Arabidopsis (File S1). Analysis of the chemical properties of A. marina R2R3 MYB proteins revealed that their length ranged from 100 bp to 1748 bp, molecular weight ranged from 11.46 kDa to 191.24 kDa, and their isoelectric point ranged from 4.65 to 10.28 (Table 1). A majority of them were predicted to be localised to nucleus (89%) while a few were localised to chloroplast (7%). Four of the AmR2R3 MYB proteins (AmMYB41, AmMYB79, AmMYB130 and AmMYB133) contained a signal peptide implying that these are secretory proteins that localise to extracellular space while one member (AmMYB251) was localised to the mitochondrion (Table 1).

Phylogenetic Analysis and Classification of R2R3 MYB Genes
The phylogenetic tree was prepared based on the alignment of the R2R3 MYB proteins of A. marina amongst themselves. The tree was corelated with the gene structure (intronexon arrangement) and the motif composition of these proteins. Based on the type of motifs present in the protein sequences, the R2R3 AmMYB were classified into 8 groups (I-VIII) and sub-groups (Figure 1a). Majority of the members belonged to group III and contained motifs 1, 2, 3, and 5 (motif information is provided in Figure S1) while some of the members also contained the motif 7. The similarities were also reflected in the exon-intron structure of the genes. For example, the members of group I were either intronless or consisted of only one intron. Similar results were found for most of the groups with the exception of few members which deviated from the group morphology. Interestingly, AmMYB63 and AmMYB137 did not contain motifs similar to the other AmR2R3 MYB but both had the R2R3 MYB domain and a number of distinct motifs ( Figure S2).

Chromosomal Location and Gene Duplication
The recent A. marina genome assembly reports a chromosome level assembly with 32 super scaffolds representing the chromosomes [35]. The R2R3 MYB genes of A. marina were found to be located exclusively in these 32 chromosomes (Supplementary Table S1, Figure 2). Chromosomes 8 and 11 of A. marina contained the loci for highest number of R2R3 MYB genes (10 each) while chromosome 7 contained only one. Most of the chromosomes contained 3-7 R2R3 MYB genes implying that the distribution of AmR2R3 MYB genes is fairly uniform in all chromosomes. Analysis of duplication events revealed seven tandem duplication events and six collinear gene pairs that have resulted in the course of evolution of AmR2R3 MYB family (Figure 2). These gene pairs were analysed for the ratio of rates of non-synonymous (dN) to synonymous (dS) substitutions (Table 2). Generally, a value of <1 suggests purifying selection resulting in functional constraint, a value of >1 implies positive selection resulting in accelerated evolution and a value = 1 implies neutral selection. It was observed that the value of dN/dS for all the gene pairs was < 1 with the value for AmMYB225/AmMYB226 almost nearing 1 ( Table 2).

Promoter Motif Analysis
The promoter regions of genes contain conserved motifs which act as recognition and binding sites for various proteins. These interactions play an integral role in the regulation of gene expression and thereby affect the important biological processes in an organism. Therefore, conserved motifs in the promoter region of AmR2R3 MYB genes were analysed to evaluate their role in abiotic stress tolerance. The most frequently found motifs were core elements necessary for transcription such as CAAT-box (6126) and TATA-box (7599). A large number of cis-regulatory elements (CREs) related to stresss tolerance were identified ( Figure 3, Table 3). Most abundant of these were the water and drought response elements MYB (872) and ABRE elements (668) followed by MYC (593), which is a drought response element, ARE (327) i.e., anaerobic-responsive elements and STRE (308) which is stressresponsive element. A number of other stress response CREs, such as ARE and ERE elements (oxidative stress responsive elements), W box (binding site for WRKY TFs), LTR (low temperature responsive elements), and DRE core (cold and dehydration responsive element), were also identified in the promoter regions of AmR2R3 MYB genes (Table 3).

Homology with R2R3 MYB Genes from Rice and Arabidopsis
Many transcription factor domains are known to be conserved across plant genera [48]. Probable reason for the high level of conservation could be the essential function of these genes in growth and development of plants. Thus, analysing the homology between different plants can provide an idea about the evolution of the gene family. Therefore, the AmR2R3 MYB proteins were compared with those from Arabidopsis and rice. It was interesting to note that none of the A. marina R2R3 MYB proteins grouped with the R2R3 MYB from rice and Arabidopsis (Figure 4a). The Am R2R3 MYB formed a distinctly separate clade while a number of members from Arabidopsis (with locus Ids starting with AT) and rice (with locus Ids starting with LOC_Os) showed similarity to each other. Thereafter, the AmR2R3 MYB genes (CDS sequences) were compared to the R2R3 MYB genes of rice (OsR2R3 MYB) and Arabidopsis (AtR2R3 MYB). We observed that 28 out of the 185 AmR2R3 MYB genes found homologous counterparts in AtR2R3 MYB genes and 27 found homologous counterparts in OsR2R3 MYB genes (Supplementary Table S2). Many of the AmR2R3 MYB genes had multiple homologs in Arabidopsis and rice (Figure 4b).

In Silico Gene Expression Analysis
In silico expression analysis was carried out using the RNA-Seq data for control and treated A. marina samples exposed to drought and salinity stress available at NCBI. The raw FPKM values were calculated using publicly available perl scripts and normalised by calculating their log 2 values and z scores (Supplementary Table S3). The hierarchical clustering of values based on the pattern of expression clustered the AmR2R3 MYB genes into distinct blocks (Figure 5a). It was observed that drought stress had a more noticeable effect on the expression of AmR2R3 MYB genes as compared with salt stress. The differential expression was also more pronounced in the leaf tissue compared with root. AmMYB275,  61, 65, 237, 135, 190, 216, 1, 29, 209, 257, 112, 122, and 187 were upregulated in leaf and root tissues in response to drought stress and AmMYB267 was especially upregulated in root tissue (Figure 5b). On the other hand, AmMYB117, 158, 168, 183, 26, 80, and 140 were down regulated in leaf and root tissue in response to drought stress. AmMYB146, AmMYB164, and AmMYB51 were down regulated in leaf tissue while having no significant differential expression in root (Figure 5c). Salt stress had little effect on the expression of AmR2R3 MYB genes. Two of the members showed a noticeable change in expression in response to salt stress. AmMYB87 was upregulated in response to salt stress and downregulated in response to drought in both leaf and root tissue while the opposite was observed for AmMYB39 (Figure 5d).

Discussion
Plants response to environmental stress is one of the most extensively studied research area and assumes significance in developing strategies for addressing the growing concern of declining plant productivity in response to adverse climatic impacts. Abiotic stress conditions like drought and salinity are responsible for heavy yield losses in plants [49]. Therefore, understanding the molecular basis of abiotic stress response in plants has been a focal research area. Halophytes such as Avicennia marina typically grow in conditions of high salinity in various regions of the world [50]. Such plants can be a valuable resource for identifying genes which regulate their enhanced response to such stresses. The R2R3 MYB gene family is one of the most widespread family of transcription factors and transcription regulators with diverse functions, especially in abiotic stress response in plants [51]. In this study, we have undertaken the identification and analysis of the R2R3 MYB gene family in Avicennia marina which has led to a number of interesting observations.

The Evolution of A. marina R2R3 MYB Gene Family
There are numerous parameters to assess the evolutionary history of gene families and phylogenetic relationships form the foundation of such analyses. The AmR2R3 MYB were grouped into eight groups and sub-groups based on the presence of similar motifs in their peptide sequences. A key factor that contributes to the evolution of a gene family is the duplication events. A number of tandem duplications as well as collinear AmR2R3 MYB gene pairs were identified and classified into negatively, positively and neutrally selected genes based on their dN/dS ratios. The AmR2R3 MYB genes were under negative or purifying selection. This type of expansion of a gene family ensures functional conservation of the genes in the course of evolution and is commonly observed in transcription factor families [52,53].
On one hand, the expansion of R2R3 MYB gene family in A. marina was found to be under negative selection leading to the conservation of gene function. However, contrasting results were obtained on analysing the phylogenetic association of the members between A. marina, Arabidopsis, and O. sativa. There was some intermixing observed for R2R3 MYB proteins from Arabidopsis and rice, both of which are glycophytes, while the R2R3 MYB proteins of the halophyte A. marina formed a clade that was distinctly separate from the counterparts in both Arabidopsis and rice. This outcome is fortified by the fact that Avicennia is a monotypic genus which implies that it would have a markedly different genetic composition to model plants like rice and Arabidopsis [50]. This supports the idea that the R2R3 MYB family in A. marina has evolved to help the plant to combat abiotic stress and is therefore crucial to its survival and growth in conditions of high salinity. However, a one on one BLASTn based comparison between the CDS sequences of R2R3 MYB from A. marina with those from Arabidopsis thaliana and Oryza sativa revealed 28 and 27 members of AmR2R3 MYB family to have similarities with AtR2R3 MYB and OsR2R3 MYB respectively. This suggests that although there may be similarities between the R2R3 MYB genes of A. marina with the other two plants at the nucleotide level, the results may not be reflected in phylogenetic tree generated using the corresponding proteins. This may be due to the fact that the algorithms for phylogenetic analysis consider only the best alignment scenarios when comparing gene sequences from multiple genera simultaneously. Therefore, AmR2R3 MYB formed a separate clade, as the protein sequences for R2R3 MYB from A. thaliana and O. sativa were more similar to each other than to A. marina.

AmR2R3 MYB in the Regulation of Gene Expression During Drought and Salinity Stress
The conserved DNA motifs in promoter regions of genes allow various regulators to bind and control gene expression [54]. Analysis of these motifs in AmR2R3 MYB gene promoter regions led to identification of a number of motifs involved in abiotic stress response. The "MYB" motif was found to be most abundant and has been reported to be involved in drought response by acting as a favourable binding site for bZIP TFs [55,56]. Similarly, "MYC" CRE is associated with response to drought stress and acts as a binding site for bHLH TFs [57,58]. "ABRE" and "DRE" motifs have been well characterised and known to regulate gene expression in response to drought stress in ABA-dependent and ABA-independent gene expression, respectively [59]. In addition to these, numerous other stress responsive elements, such as "STRE", "LTR", and "W box", were also found which indicate that AmR2R3 MYB gene family plays a crucial role in abiotic stress tolerance in A. marina.
In silico gene expression analysis revealed a number of members of the AmR2R3 MYB family which are differentially expressed during drought and salinity stress. Interestingly, the expression of these genes was affected more by drought stress than salt and was more evident in leaf tissue rather than root. Roots are the first tissues to be affected by salinity [60] and the R2R3 MYB genes are known to be differentially expressed in plants like Arabidopsis and rice [3] under conditions of high salinity. However, the absence of such marked difference in expression of AmR2R3 MYB could be attributed to the natural adaptation of A. marina to regions of high salinity which allows it to tolerate higher levels of salinity. Drought stress, on the other hand, had a more noticeable effect on the AmR2R3 MYB, especially in the leaf tissue. Only two members, AmMYB87 and AmMYB39, showed considerable differential expression in response to salt stress and could be promising candidates for functional characterisation.

Conclusions
This study enumerates the unique characteristics of the R2R3 MYB gene family in the monotypic halophyte Avicennia marina. It sheds light on the evolution and functional diversification of the AmR2R3 MYB family based on comparisons with model plants such as Arabidopsis and rice. We have also identified important candidates that may be crucial to abiotic stress response in plants. This study can act as a foundation for selecting candidates for further functional characterisation.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-439 5/11/1/123/s1, Figure S1: Logos and sequences of the motifs identified in the AmR2R3 MYB peptide sequences, Figure S2: Motifs identified in the peptide sequences of AmMYB63 and AmMYB137, File S1: File containing the fasta formatted CDS and peptide sequences of AmR2R3 MYB genes predicted in this study, Table S1: Nomenclature of scaffold to designate chromosomes, Table S2 Funding: This study was funded by the institutional grant from Department of Biotechnology, Government of India.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found here: Whole genome assembly and annotation files for Avicennia marina can be found at [35]. RNA-seq data used in gene expression analysis was downloaded from NCBI SRA database: Accession numbers SRR2029733, SRR2029734, SRR2029735, SRR2029736, SRR2029738, SRR2029739.