Insight into the B3Transcription Factor Superfamily and Expression Profiling of B3 Genes in Axillary Buds after Topping in Tobacco (Nicotiana tabacum L.)

Members of the plant-specific B3 transcription factor superfamily play important roles in various growth and developmental processes in plants. Even though there are many valuable studies on B3 genes in other species, little is known about the B3 superfamily in tobacco. We identified 114 B3 proteins from tobacco using comparative genome analysis. These proteins were classified into four subfamilies based on their phylogenetic relationships, and include the ARF, RAV, LAV, and REM subfamilies. The chromosomal locations, gene structures, conserved protein motifs, and sub-cellular localizations of the tobacco B3 proteins were analyzed. The patterns of exon-intron numbers and arrangement and the protein structures of the tobacco B3 proteins were in general agreement with their phylogenetic relationships. The expression patterns of 114 B3 genes revealed that many B3 genes show tissue-specific expression. The expression levels of B3 genes in axillary buds after topping showed that the REM genes are mainly up-regulated in response to topping, while the ARF genes are down-regulated after topping.


Introduction
Transcription factors (TFs) are an important group of proteins that regulate gene expression at the transcriptional level by binding to specific DNA sequences. TFs participate in many biological processes as vital elements, functioning in a coordinated fashion to direct the cell life cycle, cell migration, and cellular organization during embryonic development and in response to signals from outside the cell. With the completion of genome sequencing in many plant species, genome-wide identification and analyses of genes encoding plant TFs is now straightforward [1]. In sixteen monocot and 38 eudicot species with genome sequences, the total number of TF genes identified ranged from 888 to 3714 [2]. Many TF families are present in plants as well as animals, bacteria, and yeast, including the homeodomain, MYB, bHLH, MADS, and bZIP families [1]. However, other classes of TFs are considered plant-specific, such as the AP2/EREBP, NAC, WRKY, GRAS, SBP and B3 families [1,[3][4][5]. In our study, we chose to focus on the B3 transcription factor superfamily.
All members of the B3 superfamily contain the plant-specific B3 domain, which has an average length of 100 amino acid residues. This domain was initially characterized in the Viviparous1 (VP1) gene of maize [6] and the VP1 orthologue Abscisic acid-insensitive 3 (ABI3) from Arabidopsis thaliana [7] and possesses a sequence-specific DNA binding activity [5,8]. Several other domains, including APETALA2 (AP2), auxin response factor (ARF), auxin/indole-3-acetic acid (Aux/IAA), and zinc finger Cys-and Trp-containing domain (zf-CW), are also found in many multi-domain B3 proteins in addition to the B3 domain, and are considered to mediate protein-protein interaction and/or dimerization [9]. Based on the presence of these domains, the B3 superfamily can be classified into four distinct gene families, which include the LEAFY COTYLEDON2 [LEC2]-ABSCISIC ACID INSENSITIVE3 [ABI3]-VAL (LAV), RELATED TO ABI3 and VP1 (RAV), AUXIN RESPONSE FACTOR (ARF) and REPRODUCTIVE MERISTEM (REM) families [4].
The LAV family consists of the AFL and the VAL subfamily [4,10]. There are six LAV members in the Arabidopsis genome. Three well-studied genes, ABI3, FUSCA3 (FUS3), and LEAFY COTYLEDON2 (LEC2), belong to the AFL subgroup; these genes are known to control embryogenesis and the accumulation of storage reserves, and the genes show distinct temporal and spatial expression patterns during embryogenesis. In addition, the three AFL proteins can cooperate with each other and are influenced by mutual regulatory interactions during seed development and other developmental processes [10][11][12][13][14][15][16][17][18]. The three members of the VAL subgroup are VP1/ABI3-LIKE 1 (VAL1), VAL2 and VAL3, also known as HIGH-LEVEL EXPRESSION OF SUGAR-INDUCIBLE GENE 2 (HSI2), HSI2-LIKE 1 (HSL1) and HSI2-LIKE 2 (HSL2), respectively [19][20][21], and three other protein domains are present in addition to the B3 domain: a zf-CW domain, an ethylene response factor-associated amphiphilic repression (EAR) motif and a plant homeodomain-like (PHD-L) domain. Based on these domains, VAL proteins repress the seed maturation program and initiate germination and vegetative development by coordinating repression of the AFL network during seed germination rather than seed development [10,[21][22][23][24][25][26][27].
The REM family represents the most diverse subfamily in the B3 superfamily. Relatively few members have been functionally characterized, which is inconsistent with the large number of REM proteins. All REMs contain one to seven B3 domains, which are often present in more than two copies. The B3 domains present in the REM proteins show variation with respect to amino acid sequence and length [30,99]. Moreover, REM genes are phylogenetically divergent and extensively duplicated, and are mostly located in clusters in the Arabidopsis genome [4,30,100]. The first REM gene isolated, BoREM1, is expressed specifically in reproductive meristems in cauliflower, and was proposed to be involved in the determining the identity of the floral meristem [101]. AtREM1, the Arabidopsis ortholog of BoREM1, is preferentially transcribed in reproductive meristems, and functions in the floral organ development [102]. VRN1 functions in maintenance of the vernalization response and is proposed to be involved in the epigenetic repression of FLOWERING LOCUS C (FLC) in a non-sequence-specific DNA binding manner [99,[103][104][105][106]. Another REM gene, VERDANDI (VDD), plays a role in female gametophyte development in Arabidopsis, as a direct target of the MADS-domain ovule identity complex that includes SEEDSTICK (STK), SEPALLATA3 (SEP3) and SHATTERPROOF1/2 (SHP1/2) [107][108][109]. Six REM genes are differentially expressed in the SHP2 expression domain [110]. Moreover, eight REM genes were demonstrated to be targets of the floral homeotic MADS transcription factor AGAMOUS [111].
Tobacco is cultivated as an economic crop, a potential bioenergy crop and is an important research model plant for studying fundamental biological processes and plant disease susceptibility [112][113][114][115]. During growth, the shoot architecture of tobacco is controlled by apical dominance and development process. After removal of the apical dominance by topping (removing the inflorescence just before flowering), the axillary buds rapidly grow into branches, which can severely affect the biologic and economic yields of tobacco. At present, few genes involved in the development of axillary buds have been characterized in tobacco. It is well known that B3 TFs have multiple functions in various pathways that regulate branch development by controlling hormone response genes, sugar response genes, and meristem developmental genes. However, the B3 gene families remain relatively poorly characterized in tobacco. Therefore, it is a necessary first step to analyze the B3 gene families at a genome-wide level. Fortunately, the genomes of four species of Nicotiana, including several cultivars of the allotetraploid tobacco have been sequenced [115][116][117][118]. In addition, mutant libraries of tobacco based on both activation-tagging and chemical mutagenesis have been constructed and characterized [119,120]. These resources provide an opportunity to enhance our understanding of the function of B3 gene families in branching development in tobacco compared with those of Arabidopsis.
In this work, we identified 114 B3 genes in the tobacco genome using comparative genome analysis. The evolutionary relationships and gene structures were investigated to gain insight into the role of B3 genes in regulating the processes of growth and development in tobacco. The expression patterns of 114 B3 genes analyzed using RNA-seq data from common tobacco revealed that many B3 genes show tissue-specific expression. The expression levels of B3 genes in axillary buds were assayed by Quantitative Reverse Transcription Polymerase chain reaction (qRT-PCR) after topping treatment, which showed that the expression of REM genes mainly responded positively to topping; on the contrary, ARF genes were down-regulated after topping.

Identification and Classification of B3TF Family Members in Tobacco
Whole genome protein sequences from tobacco and Arabidopsis were downloaded from http://solgenomics.net/organism/Nicotiana_tabacum/genome (SNG database) and http://www. arabidopsis.org/. The following strategy was used to analyze each B3 gene from the genome of tobacco. The Hidden Markov Model (HMM) profile of the B3 domain (PF02362) was obtained from the Pfam database (http://pfam.sanger.ac.uk). The conserved B3 domain was used to search for B3 protein sequences using HMMER software with default parameters. B3 domain in the proteins obtained was predicted and identified again, and the proteins with no B3 domains were eliminated. The B3 proteins in tobacco and Arabidopsis were compared to identify orthologs by searching a local database that was built using Arabidopsis B3 proteins, and the E-value was set at 1e-10. BLASTP and TBLASTN with default parameters were also performed to identify further the NtB3 proteins using AtB3 protein sequences from tobacco protein and mRNA sequences, respectively. After removing the redundant sequences, all of the non-redundant and high-confidence proteins were assigned as tobacco B3 family members. The isoelectric points and protein molecular weights were predicted with the ExPASy Proteomics analysis tools (http://expasy.org). The sub-cellular localizations were predicted using the Protein Subcellular Localization Prediction Tool (https://www.genscript.com/psort.html).When the subcellular locations of TFs were not predicted in the nuclear, the ortholog B3 proteins of Arabidopsis were localized according to the data of TAIR and NCBI.

Multiple Sequence Alignments and Phylogenetic Analysis
Multiple sequence alignments of the predicted total B3 proteins in tobacco were conducted using ClustalW with the default pairwise and multiple alignment parameters. The protein sequences of each of the LAV, RAV, and ARF subfamilies from both tobacco and Arabidopsis were aligned using ClustalW with default parameters. In addition, multiple sequence alignments of the REM subfamilies from both tobacco and Arabidopsis were performed using Muscle with default parameters, as the alignments of the REMs based on ClustalW failed to be conducted because of the complexity of sequence similarity. Phylogenetic trees were generated using the ML (Maximum likelihood) method as implemented in MEGA 6.0 with 1000 bootstrap replicates based on the JTT matrix-based model [121,122].

Chromosomal Location
The chromosomal locations of the B3 genes were retrieved from the SGN database. The physical positions of the B3 genes in identified tobacco were mapped to the chromosomes with MapDraw V2.1 software [123].

Gene Structure and Conserved Motif Analysis
The structures of the B3 genes were analyzed and illustrated using the Gene Structure Display Server (GSDS) tool (http://gsds.cbi.pku.edu.cn; [124]). The distributions of the conserved motifs in the tobacco B3 proteins were predicted with MEME 5.0.2 (Multiple Expectation Maximization for Motif Elicitation, http://meme-suite.org/) using the following parameters: the optimum motif width was set from 6 to 200, and the maximum number of motifs to identify was set to 15 motifs.

Expression Pattern Analysis of Tobacco B3 Genes
High-throughput RNA-seq data of Nicotiana tabacum (N. tabacum) was downloaded from https: //www.ncbi.nlm.nih.gov/sra. The RNA-seq data for B3 genes in 11 tobacco tissues were extracted and used to analyze the expression of B3 genes in different tissues by the FPKM (fragments per kilobase per million reads mapped) method. To investigate the tissue-specific expression patterns of B3 genes in tobacco, we analyzed and extracted the data of the B3 genes separately from each tissue included in this study with a consistent method using Tophat and Cufflinks in the R statistical computing environment. To visualize the expression pattern of B3 genes in different tissues, the treated expression data were standardized and analyzed using Cluster software, and then processed into a CDT profile, which could then be used to generate the Heat Map by Treeview V1.1.6 software.

Plant Growth, Topping Treatment, and Quantitative Real-Time PCR Analysis
The N. tabacum flue-cured cultivar K326 was used to study B3 gene expression in this study. The plants were grown in the greenhouse, and two groups of rapidly growing wild-type plants were used in this research. One group was allowed to grow normally without topping while the other group was topped, and the axillary buds from the upper and lower parts of the tobacco plants in the two groups were then collected for RNA extraction to analyze the expression of tobacco B3 genes. Total RNA was extracted using the Plant RNA Extraction Kit (TaKaRa, Dalian, China) with three biological replicates according to the manufacturer's instructions. We used qRT-PCR to determine the relative mRNA levels of the genes, which were calculated using the 2 −∆∆Ct method. The gene-specific primers given in Table S1 based on the 114 non-coding regions of the B3 genes were designed for qRT-PCR specifically to avoid non-specific amplification.

Identification of B3 Genes in the Tobacco Genome
In this study, 114 B3 tobacco genes were identified through extensive searches for B3-type domains and the annotation information for B3 genes in the SGN database. The characteristic features of B3 genes are listed in Table 1 and Table S2. The lengths of proteins encoded by the B3 genes varied from 99 to 1106 amino acids, with predicted protein masses ranging from 11,220.96 kD to 122,010.84 kD. The theoretical isoelectric points (pI) ranged from 4.5 to 9.8. The results of sub-cellular localization prediction suggested that 89 B3 proteins are localized to the nuclear, five are localized to the chloroplast, one protein is localized to mitochondrion, and the remaining 21 proteins are localized in the cytoplasm ( Table 1, Table S2). However, the Arabidopsis ortholog proteins of five tobacco ARFs localized to the cytoplasmic and chloroplast, ten REMs localized to the cytoplasmic, and two REMs localized to the chloroplast were all localized to the nuclear according to the data of TAIR and NCBI. Arabidopsis ortholog proteins of six tobacco REMs localized to the cytoplasmic were all localized to the chloroplast (Table S2).The results indicated that the predictions of subcellular locations of ortholog B3 proteins are not completely identical in different species.
To survey the size characteristics of B3 proteins in plants, the findings of some previous studies on B3 genes in several representative species, including the dicot species Arabidopsis (Brassicaceae) and tomato (Solanaceae), and the monocot rice, are summarized in Table 2. The total number of predicted B3 genes in tobacco was found to be similar to the numbers in Arabidopsis (118), rice (91), and tomato (97). However, the number of ARFs found in tobacco was double that found in tomato. Moreover, the numbers of ARF subfamilies in the parental species of allotetraploid tobacco were both twenty-three, nearly half that of N. tabacum. The numbers of RAV and LAV subfamilies in the parental species were all four, compared with five members in allotetraploid tobacco. The numbers of REM subfamilies in the parental species were forty-four and fifteen, respectively, of which total number is equal to that found in allotetraploid tobacco. The statistics of the number of B3 genes in four species showed that the distribution of B3 genes into the four subfamilies in different species is similar, especially when comparing tobacco with tomato, regardless of the total number of B3 genes or the number of genes in each subfamily.

Phylogenetic Relationship Analysis of B3 Proteins in Tobacco
To investigate the phylogenetic relationships among the tobacco B3 genes, a local Arabidopsis B3-family protein database was searched with the 114 tobacco B3 protein sequences to identify the orthologous genes in the tobacco B3 family. Based on the phylogenetic classification of the B3 superfamily in Arabidopsis, the 114 tobacco B3 proteins were classified into four groups, as in Arabidopsis (Figure 1, Figures S1-5). A phylogenetic analysis was then performed based on the multiple sequence alignment of all tobacco B3 proteins. The phylogenetic tree constructed using the Maximum Likelihood method clustered the B3 proteins into four large clades ( Figure 1). There are five proteins classified in the LAV subfamily, five proteins in the RAV subfamily, 45 proteins in the ARF subfamily, and the remaining 59 proteins were classified into the REM subfamily, which is considerably larger than the others (Figure 1, Figures S2-5). The members of the REM subfamily in Arabidopsis is inconsistent with different studies that used different classification standards [4,9,30], showing REM proteins to be a very complicated group, and that the classification is no longer apparent. In our study, we did not further classify the REM family into smaller groups. In summary, similar to surveys on B3 superfamily proteins in other plant species, the REM subfamily contains the largest number of proteins of the four subfamilies in the B3 superfamily, and the ARF subfamily also contains a large number of proteins. Both the LAV and RAV subfamilies are small, with just five members each, considerably fewer than in either the ARF or REM subfamilies. Distribution of the 114 tobacco B3 proteins in four subfamilies is consistent with the classification of B3 proteins in three other species, especially in tomato, which, like tobacco, belongs to the botanical family Solanaceae (Table 2). These results suggested that the tobacco protein sequences and their classification are reliable, and that they could therefore be used in subsequent analyses.

Chromosomal Location of B3Genes in the Tobacco Genome
The chromosomal distribution results revealed that 54 of the B3 genes were unevenly distributed on 18 tobacco chromosomes, 12 genes were anchored to unattributed scaffolds that consist of only scaffold length and ID, and 17 genes had no chromosomal location information in the tobacco genome ( Figure 2, Table S3). No B3 genes were mapped to six tobacco chromosomes, including chromosomes 6, 7, 8, 16, 20, and 21. Tobacco chromosome 17 has 10 B3 genes, which is the most of any chromosome. We also found that chromosome 17 is the longest among all of the 24 tobacco chromosomes, and that the B3 genes located on this chromosome mainly consist of REM and ARF subfamily genes. Chromosomes 12, 15, and 19 also have a large number ofB3 genes, containing 5, 6, and 5 genes, respectively. Five chromosomes (1, 4, 5, 11, and 24) have only a single gene each, which is the lowest number ( Figure 2, Table S3). Thirty-one of the B3 genes were found to have 1-3 homologous genes (Table S3). Both NtREM22 and NtREM41 have three identical homologous genes, including NtREM53, NtREM56, and NtREM57. NtREM24, NtREM25, and NtREM40 have two homologous genes (Table S3). To gain further insight into the evolutionary relationships among the 114 tobacco B3 proteins, ML bootstrap consensus trees for each B3 protein subfamily in tobacco and Arabidopsis were generated using Clustal (Figures S2-4) and Muscle ( Figure S5). These classifications showed that NtLAV3 and NtLAV4 are in a clade with VAL2 of Arabidopsis, and NtLAV1 and NtLAV5 are more closely related to VAL3 in A. thaliana ( Figure S2). In the RAV subfamily, NtRAV1, 2, and 5 group in a large clade with the Arabidopsis NGA proteins, and NtRAV3, and NtRAV4 are in another large clade with three AtNGAL proteins ( Figure S3). Overall, the tobacco ARFs show closer evolutionary relationships to the ARFs of Arabidopsis compared to the evolutionary relationships of the other three subfamilies between tobacco and Arabidopsis ( Figure S4). There are 13 tobacco proteins that have >90% sequence homology with Arabidopsis proteins ( Figure S4). However, the evolutionary relationships of the REMs between tobacco and Arabidopsis are very low ( Figure S5).

Chromosomal Location of B3 Genes in the Tobacco Genome
The chromosomal distribution results revealed that 54 of the B3 genes were unevenly distributed on 18 tobacco chromosomes, 12 genes were anchored to unattributed scaffolds that consist of only scaffold length and ID, and 17 genes had no chromosomal location information in the tobacco genome ( Figure 2, Table S3). No B3 genes were mapped to six tobacco chromosomes, including chromosomes 6, 7, 8, 16, 20, and 21. Tobacco chromosome 17 has 10 B3 genes, which is the most of any chromosome. We also found that chromosome 17 is the longest among all of the 24 tobacco chromosomes, and that the B3 genes located on this chromosome mainly consist of REM and ARF subfamily genes. Chromosomes 12, 15, and 19 also have a large number of B3 genes, containing 5, 6, and 5 genes, respectively. Five chromosomes (1, 4, 5, 11, and 24) have only a single gene each, which is the lowest number ( Figure 2, Table S3). Thirty-one of the B3 genes were found to have 1-3 homologous genes (Table S3). Both NtREM22 and NtREM41 have three identical homologous genes, including NtREM53, NtREM56, and NtREM57. NtREM24, NtREM25, and NtREM40 have two homologous genes (Table S3).

Gene Structure and Conserved Motif Analysis of B3 Superfamily Proteins
To understand the structural diversity of the tobacco B3 genes, we analyzed the exon/intron arrangements in the coding regions. As a whole, the exon/intron numbers were high, and they varied from 2 to 15 in the B3 superfamily genes. The

Gene Structure and Conserved Motif Analysis of B3 Superfamily Proteins
To understand the structural diversity of the tobacco B3 genes, we analyzed the exon/intron arrangements in the coding regions. As a whole, the exon/intron numbers were high, and they varied from 2 to 15 in the B3 superfamily genes. The ARF subfamily members are predicted to have 2-15 exons, with an average of 10.64. While 11 of the ARF genes have only 4-5 exons, most of 45 tobacco ARF genes have more than 10 exons. The NtARF22 in individual clade have the minimum number of exons. Genes located on the same branches of the phylogenetic tree share similar gene structures and exon/intron numbers (Figure 3). The five members of the RAV subfamily have 2-3 exons each, and the gene structures are also similar within the clade. All tobacco LAV genes have more than 10 exons except for NtLAV4, which contains six exons, and the exon/intron organizations of the LAV genes are consistent. However, the REM subfamily members have diverse gene models and exon/intron numbers. The genes in the REM subfamily have 2-11 predicted exons, with an average of 6.6. In the REM subfamily, the exon numbers may differ between some neighboring genes in the phylogenetic tree. In the B3 protein family, the REM subfamily is the most numerous and divergent, not only in tobacco, but in all species in which these genes have been studied (Figure 3).
The functional conserved domains in the tobacco B3 proteins were compared and analyzed ( Figure S6). In ARF subfamily, all members have a B3 domain and a middle domain, except for NtARF19 and NtARF40 (Table S2, Figure S6). The NtARF19 and NtARF40 proteins are short and have only a B3 domain without the auxin response factor and AUX_IAA domains ( Figure S6). The B3 domain and middle domain in most of the ARF proteins are located close to the N termini (Table S4, Figure S6). Twenty-two members of the ARF subfamily contained all three motifs, and these proteins were mainly distributed in three large clades ( Figure S6). In general, proteins that show close evolutionary relationships were always similar with respect to the types and quantities of motifs they contained. The five RAV proteins have only a B3 domain ( Figure S6). Three members of the LAV subfamily possess both a B3 domain and a zf-CW (CW-type zinc-finger) domain, while the remaining two members only contain a B3 domain ( Figure S6). The REM proteins each contain 1-5 B3 domains, and they are evenly distributed in the REM proteins that contain between 3 and 5 B3 domains. In the REM proteins that contain two B3 domains, they are mainly found in the N-terminal and C-terminal regions (Table S4). The B3 domains are located in the N-termini and middle regions in most of the REM proteins that have a single B3 domain (Table S4). The similar distributions and numbers of B3 domains in REM proteins were congruent with the branch distances in the evolutionary tree between two REM proteins. The length variations in several of the domains in the B3 superfamily may affect the exact core structure of the proteins [9]. In tobacco, the length of the B3 domain varies from 41 to 106 amino acids, with an average of 90.9, and the ARF and AUX_IAA domains also show similar length variations that range from 30 to 83 and 38 to 115 amino acids, respectively. The length of the zf-CW domain in the three LAV proteins was very consistent, at 42 amino acids in all three proteins (Table S2, Table S4).
To further explore the structure of the tobacco B3 proteins, the conserved motifs of the B3 proteins were analyzed based on the phylogenetic relationships. Ten conserved motifs, which we named 1 to 10, were identified using the MEME tool (Figures 4 and 5, Supplementary Data S1). Proteins in the same clades tended to have similar motif types and arrangements (Figure 4), which supports the results of the gene structure analysis and the phylogenetic analysis. In addition, we conducted analyses of the conserved B3, ARF, and Aux/IAA domains ( Figures S7-9), and found that Motifs 1,2,6 are fragments on the B3 domain, motif 10 is the ARF domain, and Motif 8 is the Aux/IAA domain ( Figure 5, Figures S7-9).    . Sequence logos of the ten conserved motifs identified using MEME tools from the B3 proteins in tobacco. Total height of the residues stack indicates the information content of that position in the motif. Height of residues within the stack indicates the probability of each residue at that position.

Tissue-Specific Expression Profiling of the B3 Genes in Tobacco
The expression patterns of the B3 genes at different developmental stages were investigated by analyzing the RNA-sequencing data in 12 different tobacco tissues that represent the entire tobacco growth cycle (Figures 6 and 7). In tobacco seedlings, the highly expressed genes in the shoot apex and shoot were mainly in the ARF subfamily. The ARF genes NtARF35, NtARF12, NtARF18, NtARF25, NtARF4, NtARF31, NtARF6, and NtARF45 are expressed at high levels in the shoot apex, while NtARF9, NtARF38, NtARF28, NtARF32, NtARF35, and NtRAV1 show high expression levels in the seedling shoot ( Figure 6). In seedling roots, in addition to the ARF genes NtARF14, NtARF23, and NtARF2, several REM genes are also expressed at high levels, such as NtREM37 and NtREM38 ( Figure  6). A few genes in the shoot are expressed at similar levels in the shoot apex; however, these genes exhibited the opposite expression pattern in root tissue compared with the other two tissues. An example is NtREM37, which suggests that expression of these genes is tissue-specific during the seedling stage in tobacco. In summary, most of the genes expressed in the shoot apex are tissue specific. There are 10 ARF and 13 REM genes show shoot-specific expression, and both the RAV and LAV families have one member that is specifically expressed in the shoot. Genes that show rootspecific expression are found in both the ARF and REM families; notably, the LAV family contains two genes that are expressed specifically in the root ( Figure 6). In adult plants, NtARF12, NtARF13, NtARF31, NtARF40, NtARF44, NtREM26, NtREM3, NtREM43, and NtREM19 are expressed at very Figure 5. Sequence logos of the ten conserved motifs identified using MEME tools from the B3 proteins in tobacco. Total height of the residues stack indicates the information content of that position in the motif. Height of residues within the stack indicates the probability of each residue at that position.

Tissue-Specific Expression Profiling of the B3 Genes in Tobacco
The expression patterns of the B3 genes at different developmental stages were investigated by analyzing the RNA-sequencing data in 12 different tobacco tissues that represent the entire tobacco growth cycle (Figures 6 and 7). In tobacco seedlings, the highly expressed genes in the shoot apex and shoot were mainly in the ARF subfamily. The ARF genes NtARF35, NtARF12, NtARF18, NtARF25, NtARF4, NtARF31, NtARF6, and NtARF45 are expressed at high levels in the shoot apex, while NtARF9, NtARF38, NtARF28, NtARF32, NtARF35, and NtRAV1 show high expression levels in the seedling shoot ( Figure 6). In seedling roots, in addition to the ARF genes NtARF14, NtARF23, and NtARF2, several REM genes are also expressed at high levels, such as NtREM37 and NtREM38 (Figure 6). A few genes in the shoot are expressed at similar levels in the shoot apex; however, these genes exhibited the opposite expression pattern in root tissue compared with the other two tissues. An example is NtREM37, which suggests that expression of these genes is tissue-specific during the seedling stage in tobacco. In summary, most of the genes expressed in the shoot apex are tissue specific. There are 10 ARF and 13 REM genes show shoot-specific expression, and both the RAV and LAV families have one member that is specifically expressed in the shoot. Genes that show root-specific expression are found in both the ARF and REM families; notably, the LAV family contains two genes that are expressed specifically in the root ( Figure 6). In adult plants, NtARF12, NtARF13, NtARF31, NtARF40, NtARF44, NtREM26, NtREM3, NtREM43, and NtREM19 are expressed at very low levels in all tissues except flowers, including young flowers, mature flowers, and senescent flowers (Figure 7). The NtARF12, NtARF13, NtARF31, and NtREM26 genes show their highest expression levels in young flowers, and are also expressed at high levels in senescent flowers. Similarly, NtREM3, NtREM43, NtREM19, NtARF40, NtARF2, and NtARF25 have their highest expression levels in mature flowers, and also show high levels of expression in senescent flowers. NtREM19 and NtARF4 are expressed at high levels in mature flowers and senescent flowers. As shown in Figure 7, over the blooming cycle of the flower, the expression levels of several ARF genes including NtARF23, NtARF45, NtARF45 and NtARF5 decreased gradually, while the relative expression of some REM genes, including NtREM15, NtREM41, NtREM27, and NtREM18, increased. We also found that for genes with high expression levels in young flowers, expression was much higher than in mature and senescent flowers. In the dry seed capsule, several genes from the REM subfamily show very high expression levels; these include NtREM33, NtREM4, NtREM55, NtREM10, NtREM36, and NtREM37. In both the mature and senescent leaf, B3 genes have similar expression profiles. During leaf senescence, the expression levels of NtARF21, NtARF11, NtARF16, NtARF28 NtREM36, and NtREM37 declined. NtARF22 and NtRAV2 were found to be expressed in mature leaves, and NtARF43 showed a high level of expression in senescent leaves. In general, NtARF12, NtARF13, NtARF31, NtARF40, NtREM26, NtREM3, NtREM43, and NtREM19 show flower-specific expression, and the expression of NtREM4, NtREM9, NtREM10, NtREM55, and NtLAV3 is specific to the dry capsule. NtRAV2 is specifically expressed in the mature leaf, and the expression of NtREM30 and NtREM31 is specific to the root (Figure 7). low levels in all tissues except flowers, including young flowers, mature flowers, and senescent flowers (Figure 7). The NtARF12, NtARF13, NtARF31, and NtREM26 genes show their highest expression levels in young flowers, and are also expressed at high levels in senescent flowers. Similarly, NtREM3, NtREM43, NtREM19, NtARF40, NtARF2, and NtARF25 have their highest expression levels in mature flowers, and also show high levels of expression in senescent flowers. NtREM19 and NtARF4 are expressed at high levels in mature flowers and senescent flowers. As shown in Figure 7, over the blooming cycle of the flower, the expression levels of several ARF genes including NtARF23, NtARF45, NtARF45 and NtARF5 decreased gradually, while the relative expression of some REM genes, including NtREM15, NtREM41, NtREM27, and NtREM18, increased. We also found that for genes with high expression levels in young flowers, expression was much higher than in mature and senescent flowers. In the dry seed capsule, several genes from the REM subfamily show very high expression levels; these include NtREM33, NtREM4, NtREM55, NtREM10, NtREM36, and NtREM37. In both the mature and senescent leaf, B3 genes have similar expression profiles. During leaf senescence, the expression levels of NtARF21, NtARF11, NtARF16, NtARF28 NtREM36, and NtREM37 declined. NtARF22 and NtRAV2 were found to be expressed in mature leaves, and NtARF43 showed a high level of expression in senescent leaves. In general, NtARF12, NtARF13, NtARF31, NtARF40, NtREM26, NtREM3, NtREM43, and NtREM19 show flower-specific expression, and the expression of NtREM4, NtREM9, NtREM10, NtREM55, and NtLAV3 is specific to the dry capsule. NtRAV2 is specifically expressed in the mature leaf, and the expression of NtREM30 and NtREM31 is specific to the root (Figure 7).

Expression Profiles of B3 Genes in Tobacco Axillary Buds in Response to Topping
Topping is widely practiced in tobacco cultivation, and tobacco axillary buds grow rapidly after topping. Therefore, we quantified the expression levels of all newly identified tobacco B3 genes in the upper and lower parts of the tobacco axillary buds after topping using qRT-PCR (Figures 8 and  9, Figures S10-12). No SYBR Green signals were detected for four genes (NtARF19, NtARF42, NtREM14, and NtREM39), while normal signals were detected for the internal control gene. Comparing the results of the B3 gene expression levels between HB1U (upper axillary buds before topping) and HB2U (upper axillary buds after topping) showed that the expression levels of NtREM8, NtREM30, NtREM33, NtREM35, NtREM36, NtREM37, NtREM47, and NtREM48 increased, while the expression levels of 13 genes including NtREM16, NtREM25, NtREM27, NtREM54, NtREM57, NtARF1, NtARF22, NtARF31, NtARF35, NtARF40, NtARF41, NtLAV5, and NtRAV1 decreased ( Figure  9). We also compared the expression levels of B3 genes between HB1D (lower axillary buds before topping) and HB2D (lower axillary buds after topping), and found that in this group, the relative expression of eight genes (NtARF32, NtARF39, NtREM20, NtREM26, NtREM40, NtREM55, NtREM56, and NtREM59) increased, while expression of six genes (NtRAV2, NtRAV5, NtREM31, NtARF28, NtARF30, and NtARF33) decreased (Figure 9). It is worth noting that there are six genes in which expression changed significantly in these two types of comparisons (Figure 9). Among of them, NtREM33 and NtREM48 were up-regulated, while NtARF31 was down-regulated, in both the upper and lower axillary buds after topping. Expression of the remaining four genes was up-regulated in the upper axillary buds but was down-regulated in the lower axillary buds. The relative expression of many B3 genes changed significantly in the axillary buds after topping; this was especially true for NtREM8, NtREM30, NtREM32, NtREM36, NtREM37, and NtREM56, where expression increased by 23-, 9-, 8-, 8-, and 8-fold, respectively, compared to expression before topping. Also, the relative changes in expression after topping were 7-fold for NtRAV1 and 14-fold for NtARF30 compared to before topping (Figure 9). From these results, we found that many genes showed significant changes

Expression Profiles of B3 Genes in Tobacco Axillary Buds in Response to Topping
Topping is widely practiced in tobacco cultivation, and tobacco axillary buds grow rapidly after topping. Therefore, we quantified the expression levels of all newly identified tobacco B3 genes in the upper and lower parts of the tobacco axillary buds after topping using qRT-PCR (Figures 8 and 9, Figures S10-12). No SYBR Green signals were detected for four genes (NtARF19, NtARF42, NtREM14, and NtREM39), while normal signals were detected for the internal control gene. Comparing the results of the B3 gene expression levels between HB1U (upper axillary buds before topping) and HB2U (upper axillary buds after topping) showed that the expression levels of NtREM8, NtREM30, NtREM33, NtREM35, NtREM36, NtREM37, NtREM47, and NtREM48 increased, while the expression levels of 13 genes including NtREM16, NtREM25, NtREM27, NtREM54, NtREM57, NtARF1, NtARF22, NtARF31, NtARF35, NtARF40, NtARF41, NtLAV5, and NtRAV1 decreased (Figure 9). We also compared the expression levels of B3 genes between HB1D (lower axillary buds before topping) and HB2D (lower axillary buds after topping), and found that in this group, the relative expression of eight genes (NtARF32, NtARF39, NtREM20, NtREM26, NtREM40, NtREM55, NtREM56, and NtREM59) increased, while expression of six genes (NtRAV2, NtRAV5, NtREM31, NtARF28, NtARF30, and NtARF33) decreased (Figure 9). It is worth noting that there are six genes in which expression changed significantly in these two types of comparisons (Figure 9). Among of them, NtREM33 and NtREM48 were up-regulated, while NtARF31 was down-regulated, in both the upper and lower axillary buds after topping. Expression of the remaining four genes was up-regulated in the upper axillary buds but was down-regulated in the lower axillary buds. The relative expression of many B3 genes changed significantly in the axillary buds after topping; this was especially true for NtREM8, NtREM30, NtREM32, NtREM36, NtREM37, and NtREM56, where expression increased by 23-, 9-, 8-, 8-, and 8-fold, respectively, compared to expression before topping. Also, the relative changes in expression after topping were 7-fold for NtRAV1 and 14-fold for NtARF30 compared to before topping (Figure 9).
From these results, we found that many genes showed significant changes in expression after topping, and the responses of these genes to topping were not identical in the upper and lower axillary buds of tobacco (Figures 8 and 9).
Genes 2019, 10, x FOR PEER REVIEW 17 of 30 in expression after topping, and the responses of these genes to topping were not identical in the upper and lower axillary buds of tobacco (Figures 8 and 9).

Discussion
The plant-specific B3 superfamily, which was defined because the B3 proteins contain at least one B3 domain, is one of the largest families of transcription factors, and its members play important roles in phytohormone, sugar, and other signaling pathways involved in plant growth and development [12,13,17,18,33,125]. Although there have been several previous studies of the B3 family in plants, our study is the first to investigate and characterize the B3 family genes in tobacco. In this study, we identified 114 B3 genes through a comprehensive search for the B3 protein domain, and a phylogenetic analysis grouped the proteins into four subfamilies that reflected the grouping of the Arabidopsis B3 proteins [4], and included the ARF, LAV, RAV, and REM subfamilies. Although

Discussion
The plant-specific B3 superfamily, which was defined because the B3 proteins contain at least one B3 domain, is one of the largest families of transcription factors, and its members play important roles in phytohormone, sugar, and other signaling pathways involved in plant growth and development [12,13,17,18,33,125]. Although there have been several previous studies of the B3 family in plants, our study is the first to investigate and characterize the B3 family genes in tobacco. In this study, we identified 114 B3 genes through a comprehensive search for the B3 protein domain, and a phylogenetic analysis grouped the proteins into four subfamilies that reflected the grouping of the Arabidopsis B3 proteins [4], and included the ARF, LAV, RAV, and REM subfamilies. Although tobacco is an allotetraploid plant (2n = 48), and the number of chromosomes in tobacco is twice the number found in tomato and rice, the total number of B3 genes in tobacco is not much more than that in tomato and rice, and fewer than that in Arabidopsis. However, the number of ARF in parental species of the allotetraploid tobacco was both the same with that of tomato and nearly half to that of N. tabacum. The numbers of RAV and LAV subfamilies in parental species were nearly the same numbers as those in tomato and the allotetraploid tobacco. The numbers belonging to the REM superfamily is variable in different species, and relatively, very few members of the REM subfamily have been functionally characterized. Therefore, based on the distribution and numbers of the ARF, LAV, and RAV subfamilies in different species, and the close genetic relationship between the parental species of the allotetraploid tobacco, we can speculate that the genome of allotetraploid tobacco has not yet been perfectly assembled and annotated in the SNG database.
The B3 proteins mainly contain five types of functional domains: B3, AP2, zf-CW, ARF, and Aux/IAA. The B3 domain possesses sequence-specific DNA binding activity and tends to recognize the CACCTG sequence in AtRAV1 [32]. In rice, experimental evidence has shown that the B3 domain contributes to nuclear localization of ARF proteins in addition to the DNA binding activity [126]. In our research, we found that the B3 domain is mainly located in the N-terminal region of B3 proteins, except for proteins in the REM subfamily. It is known that some localization-related signal sequences are often located at the N-termini of proteins. Most of the REM proteins contain more than one B3 domain, and show variation in the sequence and length of the B3 domains, which we found to be the case with tobacco REM proteins ( Figure S6). These multiple B3 domains in proteins could be derived from one or more B3 domain duplication events. During evolution, domain duplications can generate new genes that enable the plant to respond to external stimuli, and also enrich the functions of proteins in a gene family, thereby increasing functional diversity [9]. However, the functional characteristics of these proteins with domain duplications are not easy to determine, because many of the REM proteins are functionally redundant. The functions of only two REM genes have been well studied to date. In addition, the crystal structures of RAV1 and AT1G16640.1 from Arabidopsis showed that an open β-barrel consisting of seven β-strands and two α-helices between the β-strands constitutes the primary structure of the B3 domain [4]. This tertiary structural model was verified in different members of the B3 family by modeling, and the tertiary structure of the B3 domains is similar in the different subfamilies of the B3 superfamily, even though the protein sequence similarities are low [4,31], which shows that the structure of the B3 domains is well conserved. The zf-CW domain is present in most of the VAL proteins in the LAV subfamily, and possesses DNA-binding activity. In tobacco, we identified five LAV proteins, of which three members, NtLAV2, 3, and 5, have a zf-CW domain. Interestingly, the evolutionary relationships between these three proteins are not very close, but NtLAV1 and NtLAV3 share genetic relationships of >99% with NtLAV5 and NtLAV4, respectively; two proteins without zf-CW domains. We speculate that loss of the zf-CW domain in NtLAV5 and NtLAV4 occurred during evolution. The AP2 domain, which plays an important role in the development of the meristem, floral organs, and seed coat in Arabidopsis, is another DNA binding domain found in some RAV proteins in addition to the B3 domain. RAV proteins mainly include two types, one type has an AP2 domain, and the other does not. In our study, the five tobacco RAV members do not contain AP2 domains. Based on the evolutionary relationships between the RAV proteins of Arabidopsis and tobacco, the five tobacco RAV proteins are closer to NGA and NGAL in Arabidopsis, which do not have AP2 domains. In Arabidopsis, NGALs are involved in flower and seed development, while the NGA proteins are mainly involved in flower and leaf development and regulation of leaf morphogenesis [44,45]. Therefore, the RAV genes of tobacco may have significant roles in flower, seed, and leaf development. The middle domain and Aux/IAA domain are both found in ARF proteins. In tobacco, the middle domain is located at the N-terminal end of the protein, and the Aux/IAA domain or domain III/IV is usually located at the C-terminus, which is consistent with the findings of previous studies [62,70]. NtARF19 and NtARF40 are very short proteins that have only a B3 domain. Again, as with NtLAV5 and NtLAV4, we can speculate that domain loss occurred in these two proteins during evolution. It is interesting that no expression or very low levels of expression have been detected for NtARF19, indicating that NtARF19 may have become a pseudogene, similar to AtARF23 in Arabidopsis.
The exon-intron organization analysis, motif analysis, and gene expression analysis further supported our phylogenetic analysis. Genes for the more closely related proteins tend to have similar exon-intron arrangements, and the proteins have similar structures. Unique features in the gene and/or protein structures of some B3 genes are partially due to the extensive expansion of this gene superfamily through gene duplication or gene fragmentation and loss in other genes.
The results of previous chromosome localization studies have indicated that REM genes are inclined to cluster in the genome, and this was observed in 11 plant species including maize, rice, sorghum, and Arabidopsis. A similar situation was also found in tobacco (Figure 2), where the largest cluster consists of nine REM genes in a 103.6 kb genomic region located on chromosome 15. Furthermore, we also found that many B3 genes are also clustered on other chromosomes. The large number of REM genes that arose from extensive gene duplication may be responsible for the clustering. However, there is at present no practical experimental evidence to explain this observation.
The expression levels of genes in different tissues and developmental stages are closely related to their function. We investigated the expression patterns of tobacco B3 genes in different tissues, which contributed to an understanding of the potential functions of these genes. In seedlings, we found that many genes are expressed in the stem and root. In the seedling shoot, several genes in the REM and RAV subfamilies have high expression levels in addition to the ARF genes. In seedling roots, in addition to NtLAV1 and NtLAV5, the highly expressed genes are mainly members of the ARF and REM subfamilies. In the seedling shoot apex, there are fewer genes with high expression levels than in the other two tissues, but the number of genes that are highly expressed is similar between the three tissues. Most of the genes expressed in the shoot apex had high expression levels and are mainly in the ARF subfamily. We hypothesize that the high auxin content in the shoot apex activates transcription of these ARF genes. It is worth mentioning that the highly expressed genes in different tissues exhibit tissue-specific expression patterns. In various tissues and organs of adult tobacco, constitutively expressed genes are rarely found in analyses of transcriptome data. Many genes are specifically transcribed at certain developmental stages. NtARF44 showed high expression levels in young flowers; NtREM3, NtREM43, and NtARF2 are highly expressed in mature flowers; and NtREM27, NtREM15 and NtARF3, NtARF11, and NtARF15 were expressed at high levels in senescent flowers. However, the expression of NtARF2 and NtARF25 increased and then decreased during flower development, peaking in mature flowers, indicating that these two genes can regulate the entire floral developmental process. In Arabidopsis, ARFs also are key factors in floral morphogenesis [77,81,82]. All the evidence indicates that these genes are specifically responsive at different stages of flower development in tobacco (Figure 7). At the same time, some genes are highly expressed in two associated developmental stages. NtREM19 shows high expression levels in mature and senescent flowers, and NtREM10 is highly expressed in senescent flowers and dry capsules, indicating that these genes play vital roles in the progression between different stages of development ( Figure 7). NtREM4, NtREM55, NtREM38, and NtLAV3 are highly expressed only in the dried capsule, suggesting that these genes are associated with fruit development. The expression patterns of the tobacco REM genes are supported by previous investigations into the expression and transcript features of BoREM1 and other REM genes [101][102][103]105,110,111]. NtARF11, NtARF15, and NtRAV1 are highly expressed in young leaves; NtARF43 is the only gene that showed a high level of expression in senescent leaves and a low level of expression in other tissues. NtARF10 and NtARF21 are highly expressed during three stages of leaf development. These results show that ARF genes are involved in the development and senescence of tobacco leaves, which is consistent with previous reports [76,77,127]. RAV gene expression in tobacco leaves may be the same as the NGATHA genes that regulate plant leaf development via the auxin signaling pathway in Arabidopsis [41]. We also found that NtLAV1 and NtLAV5, which are closely related phylogenetically, have similar expression patterns, and their expression levels are very low in almost all tissues assayed. The same things were found for NtLAV3 and NtLAV4, which have a close genetic relationship and are mainly expressed in the dry capsule. NtLAV3 and NtLAV4 share a close evolutionary relationship with the VAL genes that function to repress the seed maturation program and vegetative development [10,[22][23][24].
In addition, we investigated the expression of B3 genes in tobacco axillary buds after topping, which is an important agronomic practice for producing tobacco leaves of good quality. Because the upper and lower parts of the tobacco plant are affected differently by apical dominance, we sampled the axillary buds from the upper and lower parts of the plants. Compared to the control, many tobacco REM genes showed significant up-regulated expression after topping, in both upper and lower axillary buds. However, only four ARF genes were up-regulated to some extent. Interestingly, all up-regulated ARF genes were in the lower axillary buds, and the expression levels of all three ARF genes that showed significant changes in expression levels in the upper axillary buds were down-regulated. It is interesting that all the ARF genes with significant down-regulating expression were transcriptional repressors, such as NtARF22, NtARF35, and NtARF41 in upper axillary buds, NtARF28, NtARF30, and NtARF33 in lower axillary buds, whose middle domain is rich in proline (P), serine (S), and threonine (T). The ARF genes with significantly up-regulating expression were transcriptional activators, such as NtARF32 and NtARF39 in lower axillary buds, whose middle domain is rich in glutamine (Q)-rich middle region ( Figure 10). These results further support the model of gene response to auxin in that the ARF genes in the upper axillary buds can respond to high auxin concentrations in the control of apical dominance, and the activity of these proteins are repressed when auxin concentrations decrease as a result of removing apical dominance by topping.   The increased expression of many REM genes shows that REM genes regulate the development of axillary buds. In addition, the expression level of NtRAV1 in the upper axillary buds was significantly lower, and NtRAV2 and NtRAV5 in the lower axillary buds were down-regulated in response to topping treatment, which indicated that the RAV family plays a crucial role in axillary bud growth and development. It is worth noting that the sequence homology between NtRAV2 and NtRAV5 is >90%, and the gene structures and protein structures, as well as the changes in expression in response to topping are very similar, indicating that these two genes might have redundant functions. In the LAV family, the NtLAV5 gene was found to be down-regulated in the upper axillary buds and up-regulated in the lower axillary buds after topping. We speculate that NtLAV5 regulates the growth of axillary buds via auxin pathways. The sum of all the evidence presented here shows that B3 gene family is very important in tobacco growth and development, especially with respect to axillary bud growth and development. Therefore, our systematic analysis of the B3 gene family in tobacco will be of great value for future research into the molecular mechanisms that control growth and development in tobacco, especially the growth and development of the axillary buds.  Table S1: List of primers used in quantitative real time-PCR analysis; Table S2: Characteristics of B3 transcription factor superfamily in tobacco; Table S3: Detail information of B3 Gene ID and Chromosomal location; Table S4: Summary of functional domains present in tobacco B3 proteins; Figure S1: ML original tree of the 114 tobacco B3 proteins.; Figure S2: ML bootstrap consensus tree of LAV subfamily which contains five and six proteins in tobacco and Arabidopsis, respectively.; Figure S3: ML bootstrap consensus tree of RAV subfamily which contains five and thirteen proteins in tobacco and Arabidopsis, respectively.; Figure S4: ML bootstrap consensus tree of ARF subfamily which contains 45 and 23 proteins in tobacco and Arabidopsis, respectively; Figure S5: ML bootstrap consensus tree of REM subfamily which contains 59 and 76 proteins in tobacco and Arabidopsis, respectively; Figure S6: Protein structures of B3 family members in tobacco.; Figure S7: Sequence logos of the B3 domain which is represented by Motifs 1, 2, and 6.; Figure S8: Sequence logos of the ARF domain which is represented by Motif 10; Figure S9: Sequence logos of the AUX/IAA domain which is represented by Motif 8; Figure S10: qRT-PCR analysis of all the LAV and RAV subfamily genes.; Figure S11: qRT-PCR analysis of 43 ARF subfamily genes. The expression levels of NtARF19 and NtARF42 were not detected; Figure S12: qRT-PCR analysis of 57 REM subfamily genes. The expression levels of NtREM14 and NtREM39 were not detected; Supplementary Data S1: Sequences of the ten conserved motifs identified using MEME tools from all the B3 proteins in tobacco.
Author Contributions: D.W. and Y.S. conceived and designed the research. F.X., T.S., S.Y., J.C., X.L., G.L. and D.W. performed technical work for bioinformatic analysis. F.X., X.W., J.H., and M.C. performed technical work for expression pattern analysis. F.X., T.S., and D.W. analyzed and summarized the data. F.X. and D.W. wrote the manuscript. All authors read and approved the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.