Genome-Wide Characterization and Expression Analysis of GeBP Family Genes in Soybean

The glabrous-enhancer-binding protein (GeBP) family is a family of plant-specific transcription factors, whose members share a central DNA-binding domain. Previous studies have already proven that GeBP genes are involved in the control of cell expansion but not cell proliferation in Arabidopsis. However, there has not yet been a versatile analysis of the GeBP genes’ function in soybean (Glycine max L.). Here, we identified and named 9 GmGeBP genes in the soybean genome. These genes were distributed on 7 of the 20 chromosomes and the intron numbers ranged from zero to one. According to the phylogenetic tree, 52 GeBP genes obtained from four plant species were clustered into major four groups. Through the RNA-seq analysis of the nine GmGeBP genes, 8 of 9 GmGeBP genes were be found to expressed differentially across the 14 tissues. Additionally, among nine GmGeBP genes, only GeBP4 were highly expressed in abnormal trichome soybeans, which was predicted to be involved in trichome development. This genome-wide analysis of GmGeBP genes helps to provide an overview of the evolution and functions of two kinds of soybean plants. These results will help to clarify the potential functions and characteristics of GmGeBP genes in the soybean life cycle.


Introduction
The soybean is regarded as an important oil and food crop with a high protein content around the world, and it provides a vital source of human food, animal feed and cooking oil [1]. However, soybean plants are often exposed to various harmful environments, such as salinity, drought, pH and temperature variations and heavy metals, which limit soybean production. In order to alleviate the damage caused by environmental conditions, the plants developed rich protective structures during evolution [2]. For example, the plants' epidermal hair, which is a special kind of skin structure, is widely distributed amongst land plants and protects the plants from environmental stress [3,4].
Transcription factors (TFs) are important regulators of developmental processes and stress response. In plants, TFs often bind the cis-acting element to affect gene expression in response to stress, eventually protecting against or reducing damage to plants. The glabrous-enhancer-binding protein (GeBP) family is a family of transcription factors specific to plants, whose members share a central DNA-binding domain [3,5]. In Arabidopsis, GALBROUS1 (GL1) was an early gene discovered to regulate the initiation of epidermal hair growth, the glabrous-enhancer-binding protein can regulate the occurrence of epidermal hair by adjusting the expression pattern of GL1 [6]. The conserved motif analysis of the Arabidopsis GeBP sequence revealed that it possesses a basic amino acid region and a leucine zipper region, suggesting that GeBP may belong to the bZIP transcription factor family.
However, it was found that the interval between these two conserved domains exceeds 9 residues, which is not a standard definition of a bZIP protein [7,8]. N-x7-R/K-x9-L-x6-L-x6-L is the basic structure of a typical bZIP conserved domain, which includes alkaline domains and Leu zipper domain separated by 9 residues. Among the 21 GeBP family members in Arabidopsis, the research has focused on the similarities of the GeBP/GeBP-LIKE formation of a unique clade and the sharing of an undefined extra-endian conserved region [3].
It is worth noting that the GEBP/GPL gene represents a newly defined class of leucine zipper (Leu zipper) transcription factors, and they play redundant roles in the regulation of the cytokinin hormone pathway [9]. A recent study demonstrated the Arabidopsis GeBP-LIKE 4 (GPL4) transcription factor as an inhibitor of root growth that is induced rapidly in the root tips in response to cadmium (Cd) [10]. These research outcomes suggested that the GeBP family gene is not only involved in the developmental process of the plant but also protects against environmental stress. The constitutive expressor of pathogenesis-related gene-5 (CPR5) in Arabidopsis (Arabidopsis thaliana) displayed highly pleiotropic functions, particularly in pathogen responses, cell proliferation, cell expansion and cell death. It was found that GeBP/GPLs are involved in the control of cell expansion in a CPR5-dependent manner but not in the control of cell proliferation by regulating a set of genes that represents a subset of the CPR5 pathway [11].
The soybean is an ancient tetraploid, and is the main oil and protein crop around the world. Although GEBP gene members in Arabidopsis and rice have been characterized [3], a comprehensive analysis of GmGeBP at the genome-wide level in soybean is still needed. Hence, we identified and characterized the GmGeBP gene family in soybean at the genomewide level. The cis elements of the GmGeBP promoter region were also analyzed. The expression patterns of GmGeBP genes in several tissues and under different treatments such as drought, salinity, cold and heat were investigated through the publicly available transcriptome data and our quantitative real-time polymerase chain reaction (qRT-PCR) results. Our study will not only widen the gene information on the GmGeBP family in soybean, but will also provide a useful source and new insights for scientists to analyze the functions of GmGeBP in the future.

Identification of GeBP Gene Family in Soybean
A total of 9 members of the GmGeBP gene family were distributed on 7 of 20 soybean chromosomes, while GeBP3 and GeBP4 existed on Chromosome 10, GeBP8 and GeBP9 existed on chromosome 20, respectively ( Table 1). The bioinformatics analysis of 9 members showed that they had different characteristics in terms of their amino acid length and WM and pI values. The amino acid lengths ranged from 353 to 448, the molecular weights ranged from 39.647 to 49.235 kDa and the pI values ranged from 4.65 to 9.08 (Table 1). No significant differences were found between the acids and bases among the GeBP proteins, except one (GmGeBP2), which was basic. Wolf PSORT prediction was used to determine the subcellular locations of all GmGeBP proteins. Among the proteins, 8 members of the GmGeBP proteins most likely localized in the nucleus, while the GmGeBP8 protein localized in the cytoplasm.

Chromosomal Distribution and Intron-Exon Patterns of GmGeBP Genes
As shown in Figure 1A, the physical positions of the GmGeBP genes were obtained from the Phytozome database, and were used to map them to their corresponding chromosomes. Seven out of the twenty chromosomes possessed GmGeBP genes. The number of GmGeBP genes on each chromosome was only one or two. In detail, chromosomes 10 and 20 harbored two GmGeBP genes, while each of the remaining five chromosomes (chromosomes 3, 5, 13, 15 and 19) contained only one GmGeBP gene. Additionally, the location of each GeBP family member on the soybean chromosome was marked in the picture and the detailed data are shown in Table 1. The gene duplication events were also analyzed and the segmental duplication patterns are connected by blue lines in Figure 1. bp, base pair; aa, amino acids; Da, Dalton. WoLF PSORT predictions: chlo (chloroplast), cyto (cytosol), nucl (nucleus), E.R. (endoplasmic reticulum), mito (mitochondria), plas (plasma membrane), extr (extracellular), cysk (cytoskeleton), plas (plasma membrane), vacu (vacuolar membrane). TargetP predictions: C (chloroplast), M (mitochondrion), S (secretory pathway), -(any other location); values indicate score (0.00-1.00) and reliability class (1-5; best class is 1). CDS sequence data and genomic sequence data are shown in Supplementary Files S1 and S2.  The gene structure divergence occupies an important place in the evolution of gene families and provides extra evidence for the analysis of phylogenetic relationships [12]. Therefore, the intron-exon configurations of GmGeBP genes were constructed using the Gene Structure Display Server ( Figure 1B). The results obtained from the analysis of the gene structure of GmGeBP family members showed that the number of introns among the genes equaled only one. Only two GmGeBP genes (GmGeBP2 and GmGeBP4) were among the GeBP gene family members that had one intron and were clustered similarly. The remaining genes had no intron. It was also suggested that the structure of the GmGeBP genes was relatively stable and was not prone to variable shearing when replicating. The CDS sequence data and genomic sequence data are shown in Supplementary Files S1 and S2.

Conserved Motif Analysis of GmGeBP Proteins
The analysis provided the three conserved motif patterns of GmGeBP proteins obtained from MEME ( Figure 1C). The number of motifs that each GmGeBP protein contained was changed from five to eight. GmGeBP1 and GmGeBP7 contained eight conserved motifs and GmGeBP3 and GmGeBP9 contained seven conserved motifs. Additionally, half of the GmGeBP proteins (GmGeBP4, GmGeBP8, GmGeBP2 and GmGeBP5) contained six conserved motifs, while GmGeBP6 only had five conserved motifs. Motifs 1, 2, 3 and 4 were widely presented in all of the GmGeBP proteins, which indicated that they were conserved and might play important roles. Motif 5 existed in the proteins of GmGeBP7 1, 3, 4, 7, 8 and 9, which could be deemed to be one clade. Motif 6 in GmGeBP1 and GmGeBP7 proteins was shown in two locations. Furthermore, motif 7 was presented at the beginning of the GmGeBP1, GmGeBP7, GmGeBP4 and GmGeBP9 proteins. In general, closely related GmGeBP proteins on adjoining branches of the phylogenetic tree had the same or a similar motif constituent. For example, GmGeBP1 and GmGeBP7 both had eight conserved motifs. The low sequence diversity of GmGeBP protein domains suggested that the GmGeBP family members were stable after the genome duplication.

Phylogenetic Analysis of GmGeBP Genes in Plants
To investigate the phylogenetic relationships and the evolutionary history of GeBP proteins in soybean, a total of 52 GeBP proteins from Arabidopsis thaliana, Glycine max, Oryza sativa and Medicago sativa L. were used to construct an unrooted phylogenetic tree with the neighbor-joining method. As shown in Figure 2, the GeBP members from different plant species were divided into four major groups named I, II, III and IV. Group I was the largest subfamily, which contained 21 members, while group II was the smallest with four OsGeBP members. In detail, group I could be subdivided into two subgroups (a and b) according to the bootstrap values. The phylogenetic analysis showed that the GmGeBP5, GmGeBP6, GmGeBP2 and MtGeBP genes clustered in subgroup b. As depicted in group III, which had the majority of the GmGeBP gene members, GmGeBP4 grouped with GmGeBP8, GmGeBP3 grouped with GmGeBP9 and GmGeBP1 grouped with GmGeBP7 were clustered in this clade. As can be seen from the whole picture, it is interesting to note that the GmGeBP genes had a closer relationship with lucerne than the other species among them. Most GmGeBP proteins in the same group shared identical motifs and similar exon-intron patterns among the related genes. This was consistent with the fact that soybean and lucerne are both legumes.

Analysis of Cis-Elements in Putative GmGeBP Gene Promoters
To investigate the regulation patterns of GmGeBP, the cis-elements of the GmGeBP gene promoter were analyzed. The 2000 bp sequence upstream of the start codon of each GmGeBP gene was determined using Phytozome software. The cis-elements were divided into seven categories: stress response, hormone response, light response, promoterrelated, development-related, site-binding-related and unknown. Elven stress-responsive cis-elements were identified, including HSE, LTR, box S, TC-rich repeats, W box, WUNmotif, MBS, MBSI, MBSII, GC-motif and Box-W1, which reflected plant responses to heat, low-temperature, defense stresses, drought, anaerobic induction and fungal elicitors, respectively. Six kinds of hormone-responsive cis-elements were certified, including salicylic acid, MeJA, gibberellins, auxin and ethylene ( Figure 3). Nine kinds of development-related cis elements were identified, such as as-2-box, circadian, AC-I, AC-II, O2-site, MSA-like, ERE, GCN4_motif and Skn-1_motif, which influenced the shoot-specific expression, circadian control, phloem expression, zein metabolism regulation, cell cycle regulation, ethylene induced expression and endosperm expression. A relatively large number of light-responsive ciselements, promoter-related cis-elements and site-binding-related cis-elements in GmGeBP promoters were observed (Supplementary Files S5 and S6).

RNAseq Analysis of GmGeBP Genes in Various Tissues
To further investigate the expression patterns of GmGeBP genes during soybean development, the RNA-seq atlas data were used to analyze the expression profiles of 14 different tissues (young leaf, flower, one cm pod, pod shell 10 DAF, pod shell 14 DAF, seed 10 DAF, seed 14 DAF, seed 21 DAF, seed 25 DAF, seed 28 DAF, seed 35 DAF, seed 42 DAF, root and nodule) in the soybean cultivar William 82. The RNA-Seq atlas data of GmGeBP genes were downloaded from Soybase (http://soybase.org/soyseq/, accessed on 23 October 2021) [13]. However, the RNA-Seq atlas data of GmGeBP2 were not obtained, which might indicate that this gene is a pseudogene or is only expressed at specific developmental stages or under special conditions. As shown in Figure 4, we observed that the eight GmGeBP genes were mainly clustered into two classes on the hierarchical clustering analysis.

RNAseq Analysis of GmGeBP Genes in Various Tissues
To further investigate the expression patterns of GmGeBP genes during soybean development, the RNA-seq atlas data were used to analyze the expression profiles of 14 different tissues (young leaf, flower, one cm pod, pod shell 10 DAF, pod shell 14 DAF, seed 10 DAF, seed 14 DAF, seed 21 DAF, seed 25 DAF, seed 28 DAF, seed 35 DAF, seed 42 DAF, root and nodule) in the soybean cultivar William 82. The RNA-Seq atlas data of GmGeBP genes were downloaded from Soybase (http://soybase.org/soyseq/, accessed on 23 October 2021) [13]. However, the RNA-Seq atlas data of GmGeBP2 were not obtained, which might indicate that this gene is a pseudogene or is only expressed at specific developmental stages or under special conditions. As shown in Figure 4, we observed that the eight GmGeBP genes were mainly clustered into two classes on the hierarchical clustering analysis. Class I contained two GmGeBP members (GmGeBP4 and GmGeBP8) that were not expressed in these fourteen tissues. Class II contained six GmGeBP gene members (GmGeBP1, 3,[5][6][7]9), and all of the GmGeBP genes in this class were obviously upregulated in all fourteen tissue types. Here, two GmGeBP gene members (GmGeBP3 and GmGeBP9) revealed expression differences compared with other members in this class. At the young leaf stage to the flower stage, two genes (GmGeBP5 and GmGeBP7) were found to be downregulated, while the other six GmGeBP genes revealed upregulation. Half of the genes exhibited transcript abundance profiles with marked peaks in the tissues from the one cm pod, and four (GmGeBP1, 3, 5, 7) GmGeBP gene members were downregulated at two stages of the pod-shell phase. At the seed stage, the expression patterns of four (GmGeBP3, 5, 6, 7) GmGeBP gene members increased at 14 DAF, 25 DAF and 35 DAF but fell at 21 DAF, 28 DAF and 42 DAF. Meanwhile, the other 2 genes' expression profiles dropped at 14 DAF, 25 DAF and 35 DAF but increased again at 21 DAF, 28 DAF and 42 DAF. It was found that the majority of genes except two (GmGeBP3 and GmGeBP9) in this class examined were significantly expressed in the tissues of the roots and nodules.

Expression Analysis of Soybean GeBP Genes in Response to Trichome Development
In order to analyze the expression profiles of GmGeBP genes between trichome and abnormal trichome soybeans ( Figure 5), qRT-PCR was used to analyze the expression patterns of the nine GmGeBP genes. As can be seen from Figure 5, 66.7% of the GmGeBP (GmGeBP1, GmGeBP2, GmGeBP4, GmGeBP5, GmGeBP8 and GmGeBP9) gene expression levels were upregulated in abnormal trichome soybean compared with trichome soybean. However, the expression levels of three GmGeBP genes (GmGeBP3, 6 and 7) determined Class I contained two GmGeBP members (GmGeBP4 and GmGeBP8) that were not expressed in these fourteen tissues. Class II contained six GmGeBP gene members (GmGeBP1, 3,[5][6][7]9), and all of the GmGeBP genes in this class were obviously upregulated in all fourteen tissue types. Here, two GmGeBP gene members (GmGeBP3 and GmGeBP9) revealed expression differences compared with other members in this class. At the young leaf stage to the flower stage, two genes (GmGeBP5 and GmGeBP7) were found to be downregulated, while the other six GmGeBP genes revealed upregulation. Half of the genes exhibited transcript abundance profiles with marked peaks in the tissues from the one cm pod, and four (GmGeBP1, 3, 5, 7) GmGeBP gene members were downregulated at two stages of the pod-shell phase. At the seed stage, the expression patterns of four (GmGeBP3, 5, 6, 7) GmGeBP gene members increased at 14 DAF, 25 DAF and 35 DAF but fell at 21 DAF, 28 DAF and 42 DAF. Meanwhile, the other 2 genes' expression profiles dropped at 14 DAF, 25 DAF and 35 DAF but increased again at 21 DAF, 28 DAF and 42 DAF. It was found that the majority of genes except two (GmGeBP3 and GmGeBP9) in this class examined were significantly expressed in the tissues of the roots and nodules.

Expression Analysis of Soybean GeBP Genes in Response to Trichome Development
In order to analyze the expression profiles of GmGeBP genes between trichome and abnormal trichome soybeans ( Figure 5), qRT-PCR was used to analyze the expression patterns of the nine GmGeBP genes. As can be seen from Figure 5, 66.7% of the GmGeBP (GmGeBP1, GmGeBP2, GmGeBP4, GmGeBP5, GmGeBP8 and GmGeBP9) gene expression levels were upregulated in abnormal trichome soybean compared with trichome soybean. However, the expression levels of three GmGeBP genes (GmGeBP3, 6 and 7) determined via qRT-PCR analysis were weakly downregulated. We found that only GmGeBP4 was highly expressed in abnormal trichome soybean, while the other genes exhibited no obvious changes. via qRT-PCR analysis were weakly downregulated. We found that only GmGe highly expressed in abnormal trichome soybean, while the other genes exhib obvious changes.
This section may be divided by subheadings. It should provide a concise and description of the experimental results, their interpretation and the expe conclusions that can be drawn.

Discussion
According to the following observations, GeBP as a TF is predicted to play the hormone pathway. First, the GeBP protein binds to the cis-regulatory eleme GLABROUS1 gene, which is an myb gene involved in epidermis cell determina regulated by GA, a cytokinin hormone [6,14,15]. Secondly, the transcript levels are positively regulated by BREVIPEDICELLUS (BP), a gene of the KNO homeodomain (KNOX) family that positively regulates the cytokinin pathwa shoot apical meristem (SAM) [16][17][18]. A preliminary analysis of the GeBP gene fa been performed in Arabidopsis and rice. However, this family has not previou studied in soybean. Therefore, a systematic analysis of the GmGeBP gene fam This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation and the experimental conclusions that can be drawn.

Discussion
According to the following observations, GeBP as a TF is predicted to play a role in the hormone pathway. First, the GeBP protein binds to the cis-regulatory element of the GLABROUS1 gene, which is an myb gene involved in epidermis cell determination and Plants 2022, 11, 1848 9 of 13 regulated by GA, a cytokinin hormone [6,14,15]. Secondly, the transcript levels of GeBP are positively regulated by BREVIPEDICELLUS (BP), a gene of the KNOTTED1 homeodomain (KNOX) family that positively regulates the cytokinin pathway in the shoot apical meristem (SAM) [16][17][18]. A preliminary analysis of the GeBP gene family has been performed in Arabidopsis and rice. However, this family has not previously been studied in soybean. Therefore, a systematic analysis of the GmGeBP gene family was performed and gene expression patterns were determined between abnormal trichome and trichome soybeans.
In the present study, nine of the GmGeBP gene family members were identified in the soybean genome using the bioinformatics method. It is known that segmental and tandem duplications played a role in the evolution and expansion of gene families in plants [19]. Gene duplication events were also identified for GmGeBP genes (Supplementary File S8). It was revealed that no GmGeBP member was identified as a tandem-duplicated event.
In addition, four pairs of GmGeBP genes were found to be segmental duplicates. Here, 8 (88.9%) GmGeBP gene members were duplicated genes, which suggested that the gene duplication occupied a main position in the expansion of the GmGeBP family.
The analysis of the gene structure showed that the intron number of GmGeBP genes was one in GmGeBP2 and GmGeBP4. It was reported that two GeBP genes in rice included one intron and two GeBP genes in tomato included seven introns and one intron, respectively. Consequently, our result indicated that the structures of GeBP genes were stable. Furthermore, three pairs of GmGeBP genes (GmGeBP1 and 7, GmGeBP3 and 9, GmGeBP5 and 6) exhibited similar intron-exon structures and intron numbers, and exhibited high conservation in the evolutionary process and high coherence with the characteristics defined in the above phylogenetic analysis (Figure 4). The phylogenetic analysis revealed that the GeBP genes from different plant species were separated by a high bootstrap value (95%). The analysis in Arabidopsis thaliana, rice and lucerne indicated that the genes were mainly classified into four groups and were consistent with the former studies. The majority of the GmGeBP proteins in the same group shared consensus motifs and similar exon-intron structures within the related genes.
Promoters in the upstream region of genes may provide useful information to further investigate the functions of GmGeBP genes in different developmental stages of plants due to their important roles involving the developmental or environmental regulation of gene expression [20]. Thus, the cis-elements in the promoter sequences were predicted using Soybase. The cis-elements of gene promoters in GmGeBP members were identified as having different functions, such as they could participate in the response to drought, heat, low temperature, anaerobic induction and defense stresses. Meanwhile, they were involve in the induction of GA, MeJA, auxin and ethylene. Among these cis-elements, ABRE, TGACG-motif, TATC-box, TCA-element, ARE and AuxRE were related to abscisic acid (ABA), MeJA, gibberlin, salicylic acid, anaerobic induction and auxin, respectively [21][22][23][24]. Therefore, these results will contribute to further understanding the various functional roles of GmGeBP genes in the formation of the trichome and in response to adverse conditions.
In order to reveal the potential functions of GmGeBP genes, the expression profiles of eight GmGeBP genes in different tissues were analyzed. According to the RNA-seq analysis, there existed three kinds of expression patterns among the nine GmGeBP genes, containing no expression, constitutive expression and tissue-specific expression patterns. During the different developmental stages of soybean, GmGeBP4 and GmGeBP8 showed little expression through all fourteen stages. Meanwhile, four genes (GmGeBP1, 5, 6 and 7) were continuously highly expressed in all tested tissues. Two genes (GmGeBP3 and 9) showed tissue-specific expression patterns, mainly expressed in the roots and nodules.
Previous studies have shown that GeBP genes are involved in the regulation of cell expansion in a CPR5-dependent manner, but not in the regulation of cell proliferation [25]. A specific phenotype of the cpr5 mutant is the abnormal trichome development not found in the phenotype of any other constitutive pathogen response mutant [26,27]. In order to further clarify the role of the GeBP gene in the development of soybean trichome cells, we detected the expression of 9 GmGeBP genes between abnormal and normal trichomes via qRT-PCR. It has been reported that the GEBP/GPL gene plays an inhibitory role in cell expansion by offsetting the positive role of CPR5 in cell expansion [11]. In our results, there was one GmGeBP family member (GmGeBP4) displaying increased expression patterns in abnormal trichome soybean compared with trichome soybean. Thus, we could speculate that GmGeBP4 might be related to the regulation of the CPR5-dependent processes, which are involved in the formation of trichome.
Taken together, we performed a genome-wide analysis of GmGeBP genes in soybean and obtained a series of information, including sequence information, gene duplication, conserved motif, gene structure and phylogenetic relationship data. The promoter analysis and expression patterns of GmGeBP genes in different tissues helped us better reveal the potential function of members in the GmGeBP gene family. Finally, the expression levels of GmGeBP genes in abnormal trichome soybean and trichome soybean provided clues for further studies of the roles of GeBP genes in the CPR5-related cell expansion process.

Plant Materials and Stress Treatment
The trichome soybean seeds "A3127" and abnormal trichome soybean seeds "Guoyu98-2" were provided by Yumin Wang. The seeds were germinated and grown in a controlled chamber (25 • C/20 • C, day/night, 16 h/12 h light/dark cycle) [28]. The leaves were collected and frozen in liquid nitrogen immediately and stored at −80 • C. Three biological duplications and three technical duplications of each experiment were performed.

Identification of GeBP Genes in Soybean
In this study, the Soybase database (http://www.soybase.org, accessed on 15 September 2021) was used to download the information for soybean GeBP genes, including the gene locations, gene sequences, CDS sequence data and protein information (Supplementary Files S1-S3) [13]. The physicochemical parameters were obtained from Ex-PASy (http://www.expasy.org/tools/, accessed on 15 September 2021). The ORF lengths, chromosome locations and numbers of exons and gene strands for each gene were obtained from the Phytozome database (http://www.phytozome.net/soybean.php, accessed on 15 September 2021).
Furthermore, potential GeBP genes from three other species were selected to analyze the evolutionary relationships among the different plant species, and these GeBP genes were confirmed using the same method above.

Subcellular Localization, Conserved Motifs and Gene Structure Analysis of GmGeBP Proteins
Firstly, the subcellular localizations of GmGeBP genes were identified using Prot-Comp9.0. Then, the conserved motifs of GmGeBP gene members were obtained from MEME version 4.11.0 (http://memesuite.org/tools/meme, accessed on 15 September 2021). The gene structure analysis was performed using a Gene Structure Display Sever (GSDS; http://gsds.cbi.pku.deu.cn/, accessed on 15 September 2021). The genes were identified as showing segmental duplication if they were found to exist on duplicated chromosomal blocks [29]. The paralogs were certified to be tandem-duplicated genes if the two genes were separated by five or fewer genes in a 100 kb region [30].

Chromosomal Location and Phylogenetic Tree Construction of GeBPs
Information about the chromosomal locations of GmGeBP genes was obtained from the Soybase database. The multiple sequence alignment of a total of 52 GeBP genes was performed with Clustal X2.0 software using default parameters [31], which were as follows: the gap opening and extension penalty of the pairwise alignment were 10 and 0.1, respectively; the gap opening and extension penalty of the multiple alignment were 10 and 0.2, respectively; the separation distance was 4. The phylogenetic tree was constructed with MEGA version 11.0 using the neighbor-joining (NJ) method and a bootstrap analysis with 1000 replications [32].

Promoter Analysis
Regions 2000 bp upstream of the translation start site of each GmGeBP gene were downloaded from Soybase (Supplementary File S4). Cis-elements in promoters of each GmGeBP gene were obtained using the PlantCARE server (Supplementary Files S5 and S6).

RNA Extraction and qRT-PCR Analysis
The total RNA was extracted from fresh samples using the Trizol method according to the manufacturer's instructions. The quality of the RNA was certified via agarose gel electrophoresis before reverse transcription. The first-strand cDNA synthesis was performed as the instructions described. The gene-specific primers for GmGeBP genes were designed using Primer 5.0 (Supplementary File S7), and the raw data from the qRT-PCR analysis are shown in Supplementary File S9. The GmActin (LOC100792119) was used as an internal control. Three biological replicates were used per sample. Each sample was performed with three technical replicates. The relative expression level was calculated following the 2 −∆∆CT method [33].

Conclusions
In this study, we identified a GeBP gene family containing 9 GmGeBP genes in soybean, which were distributed on 7 of the 20 chromosomes. The GmGeBP proteins had three conserved motif patterns, indicating the conservation of its function. We also identified nine kinds of development-related cis elements and six kinds of hormone-responsive cis-elements in the putative GmGeBP gene promoters, indicating that the gene family is involved in multiple life processes. The trichome is a special epidermal cell, which protects plants from environmental stress and is an ideal model system for studying cell differentiation, cell cycle regulation and morphogenesis. In our results, one of the GeBP gene family members, GeBP4, was highly expressed in abnormal trichome soybean compared with normal trichome soybean, and was predicted to be involved in trichome development. Our study provides a theoretical basis for the functional study of GeBP4 in trichome development in the future.