Genome-Wide Identification and Expression Analysis of the HD-Zip Gene Family in Wheat (Triticum aestivum L.)

The homeodomain-leucine zipper (HD-Zip) gene family, as plant-specific transcription factors, plays an important role in plant development and growth as well as in the response to diverse stresses. Although HD-Zip genes have been extensively studied in many plants, they had not yet been studied in wheat, especially those involved in response to abiotic stresses. In this study, 46 wheat HD-Zip genes were identified using a genome-wide search method. Phylogenetic analysis classified these genes into four groups, numbered 4, 5, 17 and 20 respectively. In total, only three genes with A, B and D homoeologous copies were identified. Furthermore, the gene interaction networks found that the TaHDZ genes played a critical role in the regulatory pathway of organ development and osmotic stress. Finally, the expression profiles of the wheat HD-Zips in different tissues and under various abiotic stresses were investigated using the available RNA sequencing (RNA-Seq) data and then validated by quantitative real-time polymerase chain reaction (qRT-PCR) to obtain the tissue-specific and stress-responsive candidates. This study systematically identifies the HD-Zip gene family in wheat at the genome-wide level, providing important candidates for further functional analysis and contributing to the better understanding of the molecular basis of development and stress tolerance in wheat.


Introduction
The homeodomain-leucine zipper (HD-Zip) gene family is one of the key transcription factors in plants, playing a vital role in various abiotic stresses and signal transduction [1][2][3]. Generally, HD-Zip proteins possess the conserved HD domain, acting as a specific DNA binding site at the C-terminal, together with the adjacent leucine-zipper (LZ) motif that is responsible for protein dimerization [4,5]. Based on the sequence homology, DNA binding specificity and physiological functioning, HD-Zip genes were further divided into four groups, namely HD-Zip I, II, III and IV [6]. Members of groups I and II interacted with similar DNA binding sites of the pseudo-palindromic sequence CAATNATTG [5,7]. However, the members of group II encoded an additional common CPSCE motif consisting of five conserved amino acids-Cys, Pro, Ser, Cys and Glu-which acted as a sensory domain to redox the cell state and were located downstream of the Zip domain [8,9]. Group III and IV contained steroidogenicacute regulatory protein-related lipid transfer (START)

Genome-Wide Identification of HD-Zip Gene Family in Wheat
Wheat genome and protein sequences (release 36, accessed in 2017) were obtained from the Ensemble plants database [43] to predict HD-Zip genes. These sequences were first used to construct a local protein database with which to search against known HD-Zip protein sequences collected from A. thaliana (48) and O. sativa (48), through a local protein basic local alignment search (BLASTP) program (https://blast.ncbi.nlm.nih.gov) with an E-value cut-off <10 −5 and an identity of 50% as the threshold. The hidden Markov model (HMM) profile of the conserved HD domain of homeobox (PF00046) and the leucine zipper (LZ) domain (PF02183) sequences were download from the PFAM database [44] and used to examine all wheat protein sequences using the HMMER search tool [45]. After manual curating, the obtained protein sequences were checked using theNational Center for Biotechnology Information (NCBI)-Conserved domain database (CDD) search [46] to identify the conserved protein domain with the default parameters. The redundant sequences containing complete HD and LZ domains were further removed by alignment and the remaining ones was considered as putative wheat HD-Zip genes (TaHDZ). The coding sequences of these HD-Zip genes were retrieved from the wheat genome annotation information. Finally, Compute pI/MW tool in ExPASy database [47] was used to calculate the biochemical parameters of TaHDZs. Sub-cellular localization of these genes was predicted by Cell-Ploctool software [48].

Phylogenetic Analysis and Gene Duplication
The Arabidopsis HD-Zip proteins were downloaded from the Arabidopsis information resource (TAIR) database (http://web.arabidopsis.org) and those of rice were obtained from the rice genome annotation project (http://rice.plantbiology.msu.edu/). To investigate the evolutionary relationships among these TaHDZ genes, the ClutsalX1.83 program [49] was used to align the protein sequences of the wheat, rice and Arabidopsis HD-Zip genes (Supplementary file 1). The phylogenetic tree was constructed using MEGA6.0) [50] with the neighbor-joining (NJ) method and 1000 bootstrap replications.
The chromosome localizations of these genes were analyzed by mapping the gene sequences back to chromosome using the nucleotide basic local alignment search (BLASTN) program (https://blast. ncbi.nlm.nih.gov) with the E-value cutoff <10 −5 and the best hits were identified. Gene duplication was investigated following the method as described by Wang et al. [51]. Based on chromosome position and phylogenetic relationship, the homoeologous copies distributed in three sub-genomes A, B or D of wheat were identified. Then, the Circos tool [52] was used to visualize the duplicated regions in the wheat genome and the same colors show the homoeologous chromosomal segments.

Gene Structure and Protein Conserved Motifs Analysis
The gene structure information was obtained from the Ensemble plants database [43] and displayed in the gene structure display server (GSDS) program [53]. Conserved motifs of these genes were determined using the multiple EM for motif elicitation (MEME) program [54] with the parameters as follow: optimum motif widths of 6-200 residues and a maximum of 20 motifs. The schematic diagram of the amino acid motifs for each TaHDZ gene was drawn accordingly.

Expression Profile Analysis of TaHDZ Genes
The publicly available wheat RNA-Seq datasets were downloaded from the URGI database [55] and the NCBI sequence read archive (SRA) database (https://www.ncbi.nlm.nih.gov/sra), then used to analyze the expression profiles of the identified TaHDZ genes (Table S2). A total of 5 tissues including root, stem, leaf, spike and grain, as well as four stress treatments including cold (SRR1460552), heat (SRR1542413), drought (SRR1542409) and salt (SRR2306546) were used to identify tissue-specific or stress-responsive ones. Evaluation of the quality of RNA-Seq reads and trimming of the low-quality readings with a Phred quality (Q) score < 20 were performed using FastQCv0.11.5 [56]. Then, TopHat version 2.1.1 [57] was used to map the RNA-Seq reads to the wheat genome (release v36). Cufflinks Version 2.2.1 [58] was used to calculate the value of fragments per kilo base of transcript per million fragments mapped (FPKM) of these genes with the default parameters. The expression level was first normalized and then the log 10 -transformed values were used for visualizing the heat map using the R software (https://www.r-project.org/).

Interaction Network of TaHDZ Genes
To predict the regulatory role of TaHDZ genes, the interaction networks of TaHDZs with other wheat genes was constructed based on the orthologous relationship between Arabidopsis and wheat using the AraNetV2 tool [59]. The Arabidopsis gene ontology (GO) biological processes were used as the major reference set. The Cytoscape plugin, BiNGO [60], was used to analyze the orthologous gene and identify the biological pathways of the specific gene sets.

Plant Materials, Growth Conditions and Abiotic Stress Treatments
Seeds were planted in pots and filled with 1/2 Hoagland's liquid medium and grown in a greenhouse at 22 • C, with a 16 h photoperiod (12,000 lux) and 8 h dark period. Wheat variety Dekang No. 685 (DK, salt-tolerant) and Chinese spring (CS, salt-susceptible) were used for the salt treatment. Besides, Hanxuan No. 10 (HX, drought-tolerant) and CS were used for the drought treatment. For salt stress, two-week seedlings of DK and CS were treated with 200 mM sodium chloride (NaCl) for 24 h. HX and CS were treated with 19.2% polyethylene glycol (PEG) for 24 h to represent the drought treatment. Then, the plants materials were collected and immediately frozen in liquid nitrogen for RNA extraction. All samples were replicated three times.

Quantitative Real-Time Polymerase Chain Reaction Analysis
cTotal RNA was isolated from all the collected samples using the Plant RNA isolation kit (OmegaBioTek, Norcross, GA, USA), following the manufacturer's instructions. The RNA quality was checked using 1.0% (w/v) agarose gel stained with ethidium bromide (EB) and spectrophotometer analysis and then DNase I treatment was conducted to remove the DNA contaminations (Takara, Shiga-ken, Japan). The first strand complementary DNA (cDNA)s were synthesized using the cDNA amplification kit (Vazyme, Nanjing, China). The TaHDZ gene primers and 18S, as the reference gene, were used for quantitative real-time polymerase chain reaction (qRT-PCR), designed using Primer Premier 5.0 software [61] and are listed in Table S1. Then, qRT-PCR was performed using CFX96Touch (Bio-Rad, Hercules, CA, USA). Three biological replicates for each sample were performed and the expression level was evaluated using the 2 −∆∆Ct method.

Identification of TaHDZ Genes in Wheat
The availability of the complete genome sequence allowed the identification and analysis of gene family at the genome level in wheat. Increasing numbers of gene families such as MAPKKK, Aux/IAA and Annexin have been reported in wheat [34,62,63]. HD-Zip, as one of the plant-specific transcription factor gene families, is vital for regulating plant development and physiological processes as well as in response to abiotic stresses. However, little is known about the HD-Zip gene family in wheat. We identified and characterized the wheat HD-Zip gene family based on a genome-wide search approach. A total of 46 non-redundant genes containing the complete HD and LZ domains were obtained, which were considered as the putative wheat HD-Zip genes. These TaHDZ proteins ranged in length from 100 to 883 amino acids, with molecular weights ranging from 14.62 kDa to 95.73 kDa and the isoelectric points ranged from 4.61 to 10.68.
Subcellular localization analysis indicated that 21 TaHDZ are localized in the nucleus, 20 in the chloroplast, whereas only three and two were found in the peroxisome and mitochondrion, respectively (Table 1). Moreover, the proteins were grouped into 28 clusters based on their phylogenetic relationship. Among them, 15 clusters were assigned to different A, B or D sub-genomes, which were considered as the homoeologous copies of one TaHDZ gene. Finally, there are 28 clusters of HD-Zip gene family in wheat and termed as TaHDZ1-A to TaHDZ28-D according to their chromosomes position ( Table 1). The size of HD-Zip family in wheat is similar to that of rice, maize and peal millet, as well as Arabidopsis with 48, 55, 52 and 48, respectively, while are much higher than that of grape (Vitis vinifera L.) with 31 members, chrysanthemum (Chrysanthemum morifolium L.) with 17 members and citrus (Citrus sinensis L.) with 27 members and much lower than that of soybean (Glycine max L.) with 88 members and poplar (Populus trichocarpa L.) with 65 members [9,[64][65][66][67]. A previous study reported that the abundance of the HD-Zip gene family was not correlated to genome size, which was mainly dependent on tandem and segmental duplications during the process of genome evolution [65]. Our results show that the HD-Zip gene family did not expand during wheat polyploidization and genome evolution. Besides, fewer duplication events of HD-Zip genes were found in wheat.

Chromosome Localization Analysis of TaHDZs
Chromosome distribution was further analyzed by BLASTN against wheat genome sequence. Results revealed that the HD-Zip genes were unevenly distributed on wheat chromosomes. In total, 12, 19 and 15 TaHDZ were located on the A, B and D sub-genome, respectively. Chromosome groups two and four had 10 HD-Zip genes each, representing the most abundant regions, followed by group five with seven. At the same time, no TaHDZ genes were found on chromosomes 3A, 5A, 6B, or 7B ( Figure 1). Gene duplication is generally the main factor causing the expansion of the given gene family [68]. As a genetically allohexaploid species, wheat has a complex origin and evolutionary history, derived from three diploid donor species through two naturally interspecific hybridization events. As aresult, each wheat gene generally has three homologous loci arising from polyploidization [69]. Though sequence similarity and chromosome localization analysis, three TaHDZ genes, including TaHDZ9, TaHDZ15 andTaHDZ17, were found to have three copies on each of the A, B and D homoeologous chromosome. Twelve TaHDZ genes (TaHDZ1, -6, -7, -8, -10, -13, -16, -20, -23, -24, -25 and -28) contain two copies in the A, B, or D homoeologous chromosome. Just one copy of the remaining 13 genes was found in wheat chromosomes (Figure 1). Previous studies revealed that inversion and crossover might occur between homologous chromosomes during polyploidization to fractionate the coding region or delete some homologous sequences [70]. Our results indicated considerable homologous genes loss may occur in the wheat HD-Zip gene family, causing the loss of some homologous copies. The retention and dispersion of specific HD-Zip genes in homologous chromosomes could provide some insights into the mechanism of wheat chromosome interaction and genome evolution.

Phylogenetic Analysis of TaHDZ Genes
To investigate the phylogenetic relationships of the HD-Zip gene family, 46 TaHDZ proteins, together with 48 and 39 publicly available Arabidopsis and rice HDZ proteins, were selected for phylogenetic analysis. Based on the classification criteria in Arabidopsis and rice [6], the wheat HDZ proteins were clustered into four groups, I to IV. Specifically, 20, 17, 4 and 5 TaHDZ proteins were classified into groups I, II, III and IV, respectively. Group I was further divide into eight sub-groups, including group I-a (3), group I-β1 (3), group I-β2 (0), group I-б (7), group I-ε (0), group I-ξ (3), group I-r (4) and group I-φ (0) (Figure 2). Among them, group I was the most abundant in wheat, rice and Arabidopsis, accounting for 43%, 35% and 35%, respectively, similar to that of grape and sorghum having a percentage of 42% and 32%, respectively [9]. In contrast, in maize group II is the largest group, accounting for 32.72% [12,65]. Furthermore, no proteins were categorized into subgroup I-β2 and I-ε in wheat, maize, or rice but more than two members were present in both soybean and Arabidopsis ( Table 2). The HD-Zip gene family distribution has been reported to be predominant with a species bias [9,12]. This result indicated that gene loss in these sub-groups might occur during the divergence of dicots and monocots. Additionally, the number of proteins in group I-б in wheat was significantly higher, with seven members, compared to one in maize, two in rice, three in Arabidopsis and two in soybean, suggesting some TaHDZ genes also expanded, although duplication events of HD-Zip genes were unusual in wheat (Table 2 and Figure 3).   I-a   Group  I-β1   Group  I-β2   Group  I-ε   Group  I-б   Group  I-ξ   Group  I-r   Group  I-φ   Group  II   Group  III   Group  IV  Total   wheat  3  3  0  0  7  3  4  0  17  4  5  46  Maize  3  4  0  0  1  5  4  0  18  5  15  55  Rice  2  3  0  0  2  4  3  0  12  5  8  39  Arabidopsis  4  2  3  2  3  0  2  1  10  5  16  48  Soybean  8  4  2  2  2  8  2  0  27  12 19 86

Co-Expression Network between TaHDZ Genes and Other Genes in Wheat
To identify the biological function and interaction relationship between TaHDZs and other wheat genes, their interaction network was constructed using an orthology-based method. A total of nine TaHDZ genes were homologous with Arabidopsis, with 191 gene pairs of network interactions found, suggesting that TaHDZ proteins are widely involved in the wheat metabolic network and regulated diverse biological processes and pathways (Figure 4). Interacting genes were classified into three major gene ontology classes using GO annotations: diverse biological process, cellular component and molecular function (Table S2). Previous studies found that AtHB7 is involved in the regulation of organ development and plays a critical role in the response to osmotic stress [71]. In this study, TaHDZ20-B was found as the orthologous gene of AtHB7 in the wheat HD-Zip family. Further analysis found that they interacted with 71 wheat genes, including the stress-responsive gene LEA, MAPKKK18, MYB and NAC, suggesting they may be mainly involved in the response to abiotic stresses (Figure 4). The identified homology genes and the putative co-expression network analysis of TaHDZs have provided useful information for the further study of the biological function and transduction pathways of HD-Zip genes in wheat.

Conserved Motifs and Expression Profile Analysis of TaHDZ Genes
Different members of gene families generally exhibit disparities in abundance in different tissues or with different stressors [72]. To gain insight about the putative functions of the TaHDZ genes, the temporal and spatial expression profile of these identified TaHDZ genes were examined using the publicly available RNA-Seq data ( Figure 5). Results showed that some tissue-specific genes, such asTaHDZ7-A/B specifically expressed in the wheat spike, were found, although most members of group I had no significant expression differences in the five organs. Furthermore, the stress-responsive group I members were also analyzed. Result found that TaHDZ15-A/B/D were expressed in normal conditions, whereas higher expression was found after 6h of cold and drought stress. Additionally, the TaHDZ8-A/B transcript level increased for up to 6h after exposure to cold, heat, drought and salt stress (Figure 5b), suggesting they play a role in response to stress.
In group II, five genes, TaHDZ10-B/D, TaHDZ25-A/D, TaHDZ28-A/D, TaHDZ3-B andTaHDZ9-A/B/D, had significant expression differentiation in different tissues, whereas TaHDZ25-A/D, TaHDZ26-D, TaHDZ27-D and TaHDZ28-A/D showed significant high expression under stresses. Among them, TaHDZ26-D and TaHDZ27-D were markedly induced by cold and heat stress, respectively. The expression patterns of group I and II genes revealed that wheat HD-Zip I and II genes are relevant for a variety of stresses. Notably, the cold-responsive genes mainly belonged to group I, whereas the salt-responsive genes belonged to group II, consistent with previous results that indicated that group I and II genes play important roles in response to various stresses in plants [73]. For groups III and IV, most of the members showed no significant expression differences in different tissues or with different stresses (Figure 5b). However, TaHDZ13-B/D, belonging to group III, had a relatively high expression in the spike and grain but weak expression in the root and leaf, suggesting involvement in the reproduction process. TaHDZ13-B/D was the one

Conserved Motifs and Expression Profile Analysis of TaHDZ Genes
Different members of gene families generally exhibit disparities in abundance in different tissues or with different stressors [72]. To gain insight about the putative functions of the TaHDZ genes, the temporal and spatial expression profile of these identified TaHDZ genes were examined using the publicly available RNA-Seq data ( Figure 5). Results showed that some tissue-specific genes, such as TaHDZ7-A/B specifically expressed in the wheat spike, were found, although most members of group I had no significant expression differences in the five organs. Furthermore, the stress-responsive group I members were also analyzed. Result found that TaHDZ15-A/B/D were expressed in normal conditions, whereas higher expression was found after 6h of cold and drought stress. Additionally, the TaHDZ8-A/B transcript level increased for up to 6h after exposure to cold, heat, drought and salt stress (Figure 5b), suggesting they play a role in response to stress.
In group II, five genes, TaHDZ10-B/D, TaHDZ25-A/D, TaHDZ28-A/D, TaHDZ3-B and TaHDZ9-A/B/D, had significant expression differentiation in different tissues, whereas TaHDZ25-A/D, TaHDZ26-D, TaHDZ27-D and TaHDZ28-A/D showed significant high expression under stresses. Among them, TaHDZ26-D and TaHDZ27-D were markedly induced by cold and heat stress, respectively. The expression patterns of group I and II genes revealed that wheat HD-Zip I and II genes are relevant for a variety of stresses. Notably, the cold-responsive genes mainly belonged to group I, whereas the salt-responsive genes belonged to group II, consistent with previous results that indicated that group I and II genes play important roles in response to various stresses in plants [73]. For groups III and IV, most of the members showed no significant expression differences in different tissues or with different stresses (Figure 5b). However, TaHDZ13-B/D, belonging to group III, had a relatively high expression in the spike and grain but weak expression in the root and leaf, suggesting involvement in the reproduction process. TaHDZ13-B/D was the one member of group IV that showed significantly up-regulation under heat stress compared to the control, meaning it could be considered heat-responsive gene (Figure 5b). To obtain insight into the relationship between gene structure and expression of TaHDZs, the conserved motifs in these protein sequences were further predicted using the MEME program. A total of 20 conserved motifs were found (Figure 5c). The identified TaHDZs motifs ranged from 8 to 50 amino acids in length. The details of the sequence of all conserved motifs are shown in Figure S1. The motifs were unevenly distributed in those proteins, with the number of motifs ranging from 3 to 13. It showed that motif one and two, corresponding to the HD domain and motif three, corresponding to the LZ domain, were distributed in all the TaHDZ proteins. Notably, proteins in the same group seemed to share similar motif compositions. For examples, three motifs, including motif six, eight and nine were found to be related to the START domain, which was present in groups III and IV subfamilies, with the exception of TaHDZ5-D. Motif four, encoding the CPSCE domain, was present in each group II member, except for TaHDZ25, -26, -27 and -28. Additionally, the gene structure might also be involved in the control of gene expression patterns in various tissues or abiotic stresses (Figure 5b,c). For examples, motif 11, 17 and 20 only existed in TaHDZ23B/D and TaHDZ1A and these genes had relatively high expression in various tissues or under abiotic stresses. However, group I-б and IV, containing motif five, had relatively low expression in various tissues or under abiotic stresses (Figures 5b,c and S1).

Expression Profiles of TaHDZ Genes under Abiotic Stress in Wheat by qRT-PCR Analysis
To validate the stress-responsive candidates, 28 TaHDZs showing differential expression underabiotic stresses, based on the RNA-Seq data, were selected to conduct quantitative polymerase chain reaction (qPCR) analysis. Results showed that these genes had differential expression under different stresses and between tolerant and susceptible genotypes. In CS, which is susceptible to salt and drought stress, a total of 21 and 20 TaHDZ genes were induced by salt and drought stress, respectively, whereas 18 were induced by salt stress in the DK genotype (salt-tolerant) and 20 were induced by drought stress in the HX genotype (drought-tolerant). In total, 13 TaHDZ genes were induced by salt stress in both DK and CS. Among them, TaHDZ11 and TaHDZ19 were down-regulated and the others were up-regulated. Sixteen TaHDZ genes were induced by drought in both HX and CS, of which TaHDZ11, TaHDZ17 andTaHDZ18 were rapidly reduced 0.06-fold, 0.39-fold and 0.26-fold under drought stress in CS, whereas these genes were strongly induced to 50-fold, 100-fold and 2.64-fold in HX under drought treatment, suggesting these genes might play a vital role in the response to drought stress in wheat ( Figure 6). The control was an untreated seedling. Three biological replicates for each sample were performed and bars represented the standard deviations of the mean. Asterisks on top of the bars indicating statistically significant differences between the stress and counterpart controls (** p< 0.01, Student's t-test). Gene expression profiles were evaluated using the 2 −∆∆C method.
A large number of HD-Zip genes have been demonstrated to regulate the abiotic stresses response in model plants [12,25,26,71], providing indications of the biological function of wheat HD-Zip genes using orthology-based predictions. Oshox22 was found to be participate in ABA-mediated signal pathways, regulating drought and salt responses in rice [74]. Through phylogenetic analysis, TaHDZ8 was found as the orthologous gene of Oshox22 in this study. RNA-Seq data showed that TaHDZ8 was only lightly expressed 6h after exposure to abiotic stress (Figure 5c). Using qRT-PCR, the results showed that the expression level of TaHDZ8 in CS under salt or drought stress was 9.50-fold and 18.61-fold higher than those of the control, respectively. The expression level of TaHDZ8 was 33.25-fold higher in DK and 8.13-fold higher in HX than in CS as the control (Figure 6). In addition, AtHB7 and AtHB12 were revealed to be involved in the fine-tuning processes associated with growth and drought stress [71]. Their orthologue counterpart in wheat-TaHDZ20 and TaHDZ28-had significant differential expressions between control and drought stress in susceptible genotypes while no significant difference in tolerant genotype (Figure 6), suggesting these two genes may play the essential role in the regulatory network of drought response in wheat.

Conclusions
This study systematically identifies and characterizes the wheat HD-Zip gene family at the whole genome level. A total of 46 wheat HD-Zip genes, belonging to groups I to IV, were identified, and they were unevenly distributed on wheat chromosomes. The gene structure, conserved motif and phylogenetic relationship analysis further supported the classification. Gene duplication analysis found that 13, 12 and 3 TaHDZs were found to have one, two and three homoeologous copies, respectively, suggesting homoeologous gene loss events occurred in this family. Furthermore, the co-expression network between TaHDZs and other wheat genes was constructed. Up to 191 interactions could be found, indicating that TaHDZ proteins are widely involved in the wheat metabolic network and that they regulate diverse biological processes and pathways. Finally, the expression profiles of these TaHDZ genes in different tissues and under various stress conditions were identified by RNA-Seq mapping and further validated by qRT-PCR analysis. In total, thirteen salt-responsive and sixteen drought-responsive genes were obtained, which should be considered as the candidates for future abiotic functional studies. Overall, this study provides the genetic background knowledge for genetic improvement of salt and drought tolerance in wheat, and contributes as well to reveal the molecular mechanism of stress response.
Supplementary Materials: The following are available online at www.mdpi.com/2073-4425/9/2/70/s1, Figure  S1: Sequence logos of TaHDZ proteins conserved motifs. A total of 20 motifs were identified using the MEME tool, Table S1: Primer sequences of 28 TaHDZ genes used for qRT-PCR analysis, Table S2: Details of 9 TaHDZ orthologous genes between wheat and Arabidopsis thaliana, Supplementary file 1: The multiple sequence alignment of TaHDZ proteins.