Evolutionary Divergence of Duplicated Hsf Genes in Populus

Heat shock transcription factors (Hsfs), which function as the activator of heat shock proteins (Hsps), play multiple roles in response to environmental stress and the development of plants. The Hsf family had experienced gene expansion via whole-genome duplication from a single cell algae to higher plants. However, how the Hsf gene family went through evolutionary divergence after genome duplication is unknown. As a model wood species, Populus trichocarpa is widely distributed in North America with various ecological and climatic environments. In this study, we used P. trichocarpa as materials and identified the expression divergence of the PtHsf gene family in developmental processes, such as dormant bud formation and opening, catkins development, and in response to environments. Through the co-expression network, we further discovered the divergent co-expressed genes that related to the functional divergence of PtHsfs. Then, we studied the alternative splicing events, single nucleotide polymorphism distribution and tertiary structures of members of the PtHsf gene family. In addition to expression divergence, we uncovered the evolutionary divergence in the protein level which may be important to new function formations and for survival in changing environments. This study comprehensively analyzed the evolutionary divergence of a member of the PtHsf gene family after genome duplication, paving the way for further gene function analysis and genetic engineering.


Introduction
Heat shock proteins (Hsps), which function as molecular chaperones to maintain proteostasis, are critical for almost all organisms in response to environmental stresses, such as heat, cold, drought, salt, biotic stresses and so on [1][2][3][4]. Heat shock transcription factors (Hsfs) play essential roles from fungi to plants and animals in response to environmental stress to induce the transcription of Hsps via binding to the heat shock elements (HSEs) that are located in their upstream promoter regions [1,2,5]. Except for the stress response, Hsf is also involved in developmental processes in animals and plants [1,2], such as oogenesis and larvae development in Drosophila Melanogaster [6] and seed development in Arabidopsis [7]. Therefore, Hsf is a fundamental transcription factor (TF) family in organism development and surviving in a fluctuating environment, especially for plants because of their sessile lifestyle.
All proteins with an expectation value (E) < 1 × e −10 and harbored the Hsf domain confirmed in the Conserved Domain Database (https://www.ncbi.nlm.nih.gov/cdd/) were used for further analysis. The phylogenetic tree was constructed using IQ-TREE 1.6.9 [27] with the JTT + F + I + G4 model that was the best-fit model selected by ModelFinder [28] and ultrafast bootstrap (1000 replicates) for each alignment [29]. The interactive Tree Of Life [30] was used to display and edit the phylogenetic tree.

Promoter Analysis
The promoter sequences (2 kb upstream of translation initiation site) of PtHsfs were downloaded from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html#). The similarity of the promoters between the PtHsf paralogous pair was analyzed using the default parameter of PlantPAN 3.0 [31].

Natural Variation in PtHsfs
The single nucleotide polymorphisms (SNPs) in the PtHsf genes were obtained from Phytozome, which was based on the whole genome re-sequencing data of 549 P. trichocarpa natural individuals in North America [22]. Based on the SNPs location and their consequence, the SNPs were classified into UTR 5 prime, UTR 3 deleted, UTR 3 prime, intron, start gained, start lost, stop gained, stop lost, frame shift, codon insertion, codon deletion, codon change plus codon deletion, codon change plus codon insertion, splice site donor, splice site acceptor, synonymous stop, synonymous coding and non-synonymous coding.

Protein Structural Modeling
The 3D structures of PtHsfs were built using the Iterative Threading ASSEmbly Refinement (I-TASSER, v.5.1, Lansing, MI, USA) protein structure modeling toolkit [33] with the default parameter.

Genome-Wide Duplication of Hsf Genes in Populus
To study the Hsf gene number variation during the evolution from algae to land vascular plants, we identified Hsf genes from 14 plants that include plant species from lower algae to higher vascular plants ( Figure 1A). Only 1 and 2 Hsf genes were in the Ostreococcus lucimarinus and Volvox carteri genome, respectively ( Figure 1A). The Hsf gene number increased in the earlier land plants, having 3 and 7 Hsfs in Marchantia polymorpha and Physcomitrella patens, respectively. When plants evolved to vascular plants and higher seed plants, the Hsf genes number were doubled ( Figure 1A). This means the Hsf genes were multiplied on the way to terrestrial plants and seed plants. In order to study the evolutionary relationships of Hsf genes from aquatic to higher plant, we construct a phylogenetic tree using the full length of the Hsf protein sequences from the 8 plant species including Ostreococcus lucimarinus, Volvox carteri, Marchantia polymorpha, Physcomitrella patens, Selaginella moellendorffii, Amborella trichopoda, Arabidopsis thaliana and Populus trichocarpa ( Figure 1B). The Hsf gene originated in Ostreococcus lucimarinus, a unicell algae and divided into 3 main subclasses that were subclass A, B and C in the evolution history ( Figure 1B). Both subclass A and B contained a branch consisting of Hsf genes from Marchantia polymorpha, Physcomitrella patens, Selaginella moellendorffii, and Hsf showed expansion in the Physcomitrella patens and Selaginella moellendorffii genome. Subclass A and B were further divided into different subfamilies (A1-A9 in the subclass A, B1-B5 in the subclass) in Amborella trichopoda that was considered as the basal angiosperms [34]. Hsf genes from higher plants exhibited expansion in every subfamily when evolved from Amborella trichopoda. This result suggested that there were Hsf gene expansion processes when the higher plant evolved from aquatic life.
To investigate how the PtHsf gene family expanded during whole-genome duplication, we found that 23 out of 28 PtHsf were located on duplicated fragments and ten paralogous pairs were identified via phylogenetic analysis [12]. Here we compared the syntenic relationships of genes nearby the duplicated PtHsfs. As shown in Figure 1C, 1 Mb chromosome blocks containing duplicated PtHsfs were used to compare the syntenic relationship. Among the ten PtHsf paralogous pairs, four PtHsf pairs (A6a/A6b, A7a/A7b, A8a/A8b, B4a/B4c, and B4b/B4d) were located in high density duplicated blocks, while two PtHsf pairs (A4a/A4c and A5a/A5b) were located in relative low-density duplicated blocks. This result indicates that the PtHsf gene family underwent expansion and evolution via whole genome duplication.

Expression Similarity and Divergence of PtHsf Genes
To explore whether the gene function is divergent with the expansion of the PtHsf gene family, we analyzed the expression patterns of PtHsfs. The gene expression level in different tissues or under different growth conditions reflects its potential function in the organism. Based on the Populus gene atlas database, we compared the expression patterns of PtHsfs in different developmental processes such as bud set and bud flush, male/female catkin development, leaf expansion, root/stem response to different nitrogen nutrition ( Figure 2). In dormant buds, most of the PtHsf genes were highly expressed except PtHsf-A4c, -B4b and -B4d (Figure 2A), which means most of the PtHsfs genes were involved in dormant bud formation. This might be a strategy to protect the meristem to survive in the winter [35,36]. In fully opened buds, most of PtHsfs were down-regulated obviously except PtHsf-A7a, -B2b, -B4a, -B4b, -B4c and -B4d ( Figure 2A). The down-regulation of PtHsfs in fully opened buds might be caused by seasonal variation. From dominant bud formation to release processes, we found that almost all PtHsfs play important roles in response to dynamic environmental change ( Figure 2A).
In addition, we analyzed the nitrogen response of PtHsfs in the root and stem, the paralogous pair B3a/B3b were induced by nitrogen treatments in the root but were repressed in the stem (Figure 2A). These results indicate that most paralogous pairs in the PtHsf gene family showed divergent gene expression patterns either in various developmental stages or under different nutrition conditions.

Promoter Similarity between the PtHsf Paralogous Pairs
The gene expression pattern depends on the cis-acting elements that are located in the gene promoter region. We then analyzed the promoter similarity between PtHsf paralogous pairs. Among the ten PtHsf paralogous pairs, B4b/B4d showed the highest sequence similarity in the 2 kb promoter region ( Figure 3A,B). To test if the conserved promoter sequence is associated with the gene expression similarity or duplicated data, we performed a correlation analysis. Noticeably, the gene expression correlation coefficient between paralogous pairs was positively correlated with the length of the conserved promoter regions; but was negatively correlated with the duplication date (Figure In order to better understand the expression divergence of PtHsfs, especially between paralogous pairs, the Pearson correlation coefficient (PCC, r) was employed to establish relationships among the PtHsf gene family ( Figure 2B). Noticeably, only two of the ten PtHsf paralogous pairs showed a strong positive correlation (A7a/A7b and B4b/B4d) with r > 0.8 ( Figure 2B). This further suggests that the duplicated Hsf genes underwent subfunctionalization or neofunctionalization during the evolution.

Promoter Similarity between the PtHsf Paralogous Pairs
The gene expression pattern depends on the cis-acting elements that are located in the gene promoter region. We then analyzed the promoter similarity between PtHsf paralogous pairs. Among the ten PtHsf paralogous pairs, B4b/B4d showed the highest sequence similarity in the 2 kb promoter region ( Figure 3A,B). To test if the conserved promoter sequence is associated with the gene expression similarity or duplicated data, we performed a correlation analysis. Noticeably, the gene expression correlation coefficient between paralogous pairs was positively correlated with the length of the conserved promoter regions; but was negatively correlated with the duplication date ( Figure 3C,D). This indicates that the promoter similarity directly affects the gene expression similarity and the differences of promoter similarity might be caused by the evolution process. 3C,D). This indicates that the promoter similarity directly affects the gene expression similarity and the differences of promoter similarity might be caused by the evolution process.

Co-Expression Network of PtHsfs
Generally, functional-associated genes share similar expression patterns since they are regulated by the same upstream regulators or they are in the same regulatory pathway. Co-expression network provides clues to discover genes' potential function and their evolutionary divergence [37]. Based on the genome-wide Populus expression atlas database, we constructed a PtHsfs co-expression network ( Figure 4A). A total of 4468 genes were co-expressed with the PtHsf genes except for four members (PtHsf-A4c, PtHsf-A6b, PtHsf-B2b and PtHsf-B5a), and the co-expressed gene number varied from 2 to 626 for different PtHsf genes (Table 1 and Figure 4A). PtHsf-A9, PtHsf-A6a and PtHsf-C1 were co-expressed with 15, 6 and 2 genes and formed three independent sub-networks, respectively ( Figure 4A). The remained 23 PtHsfs co-expressed with 4445 genes and consisted of a complex co-expression network ( Figure 4A). In the co-expression network, PtHsf-A1c, PtHsf-A2, PtHsf-A1b, PtHsf-A1a, PtHsf-A5a, PtHsf-B3b and PtHsf-A5b were located in a largest sub-network with their co-expressed genes, but PtHsf-B4a/4c, PtHsf-B4b/4d, PtHsf-A3a/4b, PtHsf-A4a and PtHsf-B5b were located in an independent sub-network, respectively ( Figure 4A); the other 11 PtHsfs were scattered in the co-expression network ( Figure 4A). Only 2 out of 10 paralogous pairs were in the same sub-network ( Figure 4A). This result indicates that although most of PtHsf genes were in a co-expression network, their co-expression relationships were relatively independent. This implies that the PtHsf genes generated by whole-genome duplication events underwent functional divergence.   Table S1.  Table S1. In order to further reveal the potential functions of the PtHsf genes, we performed multiple functional analyses including gene ontology (GO) enrichment, protein domain enrichment, pathway enrichment and enriched TF analysis based on the co-expressed genes ( Table 1, Table S1 and Figure 4B). At the threshold of the false discovery rate corrected P-value < 0.05, only 10 out of 23 PtHsf sub-networks were enriched with a different number of GO terms which varied from 6 to 59 ( Figure 4B and Table S1). The partial of the GO term is consistent with the potential function based on the expression pattern. The enriched GO term of the PtHsf-A1b sub-network represented in the function of "chromosome organization", "chromatin organization", "chromatin modification", "covalent chromatin modification" and "histone modification" (Table S1), might be involved in the meiotic cell cycle in catkins development [38]. PtHsfB4s were highly expressed in the leaf expansion process (Figure 2A). This process included cell enlarge, cell wall thicken and photosynthesis enhancement [39]. We found that the PtHsf-B4a/B4c sub-network was enriched in GO terms of "cellular aromatic compound metabolic process", "cellular macromolecule metabolic process", "macromolecule metabolic process" and "primary metabolic process" (Table S1), implying that PtHsf-B4a/B4c were involved in cell construction. In contrast, the PtHsf-B4b/B4d sub-network was enriched in "electron transport chain", "respiratory electron transport chain", "photosynthetic electron transport chain", and "photosynthetic electron transport in photosystem II" (Table S1), indicating that PtHsf-B4b/B4d play roles in photosynthesis. This result suggested that the two paralogous pairs (PtHsf-B4a/B4c and PtHsf-B4b/B4d) in the subfamily B4 play totally different roles in leaf development. In response to different nitrogen treatments, "nitrogen compound metabolic and biosynthetic processes" related GO terms were enriched in the sub-networks of PtHsf-A1b, PtHsf-A1c, PtHsf-A2, PtHsf-A5b, PtHsf-B4a and PtHsf-B4c, which were dynamic responses to nitrogen treatments (Figure 2A, Table S1). In addition, the sub-networks of PtHsf-A1b, PtHsf-A1c, PtHsf-A2, PtHsf-A5b, PtHsf-B1, PtHsf-B4a, PtHsf-B4c were mainly enriched in the "primary, macromolecule, heterocycle and cellular metabolic process" and "gene expression" (Figure 4B), which might be caused by a function redundant or functional diversity.
In the protein domain enrichment analysis, we found that the sub-networks of 15 PtHsfs were enriched in 1-41 protein domains, of which the Hsp and Hsf protein domains were enriched in the sub-networks of PtHsf-A2, PtHsf-A7a and PtHsf-B2a (Table S1), which are consistent with their dominant roles in the heat shock response [12]. Different domain types were enriched in the sub-networks of PtHsf genes, except PtHsf-B4b/B4d (Table 1 and Table S1). Noticeably, the distinct domain enrichment divergences were exhibited among the sub-networks of PtHsf-A1b, PtHsf-A1c, PtHsf-A2, PtHsf-A5b, PtHsf-B1, PtHsf-B4a and PtHsf-B4c, which showed conserved GO enrichment (Table 1 and Table S1). We further found that the co-expressed TFs in the sub-networks of PtHsfs also displayed divergence in the PtHsf gene family. For example, the sub-networks of PtHsf-B4b and PtHsf-B4d were enriched in similar GO terms and functional domains, but their co-expressed TFs were significantly different. Only two TFs in the HD-ZIP family (Potri.001G372300 (homolog of Arabidopsis HB14, PHB) and Potri.001G188800 (homolog of Arabidopsis HB15, CNA)) were co-expressed with both PtHsf-B4b and PtHsf-B4d. For sub-network-specific TFs, one bHLH and two GRAS TFs were specifically co-expressed with PtHsf-B4b, whereas one NAC, one bHLH, one WOX and two TCP TFs were specifically co-expressed with PtHsf-B4d (Table 1 and Table S2). The pathway enrichment analysis showed that the sub-networks of 11 PtHsfs were enriched in 1-3 pathway(s), of which the "spliceosome pathway" was enriched in multiple sub-networks such as sub-networks of PtHsf-A1b, PtHsf-A1c, PtHsf-A2, PtHsf-A5b and PtHsf-A7a (Table 1 and Table S1), this implying PtHsfs might be involved in the process of alternative splicing. Altogether, the co-expressed network analysis of PtHsfs reveals that the members in the PtHsf gene family underwent functional divergence.

The SNP Differences between PtHsf Paralogous Pairs
Single nucleotide polymorphism (SNP) is a way to produce gene variation that is also responsible for gene function, natural selection and genome evolution [42,43]. To investigate the natural variation in the PtHsf gene family, we analyzed the SNP dataset from a natural population that includes 549 P. trichocarpa individuals [22] (Figure 5, Table 3 and Table S3). The SNP density was significantly different in PtHsf genes, for example, a total of 342 SNPs were detected in PtHsfA1a, but only 50 SNPs were detected in its paralogs PtHsf1c ( Figure 5A and Table S3). As shown in Figure 5A, SNPs can be divided into different types based on the location and consequence. SNPs in the PtHsf gene family were distributed in the intron, UTR region, and coding sequence that produce synonymous coding or non-synonymous coding ( Figure 5, Table S3). In the SNPs types that produce protein variation, the frequency of non-synonymous coding and the start gained type of SNPs is much higher than other types such as the start lost, stop gained, stop lost and so on ( Figure 5A, Table S3). Except for the protein coding region variation, the SNP frequency was higher in the regulatory regions such as 5 -UTR and 3 -UTR ( Figure 5A), which adjusts the gene expression level under different environmental stresses. The SNP frequency of SNP types affecting the coding sequences of PtHsf proteins (such as the start gained and non-synonymous coding) varied greatly ( Figure 5A, Table 3), even in PtHsf-A4a/c and PtHsf-A4b/d-the paralogous pairs with conserved expression and co-expressed genes ( Figure 5A, Table 3). The SNP variation in PtHsf genes will increase the functional diversity and adaption to the environment at the population level.  Table S3.  Table S3.

Divergence of Protein 3D Structure
PtHsfs were divided into three subclades based on the conserved domains. All the PtHsf proteins contained the DBD located in the N terminal and followed HR-A/B ( Figure 6A). In the C terminal, the PtHsf-A subclade had the transcriptional activation motif (AHA motifs) which was lost in subclass B and C (Figure 6A), and subclass B PtHsf harbored the regulatory domain (RD). NLSs were located in subclass B and C PtHsf proteins and nuclear export signal (NES) were only found in the PtHsf-A subclass except for PtHsf-A8a/8b ( Figure 6A). The composition and distribution of the conserved domains were similar to those previously reported in animals and Arabidopsis [25]. As a transcription factor, Hsf proteins played their roles via forming homotrimerization or heterotrimerization to interact with the DNA binding site and initiate the transaction of downstream target genes. So, it was very important for Hsf to maintain specific spatial configuration [44]. To predict the 3D structures, we selected 16 PtHsf proteins that covered each subfamily and each subclade for structural prediction ( Figure 6B). The helix-turn-helix structure of DBD can be found in the N terminal except for PtHsf-A3, PtHsf-B4a and PtHsf-C1 ( Figure 6B). The HR-A/B adjacent to DBD consisted of the α-helix, except for PtHsf-A3 and PtHsf-B4a ( Figure 6B). This result means that the secondary structure of the N terminal of PtHsfs was conserved. In contrast, the C terminal showed secondary structure diversity ( Figure 6B). For example, the NLS that was on the flanked of HR-A/B in subclade A formed an α-helix except for PtHsf-A3 and PtHsfA8b ( Figure 6B); it formed a random coil in subclade B and C ( Figure 6B). It is interesting that only PtHsf-A1a and PtHsf-A5a shared a similar geometry of packing in the PtHsf-A subclade ( Figure 6A), which was similar to human Hsf1 and Hsf2 proteins [44]. In the PtHsf-B subclade, PtHsf-B2a, PtHsf-B3a, PtHsf-B4b and PtHsf-B5a had similar geometries of packing, but the length and position of the α-coil in the HR-A/B domain were different in the four proteins ( Figure 6B). We then compared the 3D structure of paralogous pairs ( Figure S1) and found that there were divergences of the tertiary structure between the paralogous pairs of PtHsf-A5a/A5b, PtHsf-Aa6a/A6b, PtHsf-A7a/A7b, PtHsf-A8a/A8b, PtHsf-B2a/B2c, and PtHsf-B4a/B4c ( Figure S1). Altogether, PtHsf members retained conserved functional motifs but had a tertiary structure divergence.

Discussion
Hsf transcription factors are direct upstream regulators of Hsp genes by interacting with their cis-acting elements HSE, it plays the diversity function in plant development and responds to various environmental stresses [1]. The plant evolved Hsf genes from unicellular algae and underwent gene expansion via whole genome duplication ( Figure 1) [14]. However, how the PtHsf gene family evolved after whole-genome duplication events is unclear. In this study, we found that the expression pattern of PtHsf paralogous pairs was positively correlated with promoter region conservation. Furthermore, we found that the alternative splicing, SNP distribution and frequency, and protein 3D structure were divergent in PtHsfs paralogous pairs, implying that the Hsf gene family acquired multiple ways to increase their protein diversity to adapt to the environment. This study comprehensively analyzed the potential functional divergence of PtHsf genes, uncovering the possible evolutionary history of the PtHsf gene family in Populus.
Gene functional diversification accelerated by gene expansion promotes the environmental adaption ability of an organism [45]. In Populus, the PtHsf gene family underwent gene expansion during evolution (Figure 1) [12]. As key regulators of the heat shock response, seven PtHsfs were validated to respond to heat shock and activate Hsps expression in poplar [12]. Moreover, the PtHsf gene family had a diversity expression pattern in different organs [12], different biological processes and nitrogen response (Figure 2A), which suggests that the PtHsf gene family might acquire new functions in the development and nitrogen response from gene expansion process. The expression pattern diversity of the Hsf gene family was widely reported in Arabidopsis, rice, and so on [46]. In our co-expression network, only PtHsf-B4a/4c, PtHsf-B4b/4d, and PtHsf-A3a/3b were located in the

Discussion
Hsf transcription factors are direct upstream regulators of Hsp genes by interacting with their cis-acting elements HSE, it plays the diversity function in plant development and responds to various environmental stresses [1]. The plant evolved Hsf genes from unicellular algae and underwent gene expansion via whole genome duplication ( Figure 1) [14]. However, how the PtHsf gene family evolved after whole-genome duplication events is unclear. In this study, we found that the expression pattern of PtHsf paralogous pairs was positively correlated with promoter region conservation. Furthermore, we found that the alternative splicing, SNP distribution and frequency, and protein 3D structure were divergent in PtHsfs paralogous pairs, implying that the Hsf gene family acquired multiple ways to increase their protein diversity to adapt to the environment. This study comprehensively analyzed the potential functional divergence of PtHsf genes, uncovering the possible evolutionary history of the PtHsf gene family in Populus.
Gene functional diversification accelerated by gene expansion promotes the environmental adaption ability of an organism [45]. In Populus, the PtHsf gene family underwent gene expansion during evolution (Figure 1) [12]. As key regulators of the heat shock response, seven PtHsfs were validated to respond to heat shock and activate Hsps expression in poplar [12]. Moreover, the PtHsf gene family had a diversity expression pattern in different organs [12], different biological processes and nitrogen response (Figure 2A), which suggests that the PtHsf gene family might acquire new functions in the development and nitrogen response from gene expansion process. The expression pattern diversity of the Hsf gene family was widely reported in Arabidopsis, rice, and so on [46]. In our co-expression network, only PtHsf-B4a/4c, PtHsf-B4b/4d, and PtHsf-A3a/3b were located in the same sub-network, respectively, others were in relatively independent sub-networks ( Figure 4). The GO enrichment analysis showed that the PtHsf gene expression diversity was consistent with their potential function (Table S1). For example, the PtHsf-A1b sub-network was enriched in catkins development related processes (Table S1) [38]. The sub-networks of PtHsf-B4a/B4c and PtHsf-B4b/B4d dynamically regulated in leaf development were enriched in terms of cell construction and the photosynthetic electron transport chain, respectively (Table S1). Sub-networks of PtHsfs that responded to different types of nitrogen treatment were also enriched in nitrogen compound metabolic and biosynthetic processes (Table S1). Because the mRNA abundance of functionally related genes are coordinated regulated [37], PtHsf co-expression network analysis and GO enrichment further indicated that neofunctionalization occurred in the PtHsf gene family. A similar function of GO terms was the enrichment in the sub-networks of PtHsf-A1b, PtHsf-A1c, PtHsf-A2, PtHsf-A5b, PtHsf-B4a and PtHsf-B4c ( Figure 4B), which might be caused by functional redundancy. However, they showed differences in the enriched domain and co-expressed TFs (Table S1). PtHsf-B4b and PtHsf-B4d were the most conserved paralogous genes in the PtHsf gene family because of the similar expression pattern, GO enrichment and domain enrichment ( Figure 4B, Table S1). In the co-expressed TFs, two members from the HD-ZIP family were co-expressed with both PtHsf-B4b and PtHsf-B4d (Table 1 and Table S2), one of which is a homolog of PHB which is involved in the polarity establishment in leaf primordium [47]. Noticeably, PtHsf-B4d was specifically co-expressed with two TCP TFs, which were reported to be affecting the leaf shape in Arabidopsis [48,49]. This specific co-expression might imply the specific role of PtHsf-B4d in a specific stage/process of leaf development. Altogether, the members of the PtHsf gene family showed functional diversity.
The gene promoter determines its expression pattern, we found that the expression similarity of PtHsf paralogous pairs was positively correlated with the length of the conserved promoter region (Figure 3). In addition, the promoter choice can influence the splice site selection [50,51]. In our study, we found that the PtHsf genes had various alternative splicing isoforms (Table 1), even in the paralogous pair PtHsf-B4a/4c with a similar expression pattern ( Figure 2 and Table 2). Noticeably, PtHsf-B4b/4d and PtHsf-A3a/3b were paralogous pairs with more than 400 bp conserved blocks close to the translation initiation site that only produced one transcript from each gene; other paralogous pairs with a relatively low conservation in the promoter region produced a diverse splicing isoform except for PtHsf-A5a/5c ( Figure 3, Table 1). The different alternative splicing might be caused by the difference in 5 -UTR. It has been reported that the sequence and secondary structure of the 5 -flanking region plays a key role in determining the alternative splicing site [50,51]. In human hepatoma Hep3B cells, a five-point mutation out from a 220 bp promoter can change the splicing pattern [50]. So, the promoter modification might be one of the reasons for the alternative splicing diversity in the PtHsf gene family. Another reason is that the unbalanced alternative splicing isoform might be related to the co-expressed genes in the "spliceosome pathway" (Table S1). Sub-networks of PtHsf-A1c, PtHsf-A2, and PtHsf-A7a was enriched in the "spliceosome pathway"; these genes had at least two transcripts (Table 2). Furthermore, the unevenly distributed SNPs in PtHsf genes may also mediate the difference of alternative splicing ( Figure 5). In humans, the MYLK gene with two SNPs in intron eleven generated a splicing variant because the presence of two SNPs at the acceptor site affected the recognition of the spliceosome [52]. In the Arabidopsis Wassilewskia accession, RPT5b harbors an SNP in intron seven, producing a mis-splicing transcript [53]. Based on the P. trichocarpa population, we identified one SNP at the splice donor site of PtHsf-A6a and PtHsf-A9 and one SNP at the splice acceptor site of PtHsf-A6b ( Figure 5), which may influence their alternative splicing process in specific genotypes ( Figure 5). Altogether, the promoter divergence, co-expressed genes and SNPs might contribute to the alternative splicing of PtHsf genes in different levels, these regulatory mechanisms increased the functional diversity of PtHsfs.
The protein 3D structure is pivotal for the functional maintenance of proteins. We found that the N-terminal of PtHsfs were conserved in the sequence identity and secondary structure ( Figure 6). The winged helix-turn-helix were formed in DBD which will interact with a major groove of the DNA ( Figure 6B) [44,54], and the α-helix also shaped in the HR-A/B were responses for trimerization when Hsf was activated ( Figure 6B) [54]. The secondary structures of DBD and HR-A/B of PtHsfs were similar to animal Hsfs [44,54]. We also found that the structure of the C-terminal of PtHsf was more diverse than the N-terminal ( Figure 6B). This is consistent with the sequence identified among the 16 PtHsfs was below 35% (Table S4), which is a marginal value for the protein structure similarity [55]. The paralogous pairs were generated from genome duplication events, the protein 3D structure should be conserved; but we noticed that the tertiary structures were divergent in PtHsf paralogous pairs ( Figure S1). It might be affected by specific amino-acid residue mutations [56,57]. There were 11 and 6 amino-acid residue mutants in the Pro or Gly which were helix-breaking residues [57] when comparing the protein sequence of PtHsf-B2a/B2c and PtHsf-A8a/A8b, respectively ( Figure S2). This is consistent with the protein 3D structures, PtHsf-B2c and PtHsf-A8a, which have a long α-helix while PtHsf-B2a and PtHsf-A8b consisted of a short α-helix and random coil ( Figure S1). Above all, both sequence identities and amino-acid residue mutations may affect the protein 3D structure divergence in the PtHsf gene family.

Conclusions
In this study, we found that the Hsf gene family was a conserved family across the world and expanded in plants via genome duplication. Furthermore, the functional diversity of PtHsfs was evaluated by expression profiles and co-expressed gene enrichment. In addition, the modification of the promoter and coding sequence region led to the divergence of alternative splicing events and the 3D structure. This study comprehensively analyzed the potential functional diversity of PtHsf genes and uncovered the evolutionary history of the PtHsf gene family. Further investigations of the function of the PtHsf gene family in poplar will uncover the molecular mechanism of PtHsfs in poplar development and adaption to dynamic environments.