Insight into the Phylogeny and Binding Ability of WRKY Transcription Factors

WRKY transcription factors (TFs), which make up one of the largest families of TFs in the plant kingdom, are key players in modulating gene expression relating to embryogenesis, senescence, pathogen resistance, and abiotic stress responses. However, the phylogeny and grouping of WRKY TFs and how their binding ability is affected by the flanking regions of W-box sequences remain unclear. In this study, we reconstructed the phylogeny of WRKY across the plant kingdom and characterized the DNA-binding profile of Arabidopsis thaliana WRKY (WRKY54) based on its W-box recognition sequence. We found that WRKY TFs could be separated into five clades, and that the functional zinc-finger motif at the C-terminal of WRKY appeared after several nucleotide substitutions had occurred at the 3′-end of the zinc-finger region in chlorophytes. In addition, we found that W-box flanking regions affect the binding ability of WRKY54 based on the results of a fluorescence-based electrophoretic mobility shift assay (fEMSA) and quartz crystal microbalance (QCM) analysis. The great abundance of WRKY TFs in plants implicates their involvement in diverse molecular regulatory networks, and the flanking regions of W-box sequences may contribute to their molecular recognition mechanism. This phylogeny and our findings on the molecular recognition mechanism of WRKY TFs should be helpful for further research in this area.


Introduction
WRKY transcription factors (TFs) are named based on the conserved residue WRKY, which constitutes an integral part of their DNA-binding domain (DBD) and is approximately 60 residues in length. The key structural features of this domain are the conserved DNA-binding motif WRKYGQK and a zinc-finger motif at the C-terminus [1]. Previous research has indicated that WRKY TFs should be classified into three groups based on the number of WRKY domains and the type of zinc-finger motif they contain [2]. The group 1 WRKY TFs contain two WRKY domains, and each of these has a C2H2 motif. The group 2 TFs contain one WRKY domain with a C2H2 zinc-finger motif, whereas the group 3 TFs contain one WRKY domain with a C2HC zinc-finger motif. This grouping of WRKY TFs has been widely used, but molecular evidence has now shown that it is inconsistent with phylogeny [2][3][4]. For example, the WRKY genes from Arabidopsis thaliana can be grouped into group Ia, comprising AtWRKY1, -2, -3, -4, -10, -25, -26, -32, -33, -34, -44, and -58, and group Ib, comprising AtWRKY8, -12, -13, -23, -24, -28, -43, -45, -48, -56, -57, -68, -71, and -75; these two groups are sister groups. Group IIa, comprising AtWRKY6, -9, -18, -31, -36, -40, -42, -47, -60, -61, and -72, is followed [4]. However, group Ia and IIa members were treated as a monophyletic group named group IIa by Mohanta et al. [3]. In addition, some analysis. In summary, a novel perspective on the evolutionary origin of WRKY TFs is provided through our reconstructed phylogenetic tree. In addition, the binding profile we describe for the WRKY54 DBD sheds light on the long-standing question of the protein-DNA recognition mechanism and the sequence-selective binding of the highly conserved WRKY TFs.

Retrieval and Identification of WRKY Transcription Factor Genes
WRKY TF gene family was identified by using 16 species from across the plant kingdom (Table 1). These included a rhodophyte, a chlorophyte, embryophytes (two), a tracheophyte, monocots (two), and dicots (eight). In total, we obtained 528 WRKY TF genes. In the selected species, we identified no WRKY TFs in P. umbilicalis (rhodophyte), and only two in Micromonas pusilla (chlorophyte). Large numbers of WRKY TFs were identified in the monocots (Oryza sativa: 75; Zea mays: 62) and dicots (A. thaliana: 70). We only found WRKY TFs with two WRKY domains in the embryophytes. Based on multiple sequence alignment, there were significant differences in the sequence characteristics of the chlorophyte (M. pusilla) WRKY TF and those of most of the other species. Almost all of the WRKY TFs were characterized by a conserved WRKY motif (xWRKYGQK or xWRKYGEK) with a zinc-finger motif (CxCxHTC or CxCxHxH), where x denotes any amino acid. However, the WRKY TF from M. pusilla (Mpu50253) contained a conserved WRKY motif (RWRKYGQK) with only part of the zinc-finger motif (C), suggesting that three or four conserved amino acid substitutions (C → CxCxHxH or C → CxCxHTC) occurred after plants had colonized land.

Phylogenetic Reconstruction for WRKY Transcription Factors
To trace the evolution of WRKY TFs, the WRKY TF obtained from M. pusilla (Mpu50253, no zinc-finger motif) was chosen as the root of the reconstructed WRKY phylogeny (Figure 1). In the resulting phylogeny, the TFs were grouped into five clades:

The Single WRKY Domain of WRKY54 Exists as Both a Monomer and in an Aggregated Form In Vitro
To study the binding ability of the single WRKY domain in Clade 3, the recombinant WRKY54 protein was expressed and purified. In the first trial, the full length of the WRKY54 recombinant protein exhibited an aggregated form with no DNA binding ability in solution after cell lysis. The DNA binding domain of WRKY54 from residues 133-224 was constructed. After protein expression and purification, the recombinant WRKY54  Clade 2 contained WRKY TFs from across the plant kingdom, from Physcomitrium patens (e.g., Pp3000) to Prunus persica (e.g., Ppe74300). There were two main zinc-finger structure types identified in Clade 2: CxCxHNH (e.g., WRKY71, Mp6s0129, Spol02923, and Aco99600) and CxCxHTH (e.g., WRKY48, Zm07329, OsWRKY11, and Spol04568). A WRKY TF lacking a zinc-finger motif, OsWRKY52, was also located in Clade 2.
Finally, the long branches observed in the reconstructed WRKY phylogeny suggest a high degree of nucleotide variation across the plant kingdom.

The Single WRKY Domain of WRKY54 Exists as Both a Monomer and in an Aggregated Form In Vitro
To study the binding ability of the single WRKY domain in Clade 3, the recombinant WRKY54 protein was expressed and purified. In the first trial, the full length of the WRKY54 recombinant protein exhibited an aggregated form with no DNA binding ability in solution after cell lysis. The DNA binding domain of WRKY54 from residues 133-224 was constructed. After protein expression and purification, the recombinant WRKY54 DBD protein exhibited two forms-an aggregated form and a monomeric form-that corresponded to two major peaks in the SEC elution profile (Figure 2A). SDS-PAGE showed that main protein product size of the eluted solution collected at 43 mL, 78 mL, and 81 mL was 50-75 kDa ( Figure 2B). To verify the oligomerization state of the WRKY54 DBD, protein solutions collected at 43, 78 and 81 mL were subjected to DLS. The protein solutions collected at 78 and 81 mL may have contained the same protein product, so these solutions were pooled for the DLS analysis. The DLS results showed that the molecular mass of the WRKY54 DBD fraction collected at 43 mL was 1667 kDa, suggesting that the WRKY54 DBD in this fraction was in the aggregate form ( Figure 2C). In contrast, the protein size measured in the pooled solutions collected at 78 and 81 mL was 59 kDa, suggesting that the WRKY54 DBD in these fractions was in the monomer form ( Figure 2D).

WRKY54 DNA-Binding Domain Can Bind to W4 Box from SAG12 Upstream Sequence
It is known that WRKY TFs bind to the DNA sequence named W-box (5 -TTGAC-C/T-3 ). However, several sequences upstream of the promoters triggered by WRKY TFs contain W-box-centered sequences with TTGACC or TTGACT. Therefore, we first used a fEMSA to examine whether the WRKY54 DBD can bind to W-box sequences with TTGACC or TTGACT from the SAG12 upstream region. Four types of W-box (W1-W4) obtained from SAG12 upstream regions were used as probes ( Figure 3A). Those four types of W-box were labeled with fluorescein and incubated with purified WRKY54 DBD protein. The results confirmed binding of the WRKY54 DBD protein to W4, but no binding to W1, W2, or W3 ( Figure 3B). These four W-box sequences differ in the nucleotides at either the 5 or the 3 flanking region ( Figure 3A). To examine whether the nucleotides located adjacent to W4 box affect the binding ability of WRKY54, we designed a series of W4 probes. collected at 78 and 81 mL may have contained the same protein product, so these solutions were pooled for the DLS analysis. The DLS results showed that the molecular mass of the WRKY54 DBD fraction collected at 43 mL was 1667 kDa, suggesting that the WRKY54 DBD in this fraction was in the aggregate form ( Figure 2C). In contrast, the protein size measured in the pooled solutions collected at 78 and 81 mL was 59 kDa, suggesting that the WRKY54 DBD in these fractions was in the monomer form ( Figure 2D).

WRKY54 DNA-Binding Domain Can Bind to W4 Box from SAG12 Upstream Sequence
It is known that WRKY TFs bind to the DNA sequence named W-box (5′-TTGAC-C/T-3′). However, several sequences upstream of the promoters triggered by WRKY TFs contain W-box-centered sequences with TTGACC or TTGACT. Therefore, we first used a fEMSA to examine whether the WRKY54 DBD can bind to W-box sequences with TTGACC or TTGACT from the SAG12 upstream region. Four types of W-box (W1-W4) obtained from SAG12 upstream regions were used as probes ( Figure 3A). Those four types

WRKY54 Can Bind to the W4 of SAG12 In Vivo
The ChIP-PCR technique was used to confirm whether WRKY54 binds specifically to W4 in vivo. Consistent with our fEMSA results, WRKY54 failed to bind to W1-W3 in either wild type Arabidopsis or a WRKY54-overexpression line ( Figure 4A-C). However, for W4, a clear banding pattern could be seen for the WRKY54-overexpression line, although it was weaker for the wild type ( Figure 4D).

Length of the Flanking Region Adjacent to W-Box Affects the Binding Ability of WRKY54 DBD
We then examined whether the composition of the flanking region adjacent to W4 box affected the specific binding ability of the WRKY54 DBD. Six artificial W4 variants were synthesized for this purpose ( Figure 5A). Clear banding shift patterns were observed for the WRKY54 DBD-W4, WRKY54 DBD-T13, and WRKY54 DBD-T12a pairs ( Figure 5B). In contrast, the WRKY54 DBD-T12 pair displayed a relatively weak banding shift. For the pairs involving T6, T8, and T10, no banding shift was observed, suggesting that the WRKY54 DBD was unable to bind to those artificial W4-like nucleotides ( Figure 5B). of W-box were labeled with fluorescein and incubated with purified WRKY54 DBD protein. The results confirmed binding of the WRKY54 DBD protein to W4, but no binding to W1, W2, or W3 ( Figure 3B). These four W-box sequences differ in the nucleotides at either the 5′ or the 3′ flanking region ( Figure 3A). To examine whether the nucleotides located adjacent to W4 box affect the binding ability of WRKY54, we designed a series of W4 probes.

WRKY54 Can Bind to the W4 of SAG12 In Vivo
The ChIP-PCR technique was used to confirm whether WRKY54 binds specifically to W4 in vivo. Consistent with our fEMSA results, WRKY54 failed to bind to W1-W3 in either wild type Arabidopsis or a WRKY54-overexpression line ( Figure 4A-C). However, for W4, a clear banding pattern could be seen for the WRKY54-overexpression line, although it was weaker for the wild type ( Figure 4D). The QCM was then used to examine the binding constants of the WRKY54 DBD to W4 and the W4-like nucleotides ( Table 2). The K d value obtained from the W4-WRKY54 DBD pair was 163 ± 10.06 pM. The K d values were 192 ± 22.5 pM and 2195 ± 442.3 pM for the T13-WRKY54 DBD and T12-WRKY54 DBD pairs, respectively. When the 5 -end nucleotides were removed, the K d values were 47 ± 6.2 pM (T12a-WRKY54 DBD pair) and 68 ± 11.73 pM (T13-WRKY54 DBD pair; Table 1, Figure 5C). There were no signals detected in the T6, T8, or T10 pairs. These results showed that a flanking region with at least three nucleotides at the 5 end of TTGACT is required for the binding of the WRKY54 DBD.

Length of the Flanking Region Adjacent to W-Box Affects the Binding Ability of WRKY54 DBD
We then examined whether the composition of the flanking region adjacent to W4 box affected the specific binding ability of the WRKY54 DBD. Six artificial W4 variants were synthesized for this purpose ( Figure 5A). Clear banding shift patterns were observed for the WRKY54 DBD-W4, WRKY54 DBD-T13, and WRKY54 DBD-T12a pairs ( Figure  5B). In contrast, the WRKY54 DBD-T12 pair displayed a relatively weak banding shift. For the pairs involving T6, T8, and T10, no banding shift was observed, suggesting that the WRKY54 DBD was unable to bind to those artificial W4-like nucleotides ( Figure 5B).

Structural Insights into the DNA Binding of WRKY54 to W-Box
The three-dimensional model of the docked structure of the WRKY54 DBD revealed a four-stranded antiparallel β-sheet with the conserved WRKYGQK (W157, R158, K159, Y160, G161, Q162, and K163) motif located in the β1 strand. Insertion of the four-stranded antiparallel β-sheet into the major groove of W-box DNA permits extensive interactions between the residues of the WRKY54 DBD and the DNA nucleotides (5 -ATTTGTTGACTAGG-3 , W-box sequence underlined), as illustrated in Figure 6A. At the protein-DNA interface, intermolecular contacts between the WRKY54 DBD and the W4 nucleotides are primarily between the conserved residues W157, R158, K159, Y160, G161, and K163 on the β1 strand and dT4-dT7 and dG19-dA23 on the DNA. In this context, molecular interactions are mediated mainly by the formation of apolar contacts and H-bonds between the conserved residues of the WRKYRQK binding motif (W157, R158, K159, and Y160) and the DNA nucleotides dG5, dT6, dG19, dT20, dC21 and dA22 ( Figure 6B). In this postulated scenario, intermolecular hydrogen-bonding interactions formed between W157 and dT4 and/or dG5; K159 and dT6; and K163 and dG19 and/or dT20 are involved in the specific binding of the WRKY54 DBD to the W-box DNA sequence ( Figure 6C,D). Based on the binding mode of the WRKY54 DBD to W4, a 5 flanking region of W-box with at least three additional nucleotides is thus required for WRKY54 binding. The QCM was then used to examine the binding constants of the WRKY54 DBD to W4 and the W4-like nucleotides ( Table 2). The Kd value obtained from the W4-WRKY54 DBD pair was 163 ± 10.06 pM. The Kd values were 192 ± 22.5 pM and 2195 ± 442.3 pM for the T13-WRKY54 DBD and T12-WRKY54 DBD pairs, respectively. When the 5′-end nucleotides were removed, the Kd values were 47 ± 6.2 pM (T12a-WRKY54 DBD pair) and 68 ± 11.73 pM (T13-WRKY54 DBD pair; Table 1, Figure 5C). There were no signals detected in the T6, T8, or T10 pairs. These results showed that a flanking region with at least three nucleotides at the 5′ end of TTGACT is required for the binding of the WRKY54 DBD.

Structural Insights into the DNA Binding of WRKY54 to W-Box
The three-dimensional model of the docked structure of the WRKY54 DBD revealed a four-stranded antiparallel β-sheet with the conserved WRKYGQK (W157, R158, K159, Y160, G161, Q162, and K163) motif located in the β1 strand. Insertion of the four-stranded antiparallel β-sheet into the major groove of W-box DNA permits extensive interactions between the residues of the WRKY54 DBD and the DNA nucleotides (5′-ATTTGTT-GACTAGG-3′, W-box sequence underlined), as illustrated in Figure 6A. At the protein-DNA interface, intermolecular contacts between the WRKY54 DBD and the W4 nucleotides are primarily between the conserved residues W157, R158, K159, Y160, G161, and K163 on the β1 strand and dT4-dT7 and dG19-dA23 on the DNA. In this context, molecular interactions are mediated mainly by the formation of apolar contacts and H-bonds between the conserved residues of the WRKYRQK binding motif (W157, R158, K159, and Y160) and the DNA nucleotides dG5, dT6, dG19, dT20, dC21 and dA22 ( Figure 6B). In this postulated scenario, intermolecular hydrogen-bonding interactions formed between W157 and dT4 and/or dG5; K159 and dT6; and K163 and dG19 and/or dT20 are involved in the specific binding of the WRKY54 DBD to the W-box DNA sequence ( Figures 6C,D). Based on the binding mode of the WRKY54 DBD to W4, a 5′ flanking region of W-box with at least three additional nucleotides is thus required for WRKY54 binding.

WRKY TFs Can Be Classified into Five Clades
WRKY TFs are involved in metabolism, growth, pathogen resistance, and abiotic stress responses [5]. The phylogeny of WRKY TFs has therefore been reconstructed several times to identify the WRKY groups and their evolutionary process [2][3][4]. Surprisingly, these previous studies were unable to create a consensus on the evolution of WRKY TFs [2][3][4]. These TFs contain two types of factors with highly diverse sequences and gene structures, which may explain the inconsistent topologies that were derived in these previous studies [3]. Here, we conducted a new phylogenetic analysis of WRKY TFs using data from an up-to-date dataset (16 representative species selected from across the plant kingdom) to elucidate the relationships between these WRKY TFs. By setting the WRKY obtained from M. pusilla (Mpu50253) as the root, we obtained five distinct clades with moderate (BI: 0.88; ML: 0.8) to strong (BI: 1; ML: 1) branch support (Figure 1). An evolutionary history of the WRKY domain can therefore be proposed based on the amino acid sequences next to the schematic WRKY phylogeny in Figure 7.

Molecular Binding Mechanism for Specific WRKY-W-Box Interaction
Based on the binding profile of the WRKY54 DBD to W1-W4 sequences, the DNA binding ability is abolished when probed with W-box (TTGACT) alone (Figures 3 and 4) in fEMSA assay. In contrast, the WRKY54 DBD could be bound well to W4 with 5′-TTG and 3′-A residues immediately adjacent to the W-box (W4) nucleotides ( Figure 5, Table 1). Based on these results, we identified the W-box flanking region as an essential element in WRKY54 DBD binding interactions [8,10]. The relative importance of the W-box flanking region has been implied in previous research on the DNA binding ability of other WRKY TFs (e.g., WRKY11, WRKY6, WRKY26, and WRKY38) [10]. For instance, it is known that WRKY11 and WRKY6 bind to W-box sequences that have a G residue directly adjacent to the 5′ end of the conserved W-box nucleotides, whereas WRKY26 and WRKY38 bind to conserved W-box nucleotides with a T, C, or A residue in the same place [10]. Combined, these findings suggest that the composition of the W-box flanking regions affect the ability of these W-box nucleotides and WRKY TFs to bind [10].
In addition, the present study provides a novel clue to the DNA-binding mechanism of WRKY TFs. As suggested by the fEMSA assay and QCM analysis, the minimum length In this study, WRKY TFs with the sequence RWRKYGK . . . RSYYKC in the Chlorophyta were found, but not in the Rhodophyta ( Figure 7A, Table 1). This suggests that these WRKY TFs may have originated in Chlorophyta and then evolved in land plants. The WRKY domain is currently regarded as a sequence that exists in all land plants, is approximately 60 amino acid residues long, and contains a WRKY DBD and a zinc-finger motif (CxCxHNH or CxCxHTC) [2,5,7]. However, the incomplete zinc-finger motif identified in M. pusilla (Mpu50253) suggests that the currently known WRKY domain with a zinc-finger motif only appeared in land plants after the development of chlorophytes ( Figure 7A). After the zinc-finger motif had developed, a series of amino-acid substitutions and duplications occurred that follow WRKY TF divergence.
By mapping main WRKY motif and zinc-finger motif next to simplified WRKY topology, a potential WRKY evolution pattern was revealed in plants (Figure 7). After Chloro-phytes, land plants gained three types of zinc-finger structures, including CxCxHNH, and CxCxHTC ( Figure 7B). We regarded these as the basal forms for WRKY TFs. Following divergence, a basal form, CxCxHNH, was dominant within Clade2. In addition, amino acid residue substitutions at 3 end of zinc-finger leading to a second zinc-finger type -CxCxHxH in Clade2. For example, a threonine replacement at zinc-finger 3 end formed a CxCxHTH zinc-finger structure in A. thaliana WRKY48 or a valine replacement at zincfinger 3 end formed a CxCxHVH in M. acuminata Mu7T13310. Lastly, loss of zinc-finger structure was found in Clade 2, like OsWRKY52. This could due to amino acid similarity between conserved WRKY region of OsWRKY52 and other Clade 2 WRKY sequences. Next, CxCxHNH, CxCxHTH and CxCxHTC zinc-finger structure were found in Clade 3. The WRKY TFs containing two WRKY domains were assigned to Clade 4, suggesting a post-species divergence WRKY-domain-duplication event. In Clade5, two main zinc-finger structures were found, including CxCxHNH and CxCxHTC. One of the common features of Clade 5 is the CxCxHTC zinc-finger structure ( Figure 7A), suggesting zinc-finger motif ancestral polymorphism maintenance or an amino-acid substitution occurring in their common ancestor. Substitutions in zinc-finger structure may associate with diverse functions of WRKY TFs in plants. That is because zinc-finger structure of WRKY TFs plays a key role in mediating WRKY protein binding to W-box nucleotide ability and also involving in oligomerization state transition of WRKY TFs [11,12]. Based on these phylogenetic relationships and the zinc-finger structures, the WRKY domain obtained the full zinc-finger motif after the emergence of chlorophytes, after which amino-acid substitutions occurred in the zinc-finger motif, followed by divergence that led to WRKY domain duplication in the common ancestor of Clade 4 and a retained ancestral zinc-finger motif polymorphism in Clade 5 ( Figure 7B). Previous models have inferred WRKY evolution based mainly on nucleotide similarity, suggesting that the current WRKY TFs evolved from TFs containing a double WRKY domain [5,13]. Unlike these previous models, our evolutionary scenario for the current WRKY TFs seen across the plant kingdom was based on development from a TF containing a single WRKY domain.

Molecular Binding Mechanism for Specific WRKY-W-Box Interaction
Based on the binding profile of the WRKY54 DBD to W1-W4 sequences, the DNA binding ability is abolished when probed with W-box (TTGACT) alone (Figures 3 and 4) in fEMSA assay. In contrast, the WRKY54 DBD could be bound well to W4 with 5 -TTG and 3 -A residues immediately adjacent to the W-box (W4) nucleotides ( Figure 5, Table 1). Based on these results, we identified the W-box flanking region as an essential element in WRKY54 DBD binding interactions [8,10]. The relative importance of the W-box flanking region has been implied in previous research on the DNA binding ability of other WRKY TFs (e.g., WRKY11, WRKY6, WRKY26, and WRKY38) [10]. For instance, it is known that WRKY11 and WRKY6 bind to W-box sequences that have a G residue directly adjacent to the 5 end of the conserved W-box nucleotides, whereas WRKY26 and WRKY38 bind to conserved W-box nucleotides with a T, C, or A residue in the same place [10]. Combined, these findings suggest that the composition of the W-box flanking regions affect the ability of these W-box nucleotides and WRKY TFs to bind [10].
In addition, the present study provides a novel clue to the DNA-binding mechanism of WRKY TFs. As suggested by the fEMSA assay and QCM analysis, the minimum length necessary to maintain the specific interaction between W4 and the WRKY54 DBD is three retained nucleotides at either end of the conserved region of W4, suggesting that the length of W-box flanking regions could be a factor in the DNA binding of the WRKY54 DBD ( Figure 5, Table 1).
The DNA binding mode of the WRKY54 DBD was proposed based on our structural model with W4 DNA. Consistent with the binding mode of the WRKY1 domain (PDB ID: 6J4E) [14], which was the template structure we used for our homology modeling of the WRKY54 DBD, the protein-DNA binding interaction was primarily mediated by the conserved WRKYGQK binding motif. However, we observed a slight difference in the structural arrangement of the loop between β1 and β2, and of another loop connecting β3 and β4 at the C-terminus. As illustrated in a previous study [15], the dynamic process of WRKY TF binding to W-box DNA involves structural rearrangements of the β1 strand harboring the conserved DNA-interacting residues. Accordingly, it has been proposed that this structural change may alter the loop structure and the length of the strands, enabling extensive contact with the DNA [15]. On the other hand, because of the highly similar DNA-binding mode used by WRKY TFs from different groups [14,15], the mechanism underlying the DNA-binding preferences of the WRKY TFs remains elusive. Despite the limited evidence, it has been noted that differences in the DNA-contacting residues among WRKY TFs may be the key factor determining their preferences for different W-box flanking regions. For example, Yamasaki et al. [15] suggested that the hydrogen-bonding interaction between R415 and dG5 in the modeled structure of WRKY4 binding to W-box (5 -CGCCTTTGACCAGCGC-3 ) may be responsible for recognizing the nucleotides in the W-box flanking region, which could explain the binding preferences of WRKY TFs. Interestingly, in structure of WRKY54 DBD binding to W4 DNA, the dT4 located adjacent to the 5 end of the W-box core region could form a hydrogen bond with the conserved residue W157. In accordance with our fEMSA results, the 5 -TTG located adjacent to the W4 box nucleotides was shown to be indispensable for the W4-WRKY54 DBD interaction. Based on these results, we propose a "recognition-then-binding" mechanism for the formation of the W4-WRKY54 DBD protein-DNA complex.
Based on our results and previous studies, the interaction between WRKY TFs and Wbox nucleotides seems to occur in a clade-specific manner [10,15]. For example, the 5 -TTG of W4 is essential for the protein-DNA interaction between WRKY54 and W4 ( Figure 5). However, the 5 -AG nucleotides of W2 of PR1-1 (pathogenesis-related protein 1-1) and the 5 -T of the conserved PR1-1 W2 are necessary for the protein-DNA interaction between WRKY11 and W2 of PR1-1 [10]. Further, the protein-DNA interaction between WRKY26 and W2 of PR1-1 is mainly affected by mutations of both the 5 -and 3 -end nucleotides of the conserved W2 region [10]. In reconstructed phylogeny, WRKY11, WRKY54, and WRKY26 belong to Clade 1, Clade 3, and Clade 4, respectively (Figure 1), suggesting the existence of clade-specific interactions between WRKY TFs and their corresponding W-box sequences.
Here, an association between the recognition specificity of WRKY TFs with their corresponding W-box sequences and their phylogeny was established. However, one other interesting issue emerged: how a particular WRKY TF is involved in various molecular regulatory networks. WRKY TFs are known to regulate complicated developmental processes (e.g., leaf senescence) and responses to environmental cues (e.g., responses to pathogen infection) [5,6,[15][16][17][18][19][20][21][22]. For example, WRKY26 is known to regulate PR1 (a pathogenesis-related protein in Arabidopsis) and SIRK (a senescence-associated expression gene in Arabidopsis) by identifying different W-box nucleotides (PR1 W2: AGTTGACCAA; SIRKW11/W12: TTGGTTGACTATCAACATCTTATTGACCAAAT) [10,21]. To be able to interact with such different W-box sequences, a conformational change in WRKY26 at the protein level might be expected. Further analyses of the protein-DNA interactions of specific WRKY domains involved in different molecular regulatory networks may provide clues that can lead to a full understanding of the mechanisms of plant-specific WRKY TFs. In summary, WRKY TFs can recognize the flanking regions of W-box sequences to enhance their binding ability and ensure transcription regulation.

WRKY Phylogeny Reconstruction
First, the core residues of WRKY domains (53-55 residues in length, on average) of green plants were labeled using Bioedit to identify conserved WRKY genes [23]. Then, to investigate WRKY gene evolution across the plant kingdom, WRKY genes were retrieved from the online Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html# accessed on 23 January 2022) [24] and the database of the National Center for Biotechnology Information (NCBI). The 73 WRKY genes of Arabidopsis thaliana (AtWRKY) were used as templates for identifying the WRKY genes of the following selected species: Micromonas pusilla, Marchantia polymorpha, Selaginella moellendorffii, Amborella trichopoda, basal angiosperms (such as Amborella trichopoda), monocots (including Zea mays and Oryza sativa), and eudicots (including Aquilegia coerulea, Solanum lycopersicum and Arabidopsis thaliana) (for details see Table 1 and Table S1). In addition, two polyploidy species are also included in this study, such as Musa acuminate (triploid) and Fragaria x ananassa (octoploid). These species are chosen to represent major plant lineages. Although we searched the Porphyra umbilicalis (Rhodophyta) database, implemented in Phytozome, we found no WRKY-containing genes in that database. Also, a WRKY gene found in M. pusilla, Mpu62993, caused long-branch attraction in preliminary phylogenetic reconstruction. Thus, Mpu62993 was excluded from the reconstruction. To conduct a comprehensive search for the WRKY genes of the selected species, the default threshold value and tblastx were used for each search. Only sequences containing a WRKY domain were used for the phylogenetic reconstruction.
First, all WRKY nucleotide sequences were aligned using the MUSCLE algorithm [25] in MEGA v.6 [26]. Next, the resulting alignment matrix was visually refined based on amino acid translations using Bioedit [23]. The best-fitting model of the aligned WRKY matrix was GTR + G + I, as evaluated using the Bayesian information criterion (BIC) [27]. Both Bayesian inference (BI) and maximum likelihood (ML) algorithms were applied to infer the relationships among the identified WRKY genes using the PhyML 3.0 online interface [28]. Branch support for nodes was assessed using an approximate likelihood ratio test (aLRT) [29] and a Bayesian-like transformation of the aLRT (aBayes) [30] for the ML and BI findings, respectively (see Supplementary Figure S1 for details). A conventional bootstrap algorithm was conducted with 1000 replicates. The reconstructed phylogeny was visualized using the online interface Interactive Tree of Life (iTOL) v5 [31].

Vector Construction for the WRKY54 DNA-Binding Domain
For protein expression and purification, the wild type WRKY54 gene (AT2G40750) was amplified via polymerase chain reaction (PCR) from the plasmid pET21a-WRKY54, which was provided by Dr. Shih-Tong Jeng (Institute of Plant Biology, National Taiwan University, Taipei, Taiwan). The DBD of WRKY54 (WRKY54 DBD) was amplified with the forward BamH1 primer (5 -CCCGGATCCGGATGCTACACTAGAA-3 ) and the reverse Pst1 primer (5 -CCCCTGCAGGAAAAGGCTCGGTCTT-3 ). It was then subcloned into the expression vector pMALH12-c5v, such that the recombinant protein WRKY54 contained a maltose binding protein tag, a thrombin cleavage site at the N-terminus, and a twelve-histidine tag at the C-terminus.

Protein Expression and Purification
The WRKY54 DBD recombinant protein was expressed using Escherichia coli Rosetta-gamiB (DE3). Cultures were incubated at 37 • C to an OD 600 of 0.4-0. 6

Determination of Molecular Size via Dynamic Light Scattering
The protein samples were subjected to dynamic light scattering (DLS) after advanced purification by SEC. Data were collected using a Zetasizer Nano ZS DLS instrument (Malvern Instruments, Malvern, UK) equipped with 50 mW laser fiber. An appropriate refractive index, viscosity (10% glycerol), and temperature (25 • C) was set for each sample.

DNA Binding Assays via fEMSA
To visualize whether the WRKY54 DBD interacted with specific W-box regions of the SAG12 gene, a series of fEMSA assays were conducted. All W-box nucleotides (including W1 to W4, which were identified from the SAG12 promoter region, and artificial synthesized W-box-like nucleotides) were labeled with fluorescein at the 5 end and used as DNA probes in the fEMSA. Prior to native polyacrylamide gel electrophoresis, the DNA probe-protein mix was incubated in the dark for 30 min. The DNA-protein binding reaction was performed by incubating purified WRKY54 DBD recombinant protein with the aforementioned double-stranded W-box and W-box-like nucleotides at 4 • C for 90 min at 120 V. After electrophoresis, the DNA probe-protein binding pattern was observed using a fluorescent luminescence image analyzer (FluorChem M, San Jose, CA, USA) at National Taiwan University.

Determination of Binding Constant between WRKY54 DBD Protein and W4 or W4-like DNA Using Quartz Crystal Microbalance Technique
The QCM technique was used to determine the dissociation constant (K d ) values and examine the binding ability of the WRKY54 DBD protein with W4 DNA or W4-like DNA. The protein-nucleotide interactions in WRKY54 DBD-W4 DNA and -W4-like DNA pairs were analyzed by using an AffinixQN QCM biosensor (Initium, Tokyo, Japan). To measure K d , one has to describe the relationship between pressure and the number of active sites on the surface undergoing adsorption by applying the Langmuir equation ( [Guest] have formed an adsorbed complex. K on and K off represent the association and dissociation rate constants, respectively. The ratio of K off to K on is K d , the dissociation constant. Prior to usage, the QCM biosensor was washed twice with 3 µL of piranha solution (H 2 SO 4 and H 2 O 2 in a 3:1 ratio) and incubated with 1% SDS for 5 min. Then, 400 µL of reaction buffer (30 mM HEPES, pH 7.5, 0.5 M NaCl, and 2 mM tris(2-carboxyethyl) phosphine [TCEP]) was applied to the dried sensor to balance and set up the magnetic stir at 1000 rpm at 25 • C. Avidin (1 mg/mL) was injected into the reaction buffer and coated on the Au electrode plate until saturation. Excess proteins were washed out with the reaction buffer until the oscillation frequency became a static horizontal line. Next, 5 µL of a double-stranded DNA probe with a biotin-labeled 5 -end (50 µM) was added. When the frequency had stabilized at ca. 0.3 Hz/s, 1 mg/mL of WRKY54 DBD recombinant protein was injected every 10 min, 10 times. The continuous titration method was used to determine the binding constant of the protein-DNA pairs, and the values were recorded as multiple binding curves using the AffinixQN v2 software (Initium, Tokyo, Japan). Data from three independent repeats were processed using AQUA v2 software (Initium, Tokyo, Japan).

Examination of WRKY54 Protein and W-Box through Chromatin Immunoprecipitation In Vivo
The chromatin immunoprecipitation (ChIP) technique was used to examine whether WRKY54 interacts with the W-box identified from SAG12 gene in vivo. Chromatin was extracted from 5.5-week-old Arabidopsis plants. After fixation with 0.37% formaldehyde, the chromatin was sheared to an average length of 500-1000 bp by sonication and then immunoprecipitated with rabbit-anti-myc antibodies (Abcam, Cambridge, UK, ab9106). The cross-linking was then reversed, and the quantity of each precipitated DNA fragment was determined via PCR using specific primers (Supplementary Table S2). Three biological replicates were performed.

Homology Modeling and Protein-DNA Complex Docking
To gain insight into the probable binding mode of WRKY54, homology modeling of the WRKY54 DBD was performed in PyMod 3 [33] with the crystal structure of the WRKY1 domain (PDB ID: 6J4E) as a template model. Next, protein-DNA docking was carried out using the ZDOCK server [34]. From the 10 top-scoring docking poses generated by the rigid-body docking algorithm, a final model was selected based on information provided in previous protein-DNA binding studies [14].

Conclusions
WRKY transcription factors (TFs) participate various molecular regulatory networks in plants. In this study, a reconstructed WRKY TFs phylogeny shows that WRKY TFs may have originated in Chlorophyta and then diversified with amino acid substitutions at zinc finger motif in land plants. In addition, minimum length and specific nucleotide of W-box flanking region are two key factors in affecting binding and recognition ability of WRKY TFs based on the results obtaining from studying binding and recognition ability of WRKY54 DBD and its corresponding W-box nucleotides.