Understanding the Effect of Structural Diversity in WRKY Transcription Factors on DNA Binding Efficiency through Molecular Dynamics Simulation

A precise understanding of the molecular mechanism involved in stress conditions has great importance for crop improvement. Biomolecules, such as WRKY proteins, which are the largest transcription factor family that is widely distributed in higher plants, plays a significant role in plant defense response against various biotic and abiotic stressors. In the present study, an extensive homology-based three-dimensional model construction and subsequent interaction study of WRKY DNA-binding domain (DBD) in CcWRKY1 (Type I), CcWRKY51 (Type II), and CcWRKY70 (Type III) belonging to pigeonpea, a highly tolerant crop species, was performed. Evaluation of the generated protein models was done to check their reliability and accuracy based on the quantitative and qualitative parameters. The final model was subjected to investigate the comparative binding analysis of different types of WRKY–DBD with DNA-W-box (a cis-acting element) by protein–DNA docking and molecular dynamics (MD) simulation. The DNA binding specificity with WRKY variants was scrutinized through protein–DNA interaction using the HADDOCK server. The stability, as well as conformational changes of protein–DNA complex, was investigated through molecular dynamics (MD) simulations for 100 ns using GROMACS. Additionally, the comparative stability and dynamic behavior of each residue of the WRKY–DBD type were analyzed in terms of root mean square deviation (RMSD), root mean square fluctuation (RMSF)values of the backbone atoms for each frame taking the minimized structure as a reference. The details of DNA binding activity of three different types of WRKY–DBD provided here will be helpful to better understand the regulation of WRKY gene family members in plants.

in-depth understanding of the molecular mechanism involved in stress tolerance is crucial for the development of stress-tolerant pigeonpea cultivars. Similarly, the basic understanding of the biological phenomenon will help to expand the knowledge for the improvement of other crop species.
In our present study, we predicted the homology-based 3D model of the WRKY protein using computational methods. In addition, comparative molecular dynamics simulations of different WRKY variants in pigeonpea to facilitate the molecular mechanisms of WRKY transcription factors and the interaction of these WRKY transcription factors with DNA molecules. The present study provides a basis for the understanding of the structural and functional dynamics of WRKY transcription factors in response to stress in pigeonpea.

Sequence Analysis and Comparative Phylogeny
In the present study, we selected three WRKY-DBDs representing three different variants that belong to different groups [47]. The CcWRKY1belongedto group I, having two WRKY DBDs, one at N-terminal and another at C-terminal, while only the C-terminal WRKY domain took part in the sequence-specific DNA-binding process, thus the C-terminal domain of CcWRKY1 was considered Type I WRKY-DBD. The Type II (CcWRKY51), member of the group II, had a WRKYGKK motif, (a replacement from amino acid glutamine in group I to lysine; Q14K). Similarly, the Type III (CcWRKY70), member of group III, had a WRKYGEK motif (a replacement from amino acid glutamine in group I to glutamic acid; Q14E) was considered. The nucleotide distribution and A-T-G-C density examination by MATLAB Bioinformatics Toolbox revealed the A-T and G-C rich regions throughout the sequences. Subsequently, the percentage of nucleotide distribution was depicted using a pie chart ( Figure S1) and dimmers content as a bar diagram. Evaluation of amino acid composition of the selected CcWRKY protein sequence revealed the frequent occurrence of serine, lysine, threonine, glutamic acid, and valine, as compared to other amino acids ( Figure S2).
The physicochemical properties such as molecular weight (mol. wt.) of the selected CcWRKY proteins ranged from 7017.84 to 7528.51 kDa, and the instability index (II) ranged from 17.71 to 39.92 which showed Type I was relatively more stable than other proteins. A protein with an instability index ranging less than 40 usually shows higher stability at in vitro conditions, whereas a predicted instability index above 40 suggests the unstable nature of the protein. At the isoelectric point (pI), proteins were compact and stable. The pI of WRKY-DBDs ranged from 8.75-9.47, showing its basic nature (pI>7.0). For improvement of the thermal constancy of globular proteins, aliphatic index (AI) is an important factor, that indicates the stability for a wide range of temperatures. Range of extinction coefficient values 13,200-14,565 reveals the existence of Cys, Trp, and Tyr amino acid residues. The grand average of hydropathicity (GRAVY) ranges of WRKY-DBDs was very lower (−0.933 to −1.523), which pointed to its higher affinity with water molecules (Table S1). The sub-cellular localization of selected WRKYs using a support vector machine-based prediction tool WoLFPSORT server [48] showed their presence in the nuclear region. The zinc finger signature for Type I, Type II, and Type III WRKY domains were different, Type I contained both a CTD zinc-finger structure C-X4-C-X23-H-X-H and a NTD structure C-X4-C-X22-H-X-H, whereas Type II contained C-X4-C-X23-H-X-H and a Type III had C-X7-C-X23-H-X-C-type zinc finger signature.
The knowledge about the secondary structure of the target proteins is critical to explain the protein folding in 3D structure conformations. The comparative secondary structure analysis amongst the target and template sequences showed very strong homology over the entire sequence (Table S2). The secondary structure analysis also supported the primary sequence level analysis of target and templates. The secondary structure conservation also disclosed the reliability of our proposed model based on target-template alignment predicted by the Modeller software. Secondary structure and the disulfide bridges (S-S) in the target proteins were predicted by using PDBsum and CYS_REC programs ( Figure S3). The sequence alignment of Type I, Type II, and Type III WRKY DBD within pigeonpea, and with their sequential homologs in Glycine max, Phaseolus vulgaris, Medicago truncatula, and Cicer arietinum showing the strong conservation of amino acid residues across the different species ( Figure S4). A phylogenetic tree was built using complete peptide sequences of Type I, Type II, and Type III from C. cajan, and their sequence homolog from G. max, C. arietinum, P. vulgaris, and M. truncatula (Figure 1). The high bootstrap values indicated the conservation of amino acid residues across plant species. arietinum showing the strong conservation of amino acid residues across the different species ( Figure S4). A phylogenetic tree was built using complete peptide sequences of Type I, Type II, and Type III from C. cajan, and their sequence homolog from G. max, C. arietinum, P. vulgaris, and M. truncatula ( Figure 1). The high bootstrap values indicated the conservation of amino acid residues across plant species. For better understanding of the diversity of protein motifs throughout the selected CcWRKYs, the motif distributions were analyzed as a phylogenetic tree for Type I, Type II, and Type III WRKY gene clusters. The statistical significance of the predicted motifs was calculated in terms of their pvalue. In our results, motif 1 (DGYRWRKYGQKMVKGNTNPRNYYRCSSPGC) represents Cterminal WRKY domain (CTD) which is present in all the fifteen WRKY members, while motif 3 (DGYNWRKYGQKHVKGNEFIRSYYKCTHPNC) represents an N-terminal WRKY domain (NTD) which present only in group I members.
Similarly, motif 2 (PVKKHVERASHDSKIVITTYEGQHDHEIPP), which is also a type of WRKY domain, was present in all the members except the CcWRKY50 protein ( Figure 2). Furthermore, the occurrence of additional motifs or un-conserved domains explains their divergence within the group. In group I, four additional protein motifs were present, including motif 4, motif 9, along with motif 12 and motif 14. Motif 5 and motif 7 are specific to group II and III, which leads into the formation of the separate cluster in the phylogenetic tree. Motif 6 and motif eight were found to be present in both groups I and II. Some additional motifs like motif 10, motif 11, motif 13, and motif 19 were present in the entire group I members, while absent in M. truncatula ( Figure 2). For better understanding of the diversity of protein motifs throughout the selected CcWRKYs, the motif distributions were analyzed as a phylogenetic tree for Type I, Type II, and Type III WRKY gene clusters. The statistical significance of the predicted motifs was calculated in terms of their p-value. In our results, motif 1 (DGYRWRKYGQKMVKGNTNPRNYYRCSSPGC) represents C-terminal WRKY domain (CTD) which is present in all the fifteen WRKY members, while motif 3 (DGYNWRKYGQKHVKGNEFIRSYYKCTHPNC) represents an N-terminal WRKY domain (NTD) which present only in group I members. Similarly, motif 2 (PVKKHVERASHDSKIVITTYEGQHDHEIPP), which is also a type of WRKY domain, was present in all the members except the CcWRKY50 protein ( Figure 2). Furthermore, the occurrence of additional motifs or un-conserved domains explains their divergence within the group. In group I, four additional protein motifs were present, including motif 4, motif 9, along with motif 12 and motif 14. Motif 5 and motif 7 are specific to group II and III, which leads into the formation of the separate cluster in the phylogenetic tree. Motif 6 and motif eight were found to be present in both groups I and II. Some additional motifs like motif 10, motif 11, motif 13, and motif 19 were present in the entire group I members, while absent in M. truncatula ( Figure 2).

Construction and Validation of WRKY-DBD Model Variants
The WRKY-DBDs were 60-70 amino acid residues long, extracted from full-length WRKY proteins. The domain characterized by the presence of highly conserved WRKYGQK motif and BLASTP search was considered to retrieve the suitable templates from the Protein Data Bank [49] for modeling the three dimensional (3D) structure of target proteins. The BLASTP results revealed that, three templates (PDB id: 2AYD (AtWRKY1-C-terminal domain), PDB id: 1WJ2 (AtWRKY4-Cterminal domain), and PDB id: 2LEX (AtWRKY4 C-terminal domain and DNA W-box complex) showed highest sequence identity with the targets. Subsequently, the templates were selected to build protein 3D models using Modeller 9.19 program by Ben Webb (University of California, San Francisco, CA, USA).

Construction and Validation of WRKY-DBD Model Variants
The WRKY-DBDs were 60-70 amino acid residues long, extracted from full-length WRKY proteins. The domain characterized by the presence of highly conserved WRKYGQK motif and BLASTP search was considered to retrieve the suitable templates from the Protein Data Bank [49] for modeling the three dimensional (3D) structure of target proteins. The BLASTP results revealed that, three templates (PDB id: 2AYD (AtWRKY1-C-terminal domain), PDB id: 1WJ2 (AtWRKY4-C-terminal domain), and PDB id: 2LEX (AtWRKY4 C-terminal domain and DNA W-box complex) showed highest sequence identity with the targets. Subsequently, the templates were selected to build protein 3D models using Modeller 9.19 program by Ben Webb (University of California, San Francisco, CA, USA).
A total of five protein models were selected out of the 30 predicted models for each Type I (CcWRKY1), Type II (CcWRKY51), and Type III (CcWRKY70) proteins. Out of five putative models, the model with the least discrete optimized protein energy (DOPE) score and highest GA341 score was considered as a most thermodynamically stable model and used for further refinement and validation ( Figure 3).
for modeling the three dimensional (3D) structure of target proteins. The BLASTP results revealed that, three templates (PDB id: 2AYD (AtWRKY1-C-terminal domain), PDB id: 1WJ2 (AtWRKY4-Cterminal domain), and PDB id: 2LEX (AtWRKY4 C-terminal domain and DNA W-box complex) showed highest sequence identity with the targets. Subsequently, the templates were selected to build protein 3D models using Modeller 9.19 program by Ben Webb (University of California, San Francisco, CA, USA).  To stabilize the stereo chemical properties of the modelled 3D protein model, MD simulation was done for 100 ns using the GROMACS 5.0. From the RMSD analysis, it seems that the Type I WRKY showed a relatively lesser backbone RMSD value 0.1 nm than Type II and Type III which were between 0.1 and 0.2 ( Figure 4a). The last 10ns trajectory was extracted from the simulated modes and considered for further docking analysis. Root mean square fluctuation (RMSF) showed the action behavior of each amino acid residue for Type I, Type II, and Type III. The RMSF plot indicated a very similar type of residue fluctuations for Type I, Type II, and Type III, with an average of 1nm to 1.3 nm (Figure 4b). The compactness of protein was measured in terms of the radius of gyration (Rg) score. The Rg plot of protein versus time (300 K) shows the average Rg scores 1.28 nm, 1.3 nm, and 1.38 nm for Type I, Type II, and Type III, respectively ( Figure 4c). The Rg score indicated that Type II and Type III show lesser compactness with a high Rg score throughout the simulation process than the Type I WRKY. We also observed the significant differences in the solvent-accessible surface area (SASA) during the simulation run. The SASA calculated in respect to time showed that Type I showed 51.8 nm 2 while Type II and Type III showed relatively higher SASA values of 52.12 nm 2 and 56.8 nm 2 , respectively ( Figure 4d).
Further, the reliability of the protein models was evaluated by PROCHECK server [50]. Ramachandran plots generated through the PROCHECK server gives the full details about the backbone of torsion angels Phi (Φ) and Psi (ψ) dispersal of the amino acids in the protein model. The Ramachandran plot analysis for Type I, Type II, and Type III WRKY-DBDs showed that 98.1%, 98.0%, and 96.7% of the total residues lies in the most favored region. Whereas, 1.9%, 2.0%, and 3.3% were present in the additionally allowed regions, respectively, and no residues were located in generously and disallowed regions ( Figure S6). Further, the model quality was promoted by a high ERRAT score. The overall quality factor for Type I, Type II, and Type III protein models were 75.00, 65.38, and 67.27, respectively, confirming the acceptance of the protein models ( Figure S7). The VERIFY3D results showed that greater than 80.0% of the residues averaged a 3D-1D score of ≥0.2, representing the consistency of the predicted models ( Figure S8). The qualitative model energy analysis (QMEAN) provides an estimate of the absolute quality of the model by relating it to already-known X-ray crystallography-based solved protein structures ( Figure S9). Further, ProSA server [51] (protein structure analysis) was considered for the detection of errors in theoretical and experimental protein models. The z-score showed complete model quality as well as calculated the differences in the total energy of the protein structure with respect to an energy distribution derived from all possible random conformations. The ProSA analysis revealed that the z-scores were very similar for target and templates with a minimum structural error difference, which indicates that the stereo-chemical properties of the protein model coordinates were reliable ( Figure S10). Thus, on the basis of qualitative as well as quantitative parameters, the predicted models were found to be appropriate to proceed with further analysis. The RMSD estimation by the superimposition of the predicted protein model over the alreadyknown experimentally determined structure using the Superpose online server [52] also re-confirms the quality of predicted models. The superimposition of Type I, Type II, and Type III with each other showed that the overall sequence similarity between Type I and Type II: 46/64 (71.9%); Type I and Type III: 33/67 (49.3%), and Type II and Type III: 37/67 (55.2%). A similar type of results was observed when the Type I (CcWRKY1) protein model was superimposed with a template 2AYD (80.3%), the local and global RMSD values were 0.46 Å around backbone and 0.35 Å alpha carbon atoms while the similar template was aligned with Type II (CcWRKY51), the RMSD values for alpha carbons was 0.47 Å and 0.52 Å about backbone atoms and when aligned with Type III (CcWRKY70), the RMSD-values for alpha carbon atoms were1.55 Å and 1.46 Å around for backbone atoms. In contrary, when Type I (CcWRKY1), Type II (CcWRKY51), and Type III (CcWRKY70) protein model were superimposed with the template 1WJ2, we found relatively higher RMSD values for the backbone and alpha carbon atoms (Table S3). The superposition results concluded that the conservancy of WRKY DBD structures along with sequence resemblance across the different WRKY group members. The RMSD estimation by the superimposition of the predicted protein model over the already-known experimentally determined structure using the Superpose online server [52] also re-confirms the quality of predicted models. The superimposition of Type I, Type II, and Type III with each other showed that the overall sequence similarity between Type I and Type II: 46 (Table S3). The superposition results concluded that the conservancy of WRKY DBD structures along with sequence resemblance across the different WRKY group members.

Molecular Docking Analysis of WRKY-DBD Variants with DNA-W-box
Protein docking is an essential tool in structural molecular biology, which is used to identify the residues mainly responsible for the interaction among protein-protein or protein-ligand molecules. Docking analysis of WRKY-DBD with the targeted cis-regulatory motifs was studied in several plant species such as rice and A. thaliana. The representative protein structure docked with DNA W-box using HADDOCK webserver ver.2.2. server [53] generated seven, twelve, and fourteen clusters for Type I, Type II, and Type III WRKYs, respectively. For Type I, the size of cluster 1 was most significant (60%) and for Type II (44%), whereas the largest cluster in Type III was cluster 1 (22%) ( Figure 5). Among all the predicted clusters, the selection of the best cluster was made on the basis of the HADDOCK score. The best scoring complex with lowest HADDOCK score was selected for further analysis.

Molecular Docking Analysis of WRKY-DBD Variants with DNA-W-box
Protein docking is an essential tool in structural molecular biology, which is used to identify the residues mainly responsible for the interaction among protein-protein or protein-ligand molecules. Docking analysis of WRKY-DBD with the targeted cis-regulatory motifs was studied in several plant species such as rice and A. thaliana. The representative protein structure docked with DNA W-box using HADDOCK webserver ver.2.2. server [53] generated seven, twelve, and fourteen clusters for Type I, Type II, and Type III WRKYs, respectively. For Type I, the size of cluster 1 was most significant (60%) and for Type II (44%), whereas the largest cluster in Type III was cluster 1 (22%) ( Figure 5). Among all the predicted clusters, the selection of the best cluster was made on the basis of the HADDOCK score. The best scoring complex with lowest HADDOCK score was selected for further analysis.  The HADDOCK score is the sum of van der waals energy, electrostatic energy, desolvation energy as well as restraint violation energies while the z-score shows the dependability of the particular docked complexes from the cluster. For all the docked complexes, HADDOCK score versus i-RMSD and HADDOCK score versus l-RMSD plots were generated. From the predicted clusters of Type I WRKY domain, cluster 1 was considered as the best docked complex having the lowest HADDOCK score of −93.6 +/− 5.6, and z-score of −1.6 (Table S4) (Tables S5 and S6).

Comparative Interaction Analysis of Unbound and Bound Complexes Using MD Simulations
To obtain more details and accurate information about the WRKY protein structure as well as WRKY-DNA complex, the WRKY protein-DNA complex was subjected to simulations of 100 nanoseconds (ns). To calculate the stability of Type I, Type II, and III WRKY WRKY-DNA complex, backbone root mean square deviation (RMSD), root mean square fluctuation (RMSF), the radius of gyration (Rg), and solvent accessible surface area (SASA) were examined. Generally, the stability of every WRKY-DNA complex was calculated by analyzing the RMSD profiles for three replications (Supplementary dataset 1).
In the case of Type I and Type II WRKY-DNA complexes, it remains stable throughout the entire simulation run, while in Type III, higher variation in RMSD was observed. Furthermore, the Type III WRKY-DNA complex attained overall stability after 85,000 ps ( Figure 6a). The RMSF versus time plot showed that heptapeptide amino acid residues of Type III showed slightly higher fluctuation than the Type I and Type II, which enables the stable interaction of Type I WRKY DBD with DNA W-box during whole simulation process as compared to Type II and III (Figure 6b). Subsequently, Type I and Type II complexes were more compact (lower Rg value) than Type III, producing relatively more stable WRKY protein-DNA complexes (Figure 6c). Additionally, RMSF as well as Rg profiles for the Type I and Type II were constant as their RMSD profiles. This significant difference was observed in the solvent accessible surface area (SASA) values during the MD simulation run. The Type I and Type II exhibited SASA value of 50.12 nm 2 and 51.8 nm 2 , whereas Type III showed relatively higher SASA values of 55.6 nm 2 (Figure 6d). Significant alterations were noticed in the number of H-bonds in Type I, Type II, and Type III WRKY DBD and DNA during the entire simulation process (Figure 6e). The Type I, Type II, and Type III WRKYs exhibited H-bonds in a range of four to ten throughout the entire simulation process. The average total energy of Type I, Type II, and Type III WRKY WRKY-DNA complexes were −446,044 kJ/mol, −386,285 kJ/ mol, and −394,171 kJ/mol, respectively. The higher energy resulted in greater stability for the Type I WRKY WRKY-DNA complex. The stable Type I, Type II, and Type III complexes were extracted from the last 10 ns for further analysis.
The stabilization of Type I WRKY-DNA complex was due to the establishment of four hydrogen bonds (H-bond) formed with the residues Gln14, Arg28, Gly33, and Lys37 along with three hydrophobic interactions (Tyr12, Gly13, Lys15, Tyr26, and Arg28) ( Table 1 and Figure 7). Similarly, the complex of Type II was stabilized by three H-bonds with Arg10, Lys15, and Lys32 residues and four Arg12, Gly14, Lys24, and Arg28 hydrophobic interactions (Table 1 and Figure 8). The complete binding study of the Type III cluster revealed the construction of four H-bonds with Lys16, Arg22, Tyr24, and Lys35 and four hydrophobic interactions Lys11, Tyr12, Lys15, Arg33, and Tyr36 respectively (Table 1 and Figure 9). Generally, the structural feature of the WRKY-DNA complex explains that hydrogen bonds play a significant role to stabilize the protein-DNA complex along with the hydrophobic interactions.

Binding Free Energy Analysis
Comparative binding free energy estimation was carried using the MM-PBSA method to determine the binding affinity of DNA for Type I, Type II, and Type III WRKY DBD. Our results clearly showed that Type I WRKY exhibited relatively lower binding free energy of −443.21 kJ/mol as compared to Type II and Type III with free energy values of −396.86 kJ/mol and −382.12 kJ/mol respectively. Therefore, the Type I had more affinity towards DNA as compared to the Type II and III. It might be due to the loss of contacts between the amino acid residues and nucleic acid.

Binding Free Energy Analysis
Comparative binding free energy estimation was carried using the MM-PBSA method to determine the binding affinity of DNA for Type I, Type II, and Type III WRKY DBD. Our results clearly showed that Type I WRKY exhibited relatively lower binding free energy of −443.21 kJ/mol as compared to Type II and Type III with free energy values of −396.86 kJ/mol and −382.12 kJ/mol respectively. Therefore, the Type I had more affinity towards DNA as compared to the Type II and III. It might be due to the loss of contacts between the amino acid residues and nucleic acid.

Binding Free Energy Analysis
Comparative binding free energy estimation was carried using the MM-PBSA method to determine the binding affinity of DNA for Type I, Type II, and Type III WRKY DBD. Our results clearly showed that Type I WRKY exhibited relatively lower binding free energy of −443.21 kJ/mol as compared to Type II and Type III with free energy values of −396.86 kJ/mol and −382.12 kJ/mol respectively. Therefore, the Type I had more affinity towards DNA as compared to the Type II and III. It might be due to the loss of contacts between the amino acid residues and nucleic acid.

Discussion
The WRKY protein plays a diverse role in regulating different physiological processes, such as plant growth, development, plant senescence, signal molecule delivery, biotic, or abiotic stress responses [54]. Several studies on gene expression studies resulting in the revelation of biotic and abiotic stressors in different plant species have confirmed the desired role of WRKY transcription factors in defense response against pathogens [55]. The WRKY transcription factors recognize the W-box(core TTGACC/T motif), which is present at the promoter region of the genes for a preferential binding that regulates the dynamic signaling cascade [56]. The 3D structure and molecular dynamics of WRKY protein and its interaction pattern with DNA W-box in pigeonpea have not been studied yet. The analysis of targeted CcWRKYs at nucleotide as well as protein level describes its nucleotide composition, A-T, C-G loaded region as well as physicochemical properties including mol. weight, distribution of amino acids, theoretical pI, instability index, aliphatic index, and GRAVY index were found in a suitable range for controlling the stability of WRKY protein. The knowledge about the secondary structure of the target proteins is critical to understand the protein folding in 3D-structure conformations. The comparative secondary structure assessment between the target and templates showed strong homology across the entire sequence gives a valid reason for selecting particular templates [57]. The phylogenetic analysis of Type I (WRKY1), Type II (WRKY51), and Type III (WRKY70) of C. cajan, and its sequential homolog in G. max, P. vulgaris, M. truncatula, and C. arietinum showed strong conservation of core amino acid residues around the WRKY domain disclose their evolutionary significance during their phylogenetic origin. This study revealed the CcWRKYs showing closed evolutionary relatedness with G. max followed by P. vulgaris, M. truncatula, and C. arietinum. Conserved motif analysis reveals the presence of a group-specific WRKY domain within the similar groups and the presence of additional motifs or un-conserved domains explains their divergence within the group.
To empathize the molecular mechanisms essentially underlying connections that alter the WRKY transcription factor binding activity, information about the protein sequence and structure is requisite. Comparative protein modeling is considered as one of the most accurate methods for prediction of three-dimensional protein structures, producing appropriate protein models for a broad range of applications in the area of molecular sciences [57]. The greater sequence similarity assures a consistent alignment between the target sequence and the templates. Hence, 3D models for Type I, Type II, and Type III was developed and analyzed through various computational tools to analyze the stereo-chemical quality of our predicted protein model to understand its reliability. Accuracy of the protein model is essential for further analysis like docking, screening, or identification of potential lead compound because the accurateness of the protein model determines the scope of its possible applications. The quality and reliability of the predicted protein models were checked using the Ramachandran plot generated via the PROCHECK server [50]. ThePhi (Φ) and Psi (ψ) distribution of the amino acids showed 98.1% residues in the most favored region for Type I and 98.0%, 96.7% for Type II, and Type III, suggesting that the predicted protein models fall within the range of known proteins. The VERIFY3D results showed greater than 80.0% of the residues had an average 3D-1D score >0.2, supporting the reliability of the protein models [58], then further ProSA analysis was performed to confirm the quality of models [51], followed by superimposition with experimentally determined structure results. The MD simulation was run for 100 ns to stabilize the Type I, Type II, and Type III WRKY DBD structures.
The docking of protein models with the DNA-W-box using the HADDOCK web server generated seven, twelve, and fourteen clusters for Type I, Type II, and Type III, respectively. The largest cluster size of (60%) was for Type I and for Type II (44%), whereas the largest cluster in Type III is Cluster 1 (22%) ( Figure 5). Selection of the best cluster was made on the basis of HADDOCK score, and the complex with the lowest HADDOCK score was selected for further analysis [53]. The cluster 1 of Type I with HADDOCK score of −93.6 +/− 5.6, and Z-score of −1.6, was selected as best docked complex and for the Type II and Type III, cluster 2 and cluster 4 having lowest HADDOCK scores of −78.9 +/− 4.7 and −89.8 +/− 10.0and Z-scores of−1.2 and −2.3 respectively. The stabilization of Type I WRKY-DNA complex is due to the arrangement of four H-bond formed with the Gly13, Gln14, Gly33, and Lys37 residues along with three hydrophobic interactions Gly13, Lys15, and Arg28 (Table 1 and Figure 7). Similarly, the Type II WRKY-DNA complex was stabilized by three H-bonds with Arg10, Lys15, and Lys32 residues and four Arg12, Gly14, Lys24, and Arg28 hydrophobic interactions (Table 1 and Figure 8). The binding study of the Type III cluster revealed the construction of four H-bonds with Lys16, Arg22, Tyr24, and Lys35 and four hydrophobic interactions Lys11, Tyr12, Lys15, Arg33, and Tyr36, respectively (Table 1 and Figure 9). The structural knowledge of the WRKY-DNA complex revealed that hydrogen bonds play a very significant role in stabilizing the WRKY protein-DNA complex as well as hydrophobic interactions.
After the MD simulation of WRKY-DNA complex for 100 ns, the backbone RMSD profiles of Type I, Type II, and Type III were computed to check the stability of the complexes. The Type I and Type II WRKY-DNA complex showing stable throughout the simulation run, while in Type III high variation in RMSD was observed as compared to the Type I and Type II WRKY-DNA complexes, and attains the overall stability after 85,000 ps. The RMSF analysis indicating the higher degree of flexibility of interacting residues for Type II and Type II as compared with the Type I. Subsequently, Type I and Type II WRKY-DNA complexes were more compact (lower Rg value) than Type III, resulted in the stable protein-DNA complex. From the RMSD, RMSF, Rg, and hydrogen bond analysis for WRKY-DNA complexes, it was observed that the Type I complex is more stable than Type II and Type III. The binding energy results indicated that point substitution of glutamine to lysine in Type II and glutamine to glutamic acid substitution in Type III declines the binding affinity of WRKY DBD with DNA, leading to the instability of the complex. Therefore, our results showed that Type I WRKY DBD has more affinity towards DNA as compared to the Type II and Type III variants. It may be due to the loss of interaction between the protein and DNA molecules.

Sequence Retrieval and Analysis
Based on the present analysis, the most commonly occurring variants of WRKY-DBDs in pigeonpea was considered for this study. From the UniProt database [59], the Type I WRKY (WRKY1), Type II, and Type III WRKY-DBD in pigeonpea are named WRKY51, and WRKY70, which are the sequence homologs of AtWRKY1, AtWRKY51, and AtWRKY70, belonging to the WRKY group I, II, and III, and which were selected for this study. The amino acid (Accession no., XP_020234360.1, XP_020213367.1, XP_020210665.1) and nucleotide (Accession no., XM_020378771.1, XM_020357778.1, XM_020355076.1) sequences of the pigeonpea (CcWRKY1, CcWRKY51, and CcWRKY70) were retrieved from the National Centre for Biotechnology Information (NCBI) database [60]. The occurrence of the functional WRKY (DBD) region inCcWRKY1, CcWRKY51, and CcWRKY70 was identified using the NCBI-Conserved Domain Database [61] and the ExPASy-ScanProsite tool [62]. The MATLAB Bioinformatics toolbox [63] was used for sequence analysis to determine the density of nucleotides, AT-GC density, dimercount, base count, and peptide proportion.

Primary and Secondary Structure Analysis
The primary structural properties of the selected peptides were studied using the Expasy-ProtParam tool [64]. Physicochemical properties such as mol. weight, atomic composition, isoelectric point (pI), instability index (II), aliphatic index (AI), and Ggrand average of hydropathicity (GRAVY) were measured. Secondary structure prediction of CcWRKY1, CcWRKY51, and CcWRKY70 was made using self-optimized prediction from multiple alignments (SOPMA) tool [65]. The SOPMA is based on self-optimized prediction approach which has been described to increase the success rate in the secondary structure prediction of target proteins and visualization of the secondary structure by using the PDBsum web server [66]. The CYS_REC program of Soft Berry was used to calculate disulfide bridges (S-S) present in the peptides [67].

Sequence Alignment, Phylogeny, and Potential Motif Analysis
Sequence alignment of Type I, Type II, and Type III WRKY DBDs (CcWRKY1, CcWRKY51, and CcWRKY70), along with their sequential homologs in G. max, P. vulgaris, M. truncatula, and C. arietinum were performed. A phylogenetic tree was constructed using the ML method, which is based on the JTT+G+I substitution model having lowest a BIC scores (Bayesian information criterion) with 1000 replicates of bootstrap by MEGA ver.6.0 [68]. The distribution of potential motifs in selected WRKYs were investigated using multiple expectation maximization for motif elicitation (MEME) Suite 5.0.3.

Generation of Protein 3D Models
The selected CcWRKYs with >50% sequence identity with the templates in the BLASTP [70] analysis against RCSB Protein Data Bank was employed for comparative 3D protein structure modeling. Templates PDB ID: 2AYD (C-terminal WRKY domain of AtWRKY1), 1WJ2 (C-terminal WRKY domain of AtWRKY4), and 2LEX (Complex of the C-terminal WRKY domain of AtWRKY4 and W-box DNA), showing the higher sequence identity, query coverage as well as less E-value with the targets. The comparative protein modeling starts with a sequence alignment of a target sequence, and the template whose structure has been already known, the three-dimensional structure of target protein was generated by Modeller 9.19 program [57].

Evaluation of Structural Models
The stability and reliability of the generated protein models were verified using SAVES server version 4 [50]. The server has several assembled programs such as PROCHECK, which investigate the residues in available zones of Ramachandran plots, used to assess the stereochemical quality of the protein model. The ERRAT gives the overall quality factor of the protein, used to check the statistics of the non-bonded interactions among different atoms [71]. Similarly, VARIFY_3D was to determine the compatibility of the atomic model with its own amino acid sequence [58]. Energy minimization of the predicted WRKY proteins was done by using Swiss PDB Viewer 4.1.0. The ProSA Web Server was used in the refinement and validation of the minimized protein structure by measuring the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations [51]. In addition, to determine the structural topology, the modeled C-terminal domain of each Type I, Type II, and Type III WRKY protein models were made to superimpose over the template using SuperPose version 1.0 [52].

Binding Site Prediction
The functionality of a protein sequence was determined by the presence of a well-conserved group of amino acids recognized as a binding site. The binding sites of Type I, Type II, and Type III CcWRKYs were predicted by the COACH server [72], which is a meta-server approach based protein-ligand binding site prediction server. It investigates the ligand binding sites in a target protein by comparing with corresponding binding-specific substructure and sequence profile. The model with the lowest DOPE score and highest GA341 score were chosen for further analysis.

Protein-DNAInteraction
To investigate the molecular connections between WRKY-DBD and DNA W-box, DNA (W-box) was docked to the specific binding sites of WRKY DBDs using HADDOCK web server version 2.2 [53]. The HADDOCK is known for high ambiguity-driven protein-protein docking that calculates docking interfaces for protein-nucleic acid complexes based on experimental knowledge in the form of ambiguous interaction restraints. Ambiguous interaction restraints (AIR) were generated on the basis of active and passive residues in WRKY-DBD. The possible binding sites predicted from the COACH server was designated as active amino acid residues while passive amino acid residues were automatically defined by the program. Ambiguous interaction restraints (AIR) were generated on the basis of lists of selected active and passive residues for both protein and DNA. Further, the illustration and visualization of the docked complexes were analyzed by UCSF Chimera ver.1.11.2 [73] for their interaction studies. The W-box DNA-protein complex was inspected using LigPlot + v.2.1 [74] and PyMol ver.3.2 [75].

Molecular Dynamics (MD) Simulations Analysis
The MD simulation of modeled WRKY-DBD variants and its complexes with DNA-W-box were carried out by using the GROMACS 5.0 software package [76]. For the simulation of bound and unbound WRKY-DBDs AMBER99SB-ILDN proteins, the nucleic AMBER94 force field was applied [77]. A force field is basically like a mathematical equation used to calculate, simulate the binding energy of the atoms. The acronym AMBER stands for assisted model building and energy refinement. For the stability analysis, protein and protein-DNA complexes were solvated into a cubic water box, with a temperature of 300 K which was retain computationally. The systems were solvated using the simple point charge (SPC) water model [78] and sodium and chloride ions were added for neutralization of the entire system. At first, the protein models were simulated to investigate its dynamic behavior and stability without the DNA molecule. The protein-DNA complex was generated using the HADDOCK server. Energy minimization was initially performed when the maximum force was below 1000 kJ/mol/nm, using the steepest descent method for 50,000 cycles, to release steric clashes or conflicting contacts. After the energy minimization step, all systems were employed for a 10,000 ps position restraint dynamics simulation (equilibration phase) under NVT-fixed volume and temperature and NPT-fixed pressure and temperature phase conditions. Finally, the well-equilibrated system was subjected to an MD production run for 100 nanoseconds at 300 K and 1 bar pressure. The stability and dynamic behavior of each residue of the WRKY-DBD variants were also analyzed by plotting backbone root mean square deviation (RMSD) versus time graph. Similarly, the radius of gyration (Rg), root mean square fluctuation (RMSF), solvent accessible surface area (SASA) and principal component analysis were analyzed using GROMACS' inbuilt utilities g_rmsf, g_gyrate, g_sasa, and g_cover, respectively.

Conclusions
The 3D structure of the WRKY domain in pigeonpea was predicted through a comparative modeling approach using already-known crystal structure as templates. On the basis of MD simulation, analysis of Type I and Type III showed higher deviation and fluctuations as compared to Type II. From the 100 ns MD simulation of Type I, Type II and Type III WRKY-DNA complexes, it was concluded that the H-bonds and hydrophobic interactions played a significant role in stabilizing the protein-DNA complex. The substitution of amino acid in the conserved domain may alter the interaction behavior of protein DBD with the DNA, which leads to the changed protein function. We conclude that the analysis of WRKY transcription factors and target cis-regulatory elements will be helpful to understand better the effect of different WRKY variants and overall molecular mechanisms regulated by the WRKY family in plants.   Figure S3. Topology diagrams showing the secondary structure of Type I, Type II, and Type III WRKY generated by PDBsum web server. Four beta strands are labeled by 'β' and represented as arrows. Figure S4. Multiple sequence alignment showing strong conservation of amino acid residues across the different species. Figure S5. Distribution of β-sheets in (a) Type I; (b) Type II, and (c) Type III WRKYs of pigeonpea. The C and N represent the C-terminal and N-terminals of the WRKY domain. Figure S6. Ramachandran plot of (a) Type I WRKY; (b) Type II and (c) Type III CcWRKY proteins. The most favored region indicated by red color, additionally allowed, generously allowed, and disallowed regions are indicated as yellow, light yellow, and white colors, respectively. Figure S7. ERRAT overall quality score analysis of the protein models for Type I, Type II, and Type III WRKY DBDs. Figure  S8. Varify3D amino acid level quality score of Type I, Type II, and Type III WRKY DBD protein models. Figure S9. Qualitative Model Energy Analysis (QMEAN) measurements for protein model quality of Type I, Type II, and Type III WRKY DBDs. Figure S10. Comparative qualitative evaluation of Type I, Type II and Type III CcWRKYs performed using ProSA web server, the plot showing the structural error at each amino acid residues in the protein and estimates the overall quality score. The ProSA score for Type I, Type II and Type III were found close to the templates (2AYB and 1WJ2). Table S1. Physicochemical characteristics of selected accessions of Type I, Type II, and III WRKYs from pigeonpea. Table S2. Comparative secondary structure analysis of target and templates predicted from Self-Optimized Prediction Method with Alignment (SOPMA) Server. Table S3. Super imposition of Type-I, Type-II and Type-III WRKY members and also with the templates to calculate the structural closeness in terms of global as well as local RMSD-values that showed the structural conservation of WRKY DBDs. Table S4. WRKY-DNA complexes predicted for Type I (CcWRKY1) WRKY-DBD using HADDOCK Server. Table S5. WRKY-DNA complexes predicted for Type II (CcWRKY51) WRKY-DBD using HADDOCK Server. Table S6. WRKY-DNA complexes predicted for Type III (CcWRKY70) WRKY-DBD using HADDOCK Server. Supplementary dataset 1. Comparative analysis of the RMSD profiles of WRKY-DNA complexes for three replications. Funding: Financial support for this research was funded by the National Agri-Food Biotechnology Institute (NABI) Core grant and SERB, Govt. of India for JC Bose National Fellowship to TRS.