The Sequence and a Three-Dimensional Structural Analysis Reveal Substrate Specificity among Snake Venom Phosphodiesterases

(1) Background. Snake venom phosphodiesterases (SVPDEs) are among the least studied venom enzymes. In envenomation, they display various pathological effects, including induction of hypotension, inhibition of platelet aggregation, edema, and paralysis. Until now, there have been no 3D structural studies of these enzymes, thereby preventing structure–function analysis. To enable such investigations, the present work describes the model-based structural and functional characterization of a phosphodiesterase from Crotalus adamanteus venom, named PDE_Ca. (2) Methods. The PDE_Ca structure model was produced and validated using various software (model building: I-TESSER, MODELLER 9v19, Swiss-Model, and validation tools: PROCHECK, ERRAT, Molecular Dynamic Simulation, and Verif3D). (3) Results. The proposed model of the enzyme indicates that the 3D structure of PDE_Ca comprises four domains, a somatomedin B domain, a somatomedin B-like domain, an ectonucleotide pyrophosphatase domain, and a DNA/RNA non-specific domain. Sequence and structural analyses suggest that differences in length and composition among homologous snake venom sequences may account for their differences in substrate specificity. Other properties that may influence substrate specificity are the average volume and depth of the active site cavity. (4) Conclusion. Sequence comparisons indicate that SVPDEs exhibit high sequence identity but comparatively low identity with mammalian and bacterial PDEs.

The generated model and DiANNA web server [38] also confirmed the presence of sixteen disulfide bridges in PDE_Ca.   PDE_Pm, phosphodiesterase Protobothrops mucrosquamatus (Gene Bank ID: XP_015675293.1), PDE_Pe, phosphodiesterase Protobothrops elegans (Gene Bank ID: BAP39928.1). Residues involved in catalysis and metal ion binding are underlined with blue and black, respectively. Cysteine residues that form disulfide bridges are linked (yellow lines). Putatively N-glycosylated amino acid residues are underlined in brown. Amino acid residues in the somatomedin B domain, the somatomedin B-like domain, the ectonucleotide pyrophosphatase/phosphodiesterase domain (also called autotaxin), and the DNA/RNA non-specific domain, are colored green, light green, blue, and red, respectively. Secondary structural elements (alpha helices and beta strands) are shown above the sequence. Sequence numbering corresponds to the PDE_Ca precursor protein.
The ThreaDom (Threading-based Protein Domain Prediction) online web server [41] for domain conservation indicated that the domain architecture of PDE_Ca is fully conserved, with other proteins in the Protein Data Bank (PDB) containing a similar structure fold ( Figure 2). The molecular weights of PDE_Ca in zymogen and the mature forms were 96.37 and 93.10 kDa, with corresponding pIs of 8.13 and 8.05, respectively [42]. The theoretically calculated molecular weights and pIs accord with the experimentally measured values for other SVPDEs [43][44][45][46][47].
The PROCHECK analysis of the best model shows that >95% of the amino acid residues are in the most favored region of the graph (Figure 3). The remaining 5% are in the allowed region with no residues in the forbidden/disallowed region. ERRAT analysis shows an overall quality factor of 87.88 for the PDE_Ca model, which lies in the average quality range for 3D protein structures [55]. In total, >95% of the amino acid residues are in the most favored region, while the remaining 5% are in the allowed region with no residues in the forbidden/disallowed region. Quadrant I displays a region where multiple conformations are allowed. Quadrant II shows the biggest region in the graph, with the most favorable conformations of atoms. Quadrant III shows the next biggest region in the graph, where the right-handed alpha helices lie. Quadrant IV has almost no outlined region. This conformation is disfavored due to steric clash.

Molecular Dynamics Simulation
GROMACS, AMBER16, MDWeb, and MDMobby [58][59][60] all produced the same results. MD simulation analysis indicated that all structural parameters, such as chirality, disulfide bonds, and the absence of steric clashes, were correct ( Figure S1A). The two basic assessments (root mean square deviation (RMSD) and radius of gyration (RG)) used to validate the structures through MD simulation were analyzed. The RMSD deviation indicated that the PDE_Ca structure did not deviate more than 1 Å from the initial structure, and the RG was also maintained at around 31.7 Å ( Figure S1B,C). Flexibility analysis (Bfactor and RMSD per residue), identified some flexible regions, located mostly in loop regions ( Figure S1C,D). All these analyses indicate that the simulated structure does not exhibit critical structural deformations.

Overall Structure of PDE_Ca
The 3D structure of PDE_Ca is similar to that of the other members of the alkaline phosphatase-like superfamily (ALP-like superfamily) [25,[61][62][63]. PDE_Ca has a complex structure. It is a multi-domain protein that consists of four domains-a somatomedin B domain, a somatomedin B-like domain, an ectonucleotide pyrophosphatase/phosphodiesterase domain (also called autotaxin), and a DNA/RNA non-specific domain ( Figure 4). These domains are briefly described below.

Somatomedin B Domain
The somatomedin B domain (SMB) is located at the N-terminus of the protein and comprises amino acid residues 33-79 ( Figures 1, 4 and 5). It has two alpha helices (Figures 4 and 5). The SMB domain is stabilized by four intrachain disulfide bridges, one salt bridge (between Asp52-Arg58) ( Table 3), and extensive hydrogen bonding (14 intrachain H-bonds).

Somatomedin B-like Domain
The somatomedin B domain connects to another domain called the Somatomedin B-like domain (SMB-like). This domain consists of residues 81 to 124 ( Figure 1). Like the somatomedin B domain, it also contains eight cysteine residues in four disulfide bridges ( Figure 1). The secondary structure of this domain contains two alpha helices (Figures 4 and 5). Beside its disulfides, the SMB-like domain is stabilized by four salt bridges (Arg82-Glu85, Arg87-Asp104, Arg87-Asp98, Lys102-Asp98), and thirteen interchain H-bonds (four with the SMB domain and nine with the PDE domain), as well as 24 intrachain H-bonds.
Both the SMB and SMB-like domains are highly compact. Disulfide bridges are arranged in the centers of both domains, forming covalently bonded cores (Figure 4).
The ENPP/PDE domain consists of amino acid residues 147-479 ( Figure 1). It contains five cysteine residues, two of which make an intrachain disulfide bridge and three of which make interchain disulfide bridges (two with the connecting segment and one with the DNA/RNA non-specific domain) (Figures 1 and 4) ( Table 2). This domain also contains the active site residues, together with the two zinc ions (Figure 4). One of the zinc ions (Zn +2 1) is coordinated by four amino acids (Asp153, Thr191, Asp358, and His59), and the other one (Zn +2 2) is coordinated by three residues (Asp311, His315, and His476). All these catalytic amino acid residues are fully conserved in mouse NPP1 [62], human NPP3 (61), human autotaxin ENPP2 [64], and bacterial PDE [29] (Figure 6).
The secondary structure of this domain contains 14 beta strands and 14 alpha helices ( Figure 5). Of the fourteen alpha helices and beta strands, seven and five are short alpha helices and beta strands, respectively. This domain is stabilized by interchain disulfide bridges (one with the SMB-like domain and seven with the DNA/RNA non-specific domain), 13 salt bridges (Table 3), and numerous interchain H-bonds.

DNA/RNA Non-Specific Domain
This domain comprises residues 603 to 867 ( Figure 1). It contains nine cysteines that form one intrachain and three interchain disulfides (one with the PDE domain and two with the connecting segment) (Figure 4). This domain also contains the Ca 2+ -binding loop ( Figure 4). The secondary structure of this domain contains seven beta strands and ten alpha helices (Figures 4 and 5). This domain is connected to the PDE domains through a long loop, called the lasso loop [65]. This domain is stabilized by four disulfides, eight salt bridges, and numerous H-bonds.  (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) and beta strands (A-L) are represented as cylinders and arrows, respectively. Short alpha helices and beta strands are shown as primes (e.g., 17'). Secondary structures and amino acid residues in alpha helices and beta strands were assigned from the primary sequence using the program DSSP [66] and were confirmed with PyMOL from the tertiary structure. Parts of the secondary structure belonging to the somatomedin B, somatomedin B-like, ectonucleotide pyrophosphatase/phosphodiesterase, and DNA/RNA non-specific domains, as well as the connecting segments (UN), are colored in green, light green, blue, red, and black, respectively.
To explain these broad specificities of SVPDEs, the structures of two other PDEs from Vipera lebetina (PDE_Vl) and Bothrops atrox (PDE_Ba) were modeled, using the same modeling and validation programs used for the PDE_Ca structural model. The atomic coordinate of human Ectonucleotide pyrophosphatase/phosphodiesterase 3 (PDB ID: 6C01, amino acid sequence identity 64.09% and 62.91% with PDE_Vl and PDE_Ba, respectively) was used as a template. The Ramachandran plot analysis indicates that in both the modeled structures, 98% of amino acid residues were in the favored region of the plot, and 2% were in the allowed region ( Figures S2 and S3). The ERRAT analysis shows an overall quality factor of 90 for the modeled structure, which lies in the average quality range for the protein 3D structures (Figures S4 and S5) [55]. The modeled structures of the PDE_Ca, PDE of Vipera lebetina (PDE_Vl), and Bothrops atrox (PDE_Ba) were compared, taking into account the active site residue composition, active site cavity volume and average depth, and the surface charge distribution (Figure 7, Table 4). The active site's amino acid residues are the same for these enzymes. However, the average active site's cavity volume, the average depth of the active site, and the surface charge distribution vary considerably (Figure 7, Table 4).
The average active site cavity volumes of PDE_Ca, PDE_Vl, and PDE_Ba are 6608.25, 3985.03, and 2243.11 Å 3 , respectively, with corresponding average depths of 21.39, 12.54, and 9.86 Å, respectively (Table 4). These values indicate that the average active site cavity volume and depth of PDE_Ca is much larger than that of either PDE_Vl or PDE_Ba. These characteristics permit larger substrates (ATP) to access the active site of PDE_Ca, while preventing it for PDE_Vl.
Another factor that affects the substrate specificity of these enzymes is the surface charge distribution (Figure 7). The surface charge of PDE_Ca (overall and around the active site) is highly positive (Figure 7A), for PDE_Vl it is partially positive and negative ( Figure 7B). Analysis of the active site cavity volume and its average depth indicate that SVPDEs with small active site cavity volumes and average depths (like Vipera lebetina, Daboia russelli russelli, and Cerastes cerastes ) ( Table 4) show a high preference for ADP, while other SVPDEs with large active site volumes and average depths (like PDEs from Crotalus adamanteus, Trimeresurus stejnegeri, and Bothrops jararaca) show a high preference for ATP.  2.8. Structural Alignment between PDE_Ca, Human ENPP3, Mouse NPP1, Human Autotaxin, Xa NPP1, PDE_Vl, PDE_Ba and Naja atra atra PDE.
The structural alignment between PDE_Ca, human ENPP3 [53], mouse NPP1 [61], human autotaxin [63], Xa NPP1 [62], phosphodiesterase from Vipera lebetina (PDE_Vl), Bothrops atrox (PDE_Ba), and the PDE from Taiwan cobra (Naja atra atra; PDB ID: 5GZ4 and 5GZ5) shows that the three-dimensional structural folds of these enzymes are similar and that all of them align well, with an RMSD value range between 0.21 and 0.92 (average RMSD value of 0.61) ( Table 5) (Figure 8). They have the same active site residues (Figures 1 and 6) and disulfide bridges (Figure 1). However, the amino acid residues in the loop regions vary considerably, both in composition and length (Figure 8). For this reason, the surface charge distribution also varies ( Figure 7A-G), which may impart variable substrate specificity to these enzymes [45,68,69]. The overall surface charge for the SVPDEs is positive, while it is negative for human ENPP3, mouse NPP1, human autotaxin, and Xa NPP1 ( Figure 7C-F). The average active site cavity volume and average depth also vary among these enzymes ( Table 4).

Maturation Mechanism for SVPDEs
PDE_Ca, like other SVPDEs, is synthesized as a precursor protein (zymogen) [43,45]. The immature PDE_Ca contains 851 amino acid residues [64] (Figure 1), in which the first 23 amino acid residues belong to a signal peptide (confirmed with SignalP 3.0 [70], Figure 9A), eight amino acid residues to the activation peptide, and the remaining 820 amino acid residues to the mature protein [48]. The signal peptide prevents the protein from proper folding and is removed cotranslationally or by signal peptidases [71,72]. The Kyte and Doolittle hydropathy plot [42] indicates that this region is located in the hydrophilic part ( Figure 9B). The function of the activation peptide is unknown. However, this is considered important for proper folding of the protein, as described for spider venom and plant proteins [2,73]. This part is removed by endopeptidases [74] ( Figure 9D). It is also located in the hydrophilic region of the Kyte and Doolittle hydropathy plot [42], and it is exposed on the surface of the protein (and thereby accessible to peptidases). The remaining peptide does not undergo further processing.

Sequence Retrieval and Multiple Sequence Alignment
The amino acid sequence of PDE_Ca (851 amino acid residues) (gene bank accession no. JAS04699.1; UniProt ID: A0A0F7Z2Q3) [64] was obtained from the NCBI (National Centre for Biotechnology Information) protein database (http://www.ncbi.nlm.nih.gov/protein). The signal peptide was identified using the SignalP 3.0 server [70] with default parameters. The amino acid sequence of PD_Ca was used as a query for searching homologous proteins from the non-redundant database by searching with the NCBI Protein BLAST using default parameters. The multiple sequence alignment of the selected homologous protein sequences, including the amino acid sequence of PD_Ca, was generated using MUSCLE [75]. The aligned sequences were edited and colored in Box-shade V3.21 [76].

Sequence Logo Generated from Multiple Sequence Alignment
The Weblogo 3.2 [77,78] was used to illustrate the conservation patterns of amino acids in the protein sequence, and graphically represent the multiple sequence alignment, using default parameters, except for the composition, which was done without adjustment.

In Silico Analysis of the Domain and Biochemical Properties of the PDE_Ca
The PDE_Ca primary sequence was analyzed for the presence of domains/motifs using the conserved domain search tool [79], available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. The protparam and Compute pI/MW tools from the ExPASy Proteomics server (http://web.expasy.org/ compute_pi/) [42] were used to compute the isoelectric point (pI) and molecular weight of the protein.

Prediction of Ligand Binding and Glycosylation Sites
The 3DLigandSite-Ligand binding site prediction Server [80] was used for ligand binding amino acid residues in PDE_Ca, while putative glycosylation sites were predicted using the NetNGlyc 1.0 Server [48] and ScanProsite tool [81], with default parameters.

Homology Model Building of PDE_Ca
The 3D model of PDE_Ca was generated using various online proteins modeling programs, such as I-TESSER [87], the MODELLER 9v19 program [51], and SWISS Model [52], using the atomic coordinates of human Ectonucleotide pyrophosphatase/phosphodiesterase 3 (PDB ID: 6C01, amino acid sequence identity 63.39%) as a template [53]. The final model was selected based on the quality and validation reports generated by PROCHECK [50].

Molecular Dynamics Simulation
The modeled structure of PDE_Ca was validated through MD simulation using various programs, like AMBER16 [58], GROMACS [59], MDweb, and MDMoby [60]. The all-atom protein interaction was determined using the FF14SB force field [85]. The web-server H ++ [84] was used to determine the protonation states of the amino acid side chain at pH 7.0. Chloride ions were used for system neutralization and were placed in a rectangular box of TIP3P water and extended to at least 15 Å from any protein atom. For the removal of bad contacts from the structure, the system was energy minimized for 500 conjugate gradients steps by applying a constant force constraint of 15 kcal/mol. Å 2 . The system was then heated gradually from 0 to 300 K for 250 ps with a constant atom number, volume, and temperature (NVT) ensemble, at the same time that the protein was restrained with a constant force of 10 kcal/mol. Å 2 . The equilibration step was carried out using the constant atom number, pressure, and temperature (NPT) ensemble for 500 ps, and the simulation was done for 100 ns with a 4 fs time step. The temperature and pressure were kept constant at 300 K and 1 atm, respectively, by Langevin coupling. The long-range electrostatic interactions were computed with the Particle-Mesh Ewald method (PME) [85], keeping the cut-off distance of 10 Å to Van der Waals interactions.

Structure Superimposition
The build PDE_Ca protein model was aligned to homologous proteins using the PyMOL molecular graphics visualization program [86].

Surface Charge Analysis
Charge and radius calculations were carried out using the PDB2PQR server program [88]. The surface and charge were then visualized in ABPS Tools from the PyMOL molecular graphics visualization program [86].

Conclusions
In conclusion, a sequence and structural analysis of PDE_Ca was carried out using various computational biology programs. The amino acid sequence comparison analysis indicated that SVPDEs display high sequence identity (90.6%) with one another and comparatively low sequence identity (58.33%) with mammalian and bacterial PDEs. The three-dimensional model of PDE_Ca, produced using various modeling programs, was of good quality, as shown by the PROCHECK and ERRAT analysis. The modeled structure was further analyzed by molecular dynamic simulation, and the analysis indicated that all important structural parameters, such as chirality, disulfide bonds, and the absence of steric clashes, were correct. The root mean square deviation and radius of the gyration did not suffer significantly during model building and were maintained at 1 Å and 31.7 Å, respectively. The structural analysis indicated that the complex structure of PDE_Ca is folded into a multi-domain protein that comprises four domains-a somatomedin B domain, a somatomedin B-like domain, an Ectonucleotide pyrophosphatase/Phosphodiesterase domain (also called autotaxin), and a DNA/RNA non-specific domain. Structural comparisons with PDEs from other snake venoms, and mammalian and bacterial counterparts indicated that the surface charge distribution and the average active site cavity volume and depth vary considerably, which may contribute to their variable substrate specificity. Finally, during the maturation process, venom PDEs lose their signal and activation peptides to convert into the fully active mature forms. The structure of PDE_Cα presented in this paper is only a predicted structure. These conclusions need to be confirmed with experimental evidence [89].
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6651/11/11/625/s1. Figure S1. Molecular dynamic simulation analysis of PDE_Ca; Figure S2. Ramachandran plot of the modeled structures of Vipera lebentia phosphodiesterase model; Figure S3. Ramachandran plot of the modeled structures of Bothrops atrox phosphodiesterase model; Figure S4. Errors plot for the modeled structure of Vipera lebentia. The plot was generated by ERRAT2. The amino acid residues showing errors were shown by black lines; Figure S5. Errors plot for the modeled structure of Bothrops atrox model. The plot was generated by ERRAT2. The amino acid residues showing errors were shown by black lines; Figure S6. Phylogenetic relationships of PDEs based on protein sequences according to the neighbor-joining method without distance corrections.
Author Contributions: A.U. design of the work, conceptualization, supervision, and methodology; C.B. Funding Acquisition, Validation, analysis, and interpretation of data; K.U., writing-review; H.A., manuscript drafting and S.u.R. substantial revision and editing of the manuscript.
Funding: This research received no external funding.