Structure-Function Mutational Analysis and Prediction of the Potential Impact of High Risk Non-Synonymous Single-Nucleotide Polymorphism on Poliovirus 2A Protease Stability Using Comprehensive Informatics Approaches

Polio viral proteinase 2A performs several essential functions in genome replication. Its inhibition prevents viral replication, thus making it an excellent substrate for drug development. In this study, the three-dimensional structure of 2A protease was determined and optimized by homology modelling. To predict the molecular basis of the interaction of small molecular agonists, docking simulations were performed on a structurally diverse dataset of poliovirus 2A protease (PV2Apr°) inhibitors. Docking results were employed to identify high risk missense mutations that are highly damaging to the structure, as well as the function, of the protease. Intrinsic disorder regions (IDRs), drug binding sites (DBS), and protein stability changes upon mutations were also identified among them. Our results demonstrated dominant roles for Lys 15, His 20, Cys 55, Cys 57, Cys 64, Asp 108, Cys 109 and Gly 110, indicating the presence of various important drug binding sites of the protein. Upon subjecting these sites to single-nucleotide polymorphism (SNP) analysis, we observed that out of 155 high risk SNPs, 139 residues decrease the protein stability. We conclude that these missense mutations can affect the functionality of the 2A protease, and that identified protein binding sites can be directed for the attachment and inhibition of the target proteins.


Introduction
Pakistan is making significant progress in interrupting and eradicating poliovirus (PV) from the country, which is still currently categorized as endemic for poliovirus transmission. No drugs are yet available to treat infected polio sufferers. Polio antiviral drug development will depend on ongoing The present study aimed to explore PV2A pr° as probable drug targets, and to identify the functional and structural impact of nsSNPs of PV2A pr°, along with mapping of active sites on PV2A pr° using molecular docking protocols, where antiviral agents might potentially bind ( Figure 1). Computational analysis of the functional and structural effect of highly damaging SNPs will be helpful in facilitating in finding true disease associations. We propose that this study will help in understanding and predicting highly damaging missense SNPs and potential binding sites to discover the mutation structure-function relationship in PV2A pr°. Following such structure-function investigations, it is predicted that novel agents that inhibit PV through interference with proteolytic processing in virus replication might be identified.

Protein Sequence Dataset
The protein sequences of PV2A pr° were retrieved from the NCBI GenBank (https://www.ncbi.nlm.nih.gov/genbank/). We retrieved all PV2A pr° sequences reported to date, from NCBI. A total of 553 sequences were retrieved from the database. The selected sequences representing all three serotypes (Type 1-3) were selected from various geographical regions, including Africa, the United States of America, Belarus, Brazil, Cameroon, Chad, China, Congo, Egypt, Estonia, Finland, Greece, Guinea, India, Iran, Israel, Kazakhstan, Madagascar, Nigeria, Pakistan, Peru, the Philippines, Russia, Spain, Switzerland, Taiwan, Tajikistan, Turkmenistan, and the United Kingdom. Out of 553 sequences, 250, 172, and 131 were from serotype 1, 2, and 3, respectively. These sequences include all vaccine-derived and wild type genotypes of poliovirus.

Protein Sequence Dataset
The protein sequences of PV2A pr• were retrieved from the NCBI GenBank (https://www.ncbi. nlm.nih.gov/genbank/). We retrieved all PV2A pr• sequences reported to date, from NCBI. A total of 553 sequences were retrieved from the database. The selected sequences representing all three serotypes (Type 1-3) were selected from various geographical regions, including Africa, the United States of America, Belarus, Brazil, Cameroon, Chad, China, Congo, Egypt, Estonia, Finland, Greece, Guinea, India, Iran, Israel, Kazakhstan, Madagascar, Nigeria, Pakistan, Peru, the Philippines, Russia, Spain, Switzerland, Taiwan, Tajikistan, Turkmenistan, and the United Kingdom. Out of 553 sequences, 250, 172, and 131 were from serotype 1, 2, and 3, respectively. These sequences include all vaccine-derived and wild type genotypes of poliovirus.

Protein Conservation Analysis
For sequence conservation analysis, sequences were aligned to multiple sequence alignment (MSA) and position-wise diversity using the ClustalW [24] CLC sequence viewer v3.6 Workbench [25], BioEdit v. 7.2.3 [26], and Datamonkey [27]. The sequences from each serotype were first aligned to obtain a consensus sequence, and subsequently these consensus sequences were aligned with each other.

Prediction of Intrinsic Disorder Regions
Intrinsic disorder regions are misfolded regions of the viral proteins, with most viruses using these regions to proceed with their viral processes. In our analysis, we included four software packages to identify disordered positions in PV2A pr• based on the majority representation by these software tools. In this context, a residue is considered as disordered if its score is above 0.5. Based on the real valued disorder propensities generated by MetaPrDOS [28], PSIPRED [29], and IUPred [30], disordered regions are defined as regions with scores above the grey line (threshold level generated by the software).

Drug Binding Site Predictions
Possible drug binding sites were predicted against polio 2A pr• , in order to identify the possible drug target within this protein. Protein Data Bank (PDB) structure was generated by I-Tasser/Modeller [17,31] by the software COACH [17] and FTSite [32]. A meta-server approach and energy-based method for binding ligand site prediction involved the use of COACH [17] and FTSite [32], respectively. The overall results were given in the form of position of potential drug binding sites. Only consensus binding sites predicted by these two software packages were taken.

Structure Prediction and Ligand-Protein Interaction Profiling
The three-dimensional structure of PV2A pr• was predicted by homology modeling using Coxsackievirus B4 2A proteinase (pdb 1Z8R) as a template. The homology model was generated with the modeling program Modeller v.9.14 [31] using automated sequence alignment by Clustal Omega [33]. Briefly, 100 independent models were generated, and energy minimized using MMFF94x force field. The final model was selected on the basis of the smallest number of outliers in a Ramachandran plot [34].
A total of 100 poses per ligand were generated using scoring function London scoring function (dG) and Alpha triangle matcher implemented in software MOE v.2013 [35]. All ligand-protein complexes were refined, and the energy minimized using MMFF94x force field parameters [36]. Final pose selection was performed using consensus scoring using GBVI/WSA dG, London dG scoring functions [37] of already-docked poses. The top ten best scored poses by each scoring function were considered for consensus; however, only those poses that showed maximum overlap of all scoring functions were selected for final ligand-protein interaction profiling.

Retrieval of Missense Single-Nucleotide Polymorphisms Datasets
After docking analysis, certain amino acid residues showed potential binding activities, and all such amino acid residues were subjected to missense SNP analysis. These amino acids include Gly 1, Lys 15, His 20, Cys 55, Cys 57, Cys 64, Asp 108, Cys 109, and Gly 110.

Analysis of Functional Consequences of Missense Single-Nucleotide Polymorphisms
To assess the potential functional effect of the missense SNPs, 15 prediction tools were utilized, which were further distributed into four different methods. All filtered missense SNPs were classified as deleterious by at least three tools in each of the four groups, and these were denominated as convergent deleterious predicted SNPs.
A missense SNP algorithm was achieved by means of the use of four tools: PROVEAN (Protein Variation Effect Analyzer) [38], SNPs & Gene Ontology (GO) [39], Meta-SNP [40] and PredictSNP [41]. SNPs predicted as deleterious by at least three servers were considered damaging.
To predict the functional aspects of SNPs, an alignment-based scoring method was conducted through PROVEAN. In PROVEAN, a threshold of −2.5 was used (a score of −2.5 was considered "deleterious", while a score > −2.5 was considered "neutral").
In order to predict whether a certain mutation is disease-related or not, an SNPs&GO algorithm was used [39]; this tool accurately uses protein GO annotation information to predict the relationship of mutation with the disease.
To reduce the bias of a single predictor, Meta-SNP [40] was used since it incorporates four existing methods: PANTHER (Protein ANalysis THrough Evolutionary Relationships) [42], PhD-SNP (Predictor of Human Deleterious Single Nucleotide Polymorphisms) [43], SIFT (Sorting Intolerant From Tolerant) [44], and SNAP (screening for non-acceptable polymorphisms) [45] for the detection of disease-associated nsSNVs. PANTHER, SIFT, and SNAP help in variant annotation in terms of their being disruptive to protein function or equivalent to wild-type, whereas PhD-SNP mainly identifies disease-associated substitutions [43].

Analysis of Protein Stability Changes Due to High Risk Missense Single-Nucleotide Polymorphisms
To quantitatively predict the change in protein stability due to high risk missense mutations, I-Mutant v. 2.0 [50], STRUM [51], and EASE-MM (sequence-based prediction of mutation-induced stability changes with feature-based multiple models) [52] web servers were used.

Protein Conservation Analysis
PV2A pr• sequences were selected from diverse geographical regions, including Africa, the United States of America, Belarus, Brazil, Cameroon, Chad, China, Congo, Egypt, Estonia, Finland, Greece, Guinea, India, Iran, Israel, Kazakhstan, Madagascar, Nigeria, Pakistan, Peru, the Philippines, Russia, Spain, Switzerland, Taiwan, Tajikistan, Turkmenistan, and the United Kingdom. These sequences represent all serotypes of poliovirus, and show many conserved patterns among all the serotypes (File S1).

Intrinsic Disorder Regions of Polio 2A pr•
For identifying intrinsic disorder regions in polio 2A pr• , three software packages were employed. Results were compared with all three software packages. This was performed to ensure that drug binding sites did not lie within this region. In our analysis, it was observed that polio 2A pr• possesses only a few intrinsic disorder regions, proposing it as a potential drug target protein ( Figure 2). Amino acid residues GFGH (1-4) and EAMEQ (145-149) were found in Intrinsic Disordered regions (IDR).

Drug Binding Site Predictions of Polio 2A pr°
To predict the drug binding sites of polio 2Apro, COACH and FTSite algorithms were employed using PDB file as an input. The consensus drug targets predicted by these two algorithms were taken for further evaluation. In this protein, three potential binding sites were predicted using FTSite.

Drug Binding Site Predictions of Polio 2A pr•
To predict the drug binding sites of polio 2Apro, COACH and FTSite algorithms were employed using PDB file as an input. The consensus drug targets predicted by these two algorithms were taken for further evaluation. In this protein, three potential binding sites were predicted using FTSite.

Drug Binding Site Predictions of Polio 2A pr°
To predict the drug binding sites of polio 2Apro, COACH and FTSite algorithms were employed using PDB file as an input. The consensus drug targets predicted by these two algorithms were taken for further evaluation. In this protein, three potential binding sites were predicted using FTSite.    [32]. First binding site is in green color, while second and third are in orange and blue, respectively.

Structure Prediction and Ligand-Protein Interaction Profiling
An understanding of the molecular basis of ligand-protein interactions in virus-encoded proteins could pave the way towards development of potential therapeutic agents against PV infection. In the absence of any crystallographic data for PV2A, we used homology modeling to compare it with an equivalent protease from coxsackievirus B4, which has been crystallized and is 56% identical in sequence to PV2A pr• . Coxsackievirus B4 belongs to the family Picornaviridae, and is a target for natural killer cells and pancreatic islet cells, resulting in cell apoptosis [53]. Following the homology modeling protocol, described similarities between a folding structure and active site catalytic residues were discovered in both template and target proteins (Figure 4). Briefly, following the generation of 100 independent models using Modeller v.9.14, Ramachandran plots of the final model revealed that 90.5% of the residues lie in the most favored position with 6.1% in additional allowed positions, and 3.4% in disallowed regions ( Figure S1). The small number of amino acid residues (Pro 90, His 101, Ser 105, Ile 112, Cys 115) in the disallowed region of the Ramachandran plot was selected for further molecular docking and for Root Mean Square Deviation (RMSD)-based comparison with the crystal structure of Coxsackievirus B4.

Structure Prediction and Ligand-Protein Interaction Profiling
An understanding of the molecular basis of ligand-protein interactions in virus-encoded proteins could pave the way towards development of potential therapeutic agents against PV infection. In the absence of any crystallographic data for PV2A, we used homology modeling to compare it with an equivalent protease from coxsackievirus B4, which has been crystallized and is 56% identical in sequence to PV2A pr°. Coxsackievirus B4 belongs to the family Picornaviridae, and is a target for natural killer cells and pancreatic islet cells, resulting in cell apoptosis [53]. Following the homology modeling protocol, described similarities between a folding structure and active site catalytic residues were discovered in both template and target proteins (Figure 4). Briefly, following the generation of 100 independent models using Modeller v.9.14, Ramachandran plots of the final model revealed that 90.5% of the residues lie in the most favored position with 6.1% in additional allowed positions, and 3.4% in disallowed regions ( Figure S1). The small number of amino acid residues (Pro 90, His 101, Ser 105, Ile 112, Cys 115) in the disallowed region of the Ramachandran plot was selected for further molecular docking and for Root Mean Square Deviation (RMSD)-based comparison with the crystal structure of Coxsackievirus B4. In order to estimate the overall accuracy of the model, a 3D structural comparison of the PV2A pr° and the equivalent Coxsackievirus B4 protein was carried out by measuring RMSD between Cα-atom positions between two proteins using the MOE version 2013 program. After flexible alignment of both structures, an RMSD of 0.08 Å was obtained, that indicated the consistency of the homology model ( Figure 5). In PV2A pr° the predicted secondary structure consists of 27.52% alpha helix, 26.17% extended strand, and 29.53% random coils ( Figure S2), while the predicted structure from residues Ser 95 to Ser 105 comprises a long loop of the protein. In order to gain deeper insights into the ligandprotein interaction pattern, four structurally diverse ligands were docked into the binding cavity of the modelled structure of PV2A pr°. In order to estimate the overall accuracy of the model, a 3D structural comparison of the PV2A pr• and the equivalent Coxsackievirus B4 protein was carried out by measuring RMSD between Cα-atom positions between two proteins using the MOE version 2013 program. After flexible alignment of both structures, an RMSD of 0.08 Å was obtained, that indicated the consistency of the homology model ( Figure 5). In PV2A pr• the predicted secondary structure consists of 27.52% alpha helix, 26.17% extended strand, and 29.53% random coils ( Figure S2), while the predicted structure from residues Ser 95 to Ser 105 comprises a long loop of the protein. In order to gain deeper insights into the ligand-protein interaction pattern, four structurally diverse ligands were docked into the binding cavity of the modelled structure of PV2A pr• .
Ligands were prepared as a dataset for docking studies against PV2A pr• using the ChemDraw program ( Figure S3). In order to remove any bias, 100 poses per ligand were generated at 5 Å. All poses were subjected to ranking based on the scoring functions to obtain best score poses. Only those binding solutions that gave maximum overlap of the GBVI/WSA dG, London dG scoring functions were selected for ligand-protein interaction analysis.  Ligands were prepared as a dataset for docking studies against PV2A pr° using the ChemDraw program ( Figure S3). In order to remove any bias, 100 poses per ligand were generated at 5 Å . All poses were subjected to ranking based on the scoring functions to obtain best score poses. Only those binding solutions that gave maximum overlap of the GBVI/WSA dG, London dG scoring functions were selected for ligand-protein interaction analysis.

Missense Single-Nucleotide Polymorphism Dataset
After docking analysis, a total of ten amino acids were considered to be involved in binding sites. To investigate the effects of these binding site residues on the protein, we performed missense analysis. These ten amino acids (Gly 1, Lys 15, His 20, Cys 55, Cys 57, Cys 64, Asp 108, Cys 109, and Gly 110) were then subjected to missense analysis. All amino acids were mutated to every possible mutation. As a result, a total of 190 mutations were subjected to further analysis. Missense mutations

Missense Single-Nucleotide Polymorphism Dataset
After docking analysis, a total of ten amino acids were considered to be involved in binding sites. To investigate the effects of these binding site residues on the protein, we performed missense analysis. These ten amino acids (Gly 1, Lys 15, His 20, Cys 55, Cys 57, Cys 64, Asp 108, Cys 109, and Gly 110) were then subjected to missense analysis. All amino acids were mutated to every possible mutation. As a result, a total of 190 mutations were subjected to further analysis. Missense mutations totaled ten, and were employed in a variety of in silico SNP prediction tools, in order to determine the effect of a given missense mutation on the respective gene function.

Missense Single-Nucleotide Polymorphisms Analysis
To obtain higher accuracy results, four in silico SNP prediction tools (PROVEAN, SNPs&GO, Meta-SNP, and PredictSNP) were employed in our study to predict the high risk missense SNPs. A total of 190 SNPs were subjected to analysis using theses algorithms.  Table S2).
We categorized SNPs as damaging if they were predicted to be damaging by three or more SNP prediction algorithms in the case of Meta-SNP, while criteria of five or more for PredictSNP, and two or more for SNPs&GO, were used. After this classification, we further concentrated our analysis to select those SNPs of high risk from these four algorithms (PROVEAN, SNPs&GO, Meta-SNP, and PredictSNP), and they were then observed for their ability to confer damaging effects when using three or more prediction tools (Table S2). Out of a total of 192 SNPs, 155 were considered high risk and were subjected to further stability studies. To predict highly deleterious SNPs, these selected algorithms covered a maximum number of methods, including alignment scores, neural networks, hidden Markov models, support vector machine, and Bayesian classification.

Protein Stability Changes
The protein stability change upon point mutation was estimated using I-Mutant2.0, STRUM ad EASE-MM web-servers. Only highly deleterious missense SNPs were considered for the analysis. SNPs were considered as destabilizing in nature if two or more algorithms showed a decrease in stability upon mutation. Out of 155 high risk SNPs, 17 SNPs (K15L, K15P, H20E, H20L, D108C, D108E,  D108F, D108I, D108L, D108M, D108P, D108Q, D108S, D108T, D108V, D108W, and D108Y) were found to be more stable than the wild type residue. The remaining SNPs resulted in decreased protein stability compared to wild type.

Discussion
PV2A pr• is a member of the cysteine protease superfamily of proteins and is involved in various physiological processes in virus infected cells, and is a significant target for therapeutics. However, the lack of PV2A pr• crystal structure is a restraint in understanding atomic details. In this study, we took advantage of PV2A pr• primary structural similarity to other viral proteases, specifically that from Coxsackievirus B4, and performed an in silico analysis of its structure.
Computational techniques have enabled preliminary assessments of the binding capabilities of a range of potential compounds for use in viral therapy, prior to scale-up of their synthesis and further testing. To assess the binding affinity of any potentially useful compounds against their target proteins, molecular docking has been used extensively in both drug design and for screening of novel compounds. Four protease inhibitors, which may be potential therapeutic agents, were docked against PV2A pr• to determine any binding interactions. These inhibitors have shown promising in vitro results previously. Rupintrivir is a protease inhibitor which has successfully been used previously to inhibit replication of viruses with 3C proteases, including enterovirus 93 [54,55], and has therapeutic potential for treating PV infections, because its docking behavior with PV2A pr• is of a similar profile. Rupintrivir also has potent antiviral activity against all HRV serotypes tested, and has low toxicity [56]. A cation-π interaction observed between PV2A pr• Gly 111 and z-VAD, which is a known inhibitor of block eIF4Gase activity in vivo [57], suggests the therapeutic usefulness of this inhibitor. Similarly, MCPK which has hydrogen bonds with residues Gly 1, Cys 57, and Cys 64 ( Figure 6) in PV2A pr• , may also be a potential therapeutic agent. MCPK is an elastase-specific inhibitor which inhibits PV2A pr• both in vivo and in cell-free systems [8]. The presence of fluorine in rupintrivir, which is absent from elastatinal, is thought to increase its binding capacity and its potential as an inhibitor of virus replication. Yu and Lloyd [13] (Figure 7 [13], while study III comprises the Sommergruber et al. results [58].
The docking results indicated that residues Gly 1, His 20, Cys 109, and Cys 57 contribute significantly to strong electrostatic interactions in the active site, particularly His 20 and Cys 109. Apart from these residues, Lys 15, Cys17, Cys 55, Cys 64, Asp 108, and Gly 110 were found to be potentially interacting with the compounds. All these residues were then subjected to SNP analysis to predict the sites which could impair the docking, as well as to diversify protein function.
SNPs may affect protein dynamics by changing protein stability, disrupting salt bridges or hydrogen bonding. They can affect the binding affinity and protein interaction by altering the specificity of the protein, disrupting and blocking the active site. Highly deleterious or damaging SNP identification can be a complicated task in large-scale analyses, and involves testing thousands of SNPs in the respective genes. Using in silico approaches, isolation of disease-associated SNPs from  [13], while study III comprises the Sommergruber et al. results [58].
The docking results indicated that residues Gly 1, His 20, Cys 109, and Cys 57 contribute significantly to strong electrostatic interactions in the active site, particularly His 20 and Cys 109. Apart from these residues, Lys 15, Cys17, Cys 55, Cys 64, Asp 108, and Gly 110 were found to be potentially interacting with the compounds. All these residues were then subjected to SNP analysis to predict the sites which could impair the docking, as well as to diversify protein function.
SNPs may affect protein dynamics by changing protein stability, disrupting salt bridges or hydrogen bonding. They can affect the binding affinity and protein interaction by altering the specificity of the protein, disrupting and blocking the active site. Highly deleterious or damaging SNP identification can be a complicated task in large-scale analyses, and involves testing thousands of SNPs in the respective genes. Using in silico approaches, isolation of disease-associated SNPs from neutral ones could significantly help in identifying disease-related mutations. In the current study, we selected highly damaging mutations in PV2A pr• which require further experimental testing to confirm their damaging effects.
Ten potential amino acids (Gly 1, Lys 15, Cys 17, His 20, Cys 55, Cys 57, Cys 64, Asp 108, Cys109, and Gly 110) involved in binding interactions in PV2A pr• were subjected to SNP analysis to obtain highly damaging missense SNPs. Except for Gly 1, all SNPs proved to be highly damaging, as Gly 1 resides in the disordered region.
The catalytic triad of PV2A pr• comprises His 20, Asp 38, and Cys 109, which are essential for autocatalytic activity; thus, single amino acid substitutions at these sites result in reduced catalytic activity of the protein [59]. Any mutation in the Cys 109 residue would result in the decrease of the protein stability (Table S2). Similar is the case involving His 20, except H20E and H20L SNPs, which increase the protein stability. Previously, it was reported that single amino acid mutations at the His 20 and Cys l09 residues completely lost the proteolytic activity of 2A pr• [60].
Complete loss of in cis and in trans activity were observed when single amino acid mutations at Cys 55 and Cys 57 were generated using site-directed mutagenesis followed by analysis of these mutant 2A pr• biochemical properties. Substitution of cysteine at residue 64 with serine was found to only slightly alter cis and trans activity. Also, the mutant C64N was suggested to be used to distinguish the in cis and in trans activities of 2A pr• [13]. Results of the current study determined that Cys 55, Cys 57, and Cys 64 are highly deleterious missense SNPs, unveiling their significance.
Mutational studies also showed that Gly 110 was hydrolyzed, and blocked eIF-4G [14]. Studies have also revealed that the mutant G110S, along with other mutants, are involved in blocking protein synthesis, as compared to wild type residues [61]. The current work also confirms the G110 residue as a highly missense SNP, as when mutated to any residue, decreases protein stability.
In light of these results, it can be concluded that along with other reported residues (His 20, Cys 55, Cys 57, Cys 64, and Cys 109), three of the residues, Lys 15, Cys 17, and Asp 108, show potential binding activity along with stability functions, unveiling their significance. Computational analysis of the protein proved it to be a better and more suitable drug target against polio. Results are very much in accordance with the author's previous work using blood mutational samples of 2A pr• [62].
Amino acid variations causing changes with respect to protein stability is one of the important setbacks in proteomics, along with the computational techniques used to isolate high risk missense SNPs from significantly large SNP datasets. Important ligand-protein interactions have been identified following docking investigations in these studies to determine candidates for drug targeting, the potency of which may be increased following relatively simple structural alterations.

Conclusions
In our investigation, we have identified the most deleterious mutation in PV2A pr• using various bioinformatics tools. High risk missense SNPs were separated for their deleterious effect on protein function based on these tools. Hence, the in silico analysis we performed proved to be both practical and valuable for developing a thorough understanding of various diseases, thereby providing a valuable resource for pharmacogenomic approaches. Antiviral activity of these inhibitors against polio and non-polio enteroviruses using in vitro assays would help us in better understanding of protein-inhibitor interactions.
The global effort has expanded capacities to tackle other infectious diseases by building effective surveillance and immunization systems. An antiviral drug targeting viral replication, rather than the symptoms of the infection, could also be of great advantage.

Conflicts of Interest:
We declare no conflict of interest.