Identification of Potential HCV Inhibitors Based on the Interaction of Epigallocatechin-3-Gallate with Viral Envelope Proteins

Hepatitis C is affecting millions of people around the globe annually, which leads to death in very high numbers. After many years of research, hepatitis C virus (HCV) remains a serious threat to the human population and needs proper management. The in silico approach in the drug discovery process is an efficient method in identifying inhibitors for various diseases. In our study, the interaction between Epigallocatechin-3-gallate, a component of green tea, and envelope glycoprotein E2 of HCV is evaluated. Epigallocatechin-3-gallate is the most promising polyphenol approved through cell culture analysis that can inhibit the entry of HCV. Therefore, various in silico techniques have been employed to find out other potential inhibitors that can behave as EGCG. Thus, the homology modelling of E2 protein was performed. The potential lead molecules were predicted using ligand-based as well as structure-based virtual screening methods. The compounds obtained were then screened through PyRx. The drugs obtained were ranked based on their binding affinities. Furthermore, the docking of the topmost drugs was performed by AutoDock Vina, while its 2D interactions were plotted in LigPlot+. The lead compound mms02387687 (2-[[5-[(4-ethylphenoxy) methyl]-4-prop-2-enyl-1,2,4-triazol-3-yl] sulfanyl]-N-[3(trifluoromethyl) phenyl] acetamide) was ranked on top, and we believe it can serve as a drug against HCV in the future, owing to experimental validation.


Introduction
E2 protein is usually considered as the site for HCV entrance because it contains highly conserved regions [1]. Generally, it has a role in target cell recognition and its attachment with the virus. The major variations in E2 protein are highly observed in hypervariable regions. Three different hypervariable regions have been reported recently. Hypervariable region 1 has a role in target cell recognition and its attachment. Hypervariable region 2 usually helps in binding with the receptors of the cell surface [2,3]. Therefore, the diverse nature of the virus and certain drawbacks present in the available treatment compelled scientists to identify a drug that is cost-effective and pan-genotypic in nature.
Epigallocatechin-3-gallate is a component of green tea. It contains some other catechins too, such as epigallocatechin-gallate 46.8%, epicatechin gallate 13.54%, epigallocatechin 2.28%, epicatechin 8.07%, and gallocatechin 7.24%. Certain flavanols are also present in small amounts [4]. It is capable of inhibiting the HCV as approved by means of cell culture analysis [5][6][7]. They also showed that it can specifically target the virus entry into the cell, as well as its attachment and transmission from one cell to another. Therefore, the procedure of de novo drug synthesis was used to evaluate its effects on HCV envelope proteins.
The application of computational techniques in the field of biological sciences helped provide new approaches in drug development and designing. Computer-aided drug designing can assist in accelerating the process of therapeutic drug synthesis, which requires a wet lab and screening process that are costly and time-consuming. The advent of revolutionary drug development, such as virtual screening, homology modeling, genomics, proteomics, and de novo synthesis drastically increased the process of drug development [8][9][10]. The two databases named ZINC and PubChem contain millions of purchasable "drug-like" compounds, effectively all organic molecules that are for sale, a quarter of which are available for immediate delivery. They connect purchasable compounds to high-value ones, such as metabolites, drugs, natural products, and annotated compounds from literature. They also offer new analysis tools that are easy for non-specialists yet with few limitations for experts. These databases retain their original 3D roots, and all molecules are available in biologically relevant, ready-to-dock formats. Thus, these databases are useful sources of ligand screening [11,12]. Calculation of logP, polar surface area (PSA), molecular weight, number of hydrogen-bond donors and acceptors, and number of rotatable bonds are the criteria for selection of drug-like molecules obtained from these databases [13]. The molecules in these databases are applied in virtual screening for identification of their inhibitory action against target structures [14].
Globally, 170 million people are infected with hepatitis C virus. Approximately 15-20% of the population progress to chronic liver infection in 15 to 20 years [15]. Hepatitis C virus is an RNA virus that belongs to the family of Flaviviridae having a genus of hepacivirus. The enveloped genome of HCV is positive-sense having 3010 amino acids and 9600 nucleotide bases. The HCV structure contains Open Reading Frame, 5 non-coding region and 3 untranslated region. ORF region encodes 11 proteins commonly known as structural and non-structural proteins. Structural proteins are E1, E2, and p7, while non-structural proteins are NS2, NS3, NS4A, NS4B, NS5A, and NS5B [16].
The aim of the present study is to identify potential HCV inhibitors based on their interaction with Epigallocatechin-3-gallate by using ligand-based virtual screening as well as target-based virtual screening. Virtual screening helps in evaluating various scaffolds of the molecule such as its interaction energy and binding energies, etc. Conventional drug development can cause toxicity in the host, while the in-silico approach abrogates the toxic effect on host cells.

Homology Modeling
The E2 model of HCV was designed by homology modelling. A representative sequence for each genotype was retrieved from the UniProt database [17]. The homology modelling of these retrieved sequences produced 164 models that were built via online homology modelling servers, i.e., SWISS-MODEL [18], I-TASSER [19], LOMETS [20], CPH models [21], as well as MODELLER [22]. The template used for this purpose was 4MWF. The models obtained were further evaluated on ProCheck [23] and ProSa [24] to analyze the stereochemical properties of protein structures. The selected models were energyminimized and refined with ModRefiner [25] to ensure that the confirmations obtained were stable in nature. The finest possible model obtained was selected for further analysis. The examination of the models was done on Discovery Studio Visualizer [26].

Binding Site Prediction
The binding sites of the model were predicted through different sites such as COACH [27], TM-SITE, S-SITE [28], CO-FACTOR [29], FIND-SITE, and CON-CAVITY [30]. The pockets having the highest C-Score were then further compared with binding sites predicted in literature. The pockets that were predicted by both the literature and online tools were further selected for virtual screening.

Ligand-Based Virtual Screening
The screening of ligands was done on three commercially available online servers such as ZINC [11], PubChem [12], and DrugBank [31]. The screening of the structurally-similar ligands with the known inhibitor, EGCG, was done by following 70% similarity index for ZINC, 3D similarity search for Pubchem, and 50% cutoff value for Drug Bank. The ligands identified twice were considered only once. The drug-like properties of the ligands were evaluated using Lipinski's Rule of Five, while the toxicity filters were employed with the help of server Swiss ADME [32].

Structure-Based Virtual Screening
Dockblaster [33], pep mms mimic [34], and MTiOpenScreen [35] were used for drug mining against the E2 protein of HCV to find its potential inhibitors. The residues bind within the specific binding cavities. The drug's likeliness of the ligand was evaluated while the toxicity filters were applied. The ligands fulfilling all the properties were further selected for docking.

Library Designing
The library of the lead molecule was designed with CLEVER [36], which helps in analyzing chemical compounds as well as the conversion of the lead molecule's chemical format.

Virtual Screening and Docking on PyRX
PyRx [37] is a graphical interface for users to execute virtual screening. It can evaluate the binding affinity as well as the RMSD scores of each ligand. The library of the ligands was subjected to virtual screening against the E2 protein of HCV. Docking of the selected ligands was carried out on an automated docking tool AutoDock Vina [38]. It performed ligand docking with protein in a specific grid. The tool helped in protein and ligand preparation, optimization, and grid generation near the active sites and then docking.

Docking
The interaction of the obtained ligands with the protein was analyzed with AutoDock Vina using PyRX. The docking was done on the specific pockets that were predicted by literature and databases.

Analysis of Interaction
To analyze the interaction between docked ligands and protein, LigPlot+ [39] was used. This helped us to clearly observe the type of bonding between the ligands and protein.

Sequences Obtained after Alignment
The template structure used for multiple alignment is 4MWF. ClustalW was used for the purpose of alignment. The alignment obtained is shown in Figure 1. The obtained alignments showed that close similarity exists between template 4MWF and query model. The residues are in comparable positions as shown above. Therefore, they can be expected to have the same function as the template structure. Hence it can be further used for homology modelling.

Model Selection
Different models were obtained after ProCheck analysis. The selected models were listed based on their quality and stereochemical property as shown in Table 1. The tool gave us an insight of the structure, while it can also highlight the portion of the protein that needs to be highlighted. After ProCheck analysis, 23 models were selected on the basis of Ramachandran plot. Computational models were developed in previous studies [2,40] prior to the experimentally determined structures of the E2 glycoprotein. The structures of flavivirus and alphavirus class II fusion proteins were used as modeling templates by investigators. A crystal structure of the E2 glycoprotein of tick-borne encephalitis virus (PDB code 1SVB) [41] presented as the key template for the first of these modeling studies. The model, having a core value 91.2%, disallowed region 0.0%, maximum deviation value 8.2, bad contacts 6, and generously allowed region 1.0%, was selected for further analysis.

Further Analysis by ProSa
The selected models were then subjected to further evaluation by ProSa [24]. The tool helped in determining the most suitable model on the basis of energy; therefore, the models having the lowest preference line, as shown in Figure 2, were selected. In the previous investigation [42], scientists predicted that the sequence of the E2 model of HCV would be compared to and conserved as an epitope for vaccine development using in silico approach.
During the study, ProSA selected the best model for evaluation. Therefore, the current study shows the importance of these computational tools for studying the best structure. The models obtained through LOMETS [43] were rejected, as they gave no results when they were analyzed with ProSa. Different colors were given to the models. The overall comparison of the models suggests that MOD29 showed the lowest energy level and is, hence, the most stable confirmation among all other selected models. Therefore, it can be further used as a homology model. The graphical representation suggests that MOD29 has the lowest preference line.
The ProCheck generated Ramachandran plot of the model is shown in Figure 3. The core region contains 91.2% residues, while no residues reside in the disallowed region.
The Ramachandran plot of the model shows that almost 91.2% of residues were present in the most favored region, while no residue was observed in the disallowed region. The number of proline residues was 28. Proline has a specific role in protein splicing, while 32 glycine residues were observed in the model. The 3D model of HCV core protein was designed in a study. The Ramachandran plot of the study reveals that only 87.1% amino acids are present in the favorable region while 12.6% and 0% in allowed and disallowed regions [44]. It can also be predicted from the plot that the high density of amino acids is present in the form of anti-parallel beta sheets, while some of them are in the form of collagen triple helix. Whereas some density of the protein can also be observed in the negative Psi region; therefore, it can be concluded that some of the amino acids can be in the form of right-handed alpha helix.

Model Topology
A model of the E2 protein of HCV is shown in Figure 4. The sequence contains 240 amino acids. The 3D model of protein reveals that it contains 11 beta pleated sheets, while 9 alpha helixes were examined during the analysis. The analysis of disulfide bridges was done on a tool (clavius.bc.edu/~clotelab/DiANNA/), which shows that the model has eight disulfide bonds. The protocol used for building disulfide bonds in the model is shown in model patch_ss_template (4MWF). In a study [45], the topology adopted by the specific transmembrane region of HCV envelope proteins has given rise to major controversy, as the model showed less than 30 amino acids. Therefore, current investigations show advancement in model designing of HCV envelope proteins.

Binding Site Analysis
The analysis of the binding site was done by COACH [27]. The web server helps in predicting the binding site with the help of five different tools. Results were based on C-Score value. The residues having the highest C-scores were selected for further analysis. The binding site of HCV envelope protein is also mentioned in the study [46], which can be considered a target site for drug designing. Therefore, the residues observed in the E2 protein by COACH were compared with the whole HCV genome as shown in literature, and only those that were found similar in COACH and the literature were selected. Residues are shown in Table 2. Thus, current findings show the most suitable binding pocket for drug designing.

Virtual Screening on the Basis of Ligands
Three databases were screened using EGCG as a priority model. Obtained ligands were similar to EGCG. Table 3 shows the total ligands during ligand-based virtual screening. Several studies [47][48][49][50] reported virtual screening methods for identification of potential inhibitors for targeting envelope proteins of the flaviviridae family. Thus, current study shows advancement in the virtual screening approach for drug designing against HCV envelope proteins.

Virtual Screening on the Basis of Macromolecule
The protein obtained via homology modeling was used as a model in structure-based virtual screening. Table 4 shows the total ligands during structure-based virtual screening. In a study [51], HCV envelope protein was targeted to inhibit HCV infection and cell to cell transmission by identifying novel drugs through virtual screening. Table 4. Number of ligands obtained in structure-based virtual screening.

No. of Ligands Obtained
DockBlaster 500

MTI open screen 3000
Pep mms mimic 200

PyRx Based Virtual Screening
Ligands obtained from structure-based or ligands-based virtual screening were further assessed via docking by PyRx [37]. The docked ligand-macromolecule complexes were ranked on basis of binding affinity with lowest energy to be at the top. In one of the current studies, computational docking reveals the set of 23 drugs that block the viral infection on CD-81 binding site, but after experimental analysis only one ligand was capable of binding to inhibit the infection of Huh-7 cells. While the binding energy of the drugs ranges from −8.64 to −6.36 [52], the binding energy of the drugs obtained from our virtual screening ranges from −13.2 to −11.
A grid was generated near the binding pockets as predicted by the COACH server. The X, Y, and Z coordinates are characterized in Table 5. The results obtained through PyRx were further evaluated through LigPlot+ to obtain their interaction with ligand molecules. The 2D models were predicted through LigPlot+.

Analysis of 2D and 3D Interactions of Docked Complexes
The 3D interactions of the top five molecular docked complexes were analyzed by Discovery Studio Visualizer and the 2D plots for complexes were obtained from LigPlot+, shown in Figures 5-14, respectively.          The 2D interaction of mms02387687 ligand and macromolecule suggests that the interaction lacks hydrogen bonding and non-ligand bond interaction, while it represents hydrophobic interactions with some non-ligand residues. The docking scores are 6312. The PyRx results suggest that it has the highest affinity to bind with the macromolecule, i.e., −13.2. Similarly, the 2D interpretation shows that the mms02384293 ligand and non-ligand forms bond with macromolecule residue Pro102 and Ala222, but lack hydrogen bonding. The docking score for the macromolecule is 6332. It has the second-highest binding affinity with the macromolecule after mms02387687, thus having −12.4 binding affinity. The 2D interaction of mms02962350 ligand suggests that it forms the non-ligand bonding interaction with residue Phe64. The docking score is 6676. The results of PyRx suggest that it has −12 binding affinity with the macromolecule. Ligand zinc000150338804 shows two types of bonding such as hydrogen bonding and non-ligand bonding interaction. The hydrogen bonding is present on two sites: Tyr241, which is 3.30cm apart, and Thr52, which is 3.04cm apart from the macromolecule. The non-ligand bonding interaction is also present at two sites, which are Gly177 and Ile94. It is also known as theaflavindigallate. The docking scores are 7982. The PyRx results show that the binding affinity is −12. Ligand zinc000230090738 and macromolecule display non-ligand bonding interaction at two different residues, which are Leu89 and Asp100. Both Leu89 and Asp100 form hydrogen bonding at only one site. The docking score of the ligand is 7342, while the affinity of its binding with the macromolecule is −11.8. Some previous studies [51] show hydrogen and other interactions of sofosbuvir and ribavirin with HCV envelope protein during docking studies. Similarly, in other studies [52], the envelope protein is targeted for identification of various inhibitory molecules. In total, ZINC11882026, ZINC19741044, ZINC00653293, and ZINC15000762 are identified as potential candidates and recognized as appreciable drugs for viral envelope protein. Therefore, current findings suggest there is progress in docking methods for further identification of drugs against HCV envelope protein.

Conclusions
To reduce the cost of drugs and to limit the amount of time it takes to discover drugs, virtual screening methods are widely used. In this project, E2 protein is modeled through homology modeling. The cell culture analysis of EGCG reveals its affectivity by blocking its path during the inoculation stage. As such, EGCG is used as a standard for screening of potential inhibitors. Structure-based screening approach is successful in obtaining the inhibitors that may behave as a potent target against HCV. Structure-based virtual screening revealed 3700 drugs, while ligand-based virtual screening revealed only 214 drugs. The ligands obtained were finally screened through PyRx and selected on the basis of their binding affinities. Among these ligands, mms02387687 (2-[[5-[(4-ethylphenoxy) methyl]-4-prop-2enyl-1,2,4-triazol-3-yl]sulfanyl]-N-[3(trifluoromethyl)phenyl] acetamide) was ranked top because of its high binding affinity. The top 5 ligands were further docked with E2 protein.
The interaction between the ligands and the protein was analyzed on LigPlot+. We provided valuable information for possible drug-like compounds against HCV and concluded that in-silico dataset might help guide the scientific community toward having a better understanding of ligand molecule interaction with HCV E2 protein.