Discovery of Novel HIV Protease Inhibitors Using Modern Computational Techniques

The human immunodeficiency virus type 1 (HIV-1) has continued to be a global concern. With the new HIV incidence, the emergence of multi-drug resistance and the untoward side effects of currently used anti-HIV drugs, there is an urgent need to discover more efficient anti-HIV drugs. Modern computational tools have played vital roles in facilitating the drug discovery process. This research focuses on a pharmacophore-based similarity search to screen 111,566,735 unique compounds in the PubChem database to discover novel HIV-1 protease inhibitors (PIs). We used an in silico approach involving a 3D-similarity search, physicochemical and ADMET evaluations, HIV protease-inhibitor prediction (IC50/percent inhibition), rigid receptor–molecular docking studies, binding free energy calculations and molecular dynamics (MD) simulations. The 10 FDA-approved HIV PIs (saquinavir, lopinavir, ritonavir, amprenavir, fosamprenavir, atazanavir, nelfinavir, darunavir, tipranavir and indinavir) were used as reference. The in silico analysis revealed that fourteen out of the twenty-eight selected optimized hit molecules were within the acceptable range of all the parameters investigated. The hit molecules demonstrated significant binding affinity to the HIV protease (PR) when compared to the reference drugs. The important amino acid residues involved in hydrogen bonding and п-п stacked interactions include ASP25, GLY27, ASP29, ASP30 and ILE50. These interactions help to stabilize the optimized hit molecules in the active binding site of the HIV-1 PR (PDB ID: 2Q5K). HPS/002 and HPS/004 have been found to be most promising in terms of IC50/percent inhibition (90.15%) of HIV-1 PR, in addition to their drug metabolism and safety profile. These hit candidates should be investigated further as possible HIV-1 PIs with improved efficacy and low toxicity through in vitro experiments and clinical trial investigations.


Introduction
HIV protease (PR) is one of the three enzymes essential in the life cycle of the human immunodeficiency virus (HIV) survival and replication. At some point in the life cycle of HIV, immature viral particles are produced. These budded immature viral particles that contain catalytically inactive protease cannot undergo maturation to an infective form [1]. The role of PR is basically to catalyze the hydrolysis of Gag and Gag-Pol polyproteins thereby generating mature infectious virions [2]. HIV protease inhibitors work by antagonizing the process that leads to the generation of mature infectious virions. When there is a chemical inhibition of this enzyme or there is a mutation in the catalytic residues of the PR, there is a production of immature noninfectious viral particles which lacks the capacity for viral infectivity [1,[3][4][5]. Therefore, HIV PR has played a key role as an attractive target for the design of drugs against acquired immunodeficiency syndrome (AIDS) [6]. HIV protease inhibitors are a class of anti-HIV drugs that inhibit the hydrolysis of Gag and Gag-Pol polyproteins, which ultimately prevents the production of mature infectious virions [7]. Currently, the FDA has approved ten protease inhibitors for clinical use. They are saquinavir, indinavir, ritonavir, nelfinavir, amprenavir, fosamprenavir, lopinavir, atazanavir, tipranavir and darunavir.
Over the decades, computer-aided drug design (CADD) has brought tremendous breakthroughs in the field of drug discovery [8,9]. Basically, CADD is used to catalyze and rationalize the drug design process while reducing costs and time [10,11]. DiMasi et al. [12] and Song et al. [13] have reported that an average of 10-15 years and an average of USD 800 million will be required to bring a new drug into the market. Paul and co-workers, in 2010, approximated the cost of drug discovery to USD 1.8 billion [14]. CADD explores computational techniques to study the drug-receptor interactions and to determine the binding potential and affinity of a given molecule to a drug target [15]. Different chemical interactions such as hydrophobic, electrostatic and hydrogen bonding are seen in the binding of ligands to the receptor. It has been applied in the earliest stage in drug discovery to identify hit compounds by high throughput screening (HTS) [11]. Virtual screening focuses on filtering libraries of compounds using in silico methods to prioritize those compounds that are most likely going to be active against a selected target. Subsequently, these hits and lead molecules are tested with a suitable activity assay for potency and ADMET properties. Notable among the modern techniques in CADD is pharmacophore modeling.
Pharmacophore is a molecular framework that carries the essential features (phoros) responsible for a drug's biological activity (pharmacon) [16]. Similarly, the International Union of Pure and Applied Chemistry (IUPAC) has defined pharmacophore as "the ensemble of steric and electronic features that is necessary to ensure the optimal supra-molecular interactions with a specific biological target structure and to trigger (or to block) its biological response" [17,18]. There are so many applications of pharmacophore modeling in the drug discovery process ranging from virtual screening to target identification, scaffold hopping, ligand profiling, lead optimization and de novo drug design. Therefore, this method is a widely used tool in CADD and chemoinformatics fields [19].
Pharmacophoric modeling is based on the theory that two or more compounds that have common chemical functionalities, and maintain a similar spatial arrangement, will lead to similar biological activity on the same target [20]. The key important pharmacophoric feature types are hydrogen bond donors (HBDs); hydrogen bond acceptors (HBAs); hydrophobic areas (H); positively and negatively ionizable groups (PI/NI); aromatic groups (AR); and metal coordinating areas ( Figure 1). Additional size restrictions in the form of shape or exclusion volumes (XVOL)-forbidden areas-can be added to represent the size and shape of the binding pocket [21]. Pharmacophore modeling can be structure-based or ligand-based. In this study, we have used ligand-based pharmacophore modeling to screen the PubChem database for compounds with bioactivity against HIV-1 protease.
The result of the pharmacophoric similarity search is summarized in Figure 2. Preliminary filtration was performed, where compounds that had more than one Lipinski violation were filtered off. The search result returned a total of 6759 similar compounds from different HIV protease inhibitors used as reference (Figure 2). At the end of further screening and optimization of these compounds, we obtained 46 hits ( Figure 2). We noted that there were no hits from saquinavir, tipranavir and indinavir based on our screening criteria. The screening criteria used to exclude molecules include violation of more than 2 Lipinski's parameters, binding affinity less than the reference drug (from the docking scores), low anti-HIV protease inhibition activity, poor ADME properties and potential toxicity. The molecules from saquinavir, tipranavir and indinavir failed in one or more of the above criteria.
The result of the pharmacophoric similarity search is summarized in Figure 2. Preliminary filtration was performed, where compounds that had more than one Lipinski violation were filtered off. The search result returned a total of 6759 similar compounds from different HIV protease inhibitors used as reference (Figure 2). At the end of further screening and optimization of these compounds, we obtained 46 hits ( Figure 2). We noted that there were no hits from saquinavir, tipranavir and indinavir based on our screening criteria. The screening criteria used to exclude molecules include violation of more than 2 Lipinski's parameters, binding affinity less than the reference drug (from the docking scores), low anti-HIV protease inhibition activity, poor ADME properties and potential toxicity. The molecules from saquinavir, tipranavir and indinavir failed in one or more of the above criteria.   Table 1 shows the physicochemical properties of the compounds which are important factors in the determination of the solubility, permeability, and bioavailability of an orally delivered drug. With the exceptions of HPS/013, HPS/014, HPS/015, HPS/016, and HPS/017, we noted that the molecular weights of most of the compounds are greater than 500 Da. This is not so surprising as the molecular weights of all the reference drugs used were more than 500 Da. However, other parameters for drug-likeness according to Lipinski's rule of five (Ro5) were obeyed, ensuring that the compounds did not violate more than one of Lipinski's rules. Traditionally, therapeutics have been small molecules that fall within Lipinski's rule of five (i.e., a molecule with a molecular mass less than 500 Da, no more than 5 hydrogen bond donors, no more than 10 hydrogen bond acceptors, and an octanol-water partition coefficient log P not greater than 5) [22]. Violation of not more than one rule is a good indication of molecules that will have good oral bioavailability. It should be noted that compounds that are substrates for biological transporters do not obey this rule, they can have violations up to 4 [23].  Table 1 shows the physicochemical properties of the compounds which are important factors in the determination of the solubility, permeability, and bioavailability of an orally delivered drug. With the exceptions of HPS/013, HPS/014, HPS/015, HPS/016, and HPS/017, we noted that the molecular weights of most of the compounds are greater than 500 Da. This is not so surprising as the molecular weights of all the reference drugs used were more than 500 Da. However, other parameters for drug-likeness according to Lipinski's rule of five (Ro5) were obeyed, ensuring that the compounds did not violate more than one of Lipinski's rules. Traditionally, therapeutics have been small molecules that fall within Lipinski's rule of five (i.e., a molecule with a molecular mass less than 500 Da, no more than 5 hydrogen bond donors, no more than 10 hydrogen bond acceptors, and an octanol-water partition coefficient log P not greater than 5) [22]. Violation of not more than one rule is a good indication of molecules that will have good oral bioavailability. It should be noted that compounds that are substrates for biological transporters do not obey this rule, they can have violations up to 4 [23].

Physicochemical Properties Evaluation
The partition coefficient (logP) is the measure of the lipophilicity of a drug and an indication of its ability to cross the cell membrane. It has served as one of the fundamental principles for drug discovery and design [24]. Maintaining a balance in octanol-water partition coefficient of a molecule is very important. All the hit molecules are within the range of Lipinski's Ro5. When there is high lipophilicity, there will be a high rapid metabolic turnover of the [25], low solubility and poor absorption of the compound. Furthermore, an increase in lipophilicity increases the probability of binding to hydrophobic protein targets that are not the desired ones, leading to an increase in the potential for toxicity. Log P and other physicochemical properties are of potential importance in the prediction and optimization of drug-likeness of molecules.
Topological polar surface area (TPSA) is a molecular descriptor widely used in the study of drug transport properties such as intestinal absorption [26] and blood-brain barrier (BBB) penetration [27]. In addition, the polar surface area (PSA), which reflects the ligand hydrophilicity, is very vital in protein-ligand interaction [28]. The major application of TPSA in medicinal chemistry has been to evaluate how molecules can cross different membrane barriers [29]. Kelder and colleagues showed drug intestinal permeation predominated by passive diffusion and paracellular route for drugs with TPSA of less than 120 Å [30]. Compounds with a TPSA >140 Å have been identified as poorly absorbed. Most of our hit molecules have TPSA < 140 Å. The percentage solubility calculated from % ABS = 109-0.345 × TPSA [31], ranges from 60.56-72.75%. Both the TPSA and the % ABS showed that most of our hit molecules had good solubility and are well absorbed, a designation of good bioavailability upon oral administration.

Molecular Docking Studies
Molecular docking accurately predicts the experimental interaction mode and ligands affinity within the active binding site of the drug target, making it very significant in the discovery of new drugs [32]. Using the SiteFinder in MOE, the following amino acid residues were found to be part of the active binding site of the 2Q5K: ASP25, GLY27, ALA28, ASP29, ASP30, VAL32, ILE47, GLY48, GLY49, ILE50, GLY52, PHE53, ILE54, THR80, PRO81, VAL82 and ILE84. The docking protocol was further validated by docking the retrieved co-crystallized ligand (lopinavir) into the binding cavity of the 2Q5K-liponivir complex ( Figure 3) and measuring the root mean square deviation (RMSD). The RMSD gave 0.9 Å. Usually, RMSD ≤ 2 Å for a pose is considered a docking success. Our docking protocol was, therefore, fully validated. Figure 3 shows the docked ligand (cyan) almost perfectly overlapped the co-crystallized ligand (yellow). Figure S1 of the supplementary material also gives more insight into the validation, where both the docked and the co-crystallized lopinavir shared the same amino acid residues of the catalytic triads (ASP 25, GLY 27 and ASP 29) as shown in Figure 4. Table 2 shows the binding free energy, ∆G (kcal/mol) from the molecular docking calculations using MOE. The ∆G of the hit molecules is lower than the reference drug used. The low ∆G (high negative values) is an indication of the high binding affinity of these molecules with HIV protease. All the hit molecules have higher binding affinity than their respective reference drugs. These docking scores suggest that the hit molecules might have better HIV-1 protease inhibitory activities. HPS/005 and HPS/006 showed the most favorable docking scores of −8.98 and −8.63 kcal/mol, respectively. Due to toxicity concerns of HPS/005 in Tables 3 and 4, it was not further studied. Hit molecules HPS/002, HPS/004, HPS/006, HPS/007, HPS/008 and HPS/009 were selected for further studies in order to gain more insight into the nature of the chemical interactions and conformations of docked compounds. The criteria for their selection include high binding affinity when compared to the reference, significant anti-HIV activity (low IC 50 and high % inhibition), good ADME properties and low toxicity potential. The docking poses were visualized, and the detailed interactions were recorded as shown in Figure 5 and Table 3, respectively.      Caco2 perm.-Caco2 permeability (log Papp in 10 −6 cm/s); IA(H)-Intestinal absorption (human); Skin perm-Skin Permeability (log Kp); P-GPS-P-glycoprotein substrate; P-GP1I-P-glycoprotein I inhibitor; P-GP2I-P-glycoprotein II inhibitor; VDss-Volume of distribution at steady state (human) (Log L/kg); FU-Fraction unbound (human) (Fu); BBB perm.-BBB permeability (log BB); CNS perm.-CNS permeability (log PS). Molecular docking simulation clearly demonstrated significant chemical interactions of the atoms of the ligands and the amino acid residues of the HIV protease ( Table 3). The amino acid residues of the HIV-1 protease that interacted with the atoms of the optimized Molecular docking simulation clearly demonstrated significant chemical interactions of the atoms of the ligands and the amino acid residues of the HIV protease (Table 3). The amino acid residues of the HIV-1 protease that interacted with the atoms of the optimized hit compounds (Table 3)  It was observed that HPS/007 has the highest binding affinity (−8.57 kcal/mol). In addition to the chemical interactions with the residues in the catalytic active site, the two thiazole rings contributed significantly to this high binding affinity of HPS/007 with the HIV protease. Again, the 6-membered aromatic rings, which are common in all the hit molecules played a key role in the chemical interactions, which resulted in an improved binding affinity of these molecules to the HIV protease. Figure 5 illustrates the CPK representation of the binding poses of (a) HPS/002 (b) HPS/004 (c) HPS/006 (d) HPS/007 (e) HPS/008 and (f) HPS/027 in the binding cavity of HIV-1 protease.

Anti-HIV Prediction Analysis
The results of HIV protease inhibitory activity-IC 50 (µM) and % inhibition of the selected hits from different reference drugs using HIVprotI are shown in Table 2. The fourteen selected hit molecules showed significant activity against HIV-1 protease with IC 50 range of −2.52-48.90 µM and 52.71-90.13% inhibition. Generally, each selected hit molecule has a better IC 50 and/or % inhibition when compared to its reference drug. HPS/002 has a comparable % inhibition of 90.13% to its reference drug-lopinavir (90.12%). However, it showed lower and better IC 50 of 48.90 µM than lopinavir (203.69 µM). A low IC 50 value implies that the drug candidate is potent at a low concentration and ultimately may show lower systemic toxicity when administered to the patient.

ADMET Analysis
The pharmacokinetic and pharmacodynamic profiles are usually used to assess the safety and efficacy of a drug during the drug development process. Despite how promising a drug candidate may be, the ADMET drug properties determine the extent to which it can be useful as a drug. We have extensively evaluated the ADMET properties that can limit the use of drug candidates. Various ADMET properties of the optimized hit molecules were evaluated. The information can be used to predict various pharmacokinetic phenomena of these compounds, which ultimately guide the further development of new drug compounds [33]. A compound that does not meet certain ADME standards may result in poor bioavailability. In like manner, compounds which are very toxic cannot be useful as drug candidates as the risks will outweigh the potential benefits of such compounds.

Absorption and Distribution Evaluations
Water solubility is an essential aspect of a drug's pharmacological reaction following oral delivery. Drugs with strong water solubility will have good absorption and bioavailability qualities. Drug absorption and bioavailability can boost plasma drug concentrations at the target location, allowing it to fulfill therapeutic actions [34]. The compounds showed good water solubility ( Table 4). The absorption parameters shown in Table 4 indicate that the compounds could be absorbed from the intestinal tract upon oral administration. All the hit molecules have high GI absorption (poorly absorbed <30%), denoting an increase in permeability. High Caco-2 cell permeability is associated with a permeability value > 0.9 [35] The hit molecules and all the FDA-reference drugs except Lopinavir did not comply with threshold values. Table 4 also shows the distribution parameters analysis. For a given compound, a logBB > 0.3 is considered to readily cross the blood-brain barrier while molecules with logBB < −1 are poorly distributed to the brain. Similarly, compounds with a logPS > −2 are considered to penetrate the central nervous system (CNS) while those with logPS < −3 are considered unable to penetrate the CNS. From Table 4, all the hit molecules, including the reference drugs are not readily able to cross the blood-brain barrier (BBB) or penetrate the central nervous system (CNS). The only exception to this is lopinavir, which can penetrate the CNS but was not readily able to cross the BBB. The volume of distribution, VDss is the theoretical volume that the total dose of a drug would need to be uniformly distributed to give the same concentration as in blood plasma. The higher VD is the more of a drug is distributed in the tissue rather than plasma. VDss is considered low if below 0.71 L/kg (log VDss < −0.15) and high if above 2.81 L/kg (log VDSS > 0.45) [35]. While some of the hit molecules have a high volume of distribution at a steady state (VDss), others have a low to moderate VDss. Table 5 shows the metabolism and excretion parameters analysis. CYP1A2, CYP2C19, CYP2C9, CYP2D6, CYP2E1 and CYP3A4 play key roles in drug metabolism. The results suggest that except for lopinavir, HPS/015, HPS/019, HPS/024 and HPS/028, all the other hit molecules and reference drugs inhibit the CYP3A4 sub-enzymes of cytochrome P450 (CYP). CYP3A4 is responsible for metabolizing ∼50% of all drugs by itself [36]. All the hit molecules and the reference drugs do not inhibit CYP2D6 except HPS/017, HPS/028 and nelfinavir. Most of the compounds and drugs evaluated are not CYP2C9 inhibitors while others inhibit it. It has been demonstrated that CYP2C9 can metabolize some marketed drugs [36].

Metabolism and Excretion
All except HPS/002 are organic cation transporter 2 (OCT2) non-substrates. OCT2 is a renal uptake transporter that plays a critical role in the disposal and renal clearance of drugs and endogenous compounds [36]. OCT2 substrates can adversely interact with co-administered OCT2 inhibitors. This evaluation has provided insightful information on drug clearance and potential contraindications of OCT2 transporter drug candidates.

Toxicity Analysis
The LD 50 (mg/kg body weight) is the median lethal dose at which 50% of test subjects die upon exposure to a compound. The Globally Harmonized System of classification of labeling of chemicals (GHS) has categorized LD 50 -mg/kg into the following classes: Class I: fatal if swallowed (LD 50 ≤ 5); Class II: fatal if swallowed (5 < LD 50 ≤ 50); Class III: toxic if swallowed (50 < LD 50 ≤ 300); Class IV: harmful if swallowed (300 < LD 50 ≤ 2000); Class V: may be harmful if swallowed (2000 < LD 50 ≤ 5000) and Class VI: non-toxic (LD 50 > 5000). None of the hits is fatal when swallowed as shown in Table 6. Some are toxic when swallowed while others may be harmful when swallowed. This is an affirmation that drugs are poisons and not food that is non-toxic when swallowed (Class VI). This harmful nature may have to do with some side effects that may be associated with drugs when taken. In Table 6, almost all the hits are found to be inactive, with a high probability, to hepatotoxicity, carcinogenicity, immunotoxicity, cytotoxicity, aromatase, estrogen receptorα, androgen receptor and peroxisome proliferator-activated receptor gamma. This finding gives more credence to our hit compounds as potential HIV protease inhibitors. The androgen receptor (AR) is the main biomolecular target involved in the development and progression of hormone-dependent prostate cancer [37]. In silico evaluation of smallmolecule binding at the AR is very important for the screening of potential androgendisrupting chemicals [38] causing endocrine disorders [39,40]. Additionally, disruption of the androgen system is associated with decreased sperm count, increased infertility [41] and diabetes mellitus [42]. We have predicted from this study that ritonavir, atazanavir and darunavir have hepatotoxicity potentials (Table 6). The AMES (the bacterial reverse mutation test) toxicity profile (Table 7) indicates that the hit molecules do not have potential mutagenic tendencies. It is only PC_HPS/017 that tests positive and therefore may act as a carcinogen since HIV-1 is often linked to mutation. The maximum recommended tolerated dose (MRTD) provides an estimate of the toxic dose threshold of chemicals in humans. For a given compound, an MRTD of ≤20.477 log(mg/kg/day) is considered low, and higher >0.477 log log(mg/kg/day). All the hit molecules showed values less than the maximum human-tolerated dose (0.477 log mg/kg/day), indicating no possible dose-related toxicity. None of the hit molecules, including the reference drugs, are considered a likely inhibitor of hERGI. HPS/002, HPS/004, HPS/007, HPS/023, HPS/027 and some other hit molecules are considered possible hERGII inhibitors while HPS/015 and some other hits will potentially not inhibit hERGII. Human ether-a-go-go-related gene (HERG) potassium channels are expressed in multiple tissues including the heart and adenocarcinomas [43]. The pharmacological reduction of HERG currents may cause acquired long QT syndrome and life-threatening "torsade de pointes" arrhythmias [44]. HERG expression in tumor cells accelerates cell proliferation [45], and inhibition of HERG currents has been shown to reduce cell proliferation [46]. The hepatotoxicity result shows that the hit molecules and all the FDA-approved drugs used as reference could be related to at least one physiological or pathological event, which could be associated with disruption of normal liver function. T. pyriformis is a protozoa bacterium with its toxicity measured in log µg/L often used as a toxic endpoint. If the value is <−0.5 log µg/L for any compound, it is considered to be toxic [47]. None of the hit molecules and the reference drugs showed toxicity against T. pyriformis. Similarly, Minnow toxicity was assessed. It is an indication of the lethal concentration (LC 50 ) that represents the concentration of a molecule required to cause the death of 50% of the flathead minnows. A compound with log LC 50 < −0.3 is regarded as having high acute toxicity [36]. While some hit molecules may be associated with minnow toxicity, others are not. In the light of the foregoing discussions, we observe that the majority of the hit molecules' predicted toxicities (Tables 6 and 7) maintain a relatively lower acute toxicity risk compared to reference drugs.

Physicochemical Evaluation for Drug-Likeness
The Molecular Descriptors Algorithm in MOE was used to evaluate the physicochemical properties of the compounds. These properties include molecular weight (MW), log octanol/water partition coefficient (log P), number of H-bond donor atoms (a_don), number of H-bond acceptors (a_acc), topological polar surface area (TPSA), number of rotatable bonds (b_rotN). We applied Lipinski's rule of five to evaluate the drug-likeness. The result is shown in Table 1.

Pharmacophore Generation and Similarity Search
Pharmacophore Query Editor in MOE was used to annotate the 2Q5K-lopinavir complex loaded in MOE as well as rendering the resulting annotations graphically in the MOE window ( Figure 1). Annotation is the process of identifying regions of pharmacophoric importance in space around a molecular conformation and associating with those regions of pharmacophore types. The unified annotation scheme was used. In addition to the search mechanism in the PubChem database, this query was also used to search against the database to identify the similar pharmacophore model among them and separate the active compounds from the rest of the database.
3.5. Molecular Docking (MOE, 2020) [49] The crystal structure of the wild type HIV-1 protease co-crystallized with lopinavir at a resolution of 1.95 Å (Figure 6) was obtained from the Protein Data Bank (PDB ID: 2Q5K) [50], and loaded into Discovery Studio. The preliminary preparation of the protein was performed in Discovery Studio where the water molecules and heteroatoms were eliminated. The 2Q5K was then loaded into the MOE. The Quickprep algorithm in MOE comprising structure preparation and protonate 3D was used to complete protein preparation. The polar hydrogens were added. A temperature of 300 K, a salt concentration of 0.1 and pH 7 was specified in the implicit solvated environment to carry out the protonation process. Then, the structure was energy-minimized in the MMFF94x force field to an RMS gradient of 0.1 kcalmol −1 /Å 2 . The Finder Site in MOE was used to calculate the possible active sites in a receptor from the 3D atomic coordinates of the receptor-2Q5K, which will help to determine potential sites for ligand binding docking calculations. The Site Finder methodology is based upon Alpha Shapes. The generated sites are ranked according to their Propensity for Ligand Binding (PLB) score, which is based on the amino acid composition of the pocket [51].
The dataset from the outputs of the pharmacophore-based similarity search was prepared and loaded into the MOE where they were further prepared and energy-minimized. All the hit molecules were docked as a database into the predicted binding domain, and 30 conformations were generated for each molecule in the 2Q5K binding site. The triangular matcher placement and rigid receptor refinement were employed. Among them, the conformation with the lowest docking score was chosen to study the binding orientations of the ligands, and each complex was assessed and ranked by the London ΔG energy scoring function.

Prediction of HIV Activity
In this study, we used HIVprotI to predict the anti-HIV activity of the selected hits. The half maximal inhibitory concentration. IC50 (µM) and the percentage inhibition (%values) of these compounds were evaluated in silico. HIVprotI is a web-based algorithm for the prediction and design of protein-specific anti-HIV compounds (http://bioinfo.imtech.res.in/manojk/hivproti; accessed on 30 March 2022). This web tool was developed by experimentally testing the IC50 and percentage inhibition activity of various inhibitors datasets against all three HIV proteins (reverse transcriptase RT, protease, PR, and integrase, IN). These inhibitors from the ChEMBL database were further used to develop support vector machine (SVM)-based quantitative structure-activity relationship (QSAR) models using the inhibitor features, descriptors and fingerprints [52].

Conclusions
Pharmacophore-guided 3D-similarity search, ADMET profiling, molecular docking studies, and in silico evaluation of anti-HIV activity were carried out on the PubChem database containing 111,566,735 compounds to evaluate potential new antiviral agents against HIV-1 protease. The in silico analysis revealed that fourteen (HPS/002, HPS/004, HPS/006. HPS/007, HPS/008, HPS/009, HPS/010, HPS/011, HPS/012, HPS/013, HPS/014, HPS/018, HPS/020 and HPS/024) out of the twenty-eight selected optimized hit molecules were within the acceptable range of all the parameters investigated, such as physicochem- The Finder Site in MOE was used to calculate the possible active sites in a receptor from the 3D atomic coordinates of the receptor-2Q5K, which will help to determine potential sites for ligand binding docking calculations. The Site Finder methodology is based upon Alpha Shapes. The generated sites are ranked according to their Propensity for Ligand Binding (PLB) score, which is based on the amino acid composition of the pocket [51].
The dataset from the outputs of the pharmacophore-based similarity search was prepared and loaded into the MOE where they were further prepared and energy-minimized. All the hit molecules were docked as a database into the predicted binding domain, and 30 conformations were generated for each molecule in the 2Q5K binding site. The triangular matcher placement and rigid receptor refinement were employed. Among them, the conformation with the lowest docking score was chosen to study the binding orientations of the ligands, and each complex was assessed and ranked by the London ∆G energy scoring function.

Prediction of HIV Activity
In this study, we used HIVprotI to predict the anti-HIV activity of the selected hits. The half maximal inhibitory concentration. IC 50 (µM) and the percentage inhibition (%values) of these compounds were evaluated in silico. HIVprotI is a web-based algorithm for the prediction and design of protein-specific anti-HIV compounds (http://bioinfo.imtech. res.in/manojk/hivproti; accessed on 30 March 2022). This web tool was developed by experimentally testing the IC 50 and percentage inhibition activity of various inhibitors datasets against all three HIV proteins (reverse transcriptase RT, protease, PR, and integrase, IN). These inhibitors from the ChEMBL database were further used to develop support vector machine (SVM)-based quantitative structure-activity relationship (QSAR) models using the inhibitor features, descriptors and fingerprints [52].

Conclusions
Pharmacophore-guided 3D-similarity search, ADMET profiling, molecular docking studies, and in silico evaluation of anti-HIV activity were carried out on the PubChem database containing 111,566,735 compounds to evaluate potential new antiviral agents against HIV-1 protease. The in silico analysis revealed that fourteen (HPS/002, HPS/004, HPS/006. HPS/007, HPS/008, HPS/009, HPS/010, HPS/011, HPS/012, HPS/013, HPS/014, HPS/018, HPS/020 and HPS/024) out of the twenty-eight selected optimized hit molecules were within the acceptable range of all the parameters investigated, such as physicochemical and ADMET parameters, the predicted IC 50 /percent inhibition of HIV PR protein, docking scores and free binding energies.
There are clear indications from the docking results that residues ASP25, GLY27, ASP29, ASP30, and ILE50 are involved in essential hydrogen bonding and п-п stacked interactions stabilized the optimized hit molecules from PubChem in the active binding site of the HIV-1 PR (2Q5K), thereby playing vital roles for the observed anti-HIV activity. The MD simulation showed that the molecules are stable in the active binding site of the HIV-1 protease. Out of the fourteen hit candidates, HPS/002 and HPS/004 have been found to be most promising in terms of IC 50 /percent inhibition of HIV-1 PR and MD stability, in addition to their drug metabolism and safety profiles. We, therefore, propose that these fourteen hit molecules with non-toxic and good bioavailability predicted qualities, with emphasis on HPS/002 and HPS/004, should be investigated further as possible PR inhibitors through wet lab experiments (in vitro/in vivo and pre-clinical) and clinical trial investigations. This is on the premise that computational studies alone are not sufficient by themselves in the drug discovery process.