The Discovery of Potential SARS-CoV-2 Natural Inhibitors among 4924 African Metabolites Targeting the Papain-like Protease: A Multi-Phase In Silico Approach

Four compounds, hippacine, 4,2′-dihydroxy-4′-methoxychalcone, 2′,5′-dihydroxy-4-methoxychalcone, and wighteone, were selected from 4924 African natural metabolites as potential inhibitors against SARS-CoV-2 papain-like protease (PLpro, PDB ID: 3E9S). A multi-phased in silico approach was employed to select the most similar metabolites to the co-crystallized ligand (TTT) of the PLpro through molecular fingerprints and structural similarity studies. Followingly, to examine the binding of the selected metabolites with the PLpro (molecular docking. Further, to confirm this binding through molecular dynamics simulations. Finally, in silico ADMET and toxicity studies were carried out to prefer the most convenient compounds and their drug-likeness. The obtained results could be a weapon in the battle against COVID-19 via more in vitro and in vivo studies.


Introduction
On 11 November 2022, the WHO stated that the confirmed global infections of COVID-19 were 630,832,131, with 6,584,104 people dead [1]. Conforming to these gigantic numbers, massive work is demanded from scientists worldwide to find a cure.
The evolution of computational chemistry methods as a successful tool to conclude the physical and chemical properties of a molecule and the molecular reactions allowed deep identification of the molecular properties of compounds in addition to their interactions with different proteins [2]. Consequently, the in silico prediction of the activity of large libraries of compounds against a specific molecular target became available [3]. The computational (in silico) chemistry methods have been employed in drug discovery [4][5][6], molecular modeling [7], and design [8,9]. Additionally, it has been used to predict ADMET [10][11][12], toxicity [13][14][15] as well as DFT [16] properties.
The interest of humans in the use of natural products can be traced back hundreds of years and continues to the present [17,18].
The papain-like protease, PLpro, is a vital enzyme in the coronavirus. PLpro has an important role in the processing mechanism of viral polyproteins. This process leads to the ligand to bind with the targeted protein was our main motive in this work. We utilized some ligand-based computational techniques such as structure fingerprints and similarity to select the natural compounds (through the examined library) that have high degrees of similarities and hence could bind with PLpro effectively. The fingerprint study is a molecular descriptor technique widely used to figure out the similarity or dissimilarity between the chemical structures of two molecules or more [37,38]. In fingerprint study, the software converts the basic chemical molecular descriptors into mathematical symbols. The resulting data is displayed as bit strings that identify the presence (1) or absence (0) of a specific 2D atomic or fragment descriptor in both test and reference compounds [39,40]. In this study, Discovery Studio software examined the molecular fingerprints of 4924 compounds against TTT. This study aims to extract the most similar natural compounds to the ligand. The employed descriptors are H-bond acceptor [41] and donor [42], charge [43], hybridization [44], positive [45] and negative ionizable atoms [46], halogens [47], aromatic [48], or none of the above besides the ALogP [49] category of fragments and atoms. The study was adjusted to choose the most structurally similar 200 compounds to TTT (Table 1).

Molecular Similarity
The molecular similarity study differs from the fingerprints study in that the first computes certain descriptors regarding the whole chemical structure of a molecule. The computed descriptors are topological, steric, electronic, and/or physical properties [50]. On the other hand, the fingerprints study compares the absence or presence of certain 2D atom paths, fragments, or substructures in the chemical structures of reference and test molecules [51].

Docking Studies
The forty most structurally similar compounds were subjected to a molecular docking study in the hope of getting an insight into the way they interact with their biomolecular target. The papain-like protease (PLpro) crystal structure PDB ID: 3E9S in complex with the co-crystallized ligand, TTT, was adopted for the present study. A docking study was performed using MOE 14.0 software, Montreal, Canada. The calculated ∆G of the tested compounds is cited in Table 3. The docking protocol was first validated via the redocking of the co-crystallized ligand (TTT) against the active pocket of SARS papain-like protease (PLpro) active pocket. However, the validation step proved the suitability of the performed protocol for the intended docking study, as demonstrated by the small RMSD (0.51 Å) between the docked pose and the co-crystallized ligand ( Figure 3).
TTT, the well-known PLpro inhibitor, was used as a reference in the current study. The binding affinity value of TTT was −9.30 kcal/mol. TTT interacted with the active pocket through the formation of two H-bonds. The amidic NH group of TTT formed an H-bond with the carboxylic acid side chain of Asp165, while the amidic carbonyl bound to the nitrogen backbone of Gln270. Additionally, the naphthyl moiety was involved in a hydrophobic interaction with the Pro249 side chain ( Figure 4).  TTT, the well-known PLpro inhibitor, was used as a reference in the current study. The binding affinity value of TTT was −9.30 kcal/mol. TTT interacted with the active pocket through the formation of two H-bonds. The amidic NH group of TTT formed an H-bond with the carboxylic acid side chain of Asp165, while the amidic carbonyl bound to the nitrogen backbone of Gln270. Additionally, the naphthyl moiety was involved in a hydrophobic interaction with the Pro249 side chain (Figure 4).   TTT, the well-known PLpro inhibitor, was used as a reference in the current study. The binding affinity value of TTT was −9.30 kcal/mol. TTT interacted with the active pocket through the formation of two H-bonds. The amidic NH group of TTT formed an H-bond with the carboxylic acid side chain of Asp165, while the amidic carbonyl bound to the nitrogen backbone of Gln270. Additionally, the naphthyl moiety was involved in a hydrophobic interaction with the Pro249 side chain ( Figure 4).  Results of the docking study showed that most of the tested compounds have a similar position and orientation inside the SARS PLpro active site. Among them, members 2195, 1952, 2982, and 1330 revealed the greatest binding free energies of docking, which were almost close to the redocked ligand.
The docking simulation of compound 2195 revealed that it has the highest fitting into the enzyme active site with a docking score of −24.88 kcal/mol. It was stabilized in the active site through the formation of six H-bond interactions and many hydrophobic interactions. The chromenone moiety, via its carbonyl and hydroxyl groups, formed five H-bonds with Tyr269, Gln270, Leu263, and Lys253. On the other side, the p-hydroxyphenyl moiety was involved in an H-bond with Arg167 Figure 5. The chromenone moiety and the phenyl ring also formed hydrophobic interactions with Tyr269 and Lys158, respectively.
Compound 1952 exhibited a binding mode similar to that of the co-crystallized ligand with the formation of two H-bonds. One H-bond was formed between a hydroxyl group side chain with Asp165. The other H-bond was formed between the oxygen bridge and Gln270 ( Figure 6). Additionally, a hydrophobic interaction was formed between one phenyl moiety and Tyr265 of the active site.
A study of the top docking poses of member 2982 (Figure 7) showed that it interacted with the PLpro active site through the formation of three H-bond interactions. The hydroxyphenyl moiety was involved in an H-bonding with Leu163, while the aliphatic hydroxyl group formed another H-bond with Asp165. In addition, the carbonyl group formed an H-bond with Gln270. The docking simulation of compound 2195 revealed that it has the highest fitting into the enzyme active site with a docking score of −24.88 kcal/mol. It was stabilized in the active site through the formation of six H-bond interactions and many hydrophobic interactions. The chromenone moiety, via its carbonyl and hydroxyl groups, formed five Hbonds with Tyr269, Gln270, Leu263, and Lys253. On the other side, the p-hydroxyphenyl moiety was involved in an H-bond with Arg167 Figure 5. The chromenone moiety and the phenyl ring also formed hydrophobic interactions with Tyr269 and Lys158, respectively. Compound 1952 exhibited a binding mode similar to that of the co-crystallized ligand with the formation of two H-bonds. One H-bond was formed between a hydroxyl group side chain with Asp165. The other H-bond was formed between the oxygen bridge and Gln270 ( Figure 6). Additionally, a hydrophobic interaction was formed between one phenyl moiety and Tyr265 of the active site. the enzyme active site with a docking score of −24.88 kcal/mol. It was stabilized in the active site through the formation of six H-bond interactions and many hydrophobic interactions. The chromenone moiety, via its carbonyl and hydroxyl groups, formed five Hbonds with Tyr269, Gln270, Leu263, and Lys253. On the other side, the p-hydroxyphenyl moiety was involved in an H-bond with Arg167 Figure 5. The chromenone moiety and the phenyl ring also formed hydrophobic interactions with Tyr269 and Lys158, respectively. Compound 1952 exhibited a binding mode similar to that of the co-crystallized ligand with the formation of two H-bonds. One H-bond was formed between a hydroxyl group side chain with Asp165. The other H-bond was formed between the oxygen bridge and Gln270 ( Figure 6). Additionally, a hydrophobic interaction was formed between one phenyl moiety and Tyr265 of the active site.

ADMET Studies
The ability of a molecule to be a drug is decided not only by activity but also by acceptable pharmacokinetic properties. ADMET profile describes absorption, distribution, metabolism, excretion, and toxicity. Although the determination of the ADMET profile is available via several medium-and high-throughput in vitro methods, the ability to predict it depending on in silico is available with the advantage of saving time, money, effort, and animal lives [60]. ADMET prediction is an essential step in drug discovery [61].
The computed ADMET descriptors for 17 compounds that displayed correct binding mode and energy, as well as remdesivir as a reference drug, are listed in (Table 4 and Figure 9) . Compounds 181, 182, 204, 212, 213, 215, 1952, 2981, 3040, and 3396 were expected to have a high ability to pass BBB and, accordingly, were eliminated. Fortunately, the absorption levels of all compounds were computed as good. Similarly, all of them showed low to good aqueous solubility levels. All compounds were expected to bind to plasma protein with a ratio of more than 90%. Finally, according to these results, compounds Hippacine (164), Naamine D (2197), (±)-Enterofuran (3412), Daphnelone (2982), 4,2 -dihydroxy-4 -methoxychalcone (1330), 2 ,5 -dihydroxy-4-methoxychalcone (1331), and wighteone (2195) were favored and subjected to the next toxicity examination.   f f a BBB, Ability to pass the blood-brain barrier, 1 is high, 2 is medium, 3 is low, 4 is very low. b HIA, human intestinal absorption level, 0 is good, 1 is moderate, 2 is poor, and 3 is very poor. c Aq, Aqueous solubility level, 0 is extremely low, 1 is very low, 2 is low, 3 is good, and 4 is optimal. d CYP2D6, inhibition of CYP2D6 enzyme, t is an inhibitor, f is a non-inhibitor. e PPB, f means less than 90%, t means more than 90%.
Metabolites 2022, 12, x FOR PEER REVIEW 14 of 24 Figure 9. ADMET profile of the African metabolites and the reference.

Toxicity Studies
The prediction of toxicity of a molecule depending on computer software (in silico) has been employed effectively to select drug leads in the field of drug design, as in vitro and in vivo methods are usually limited by lack of time, budget, and ethical restrictions [62,63].

Toxicity Studies
The prediction of toxicity of a molecule depending on computer software (in silico) has been employed effectively to select drug leads in the field of drug design, as in vitro and in vivo methods are usually limited by lack of time, budget, and ethical restrictions [62,63].
The toxicity of 7 compounds that displayed good ADMET profiles was predicted in silico using Discovery Studio software concerning 7 different models. The employed models are FDA rat carcinogenicity [64,65], carcinogenic potency median toxic dose, (TD 50 ) [66], rat maximum tolerated dose (MTD) [67,68], rat oral LD 50 [69], rat chronic lowest-observedadverse-effect level (LOAEL) [70,71], and ocular and skin irritancy [72]. As shown in Table 5, compounds 2982 and 3412 were proposed as carcinogenic. In consequence, both were refused. Also, all compounds, excluding 2197, are expected to have TD 50 and TD 50 values more than the reference. Thus, 2197 was excluded too. All compounds were computed to have LOAEL values more than the reference and to be non-irritant in the skin model. On the other hand, all compounds except 1330 showed different degrees of ocular irritancy. The acquired results privilege compounds Hippacine (164), 4,2 -dihydroxy-4methoxychalcone (1330), 2 ,5 -dihydroxy-4-methoxychalcone (1331), and wighteone (2195) as the most convenient inhibitors against the target enzyme. Among the selected compounds, wighteone displayed the most favorable docking score and energy. Wighteone is an isoflavonoid that has been isolated from the bark of a South African Erythrina species showing promising antibacterial effects [73]. It was also isolated from several Maclura species before [74,75]. The antiviral activity of wighteone against HIV has been reported in vitro [76], and its in silico potentiality against HIV-1 protease enzyme with a binding affinity of −8.7 Kcal/mol [77].

Molecular Dynamics (MD) Simulations
Although molecular docking can predict the correct binding poses of a molecule inside the active site of a certain protein, it has a major drawback in that it considers the proteins rigid, and thus doesn't allow the protein to adjust its conformation during the docking process [78]. On the other hand, MD simulations can efficiently predict how every single atom in a specific protein will move over a specific time, depending on a physical model of the interatomic interactions [79]. Correspondingly, MD simulations have been successfully utilized to examine the conformation changes in protein-ligand interactions and protein dynamics and folding [80]. MD simulation is an effective and accurate in silico technique that can describe the binding mode, stability, and flexibility of a certain receptor and a specific ligand for a determined time [81].
Molecular dynamics (MD) simulations were carried out to mimic the dynamic nature of PLpro-wighteone interaction under physiological conditions and to investigate the stability of binding complex simulation for 100 ns.

RMSD and RMSF Analysis
The binding of a certain ligand in a specific protein causes notable changes in the structure [82]. Consequently, the root mean square deviation (RMSD) parameter was investigated to explore whether the structure of the PLpro-wighteone complex is stable and near the experimental structure. Figure 10 shows that the PLpro-wighteone complex exhibited a good RMSD value along with 100 ns MD; the PLpro showed an RMSD value of 2.5 Å too, while the complex exhibited an average RMSD value of 3.5 Å, below the acceptable range of 4 Å. After 60 ns, no dramatic increment in the RMSD values was noticed and the complex system reached equilibrium. Root mean square fluctuation (RMSF) was utilized to describe the flexibility differences among wighteone, PLpro, and their complex during the MD simulation for 100 ns. Increasing RMSF value denotes a higher degree of flexibility, while the low value is related to limited movement during the MD simulation. To investigate the average fluctuation of PLpro during the MD study, the RMSF of PLpro upon the binding of wighteone was plotted as a function of residue number (Figure 11). RMSF plot indicated that the residual fluctuation of PLpro was minimized upon binding of wighteone. This result indicates that PLpro residues were more rigid in the presence of wighteone because of binding to wighteone. Root mean square fluctuation (RMSF) was utilized to describe the flexibility differences among wighteone, PLpro, and their complex during the MD simulation for 100 ns. Increasing RMSF value denotes a higher degree of flexibility, while the low value is related to limited movement during the MD simulation. To investigate the average fluctuation of PLpro during the MD study, the RMSF of PLpro upon the binding of wighteone was plotted as a function of residue number (Figure 11). RMSF plot indicated that the residual fluctuation of PLpro was minimized upon binding of wighteone. This result indicates that PLpro residues were more rigid in the presence of wighteone because of binding to wighteone. The radius of gyration (Rg) is an essential parameter that gives a clear insight into the protein stability in terms of volume change. Rg is defined as the RMSD of the massweighted of a group of atoms from their common mass center [83,84]. Accordingly, the analysis of Rg of PLpro during the MD simulation will describe its overall dimensions. The average Rg values were found to suggest the tight packing of PLpro in its native state and when bound to wighteone. PLpro-wighteone complex reached a stable conformation with the radius of gyration fluctuating around 24.4 Å (Figure 12). The radius of gyration (R g ) is an essential parameter that gives a clear insight into the protein stability in terms of volume change. R g is defined as the RMSD of the massweighted of a group of atoms from their common mass center [83,84]. Accordingly, the analysis of R g of PLpro during the MD simulation will describe its overall dimensions. The average R g values were found to suggest the tight packing of PLpro in its native state and when bound to wighteone. PLpro-wighteone complex reached a stable conformation with the radius of gyration fluctuating around 24.4 Å (Figure 12). The solvent-accessible surface area (SASA) is the surface area of the protein which can be accessible to a solvent [85]. The evaluation of SASA provides information about the conformational changes that happen in a protein because of ligand binding. The average SASA values for PLpro were monitored during 100 ns MD simulations. As shown in Figure 13, there were no major changes in the values of SASA of PLpro due to wighteone binding. The solvent-accessible surface area (SASA) is the surface area of the protein which can be accessible to a solvent [85]. The evaluation of SASA provides information about the conformational changes that happen in a protein because of ligand binding. The average SASA values for PLpro were monitored during 100 ns MD simulations. As shown in Figure 13, there were no major changes in the values of SASA of PLpro due to wighteone binding. The solvent-accessible surface area (SASA) is the surface area of the protein which can be accessible to a solvent [85]. The evaluation of SASA provides information about the conformational changes that happen in a protein because of ligand binding. The average SASA values for PLpro were monitored during 100 ns MD simulations. As shown in Figure 13, there were no major changes in the values of SASA of PLpro due to wighteone binding. Figure 13. SASA of PLpro in the MD run. Figure 13. SASA of PLpro in the MD run.

Discussion
The recent advancement in software enabled computational chemistry to perfectly describe the physical and chemical properties of a compound in addition to its potential to interact with a particular protein.
Accordingly, several researchers utilized computational chemistry to identify potential inhibitors against SARS-CoV-2 using different approaches. Exploring the potentialities of FDA-approved antivirus drugs against SARS-CoV-2 was one of the first computational approaches. For instance, the computational potentialities of remdesivir, the FDA-approved anti-ebola, and respiratory syncytial viruses against SARS-CoV-2 main protease were investigated [86]. The same approach was applied to lopinavir/ritonavir [87] and ribavirin [88] targeting SARS-CoV-2 3-chymotrypsin-like protease. One of the employed approaches was the computational-based drug repurposing of non-antiviral FDA-approved drugs such as lurasidone (anti-schizophrenia) against SARS-CoV-2 3CL hydrolase and protease enzymes [89], aclarubicin [90], and selinexor [91], FDA-approved anti-cancers that exhibited computational activities against SARS-CoV-2 main protease.
Our team employed computational chemistry to develop a multiphase in silico technique to discover the most appropriate natural inhibitor via large sets of molecules against a specific enzyme of COVID-19. Within 310 natural antiviral metabolites, the most effective inhibitor against SARS-CoV-2 nsp10 [92], main protease [93,94], and papain-like protease [95] were predicted. Also, within 3009 FDA approved drugs, the most potent inhibitors against SARS-CoV-2 nsp16-nsp10 2 -o-Methyltransferase Complex [96] and SARS-CoV-2 RNA-Dependent RNA Polymerase [97] were anticipated. The SARS-CoV-2 Helicase potential natural inhibitors were expected among 5956 compounds of traditional Chinese medicine also [98]. Further, the most active semisynthetic COVID-19 papain-like protease inhibitor was discovered amidst 69 molecules [99].
Unfortunately, at the current time, we don't have access to investigate the experimental inhibitory effects of the pointed 4 metabolites (Hippacine, 4,2 -dihydroxy-4 -methoxychalcone, 2 ,5 -dihydroxy-4-methoxychalcone, and wighteone) among 4924 African natural metabolites against SARS-CoV-2. However, we presented those 4 metabolites for all scientists worldwide to conduct further in vitro and in vivo studies. The binding potentialities of those metabolites against the SARS-CoV-2 papain-like protease were confirmed through 4 stages of in silico experiments: Stage I: Selection of the most similar metabolites to the co-crystallized ligand (TTT) of SARS-CoV-2 papain-like protease (PDB ID: 3E9S) (fingerprints and molecular similarity studies). This stage selected the most similar 40 metabolites to the co-crystallized ligand; Stage II: Evaluation and filtration according to the binding against papain-like protease by molecular docking to select 17 metabolites that showed correct binding; Stage III: Evaluation and the filtration according to drug-likeness by ADMET and toxicity studies to point out the safest and most drug-like 4 metabolites; Stage IV: Confirmation of the binding against papain-like protease by MD simulations to confirm the binding, conformational and energetic changes that combine the binding process. SARS-CoV-2 papain-like protease (PLpro, PDB ID: 3E9S). A multi-phased in silico approach was employed to select the most similar metabolites to the co-crystallized ligand (TTT).

Conclusions
Four metabolites, Hippacine (164), 4,2 -dihydroxy-4 -methoxychalcone (1330), 2 ,5dihydroxy-4-methoxychalcone (1331), and Wighteone (2195), were selected through 4924 African natural products as the most potent inhibitor against Sars-Cov-2 papainlike protease. The selection is based on multiphase (six experiments) in silico studies. The structural fingerprint study against the co-crystallized ligand (TTT) of SARS-CoV-2 papainlike protease (PDB ID: 3E9S), chemical structural similarity study, molecular docking studies against SARS-CoV-2 papain-like protease (PDB ID: 3E9S), ADMET, and toxicity profiles. Wighteone (2195), the metabolite with the best docking score, was subjected to the molecular dynamics simulation (MD) at 100 ns confirming the binding of wighteone against the target enzyme. We present these interesting results for all scientists worldwide to conduct further in vitro and in vivo studies concerning these promising natural metabolites.