In Silico Screening of the DrugBank Database to Search for Possible Drugs against SARS-CoV-2

Coronavirus desease 2019 (COVID-19) is responsible for more than 1.80 M deaths worldwide. A Quantitative Structure-Activity Relationships (QSAR) model is developed based on experimental pIC50 values reported for a structurally diverse dataset. A robust model with only five descriptors is found, with values of R2 = 0.897, Q2LOO = 0.854, and Q2ext = 0.876 and complying with all the parameters established in the validation Tropsha’s test. The analysis of the applicability domain (AD) reveals coverage of about 90% for the external test set. Docking and molecular dynamic analysis are performed on the three most relevant biological targets for SARS-CoV-2: main protease, papain-like protease, and RNA-dependent RNA polymerase. A screening of the DrugBank database is executed, predicting the pIC50 value of 6664 drugs, which are IN the AD of the model (coverage = 79%). Fifty-seven possible potent anti-COVID-19 candidates with pIC50 values > 6.6 are identified, and based on a pharmacophore modelling analysis, four compounds of this set can be suggested as potent candidates to be potential inhibitors of SARS-CoV-2. Finally, the biological activity of the compounds was related to the frontier molecular orbitals shapes.


Introduction
At the end of December 2019, China reported to the WHO several cases of human respiratory disease in Wuhan, Hubei Province. A novel, highly pathogenic coronavirus strain (CoV) was linked to this fatal pneumonia called COVID-19 [1][2][3]. By January 30, The Public Health Emergency of International Concern (PHEIC) was announced for this new CoV outbreak [2]. The disease expanded dramatically worldwide, being recognized by the WHO as a pandemic in mid-March [1,2,4]. This virus, related to the severe acute respiratory syndrome (SARS) that appeared in 2002 and the Middle East respiratory syndrome-related coronavirus (MERS) of 2012, was named SARS-CoV-2 [1,5].
Biochemically, CoVs are enveloped, positive-sense, single-stranded RNA viruses classified as α, β, γ, and δ [1][2][3]6]. CoV infects humans as well as other animals such as bats, camels, pigs, and pangolins [7]. It was discovered in the 1960s and was thought to cause only a mild disease [5], but SARS-CoV-2 is the third emergence of a coronavirus in less than 20 years with important mortality rates. In this sense, SARS, MERS, and SARS-CoV-2 have death rates of 9.6%, 35.5% and 6.76%, respectively [8,9]. Early COVID-19 symptoms include fatigue, dry cough, muscle soreness, shortness of breath, and fever [10]. In severe cases of infections with SARS-CoV-2, patients require endotracheal intubation and mechanical ventilation, as they develop acute respiratory distress syndrome, pulmonary oedema, and severe pneumonia, causing death from sepsis shock, respiratory, or multiple organ failures [8,11]. For the QSAR analysis, pIC50 = −log(IC50,M) experimental values were employed as the response variable, and three different modelling processes were performed ( Figure  2a). In the first one, 2D molecular descriptors were obtained by employing QuBiLS-MAS indexes; called "2D modelling" from now on. The second simulation was performed us- For the QSAR analysis, pIC 50 = −log(IC 50 ,M) experimental values were employed as the response variable, and three different modelling processes were performed (Figure 2a). In the first one, 2D molecular descriptors were obtained by employing QuBiLS-MAS indexes; called "2D modelling" from now on. The second simulation was performed using the 3D optimized structures to produce 3D molecular descriptors employing the QuBiLS-MIDAS indexes; this is further cited as "3D modelling". The third modelling process was a combination of 2D and 3D descriptors-"2D-3D modelling".

Dataset Splitting (Training/Test)
After selecting the attributes, a rational separation of the dataset into training (35 compounds) and test set (9 compounds) was performed, employing a K-means cluster analysis, taking into account the separation of the dataset in nine clusters, and selecting ~20% of each cluster as a test set ( Figure 3).

Dataset Splitting (Training/Test)
After selecting the attributes, a rational separation of the dataset into training (35 compounds) and test set (9 compounds) was performed, employing a K-means cluster analysis, taking into account the separation of the dataset in nine clusters, and selecting~20% of each cluster as a test set ( Figure 3).
The Euclidean distances were employed as the similarity metric in different instances. The applicability domain (AD) of the training set was evaluated by employing the four different methods implemented in AMBIT discovery [35], i.e., principal component analysis (PCA) range, Euclidean distance, citi-block distance, and probability density. The consensus approach was applied as implemented in Ambit discovery, i.e., if a molecule is labelled as out by two or more methods, then the molecule is considered as OUT. Only one molecule of the test set (ebastine) was found to be OUT the AD for the four employed methods, resulting in 90% coverage.

Dataset Splitting (Training/Test)
After selecting the attributes, a rational separation of the dataset into training (35 compounds) and test set (9 compounds) was performed, employing a K-means cluster analysis, taking into account the separation of the dataset in nine clusters, and selecting ~20% of each cluster as a test set ( Figure 3).

QSAR Modelling
In the first stage of model construction, it is mandatory to evaluate the co-linearity between the variables, which is normally achieved by obtaining the correlation matrix for the models. In this study, all models containing some attributes between 4 and 7 were selected, reaching 13 models. From these models, the most robust for the 2D, 3D, and 2D-3D modellings were chosen (M1, M5 and M13, respectively). Pearson's correlation coefficients between an attribute and the rest are shown graphically in Figure S1 (SM3). As it can be observed, the bar graph for each descriptor shows the minimum (on the left) and the maximum (on the right) value for the Pearson's coefficients correlation of the descriptor with the others. Therefore, the most negative value is −0.45, and the most positive value is 0.60, showing no co-linearity between the attributes in the three models.
Regarding the molecular attributes, the names are related to the mathematical approaches employed for the calculation of these topographic indexes, which are weighed by different physicochemical properties, denoted by the lowercase letters at the end of each descriptor. The physicochemical properties involved in model M1, M5, and M13 are electronegativity (e), softness (s), hardness (h), Van der Waals volume (v), charges (c), polarizability (p), polar surface area (PSA), refractivity index (r), and AlogP (a). Hardness (h) and softness (s) give information about the donor-acceptor properties of a molecule [36]. The PSA is related to the hydrophilic interaction and the formation of hydrogen bonds [37]. The electronegativity (e) is related to the existence of heteroatoms, involving electrostatic protein-ligand interactions [38][39][40]. The polarizability and the refractive index are associated with short-range interaction between the drug and the active site of the protein [41]. The AlogP is the partition coefficient octanol/water, which provides information about the solubility of the molecule in polar and non-polar systems.
Model M13, with only five descriptors, was selected as the most robust. It was constructed by employing multiple linear regression techniques, and the standardized coefficients by multiplying each standardized variable involved in the mathematical representation of each case, as shown in Equation (1). Models 1-12 are given in the Supplementary Material (Table S1, SM3). In Figure 4, a graphical representation of experimental data vs. predictions of model M13 is presented. It can be seen that there is a good fitting between calculated and experimental pIC 50 values in either the training or test sets (Equation (1)).
(1)   Table 1 shows all the statistical parameters used for M13 validation. The coefficient of determination (R 2 ) close to one, Fisher coefficient (F) values greater than 30 and small residual standard deviation (s) values suggest a good fitting. The values of the cross-validation coefficients for the leave-one-out (Q2LOO) and the external validation (Q2ext) are bigger than 0.8, indicating that these models are robust enough for the prediction of pIC50. The Y-scrambling analysis shows that there is no correlation by chance, as demonstrated by the small values obtained for a(R2) and a(Q2). The Organization for Economic Cooperation and Development (OECD) principles [42] establish Tropsha's test as an exhaustive validation procedure determining if a model is robust for the predictability of a given response variable. In this sense, Tropsha's test was applied to the three models. PASS values for all the models prove the good performance in the predictability of the external test set as well as in the leave-one-out cross-validation.

DrugBank 5.1.7 Screening
The screening procedure is depicted in Figure 2b. Model M13 was employed for the screening of four important datasets reported in DrugBank: approved, experimental,  Table 1 shows all the statistical parameters used for M13 validation. The coefficient of determination (R 2 ) close to one, Fisher coefficient (F) values greater than 30 and small residual standard deviation (s) values suggest a good fitting. The values of the cross-validation coefficients for the leave-one-out (Q2LOO) and the external validation (Q2ext) are bigger than 0.8, indicating that these models are robust enough for the prediction of pIC 50 . The Yscrambling analysis shows that there is no correlation by chance, as demonstrated by the small values obtained for a(R2) and a(Q2). The Organization for Economic Cooperation and Development (OECD) principles [42] establish Tropsha's test as an exhaustive validation procedure determining if a model is robust for the predictability of a given response variable. In this sense, Tropsha's test was applied to the three models. PASS values for all the models prove the good performance in the predictability of the external test set as well as in the leave-one-out cross-validation. The screening procedure is depicted in Figure 2b. Model M13 was employed for the screening of four important datasets reported in DrugBank: approved, experimental, nutraceutical, and withdrawn. Then, the datasets were cleaned by removing inorganic, organometallic, salts, and mixture compounds, obtaining 2265 molecules in the approved dataset, 5858 in the experimental, 102 in the nutraceutical, and 228 in the withdrawn drugs dataset ( Table 2). After that, the AD of these datasets was evaluated into the model M13 training set, finding that 79% of the drugs are IN the AD. This good coverage demonstrates the robustness of the model and the reliability of the prediction, fact related to the molecular diversity found in the training dataset. The pIC 50 values of these compounds were estimated and are presented in the Supplementary Material (SM4). The intervals of the predicted pIC 50 values for each dataset are presented in Table 2. From the total dataset evaluated in this screening, it is possible to identify other possible potent antiviral candidates, which could be used for the treatment of COVID-19. In this respect, 57 compounds were found to have predicted pIC 50 values > 6.6. The two best compounds (DB08476 and DB07287) present much higher pIC 50 values (7.5) compared to the other ones (≤7.1). These compounds, 3-amino-azacyclotridecan-2-one (DB08476) and 2-(2,4-dichlorophenoxy)-5-(pyridin-2-ylmethyl)phenol (DB07287), are experimental drugs belonging to the macrolactam and diphenylethers classes, respectively [29]. Interestingly, these types of compounds are used as antibiotic drugs [43][44][45][46].
Regarding the 57 compounds with pIC 50 values > 6.6, miconazole (DB01110) is an azole class antifungal used to treat pityriasis Versicolor, ringworm, and infections caused by Candida sp. [47,48]. Nitenpyram (DB11438) is a nicotinic acetylcholine receptor inhibitor. It is used to treat Ctenocephalides spp. in dogs and cats and is rapidly eliminated in urine. Furthermore, nitenpyram is also considered a second-generation pesticide of the neonicotinoid family [49]. Metildigoxin (DB13401) is a semi-synthetic cardiac glycoside prodrug prescribed to treat arrhythmia and heart failure [50]. After oral administration, it is completely absorbed and rapidly transformed into digoxin [51]. Chemically, it is closely related to digoxin, changing a hydroxyl group in the latter for a methoxy one on the terminal monosaccharide [52]. Furthermore, 2 ,4 -Dinitrophenyl-2deoxy-2-Fluro-B-D-Cellobioside (DB04086) is an experimental drug belonging to the class of o-glycosyl organic compounds [53].
Interestingly, some of the commonly used drugs to treat respiratory problems, bronchitis, asthma, and allergic rhinitis were also identified as possible good candidates against SARS-CoV-2. These include dirithromycin (DB00954), a macrolide glycopeptide antibiotic used to treat upper and lower respiratory infections [54][55][56], monensin (DB11430), flunisolide (DB00180), fluticasone propionate (DB00588), and tixocortol (DB09091). In addition, some antibiotics, widely used in many infections, were identified as potent SARS-CoV-2 inhibitors, such as amikacin, streptomycin, lincomycin, and spiramycin. The latter is used for the treatment of toxoplasmosis in pregnant women.
Recently, a drug database against SARS-CoV-2 called DockCov2 was published [57]. The results of the experimental pIC 50 obtained from the Jeon et al. [28] database were compared to DockCov2 scores for RdRp, Mpro, and the highest score obtained for each compound independently of the target (Table S2,  were present in the DockCov2 database. The results showed a poor correlation (R 2 < 0.11) between experimental and DockCoV2 values, failing to predict the affinity of the Jeon et al. [28] database.
The strength that makes M13 a robust pIC 50 prediction model is the use of experimental data and that a biological target does not need to be identified. The methods reported, such as the ones using a double (Autodock Vina and MM-GBSA) scoring approach [58] and virtual screening [59], rely only on in silico methodology, with the flaws these methods have including the requirement of a specific biological target.
This screening was complemented with pharmacophore modelling. For the model construction, the four most active compounds extracted from the Jeon et al. [28] database were used (digoxin, digitoxin, salinomycin, and niclosamide). The results obtained from the Pharmagist web server (https://bioinfo3d.cs.tau.ac.il/PharmaGist/ (accessed on 18 February 2021)) show that the pharmacophore is composed of three hydrogen bond acceptors (HBA) and one hydrophobic (HPH) interaction ( Figure 5). The four features and a combination of three features were scanned against the experimental database and tabulated in Table 3. between experimental and DockCoV2 values, failing to predict the affinity of the Jeon et al. [28] database. The strength that makes M13 a robust pIC50 prediction model is the use of experimental data and that a biological target does not need to be identified. The methods reported, such as the ones using a double (Autodock Vina and MM-GBSA) scoring approach [58] and virtual screening [59], rely only on in silico methodology, with the flaws these methods have including the requirement of a specific biological target.
This screening was complemented with pharmacophore modelling. For the model construction, the four most active compounds extracted from the Jeon et al. [28] database were used (digoxin, digitoxin, salinomycin, and niclosamide). The results obtained from the Pharmagist web server (https://bioinfo3d.cs.tau.ac.il/PharmaGist/ (accessed on 18 February 2021)) show that the pharmacophore is composed of three hydrogen bond acceptors (HBA) and one hydrophobic (HPH) interaction ( Figure 5). The four features and a combination of three features were scanned against the experimental database and tabulated in Table 3.  Looking at the number of hits obtained, models 4 and 5 (Table 3) were chosen as the best ones and used for DrugBank screening. Model 4 is composed of two HBAs and one HPH interaction. The distance between the two HBAs is 2.8 Å, while the distance from HBA1/HBA2 to HPH4 is 12.0 Å and 13.6 Å, respectively. For model 5, the pharmacophore is formed by three HBAs with a distance of 2.8 Å between HBA1 and HBA2, 6.0 Å between HBA1 and HBA3, and 7.6 Å between HBA2 and HBA3. Model 4 and model 5 were tested against the Jeon et al. [28] database (Table S3,    Looking at the number of hits obtained, models 4 and 5 (Table 3) were chosen as the best ones and used for DrugBank screening. Model 4 is composed of two HBAs and one HPH interaction. The distance between the two HBAs is 2.8 Å, while the distance from HBA1/HBA2 to HPH4 is 12.0 Å and 13.6 Å, respectively. For model 5, the pharmacophore is formed by three HBAs with a distance of 2.8 Å between HBA1 and HBA2, 6.0 Å between HBA1 and HBA3, and 7.6 Å between HBA2 and HBA3. Model 4 and model 5 were tested against the Jeon et al. [28] database (Table S3, SM3), where 15 compounds matched model 4 with RMSD values lower than 0.54. In model 5, 14 compounds matched the pharmacophore with RMSD values lower than 0.32. This is not surprising as QSAR model M13 is based on experimental data and is not based on a single biological target.
DrugBank screening produced 48 hits for model 4 and 45 hits for model 5. As the same compound can produce several hits by changing its pose, the lists were cleaned, leaving the pose for each compound with the lowest RMSD. Then, the pIC 50 value of each hit was predicted and filtered (pIC 50 > 6.6). Six compounds were obtained for model 4 and seven for model 5 (Table S4, SM3). Three compounds, DB01980, DB03259, and DB02438, fit both models. DB03259 was compared to the pharmacophore model 4, finding a distance between both HBAs of 3.1 Å, between HBA1 and HPH4 of 5.6 Å, and between HBA2 and HPH4 of 7.1 Å ( Figure S2a, SM3). The same was performed with DB02438 and pharmacophore model 5, finding a better fit to the model with a difference in the HBA1-HBA2 distance of 0.0 Å, HBA1-HBA3 of 0.5 Å, and HBA2-HBA3 of 0.1 Å ( Figure S2b, SM3).

Docking Study
As described in the methodology section, molecules taken from the Jeon et al. [28] database were docked against Mpro, PLpro, RdRp, and tabulated in Table S5. Furthermore, the two molecules with the biggest predicted pIC 50 value, plus the best two compounds obtained from the pharmacophore model 4 and the best two from the pharmacophore model 5, were also considered, and their results are presented in Table S6 (six compounds total). The calculation area was reduced to the active site spotted experimentally. When knowing the active site, reducing the calculation area to it is an efficient approach, as all the computational cost is used in finding the right pose, rather than looking for the active site in the entire enzyme. Ligands extracted from the experimental structure were also docked as a validation procedure, finding that, for Mpro and PLpro, the docked structure superimposed the experimental one. For RdRp, experimental and docked structures are in different places because its Cryo-EM structure was elucidated when remdesivir was already incorporated to the RNA strand and outside the active site.
Docking scores range from −4.7 to −9.2 kcal/mol in Mpro, from −4.7 to −8.5 kcal/mol in PLpro, and from −4.5 to −9.4 kcal/mol in RdRp, showing that all molecules in the dataset might be able to fit in any of the three enzymes active sites inhibiting them, and affecting the SARS-CoV-2 life cycle. Interestingly, the highest docking scores were found for digoxin, which is the molecule with the biggest pIC 50 (6.72). Digitoxin and Salinomycin, which present the second and third highest pIC 50 values, also present higher docking scores, being −7.8 and −8.0 kcal/mol for MP, −8.9 and −7.2 kcal/mol for PL, and −9.3 and −8.6 kcal/mol for RdRp, respectively. Docking affinity scores were compared with experimental pIC 50 values, finding a poor correlation with r 2 values below 0.17. It is well-known that docking scores are not good for predicting binding affinities [60,61]. Most docking scoring functions depend on the size of the ligand, as large molecules will have more functional groups and make more interactions with the active site. Still, docking scores are very useful to separate which molecules can bind to a biological target and which ones are less likely or definitely will not fit in an enzymatic pocket. Furthermore, docking is key to study the different interactions and see how changes in the molecule can affect how it fits the active site. A deep analysis of the different poses and interactions with the enzyme allows us to filter the best candidates. In this sense, seven drugs were chosen from the dataset-four from the top of the list and three from the bottom. The dataset was ordered from highest to lowest pIC 50 values, and digoxin, hexachlorophene, bazedoxifene, dronedarone, thioridazine, chloroquine, and remdesivir were selected. Moreover, the six molecules chosen from the DrugBank screening were also considered. Table 4 shows the different interactions resulting from the docking calculation of these thirteen drugs against Mpro. From the Jeon et al. [28] database, the result shows that remdesivir is the drug that interacts with more residues (18). Digoxin, bazedoxifene, and dronedarone interact with 15 residues, while thioridazine and chloroquine interact with 12. Hexachlorophene is the drug interacting with fewer residues (five). From those interactions, remdesivir forms four hydrogen bonds (HBs), digoxin and dronedarone form two HBs, and hexachlorophene and dronedarone form one HB. From the molecules taken from the screening, DB07287 and DB02213 are the ones interacting with more residues (11); DB02438 interacts with 10, DB03259 interacts with 9, while DB01980 and DB08476 interact with 8 residues. DB02438 is the one presenting the most HBs with eight, followed by DB03259 and DB02213 with three and DB01980 and DB08476 with one. DB07287 does not form any HBs. In addition, the molecular docking analysis of the known Mpro inhibitor (N3) was performed and the result shows that it interacts with 21 residues, 8 of them being HBs ( Figure 6).
The active site of Mpro is a long pocket with a narrow chamber in the middle of it. During the interaction with inhibitor N3, the backbone of its peptide-like structure fits along with the active site where the side chain of a Leu residue fits in the side chamber, making the interaction stronger ( Figure 6). Comparing with the studied drugs, digoxin fits in the entire pocket, but it is not able to enter the chamber due to the lack of side chains. Hexachlorophene's most stable pose fits in the middle of the pocket, but it does not enter the side chamber either. Bazedoxifene fits in the pocket, but due to the size of the molecule, it only interacts with one end of the pocket and it is not able to reach the chamber. Dronedarone interacts with half of the pocket but it is unable to enter the chamber, the same as chloroquine. Thioridazine fits in half of the active pocket, entering the side chamber, although it is not able to make any HBs, which may explain its low affinity. Remdesivir can occupy almost all the active site, including the chamber. From the DrugBank screening, DB01980 and DB02438 fit in the pocket but are not able to enter the side chamber, while DB03259, DB02213, DB08476, and DB07287 do manage to fit in the side chamber. Although some of these drugs can enter and have a good fit in the active site of Mpro, the lack of hydrogen bonding and the fewer interactions, compared to the N3 inhibitor, may be a drawback in achieving good binding energies that can compete with natural substrates inhibiting the enzyme. The active site of Mpro is a long pocket with a narrow chamber in the middle of it. During the interaction with inhibitor N3, the backbone of its peptide-like structure fits along with the active site where the side chain of a Leu residue fits in the side chamber, making the interaction stronger ( Figure 6). Comparing with the studied drugs, digoxin fits in the entire pocket, but it is not able to enter the chamber due to the lack of side chains. Hexachlorophene's most stable pose fits in the middle of the pocket, but it does not enter the side chamber either. Bazedoxifene fits in the pocket, but due to the size of the molecule, it only interacts with one end of the pocket and it is not able to reach the chamber. Dronedarone interacts with half of the pocket but it is unable to enter the chamber, the same as chloroquine. Thioridazine fits in half of the active pocket, entering the side chamber, although it is not able to make any HBs, which may explain its low affinity. Remdesivir can occupy almost all the active site, including the chamber. From the DrugBank screening, DB01980 and DB02438 fit in the pocket but are not able to enter the side chamber, while DB03259, DB02213, DB08476, and DB07287 do manage to fit in the side chamber. Although some of these drugs can enter and have a good fit in the active site of Mpro, the lack of hydrogen bonding and the fewer interactions, compared to the N3 inhibitor, may be a drawback in achieving good binding energies that can compete with natural substrates inhibiting the enzyme.
Next, PLpro was studied and the interactions were found for each drug with the active site presented in Table 5. For Mpro, Asn142 and Gly143 seem to be critical residues for binding as N3, Digoxin, DB02438, and DB02213 interact with them ( Figure S3 and Figure 6b). Furthermore, interactions of DB02438 and DB02213 with Asn142, Gly143, and Glu166 match pharmacophore model 5, where HBA1 interacts with Asn142, HBA2 with Glu166, and HBA3 with Gly143. In DB02438, a distance of 6.6 Å and 3.4 Å was found for HBA1-HBA3 and HBA1-HBA2, respectively. For DB02213, HBA1-HBA3 presents a distance of 5.1 Å, while HBA1-HBA2 presents a distance of 2.9 Å.
Next, PLpro was studied and the interactions were found for each drug with the active site presented in Table 5.  Plpro has a long pocket, similar to Mpro, although it is more superficial and does not have any chambers. The main characteristic of the Plpro active site is that it closes, forming a tunnel-like structure at one end, allowing only long, thin structures to pass through it. Analyzing the poses resulting from the docking calculation, none of the drugs can pass through this tunnel. In the case of digoxin, the longest structure, it goes above this tunnel (Figure 7). The other molecules interact and fit only with the bigger end of the active site of PLpro. Looking at the interactions, dronedarone interacts with more residues (14), followed by digoxin (12), thioridazine (11), remdesivir (11), bazedoxifene (10), hexachlorophene (9), chloroquine (9), DB01980 (9), DB02213 (9), DB07287 (9), DB03259 (8), DB02438 (8), and DB08476 (8). From those, DB02438 forms five HBs; digoxin and DB02213 form four HBs; hexachlorophene and remdesivir form three HBs; bazedoxifene and DB03259 form two HBs; and dronedarone, DB01980, DB08476, and DB07287 form one HB. Thioridazine and chloroquine do not form any HBs. Comparing these results with the interaction between PLpro and experimentally found VIR250, this inhibitor presents 18 interactions, from which 6 are HBs. It seems that being able to go through the tunnel-like structure of the active site is key to have a binding affinity good enough to compete with natural peptides.  Plpro has a long pocket, similar to Mpro, although it is more superficial and does not have any chambers. The main characteristic of the Plpro active site is that it closes, forming a tunnel-like structure at one end, allowing only long, thin structures to pass through it. Analyzing the poses resulting from the docking calculation, none of the drugs can pass through this tunnel. In the case of digoxin, the longest structure, it goes above this tunnel (Figure 7). The other molecules interact and fit only with the bigger end of the active site of PLpro. Looking at the interactions, dronedarone interacts with more residues (14), followed by digoxin (12), thioridazine (11), remdesivir (11), bazedoxifene (10), hexachlorophene (9), chloroquine (9), DB01980 (9), DB02213 (9), DB07287 (9), DB03259 (8), DB02438 (8), and DB08476 (8). From those, DB02438 forms five HBs; digoxin and DB02213 form four HBs; hexachlorophene and remdesivir form three HBs; bazedoxifene and DB03259 form two HBs; and dronedarone, DB01980, DB08476, and DB07287 form one HB. Thioridazine and chloroquine do not form any HBs. Comparing these results with the interaction between PLpro and experimentally found VIR250, this inhibitor presents 18 interactions, from which 6 are HBs. It seems that being able to go through the tunnel-like structure of the active site is key to have a binding affinity good enough to compete with natural peptides.  Finally, the interaction of different drugs with RdRp was evaluated, and the results are shown in Table 6. Hydrogen bonds in bold.
The main function of RdRp is to catalyze the replication of RNA, using nucleotides and an RNA template. Therefore, nucleotide analogues are considered good RdRp inhibitors. The active site of RdRp is very accessible, allowing for RNA synthesis (Figure 8a). Analyzing the thirteen drugs, all of them fit perfectly in the active site (Figure 8b). Digoxin was found to form 19 interactions with the active site, followed by dronedarone (15); bazedoxifene (14); remdesivir (13); thioridazine (13); chloroquine, DB02213, and DB07287 (11); DB08476 (9); DB03259 and DB02438 (8); DB01980 (7); and hexachlorophene (6). Seven HBs were found in digoxin interaction; four HBs in remdesivir, DB02438, and DB02213; two HBs in dronedarone and bazedoxifene; and one HB in hexachlorophene, DB03259, and DB01980. The results of digoxin agree with experimental data, as it is the one with the most interactions and HBs being able to bind tightly to the active site, competing with the natural substrate. Furthermore, digoxin matches pharmacophore model 5, where HBA1 interacts with Arg555, HB2 with Ser682, and HBA3 with Thr687 ( Figure S4, SM3). In this sense, there is an HBA1-HBA2 distance of 2.8 Å, HBA1-HBA3 of 7.9 Å, and HBA2-HBA3 of 8.0 Å. These results suggest that the inhibition of SARS-CoV-2 can precede by one of these three targets, with RdRp being the most likely action mechanism for the most active compound of the series (digoxin).

Molecular Dynamics Simulation
The results obtained from the docking analysis were further studied using molecular dynamics techniques. The quality of the simulations was assessed, observing no weird behavior on the system by graphing the box size, density, potential energy, pressure, temperature, and volume of the system against the simulation time. Furthermore, the target-ligand complex shows a stable behavior with RMSD values lower than 0.4 nm throughout the simulation in all the models.
For Mpro, digoxin, chloroquine, dronedarone, and remdesivir were compared. The RMSD for the complex and the ligand is shown in Figure S5a. For the complex, the RMSD in all cases remained stable, with a value around 0.2 nm, while for the ligand, remdesivir presents more fluctuations throughout the simulation, followed by dronedarone. Coulomb interaction energies are more negative for digoxin, while Lennard-Jones present the lowest values in dronedarone. This means that there might be a tighter interaction between digoxin and dronedarone with the active site ( Figure S5b, SM3). This can be further corroborated by analyzing the HBs. For digoxin, on average, there are five HBs between the drug and the enzyme. In some parts of the simulation, HBs increase up to seven, while the lowest HB found was one. On the other hand, chloroquine is the model forming the fewest HBs with an average of one HB, two HBs being the highest and zero being the lowest ( Figure S5c). Finally, the lifetimes of the main HBs extracted from their occupancy during the simulation were estimated and plotted in Figure S6 (SM3).
The results show that there is a relation between the occupancy and the lifetime. Lifetime values range from 0.37 ps to 0.91 ps, and occupancies between 2% and 48%. Dronedarone interaction with Thr25 produces the highest HB lifetime, also having the highest occupancy (48%).
The structure resulting from the docking calculation was compared with the final structure of the MD simulation. For digoxin, the structure after the simulation fits better in the pocket, with a methyl group pointing into the side chamber ( Figure S7a). Chloroquine, on the other side, due to the lack of HB and strong interaction, moves out of the pocket, with only the quinolone ring having some interaction with the active site ( Figure  S7b).
The results of the MD simulation of PLpro showed that digoxin and chloroquine leave the active site at the end of the simulation. This may be attributed to the fact that none of the molecules entered the tunnel-like structure, so a strong interaction could not be achieved for these molecules. The behavior of bazedoxifene was studied in more detail because it is the molecule with the highest pIC50 that after the simulation time stays in the active site of the enzyme. Figure S8 shows the RMSD, Coulomb, Lennard-Jones energy, and HBs.
The results show that Coulomb energy is just below zero, which indicates that although the molecule stayed in the active site, the interaction is not very strong. Again, this

Molecular Dynamics Simulation
The results obtained from the docking analysis were further studied using molecular dynamics techniques. The quality of the simulations was assessed, observing no weird behavior on the system by graphing the box size, density, potential energy, pressure, temperature, and volume of the system against the simulation time. Furthermore, the targetligand complex shows a stable behavior with RMSD values lower than 0.4 nm throughout the simulation in all the models.
For Mpro, digoxin, chloroquine, dronedarone, and remdesivir were compared. The RMSD for the complex and the ligand is shown in Figure S5a. For the complex, the RMSD in all cases remained stable, with a value around 0.2 nm, while for the ligand, remdesivir presents more fluctuations throughout the simulation, followed by dronedarone. Coulomb interaction energies are more negative for digoxin, while Lennard-Jones present the lowest values in dronedarone. This means that there might be a tighter interaction between digoxin and dronedarone with the active site ( Figure S5b, SM3). This can be further corroborated by analyzing the HBs. For digoxin, on average, there are five HBs between the drug and the enzyme. In some parts of the simulation, HBs increase up to seven, while the lowest HB found was one. On the other hand, chloroquine is the model forming the fewest HBs with an average of one HB, two HBs being the highest and zero being the lowest ( Figure S5c). Finally, the lifetimes of the main HBs extracted from their occupancy during the simulation were estimated and plotted in Figure S6 (SM3).
The results show that there is a relation between the occupancy and the lifetime. Lifetime values range from 0.37 ps to 0.91 ps, and occupancies between 2% and 48%. Dronedarone interaction with Thr25 produces the highest HB lifetime, also having the highest occupancy (48%).
The structure resulting from the docking calculation was compared with the final structure of the MD simulation. For digoxin, the structure after the simulation fits better in the pocket, with a methyl group pointing into the side chamber ( Figure S7a). Chloroquine, on the other side, due to the lack of HB and strong interaction, moves out of the pocket, with only the quinolone ring having some interaction with the active site ( Figure S7b).
The results of the MD simulation of PLpro showed that digoxin and chloroquine leave the active site at the end of the simulation. This may be attributed to the fact that none of the molecules entered the tunnel-like structure, so a strong interaction could not be achieved for these molecules. The behavior of bazedoxifene was studied in more detail because it is the molecule with the highest pIC 50 that after the simulation time stays in the active site of the enzyme. Figure S8 shows the RMSD, Coulomb, Lennard-Jones energy, and HBs.
The results show that Coulomb energy is just below zero, which indicates that although the molecule stayed in the active site, the interaction is not very strong. Again, this agrees with the HBs, where in almost all the simulations there is only one. Tyr273 is the residue making the strongest HB, with a lifetime of 0.60 ps and an occupancy of 8%. The increase in the RMSD of the ligand in the last 4 ns of simulation means more fluctuation in the drug, which may be caused by the loss of interactions with the active site. The results of the starting and the final point of the MD simulation were analyzed to see how the ligand changed throughout the simulation (Figure 9). The first important change is the loss of the HB between the drug and Gly163. Still, the HB with Tyr273 is kept along with the simulation, which is the main reason for the molecule not to leave the active site. The side chain of bazedoxifene is very flexible, and because it cannot form strong bonds with the active site, it lost interaction with Lau162 and Gln269 going outside the active site. agrees with the HBs, where in almost all the simulations there is only one. Tyr273 is the residue making the strongest HB, with a lifetime of 0.60 ps and an occupancy of 8%. The increase in the RMSD of the ligand in the last 4 ns of simulation means more fluctuation in the drug, which may be caused by the loss of interactions with the active site. The results of the starting and the final point of the MD simulation were analyzed to see how the ligand changed throughout the simulation (Figure 9). The first important change is the loss of the HB between the drug and Gly163. Still, the HB with Tyr273 is kept along with the simulation, which is the main reason for the molecule not to leave the active site. The side chain of bazedoxifene is very flexible, and because it cannot form strong bonds with the active site, it lost interaction with Lau162 and Gln269 going outside the active site. For the MD simulation against RdRp, again digoxin, chloroquine, dronedarone, and remedesivir were studied. Special attention is put on remdesivir as it is a known ligand for this polymerase. The results for the interaction of the three drugs with RdRp showed that all of them stayed in the active site. Analyzing remdesivir in detail, the RMSD plot shows an average value of 0.5 nm, rising from 0.2 to 0.7 nm in the first 10 ns and stabilizing at around 0.3 nm after that. This result was compared with digoxin and dronedarone, whose RMSD values are constant throughout the simulation with a RMSD around 0.5 nm ( Figure 10). This agrees with the HBs (between three and four) formed in almost the whole simulation. For the MD simulation against RdRp, again digoxin, chloroquine, dronedarone, and remedesivir were studied. Special attention is put on remdesivir as it is a known ligand for this polymerase. The results for the interaction of the three drugs with RdRp showed that all of them stayed in the active site. Analyzing remdesivir in detail, the RMSD plot shows an average value of 0.5 nm, rising from 0.2 to 0.7 nm in the first 10 ns and stabilizing at around 0.3 nm after that. This result was compared with digoxin and dronedarone, whose RMSD values are constant throughout the simulation with a RMSD around 0.5 nm ( Figure 10). This agrees with the HBs (between three and four) formed in almost the whole simulation.
For chloroquine, RMSD values almost double compared to digoxin and remdesivir, with values around 1 nm. Coulomb and Lennard-Jones energies for digoxin are lower in all the simulations compared to chloroquine and remdesivir. This suggests that digoxin forms a tighter interaction with RdRp. Dronedarone presents a low Lennard-Jones interaction energy (comparable to digoxin), although its Coulomb energy is higher (~5 kcal/mol). As the simulation proceeds, the energy in remdesivir presents a greater fluctuation, which agrees with the RMSD plot. Analyzing remdesivir more in detail, it can be seen that in the first 10 ns, the molecule moves around 0.2 nm away from the active site. As remdesivir is moving out, HBs are less likely to be formed, and that is why between 2 and 10 ns the average HBs is two. Interestingly, although the remdesivir molecule is outside the active site, it matches with the entrance of the RNA strand (Figure 11a pale cyan structure). After this time, remdesivir is stabilized in the active site (Figure 11a, cyan structure). This might suggest that remdesivir, as a nucleotide analogue, interacts with other sites of the molecule waiting like any other nucleotide to enter the active and be incorporated to the RNA during the replication process. On the other hand, digoxin tight interactions make the molecule pose similar at the beginning and the end of the MD simulation (Figure 11b). Chloroquine interaction is weaker, as determined by the one HB that is formed in most of the simulation. Therefore, it can be seen that chloroquine moves about 10 Å inside the active site during the simulation (Figure 11d). The mechanism of digoxin may be different from remdesivir. While remdesivir substitutes a nucleotide in the RNA sequence, making further RNA transcription fail, digoxin tightly binds itself to the active site, preventing the RNA strand being replicated. Figure 11c shows how digoxin is in the same place as an RNA strand would be when interacting with RdRp. For chloroquine, there is an interaction between this drug and the active site, although the movement throughout the simulation may indicate that this is not strong enough to inhibit RNA replication. Occupancies and HB lifetimes support the aforementioned, as digoxin presents the highest lifetimes (0.95 ps) and the highest occupancy (81%), while chloroquine presents the lowest occupancy (2%) (Figure 12).
Based on the highest occupancy and HB lifetimes, two key residues were chosen for the interaction of RdRp with each of the four ligands. Then, the distances between the ligand and those key residues were measured during the simulation time and plotted in Figure 13. For chloroquine, RMSD values almost double compared to digoxin and remdesivir, with values around 1 nm. Coulomb and Lennard-Jones energies for digoxin are lower in all the simulations compared to chloroquine and remdesivir. This suggests that digoxin forms a tighter interaction with RdRp. Dronedarone presents a low Lennard-Jones interaction energy (comparable to digoxin), although its Coulomb energy is higher (~5 kcal/mol). As the simulation proceeds, the energy in remdesivir presents a greater fluctuation, which agrees with the RMSD plot. Analyzing remdesivir more in detail, it can be seen that in the first 10 ns, the molecule moves around 0.2 nm away from the active site. As remdesivir is moving out, HBs are less likely to be formed, and that is why between 2 and 10 ns the average HBs is two. Interestingly, although the remdesivir molecule is outside the active site, it matches with the entrance of the RNA strand (Figure 11a pale cyan structure). After this time, remdesivir is stabilized in the active site (Figure 11a, cyan structure). This might suggest that remdesivir, as a nucleotide analogue, interacts with other sites of the molecule waiting like any other nucleotide to enter the active and be incorporated to the RNA during the replication process. On the other hand, digoxin tight chloroquine, there is an interaction between this drug and the active site, although the movement throughout the simulation may indicate that this is not strong enough to inhibit RNA replication. Occupancies and HB lifetimes support the aforementioned, as digoxin presents the highest lifetimes (0.95 ps) and the highest occupancy (81%), while chloroquine presents the lowest occupancy (2%) (Figure 12).  Based on the highest occupancy and HB lifetimes, two key residues were chosen for the interaction of RdRp with each of the four ligands. Then, the distances between the chloroquine, there is an interaction between this drug and the active site, although the movement throughout the simulation may indicate that this is not strong enough to inhibit RNA replication. Occupancies and HB lifetimes support the aforementioned, as digoxin presents the highest lifetimes (0.95 ps) and the highest occupancy (81%), while chloroquine presents the lowest occupancy (2%) (Figure 12).  Based on the highest occupancy and HB lifetimes, two key residues were chosen for the interaction of RdRp with each of the four ligands. Then, the distances between the  ligand and those key residues were measured during the simulation time and plotted in Figure 13. For the four ligands, digoxin (Figure 13b) is the one forming tighter interactions with both key residues, agreeing with its high pIC50 value. The average length found for digoxin with ASP684 was 1.88 Å, while for ASN543 it was 2.20 Å. The distance between ASP684 and the ligand takes around 2.5 ns to become stabilized, where a value of 4.34 Å is the highest one found. After stabilization, 2.97 Å is the longest distance between the ligand and this residue. Throughout the simulation, the distance between digoxin and ASN543 does not go above 3.18 Å, meaning that the HBs formed are very tight and stable. Analyzing RdRp's natural substrate remdesivir, strong interactions are also formed in both residues as expected. The average distance found was 2.44 Å for ASN496 and 1.97 Å for ARG569. The HB formed between remdesivir and ARG569 is stable throughout all the simulation, while ASN496 (Figure 13d) stabilized after 10 ns. For chloroquine and dronedarone, both ligands formed interactions with GLN573 and TYR689. The average distance estimated for GLN573 was 4.85 Å for chloroquine and 5.08 Å for dronedarone. A shorter average distance was found for the interaction with TYR689 (3.35 Å for chloroquine and 4.10 Å for dronedarone). These interactions are not so strong, as seen by the greater fluctuation observed in the graph vs. the digoxin and remdesivir ones. This fact supports the lower pIC50 values found for both molecules. Although the distance is stabilized after 20 ns of simulation, values above the HB cutoff distance of 3.6 Å can be found in this portion of the simulation.

Total Binding Energy Calculations
A final step in a protein-ligand complex interaction study is to calculate the free binding energy and its van der Waals, electrostatic, and solvent-accessible surface area (SASA) contributions. MMPBSA calculations were performed on the same molecules chosen for the MD simulations. The binding energy for the PLpro, complexed with chloroquine, dronedarone, and digoxin, was not computed because those drugs did not stay in the active site of the enzyme during the MD simulation, as discussed above. For all the other complexes, binding energy and their contributions are given in Table 7. For the four ligands, digoxin (Figure 13b) is the one forming tighter interactions with both key residues, agreeing with its high pIC 50 value. The average length found for digoxin with ASP684 was 1.88 Å, while for ASN543 it was 2.20 Å. The distance between ASP684 and the ligand takes around 2.5 ns to become stabilized, where a value of 4.34 Å is the highest one found. After stabilization, 2.97 Å is the longest distance between the ligand and this residue. Throughout the simulation, the distance between digoxin and ASN543 does not go above 3.18 Å, meaning that the HBs formed are very tight and stable. Analyzing RdRp's natural substrate remdesivir, strong interactions are also formed in both residues as expected. The average distance found was 2.44 Å for ASN496 and 1.97 Å for ARG569. The HB formed between remdesivir and ARG569 is stable throughout all the simulation, while ASN496 (Figure 13d) stabilized after 10 ns. For chloroquine and dronedarone, both ligands formed interactions with GLN573 and TYR689. The average distance estimated for GLN573 was 4.85 Å for chloroquine and 5.08 Å for dronedarone. A shorter average distance was found for the interaction with TYR689 (3.35 Å for chloroquine and 4.10 Å for dronedarone). These interactions are not so strong, as seen by the greater fluctuation observed in the graph vs. the digoxin and remdesivir ones. This fact supports the lower pIC 50 values found for both molecules. Although the distance is stabilized after 20 ns of simulation, values above the HB cutoff distance of 3.6 Å can be found in this portion of the simulation.

Total Binding Energy Calculations
A final step in a protein-ligand complex interaction study is to calculate the free binding energy and its van der Waals, electrostatic, and solvent-accessible surface area (SASA) contributions. MMPBSA calculations were performed on the same molecules chosen for the MD simulations. The binding energy for the PLpro, complexed with chloroquine, dronedarone, and digoxin, was not computed because those drugs did not stay in the active site of the enzyme during the MD simulation, as discussed above. For all the other complexes, binding energy and their contributions are given in Table 7.  Table 7 shows that all binding energies are negative, which suggests a favourable interaction between the different ligands and the targets. Total binding energies go from −2.77 kcal/mol to −17.55 kcal/mol in Mpro, and from −15.79 kcal/mol to −23.79 kcal/mol in RdRp. For bazedoxifene, a total binding energy value of −10.96 kcal/mol was found for PLpro, suggesting that the mechanism of this drug may be mediated by the inhibition of this protease. Dronedarone differences in its binding energy suggest a more likely polymerase mechanism than a protease one. A total energy difference of 0.15 kcal/mol was found for the interaction of chloroquine with Mpro and RdRp, which may indicate that this drug can act as a protease or polymerase inhibitor. For digoxin, it presents a 7.15 kcal/mol better total binding energy for RdRp than for Mpro. Still, its binding energy to Mpro is the second-highest, which may suggest an RdRp inhibition, but also a possible Mpro mechanism. Comparing the binding energy of Mpro known substrate N3, it is observed that chloroquine and digoxin have a stronger affinity. For RdRp, all the molecules present better binding energies than the known substrate remdesivir. Binding energies were compared to experimental pIC 50 values, finding a very good correlation in the RdRp model with an R 2 of 0.843. Furthermore, experimental and predicted pIC 50 values for the four compounds studied for Mpro and RdRp were compared, and an R 2 higher than 0.99 was found.
In this line, the shape and energy of HOMO and LUMO orbitals have been calculated with the aim of finding any hint that allows us to discriminate between the more active and the least active compounds against COVID-19. In this sense, more and less active compounds from the data in Table S3 were selected to carry out the frontier orbital analysis. According to Figure 14, compounds with the least activity (Figure 14a) have both HOMO and LUMO scattered under almost the entire molecular regions, whereas dihydrogambomic acid ( Figure 14b) and remdisivir (Figure 14c) present their frontier orbital located in the same molecular region. On the other hand, the most active compounds have the frontier orbitals located in a specific, but different, molecular region (Figure 14d-g). Therefore, these results suggest that the acceptor-donor characteristic of these compounds plays a fundamental role in the inhibition of SARS-CoV-2. The results contrast with some reported before, where the antiviral compound showed frontier orbital located in the same region of the molecule, as explained above for remdesivir [83,84]. It is important to remark that for compounds with pIC 50 values in the 5-6 interval, a clear tendency concerning the molecular orbital shapes is not observed; furthermore, they are not shown herein. bital located in the same region of the molecule, as explained above for remdesivir [83,84]. It is important to remark that for compounds with pIC50 values in the 5-6 interval, a clear tendency concerning the molecular orbital shapes is not observed; furthermore, they are not shown herein. Niclosamide; (e). Digitoxin; (f). digoxin; (g). salinomycin. HOMO: highest occupied molecular orbital. LUMO: lowest unoccupied molecular orbital.
In order to verify if the aforementioned pattern is met for the most prominent compounds with a potential anti-COVID-19 activity from the screening, the molecular orbitals for the six compounds chosen for the docking were computed, and the results are presented in Figure 15. According to this, all compounds met this characteristic except for DB08476 and DB02213. For DB08476, the HOMO and LUMO are in the same region of the molecule. In this case, that makes sense because all the functional groups that will be responsible for the main interactions are located in the same region. For DB02213, while the HOMO occupied one molecular region as seen for other active molecules, the LUMO is present in the central region of the molecule sharing some regions with the HOMO. Still, our results suggest that the structural requirement for COVID 19-active compounds may lie not only in the molecular descriptors described in M13, but also in the frontier molecular shape. In order to verify if the aforementioned pattern is met for the most prominent compounds with a potential anti-COVID-19 activity from the screening, the molecular orbitals for the six compounds chosen for the docking were computed, and the results are presented in Figure 15. According to this, all compounds met this characteristic except for DB08476 and DB02213. For DB08476, the HOMO and LUMO are in the same region of the molecule. In this case, that makes sense because all the functional groups that will be responsible for the main interactions are located in the same region. For DB02213, while the HOMO occupied one molecular region as seen for other active molecules, the LUMO is present in the central region of the molecule sharing some regions with the HOMO. Still, our results suggest that the structural requirement for COVID 19-active compounds may lie not only in the molecular descriptors described in M13, but also in the frontier molecular shape.

Datasets Preparation
The dataset used in this study is composed of 44 compounds extracted from the FDA-approved drugs in Joan et al.'s study, which were experimentally tested as possible antiviral candidates in the treatment of COVID-19 [28]. This dataset is diverse in structure and adequate for the construction of a QSAR model. The 2D structures of all compounds were downloaded from PubChem (https://pubchem.ncbi.nlm.nih.gov/ (accessed on 18 February 2021)). Three-dimensional structures were optimized at the Universal Force Field (UFF) level of theory using the RDKit (https://www.rdkit.org/ (accessed on 18 February 2021)), implemented in the QuBiLS-MAS software [85]. The dependent variable used in this study is the experimental value of the half-maximal inhibitory concentration [pIC50 = −log (IC50, M)]. A rational division of the dataset into training (80%) and test sets (20%) was performed by employing a K-means clustering analysis, which has been demonstrated to be more acceptable for an appropriate assessment of the predictability of the models [86]. The clustering analysis was performed on the standardized variables, the Euclidean distance as a similarity metric, and 500 iterations. Then, around 20% of the molecules of each cluster were selected as a test set.

Datasets Preparation
The dataset used in this study is composed of 44 compounds extracted from the FDA-approved drugs in Joan et al.'s study, which were experimentally tested as possible antiviral candidates in the treatment of COVID-19 [28]. This dataset is diverse in structure and adequate for the construction of a QSAR model. The 2D structures of all compounds were downloaded from PubChem (https://pubchem.ncbi.nlm.nih.gov/ (accessed on 18 February 2021)). Three-dimensional structures were optimized at the Universal Force Field (UFF) level of theory using the RDKit (https://www.rdkit.org/ (accessed on 18 February 2021)), implemented in the QuBiLS-MAS software [85]. The dependent variable used in this study is the experimental value of the half-maximal inhibitory concentration [pIC 50 = −log (IC 50 , M)]. A rational division of the dataset into training (80%) and test sets (20%) was performed by employing a K-means clustering analysis, which has been demonstrated to be more acceptable for an appropriate assessment of the predictability of the models [86]. The clustering analysis was performed on the standardized variables, the Euclidean distance as a similarity metric, and 500 iterations. Then, around 20% of the molecules of each cluster were selected as a test set.
In addition, the drugs included in the recent version of the DrugBank database [29] (Version 5.1.7) were downloaded from the webpage (https://www.drugbank.ca/ (accessed on 18 February 2021)) and used in the present study for a virtual screening prospective analysis by taking into account the datasets labelled as Approved (2636 compounds), Experimental (6127 compounds), Nutraceutical (118 compounds), and Withdrawn (245 compounds). All these datasets were cleaned by the removal of inorganic, organometallic, and mixture compounds, while salts were neutralized. The AD was evaluated in these compounds and the ones classified as IN were considered for the prediction of the pIC 50 value.

Descriptors Calculations and Modelling Process
Two types of descriptors were calculated, topographical 2D and 3D. For the calculation of the 2D descriptors, QuBiLS-MAS software [85] was employed, while for the estimation of the 3D descriptors, QuBiLS-MIDAS software [87] was used. Then, 945 2D and 114 3D descriptors were calculated and used for the modelling process. These types of topographical descriptors have demonstrated to be adequate for QSAR model construction in different applications, such as toxicology [88], hepatotoxicity [89], antimalarial [90], and others [91][92][93].
Modelling considering only 2D descriptors, only 3D descriptors, and the combination 2D-3D descriptors was built. In a first stage, a supervised attribute selection using the Genetic Algorithms was applied to determine small subsets of attributes, which contain the major information content about the pIC 50 , taking into account the ratio molecules: descriptors 5:1 [94]. This analysis was performed using MATLAB version R2014b (Mathworks Inc. software, Natick, MA, USA) [95]. The selected subsets were used for the model construction, employing the multiple linear regression approach.

Domain of Applicability
The analysis of the AD (applicability domain) of the model is an important way to measure how reliable the prediction of the pIC 50 value is. The applicability domain of a model consists of the evaluation of the theoretical spatial region, as defined by the descriptors involved in a model. Although there is not a method universally considered the best for determining the AD [96], there are some recognized methods for this analysis, such as the one based on the principal component analysis (PCA) range, the Euclidean distance, citi-block distance, and probability density, which are implemented in AMBIT discovery and used in the present study [35]. The PCA method is based on the analysis of the range of each descriptor, and if at least one of the descriptors of a new molecule is not in the range, then it is considered as OUT of the AD. In the Euclidean and City block methods, distances are determined as a cut-off value of the corresponding distance from the centroid to determine if the compound is OUT or IN. In the probability density method, a standard distribution is constructed using parametric methods, being able to identify interior empty regions in the dataset distribution. The consensus approach was applied to take a decision by taking into account that if a molecule is catalogued as OUT by two or more AD methods, then it can be finally considered OUT [89].

Model Performance
The model performance was evaluated by the traditional statistical analysis parameters: i.
The collinearity between the descriptors was evaluated by taking into account a Pearson's correlation coefficient of <0.7. ii.
The coefficient of determination (R 2 ) is used as a global evaluator of the model. iii.
The predictability of the model is determined by the leave one out cross-validation coefficient (Q 2 LOO ), the external validation coefficient (Q 2 ext ), and the bootstrapping coefficient (Q 2 boot ). iv.
The robustness of the model is evaluated in terms of the Y-scrambling analysis (a(R 2 ) and a(Q 2 )), based on a random perturbation of the Y values. v.
Additionally, the linear fitting of the model was evaluated in terms of the Fisher coefficient value (F) and the residual standard deviation (s).

vi.
Tropsha's test analysis is performed for the leave-one-out and external validation analysis, as suggested by the Organization for Economic Cooperation and Development (OECD) principles [42].

Pharmacophore Modelling
A ligand-based pharmacophore modelling was performed using the four most active compounds extracted from the Jeon et al. [28] database and PharmaGist web server [97,98]. The molecular and electronic features derived from the calculation were used to build pharmacophore models. Then, each model was screened against the previously cleaned approved, experimental, nutraceutical, and withdrawn DrugBank databases using Pharmit [99]. Finally, the molecules that fit the pharmacophore were compared to their predicted pIC 50 values.

Molecular Docking
For the docking studies, 3 enzymes were chosen as biological targets based on their importance in COVID-19, as described in a recent publication [3]. The targets chosen were SARS-CoV-2 Mpro, PLpro, and RdRp, which are key for the virus life cycle. Ligands were obtained from the DrugBank database [29], based on the Jeon et al. dataset [28], and the DrugBank screening, while the enzymes were taken from the Protein Data Bank [100]. For Mpro, the X-ray crystal structure of the enzyme in complex with an inhibitor N3 [101] was chosen (PDB: 6LU7). For PLpro, the crystal structure of the enzyme in complex with a peptide inhibitor VIR250 (PDB: 6WUU) was selected. Finally, for RdRp, the electron microscopy structure of the nsp12-nsp7-nsp8 complex bound to the template primer RNA and the triphosphate form of Remdesivir (PDB: 7BV2) was used. To prepare the structure, Pymol [102] was employed to eliminate water molecules, dimers, natural ligands, and other molecules such as ions. Then, Autodock Tools [103] was used to eliminate nonpolar hydrogen atoms, to identify rotating bonds, and to transform to the correct format (.PDBQT). Calculations were performed using autodock-VINA [104]. For the calculation parameters, spacing was set to 1 A; default exhaustiveness and full flexibility of the ligands were chosen. The grid box for each enzyme is presented in the supplementary information (SM3 , Table S7). Docking assessment was performed as a validation procedure using the docking accuracy strategy. The results were analyzed using Pymol [102], Ligplus+ [105], and Discovery studio visualizer [106].

Molecular Dynamic Simulations
Further classical molecular dynamic (MD) simulations were performed based on experimental pIC 50 . This type of calculation allows us to study the relative stability of the ligand of interest in the active site of the enzyme, gaining insights on different parameters such as hydrogen bond formation. For the calculation, the already cleaned enzyme plus the ligand obtained from the docking calculations were used. The system was solvated using the three-point water model (TIP3), in a cube shape. Sodium or chlorine atoms were used as appropriate to neutralize the charge of the system. Enzyme topology was built using Gromacs 2019 [107], and ligand topology with CHARMM General Force Field (CGenFF) [108]. Calculations were performed with Gromacs 2019 [107] using the Chemistry at Harvard Macromolecular Mechanics (CHARMM) forcefield. First, the system was relaxed and then equilibrated using a constant number of particles, pressure, and temperature (NVT) for 100 ps at 300 K, using Berendsen thermostat temperature coupling [109] before starting the dynamic. For the simulation parameters, thermostat coupling was set to 300 K, pressure coupling at 1 Barr and the simulation ran for 40 ns (40,000 steps of 1ps each) [58].

Free Energy Calculations
Binding free energy calculations were performed using the Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) approach with the g_mmpbsa tool [110]. Binding free energy is obtained according to Equation (1), taking into account the vacuum Molecular Mechanics (MM) potential energy for non-bonded and bonded interactions (E MM ), plus the polar (G polar ) and non-polar (G nonpolar ) solvation energy (Equation (2)). G X = E MM + G polar + G nonpolar (2) X = protein, ligand, complex. MM forcefield parameters are used to calculate E MM . G polar is obtained by solving the Poisson-Boltzmann equation, while G nonpolar is based on the Solvent Accessible Surface Area (SASA) model. The three parameters were extracted between 5 and 40 ns of the MD simulation trajectory by taking snapshots every 2 ns.

Frontier Orbital Analysis
The main purpose of this section was to investigate whether or not the frontier orbitals shapes could allow qualitatively distinguishing structures with high vs. low anti-COVID-19 activity. For this purpose, the selected structures were optimized at the DFT theory level with the WB97XD/6-311G(d,p) [82,93,111,112] method, implemented in Gaussian 16 [113] for Linux. Following optimization, the .chks files were retrieved and used to generate the frontier molecular orbitals with a 0.02 au isosurface.

Conclusions
For the QSAR analysis, 2D structures of the compounds were used as downloaded from the PubChem website, and 3D structures were satisfactorily optimized at the UFF level of theory. A robust model for pIC 50 prediction with only five attributes was obtained with good statistical parameters (R 2 = 0.897, Q 2 LOO = 0.854 and Q 2 ext = 0.876, among others). The model has been demonstrated to be robust enough for the prediction of biological activity with a wide applicability domain, making it possible to predict pIC 50 values for more than 4000 compounds extracted from the Drugbank database. Based on their pIC 50 values, it was possible to identify 57 possible potent drugs, which can be used for the treatment of COVID-19. From these, six compounds, i.e., Para-iodo-D-phenylalanine hydroxamic acid (DB01980), 2 ,6 -Dichloro-Biphenyl-2,6-Diol (DB03259), 3-O-Methylfructose (DB02438), Metanitrophenyl-Alpha-D-Galactoside (DB02213), 3-amino-azacyclotridecan-2-one (DB08476), and 2-(2,4-dichlorophenoxy)-5-(pyridin-2-ylmethyl)phenol (DB07287), were found as potential candidates due to their activity (pIC 50 > 6.6) and are suggested for further experimental studies. Interestingly, several antifungal and antibiotic compounds such as miconazole, amikacin, streptomycin, lincomycin, and spiramycin were found to be potent SARS-CoV-2 inhibitors, the same as monensin, flunisolide, fluticasone propionate, and tixocortol, which are used to treat bronchitis, asthma, and allergic rhinitis. From the docking analysis, the affinity scores found for the interaction of the studied drugs with the three enzymes were all negative, with values from −4.5 to −9.4 kcal/mol, suggesting, at first sight, that all the molecules could inhibit the biological targets. Further analysis showed that the Mpro binding pocket has a side chamber where a Leu residue of inhibitor N3 enters. Several of the studied compounds manage to fit in this chamber, although none of them can obtain the same number of interactions and HBs as N3. On PLpro, a tunnellike structure on the active site where none of the studied molecules entered made drug interaction with the active site not very strong. The best results were obtained for RdRp, where digoxin has 19 interactions, including 7 HBs. The molecular dynamic simulation for the selected compounds contributed substantially to the understanding of the proteinligand interaction for the three considered targets. MD simulation showed that in Mpro, digoxin accommodates better in the pocket with a methyl group pointing to the chamber. For PLpro, digoxin and chloroquine failed to stay in the active site after the simulation. Again, RdRp-drug complexes were found to be the better option after the MD simulation. Good fit with the active site and an overlay when adding the experimental RNA structure suggest RdRp as a plausible mechanism of several studied drugs. MMPBSA calculation showed that all values are negative, meaning that there is a favourable interaction be-tween the studied drugs and the chosen targets. For RdRp, digoxin was found to have the highest affinity, matching experimental pIC 50 values. The frontier molecular orbitals analysis suggests that it is possible to separate the compounds based on the molecular orbital shapes.
Supplementary Materials: The following are available online, Figure S1. Pearson's coefficient range for each descriptor concerning the other descriptors; Figure S2. a. Pharmacophore model 4 fitting DB03259. b. Pharmacophore model 5 fitting DB02438; Figure S3. 2D representation of the interaction of Digoxin (a), DB02438 (b), and DB02213 (c) with the active site of Mpro; Figure S4. 2D representation of the interaction of Digoxin with the active site of RdRp; Figure S5. MD results for Mpro. a. RMSD of the protein-ligand complex (red) and the ligand (black). b. Coulomb (blue) and Lennard-Jones (green) interaction energy. c. the number of hydrogen bonds; Figure S6. Occupancy (yellow line) and HB lifetimes (Bars) for the MD results of Mpro. Chloroquine in blue, digoxin in red, dronedarone in purple, and remdesivir in green; Figure S7. Interaction of MPro (pale cyan) with a. Docking (yellow) and MD (green) results for Digoxin. b. Docking (pink) and MD (orange) result for Chloroquine; Figure S8. MD results for PLpro interaction with bazedoxifene. a. RMSD of the protein-ligand complex (red) and the ligand (black). b. Coulomb (blue) and Lennard-Jones (green) interaction energy. c. the number of hydrogen bonds; Table S1. Models obtained for the 2D, 3D, and combined modeling; Table S2. DockCov2 scores of compounds in the Jeon et al. database; Table S3. Jeon et al. database screening using pharmacophore models 4 and 5; Table S4. Best compounds for the DrugBank screening using pharmacophore models 4 and 5; Table S5. Docking results of the Dataset against Mpro, PLpro, and RdRp, ordered from largest to lowest pIC 50; Table S6. Docking results of the Best Molecules of the DrugBank screening, ordered from largest to loweest predicted pIC 50 ; Table S7. Gris box used for the docking study.