Design of Potent Inhibitors Targeting the Main Protease of SARS-CoV-2 Using QSAR Modeling, Molecular Docking, and Molecular Dynamics Simulations

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is a serious global public health threat. The evolving strains of SARS-CoV-2 have reduced the effectiveness of vaccines. Therefore, antiviral drugs against SARS-CoV-2 are urgently needed. The main protease (Mpro) of SARS-CoV-2 is an extremely potent target due to its pivotal role in virus replication and low susceptibility to mutation. In the present study, a quantitative structure–activity relationship (QSAR) study was performed to design new molecules that might have higher inhibitory activity against SARS-CoV-2 Mpro. In this context, a set of 55 dihydrophenanthrene derivatives was used to build two 2D-QSAR models using the Monte Carlo optimization method and the Genetic Algorithm Multi-Linear Regression (GA-MLR) method. From the CORAL QSAR model outputs, the promoters responsible for the increase/decrease in inhibitory activity were extracted and interpreted. The promoters responsible for an increase in activity were added to the lead compound to design new molecules. The GA-MLR QSAR model was used to ensure the inhibitory activity of the designed molecules. For further validation, the designed molecules were subjected to molecular docking analysis and molecular dynamics simulations along with an absorption, distribution, metabolism, excretion, and toxicity (ADMET) analysis. The results of this study suggest that the newly designed molecules have the potential to be developed as effective drugs against SARS-CoV-2.


Introduction
Three years after the emergence of the COVID-19 disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the entire world is still suffering from the major health and socioeconomic consequences of the disease [1]. Accordingly, significant research efforts have been made to develop vaccines and drugs to effectively stop COVID-19. However, SARS-CoV-2 continues to claim human lives due to frequent mutations in its viral genome that make the new variants more transmissible and infectious, such as B.1.1.7 (alpha), B.1.351 (beta), B.1.617.2 (delta), and B.1.1.529 (omicron) [2][3][4][5]. Therefore, the discovery of new inhibitors that work better at different stages of SARS-CoV-2 propagation has become a priority.
SARS-CoV-2 is an enveloped positive-stranded RNA virus that, once it enters the cell, translates the single-stranded RNA into two large polyproteins known as pp1a and pp1ab that mediate viral replication and propagation [6]. During viral maturation, most cleavage events are controlled by a nonstructural protein 5 (nsp5), also known as the main protease (Mpro or 3CLpro), which is in charge of 11 different cleavages in the early stages of viral replication [7]. Mpro is an attractive target due to: (i) its pivotal role in viral replication and maturation [8]; (ii) minor mutations that have occurred in its genomic sequence in the new variants of SARS-CoV-2 compared to other major proteins, [9] and (iii) its inhibitors that are unlikely to be dangerous as no human proteases with similar cleavage specificity have been identified [10]. The Mpro monomer consists of three domains: domain I, domain II, and domain III, which include amino acid residues 8-101, 102-184, and 201-306, respectively [11]. Despite the fact that each individual monomeric component has its own active site, the Mpro of SARS-CoV-2 is enzymatically active only as a homodimer [11][12][13]. Therefore, it might be logical to block the catalytic site (CS) and dimer interface site (DIS) to inhibit the main function of the protease and its dimerization.
In continuation of our efforts to develop new potent Mpro inhibitors [14], in this study, two quantitative structure-activity relationships (QSAR) were developed based on the structural properties of dihydrophenanthrene derivatives and their inhibitory activities against Mpro. The first model, based on Monte Carlo optimization, was used to build SMILES-based QSAR models that provide insights into the design of new Mpro inhibitors. The second model used the genetic algorithm multi-linear regression approach (GA-MLR) to confirm the prediction of the inhibitory activity of the designed molecules. Since the dihydrophenanthrene derivative could inhibit SARS-CoV-2 by targeting the CS and DIS of the Mpro, a molecular docking study was carried out to investigate the interaction of the designed molecules within the CS and DIS of Mpro. In addition, the most potent designed molecules were subjected to molecular dynamics simulations in order to investigate their stability and behavior within the CS and DIS sites of Mpro. An absorption, distribution, metabolism, excretion, and toxicity (ADMET) evaluation of these molecules was also performed.

CORAL QSAR Model
Six QSAR models were built based on three random splits and two target functions: TF1 with W IIC = 0 and TF2 with W IIC = 0.2. Three splits were created from the entire dataset and each split was divided into four sets, as described below (for definitions of the parameters, refer to the Materials and Methods section (Section 3) of this article). The obtained statistical parameters for the developed SMILES-based QSAR models show that W IIC = 0.2 increases the effect of IIC on Monte Carlo optimization. Table 1 displays the computed statistical parameters for all splits. Figure 1 depicts the experimental pIC 50 values compared to the estimated values for the three splits. Table 1 clearly illustrates that all models are statistically reliable and fulfill the criteria set by Tropsha et al. [15] and Ojha et al. [16]. All data points of the three splits are in the applicability domain as shown in the Supplementary Materials in Table S1. The QSAR model with a higher R 2 value for the validation set and a higher IIC value for the calibration set was selected as the leading model. Therefore, the QSAR model of split 2 had the highest R 2 value of 0.9203 for the validation set and a higher IIC value of 0.9277 for the calibration set. The numerical values of the various parameters for the validation set of split 2 were Q 2 = 0.8508, CCC = 0.9157, r 2 m = 0.8825, and ∆r 2 m = 0.0647. The split 2 model is as follows in Equation (1): pIC 50 = 1.7915330 (±0.04504500) + 0.0672532 (±0.0010114) × DCW (3,17) (1)

GA-MLR QSAR Model
The GA-MLR method was applied to the training set and then evaluated to the test set using the six selected descriptors that contribute to the inhibitory activity ( Table 2). The GA-MLR model Equation (2) and its statistical parameters are presented below:

GA-MLR QSAR Model
The GA-MLR method was applied to the training set and then evaluated to the test set using the six selected descriptors that contribute to the inhibitory activity ( Table 2). The GA-MLR model Equation (2) and its statistical parameters are presented below: The performance of the aforementioned parameters of the established GA-MLR model passes the OECD's standard validation criteria. Furthermore, Figure 2a depicts the experimental pIC 50 endpoints and the predicted endpoints by the constructed GA-MLR model, which demonstrates a good correlation between the studied activity and the six selected descriptors. The AD is used to analyze the space of the leading model in order to further validate the generated model. The AD was carried out using the leverage approach, as seen in the Williams plot in Figure 2b. The dashed lines represent the cutoff value of ±3 standard deviations, while the caution line for the X outlier (h*) is 0.538. All molecules in William's plot fall within the AD, with the exception of one molecule.

Molecular Descriptor Description
Eig09_AEA(dm) Eigenvalue n. 9 from augmented edge adjacency matrix weighted by dipole moment P_VSA_MR_5 P_VSA-like on Molar Refractivity, bin 5 s3_numSharedNeighbors Number of shared neighbours in substituent 3 with other substituents s3_relPathLength_2 Maximum path length of the substituent 3 normed by s3_size SpMin2_Bh(m) Smallest eigenvalue n. 2 of Burden matrix weighted by mass GATS5m Geary autocorrelation of lag 5 weighted by mass Eig09_AEA(dm) Eigenvalue n. 9 from augmented edge adjacency matrix weighted by dipole moment P_VSA_MR_5 P_VSA-like on Molar Refractivity, bin 5 s3_numSharedNeighbors Number of shared neighbours in substituent 3 with other substituents s3_relPathLength_2 Maximum path length of the substituent 3 normed by s3_size SpMin2_Bh(m) Smallest eigenvalue n. 2 of Burden matrix weighted by mass GATS5m Geary autocorrelation of lag 5 weighted by mass The performance of the aforementioned parameters of the established GA-MLR model passes the OECD's standard validation criteria. Furthermore, Figure 2a depicts the experimental pIC50 endpoints and the predicted endpoints by the constructed GA-MLR model, which demonstrates a good correlation between the studied activity and the six selected descriptors. The AD is used to analyze the space of the leading model in order to further validate the generated model. The AD was carried out using the leverage approach, as seen in the Williams plot in Figure 2b. The dashed lines represent the cutoff value of ±3 standard deviations, while the caution line for the X outlier (h*) is 0.538. All molecules in William's plot fall within the AD, with the exception of one molecule.

Mechanistic Interpretation
A mechanistic interpretation is an essential component of the OECD. These models can be used to identify and evaluate the molecular properties that are responsible for increasing and decreasing an endpoint value. Multiple Monte Carlo optimization runs can be used to determine the mechanistic interpretation of the CORAL model. The chemical characteristics derived from the SMILES attributes with positive CWs are found to be promoters of an increase in the pIC50 value, whereas the SMILES attributes with negative CWs are shown to be promoters of a decrease in the pIC50 value in three independent Monte Carlo optimization runs. SMILES attributes containing both positive and negative CWs are undefined. Table 3 shows the primary promoters causing an increase or decrease in pIC50 values, along with the associated CWs, during three separate runs of the developed QSAR model for split 2.

Mechanistic Interpretation
A mechanistic interpretation is an essential component of the OECD. These models can be used to identify and evaluate the molecular properties that are responsible for increasing and decreasing an endpoint value. Multiple Monte Carlo optimization runs can be used to determine the mechanistic interpretation of the CORAL model. The chemical characteristics derived from the SMILES attributes with positive CWs are found to be promoters of an increase in the pIC 50 value, whereas the SMILES attributes with negative CWs are shown to be promoters of a decrease in the pIC 50 value in three independent Monte Carlo optimization runs. SMILES attributes containing both positive and negative CWs are undefined. Table 3 shows the primary promoters causing an increase or decrease in pIC 50 values, along with the associated CWs, during three separate runs of the developed QSAR model for split 2.
Based on the findings listed in Table 3, the increase promoters were used and introduced into the lead compound (49), which has the highest pIC 50 value, while the decrease promoters were avoided. The propagation promoters were examined at three distinct sites in the lead compound to design novel Mpro inhibitors ( Figure 3). All the newly developed compounds, along with their chemical structures and the calculated pIC 50 values by the two models, are listed in Table S2. The CORAL model predicted a pIC 50 range of 5.82-6.44 for all the newly developed compounds. The GA-MLR QSAR model confirmed the inhibitory activity of the newly developed compounds, with the exception of 49v, which exhibited a pIC 50 value slightly lower than the one of the lead compound 49.

Molecular Docking Analysis
To further validate the design strategy, a molecular docking approach was performed to investigate the ligand-protein interactions between the newly designed compounds with the reference lead compound (49) and the SARS-CoV-2 Mpro. The lead compound inhibits SARS-CoV-2 Mpro via mixed inhibition, which means that it could bind to the CS and DIS of the Mpro. All designed molecules were first docked to the catalytic site of Mpro. Then, the best three molecules bound to the CS were further docked to the DIS of Mpro.

Molecular Docking within the CS of Mpro
To validate our docking protocol, a re-docking analysis of the co-crystallized ligand (3WL) with 6M2N was performed. The RMSD between the experimental and predicted poses of 3WL was determined to be 0.79 Å ( Figure S1). This result shows that the docking protocol is appropriate for reproducing native poses (<2 Å). After validation of the parameters, a docking study was performed for all compounds against the Mpro protein.
The binding affinity estimations are shown in Table S2. It was found that the binding affinity values of the designed molecules (except 49c and 49k) were higher compared to that of the compound 49 (−11.47 Kcal/mol) within the CS. The 2D representation of the

Molecular Docking Analysis
To further validate the design strategy, a molecular docking approach was performed to investigate the ligand-protein interactions between the newly designed compounds with the reference lead compound (49) and the SARS-CoV-2 Mpro. The lead compound inhibits SARS-CoV-2 Mpro via mixed inhibition, which means that it could bind to the CS and DIS of the Mpro. All designed molecules were first docked to the catalytic site of Mpro. Then, the best three molecules bound to the CS were further docked to the DIS of Mpro.

Molecular Docking within the CS of Mpro
To validate our docking protocol, a re-docking analysis of the co-crystallized ligand (3WL) with 6M2N was performed. The RMSD between the experimental and predicted poses of 3WL was determined to be 0.79 Å ( Figure S1). This result shows that the docking protocol is appropriate for reproducing native poses (<2 Å). After validation of the parameters, a docking study was performed for all compounds against the Mpro protein. The binding affinity estimations are shown in Table S2. It was found that the binding affinity values of the designed molecules (except 49c and 49k) were higher compared to that of the compound 49 (−11.47 Kcal/mol) within the CS. The 2D representation of the best compounds (49n, 49p, 49x) bound with the CS of Mpro is shown in Figure 4. Their binding affinity values and interaction details are given in Table 4 and Table S3, respectively. The analysis of docking results shows that the lead compound 49 and the designed compound 49x interacted with some important residues of the CS via five hydrogen bonds (one hydrogen bond with CYS145), nine hydrophobic (two of them with HIS41) and two pi-sulfur interactions. Compound 49n was stabilized by the formation of six hydrogen bonds in which two hydrogen bonds were observed with CYS145, as well as two pi-sulfur interactions and seven hydrophobic interactions, where one of them interacted with HIS41. Compound 49p interacted by four hydrogen bonds (two of them with CYS145), and eight hydrophobic (two of them formed with HIS41), along with two pi-sulfur and one pi-lone pair contacts. Analysis of the non-covalent interactions between the most potent designed molecules within the CS shows that they interact with important key binding residues, specifically the catalytic dyad (HIS41 and CYS145); thus, they can serve as important protease inhibitors.  After selecting the three best compounds for catalytic site inhibition, we performed a docking study of these compounds within the DIS of Mpro. The monomeric structure (6M2Q) was used as a receptor to avoid the structural changes that may occur in the dimeric structure of Mpro. The binding affinity and 2D representation of the ligand-  After selecting the three best compounds for catalytic site inhibition, we performed a docking study of these compounds within the DIS of Mpro. The monomeric structure (6M2Q) was used as a receptor to avoid the structural changes that may occur in the dimeric structure of Mpro. The binding affinity and 2D representation of the ligand-     After selecting the three best compounds for catalytic site inhibition, we performed a docking study of these compounds within the DIS of Mpro. The monomeric structure (6M2Q) was used as a receptor to avoid the structural changes that may occur in the dimeric structure of Mpro. The binding affinity and 2D representation of the ligand-protein interactions of the three compounds with the DIS of SARS-CoV-2 Mpro are shown in Table 4 and Figure 5, respectively. The interaction details are represented in Table S4. Compound 49 was stabilized within the DIS by two hydrogen bonds with Val125 and ARG4, one pi-sulfur interaction, one electrostatic interaction with LYS5, and seven hydrophobic interactions where five of them interacted with ARG4 and LYS5. The binding affinity values of −9.18, −9.95, and −9.06 Kcal/mol for compounds 49n, 49p, and 49x, respectively, show that they bind better in the DIS than the lead compound 49 (−8.65 Kcal/mol). Compound 49n was stabilized using three hydrogen bonds with VAL125, LYS5, and ARG4, one electrostatic interaction with LYS5, and one pi-sulfur interaction, along with five of nine hydrophobic interactions seen to be formed with ARG4 and LYS5. The interaction with compound 49p was found to be stabilized by forming two hydrogen bonds with VAL125 and ARG4, one pi-sulfur interaction, one electrostatic interaction with LYS5, and eight hydrophobic interactions (five of them interacting with ARG4 and LYS5). Furthermore, compound 49x was stabilized by six hydrogen binding interactions where five of them interacted with LYS5, ARG4, and GLU290, and four out of eight hydrophobic interactions were seen to be formed with ARG4 and LYS5, plus one pi-lone pair interaction with LYS5 and one electrostatic interaction with GLU290. Overall, the non-covalent interactions of the three compounds within the DIS of Mpro demonstrate that the compounds interacted with one of the important residues (ARG4, LYS5, and GLU290), which have been proposed to be involved in the dimerization of Mpro. This could be a potential strategy to disrupt the stability of the dimer interface and to prevent the formation of the dimeric structure of Mpro. five of them interacted with LYS5, ARG4, and GLU290, and four out of eight hydrophobic interactions were seen to be formed with ARG4 and LYS5, plus one pi-lone pair interaction with LYS5 and one electrostatic interaction with GLU290. Overall, the noncovalent interactions of the three compounds within the DIS of Mpro demonstrate that the compounds interacted with one of the important residues (ARG4, LYS5, and GLU290), which have been proposed to be involved in the dimerization of Mpro. This could be a potential strategy to disrupt the stability of the dimer interface and to prevent the formation of the dimeric structure of Mpro. (a)

Molecular Dynamics Simulations (MDs) of Top-Three Designed Molecules
Molecular dynamics simulations (MDs) were performed to evaluate the conformational stability of ligand-protein complexes and the binding ability of ligands. In this study, the metrics RMSD, RMSF, and Rg were calculated. RMSD is a metric used to measure the deviation of a protein's backbone from its initial conformation to its final conformation. The amount of deviation observed during a simulation is used to determine the stability of the protein with respect to its structural conformation. A protein that maintains a stable structure will have minimal deviation in its backbone, while a protein with a less stable structure will have greater deviation. RMSF analysis is used to identify the flexible regions of protein-ligand complexes. In proteins, regions such as loops, turns, and coils that are less organized have higher RMSF values, while more structured regions such as alpha helices and beta sheets have lower RMS fluctuations. The Rg is a measure of the compactness of the protein structure, which is determined by comparing the Rg values of the protein before and after the ligand binding.  6M2N-49, 6M2N-49n, 6M2N-49p, and 6M2N-49x was found to be 0.115, 0.113, 0.131, 0.092, and 0.103 nm, respectively. The 6M2N-49p complex confirms the RMSD results as it has the lowest RMSF compared to all systems. The 6M2N-49x complex showed a lower RMSF than the 6M2N-49 complex, suggesting that 49x, like 49p, could act as a potential inhibitor of Mpro at its catalytic site. In addition, the Rg plot for 6M2N-apo, 6M2N-49, 6M2N- results as it has the lowest RMSF compared to all systems. The 6M2N-49x complex showed a lower RMSF than the 6M2N-49 complex, suggesting that 49x, like 49p, could act as a potential inhibitor of Mpro at its catalytic site. In addition, the Rg plot for 6M2N-apo ,  6M2N-49, 6M2N-49n, 6M2N-49p, and 6M2N-    Finally, we performed the MM/GBSA analysis to calculate the interaction energies of the complexes for the entire MD trajectory. The results are shown in Table 6. The binding free energies for 6M2N-49n, 6M2N-49p, and 6M2N-49x are −19.63, −19.77, and −30.3 kcal mol-1, respectively, indicating higher binding affinity compared to the reference compound.

MDs of the Top-Three Designed Molecules within the DIS of Mpro
MDs were performed to investigate the stability and conformational changes of the monomeric form of Mpro (6M2Q-apo) and the ligand-protein complexes 6M2Q-49, 6M2Q-49n, 6M2Q-49p, and 6M2Q-49x when the compounds were bound to the DIS of Mpro. After visualizing the dynamics of the compounds, we found that the reference compound 49 and 49x left their binding site after 50 ns and 20 ns, respectively. In contrast, 49n and 49p remain bound throughout the simulation. Following these results, we present the RMSD, RMSF, and Rg plots for 6M2Q-apo, 6M2Q-49n, and 6M2Q-49p in Figure 7 and their overall average values in Table 7. The overall average RMSD values of 6M2Q-apo, 6M2Q-49n, and 6M2Q-49p were 0.198, 0.176, and 0.178 nm, respectively. Both 6M2Q-49n, and 6M2Q-49p showed lower RMSD values compared to 6M2Q-apo, suggesting greater stability of the ligand-Mpro complexes than Mpro-apo. In addition, RMSF analysis shows that the fluctuations were smaller than for the apo form of Mpro (RMSFavg = 0.116 nm) with average RMSF values of 0.087 nm and 0.102 nm for 6M2Q-49n and 6M2Q-49p, respectively. Furthermore, the Rg values for the ligand-protein complexes are lower than the apo form of Mpro, indicating that the ligands are more closely packed. These findings support the RMSD and RMSF results.

Pharmacokinetic and Toxicity Predictions
The results of the ADMET analysis for the reference and three designed compounds were presented in Table 8. The human intestinal absorption (HIA) values reveal that these compounds have high absorption in the intestine, which makes them suitable for oral administration. With moderate predicted permeability through the blood-brain barrier (BBB), these compounds are expected to have acceptable bioavailability in the central nervous system. In addition, these compounds do not interfere with the normal functioning of cytochrome P450, which reduces the risk of unexpected or undesirable drug interactions. Moreover, the designed compounds do not exhibit toxic behavior, indicating that they may be suitable therapeutic candidates in pre-clinical trials.   Considering all MD simulation results for the complexes with Mpro, 49n and 49p could lead to potentially higher activity because they have strong binding to both sites (CS and DIS) of Mpro, whereas 49x is more stable when bounded to the CS.

Pharmacokinetic and Toxicity Predictions
The results of the ADMET analysis for the reference and three designed compounds were presented in Table 8. The human intestinal absorption (HIA) values reveal that these compounds have high absorption in the intestine, which makes them suitable for oral administration. With moderate predicted permeability through the blood-brain barrier (BBB), these compounds are expected to have acceptable bioavailability in the central nervous system. In addition, these compounds do not interfere with the normal functioning of cytochrome P450, which reduces the risk of unexpected or undesirable drug interactions. Moreover, the designed compounds do not exhibit toxic behavior, indicating that they may be suitable therapeutic candidates in pre-clinical trials.

Data Collection, Molecular Structures Preparation and Molecular Descriptor Calculation
For this study, the 2D structures of all 55 dihydrophenanthrene derivatives, collected from the published work of Jian-Wei Zhang & al [17], were sketched using ACD/ChemSketch software and converted to 3D structures using ChemDraw 16.0 software. For geometry optimization and partial charge assignment, the MMFF94 force field was employed with the steepest descent as an algorithm and with 1000 as the number of steps used for optimization. Then, the geometries were optimized using the AM1 method in the gas phase implemented in the Gaussian 09 software [18]. Frequency analysis was checked to investigate the energy minima of the optimized derivatives. The optimized molecular structures were converted into Simplified Molecular Input Line Entry System (SMILES) codes for the modeling process of the CORAL model. To build the GA-MLR model, alvaDesc software (version 1.0.8) was used to calculate the molecular descriptors using the optimized 3D structures [19]. To avoid multicollinear variables in the QSAR model, the number of generated variables was reduced by eliminating descriptors that had a constant value of over 95%, and by keeping only one of the descriptor pairs that possessed a correlation coefficient higher than 0.9. The experimental data value of each molecule (half-maximal inhibitory concentration, IC 50 ) was converted into the form of a logarithm (−log IC 50 ). The molecular structures and their corresponding IC 50 data are listed in Table S5, and their SMILES notation and converted pIC 50 are available in Table S6.

SMILES-Based QSAR Model Building
For QSAR modeling using CORAL 2019 software [20], the entire dataset (55 derivatives) was randomly divided into three splits. Each split consisted of 4 sets: active training (AT, 35%), passive training (PT, 35%), calibration (Cal, 15%), and validation (Val, 15%). Each set has a specific role in the development of the QSAR model, which is well explained in the literature [21][22][23]. The AT set is used to develop the model and generate correlation weights. These weights are then used to compute the descriptors for all the compounds involved in the modeling process. The PT set is used to evaluate the robustness of the model for compounds not belonging to those used to construct the model. The Cal set is used to detect the onset of overfitting. The Val set provides an independent evaluation of the statistical quality of the model using data for compounds that were not included in the model development and optimization. The distribution and identity level of compounds in three splits are shown in Table 9. Table 9. Percentage of the identity of the three splits. The development of QSAR models based on SMILES notation uses the optimal descriptors Equation ( [24]. The comprehensive explanation of the SMILES attributes is provided in Table 10. Following the calculation of all CWs, the linear regression technique was utilized to establish QSAR models, as indicated in Equation (4).

Set
where C 0 is the intercept, while C 1 is the slope of the regression equation. The target functions TF1 and TF2 were used to optimize the Monte Carlo method for QSAR modeling. The equilibrium of the correlation method was used to calculate TF1 (Equation (5)), while the index of ideality of correlation (IIC) [25] was added to TF1 to obtain the modified target function TF2 (Equation (6)).
Pharmaceuticals 2023, 16, 608 15 of 20 R AT , R PT , and R set are the correlation coefficients between the observed pIC 50 and the predicted pIC 50 for the active training set, the passive training set, and for a given set, respectively. The mean absolute error (MAE) is determined as follows: To build robust QSAR models, the optimal threshold (T*) and number of epochs (N*) were calculated by analyzing the best statistical parameters for the calibration set. In the search for the best T* and N*, the ranges from 1 to 10 for the threshold and from 1 to 30 for the N epoch were used for the optimization. Three optimization probes were used. The weight of IIC (W IIC ) is an empirical coefficient; in this study, the W IIC = 0 for TF1 and W IIC = 0.2 for TF2.

GA-MLR QSAR Model Building
To initiate QSAR analysis, the initial stage is to select the most appropriate descriptors from the complete set of computed ones. For this purpose, alvaDesc software was used to compute 5666 molecular descriptors, and only 643 descriptors were filtered out based on the criteria mentioned above (Section 2.1). Then, a stepwise MLR method was performed using SPSS software to select the most relevant descriptors [26]. Finally, 6 molecular descriptors were kept. Using these selected descriptors, the MLR method established a linear relationship between the pIC 50 endpoints of the molecules and their molecular descriptors by applying the ordinary least squares (OLS) algorithm implemented in the QSARINS software [27,28]. The data set was randomly divided into a training set (70%, 39 molecules) and a test set (30%, 16 molecules). The GA-MLR models were created with standard parameters except for subsets from 1 to 5, maximum generation of 10,000, and mutation probability of 0.05.

QSAR Models Validation
The validation step in QSAR is critical for evaluating the accuracy of the model in predicting the activity of new compounds. This step is critical for determining the robustness, reliability, and predictability of the QSAR model. There are four steps to validate the constructed model, including internal validation or cross-validation using the training data set, Y-randomization, external validation using the test data set, and evaluation of the applicability domain (AD) [29]. The validation steps and criteria of the GA-MLR and CORAL QSAR models are well explained in our previous articles [14,23,30,31].

Applicability Domain
The applicability domain (AD) resulting from the training data of a given in silico model was proposed by the Organization for Economic Co-operation and Development (OECD) guidelines [32]. AD can be used to assess the predictability of a created QSAR model for molecules that were not considered in the development of the model. AD is important to determine whether the created QSAR model interpolates (makes correct predictions) or extrapolates (makes less reliable predictions). Outliers are molecules that exist outside of the AD.
In the case of the SMILES-based QSAR, the statistical defects of SMILES are the guiding criteria for the definition of AD. The statistical defect (D) for a given molecule is the sum of the statistical defects, d(A), of all the attributes accessible in the SMILES notation, Equation (10).
Here, P(A) and P (A) represent the probabilities of the attributes (A) in the training set and calibration set, correspondingly, while N(A) and N'(A) indicate the frequencies of the attribute (A) occurring in the training set and calibration set, respectively.
A molecule is classified as an outlier if D > 2* D, while a molecule is in the AD if D < 2*D, where D is the estimated average for the AT, PT, and Cal sets [33].
The William plot of standardized residuals versus leverage was used to represent the AD in the GA-MLR model. Reliable model predictions have leverage values that are within ±3 standard deviations of the critical leverage and less than the warning leverage value h*. The molecules that fall outside the horizontal reference lines of the graph are outliers, while the influential molecules are those with h > h* [34].

Molecular Docking Study
In the current study, molecular docking was used to investigate the conformational patterns and ligand-protein interactions of the designed molecules within the CS and DIS sites of the SARS-CoV-2 Mpro. Two 3D crystallographic structures of the Mpro protein were downloaded from the Protein Data Bank (PDB) of the Research Collaboratory for Structural Biology (RCSB; https://www.rcsb.org/, accessed on 10 March 2023) with the PDB ID codes: 6M2N and 6M2Q [35]. The dimeric structure (6M2N) was used as a receptor molecule to explore the preferred conformation pose of the designed molecules in its active binding site, while the monomeric structure (6M2Q) was used to investigate the best conformational pose of the designed molecules in its DIS. Prior to the docking process, the two protein structures were prepared by removing water and heteroatom molecules, adding polar hydrogens and Gasteiger charges, and saving them in the pdbqt format with the assistance of AutodockTools. Using the same program, the 3WL, the lead compound 49, and the designed molecules were produced and stored in the pdbqt format. The important residues within the CS and DIS sites were defined according to the literature in Table 11 [36,37]. Molecular docking simulations were performed using AutoDock 4.2 with the following parameters: (i) xyz coordinates of −33.162, −65.074, 41.434 and grid box dimensions of 40 × 0 × 40 for the CS, and (ii) xyz coordinates of 112.000, −17.078, 46.750 and grid box dimensions of 80×80×100 for the DIS. A grid spacing of 0.375 was set for both of the grid boxes. The Lamarckian Genetic Algorithm was used to generate 100 protein-ligand binding conformations for each molecule, with a maximum of 2,500,000 energy evaluations. The most reasonable binding configurations were then subjected to an evaluation that considered both ligand-receptor interactions and binding affinity. Discovery studio visualizer [38] and Pymol software [39] were used to analyze the docking outputs. Table 11. List of key residues of SARS-CoV-2 Mpro.

Molecular Dynamics Simulation Details
The newly designed compounds which showed stronger binding to the CS and DIS were subjected to all-atom molecular dynamics using the GROMACS 2020.6 (Groningen Machine for Chemical Simulation) software [40,41]. Before running the MD simulation, the CHARMM-GUI web server [42] was used to generate the initial input parameters implementing the CHARMM36 force field [43]. Based on a rectangular grid box, each complex was solvated within TIP3P water and the necessary counterions (Na + ,Cl -) were added to maintain a salt concentration of 0.15 M through the Monte Carlo ion displacement. Energy minimization of each system was performed using the steepest descent algorithm with a maximum of 50,000 steps and a maximum force of 10.0 KJ/mol. The temperature and atmospheric pressure were set to 300 K and 1.01325 bar, respectively. Each system was equilibrated using canonical (NVT) and isothermal-isobaric (NPT) ensembles. After that, the MD simulation was performed for 100 ns. Based on the dynamics trajectory results, the root mean square deviation (RMSD), the radius of gyration (Rg), and root mean square flexibility (RMSF) were used to evaluate the structural stability of the designed molecules within the CS and DIS of Mpro. The obtained data were plotted using Xmgrace software [44].

MM-GBSA
The calculation of the binding free energies of protein-ligand complexes was estimated by the Molecular Mechanics Generalized Born Surface Area (MMGBSA) method using the gmx-mmgbsa package [45]. The binding free energy is calculated according to the following equations: where the total free energy of the complex is represented by ∆G complex , while the energies of the isolated protein and ligand are represented by ∆G protein and ∆G ligand , respectively; ∆G bind = ∆E gas + ∆G sol = ∆E vdw + ∆E ele + ∆G polar + ∆G nonpolar (12) where ∆E gas is the average potential energy of molecular mechanics in a vacuum, which includes both van der Waals interactions (∆E vdw ) and electrostatic interactions (∆E ele ). ∆G solv is the contribution to the solvation-free energy that is made up of the polar solvation energy (∆G polar ) and the nonpolar solvation energy (∆G nonpolar ).

ADMET Study
The absorption, distribution, metabolism, excretion, and toxicity (ADMET) analysis plays a critical role in drug discovery and development. Based on the ADMET results, we could predict the efficacy and safety of the potential hit compound in the early stages of drug development. In this regard, the ADMET predictions for the compounds were performed using the pkCSM server [46] for pharmacokinetic properties and the Osiris property explorer software [47] for toxicological properties.

Conclusions
The aim of this study was to find new inhibitors of SARS-CoV-2 replication by targeting the Mpro, a crucial protein involved in replicase polyprotein processing. To achieve this goal, we developed two QSAR models based on the structural properties of a series of dihydrophenanthrene derivatives that have shown potential inhibitory activity against Mpro. SMILES-based and GA-MLR QSAR models were built to provide insights in the design of novel potent Mpro inhibitors. Molecular docking and molecular dynamics simulations were further used as computational validation of the designed compounds. Overall, it has been shown that three compounds (49n, 49p, and 49x) exhibited a good inhibitory activity against Mpro. Also, they showed an important conformational and structural stability and a favorable binding profile. In addition, these drug candidates were found to be non-toxic and had acceptable pharmacological properties. By targeting the catalytic and dimer interface sites of Mpro, this suggests that these compounds have the potential to be developed as effective drugs against SARS-CoV-2. The combination of computational methods used in this study enabled the identification of new drug candidates, which can now be further evaluated for their efficacy and safety. Future studies could explore the feasibility of in vitro experiments to confirm and strengthen our computational modeling results. Nonetheless, the computational methods used in this study demonstrate their potential utility in accelerating drug discovery and directing future research directions against SARS-CoV-2.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ph16040608/s1. Figure S1: Superimposed view of native pose of co-crystallized ligand (black) with its X-ray pose (red) inside the CS of Mpro; Table S1: Split distribution, experimental pIC50 and calculated pIC50 for the three splits using TF2 (with W IIC = 0.2); Table S2: Chemical structures of designed Mpro inhibitors; Table S3: Details of ligand-protein interactions of designed compounds within the CS of Mpro; Table S4: Details of ligand-protein interactions of designed compounds within the DIS of Mpro; Table S5: Chemical structure of the 55 dihydrophenanthrene derivatives and their corresponding IC50; Table S6: SMILES notation of the 55 dihydrophenanthrene derivatives and their pIC50.