Anti-HIV Potential of Beesioside I Derivatives as Maturation Inhibitors: Synthesis, 3D-QSAR, Molecular Docking and Molecular Dynamics Simulations

HIV-1 maturation is the final step in the retroviral lifecycle that is regulated by the proteolytic cleavage of the Gag precursor protein. As a first-in-class HIV-1 maturation inhibitor (MI), bevirimat blocks virion maturation by disrupting capsid-spacer peptide 1 (CA-SP1) cleavage, which acts as the target of MIs. Previous alterations of beesioside I (1) produced (20S,24S)-15ꞵ,16ꞵ-diacetoxy-18,24; 20,24-diepoxy-9,19-cyclolanostane-3ꞵ,25-diol 3-O-3′,3′-dimethylsuccinate (3, DSC), showing similar anti-HIV potency compared to bevirimat. To ascertain the binding modes of this derivative, further modification of compound 1 was conducted. Three-dimensional quantitative structure–activity relationship (3D-QSAR) analysis combined with docking simulations and molecular dynamics (MD) were conducted. Five new derivatives were synthesized, among which compound 3b showed significant activity against HIV-1NL4-3 with an EC50 value of 0.28 µM. The developed 3D-QSAR model resulted in great predictive ability with training set (r2 = 0.99, q2 = 0.55). Molecular docking studies were complementary to the 3D-QSAR analysis, showing that DSC was differently bound to CA-SP1 with higher affinity than that of bevirimat. MD studies revealed that the complex of the ligand and the protein was stable, with root mean square deviation (RMSD) values <2.5 Å. The above results provided valuable insights into the potential of DSC as a prototype to develop new antiviral agents.


Introduction
Despite the great success of highly active antiretroviral therapy (HAART) at reducing mortality from HIV infection, currently available antiretroviral medications frequently fail to act as an eradicative cure, which are usually accompanied by different degrees of toxic side effects and drug resistance. Therefore, it is urgent to develop anti-HIV drugs with novel mechanisms of action. Bevirimat (BVM), derived from betulinic acid, is a first-in-class HIV-1 maturation inhibitor (MI), which exhibits potent effects against viruses resistant to nucleoside and non-nucleoside reverse transcriptase inhibitors (NRTIs and NNRTIs), protease inhibitors (PIs), and integrase inhibitors (INs), and even displays synergistic effects with other anti-HIV drugs [1,2]. Because bevirimat possesses a favorable safety profile and mode of action distinct from other antiretroviral agents as well as prominent antiretroviral activity, BVM and its analogues were subjected to extensive studies and entered phase viruses resistant to nucleoside and non-nucleoside reverse transcriptase inhibitors (NRTIs and NNRTIs), protease inhibitors (PIs), and integrase inhibitors (INs), and even displays synergistic effects with other anti-HIV drugs [1,2]. Because bevirimat possesses a favorable safety profile and mode of action distinct from other antiretroviral agents as well as prominent antiretroviral activity, BVM and its analogues were subjected to extensive studies and entered phase IIb clinical trials, successively [3,4]. In this situation, natural products and their structural analogues could be seen as a necessary complement to conventional HAART and thus have attracted much more attention [5,6].
HIV-1 maturation is the final step in the viral replication cycle that is governed by the proteolytic cleavage of the Gag precursor protein [7]. The Gag polyprotein is stepwise split by viral proteases into matrix (MA), capsid (CA), spacer peptide 1 (SP1), nucleocapsid (NC), spacer peptide 2 (SP2) and p6, which subsequently rearrange to form mature and infectious virion. The cleavage between CA and SP1 is a crucial rate-limiting step, which acts as the target of HIV-1 MIs [8]. The structural studies of CA-SP1 have shown that the C-terminal domain (CTD) of CA and SP1 forms a six-helix bundle (6HB), which holds together the Gag hexamer necessary for HIV particle assembly [9][10][11]. The 6HB actually works as molecular switch to control the maturation process [12]. Bevirimat works by binding to the CA-SP1 peptide and stabilizing 6HB and thus keeping the enzyme from cutting the Gag polyprotein [13]. The accumulation of CA-SP1 intermediate with the absence of assembled CA proteins led to loss of infectivity [14]. Further investigation revealed that BVM binds at the center of 6HB with the axial orientation via electrostatic and hydrophobic interactions [15]. Figure 1 depicts the structure of the HIV-1 Gag precursor protein and BVM binding to CACTD-SP1. In silico molecular modeling has played an increasingly pivotal role in contemporary drug discovery [16]. Computer experiments using various computational techniques and software programs have proven to be essential for the interpretation of common experimental assays [17]. Nowadays, 3D-quantitative structure-activity relationship (3D-QSAR) models have been intensely applied in drug design [18]. Virtual screening and/or molecular docking software program as well as molecular dynamics (MD) have been In silico molecular modeling has played an increasingly pivotal role in contemporary drug discovery [16]. Computer experiments using various computational techniques and software programs have proven to be essential for the interpretation of common experimental assays [17]. Nowadays, 3D-quantitative structure-activity relationship (3D-QSAR) models have been intensely applied in drug design [18]. Virtual screening and/or molecular docking software program as well as molecular dynamics (MD) have been extensively used to guide the drug discovery [19,20]. In particular, MD simulations have become increasingly useful for understanding the structure-function relationship of the target and the essence of protein-ligand interactions [21,22]. With the advent of high-performance computing, graphics processing units (GPUs)-based MD simulations are capable of unveiling biological phenomena that were previously not feasible using traditional hardware architectures [23].
A growing trend in the application of 3D-QSAR combined with molecular docking and MD simulation study in drug design has been witnessed in recent years [24,25].
Triterpenoids and their analogs have been extensively investigated for their diverse structures and antiviral activities [26][27][28]. In our previous efforts to search for new anti-HIV agents from natural products, beesioside I (1) was found to show significant antiviral effects. The following modification on aglycone (2) of compound 1 led to the discovery of (20S,24S)-15β,16β-diacetoxy-18,24; 20,24-diepoxy-9,19-cyclolanostane-3β,25-diol 3-O-3 ,3 -dimethylsuccinate (3, DSC) with EC 50 value of 0.025 µM, displaying similar anti-HIV potency compared to bevirimat [29]. However, the plausible binding mode of the promising derivative to CA-SP1 remains unknown. In this study, compound 1 was further modified to produce five new derivatives 2a-b and 3a-c among which compound 3b showed the strongest activity against HIV-1 NL4-3 with an EC 50 value of 0.28 µM (CC 50 > 20 µM). To further explore the potential of beesioside I derivatives as anti-HIV MIs, 3D-QSAR modeling was developed using the dataset consisting of 14 anti-HIV cycloartane triterpenoids and analogues, including DSC. Molecular docking techniques were utilized to further explain the results of the 3D-QSAR study and explore the possible binding modes. In addition, to further prove the reliability of the docking results and confirm the stability of the complex of ligand and protein, a post-docking MD simulations study was also performed. We believe that the results presented in this study can provide in-depth information for the development of new antiviral agents from triterpenes.

Chemistry
Scheme 1 depicts the syntheses of the derivatives from 1 (2a-b and 3a-c). Acylation of the 3-OH of compound 2 with benzoyl chloride and croton anhydride produced compounds 2a-b. Deacetylation and benzylation of compound 3 gave 3a-b. Furthermore, deacetylation of 3b with potassium hydroxide in ethanol yielded 3c. extensively used to guide the drug discovery [19,20]. In particular, MD simulations have become increasingly useful for understanding the structure-function relationship of the target and the essence of protein-ligand interactions [21,22]. With the advent of high-performance computing, graphics processing units (GPUs)-based MD simulations are capable of unveiling biological phenomena that were previously not feasible using traditional hardware architectures [23]. A growing trend in the application of 3D-QSAR combined with molecular docking and MD simulation study in drug design has been witnessed in recent years [24,25].
Triterpenoids and their analogs have been extensively investigated for their diverse structures and antiviral activities [26][27][28]. In our previous efforts to search for new anti-HIV agents from natural products, beesioside I (1) was found to show significant antiviral effects. The following modification on aglycone (2) of compound 1 led to the discovery of (20S,24S)-15ꞵ,16ꞵ-diacetoxy-18,24; 20,24-diepoxy-9,19-cyclolanostane-3ꞵ,25-diol 3-O-3′,3′dimethylsuccinate (3, DSC) with EC50 value of 0.025 µM, displaying similar anti-HIV potency compared to bevirimat [29]. However, the plausible binding mode of the promising derivative to CA-SP1 remains unknown. In this study, compound 1 was further modified to produce five new derivatives 2a-b and 3a-c among which compound 3b showed the strongest activity against HIV-1NL4-3 with an EC50 value of 0.28 µM (CC50 > 20 µM). To further explore the potential of beesioside I derivatives as anti-HIV MIs, 3D-QSAR modeling was developed using the dataset consisting of 14 anti-HIV cycloartane triterpenoids and analogues, including DSC. Molecular docking techniques were utilized to further explain the results of the 3D-QSAR study and explore the possible binding modes. In addition, to further prove the reliability of the docking results and confirm the stability of the complex of ligand and protein, a post-docking MD simulations study was also performed. We believe that the results presented in this study can provide in-depth information for the development of new antiviral agents from triterpenes.

Chemistry
Scheme 1 depicts the syntheses of the derivatives from 1 (2a-b and 3a-c). Acylation of the 3-OH of compound 2 with benzoyl chloride and croton anhydride produced compounds 2a-b. Deacetylation and benzylation of compound 3 gave 3a-b. Furthermore, deacetylation of 3b with potassium hydroxide in ethanol yielded 3c.

Evaluation of Anti-HIV Activity
All of the synthetic derivatives were evaluated for anti-HIV activity by determining the inhibitory effects on virus replication in MT4 lymphocytes infected by HIV-1 NL4-3 Nanoluc-sec virus. Cytotoxic activity tests were also carried out on MT4 cells. Table 1 shows the anti-HIV and cytotoxic results for the tested compounds with AZT as the positive control. Compound 3b showed the best potency among the derivatives.

3D-QSAR Study
To obtain in-depth understanding of the positive and negative effects of modifications at different sites on antiviral activity, a position-based 3D-QSAR analysis was conducted using the Schrödinger Suites 2018. The 3D-QSAR models were constructed and validated using the dataset composed of 14 cycloartane triterpenoids and analogues with their corresponding activity data reported from our laboratory. As shown in Table 2, 14 small molecules were randomly divided into a training set with 11 molecules and a test set with 3 molecules. The assembled 3D-QSAR model has the following Gaussian field fractions: 0.28 (steric), 0.11 (electrostatic), 0.20 (hydrophobic), 0.22 (hydrogen bond acceptor) and 0.19 (hydrogen bond acceptor). It exhibits a good correlation coefficient (r 2 ) of 0.99 and reliable predictivity with a cross-validation correlation coefficient (q 2 ) of 0.55. The linear correlation between the experimental results and the predicted values are shown in Figure 2 and listed in Table 2.

3D-QSAR Study
To obtain in-depth understanding of the positive and negative effects tions at different sites on antiviral activity, a position-based 3D-QSAR analy ducted using the Schrödinger Suites 2018. The 3D-QSAR models were con validated using the dataset composed of 14 cycloartane triterpenoids and an their corresponding activity data reported from our laboratory. As shown i small molecules were randomly divided into a training set with 11 molecul set with 3 molecules. The assembled 3D-QSAR model has the following G fractions: 0.28 (steric), 0.11 (electrostatic), 0.20 (hydrophobic), 0.22 (hydrogen tor) and 0.19 (hydrogen bond acceptor). It exhibits a good correlation coef 0.99 and reliable predictivity with a cross-validation correlation coefficient (q linear correlation between the experimental results and the predicted values Figure 2 and listed in Table 2.    The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b is a model   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b is a model   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity.  The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b is a model   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b   The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure  3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b is  The QSAR result was further visualized by 3D contour maps to determine the influence of steric and electrostatic fields on the activity value of the compounds and DSC was used as the template molecule in the map. The model of the steric field is shown in Figure 3a, where the favorable region of the larger steric field (in green) is located near the carboxyl group of the side chain and the D-ring linked acetyl group, suggesting that the introduction of large groups in this region could increase the activity. The yellow part suggests that the introduction of small groups could improve the activity. Figure 3b is a model of the electrostatic field, with positive coefficients shown in blue and negative coefficients in red. The dimethyl succinyl group of the C-3 side chain and the D-ring linked acetyl group as well as the hydroxyl group at position 25 are covered in red, implying that the negative charge in this region is favorable for increased activity. The blue outline shows that increasing the positive charge in this region will affect the activity positively. group as well as the hydroxyl group at position 25 are cover negative charge in this region is favorable for increased activ that increasing the positive charge in this region will affect th

Molecular Docking Study
The molecular docking studies were conducted to elabor clarify the possible binding modes of target compounds with DSC was selected as the ligand example and BVM as the pos better docking score of -8.096 kcal/mol (RMSD 2.73 Å) as c BVM (docking score = -6.574 kcal/mol, RMSD = 1.56 Å), sugg affinity for the CA-SP1 peptide than BVM. The docking diag SP1 was analyzed by the overall docking graph and three-dim detail. The results showed that BVM was vertically bound t linear extended conformation with the C-3 side chain orient the goblet-shaped hexamer as presented in Figure 4a,c, consi [30]. The binding of BVM and CA-SP1 was mainly mediated tions and hydrophobic interactions. The terminal carboxylic g BVM formed two ionic bonds with two amino acids (LYS290 a into contact with LEU363 and MET367 via hydrophobic inte

Molecular Docking Study
The molecular docking studies were conducted to elaborate the 3D-QSAR results and clarify the possible binding modes of target compounds with the active site of CA CTD -SP1. DSC was selected as the ligand example and BVM as the positive control. DSC showed a better docking score of -8.096 kcal/mol (RMSD 2.73 Å) as compared with the standard BVM (docking score = -6.574 kcal/mol, RMSD = 1.56 Å), suggesting that DSC has a higher affinity for the CA-SP1 peptide than BVM. The docking diagram of BVM/DSC with CA-SP1 was analyzed by the overall docking graph and three-dimensional interaction map in detail. The results showed that BVM was vertically bound to the center of the 6HB in a linear extended conformation with the C-3 side chain oriented toward the upper part of the goblet-shaped hexamer as presented in Figure 4a,c, consistent with the literature data [30]. The binding of BVM and CA-SP1 was mainly mediated through electrostatic interactions and hydrophobic interactions. The terminal carboxylic group of the C-3 side chain of BVM formed two ionic bonds with two amino acids (LYS290 and LYS359). BVM also came into contact with LEU363 and MET367 via hydrophobic interactions, as shown in Figure 5a,b. In contrast, DSC was slightly inclined to the center of the 6HB as depicted in Figure 4b,d. In the binding mode (Figure 5c,d), DSC was bound to CA-SP1 via two ionic bonds and hydrophobic interactions. Moreover, the acetyl group at 16-position of DSC formed a hydrogen bond with LYS359. The above results suggested that DSC could interact in a different way with the CA CTD -SP1 junction helix compared to BVM.

Molecular Dynamics
MD studies were carried out to integrate the results of the docking simulation with the analysis of molecular movements of CA-SP1 upon ligands binding. The docking poses of DSC and BVM with CA-SP1 underwent 50 ns. The kinetic energy and potential energy tended to be stable during the MDs process, as depicted in Figure S1. The total energy change in the simulation process is represented by a frequency distribution diagram. It can be observed that the energy distributions of both DSC and BVM are normal ( Figure  S2). In addition, the Awk script is used to search for the frame number corresponding to the lowest potential energy in the MDs process. The lowest potential energy structures are shown in Figure S3. Figure 6 shows that the RMSD of the reference BVM fluctuated between 0.75 and 1.75 Å, accompanied by the RMSD of DSC fluctuated between 1.00 and 2.25 Å, which was slightly larger than that of BVM. Overall, the RMSD value fluctuated less during the MD simulation, indicating that the docking poses were reliable. Visualization of the MD simulation for DSC was performed using the VMD analysis tool, seen in Figure 7.
In order to further analyze the free energy of DSC and CA-SP1, molecular mechanics/generalized born surface area (MM/GBSA) and molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA) methods were used for the calculation of the total binding energy. The energy data of BVM and DSC were carried out as listed in Table 3

Molecular Dynamics
MD studies were carried out to integrate the results of the docking simulation with the analysis of molecular movements of CA-SP1 upon ligands binding. The docking poses of DSC and BVM with CA-SP1 underwent 50 ns. The kinetic energy and potential energy tended to be stable during the MDs process, as depicted in Figure S1. The total energy change in the simulation process is represented by a frequency distribution diagram. It can be observed that the energy distributions of both DSC and BVM are normal ( Figure S2). In addition, the Awk script is used to search for the frame number corresponding to the lowest potential energy in the MDs process. The lowest potential energy structures are shown in Figure S3. Figure 6 shows that the RMSD of the reference BVM fluctuated between 0.75 and 1.75 Å, accompanied by the RMSD of DSC fluctuated between 1.00 and 2.25 Å, which was slightly larger than that of BVM. Overall, the RMSD value fluctuated less during the MD simulation, indicating that the docking poses were reliable. Visualization of the MD simulation for DSC was performed using the VMD analysis tool, seen in Figure 7.    In order to further analyze the free energy of DSC and CA-SP1, molecular mechanics/generalized born surface area (MM/GBSA) and molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA) methods were used for the calculation of the total binding energy. The energy data of BVM and DSC were carried out as listed in Table 3. The absolute values of total binding free energy of the complex of DSC-CA-SP1 obtained by the two calculation methods (-48.27 and -48.18 kcal/mol) were accordingly higher than that of BVM (-38.13 and -42.79 kcal/mol), suggesting that DSC had stronger binding ability with CA-SP1 peptide. For both compounds, all of the VDW, EE, polar and apolar energy made necessary contributions for the binding.   With the purpose of exploring the key amino acid residues around ligands in CA-SP1 protein binding, the energetic decomposition of amino acid residues was performed. Figure 8 shows amino acid residues with energy contribution values lower than −2 kcal/mol, indicating that LYS359, LYS290 and LEU363 played a major role in binding free energy. Furthermore, the alanine binding site-scan (ABS-scan) was conducted to quantitatively characterize the binding free energy of specific residues in protein-ligand binding. The results showed that the strongest interaction was brought about by the residue LYS359 with a total fraction of −7.96 kcal/mol followed by LYS290 with a calculated value of −4.45 kcal/mol. The effect of LEU363 was mainly composed of van der Waals forces with a calculated value of −2.08 kcal/mol. All of these results are consistent with the molecular docking experiments. Because all the binding free energy changes caused by the three amino acid mutations are less than −2 kcal/mol, residues LYS359, LYS290 and LEU363 can be defined as hot-spot amino acids. The contributions of each energy component are shown in Figure 9. With the purpose of exploring the key amino acid residues around ligands in CA-SP1 protein binding, the energetic decomposition of amino acid residues was performed. Figure  8 shows amino acid residues with energy contribution values lower than −2 kcal/mol, indicating that LYS359, LYS290 and LEU363 played a major role in binding free energy. Furthermore, the alanine binding site-scan (ABS-scan) was conducted to quantitatively characterize the binding free energy of specific residues in protein-ligand binding. The results showed that the strongest interaction was brought about by the residue LYS359 with a total fraction of −7.96 kcal/mol followed by LYS290 with a calculated value of −4.45 kcal/mol. The effect of LEU363 was mainly composed of van der Waals forces with a calculated value of −2.08 kcal/mol. All of these results are consistent with the molecular docking experiments. Because all the binding free energy changes caused by the three amino acid mutations are less than −2 kcal/mol, residues LYS359, LYS290 and LEU363 can be defined as hot-spot amino acids. The contributions of each energy component are shown in Figure 9.

Chemistry
All chemical reagents and solvents were obtained from Sigma Aldrich or other commercial source and used without further purification. 1 H and 13 C NMR spectra were

Chemistry
All chemical reagents and solvents were obtained from Sigma Aldrich or other commercial source and used without further purification. 1 H and 13 C NMR spectra were measured in pyridine-d 5 on Bruker AV III 600 NMR spectrometer (Bruker, Billerica, German) with TMS as internal standard. HRESIMS spectra were performed on an LTQ Obitrap XL spectrometer (Thermo Fisher Scientific, Boston, MA, USA). Melting points were determined on an X-4B apparatus (Shanghai Precision Instruments Co., Ltd., Shanghai, China) and are uncorrected. Optical rotations were obtained on an Anton Paar MCP200 polarimeter (Anton Paar, Graz, Austria). Semi-preparative HPLC on a CXTH LC3050N equipped with two pumps of P3000 and a UV3000 detector (Beijing ChuangXinTongHeng Science and Technology Co., Ltd., Beijing, China), and a ZORBAX Eclipse XDB-C18 (9.4 × 250 mm, 5 µm, Agilent, Santa Clara, CA, USA) column. Precoated silica gel GF 254 plates (Zhi Fu Huang Wu Pilot Plant of Silica Gel Development, Yantai, China) and Merck precoated silica gel 60 F254 plates were used for TLC and spots visualized by heating plates sprayed with 10% H 2 SO 4 in EtOH. Flash chromatograph was carried out on a Biotage Isolera Four system with a Buchi FlashPure Select Silica gel cartridge (15 µm, spherical, 4 g).

General Procedure for Synthesis of Compounds 2a-b
Compound 2 (0.017 mmol) in anhydrous pyridine (2.0 mL) was acylated separately with benzoyl chloride (0.02 mL) and crotonic anhydride (0.02 mL) with DMAP followed by chromatography to yield 2a and 2b, respectively. NMR and MS spectra of 2a-b were provided in the Supplementary Materials (see Figures S4-S9).

General Procedure for Synthesis of Compounds 3a-c
Compound 3 (0.042 mmol) was treated separately with 2.5% KOH in EtOH (5 mL) and bromobenzene (0.01 mL) and potassium carbonate (12 mg) in DMF (5 mL) to give 3a (36.8% yield) and 3b (45.2% yield), respectively. Compound 3b (0.012 mmol) was treated with 2.5% KOH in EtOH (3 mL) to give 3c (53.5% yield). The solution was monitored by TLC until completed. The crude product was subjected to silica gel column chromatography followed by purification with semi-preparative HPLC using CH 3 CN/H 2 O as a mobile phase to give a colorless solid. NMR and MS spectra of 3a-c were given in the Supplementary Materials (see Figures S10-S18). In vitro activity was determined by a previously described HIV-1 infection assay [31]. Briefly, a 96-well microtiter plate was used to set up the screening assay using MT4 cells infected by HIV-1 NL4-3 Nanoluc-sec virus in the presence of various concentrations of derivatives. The viral replication thus can be monitored by measuring the luciferase activity using the Promega Nano-Glo Luciferase Assay System.

Cytotoxicity Assay
The cytotoxicity of the synthesized derivatives was determined using the CellTiter-Glo ® Luminescent Cytotoxicity Assay (Promega, Madison, WI, USA). MT-4 cells were cultured in the presence of various concentrations of the compounds for 3 days. The percentage of viable cells was measured by following the protocol provided by the manufacturer. The CC 50 was derived using the Quest Graph™ IC 50 Calculator (https://www. aatbio.com/tools/ic50-calculator, accessed on 7 October 2021).

3D-QSAR Study
A total of 14 compounds of cycloartane triterpenoid type mother nuclear structure with anti-HIV activity ranging from 0.025 to 7.62 µM were selected as the dataset in our experiments. Small molecules were preprocessed using the LigPrep module of the Schrödinger Suite (Schrödinger 2018, LLC, New York, NY, USA). The whole dataset was randomly divided into the 80% training set and 20% test set. The 3D-QSAR model was constructed in the 3D field-based module of the Schrödinger software. The IC 50 values were converted into the PIC 50 values and used as the dependent variables for QSAR analyses. A Gaussian field style was employed and partial least-square (PLS) was used for the regression analysis. The van der Waals potential and the electrostatic potential were considered as separate terms.

Molecular Docking
Molecular docking was performed using the Molecular Operating Environment software (MOE 2019) (Chemical Computing Group, Montreal, Canada). The crystal structure of the CA CTD -SP1 Gag fragment (PDB ID: 5I4T) was obtained from the RCSB Protein Data Bank. The structure file was subjected to the QuickPrep procedure of MOE. The BVM structure was taken from the PubChem database. The 3D structure of DSC was generated using Chem3D software 2018. Multiple conformations of small molecules in the format of mol2 were generated as ligand libraries using Frog2.14 program (https://bioserv.rpbs.univ-paris-diderot.fr/services/Frog2/, accessed on 7 October 2021). The partial charges of all protein and ligand atoms were calculated using the implemented Amber10: EHT force field. For both ligands, all poses were located inside the cavity formed by the 6HB with different positions or orientations. Docking simulation was carried out choosing the triangle matcher for placement of the ligand in the binding site and ranked with the London dG scoring function. The best 30 poses were passed to refinement and energy minimization using the rigid receptor method and then rescored with the GBVI/WSA dG scoring function.

Molecular Dynamics Simulations
MDs were performed using the AMBER package version 22 within the Ubuntu 20.04 Linux operating system which was installed on high-performance computing cluster. All calculations were implemented using AMBER s PMEMD engine running on CUDAenabled GPUs. The CA CTD -SP1 protein adopts the ff14SB force field with the TIP3P water model and small molecules select the GAFF force field [32]. The best scored binding poses calculated by molecular docking were used as input for MD. The antechamber module (implemented in AmberTools22) was utilized for the preparation of the ligands. The protein was pretreated with the leap module. The water box and the complex system were separately optimized. The energy minimizations were performed on the solvated systems for 1000 steps using a combination of steepest descent and conjugate gradient methods. After the energy of the system converges, the optimization process will automatically end. The system was then gradually heated to 300 K and equilibrated under the NVT ensemble. MD productions were performed for 50 ns timescale in two steps for each complex, and the trajectories were gathered every 1 ps. The first step was 40 ns to ensure that the conformation had stabilized, and the second step was 10 ns for energy calculation and interaction analysis. The energy and temperature data in the simulation process were generated by Perl script, and the change trend of data was visualized by Xmgrace program. The minimum value of potential energy data was found by Awk script, then the structure was extracted by CPPTRAJ to obtain the lowest potential energy structure [33]. Visualization platform Visual Molecular Dynamics (VMD, version 1.9.4a50) was used for analysis of the MDs results [34].
The binding free energies (ignoring entropy changes) of CA CTD -SP1 to DSC and BVM were calculated using Amber22 by the MM/PBSA and MM/GBSA methods, which are common for the calculation of affinity energy [35]. The binding free energy was decomposed into molecular mechanics terms and solvation energies, respectively. The van der Waals forces, electrostatic interactions, polar solvation energies and apolar solvation energies were computed [36,37].
Understanding the process of ligand recognition by proteins is vital for drug design. The protein residues present at the binding site form pockets that provide a conducive environment for ligand recognition. A particular residue or cluster of residues near the pocket are called hotspot residues that will result in a large change in the binding free energy when induced to mutate to alanine [38][39][40]. Change in the binding energy caused by alanine mutation of the active site can be used to identify what residues may be important for interaction. The alanine mutation scanning experiments were performed on the amino acids with the highest energy contribution in the residue decomposition results. The amino acids with an energy difference of less than -2 kcal/mol were generally considered to be hotspot amino acids [41].

Conclusions
In summary, five new beesioside I derivatives were designed, synthesized and evaluated for the anti-HIV efficiency in vitro, among which compound 3b showed significant antiviral activity against HIV-1 NL4-3 with an EC 50 value of 0.28 µM. The molecular modeling including 3D-QSAR, molecular docking and MDs techniques was employed to investigate the potential of beesioside I derivatives as MIs. Based on the 3D-QSAR model using the dataset consisting of 14 cycloartane triterpenoids and analogues, the important structural factors affecting anti-HIV activity of beesioside I derivatives were revealed. Molecular docking studies are complementary to the 3D-QSAR results, showing that DSC was bound to CA-SP1 protein through electrostatic, hydrogen bond and hydrophobic interactions with stronger binding affinity than that of the standard BVM. The results of MDs were consistent with the docking simulation, revealing that the complex of the ligand and the protein was stable within 50 ns and that residue LYS359, LYS290 and LEU363 played a major role in binding of CA-SP1 to DSC. All the results showed that DSC acted in a different way from BVM as the potential MI. Overall, the beesioside I derivative could be considered as a promising antiviral agent, worthy of further investigation.