Screening and Analysis of Potential Inhibitors of SHMT2

: Serine hydroxymethyltransferase 2 (SHMT2) has garnered signiﬁcant attention as a critical catalytic regulator of the serine/glycine pathway in the one-carbon metabolism of cancer cells. Despite its potential as an anti-cancer target, only a limited number of inhibitors have been identiﬁed so far. In this study, we employed seven different scoring functions and skeleton clustering to screen the ChemDiv database for 38 compounds, 20 of which originate from the same skeleton structure. The most signiﬁcant residues from SHMT2 and chemical groups from the inhibitors were identiﬁed using ASGBIE (Alanine Scanning with Generalized Born model and Interaction Entropy), and the binding energy of each residue was quantitatively determined, revealing the essential features of the protein–inhibitor interaction. The two most important contributing residues are TYR105 and TYR106 of the B chain followed by LEU166 and ARG425 of the A chain. The ﬁndings will be greatly helpful in developing a thorough comprehension of the binding mechanisms involved in drug–SHMT2 interactions and offer valuable direction for designing more potent inhibitors.


Introduction
The metabolic process of cancer cells is very different compared to that of normal cells, and this abnormal cellular metabolic process is termed reprogrammed metabolism, in which one-carbon unit (1C) metabolism [1][2][3] is necessary to maintain the proliferation and growth of cancer cells.One-carbon unit (1C) is mainly derived from the conversion of serine and glycine in mitochondria, and is transported intracellularly as tetrahydrofolate (THF), which is a carrier of the substances that are essential for cell proliferation [1,[3][4][5][6][7][8][9][10][11][12][13].Hence, predicting tumor behavior and inhibiting the essential pathways involved in one-carbon metabolism presents an opportunity to restrain tumor progression.
Serine hydroxymethyltransferase is a critical enzyme involved in the synthesis of serine and glycine.In human cells, there are two subtypes of this enzyme present in the cytoplasm and mitochondria, named SHMT1 (cytoplasmic) and SHMT2 (mitochondrial) according to their location.Among them, SHMT2 is related to the reprogramming metabolic process of many cancers, such as liver cancer, breast cancer, B-cell lymphoma, lung cancer, tongue squamous cell carcinoma, and glioma, and other cancers and tumors have an obvious high expression [14][15][16].Given its role in cancer metabolism, SHMT2 has garnered as an attractive therapeutic target, receiving widespread attention in recent years.
Research [17] indicates that both antibodies and small molecules can exert inhibitory effects on proteins.Currently, there are two major classes of molecules known for their inhibitory effects on SHMT2.These include pyrano-and pyrazole-like molecules as well as structural analogues of folic acid.The former class has been shown to inhibit the activity of plant SHMT2 proteins and has even been published as an herbicide [18].Researchers have also tested compounds with this backbone on human-derived SHMT2 and observed the same inhibitory effect [19].The binding strength of pyrano-and pyrazole-like molecules to SHMTs has been determined with IC50 values, ranging from 10 nM to 5 µM [14].In contrast, folic acid structural analogue molecules such as pemetrexed and lomitrexed exhibit weak binding to SHMT2, with IC50 values ranging from 100 µM [20,21].
The drug development process is lengthy and costly, and in order to address these two major issues, computer-aided drug design (CADD) [22] has emerged.Currently, there are two types of computer-aided drug design methods: ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS) [23,24].After obtaining the selected potential inhibitor molecules, it is necessary to quantitatively analyze the protein-ligand interactions and accurately predict the binding affinity between the protein and the molecules.There are several computational methods available for predicting binding energy, among which MM/PB(GB)SA [25] is widely used.However, this method does not account for the influence of entropy on binding free energy.In order to predict binding energy more accurately between the protein and the ligand molecules, our research group has developed a novel method named ASGBIE [26][27][28].It involves using alanine scanning [29], MM/GBSA, and interaction entropy [30].
He et al. calculated the binding affinities of 28 ligands to SHMT2 [28] and also performed structure-based virtual screening (SBVS) on the Specs database, resulting in the identification of three potential inhibitors that are verified by experiment [31].The scaffold of the inhibitor molecules identified by Liping He et al. consists of the pyranopyrazole scaffold.In contrast, among the small molecules that have screened, only a fraction of compounds exhibits the pyranopyrazole scaffold, while the remaining small molecules possess distinct scaffolds.
In this study, a screening was performed on 1.6 million molecules from the ChemDiv database, resulting in the identification of 38 potential SHMT2 inhibitor molecules.To quantitatively predict the binding energy between the protein SHMT2 and its inhibitor molecules, the ASGBIE method was employed to analyze the protein-ligand interactions.A comparison was made with the binding mode of the protein-ligand molecule in the crystal structure 5V7I, revealing the involvement of the same key residues in the binding energy, consistent with the crystal structure.

Virtual Screening Results
The results of skeleton clustering are shown in Figure 1.During the skeleton clustering process, a classification criterion cutoff of 0.4 was chosen.As a result, 5000 molecules were divided into 435 clusters based on their skeletons.The largest cluster consisted of 41 compounds, while 264 clusters contained only a single compound each.Moreover, the distribution of compounds within the clusters provided valuable insights into the structural diversity present in the dataset.
Twenty molecules were selected based on the backbone clustering for their potential inhibitory activity.All 20 of these compounds share a common backbone structure of 6-amino-1,4-dihydropyrano[2,3-c] pyrazole-5-carbonitrile, as illustrated in Figure 2, which is the molecular scaffold structure of the existing inhibitor.In addition, the other 18 compounds with the highest GlideScore in the clusters containing molecules with different structures were selected.Furthermore, 38 molecules were selected that might be associated with molecules with SHMT2 inhibitory activity.Table S1 displays the structures for each of these 38 molecules.2, wh is the molecular scaffold structure of the existing inhibitor.In addition, the other 18 co pounds with the highest GlideScore in the clusters containing molecules with differ structures were selected.Furthermore, 38 molecules were selected that might be asso ated with molecules with SHMT2 inhibitory activity.Table S1 displays the structures each of these 38 molecules.The ranking of all molecules in the seven scoring functions is presented in Table Each molecule number starts with the letter C, where the first number denotes the ord of the 19 selected clusters and the second number represents the Glide scoring ranking the molecule in the cluster.

Result from Molecular Dynamics Simulation
The compounds C12_14 and C12_17 possess the same molecular backbone but hibit a significant difference in their comprehensive ranking.To assess the influence simulation time on the simulation results, molecular dynamics simulations were p formed, using a 100 ns MD simulation, on the complexes formed by these two compoun with SHMT2. Figure 3a,b show the RMSD values of the these two compounds over a 1 ns simulation.Specifically, Figure 3a represents the simulation trajectory of the C12_1 SHMT2 complex system, while Figure 3b represents the C12_17-SHMT2 complex syste As shown in the figure, the RMSD of both complex systems remains below 2 Å through the entire simulation process, and the fluctuation of these two systems are quite smal scale (less than 1 angstrom).We also plotted RMSD values of both ligand-bound and a proteins in Figure 3c.Also, the RMSD of the ligand-bound protein is slightly larger th that of the apo-protein, but the difference is quite small and is largely believed to be  Twenty molecules were selected based on the backbone clustering for their potential inhibitory activity.All 20 of these compounds share a common backbone structure of 6amino-1,4-dihydropyrano[2,3-c] pyrazole-5-carbonitrile, as illustrated in Figure 2, which is the molecular scaffold structure of the existing inhibitor.In addition, the other 18 compounds with the highest GlideScore in the clusters containing molecules with different structures were selected.Furthermore, 38 molecules were selected that might be associated with molecules with SHMT2 inhibitory activity.Table S1 displays the structures for each of these 38 molecules.The ranking of all molecules in the seven scoring functions is presented in Table S2.Each molecule number starts with the letter C, where the first number denotes the order of the 19 selected clusters and the second number represents the Glide scoring ranking of the molecule in the cluster.

Result from Molecular Dynamics Simulation
The compounds C12_14 and C12_17 possess the same molecular backbone but exhibit a significant difference in their comprehensive ranking.To assess the influence of simulation time on the simulation results, molecular dynamics simulations were performed, using a 100 ns MD simulation, on the complexes formed by these two compounds with SHMT2. Figure 3a,b show the RMSD values of the these two compounds over a 100 ns simulation.Specifically, Figure 3a represents the simulation trajectory of the C12_14-SHMT2 complex system, while Figure 3b represents the C12_17-SHMT2 complex system.As shown in the figure, the RMSD of both complex systems remains below 2 Å throughout the entire simulation process, and the fluctuation of these two systems are quite small in scale (less than 1 angstrom).We also plotted RMSD values of both ligand-bound and apo proteins in Figure 3c.Also, the RMSD of the ligand-bound protein is slightly larger than that of the apo-protein, but the difference is quite small and is largely believed to be the The ranking of all molecules in the seven scoring functions is presented in Table S2.Each molecule number starts with the letter C, where the first number denotes the order of the 19 selected clusters and the second number represents the Glide scoring ranking of the molecule in the cluster.

Result from Molecular Dynamics Simulation
The compounds C12_14 and C12_17 possess the same molecular backbone but exhibit a significant difference in their comprehensive ranking.To assess the influence of simulation time on the simulation results, molecular dynamics simulations were performed, using a 100 ns MD simulation, on the complexes formed by these two compounds with SHMT2. Figure 3a,b show the RMSD values of the these two compounds over a 100 ns simulation.Specifically, Figure 3a represents the simulation trajectory of the C12_14-SHMT2 complex system, while Figure 3b represents the C12_17-SHMT2 complex system.As shown in the figure, the RMSD of both complex systems remains below 2 Å throughout the entire simulation process, and the fluctuation of these two systems are quite small in scale (less than 1 angstrom).We also plotted RMSD values of both ligand-bound and apo proteins in Figure 3c.Also, the RMSD of the ligand-bound protein is slightly larger than that of the apo-protein, but the difference is quite small and is largely believed to be the result of random fluctuations of individual trajectories.Thus, the conformational change in the protein upon ligand binding should be quite small.result of random fluctuations of individual trajectories.Thus, the conformational change in the protein upon ligand binding should be quite small. .RMSD of two SHMT2-ligand complex systems and that of the apo-SHMT2 system: (a) the C12_14-SHMT2 complex system, (b) the C12_17-SHMT2 complex system, and (c) the apo-SHMT2 system in comparison with the C12_14-SHMT2 system.

Binding Free Energies
Data for the binding free energy calculations for the 38 ligands and 8Z1 in the 5V7I complex are provided in Figure 4 and Table S3.From Figure 4 it is evident that among the 38 compounds studied, C14_1 and C12_14 are two compounds with different skeletons, and they exhibit superior binding ability towards the receptor protein, with binding free energies of −32.27 kcal/mol and −25.22 kcal/mol, respectively.It is worth noting that the binding free energies of the top two compounds with SHMT2 exceed −25 kcal/mol.The binding free energy of 8Z1 is −24.61kcal/mol.These findings indicate that C14_1 and C12_14 compounds exhibit exceptional binding affinity towards SHMT2 and hold significant promise as drug candidates for inhibiting the biological activity of the SHMT2 target protein.Our focus will be on investigating the binding mechanism of 8Z1, C14_1, and C12_14 with the receptor proteins.
Figure 3. RMSD of two SHMT2-ligand complex systems and that of the apo-SHMT2 system: (a) the C12_14-SHMT2 complex system, (b) the C12_17-SHMT2 complex system, and (c) the apo-SHMT2 system in comparison with the C12_14-SHMT2 system.

Binding Free Energies
Data for the binding free energy calculations for the 38 ligands and 8Z1 in the 5V7I complex are provided in Figure 4 and Table S3.From Figure 4 it is evident that among the 38 compounds studied, C14_1 and C12_14 are two compounds with different skeletons, and they exhibit superior binding ability towards the receptor protein, with binding free energies of −32.27 kcal/mol and −25.22 kcal/mol, respectively.It is worth noting that the binding free energies of the top two compounds with SHMT2 exceed −25 kcal/mol.The binding free energy of 8Z1 is −24.61kcal/mol.These findings indicate that C14_1 and C12_14 compounds exhibit exceptional binding affinity towards SHMT2 and hold significant promise as drug candidates for inhibiting the biological activity of the SHMT2 target protein.Our focus will be on investigating the binding mechanism of 8Z1, C14_1, and C12_14 with the receptor proteins.

Identification of Hot Spots
The ASGBIE method was used to calculate the detailed results of each part for the complex system C14_1-SHMT2, as shown in Table 1.Only the energy of each part of the amino acid residue with ∆∆G < −1.0 kcal/mol is displayed in the table.The binding free energy of the composite system is composed of enthalpy and entropy, with enthalpy being composed of four components: van der Waals energy, electrostatic energy, polar solvation energy, and non-polar solvation energy.The data in the table indicate that the van der Waals energies are significantly higher than those of the other components, and thus dominate the binding mechanism in the complex system.

Identification of Hot Spots
The ASGBIE method was used to calculate the detailed results of each part for the complex system C14_1-SHMT2, as shown in Table 1.Only the energy of each part of the amino acid residue with ∆∆G < −1.0 kcal/mol is displayed in the table.The binding free energy of the composite system is composed of enthalpy and entropy, with enthalpy being composed of four components: van der Waals energy, electrostatic energy, polar solvation energy, and non-polar solvation energy.The data in the table indicate that the van der Waals energies are significantly higher than those of the other components, and thus dominate the binding mechanism in the complex system.In addition, to verify the reliability of the calculation results, a comparison was made between the residue contributions of C14_1-SHMT2 and 5V7I, as shown in Figure 5.It illustrates that the residue TYR105, located on the B chain near the binding pocket, is the residue that contributes most to the binding of the ligand and receptor proteins.The residues LEU166 on the A chain and TYR106 on the B chain also significantly contribute to the binding energy.

ASN408_A
− In addition, to verify the reliability of the calculation results, a comparison was made between the residue contributions of C14_1-SHMT2 and 5V7I, as shown in Figure 5.It illustrates that the residue TYR105, located on the B chain near the binding pocket, is the residue that contributes most to the binding of the ligand and receptor proteins.The residues LEU166 on the A chain and TYR106 on the B chain also significantly contribute to the binding energy.Not all amino acid residues in proteins significantly contribute to the binding of small drug molecules to target proteins.Only a few key residues located near the binding pocket play a crucial role in binding molecules to receptor proteins.These amino acids are commonly referred to as hot spots.To uncover the binding mechanism of receptor proteins and molecules and provide ideas and schemes for designing drugs that target SHMT2, it is essential to analyze the binding mechanism of hot spots to molecules and study their contribution to the binding free energy.In this study, hot spots were defined as those that contributed less than −2 kcal/mol to the binding free energy.
Compounds C14_1 and C12_14 have different scaffolds, with C12_14 featuring a furan-pyrazole scaffold structure.These two compounds exhibit the highest binding free energies when interacting with the protein SHMT2; thus, we selected these two complex systems and 5V7I for binding mechanism analysis.Figure 6 ranks the hot spot residues in each complex system according to their energy contribution.Not all amino acid residues in proteins significantly contribute to the binding of small drug molecules to target proteins.Only a few key residues located near the binding pocket play a crucial role in binding molecules to receptor proteins.These amino acids are commonly referred to as hot spots.To uncover the binding mechanism of receptor proteins and molecules and provide ideas and schemes for designing drugs that target SHMT2, it is essential to analyze the binding mechanism of hot spots to molecules and study their contribution to the binding free energy.In this study, hot spots were defined as those that contributed less than −2 kcal/mol to the binding free energy.
Compounds C14_1 and C12_14 have different scaffolds, with C12_14 featuring a furan-pyrazole scaffold structure.These two compounds exhibit the highest binding free energies when interacting with the protein SHMT2; thus, we selected these two complex systems and 5V7I for binding mechanism analysis.Figure 6 ranks the hot spot residues in each complex system according to their energy contribution.
Figure 6a shows that among the complex systems of compound C14_1 and SHMT2, residue TYR105_B contributes the most to the binding free energy, with an energy of −4.18 kcal/mol, followed by LEU166_A residue with −3.92 kcal/mol.Furthermore, residue LYS409 contributes −1.80 kcal/mol of binding energy, which is very close to the energy contribution of hot spot residues.
The total binding energy of the complex system C12_14-SHMT2 is second only to that of C14_1-SHMT2.From Figure 6b, it can be found that six residues in this system are predicted to be hot spots, with the highest incidence.The A-chain residue LEU166 contributes the most energy of −4.39 kcal/mol, followed by the B-chain residue TYR105 and the A-chain residue ARG425, which generate energy equal to −3.67 kcal/mol and −3.27 kcal/mol, respectively.These three hot spot residues with the highest binding energy contributions were the same as those in the C14_1-SHMT2 system.
As can be observed from Figure 6c, the hot spot residues in 5V7I are identical to those in the two aforementioned systems, and the binding energy contribution of residue TYR105 in chain B is the highest.Figure 6a shows that among the complex systems of compound C14_1 and SHMT2, residue TYR105_B contributes the most to the binding free energy, with an energy of −4.18 kcal/mol, followed by LEU166_A residue with −3.92 kcal/mol.Furthermore, residue LYS409 contributes −1.80 kcal/mol of binding energy, which is very close to the energy contribution of hot spot residues.
The total binding energy of the complex system C12_14-SHMT2 is second only to that of C14_1-SHMT2.From Figure 6b, it can be found that six residues in this system are predicted to be hot spots, with the highest incidence.The A-chain residue LEU166 contributes the most energy of −4.39 kcal/mol, followed by the B-chain residue TYR105 and the A-chain residue ARG425, which generate energy equal to −3.67 kcal/mol and −3.27 kcal/mol, respectively.These three hot spot residues with the highest binding energy contributions were the same as those in the C14_1-SHMT2 system.Furthermore, to further analyze the binding mechanism of the ligand-protein complex systems and the role of hot spot residues, we calculated the average energy contribution of all hot spot residues in the 38 systems, as shown in Figure 7.
Figure 7 shows that the B-chain residue TYR105 and the A-chain residue LEU166 were consistently predicted to be hot spot residues in all 38 complex systems, with an average energy contribution of −3.56 kcal/mol and −3.10 kcal/mol, respectively.This suggests that these two residues have the most significant impact on binding affinity and are the most critical amino acids in the binding process.Additionally, the A-chain residue ARG425 and the B-chain residue TYR106 also play important roles in binding energy, with an average energy contribution of −3.09 kcal/mol and −2.72 kcal/mol.Moreover, there was a slightly less predictable, but still significant hot spot on the A-chain residue TYR176, with an average energy contribution of −2.46 kcal/mol.Additionally, residue 98GLU_B was identified as a hot spot only once and contributed a binding energy of −2.06 kcal/mol.
Based on the above study, of the complex systems formed by the binding of 38 compounds to SHMT2 proteins, four hot spot residues made the most significant contribution to the binding mechanism.These residues were LEU166 and ARG425 from the A chain and TYR105 and TYR106 from the B chain.These residues were found to have the highest frequency of occurrence and the greatest influence on the binding mechanism.
Biophysica 2023, 3, FOR PEER REVIEW As can be observed from Figure 6c, the hot spot residues in 5V7I are identical to in the two aforementioned systems, and the binding energy contribution of re TYR105 in chain B is the highest.
Furthermore, to further analyze the binding mechanism of the ligand-protein plex systems and the role of hot spot residues, we calculated the average energy con tion of all hot spot residues in the 38 systems, as shown in Figure 7. Figure 7 shows that the B-chain residue TYR105 and the A-chain residue LE were consistently predicted to be hot spot residues in all 38 complex systems, wi average energy contribution of −3.56 kcal/mol and −3.10 kcal/mol, respectively.This gests that these two residues have the most significant impact on binding affinity an the most critical amino acids in the binding process.Additionally, the A-chain re ARG425 and the B-chain residue TYR106 also play important roles in binding energy an average energy contribution of −3.09 kcal/mol and −2.72 kcal/mol.Moreover, ther a slightly less predictable, but still significant hot spot on the A-chain residue TY with an average energy contribution of −2.46 kcal/mol.Additionally, residue 98G was identified as a hot spot only once and contributed a binding energy of −2.06 kca Based on the above study, of the complex systems formed by the binding of 38 pounds to SHMT2 proteins, four hot spot residues made the most significant contrib to the binding mechanism.These residues were LEU166 and ARG425 from the A and TYR105 and TYR106 from the B chain.These residues were found to have the hi frequency of occurrence and the greatest influence on the binding mechanism.

Analysis of Binding Mechanism
Exploring the way in which hot spot residues bind to molecules and their con tion to the binding of both is essential to understand the binding mechanism bet molecules and receptor proteins.This knowledge can even provide useful guideline suggestions for the design of targeted SHMT2 inhibitor molecules.To achieve this we decomposed the binding free energy on the hot spot residues of each system in contributions of van der Waals energy, nonpolar solvation energy, electrostatic en and polar solvation energy.Figure 8 displays the different types of energy contribu for all hot spot residues predicted in the three systems.

Analysis of Binding Mechanism
Exploring the way in which hot spot residues bind to molecules and their contribution to the binding of both is essential to understand the binding mechanism between molecules and receptor proteins.This knowledge can even provide useful guidelines and suggestions for the design of targeted SHMT2 inhibitor molecules.To achieve this goal, we decomposed the binding free energy on the hot spot residues of each system into the contributions of van der Waals energy, nonpolar solvation energy, electrostatic energy, and polar solvation energy.Figure 8 displays the different types of energy contributions for all hot spot residues predicted in the three systems.
A comprehensive analysis of the energy terms for each residue in the binding energy contribution clearly shows that van der Waals interactions of hot spot residues dominate the entire binding process.For example, in the C14_1-SHMT2 complex system, the van der Waals energies of the four hot spot residues were −5.05 kcal/mol, −4.50 kcal/mol, −3.18 kcal/mol, and −2.40 kcal/mol, while the electrostatic energies were only 0.05 kcal/mol, −0.07 kcal/mol, −0.08 kcal/mol, and 0.13 kcal/mol, which were much less than the energy contribution of the van der Waals part.The figure also indicates that the main contribution to the binding energy in other systems is also the van der Waals energy part, while nonpolar solvation energy, electrostatic energy, and polar solvation energy have little effect on the overall binding effect.
Through an analysis of the binding free energy of the three complex systems and the specific energy contribution of individual residues near the binding pocket, we were able to identify the four hot spot residues, TYR105_B, LEU166_A, ARG425_A, and TYR106_B, which play a major role in the binding mechanism.To deeply investigate the interaction pattern between the compound molecules and the hot spot residues, we used the spatial conformation of the C14_1-SHMT2 system as a representative, and compared it with the interaction pattern between the protein and ligand in 5V7I, which is named 8Z1.The interaction pattern of the compound molecules with the hot spot residues was investigated in depth, as shown in Figure 9.
In the C14_1-SHMT2 system, the main contribution of the four hot spot residues is from the van der Waals energy part, with the B-chain residue TYR105 having the largest contribution.The figure shows that TYR105_B has a strong π-π interaction with the benzene ring of the ligand molecule, while the A-chain residue LEU166 has a significant π-alkyl interaction with the benzene ring of the ligand molecule, but the distance to the ligand molecule is greater compared to residue TYR105_B, and the π-alkyl interaction is also weaker than the π-π interaction.These findings are consistent with the calculated energy results.Additionally, van der Waals interactions exist between the hot spot residues ARG425_A and TYR106_B and the ligand molecule, with the A-chain residue ARG425 being closer to the ligand and having a stronger effect.This is why the contribution of ARG425_A in the calculated results is larger than that of TYR106_B, but less than the energy contribution of TYR105_B and LEU166_A.A comprehensive analysis of the energy terms for each residue in the binding energy contribution clearly shows that van der Waals interactions of hot spot residues dominate the entire binding process.For example, in the C14_1-SHMT2 complex system, the van der Waals energies of the four hot spot residues were −5.05 kcal/mol, −4.50 kcal/mol, −3.18 kcal/mol, and −2.40 kcal/mol, while the electrostatic energies were only 0.05 kcal/mol, −0.07 kcal/mol, −0.08 kcal/mol, and 0.13 kcal/mol, which were much less than the energy contribution of the van der Waals part.The figure also indicates that the main contribution to the binding energy in other systems is also the van der Waals energy part, while nonpolar solvation energy, electrostatic energy, and polar solvation energy have little effect on the overall binding effect.
Through an analysis of the binding free energy of the three complex systems and the specific energy contribution of individual residues near the binding pocket, we were able to identify the four hot spot residues, TYR105_B, LEU166_A, ARG425_A, and TYR106_B, which play a major role in the binding mechanism.To deeply investigate the interaction pattern between the compound molecules and the hot spot residues, we used the spatial conformation of the C14_1-SHMT2 system as a representative, and compared it with the interaction pattern between the protein and ligand in 5V7I, which is named 8Z1.The interaction pattern of the compound molecules with the hot spot residues was investigated in depth, as shown in Figure 9.In the C14_1-SHMT2 system, the main contribution of the four hot spot residues is from the van der Waals energy part, with the B-chain residue TYR105 having the largest contribution.The figure shows that TYR105_B has a strong π-π interaction with the benzene ring of the ligand molecule, while the A-chain residue LEU166 has a significant πalkyl interaction with the benzene ring of the ligand molecule, but the distance to the ligand molecule is greater compared to residue TYR105_B, and the π-alkyl interaction is also To verify the accuracy and reliability of the binding mechanism analysis, the interaction between the ligand molecule and the protein in 5V7I was analyzed.In the crystal structure of SHMT2, there is a strong π-π interaction between the B-chain residue TYR105 and the ligand molecule, and the other three residues near the binding pocket form a strong π-alkyl interaction with the ligand molecule, which shows that the van der Waals energy is the part that makes the main contribution to the binding interaction, which is consistent with the conclusion based on the calculation results.
Based on the predictions of hot spot residues and the analysis of spatial conformation, we found that four hot spot residues, namely, TYR105_B, LEU166_A, ARG425_A, and TYR106_B, are crucial for the formation of a stable complex between the compound and SHMT2 protein.During the ligand-acceptor binding process, the dominant contribution to the binding comes from the van der Waals energy fraction, primarily due to the formation of a large number of π-π interactions and π-alkyl interactions between the benzene ring of the molecule and the surrounding residues.

Method
The flow chart is shown in Figure 10.In this study, we aimed to screen a large number of molecules in the ChemDiv database, which initially contained 1.6 million molecules.To reduce the number of molecules and increase the likelihood of finding drug-like compounds, we conducted a preliminary screening based on the "Five Principles of Drug-like Drugs".In order to refine the selection of potential inhibitor molecules, we utilized seven different scoring functions, including GlideScore [32][33][34][35][36][37], ad4_scoring, dkoes_scoring, vina, vinardo (these four functions are in the Smina molecular docking program [38], Delta-Vina [39,40], and AA-Score [41], to perform consensus scoring on the top 10,000 compounds obtained using GlideScore.From these results, we selected 1254 molecules that were ranked in top 5000 across all seven scoring methods.
Biophysica 2023, 3, FOR PEER REVIEW 11 Delta-Vina [39,40], and AA-Score [41], to perform consensus scoring on the top 10,000 compounds obtained using GlideScore.From these results, we selected 1254 molecules that were ranked in top 5000 across all seven scoring methods.The selection of potential drugs is a critical step in drug discovery.Given that the properties of molecules are closely related to their structures, we utilized the Butina algorithm to cluster 1254 molecules based on their structural similarities.By doing so, we were able to efficiently and accurately screen and ultimately identify 38 potential drug candidates.To accurately determine the binding strength of 38 potential inhibitor molecules to SHMT2, we conducted molecular dynamics simulations [42,43] of 38 protein-ligand complex systems.Using the ASGBIE method, we calculated the binding free energy of each system and analyzed the binding mechanism of each molecule.The selection of potential drugs is a critical step in drug discovery.Given that the properties of molecules are closely related to their structures, we utilized the Butina algorithm to cluster 1254 molecules based on their structural similarities.By doing so, we were able to efficiently and accurately screen and ultimately identify 38 potential drug candidates.To accurately determine the binding strength of 38 potential inhibitor molecules to SHMT2, we conducted molecular dynamics simulations [42,43] of 38 protein-ligand complex systems.Using the ASGBIE method, we calculated the binding free energy of each system and analyzed the binding mechanism of each molecule.

Molecular Docking
We obtained 1.6 million compounds from ChemDiv (http://www.chemdiv.com(accessed on 24 May 2022)), which were then screened using the Lipinski's rule of five, which is a set of guidelines used in drug discovery and designed to evaluate the drug-likeness of chemical compounds.For the primary screening, we established the following criteria 1. molecular weight greater than 250 and less than 500; 2. hydrogen bond donor count less than 5; 3. hydrogen bond acceptor count less than 10; 4. lipophilicity coefficient less than 5; and 5. number of rotatable bonds not exceeding 10.As a result of this primary screening, we obtained 1,071,548 molecules that met the necessary criteria.Next, we optimized the compounds according to various pH environments using the LigPrep function of the Glide module.For the docking process, we utilized the crystal structure of the SHMT2 obtained from the PDB database (PDB ID: 5V7I [14]).The SHMT2 comprises two chains, A and B, as illustrated in Figure 11.To complete the missing hydrogen atoms of the protein, we employed Schrödinger.The force field of OPLS-2005 was set to a pH parameter of 7.4 to optimize the structure and minimize the energy of the protein.A protein docking box was constructed with a central coordinate based on the location of the small molecule near the 5V7I binding pocket, and the box radius was set to 30 Å. Next, molecular docking was performed using the ligand docking operation of the Glide module, and the docking of compounds from the ChemDiv database to SHMT2 was performed with SP precision.

Consensus Scoring
The Glide module was used to dock proteins with the ligands of the compound library, and the GlideScore [44] ranking was obtained.The top 10,000 scores were used to select the best conformations of the molecules, and the structures of the complexes after Glide docking were scored by consensus using other six different scoring functions, namely, ad4_scoring [45], dkoes_scoring [38], vina [46], vinardo [47], ΔvinaXGB [39], and AA-Score [41] (developed by our research group), and the molecules were ranked according to their scores.

Molecular Clustering
To begin with, the top 5000 molecules are selected from the consensus scoring ranking.These 1254 compounds are then skeleton-clustered using the Butina algorithm [48], which employs molecular fingerprints and Tanimoto coefficients to rapidly group the dataset into clusters based on structural similarities among the compound molecules.When To complete the missing hydrogen atoms of the protein, we employed Schrödinger.The force field of OPLS-2005 was set to a pH parameter of 7.4 to optimize the structure and minimize the energy of the protein.A protein docking box was constructed with a central coordinate based on the location of the small molecule near the 5V7I binding pocket, and the box radius was set to 30 Å. Next, molecular docking was performed using the ligand docking operation of the Glide module, and the docking of compounds from the ChemDiv database to SHMT2 was performed with SP precision.

Consensus Scoring
The Glide module was used to dock proteins with the ligands of the compound library, and the GlideScore [44] ranking was obtained.The top 10,000 scores were used to select the best conformations of the molecules, and the structures of the complexes after Glide docking were scored by consensus using other six different scoring functions, namely, ad4_scoring [45], dkoes_scoring [38], vina [46], vinardo [47], ∆vinaXGB [39], and AA-Score [41] (developed by our research group), and the molecules were ranked according to their scores.

Molecular Clustering
To begin with, the top 5000 molecules are selected from the consensus scoring ranking.These 1254 compounds are then skeleton-clustered using the Butina algorithm [48], which employs molecular fingerprints and Tanimoto coefficients to rapidly group the dataset into clusters based on structural similarities among the compound molecules.When the number of molecules in the compound library is less than 100,000, the Butina clustering method can be implemented with the aid of Rdkit.
The clustering process involves the following steps: 1. defining the function for calculating the molecular fingerprint, 2. reading the compound molecules from the library, 3. computing the molecular fingerprint and clustering based on the cutoff value, and 4. obtaining the resulting clusters.The cutoff represents the threshold for Tanimoto similarity, where compounds with a Tanimoto distance greater than the cutoff are classified into different clusters.Tanimoto similarity is expressed using the equation: In the formula, N A denotes the number of features inherent to molecule A, N B the number of features inherent to molecule B, and N C represents the tally of features shared between molecules A and B. The selected cutoff should result in a reduced number of clusters containing individual elements after clustering, with a cluster composition characterized by a modest distribution of molecular quantities and a relatively smooth trend in variation.In this study, a cutoff value of 0.4 was set during molecular clustering.As a result, the clustering analysis yielded clusters where molecules within each cluster shared the same skeletal structure.However, there were significant differences in the skeletal structures between molecules that were assigned to different clusters.

Molecular Dynamics Simulation
The GAFF force field [49] was utilized to obtain force field parameters for the 22 compounds.Gaussian16 and Antechamber in Amber18 were used to calculate RESP charges [50][51][52][53][54] for 22 ligands.The protonation of the protein-ligand complex was conducted at pH 7.4 to maintain consistency with experimental conditions.The nonstandard protein residue LLP parameters were prepared in accordance with the Amber tutorial (http://ambermd.org/tutorials/basic/tutorial5/(accessed on 24 May 2022)).Finally, molecular dynamics (MD) simulations were performed using the ff14SB force field [55] and Amber18 [56,57].To solvate the protein-ligand complex, TIP3P water was used to fill a cubic box extending 10 Å away from the protein, and the cutoff for nonbonded interactions was set to 10 Å.To neutralize the systems, counterions were added.Each system underwent a minimization step of 4000 steps and was heated from 0 to 300 K in 1 ns with Langevin dynamics, followed by a 10 ns molecular dynamics (MD) simulation in the NPT ensemble using a time step of 2 fs.To constrain bonds involving hydrogen atoms, the SHAKE algorithm [58] was employed.To compare the impact of time duration on simulation results, we chose two complex systems and performed molecular dynamics simulations of 100 ns duration on each.
To determine whether the trajectory has achieved stability, it is necessary to calculate the RMSD.The calculation of the RMSD values for the complex is performed by aligning the protein's backbone atoms and all atoms of the small molecule, with the initial docked conformation as the reference.A total of 10,000 frames from the trajectory are sampled, with every tenth frame selected for comparison against the initial conformation.For calculating the ligand RMSD values, all atoms of the ligand are aligned with the initial structure, and the difference between the two structures is quantified.

Binding Free Energy Calculation
The computational alanine scanning approach assumes that the mutated alanine residue contributes insignificantly to the binding free energy.Therefore, the difference in free energy before and after alanine mutation provides a quantitative measure of the contribution of the specific residue to the total binding free energy.Thus, the difference in binding free energy resulting from the mutation of residue x to a (alanine) can be defined as follows: ∆∆G The binding free energy for both the gas phase and solvation components was evaluated using the following scheme: To determine the gas phase components of the binding free energy between and residues a and x, we used the notation ∆G a gas and ∆G x gas .We applied the alanine scanning approach to all pocket residues located within 6 Å of any ligand atoms to obtain residuespecific contributions to the total binding energy.
We evaluated the gas phase free energy between the residue and the ligand by summing the enthalpy and entropy.The enthalpy was calculated using the standard molecular mechanics method, while the entropy was calculated using the IE (interaction energy) method [30,[59][60][61][62].
and similarly, for the mutant, Here, E x int and E a int represent the interaction energies, including van der Waals and electrostatic energies, between the ligand and residue x and a, respectively.The exponential argument β is equal to 1/KT, and ∆E x/a int is the deviation from the average E x/a int .The exponential average was evaluated using discrete time averaging: where N is the number of MD snapshots.Finally, Equation (3) becomes the following: In Equation ( 4), the solvation energy was calculated using the OBC GBSA model (igb = 2 in Amber) [63][64][65], with a dielectric constant of 1, 3, and 5 for nonpolar, polar, and charged residues, respectively [65,66].By combining the alanine scanning and IE methods described above, the total protein-ligand binding free energy can be approximated by the summation [26,67]: where the summation is performed with all contributing residues (6 Å around the ligand).Although Equation ( 9) is not theoretically rigorous, it provides a good estimate of the total binding free energy, especially when relative binding free energies are important for a series of compounds targeting the same receptor.In this study, a total of 1000 snapshots from the last 5 ns of the MD trajectory were used to calculate the binding energy using the ASGBIE method [26][27][28]60].
For comparison, we also calculated the binding free energy using the conventional MM/GBSA method with the MMPBSA.pyscript [68], and the igb parameter was set to twp.We set the dielectric constant to one in the MM/GBSA calculation, as this is typical in standard MM/GBSA calculations.A total of 100 snapshots were used to calculate the MM/GBSA energy.Considering the computational cost, we used 10 snapshots to perform the normal mode calculation for each complex.

Conclusions and Discussion
To identify the potential inhibitor molecules of SHMT2, we employed seven scoring functions for consensus scoring in the ChemDiv database.Subsequently, we performed skeleton clustering on the compounds obtained by consensus scoring, resulting in a list of 38 drugs with potential inhibitory activity.Following this, we conducted molecular dynamics simulations on these 38 complex systems and calculated the binding free energy using the alanine scanning binding interaction entropy method.Based on the strength of their binding affinity, we ranked the systems and selected the top two systems for the further analysis of their hot spots.
After identifying the hot spots of each system, we studied the similarities and differences between them, as well as their mechanisms of action with molecules.Based on our research, we have arrived at the following conclusions.
We used the ASGBIE method to calculate the binding free energy of the 38 screened compounds.The results showed that the binding strength of C14_1 and C12_14 to SHMT2 ranked among the top two, and the complex system formed by these two compound molecules and SHMT2 remained stable during kinetic simulation.By analyzing and comparing the binding affinity of each system, we found that these two compounds exhibited the highest binding affinity to SHMT2.
Our analysis revealed that the amino acid residues TYR105_B, LEU166_A, ARG425_A, and TYR106_B, located near the SHMT2 binding pocket, were consistently identified as hot spots in both complex systems.These hot spots were found to contribute less than −2 kcal/mol to the binding free energy in each of the different systems.
We conducted an analysis of the binding mechanism of three complex systems and found a large number of van der Waals interactions between the compounds and hot spot residues, including π-π interactions and π-alkyl interactions.These findings suggest that the presence of benzene rings is favorable for the binding of molecules to proteins, with the van der Waals energy contribution having the most significant impact on the binding free energy.
When comparing the interaction of molecules and residues near the binding pocket in the crystal structure of SHMT2, we found that the hot spots that played the main role were TYR105_B, TYR106_B, LEU166_A, and ARG425_A.This indicates that these four hot spots played a crucial role in the binding process and were the key residues that enabled the compound to bind to SHMT2.
In conclusion, this study provides valuable insights into the binding mechanism of SHMT2 and its inhibitor molecules.By analyzing the hot spots and binding free energy contributions of key residues, we have identified the crucial role of TYR105_B, TYR106_B, LEU166_A, and ARG425_A in the binding process.The study also highlights the importance of van der Waals interactions in ligand-receptor protein binding and suggests that clustering and consensus scoring methods can be employed in screening SHMT2 inhibitor drugs.Overall, these findings can guide the modification and design of SHMT2 inhibitor molecules and potentially aid in the development of cancer treatments.
However, the present study does lack the experimental validation of the computational predictions and has not addressed the mutational effect as well as the selectivity issue of the compounds between SHMT1 and SHMT2.We hope to address these issues in our future study.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biophysica3040044/s1,Table S1.The structures of 38 compounds screened.Table S2.Compound rankings among the seven scoring functions.Table S3.Specific values of binding free energy for 38 compounds and 8Z1 to SHMT2.

Biophysica 2023, 3 ,Figure 1 .
Figure 1.The result of clustering (similarity cutoff is 0.4).The horizontal axis represents the numbe clusters obtained, while the vertical axis represents the number of molecules contained in each clus Twenty molecules were selected based on the backbone clustering for their poten inhibitory activity.All 20 of these compounds share a common backbone structure o amino-1,4-dihydropyrano[2,3-c] pyrazole-5-carbonitrile, as illustrated in Figure 2, wh is the molecular scaffold structure of the existing inhibitor.In addition, the other 18 co pounds with the highest GlideScore in the clusters containing molecules with differ

Figure 2 .
Figure 2. The common scaffold of all compounds in the selected 12th cluster, the pyranopyraz structure.

Figure 1 .
Figure 1.The result of clustering (similarity cutoff is 0.4).The horizontal axis represents the number of clusters obtained, while the vertical axis represents the number of molecules contained in each cluster.

Figure 1 .
Figure 1.The result of clustering (similarity cutoff is 0.4).The horizontal axis represents the number of clusters obtained, while the vertical axis represents the number of molecules contained in each cluster.

Figure 2 .
Figure 2. The common scaffold of all compounds in the selected 12th cluster, the pyranopyrazole structure.

Figure 2 .
Figure 2. The common scaffold of all compounds in the selected 12th cluster, the pyranopyrazole structure.

Figure 3
Figure3.RMSD of two SHMT2-ligand complex systems and that of the apo-SHMT2 system: (a) the C12_14-SHMT2 complex system, (b) the C12_17-SHMT2 complex system, and (c) the apo-SHMT2 system in comparison with the C12_14-SHMT2 system.

Figure 4 .
Figure 4. Binding free energy ranking of 38 complex systems and that of 5V7I ( in blue).

Figure 4 .
Figure 4. Binding free energy ranking of 38 complex systems and that of 5V7I (in blue).

Figure 5 .
Figure 5.The contribution of residues to the complex C14_1-SHMT2 and 5V7I within 5 Å of the pocket (only residues with ∆∆G < −1.0 kcal/mol are listed).

Figure 5 .
Figure 5.The contribution of residues to the complex C14_1-SHMT2 and 5V7I within 5 Å of the pocket (only residues with ∆∆G < −1.0 kcal/mol are listed).

Figure 7 .
Figure 7. Average energy contribution of hot spot residues across the 38 complex systems.

Figure 7 .
Figure 7. Average energy contribution of hot spot residues across the 38 complex systems.

Figure 9 .
Figure 9. Ligand-residues interactions.The left panel represents the interactions between the ligand and surrounding residues in the 5V7I.The right panel represents the interactions between compound C14_1 and SHMT2.

Figure 11 .
Figure 11.Cocrystal structure of 5V7I.The SHMT2 consists of two chains, A (green) and B (blue), and the ligand molecule (pink) is bound to SHMT2.

Table 1 .
Average energy values for each residue (∆∆G < −1.0 kcal/mol) in the C14_1-SHMT2 complex system.(∆∆E vdw is van der Waals energy, ∆∆E ele is electrostatic energy, ∆∆GB is polar solvation energy, and ∆∆NP is binding enthalpy, IE is binding entropy, and ∆∆G x→a bind s binding free energy).