QSAR, Docking, and Molecular Dynamics Simulation Studies of Sigmacidins as Antimicrobials against Streptococci

Streptococci are a family of bacterial species significantly affecting human health. In addition, environmental Streptococci represent one of the major causes of diverse livestock diseases. Due to antimicrobial resistance, there is an urgent need for novel antimicrobial agent discovery against Streptococci. We discovered a class of benzoic acid derivatives named sigmacidins inhibiting the bacterial RNA polymerase-σ factor interaction and demonstrating excellent antimicrobial activity against Streptococci. In this work, a combinational computer approach was applied to gain insight into the structural basis and mechanism of action of sigmacidins as antimicrobials against Streptococcus pneumoniae. Both two- and three-dimensional quantitative structure-active relationships (2D and 3D QSAR) of sigmacidins displayed good predictive ability. Moreover, molecular docking and molecular dynamics simulation studies disclosed possible contacts between the inhibitors and the protein. The results obtained in this study provided understanding and new directions to the further optimizations of sigmacidins as novel antimicrobials.


Introduction
Streptococci are a large family of Streptococcus species widely present in the environment and as microbiota of mammals such as humans, wild animals and livestock. Streptococcus pneumoniae, belonging to the alpha-hemolytic streptococcal species, is one of the most common human pathogens leading to a range of pneumococcal diseases, including otitis media, sinusitis, pneumonia, septicemia, and meningitis [1]. The beta-hemolytic streptococcal species such as Streptococcus pyogenes (Group A Streptococcus, GAS) and Streptococcus agalactiae (Group B Streptococcus, GBS) represent the other two pathogens frequently triggering human diseases such as streptococcal pharyngitis (strep throat), impetigo, pneumonia, and meningitis [2,3]. In addition, more than one third of herd mastitis incidences are caused by "Environmental Streptococci" [4,5]. This name was coined after Streptococci which leads to animal diseases [6], such as Streptococcus dysgalactiae, Streptococcus uberis (Groups C and G Streptococci, GCS and GGS), and Enterococcus spp., which used to be classed in the genus Streptococcus (Group D Streptococcus) prior to 1984 [7]. The medical application to treat streptococcal infections by antibiotics is sometimes ineffective due to antimicrobial resistance [5,8]. As a result, S. pneumoniae has been listed in the World Health Organization Global Priority Pathogens List for which new antibiotics are urgently needed [9].
The emergence of multidrug resistance to current antibiotics among pathogens highlights the importance of the discovery of novel antimicrobials with minimized antimicrobial resistance. Protein-protein interactions (PPI) are appropriate targets for reducing antimicrobial resistance [10]. We focused on the specific and conserved bacterial PPIs for antimicrobial discovery [11]. Bacterial RNA polymerase (RNAP) comprises several subunits: 2α, β,  [17]) and the detailed interactions between β'CH and σ2.2, hydrogen bonds: green, hydrophobic: magenta; (B) Chemical structures of reported sigmacidins derivatives targeting the RNAP-σ interaction and their activities against S. pneumoniae.
With a new class of antimicrobial agents in hand, we intend to make use of the quantitative structure-activity relationship (QSAR) analysis to explore the relationship between the observed antimicrobial activity and numerical descriptors in order to predict the biological properties of perspective compounds and guide future syntheses [22]. The 2D QSAR study considers physicochemical properties of signal atoms and functional groups and their contribution to biological activity, while 3D QSAR could foresee the potential three-dimensional structure of the ligand molecules [23]. In this study, statistical methods including multiple linear regression (MLR) and partial least square analysis (PLS) were applied to analyze the correlation between properties or descriptors of the molecules and molecular properties [24].
In an attempt to reveal the relationships between the chemical structures and their activity against the representative S. pneumoniae, we took fiftysix molecules reported previously [20,21] to generate a set of quantitative rules and construct both 2D and 3D QSAR  [17]) and the detailed interactions between β CH and σ 2.2 , hydrogen bonds: green, hydrophobic: magenta; (B) Chemical structures of reported sigmacidins derivatives targeting the RNAP-σ interaction and their activities against S. pneumoniae.
With a new class of antimicrobial agents in hand, we intend to make use of the quantitative structure-activity relationship (QSAR) analysis to explore the relationship between the observed antimicrobial activity and numerical descriptors in order to predict the biological properties of perspective compounds and guide future syntheses [22]. The 2D QSAR study considers physicochemical properties of signal atoms and functional groups and their contribution to biological activity, while 3D QSAR could foresee the potential three-dimensional structure of the ligand molecules [23]. In this study, statistical methods including multiple linear regression (MLR) and partial least square analysis (PLS) were applied to analyze the correlation between properties or descriptors of the molecules and molecular properties [24].
In an attempt to reveal the relationships between the chemical structures and their activity against the representative S. pneumoniae, we took fiftysix molecules reported previously [20,21] to generate a set of quantitative rules and construct both 2D and 3D QSAR models for the design of novel derivatives. Their structures are given in the supporting information (Table S1) and their activities against S. pneumoniae (MIC) are shown in Table 1. Moreover, molecular docking and molecular dynamics (MD) simulations were performed to gain insight into the structural basis and the inhibitory mechanism of the inhibitors. A 2D QSAR study for compounds against S. pneumoniae was performed to determine the factors/descriptors related to the antibacterial activities of compounds 1-56, and to disclose the structural features contributing towards the bacterial inhibitory activities. In this study, the molecular properties for the compounds in the training set were calculated using the "Calculate Molecular Properties" protocol in Discovery Studio 2016 (DS 2016).
Descriptors used for the building of the model were selected based on the results of the intercorrelation matrix between the calculated descriptors. In the present research, the selected descriptors have intercorrelation values lower than 0.5 (Table 2) to avoid model overfitting, and the result of the matrix analysis revealed the independence of these descriptors. A Multiple Linear Regression (MLR) analysis method was used to construct the model. The statistical quality of the MLR model was judged by the calculation of the squared correlation coefficient (r 2 ) for internal validation and the predictive squared correlation coefficient (r 2 pred ) for external validation [25]. Moreover, the predictive power of the QSAR model was verified using LOO internal validation or cross validation (q 2 ). Usually, a value of q 2 > 0.5 is considered acceptable [26]. In this model, the r 2 was 0.732, r 2 pred was 0.613, and q 2 equaled 0.562, which indicated the true predictive ability of the model (Table 3) The predicted activities for the inhibitors versus their experimental activities and the residues between them are listed in Table 1. The correlation between the predicted activities and the experimental activities are depicted in Figure 2. These results demonstrated that the predicted activities by the constructed MLR model were in good agreement with the experiment data, suggesting that the 2D QSAR model was reliable for structure-activity prediction. The predicted activities for the inhibitors versus their experimental activities and the residues between them are listed in Table 1. The correlation between the predicted activities and the experimental activities are depicted in Figure 2. These results demonstrated that the predicted activities by the constructed MLR model were in good agreement with the experiment data, suggesting that the 2D QSAR model was reliable for structure-activity prediction.  [27], (b) hydrogen bond acceptor (HBA) which is critical to potency, selectivity, permeability, and solubility [28], (c) molecular polar surface area, a guideline towards the improvement of oral absorption and permeability [29], and (d) LUMO eigenvalue that is related to electrostatic properties [30], were used to describe the relationship between chemical properties and the antimicrobial activity. Compared with the molecular polar surface area, AlogP, HBA count, and LUMO Eigenvalue showed higher correlations, and slight variations of these descriptors significantly affected the activity.
Equation ( To further improve the predictive ability of the above model, an outlier analysis was conducted using the Find Outlier Molecules module of DS 2016 to identify the outliers in the dataset, and the acceptable level was set to 95 (95% confidence interval). Results showed that four compounds were returned as outliers, including compounds 10,for which the Molecular PSA was too high, 33 for which the LUMO Eigenvalue VAMP was too low, and 39 and 43, for which the Molecular PSA was too low. These four compounds were removed and a new 2D QSAR model was constructed. In the revised model, r 2 was 0.777, r 2 pred was 0.721, and q 2 equaled to 0.690, which indicated the improved predictive ability of the model (Table 4).  [27], (b) hydrogen bond acceptor (HBA) which is critical to potency, selectivity, permeability, and solubility [28], (c) molecular polar surface area, a guideline towards the improvement of oral absorption and permeability [29], and (d) LUMO eigenvalue that is related to electrostatic properties [30], were used to describe the relationship between chemical properties and the antimicrobial activity. Compared with the molecular polar surface area, AlogP, HBA count, and LUMO Eigenvalue showed higher correlations, and slight variations of these descriptors significantly affected the activity.
Equation ( To further improve the predictive ability of the above model, an outlier analysis was conducted using the Find Outlier Molecules module of DS 2016 to identify the outliers in the dataset, and the acceptable level was set to 95 (95% confidence interval). Results showed that four compounds were returned as outliers, including compounds 10,for which the Molecular PSA was too high, 33 for which the LUMO Eigenvalue VAMP was too low, and 39 and 43, for which the Molecular PSA was too low. These four compounds were removed and a new 2D QSAR model was constructed. In the revised model, r 2 was 0.777, r 2 pred was 0.721, and q 2 equaled to 0.690, which indicated the improved predictive ability of the model (Table 4). Equation (2) represents the revised MLR model. As shown in the equation, changes in the ALogP and LUMO Eigenvalue VAMP may affect the antimicrobial active significantly as they have higher correlations in comparison with HBA count and molecular polar surface area.
Equation (2) representing the upgraded 2D QSAR model: Structural alignment of the molecules is critical to both the predictive accuracy of a 3D QSAR model and reliability of contour models. Therefore, we applied flexible alignment to align all the molecules in this study. The most active compound, 40, was selected as the alignment template and the rest of the compounds were aligned to it by using the common substructure as displayed in Figure 3. Equation (2) represents the revised MLR model. As shown in the equation, changes in the ALogP and LUMO Eigenvalue VAMP may affect the antimicrobial active significantly as they have higher correlations in comparison with HBA count and molecular polar surface area.
Equation (2)  Structural alignment of the molecules is critical to both the predictive accuracy of a 3D QSAR model and reliability of contour models. Therefore, we applied flexible alignment to align all the molecules in this study. The most active compound, 40, was selected as the alignment template and the rest of the compounds were aligned to it by using the common substructure as displayed in Figure 3.

Three-Dimensional QSAR Study
The 3D QSAR model in this study was built by the Field-Based model module in Maestro 10.2. The statistical parameters are presented in Table 5. Here, r 2 is the non-crossvalidated value for the regression, r 2 CV is the LOO cross-validated correlation value, r 2 scramble represents the average value of r 2 from a series of models built using scrambled activities, and "Stability" reflects the sensitivity of the model to omissions from the training set. Q 2 is directly analogous to r 2 , but is based on the test set predictions. When the r 2 value is larger than the stability value, this is an indication that the dataset is over-fit. As we set PLS factor to six considering the statistical results, the three-factor model was selected with r 2 and r 2 CV values of 0.805 and 0.568, respectively, and a stability value of 0.883 ( Table 5).
The model was built using four fields, including steric, electrostatic, hydrogen bond (H-bond) donor, and H-bond acceptor. As shown in Table 6, the steric field and the hydrophobic field contributed significantly to the antibacterial activity with percentages of 36.1% and 29.8%, respectively.

Three-Dimensional QSAR Study
The 3D QSAR model in this study was built by the Field-Based model module in Maestro 10.2. The statistical parameters are presented in Table 5. Here, r 2 is the non-crossvalidated value for the regression, r 2 CV is the LOO cross-validated correlation value, r 2 scramble represents the average value of r 2 from a series of models built using scrambled activities, and "Stability" reflects the sensitivity of the model to omissions from the training set. Q 2 is directly analogous to r 2 , but is based on the test set predictions. When the r 2 value is larger than the stability value, this is an indication that the dataset is over-fit. As we set PLS factor to six considering the statistical results, the three-factor model was selected with r 2 and r 2 CV values of 0.805 and 0.568, respectively, and a stability value of 0.883 (Table 5). The model was built using four fields, including steric, electrostatic, hydrogen bond (H-bond) donor, and H-bond acceptor. As shown in Table 6, the steric field and the hydrophobic field contributed significantly to the antibacterial activity with percentages of 36.1% and 29.8%, respectively. The eleven compounds randomly selected by Maestro 10.2 were used as the test set to validate the predictive ability of the 3D QSAR model. As a result, the predicted pMIC values shown in Table 1 were in good alignment with the experimental data, with a Pearson-r (the correlation between the predicted and observed activity for the test set) value of 0.835 and a Q 2 value of 0.528. The correlation plots between the experimental and predicted pMIC values for both the training and test sets were shown in Figure 4. Though some outliers may possibly be generated, the results demonstrated the potential of the 3D QSAR model to be used for drug design with a good predictive power. The eleven compounds randomly selected by Maestro 10.2 were used as the test set to validate the predictive ability of the 3D QSAR model. As a result, the predicted pMIC values shown in Table 1 were in good alignment with the experimental data, with a Pearson-r (the correlation between the predicted and observed activity for the test set) value of 0.835 and a Q 2 value of 0.528. The correlation plots between the experimental and predicted pMIC values for both the training and test sets were shown in Figure 4. Though some outliers may possibly be generated, the results demonstrated the potential of the 3D QSAR model to be used for drug design with a good predictive power.

Interpretation of the 3D QSAR Contour Maps
To visualize the structure-activity relationship of these inhibitors, the steric, electrostatic, hydrophobic, H-bond donor, and H-bond acceptor contour maps of the models are displayed in Figure 5. The most active compound, 40, was used for further analysis.
In the steric contour ( Figure 5A), the green regions represented that the introduction of bulky substituents might increase activity, while steric hindrance should be avoided in the yellow regions. As shown in Figure 5A, a relatively large green contour was found around 3,4-diCl groups of compound 40, indicating that bulky substituents might be preferred in this region, while at 5-and 6-positions of the right benzene ring, steric hindrance was unfavorable.
The electrostatic contour ( Figure 5B) for compound 40 showed that a relatively large blue contour was located around the para-position to the -COOH group, suggesting that electron-deficient substituents may increase the activity. In addition, a small blue region was found to surround the -NO2 group, this can explain why reduction or removal of -NO2 resulted in reduction of the antimicrobial activity. In contrast, the red contour was mainly located around the linker, which meant electron-rich linkers may improve the activity.

Interpretation of the 3D QSAR Contour Maps
To visualize the structure-activity relationship of these inhibitors, the steric, electrostatic, hydrophobic, H-bond donor, and H-bond acceptor contour maps of the models are displayed in Figure 5. The most active compound, 40, was used for further analysis.
In the steric contour ( Figure 5A), the green regions represented that the introduction of bulky substituents might increase activity, while steric hindrance should be avoided in the yellow regions. As shown in Figure 5A, a relatively large green contour was found around 3,4-diCl groups of compound 40, indicating that bulky substituents might be preferred in this region, while at 5-and 6-positions of the right benzene ring, steric hindrance was unfavorable.
The electrostatic contour ( Figure 5B) for compound 40 showed that a relatively large blue contour was located around the para-position to the -COOH group, suggesting that electron-deficient substituents may increase the activity. In addition, a small blue region was found to surround the -NO 2 group, this can explain why reduction or removal of -NO 2 resulted in reduction of the antimicrobial activity. In contrast, the red contour was mainly located around the linker, which meant electron-rich linkers may improve the activity.
like the yellow regions.
H-bond acceptor and donor contour maps are displayed in Figure 5D. H-bond acceptors were favored as red regions and unfavored in the magenta regions. Moreover, Hbond donors were preferred as blackish green and pale green as the unfavorable regions. The magenta and blackish green contours covered the middle benzene ring, while the red and pale green regions encircled the -COOH group. Briefly, the H-bond acceptor and donor groups contributed less to the activity compared to the steric and hydrophobic groups.

Docking Studies
To compare the differences between the most bioactive compound 40 and the hit compound 1 in the binding processes, their potential binding modes and key interactions were analyzed using the LibDock module of DS 2016. As shown in Figure 6A,B, the -NH2 and -NO2 groups of compound 1 formed hydrogen bonds with Asp542 (H···O 2.15 Å) and Arg550 (O···H 1.85 Å) of β'CH region, respectively. Additionally, some weak π-cation interactions existed between the aromatic ring of compound 1 and Arg553 and Arg546.
In Figure 6C,D, it is suggested that compound 40 formed three classical hydrogen bonds with β'-CH region, including two H-bonds between the carboxylic group and On the hydrophobic contour map ( Figure 5C), the yellow regions indicated that the hydrophobic groups were preferred, while the white regions favored hydrophilic groups. It was shown that a relatively large blue region appeared around the 3,4-diCl groups; together with the prediction of the steric contour, the models indicated that replacing the -Cl with bulky and hydrophobic substituents might be beneficial to the antibacterial activity. In the contrast, the white regions were adjacent to the molecule but did not wrap it up like the yellow regions.
H-bond acceptor and donor contour maps are displayed in Figure 5D. H-bond acceptors were favored as red regions and unfavored in the magenta regions. Moreover, H-bond donors were preferred as blackish green and pale green as the unfavorable regions. The magenta and blackish green contours covered the middle benzene ring, while the red and pale green regions encircled the -COOH group. Briefly, the H-bond acceptor and donor groups contributed less to the activity compared to the steric and hydrophobic groups.

Docking and MD Simulations Studies of Compound 1 and 40 2.3.1. Docking Studies
To compare the differences between the most bioactive compound 40 and the hit compound 1 in the binding processes, their potential binding modes and key interactions were analyzed using the LibDock module of DS 2016. As shown in Figure 6A,B, the -NH 2 and -NO 2 groups of compound 1 formed hydrogen bonds with Asp542 (H···O 2.15 Å) and Arg550 (O···H 1.85 Å) of β CH region, respectively. Additionally, some weak π-cation interactions existed between the aromatic ring of compound 1 and Arg553 and Arg546.
The docking results indicated both compounds made extensive contacts with β'CH. In comparison to the hit compound 1, compound 40 made more interactions with the "hotspot" residues, including Arg546, Arg550, and Arg553 through hydrogen bonds.

MD Simulation Studies
To further understand the difference on the binding processes of hit compound 1 and compound 40, 10 ns MD simulations based on the above-mentioned binding modes were performed. To study the dynamic stability of both systems, root-mean-square deviations (RMSD) from the starting structures were analyzed (Figure 7). The plots showed that both the two systems reached equilibrium within 6 ns, and the proteins and ligands in both systems were stable after equilibrium. Average RMSD values for the protein and ligand in 1-β'CH bound system were 2.0 Å and 4.1 Å, respectively, while the corresponding values for the 40-β'CH bound system were 2.3 Å and 4.0 Å, respectively. Moreover, it was observed that compound 40 fluctuated more violently which might be due to the distances of the connected bonds between 40 and β'CH in the docking model. They were slightly longer than the distances between compound 1 and β'CH. In addition, the protein in the 40-β'CH system encountered more sizable rearrangement. This may be due to compound 40 having a more flexible structure. It could generate more conformations which require the protein to make more changes to adapt.
The energy of both complexes through MD simulation is shown in Figure 8. During the 10 ns production run, due to the existence of counter ions, the potential energy will often not decrease [31]. Results demonstrated that compound 40 in complex with the β'CH region had a much lower total energy compared to that of hit compound 1 (Figure 8A), especially the electrostatic energy ( Figure 8B), while the van der Waals energies of the two systems were similar ( Figure 8C). These results indicated that more attention should be given on the electrostatic energy when developing high-affinity inhibitors of the β'CH-σ interaction. In Figure 6C,D, it is suggested that compound 40 formed three classical hydrogen bonds with β -CH region, including two H-bonds between the carboxylic group and Arg553 (O···H 2.68 Å) and Lys556 (O···H 2.32 Å), one H-bond between -NO 2 and Arg550 (O···H 2.93 Å), while the S atom formed a nonclassical H-bond with Arg546 (S···H 2.48 Å).
The docking results indicated both compounds made extensive contacts with β CH. In comparison to the hit compound 1, compound 40 made more interactions with the "hotspot" residues, including Arg546, Arg550, and Arg553 through hydrogen bonds.

MD Simulation Studies
To further understand the difference on the binding processes of hit compound 1 and compound 40, 10 ns MD simulations based on the above-mentioned binding modes were performed. To study the dynamic stability of both systems, root-mean-square deviations (RMSD) from the starting structures were analyzed (Figure 7). The plots showed that both the two systems reached equilibrium within 6 ns, and the proteins and ligands in both systems were stable after equilibrium. Average RMSD values for the protein and ligand in 1-β CH bound system were 2.0 Å and 4.1 Å, respectively, while the corresponding values for the 40-β CH bound system were 2.3 Å and 4.0 Å, respectively. Moreover, it was observed that compound 40 fluctuated more violently which might be due to the distances of the connected bonds between 40 and β CH in the docking model. They were slightly longer than the distances between compound 1 and β CH. In addition, the protein in the 40-β CH system encountered more sizable rearrangement. This may be due to compound 40 having a more flexible structure. It could generate more conformations which require the protein to make more changes to adapt.  The binding energy of both inhibitors with β'CH were calculated using the Calculate Binding Energy module of DS 2016. For each system, binding energy calculation was performed for snapshots extracted every 100 ps from the last 2 ns of the whole 10 ns MD trajectory. For each snapshot, the free energy was calculated for each molecular species (complex, protein, and ligand), and the binding free energy was defined as: ΔEbinding = ΔEComplex − ΔEReceptor − ΔELigand [32]. Results showed that the average binding energy for compounds 1 and 40 were −12.1359 kcal/mol and -9.8806 kcal/mol, respectively. These results showed similar binding interactions of compounds, while compound 40 is a more flexible molecule, which may lead to higher binding energy as demonstrated.
To further understand the mechanic of action of the inhibitors, the final snapshots of the 10 ns trajectory were used to analyze the interactions between β'CH and compound 40 and 1, respectively. For compound 1, in comparison with the starting conformation, the most outstanding difference was that the benzoic acid moiety was turned over to form two hydrogen bonds between the carboxylic group and Arg549 and Arg553 ( Figure 9A,B). Moreover, the nitro group not only retained the H-Bonding interaction with Arg550, but also formed a new hydrogen bond with Arg546.
For compound 40, the three benzene rings of compound 40 positioned much closer The energy of both complexes through MD simulation is shown in Figure 8. During the 10 ns production run, due to the existence of counter ions, the potential energy will often not decrease [31]. Results demonstrated that compound 40 in complex with the β CH region had a much lower total energy compared to that of hit compound 1 ( Figure 8A), especially the electrostatic energy ( Figure 8B), while the van der Waals energies of the two systems were similar ( Figure 8C). These results indicated that more attention should be given on the electrostatic energy when developing high-affinity inhibitors of the β CH-σ interaction.  The binding energy of both inhibitors with β'CH were calculated using the Calculate Binding Energy module of DS 2016. For each system, binding energy calculation was performed for snapshots extracted every 100 ps from the last 2 ns of the whole 10 ns MD trajectory. For each snapshot, the free energy was calculated for each molecular species (complex, protein, and ligand), and the binding free energy was defined as: ΔEbinding = ΔEComplex − ΔEReceptor − ΔELigand [32]. Results showed that the average binding energy for compounds 1 and 40 were −12.1359 kcal/mol and -9.8806 kcal/mol, respectively. These results showed similar binding interactions of compounds, while compound 40 is a more flexible molecule, which may lead to higher binding energy as demonstrated.
To further understand the mechanic of action of the inhibitors, the final snapshots of The binding energy of both inhibitors with β CH were calculated using the Calculate Binding Energy module of DS 2016. For each system, binding energy calculation was performed for snapshots extracted every 100 ps from the last 2 ns of the whole 10 ns MD trajectory. For each snapshot, the free energy was calculated for each molecular species (complex, protein, and ligand), and the binding free energy was defined as: ∆E binding = ∆E Complex − ∆E Receptor − ∆E Ligand [32]. Results showed that the average binding energy for compounds 1 and 40 were −12.1359 kcal/mol and -9.8806 kcal/mol, respectively. These results showed similar binding interactions of compounds, while compound 40 is a more flexible molecule, which may lead to higher binding energy as demonstrated.
To further understand the mechanic of action of the inhibitors, the final snapshots of the 10 ns trajectory were used to analyze the interactions between β CH and compound 40 and 1, respectively. For compound 1, in comparison with the starting conformation, the most outstanding difference was that the benzoic acid moiety was turned over to form two hydrogen bonds between the carboxylic group and Arg549 and Arg553 ( Figure 9A,B). Moreover, the nitro group not only retained the H-Bonding interaction with Arg550, but also formed a new hydrogen bond with Arg546. key residues of β'CH region ( Figure 9C). As shown in Figure 9D, the carboxylic acid group of 40 formed a hydrogen bond and a salt-bridge with residues Arg553 and Lys556, respectively. Moreover, the nitro group formed two hydrogen bonds with Arg546 and Arg550, while a salt bridge was also formed between the nitro group and Arg550. In addition, the 3,4-diCl group of the left benzene ring formed hydrophobic interactions with Arg545.
Overall, these interactions may play key roles for the bioactivity of compound 40.

Dataset
All small molecule RNAP-σ inhibitors and their antimicrobial activities (MIC, μg/mL) were adopted from previous studies [20,21]. The MIC values in units of microgram per milliliter (μg/mL) were transformed in molarity (M) and subsequently transformed to pMIC (−logMIC). The dataset was divided into a training set for model generation and a test set (Table 1) for model validation, containing 45 and 11 compounds, respectively. The test set was chosen randomly by Maestro 10.2.

Preparation of the Small Molecules
The 3D structures of compounds were generated using Maestro 10.2, geometrically minimized with Macromodel (Maestro 10.2) based on the OPLS-2005 force field and all other parameters were set to the default settings [33].

Two-Dimensional QSAR Model Construction
Two-dimensional molecular properties of the training set compounds were calculated by module "Calculate Molecular Properties" in DS 2016. Two-dimensional descriptors including PKa, AlogP, molecular weight, molecular property counts (Num_aromatic Rings, Num_H_Acceptors, Num_H_Donors, Num_Rings, Num_RotatableBonds), For compound 40, the three benzene rings of compound 40 positioned much closer to the surface of the helix. This led to the molecule that made more interactions with the key residues of β CH region ( Figure 9C). As shown in Figure 9D, the carboxylic acid group of 40 formed a hydrogen bond and a salt-bridge with residues Arg553 and Lys556, respectively. Moreover, the nitro group formed two hydrogen bonds with Arg546 and Arg550, while a salt bridge was also formed between the nitro group and Arg550. In addition, the 3,4-diCl group of the left benzene ring formed hydrophobic interactions with Arg545. Overall, these interactions may play key roles for the bioactivity of compound 40.

Dataset
All small molecule RNAP-σ inhibitors and their antimicrobial activities (MIC, µg/mL) were adopted from previous studies [20,21]. The MIC values in units of microgram per milliliter (µg/mL) were transformed in molarity (M) and subsequently transformed to pMIC (−logMIC). The dataset was divided into a training set for model generation and a test set (Table 1) for model validation, containing 45 and 11 compounds, respectively. The test set was chosen randomly by Maestro 10.2.

Preparation of the Small Molecules
The 3D structures of compounds were generated using Maestro 10.2, geometrically minimized with Macromodel (Maestro 10.2) based on the OPLS-2005 force field and all other parameters were set to the default settings [33].

Two-Dimensional QSAR Model Construction
Two-dimensional molecular properties of the training set compounds were calculated by module "Calculate Molecular Properties" in DS 2016. Two-dimensional descriptors including PKa, AlogP, molecular weight, molecular property counts (Num_aromatic Rings, Num_H_Acceptors, Num_H_Donors, Num_Rings, Num_RotatableBonds), Molecualr surface Area, Molecular_Fractional Polar Surface Area, HOMO Eigenvalue VAMP, and LUMO Eigenvalue VAMP were adopted. The model was validated using the test set correlation and Leave-one-out (LOO) cross validation.

Three-Dimensional QSAR Model Construction
A 3D QSAR model was developed by Maestro 10.2. The alignment was achieved by using the Flexible Ligand Alignment module. The 3D QSAR model was generated by Field-based QSAR module with default parameters. The field style was set to Gaussian field, including steric, electrostatic, hydrophobic, H-bond acceptor, and donor field. The maximum PLS factor was set to six and in the PLS regression analysis, a leave-one-out (LOO) cross validation was performed to find the optimal number of components. The descriptors were generated in a 3D cubic lattice with grid spacing of 1 Å and extending to 3 Å units beyond the aligned molecules in all directions. In addition, the cutoff values for truncating steric force and electrostatic force fields were both set to 30 kcal/mol.

Docking and Molecular Dynamic Simulations
The crystal structure of the β CH region was extracted from the crystal structure of bacterial RNAP (PDB: 1IW7, [17]) which was downloaded from Protein Data Bank. Structures of the compounds and the protein for docking were imported to DS 2016 and the conformations were generated with the protocol "Prepare Protein" and "Prepare Ligands", respectively. Molecular Docking was performed using the LibDock tool and the identified critical residues for the σ 2.2 -β CH region PPI (including Arg546, Arg550, Arg553, Leu566) were defined as the binding sites. The docking process was conducted with the default parameters unless otherwise mentioned. MD simulation was conducted in a similar manner as described [34]. Binding free energy was calculated according to the literature [32].

Conclusions
Streptococci are an important bacterial family closely related to human wellbeing, while environmental Streptococci significantly affect herd health. Antimicrobial resistance to conventional antibiotics is emerging due to natural resistance mechanisms and antibiotic misuse. Therefore, novel antimicrobial agents are urgently required. We focused on bacterial transcription [35,36] and discovered a series of benzoic acid derivatives and named them sigmacidins. They were capable of mimicking bacterial transcription factor σ at the region 2.2 to disrupt its binding to RNAP and to exhibit excellent antimicrobial activity against Streptococci including E. faecalis.
In this study, a combined computational approach was applied to investigate the relationship of the structural basis and antimicrobial activities of sigmacidins. Both 2D and 3D QSAR models were constructed, and the binding poses of the inhibitors to the protein were obtained. The 2D QSAR model constructed revealed close structure-activity correlation and contribution of various properties/descriptors in the activity. In the model identified in this study, ALogP, hydrogen bond acceptor (HBA), molecular polar surface area, and LUMO eigenvalue were taken to describe the SAR. The 2D QSAR equation implied that the activity of the compounds was related to and can be improved by increasing AlogP, molecular polar surface area, and the LUMO eigenvalue. This equation will be useful to estimate antimicrobial activity of newly designed compounds. Two-dimensional QSAR for bioactivity prediction is simple and efficient; however, it was obtained based on limited structures of substituents and may have accurate correlations to a relatively small range of substitutions for further structural optimizations. The 3D QSAR model gained further insight into the 3D structure information for the understanding of the SAR of these antimicrobials. The importance of the steric and hydrophobic properties of the 3,4-substitution of the left benzene ring was highlighted, while the substituents at para-position of the -COOH group could be further explored for novel derivative synthesis. Besides guiding the modifications of existing molecules, the constructed model can also be used directly for virtual screening to identified novel hits. Finally, molecular docking indicated possible binding poses of the inhibitors in complex with β CH and MD simulations used to rationalize the docked poses. In this study, the docking model of the 40-β CH system showed similar stability to 1-β CH and the binding free energy of 40-β CH was slightly higher than that of 1-β CH, probably due to its flexible structure. Fortunately, the surface of the protein fragment is enriched in arginine which is elastic and able to accommodate more conformational changes of compound 40. In addition, compared to the starting docking models, after 10 ns simulation, both compounds formed more H-bonding contacts with β CH region, especially with "hotspot" residues, including Arg546, Arg550, and Arg553. This indicated that the binding affinity might be increased by enhancing acidity of the inhibitors. For example, the nitro group can be replaced by acidic substituents. While compound 40 demonstrated significantly superior antibacterial activity to compound 1, the possible reasons may include the greater bacterial cell membrane permeability of compound 40 which was optimized from hit compound 1. Nevertheless, the docking and MD simulations showed some difference, probably due to the challenging PPI target with a relatively flat binding site. Here, we need to combine the two methods which put forward possible contacts between ligands and the β CH region. This combination is useful for future structure-based drug design. The newly designed compounds that fulfill the requirement by these 3D features may be able to bind to the same target protein and possess significant antimicrobial activity, which remain to be experimentally evaluated.
Overall, the models established in this study provided useful indications for the design of novel sigmacidins derivatives against pathogenic Streptococci. Note that sigmacidins also demonstrated excellent antimicrobial activity against Staphylococci such as Staphylococcus aureus, Staphylococcus epidermidis, and Staphylococcus Saprophyticus [20,21]. We believe that the further development of sigmacidins via ligand-based and structure-based drug design will contribute to novel antimicrobial agent discovery in the post-antibiotic era.

Informed Consent Statement: Not applicable.
Data Availability Statement: Supporting information for this article is available upon request from corresponding authors.