Computational Analysis of Chemical Space of Natural Compounds Interacting with Sulfotransferases

The aim of this study was to investigate the chemical space and interactions of natural compounds with sulfotransferases (SULTs) using ligand- and structure-based in silico methods. An in-house library of natural ligands (hormones, neurotransmitters, plant-derived compounds and their metabolites) reported to interact with SULTs was created. Their chemical structures and properties were compared to those of compounds of non-natural (synthetic) origin, known to interact with SULTs. The natural ligands interacting with SULTs were further compared to other natural products for which interactions with SULTs were not known. Various descriptors of the molecular structures were calculated and analyzed. Statistical methods (ANOVA, PCA, and clustering) were used to explore the chemical space of the studied compounds. Similarity search between the compounds in the different groups was performed with the ROCS software. The interactions with SULTs were additionally analyzed by docking into different experimental and modeled conformations of SULT1A1. Natural products with potentially strong interactions with SULTs were outlined. Our results contribute to a better understanding of chemical space and interactions of natural compounds with SULT enzymes and help to outline new potential ligands of these enzymes.

In this study, we investigated human SULT1A1 [26], which is the most abundant SULT in human liver, and is also distributed in lung, platelets, kidney, and gastrointestinal tissues [27]. Human SULT1A1 exhibits a broad substrate range with specificity for phenolic compounds, including drugs and pro-carcinogens such as N-hydroxy-aromatic and heterocyclic arylamines [8]; indeed, the SULTs play a key role in the metabolism of a number of natural compounds [12,[28][29][30][31]. Contrary to many exogenous compounds, some natural compounds (e.g., flavonolignans) are sufficiently functionalized by Phase II conjugations without the need to pass through Phase I metabolism [12]. In human hepatocytes, flavonolignans were found to be metabolized by glucuronidation or sulfation, while other types of conjugations, such as methylation or glutathionylation, or formation of Phase I products, were negligible [32,33]; the flavonoid taxifolin can also be metabolized directly by Phase II conjugations [34,35].
Here, we focused on the chemical space and interactions of natural compounds with SULTs using ligand-and structure-based in silico methods. We created an in-house library of natural ligands reported to interact with SULTs, and the chemical structure and properties of the compounds were compared to the properties of synthetic compounds known to interact with SULTs. The natural ligands interacting with SULTs were further compared to other natural products for which interactions with SULTs were not known. Cluster and similarity analyses were performed between the compounds, and their interactions with SULTs were analyzed by docking into experimental and modeled conformations of SULT1A1. Natural products with potentially strong interactions with SULTs were proposed, thus outlining them as potential SULT ligands.

Results and Discussion
The groups of the studied compounds are presented in Table 1. The descriptors of the molecular structures available in Molecular Operating Environment (MOE), v.2019.0102 (Chemical Computing Group, https://www.chemcomp.com/) were calculated and analyzed. Differences in the descriptor means among the three groups of compounds were evaluated with ANOVA. Principal component analysis (PCA) was used to extract information from the calculated descriptors; cluster analysis was applied to the chemical compounds. ROCS similarity search between the compound groups was performed.
The interactions of the three groups of compounds with SULTs were further analyzed by docking to three alloforms of SULT1A1, the crystallographic structures of which were retrieved from the Protein Data Bank (PDB, https://www.rcsb.org).

Compounds Collection for the In-House Library of Natural SULT Ligands
The search in the scientific literature [15, for natural compounds reported as ligands of SULTs led to the creation of an in-house library of 118 structures. It is freely available at http://biomed.bas.bg/qsarmm/.
Sixty-two compounds were reported as substrates, 53 compounds were inhibitors of SULT1A1, and for 3 compounds, both substrate and inhibitory kinetic parameters were available. Most of the compounds were also reported to be substrates or inhibitors of other SULT isoforms, as listed in the database.
The library of collected natural SULT ligands contains the following information: The statistics of the above listed chemical descriptors in the group of the investigated natural SULT1A1 ligands are reported in Table 2.
Most of the natural SULT ligands are of medium size (average MW of 272 Da) and contain at least one ring (3 rings on average), but there are also some large molecules with a high number of rings (up to 11). On average, the number of H-bond donor or acceptor atoms is 3 and 4, respectively, but some ligands do not have H-bond donors (flavone, bergamottin, nobiletin, rotenone, and tangeretin) or have only one H-bond acceptor, while others have as much as 17 H-bond donors and 24 H-bond acceptors (punicalagin). The ASA+ of the molecules is slightly larger than the ASA−. The hydrophobic/hydrophilic properties vary from very hydrophilic molecules with negative logP(o/w), reaching −2.1, to hydrophobic with high positive logP(o/w) (up to 5.3); the average logP(o/w) is 2.2. In general, there are more hydrophobic molecules with positive logP(o/w) and larger hydrophobic surface areas than hydrophilic molecules. The observed diversity in the structural characteristics of the natural SULT ligands in relation to the size, H-bonding properties, and hydrophobic/hydrophilic properties is in agreement with the well-known promiscuity of SULTs but with preferential selectivity towards a specific class of compounds for each isoenzyme [6]. Principal Component Analysis of the calculated structural descriptors PCA was applied in order to analyze the set of 313 calculated descriptors for natural SULT ligands. The eigenvalues of the principal components (PC) and the percentage of the explained variance of the descriptor data set are presented in Figure 1: Most of the natural SULT ligands are of medium size (average MW of 272 Da) and contain at least one ring (3 rings on average), but there are also some large molecules with a high number of rings (up to 11). On average, the number of H-bond donor or acceptor atoms is 3 and 4, respectively, but some ligands do not have H-bond donors (flavone, bergamottin, nobiletin, rotenone, and tangeretin) or have only one H-bond acceptor, while others have as much as 17 H-bond donors and 24 H-bond acceptors (punicalagin). The ASA+ of the molecules is slightly larger than the ASA−. The hydrophobic/hydrophilic properties vary from very hydrophilic molecules with negative logP(o/w), reaching −2.1, to hydrophobic with high positive logP(o/w) (up to 5.3); the average logP(o/w) is 2.2. In general, there are more hydrophobic molecules with positive logP(o/w) and larger hydrophobic surface areas than hydrophilic molecules. The observed diversity in the structural characteristics of the natural SULT ligands in relation to the size, H-bonding properties, and hydrophobic/hydrophilic properties is in agreement with the well-known promiscuity of SULTs but with preferential selectivity towards a specific class of compounds for each isoenzyme [6].

Principal Component Analysis of the calculated structural descriptors
PCA was applied in order to analyze the set of 313 calculated descriptors for natural SULT ligands. The eigenvalues of the principal components (PC) and the percentage of the explained variance of the descriptor data set are presented in Figure 1: The first two components covered 55.5% of the descriptor variance. The analysis of the projections of the original variables onto the component plane (PCA loadings) revealed that the first PC (PC1) accounted mostly for the steric characteristics/size (molecular weight, volume, surface, and shape), while logP(o/w), ASA_H, ASA_P, and hydrophobic volume contributed to the second PC (PC2), thus accounting for the hydrophobic/hydrophilic properties.
A plot of projection of the compounds onto the PC plane (the score plot of PC1 vs. PC2) is presented in Figure 2: The first two components covered 55.5% of the descriptor variance. The analysis of the projections of the original variables onto the component plane (PCA loadings) revealed that the first PC (PC1) accounted mostly for the steric characteristics/size (molecular weight, volume, surface, and shape), while logP(o/w), ASA_H, ASA_P, and hydrophobic volume contributed to the second PC (PC2), thus accounting for the hydrophobic/hydrophilic properties.
A plot of projection of the compounds onto the PC plane (the score plot of PC1 vs. PC2) is presented in Figure 2: Compounds 100 and 109 (punicalagin and thearubigin), which fall outside of the rest compounds, are very bulky and have the highest molecular weights among the natural SULT ligands. No similarity groups could be identified within the investigated compounds, confirming the diversity in their chemical structures.

Cluster analysis
To better characterize the chemical space of the studied compounds, five principal components, describing 72% of the variance in the compound structures, were utilized in the cluster analysis ( Figure 1). The obtained clusters are presented in Figure 3; the clusters were identified by applying a cutoff value of 33% of the maximal distance based on the Sneath's index of cluster significance (shown as a line in Figure 3) [65]. Seven clusters at the 33% level were outlined.  Compounds 100 and 109 (punicalagin and thearubigin), which fall outside of the rest compounds, are very bulky and have the highest molecular weights among the natural SULT ligands. No similarity groups could be identified within the investigated compounds, confirming the diversity in their chemical structures.
Cluster analysis To better characterize the chemical space of the studied compounds, five principal components, describing 72% of the variance in the compound structures, were utilized in the cluster analysis ( Figure 1). The obtained clusters are presented in Figure 3; the clusters were identified by applying a cutoff value of 33% of the maximal distance based on the Sneath's index of cluster significance (shown as a line in Figure 3) [65]. Seven clusters at the 33% level were outlined. Compounds 100 and 109 (punicalagin and thearubigin), which fall outside of the rest compounds, are very bulky and have the highest molecular weights among the natural SULT ligands. No similarity groups could be identified within the investigated compounds, confirming the diversity in their chemical structures.

Cluster analysis
To better characterize the chemical space of the studied compounds, five principal components, describing 72% of the variance in the compound structures, were utilized in the cluster analysis ( Figure 1). The obtained clusters are presented in Figure 3; the clusters were identified by applying a cutoff value of 33% of the maximal distance based on the Sneath's index of cluster significance (shown as a line in Figure 3) [65]. Seven clusters at the 33% level were outlined.  The chemical structures of the compounds as classified in the clusters are presented in Figure 4.    The first compound classified in a single cluster 1 (100, punicalagin, Figure 3) is very bulky and with the highest molecular weight (1085 Da). As seen in Figure 4, the compounds are reasonably grouped in clusters, according to the molecular descriptors, in agreement with their structures. Cluster 2 includes the three compounds from the group of natural SULT ligands, which contain iodine. Cluster 3 contains mainly bulky 2-phenylchroman (flavan) and other 2-arylchroman derivatives (including 1 flavanone derivative). Overall, clusters 4 and 5 include compounds with one or two fused rings, but cluster 4 has compounds with more oxygen or nitrogen atoms, compared to the compounds in cluster 5. Cluster 6 comprises compounds with more than two rings, including also some steroid compounds-derivatives of estrone and estradiol. Cluster 7 contains mainly hydroxy flavones; there are also some polyphenol compounds, such as curcuminoids, stilbenes, and benzophenones, with similar structural fragments. Cluster 7 may be divided into two smaller clusters (denoted as 7.1 and 7.2), close to the 33% distance level (Figure 3). In general, the compounds in cluster 7.2 contain lower number of oxygen atoms than the compounds in cluster 7.1 and have an unsubstituted phenyl ring.

Docking Results
Next, we performed docking of all 118 compounds into different SULT1A1 conformations, using MOE software. We applied two scoring functions, Alpha HB and London dG with different terms of the binding energies calculations (see Section 3 for details). Figure 5 illustrates the docking scores (approximating the binding energies) obtained for the natural SULT ligands from their docking in eight SULT1A1 structures: four crystallographic (with PDB IDs: 4GRA for SULT1A1*1, 1LS6 and 2D06 for SULT1A1*2, and 1Z28 for SULT1A1*3) and four conformations generated from Molecular Dynamics Simulations (MD) (see Section 3 for details). Alpha HB and London dG scoring functions are applied, and the lowest scores of 30 docking poses for each compound (indicating better binding affinity and stronger interactions with the enzymes) are plotted. The compounds are ordered in the figure, according to the results from the cluster analysis. The cluster numbers are also presented. The first compound classified in a single cluster 1 (100, punicalagin, Figure 3) is very bulky and with the highest molecular weight (1085 Da). As seen in Figure 4, the compounds are reasonably grouped in clusters, according to the molecular descriptors, in agreement with their structures. Cluster 2 includes the three compounds from the group of natural SULT ligands, which contain iodine. Cluster 3 contains mainly bulky 2-phenylchroman (flavan) and other 2-arylchroman derivatives (including 1 flavanone derivative). Overall, clusters 4 and 5 include compounds with one or two fused rings, but cluster 4 has compounds with more oxygen or nitrogen atoms, compared to the compounds in cluster 5. Cluster 6 comprises compounds with more than two rings, including also some steroid compounds-derivatives of estrone and estradiol. Cluster 7 contains mainly hydroxy flavones; there are also some polyphenol compounds, such as curcuminoids, stilbenes, and benzophenones, with similar structural fragments. Cluster 7 may be divided into two smaller clusters (denoted as 7.1 and 7.2), close to the 33% distance level ( Figure  3). In general, the compounds in cluster 7.2 contain lower number of oxygen atoms than the compounds in cluster 7.1 and have an unsubstituted phenyl ring.

Docking Results
Next, we performed docking of all 118 compounds into different SULT1A1 conformations, using MOE software. We applied two scoring functions, Alpha HB and London dG with different terms of the binding energies calculations (see Section 3 for details). Figure 5 illustrates the docking scores (approximating the binding energies) obtained for the natural SULT ligands from their docking in eight SULT1A1 structures: four crystallographic (with PDB IDs: 4GRA for SULT1A1*1, 1LS6 and 2D06 for SULT1A1*2, and 1Z28 for SULT1A1*3) and four conformations generated from Molecular Dynamics Simulations (MD) (see Section 3 for details). Alpha HB and London dG scoring functions are applied, and the lowest scores of 30 docking poses for each compound (indicating better binding affinity and stronger interactions with the enzymes) are plotted. The compounds are ordered in the figure, according to the results from the cluster analysis. The cluster numbers are also presented.
(a)  The docking scores calculated by different scoring functions could not be directly compared, as they approximate the binding energies, using different terms. Therefore, the comparison between the compounds is based on the trends outlined by the particular scoring values. The docking score range for Alpha HB was between −135 and −50, while for London dG the range was between −21 and −7. Overall, the docking scores followed the same trends for the two scoring functions, and they were similar for the four crystallographic and the 4 MD structures, except for the SULT1A1 conformations generated by the MD simulations starting from the crystal structure 1LS6. The Alpha HB and London dG scores obtained on the crystal structure 1LS6 outperformed those obtained on the corresponding 1LS6 MD structures. Previously, we had obtained similar results in terms of scores of drug-like molecules docked into 1LS6 [14]. In that previous study, MD simulations starting from the 1LS6 crystal structure did not permit significant improvement of the scores and the ranks of those compounds [14]. In fact, the 1LS6 structure corresponds to SULT1A1*2 (R213H), which is known to be less stable [66] among the three alloforms studied here, *1, *2 and *3 (see Section 3 for details), and thus the MD simulations without a bound ligand cannot allow the stabilization of the SULT1A1*2 structure.
The lowest binding energies for both Alpha HB and London dG occurred in compound cluster 3 and sub-cluster 7.1. This is expected, taking into account that these clusters mainly consist of phenol-containing compounds (e.g., flavonoids), which are typical substrates of SULT1A1 (also known as phenol sulfotransferase) [67]. Cluster 7.2 also includes flavonoids, but with a smaller number of OH groups, which could explain the worse docking scores. Interestingly, the docking into the MD conformations allowed for the improvement of the Alpha HB and LondondG docking scores of sub-cluster 7.2, underlying the importance of taking into account the conformational changes of SULTs when exploring interactions with their ligands. Our recent study [25], using MD simulations with excited Normal Modes (MDeNM) [68], demonstrated that the natural flexibility of SULT1A1 provides a large opening of the key loops 1, 2, and 3, thus ensuring the recognition of diverse substrates and inhibitors by SULT1A1, the large inhibitor epigallocatechin gallate, in particular [20].
Compound 43 (in cluster 4, Figure 4) had also low binding energies, possibly due to its high number of oxygen atoms, which favors SULT interactions. Some compounds in cluster 6 also had low binding energies in the crystallographic enzyme structures and when applying Alpha HB scoring function.
For the four cases shown in Figure 5a-d, cluster 5 showed very different docking scores due to the different size of its compounds. In Figure 6, the distribution of several descriptors important for protein-ligand interactions and compound behavior in the living systems is compared in the three studied groups. The minimum-maximum ranges, median values (the dots) and the 25-75% percentiles of the descriptors are presented.
The ranges for most of the descriptors in the synthetic SULT ligands group were narrower and were included in the corresponding descriptor ranges of the natural SULT ligands group. The value ranges of ASA_H and logP(o/w) for the two groups were comparable. The range of logP(o/w) values of the synthetic ligands was slightly higher than the logP(o/w) range of the natural ligands; thus, some synthetic ligands were more lipophilic than the natural ligands. The ranges for most of the descriptors in the synthetic SULT ligands group were narrower and were included in the corresponding descriptor ranges of the natural SULT ligands group. The value ranges of ASA_H and logP(o/w) for the two groups were comparable. The range of logP(o/w) values of the synthetic ligands was slightly higher than the logP(o/w) range of the natural ligands; thus, some synthetic ligands were more lipophilic than the natural ligands.
The range values of the structural descriptors for the natural SULT ligands were within the range of the corresponding descriptor values for the other natural products. For all descriptors, except logP(o/w), the ranges for the natural SULT ligands were in the lowest part of the ranges for the other natural products. The logP(o/w) values for natural SULT ligands were in the middle range of logP(o/w) of the other natural products, showing that there were no extremely hydrophilic or extremely lipophilic natural SULT ligands.

ANOVA
ANOVA was used to outline structural descriptors with statistically significant differences between natural and synthetic SULT ligands, and between natural SULT ligands and other natural products. Different descriptor means would suggest different molecular properties between these three compound groups. The range values of the structural descriptors for the natural SULT ligands were within the range of the corresponding descriptor values for the other natural products. For all descriptors, except logP(o/w), the ranges for the natural SULT ligands were in the lowest part of the ranges for the other natural products. The logP(o/w) values for natural SULT ligands were in the middle range of logP(o/w) of the other natural products, showing that there were no extremely hydrophilic or extremely lipophilic natural SULT ligands.
ANOVA ANOVA was used to outline structural descriptors with statistically significant differences between natural and synthetic SULT ligands, and between natural SULT ligands and other natural products. Different descriptor means would suggest different molecular properties between these three compound groups. Table 3 reports the ANOVA Fisher statistics (F) and the probability values (p). According to the ANOVA results, the H-bond donor and acceptor properties, ASA+, ASA_P, ASA_H, and logP(o/w) are significantly different for the two groups of natural and synthetic SULT ligands at the 99.5% significance level (p < 0.005), despite the close descriptor ranges shown above ( Figure 6). The molecular size (weight, volume, surface), number of rings, and negatively charged areas are not statistically different in the two groups at the 99.5% significance level, and the compounds may be random samples from the same compound population in relation to these properties. Table 3. Comparison of the descriptors between the groups of natural SULT ligands (118 compounds), synthetic SULT ligands (102 compounds), and other natural products (1220 compounds) -ANOVA Fisher statistics (F) and probability values (p). According to ANOVA, the natural SULT ligands and the other natural products differ in their size and number of rings, H-bond acceptor properties, negatively charged areas, and hydrophobic areas of the molecules. The two groups of compounds have similar means in relation to H-bond donor atoms, logP(o/w), the ASA_P and ASA+. The ANOVA results showed that although the descriptor ranges for natural SULT ligands were within the descriptor ranges for the other natural products, some of the properties for the natural SULT ligands (size, number of rings, H-bond acceptor properties, ASA−, and ASA_H) were significantly different from the other natural products; therefore, a small proportion of natural compound could be SULT ligands. In fact, logP(o/w) of natural ligands interacting with SULTs is within the range of −2.1 to 5.3, which means that such compounds are relatively soluble. However, many of them are with low permeability [29]. The series of metabolic reactions, including Phase I and Phase II reactions, catalyzed by intracellular metabolic enzymes (such as CYP, esterase, SULT, and UGT enzymes) showed that natural medicines with low permeability have distinctive metabolisms and pharmacokinetics [29]. For example, the flavonoid rutin, a natural medicine with antiviral, anti-inflammatory, and vasodilator effects, passes through Phase II metabolism by methylation, sulfation, and glucuronidation reactions [69].

Comparison of the Descriptors between the Groups of Natural SULT Ligands and other Natural
Cluster analysis Natural and synthetic SULT ligands (220 compounds altogether) were combined for the cluster analysis performed further. After removing descriptors without variance in the data set, 311 structural descriptors remained. The PCA extracted six PCs that explained 71.2% of the total variance of the structural descriptors; these PCs were used in the cluster analysis.
The obtained compound clusters are presented in Figure 7. Seven clusters at the 33% level were outlined. Some of the clusters contained entirely natural SULT ligands (clusters 2 and 6) or predominantly synthetic SULT ligands (cluster 3); in other clusters, the synthetic SULT ligands were placed together with the natural SULT ligands, although again some sub-clusters of natural or synthetic ligands could be identified, as shown in Figure 7.
Based on the statistical analysis of the structural descriptors, the natural SULT ligands were estimated as slightly more diverse than the synthetic ligands, in agreement with literature reports [70]. The ANOVA analysis showed that the two groups of compounds differ in their means of some molecular properties related to compound polarity (H-bond donor and acceptor properties, ASA+, ASA_P, ASA_H, and logP(o/w)). This was also observed in the cluster analysis, where some natural and synthetic ligands were grouped into different clusters.
When combining the three groups-natural SULT ligands, synthetic SULT ligands, and the other natural products (1440 compounds altogether)-318 structural descriptors remained after removing descriptors without variance, and these descriptors were used in the PCA. Seven principal components covered 71.2% of the total variance of the structural descriptors. The obtained compound clusters based on these PCs are presented in Figure 8. The figure also shows the number of natural and synthetic SULT ligands within the clusters by columns. Figure 7. Clusters obtained by combining natural and synthetic SULT ligands. The 33% cutoff line is shown, and the clusters are numbered. Clusters, which contain entirely or predominantly natural or synthetic SULT ligands are labeled with N or S, respectively. Seven clusters at the 33% level were outlined. Some of the clusters contained entirely natural SULT ligands (clusters 2 and 6) or predominantly synthetic SULT ligands (cluster 3); in other clusters, the synthetic SULT ligands were placed together with the natural SULT ligands, although again some sub-clusters of natural or synthetic ligands could be identified, as shown in Figure 7.
Based on the statistical analysis of the structural descriptors, the natural SULT ligands were estimated as slightly more diverse than the synthetic ligands, in agreement with literature reports [70]. The ANOVA analysis showed that the two groups of compounds differ in their means of some molecular properties related to compound polarity (H-bond donor and acceptor properties, ASA+, ASA_P, ASA_H, and logP(o/w)). This was also observed in the cluster analysis, where some natural and synthetic ligands were grouped into different clusters.
When combining the three groups-natural SULT ligands, synthetic SULT ligands, and the other natural products (1440 compounds altogether)-318 structural descriptors remained after removing descriptors without variance, and these descriptors were used in the PCA. Seven principal components covered 71.2% of the total variance of the structural descriptors. The obtained compound clusters based on these PCs are presented in Figure 8. The figure also shows the number of natural and synthetic SULT ligands within the clusters by columns.
Nine clusters were outlined. The SULT ligands were distributed among the clusters with the other natural products, suggesting structural similarity, but the ligands were placed mainly in clusters 7, 8 and 9 (these clusters contained 664 compounds). Thus, some structural grouping of the SULT ligands (natural and synthetic) compared to the other natural products was observed.

ROCS similarity search
The chemical similarity between SULT ligands and other natural products was evaluated by the ROCS software. Compounds from the group of other natural products, which are similar to the natural SULT ligands, were sought. We assessed 53 natural compounds as similar to 48 natural SULT ligands, with a Tanimoto-Combo score of >1.5. Nine clusters were outlined. The SULT ligands were distributed among the clusters with the other natural products, suggesting structural similarity, but the ligands were placed mainly in clusters 7, 8 and 9 (these clusters contained 664 compounds). Thus, some structural grouping of the SULT ligands (natural and synthetic) compared to the other natural products was observed.

ROCS similarity search
The chemical similarity between SULT ligands and other natural products was evaluated by the ROCS software. Compounds from the group of other natural products, which are similar to the natural SULT ligands, were sought. We assessed 53 natural compounds as similar to 48 natural SULT ligands, with a Tanimoto-Combo score of >1.5.

Comparison of the Docking Results in the Three Groups of Compounds
The ranges of the best docking scores for each compound in the SULT1A1 structures (four crystallographic structures-GRA, 1LS6, 2D06, and 1Z28-and one MD conformation selected from MD simulations starting from each of the four crystallographic structures), are presented in Figure 9.

Comparison of the Docking Results in the Three Groups of Compounds
The ranges of the best docking scores for each compound in the SULT1A1 structures (four crystallographic structures-GRA, 1LS6, 2D06, and 1Z28-and one MD conformation selected from MD simulations starting from each of the four crystallographic structures), are presented in Figure 9.
As expected, comparable binding energies for the natural SULT ligands and the synthetic SULT ligands were observed, with the natural ligands having a slightly wider range of docking scores. The other natural products had a much broader range of docking scores. The similarity trends obtained for Alpha HB and London dG scoring functions in the group of natural SULT ligands ( Figure 5) were also observed in the other groups ( Figure 9).

Search for Potential SULT Ligands among the Group of other Natural Products
Further, natural products similar to the natural SULT ligands and natural products showing the lowest binding energies were investigated in order to search for potential active SULT ligands.
For this purpose, we selected the natural products with an ROCS similarity above 1.5 Tanimoto-Combo score to natural SULT ligands and the natural products showing lower docking energy than the minimal energy obtained for the natural SULT ligands in at least one SULT1A1 structure (crystallographic or MD). As expected, comparable binding energies for the natural SULT ligands and the synthetic SULT ligands were observed, with the natural ligands having a slightly wider range of docking scores. The other natural products had a much broader range of docking scores. The similarity trends obtained for Alpha HB and London dG scoring functions in the group of natural SULT ligands ( Figure 5) were also observed in the other groups (Figure 9).

Search for Potential SULT Ligands among the Group of other Natural Products
Further, natural products similar to the natural SULT ligands and natural products showing the lowest binding energies were investigated in order to search for potential active SULT ligands.
For this purpose, we selected the natural products with an ROCS similarity above 1.5 Tanimoto-Combo score to natural SULT ligands and the natural products showing lower docking energy than the minimal energy obtained for the natural SULT ligands in at least one SULT1A1 structure (crystallographic or MD). This selection resulted in a group of 159 compounds, where 40 compounds had a ROCS similarity above 1.5 to natural SULT ligands, 106 compounds had lower binding energy than the minimal energy obtained for the natural SULT ligands, and 13 compounds fulfilled both criteria. All these 13 compounds were derivatives of flavone or isoflavone. Their structures are presented in Figure 10.
The docking scores of the selected natural products in the crystallographic enzyme structures are presented in Figure 11. The levels of the minimal and maximal scores obtained for the group of natural SULT ligands are shown with lines. This selection resulted in a group of 159 compounds, where 40 compounds had a ROCS similarity above 1.5 to natural SULT ligands, 106 compounds had lower binding energy than the minimal energy obtained for the natural SULT ligands, and 13 compounds fulfilled both criteria. All these 13 compounds were derivatives of flavone or isoflavone. Their structures are presented in Figure 10. pounds fulfilled both criteria. All these 13 compounds were derivatives of flavone or isoflavone. Their structures are presented in Figure 10. The docking scores of the selected natural products in the crystallographic enzyme structures are presented in Figure 11. The levels of the minimal and maximal scores obtained for the group of natural SULT ligands are shown with lines. Generally, the docking scores of the selected natural products were located in the lower part of the score ranges for the natural SULT ligands (with some exceptions). Some of the compounds were again flavonoids, others had 3 or 4 fused aromatic rings or 2 phenyl rings with a bridge between them, similar to the compounds in cluster 6 of the natural SULT ligands, which also showed favorable docking scores. These compounds may have Generally, the docking scores of the selected natural products were located in the lower part of the score ranges for the natural SULT ligands (with some exceptions). Some of the compounds were again flavonoids, others had 3 or 4 fused aromatic rings or 2 phenyl rings with a bridge between them, similar to the compounds in cluster 6 of the natural SULT ligands, which also showed favorable docking scores. These compounds may have strong interactions with SULT1A1.

Protein Targets
Eleven crystallographic structures of SULT1A1 available in Protein Data Bank (PDB, https://www.rcsb.org, accessed on 16 August 2021) were downloaded and analyzed. Three alloforms, SULT1A1*1, SULT1A1*2, and SULT1A1*3 were considered. The crystallographic structures with PDB ID 4GRA for SULT1A1*1 and PDB ID 1Z28 for SULT1A1*3, crystallized in their apoforms, were selected as the only representatives of these alloforms in PDB. Among the nine structures available for SULT1A1*2, three are co-crystallized with Pnitrophenol (1LS6 is co-crystallized with two molecules of P-nitrophenol), another three with 7-hydroxy-2-oxo-2H-chromene-3-carbonitrile (3QV), one with estradiol (EST), one with naphthalen-2-ol (O3V), and one is in its apoform. Based on the analysis of proteinligand interactions in crystallographic structures, as well as the placement of loop Phe81-Ser91, which is important for the molecular clamp mechanism in SULT1A1, the PDB IDs 1LS6 co-crystallized with two molecules of P-Nitrophenol (NPO) and 2D06 co-crystallized with estradiol (EST) were selected for further consideration.
The binding site of SULT1A1 for docking was defined based on the amino acid residues, involved in the binding pockets of 1LS6 and 2D06 with their ligands.

Datasets Preparation
A library of natural compounds known to be ligands of SULTs was assembled. The library consists of 118 structures collected from the scientific literature [15, and is described in detail in the Results section.
A library with 1220 other natural products was taken from Ref. [72]. The 118 natural SULT ligands were not present in this library.

Chemical Structures Preparation and Calculation of Molecular Descriptors
In this study, the 3D structures of chemical compounds were built using MOE software v.2019.0102. The molecules were minimized with the MMFF94x force field. The molecular descriptors available in MOE were calculated (322 descriptors). They include number of atoms or chemical groups of a given type, electronic, steric, hydrophobic descriptors, shape connectivity indices, and semi-empirical quantum mechanical electronic descriptors. The semi-empirical PM3 Hamiltonian was used for calculation of the quantum mechanical descriptors. Descriptors without variance in the compound sets were removed from the subsequent analysis.

Molecular Dynamics
Molecular dynamics (MD) simulations were performed on three alloforms of SULT1A1 in presence of PAP, starting from PDB IDs: 4GRA (apo), 2D06 (holo with bound EST), 1LS6 (apo), and 1Z28 (apo) to ensure a broader conformational exploration of the SULT1A1 structure for subsequent docking simulations. MD simulations were performed with NAMD 2.11 [73] with the Charmm36 force field [74]. Each system underwent a 10,000-step minimization. The system was then gradually heated from 0 K to 310 K with 1000-step/K in an NVT simulation. Then, equilibration of 2 ns was performed. Finally, a 40 ns production run was performed with an NPT simulation. Three independent MD simulations at three different initial velocity conditions were executed for each system. Finally, structural clustering was done over the conformational space generated for each of the studied systems including the binding pocket with an RMSD cutoff of 1.5 Å. Six clusters for 4GRA (the representative of SULT1A1*1), 14 clusters for 2D06 (SULT1A1*2), 10 clusters for 1LS6 (SULT1A1*2), and 9 clusters for 1Z28 (SULT1A1*3) were generated. These conformations were thoroughly analyzed and compared, focusing on the orientation of amino acid residues for the binding and the placement of structural loops responsible for the flexibility of the binding pocket part. More attention was paid to the residues Phe81 and Phe84 and the loop Phe81-Ser91, which are important for the molecular clamp mechanism in SULT1A1. Thus, four conformations-one per each of the studied systems-were finally selected for further docking.

Docking
Docking studies of the three groups of compounds into the active site of SULT1A1 were done, applying the docking tool in MOE. Docking was performed in protein crystallographic structures and in MD-obtained structures (as described in the previous section). The PAP co-factor was kept during docking as a part of the protein receptor. Virtual screening docking protocol was applied, the default settings were used, and the best 30 docking poses of each ligand were kept. Two scoring functions, Alpha HB and London dG, were used. London dG is the default scoring function in MOE and it accounts for the average gain/loss of rotational and translational entropy, entropy loss due to conformational flexibility, geometric imperfections of hydrogen bonds, geometric perfections of metal ligation, and desolvation energy of each atom. Alpha HB was selected as focused on compounds containing H-bond donors and acceptors. This scoring function is a linear combination of two terms: the first term estimates the steric fit of the ligand to the binding site, and the second term estimates hydrogen bond effects.

Statistical Analysis
Statistical analysis, including one-way ANOVA, Principal Component Analysis (PCA), and cluster analysis, was performed with Statistica 12.7 (StatSoft. Inc., Tulsa, OK, USA, www.statistica.com/). Default settings were used in the PCA, and mean substitution was used for missing descriptor data. Cluster analysis was done with Ward's linkage rule and squared Euclidean distance as a similarity measure.

ROCS Software
ROCS software v.3.3.0.3 (OpenEye Scientific Software, Inc., Santa Fe, NM, USA https://www.eyesopen.com) was used to identify chemically similar compounds. The procedure included aligning and scoring the dataset of molecular structures of the other natural products to template molecular structures of the group of SULT ligands. The Tanimoto-Combo score (shape + "color"), which combines the shape and chemical features of the aligned datasets, was used to score the similarity. The score varies from 0 (non-similar compounds) to 2 (very similar compounds).

Conclusions
Different in silico approaches were used to analyze the chemical space of natural SULT ligands and their interactions with SULT1A1. These were compared to synthetic SULT ligands and other natural products, for which interactions with SULT have not been reported yet.
We outlined here the natural SULT ligands diversity in their structural and physicochemical properties. The ranges of the physico-chemical descriptors for the synthetic SULT ligands were found to be comparable or narrower than the corresponding descriptor ranges for the natural SULT ligands, but some properties in terms of polarity differed between the two groups. These differences do not impact their interactions with SULTs.
The selected descriptors of the natural SULT ligands were found to be in the lower range of descriptor values of the other natural products, except logP(o/w). The two groups of compounds were similar in relation to the number of H-bond donor atoms, logP(o/w), ASA+, and the ASA_P, but some properties differed (size, number of rings, H-bond acceptor properties, ASA−, and ASA_H), also confirmed by the grouping observed in the cluster analysis.
Combining chemical space analysis and docking of natural SULT ligands, flavonoids and similar compounds were outlined as the most favorable ligands for SULT1A1 in agreement with published data (e.g., Ref. [75]). Interestingly, the search for other natural products with similarities to the natural SULT ligands and with good predicted binding energies resulted in identification of new putative natural products interacting with SULT1A1. Further elucidation of these compounds would be necessary in order to validate them as inhibitors or substrates of SULT1A1.