Structural and Dynamic Characterizations Highlight the Deleterious Role of SULT1A1 R213H Polymorphism in Substrate Binding

Sulfotransferase 1A1 (SULT1A1) is responsible for catalyzing various types of endogenous and exogenous compounds. Accumulating data indicates that the polymorphism rs9282861 (R213H) is responsible for inefficient enzymatic activity and associated with cancer progression. To characterize the detailed functional consequences of this mutation behind the loss-of-function of SULT1A1, the present study deployed molecular dynamics simulation to get insights into changes in the conformation and binding energy. The dynamics scenario of SULT1A1 in both wild and mutated types as well as with and without ligand showed that R213H induced local conformational changes, especially in the substrate-binding loop rather than impairing overall stability of the protein structure. The higher conformational changes were observed in the loop3 (residues, 235–263), turning loop conformation to A-helix and B-bridge, which ultimately disrupted the plasticity of the active site. This alteration reduced the binding site volume and hydrophobicity to decrease the binding affinity of the enzyme to substrates, which was highlighted by the MM-PBSA binding energy analysis. These findings highlight the key insights of structural consequences caused by R213H mutation, which would enrich the understanding regarding the role of SULT1A1 mutation in cancer development and also xenobiotics management to individuals in the different treatment stages.


Architecture of SULT1A1 and Stability of Simulation Systems
The SULT1A1 is a sphere-shaped protein, which has a single α/β domain and α-helix surrounded five β-sheets at the central region ( Figure 1a). The β-sheets are responsible for the substrate binding and are conserved in cytosolic sulfotransferase [32]. The dimerization site relies on the N terminal end of sulfotransferase and hypothesized as the source of substrate inhibition [33]. The catalytic activity of SULT1A1 depends on the Glu 83 , Asp 134 , Glu 246 , and Asp 263 residues [2,34]. The mentioned structure contains a connected molecule named p-nitrophenol (PNP). The motif, 257 RKGMAGDWKXXFT 269 , which is similar to the P-loop observed in ATP and GTP binding proteins, is supposedly essential for the PAPS binding. The 5 -phosphate positioning is important for the organizing co-factor, thus transferring sulfonate group to the substrate [5,17]. The 45 TYPKSGTT 52 loop is termed as phosphosulfate binding (PSB) loops of SULT1A1 and responsible for binding 5 -phosphate group of PAP. There are three substrate binding loops of SULT1A1 named as Loop1 to Loop3 with residues of 81-90, 145-154, and 235-263, respectively. The substrate binding site of SULT1A1 is hydrophobic and flexible, permitting this enzyme to embrace various structures, thus it can interact with a variety of L-shaped uncharged substrates including, aromatics (diiodothyronine), tiny aromatics (PNP), and fused ring compounds [2,31,35,36]. The mutation of our interest, R213H is located between Loop2 and 3, and its contribution to SULT1A1 dysfunction was systematically analyzed by 100 ns molecular dynamics simulation by considering both wild and mutant type structures in the presence and absence of substrate (Figure 1b,c). from the RMSD calculation (Figure 1d). Although it has been ascertained that many proteins undergo side-chain flexibilities upon the binding of ligands [41], the findings derived from RMSD calculation are consistent with the previous experimental results that ligand binding decreases backbone flexibility of SULT1A1 [42][43][44][45][46]. Furthermore, the RMSD patterns were changed due to the mutation in both apo and holo form, indicating that mutation induced the changes in overall structure, in terms of flexibility and ligand stability. and mutated (R213H) (c) type structures. Protein thermodynamics stability during the simulation was evaluated through rootmean-square deviations (RMSDs) for wild, R213H, PNP-Wild complex, and PNP-R213H complex, by considering backbone atoms (C, Cα, and N) of protein (d). Here, dark green, light green, orange, and dark red colors denote wild, PNP-Wild, R213H, and PNP-R213H, respectively.

Effects of Mutation on Conformation Stability
To observe the outcome of the mutation on inclusive protein stability, the radius of gyration (Rg) was calculated, which indicates the mean-square mass-weighted root range of a set of atoms that shared the mass centre [47]. The higher Rg values indicate loose packing of the protein structure, which means conformation that is more flexible. Additionally, the solvent accessible surface area (SASA) was calculated, where the lower values of SASA indicates the contracted nature of protein [48]. According to the Rg analysis, no significant difference was found between the wild and mutated protein, and a similar effect was also observed in case of SASA profile ( Figure 2). However, binding of the ligand showed the effect in protein conformation, where the mutated PNP-R213H complex Figure 1. Substrate binding motifs in SULT1A1 (a), along with wild (b) and mutated (R213H) (c) type structures. Protein thermodynamics stability during the simulation was evaluated through root-mean-square deviations (RMSDs) for wild, R213H, PNP-Wild complex, and PNP-R213H complex, by considering backbone atoms (C, Cα, and N) of protein (d). Here, dark green, light green, orange, and dark red colors denote wild, PNP-Wild, R213H, and PNP-R213H, respectively.
The root-mean-square deviation (RMSD) of the protein was analyzed in all systems for the original description of simulation performance and stability, reflecting the general thermodynamic stability of the system (Figure 1d) [37]. The RMSD represents the conformational stability of the protein structure during the course of the simulation, where the larger deviations indicate more flexibility of the protein structure [38]. According to the RMSD plot, all wild and mutant type structures with and without ligand achieved equilibrium after 10 ns and remained stable afterward, which allows an appropriate framework for further analyses [39,40]. In both wild and mutant types, conformational flexibility of the SULT1A1 was seen to reduce by the binding of ligand, as revealed from the RMSD calculation ( Figure 1d). Although it has been ascertained that many proteins undergo side-chain flexibilities upon the binding of ligands [41], the findings derived from RMSD calculation are consistent with the previous experimental results that ligand binding decreases backbone flexibility of SULT1A1 [42][43][44][45][46]. Furthermore, the RMSD patterns were changed due to the mutation in both apo and holo form, indicating that mutation induced the changes in overall structure, in terms of flexibility and ligand stability.

Effects of Mutation on Conformation Stability
To observe the outcome of the mutation on inclusive protein stability, the radius of gyration (Rg) was calculated, which indicates the mean-square mass-weighted root range of a set of atoms that shared the mass centre [47]. The higher Rg values indicate loose packing of the protein structure, which means conformation that is more flexible. Additionally, the solvent accessible surface area (SASA) was calculated, where the lower values of SASA indicates the contracted nature of protein [48]. According to the Rg analysis, no significant difference was found between the wild and mutated protein, and a similar effect was also observed in case of SASA profile ( Figure 2). However, binding of the ligand showed the effect in protein conformation, where the mutated PNP-R213H complex showed the higher Rg value for around 18.13-18.5 Å and the lower Rg value was observed for the PNP-Wild complex around 18.1-18.48 Å (Figure 2a). In the case of the wild type, it was clearly seen that the flexibility of the protein was reduced after binding of the ligand, thus increasing the compactness of protein folding. In contrast, the mutant type retained similar flexibility patterns to its apo form. The similar fashion of structural deviation in wild and mutant types was also observed in the SASA calculation (Figure 2b), where binding of ligand reduced the total SASA values. The PNP-R213H complex showed a higher SASA value than the wild type complex. At the initial stage, R213H complex was seen to fluctuate but after 7 ns, it remained stable and continued to 50 ns. Afterward, the SASA values decreased rapidly at 60 ns and again remained stable at 70 ns to the rest of the simulation period.  (Figure 2a). In the case of the wild type, it was clearly seen that the flexibility of the protein was reduced after binding of the ligand, thus increasing the compactness of protein folding. In contrast, the mutant type retained similar flexibility patterns to its apo form. The similar fashion of structural deviation in wild and mutant types was also observed in the SASA calculation (Figure 2b), where binding of ligand reduced the total SASA values. The PNP-R213H complex showed a higher SASA value than the wild type complex. At the initial stage, R213H complex was seen to fluctuate but after 7 ns, it remained stable and continued to 50 ns. Afterward, the SASA values decreased rapidly at 60 ns and again remained stable at 70 ns to the rest of the simulation period. The PNP-Wild complex, on the other side, showed less SASA values about 126.5 nm 2 at 31 ns and its retained stability at 40 ns, and also persisted stable through the simulations. The overall flexibility of the protein can be modulated by the number of the internal hydrogen bond, which was calculated and shown in Figure 2c. However, no significant difference was observed in the total hydrogen bonding (Table S1). Overall, Rg and SASA calculation denoted the mutation-induced flexibility in the protein-ligand complex, which was consistent with the RMSD analysis. Figure 2. Stability of the protein during the simulation by means of radius of gyration (a), solvent accessible surface area (SASA; (b)), and number of intra-residue hydrogen bond (c) for both wild and mutated type structures. Here, dark green, light green, orange, and dark red colors denotes wild, PNP-Wild, R213H, and PNP-R213H, respectively.

Effects of Mutation in Protein Dynamics
Correlative motions play a crucial part in recognizing and interacting bio-macromolecular system, which can be achieved from the covariance equation of molecular fluctuation generated through the molecular dynamics (MD) simulation trajectory. The correlative motions of different simulation systems were analyzed by the dynamic cross correlation matrix (DCCM). The DCCM fluctuations are depicted in Figure 3, which graphically displays time-correlated information among The PNP-Wild complex, on the other side, showed less SASA values about 126.5 nm 2 at 31 ns and its retained stability at 40 ns, and also persisted stable through the simulations. The overall flexibility of the protein can be modulated by the number of the internal hydrogen bond, which was calculated and shown in Figure 2c. However, no significant difference was observed in the total hydrogen bonding (Table S1). Overall, Rg and SASA calculation denoted the mutation-induced flexibility in the protein-ligand complex, which was consistent with the RMSD analysis.

Effects of Mutation in Protein Dynamics
Correlative motions play a crucial part in recognizing and interacting bio-macromolecular system, which can be achieved from the covariance equation of molecular fluctuation generated through the molecular dynamics (MD) simulation trajectory. The correlative motions of different simulation systems were analyzed by the dynamic cross correlation matrix (DCCM). The DCCM fluctuations are depicted in Figure 3, which graphically displays time-correlated information among the residues of the protein. The deeper color intensity highlights more positive or negative correlated motions between the structures. The red color indicates strong positive correlation whereas; the blue color indicates a negative correlation. The R213H protein and PNP-R213H complex exhibited different movements compared to wild protein and PNP-Wild complex, respectively. Higher positive and negative correlations were observed in wild type, especially within the inter loop residues of the active site, that is, between loop3 and loop2 ( Figure 3a). Loop3 also showed negative correlation with the residues located in loop1, however, residues contained in the loop1 were further anti-correlated with the PSB loops containing residues. Besides, the mutation showed only the negative, but insignificant, correlations between the residues of PSB loop and loop1 ( Figure 3b). Interestingly, these correlations were lost in the case of ligand binding conferring the stable protein-ligand complex; instead, the complex generated random motions within the loops (Figure 3c). The ligand bounded R213H protein exhibited the same correlations as in unbounded forms. Moreover, a few but new types of correlations were observed between loop1 and loop3 (Figure 3d), which was different from the wild complex.
In addition, principle component analysis (PCA) was performed to observe whether these correlations resulted in any significant flexibility of the protein. The PCA can be applied to any system and permits to study the influence of any varying parameters, by lessening the collective motions complexity [49][50][51][52], which is associated with the phase space behavior related to protein functions and stability. Therefore, it is frequently used to characterize the conformational variances involved in the protein folding [53,54], opening and closing mechanisms of ion channels and other proteins [55][56][57][58], as well as the conformational dynamics induced by the mutations [46,50,59,60] and the ligand binding [61]. The motion of the C α atoms was described by the eigenvectors of the covariance matrix, which were calculated in PCA analysis and resulted in first PCA of 22.72%, 23.03%, 16.48%, and 21.05% for wild, R213H, wild-PNP, and R213H-PNP systems, respectively ( Figure S1). Compared to the wild and wild-PNP systems, the apo and holo structure in the mutated form showed unusual patterns on to the phase space of the foremost two principal components (PC1 and PC2), describing the significant alternation in the protein conformation. Furthermore, the highest PC1 in the wild apo structure described that wild apo structure was undergone conformational changes, while its ligand binding state, i.e., binding with ligand reduced the conformational stability ( Figure 4a). A closure looks to the local residue mobility, in terms of PC1, ensured that the conformational changes were induced in the active site loops, which was decreased binding of ligand and in mutated forms ( Figure 4b).
In mutated form, the holo structure increased the movement in loop3, while reduced in PSB loop and loop1 and 2, compared to both wild type structures ( Figure 4b). These results are correlated with the finding from DCCM analysis and confirmed that the residues in the active site endured conformational changes due to the mutation. Here, strong correlative motions (±4 to ±1) were depicted in three dimensional cartoon model on the right side of each panel. In all cases, correlations were represented in the color spectrum, from red to blue. The positive correlation between two residues, that is, both residues move the same direction, is represented as red color. On the other hand, residues having opposite motion are denoted as the anticorrelative motion, marked as blue color. Here, strong correlative motions (±4 to ±1) were depicted in three dimensional cartoon model on the right side of each panel. In all cases, correlations were represented in the color spectrum, from red to blue. The positive correlation between two residues, that is, both residues move the same direction, is represented as red color. On the other hand, residues having opposite motion are denoted as the anti-correlative motion, marked as blue color. With the purpose of observing the dynamics changes in the active site, Define Secondary Structure of Proteins (DSSP) analysis was performed to analyze the structural stability in the loops, shown in Figure 5, which provides detailed information about the loss and gain of the secondary structure all through the simulation. As shown in the results, the secondary structure content was stably maintained by all residues in both wild and mutant types, except the residues located in the loop3. The residues located 235-263, showed diverse conformations throughout the simulation in different systems. In wild type, these residues, including 255-263, showed 3-helix conformation over time ( Figure S2). However, this conformation was changed to A-helix after 20 ns of simulation due to the mutation ( Figure S2c). Furthermore, mutation reduced the bend conformation of the loops, which seemed to increase in case of wild complex structure ( Figure S2c). In both apo and holo structures, mutation induced the persistency of B-bridge conformation in the loop3 ( Figure S2) denoting that mutation increased the rigidity of the loops that may damage the plasticity of the active site. With the purpose of observing the dynamics changes in the active site, Define Secondary Structure of Proteins (DSSP) analysis was performed to analyze the structural stability in the loops, shown in Figure 5, which provides detailed information about the loss and gain of the secondary structure all through the simulation. As shown in the results, the secondary structure content was stably maintained by all residues in both wild and mutant types, except the residues located in the loop3. The residues located 235-263, showed diverse conformations throughout the simulation in different systems. In wild type, these residues, including 255-263, showed 3-helix conformation over time ( Figure S2). However, this conformation was changed to A-helix after 20 ns of simulation due to the mutation ( Figure S2c). Furthermore, mutation reduced the bend conformation of the loops, which seemed to increase in case of wild complex structure ( Figure S2c). In both apo and holo structures, mutation induced the persistency of B-bridge conformation in the loop3 ( Figure S2) denoting that mutation increased the rigidity of the loops that may damage the plasticity of the active site.

Effects of Mutation in Active Site and Ligand Binding
Before the detailed investigation, the residue fluctuation profile due to the mutation was considered, thus RMSF analysis was performed on simulation trajectories ( Figure 6). The RMSF calculation represents the dynamics of the backbone atoms, where higher values designate greater flexibility through the MD simulation. The highest peaks on this graph show the protein regions, which were mostly fluctuated throughout the simulation period. The protein in all simulation systems showed RMSF within a range of 0.2-9.8 Å. As can be seen in Figure 6, the overall RMSF value of the residues of wild and mutated proteins was almost the same in the PSB loop, loop1, and loop2, however, the difference was observed in loop3, indicating a major binding site of the ligand. Evidently, R213H mutation induced higher fluctuation in the loop3 segment compared to the wild type, which was reduced by ligand binding. For instant, binding of ligand showed less fluctuation compared to wild type, while the mutation reduced more than that of wild type.

Effects of Mutation in Active Site and Ligand Binding
Before the detailed investigation, the residue fluctuation profile due to the mutation was considered, thus RMSF analysis was performed on simulation trajectories ( Figure 6). The RMSF calculation represents the dynamics of the backbone atoms, where higher values designate greater flexibility through the MD simulation. The highest peaks on this graph show the protein regions, which were mostly fluctuated throughout the simulation period. The protein in all simulation systems showed RMSF within a range of 0.2-9.8 Å. As can be seen in Figure 6, the overall RMSF value of the residues of wild and mutated proteins was almost the same in the PSB loop, loop1, and loop2, however, the difference was observed in loop3, indicating a major binding site of the ligand. Evidently, R213H mutation induced higher fluctuation in the loop3 segment compared to the wild type, which was reduced by ligand binding. For instant, binding of ligand showed less fluctuation compared to wild type, while the mutation reduced more than that of wild type. Accordingly, the ligand RMSD and intermolecular interaction profiling was considered to analyze the effect of rigid behavior of loop3 on ligand binding, where the plot showed that mutation reduced the ligand flexibility in the active site, while ligand was seemed to be very flexible in the pocket of native structure, as revealed from the wild type complex (Figure 7a). Likewise, the intermolecular hydrogen bonding analysis showed that the number of hydrogen bonding partners, specifically, residues from the active site was decreased in the mutated structure; the compound only showed maximum interaction with His108 ( Figure 7c). Surprisingly, mutation also reduced the total contact with the active site residue (Figure 7e), and induced a notable change in protein-ligand interaction network. Specifically, the total number of ligand-mediated contact was reduced in the mutant complex in the region of loop1 and loop3 (Figure 7e), however, mutation induced the total contact with the residues in loop2, including Ala146, Val148, and His149. Besides the ligand also maintained the contact with the residues from PSB loop such as Pro47 and Lys48, whether these inter molecular contacts were favorable to binding affinity or not, additionally binding energy calculated by MM-PBSA method was conducted on resulted 4000 snapshots from 100 ns simulation [62]. MM-PBSA binding energy results were depicted against time for wild-PNP and PNP-R213H, which is shown in Figure 7b. The average binding energy for wild-PNP complex was 110 kJ/mol, while 60 kJ/mol for mutant types (Table 1). Previous studies based on biochemical pharmacogenetic assays presented that, R213H caused very low thermostable sulfonation activity to 4 mM 4-nitrophenol, while the wild type exerted a higher binding affinity [63,64]. Consistent with previous findings, the Accordingly, the ligand RMSD and intermolecular interaction profiling was considered to analyze the effect of rigid behavior of loop3 on ligand binding, where the plot showed that mutation reduced the ligand flexibility in the active site, while ligand was seemed to be very flexible in the pocket of native structure, as revealed from the wild type complex (Figure 7a). Likewise, the intermolecular hydrogen bonding analysis showed that the number of hydrogen bonding partners, specifically, residues from the active site was decreased in the mutated structure; the compound only showed maximum interaction with His108 ( Figure 7c). Surprisingly, mutation also reduced the total contact with the active site residue (Figure 7e), and induced a notable change in protein-ligand interaction network. Specifically, the total number of ligand-mediated contact was reduced in the mutant complex in the region of loop1 and loop3 (Figure 7e), however, mutation induced the total contact with the residues in loop2, including Ala146, Val148, and His149. Besides the ligand also maintained the contact with the residues from PSB loop such as Pro47 and Lys48, whether these inter molecular contacts were favorable to binding affinity or not, additionally binding energy calculated by MM-PBSA method was conducted on resulted 4000 snapshots from 100 ns simulation [62]. MM-PBSA binding energy results were depicted against time for wild-PNP and PNP-R213H, which is shown in Figure 7b. The average binding energy for wild-PNP complex was 110 kJ/mol, while 60 kJ/mol for mutant types (Table 1). Previous studies based on biochemical pharmacogenetic assays presented that, R213H caused very low thermostable sulfonation activity to 4 mM 4-nitrophenol, while the wild type exerted a higher binding affinity [63,64]. Consistent with previous findings, the MM-PBSA analysis thus revealed that the mutation reduced the binding affinity to the PNP, as a result of changing intermolecular contacts, which might contribute to low sulfonation activity. MM-PBSA analysis thus revealed that the mutation reduced the binding affinity to the PNP, as a result of changing intermolecular contacts, which might contribute to low sulfonation activity.

Insights Into Substrate Binding
Since the mutation disrupted significant intermolecular interactions between the loop residues and the ligand, the conformational variability of these loops was mapped onto the free energy landscape by considering the radius of gyration of loop1 and loop3 as the reaction coordinates. The disperse area with color depth from light to dark red color indicates the conformational variability of the loops, where the darker area represents the conformation with the lowest energy minima. The degree of dispersion describes the conformational flexibility, where the larger area denotes more flexibility. As can be seen in the Figure 8, the conformation of the loops with the lowest energy was dispersed through the projection, where the two largest resulting clusters were seen to shift towards the larger Rg value, describing the loops flexibilities during the simulation (Figure 8a). On the other hand, the energy minima of R213H was seen to shift towards the lower Rg values, in the case of both loops, demonstrating the loss of loop flexibility (Figure 8b). In the case of ligand binding, wild type showed normal distribution in the phase behavior, moreover concentrated, which explains loop contribution to the ligand binding (Figure 8c). Consequently, mutant reduced the conformation flexibility more than the wild type in case of ligand binding, where the flexibility of loop3 was seen to be reduced, while the distribution energy minima were shifted toward the larger value of loop1 (Figure 8d). Since the mutation disrupted significant intermolecular interactions between the loop residues and the ligand, the conformational variability of these loops was mapped onto the free energy landscape by considering the radius of gyration of loop1 and loop3 as the reaction coordinates. The disperse area with color depth from light to dark red color indicates the conformational variability of the loops, where the darker area represents the conformation with the lowest energy minima. The degree of dispersion describes the conformational flexibility, where the larger area denotes more flexibility. As can be seen in the Figure 8, the conformation of the loops with the lowest energy was dispersed through the projection, where the two largest resulting clusters were seen to shift towards the larger Rg value, describing the loops flexibilities during the simulation (Figure 8a). On the other hand, the energy minima of R213H was seen to shift towards the lower Rg values, in the case of both loops, demonstrating the loss of loop flexibility (Figure 8b). In the case of ligand binding, wild type showed normal distribution in the phase behavior, moreover concentrated, which explains loop contribution to the ligand binding (Figure 8c). Consequently, mutant reduced the conformation flexibility more than the wild type in case of ligand binding, where the flexibility of loop3 was seen to be reduced, while the distribution energy minima were shifted toward the larger value of loop1 (Figure 8d). In order to understand the effect of mutation in the substrate binding, the conformer having the lowest energy was further isolated from the simulation trajectories of holo protein based on the additional free energy landscape analysis. The free energy landscape was thus constructed based on the RMSD and Rg time-dependent changes to the original structure, which demonstrates the effect of mutation on global conformational space of SULT1A1 ( Figure S3). The energy minima in the figure were represented from red to blue color, where the more concentrated blue zones described the more stable conformation with global minimum energy. On the other hand, the shape and size of the area display the conformational stability of the protein. As can be seen in Figure S3, the distribution of global energy minima was changed due to the mutation, although both wild and mutant types exhibited Gibbs free energy varying from 0 to 11 kcal/mol. Since SULT1A1 in both wild and mutant In order to understand the effect of mutation in the substrate binding, the conformer having the lowest energy was further isolated from the simulation trajectories of holo protein based on the additional free energy landscape analysis. The free energy landscape was thus constructed based on the RMSD and Rg time-dependent changes to the original structure, which demonstrates the effect of mutation on global conformational space of SULT1A1 ( Figure S3). The energy minima in the figure were represented from red to blue color, where the more concentrated blue zones described the more stable conformation with global minimum energy. On the other hand, the shape and size of the area display the conformational stability of the protein. As can be seen in Figure S3, the distribution of global energy minima was changed due to the mutation, although both wild and mutant types exhibited Gibbs free energy varying from 0 to 11 kcal/mol. Since SULT1A1 in both wild and mutant types exhibited stable conformation in the lowest energy level of 0 kcal/mol, the respective conformation was visualized and analyzed for intermolecular interaction analysis ( Figure S3).
Interestingly, significant fluctuations were noted between wild and mutant forms in the protein-ligand interactions (Figure 9). In wild complex, the ligand formed multiple hydrophobic interactions with the residues from loop1 and loop3, including Phe81, Phe84, Val243, and Met248 (Figure 9a). On the other hand, it showed only hydrophobic interaction with Phe24 residue and hydrogen bond with His108 in case of mutant type (Figure 9b), which indicates that mutation might alter the active site of the protein, therefore the volume and total SASA of the ligand binding pocket was calculated with the help of CASTp server, where the higher SASA is more pronounced for hydrophilicity, shown in Table 1. The analysis displayed that mutation increased the active site volume and SASA of apo SULT1A1, however, the volume was reduced in complex with ligand, in both cases, SASA was still increased (Table 1). types exhibited stable conformation in the lowest energy level of 0 kcal/mol, the respective conformation was visualized and analyzed for intermolecular interaction analysis ( Figure S3). Interestingly, significant fluctuations were noted between wild and mutant forms in the proteinligand interactions (Figure 9). In wild complex, the ligand formed multiple hydrophobic interactions with the residues from loop1 and loop3, including Phe81, Phe84, Val243, and Met248 (Figure 9a). On the other hand, it showed only hydrophobic interaction with Phe24 residue and hydrogen bond with His108 in case of mutant type (Figure 9b), which indicates that mutation might alter the active site of the protein, therefore the volume and total SASA of the ligand binding pocket was calculated with the help of CASTp server, where the higher SASA is more pronounced for hydrophilicity, shown in Table 1. The analysis displayed that mutation increased the active site volume and SASA of apo SULT1A1, however, the volume was reduced in complex with ligand, in both cases, SASA was still increased (Table 1). Figure 9. The intermolecular non bonded interaction pattern is rendered for wild (a) and mutated (b) complex derived from the free energy landscape. Here, pink color denotes Pi-alkyl bond, yellow for pi-cation, and green for hydrogen bond, respectively. Per-residue decomposition analysis (c). Here, green color denotes PNP-Wild protein whereas, magenta color represents PNP-R213H protein.
Moreover, per-residue decomposition analysis, which describes the contribution of each resides in total binding energy, was conducted to elucidate the mutation-induced effect on ligand binding, based on the supplementary 25 ns MD simulation by GROMACS software. Noticeably, the mutation caused significant changes in the binding energy contribution, where it was represented that, mutation reduced the favorable contribution of the residues from loop1 and loop3, especially Phe247, Phe255 and Phe81, and Phe84 respectively, while it increased the unfavorable contributions of Leu82, Glue83, Ala86, and Tyr240 (Figure 8c). Furthermore, the mutation also showed a reduction in the total MM-PBSA binding energy than the wild type, supporting the previous MM-PBSA analysis by Figure 9. The intermolecular non bonded interaction pattern is rendered for wild (a) and mutated (b) complex derived from the free energy landscape. Here, pink color denotes Pi-alkyl bond, yellow for pi-cation, and green for hydrogen bond, respectively. Per-residue decomposition analysis (c). Here, green color denotes PNP-Wild protein whereas, magenta color represents PNP-R213H protein.
Moreover, per-residue decomposition analysis, which describes the contribution of each resides in total binding energy, was conducted to elucidate the mutation-induced effect on ligand binding, based on the supplementary 25 ns MD simulation by GROMACS software. Noticeably, the mutation caused significant changes in the binding energy contribution, where it was represented that, mutation reduced the favorable contribution of the residues from loop1 and loop3, especially Phe247, Phe255 and Phe81, and Phe84 respectively, while it increased the unfavorable contributions of Leu82, Glue83, Ala86, and Tyr240 (Figure 8c). Furthermore, the mutation also showed a reduction in the total MM-PBSA binding energy than the wild type, supporting the previous MM-PBSA analysis by YASARA dynamics calculation. These interpretations were consistent with fluctuated notion that conformational flexibility of loops in the active site plays a vital role in the ligand binding, which eventually disrupted by R213H.

Discussion
Sulfonation has been recognized as the vital metabolic reaction for metabolizing not only various endogenous compounds but also bioinactivation of a variety of xenobiotics [10,[65][66][67] through the action of cytosolic sulfotransferase (SULT). The genetic polymorphisms of SULT1A1 [26,[68][69][70] caused the theatrical changes in the protein function. In this study, we considered SULT1A1 R213H, which is associated with various cancers. To look for key structural consequences offered by the mutation for this deleterious behavior, a series of comparative analyses were accumulated in this study based on computational analysis.
The molecular dynamics simulation was thus conducted to reveal the dynamic changes of the protein due to the mutation in both apo and holo form, where the initial characterization by RMSD showed that mutations induced different transitional pathways from the initial to the final stage of simulation. Consistent with the RMSD profile, the Rg and SASA calculations also confirmed the mutation induced conformational flexibility in both apo and holo form of SULTA1, where mutated types increase the dimension and solvent accessibility of the protein. In dynamics condition, protein undergoes various conformational changes for particular function, where residual communication in terms of correlative motions serves a vital role in recognizing and binding of various biological macromolecules, and this communication is usually disrupted by the mutation [71]. The correlative motion in SULTA1 based on the DCCM analysis showed that mutation reduced the correlative motions particularly in the loop1 and loop3 of active site as a results structure flexibility is lost in those positions, which was clearly reflected through collective motion analysis by PCA. Furthermore, mutation also increased the correlated motions in mutated-PNP complex compared to the other systems, which indicates the functional disruption of SULTA1. These consequences were further supported by RMSF analysis, where the effect was revealed in the loop1 and loop3 regions.
The active site of SULT1A1 is plastic, which is maintained by the aforementioned three loops, can process various conformational changes to adopt a verity of aromatic compounds [2,31,35,36]. Furthermore, the binding of these aromatic substances to the active site is highly modulated by Phe142 and Phe81, which creates an entry portal that allows only the catalytic site to bind linear substrates [72]. On the other hand, Val148, Phe247, and Met248 in the loop3 maintained interactions with nitrogen containing groups of the substrate followed by water bridge and van der Waals bonds [31].
Previous study showed that substitution of His213 makes the protein more thermolabile than the wild type [73], where the crystal studies presented that the mutation influence both structural stability and substrate regulation of SULT1A1 [31]. In this regard, Lu and colleagues identified that the substrate binding affinity and kinetics are susceptible to the loss of secondary structures stability [74], which is achieved by the mutation. Perhaps, it has been presumed that the mutation might influence the PAPS binding to PSB loop by changing the interaction of Tyr62-Arg213 [30], thus contribute to loss of enzymatic activity. Interestingly, total contact analysis showed that mutation reduced total number of interactions with the ligand during the simulation, which was also revealed in the analysis of the most native structure with lowest energy minima, calculated from the free energy landscape. Furthermore, it appeared that the active site volume and total hydrophobicity were reduced (as SASA is increased) due to the mutation; thus, the ligand flexibility in the active site was hindered, which was necessary for maintaining maximal interactions with the pocket residues. Besides, the increase of hydrophilicity in the active site resulted in unfavorable binding of the ligand, which might affect ligand binding properties as well as selectivity.
Overall, the PAPS site of cytosolic SULTs is characterized by the conserved residues, however, the core hydrophobic substrate binding site exhibits wide range of substrate specificity, which allows sulfonation for a specific substrate [2]. Accordingly, the uncharged phenolic compounds are preferred by SULT1A1, where the mutation R213H disorders the plasticity of substrates binding loops and increased the hydrophilicity of the pocket, affecting the sulfonation process, and thus contributes in cancer progression.

Preparation of the Simulation System
The structure of SULT1A1 has been extracted from the protein database (http://www.rcsb.org/ pdb) [75] in three-dimensional crystal frameworks with a protein data bank (PDB) ID of 1LS6 [31], having the molecular mass of 34.165 kDa. The structure was initially prepared by adding bond orders, hydrogen and charges; and also refined by removing water molecules and optimizing it at neutral pH. The structure was further fixed by correcting some amine derivatives, thiol groups, hydroxyl groups, protonated glutamic acids, aspartic acids, and histidines. In order to adjust the heavy atom of the structure, energy minimization has been applied by using optimized potentials for liquid simulation (OLPS3) force field until the RMSD reached to 0.30 Å. Since the structure already contained the histidine residue at the position of 213, the wild type structure (H213R) was further constructed by computational mutagenesis methodology, using Schrödinger Mutate Residues script 2017-1 suite (LLC, New York, NY, USA) [76]. Additional refinement by short molecular dynamics simulation was performed to improve the structure quality. For that, 500 ps of molecular dynamics simulation was conducted by using Yet Another Model Building and Energy Refinement force field (YAMBER3) force field [77], followed by maintaining 298 K temperature at pH 7.4. The simulation was performed in the transferable intermolecular potential3 points (TIP3P) water model with a density of 0.997. Based on the lowest energy, the final representative structure was considered for further study. For ligand binding characterization, the native co-crystal ligand, p-nitrophenol (PNP) was only considered and kept in protein structure. Accordingly, four individual simulation systems were developed, including wild, wild-PNP, R213H, and R213H-PNP.

Molecular Dynamics Simulation
The consequence of mutation on SULT1A1 protein structure and stability was analyzed by molecular dynamics simulation. The YASARA dynamic software (YASARA Biosciences GmBH, Vienna, Austria) [78] was used to analyze the dynamics of the wild and R213H type structure in the presence and absence of ligand, according to the previous protocols [59,62,79,80]. The process was carried out by cleaning each structure, followed by optimizing the hydrogen bond network. A cubic simulation cell was then constructed with a periodic boundary condition. The assisted model building with energy refinement14 (AMBER14) force field [78,81,82] was used to input the atoms of every single complex. This force field was preferred because, in regards to knowledge-based interactive potentials of the homology models, it was optimized for structure prediction, refinement, and energy minimization. The ligand was parameterized by AutoSMILES [83] algorithm, which used combined, AM1BCC [84] and General AMBER Force Field (GAFF) [81] for typing atomic charges and bond orders to unknown organic molecules. The TIP3P solvation system was used with 0.997 g/L density for solvating the simulation cell, and acid dissociation constant values (pKa) were calculated for titratable amino acids in the protein [85]. The TIP3P solvent system was known to give the best experimental output with a combination of AMBER14 force field [86][87][88][89]. The pH of the system was maintained at 7.4 to mimic the physiological conditions, and accordingly, the protonation states of each amino acid residue were determined in combination with H-bonding network and SCWRL algorithm, which employs dead-end elimination and graph theory [90]. Furthermore, the solvation system was supplemented with Na + and Cl − ions [91]. Followed by energy minimization through the steepest descent approach, the conformational stress of the system was reduced by applying simulated annealing protocol. In order to explain the long-range electrostatic interactions, the Ewald particle mesh (PME) was used, where the distance cut-off was set to 8 Å. Finally, 100 ns molecular dynamics simulation was performed with Berendsen thermostat at a time step interval of 2.00 fs, together with a multiple time step algorithm [92,93]. The pressure was set constant and trajectories were accumulated with an interval of 25 ps. A preinstalled macro (md_run.mcr) within the YASARA suite was used for all simulation phases, while the results was characterized using YASARA suite along with VMD (Version 1.9.3, Theoretical and Computational Biophysics Group, Urbana, IL, USA, 2016) [94] and DSSP (EMBL, Heidelberg, Germany) [95,96] tools. The corresponding trajectories were used in different evaluative procedures to analyze structural insights and stability through root mean square deviation (RMSD), radius of gyration (Rg), solvent accessible surface area (SASA), number of hydrogen bonds, dynamic cross correlation matrix (DCCM), root mean square fluctuation (RMSF), and secondary structure element (SSE) analysis.
Initially, the resultant trajectories were subjected to evaluate binding energy by MM-PBSA calculation, for that integral YASARA binging energy Macro was used. The method of calculation follows the theory of nuclear physics [97]; that is, the better binding is indicated by positive energy [62,98]. The equation is as follows, (1) where, YASARA built in Macros was used to calculate MM-PBSA binding energy, using AMBER 14 as a force field, where the more positive energies indicate the better binding [99]. The approach is similar to MM/PBSA, reported by [100].

Dynamic Cross-Correlation Map (DCCM) Analysis
The Bio3D [101] package integrated with R program was used to calculate residue-residue dynamic cross-correlations to determine how mutations influence the inner dynamics of protein conformations. The Cα-coefficient of cross-correlation was calculated by the average structure. Bio3D "DCCM" derives co-variation of the matrices internally from the coordinates provided and calculates Pearson's co-variance matrices correlation coefficients, calling on "cov2dccm". Based on the following equation, a cross-correlation ratio, C ij , has been considered for the Cα electrons [102]: where, the average location of i th and j th residues were represented by ∆r i and ∆r j . The time average was represented by the angular bracket. The DCCM values were between − 1 and + 1, where the positive correlation was represented by the positive value and negative correlation represented by negative values.

Principal Component Analysis (PCA)
Generally, PCA is used to understand the dynamics of any biological system. This technique has been used to achieve diagonal eigenvectors and the equivalent eigenvalues based on the calculation and diagonalization of the covariant atomic fluctuation equation [103,104]. This is also called by their eigenvectors are principal components (PCs), which supply data with their corresponding vectors on movement and displacement of the atoms. The mathematical details have been described previously [105,106].

Free Energy Landscape Analysis
The free energy landscape maps all the probable macromolecular conformational modifications, by emphasizing their respective energy levels to detect the location of the interaction molecules in the system [107,108]. In the evaluation of free energy landscape, the stability of protein was defined by Gibb's free energy calculation. It also represents the various structural nature of protein's structure-function correlation. In this study, the wild, and mutant protein free energy landscapes were analyzed by the following equations: where, Boltzmann's constant is signified through K B , T stands for temperature (300 K), population bin I is presented by N i and the most inhabited population bin was depicted by N max . As the smallest provability, an unnatural barrier scale was set for the bin without any populations. A color code model was used to display different energy levels.

Per Residue Energy Decomposition Analysis
In order to reveal the contribution of active site residue in the binding energy in both wild and mutated system, per residue energy decomposition analysis was performed by using the g_mmpbsa tool, as a function of APBS and GROMACS, thus independent MD simulations were individually performed for each of the complexes [109]. The lowest energy structures of protein-ligand complex derived from the free energy landscape analysis in both wild and mutated type were subjected for additional 25 ns molecular dynamics simulation by GROMACS software, according to the previously described method by Kumari et al. [109], where the protein was parameterized by the AMBER ff99SB force field [110], and the AM1-BCC and GAFF were considered for the van der Waals and bonded parameters for the ligand (See Supplementary file 1 for details). For the calculation, a series of 500 snapshots taken from 25 ns of the total simulation, with intervals of 50 ps [111]. The following equation has been used to calculate the binding free energy [112][113][114]: where, the total binding free energy is represented by ∆G bind , the energies of protein-ligand complex, protein (SULT1A1) and the ligand, PNP were depicted by G complex , G protein , and G ligand , respectively. Binding free energy can also be calculated using the equations e2-e5, where ∆G sol denoted as the solvation energy, ∆G pol and ∆G npol represents polar and nonpolar solvation energy components, respectively. The electrostatic interaction energy was represented by ∆E elec and van der Waals interaction energy represented by ∆E vdW . In equation (e5) γ and b are the experimental parameters, SASA stands for the solvent accessible surface area. Polar solvation effects have been computed by standard Poisson-Boltzmann equation, whereas SASA was used to calculate non-polar solvation energies through the following values γ (0.0226778 kJ.mol −1 ·Å −2 ) and b (3.84928 kJ.mol −1 ) [115].

Conclusions
The present study addressed the alteration of structural properties of SULT1A1 due to the R213H mutation, and also its effect on the substrates binding. A variety of structural characterization methods were applied followed by molecular dynamics simulations in this study, where the results highlighted significant structural differences in wild and mutant SULT1A1 proteins in terms of their native and ligand bounded states. Essentially, this mutation caused the actual conformational alteration in substrate binding loops, where the loop3 was mostly influenced. This manipulation changed the plasticity and intrinsic properties of the active site, which reduced not only the ligand flexibility in the pocket but also decreased the binding affinity, as well as the substrates recognition capabilities. Together with previous findings on SULT1A1 R213H, this study hereby contributes to a better understanding regarding the role of SULT1A1 mutation in cancer development.