Modified Protein-Water Interactions in CHARMM36m for Thermodynamics and Kinetics of Proteins in Dilute and Crowded Solutions

Proper balance between protein-protein and protein-water interactions is vital for atomistic molecular dynamics (MD) simulations of globular proteins as well as intrinsically disordered proteins (IDPs). The overestimation of protein-protein interactions tends to make IDPs more compact than those in experiments. Likewise, multiple proteins in crowded solutions are aggregated with each other too strongly. To optimize the balance, Lennard-Jones (LJ) interactions between protein and water are often increased about 10% (with a scaling parameter, λ = 1.1) from the existing force fields. Here, we explore the optimal scaling parameter of protein-water LJ interactions for CHARMM36m in conjunction with the modified TIP3P water model, by performing enhanced sampling MD simulations of several peptides in dilute solutions and conventional MD simulations of globular proteins in dilute and crowded solutions. In our simulations, 10% increase of protein-water LJ interaction for the CHARMM36m cannot maintain stability of a small helical peptide, (AAQAA)3 in a dilute solution and only a small modification of protein-water LJ interaction up to the 3% increase (λ = 1.03) is allowed. The modified protein-water interactions are applicable to other peptides and globular proteins in dilute solutions without changing thermodynamic properties from the original CHARMM36m. However, it has a great impact on the diffusive properties of proteins in crowded solutions, avoiding the formation of too sticky protein-protein interactions.


Introduction
Atomistic descriptions of biomolecules using molecular dynamics (MD) simulation are important to investigate structure-dynamics-function relationships [1,2]. Long-time MD simulations provide us reliable thermodynamic and kinetic properties of proteins and other biomolecules in solution, membrane, and other cellular environments, if accurate molecular force fields are available [3,4]. Atomistic MD simulations used to focus on conformational dynamics of small globular proteins, while large membrane proteins, nucleic acids, and protein-nucleic acid complexes like ribosome have become target systems of the simulations in these days [5][6][7][8]. Intrinsically disordered regions/proteins (IDRs/IDPs) and their assemblies including aggregations and liquid droplets in cells have also been examined using atomistic [9][10][11][12] and coarse-grained (CG) MD simulations [13][14][15][16][17][18]. All of these molecules are necessary for modeling highly heterogeneous and crowded cellular environments [19,20].
Molecules 2022, 27, 5726 3 of 18 examine various scaling parameters for LJ interactions between protein and water in MD simulations of (AAQAA) 3 in solution. To confirm the most reasonable scaling parameter, λ, several proteins including TDP-43 in the C-terminal domain (CTD) [31], chignolin [56], c-Src kinase [57], and villin [58] are simulated in dilute or crowded solution. Considering the slow folding/unfolding processes of these peptides/proteins, enhanced sampling algorithms like replica-exchange molecular dynamics (REMD) [59] for (AAQAA) 3 , Gaussian accelerated MD (GaMD) [60] for TDP-43, and Gaussian accelerated replica-exchange umbrella sampling (GaREUS) [61] for chignolin were employed. We employed different protocols based on the features of each peptide and analysis. In the case of (AAQAA) 3 , the calculation of the temperature dependence of helicity is required for comparison with the experiments. The REMD method is suitable for evaluating the temperature dependence while enforcing the conformational sampling. Since chignolin is a β-hairpin forming protein and its timescale for folding/unfolding process is longer than that for α-helix, enhanced sampling methods using the collective variables (CVs), such as GaREUS, are needed. TDP-43 is intrinsically disordered, and any CVs are not needed. Thus, GaMD can be used for accelerating the conformational changes without increasing the computational cost from the cMD. Following the work by Nawrocki et al. [27], we test the scaling parameters of protein-water LJ interactions in the range between 1.00 and 1.09. The most reasonable value that we obtain for CHARMM36m in conjunction with the modified TIP3P is largely different from the previously proposed values for AMBER ff99SB/ff03 and CHARMM36, suggesting that a better balance between protein-protein and protein-water interactions is realized in CHARMM36m [44]. However, the small scaling value that we suggest in this study could change the diffusive properties of proteins in crowded solutions.

Folding and Stability of Small Peptides in Solution
2.1.1. (AAQAA) 3 We performed REMD simulations [59] of (AAQAA) 3 using 58 replicas covering the temperature range between 275.0 and 382.0 K. In the simulations, we aim to examine the temperature dependence of the helicity with different scaling parameters, λ, for protein-water LJ interactions. Since the scaling is introduced in CHARMM using the NBFIX function, the scaling parameter, λ, and the term, NBFIX, are used as the same meaning in this paper. Following to the previous studies [44], we regard that the corresponding residues form an α-helix, when the backbone dihedral angles (φ and ψ) for three or more consecutive residues satisfy −100 • < φ < −30 • and −67 • < ψ < −7 • , respectively. The fraction of helix is computed from the number of residues forming α-helix, divided by the total number of residues. In Figure 1a, five curves obtained in the REMD simulations with λ = 1.00 (CHARMM36m), 1.03, 1.04, 1.06, and 1.09, are compared to the experimental results [62]. Although none of the simulation curves fit to the temperature dependence observed in the experiment, the fraction of helix at 300 K for λ = 1.00 (CHARMM36m) (0.17 ± 0.01) and 1.03 (0.15 ± 0.01) are close to the experimental one (~0. 2). Note that at λ = 1.00 in our simulation shows similar value to the previously reported in the CHARMM36m paper [44], despite the difference in water box size. In contrast, using larger scaling values, λ = 1.06 and 1.09, the fraction of helix is drastically reduced to less than 0.1 at all the temperatures. Even at 300 K, it is around 0.05, which is much smaller than the experimental value and those with λ = 1.00 and 1.03. Interestingly, the curve with λ = 1.04 shows medium fractions of helicity between λ = 1.03 and 1.06. In Figure 1b, the faction helix at 300 K versus the scaling parameter, λ, is well fitted with the sigmoidal curve, and the fitted parameters are (A,B,C,D) = (0.11, 1.04, −0.0049, 0.0055). Parameter B corresponds to the scaling factor for the middle point of the helicity curve.
and the fitted parameters are ( , , , ) = (0.11, 1.04, −0.0049, 0.0055). Parameter B corresponds to the scaling factor for the middle point of the helicity curve. In Figure 1c, the fraction of helix in each residue at 300 K is compared between the REMD simulations with the scaling parameter, λ, and the experimental results. The residue profiles with λ = 1.00 and 1.03 are similar to the experimental one, although that in residue 3 (Q3) is largely underestimated in the simulations. Interestingly, these profiles are also comparable to those predicted in recent AMBER force fields for IDPs (ff99SBws-STQ [35]). In other profiles with λ = 1.04, 1.06, and 1.09, the fraction of helix in each residue is scaled down almost uniformly from those with λ = 1.00 and 1.03.
We next examine the conformational spaces, which were explored in the REMD simulations with different scaling parameters. For this purpose, the two-dimensional potential of mean forces (2D-PMFs) at 300 K are shown along with the end-to-end distance, d, and the C root mean square deviation (RMSD) from the ideal -helix conformation (Figure 2). The folded structures are localized at d~24 Å and RMSD~0.5 Å, while the unfolded structures have a broad region with RMSD > 4 Å. Except for λ = 1.09, these two basins (for the folded and unfolded ones) clearly exist in all the PMFs. The PMFs with λ = 1.03 is close to that with the original CHARMM36m ( = 1.00), while the unfolded basin is slightly emphasized due to stronger protein-water interactions. At λ = 1.04, 1.06, and 1.09, the unfolded basins show deeper and wider free-energy minima as the protein-water interaction increases via the scaling values. These analyses suggest that the shape of the free-energy landscapes of (AAQAA)3 does not change significantly with different scaling parameters, λ, while the populations of folded and unfolded states are drastically altered. In Figure 1c, the fraction of helix in each residue at 300 K is compared between the REMD simulations with the scaling parameter, λ, and the experimental results. The residue profiles with λ = 1.00 and 1.03 are similar to the experimental one, although that in residue 3 (Q 3 ) is largely underestimated in the simulations. Interestingly, these profiles are also comparable to those predicted in recent AMBER force fields for IDPs (ff99SBws-STQ [35]). In other profiles with λ = 1.04, 1.06, and 1.09, the fraction of helix in each residue is scaled down almost uniformly from those with λ = 1.00 and 1.03.
We next examine the conformational spaces, which were explored in the REMD simulations with different scaling parameters. For this purpose, the two-dimensional potential of mean forces (2D-PMFs) at 300 K are shown along with the end-to-end distance, d, and the Cα root mean square deviation (RMSD) from the ideal α-helix conformation ( Figure 2). The folded structures are localized at d~24 Å and RMSD~0.5 Å, while the unfolded structures have a broad region with RMSD > 4 Å. Except for λ = 1.09, these two basins (for the folded and unfolded ones) clearly exist in all the PMFs. The PMFs with λ = 1.03 is close to that with the original CHARMM36m (λ = 1.00), while the unfolded basin is slightly emphasized due to stronger protein-water interactions. At λ = 1.04, 1.06, and 1.09, the unfolded basins show deeper and wider free-energy minima as the protein-water interaction increases via the scaling values. These analyses suggest that the shape of the free-energy landscapes of (AAQAA) 3 does not change significantly with different scaling parameters, λ, while the populations of folded and unfolded states are drastically altered.
The REMD simulation results of (AAQAA) 3 suggest that the scaling parameter, λ = 1.04, or larger values could underestimate the stability of α-helix, when it is applied to CHARMM36m in conjunction with the modified TIP3P. Unexpectedly, the previously proposed scaling parameter for CHARMM36, λ = 1.09, does not work well for (AAQAA) 3 . Since the peptide contains only two types of amino acids (Ala and Gln), more through tests are required to examine the applicability of λ = 1.03 in many other protein systems. Hereafter, we mainly discuss two scaling parameters, λ = 1.00 and 1.03 and examine several protein systems in dilute or crowded solution, whereas the results with λ = 1.09 are presented in Supplementary Information. The REMD simulation results of (AAQAA)3 suggest that the scaling parameter, λ = 1.04, or larger values could underestimate the stability of -helix, when it is applied to CHARMM36m in conjunction with the modified TIP3P. Unexpectedly, the previously proposed scaling parameter for CHARMM36, λ = 1.09, does not work well for (AAQAA)3. Since the peptide contains only two types of amino acids (Ala and Gln), more through tests are required to examine the applicability of λ = 1.03 in many other protein systems. Hereafter, we mainly discuss two scaling parameters, λ = 1.00 and 1.03 and examine several protein systems in dilute or crowded solution, whereas the results with λ = 1.09 are presented in Supplementary Information.

Chignolin
The folding of chignolin (PDB ID: 1UAO [56]), a -hairpin forming protein with 10 amino-acid residues (sequence: GYDPETGTWG), is simulated with GaREUS [61], where replica-exchange umbrella sampling (REUS) [63] and Gaussian accelerated MD (GaMD) [60] are combined. Since a folding process of a -hairpin takes longer time than -helical peptide, a powerful sampling method, such as GaREUS, is quite useful to get the converged thermodynamics within reasonable computational times. Experimentally, chignolin is reported to form both a stable -hairpin and misfolded structures, where the folded population from the NMR measurement is ~60%. We first test two scaling parameters, λ = 1.00, and 1.03 for the GaREUS simulations of chignolin in water and compare them with λ = 1.09.

Chignolin
The folding of chignolin (PDB ID: 1UAO [56]), a β-hairpin forming protein with 10 amino-acid residues (sequence: GYDPETGTWG), is simulated with GaREUS [61], where replica-exchange umbrella sampling (REUS) [63] and Gaussian accelerated MD (GaMD) [60] are combined. Since a folding process of a β-hairpin takes longer time than α-helical peptide, a powerful sampling method, such as GaREUS, is quite useful to get the converged thermodynamics within reasonable computational times. Experimentally, chignolin is reported to form both a stable β-hairpin and misfolded structures, where the folded population from the NMR measurement is~60%. We first test two scaling parameters, λ = 1.00, and 1.03 for the GaREUS simulations of chignolin in water and compare them with λ = 1.09. Figure 3a shows the probability densities along the Cα-RMSD with respect to the NMR structure are shown. In the CHARMM36m paper [44], chignolin, and its double mutant, CLN025 were simulated using REMD, showing the native populations of 2.6% and 41%, respectively. Interestingly, the folded population in the simulation with λ = 1.00 (CHARMM36m) has increased to 27%, suggesting the superiority of sampling efficiency in GaREUS compared to REMD for β-hairpin peptides. As λ increases, the folded conformations (RMSD < 2.2 Å) increases up to 33% for λ = 1.03. Note that the convergence check of the folded population using different trajectory lengths indicates that folded populations are well converged in the present simulation (Supplementary Figure S1). The decrease of the folded population is observed for λ = 1.09 (22%) (Supplementary Figure S2). In comparison, the folded population for λ = 1.09 is lower than those for λ = 1.00 and 1.03. Although the populations for λ = 1.00 and 1.03 are still lower than the NMR measurement [56], the performance of CHARMM36m for the structure prediction of β-hairpin peptides is not too bad as discussed in the original paper. Figure 3b shows the 2D-PMFs spanned with the ASP3-GLY7 and ASP3-THR8 distances, d(ASP3 − GLY7) and d(ASP3 − THR8) which are often used to describe the folded, misfolded, and unfolded states of chignolin [64]. As revealed in the previous study, the regions around (d(ASP3 − GLY7), d(ASP3 − THR8))~(3 Å, 6 Å) and~(6.5 Å, 3 Å) correspond to the folded and misfolded states. The overall shapes of the 2D-PMFs for λ = 1.00 and λ = 1.03 are similar to each other. However, the misfolded population for λ = 1.03 is slightly increased and the unfolded population is instead reduced compared to that for λ = 1.00. It is interesting that the folded population of the β-hairpin increases as the protein-water LJ interaction is slightly stronger, while the detailed mechanism is still unknown.
in GaREUS compared to REMD for -hairpin peptides. As λ increases, the folded conformations (RMSD < 2.2 Å) increases up to 33% for λ = 1.03. Note that the convergence check of the folded population using different trajectory lengths indicates that folded populations are well converged in the present simulation (Supplementary Figure S1). The decrease of the folded population is observed for λ = 1.09 (22%) (Supplementary Figure S2). In comparison, the folded population for λ = 1.09 is lower than those for λ = 1.00 and 1.03. Although the populations for λ = 1.00 and 1.03 are still lower than the NMR measurement [56], the performance of CHARMM36m for the structure prediction of -hairpin peptides is not too bad as discussed in the original paper. Figure 3b shows the 2D-PMFs spanned with the ASP3-GLY7 and ASP3-THR8 distances, (ASP3 − GLY7) and (ASP3 − THR8), which are often used to describe the folded, misfolded, and unfolded states of chignolin [64]. As revealed in the previous study, the regions around (ASP3 − GLY7), (ASP3 − THR8) ~(3 Å, 6 Å) and ~ 6.5 Å, 3 Å correspond to the folded and misfolded states. The overall shapes of the 2D-PMFs for λ = 1.00 and λ = 1.03 are similar to each other. However, the misfolded population for λ = 1.03 is slightly increased and the unfolded population is instead reduced compared to that for λ = 1.00. It is interesting that the folded population of the -hairpin increases as the protein-water LJ interaction is slightly stronger, while the detailed mechanism is still unknown.

TDP-43 in the CTD
Transactive response DNA binding protein 43 (TDP-43) is a versatile nucleic-acid binding protein, playing a central role in amyotrophic lateral sclerosis (ALS) pathogenesis. TDP-43 consists of a well-folded N-terminal domain (NTD), two-highly conserved RNArecognition motifs (RPMs), and an unstructured prion-like C-terminal domain (CTD). Residues 320-334 in the CTD are found to form a transient -helix in the previous studies with MD simulations and NMR spectroscopy [31]. We therefore select the residues 310-340 of TDP-43 as the second target system. In the simulations, we focus on the secondary structure formation of TDP-43 in the CTD and use the GaMD method [60] as an enhanced sampling method, which can reduce energy barriers between minimum energy states using a GaMD boost potential. We performed ten replicas of GaMD simulations (each for 1 μs) and took the averages of the fraction of helix in each residue. The effect of GaMD boost

TDP-43 in the CTD
Transactive response DNA binding protein 43 (TDP-43) is a versatile nucleic-acid binding protein, playing a central role in amyotrophic lateral sclerosis (ALS) pathogenesis. TDP-43 consists of a well-folded N-terminal domain (NTD), two-highly conserved RNA-recognition motifs (RPMs), and an unstructured prion-like C-terminal domain (CTD). Residues 320-334 in the CTD are found to form a transient α-helix in the previous studies with MD simulations and NMR spectroscopy [31]. We therefore select the residues 310-340 of TDP-43 as the second target system. In the simulations, we focus on the secondary structure formation of TDP-43 in the CTD and use the GaMD method [60] as an enhanced sampling method, which can reduce energy barriers between minimum energy states using a GaMD boost potential. We performed ten replicas of GaMD simulations (each for 1 µs) and took the averages of the fraction of helix in each residue. The effect of GaMD boost potential was removed by the reweighting scheme with the cumulant expansion proposed by Miao et al. [60].
The GaMD simulations were conducted with the scaling parameters of λ = 1.00 and λ = 1.03 to make a comparison with the experiment (Figure 4). The fractions of helices show large statistical errors, as the averaged values are just taken from ten independent GaMD runs. Comparison of λ = 1.00 and 1.03 show a small reduction of helicity in the CTD of TDP-43, as protein-water LJ interaction is stronger. For λ = 1.00, the computed helicity in residues 320-330 (region I, sequence: PAMMAAQAA) is in accord with the experimental values, while the helix formed around 335-345 (region II, sequence: GMMGMLASQQ) is found to be too stabilized. The stability of helix of region II is reduced with λ = 1.03, while the reduction of helicity in region I is also observed. In the case of λ = 1.09, the helicity of region II becomes close to the experimental observation, but the difference from the experiment in region I is further emphasized (Supplementary Figure S3).
show large statistical errors, as the averaged values are just taken from ten independe GaMD runs. Comparison of λ = 1.00 and 1.03 show a small reduction of helicity in th CTD of TDP-43, as protein-water LJ interaction is stronger. For λ = 1.00, the compute helicity in residues 320-330 (region I, sequence: PAMMAAQAA) is in accord with th experimental values, while the helix formed around 335-345 (region II, sequenc GMMGMLASQQ) is found to be too stabilized. The stability of helix of region II is reduce with λ = 1.03, while the reduction of helicity in region I is also observed. In the case of λ 1.09, the helicity of region II becomes close to the experimental observation, but the d ference from the experiment in region I is further emphasized (Supplementary Figure S3 The results suggest that the 1.03 scaling of protein-water LJ interaction keeps maj structural features of TDP-43 in the CTD, with some differences from the NMR exper mental data. Rg of TDP-43 in the CTD with λ = 1.03 does not change from that with λ 1.00 (Supplementary Figure S4). It suggests that the scattering function of TDP-43 corr sponding to the SAXS profile, which reflects the averaged feature of the protein structure would be unaltered using the present modification of protein-water LJ interactions. T reduce helicity in the region II, fine tuning of the backbone torsional angles might be ne essary like the previous studies of AMBER ff99SBws-STQ [35].

Dynamics and Stability of Globular Proteins in Solution
The c-Src kinase (PDB ID: 1Y57) [57] is one of the essential protein kinases, whic consists of the kinase domain (276 residues), SH2, and SH3 domains. Although these tw domains are necessary for the activation, we simulate only the kinase domain in water the previous computational works [65,66]. Five independent MD trajectories for eac value of λ (1.00 and 1.03) are analyzed to examine the effect of the scaling parameter, λ 1.03, on globular protein structures ( Figure 5). For comparison, we also performed on MD simulation for λ = 1.09, The distribution of the C -RMSDs excluding the residues ne the N-and C-terminals have a peak around ~2.5 Å for λ = 1.00 and 1.03 (Figure 5a), su gesting that the kinase domain stability is kept in both cases. Interestingly, the populatio The results suggest that the 1.03 scaling of protein-water LJ interaction keeps major structural features of TDP-43 in the CTD, with some differences from the NMR experimental data. Rg of TDP-43 in the CTD with λ = 1.03 does not change from that with λ = 1.00 (Supplementary Figure S4). It suggests that the scattering function of TDP-43 corresponding to the SAXS profile, which reflects the averaged feature of the protein structures, would be unaltered using the present modification of protein-water LJ interactions. To reduce helicity in the region II, fine tuning of the backbone torsional angles might be necessary like the previous studies of AMBER ff99SBws-STQ [35].

Dynamics and Stability of Globular Proteins in Solution c-Src Kinase
The c-Src kinase (PDB ID: 1Y57) [57] is one of the essential protein kinases, which consists of the kinase domain (276 residues), SH2, and SH3 domains. Although these two domains are necessary for the activation, we simulate only the kinase domain in water as the previous computational works [65,66]. Five independent MD trajectories for each value of λ (1.00 and 1.03) are analyzed to examine the effect of the scaling parameter, λ = 1.03, on globular protein structures ( Figure 5). For comparison, we also performed one MD simulation for λ = 1.09, The distribution of the Cα-RMSDs excluding the residues near the N-and C-terminals have a peak around~2.5 Å for λ = 1.00 and 1.03 (Figure 5a), suggesting that the kinase domain stability is kept in both cases. Interestingly, the population of the native state (<3.0 Å) with λ = 1.03 slightly increases compared to that with λ = 1.00. The distribution for λ = 1.09 shows a peak spreading around the RMSD of 4.5 Å, which is absent in the cases of λ = 1.00 and 1.03 (Supplementary Figure S5). It indicates the distortion of the kinase domain structure. Referring to Figure 5b, the Cα root mean square fluctuation (Cα-RMSF) analysis shows that the difference is hardly discernible between λ = 1.00 and λ = 1.03, except for the first 40 residues and the residues around 210-220. of the native state (<3.0 Å) with λ = 1.03 slightly increases compared to that with λ = 1.00. The distribution for λ = 1.09 shows a peak spreading around the RMSD of 4.5 Å, which is absent in the cases of λ = 1.00 and 1.03 (Supplementary Figure S5). It indicates the distortion of the kinase domain structure. Referring to Figure 5b, the C root mean square fluctuation (C -RMSF) analysis shows that the difference is hardly discernible between λ = 1.00 and λ = 1.03, except for the first 40 residues and the residues around 210-220. The time-series of the C -RMSDs with all the 276 residues in Supplementary Figure  S6 reveal larger deviations for λ = 1.03. This suggests that the conformational fluctuations near the N-and C-terminals are enhanced as protein-water LJ interaction increases (with λ = 1.03). The results of the C -RMSDs with and without N-or C-terminal residues (10 residues for each terminal) are reasonable, since we do not change intra-protein interactions of the CHARMM36m, but slightly emphasize protein-water interactions using the scaling parameter of λ = 1.03. The latter seems to affect the motions of N-and C-terminal residues in water, primarily, while the globular domain stability is almost unaltered. Note that the C -RMSD and C -RMSF obtained from the MD simulations with the LJ scaling parameters, λ = 1.00 and 1.03, are similar to those in a previous study using the AMBER ff99SB-ILDN [48]. Since the AMBER force field was tuned to reproduce the conformational dynamics and stability of folded native structures, the results obtained here seem to be promising for simulating the folded native protein structures in solution.

Structural Stability and Diffusivity of Globular Proteins in the Crowded Solutions
Finally, a small globular protein, villin headpiece subdomain (villin), was simulated with the conventional MD simulations in dilute solution as well as in crowded solution. Villin contains 35 amino-acid residues, composing of three -helices [PDB ID: 1VII [58]]. Due to the small size and conformational stability, it was often used to test simulation protocols, folding mechanisms [46], and the effect of macromolecular crowding and weak non-specific interactions [25][26][27]. Here, we prepared two systems: in one system, a single villin is simulated in water (dilute solution) and in the other, eight villins are simulated in solution (crowded solution). Note that in the crowded solution, both target and crowder proteins are villins. The concentration of the crowded solution is about 32 mM, which is the same as those used in our previous study [27].
The time-series of C -RMSDs with respect to the crystal structure in the dilute solution (Supplementary Figure S7a) reveals that villin keeps the native structure (2-3 Å) during the simulations with λ = 1.00. As for λ = 1.03, the RMSDs fluctuate around 2-3 Å for The results of the Cα-RMSDs with and without N-or C-terminal residues (10 residues for each terminal) are reasonable, since we do not change intra-protein interactions of the CHARMM36m, but slightly emphasize protein-water interactions using the scaling parameter of λ = 1.03. The latter seems to affect the motions of N-and C-terminal residues in water, primarily, while the globular domain stability is almost unaltered. Note that the Cα-RMSD and Cα-RMSF obtained from the MD simulations with the LJ scaling parameters, λ = 1.00 and 1.03, are similar to those in a previous study using the AMBER ff99SB-ILDN [48]. Since the AMBER force field was tuned to reproduce the conformational dynamics and stability of folded native structures, the results obtained here seem to be promising for simulating the folded native protein structures in solution.

Structural Stability and Diffusivity of Globular Proteins in the Crowded Solutions
Finally, a small globular protein, villin headpiece subdomain (villin), was simulated with the conventional MD simulations in dilute solution as well as in crowded solution. Villin contains 35 amino-acid residues, composing of three α-helices [PDB ID: 1VII [58]]. Due to the small size and conformational stability, it was often used to test simulation protocols, folding mechanisms [46], and the effect of macromolecular crowding and weak non-specific interactions [25][26][27]. Here, we prepared two systems: in one system, a single villin is simulated in water (dilute solution) and in the other, eight villins are simulated in solution (crowded solution). Note that in the crowded solution, both target and crowder proteins are villins. The concentration of the crowded solution is about 32 mM, which is the same as those used in our previous study [27].
The time-series of Cα-RMSDs with respect to the crystal structure in the dilute solution (Supplementary Figure S7a) reveals that villin keeps the native structure (2-3 Å) during the simulations with λ = 1.00. As for λ = 1.03, the RMSDs fluctuate around 2-3 Å for five trajectories, while one trajectory shows larger values of RMSD (~4 Å) as compared to the others due to the orientational change of the N-terminal α-helix (Supplementary Figure S7b). On the other hand, the structural characterization with the DSSP algorithm [67,68] reveals that the native secondary structures in the native structure are well conserved for all the trajectories (Figure 6a). In the crowded solution, most villins keep the structures close to the native state (2-3 Å) with λ = 1.00 (Supplementary Figure S8a), while a partial unfolding is observed in one of eight villins. This is rather consistent with our previous studies, which suggests the partial unfolding due to the weak and non-specific interactions. At λ = 1.03, the number of villins showing the large fluctuation compared with the dilute solution increases, while there was no unfolding trajectory in the eight villins (Supplementary Figure S8b. The secondary structures are conserved in the crowded solution (Figure 6b), which suggests that tertiary structures might be broken due to protein-protein interactions. By further increasing the scaling factor up to λ = 1.09, the breakdown of the native structure is observed both for the dilute and crowded solutions ( Supplementary Figures S9 and S10).
the others due to the orientational change of the N-terminal -helix ( Supplementary Figure S7b). On the other hand, the structural characterization with the DSSP algorithm [67,68] reveals that the native secondary structures in the native structure are well conserved for all the trajectories (Figure 6a). In the crowded solution, most villins keep the structures close to the native state (2-3 Å) with λ = 1.00 (Supplementary Figure S8a), while a partial unfolding is observed in one of eight villins. This is rather consistent with our previous studies, which suggests the partial unfolding due to the weak and non-specific interactions. At λ = 1.03, the number of villins showing the large fluctuation compared with the dilute solution increases, while there was no unfolding trajectory in the eight villins (Supplementary Figure S8b. The secondary structures are conserved in the crowded solution (Figure 6b), which suggests that tertiary structures might be broken due to protein-protein interactions. By further increasing the scaling factor up to λ = 1.09, the breakdown of the native structure is observed both for the dilute and crowded solutions ( Supplementary Figures S9 and S10). The translational diffusions of villin in the dilute and crowder solutions are examined. Note that the diffusion coefficient depends on the system size due to the periodic boundary condition (PBC). In the present study, we employ the correction proposed by Yeh and Hummer [69], represented as where and are the diffusion coefficient obtained from the slope of the mean square displacement (MSD), and correction term for PBC defined from the hydrodynamic radius of villin ( ), solvent viscosity ( ), and box length of the system ( ). The value of is taken from HYDROPRO [70] estimation as 13.86 Å. The viscosity of the TIP3P water, = 0.35 cP [71], is used for the dilute solution. As for the crowder solution, the Einstein's relationship between the crowder volume fraction ( ) and viscosity of the crowder solution ( ) for suspension is employed. In addition, we utilize the viscosity correction to the diffusion coefficient with the experimental water viscosity, = 0.89 cP [72].
The corrected coefficient is defined as . Further description is available in the Supplementary Information. In dilute solution, the values of for λ = 1.00 and 1.03 obtained from the linear fitting of the MSD at 30 /ns 50 are also close to each other ( Table 1). The corrected coefficients ( ) are 0.19 and 0.20 nm /ns for λ = 1.00 and 1.03, respectively. These values after the PBC and viscosity corrections are in excellent agreement with that predicted from HYDROPRO, 0.18 nm /ns. Note that the HYDROPRO predictions are generally The translational diffusions of villin in the dilute and crowder solutions are examined. Note that the diffusion coefficient depends on the system size due to the periodic boundary condition (PBC). In the present study, we employ the correction proposed by Yeh and Hummer [69], represented as where D MSD and D PBC are the diffusion coefficient obtained from the slope of the mean square displacement (MSD), and correction term for PBC defined from the hydrodynamic radius of villin (R h ), solvent viscosity (η), and box length of the system (L). The value of R h is taken from HYDROPRO [70] estimation as 13.86 Å. The viscosity of the TIP3P water, η TIP3P = 0.35 cP [71], is used for the dilute solution. As for the crowder solution, the Einstein's relationship between the crowder volume fraction (φ) and viscosity of the crowder solution (η c ) for suspension is employed. In addition, we utilize the viscosity correction to the diffusion coefficient with the experimental water viscosity, η expt = 0.89 cP [72]. The corrected coefficient is defined as D . Further description is available in the Supplementary Information. In dilute solution, the values of D MSD for λ = 1.00 and 1.03 obtained from the linear fitting of the MSD at 30 ≤ t/ns ≤ 50 are also close to each other ( Table 1). The corrected coefficients (D ) are 0.19 and 0.20 nm 2 /ns for λ = 1.00 and 1.03, respectively. These values after the PBC and viscosity corrections are in excellent agreement with that predicted from HYDROPRO, 0.18 nm 2 /ns. Note that the HYDROPRO predictions are generally close to the experimental values under the dilute condition, and hence this agreement indicates the reliability of CHARMM36m with λ = 1.03 about the description of the diffusive properties as well as the original CHARMM36m for the dilute solutions. As for the crowder solution, the diffusion coefficients from the MSDs (D MSD ) are 0.031 nm 2 /ns for λ = 1.00 and 0.048 nm 2 /ns for λ = 1.03, respectively (Figure 7b and Table 1). Also, the diffusion coefficient with the PBC correction (D PBC is 0.20 nm 2 /ns) is for λ = 1.03, which is close to that from CHARMM36 with λ = 1.09, 0.22 nm 2 /ns [27]. The small modification of protein-water LJ interaction leads to~1.5 times acceleration of translational motions, judging from MSD results (Figure 7b). These results suggest that the optimal scaling parameter, λ = 1.03, does not change conformational stability of globular proteins significantly, while it avoids from slowdown of translational diffusion for proteins in crowded solution, probably reducing too sticky protein-protein interactions in the MD simulations. As for the crowder solution, the diffusion coefficients from the MSDs ( ) are 0.031 nm /ns for λ = 1.00 and 0.048 nm /ns for λ = 1.03, respectively (Figure 7b and Table 1). Also, the diffusion coefficient with the PBC correction ( ) is 0.20 nm /ns for = 1.03, which is close to that from CHARMM36 with λ = 1.09, 0.22 nm /ns [27]. The small modification of protein-water LJ interaction leads to ~1.5 times acceleration of translational motions, judging from MSD results (Figure 7b). These results suggest that the optimal scaling parameter, λ = 1.03, does not change conformational stability of globular proteins significantly, while it avoids from slowdown of translational diffusion for proteins in crowded solution, probably reducing too sticky protein-protein interactions in the MD simulations.

Solvation Free Energies of Amino Acid Analogues
So far, we used the single scaling parameter, λ, to modify protein-water LJ interactions in MD simulations of peptides or proteins. To understand how the modification of the CHARMM36m affects molecular interactions between water and each amino acid, we compute the solvation free energies of the 14 amino acid homologs, associated with the solubility of molecules, through free energy perturbation (FEP) method ( Figure 8). For all the species, CHARMM36, CHARMM36m (λ = 1.00), and the modified CHARMM36m (λ = 1.03) are compared to the experimental values of Δ [73]. Figure 8 shows that all the calculations with λ = 1.03 become closer to the experimental results compared with others, suggesting that the modification seems to be valid for each amino acid. However, to achieve the quantitative agreement with the experiments, in particular, for the analogues of Asn, Gln, Met, Trp, and Tyr, a larger scaling factor is required. For instance, to achieve the quantitative agreement of solvation free-energies between FEP and experiments,

Solvation Free Energies of Amino Acid Analogues
So far, we used the single scaling parameter, λ, to modify protein-water LJ interactions in MD simulations of peptides or proteins. To understand how the modification of the CHARMM36m affects molecular interactions between water and each amino acid, we compute the solvation free energies of the 14 amino acid homologs, associated with the solubility of molecules, through free energy perturbation (FEP) method ( Figure 8). For all the species, CHARMM36, CHARMM36m (λ = 1.00), and the modified CHARMM36m (λ = 1.03) are compared to the experimental values of ∆G solv [73]. Figure 8 shows that all the calculations with λ = 1.03 become closer to the experimental results compared with others, suggesting that the modification seems to be valid for each amino acid. However, to achieve the quantitative agreement with the experiments, in particular, for the analogues of Asn, Gln, Met, Trp, and Tyr, a larger scaling factor is required. For instance, to achieve the quantitative agreement of solvation free-energies between FEP and experiments, about a 40% increase of protein-water LJ interactions (λ = 1.40) is necessary, which may not be applicable to MD simulations of peptides and proteins in solution. To resolve this inconsistency, furthermore careful tunings in molecular force fields are necessary not only for proteins but also for water molecules. about a 40% increase of protein-water LJ interactions (λ = 1.40) is necessary, which may not be applicable to MD simulations of peptides and proteins in solution. To resolve this inconsistency, furthermore careful tunings in molecular force fields are necessary not only for proteins but also for water molecules.

Discussion and Conclusions
In this study, we performed enhanced sampling MD simulations of small peptides in dilute solution and conventional MD simulations of globular proteins in dilute and crowded solutions using various scaling parameters of protein-water LJ interactions. For CHARMM36m in conjunction with the modified TIP3P water models, about 10% increase of the protein-water interaction affects peptide conformational stability significantly and only small increases (up to 3%) is allowed to keep a good balance between protein-protein and protein-water interactions. The modified force field is applicable not only to -helical but also -hairpin peptides and globular proteins in dilute solution. As enhanced sampling methods, we used REMD for (AAQAA)3, GaREUS for chignolin, and GaMD for TDP-43 at the CTD in dilute solution. The GaREUS simulation for chignolin shows higher populations of the folded conformation both for the original (λ = 1.00) and modified (λ = 1.03) CHARMM36m. Note that the original CHARMM36m paper suggested much lower population of the folded state using T-REMD simulation. The GaREUS can enhance conformational sampling with the boost potential, replica-exchange, and umbrella potential for collective variables, which is more suitable to study slow folding/unfolding simulations of a -hairpin peptide than T-REMD. It suggests the importance of enhanced conformational sampling methods for evaluating the quality of molecular force fields for different peptides or proteins.
In crowded villin simulations, translational and rotational diffusive motions are significantly affected by the small modification in the force field, keeping the conformational stability of villin compared to the simulation with the original CHARMM36m force field. This suggests that 3% increase of protein-water LJ interactions reduces too much sticky protein-protein interaction of the original force fields to avoid aggregation in the highconcentration protein solution. It should be noted that the over sticky protein association could lead to the underestimation of the protein diffusivity, and hence the modification of the protein-water interactions proposed by the present study is useful for elucidating the kinetics in the crowded environments.

Discussion and Conclusions
In this study, we performed enhanced sampling MD simulations of small peptides in dilute solution and conventional MD simulations of globular proteins in dilute and crowded solutions using various scaling parameters of protein-water LJ interactions. For CHARMM36m in conjunction with the modified TIP3P water models, about 10% increase of the protein-water interaction affects peptide conformational stability significantly and only small increases (up to 3%) is allowed to keep a good balance between protein-protein and protein-water interactions. The modified force field is applicable not only to α-helical but also β-hairpin peptides and globular proteins in dilute solution. As enhanced sampling methods, we used REMD for (AAQAA) 3 , GaREUS for chignolin, and GaMD for TDP-43 at the CTD in dilute solution. The GaREUS simulation for chignolin shows higher populations of the folded conformation both for the original (λ = 1.00) and modified (λ = 1.03) CHARMM36m. Note that the original CHARMM36m paper suggested much lower population of the folded state using T-REMD simulation. The GaREUS can enhance conformational sampling with the boost potential, replica-exchange, and umbrella potential for collective variables, which is more suitable to study slow folding/unfolding simulations of a β-hairpin peptide than T-REMD. It suggests the importance of enhanced conformational sampling methods for evaluating the quality of molecular force fields for different peptides or proteins.
In crowded villin simulations, translational and rotational diffusive motions are significantly affected by the small modification in the force field, keeping the conformational stability of villin compared to the simulation with the original CHARMM36m force field. This suggests that 3% increase of protein-water LJ interactions reduces too much sticky protein-protein interaction of the original force fields to avoid aggregation in the high-concentration protein solution. It should be noted that the over sticky protein association could lead to the underestimation of the protein diffusivity, and hence the modification of the protein-water interactions proposed by the present study is useful for elucidating the kinetics in the crowded environments.
For IDRs/IDPs simulations, the scaling of protein-water LJ interactions might not be sufficient to explore disordered conformations in dilute and crowded solutions. Although too much sticky protein-protein or intra-protein interactions are avoided, local conformational properties, such as dihedral angle distributions, should be simulated more carefully. As introduced in AMBER ff99SBws-STQ, a careful high-tuning in the backbone dihedral angle potential energy is useful in the simulations of IDRs/IDPs.
The test systems in this study include an α-helical peptide, (AAQAA) 3 , a β-hairpin peptide, chignolin, a disordered region with some helical fraction, C-terminal TDP-43, globular proteins, c-Src kinase and villin in dilute and crowded solutions. Ideally, a single molecular force field is applicable to such a variety of peptides and proteins in different conditions, predicting their intrinsic structures, dynamics, and interactions at high precisions. This study provides a minimal set for testing the quality of the existing force fields.

Materials and Methods
All the MD simulations were performed using the GENESIS software [74,75]. CHARMM36m [44] with or without NBFIX corrections are used together with TIP3P water model. After modeling each protein/peptide and adding water molecules and ions, energy minimization was carried out and then short MD simulations were performed for equilibration in NPT and NVT ensembles. The system temperature and pressure were controlled using the Bussi thermostat/barostat [76] with a time step of 2 fs. Water molecules and protein/peptide bonds involving hydrogens were constrained with SETTLE and SHAKE [77], respectively. Long-range electrostatic interaction was calculated using particle mesh Ewald (PME) method [78], while LJ interaction was smoothly reduced to zero from 10-12 Å using a switch function.
In production runs, each replica was simulated for 525 ns with NVT ensemble. Replica exchanges were attempted every 1500-time step. The equations of motion were integrated using the r-RESPA multiple time step method [80] with a 3.5 fs time step for fast motions and a 7.0 fs time step for slow motions. To make the simulation trajectories stable, optimized hydrogen repartitioning (HMR) and group-based temperature/pressure approach (group T/P) [81] were employed. Structures and energies were written every 1500 steps. Trajectories from 105 to 525 ns were used for analysis.
In production runs, we performed 1-µs GaREUS simulations with NVT ensemble in each replica for λ = 1.00 and 1.03, and 0.425-µs GaREUS simulation for λ = 1.09. To obtain unbiased free-energy landscapes, the two-step reweighting scheme, which consists of the second-order cumulant expansion for removing the GaMD biases and multistate Bennett acceptance ratio (MBAR) [82] for harmonic restraint potentials in REUS, were applied to the simulation trajectories. A 5-fs time step was used to integrate the equation of motion using velocity Verlet integrator with HMR and group T/P. Structures and energies were output every 1000 steps and every 500 steps, respectively. The first 0.1-µs in the GaREUS simulation for each condition were omitted as equilibration.

TDP-43
The C-terminal domain of TDP-43 (residues 310-340) in water was simulated using dual-boost GaMD [60]. The N-and C-terminal residues were capped with the acetyl and amide groups, respectively. The peptide was solvated in a periodic box (65 × 65 × 65 Å 3 ), which contain 8395 water molecules, 24 Na + , and 24 Cl -. Three NBFIX scaling values, λ = 1.00, 1.03, and 1.09, were tested. Initial structure of TDP-43 was first built as an extended conformation. The structure was collapsed using the short MD simulation in vacuum. In dual-boost GaMD, one boost potential is applied to the dihedral and CMAP energy terms and another to the total potential excluding the dihedral and CMAP energies. To obtain the initial guesses of the both boosting potentials, the thresholds and force constants were calculated from an initial 5-ns simulation without boosting. After the initial determination of the GaMD parameters, the boost potentials were added to the system while continuing the update of parameters every 100 ps in a 15-ns NVT simulation. The finally determined parameters were used for production runs. The thresholds in the boosting potentials are set to the lower bound. σ 0 was set to 6 kcal/mol for both potentials.
In production runs, ten independent GaMD simulations for 1 µs each were performed with NVT ensemble, and the results were averaged. A 5-fs time step was used to integrate the equation of motion using velocity Verlet integrator with HMR and group T/P. To analyze the GaMD simulation trajectories, we avoid the effect of GaMD biasing potential using the reweighting scheme based on the cumulant expansions [60]. Structures and energies were output every 20,000 steps and every 2000 steps, respectively. Trajectories from 0.1 to 1 µs were used for analysis.

c-Src Kinase
Initial structure of c-Src kinase was taken from PDB (PDB ID: 1Y57 [57]) and was solvated in a periodic box (102 × 102 × 102 Å 3 ). The system consists of 103,592 atoms, where c-Src kinase, 94 Na + , 89 Cl − , and 33,000 water molecules exist. Three NBFIX values (1.00, 1.03 and 1.09) were used. Five independent conventional MD (cMD) simulations for 1 µs for λ = 1.00 and 1.03 were performed with NVT ensemble, while one simulation for 1 µs was done for λ = 1.09. The r-RESPA integrator with a 3.5 fs were employed with HMR and group T/P. Structures and energies were written every 6000 steps and every 3000 steps, respectively. Trajectories from 0.2 to 1 µs were used for analysis.

Villin Crowding Solution
For the dilute system, one villin was solvated in a periodic box (102 × 102 × 102 Å 3 ). The system consists of 99,776 atoms, where villin, 89 Na + , 91 Cl − , and 33,000 water molecules exist. Three NBFIX values (1.00, 1.03 and 1.09) were used. Five independent cMD simulations for 1 µs each were performed with NVT ensemble for λ = 1.00 and 1.03. As for λ = 1.09, one cMD simulation for 1 µs was performed. The r-RESPA integrator with a 3.5 fs were employed with HMR and group T/P. Structures and energies were output every 6000 steps and every 3000 steps, respectively. The first 0.2 µs trajectory for each was omitted as equilibration.
For the crowded solution, eight villins were solvated in a periodic box (78 × 78 × 78 Å 3 ). Initial structure of villin was taken from PDB (PDB ID: 1VII [58]). The system consists of 42,262 atoms, where 8 villins, 34 Na + , 50 Cl − , 12,466 water molecules exist. Three NBFIX values (1.00, 1.03, and 1.09) were used. The trajectory lengths of cMD simulations with NVT ensemble are 1.5 µs for λ = 1.00 and 1.03, and 1.0 µs for λ = 1.09. The r-RESPA integrator with a 3.5 fs were employed with HMR and group T/P. Structures and energies were output every 15,000 steps and every 5000 steps, respectively. Trajectories from 0.2 to 1.5 µs were used for analysis.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/molecules27175726/s1, Scheme of computing diffusion coefficient of villin; Figure S1: Convergence check of folded population of chignolin; Figure S2: Probability densities along the Cα-RMSD of chignolin; Figure S3: Fraction helix of TDP-43 for λ = 1.09; Figure S4: The radius of gyration, R g , of TDP-43; Figure S5: Distribution of Cα-RMSD of c-Src kinase; Figure  S6: The Cα-RMSD of c-Src kinase in dilute solution; Figure S7: The Cα-RMSD of villin in dilute solution; Figure S8: The Cα-RMSD of villin in crowded solution; Figure S9: The Cα-RMSD of villin in the dilute and crowded solutions for λ = 1.09; Figure S10: The helicity of villin in the dilute and crowded solutions.