Owing to significant advancements in molecular structural analysis such as electron cryo-microscopy (Cryo-EM) and X-ray free-electron laser (XFEL), thousands of three-dimensional biomolecular structures including macromolecules and complex systems have been revealed. Furthermore, with the progress of molecular dynamics (MD) simulations using GPUs and supercomputers, it has become possible to understand the dynamics and behavior of biomolecular systems. Although all-atom molecular dynamics (AA-MD) simulation is a powerful tool for studying the dynamics of a biomolecule, it remains difficult to simulate entire functional processes of large molecules such as membrane proteins and motor proteins whose typical time scales are around milliseconds or longer. For example, even using a special-purpose supercomputer dedicated to AA-MD simulations, the protein folding process can be simulated only for proteins that are small and have a relatively high folding speed [1
]. To overcome this problem, the application of coarse-grained (CG) MD simulations has been attracting attention because of its lower calculation cost compared to that of AA-MD. In fact, CG-MD simulation has succeeded in qualitatively reproducing numerous biological processes for various biomolecules [2
]. Especially, C
switching Go model [3
] and multiple basin Go model [13
] can realize large conformational change of protein easily by using available multiple native structures as references to construct model interaction.
However, CG-MD simulations strongly depend on the various model and environmental parameters and their tuning is needed to reproduce desired biological processes. In general, many CG-parameters are directly or indirectly influenced by the environment such as the temperature and ionic strength and mutation. Particularly, determining CG-parameters for a larger system with drastic conformational changes and interactions between multiple-chains is quite difficult and generally results in some uncertainty related to model parameters: for example, in the multiple-basin model [13
] coupling parameter
and relative stability
, in Langevin simulation of CG-model the friction constant
, inter-molecular interaction strength, parameter related to ion-strength dependence and so on. Various methods for determining these parameters such as force matching [14
] and fluctuating matching [15
] have been proposed. However, in many cases, these methods are very computationally expensive and cannot always determine valid parameters. As a result, such tuning is often performed manually.
Furthermore, to understand molecular mechanics through simulations, it is important to investigate the region of successful parameters (in other words, a phase diagram) that reproduce a targeted process. For example, a phase diagram of environmental parameters such as temperature, ionic strength and mutation is expected to provide information on the sensitivity or robustness to environmental changes and mutations of the target molecule. Particularly, the range of CG-parameters related to mutation may provide insights into the design new molecule with better functions. A few studies [4
] have been conducted to systematically investigated and validated the dependence of the CG-model parameters by drawing a phase diagram and the limitation of CG-models have been discussed, including some parameter uncertainty.
However, exhaustively examining all candidate parameters is inefficient. Specifically, as simulations are performed stochastically, their results vary depending on a seed of the random variable and initial conditions of MD. To determine whether a process occurs stably (i.e., beyond a certain probability) under a certain parameter, it is necessary to repeatedly perform calculations while changing the conditions. As a result, the computational cost of exhaustively examining parameters with MD simulations is extremely high.
In recent years, various parameter optimization methods [17
] such as Bayesian optimization (BO) and evolutionary algorithms have been proposed in the field of machine learning and applied to a wide range of actual problems such as parameter optimization of deep neural networks [18
], combination of materials [20
], and protein design [21
]. Most parameter optimization techniques effectively find the optimal parameter. However, it is not necessarily appropriate to efficiently search for parameters beyond a certain criterion. In contrast, one of the authors recently proposed an effective sampling method [22
] for constructing phase diagrams based on uncertainty sampling (US), a type of active learning technique. The method based on US can efficiently determine phases to examine the phase boundary preferentially when two or more phases are sampled. By regarding successful parameters and the failed parameters as two phases, it is possible to efficiently search for successful parameters. However, if the number of successful parameters is small, the efficiency of the method is considered to be poor, as boundary sampling becomes difficult because of difficulties in detecting successful parameters.
In this study, we propose a method named BOUS that efficiently samples in successful regions by combining BO and US described above to overcome the computational cost of parameter search in MD simulations. BOUS first searches for a successful parameter based on the success rate of the targeted process by using BO, and then switches to the US to efficiently search for successful parameter regions. To evaluate the performance of BOUS, we applied sampling methods including BOUS to parameter search problems of the rotational motion for F1-ATPase based on CG-MD simulations. We performed the CG-MD simulations based on two types of dynamics, Newtonian and Langevin. We also evaluated the sampling performances of other sampling methods: exhaustive search, random sampling (RS), US, and BO. The results showed that BOUS, BO, and US identified successful regions and construct a phase diagram with drastically reduced computation compared to exhaustive search and RS. In addition, BOUS showed better performances than BO and US.
Moreover, we confirmed that the rotational motion of the F1-motor was reproduced over a wide range containing parameters that were not reported in existing studies. We also discussed the stability information against parameter perturbation based on the constructed phase diagrams of successful parameters. These results suggest that deeper mechanical and biological discussions can be accelerated by efficiently drawing phase diagrams. Our implementation is available at https://github.com/tsudalab/SPEMD
By conducting Newtonian dynamics and under-damped Langevin dynamics simulations of a CG-model with a few specific parameters, the past works [3
] (Koga’s work and CafeMol manual) showed that the F1 motor can make rotational motion:
K, and EVI =
for Newtonian dynamics and
K, EVI =
for Langevin dynamics were presented as representative successful parameters. In contrast, in this study, by drawing phase diagrams as shown in Figure 3
a,b for Newtonian dynamics and Figure 5
a,b for Langevin dynamics, we elucidated that the F1 motor can reproduce the rotational motion with a high success rate over wider parameter areas than the (localized) specific parameters used in previous studies [3
] (and an example in CafeMol manual). The time trajectories displayed in Supplemental Figure S3a,b
for Newtonian dynamics show that even at a higher temperature
) and lower EVI
), the success rate for rotational motion of the F1 motor can be significantly high (
). Similarly, the time trajectories for Langevin dynamics in Supplemental Figure S4b,c
showed that even at a higher temperature
K and lower EVI
than those in past work [32
] (Figure S4a
), a high success rate could be realized. Obtaining this kind of knowledge is one of the advantages of drawing a phase diagram in a wide parameter space of CG-model.
Clarifying the area of successful parameters will enable a detailed analysis of the dynamics and mechanisms to realize important functions of target biomolecules. For example, our results provide insight into the effect of friction
on the success rate from the phase diagram Figure 5
a: while in the higher temperature area
K, the difference in success rates between low friction
and high friction
is not so significant, at the lower temperature
K, the success rate seems to be decreased with higher friction
. These tendencies may be apprehended from the simulated trajectories of the rotational angle in each parameter: at the lower temperature (
K), compared to the rapid response for the smaller friction (
), the higher friction (
) caused a slower response and tended to fail rotational motion, as shown in Supplemental Figure S5
. However, at the higher temperature
, the fluctuation amplitude of rotation with lower friction (
) frequently exceeded the tolerance range of the angle, resulting in a lower success rate which is comparable to that obtained with high (
), as shown in Supplement Figure S6
In our study, it seems that the temperature with a significant success ratio is relatively smaller than room temperature ( K) in which the F1 motor can rotate in-vitro experiments. This probably comes from the switching go model: the immediate high activation energy of the whole system accompanied by switching potential may cause unstable rotation of gamma. We guess that the success ratio at higher temperatures can be improved by applying a multiple-basin model which can suppress activation energy.
Identifying the region of successful parameters for the CG-model will also provide information on the robustness of the biological function of the target molecule to environmental changes (temperature, ion strength and so on) and mutations. In the phase diagram, the range of successful EVI:
for Langevin dynamics (Figure 5
for Newtonian dynamics with threshold
) is assumed to be related to the robustness in undergoing rotational motion of F1 against mutations in residues between
subunits (as the EVI parameter should depend on the residues type in the corresponding area). Although in this study, the EVI parameter was set to a uniform value for all residues between
, a phase diagram with residue-dependent EVI parameters will be developed in our future studies.
To select the next parameter based on machine learning algorithms, computational costs (search time) are required for learning and selection. Supplemental Figure S7
shows the averaged search time of each algorithm at each sampling step in the Langevin dynamics simulations. These results show that the search time of BO was very short (approximately 10 s) compared to the simulation times. Furthermore, in the US, the search time was less than 1 s, and the search time decreased as the number of samplings increased. It is considered that label estimation, which is the most time-consuming step in US, converged quickly when the number of sampled points is large. Because BOUS initially uses BO, a relatively long search time is needed; however, after finding a successful parameter, this method switches to the US and the search time becomes very short. These search times are considered to be sufficiently short compared to time-consuming MD simulations.
In this study, we examined phase diagrams of parameter spaces with two phases, successful and failed. The phase diagram construction method based on US is also applicable in cases where there are more than two types of phases, as described in [22
]. For example, in the case of the F1-motor system, because the substep (90 + 30) for 120-degree rotation was frequently observed at lower ATP concentrations in in-vitro experiments [35
] there are three possibilities: 120-degree rotation at once (without substep), substep rotation (90 + 30 degree), and others (failure for rotation). It is considered effective to apply US simply or search for all phases using BO and then switch to US like BOUS, when there are more than two types of phases. As future work, we will apply our new methodology to other biological systems to evaluate the potential of our machine learning-based approach.