A Free-Energy Landscape Analysis of Calmodulin Obtained from an NMR Data-Utilized Multi-Scale Divide-and-Conquer Molecular Dynamics Simulation

Calmodulin (CaM) is a multifunctional calcium-binding protein, which regulates a variety of biochemical processes. CaM acts through its conformational changes and complex formation with its target enzymes. CaM consists of two globular domains (N-lobe and C-lobe) linked by an extended linker region. Upon calcium binding, the N-lobe and C-lobe undergo local conformational changes, followed by a major conformational change of the entire CaM to wrap the target enzyme. However, the regulation mechanisms, such as allosteric interactions, which regulate the large structural changes, are still unclear. In order to investigate the series of structural changes, the free-energy landscape of CaM was obtained by multi-scale divide-and-conquer molecular dynamics (MSDC-MD). The resultant free-energy landscape (FEL) shows that the Ca2+ bound CaM (holo-CaM) would take an experimentally famous elongated structure, which can be formed in the early stage of structural change, by breaking the inter-domain interactions. The FEL also shows that important interactions complete the structural change from the elongated structure to the ring-like structure. In addition, the FEL might give a guiding principle to predict mutational sites in CaM. In this study, it was demonstrated that the movement process of macroscopic variables on the FEL may be diffusive to some extent, and then, the MSDC-MD is suitable to the parallel computation.

CaM consists of 148 residues and it has four special motifs called 'EF-hand'; the EFhand specifically interact with Ca 2+ . The reason is that the EF-hand sequence contains a highly conserved motif at the binding sites; the motif is the most common motif found in Ca 2+ -binding proteins [4,[19][20][21][22]. The EF-hand motifs have been found in other Ca 2+binding proteins, such as Troponin C and Calbindin, and these proteins comprise a "EFhand protein family" [19][20][21]23,24]. In the solution state, The EF-hand motif forms a helixloop-helix structure [19,20,[25][26][27]. The canonical EF-hand has the central loop consisting of Since the molecular dynamics (MD) simulation of CaM was performed by Komeiji et al. in 2002, many works employing the MD simulations have been made on CaM [65]. However, the SAXS experiment conducted in 2012 reported that the structural change of CaM takes from a millisecond to a second in a timescale [66]. The timescale currently affordable by ordinary MD simulation is a few hundred nanoseconds to microseconds, while the timescale of CaM is longer by more than 10 3 times. Useful but time-consuming calculations, such as the calculation of the free-energy landscape (FEL), for example, are not possible with the conventional MD simulations.
To overcome the time limitation, many simulation methods that facilitate effective sampling, have been proposed. For example, "umbrella sampling" that uses a harmonic potential or modified "flat-bottom" potential to facilitate an extensive sampling, is widely used [67][68][69]. Since the dimension of the conformation space is too large to sweep, collective variables, such as center-of-mass and RMSD, have been used to sweep multidimensional conformation space by lower dimensional sampling [70,71]. In our sampling, variables from principal component analysis (PCA) were used; the variables were chosen so that they were not affected by rotation and translation.
In a previous work, we proposed a method which combines an efficiency of a coarsegrained model (CGM) for conformational search and accuracy of an all-atom model (AAM) named multi-scale divide-and-conquer MD (MSDC-MD) [72] and demonstrated that the FEL of an intrinsically disordered protein, which is one of CaMBDs, could be evaluated correctly. In the present work, MSDC-MD was applied to the large conformational change of CaM by using the NMR experimental structures instead of CGM-MD-derived structures adopted before, and analyzed the FEL to obtain useful information for conformational change of CaM.

Materials and Methods
In our previous work [64], the monomeric CaM structural ensemble (PDB ID 2k0e) was analyzed by PCA. The first and second principal components corresponded to the relative domain motion (Figure 1a,b), and the cumulative proportion of the variance was about 80%. The distribution of monomeric structures on the plane by the first and second principal components had a horseshoe shape, as shown in Figure 1c (such distribution was also reported by Kurkcuoglu et al. in 2016 [73]). The NMR structures (PDB ID 2k0e) were analyzed by the PCA to determine the reaction coordinate. Before PCA, to remove the effects of translation and rotation, rootmean-square-distance (RMSD) fitting with respect to the "reference coordinate" (first model's N-lobe, Cα positions 5-70; s ν = (x ν , y ν , z ν ) for ν = 5· · · 70) was performed on the original Cα position r i = (x i , y i , z i ), where the vector Q is the centroid of the fitting Cα atoms and the matrix R is a 3 × 3 rotational matrix. The rotational matrix R depends on r ν and s ν through the cross-covariance matrix A, defined as follows [74], ν=5 s ν where the vector P is the centroid of the reference coordinate. Then, the fitted coordinate X = r 1 , · · · , r N was analyzed by PCA to obtain the eigenvalues and eigenvectors (N = 148). The projection of the structure to nth principal component (v n ) can be written of X ( · · · indicates an ensemble average, k, l = 1 · · · 3N), which satisfies the following equation.
In order to perform MSDC-MD, the monomeric structures on the v 1 , v 2 -plane were divided into 56 small areas (Figures 1c and 2 dashed line), where the sizes of the small areas were chosen so that the following all-atom model (AAM) MD simulations can sweep the small area. Note here that this process is based on trial and error to date, because the way to know the optimum size of the area is unknown a priori. When the size is too large, individual AAM-MD cannot sweep the area, nevertheless, when the size is too small, the ergodicity might break down. Multiple AAM-MD simulations were initiated at a randomly selected AAM structure using a modified version of GROMACS [75] (in a small area without the monomeric structure, AAM-MD simulation was not performed). To estimate the statistical error, this process was performed three times, changing the initial structure and the initial velocity. For the simulations, AMBER03 [76] force field was used for the protein and ions. Ca 2+ parameter was σ = 3.0524 Å and = 0.45962 kcal/mol [77]. The periodic boundary condition in that 20 Å buffer was taken from the most outer atom, was used and CaM was solvated in the TIP3P water models [78]. The numbers of water molecules were different for different initial structures, and then the total number of atoms varied from 52,754 to 68,597. The particle mesh Ewald method [79][80][81], with cutoff = 10 Å, was used for Coulomb interaction calculations. The total charge of the system was neutralized by adding Na+ ions. The Lennard-Jones potential, with cutoff = 14 Å, was used for van der Waals interactions. After energy minimizations and 1 ns of equilibration simulations with position restraints, 200 ns of multiple production simulations were performed independently, and for all AAM-MD simulations, the time step was 2 fs. The temperature and pressure of these simulations were maintained to be T = 300 K and P = 1 bar by the Berendsen thermostat [82] and Berendsen barostat [83], respectively. In the production runs, the following umbrella-like potential was imposed.
are the lower and upper limits of the sampling region, respectively. The difference between "small area" and "sampling region" is illustrated in Figure 2. The sampling regions overlap with the neighboring small areas by 5 Å. For example, the parameters of the sampling region 1 and 2 are taken to be v Then, the relative weight of between the small area 1 and 2 can be written as ω 12 = log(N 2 ) − log(N 1 ), where N 1 and N 2 are the number of the samples from the sampling region 1 and 2 in v . In general, the relative weight between the small areas i and j can be written as and N j are the number of samples from the sampling region i and j in v < v m were excluded from the FEL calculation. The entire weight over all sampling regions (W i ) is then obtained as follows.
where i,j are indexes of the sampling region, c is the iteration number and W (0) ν = 0; the initial sampling region ν is determined so that the number of overlapping regions is maximized, as, even if there is no overlap with the next region, the relative weight might be determined through the diagonally across region. Then, the above iteration was continued until all regions were linked. The entire FEL was obtained by summing the samples in the sampling region i with the final weight W i as follows.
where k B is the Boltzmann constant, the index t and L i are, respectively, the time step and the number of samples obtained from each sampling region i (please remember that samples out of the sampling region is excluded) and is included in a region that is swept twice, M = 2. For this study, the difference of the entire FEL caused by the different choice of ν was satisfactory smaller than the statistical error.
The force of φ(v m ) can be written as follows.
Since the vector Q and the rotation matrix R depend on original Cartesian coordinates X i = (r 1 , · · · , r N ), ∇v m can be written as follows.
In order to show the efficiency of MSDC-MD's sampling, the non-arbitrary structure of a given area on FEL is shown as a "representative" structure. The representative structure was defined as a centroid in RMSD. Let ∆ ij be the RMSD of structure i with respect to structure j, then the total distance between an i and others are measured as ∆ i = ∑ j =i ∆ ij . Then, the representative structure was defined as that with the smallest ∆ i .
In our previous study [64], the allosteric interactions were identified as a contact probability. These interactions are also used to analyze the FEL. Suppose two residues, i and j, where a distance between any atom in residue i and another atom in residue j is less than 4.5 Å, the residues are identified as a contacting residue pair.

Results and Discussion
The entire FEL (F(v 1 , v 2 ), T = 300 K, k B T = 0.6 kcal/mol) obtained by MSDC-MD is shown in Figure 3, which shows that the FEL has a complex multi-valley structure. The red points correspond to the NMR structures ( Figure 3 red; PDB ID 2k0e), and it can be seen from Figure 3 that the distribution of the NMR structures roughly coincided with the basins of the FEL. The monomeric elongated structure ( Figure 3 purple; PDB ID 1cll) is located at the edge of a broad basin around (v 1 , v 2 ) = (50, 100). Notably, the ring-like structure in the complex ( Figure 3 green; PDB ID 4q5u) was not located in a stable FEL's basin, but in a slightly higher free-energy (structurally unstable) region.
In order to confirm whether statistically sufficient samples were obtained or not, the RMSD, with respect to the elongated structure and the ring-like structure, were calculated ( Figure 4a). It can be seen from the figure that the smallest Cα-Cα RMSD, with respect to the elongated structure and the ring-like structure, are, respectively, 2.7 Å and 2.8 Å.
Considering that a Cα distance between adjacent amino acids is about 3.8 Å, the MSDC-MD has succeeded in sampling structures similar enough to the elongated structure or the ring-like structure. Indeed, when the structures of the minimum RMSD and representative structures obtained from MSDC-MD are superimposed on the elongated structure and the ring-like structure, it can be seen that they coincided well with each other (Figure 4b-e); the RMSD of the representative structure around the elongated and ring-like structure, with respect to the elongated and ring-like structure, was, respectively, 3.2 Å and 2.8 Å. Then, it can be said that the shape of the FEL around the elongated structure and the ring-like structure in Figure 3 is not due to a sampling error. Since the elongated structure was obtained from X-ray crystallography, the location of the elongated structure is away from the region where many NMR structures are located, which would reflect differences in experimental conditions. On the other hand, the ring-like structure is probably intrinsically unstable, and it is one that can only be stabilized by forming a complex.
To validate the correctness of the FEL, the FEL obtained by 5 µs conventional AAM-MD starting from an elongated structure is shown in Figure 5a. Further, the comparisons of one-dimensional FELs are shown in Figure 5b,c. The FEL shown in Figure 5b was obtained from averaging the two-dimensional FEL on the rectangle region from (−210 Å, 110 Å) to (190 Å, 140 Å) along the v 2 -axis (red rectangle in Figure 5a). Similarly, the FEL shown in Figure 5c was obtained from averaging the two-dimensional FEL on the rectangle region from (170 Å, 0 Å) to (140 Å, 160 Å) along the v 1 -axis (orange rectangle in Figure 5a). The position of the averaging area was shifted by ±10 Å, but the qualitative trend of the graph did not change.    Figure 5a. Therefore the FEL of AAM-MD at the right basin is shallower than that of MSDC-MD. AAM-MD, as shown in Figure 5c, could only sample a range of 40 Å < v 2 <100 Å, while MSDC-MD could sample a much wider range of 0 Å < v 2 < 160 Å.
In our previous study [64], in order to identify the allosteric interactions among holo-CaM, the holo-CaM NMR structures (PDB ID: 1dmo) were compared to the "chimera" structure; the chimera structure was made by combining the linker region of the apo-CaM structures and the N-and C-lobe of holo-elongated CaM (PDB ID: 1cll). From the study, the interactions that are included in the holo-CaM NMR structures, but not in the chimera structures, had been identified as the contact probability (Figure 6a). Presumably, these interactions include allosteric interactions.
In order to investigate the relationship between the inter-domain interactions and the positions on the FEL, the inter-domain interactions are divided into five groups (C1-C5; Figure 6a), and the structures that have C1-C5 interactions were marked on the FEL ( Figure 6b). As shown in Figure 6b, the structures forming C1-5 are projected following their own patterns instead of being projected randomly, indicating that the C1-5 interactions play essential roles in regulating the inter-domain motion. From Figure 6b, most NMR data are distributed (i) in the region around (v 1 , v 2 ) = (−140, 50), (ii) the region around (50, 90) and (iii) the region around (140, −30). Similarly, the FEL basins are in (i), (ii) and (iii), where the lowest point of the FEL is included in (ii), and the C3, C4 forming structures are populated. On the other hand, (iv) region (20 < v 1 < 70, −200 < v 2 < 50) is a broad and stable region, and the elongated structures are included here.
Since the distribution of NMR data of apo-CaM is basically distributed on the lower part of the FEL, the process of the initial structural change from apo-CaM to holo-CaM would correspond to the path on the FEL from the lower to the most stable point in (ii). The most probable path, in this regard, is the one passing through the region (iv). The reason why the experimentally famous elongated structure is actually included in this region may be the result of X-ray crystallography capturing the structural change from apo-CaM to holo-CaM upon Ca 2+ binding. Since the lowest point of the FEL is located at point (ii), the ring-like structure is not the most stable state for monomeric holo-CaM. However, the FEL basin around the ring-like structure indicates that even the monomeric holo-CaM has interactions (C4, C5) to form the ring-like structure. Around the (iii) region, inter-domain interactions (C1, C2) are formed, but as the FEL barrier exits between region (ii) and (iii), as shown in the subset of Figure 6b, the difference of the relative free-energy to region (iii) (∆∆F), between region (iii) and the saddle point of the barrier, is 1.5 kcal/mol, and ∆∆F between the saddle point and region (ii) is 4.5 kcal/mol. On the other hand, no clear barrier was found between region (iii) and (iv). Therefore, the structures that once went to region (iii) would back region (iv) against the barrier, while the inter-domain interactions formed in region (iii) are dissociated, in order to take an elongated structure. Then, if mutations that decrease the FEL barrier are generated, the mutated CaM may change its structure in a shorter time. Figure 6. (a) The inter-domain contact probabilities whose contact ratio in holo-CaM is larger than that of chimera structures. The interactions can be divided into five groups (C1-C5). (b) The entire free-energy landscape obtained from MSDC-MD. The x-and y-axis are, respectively, v 1 and v 2 . The red points show the NMR structures (PDB ID 2k0e), the orange points show the apo-NMR structure (1dmo), and the green and purple points, respectively, show the elongated structure (PDB ID 1cll) and the ring-like structure (PDB ID 4q5u). C1-C5 forming structures are, respectively, indicated as the pink point, orange cross, green cross, cyan square, and black circle. In the subset, the free-energy values relative to the region (iii) are shown in kilocalories per mole.
However, based on the amount of NMR data and the distribution of FELs, it is reasonable to assume that the structural changes occur in the order (iv), (ii), and (i) at this stage. Although the structural change of the individual AAM-MDs in the MSDC-MD is virtual behavior under fictitious confinement potentials, the overall FEL can be quickly calculated by reweighting the individual free energies. In this case, we compared the FEL of 200 ns MSDC-MD with that of the 5 µs, and the 200 ns MSDC-MD was able to calculate the FEL that is clearly broader and consistent with the NMR structure distribution. Furthermore, since the AAM-MDs in the MSDC-MD do not need to communicate with each other, it is very compatible with parallel computing. We can then conclude that the MSDC-MD is an excellent method for FEL analyses in recent situations where parallel computing is commonly used.

Conclusions
In our study, we applied a multi-scale simulation method that divides the conformation space into small areas to enable us to perform a well-equilibrated MD simulation, named 'MSDC-MD', to the free-energy analysis of CaM. In order to investigate the efficiency and accuracy, MSDC-MD simulation of CaM was performed. Since Komeiji et al. in 2002, CaM has been a good example to demonstrate the computational method, because it exhibits large conformational changes including domain rotation and translation in the Ca 2+ -signal transduction process. As the process takes nearly 1 ms, conventional MD simulations have failed to sufficiently sample wide conformational regions of CaM.
MSDC-MD succeeded in providing the overall conformational change of the CaM on the FEL yield by v 1 and v 2 ; the conformational samples obtained from MSDC-MD sufficiently covered the NMR and X-ray crystal structures indicating the efficiency of MSDC-MD. Furthermore, the MSDC-MD structures include samples similar to the famous elongated structure (PDB ID 1cll, RMSD = 2.8 Å at minimum) and ring-like structure (PDB ID 4q5u, RMSD = 2.7 Å at minimum), as shown in Figure 2. As the Cα-Cα distance connected by covalent bonds is about 3.8 Å, we conclude that the MSDC-MD succeeded in sampling the important structures, even though we started the simulations at different conformations.
The accuracy of the MSDC-MD was also investigated by comparing the FELs obtained by MSDC-MD and the conventional MD. Although a long ordinary MD simulation of 5 µs was performed, the conventional MD only gave left-upper region of FEL on the v 1 , v 2 space. Then, one-dimensional FEL was calculated to compare the shape of FELs. The FEL obtained by the conventional MD on v 1 detected only the left basin, and it was trapped there until the end of the MD simulation. On the other hand, MSDC-MD can detect not only the left basin, but also another basin that is much deeper than the left one. Compared with the shape of the left basin, the shapes coincided with each other within the statistical error, which indicates that the MSDC-MD sufficiently reproduces the same entire FEL as the ordinary MD simulation, if the ordinary MD simulations of the millisecond time scale could be performed. The comparison of FEL on v 2 also demonstrated that the shape of FELs coincided with each other within the statistical error at the bottom of the ordinary MD's FEL.
The FEL analysis provided the picture of how conformational changes occur on the FEL, and the important interactions on each stage of conformational change (C1-5) were linked to the FEL. The interactions C1-5 were defined to analyze the NMR structures in our previous study [64]. From the distribution of the NMR structures only, one might think that conformational change would occur along a path (iii) → (ii) → (i) on the upper region. However, the MSDC-MD's FEL shows that the free-energy barrier between region (ii) and (iii), leads to the structures in region (iii) going back to region (iv). From the viewpoint of interactions, interactions C1-2 formed in region (iii) would be dissociated to form the elongated structure. Then, the process of conformational change on the FEL would occur along a path (iv) → (ii) → (i) when CaMBD is present. Such detailed information could not only be obtained from the experimental data, and the FEL analysis is necessary to complement the experimental data and investigate the biomolecular system more deeply.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Acknowledgments:
In this study, all structural figures were created by PyMOL [84], and some structural analyses were performed by statistical computing and graphic package 'R' [85]. This research, in part, used computational resources of Oakforest-PACS and Cygnus provided by the Multidisciplinary Cooperative Research Program in the Center for Computational Sciences, University of Tsukuba.

Conflicts of Interest:
The authors declare no conflict of interest.