In a series of recent studies, the authors and collaborators have discovered novel multiscale techniques that probe the cross-talk among multiple scales in space and time that are inherent within such systems, yet preserve all-atom detail within the macromolecular assemblies [
21–
32]. Multiscale perturbation methods can be described in a general framework. First, intermediary subsystem centers of mass that characterize the mesoscale deformation of the system and/or slowly-evolving order parameters (OPs) for tracking the long-scale migration of individual molecules are introduced. Broadly speaking, these OPs filter out the high-frequency atomistic fluctuations from the low-frequency coherent modes and describe coherent, overall structural changes. They have been utilized to capture a variety of effects of multiscale physical systems, including Ostwald ripening in nanocomposites [
24], nucleation and front propagation pathways during a virus capsid structural transition [
30] and counter-ion and temperature-induced transitions in viral RNA [
21,
32]. As they evolve on a much longer time scale than that of atomistic processes, the OPs serve as the basis of a multiscale analysis. Multiscale techniques are then used to provide evolution equations for the OPs and/or the subsystem center of mass variables, which are equivalent to a set of stochastic Langevin equations for their coupled dynamics. The resulting perturbative theory is the natural consequence of a long history of multiscale analysis in classical many-particle physics [
33–
39]. Finally, a computational, force-field-based algorithm suggested by the multiscale development can be implemented. In the current study, we shall validate the theory by comparison with MD within simulations of different structural components in satellite tobacco mosaic virus (STMV).
2.1. Coarse-Grained Variables
To describe the multiscale development, natural OPs must first be introduced. In the current context, they describe the global organization of many-particle systems and probe complex motions, such as macromolecular twisting or bending. Classic examples of OPs include the degree of local preferred spin or molecular orientation and mass density, for which profiles vary across a many-particle system. For a solid, profiles of particle deviation from rest lattice positions have traditionally been used. However, for a number of macromolecular assemblies, the timescale of many phenomena is comparable to that of migration away from lattice positions, making the latter a less sensitive OP. Furthermore, classical phase transition theories, like that for magnetization, are built on the properties of infinite systems, e.g., renormalization group concepts [
40]. In contrast, macromolecular assemblies are finite and, hence, cannot completely follow the theory of macroscopic phase transitions. Furthermore, they can reside in conformational states without a simple, readily-identifiable symmetry, e.g., ribosomes. Nonetheless, as pH and other conditions in the host medium change, the system can switch to a different conformation [
41]. Such a system experiences a structural transition between two states, neither of which has a readily-identifiable symmetry. This suggests that macromolecular OPs cannot be readily associated with the breaking of symmetry, even if they signify a dramatic change of order. Thus, an OP description is needed to signal the emergence of a new order in such systems when there are no readily-identifiable symmetries involved.
For nanoscale assemblies, OPs have been introduced as generalized mode amplitudes [
42]. More precisely, vector OPs
were constructed that characterize system dynamics as a deformation from a reference configuration of
N-atoms
. The set of time-dependent atomic positions
was previously expressed in terms of a collection of basis functions
and these OPs [
26]. Thus, variations in the OPs generate the structural transformations. Since the OPs characterize overall deformation, the
Uk functions vary smoothly across the system,
i.e., on the nanometer scale or greater. As one seeks only a few OPs (≪
N), this relationship between the atomic positions and OPs cannot completely describe individual atomic motion. Previously, this was addressed by introducing residuals to capture the short-scale atomic dynamics, deriving equations for the co-evolution of the OPs and the probability distribution of atomic configurations [
23,
24,
29,
43]. However, as many systems are easily deformed by thermal stress and fluctuation, large deformations cannot be considered as coherent changes determined by merely a few OPs. To deal with this, a slowly evolving hierarchical structure was introduced [
19] about which the construction of OPs is formulated. Instead of OPs depending explicitly on the
N-atom configuration, intermediary variables, representing the centers of mass (CMs) of subsystems within the structure, are utilized, and OPs which depend only on these CMs are constructed. Additionally, rather than being constrained by an initial reference configuration, as in [
24], basis functions depend on quantities that vary slowly with system-wide deformations. This allows for the methodology to accurately describe systems, such as macromolecular assemblies, which may undergo drastic changes and, hence, cannot be modeled as continuous transformations from a fixed reference configuration. Ultimately, the multiscale methodology based on these variables couples the atomistic and CG evolution to facilitate all-atom simulations of complex assemblies.
Of course, many biological structures are organized in a hierarchical fashion. For example, a non-enveloped virus may consist of about
N = 10
6 atoms, organized into about
Nsys = 100 macromolecules (e.g., protein and RNA or DNA). When the system is spheroidal, as for an icosahedral virus, it has a total diameter of about 100 typical atomic diameters (
i.e., around
N1/3), while the total mass of the system is
N · m, where
m represents the average atomic mass, and that of a typical macromolecule is about
mN/Nsys. To accurately represent this multiple mass and length scale structure, a hierarchical OP formulation is incorporated into the description of the system. On the finest scale, the dynamics are described by the 6
N atomic positions and momenta, denoted by
. Since the matter of interest is hierarchical, the overall structure is divided into
Nsys non-overlapping subsystems indexed by
S = 1, 2, …,
Nsys. The center of mass of each subsystem, given by:
serves as an intermediate-scale description. Here,
mi is the mass of atom
i;
MS is the mass of subsystem
S; and
is one if atom
i is in subsystem
S, and zero otherwise. Effectively, the
variables denote subsystem OPs that characterize the organization and dynamics of the
S−th subsystem.
While the centers of mass describe subsystem-wide motion, the largest scale of interest must also be described to illustrate changes in the overall structure of the system. Thus, a set of hierarchical OPs
is introduced to further characterize the collective behaviors. This is performed using a space-warping transformation [
26] that is modified to accommodate the present dynamically hierarchical structure. First, the relationship between OPs
and CMs
is defined by:
where
is the set of all subsystem centers of mass and
is a pre-chosen basis function depending on the CM of subsystem
S. The basis function
is constructed as
, where
k is a set of three integers
k1,
k2,
k3, implying the order of the Legendre polynomial
U for the
X,
Y,
Z components of
, respectively. As in [
19,
44], OPs labeled by indices
k = {000, 100, 010, 001} are denoted as lower-order, while
k > {000, 100, 010, 001} are higher-order. Notice that basis functions do not depend on each atomic position
, but rather on the intermediate scale variables,
, thereby ensuring a hierarchical foundation. Additionally, since the basis functions
depend on dynamic variables and not CMs of a fixed reference configuration (e.g.,
), the collection of expressions in
Equation (1) constitutes an implicit system of equations for the CMs. By choosing the set of
basis functions to be smoothly varying, the set of
tracks the overall coherent deformation of the system. As such deformation implies slow motion, one expects that the
variables will be slowly varying in comparison to CM migration and atomic fluctuation.
For a finite truncation of the sum in
Equation (1), there will be some residual displacement. Hence,
Equation (1) becomes:
where
is the residual distance for the
S−th subsystem. The OPs are then expressed precisely in terms of the
variables by minimizing the mass-weighted square residual:
with respect to Φ at constant
R. With
Equation (2), the expression for Σ becomes:
The optimal OPs are those that minimize Σ,
i.e., those containing the maximum amount of information, so that
terms are, on average, the smallest. Minimizing the sum in
Equation (3) as in [
25], the relationship between OPs and CMs becomes:
If one chooses a preliminary set of basis functions
to be, for instance, Legendre polynomials [
24,
31,
43], then the Gram–Schmidt procedure can be used to generate an orthonormal basis. In particular, the formulation is simplified within the current context, and basis functions are normalized, so that:
With this choice, a clear representation of the OPs emerges:
in terms of basis functions and subsystem CMs. Here,
μk serves as an effective mass associated with
and is proportional to the square of the basis vector’s length. The masses primarily decrease with increasing complexity of
[
32,
44]. Thus, the OPs with higher
k probe smaller regions in space. Specific sets of OPs can capture deformations, including extension, compression, rotation, tapering, twisting and bending. As the basis functions depend on the collection of CMs,
Equation (4) is an explicit Equation for
in terms of
. Hence, three differing levels of description are utilized: the finest scale of atomic vibration captured by Γ, the set of intermediate scale CM variables for each subsystem
and a global set of slowly evolving OPs given by
.
To conclude the hierarchical construction of variables describing the system, note that both the scaled subsystem CMs and global OPs vary slowly relative to individual atomic fluctuations, and thus, they can serve as the basis for a multiscale analysis. To reveal the respective time scales on which
and
evolve, it is convenient to define smallness parameters, in this case ϵ1 and ϵ2. Within the current context, these are ratios of masses that characterize the significant difference in motion throughout the system. Since the subsystem mass is significantly larger than that of the average atom, the parameter ϵ1 = m/MS, where m is a typical atomic mass, will accurately describe this separation of scales. In a similar manner, as μk represents the sum of subsystem masses, this quantity is large in comparison to MS. Hence, another scaling parameter is introduced, given by ϵ2 = MS/MTOT, where MTOT is the mass of the entire system. Finally, μk ≈ MTOT = MS/ϵ2 = m/ϵ1ϵ2 is the effective mass related to the k-th OP. There are a number of different scalings one may consider regarding the relative size of ϵ1 and ϵ2. However, only the situation in which the total system consists of a relatively small number of subsystems (e.g., a few pentamers) is considered within the current study, and hence, the second smallness parameter remains large relative to the first, i.e., ϵ2 = O(1). Additionally, the first parameter is rewritten as ϵ1 = for > 0 small.
To investigate the time rate of change of
and
, the Liouville operator:
is utilized, where
and
the momentum of and the net force acting on atom
i, respectively. Using
Equation (4), it follows that
, and
thus:
where
is the total momentum of the
S-th subsystem. Additionally, the terms appearing in
Equation (6) are given by:
Here,
is the conjugate momentum associated with the
k-th OP, while
appears due to the dependence of basis functions
on intermediate-scale CMs. Using the definition of
ϵ,
Equations (6) and
(7) yield:
for any of the OPs or CMs. Hence,
Equations (10) and
(11) demonstrate that the CMs and OPs evolve slowly, at a rate
O(
ϵ), in relation to the atomistic variables, and this formulation is consistent with the quasi-equilibrium distribution of all-atom configurations Γ at fixed values of
and
. Therefore, the set of
and
describes the slow dynamics of the macromolecular structure. Other variables, such as the preferred orientation of the macromolecules, could also be included [
24]. Next, the pair
will be shown to satisfy the Langevin dynamics (in contrast with the
), due to the key role of inertial effects underlying the motion of individual atoms and the long-scale nature of
and
.
2.2. Multiscale Theory and Analysis
To begin the multiscale analysis, the Liouville Equation is used to derive a conservation law for the slow dynamics of the system. Define
W to be the joint probability density for
and
by:
Here,
and
represent the sets of subsystem CMs and OPs evaluated at the atomic configuration Γ. With this, the Liouville equation for the
N-atom probability density
ρ(
t, Γ):
with the Liouville operator given by
Equation (5) is used to arrive at a conservation law [
23] for
W via the chain rule. Namely, taking a time derivative and integrating by parts yields the equation:
where:
This equation involves
ρ and is thus not closed with respect to
W. However, one finds [
19,
23] that this formulation enables a novel procedure for constructing a closed equation for
W when
ϵ is small. Note that the expressions for
and
can be determined explicitly by
Equations (8) and
(9).
Throughout, the hypothesis that the
N-atom probability density
ρ has a multiple scale character will be crucial within the analysis. Thus,
ρ can be represented to express its dependence on the atomic positions and momenta (denoted collectively by Γ), both directly and, via the set of OPs
and centers of mass
, indirectly, in the form:
The time variables, tn = ϵ nt, are introduced to track processes on time scales O (ϵ −n) for n = 0, 1, 2, 3, … The set
tracks time for the slow processes, i.e., much slower than those on the 10−14-second scale of atomic vibrations. In contrast, t0 tracks the latter fast atomistic processes. Note that the ansatz on the dependence of ρ is not a violation of the 6N degrees of freedom, but rather a way to express the multiple ways in which ρ depends on Γ and t.
With this, the ansatz on
ρ and the chain rule imply that the Liouville equation takes the form:
The operator
involves partial derivatives with respect to
and
computed at constant Γ, whereas the converse is true for
, which involves partial derivatives with respect to the Γ argument of
ρ at constant values of
and
. By mapping the Liouville problem to a higher dimensional description,
i.e., from Γ to
, the Equation can be solved perturbatively in this representation and the small
ϵ limit. Using this approximation for
ρ and the conservation law of
Equation (12), a closed equation for
W is ultimately obtained. Since
ϵ is small, the development can be advanced with an expansion
. Next, the multiscale Liouville equation is examined to each order in
ϵ. To the lowest order, one obtains the equation
under the assumption that
ρ0 is at quasi-equilibrium,
i.e., independent of
t0. Using an entropy maximization procedure [
25] with the canonical constraint of fixed average energy, the lowest order solution is determined to be:
where
β is the inverse temperature,
H is the Hamiltonian:
for
N-atom potential
V and the
,
partition function is given by:
To
O(
ϵ), one obtains the equation
. Using
Equations (14) and
(15), this yields:
with:
for the
-constrained Helmholtz free energy
F, satisfying
Q =
e−βF. Using the Gibbs hypothesis, which states the equivalence of long-time and thermal (
i.e.,
) averages, we find:
for any variable
A. As the
involve the sum of momenta, which tend to cancel, their thermal averages are zero, and hence,
−th = 0. Using this thermal average, dividing by
t0 in
Equation (17) and taking the limit as
t approaches infinity, one finds
. Thus, removing divergent behavior as
t0 → ∞ implies
∂W/∂t1 = 0, and hence,
W is independent of
t1. Therefore, this term within
Equation (16) vanishes.
As one seeks a kinetic theory correct to
O(ϵ
2), the approximation
ρ ≈ ρ0 + ϵ
ρ1 can be made. Using the conservation law of
Equation (12) with
Equations (15) and
(16), a closed differential equation for
W is finally obtained, namely:
where
τ = ϵ
2t. The diffusion coefficients (
D) are given in terms of correlation function expressions:
where 〈⋯〉 represents a thermal average over the
-constrained ensemble. The above equation for
W is of the Smoluchowski form and describes the evolution of the reduced probability density depending on a set of CMs
evolving and interacting with a set of collective variables
. Further analytic details can be found in [
19]. On the timescale on which the correlation functions decay for the present problem, the OPs are essentially unchanged. Therefore, to a very good approximation, the evolution in the correlation function occurs at constant OP values. This is simple to implement, as correlation functions can then be computed via standard MD codes. Of course, the Smoluchowski equation possesses associated Langevin equations that describe the stochastic dynamics of
and
, and this provides a computational foundation from which the behavior of the intermediate and collective variables can be simulated.
2.3. Langevin Equations and Multiscale Algorithm
The Smoluchowski equation provides a sound theoretical framework for stochastic OP dynamics. For practical computer simulation of viral systems, rigorous Langevin equations for the OPs equivalent to the above Smoluchowski equation can be derived [
19,
24,
25]. First, all centers of mass and order parameters are grouped into a single collection of CG variables, represented by:
where
M =
kmax+
Nsys is the total number of coarse variables. The representation of the Smoluchowski equation for
W rewritten in the single coarse-grained representation becomes:
where the diffusion factors and thermal average forces have been consolidated to match the consolidated coarse-grained variables. With this, the associated Langevin equations take the form:
for
k = 1,
…, M, where
is the thermal-averaged force and
is a random force. Here, the stochastic process
is stationary, and all average random forces vanish. More specifically, the solution of
Equation (19) must be the probability density for the collection of stochastic processes
that satisfies the Langevin
Equation (20). As the latter equation completely describes the evolution of coarse variables, it can be used to simulate the dynamics of the pertinent modes within the
N-atom system.
In particular, the OP and CM velocity autocorrelation functions provide a criterion for the applicability of the present multiscale approach. If the reduced description is complete,
i.e., the set of OPs and CMs considered do not couple strongly with other slow variables, then the correlation functions decay on a time scale much shorter than the characteristic time(s) of OP evolution [
44]. However, if some slow modes are not included in the set of OPs, then these correlation functions can decay on timescales comparable to those of OP dynamics [
44,
45]. This is because the missing slow modes, now expressed through the all-atom dynamics, couple with the adopted set of OPs, and the present approach fails under such conditions. For example, setting the lower limit of the integrals in
Equations (18) to
−∞ may fail to be a good approximation, and the decay might not be exponential; rather, it may be extremely slow, so that the diffusion factor diverges. Consequently, atomistic ensembles required to capture such a long-time tail behavior in correlation functions are much larger than those for capturing a rapid decay. Here, such situations are avoided via an automated procedure of understanding the completeness of the reduced description and adding additional OPs when needed (as discussed in [
21,
32]). Adapting this strategy ensures that the OP velocity autocorrelation functions decay on timescales that are orders of magnitude shorter than those characterizing coherent OP dynamics, and thus, the present multiscale approach applies. Next, a simulation algorithm is developed in order to utilize this formulation of the problem and its description in terms of the Langevin equations.
The starting point for the multiscale computational algorithm is the deformation of the initial reference configuration and the Langevin
Equations (20). Given the all-atom structure of a macromolecular assembly at time
t = 0, the number of subsystems
Nsys is identified, and their CMs
are calculated. These subsystem CMs are then used to construct the global OPs in
Equation (4), thereby capturing the structural hierarchy of an assembly. Then, multiple short MD simulations are used to construct a quasi-equilibrium ensemble of atomic configurations consistent with the instantaneous
and
description [
21,
45]. This ensemble is employed to construct the diffusion factors,
and thermal-averaged forces,
. Further details regarding the ensemble generation procedure and construction of these factors are provided in [
44]. Using the forces and diffusions, the OPs and subsystem CMs are evolved via Langevin equations to capture overall assembly deformation. As these equations form a coupled system of
M =
kmax +
Nsys stochastic differential equations, the dimension of the problem is reduced, since the number of scaled molecular CM positions and OPs is much less than
N. Updating the set of CMs every Langevin time step enables the reference configuration to slowly vary with the system over long times. The updated reference configuration
is used to compute new basis polynomials,
. With these and the Langevin-evolved
, an ensemble of CM configurations, each of which is consistent with the instantaneous state of
, is constructed via
Equation (2). Next, the reference configuration, OPs and OP-constrained ensemble of CMs are simultaneously evolved. Since the evolution of the OPs is inherently connected to that of the scaled positions and these are, in turn, dependent on atomic trajectories, the algorithm is completed by a procedure that allows repositioning of the atoms consistent with the overall structure provided by
and
. With the new set of atomic positions, both the forces and diffusions are recalculated to enable further Langevin evolution. Thus, OPs constrain the ensemble of subsystem and atomic states given by
Equations (1) and
(2), while the latter determine the diffusion factors within
Equation (18) and thermal-average force of
Equation (17) that control OP evolution within
Equation (20). In this way, the ensemble of atomic configurations is co-evolved with the global OPs and subsystem CMs.
Within simulations, water and ions are accounted for via the quasi-equilibrium ensemble (
i.e., the configuration of the water and ions rapidly explores a quasi-equilibrium ensemble at each stage of the OP dynamics). This assumption holds only when water/ions equilibrate on a timescale much smaller than that of the OPs. Therefore, fluctuations from the solvent modulate the residuals generated within the MD part of the constant OP sampling and, hence, affect the thermal-averaged force. If slow hydrodynamic modes are found to be of interest, these atoms can be included in the definition of the OPs. The emergence of such coupled slow modes is also indicated by the appearance of long-time tails in the OP velocity autocorrelation functions. However, such tails are not observed by the simulation study within the next section, as is also confirmed via agreement with MD. When ions are tightly bound to the macromolecule, they are considered part of the OPs. After every Langevin time step, an ion accessible surface is constructed via visual molecular dynamics (VMD), and ions close to the surface are tracked during the MD ensemble enrichment calculation. Those with appreciable residence time within the surface are included in the definition of the OPs henceforth. A similar solvation scheme has already been utilized with OPs in simulating virus capsid expansion in Na
+ and Ca
2+ solutions [
30].
Constructing atomic structures with modest to high Boltzmann probability that are consistent with the free energy minimizing pathway of the assembly is often not possible if only subsystem CMs and coarser-grained variables are known. This is because there are too many structures consistent with the same overall description, only a few of which contribute to the free energy minimizing pathway. Thus, though the above multiscale methodology formally derives Langevin equations from the N-atom Liouville equation, it is impractical to apply it as a simulation tool. To overcome this issue, each subsystem is described by a set of subsystem-centered variables that characterize not only their position, but also orientation and overall deformation. The number of all-atom structures consistent with this information is much less than that constrained only by the CM information. Thus, limited (though still quite large) ensemble sizes suffice for average calculations. In the next subsection, conventional MD simulations are used along with the above-mentioned procedures for calculating OPs, thermal forces and diffusions to elucidate scenarios where a dynamical reference configuration is required for capturing assembly dynamics. Finally, the issue of atomic reconstruction is addressed, and a computationally-feasible workflow is derived for implementing the perturbation method.
2.4. Simulation Results and Discussion
The multiscale analysis developed in the previous section yields a Smoluchowski equation for evolving the reduced probability of the OPs and CMs. For practical simulations, Langevin equations were derived from these Smoluchowski equations, wherein the forces and friction/diffusion coefficients can be obtained via ensemble methods and short MD simulations. An OP-based Langevin simulation algorithm has been developed and implemented within the deductive multiscale simulator (DMS) [
21,
32]. However, in these studies, OPs are defined in terms of a fixed reference configuration (not a dynamical one), and any structural change is considered a deformation of this reference structure. In contrast, a dynamical reference configuration is now introduced to construct OPs and subsequently uses these OPs to probe the structure and dynamics of a macromolecular assembly. Then, using all-atom data (positions, velocities and forces) from MD (namely, the NAMD parallel code [
46]) trajectories with classical force fields (CHARMM27 [
47]), the behavior of the thermal average forces and diffusion factors in the Langevin equations are analyzed and compared between contrasting simulations of connected
versus disconnected systems. Finally, the effect of simultaneous
Langevin evolution on the accuracy of multiscale macromolecular assembly simulations is deduced via their direct comparison with MD predictions. These ideas are demonstrated using MD and multiscale simulations of the RNA-mediated assembly of STMV capsid proteins and the expansion of its capsid-free RNA. This choice of demonstration system is made according to the criteria that the system must be large enough, so that the timescale separation between individual atomistic and overall structural dynamics warrants a multiscale approach, yet small enough so that complex dynamical behaviors are observed within 10 ns of MD simulation. All simulation parameters are provided in
Table 1.
Consider a 10-ns MD simulation of STMV protein monomers assembling with RNA in 0.25 M NaCl (
Figure 1). The initial configuration of the system is a random-coil state of the 949 nucleotide RNA surrounded by 60 randomly-placed capsid monomers (accounting for 12 pentamers). During the simulation, proteins are electrostatically attracted to the RNA, and the system begins to organize into an RNA core with an external protein shell. With this, the protein monomers gradually transition from disconnected to a non-covalently bonded state. There exist multiple pathways that lead to such self-assembly in STMV [
48]. However, the aim of this study is not to analyze these mechanisms (
Figure 2). Here, it is understood how such a structural transition can be captured by the Langevin
Equations (20). To probe the contributions from different terms in the equations as they vary with the nature of system dynamics, OPs, CMs, thermal forces and diffusions from the MD assembly simulation are compared to those from an RNA expansion in 0.25 M NaCl solution. The RNA of STMV is tightly encased within the capsid core in an icosahedral structure via strong electrostatic interactions [
48]. As the capsid is removed, electrostatic repulsion among neighboring negatively-charged nucleotides causes the system to expand, so that the repulsive forces subside [
21]. This simulation provides a contrasting example to that of the assembly as, now, the subunits (the pentamers of nucleotide helices (
Figure 2)) are moving further apart and not towards one another. Furthermore, the connectivity of the system is maintained throughout the simulation.
In this vein, all-atom configurations derived every 100 ps from the MD simulation of the monomer assembly are used in
Equations (2) and
(4) to reconstruct the evolution of CMs and global OPs. The OPs considered (with
k = {100, 010, 001}) capture the overall dilation/compression of the STMV-RNA assembly along the three Cartesian directions, and
incorporates the CMs of the protein monomers. The results of MD simulations imply that the rate of change of polynomials
U is slower than that of the OPs
, which, in turn, is slower than the subsystem CMs
. This is because, while
U and
characterize the motion of the entire system,
does so for only a subunit. In particular, the change in the basis functions
U is the slowest, as it varies smoothly across the system. Though slow, such changes suggest the use of a dynamical reference configuration for constructing OPs. All three variables, in turn, change on a timescale several orders of magnitude larger than atomic fluctuations. Thus, even though there exists a spatio-temporal scale separation between the three types of coarse-grained variables, it is much smaller in comparison to their separation with the atomic scale. As a result, it is assumed that the three variables change on a similar timescale relative to all-atom fluctuations. Finally, in order to gauge the accuracy of multiscale simulations, their results are benchmarked against those from conventional MD. The RNA-mediated protein assembly and expansion of the capsid-free RNA are simulated for 10 ns. The multiscale and MD simulations are implemented with identical initial structures and conditions from the previous section.
First, consider the results from the assembly simulation. The system is described using 3
3 global OPs along with 2
3 subsystem OPs for each of the 60 monomers and 3
3 variables elucidating the coarse-grained structure of the RNA. The number of these OPs is a natural consequence of the residual minimization within
Equation (3) and therefore implies maximum structural information at the CG level. This set of OPs can be systematically enriched if found to be incomplete (
i.e., when the OP velocity autocorrelation functions possess long-time tails. Simulation results imply the global, as well as subsystem thermal averaged force distribution, and the diffusion coefficients show excellent agreement with those from MD (
Figures 2a,b). As above, the forces on the monomers are primarily negative, implying attractive interaction with the RNA. Such forces facilitate the observed aggregation. The evolution of large-scale structural variables, including global and subsystem OPs, radius of gyration and root mean square deviation (RMSD) from the initial configuration, are presented in
Figures 2c–f.
Figure 2g shows the potential energy for the multiscale and MD simulations. These structural variables and energy profiles show excellent agreement in trend, as well as in magnitude. As the protein monomers and the RNA aggregate, the potential energy gradually decreases, indicating stabilization of the system. This trend is consistent with an increase in the number of inter-nucleic acid hydrogen bonds and suggests that the RNA gains a secondary structure during assembly. The observed difference is within the limits of those from multiple MD runs beginning from the same initial structure with different initial velocities. The agreement in simulated trends, as also visually confirmed in
Figure 2h, suggests that the multiscale procedure generates configurations consistent with the overall structural changes that arise in MD. However, care should be taken in comparing atomic scale details, such as the dihedral angle or bond length distribution, between the conventional and multiscale simulation procedures. The latter evolves an ensemble of all-atom configurations, e.g., ensembles of size 2 × 10
3 generated during every Langevin time step of 50 ps, to compute forces and diffusions. The thermal averaged forces remain practically unchanged with a further increase in ensemble size. Thus, such trajectories should be compared to an average sample of multiple MD (or a single very long MD) simulations. Such MD simulations are infeasible for this system, due to the large number of atoms involved and the long times needed to study the phenomena of interest. Therefore, a strict computational time comparison is avoided, though the equivalence of our multiscale simulations with ensemble MD methods at the atomic scale has been investigated for smaller systems, such as lactoferrin [
19,
45].
The MD and multiscale trajectories of RNA expansion also show comparable trends. These data are not presented here for the sake of brevity (see [
21]). Multiscale trajectories for both of the demonstration systems are repeated using a fixed
versus dynamical reference structure. In
Figure 2f, the resulting RMSDs from the identical initial structures are presented. For the RNA, the multiscale trajectories with and without re-referencing show considerable agreement with those from MD. Contrastingly, for the protein assembly, only the trajectory with the dynamical reference structure shows agreement with MD. From this result, independent multiscale simulations confirm the need for the coupled evolution of reference configuration CMs, OPs and atomic ensembles to account for complex deformations in macromolecular assemblies. With these results, it is apparent that the multiscale methodology induces an algorithm that can be implemented to simulate the behavior of slowly-evolving phenomena within macromolecular systems at a fraction of the computational expense incurred by the use of long-time MD simulation.