Multiscale Models for Fibril Formation: Rare Events Methods, Microkinetic Models, and Population Balances

Amyloid fibrils are thought to grow by a two-step dock-lock mechanism. However, previous simulations of fibril formation (i) overlook the bi-molecular nature of the docking step and obtain rates with first-order units, or (ii) superimpose the docked and locked states when computing the potential of mean force for association and thereby muddle the docking and locking steps. Here, we developed a simple microkinetic model with separate locking and docking steps and with the appropriate concentration dependences for each step. We constructed a simple model comprised of chiral dumbbells that retains qualitative aspects of fibril formation. We used rare events methods to predict separate docking and locking rate constants for the model. The rate constants were embedded in the microkinetic model, with the microkinetic model embedded in a population balance model for “bottom-up” multiscale fibril growth rate predictions. These were compared to “top-down” results using simulation data with the same model and multiscale framework to obtain maximum likelihood estimates of the separate lock and dock rate constants. We used the same procedures to extract separate docking and locking rate constants from experimental fibril growth data. Our multiscale strategy, embedding rate theories, and kinetic models in conservation laws should help to extract docking and locking rate constants from experimental data or long molecular simulations with correct units and without compromising the molecular description.


Introduction
Proteins and peptides interact with water and with each other via numerous hydrophobic residues, charged residues, and hydrogen bond donors and acceptors [1][2][3]. Their diverse functional groups allow them to engage in a huge variety of intramolecular folds and intermolecular associations. The enormous number of conformational states and intramolecular interactions make the protein folding problem extremely challenging [4][5][6]. Likewise, protein-protein complex formation and peptide fibril assembly are complicated by the enormous number of intra-and intermolecular interactions.
The mechanism by which protein fibrils form has been studied extensively [27][28][29][30][31][32]. At a coarse level of detail, fibril formation begins with one or two-step [27,28] versions of one-dimensional nucleation [33][34][35][36] followed by non-equilibrium growth. Processes such as secondary nucleation, i.e., breakage of fibrils to create new nuclei, and merging of as secondary nucleation, i.e., breakage of fibrils to create new nuclei, and merging of fibrils can also contribute to the fibril formation kinetics [27]. The growth mechanism and kinetics are the primary focus of this paper. Growth is thought to proceed by the addition of monomers in a two-step dock-lock mechanism [37][38][39]. The monomer first "docks" on the end of the fibril and forms initial interactions. It then "locks" into the morphology of the stable fibril structure while expelling any intervening water molecules. A single peptide can explore multiple docked states on a preformed fibril before it finally locks into the fibril's native structure. As such, the energy surface associated with these systems is rugged, with numerous local minima. Moreover, many of the docked states are off-path and meta-stable complexes [40]. These aspects of the docked state(s) make simulation of fibril growth extremely difficult.
Past simulations of fibril growth can be grouped into two main categories: (1) studies that compute the association constant for the combined docked and locked states [41][42][43][44][45], and (2) studies that focus on the time required to lock from the docked states, e.g., with Markov state models [46][47][48][49][50][51][52][53][54]. However, the docking and locking steps, with different concentration dependences, are rarely considered together as required for an accurate prediction of the overall growth rate. The schematics in Figure 1 depict these two incomplete modeling strategies.  [55] to describe the association of a Barnase and Barstar complex [56]. Their atomistic simulations give rate constants with correct units: events/time/concentration for association and events/time for transitions between bound states.
Fawzi et al. studied the critical nucleus size and the elongation mechanism of two different quaternary forms of Aβ   [57]. Using a coarse-grained model of Aβ , they computed the fraction of trajectories that docked at the fibril end when started at a specific distance from the fibril. This is a key step in the NAM framework [55], but they did not complete the calculations to obtain docking and locking rate constants.
In this article, we treated each step in the lock-dock mechanism separately, using a microkinetic model to combine both steps into an overall fibril growth rate expression. The rate expression allowed us to extract rate constants from previously obtained experimental data. Using a simple fibril assembly model, we also parametrized the microkinetic  [55] to describe the association of a Barnase and Barstar complex [56]. Their atomistic simulations give rate constants with correct units: events/time/concentration for association and events/time for transitions between bound states.
Fawzi et al. studied the critical nucleus size and the elongation mechanism of two different quaternary forms of Aβ (1-40) [57]. Using a coarse-grained model of Aβ , they computed the fraction of trajectories that docked at the fibril end when started at a specific distance from the fibril. This is a key step in the NAM framework [55], but they did not complete the calculations to obtain docking and locking rate constants.
In this article, we treated each step in the lock-dock mechanism separately, using a microkinetic model to combine both steps into an overall fibril growth rate expression. The rate expression allowed us to extract rate constants from previously obtained experimental data. Using a simple fibril assembly model, we also parametrized the microkinetic model with docking and locking rates from separate rare events calculations and compared predictions to direct simulations of fibril growth. Finally, we integrated the growth rate Life 2021, 11, 570 3 of 18 expression in a population balance model to describe an ensemble of growing fibrils. This multiscale treatment (Figure 2) of the growth process allowed us to directly use simulation or experimental data of fibril length vs. time to extract the rate constants for the proposed growth rate model. We demonstrate this in the last section by using our model together with experimental and simulation growth data for Aβ .
Life 2021, 11, x FOR PEER REVIEW 3 of 19 model with docking and locking rates from separate rare events calculations and compared predictions to direct simulations of fibril growth. Finally, we integrated the growth rate expression in a population balance model to describe an ensemble of growing fibrils. This multiscale treatment (Figure 2) of the growth process allowed us to directly use simulation or experimental data of fibril length vs. time to extract the rate constants for the proposed growth rate model. We demonstrate this in the last section by using our model together with experimental and simulation growth data for Aβ .

Docking and Locking Constants
For bimolecular reactions between spheres or charged spheres in solution, one can obtain rate constants from the steady-state Smoluchowski [58] or Debye [59] theories. For irregular molecules, one can monitor many reactive Brownian dynamics (BD) trajectories to calculate the rate constants. However, this is in practice impossible in an infinite domain because it requires us to simulate a huge box and wait for long times as the reactants diffuse from large distances to react (which may never happen). An alternative route to the docking rate constant D k is that developed by Northrup, Allison, and McCammon (NAM) [55]. NAM connected the probabilities calculated from finite domain BD trajectories to the corresponding probabilities in the infinite domain. In their method, they place the binding site of one molecule at the origin. A second molecule is placed on the surface of a sphere of radius 1 b from the origin. The distance 1 b is selected so that the two molecules interact only through an isotropic coulomb potential ( ) U r for separation A second sphere with radius 2 1 b b > is also centered at the origin as a cutoff distance for terminating Brownian trajectories that wander too far from the target. Multiple trajectories are then initiated from distance 1 b to estimate the probability that the second molecule docks at the fibril end (origin) before reaching the outer sphere at radius 2 b 2 trajectories that reach docked state before . total no. trajectories

Docking and Locking Constants
For bimolecular reactions between spheres or charged spheres in solution, one can obtain rate constants from the steady-state Smoluchowski [58] or Debye [59] theories. For irregular molecules, one can monitor many reactive Brownian dynamics (BD) trajectories to calculate the rate constants. However, this is in practice impossible in an infinite domain because it requires us to simulate a huge box and wait for long times as the reactants diffuse from large distances to react (which may never happen). An alternative route to the docking rate constant k D is that developed by Northrup, Allison, and McCammon (NAM) [55]. NAM connected the probabilities calculated from finite domain BD trajectories to the corresponding probabilities in the infinite domain. In their method, they place the binding site of one molecule at the origin. A second molecule is placed on the surface of a sphere of radius b 1 from the origin. The distance b 1 is selected so that the two molecules interact only through an isotropic coulomb potential U(r) for separation r > b 1 . A second sphere with radius b 2 > b 1 is also centered at the origin as a cutoff distance for terminating Brownian trajectories that wander too far from the target. Multiple trajectories are then initiated from distance b 1 to estimate the probability that the second molecule docks at the fibril end (origin) before reaching the outer sphere at radius b 2 p = trajectories that reach docked state before b 2 total no. trajectories .
(1) Figure 3 shows the set-up for the BD trajectories in the NAM method. Note that p is smaller than probability p ∞ that the second molecule will dock rather than escape by diffusion to an infinite distance from the fibril end. NAM showed that where Ω is the probability that a particle at r = b 2 will eventually return to r = b 1 . This is obtained analytically from the Smoluchowski equation [58] as with radius 1 b , Equation (3) simplifies to The overall rate constant can be defined as the rate of diffusion to the surface of a sphere with radius 1 [58] multiplied by the probability of reacting rather than diffusing to infinite separation, starting from separation 1 b ( p ∞ ) [55]: Here, 0 D is the diffusion constant for relative motion of the particles, equal to the sum of their self-diffusion constants when no forces or hydrodynamic interactions exist For complex systems where p might be extremely small, e.g., because of large peptides that interact with the fibril over long distances and therefore require a large value of When the potential of mean forces U(r) is zero and D is constant outside sphere with radius b 1 , Equation (3) simplifies to Ω = b 2 /b 1 . The overall rate constant can be defined as the rate of diffusion to the surface of a sphere with radius b 1 (Smoluchowski's 4πD 0 b 1 ) [58] multiplied by the probability of reacting rather than diffusing to infinite separation, starting from separation b 1 (p ∞ ) [55]: Here, D 0 is the diffusion constant for relative motion of the particles, equal to the sum of their self-diffusion constants when no forces or hydrodynamic interactions exist for r > b 1 [55].
For complex systems where p might be extremely small, e.g., because of large peptides that interact with the fibril over long distances and therefore require a large value of b 1 . These systems may require more sophisticated sampling methods such as transition interface sampling (TIS) [60] or forward flux sampling (FFS) [61] to estimate the committor p.
In atomistic simulations, the dock to lock transition is extremely complicated. Much like the protein folding problem, there are many docked states and only one (or perhaps two) locked states. Advanced sampling techniques for rare events are useful in studying the complicated dock-to-lock transition but most of these techniques require reaction coordinates that can describe the transition well. Finding a suitable reaction coordinate can be difficult [62], but there have been successful examples using native contact maps [47,48,50]. In this work, we used umbrella sampling and Langer's method [63] to study the dock to lock transition. Starting from a set of reaction coordinates q and the free energy surface (FES) from umbrella sampling, we could expand the FES around the saddle point (q ‡ ) as Matrix A should have n − 1 positive eigenvalues and a single negative eigenvalue where n is the number of dimensions in q. The same free energy expansion can be used for the reactant basin where q 0 is the coordinate of the bottom of the well of the reactant basin. The diffusion tensor at the saddle point is also needed for Langer's theory Here, δq i (t) = q i (t) − q i (t) ‡ and the double dagger sign indicates that the average is calculated using trajectories started at the saddle point. Having calculated D ij , we could obtain the locking rate constant from where −λ + is the single negative eigenvalue of the matrix DA,

Microkinetic Model
Although many studies refer to one step as fast [39,64], the docking and locking processes happen at the same net rate when fibril is growing at a steady state. The forward and backward docking and locking rates also depend differently on the monomer concentration, and thus k D and k L have different units. The docking process is a diffusioncontrolled, bi-molecular association process, and therefore k D has units of L/mol/s [44,65]. The locking process, however, is a conformational rearrangement with a first-order rate constant similar to a protein folding process, with units of events/s [47].
An overall rate law for the dock-lock growth mechanism can be developed using microkinetic modeling techniques that are familiar in catalysis [66]. For example, Massi and Straub developed a microkinetic model for the elongation rate of a fibril [65]. They included two different states for both the fibril and the monomer and derived a rate expression with six different rate constants. Massi and Straub's work is too elaborate to be fitted with most datasets, but they extracted each constant using multiple fitting techniques and data from multiple experimental methods.
Here, we try to simplify the growth rate expression by proposing a model where the fibril end is always in either a docked state or a locked state. The underlying assumption is that the docked states of the system inter-convert on a faster timescale than the locking process and the overall growth process, and therefore all the docked states can be lumped into one. This means that the system has no on-or off-path meta-stable states other than the docked state mentioned above. We further assumed that the monomers can only bind to a locked end. Figure 4 shows the overall schematic of the growth process.
Here, we try to simplify the growth rate expression by proposing a model where the fibril end is always in either a docked state or a locked state. The underlying assumption is that the docked states of the system inter-convert on a faster timescale than the locking process and the overall growth process, and therefore all the docked states can be lumped into one. This means that the system has no on-or off-path meta-stable states other than the docked state mentioned above. We further assumed that the monomers can only bind to a locked end. Figure 4 shows the overall schematic of the growth process. By using pseudo-steady state approximation and computing the net flux between any docked and locked state, we obtained the net growth rate as [66]

Population Balance Model
Here, we assume that fibrils grow irreversibly, e.g., because the monomers do not unlock once they are locked into a fibril (  By using pseudo-steady state approximation and computing the net flux between any docked and locked state, we obtained the net growth rate as [66] Here, k D is the docking rate constant, k L is the locking rate constant, [M] is the monomer(peptide) concentration, k UL is the reverse rate for the locking step, and K D is the equilibrium constant for the docking step.

Population Balance Model
Here, we assume that fibrils grow irreversibly, e.g., because the monomers do not unlock once they are locked into a fibril (k UL = 0) or because the docking equilibrium constant K D is large. If we take the case where k UL = 0, then the growth rate equation will be and the following master equation describes the fibril population The equation for dρ/dt would also contain ρ(L + 1, t) terms if we considered fibril dissolution. Expanding ρ(L − 1, t) around L and keeping terms up to the second order gives Putting Equation (13) back into Equation (12) gives This is now a Fokker-Planck equation for variable L. We note that Equation (14) only describes growth. Nucleation, aggregation, and breakage of fibrils can be included in more elaborate PBEs [67], but we do not attempt that here. Moreover, note that there are no nucleation or fragmentation events in the PDE, and thus the total number of fibrils is constant, In a finite closed system with a fixed number of monomers, the concentration of free dissolved monomers [M] decreases as they attach to the fibrils: Equations (14) and (16) describe the case where the fibril grows from only one end. In cases where the fibril can grow from both ends, the coefficients for length derivatives in Equation (14) and the right side of Equation (16) should be multiplied by a factor of 2.
The monomer concentration might be held constant at [M] = [M] 0 throughout the experiment, e.g., by a continuous flow of solution over immobilized fibrils. In this case, Equation (14) can be simplified by the substitution (17) to give ∂x(L, t) ∂t where x(L, t) = ρ(L, t)/ρ tot . Full derivation steps are available in the accompanying Supplementary Information. The solution to Equation (18) with the initial distribution Is Here, g(L, t|L 0 ) is the Green's function given by Figure 5 shows the solution to the population balance model at various dimensionless times t vs. fibril length L. Figure 5 shows the solution to the population balance model at various dimensionless times t vs. fibril length L .  When the system has a fixed total number of peptides so that growth depletes monomer concentration, Equation (16) can be written as and ρ tot is defined as in Equation (15). The solution to Equation (22) is In Equation (23), W(z) is the Lambert W function, which is the solution w to z = we w . Using Equation (23), we can rewrite the population balance model in Equation (14) for a system with a fixed number of peptides. Division through by r g ([M] 0 ) and ρ tot gives Dividing Equation (24) where The solution to Equation (25) will have the same format as Equation (18), i.e., with τ defined as Note that all the variables and kinetic parameters of the system (i.e., [M] 0 , ρ tot , k D , k L , and K D ) are now absorbed into α and ε. Figure 6 illustrates the relation between τ and t for different values of α. The relation between τ and t changes from equality in the constant monomer concentration case to a Lambert W function in the monomer depletion case. Equation (28) shows that the time scale of the growth process "stretches" as monomers are consumed. This is consistent with our intuition-as the monomer concentration decreases, the driving force for growth reduces, it takes longer for the monomers to find fibril ends, and the time needed for the addition of monomers to the fibril increases.
Life 2021, 11, 570 9 of 18 tween τ and t for different values of α . The relation between τ and t changes from equality in the constant monomer concentration case to a Lambert W function in the monomer depletion case. Equation (28) shows that the time scale of the growth process "stretches" as monomers are consumed. This is consistent with our intuition-as the monomer concentration decreases, the driving force for growth reduces, it takes longer for the monomers to find fibril ends, and the time needed for the addition of monomers to the fibril increases. We can also use Equation (28) to create a plot like Figure 5 for the varying monomer concentration case. Figure 7 shows how the fibril populations change with t when the monomer concentration is not held constant. As expected, the population of fibrils with 10 L = (initial length) grows towards longer fibrils until the system runs out of monomers. The growth process is also slowed down by the fact that the driving force for growth (i.e., monomer concentration) is constantly decreasing. This means that at the same t, the We can also use Equation (28) to create a plot like Figure 5 for the varying monomer concentration case. Figure 7 shows how the fibril populations change with t when the monomer concentration is not held constant. As expected, the population of fibrils with L = 10 (initial length) grows towards longer fibrils until the system runs out of monomers. The growth process is also slowed down by the fact that the driving force for growth (i.e., monomer concentration) is constantly decreasing. This means that at the same t, the peaks in Figure 7 are higher and at lower L values compared to their counterparts in Figure 5.  The analysis laid out above allows us to use time-series data of monomer concentration and fibril length from simulation data or experimental results together with Equation (21) or Equation (27) to extract values for D k , D K , and L k .

Simple Model and Simulations
To test our rate equation with simulation data, we developed a simple model system with two types of diatomic molecules that associate in "docked" or "locked" states. The docked and locked states in our model correspond to fibril ends with a specific chirality as shown in Figure 8. The interaction potential is made up of several components:  (27) with τ defined by Equation (28). Initial fibril length distribution is x 0 (L) = δ(L − 10). The same values of α and ε are used so that Figures 5 and 7 are comparable. The inset shows the change in dimensionless monomer concentration m with dimensionless time (Equation (23)).
The analysis laid out above allows us to use time-series data of monomer concentration and fibril length from simulation data or experimental results together with Equation (21) or Equation (27) to extract values for k D , K D , and k L .

Simple Model and Simulations
To test our rate equation with simulation data, we developed a simple model system with two types of diatomic molecules that associate in "docked" or "locked" states. The docked and locked states in our model correspond to fibril ends with a specific chirality as shown in Figure 8. The interaction potential is made up of several components:  (23)).
The analysis laid out above allows us to use time-series data of monomer concentration and fibril length from simulation data or experimental results together with Equation (21) or Equation (27) to extract values for D k , D K , and L k .

Simple Model and Simulations
To test our rate equation with simulation data, we developed a simple model system with two types of diatomic molecules that associate in "docked" or "locked" states. The docked and locked states in our model correspond to fibril ends with a specific chirality as shown in Figure 8. The interaction potential is made up of several components:  Particles of each dumbbell are bonded together with a harmonic potential (1-2 and 3-4 interactions in Figure 8).

2.
Particles of types 1 and 2 interact with particles of types 3 and 4 through a Lenard-Jones potential. This potential contributes equal stability to both the docked and locked states.

3.
The centers of mass (COM) of molecules of type 1-2 and 3-4 interact through a shortranged Lenard-Jones potential. This potential stabilizes the fibril and prevents it from dissociating. Inclusion of these LJ interactions also allows us to reduce the strength of LJ interactions between edges of the fibril and molecules in solution, which might otherwise promote branching and secondary nucleation.

4.
A Weeks-Chandler-Andersen potential between the particles in the same type of molecule, i.e., between 1 and 1, 1 and 2, 2 and 2, 3 and 3, 3 and 4, and 4 and 4, prevents the fibril from forming unstructured oligomers.

5.
A channel that guides incoming dumbbells to the docked state is introduced by using a combination of three 2D Gaussian functions (lines 5, 6, and 7 in Equation (29)): • The wall: This part of the force field prevents the dumbbells from directly locking into the fibril (vertical wall near r COM = 2 in Figure 9).

•
The channel: This guides the trajectories into the docked basin after they pass the wall. • Locked state tilting: This Gaussian function is included to make the locked state more favorable than the docked state.
Values of the energy and geometry parameters of the interaction potential are shown in Table 1.
The reaction coordinates that describe the dumbbells and their relative positions are as follows: r COM is the center-of-mass to center-of-mass coordinate distance between two dumbbells, and q is a chirality coordinate that describes the structure of each dimer. These are defined for each pair of molecules using the positions of the particles 1 through 4 via the following definitions: Here, r 0 is the equilibrium bond length. The chirality coordinate is defined to differentiate between the two enantiomers. Values of q range from −1 to 1, where −1 and 1 (at a known separation r) are the two possible enantiomers. The extrema of q are not exactly equal to −1 and 1, due to the vibrations of the bonds during the simulation.
All the trajectories are generated following the overdamped dynamics and using the Euler-Maruyama integrator [68] x where D is diffusivity (identical for all atoms), β = 1/k B T, and ξ is a Gaussian random number with zero mean and unit variance. NAM analysis is performed following the protocol described in Section 2 by putting a fibril made of four dumbbells in the center of the simulation box and a dumbbell of type 3-4 at a given distance b 1 . The trajectories will run until the distance between the dumbbells becomes either smaller than the average center-to-center distance in the docked (or locked) state or greater than the predefined cut-off distance b 2 . Depending on the starting separation, 10,000 or 20,000 trajectories are launched from distance b 1 = 12, 13, 14, 15, and 16 Å. The number of successful trajectories are counted, and success probability is then calculated using Equation (1). The number of trajectories that go directly to the locked state is also tracked to make sure that nearly all dumbbells initially attach in the docked state. locked) state or greater than the predefined cut-off distance 2 b . Depending on the starting separation, 10,000 or 20,000 trajectories are launched from distance 1 b = 12, 13, 14, 15, and 16 Å. The number of successful trajectories are counted, and success probability is then calculated using Equation (1). The number of trajectories that go directly to the locked state is also tracked to make sure that nearly all dumbbells initially attach in the docked state. Figure 9. Free energy surface obtained from umbrella sampling and weighted histogram analysis [69]. The solid lines show the path of a typical reactive trajectory. The free energy profile between a dumbbell and a preformed fibril of length L = 6 as a function of r COM and q is calculated using umbrella sampling. The free energy profile is then used in Langer's theory as described in Section 2 to calculate the locking rate constant. The docking equilibrium constant (K D ) is also calculated using the free energy profile.

Results and Discussion
The docking rate was computed through the NAM method described in Section 2. Table 2 shows results for different starting separations b 1 and cutoff distances b 2 . The success probability p was calculated using Equation (1). The results presented in Table 2 show that a very low percentage of trajectories directly went to the locked state, and thus the dynamics of the model system were consistent with the assumed sequence of events in the dock-lock mechanism. The docking rates presented in the table were calculated using Equation (4). As expected, the computed value k D = (1.21 ± 0.01) × 10 9 L/mol/s, did not depend on b 1 and b 2 .
The trajectories that fell directly into the locked state were omitted from the results above and were assumed to introduce negligible error in the rate calculation procedure (less than ≈6% of the trajectories).
Bottom-up rate constants were obtained using free energy surface and short trajectories with methods in Section 2. Specifically, we calculated the diffusivity tensor used in Langer's theory using pico-second long trajectories started near the saddle point. Using Equation (7), we calculated diffusivities as D rr = 1.696 × 10 10 Å 2 /s, D qq = 1.600 × 10 10 /s, and D rq = D qr = 7.808 × 10 8 Å/s. The free energy surface obtained from umbrella sampling (shown in Figure 9) was used to expand the free energy around the minimum of the docked basin and the saddle point. Finally, we used Equation (8) to calculate k L = 9.852 × 10 7 /s. The docking equilibrium constant was also calculated from the free energy surface to be K D = 8.440 × 10 2 mol/L.
To demonstrate how the population balance model can be used in top-down data analysis to extract rate parameters, we filled a periodic simulation box of size (60 Å) 3 with individual dumbbells and a preformed fibril of initial length 10 dumbbells. We prevented the nucleation of new fibrils by constantly checking for dimers, separating them, and (randomly) throwing the dumbbells back in the box. The dumbbell concentration was held constant by adding a dumbbell to the box in a random position every time the fibril grew. Results from this simulation were used with maximum likelihood estimation (MLE) to extract the rate constants from brute-force simulations. Results were compared to the values calculated from Langer's theory.
Top-down estimates of the same rate constants were obtained from simulations of fibril growth in 50 identical boxes at five different concentrations, each one held at fixed dumbbell concentration with seeded fibrils of length L 0 = 6. Figure 10 shows a snapshot of the simulation box. The data from these simulations were then used to extract rate constants with MLE. To this end, the growth rate expression in Equation (11) had to be re-casted in a form that enabled the identification of free parameters for optimization. Division through by k D gave where K = k L /k D + K D −1 . The code used for the top-down analysis is available in the Supplementary Information. Table 3 shows the results from top-down analysis and bottomup predictions side-by-side. dumbbell concentration with seeded fibrils of length 0 6 L = . Figure 10 shows a snapshot of the simulation box. The data from these simulations were then used to extract rate constants with MLE. To this end, the growth rate expression in Equation (11) Table 3 shows the results from top-down analysis and bottomup predictions side-by-side.    There was reasonable (order-of-magnitude) agreement between the values for k L and K as calculated from rare events methods and MLE. The difference between the bottom-up and the top-down k L values could arise for two reasons. First, errors in the computed free energy surface were exponentially magnified in Langer theory for k L . Second, the data provided to MLE for fitting was limited to ≈100 ns at five different monomer concentrations. The predictions would be more accurate with longer trajectories and a wider range of concentrations. MLE code and the contour plot showing the log-likelihood as a function of k L and K are available in the Supplementary Information.
To further demonstrate the top-down extraction of rate constants from MLE with our population balance model and growth rate expression, we used previously published experimental data from Young et al. [71] for the growth of Aβ . In their analysis, Young et al. used a two-step Michaelis-Menten kinetic model to describe the results. In this work, we derived our rate law for fibril growth on the basis of the two-step dock-lock mechanism and by using microkinetic model techniques. This provided a mechanistic basis for the Michaelis-Menten expression and a molecular interpretation for its fit parameters. We calculated k L = 21.10 /s and K = 2.71 × 10 −5 mol/L. Young et al. reported a value of K = (7.2 ± 2.4) × 10 −5 mol/L. Note that Young et al. invoked an additional "pause state" beyond the dock and lock states to explain intermittent plateaus in their fibril length vs. time data. They did not provide details on how they treated the pauses in their data analysis, e.g., whether they truncated or abridged the data to remove pauses in computing K. We obtained the value K = 2.71 × 10 −5 mol/L by fitting their entire dataset with no adjustments. Potential differences in our analysis strategies may therefore have been responsible for the threefold difference in estimates of K.

Conclusions
This study presented a multiscale rare events strategy for the dock-lock mechanism of fibril growth. Our strategy separately computes dock and lock rate constants with correct units and incorporates them in a microkinetic model for the growth (or dissolution) rate as a function of monomer concentration. The microkinetic model is further embedded in a population balance model to predict the evolving length distribution in an ensemble of growing fibrils. The population balance model provides a direct connection to experimental data and/or to large-scale molecular simulation data of fibril growth. We demonstrated how this framework can be used for top-down extraction of kinetic parameters from experimental data and molecular simulation data. For the simulation data, we demonstrated that the top-down analysis yielded rate constants in reasonable agreement with those from rare events calculations for the dock and lock steps.
Apart from the multiscale framework, a central contribution of this work is a blueprint for computing second-order docking rate constants and constructing proper concentrationdependent fibril growth rate laws from molecular level rare events calculations. Although fibril growth mechanisms will vary by sequence and solvent composition, possibly introducing additional intermediates and kinetic traps, the multiscale analysis strategy here should help to connect simulations and experiments. We note that it should also be possible to obtain second-order docking rate constants from MSM results, but to our knowledge, the calculation has not been done. As outlined in Joswiak et al. [72], one must first assume that fibril-solute encounters are rare, i.e., that finding a solute in the simulation box is already a rare event. Then, Poisson statistics give the probability to find a solute in the simulation box as V box [M], where V box [M] << 1. Now suppose the MSM approach finds a docking rate constant k MSM with units/s. The MSM result is a conditional transition probability per time given that a solute is in the box. This means that the true rate, using conditional probability rules, is From this, we identify k D = k MSM V box , which should make it possible to revisit many MSM predictions, convert them to second-order k D values, and compare them to experiments.