Martini 3 Model of Cellulose Microfibrils: On the Route to Capture Large Conformational Changes of Polysaccharides

High resolution data from all-atom molecular simulations is used to parameterize a Martini 3 coarse-grained (CG) model of cellulose I allomorphs and cellulose type-II fibrils. In this case, elementary molecules are represented by four effective beads centred in the positions of O2, O3, C6, and O6 atoms in the D-glucose cellulose subunit. Non-bonded interactions between CG beads are tuned according to a low statistical criterion of structural deviation using the Martini 3 type of interactions and are capable of being indistinguishable for all studied cases. To maintain the crystalline structure of each single cellulose chain in the microfibrils, elastic potentials are employed to retain the ribbon-like structure in each chain. We find that our model is capable of describing different fibril-twist angles associated with each type of cellulose fibril in close agreement with atomistic simulation. Furthermore, our CG model poses a very small deviation from the native-like structure, making it appropriate to capture large conformational changes such as those that occur during the self-assembly process. We expect to provide a computational model suitable for several new applications such as cellulose self-assembly in different aqueous solutions and the thermal treatment of fibrils of great importance in bioindustrial applications.


Introduction
In nature, biomass is a renewable resource, and today it is considered a key material in the circular economy plan. This natural resource is chemically composed of insoluble carbohydrates (e.g., lignocellulose and cellulose) which under the action of several enzymes can be converted into small monomeric subunits and then easily undergo degradation [1]. The extraction and purification at high quality of polysaccharides for industrial applications (mostly due to its inherent biodegradability) [2] are typically carried out by chemical and physical processes, such as contact with ionic liquids [3] or TEMPO-oxidation [4] and ultrasonic separation [5], respectively. These processes aim to destabilize the polar and electrostatic forces between cellulose chains in fibrils.
Molecular Dynamics (MD) simulation is the computational tool capable of investigating the underlying atomistic mechanisms in macromolecular systems at very short length and time scales (e.g., nm and µs). In this regard, the inner hydrophobic structure of cellulose I allomorphs and type-II fibrils has been characterized by MD methods [6][7][8]. As a result, the hydrogen bond (HB) network necessary to build an elementary microfibril has been quantitatively described in the ground state of the microfibrils in solution [7]. Note that a relatively strong presence of the O3-H· · · O5 intrachain and O6-H· · · O3 interchain HBs are relatively established, and it is believed they are responsible for the stability of the layered structures. This result was validated by X-ray and neutron fiber diffraction [9,10] and demonstrated the absence of O-H· · · O interactions between layers and a larger presence of C-H· · · O HBs between cellulose sheets. Thus, it is expected that some C-H· · · O hydrogen bonds and van der Waals forces may contribute to the stability of cellulose I.
In the age of fast computational processing and machine learning (ML), larger and more complex systems in equilibrium can be modeled by advanced all-atom MD and ML tools, such as viral diseases [11,12], material design [13], etc. However, out of equilibrium processes such as large conformational changes of biomolecules, e.g., the unfolding of a protein, self-assembly of polysaccharides in plant cell walls, enzymatic degradation of plastic, etc., are still beyond the state-of-the-art all-atom MD implementations. For such cases, coarse-grained (CG) methodologies were introduced to reduce the number of degreesof-freedom, which has a direct impact on the number of simulated particles. In addition, they employ soft potentials which allow the use of larger time steps. In general, they are developed on the basis of rigorous statistical mechanics. Thus, they can deal with larger systems and longer time scales than all-atom MD. Today, one can find several CG models that have been employed to model large fluctuations of cellulose microfibrils. For example, Fan et al. [14] developed a one-site CG model for cellulose Iβ which described fibrils on a 10-500nm length scale. Srinivas et al. [15] presented a one-site CG model derived by the force-matching approach, which captures the amorphous state of cellulose Iβ and calculated the free energy of transition between the crystalline to amorphous state. Similarly, our [8] CG model with one-bead per D-glucose centered on C4 atoms was able to model cellulose I allomorphs. In addition, the popular Martini force field [16] has been employed to model crystalline native cellulose (i.e., Iβ). Other CG models for sugars [17][18][19] are capable of describing accurately complex sugars (e.g., DNA, RNA, etc.) and capture mechanical properties and dynamics, but they can not reach very large length scales, as required by cellulose fibrils (>40 nm). Some of these models rely on implicit water and thus they can not be used in aqueous conditions, therefore, failing to reproduce the experimental conditions necessary for industrial applications. However, a Martini description could be used in combination with other biomolecules in water such as proteins, lipids, nucleic acids, and other biopolymers. In this regard, a generic framework can be derived in terms of the Martini approach.
Here, we present a Martini 3 model for cellulose fibrils, which employs one set of potential parameters, able to differentiate three structures of cellulose, namely Iα, Iβ, and II. In order to describe the inner structure of the flat ribbon of a cellulose chain, we employed a set of harmonic constraints between nearest neighbours.

All-Atom MD Simulation
The initial structures of cellulose I allomorphs (Iα and Iβ) and cellulose II fibrils were generated by the Cellulose-Builder toolkit [20]. In practice, we prepared systems with 36 cellulose chains, each one composed of 100 D-glucose units. The final structures were visualized by VMD [21], and data were postprocessed using its internal protocols. Allatom MD simulations were performed using NAMD [22] version 2.14. The fibrils were modeled using the CHARMM36 force field [23], where the D-glucose denoted as the "BLGC" residue in the force field was used to describe each residue in a given cellulose chain. A triclinic box was used to represent the simulation box, and periodic boundary conditions were implemented in all directions. Solvation of the simulation box by TIP3P water molecules [24] included a buffer distance of 15 Å from the fibril. A total of 71,286, 67,403, and 59,383 water molecules were necessary for each fibrils system, namely Iα, Iβ, and type-II. The system had an average density of 0.1053 atoms.
The equilibration protocol was: (i) 10,000 of energy minimization via conjugate gradient protocol for the solvent, while the solute remained restrained; (ii) unrestrained MD simulation using a time step of 2 fs, and a reference temperature and pressure of 300 K and 1 atm, respectively. For the second step the Langevin thermostat and Piston, which are implemented in NAMD were employed. Long-range electrostatic interactions were computed using the PME [25] methodology; (iii) finally, a production run of 100 ns for each system was started after the target temperature and pressure were achieved in the NVT and NPT ensembles. Cross sections of the fibril structures are represented in Figure 1.

Coarse-Grained Model: Martini 3
In this part, we employed a versatile CG force field denoted as Martini 3 [27]. This new methodology has been successfully validated in several applications of large conformational changes in complex systems [28][29][30]. Hence, our aim was to build a robust CG model that not only reproduced the basic structural parameters of cellulose Iα, Iβ, and type-II fibrils but also a CG model, which can be transferable between different fibrils. In order to achieve this goal, we searched in the parameter space of all CG Martini bead types. Since each cellulose fibril has different interchain long-range interaction networks (i.e., different packing structures due to HBs) we had to select an elementary fragment of the cellulose chain that represents all the structures. This study used the whole cellulose chain as the basic building block (BB) for all different fibrils.
For this purpose, we modeled each cellulose fibril as an aggregate of the same BB. Each BB consisted of several planes, with each one defined by 4 CG beads denoted as CG1, CG2, CG3, and CG4 (see Figure 2). The orientation between planes changed along the fibril axis as a consequence of our parameterization. Validation of this minimal model was supported by all-atom MD data. There are several ways to parameterize a plane, and here, we used an elementary parametrization known as the simplicial complex homeomorphic (SCH) to a plane [31]. This mathematical construction is defined by triangular patches that optimally cover an entire surface, in our case the cellulose chains. CG simulations showed an average density of 0.0105 atoms/Å 3 , with an average of 51,291 atoms per box.   Figure 3 shows atomistic distributions of distances between chosen CG sites after mapping via the SCH protocol. For all cellulose fibrils, we observe almost a well-defined bell-shape distribution with an exception for O2-O6 and O3-O6, where a bimodal distribution was sampled. In order to maintain a certain degree of parameter transferability between CG models of cellulose fibrils, we considered a distance value between two CG beads consistent with the average of the means of distributions. For those two cases where distributions diverged, we tuned the distance value to reproduce structural consistency in all CG models. The selected parameters are shown in Table 1  The labels assigned to distances are described in the method section, as well as the relationship between first and second D-glucose neighbours denoted by "+" and "++" respectively. Data for cellulose Iα, Iβ and type-II are represented by purple, green, and yellow solid lines, respectively. The horizontal black line indicates the value of the optimal distance parameter in the CG model. The distributions are bell-shaped with the exceptions of the intra-glucose distances O2-O6 and O3-O6, which follow a bimodal distribution.

Parameterization of the CG Model for Several Cellulose Fibrils
CG systems were simulated using GROMACS 2020.2 [32]. A similar equilibration procedure as described in the all-atom MD section was considered. In particular, long-range interactions were computed using Martini 3 standard methodology [27], namely, the reaction-field method [33] using a cutoff of 1.1 nm. Temperature and pressure control were achieved using the V-rescale thermostat [34] and Parrinello-Rahman barostat [35], respectively. Temperature and pressure were set to 300 K and 1 bar for all CG simulations. Equilibration in the NVT and NPT ensembles lasted 0.1 ns and 1 ns, respectively. The positions of two CG beads, namely CG1 and CG2, were constrained and fully relaxed during 1 ns by the end of the equilibration.
The time step for production MD runs was 0.02 ps and, CG-MD trajectories were set to 50 ns for each of the 18 different Martini 3 interaction types (i.e., P1, P6, SP1, SP6, TP1, TP6, N1, N6, SN1, SN6, TN1, TN6, C1, C6, SC1, SC6, TC1, and TC6). In the Martini 3 library, each interaction type is defined by at least three parameters, i.e., σ ij and ij correspond to the Lennard-Jones (12-6) potential and q for particle charges [16]. In our case, we decided to parametrize the CG3 with a TC1 Martini 3 bead type as it has the smallest interaction with CG water, generally given by W bead type, and other CG beads. According to the all-atom MD distribution of distances between CG centers (see Figure 3), we defined the same elastic constant equal to 30,000 kJ/mol for well-behaved bell-shape cases. Only those flexible bond distances, "O2-O6" and "O3-O6", required special attention. In this case, we tuned the elastic strength to a lower range of values using 25 kJ/mol, 250 kJ/mol, and 2500 kJ/mol. A cumulative time of about 3 µs was necessary to determine the best model amongst all parameters. Figure 4 shows the results of the CG parametrization for several bead types in Martini 3 and the elastic bond strength for O2-O6 and O3-O6. The final set of parameters satisfy the criterion of a small deviation of the RMSD. This quantity was averaged over all the CG trajectory, using as a reference the last frame of AA-MD simulation. As a result we selected the bead type SP6 and 2500 kJ/mol for the elastic constant as they follow our structural criterion, and the RMSD remained smaller for all fibrils simultaneously (see Table 2).    Table 3 shows the comparison of distance in CG and MD simulations. The Martini 3 model with parameters SP6 bead type and 2500 kJ/mol for O2-O6 and O3-O6 intramolecular distances maintains accurately the internal structure of the cellulose chain in each cellulose fibril type, namely Iα, Iβ, and type-II. Hence, it highlights a transferable model between those three systems. Table 3. Comparison of the distances between CG centres between AA-MD and CG-MD models for Iα, Iβ, and type II fibrils. Mean value and standard deviation are shown for each entry. The first column indicates whether the distance is found inside the D-glucose (none) or between the two subsequent (1st and 2nd neighbours) D-glucose molecules.

D-glucose Neighbours
Pairs Fibril Our CG model preserves the internal structure of the D-glucose molecules and shows thermal stability in a range of temperature from 300 K-400 K (see Supplementary Material). These results indicate the possibility to study new applications in Martini 3 such as the structural relaxation of fibrils under different temperatures.
We also carried out a RMSD cross-correlation analysis. We computed an RMSD for each pair of frames from AA and CG trajectories and, subsequently, the RMSD average and respective standard deviations (SD) for each pair of trajectories. This combinatorial problem required the computation of 663,522 values of RMSD from a pair of structures. Note that this procedure captures the relative structural changes from two structures at different time points. Table 4 shows the RMSD cross-correlation for AA and CG trajectories. This result validates our Martini 3 model of several cellulose fibrils, as it shows the correct structural description for each crystalline cellulose fibril due to a small deviation in RMSD in the range of 5 Å. In fact it shows a well-equilibrated CG structure respect to its analogous AA system. Furthermore, the larger deviation with respect to other crystals (RMSD > 15 Å) shows that a given CG fibril model retains a structural distance from other cellulose crystals, as expected. Thus no transition or relaxation from one fibril to to another is observed in our simulations.  Figure 5 shows the analysis of one feature in cellulose fibrils: the twist angle. In our case, we define it by the dihedral angle formed by O2 and C6 centers from two subsequent D-glucose molecules. As a result, we observe qualitative agreement between AA-MD and CG-MD representations for cellulose I allomorphs (Iα and Iβ). However, our CG model is indeed more accurate for the case of cellulose type-II. A detailed analysis of the distribution of twist angles shows consistency between AA and CG for type-II cellulose with a mean value of the twist angle about −0.42 degrees. In the case of cellulose I allomorphs, the dihedral angle in CG simulations reported about half of the value in AA-MD (see Table 5).

Conclusions
In this study, we employed a mathematical construction called geometrical simplicial complex for the determination of an optimal CG mapping scheme. In this context, this approach offers the advantage that a CG model does not require angular (three or four centers) potentials to be represented. However, a clear limitation such as the case of heterogeneity in complex polysaccharides may require a generalization of our approach.
We have tested our approach in two cellulose I allomorphs (i.e., Iα and Iβ), which are almost indistinguishable at large length scales. Although the local structure was mostly dominated by HBs, we managed to account for a versatile Martini 3 CG model capable of reproducing a single cellulose chain structure using only three uncharged Martini bead types. Furthermore, we also modelled the structure of cellulose type-II, which is substantially different from cellulose I allomorphs. Furthermore, our model captured the atomistic twist angle inherited by the fibril structure in solution for all three structures. Future experiments can address the relevance of the twist as a function of the temperature.
Our model opens the possibility to study large conformational changes of polysaccharides, as it is fully compatible with our earliest implementation of the Gō-Martini approach for the modelling of large conformational changes of proteins [36]. In this regard, a future improvement of this model for larger conformational changes inside a single CG cellulose chain can be achieved by switching some of the interactions between nearest neighbours by a consistent set of Gō-like potentials following standard rules for contact map creation [7,37]. Similar ideas, such as the dual-basin Gō-like model to capture transitions between different fibrils, can be suitable to speed up transitions [38,39].
Supplementary Materials: The following are available online, Figure S1: Histogram of intra and inter residues angles from CG dynamics, Figure S2: Annealing results for Iα, Iβ, type-II cellulose fibrils.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. Source code is available in https://github.com/rams-research/cgcellulosefibril, accessed on 27 January 2022.