Neural Upscaling from Residue-Level Protein Structure Networks to Atomistic Structures

Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly compressed representations of molecular structures, and simulations operating via such representations can achieve substantial computational savings. A drawback of coarse-graining, however, is the loss of atomistic detail—an effect that is especially acute for topological representations such as protein structure networks (PSNs). Here, we introduce an approach based on a combination of machine learning and physically-guided refinement for inferring atomic coordinates from PSNs. This “neural upscaling” procedure exploits the constraints implied by PSNs on possible configurations, as well as differences in the likelihood of observing different configurations with the same PSN. Using a 1 μs atomistic molecular dynamics trajectory of Aβ1–40, we show that neural upscaling is able to effectively recapitulate detailed structural information for intrinsically disordered proteins, being particularly successful in recovering features such as transient secondary structure. These results suggest that scalable network-based models for protein structure and dynamics may be used in settings where atomistic detail is desired, with upscaling employed to impute atomic coordinates from PSNs.


Background
Proteins and other biological macromolecules exhibit a wide variety of complex dynamics and interactions at varying size and time scales. While atomistic molecular dynamics (MD) models currently serve as the gold standard tools for simulating dynamics at high resolution (with some inroads by quantum mechanical methods in small-scale or specialized applications), the cost of large-scale MD simulations limits their use to relatively small systems on time scales of microseconds or less. Coarse-grained (CG) models offer a means of accessing larger system sizes and longer time scales, sacrificing atomistic detail in exchange for reduced computational cost. Many "flavors" of coarse-grained simulation exist, with the most common being aggregate particle models that represent collections of atoms by single particles whose positions evolve under a suitably modified forcefield. The highly successful MARTINI model [1], for instance, represents biomolecules by "beads" corresponding roughly to one bead per four heavy atoms, with hydrogens left implicit; MARTINI and other CG MD models have proven useful in studying the structure and dynamics of large complexes, lipid phases, and other systems that are too large to be treated with atomistic MD methods [2]. An even more radical approach to coarse-graining employs topological representations, representing molecules or molecular aggregates by network structures that encode the interactions between atoms or groups thereof, but not their positions in three-dimensional space [3,4]. Often employed for descriptive analysis of trajectories produced by MD or other methods (see, e.g., [5][6][7]), network representations have the advantage of retaining complex topological information involving protein structure while being highly compressive (greatly reducing the computational cost needed for, e.g., comparative analysis of long trajectories), and facilitating application of a large body of graph-theoretic measures for capturing structural properties ranging from cohesion and constraint to differences in centrality and contact rates. Recent work has also considered the generation of trajectories directly within the topological representation ("coordinate-free" simulation), allowing considerable computational savings [8,9].
While many questions can be posed directly within a CG representation, an obvious limitation of coarse-graining is that some observables of interest cannot be obtained without an additional step of "backmapping" or "upscaling" the CG trajectory to atomistic resolution. At first blush, this may seem impossible: by definition, a CG model does not resolve individual atoms. In practice, however, CG structures are often sufficiently constraining that a well-designed algorithm can infer atomic positions from them with considerable accuracy [10]. For instance, a number of upscaling methods for particle-based CG models work via a two-stage process in which initial guesses for atomic placement are made based on, e.g., random positioning [11], fragment-based [12,13], or geometrybased [14][15][16][17] initialization, followed by an energy minimization step to ensure physically realistic coordinates. This is not unrelated to protein structure prediction methods like those of [18,19], which begin with approximate structures based on local homology and subsequently refine them via minimization in a simplified force field. Such techniques have proven extremely successful in predicting the structure of globular proteins [20,21], and are widely used in enzyme discovery and engineering applications [22,23].
In the context of topological coarse-graining, the historical focus has been on mapping from atomistic to coarse-grained networks for purposes of analysis (e.g., [3,[24][25][26][27]), with correspondingly less emphasis on the upscaling problem. Recent work, however, has suggested the potential of graph-theoretic models for molecular structure and dynamics. For instance, Grazioli et al. [9], Yu et al. [28] use Hamiltonians defined on graphs representing the structures of protein aggregates to model the equilibrium structures and kinetics of amyloid fibrils and associated aggregation states (with vertices representing individual proteins, and edges indicating bound interactions). On a smaller scale, Grazioli et al. [8] used a closely related approach to model transient structure in intrinsically disordered proteins (IDPs), using residue-level protein structure networks (PSNs) in which each vertex represents a residue and edges represent inter-residue contacts. Although we are unaware of any existing methods for upscaling such graph structures to atomic resolution, effective methods for this purpose would greatly extend the practical reach of network-based simulation models.
Our focus in this paper is this last problem: the upscaling of topological representations of macromolecular structure (and by extension, dynamics) to atomic resolution. We specifically consider the upscaling of residue-level PSNs, as this is a widely used level of network coarse-graining for proteins and poses a non-trivial challenge for atomistic refinement. To perform the mapping from network structure to atomistic structure, we exploit advances in machine learning (ML) methods, predicting atomic coordinates from topological inputs using deep neural networks. Machine learning strategies (particularly including neural networks) have become widely used in CG modeling, with past efforts focused on ML-based methods for learning or refining CG forcefields (see e.g., [24,25,[29][30][31][32][33]). Here, we use multilayer perceptron-based (MLP) neural networks to learn pairwise interatomic distances from residue-level PSNs, allowing us to recover atomistic detail from input network structures.
In this work, we demonstrate the utility of MLP neural network models to translate coarse-grained protein structure network representations to their more finely detailed 3D coordinate structures. We apply this to the case of IDPs, showing that the trained neural network is able to reproduce equilibrium conformations of amyloid-β protein obtained from MD simulations at atomic-level detail, also capturing its diverse transient secondary structure behavior. We additionally consider the use of further refinements (such as chirality corrections and energy minimization) to improve predictive performance. We show that this scheme can obtain a high level of accuracy, with median RMSE for predicted versus true 3D structures of approximately 2.13 Å and a high degree of correspondence for relatively folded regions of the protein. The resulting scheme provides a practical mechanism for mapping PSNs produced by generative network models to predicted atomistic structures (Figure 1), for using PSNs as an efficient tool for lossy compression of long trajectories, or other applications in which it is useful to infer atomistic information from coarse-grained topological representations.  The remainder of the paper is organized as follows. We introduce our approach in Section 2, including both our ML pipeline and subsequent refinement methods. Section 3 reports the results of our simulation study, Section 4 discusses further directions, and Section 5 concludes the paper.

Methods
Data Generation Our data come from atomistic MD trajectories of Aβ 1-40 , a widely studied IDP implicated in the etiology of Alzheimer's disease; the atomistic trajectories and associated PSN coarsenings, respectively, serve as ground truth and inputs for the upscaling model ( Figure 2). Beginning with the lowest energy monomer of the PDB structure, 2LFM [34], one Aβ 1-40 monomer was simulated in explicit solvent for 1 µs using NAMD [35] via the following protocol: the initial monomer structure was solvated in a cubic TIP3P [36] water box of minimum margin 25 Å, and neutralized with NaCl counter-ions. This assembly was minimized for 10,000 iterations, followed by velocity initialization and 250 simulation iterations before final adjustment of the water box. A trajectory of approximately 1.1 µs was then simulated. Simulation was performed under periodic boundary conditions in NAMD with the CHARMM36m forcefield [37], using an NPT ensemble at 300K and 1 atm pressure. Temperature control was maintained by Langevin dynamics with a period of 1/ps, with Nosé-Hoover Langevin piston pressure control [38,39]. Creation of initial conditions and related data processing were performed using VMD [40].
The simulation contains 11,926 total frames/conformations, of which 72% was allocated for training, 20% for testing, and 8% for validation. When working with highly flexible, nonparametric learning methods (including approaches such as the deep learning techniques used here), it is common to employ multiple data splits for cross-validation purposes. Here, the training data is used for parameter estimation, the validation group is used for hyperparameter tuning and other optimizations to the training process, and the test data is used to provide a held-out evaluation of the final model. Five-fold cross validation was also performed to ensure that bias was not introduced during the initial train-test division. k-fold cross validation (a standard technique in which the data is split at random into k segments, each of which is then used to produce one test-train split) guards against the risk of obtaining anomalous performance estimates due to selection of an unusual data division, and can provide additional insights into performance sensitivity by comparing results across divisions of the data. For each frame in the Aβ 1-40 simulation, a protein structure network (PSN) was calculated using a combination of VMD [40] and the statnet [41,42] and bio3d [43] libraries for R [44]). Monomer states were sampled from the trajectory every 100 ps, from which residuelevel protein structure networks were constructed. Vertices correspond to individual residues, with two vertices being considered adjacent if they contain respective atoms whose distance is less than or equal to 1.1 times the sum of their van der Waals radii (based on radius data from [45]). The input data used to train the neural network model consists of the flattened upper triangular matrix data extracted from the residue-level contact adjacency matrix for each conformation in the Aβ 1-40 trajectory. A contact adjacency matrix, x, is a binary square representation of the edges existing between two residues, with x ij = 0 where there is no edge between the vertices associated with respective residues i and j, and x ij = 1 where an edge is present. As contact is an undirected relation, x is symmetric, and only one triangle of the matrix is required for learning. Here, the upper triangle is flattened into a one-dimensional array for processing. The output data used to train the model is the corresponding set of flattened upper triangles of the pairwise interatomic distance matrices (PIDs) calculated on all non-hydrogen atoms (across all frames in the MD simulation) ( Figure 2). For purposes of evaluation (as discussed below) it should be noted that when comparing predicted to observed PIDs, we define errors in terms of the pairwise distances themselves, not, e.g., the distances between equivalent atoms in the observed versus predicted structures post-alignment. For instance, let y ij be the observed PID for heavy atoms i and j (i, j ∈ 1, . . . , N), with predicted valueŷ ij . Then, the squared error in PIDs is given by

MD Simulation
with RMSDs and other quantities defined accordingly.
Neural network architecture and hyperparameters After generation of input and output data, a multi-layer perceptron (MLP) neural network was utilized for training as indicated in the pipeline (Figure 3). This neural network contains four hidden layers (structured as follows), and was implemented using the machine-learning libraries Keras [46] and tensorflow [47]. The first three hidden layers consist of 2000 neurons, the fourth layer contains 8000 neurons, and the last output layer predicts the flattened upper triangle of the pairwise interatomic distance matrix for a given frame from the MD simulation (46,665 neurons) ( Figure 6). Hyperparameters were optimized using the Talos Keras tuning module [48]. A Nvidia P6000 Quadro GPU card was used to train the model with the following hyperparameters: nonlinearity = relu, dropout rate = 0.2, optimization = AMSGrad, loss = mean squared error, batch size = 50, epochs = 100. Predicted output data were initially assessed using three metrics: root-mean squared deviation/error (RMSD/RMSE), mean squared error (MSE), and mean absolute percentage error (MAPE).  Post-prediction processing The predicted output data (the flattened upper triangles of the pairwise interatomic distance matrices) were first transformed into symmetric pairwise interatomic distance matrices. These were then transformed into 3D coordinate data using the multi-dimensional scaling (MDS) function from the scikit-learn python module and MDtraj [49] to generate PDB structures (Figure 3). Chimera [50] was then used to add hydrogens to predicted PDB structures, which were then further processed to remove inaccurate chiral predictions. If more than half of Cα centers were inaccurately predicted as R chiral centers (D-amino acids instead of L-amino acids), this indicated that the MDS procedure (which is reflection-invariant) predicted a reflection of the true coordinates. This was mitigated by reflecting all coordinates over the y-axis for predictions exhibiting an R S ratio greater than 1. If fewer than half of α-carbons exhibited R chiral centers, reflecting coordinates was unnecessary. Instead, Chimera was used to switch side chain coordinates and the α-hydrogen for all inaccurately predicted Cα chiral centers. After checking for correct chirality for each residue, all conformations were further minimized for 75 conjugate gradient steps.
The number of conjugate gradient steps was chosen by evaluating structures every subsequent 20 conjugate gradient steps for a cumulative 520 steps total. The maximum 520 conjugate gradient steps was chosen based on qualitative determination of average potential energy trends of all predicted conformations with increasing conjugate gradient minimization ( Figure 4). Three superposition-based metrics (RMSD, global distance test, total score (GDT_TS), template modeling (TM) score) and one superposition-free metric (local distance difference test (LDDT)) were used to analyze any potential improvements in additional conjugate gradient steps between predicted 3D structure and the original, MD-generated 3D conformation. The RMSD metric analyzes all heavy atoms, TM score focuses primarily on Cα atoms, and GDT_TS also focuses primarily on backbone atoms. The LDDT score calculates a comparison using all-atom pairwise interatomic distances. Average values of 500 randomly chosen structures (RMSD, TM Scores, GDT_TS, and LDDT) suggest a minimization range between 50-100 conjugate gradient steps. Thus, 75 steps was chosen as the total number of conjugate gradient steps to minimize all 11,926 predicted conformations. Overall, minimization yields little improvement relative to no minimization with respect to most metrics; however, it is a necessary step to remove steric clashes and slight stereochemical errors (Figure 3, last panel). . Distribution of the optimal number of CG steps for structure refinement, by metric, with green vertical lines representing means, and notches representing medians. Although exact optima vary by structure and metric, 50-100 steps are sufficient to provide good performance on most structures; extensive refinement beyond this point is rarely beneficial.

Multilayer Perceptron (MLP) Neural Network Reconstructs Aβ Conformations with Atomistic Detail
Pairwise interatomic distance (PID) predictions were made for all sets of data (train, validation, test). Predictions were evaluated against the ground-truth PIDs from the MD simulation using root-mean square error/deviation (RMSE/RMSD), mean absolute error (MAE), mean absolute percentage error (MAPE). (As described above, we evaluate PID error in terms of the pairwise distances themselves, and not, e.g., imputed coordinates.) The average metrics for the test set exhibit a favorable RMSE (1.7 Å), MAE (1.17 Å), and MAPE (7.35%) (Figures 5 and 6). Five-fold cross-validation suggests bias was not arbitrarily introduced during the initial train-test split (Figure 7). Overall, average PID metrics for the validation and test set suggest the neural network was able to devise quality predictions.    To illustrate model performance, we show examples of both good and bad predictions from the test set, beginning with the positive example of frame 1133. Original and predicted pairwise interatomic distances for frame 1133 upon initial visualization, have highly comparable values (Figure 8a,b). A grayscale depiction of absolute value differences between original and predicted PIDs reveals white and light grey data points, denoting mostly low values (Figure 8d). The distribution of these data shows that approximately 98% of difference values are less than 2 Å and 88% are less than 1 Å (Figure 8c). This result from the test set highlights one of the most accurate predictions of atomistic structures made by the neural network model. Using RMSEs of PIDs as a basis for selection, we show processed 3D predictions of the lowest RMSE score representation (frame 1133, Figure 9a), the median representation (frame 7431, Figure 9b), and the highest RMSE score structure (frame 7560, Figure 9b). The prediction with the lowest RMSE (0.67 Å) exhibits more helical secondary structure compared to the median and worst predictions, which exhibit more random coil-like dynamics. RMSE of all heavy atoms for the median representation exhibits a fairly reasonable value of 1.46 Å whereas the worst PID prediction has a RMSE of 10.4 Å. Notably, the prediction for Figure 9c aligns reasonably well for the first 20 residues and the remaining residues are more poorly predicted by the neural network model. Since folded regions are inherently more data-rich for binary contact adjacency matrix representations (e.g., a 5 Å PID and a 500 Å PID both produce the same zero matrix element), it is not surprising the neural network model struggles to predict this specific overly extended conformation; however, we note that the prediction still preserves the qualitative aspects of the extended structure, and is quite accurate for the N-terminal region. The RMSEs according to 3D structure alignment between original and processed 3D structure (and not on the basis of PIDs) also show similar values: best (0.77 Å), median (2.13 Å), and worst (12.01 Å). These values are slightly higher compared to PID-based RMSEs, most likely due to introduced 3D alignment, whereas PIDs report RMSEs between all heavy atoms.

Generation of 3D Structures and Subsequent Minimization
When multidimensional scaling maps PIDs into 3D coordinates, it does so without regard to chirality. There are thus instances in which entire conformations are D-instead of L-amino acids, a correction that can be easily identified and fixed by reflecting coordinates across the y-axis. We also corrected conformations that contained only a few instances of D-amino acids, a result of the neural network predicting slightly incorrect side chain PIDs. These chirality checks followed by minimization are necessary, computationally inexpensive processing steps required to transform PIDs into sterically reasonable 3D structures. Once corrections where fixed using Chimera, we then minimized all proteins for 75 conjugate gradient steps (a determination detailed in Methods), with a few conformations (23) requiring an additional 5 steps. Figure 10 depicts a pre-and post-minimization of the best predicted conformation (frame 1133) in the test set. Here, we focus particularly on residues histidine 13 (His13) and phenylalanine 4 (Phe4). Both residues in the pre-minimized conformation are sterically incorrect and misplaced, whereas in the post-minimized conformation, both residues have expected canonical sterics, devoid of incorrectly positioned atoms. When these optimization techniques (stereochemical corrections and minimization) are combined with the predictive power of the MLP neural network, this method yields highly effective predictive capabilities.
After minimization, it was also imperative to compare 3D minimized predictions to their original MD simulation counterparts. Three superposition-based metrics (RMSD, TM score, GDT_TS) and one superposition-free metric (local distance difference test (LDDT)) were utilized for this evaluation. The template modeling (TM) score measures the backbone similarity between a reference protein and target protein with a range from 0 (dissimilar) to 1 (identical) [51]. RMSD is a canonical protein comparison metric, and here we parameterize it to compare all heavy atoms between native and predicted structures. LDDT utilizes pairwise interatomic distances in its methodology, focusing on local intramolecular interactions and the degree (range 0-1) of their retention in the target conformation in comparison to the native reference structure [52]. The global distance test, total score (GDT_TS) is an improvement compared to RMSD designed to assess structures with the same sequence but different tertiary structure, with a higher score denoting better agreement (range 0-1) [53]. All four metrics are commonly used during the biennial Critical Assessment of Structure Prediction (CASP) structure prediction and assessment competition [54], and here we use these metrics to assess the predictive performance of the model.

Minimization
No minimization Figure 10. Comparison of pre-and post-minimized structures of the best prediction in the test set, frame 1133. Figure 11 illustrates these metrics for the combined validation-test set. There exists a positive correlation between LDDT vs. TM scores and GDT_TS (Figure 11a,b). Between RMSDs vs. TM scores and GDT_TS, predictions exhibit a negative correlation (Figure 11c,d). Included are also the aforementioned best (yellow diamond), median (purple diamond), and worst (red diamond) PID predictions from Figure 9. Since their designation as best, median and worst were on the basis of RMSEs of PIDs and not 3D structure, it is interesting to observe the surprisingly high LDDT value of frame 7560 (the worst prediction). This suggests the neural network was able to preserve more local residue interactions despite struggling with larger more regional intramolecular interactions. TM scores exhibit values in the lower range of <0.5, whereas most GDT_TS and LDDT values occupy a range >0.5, suggesting TM scores may not be as reliable of an assessment metric for Aβ 1−40 . The average and 95% confidence intervals suggest predicted 3D models are predicted relatively well considering the high GDT_TS average and narrow 95% confidence interval ( Figure 12). The best and median test cases occupy expected 3D metrics ( Figure 11). In combination with PID metrics (Figure 5), the 3D metrics demonstrate the model's ability to reasonably reconstruct the complex protein conformation of Aβ 1-40 from coarse contact adjacency matrices.

Discussion
In this work, we have implemented a custom MLP neural network modeling approach to reconstruct atom-level representations of Aβ 1-40 from residue-level PSNs. Although this particular neural upscaling model is specific to amyloid-β, the MLP neural network model can be retrained to other biomolecular systems using inputs derived from a variety of different sources (e.g., MD simulations, NMR ensembles, etc.). Given training data in the form of PIDs, the approach used here can be used to obtain comparable predictive models for PSNs associated with any protein system (and, assuming appropriate modification of the coarsening level, alternative PSN definitions such as that of [3]). Although further work is needed to investigate the range of conditions under which these models will work well, the success of the Aβ 1-40 model (a relatively non-trivial case, due to the presence of highly variable and often unfolded conformations) bodes well for performance in other systems. More broadly, an obvious extension of this approach is the creation of more general models for broader classes of protein systems, and for multiple levels of PSN coarse-graining. The success of deep learning in producing predictors with good generalization performance over large ranges of inputs (e.g., images with widely varying content) suggests that such general-purpose PSN upscaling tools are an achievable goal.
Although previous reverse mapping methods (e.g., random placement, geometricbased, etc.) are able to reconstruct atomistic models, they do so typically from coarsegrained force field models based on particle representations (e.g., MARTINI [1]), or from partially observed atomic coordinates (e.g., [55]). The advantage of a MLP neural network is the ability to learn and fine-tune parameters specific to the system under investigation from minimal information (here, PSN adjacency matrices) and without the requirement that the available information be geometric in character. This opens the door to the use of "coordinate-free", network based simulation methods [9] to explore the behavior of complex biomolecules while still retaining the ability to map results back to a conventional, spatial representation. Such network simulations can be produced using a network Hamiltonian based on connectivity patterns rather than atomistically detailed spatial interactions, allowing the prediction and simulation of large ensembles and/or long timescale trajectories that might otherwise be computationally expensive to model.
In the literature, another class of neural networks, specifically variational autoencoders (VAE), has been used primarily on single small molecules and bulk-phase simulations as test cases for reverse mapping [56]. This VAE methodology, although not tested on proteins, could possibly be adapted for such systems; however, we are able to demonstrate successful backmapping with a non-variational MLP neural network architecture, indicating that variational structure is not essential. To better generalize our neural upscaling technique to protein systems of different sizes, convolutional neural network architectures similar to AlphaFold [57] could be also be incorporated and trained to predict regions (e.g., N × N residue regions). With an ever-growing body of architectures whence to choose, there would seem to be considerable room for experimentation with alternative approaches.
As noted at the outset, there is considerable work on the problem of imputing atomistic structures from either coarse-grained or incomplete spatial information. Although our focus here is on the extreme setting where such information is unavailable, further enhancements may be possible in settings where both types of information can be employed. While this is not possible for, e.g., predicted PSNs arising from network models, it may be of use in cases where a combination of partial spatial information and incomplete contact maps are available, as from, e.g., incomplete NMR data or crosslinking mass spectrometry experiments. Models for such cases are an interesting direction for further work.
Finally, we note that non-neural network methods can also be applied to the upscaling problem. In preliminary experiments (not shown), we found that a kernelized ordinary least squares predictor [58] was able to obtain relatively good results (mean RMSD of approximately 2.4 Å, mean median ARE approximately 8% on interatomic distances (PIDs) under 10-fold cross-validation). Though the model was outperformed by the neural network architecture described here, and we did not therefore pursue it further, there may be situations in which non-neural network classes of predictors will prove useful. This would also seem to be a promising area for further investigation.

Conclusions
Direct predictions of PID metrics demonstrate the predictive capabilities of the MLP neural network to reconstruct all-atom representations of proteins from binary contact adjacency matrices. Example conformations of the best, median and worst PID-based predictions in the test set illustrate the MLP performance. In the worst prediction (frame 7560), the RMSD between the N-terminal halves of the original vs. predicted is still quite favorable (0.98 Å). Chirality corrections and conjugate gradient minimization were vital post-prediction processing steps in generating stereochemically reasonable 3D structures. Three-dimensional accuracy metrics, in particular GDT_TS-the main assessment metric in the CASP competition-suggest the neural network performed well given the average values and 95% confidence intervals. In totality, we are able to illustrate the viability of the MLP neural network architecture in this transformation experiment. This work exemplifies neural network-based techniques capable of extracting useful, meaningful data from coarse-grained models.