Energy Landscapes and Heat Capacity Signatures for Monomers and Dimers of Amyloid-Forming Hexapeptides

Amyloid formation is a hallmark of various neurodegenerative disorders. In this contribution, energy landscapes are explored for various hexapeptides that are known to form amyloids. Heat capacity (CV) analysis at low temperature for these hexapeptides reveals that the low energy structures contributing to the first heat capacity feature above a threshold temperature exhibit a variety of backbone conformations for amyloid-forming monomers. The corresponding control sequences do not exhibit such structural polymorphism, as diagnosed via end-to-end distance and a dihedral angle defined for the monomer. A similar heat capacity analysis for dimer conformations obtained using basin-hopping global optimisation shows clear features in end-to-end distance versus dihedral correlation plots, where amyloid-forming sequences exhibit a preference for larger end-to-end distances and larger positive dihedrals. These results hold true for sequences taken from tau, amylin, insulin A chain, a de novo designed peptide, and various control sequences. While there is a little overall correlation between the aggregation propensity and the temperature at which the low-temperature CV feature occurs, further analysis suggests that the amyloid-forming sequences exhibit the key CV feature at a lower temperature compared to control sequences derived from the same protein.


Introduction
Amyloid fibrils are associated with serious disorders, such as Alzheimer's, Parkinson's, type II diabetes, and dialysis-related amyloidosis. These amyloids are formed from specific proteins, and the oligomers are toxic [1]. Amyloid fibrils have a cross-beta structure that is formed by H-bonding between the NH and CO groups of the main chain of partially folded or misfolded proteins [2,3]. Since it is generally possible for proteins to establish such interactions between main chain atoms, amyloid formation has been suggested to be a generic property [4]. However, amino acid sequences also play an important role, and amyloid formation propensity may be a sequence-specific property [5,6]. Both amyloidforming proteins and short segments of these proteins can form steric zippers [7][8][9][10]. All the peptides that form fibrils also form stable dimers [11]. The aim of the present work is to investigate whether the propensity for amyloid formation is encoded in the energy landscape of monomers and dimers, and to provide a thermodynamic diagnostic involving the heat capacity (C V ), which would complement models of amyloid formation based on the interactions within side chains.
The side chains of amino acid residues play an important role in amyloid formation [12][13][14], as they participate in aromatic-aromatic [15], electrostatic [16], and van der Waals interactions [17,18]. The hydrophobicity, secondary structure propensity, overall charge [19], exposed surface, dipole moment, and cooperativity in peptides correlate with amyloid formation ability [20]. The deposition of peptides on amyloid templates is stereospecific and it involves interactions between peptide backbone and/or side chains [21]. Several predictor algorithms have been developed to identify amyloidogenic proteins based on the insights obtained from amyloid structure and properties of amino acid residues [22][23][24][25]. Different conformations of amyloid proteins can also occur at different temperatures and in different regions of the brain [26]. These alternative conformations result in amyloid polymorphism [27,28].
However, the thermodynamic driving force for amyloid formation is still not well understood. Amyloid formation is studied mainly from the kinetic perspective [29,30]. Several attempts have been made to study the heat capacity of proteins [31][32][33] using isothermal titration calorimetry and differential scanning calorimetry, and investigate the thermodynamics of amyloid formation [34][35][36]. Our recent heat capacity calculations have provided some insight into how the presence of tyrosine and arginine increases the phase separation propensity of proteins [37]. This approach is also useful to rationalise the context-dependent properties of amino acid residues in different sequences. Here, we explore the energy landscapes and calculate the heat capacity for monomers (Tables 1 and 2) and dimers ( Table 1) of hexapeptides that are experimentally known to form amyloids, along with mutations that result in loss in amyloid formation ability (Figures 1 and 2). We find that the low-energy structures contributing to the first feature (peak or inflection point) above a threshold temperature in C V exhibit a variety of structures for the amyloid-forming monomers ( Figure 3). Selected atoms in the hexapeptides are used to define end-to-end distance and dihedral value parameters. These parameters are useful to classify the variety of conformations that occur for the amyloid monomers ( Figure 4a). The low-temperature feature in C V for amyloid monomers usually corresponds to a transition between structures with different backbone conformations with different main chain or side chain interactions. A similar heat capacity analysis for dimers shows another pattern in the end-to-end distance versus dihedral correlation plots. The low-energy structures that contribute to the lowtemperature C V feature exhibit larger end-to-end distances and larger positive dihedrals for the amyloid-forming sequences compared to the controls (Figure 4b).  We also investigated the correlation between the aggregation propensity (calculated using Aggrescan [47] software (http://bioinf.uab.es/aggrescan/)) and the temperature at which the first feature of interest occurs in the C V plot ( Figure 5). While there is little correlation between these two quantities, further analysis suggests that a higher aggregation propensity correlates with a lower temperature for the C V feature in sequences from the same protein. Extrinsic factors, such as pH, buffer conditions, and protein concentration [48], are known to affect amyloid formation. The present study aims to complement these results by probing the thermodynamics of sequence-specific properties for amyloid-forming peptides. Water also contributes to interactions, but solvent dynamics are difficult to visualise and quantify [49], and here, water is modelled using an implicit solvent.

Results and Discussion
Various hexapeptides from naturally occurring proteins have been found to be important contributors of protein aggregation, and these segments can themselves also form amyloids. For example, NFGAIL [42] and VQIVYK [44], in human islet amyloid polypeptide (hIAPP or amylin) and tau protein, respectively, form amyloids, and the aggregates of these proteins are found in type II diabetes and Alzheimer's, respectively. Here, we explore the potential energy landscapes of amyloid-forming hexapeptides and control sequences to investigate if there is any incipient signature for this behaviour in the heat capacity, which tells us about the competition between alternative low-lying conformations that differ significantly in their enthalpy and entropy. The hexapeptides NFGAIL, VQIVYK, and STVIIE were chosen because several control studies for these peptides exist in the literature (Tables 1 and 2). The amyloid-forming peptides that contain leucine as the first residue, such as LYQLEN and LLYYTE, were chosen and compared with similar control sequences YQLENY and YYTEFT to understand the importance of leucine as the initial residue. While amyloid fibrils are hallmarks of various diseases, a previous study showed that the amyloid fibrils of hexapeptides can attenuate neuroinflammation [38]. The hexapeptides that form fibrils at a neutral pH were taken from taken from that report. These species include the cationic peptides, VQIVYK, GYVIIK, and KLVFFA; the nonionisable polar peptides SNQNNF, SSQVTQ, SSTNVG, and SVSSSY; and the nonionisable hydrophobic peptides GAIIGL, MVGGVV, GGVVIA, and GAILSS. For further validation of our results, we chose a few more control hexapeptides from β 2 -microglobulin, including EVDLLK, LSFSKD, and NGERIE, while the lysine and phenylalanine containing control peptides KAFIIQ and KAILFL were chosen from Waltz-DB [46], for comparison with the amyloid-forming KLVFFA peptide.
The heat capacity analysis was performed on the converged landscapes ( Figure 1). Both the control and amyloid-forming hexapeptides were found to exhibit features at low temperatures. The temperatures for all the features are listed in Tables S2-S4 in the Supplementary Materials. Interestingly, the structures with varying backbone conformations were found to contribute significantly to the low-temperature feature in amyloidogenic peptides. Note that the feature (peak/inflection point) of interest was taken as the first feature that occurred above k B T = 0.086 kcal mol −1 in C V plot. This threshold was employed because the presence of specific residues (isoleucine, valine, leucine and tyrosine) leads to low-lying structures separated by relatively small barriers. These barriers occur due to the presence of different conformations for the side chains and different rotamers for tyrosine. The N-terminal and C-terminal caps and the residues at the ends of the peptides are relatively free to orient in different directions and establish H-bonding within the peptide in different ways. Such structures are also separated by relatively small barriers. Although these structures do not differ in the main chain conformations, they may give rise to features (peaks or inflection points) below k B T = 0.086 kcal mol −1 in C V . Hence, the lowtemperature peak of interest that represents a transition between structures with different main chain conformations is the first peak that occurs above the threshold temperature.  The energy landscapes for monomers, visualised using disconnectivity graphs [50,51], can be multifunneled for both amyloid-forming and control hexapeptides ( Figure 2). However, the structures lying at the bottom of funnels separated by significant barriers differ significantly for the amyloids and controls ( Figures S2-S31 in Supplementary Materials). The low-energy minima separated by large barriers interconvert via the breaking of Hbonds between main chain and side chain atoms, opening of the peptide backbone, and refolding, which results in a different backbone conformation. Local minima contributing to C V features (peaks/inflection points from low to high temperature are presented in the Supplementary Materials, Figures S2-S31) are represented by red to blue (feature 1), green to orange (feature 2), pink to purple (feature 3), and grey to yellow (feature 4). The scalebar represents 1 kcal mol −1 . Various conformations observed for hexapeptides appear similar to beta-hairpin (Ushaped), S-shaped (partially helical) [52], question-mark-shaped, W-shaped (almost helical), and extended Z-shaped structures, which are listed in the order of increasing end-to-end distance ( Figure 3). The end-to-end distance is the distance between the N atom of the first residue and the C atom of the sixth residue in a hexapeptide. The dihedral angle is defined as the angle between the C α atoms of the first, third, fourth, and sixth residues in the hexapeptide. Small distance and small dihedral angle parameters correspond to hairpin-like structures, and large distance and large dihedral angle correspond to helical structures. Even larger distances signify extended monomer structures. These distance and dihedral angle parameters were calculated for the structures contributing to the low-temperature C V feature, and we examined the correlations with the amyloid formation propensity. The quantitative correlation between the exact temperature at which the C V feature occurs and the aggregation propensity of the peptide was also investigated using a different threshold temperature (k B T = 0.076 kcal mol −1 ).

Monomer Heat Capacity
Interesting patterns were found on analysing the funnels that contained structures contributing to the first C V feature (peak or inflection point) above k B T = 0.086 kcal mol −1 (Figure 1a). In general, for amyloid-forming hexapeptide sequences, such as NFGAIL, VQIVYK, STVIIE, LYQLEN, and KLVFFA, several significantly different main chain conformations occur in different funnels (Figure 3). Both the helical and hairpin structures consistently appear together in the landscapes for amyloid hexapeptides. In contrast, for the controls (NAGAIL, VQIVEK, NAEVYK, YYTEFT, EVDLLK, LSFSKD, NGERIE, STVIIP, STVVIE, YQLENY, KAFIIQ, and KAILFL), only one type of (or a set of states with a similar secondary structure) main chain conformation occurs at the funnel bottom. This result is also evident from the greater spread of points for amyloid-forming sequences compared to the respective control sequences in the end-to-end distance versus dihedral correlation plot (Figure 4a). Different side chain interactions may be present in the helical and hairpin structures of amyloids. Helical structures generally have a H-bond between the side chains of two residues, while beta-hairpins have a H-bond between the side chains of another pair of residues, and one participating residue is common in establishing two different kinds of H-bond patterns in the two different main chain conformations. This situation occurs for VQIVYK, STVIIE, and LYQLEN amyloid hexapeptides. The control sequences lack such interactions between side chains. It is the residues containing hydroxyl or amide groups in their side chains that usually participate in these key interactions.
However, there are some amyloid-forming sequences that do not show a variety of low-energy backbone conformations contributing to the first C V peak of interest. This situation occurs for GAILSS, SNQNNF (almost helical), MVGGVV (hairpin), SVSSSY, and GYVIIK (partial helical). GAIIGL and GGVVIA exhibit partial helical structures. LLYYTE does not form a proper hairpin that can be classified in terms of a small end-to-end distance. These observations are also evident from the narrower range of points for these peptides in the distance versus dihedral correlation plot (Figure 4a). Similarly, the control sequences NAEVYK and SPVIIE, which at first do not appear to follow the trend shown by other control sequences, with a wider spread of points in the correlation plot, are found to follow the trend when their disconnectivity graphs are analysed. Both sequences lack contributions from helical structures. However, the occurrence of extended structures leads to points with larger distance in the correlation plot. For SPVIIE, the peak of interest is the same as the melting peak, and hence extended conformations are bound to occur at this temperature. In other words, SPVIIE does not show a low-temperature peak; it shows just a melting peak.
Various intramolecular interactions occur between amino acid side chains and the backbone in the low-energy structures that contribute to features in C V . The residues with an -OH group, such as serine, threonine, and tyrosine, can establish H-bonds and interact via H atom with the -CO group (main chain) or -COO − group (aspartic or glutamic acid), and via the O atom with the -NH group (main chain or lysine). Similarly, the presence of an amide group in asparagine and glutamine allows the same residue to establish two different types of H-bonds, i.e., one in which the -CO group interacts and another in which the -NH group interacts. The specific interactions found between amino acid pairs in some of the low-energy structures for the hexapeptide monomers are summarised in Table 3. Table 3. Intramolecular interactions between pairs of amino acid residues in some of the low-energy structures of hexapeptide monomers as visualised from the disconnectivity graphs ( Figures S2-S31 in Supplementary Materials). The symbols X and U in parentheses represent the interactions present in helical/partial helical and hairpin structures, respectively. The aromatic rings (F/Y) may interact via a T-shaped or an offset stacked geometry. In SNQNNF, we also found three residues interacting simultaneously in the low-energy structures for the monomer.

Dimer Heat Capacity
Basin-hopping [53] with rigid body moves and subsequent all-atom relaxation was used to sample the energy landscape for dimers. The low-temperature heat capacity feature was taken as the first peak or inflection point above 0.086 kcal mol −1 , as for the monomers (Figure 1b). The structures within 2.6 kcal mol −1 of the global minimum were used in the end-to-end distance versus dihedral correlation plot. Interestingly, for amyloid hexapeptides, the structures with relatively large end-to-end distances and large positive dihedrals contribute to the C V feature of interest (Figure 4b). For VQIVYK, very large negative dihedrals are also observed along with very large positive dihedrals. When the dihedral angles are close to 180 degrees (positive or negative), the corresponding structures are similar. However, for LYQLEN and YQLENY, the points in the correlation plot are close to each other. We note that LYQLEN forms amyloid at a low pH [38], whereas the peptides were studied at a neutral pH in the current study. NLGPVL is a control peptide that exhibited extended structures contributing to the C V peak of interest. However, further structural analysis revealed that the interstrand separation between monomers was larger compared to other amyloid-forming dimers exhibiting extended conformations (Figure 3). This result may be rationalised by taking into account the contribution from the L residue in the second position and the P residue in NLGPVL. We suggest that these residues may not allow the strands to interact very closely and hence play a role in preventing the aggregation of this peptide.

Correlation between Heat Capacity and Amyloid Formation Predictors
The correlation plot ( Figure 5) between the temperature at which the low-temperature feature occurs in the C V plot and the propensity for amyloid formation, as predicted using Aggrescan [47], shows that there is little overall correlation between the two quantities. However, further analysis reveals that the amyloid and control sequences derived from the same protein can probably be distinguished by comparing the temperature at which the first feature (excluding melting) occurs in the heat capacity plot between 0.076 and 0.300 kcal mol −1 . The amyloid-forming sequences exhibit features at a lower temperature compared to sequences with a lower propensity for amyloid formation and the respective control sequences. This trend is evident when we compare the sequences within the sets derived from the same protein, such as tau (VQIVYK, VQIVEK, and NAEVYK), amylin (NFGAIL and NAGAIL), β 2 -microglobulin (LLYYTE, YYTEFT and EVDLLK), and A-β (KLVFFA and MVGGVV). The control sequences KAFIIQ, NLGPVL, and YQLENY lacked such low-temperature features. Another predictor that can be useful to estimate amyloid formation propensity is CamSol [54]. CamSol is designed to predict the intrinsic solubility of proteins. Aggregation-prone sequences exhibit lower solubility. Heptapeptides are the minimal sequences for which solubility can be obtained using CamSol. To obtain an approximate idea of solubility for the capped hexapeptides, we added alanine residues at the termini of the peptide sequences. As expected, the solubility exhibited a strong negative correlation with the aggregation propensity predicted using Aggrescan ( Figure S1 in Supplementary Materials). Hence, the low-temperature heat capacity feature is positively correlated with intrinsic solubility for peptide sequences occurring in the same context, i.e., within the same protein ( Figure S1 in Supplementary Materials). As for Aggrescan, there is little overall correlation between the two quantities, intrinsic solubility and low-temperature C V feature. We note that the CamSol and Aggrescan predictors may have some intrinsic limitations for such small sequences. Both of them associate the control sequence KAILFL with high aggregation propensity, and the amyloid sequences with two or more serine residues are predicted to have high solubility and lower aggregation propensity. Overall, low-temperature heat capacity features for proteins do not exhibit a strong correlation with aggregation propensity. However, they may be useful to compare the properties of short sequences occurring within the same protein, i.e., within the same context.  Figure 4. Correlation plots between end-to-end distance and dihedral angles for low-energy structures contributing to first heat capacity peak of interest.  Figure 5. Correlation plot between the temperature (k B T) at which the first low-temperature feature is observed (between 0.076 and 0.300 kcal mol −1 ) in the C V plot for the monomer and the propensity for amyloid formation predicted using the Aggrescan software [47]. The colours correspond to the proteins from which the sequences are taken. Orange, green, magenta, and grey correspond to tau, amylin, A-β, and β 2 -microglobulin, respectively. Blue represents miscellaneous peptides taken from various proteins. Black represents a peptide which is amyloidogenic at a different pH. Several peptides are excluded from the above plot. The excluded peptides include GYVIIK, KAFIIQ, NLGPVL, and YQLENY, where the first C V feature is simply the melting peak; GGVVIA, which has such a feature occurring above 0.3 kcal mol −1 ; the LIAGFN control sequence, for which the existing predictor fail to classify it differently from its reverse (NFGAIL) amyloid-forming sequence; and the de novo designed peptides STVIIE, STVIIP, SPVIIE, and STVVIE, which do not derive from naturally occurring amyloid-forming proteins.

Materials and Methods
The hexapeptides were modelled using the FF99IDPs [55] force field, which is a modified version of FF99SBILDN [56] within AMBER [57,58]. This choice was made to gain better insight into the energy landscapes of various sequences using the same potential and compare them with our earlier study [37] using the same force field. Our previous research [37] showed that the structures that contribute to low-temperature C V peaks for different AMBER force fields are similar. The N-and C-terminals of the peptide were methylated and methylamidated, respectively. The topology file was symmetrised [59] to account correctly for permutational isomers. Water was modelled using implicit solvation (igb = 8) along with a 0.1 M monovalent ion concentration [57,58].
The initial landscape exploration was performed using basin-hopping parallel tempering (BHPT) [53,60,61], implemented within the global optimisation program GMIN [62] interfaced with AMBER. Similar to our previous study [37] on monomers, 16 replicas were used for BHPT, with temperatures exponentially distributed between 300 and 575 K. For dimers, the potential energy landscapes were explored using a combination of rigid body, Cartesian, and group rotation moves [63][64][65]. Each monomer was used to define a rigid body. The two monomers were expanded radially from the centre of the coordinates, rotated within the angle-axis framework [66], and translated, with rigid body moves performed after every 111 Cartesian moves. The thousand lowest energy structures were saved and converged tighter to a root-mean-square convergence criterion for the gradient of 10 −7 kcal mol −1 A −1 . A total of 600,000-800,000 basin-hopping steps were performed for each peptide dimer. This total involves different runs with different starting structures, step size, and frequency of rigid body moves. For dimers, the sampling was monitored using the convergence of low-temperature heat capacity peaks.
Disconnectivity graphs [50,51] provide an overview of the landscape organisation. These graphs preserve the information about the highest-energy barrier that needs to be overcome to interconvert a pair of connected local minima. The vertical axis represents the potential or free energy, and the nodes along the vertical axis define superbasins. The minima within a superbasin can interconvert by overcoming a barrier that is less than or equal to the threshold energy associated with the corresponding node. The branches terminate at the potential or free energy of the individual local minimum.
The heat capacity of the peptides was estimated using the harmonic superposition approximation [87][88][89][90][91][92]. The partition function of each local minima was obtained using normal mode analysis and the total partition function was the sum of the partition functions of all the local minima. Each peak in the heat capacity involves contributions from minima with negative and positive occupation probability derivatives with respect to temperature [93]. The principal contributions (98%) for each heat capacity peak were visualised in disconnectivity graphs using different-coloured branches for each set of minima ( Figures S2-S31 in Supplementary Materials). The geometric parameters (end-to-end distance and dihedral angle) used to distinguish different structures were calculated using the CPPTRAJ program within AMBER [94].

Conclusions
The heat capacity analysis of peptide monomers and dimers may provide insight into collective phenomena, such as the aggregation and phase separation of proteins. We found that a variety of low-energy structures with different backbone conformations contributed to the low-temperature heat capacity feature for amyloid-forming hexapeptide monomers. For control sequences, the C V peak of interest did not correspond to such a diverse set of structures. The structural competition can be diagnosed via a combination of geometric parameters, such as end-to-end distance and dihedral angle. The heat capacity analysis for dimer conformations revealed that the amyloid-forming hexapeptides preferentially contributed to the peak of interest via low-energy conformations with large end-to-end distance and large positive dihedrals. The exact temperature at which the C V feature occurred was lower for the peptides with higher propensity for amyloid formation compared to control sequences derived from the same protein. However, this trend does not hold if we compare amyloidogenic hexapeptides from one protein and control sequences from a different protein.
We do not expect to extract a universal aggregating propensity from monomer and dimer properties. However, the analysis of low-temperature heat capacity features revealed new opportunities for investigating and predicting collective behaviour. The distinctions we have identified for amyloid-forming hexapeptides, compared to controls with very similar sequences, suggest that further investigation could be fruitful. For example, the species responsible for aggregation could involve higher free energy minima (excitations [52]) in the monomer landscape, which may contribute to heat capacity features. The fact that the current analysis could distinguish amyloid and control sequences with the same amino acid residues in reverse order, as for NFGAIL and LIAGFN, further illustrates how the energy landscape framework can be used to understand the context-dependent properties of amino acid residues in proteins. In the future, we will investigate the heat capacities of oligomers using coarse-grained potentials.   [95]. The computational protocol for creating the database is available as a tutorial on https://github.com/nicy-nicy/peptide-energy-landscape-exploration (accessed on 18 May 2023) and the scripts to analyse the database are available on https://github.com/ nicy-nicy/energy-landscape-cv-analysis (accessed on 18 May 2023).

Conflicts of Interest:
The authors declare no conflict of interest.