Heat-Stable Hazelnut Profilin: Molecular Dynamics Simulations and Immunoinformatics Analysis

Heat treatment can modify the allergenic potential, reducing allergenicity in specific proteins. Profilins are one of the important hazelnut allergens; these proteins are considered panallergens due to their high capacity for cross-reactivity with other allergens. In the present work, we evaluated the thermostability of hazelnut profilin, combining molecular dynamics simulation and immunoinformatic techniques. This approach helped us to have reliable results in immunogenicity studies. We modeled Cor a 2 profilin and applied annealing simulation, equilibrium, and production simulation at constant temperatures ranging from 300 to 500 K using Gromacs software. Despite the hazelnut profilins being able to withstand temperatures of up to 400 K, this does not seem to reduce its allergenicity. We have found that profilin subjected to temperatures of 450 and 500 K could generate cross-reactivity with other food allergens. In conclusion, we note a remarkable thermostability of Cor a 2 at 400 K which avoids its structural unfolding.


Introduction
Hazelnut (Corylus avellana) is one of the most consumed foods in the world; it is an important supply of processed food and can be consumed raw or cooked [1]. Likewise, this tree nut is known for its health benefits [2][3][4][5][6][7].
The influence of heat treatment on food processing has an impact on the allergenic and antigenic integrity of tree nut extracts [17]. The heat-resistant allergenic potential of hazelnut above 413.15 K by in vitro assays has been analyzed by many researchers, finding the allergenic potential of hazelnut at high temperatures is affected, demonstrated by decreased IgE binding [18,19]. Additionally, M. Wigotzki et al. found that proteins with low weight are stable at even higher temperatures [19].
The thermostability of proteins has significant repercussions in the properties of their native folded structure. Furthermore, these proteins have many individual factors achieving stability at high temperatures [20]. This behavior can be attributed to hydrogen bonding, salt bridges, van der Waals' interactions, hydrophobic internal packing, improved electrostatic interactions, and others [21,22]. Currently, molecular dynamics simulation techniques allow us to understand the behavior of many proteins submitted to high temperatures, and to enable us to predict their behavior [23][24][25][26]. For example, the soy allergen has been analyzed by molecular dynamics simulations methods to understand the structural conformation for the development of antiallergenic drugs [27,28]. Another interesting case is the study of the peanut conglutin, to understand the overall conformational stability of the protein and their IgE binding epitopes [29].
The present research focused on analyzing the impact of temperature on the allergenicity of hazelnut profilin through molecular dynamics simulations. Additionally, immunoinformatic tools were used to predict linear epitopes and cross-reactivity, among other food allergens.

Molecular Dynamics Simulation
The molecular dynamics (MD) simulations were performed in Gromacs 2019, ref [32] and we considered OPLS-AA force field. OPLS-AA is an interesting force field that considers all atoms of a protein explicitly and is useful in the study of the organic and biomolecular systems [33]. The protein was situated inside a cubic box with a distance of 1.5 nm to the edges of the box and the explicit TIP4P [34] water model was considered. Na + ions were added to neutralize the system. The energy minimization using steepest descent method of 200,000 steps with 0.001 nm initial step-size was employed.
Leap-frog algorithm for integrating Newton's equations was employed to MD simulations. The bond lengths were constrained by the LINCS (LINear Constraint Solver) algorithm, ref [35] and periodic boundary conditions (PBC) in all directions, were used. Cut-off of 0.9 nm distance for the short-range interactions and Particle-Mesh Ewald (PME) [36] method for the long-range interactions were considered. V-rescale thermostat and Parrinello-Rahman barostat regulated the temperature (300 K) and pressure (1 bar).
We applied three MD steps: annealing, equilibrium, and production simulation. The first, simulated annealing procedure consisted of 20 annealing cycles in 500 ps with a high temperature of 400 K and a cooling temperature until 300 K by 10 ns. In the second step, the equilibration was performed in the NVT (number of molecules, volume, and temperature constant) canonical ensemble by a short time of 10 ns for integration, considering position restraint. They were restrained with harmonic forces and the water molecules were allowed to equilibrate around the protein. In the third step, we considered the MD in the isobaric-isothermal ensemble without position restraint during 200 ns was calculated. Furthermore, five systems with these temperatures: 300 K, 350 K, 400 K, 450 K, and 500 K, were considered at this point.
The analysis of Root-mean-square deviation (RMSD), Root-mean-square fluctuation (RMSF), radii of gyration (Rg), Solvent Accessibility Surface Area (SASA), and hydrogen bond (Hb), were obtained using the GROMACS tools and plotted in Gnuplot 5.0 software. We considered the average values of the last 20 ns of the MD simulation. Salt bridges of 5 frames of the trajectory (180 nm, 185 nm, 190 nm, 195 nm, and 200 nm) were analyzed by VMD (Visual Molecular Dynamics) [37] software. All graphics representation was recreated in Chimera UCSF 11.1.2 software [38].

Immunoinformatic Details
The last frame of MD simulations was analyzed through Ellipro server (http://tools.iedb.org/ ellipro/). The server predicts linear and discontinuous antibody epitopes based on a three-dimensional structure protein antigen [39]. We have considered the minimum score of 0.5 and the maximum distance of 6 Angstroms.
Identification of potential cross-reactive Cor a 2, was predicted by Cross-React server (http: //curie.utmb.edu/Cross-React.html), a tool that uses the epitope location and the three-dimensional structure of protein allergens [40]. A patch size (A) of 10, an area cutoff (A 2 ) of 10, and the correlation cutoff of 0.10, were applied.

Results and Discussion
Currently, the three-dimensional structures of all profilins have not been entirely reconstructed; consequently, we have to use alternative methods to study non-crystallized proteins. This study was based on the clinical relevance of hazelnut allergenicity, particularly focused on profilin (Cor a 2). We have identified Hev b 8 as an adequate template (PDB ID: 5FDS), ref [41] which has a 1.9 Å resolution (resolved by X-ray diffraction) and a sequence identity percentage of 88.46% with Cor a 2 (see Figure 1).

MD Simulation Analysis
We introduced simulated annealing. This analysis refined and fitted the three-dimensional structure of protein after the prediction of homology modeling [42]. We used the periodic cycles of 400 K, 280 K, 270 K, 298 K, and 300 K for 0 ps, 30 ps, 60 ps, 300 ps, and 500 ps, respectively. Figure 2A presents the RMSD plot with an average of 0.14 ± 0.02 nm, and Figure 2B shows the temperature cycles during the simulation. For the MD calculus, we have used a OPLS-AA force field characterized by calculating the conformational energy of organic and biomolecular systems. This force field allows us to reproduce the experimental properties of the condensed structural and thermodynamic phases [43,44].
Furthermore, the selection of the water model plays an important role in these types of simulations, in which we look at the behavior at temperatures above 300 K. The best water model chosen for the study was the four-site water model TIP4P [45]; this water model is capable of inducing the values of physical properties and achieves a better approximation [46][47][48]. Table 1 and Figure 3 show the average of RMSD, RMSF, Rg, SASA, and Hb of the last 20 ns. The RMSD average shows an increase from 300 to 500 K, the lowest and highest values were indicated as high and low stability, respectively ( Figure 3A). While the average RMSF values show an abrupt increase at 450 K and 500 K, the high flexibility values were found in the middle of the structure, with the loss of the beta-strands, provoking the unfolding process ( Figure 3B). While the Rg value at 500 K was higher than 300 K, there is less structural compaction at 450 K and 500 K ( Figure 3C).
An interesting result of the molecular dynamics simulation was the analysis of the solvent-accessible surface area (SASA). Generally, SASA defines the surface around a protein model calculated based on the study of the hypothetical center of a water sphere (approximately one molecule of water) [49]. The total SASA values increased from 69.96 to 89.40 nm 2 with the temperature change, suggesting that the total hydrophobicity of Cor a 2 was reduced generating conformational changes ( Figure 3D). The low variation in Cor a 2 at 300 K, 350 K, and 400 K denotes higher thermodynamic stability than 450 K and 500 K. In Table 1 the Hb analysis showed us that the highest temperatures (450 K and 500 K) led to the loss of secondary structure conformation due to the reduction of their hydrogen bonds.  The analysis of salt bridges in macromolecules allows describing of the structure and function of proteins. Usually, these occur between negative (ASP and GLU) and positive (LYS and ARG) amino acids. In this study, the analysis of five frames of the last 20 ns, was performed. We analyzed the salt bridges of proteins, where the 450 K and 500 K systems had the highest salt bridges (See Table 2). Adrian Helcock in 1998 proposed that salt bridges play a crucial role in promoting hyperstability in proteins; however, they seem to contribute low stability of the protein at room temperature [50]. Therefore, the increase in salt bridges at high temperatures may imply a greater resistance to Cor a 2 denaturation.
3.3 GLU120-LYS95 3.4 d: distance in Å between atoms of basic and acidic residues.
The presence of β-sheets in the conformational structure of profilins define its high stability. Indeed, the β-sheets seem to be the most important region of the thermostability in these proteins. Figure 4 shows us in red the structure conserved concerning the protein at 300 K. Profilin at 350 K maintained the major regions of 300 K, unlike 400 K and 450 K in which only the motif of β-sheets is conserved. The secondary structure of the profilin at different temperatures showed us slight, but relevant, variations in their conformation. We observed quantitative variations of β-sheets conformation at each temperature; for 300 K, 350 K, 400 K, 450 K, and 500 K the percentage were 26.2%, 25.4%, 26.2%, 21.5%, and 15.4% respectively. A tendency to decrease the number of residues that form the β-sheets was observed; it is directly linked to the protein denaturation due to the increased temperature (For additional information see Table S6).
Another point of view is the sequence conservation, Figure 5 shows the sequence alignment between 300 K and the other temperatures analyzed. We identify in the box the structural conservation for each system.  Table S6 of supplementary material. Figure 5. The alignment sequence of molecular structure was analyzed at a different temperature, with 300 K as a reference temperature. The box shows us the sequence conservation at different temperature as indicated by the letters (A-D).

Immunoinformatics
The antigenic B cell epitopes prediction has clinical importance because it helps to identify areas of an antigen recognized by antibodies [51]. In this work, we used the Ellipro server, which is a handy immunoinformatic tool based on three algorithms. The first is the approximation of the protein shape as an ellipsoid. The second approximation is the residue protrusion index (PI). The third is the clustering of neighboring residues based on their PI values. These criteria allow analyzing linear and discontinuous epitopes of a protein [52,53].
For this analysis, we have used the last frame at each temperature. The Table 3 describes the score and the number of residues for each linear epitope. These results show that epitopes are reduced when the system is studied at 500 K. Additional information is shown in Tables S1-S5, in which we show the analysis of the epitopes of five frames of the last 20 ns. However, despite a reduction of epitopes, several allergenic regions persist in all cases. In order to complement the allergenicity study, we performed the cross-reactivity analysis, based on the epitopes of the last frame. For this case, we have considered a Pearson correlation coefficient (PCC) with a cut-off of 0.8. The results show us cross-reactivity of Cor a 2 at 300 K with 24 allergens from different plants and a decrease in cross-reactivity at 500 K.
We analyzed the percentage of intensity following the Equation (1), where; E C−R is the number of epitopes with a PPC greater than 0.8, a product of cross-reactivity analysis (cross-reactivity epitopes), and E n is the number of epitopes (linear epitopes) by Cor a 2 at different temperatures.
Linear epitopes and cross-reactivity of Cor a 2 were evaluated through temperature changes, showing significant variation. The heat map ( Figure 6) exhibits the intensity percentage of these variations where the green color means hight cross-reactivity, and the red color implies the lost cross-reactivity. In Supplementary Information, Figures S1-S5 show in detail the cross-reactivity prediction for each system. Moreover, the major cross-reactivity occurs between 350 K, 400 K, and 450 K, and the effect is reduced at 500 K. However, Cor a 2 at 500 K has a 50% prevalence that generates cross-reactivity with other allergens. With this last prediction, we show that hazelnut profilin at temperatures among 300 K and 500 K generates cross-reactivity with other food allergens. Figure 6. Heat plot. The cross-reactivity of Cor a 2 and 24 allergens are representants by the percentage of predominance; where 100% (green color) indicates that all epitopes are shown for a particular allergen and a given temperature.

Conclusions
In this research, we have performed molecular dynamics simulations at constant temperatures of 300 K, 350 K, 400 K, 450 K, and 500 K by 200 ns. These methods allowed us to understand the remarkable thermostability of Cor a 2 at temperatures up to 400 K. The salt bridges played an important role in the thermoresistance analysis of profilin. We saw that these salt bridges might avoid the complete denaturation of Cor a 2 at high temperatures as well as the case of thermophilic proteins.
Moreover, the analysis of the number of linear epitopes based on 3D structures, demonstrated that there is a reduction of epitopes at temperatures of 500 K. However, the decrease in epitopes of Cor a 2 at 500 K does not show the complete loss of its allergenicity. Finally, the cross-reactivity prediction identified many potential allergens that could generate cross-reactivity with Cor a 2 at high temperatures.