Structural Achievability of an NH–π Interaction between Gln and Phe in a Crystal Structure of a Collagen-like Peptide

NH–π interactions between polar and aromatic residues are well distributed in proteins whose stabilizing effects have been investigated in globular and fibrous proteins. In order to gain structural insights into side chain NH–π interactions, we solved a crystal structure of a collagen-like peptide containing Gln-Phe pairs. The Gln-Phe NH–π interactions were further characterized by quantum calculations, molecular simulations, and structural bioinformatics. The analyses indicated that the NH–π interactions are robust under various solvent conditions, can be distributed either on the protein surface or in its hydrophobic core and can form at a wide range of distances between residues. This study suggested that NH–π interactions can play a versatile role in protein design, including engineering hydrophobic cores, solvent accessible surfaces, and protein–protein interfaces.


Introduction
NH groups are prevalent in proteins given their location in both the backbone and some side chains (e.g., Lys, Arg, Asn, and Gln). A previous study showed that approximately 1 in 10.8 aromatic side chains, not including His, make close contact with an NH group [1]. The nature of NH-π interactions is largely dependent on the polarity of the donor group. When N is from ammonium/guanidinium of Lys/Arg, the interaction energy is dominated by an electrostatic term, so-called cation-π interactions. The stabilizing effects of side chain cation-π interactions have been characterized and subsequently used to enhance protein stability [2][3][4][5]. However, when the N from either the backbone or uncharged Asn/Gln side chains is best classified between the conventional hydrogen bonding and a weaker interaction dominated by dispersion. Burley et al. initially noted that aromatic residues could make close contact with both Arg/Lys or Asn/Gln with similar preference [6], suggesting that side chain NH-π interactions could also play an important role in protein stability and/or function. The quantification of NH-π interactions in small molecule systems had been determined experimentally, which was also supported by quantum mechanical theory. In NH-π interactions in the N-methylformamide-benzene complex, the minimum energy distance was 3.2 Å and the binding energy was −4.37 kcal/mol [7]. In contrast, the stabilizing effects of NH-π interactions in Acetyl-Phe-Gly-Gly-N-Methyl amide peptide model ranged from −1.98 to −9.24 kcal/mol [8]. More detailed studies are needed to understand the NH-π interactions in proteins.

Peptide Synthesis and Purification
The peptides were synthesized with FMOC solid-phase chemistry method by GL Biochem Ltd (Shanghai, China), and purified to more than 95% purity. The purity was tested with reversed-phase high-performance liquid chromatography (RP-HPLC) on an Agilent 1260 system using an AdvanceBio peptide map column (C18), 2.1 × 150 mm, pore size 2.7-micron, with a linear gradient of water and acetonitrile gradient each containing 0.5% TFA. Molecular weights were validated with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) on a Bruker autoflex II using a prespotted anchor chip with 10 mg/ml 2,5-dihydroxybenzoic acid in water, 50% acetonitrile as the matrix, and a dual-stage reflection electron microscope obtain the spectrum ( Figure S5). The sequence of the peptide was as below: GPQGFO: GPOGPOGPOGPQGFOGPOGPOGPO where O is (4R)-hydroxyproline. The N-and C-termini are acetylated and amidated, respectively.

Crystallization and Data Collection
The pure and lyophilized GPQGFO peptide powder was dissolved in a 20 µL buffer at pH 7.0 with 10 mM Tris·HCl making the solutions of 5 mM concentration. Incubate at 4 • C overnight and then set for crystallization by using the hanging drop vapor diffusion method at 4 • C. After~one month, the best crystals were obtained under 0.2 M potassium nitrate and 20% PEG 3350. X-ray data were collected at 100 K, and the best one was diffracted to 1.385 Å. The space group was P2 1 22 1 , with dimensions: a = 24.09 Å, b = 28.01 Å, c = 73.37 Å, α = β = γ = 90 • indexed by the HKL2000 software (HKL Research, Charlottesville, VA, USA). Data collection and processing statistics are summarized in Table S2.

Structure Determination and Refinement
The structures of GPQGFO were solved by molecular replacement using a Phaser from the Phenix software suite [24] with a collagen structure (PDB ID: 5YAN [25]) as the search models. Initial phases were improved by rigid body refinement followed by rounds of simulated annealing and anisotropic B-factor refinement using the Phenix suite. Manual model rebuilding was performed using COOT [26]. Further rounds of refinement were performed using the phenix.refine program implemented in the PHENIX package with coordinate refinement, isotropic ADP refinement, and bulk solvent modeling. The final GPQGFO model contains 598 peptide atoms and 115 water molecules. The final GPQGFO structure has an R work /R free value of 16.7/18.0%.

Circular Dichroism (CD) Spectra
CD was recorded in 1-mm quartz cuvettes on a Chirascan instrument with a six-cell thermostated cell holder and a temperature controller (Applied Photophysics Ltd, London UK). Peptide solutions of 0.2 mM were characterized in phosphate-buffer (at pH 7.0) or in pH 3, 9 or in 100, 150, 300, 500 mM NaCl. Wavelength scans were conducted from 190 to 260 nm at 4 • C. For temperature-induced denaturation, ellipticity was measured from 4 to 80 • C with a heating rate of 1 • C for 6 min. Observed ellipticity was converted to molar ellipticity by dividing raw values by the peptide concentration, number of residues, and cell path length. The melting temperature, T m , was estimated as below: where θ(T) is the observed ellipticity, and θ F (T) and θ U (T) are predicted ellipticities derived from linear fits to the folded and unfolded baselines. The melting temperature is estimated as the T where F(T) = 0.5.

Quantum Mechanical (QM) Analysis
To probe the global energy landscape of Q-F interaction, a three-dimensional potential energy scan was performed between acetamide and toluene. A set of 3-D Cartesian axes was created where the x-y plane was mapped on the toluene ring with atom C1 defined as the origin and vector CM-C1 as the x-axis. Because toluene is symmetric about the x-axis and also the x-y plane, the whole interaction space between two molecules can be represented by such a quadrant that on only one side of the x-axis and x-y plane. The space was gridded at every 0.4 Å from 0 to 4.8 Å along x-and y-axes and every 0.2 Å from 2.0 Å to 5.0 Å along the z-axis. In each grid, the coordinate of the N atom in acetamide was constrained while the other atoms were free to move. In total, 2704 pairs of molecule models were optimized and calculated and the energy surface of the amide position related to the toluene ring was scanned. The M06-2X functional with cc-pVTZ basis set was employed in geometric optimization using Gaussian. Then the single point energy was calculated with the aug-cc-pVTZ basis set and counterpoise correction was employed to calculate the interaction energy. ORCA [27] was used to perform all single-point energy calculations.
Additional QM calculations were carried out to understand the NH-π interaction between acetamide and toluene. Geometry optimization was carried out at the MP2/aug-cc-pVTZ level using psi4 [28] and interaction energy was obtained at the DLNPO-CCSD(T)/augcc-pVQZ level with counterpoise correction using ORCA. SAPT calculations were carried out with the aug-cc-pVQZ basis set using psi4.

Calculation of Geometric Parameters
To determine the geometric relationship between the pairwise, three bond conformations (d NM , θ, ω,) and three side-chain-positions (d Cγ-π , τ, χ) were defined. d NM was denoted as the distance between the N of the amide NH groups of Gln or acetamide and the center (M) of the phenyl ring. At this time, the angle between NH and M was defined as ω (0 < ω < 180 • ), and the acute angle between the MN vector and the normal vector of the phenyl ring was described as θ (0 < θ < 90 • ). The d Cγ-π and τ were defined as the shortest distance and angle between Cγ of the Gln or acetamide and the plane formed by the phenyl ring. If Cγ is located in the same direction as the normal vector, τ is positive (0 < τ < 90 • ), otherwise, it is negative (-90 • < τ < 0 • ). Whereas the χ was described as the angle between the vector formed by the projection of the Cγ-N vector onto the plane and the Cβ-Cγ vector formed by the phenyl ring. See SI method for calculation details. All atomic cartesian coordinates were retrieved from the pdb files. Since hydrogen atoms are missing from most pdb files, theoretical positions of hydrogen atoms (X-H = 1.0 Å) were then added with HGEN [29] software, but only those at positions allowing this calculation unambiguously were used further. The following cutoffs are used to screen the Q-F pairs with the NH-π interactions, d N-M < 4.3 Å, θ < 25 • , ω > 120 • [1].

Molecular Dynamics (MD) Simulation
The molecular triplex model with 24 residues in each strand was built from the crystal structure using CHARMM [30]. After energy minimization in a vacuum, the system was solvated with TIP3P water [31] in a cubic box that was 89 Å in length. No counterions or salt concentration was included. Periodic boundary conditions were used and the particle mesh Ewald was applied on both electrostatic [32] and Lennard-Jones (LJ-PME) [33,34] interactions, with 12 Å as the real space cutoff. Applying CHARMM36m [35,36] force field, the simulation was performed in an isothermal-isobaric ensemble (NPT) at 298 K and 1 atm by employing the Andersen thermostat and Monte Carlo barostat, respectively. With the constraints on hydrogen-involved bonds, a time step of 2 fs was used in the velocity Verlet integrator. After 0.64 ns equilibrium simulation, the system was further run for 200 ns using OpenMM [37] with snapshots saved every 100 ps.

Database Analysis of Q-F Pair Interaction
A non-redundant set was curated containing 9658 protein structures all determined by X-ray diffraction to a resolution of 2.0 Å or higher, refined to R ≤ 0.25 and less than 30% sequence homology by PIESCES suits [38,39].
The following criteria were employed to assemble the set: (1) no theoretical model structures and no incomplete entries, i.e., Cα coordinates only, were accepted; (2) only crystal structures with a resolution of 2.0 Å or better and a crystallographic R-factor of 0.25 or lower were accepted; (3) the minimum chain length was 40 amino acid residues; and (4) the maximum acceptable amino acid identity between any two protein chains of the set was 30%.

Triple Helices Remained Stable under Various Ionic Strengths and pH
In a previous study of GPQGFO peptides, it was indicated that a favorable Q-F interaction could be generated to stabilize the triple helix in a neutral environment [23]. To experimentally assess the tolerance of the Q-F pairs in different environments, the thermal stability of GPQGFO under various NaCl (100-500 mM) concentrations and pH (3,7,9) was characterized by circular dichroism spectra. The molar residual ellipticity (MRE) at 220 nm and T m (~40 • C) were apparently unaffected (Figure 1b,c). These results indicated that the stability contribution of the Q-F pairs was maintained under various environmental conditions, which distinguished the GPQGFO motif from that of less tolerant motifs such as GPKGEO where the stabilizing electrostatic interactions are more affected [18]. Notably, the T m values of peptides recorded here were higher than those reported by Persikov et al. [12] and Walker et al. [23], potentially due to differences in melting schedule, host sequence, or blocking of peptide termini.

Triple Helices Remained Stable under Various Ionic Strengths and pH
In a previous study of GPQGFO peptides, it was indicated that a favorable Q-F interaction could be generated to stabilize the triple helix in a neutral environment [23]. To experimentally assess the tolerance of the Q-F pairs in different environments, the thermal stability of GPQGFO under various NaCl (100-500 mM) concentrations and pH (3,7,9) was characterized by circular dichroism spectra. The molar residual ellipticity (MRE) at 220 nm and Tm (~40°C) were apparently unaffected (Figure 1b, c). These results indicated that the stability contribution of the Q-F pairs was maintained under various environmental conditions, which distinguished the GPQGFO motif from that of less tolerant motifs such as GPKGEO where the stabilizing electrostatic interactions are more affected [18]. Notably, the Tm values of peptides recorded here were higher than those reported by Persikov et al. [12] and Walker et al. [23], potentially due to differences in melting schedule, host sequence, or blocking of peptide termini.

Side Chain Conformations in Crystal Structure
In order to gain a better understanding of the Q-F pairs in the context of a triple helix, and especially to help understand the difference between the axial and lateral sequential geometry [23]. The GPQGFO peptide crystallizes with the triple helical structure characteristic of collagen molecules with 1.385 Å resolution. More data acquisition and refinement statistics are available in Table S2. Two axial pairs and one lateral were observed in the structure depending on the relative positions of the interacting amino acids (Figure 1e). Two axial pairs were, respectively, composed of Gln in the leading and Phe in the middle chains of the triple helices, the other was between the middle and trailing chains. Three bondconformation parameters, d NM , θ, and ω, describe whether the Gln Nε-H perpendicularly points to the phenyl ring center, forming a hydrogen bond-like conformation (Figure 1d). The conformations were located at the region with d NM~3 .3 Å, θ~10 • and ω~150 • , well within the empirical cutoffs for NH-π interactions (d NM < 4.3 Å, −25 • < θ < 25 • , and ω > 120 • ) extracted from the previous data-mining of protein structures [1].
Except for the bond-conformation parameters, side-chain-position parameters, d Cγ-π , τ and χ, describe relative positions between the two molecules or side chains (Figure 1d). The separation distances were located at the regions with d Cγ-π~5 Å, τ~150 • and χ~93 • . There was a third pair between Gln in the trailing and Phe in the leading chains. The side chains were pointed in opposite directions (d NM = 7.4 Å), which did not form any interactions.

Energetic Favorability of Amide-π Interactions Characterized with Quantum Methanics
To explore whether the Q-F pairs captured in the crystal structures are energetically favored, ab initio QM calculations were performed on an acetamide-toluene complex as a model system of Gln-Phe interactions. A global minimum of interaction energy was obtained at the M06-2X/aug-cc-pVTZ level by scanning the relative positions of the acetamide atom N, representing Gln Nε, with respect to the coordinate set on the phenyl ring ( Figure S1). The global minimum of interaction energy (∆E) was −5.81 kcal/mol similar to hydrogen bonding (approximately −5 kcal/mol) [40,41] and smaller than cationπ interactions (about −19 kcal/mol) [42].
In order to better understand the relationship between the conformation and interaction energy (∆E), geometric parameters were plotted against ∆E (Figure 2a). The optimal (∆E = −5.81 kcal/mol) and sub-optimal (−5.75 < ∆E < −5.48 kcal/mol) conformations were located at the region with d NM~3 .2 Å, θ~10 • and ω~150 • , were highly similar to those of axial Q-F pairs in the crystal structure, which suggested the latter should also be energetically favored (Table 1). At 0 < θ < 20 • , ∆E was plotted with respect to d NM and the curve was similar to the Leonard-Jones potential. The minimal ∆E occurs at d NM~3 .0-3.5 Å. Beyond d NM~5 Å, there was little interaction, while within d NM~3 Å, ∆E increased dramatically (Figure 2c). This result indicated that the NH-π interaction was favorable in specific geometry of distance and orientation. The side-chain-position parameters of optimal and sub-optimal states were located at the regions with dCγ-π ~5 Å, τ ~50 and χ ~66, indicating that acetamide (Gln Cγ-Nε) was tilted against the phenyl ring (Figure 2b). Notably, the sub-optimal states were clustered near the toluene C atom (Phe Cβ) with χ ~65, which might enhance interaction between acetamide and toluene methyl group. With 40 < τ < 50, the data of dCγ-π vs. E were projected on a plane (Figure 2d). E decreases as dCγ-π decreases. When 4.5 Å < dCγ-π < 5 Å, E decreases as τ increases ( Figure S2). This suggests that, although the side chain proximity and orientation could affect the interaction energy, an NH-π interaction could be formed under various side chain conformations in protein.

Balanced Contributions from Various Types of Chemical Forces
To further understand the chemical property of NH-π interactions, the single point energy of the global minimum of acetamide-toluene was calculated at a higher level of DLNPO-CCSD(T)/aug-cc-pVQZ. The interaction energy was 5.95 kcal/mol. Symmetryadapted perturbation theory (SAPT) was then used to separately examine the electrostatic (Eele), exchange (Eee), induction (Eind), and dispersion (Edisp) terms that contribute to the Q-F interaction energy (Table S1). The SAPT2+3/aug-cc-pVQZ calculation revealed a total interaction energy of 7.03 kcal/mol, with 1 kcal/mol overestimated from the CCSD(T) value. Both Eele (6.6 kcal/mol) and Edisp (8.5 kcal/mol) made comparably large contributions (Table S1). The Edisp/Eele ratio (1.29) indicated a relatively balanced interaction between dispersion and electrostatic energies that indeed affected characteristics of both the protein interior and exterior [43].   Table 1. Geometric parameters of Q-F pairs extracted from QM calculation of the acetamide-toluene complex, crystal structure and MD trajectories of peptide GPQGFO, and a non-redundant dataset of protein crystal structures.

Methods
Pairs The side-chain-position parameters of optimal and sub-optimal states were located at the regions with d Cγ-π~5 Å, τ~50 • and χ~66 • , indicating that acetamide (Gln Cγ-Nε) was tilted against the phenyl ring ( Figure 2b). Notably, the sub-optimal states were clustered near the toluene C atom (Phe Cβ) with χ~65 • , which might enhance interaction between acetamide and toluene methyl group. With 40 • < τ < 50 • , the data of d Cγ-π vs. ∆E were projected on a plane (Figure 2d). ∆E decreases as d Cγ-π decreases. When 4.5 Å < d Cγ-π < 5 Å, ∆E decreases as τ increases ( Figure S2). This suggests that, although the side chain proximity and orientation could affect the interaction energy, an NH-π interaction could be formed under various side chain conformations in protein.

Balanced Contributions from Various Types of Chemical Forces
To further understand the chemical property of NH-π interactions, the single point energy of the global minimum of acetamide-toluene was calculated at a higher level of DLNPO-CCSD(T)/aug-cc-pVQZ. The interaction energy was −5.95 kcal/mol. Symmetryadapted perturbation theory (SAPT) was then used to separately examine the electrostatic (E ele ), exchange (E ee ), induction (E ind ), and dispersion (E disp ) terms that contribute to the Q-F interaction energy (Table S1). The SAPT2+3/aug-cc-pVQZ calculation revealed a total interaction energy of −7.03 kcal/mol, with 1 kcal/mol overestimated from the CCSD(T) value. Both Eele (−6.6 kcal/mol) and E disp (−8.5 kcal/mol) made comparably large contributions (Table S1). The E disp /E ele ratio (1.29) indicated a relatively balanced interaction between dispersion and electrostatic energies that indeed affected characteristics of both the protein interior and exterior [43].

Environmental Dielectric Constants
To explore how the NH-π interaction is influenced under a wide range of environments, such as protein interior and aqueous solution, the DLNPO-CCSD(T) calculations with implicit solvent model CPCM were performed. The interaction energy moderately decreased to −4.54 kcal/mol in protein interior (dielectric constant ε = 4) and −4.03 kcal/mol in water (ε = 78.4), respectively, compared to −5.95 kcal/mol in a vacuum. In contrast, cation-π interactions, such as NH4 + -benzene, were much more favorable in vacuum (approximately −19 kcal/mol) but significantly weakened to −3.6 kcal/mol in protein interior and −1.3 kcal/mol in water, which displays a high dependency on the polarity of the environment [42].

Conformational Distribution Sampled by Molecular Dynamics Simulations
In addition to the energetic favorability revealed by QM, atomistic molecular dynamics (MD) simulations were performed with the crystal structure of the collagen triple helix to investigate the dynamics of NH-π interactions in solution. To this end, the geometric parameters were extracted from the 200 ns MD trajectory and plotted as color-coded density maps (Figure 3). For the two closely contacted Q-F pairs,~22% of the bond conformations, represented by d NM , θ, and ω, fell within the empirical cutoffs. Due to the flipping of the phenyl ring, the τ values could be either positive or negative and there were two symmetrical peaks around the line τ=0 • on the d Cγ-π-τ plane (Figure 3b). The most frequently occurring conformations in the trajectory had the similar geometric values to those in the crystal structure and QM models ( Table 1). The MD data suggested that the Q-F pairs in the crystal structure should be maintained in solution to stabilize the protein, consistent with the observations in the CD experiments. For the third pair of Gln in the trailing and Phe in the leading chain, no interaction was detected in the MD trajectories ( Figure S3).
contrast, cation-π interactions, such as NH4 + -benzene, were much more favorable in vacuum (approximately 19 kcal/mol) but significantly weakened to 3.6 kcal/mol in protein interior and 1.3 kcal/mol in water, which displays a high dependency on the polarity of the environment [42].

Conformational Distribution Sampled by Molecular Dynamics Simulations
In addition to the energetic favorability revealed by QM, atomistic molecular dynamics (MD) simulations were performed with the crystal structure of the collagen triple helix to investigate the dynamics of NH-π interactions in solution. To this end, the geometric parameters were extracted from the 200 ns MD trajectory and plotted as color-coded density maps (Figure 3). For the two closely contacted Q-F pairs, ~22% of the bond conformations, represented by dNM, θ, and ω, fell within the empirical cutoffs. Due to the flipping of the phenyl ring, the τ values could be either positive or negative and there were two symmetrical peaks around the line τ=0 on the dCγ-π-τ plane (Figure 3b). The most frequently occurring conformations in the trajectory had the similar geometric values to those in the crystal structure and QM models ( Table 1). The MD data suggested that the Q-F pairs in the crystal structure should be maintained in solution to stabilize the protein, consistent with the observations in the CD experiments. For the third pair of Gln in the trailing and Phe in the leading chain, no interaction was detected in the MD trajectories ( Figure S3).

Data Mining of Protein Structures
In order to survey the situation of NH-π interactions between Gln and Phe in proteins, a non-redundant protein structure dataset was built consisting of 9658 well-resolved (resolution  2 Å) crystal structures. Among the 1747 Q-F pairs (dNM  4.5 Å) in the dataset, ~5% of the pairs (94 out of 1747) fell within the cutoffs that defined NH-π interactions (dNM < 4.3 Å, 25< θ < 25, and ω > 120). The phenyl rings of the 94 Q-F pairs were superimposed and formed two umbrella-shaped distributions of Gln above and beneath the ring, where all of the Nε-H groups were positioned approximately perpendicular to the

Data Mining of Protein Structures
In order to survey the situation of NH-π interactions between Gln and Phe in proteins, a non-redundant protein structure dataset was built consisting of 9658 well-resolved (resolution ≤ 2 Å) crystal structures. Among the 1747 Q-F pairs (d NM ≤ 4.5 Å) in the dataset, 5% of the pairs (94 out of 1747) fell within the cutoffs that defined NH-π interactions (d NM < 4.3 Å, −25 • < θ < 25 • , and ω > 120 • ). The phenyl rings of the 94 Q-F pairs were superimposed and formed two umbrella-shaped distributions of Gln above and beneath the ring, where all of the Nε-H groups were positioned approximately perpendicular to the ring center with similar d NM , θ and ω values (Figure 4). The values of d Cγ-π and τ range from 2.9 Å to 5.9 Å and −74.7 • to 56.2 • , respectively. The Gln side chains in the axial Q-F pairs were located in the center of the umbrella and had extremely high d Cγ-π (~5.3 Å) and τ (~61.1 • ). Notably, rather than uniformly distributed around the phenyl ring, the Gln side chains are less frequently projected toward the Phe Cα with χ ranging from 4.6 • to 175.7 • , potentially preventing steric clashes. This finding suggests that NH-π interactions could occur in proteins through diverse side chain conformations within a wide range of distances between Gln and Phe. SAS than Phe (Figure 4e). Except for the Q-F pairs in peptide GPQGFO, there were three fully exposed pairs ( Figure S4). In one case, Gln and Phe were located in the i and i+4 positions of α-helices, which may stabilize the α-helices. SAS analysis indicated that the Gln-Phe NH-π interactions could occur in both the core and the surface of proteins. Consistently, when the interaction was transferred from a vacuum to either low or high dielectric constant, simulating the protein core or surface, energy losses of only ~33% and ~25% were detected, respectively.

Conclusions
From thermal unfolding experiments, we have demonstrated that NH-π interactions allow the side chain of Gln to interact with the π-system of Phe in an axial geometry that It is noted that Phe is hydrophobic while Gln is hydrophilic, which is relevant to their functions in the protein. To explore locations of the NH-π interactions between them in proteins, i.e., the hydrophobic cores or the protein surfaces, the solvent accessible surface (SAS) area of the Q-F pairs were calculated with DSSP [44,45]. The majority (56 out of 94) of Q-F pairs were buried relatively deep inside the protein core (SAS < 30 Å 2 ) (Figure 4d). Eighteen Q-F pairs were partially exposed to solvent (60 Å 2 < SAS < 120 Å 2 ). Gln had higher SAS than Phe (Figure 4e). Except for the Q-F pairs in peptide GPQGFO, there were three fully exposed pairs ( Figure S4). In one case, Gln and Phe were located in the i and i+4 positions of α-helices, which may stabilize the α-helices. SAS analysis indicated that the Gln-Phe NH-π interactions could occur in both the core and the surface of proteins. Consistently, when the interaction was transferred from a vacuum to either low or high dielectric constant, simulating the protein core or surface, energy losses of only~33% and 25% were detected, respectively.

Conclusions
From thermal unfolding experiments, we have demonstrated that NH-π interactions allow the side chain of Gln to interact with the π-system of Phe in an axial geometry that is less influenced by the environment. The interaction is energetically highly favorable with its magnitude similar to hydrogen bonding [40,41]. Computational and experimental data from structural biology, QM, and structural bioinformatics demonstrate that the polar Nε-H group is typically positioned perpendicular to the phenyl ring and aligned with the phenyl ring center, which is well consistent with a previous study [1].
Two major features of the NH-π interactions have been revealed. First, the NH-π interactions are stable in various environments, i.e., the interaction energy is insensitive to dielectric constants. The stabilizing effect also remains almost constant under a wide range of pH and ionic strengths. When transferred from a vacuum to either low or high dielectric constant, the NH-π interaction loses only~33% and~25% energy, respectively, while cation-π interaction decays by~70% and~90% [42]. The Q-F pairs, both buried in the hydrophobic core or exposed on the protein surface, are identified in the data mining of the protein structures.
Secondly, the NH-π interactions can be formed by diverse side chain conformations under a range of distances between Gln and Phe. In the GPQGFO structure, Gln and Phe, with Cα atoms separated by 7.7 Å, made a strong NH-π interaction. Multiple sub-optimal interacting conformations were observed in the QM models.
Similar interactions between side chain amino and aromatic groups, such as between Asn and Phe, can be further explored, which will enable the use of NH-π interactions in the design of hydrophobic protein cores, solvent accessible protein surfaces, and proteinprotein binding interfaces.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom12101433/s1, Figure S1: Energy surface of an acetamidetoluene complex with quantum chemical calculation; Figure S2: Scatter plot of the data points with 40 • < τ < 50 • are projected on the ∆Ed Cγ-π lane and d NM ≤ 4 Å are projected on the ∆E-θ plane; Table S1: Terms of interaction energy from SAPT2+3/aug-cc-pVQZ and MP2 methods; Table S2. Crystal statistics of crystal data collection and structure refinement one crystal was used for each structure; Figure S3: Molecular dynamics simulation of the collage-like peptide GPQGFO; Figure  S4: Side chain interactions of Q-F pairs exposed on protein surface in the crystal structure of PDB ID:4OV4, 7ODC, and 3CK6; Figure S5: HPLC Chromatograms and Mass Spectrometry analysis of peptides; Figure S6: The electron density map of the Q-F pairwise.
Author Contributions: F.X., J.H. and R.Z. designed research; R.Z., Y.X. and J.L. collected and analyzed data; F.X., R.Z., Y.X., J.H. and S.F. composed the manuscript; Y.X and J.H provided quantum chemical calculation resources; J.L. and S.F. dissecting the crystal structure. All authors have read and agreed to the published version of the manuscript.