Computational Prediction of the Protonation Sites of Ac-Lys-(Ala)n-Lys-NH2 Peptides through Conceptual DFT Descriptors

Six density functionals (M11, M11L, MN12L, MN12SX, N12, and N12SX) in connection with the Def2TZVP basis set and the SMD solvation model (water as a solvent) have been assessed for the calculation of the molecular structure and properties of several peptides with the general formula Ac-Lys-(Ala)n-Lys-NH2, with n = 0 to 5. The chemical reactivity descriptors for these systems have been calculated through Conceptual density functional theory (DFT). The active sites for nucleophilic and electrophilic attacks have been chosen by relating them to the Fukui function indices, the condensed dual descriptor Δf(r), and the electrophilic Parr functions. The results allowed the prediction of the protonation sites of the peptides and rendered a qualitative explanation of the difference in pKa of the two Lys groups.


Introduction
Conceptual density functional theory (DFT, also known as chemical reactivity theory) is a powerful tool for the prediction, analysis, and interpretation of the outcome of chemical reactions [1][2][3][4].
Recently, Luis R. Domingo [5] proposed a new theory for the study of the reactivity in organic chemistry, which he named molecular electron density theory (MEDT). In this new theory, the capability for changes in electron density is responsible for the molecular reactivity [5], while the electron density distribution at the ground state is responsible for physical and chemical molecular properties.
Within MEDT (which encompasses conceptual DFT), several new concepts and reactivity descriptors have been defined, such as the global electron density transfer (GEDT), the nucleophilicity N index, and local condensed descriptors like the electrophilic P − k and nucleophilic P + k Parr functions [6]. The use of the Parr functions is becoming widespread for the understanding of organic chemistry processes, as evidenced by some recent publications related to Diels-Alder and cycloadditions reactions [7][8][9].
An interesting chemical reaction amenable to being studied through conceptual DFT is protein glycation. This reaction is initiated by a nucleophilic addition (nonenzymatic glycation or Maillard reaction), a reaction between a free amino group from a protein and a carbonyl group from a reducing sugar to form a freely reversible Schiff base. Glycated proteins can undergo further reactions, giving rise to some structures called advanced glycation endproducts (AGEs) [10].
Indeed, the glycation sites of an amino acid, peptide, or protein may also be the preferred sites for protonation. The reason for this assertion is because from a chemical point of view, the interaction of the carbon atom of a reducing carbonyl compound with an amino nitrogen atom is the same kind of reaction that occurs when a proton interacts with that nitrogen atom of the amino group. For a peptide or protein with many glycation or protonation sites, there is a rank of preferred reaction sites that depends on the pH, the solvent, and the conformational structure of the molecular system. This means that that if we know the ordering of the protonation of the peptide or protein, it will be possible to discern (at least qualitatively) the pK a of the Lys residues.
In some recent interesting works [11,12], Makowska et al. studied the acidic-basic properties of alanine-based peptides containing acidic and basic side chains. In particular, they experimentally estimated the influence of the length of the alanine spacer on the pK a of the Lys residues of Ac-Lys-(Ala)n-Lys-NH2 peptides (wth n = 0, 1, 2, . . ., 5) [11].
Following the pioneering work of Parr et al. [1], a number of useful concepts have been derived from the analysis of the density of any molecular system through DFT. These concepts that allow a researcher to make qualitative predictions about the chemical reactivity of a given system can also be quantified, and are collectively known as conceptual DFT Descriptors.
Therefore, we believe that it could be of interest to apply the concepts of density functional theory to the study of those peptides in order to determine if it is possible to discern between the potential protonation sites and to obtain a qualitative estimation of the pK a differences between them.
In order to obtain quantitative values of the conceptual DFT descriptors, it is necessary to resort to the Kohn-Sham theory through calculations of the molecular density, the energy of the system, and the orbital energies-specifically those related to the frontier orbitals; that is, HOMO and LUMO [13][14][15][16][17][18].
Although the foundations of DFT have established that a universal density functional must exist and that all of the properties of the system can be obtained through calculations with this functional, in practice one must resort to some of the approximate density functionals that have been developed in the last thirty years. Because these are approximate functionals (i.e., not universal functionals), many of them are good for predicting some properties and others are good for other properties. Sometimes you can find density functionals that are excellent for describing the properties of a given molecular system with a particular functional group, but it is necessary to resort to other density functionals for a different functional group that you want to include in the molecular system under study.
When one is dealing with the study of chemical reactivity (i.e., a process that involves the transference of electrons), it is normal to perform calculations not only of the ground state, but also for open systems like the radical cation and radical anion. These systems are often difficult to converge giving trustworthy results-specially if diffuse functions must be included in the basis set [13][14][15][16][17][18]. For this reason, it is convenient to have a method that can give all information that one needs directly from the results of the calculation of the ground state of the molecular system under study. In particular, one may want to obtain the ionization potential (I) and electron affinity (A) of the system avoiding the calculation of the anion and cation radicals. Indeed, the link for this is given by the so-called Koopmans' theorem [15][16][17][18].
Koopmans' theorem is not valid within DFT. However, from an empirical and practical point of view, it is meaningful to follow the procedure of assigning KS HOMO as equal to and opposite of the vertical ionization potential, H = -I, and the KS LUMO as equal to and opposite of the vertical electron affinity, L = -A. We have coined the acronym KID ("Koopmans in DFT") for this empirical procedure.
This means that the goodness of a given density functional for the purpose of predicting the Conceptual DFT descriptors directly from the properties of the neutral molecule can be estimated by checking how well it follows the KID procedure. Thus, it will be interesting to consider several recent density functionals that have shown great accuracy across a broad spectrum of databases in chemistry and physics [19] to evaluate their performance in fulfilling this practical technique.
The objective of this work is twofold: (i) to conduct a comparative study of the performance of some density functionals from the Minnesota family for reproducing chemical reactivity descriptors of the Ala-spaced Lys-containing peptides proposed by Makowska et al. [12] within the KID formalism; and (ii) to predict the preferred protonation sites that could be give a qualitative estimation of the pK a of the Lys residues using condensed dual descriptors.

Theoretical Background
As this work is part of an ongoing project, the theoretical background is similar to that presented in previous research [20][21][22][23][24][25][26][27] and will be shown here for the sake of completeness. Within the conceptual framework of DFT [2,28], the chemical potential µ is defined as: where χ is the electronegativity, while the global hardness η is Using a finite difference approximation and the KID procedure, the above expressions can be written as: where H and L are the energies of the highest occupied and the lowest unoccupied molecular orbitals (HOMO and LUMO), respectively. The electrophilicity index ω has been defined as [29]: The Fukui function is defined in terms of the derivative of ρ(r) with respect to N [2]: The function f (r) reflects the ability of a molecular site to accept or donate electrons. High values of f (r) are related to a high reactivity at point r [2]. By applying a finite difference approximation to Equation (6), two definitions of Fukui functions depending on total electronic densities are obtained: where ρ N+1 (r), ρ N (r), and ρ N−1 (r) are the electronic densities at point r for the system with N + 1, N, and N − 1 electrons, respectively. The first one, f + (r), has been associated to reactivity for a nucleophilic attack, so it measures the intramolecular reactivity at the site r towards a nucleophilic reagent. The second one, f − (r), has been associated to reactivity for an electrophilic attack, so this function measures the intramolecular reactivity at the site r toward an electrophilic reagent [28].
Morell et al. [3,[30][31][32][33][34][35] proposed a local reactivity descriptor (LRD) which is called the dual descriptor (DD) f (2) (r) ≡ ∆ f (r). The definition of ∆ f (r) is shown as indicated by Morell et al. [30,31]: The dual descriptor can be condensed over the atomic sites: when ∆ f k > 0, the process is driven by a nucleophilic attack on atom k, and then that atom acts as an electrophilic species; conversely, when ∆ f k < 0, the process is driven by an electrophilic attack over atom k, and therefore atom k acts as a nucleophilic species.
In an interesting and very enlightening article [36], Jorge Ignacio Martínez-Araya has shown why the dual descriptor is a more accurate local reactivity descriptor than the Fukui functions. He has shown that although the Fukui function has the capability of revealing nucleophilic and electrophilic regions on a molecule, the dual descriptor is able to unambiguously expose truly nucleophilic and electrophilic regions, but along with the latter, dual descriptor is less affected by the lack of relaxation terms than the Fukui function when the frontier molecular orbital approximation is applied, and that this implies that the dual descriptor can be considered a more reliable descriptor to measure local reactivity than Fukui function. For this reason, we have chosen the DD to fulfil the objectives of our study instead of the Fukui functions individually.
In 2013, Domingo proposed the Parr functions P(r) [37,38], which are given by the following equations: P − (r) = ρ rc s (r) (for electrophilic attacks) , P + (r) = ρ ra s (r) (for nucleophilic attacks) , which are related to the atomic spin density (ASD) at the r atom of the radical cation or anion of a given molecule, respectively. The ASD over each atom of the radical cation and radical anion of the molecule gives the local nucleophilic P − k and electrophilic P + k Parr functions of the neutral molecule [6].

Settings and Computational Methods
Following the lines of our previous work [20][21][22][23][24][25][26][27], the computational studies were performed with the Gaussian 09 [39] series of programs with density functional methods as implemented in the computational package. The equilibrium geometries of the molecules were determined by means of the gradient technique. The force constants and vibrational frequencies were determined by computing analytical frequencies on the stationary points obtained after the optimization to check if there were true minima. The basis set used in this work was Def2SVP for geometry optimization and frequencies, while Def2TZVP was considered for the calculation of the electronic properties [40,41].
For the calculation of the molecular structure and properties of the studied systems, we chose six density functionals from the latest Minnesota density functionals family which consistently provide satisfactory results for several structural and thermodynamic properties [19]: M11, which is a is a range-separated hybrid meta-GGA (generalized gradient approximation) [42]; M11L, which is a dual-range local meta-GGA [43]; MN12L, which is a nonseparable local meta-NGA (nonseparable gradient approximation) [44]; MN12SX, which is a range-separated hybrid nonseparable meta-NGA [45]; N12, which is a nonseparable gradient approximation [46]; and N12SX, which is a range-separated hybrid nonseparable gradient approximation [45]. In these functionals, GGA stands for generalized gradient approximation (in which the density functional depends on the up and down spin densities and their reduced gradient) and NGA stands for nonseparable gradient approximation (in which the density functional depends on the up/down spin densities and their reduced gradient, and also adopts a nonseparable form). All the calculations were performed in the presence of water as a solvent by doing Integral Equation Formalism-Polarized Continuum Model (IEF-PCM) computations according to the solvation model density (SMD) solvation model [47].

Results and Discussion
The molecular structures of the six peptides with the general formula Ac-Lys-(Ala) n -Lys-NH2, with n = 0 to 5 were built by means of the Avogadro 1.2.0 program [48,49]. The resulting structures were pre-optimized by finding the most stable conformers through a random sampling with molecular mechanics techniques and a consideration of all the torsional angles through a random sampling with molecular mechanics techniques and a consideration of all the torsional angles through the general AMBER force field [50]. The structures of the resulting conformers were then reoptimized with the six density functionals mentioned in the previous section in conjunction with the Def2SVP basis set and the SMD solvation model, using water as a solvent. As an illustration of the optimized peptides, the geometrical conformation of Ac-Lys-(Ala) n -Lys-NH2 with n = 1 (KAK) is presented in Figure 1. The HOMO and LUMO orbital energies (in eV), vertical ionization potentials I, and electron affinities A (in eV), global electronegativity χ, total chemical hardness η, and global electrophilicity ω of the peptides calculated with the six density functionals and the Def2TZVP basis set using water as as solvent simulated with the SMD parametrization of the IEF-PCM model are presented in Tables S1A to S6A of the Electronic Supplementary Information (ESI). The upper part of the tables shows the results derived assuming the KID procedure (hence the subscript K) and the lower part shows the results derived from the calculated ∆SCF vertical energies.
With the objective of analyzing our results in order to verify the fulfillment of the KID procedure, we previously designed several descriptors that relate the results obtained through the HOMO and LUMO calculations with those obtained by means of the vertical I and A with a ∆SCF procedure. The first three descriptors are related to the simplest fulfillment of Koopmans' theorem by relating H with -I, L with -A, and the behavior of them in the description of the band gap: Next, we consider four other descriptors that analyze how well the studied density functionals are useful for the prediction of the electronegativity χ, the global hardness η, and the global electrophilicity ω, and for a combination of these conceptual DFT descriptors, considering only the energies of the HOMO and LUMO or the vertical I and A: where D1 stands for the first group of conceptual DFT descriptors. The results of the calculations of J I , J A , J HL , J χ , J η , J ω , and J D1 for the six peptides considered in this work are displayed in Tables S1B to S6B of the Electronic Supplementary Information (ESI).
Based on the results for the descriptors presented on Tables S1B to S6B of the ESI, we have compiled the average values for each density functional on the whole group of peptides, and the calculated results are displayed in Table 1. As can be seen from Table 1, the KID procedure holds with great accuracy for the MN12SX and N12SX density functionals, which are range-separated hybrid meta-NGA and range-separated hybrid NGA density functionals, respectively. Indeed, the values of J I , J A , and J HL are not exactly zero. However, the results are impressive, particularly for the N12SX density functional.
It is interesting to see that the same density functionals also fulfill the KID procedure for the other descriptors-namely J χ , J η , J ω , and J D1 . These results are very important, because they show that it is not enough to rely only on J I , J A , and J HL . For example, if we consider only J χ , for all of the density functionals studied in this work, the values are very close to zero. As for the other descriptors, only the MN12SX and N12SX density functionals show this behavior. This means that the results for J χ could be due to a fortuitous cancellation of errors.
An important fact is that although the range-separated hybrid NGA and range-separated hybrid meta-NGA density functionals can be useful for the calculation of the conceptual DFT descriptors, it is not the same for the range-separated hybrid GGA (M11) density functional. An inspection of Table S1A of the ESI shows that this is due to the fact that this functional inadequately describes the energy of the LUMO, leading to negative values of A, which are in contradiction with the ∆SCF results.
The condensed Fukui functions and condensed dual descriptor have been calculated using the AOMix molecular analysis program [51,52] starting from single-point energy calculations.
The condensed dual descriptors ∆ f k and the electrophilic Parr functions P − k (MPA) and P − k (MPA) over the N atoms of the Lys residues of the KK, KAK, KA2K, KA3K, KA4K, and KA5K peptides of the Lys residues of the KK, KAK, KA2K, KA3K, KA4K, and KA5K peptides calculated with the MN12SX and N12SX density functionals and the Def2TZVP basis set using water as as solvent simulated with the SMD parametrization of the IEF-PCM model are shown in Table 2.
As the studied peptides are end-capped, it would be expected that the chemical reactivities of the Lys nitrogen atoms will be different. In fact, the values presented in Table 2 correspond to the condensed dual descriptors and electrophilic Parr functions over the nitrogen atoms of the Lys closer to the end-terminal Ac-group (Lys1). The values of the descriptors over the nitrogen atom of the other Lys residue (Lys2) are not shown because they are almost zero in all cases. This means that the preferred glycation site will be Lys1, but it will also be the preferred protonation site. Therefore, it can be predicted that the pK a of Lys 1 will be greater than the pK a of Lys2, although it is not possible to assign a particular value to both pK a s. Moreover, although there is not a clear trend, it can be observed from the results in Table 2 for the condensed dual descriptor and electrophilic Parr functions (either from a Mulliken population analysis (MPA) or Hirshfeld population analysis (HPA)) that the value of the descriptors are larger for KA4K and KA5K than for the other peptides. Thus, it can be concluded that the effect of the alanine spacers will be that of increasing the difference between the pK a of Lys1 and and that of Lys2. This can probably be explained by considering that the increase of the number of Ala units leads to geometrical conformations of the peptides, making the terminal Lys residues be apart from each other and avoiding the competition between them for the reacting proton H + .

Conclusions
The Minnesota family of density functionals (M11, M11L, MN12L, MN12SX, N12, N12SX) has been tested for the fulfillment of the empirical KID procedure by comparison of the HOMO-and LUMO-derived values with those obtained through a ∆SCF technique. It has been shown that the range-separated hybrid meta-NGA density functional (MN12SX) and the range-separated hybrid NGA density functional (N12SX) are the best for the accomplishment of this objective. As such, they are a good alternative to those density functionals whose behavior have been tuned through a gap-fitting procedure and a good prospect for their usefulness in the description of the chemical reactivity of molecular systems of large size.
From the whole of the results presented in this work, it can be seen that the sites of interaction of the studied peptides can be predicted by using DFT-based reactivity descriptors such as the condensed dual descriptor and Parr functions calculations. These descriptors were used in the characterization and successful description of the preferred reactive sites and provide a firm explanation for the reactivity of those molecules.
Moreover, the preferred glycation and protonation sites could be identified, and helped to understand the effect of the alanine spacer on such properties as the pK a of the different Lys residues. As a result, the difference between the pK a of the Lys residue closer to the Ac-end-capped site and the pK a of the other Lys residue will be always greater than zero, and will be increased as the number of alanine spacers grows. Author Contributions: Daniel Glossman-Mitnik conceived and designed the research and headed, wrote and revised the manuscript, while Sebastián Sastre and Juan Frau contributed to the calculations and to the writing and the revision of the article. All authors read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest regarding the publication of this manuscript.

Abbreviations
The following abbreviations are used in this manuscript: