Molecular Sciences Vertical Ionization Energies of Α-l-amino Acids as a Function of Their Conformation: an Ab Initio Study

Vertical ionization energies (IE) as a function of the conformation are determined at the quantum chemistry level for eighteen α-L-amino acids. Geometry optimization of the neutrals are performed within the Density Functional Theory (DFT) framework using the hybrid method B3LYP and the 6-31G**(5d) basis set. Few comparisons are made with wave-function-based ab initio correlated methods like MP2, QCISD or CCSD. For each amino acid, several conformations are considered that lie in the range 10-15 kJ/mol by reference to the more stable one. Their IE are calculated using the Outer-Valence-Green's-Functions (OVGF) method at the neutrals' geometry. Few comparisons are made with MP2 and QCISD IE. It turns out that the OVGF results are satisfactory but an uncertainty relative to the most stable conformer at the B3LYP level persists. Moreover, the value of the IE can largely depend on the conformation due to the fact that the ionized molecular orbitals (MO) can change a lot as a function of the nuclear structure.


Introduction
Electron transfer (ET) processes have been a matter of huge interest since several decades [1].It is a key process in photosynthesis [2][3][4] and peptides and proteins play an important role in the electron transport between the donor and the acceptor [4][5][6][7][8][9][10][11], with a recognized importance of the coupling between proton and electron transfer in some cases [12,13].
Small peptide cations in gas phase have been recognized as interesting candidates to study intramolecular ET [14][15][16][17][18][19][20] free of any solvent effect.The theoretical model presented by Weinkauf et al. [16] relies on the ionization energies of the building blocks of the peptides, i.e. the amino acids.Apparently, not all of them were already the object of an ionization energy determination [21][22][23][24][25][26][27] and when they were, the influence of the conformation was not addressed, except in the recent article by Powis et al. [27] on alanine and threonine.
The principal aim of this work is to study systematically the variation of the vertical ionization energies of eighteen of the natural twenty α-L-amino acids as a function of their gas-phase conformations.Actually, the glutamine and the glutamic acid were not considered because of their high resemblance with asparagine and aspartic acid.Several ionized states were considered because a further investigation of charge transfer in small peptide cations needs the knowledge of the probable electronic states involved.An analysis of the molecular orbitals (MO) involved in each primary ionization event is presented.The influence of the calculation level used for the geometry optimization and for the determination of the ionization energies (IE) is addressed.For the one-particle propagator technique used for the calculation of the IE, i.e. the outer-valence Green's function technique (OVGF) [28][29][30][31][32], the incidence of choosing a reduced number of MO for the expansion series was investigated.

Computational tools and conformation choice
All the calculations were done with the Gaussian 98 [33] program on a SGI Origin 3800.All the chosen conformations were fully optimized within the density functional theory (DFT) [34] and the Kohn-Sham molecular orbital formalisms [35], using the hybrid method B3LYP [36] with the 6-31G**(5d) [37,38] basis set.As far as the backbone is concerned, three conformations were considered which differ by the relative orientation of the amine and carboxy heads.They are presented schematically in Scheme 1. Scheme 1.The three backbone conformations studied .CF1 and CF2 were found in the literature to be the lowest conformations in energy and CF3 was added for a possible interaction with the side chain.Moreover, several orientations of the side chain can lead to low energy conformations only a few of which will be considered in this study and labeled CF1(i), CF2(i) or CF3(i) with i=1 or 2. They were chosen to lie within the 10-15 kJ/mol (836-1254 cm −1 ) range from the lowest one obtained.Few geometry optimizations of the neutrals were also performed at the frozen-core (FC) MP2 [39,40] or QCISD [41] levels.Finally, single point calculations at the CCSD and CCSD(T) levels [42] were performed at the QCISD optimized geometries of the conformations of asparagine (Asn).The vertical ionization energies were determined within the outervalence Green functions OVGF method.It is based on the Koopmans' theorem [43] stating that the ionization energy corresponding to the removal of one electron from the i th MO is approximately equal to the opposite of the MO energy expressed in the Hartree-Fock (HF) [44] framework.The OVGF improves the description by taking into account both the MO reorganization energy and the electronic correlation, through an expansion series associated with each MO energy, i.e., with each primary ionization event.The method has been recognized as providing very satisfactory results provided that the ionized band was not related to a shake-up ionized state [45][46][47].This was checked to be the case for the amino acid cationic states mentioned here, for which it was verified that the OVGF pole strengths were all superior to 0.85.As a matter of fact, the pole strengths were all found superior to 0.90 except for Phe, Tyr and Trp where one pole strength was about 0.85 and corresponded to the ionization of the π3(Phe,Tyr) or π4(Trp) MO.As was already emphasized for highly correlated unsaturated systems [46,47], the simple MO ionization picture soon becomes a poor approximation even for low-lying excited states of the cation.

Influence of the electronic correlation description level
It has long been recognized [48] that the HF level was not satisfactory to quantitatively account for the relative stability of the different conformations of the amino acids.This was due to the fact that the electronic correlation is not the same for all the conformations as was also emphasized for dipeptides [49].In the literature, the quality of B3LYP was often analyzed as compared with other correlated levels [50][51][52][53][54][55][56][57].The B3LYP level was found unsuited to study transition metal complexes [50,51] or diradical species [52] but gave very satisfactory bond distances in metallocenes [55].In some cases [57], B3LYP and MP2 gave very similar results but rather different from those obtained at the CCSD level or experimentally.In other cases [53][54][55][56], B3LYP provided satisfactory results compared with the MP2, QCISD and CCSD ones.Thus, because of the probable influence of the conformation on the vertical ionization potentials, it was decided to check the nature and number of "low" energy conformations as a function of the calculation level.
Two and four conformations (see Figure 1) were considered for alanine (Ala) and asparagine (Asn), respectively, and their geometries were also optimized at the MP2(FC) and QCISD(FC) levels.For Asn at the QCISD level, calculations at the coupled cluster (CC) CCSD and CCSD(T) levels were also performed.Moreover, for arginine (Arg), lysine (Lys), isoleucine (Ile), tyrosine (Tyr) and tryptophan (Trp), three to five conformations were also optimized at the MP2(FC) levels.The relative energies are presented in Table 1.MP2, QCISD and CC results are very similar, in a range of about or less than 2 kJ/mol, but the B3LYP values can differ from the MP2 ones by as much as 10-15 kJ/mol with even inversion of the relative stabilities.Thus, in the following, the chosen conformations, that were optimized at the B3LYP level, will lie in the range of 10-15 kJ/mol from the lowest energy one in order to be sure of considering the main conformations.

Table 1:
Relative energies (kJ/mol) of few amino acid neutral conformations at three calculation levels: B3LYP, QCISD(FC) and MP2(FC).All the geometries are optimized at these respective levels.The basis set used throughout is 6-31G**.Only four results concern the CCSD//QCISD and CCSD(T)//QCISD levels, for Asn.The lowest energy conformation is the reference (∆E=0.0)for each case.ND= not determined.

Choice of the virtual orbital space in the OVGF calculation
Since the ionization potential determination in the OVGF framework is based on an expansion involving the occupied and virtual orbitals, the dimension of the calculation rapidly becomes prohibitive as a function of the system size.The virtual orbital space used for the generation of the expansion elements was then reduced.In order to validate the final choice of the orbital range, a few calculations were performed for the two conformations of glycine (Gly) and Ala as well as for phenol and formamide which were considered for comparison with Tyr and Asn respectively.All the valence occupied orbitals were always taken into account.The results are presented in Table 2, compared with the IEs obtained from the Koopmans' theorem.The smallest orbital range corresponds to a number of virtual equal to the number of occupied valence.It appears that the oscillations of the results from the smallest range to the FC level are about 0.3 eV at most, which is smaller than the difference between the MP2 and QCISD results.By comparison with the QCISD values, the MP2 IE are often overestimated by (0.2-0.7) eV while the OVGF values are either overestimated or underestimated.Moreover, the OVGF IE values calculated with the smaller MO range compare reasonably well with QCISD results, i.e. within a range of 0.2-0.3eV, except for IE2(Ala-CF1), IE3(phenol) and IE2(formamide) corresponding respectively to the ionization of (n O -n N +..), π3, (n O +... ) MOs.The discrepancy in these latter cases seems to be related to the reorganization energy of the cation MO but the reason is not apparent.

Labeling and description of some MOs involved in the ionization
The way the atoms are suffixed follows the usual rules for amino acid atoms in peptides.This is illustrated in Scheme 2.
Scheme 2: Amino acid atom labeling .Usually, the MOs are delocalized on a large part of the molecule.Nevertheless, it can happen that they become very localized, with very specific lone-pair or π or σ character.In the former case the lone-pair part of the MOs will be labeled n X .In the following, some specific denominations are illustrated.

The carboxylic head
In order to distinguish the two oxygen atoms on a carboxylic head, the oxygen of C=O will simply be labeled O, eventually with its suffix if it concerns the side chain, and the oxygen of the O-H group will be labeled O h .The usually encountered contributions of these oxygens to the MOs are schematically drawn in Figure 2. Considering the plane defined by COOH, the perpendicular contributions will be suffixed by a p (e.g.n O,p ) and the in-plane contributions will not be suffixed.As far as the π-like MOs are concerned, the highest energy one is schematized in Figure 3 and consists of an antibonding combination of a π density on C=O (π(C=O)) and a n Oh,p , labeled [π(C=O) -n Oh,p ]

The amide or peptide link
The two π-like MOs that will appear in the following are presented in Figure 4.

The arginine side chain
The two highest π-like MOs for the arginine side chain are shown in Figure 5 and present a certain amount of lone-pairs character.
The largest differences between our results and those in the literature concern Ser [66], Cys [66], Phe [74], and Trp [75] obtained at the MP2 level.The same trends as with our own MP2, QCISD or CCSD results appear, i.e.B3LYP seems to overestimate the relative stability of CF2.Two more observations from Table 3 are to be emphasized.The first one concerns His "N ε2 "for which no CF1-like low-energy conformation was found.The second one is related to the CF3-like conformations.One low-lying CF3 conformation was found only for amino acids with a long polar chain, i.e., Asn, Asp, His, Arg and Lys.By extrapolation, it should also be obtained for Glu and Gln.These two observations highlight the importance of intramolecular interactions in the relative stability of the amino acids conformers.

Variation of the IE as a function of the conformations
For these low energy conformations, the OVGF derived IEs are gathered in Table 4, with a brief description of the related ionized MOs obtained at the RHF level.This description is based on the visualization on the MO at a 0.05 a.u.contour level and lists their features, i.e. lone pairs, σ or π bonds densities, in decreasing importance.It appears that the MO can be either largely delocalized or more or less localized.A localization is nearly systematic for the [π(C=O)−n Oh,p ] or P O-N (Asn) MOs as well as for π MOs (Phe,Tyr,Trp,His,Arg).Moreover, in some specific cases, the MO can localize on few σ contributions and only one or two heteroatom lone pair contributions as found for MO27(Ser-CF1), MO24(Ser-CF2(2)), MO40(Lys-CF1 and CF2(2)), MO35(Asn-CF1(2)), MO31(Thr-CF1(2)), MO32(Cys), MO40(Met), MO46(Tyr-CF1), MO51(Trp-CF1).Also there are cases where the only atoms that bear the MO density are heteroatoms, meaning that their lone pairs all contribute.This is found in MO18(Gly-CF2), MO34(Asn-CF3), MO25(Ser-CF2(2)), MO29(Thr-CF2(1)), MO29(S-Thr-CF2), MO45(Tyr-CF1).And in most other cases, the MO densities tend to be shared by many atoms of the molecule.The conformation can have a dramatic effect on the order of the ionized MOs and thus on the IEs, as well as on their shapes.The range of the IEs for the [π(C=O)−n Oh,p ] MO is [11.5-12.7 eV] for the CF1 conformations and [10.5-11.6 eV] for the CF2 or CF3 ones, i.e. about 1 eV higher IEs for the CF1.Similar IEs large variations as a function of the conformations are found for P Oδ-Nδ (Asn), [π(C γ =O δ )−n Ohδ,p ](Asp), n Sγ (Cys), n Sδ,p (Met), n Nζ (Lys) for instance.As to the shape of the ionized MO, it is obvious from Table 4 that it varies very much as a function of the conformation.

Arginine
12.30 The relation between the MO shapes at the RHF level and the atomic spin densities (ASD) resulting from their ionization calculated at the UHF level is shown in Table 5 for a few species.Because a further study of charge transfer in small peptide cations needs the knowledge of the probable electronic states involved, it is necessary to have a good idea of the energetic position of the different electronic states of the building blocks, i.e. the amino acid cations.Several attempts to determine the wave function and energy of the low-lying electronic states of the cations were made at the UHF level.In principle, the variation calculus, which is the basis of the HF framework, should provide the lowest energy wave function and prevent the obtaining of excited states wave functions.This is referred to as the variational collapse.Nevertheless, if the excited state wave function is sufficiently different from the fundamental one, it is possible to obtain it provided that the convergence threshold on the density is not too severe.In the present calculations, this threshold (Th(SCF)) is fixed at 10 -6 au except for Phe where it was increased to 10 -5 au.For several other cases, it was not possible to obtain all the searched excited states.In Gly-CF1, only two of the three attempts were successful: the ionization of the [π(C=O)-n Oh,p ] did not lead to a stable UHF π state.Variational collapses also occur for His "N δ1 " where only four or three stable UHF cations were obtained out of six attempts.A good correlation exists between both features with the restriction that the ASD are often less delocalized than the RHF MO.Let us point out the specific patterns of alternating ASD on ring systems for the π ionizations (π1 and π2 for His and for Phe) and O=C-O h in general.
Table 5: Relation between the shape of the ionized MOs at the RHF level and the atomic spin densities (ASD) of the corresponding cations at the UHF level when obtained.Only nuclei for which the absolute value of the ASD is larger than 0.1 are listed.For Ala, ASD are calculated at the UQCISD level.The SCF convergence threshold (Th(SCF) ) was 10

Thermodynamical relative stability
The discussion about the actual number of conformations observed experimentally has been addressed for only a few amino acids.In the case of Gly and Ala, though the two lowest calculated conformations are in an energy range that made them both observable, only one was initially observed [24][25][26].Godfrey et al. [64] emphasized that a rapid conversion from CF2 to CF1 was very probable due to large vibrational amplitudes of the low frequency modes.But more recently, Powis et al. [27] showed that, both experimentally and theoretically, Ala could be observed in the two low-lying conformations.Moreover, the zero point energy and the thermal corrections on the energetic and entropic terms can also be responsible for an increased or a decreased separation between the conformers as can be seen from the results in Table 6.Some conformers were considered for a calculation of the zero point energy (ZPE) and the thermal corrections on the energy δE(T) and the entropy ∆S(T) according to the formulae of the statistical thermodynamics [77] implying the calculated frequencies.It allows then to determine the relative Gibbs free energy of the conformers ∆G(T) = ∆E + δZPE + δE(T) − T ∆S(T).The ∆E considered were those at the QCISD level for Asn and at the MP2 one for the other systems.However, the δZPE and thermal contributions were calculated with the frequencies obtained at the B3LYP level.For Asn, taking the ZPE or the thermal corrections into account results in increasing the relative stability of CF1(2) while the separation between CF2 and CF3 is decreased in the case or Arg.Now let us discuss the point of this article on the basis on one specific example: Arg with its two conformations CF2 and CF3, nearly equiprobable on the basis on Table 6 results.From Table 4, CF2 would present four vertical ionization bands under 11 eV because the three highest IE are nearly degenerate though corresponding to very different ionized MOs.In contradistinction, CF3 would show five bands under 11 eV, the first two being also degenerate and corresponding to either a π-type MO or a combination of heteroatomic lone pairs.Such kinds of degeneracy (∆IE < or ≈ 0.05 eV) between very different states also occur for Ala-CF2, Leu-CF2, Cys-CF2, Met-CF2(2), His"N ε2 "-CF2(2), Phe-CF1, Tyr-CF2(1).

Conclusions
The geometry optimization results on the neutral conformations at different calculation levels shed some doubts about the adequacy of B3LYP if a quantitative analysis of the relative stabilities is needed.While MP2 and QCISD or CC results lie within a 2kJ/mol range, the B3LYP ones can differ by as much as 10 kJ/mol and even lead to a reversal of the relative stabilities.Nevertheless, the MP2 and QCISD methods are much more time and disk consuming compared to B3LYP which will then be preferred for qualitative analysis.
The determination of the ionization energies with the OVGF method truncated in the virtual MO space seems to provide reasonably good results compared to QCISD results, with the restriction that for unclear reasons, there could be a larger error due to an underestimation of the reorganization energy.
The present work emphasized that the nature and the numbering of the ionized MOs can vary significantly as a function of the conformation as well as the values of the related ionization energies.The knowledge of the vertical ionization energies could provide patterns for the ionization spectra and help to point out what conformations of the amino acids are present in measurable amounts in the gas phase.In the case of aromatic amino acids (Phe, Tyr, Trp, His, Arg), the lowest IE is obviously related to a π MO ionization.In the other cases, most of the time the lowest IE is related to the NH 2 nitrogen lone pair ionization, except in the case of Lys(CF1,CF2), Asn(CF1(1)), Cys, Met, Gly(CF2), Ala(CF2).In the two latter cases, this is clearly due to a conformational influence on the cationic states since the eventual side chain does not contain any easily ionizable group.

Figure 2 :
Figure 2: Labeling of the MO lone pair types on the carboxylic oxygens.

Table 2 :
Table 2 also includes vertical IE obtained at the QCISD level for Ala, phenol and formamide and at the MP2 level for Ala, Asn, Tyr, Trp, phenol and formamide.The improvement brought by OVGF to Koopmans' theorem IEs is obvious.Vertical

Table 3 :
Relative energies (kJ/mol) for the chosen conformers optimized at the B3LYP level.In {} are given the corresponding values taken from the more recent or/and the highest calculation level of the literature.

Table 4 :
Vertical ionization energies IE(eV) calculated by the OVGF(w) approximation, for eighteen α-L-amino acids in several low-lying conformations.The MO window used in the calculation contains all the occupied valence MO and an equal number of virtual ones (see text).The description of the ionized MO relies on a RHF calculation and lists the main features of the MO in decreasing order.Abbr.: σ's(sdc)= σ bonds on several C-C and C-H of the side chain
n O , n Oh .

Table 6 :
Relative stabilities of some low-lying conformations for Asn, Arg, Lys and Tyr, expressed by the internal energy difference ∆E at the QCISD or MP2 level, by the ∆E incremented with the zero point energy difference ∆E+δZPE, and by the Gibbs free energy ∆G calculated at 298 K.All energies are in kJ/mol.