The Topological Analysis of the ELFx Localization Function: Quantitative Prediction of Hydrogen Bonds in the Guanine–Cytosine Pair

In this contribution, we recall and test a new methodology designed to identify the favorable reaction pathway between two reactants. Applied to the formation of the DNA guanine (G) –cytosine (C) pair, we successfully predict the best orientation between the base pairs held together by hydrogen bonds and leading to the formation of the typical Watson Crick structure of the GC pair. Beyond the global minimum, some local stationary points of the targeted pair are also clearly identified.


Introduction
Among numerous ideas published by Linus Pauling, he proposed with Robert B. Corey in 1953 a pioneering triple DNA helix structure with the bases on the outside [1].
Although this Pauling's structure soon turned out to be false, this work has paved the way for the discovery of DNA's double-helix structure [2]. In recent decades, DFT quantum chemical studies of the Watson Crick base pairs investigated the geometry, the energy and other typical properties of the hydrogen bonds (HB) that hold together adeninethymine (AT) and guanine-cytosine (GC) pairs [3][4][5][6][7][8]. Note that L. Pauling and R. B. Corey have already highlighted the role of hydrogen bonding in proteins [9]. The main concern assessed in this work is related to the validity of the molecular orbital point of view regarding the geometries of these base pair systems where hydrogen bonds play a crucial role. Indeed, it has been shown that the stability of the Watson Crick base pairs is related to a charge-transfer due to donor/acceptor orbital interactions (oxygen and nitrogen lone pairs, N-H σ* character) [10]. For example, consider the well-known most stable pair structure guanine (G)-cytosine (C), as depicted in Figure 1 [7,[10][11][12].
In this article, we tackle the possibility to find the guanine (G)-cytosine (C) pair geometry simply by looking at the orientation of the donor and acceptor domains of the bases. In this article, we tackle the possibility to find the guanine (G)-cytosine (C) pair geometry simply by looking at the orientation of the donor and acceptor domains of the bases.

Electron Localization Function for Chemical Reactivity
Nowadays, the topological analysis of the electron localization function (ELF) is a well-established tool to describe both covalent and non-covalent interactions [13][14][15][16][17][18][19][20]. However, the tricky question is to determine the most favorable relative orientations between the ELF topological domains of two reactants and, thus, to identify the preferred pathways when both molecules approach each other remains a tremendous challenge. Intuitively, it is established that favorable chemical reactions happen when electron acceptor and electron donor domains are adequately oriented. Recently, we have proposed a methodology designed to identify the favorable orientations between two reactants [21]. In this work, the topological domains are the ones of the modified ELF, termed ELFx [22] defined from ELF as follows: and ELF x (r) = 1 The kernel χ(r) of ELF being defined as: where c F = 3 is the Fermi constant, (r) is the positive definite kinetic energy density and (r) N is the total electron density of a molecular system with N electrons. x(r) is a normalized dimensionless quantity that can be expressed from the field of the frontier molecular orbitals [23].
(r) N+1 is the total electron density of the molecular system with N + 1 electrons with the same geometry and the same orbitals that are obtained for the system with N elec-

Electron Localization Function for Chemical Reactivity
Nowadays, the topological analysis of the electron localization function (ELF) is a wellestablished tool to describe both covalent and non-covalent interactions [13][14][15][16][17][18][19][20]. However, the tricky question is to determine the most favorable relative orientations between the ELF topological domains of two reactants and, thus, to identify the preferred pathways when both molecules approach each other remains a tremendous challenge. Intuitively, it is established that favorable chemical reactions happen when electron acceptor and electron donor domains are adequately oriented. Recently, we have proposed a methodology designed to identify the favorable orientations between two reactants [21]. In this work, the topological domains are the ones of the modified ELF, termed ELF x [22] defined from ELF as follows: and ELF x (r) = 1 The kernel χ(r) of ELF being defined as: where c F = 3 10 3π 2 2 3 is the Fermi constant, τ N (r) is the positive definite kinetic energy density and ρ(r) N is the total electron density of a molecular system with N electrons. x(r) is a normalized dimensionless quantity that can be expressed from the field of the frontier molecular orbitals [23].
ρ(r) N+1 is the total electron density of the molecular system with N + 1 electrons with the same geometry and the same orbitals that are obtained for the system with N electrons.
The ELF x localization domains are well suited for describing the chemical reactivity between donors and acceptors because they match with the electrophilic and the nucleophilic regions which are spread out over the molecular space. This is illustrated in Figure 2, which represents the ELF x domains of guanine and cytosine.
trons. In this latter case where x(r) = , (r) N+1 (and (r)) are also consistently used for the calculation of the ELF kernel. The ELFx localization domains are well suited for describing the chemical reactivity between donors and acceptors because they match with the electrophilic and the nucleophilic regions which are spread out over the molecular space. This is illustrated in Figure 2, which represents the ELFx domains of guanine and cytosine. The ELFx topological analysis of the guanine and cytosine molecules obtained in their isolated states yields, respectively, to valence basins accounting for electrophilic basins (red domains) and several nucleophilic basins (blue domains). The outside domains around hydrogen atoms appear as electrophilic while domains around nitrogen atoms as well as the oxygen lone-pairs clearly have a nucleophilic character.

Coulomb Intermolecular Interaction Energy
The total energy of a molecule or a complex can be split within the framework of the interacting quantum atoms (IQA) [24,25]. The IQA coulomb contribution between two molecules MA et MB, here termed E MA-MB Coul , reads: | being the distance between an electron in the domain ΩA and an electron in the domain ΩB, respectively. RA and RB are the nuclear locations of atoms A and B belonging to Ω A and Ω B domains with charges ZA and ZB. When MA and MB are located far from each other, we assume that E MA-MB Coul accounts for a large fraction of the total interaction energy [21]. The ELF x topological analysis of the guanine and cytosine molecules obtained in their isolated states yields, respectively, to valence basins accounting for electrophilic basins (red domains) and several nucleophilic basins (blue domains). The outside domains around hydrogen atoms appear as electrophilic while domains around nitrogen atoms as well as the oxygen lone-pairs clearly have a nucleophilic character.

Coulomb Intermolecular Interaction Energy
The total energy of a molecule or a complex can be split within the framework of the interacting quantum atoms (IQA) [24,25]. The IQA coulomb contribution between two molecules MA et MB, here termed E Coul MA−MB , reads: |r A − r B | being the distance between an electron in the domain Ω A and an electron in the domain Ω B , respectively. R A and R B are the nuclear locations of atoms A and B belonging to Ω A and Ω B domains with charges Z A and Z B . When MA and MB are located far from each other, we assume that E Coul MA−MB accounts for a large fraction of the total interaction energy [21].

Electron Transfer
The coulomb energy stabilization between an electron donor (MA) and an electron acceptor (MB) can be evaluated by the first-order variation of E Coul MA−MB expressed in terms of the response to changes in the number of electrons ∆N A or ∆N B where the external potential remains unchanged: The total variation ∆N = ∆N A + ∆N B = 0 because the total system is isolated. E MA/MB dual is negative when the electron transfer goes spontaneously from MA (nucleophile) to MB (electrophile). After some developments previously detailed elsewhere [21], we obtain: where f(r A ) and f(r B ) are the Fukui functions [23] typically associated with reactive nucleophilic or electrophilic sites of the reactants. The choice of the condensation scheme remains arbitrary as far as that of an electron domain or the definition of an atom in a molecule remains arbitrary. Here, we can clearly dissociate the MA and MB domains where electrophilic and the nucleophilic regions are spread out over their respective molecular space. This typically matches with the topological partition of the electron localization function ELF x .

Practical Interactions Model
Equation (4) is exact but can be computationally expensive. In practice, it can be numerically evaluated by means of a multipole expansion (ME) [26]. We use only the first terms of the ME (that is only the monopoles). For the general case in which both molecules MA and MB exhibit some donor and acceptor sites, the monopoles' development leads to the compact equation: in which N ΩNu/El are the populations of nucleophile/electrophile domains, respectively, obtained from the usual condensation of the HOMO/LUMO density computed over the ELF x basin volumes [27].
r Ω A and r Ω B are the locations of basin attractors belonging to Ω A and Ω B domains, respectively.

Results and Discussion
We explored the conformational space of the base pair GC WC formation using the Equation (7) with the algorithm previously outlined elsewhere [21]. All relative rotation angles (θ, ϕ) of the center of mass of C around the center of mass of G have been tested, with the distance between the centers of mass of C and G being frozen to 8 Å. Note that the corresponding optimized distance between the centers of mass was found close to 6 Å at the M06-2X/6-311++G(3df, 2pd) level of theory. For a given (θ, ϕ) couple, the process selects the best orientation of C associated to the lowest value of E dual . Figure 3 displays the obtained map E dual (θ, ϕ) together with the corresponding map of the DFT intermolecular interaction energy E 0 int (θ, ϕ) computed from the relevant isolated cytosine and guanine.
It is worth noting the good mapping of E dual and the DFT intermolecular interaction energy E 0 int (θ, ϕ). Indeed, the locations of critical points of E dual , notably the location of the global minimum (termed (A) on Figure 3), are in agreement with the DFT intermolecular interaction energy surface. We noted that the structure associated to the global minimum corresponds to the well-known orientation between C and G leading to the natural Watson Crick base pair GC WC structure where three typical HNH···O=C/NH···N/C=O···HNH intermolecular hydrogen bonds are observed. Two other stationary points (denoted by (B) and (C) on Figure 3) corresponding to already identified pair structures are also found on the E dual map [7]. These latter structures highlight typical HNH···O and CH···N donor/acceptor interactions. The presence of structures (A), (B) and (C) are confirmed on the DFT interaction energy surface.
Further analysis of the E dual conformational space obtained for each given (θ, ϕ) couple (not only for the best orientation of C in front of G) has led us to find other local stationary points. Some of them have been previously identified in the literature [7,8]. For example, the geometry of the second lowest minimum is displayed in Figure 4b: this pair appears clearly stabilized by two symmetrically NH···O=C hydrogen bonds.
Thus, in spite of numerous approximations used in this work, we show that the DFT energetic properties as well as the structural parameters of some identified pairs can be reasonably reproduced from our methodology. Further analysis of the Edual conformational space obtained for each given (θ, φ) couple (not only for the best orientation of C in front of G) has led us to find other local stationary points. Some of them have been previously identified in the literature [7,8]. For example, the geometry of the second lowest minimum is displayed in Figure 4b: this pair appears clearly stabilized by two symmetrically NH•••O=C hydrogen bonds. Thus, in spite of numerous approximations used in this work, we show that the DFT energetic properties as well as the structural parameters of some identified pairs can be reasonably reproduced from our methodology.

Conclusions
The information obtained from the domains of ELF x function and their populations has been used to propose an empirical model coulomb stabilization energy between electron donor and electron acceptor domains. Our methodology was able to predict the best orientations between the cytosine and the guanine leading to the formation of base pair structures. We unveil a noticeable mimicking of E dual onto the DFT intermolecular interaction energy E 0 int . In particular, we show that the global minimum, easily identified on the E dual energy surface, corresponds to the well-known Watson Crick structure for the base pair GC WC in which the guanine and the cytosine molecules are held together by three hydrogen bonds (see Figures 1 and 3). Some local stationary points of the GC pairs have been also identified.