1. Introduction
Despite of its undeniable success in the description of chemical structure and reactivity, the language of chemistry is largely grounded on concepts with loose theoretical foundations. Concepts like bond, atomic charges, orbitals, hybridization, resonance, delocalization, hyperconjugation and many others cannot be directly measured in physical experiments or expressed more precisely in the jargon of quantum mechanics, they are not observables.
The proposal of G.N Lewis [
1], in the early twenties of last century, to explain all the previous work made on the study of the molecular structure was recognized from the very beginning as an outstanding contribution to the field. Because of the weak arguments that supported his model, his work was promptly followed by attempts to overcome its theoretical shortcomings by providing a more rigorous foundation based on the recently discovered quantum mechanics. In these attempts, many concepts were proposed to underpin Lewis ideas by pioneers like Coulson, Pauling, Hückel and Mulliken, just to mention some of the most conspicuous figures. Their early efforts to accommodate the complex and sometimes elusive chemical facts in the frame provided by the new emerging physics were driven in part by the available computational capabilities, that were very reduced at the time.
Thus, what at first was an admirable exercise of imagination in its origin has led to a burden of new concepts in an attempt to grasp the increasing knowledge on the complexity of the chemical world. The effect of this multiplication of concepts has been keenly illustrated by P. Politzer in the case of the interpretation of a well established concept in chemistry as is hydrogen bonding. Numerous interpretations have been given depending on the circumstances and the authors’ preferences. As Politzer humorously quotes, hydrogen bonding “has been dissected into classical and nonclassical, proper and improper (immoral?), blue-shifted and red-shifted, dihydrogen, anti-hydrogen (rebellious?), resonance-assisted, polarization-assisted, and more” [
2].
Without wishing to understate the usefulness and achievements of the somehow imprecise concepts in which chemical language is currently built, it is worth recalling that there are two physically well grounded observables that are most useful in describing the realm of chemistry: the electron density and the electrostatic potential.
In the Born–Oppenheimer envisioning of molecular structure [
3], molecular electron density (MED) can be regarded as a mathematical description of the electron cloud surrounding the nuclei in the molecule. As it is a function defined in 3D space, it can be displayed and easily connected with intuition, and faithful approximations can be obtained with current computational methods at a low or moderate cost, depending on the system’s size. Although the importance of MED was realized very early, it remained long time as an auxiliary quantity for computing other properties like atom and bond charges [
4] or to reinforce the analysis based on wave function and its related concepts in the Linear Combination of Atomic Orbitals (LCAO) framework, such as orbitals, hybridization and the like. The recognition of its usefulness as a powerful tool for analysis of molecular structure had to wait to R. Bader’s work [
5,
6], who changed the focus considering MED as a very central property for chemical interpretation, as an alternative to the well stablished orbital analysis. Since then, the topological analysis of MED has unceasingly gained popularity among theoretical chemists, and currently, it has become a well established tool that is customarily used in the interpretation of the chemical structure and reactivity.
The case of molecular electrostatic potential (MESP) is somehow comparable to that of MED, with the added difficulty of the cost of its faithful computation in large systems. Former works on the application of MESP to chemical facts [
7,
8,
9] highlighted the possibilities of MESP in the interpretation of reactivity. The analysis of MESP in molecular surfaces and the exploration of the relations between its values with thermochemical properties by Politzer and Murray [
10,
11,
12] have proved to be useful tools for systematization and prediction of such properties. Besides, the introduction of the concepts of
sigma and
pi holes by these authors has been very valuable in the interpretation of molecular interactions between closed shell systems [
13,
14]. Furthermore, the outstanding contributions by S. Gadre et al [
15,
16,
17,
18,
19,
20,
21] on the topological analysis of MESP have added new insight on the structure and properties of molecules and molecular aggregates. However, albeit this analysis of MESP topology is breaking through in current research, its application to large systems is being hindered by the cost of computing accurate values of MESP and its derivatives. Thus, the introduction of more efficient procedures like that proposed herein has greatly facilitated this type of studies, and several works using the new algorithms have appeared in recent literature [
22,
23,
24,
25].
In this work, we report details on the algorithms used for the efficient calculation of MESP in large systems, based on the Deformed Atoms in Molecules (DAM) partition/expansion [
26] of MED. This procedure allows us to split the problem in two separate steps, one dealing with the MED partition/expansion itself, and another corresponding to the actual computation of MESP [
27]. As it will be shown, the cost of the first step is independent of the number of MESP computation points, and the second is independent of the number of basis set functions in the calculation. This separation is crucial for the high performance of the method.
The article is organized as follows. In the next section, the theoretical foundations of the method are outlined, only essential equations are given and the remaining ones are collected in the
supplementary file (SF) accompanying this work. The third section is devoted to the description of the algorithms’ implementation, and is split in two subsections collecting the algorithms for the two steps of the procedure. Again, subordinate equations and technical details are addressed in SF. Several results on performance and accuracy are presented in the fourth section and, finally, significant conclusions are drawn from these results.
2. Method
The algorithm for the efficient evaluation of MESP reported here is based on DAM [
26] partition of electron density. In this partition the molecular density is expressed as a sum of atomic contributions:
where the densities of the (pseudo)atomic fragments,
, are obtained with a least-deformation criterion based on the fast convergence of the long-range multipole expansion of the electrostatic potential [
28]. In practice, in the Linear Combination of Atomic Orbitals framework (LCAO), in which DAM is formulated, this fast convergence is achieved by assigning the one-center distributions, i.e., products of pairs of basis functions centered at the same point
, to their pertaining atoms,
A, and partitioning the two-center distributions,
, between their respective centers:
In this work, we will deal with basis functions, , consisting of Gaussian contracted functions (CGTO), which are most used in molecular calculations.
Contracted Gaussian functions,
, are linear combinations of primitive Gaussian functions,
:
where
is the number of primitive functions (length of the contraction), index
i runs over the primitive functions in the contraction, and the expansion coefficients,
, are chosen so that the radial part of the contraction remains normalized. Primitive Gaussians on their side are defined as:
and the same angular part is taken for all the primitives in a given contraction. Thus, the contracted functions can be written as:
where
and
are radial and angular normalization constants defined by:
and
are unnormalized real spherical harmonics given by:
where
are the corresponding associated Legendre functions (see Equation 8.751.1 at [
29]).
For two-center densities,
, consisting of pairs of spherical primitive Gaussians (
) centered at two different points,
A and
B, it has been proved [
28] that the best convergence in the long-range potential is achieved by assigning the whole density to the center with higher exponent
or, in case of equal exponents, by assigning one half to each center. This result reminds the
nearest-site algorithm reported by A. Stone and M. Alderton [
30], but without considering expansions at the bond center.
When applying this criterion to densities made of pairs of spherical contracted functions, the products of primitives become assigned to one center or another depending on their exponents, yielding a partition that can be regarded as a more realistic version of the Mulliken partition (see
Figure 1).
For distributions consisting of nonspherical functions (
), the partition is applied to the products of brackets of Equation (
5), and the remaining terms are translated to the centers as described below.
The next step in the procedure is to expand the densities of the atomic fragments thus obtained as a series of radial factors times unnormalized real regular solid harmonics (hereafter
regular harmonics, for short), namely:
where the regular harmonics,
, are related to unnormalized real spherical harmonics by:
The partition of density given in Equation (
1) combined with the expansion (
9) allows us to write the electrostatic potential as [
27]:
in terms of atomic nuclei charges,
, effective multipoles,
, and inverse multipoles,
:
In this way, the molecular electrostatic potential results in a sum of atomic contributions and the short/long-range separation can be carried out at the atomic level. Thus in the long-range region, the effective multipoles can be accurately replaced by point multipoles:
and the inverse multipoles do vanish:
.
As it will be shown below, this is a most useful feature when dealing with large systems because a huge fraction of the atomic contributions to MESP can be computed from only the long range terms even in the regions where molecular short-range potential is necessary. This apparent paradox comes from the fact that, in these systems, the molecular short-range at a given point usually involves a reduced number of atomic short-range contributions coming from atoms in the neighborhood of the point, the remaining contributions being of long-range type.
3. The Algorithm
The algorithm for the evaluation of the electrostatic potential in large molecular systems, according to the method described in the previous section, consists of two main steps, which are executed in sequence. First, a partition of the molecular density must be carried out with the DAM procedure followed by the expansion of the atomic fragments and the computation of effective and inverse multipoles. Second, once the partition/expansion has been made, the electrostatic potential is computed in the desired points with the aid of Equation (
11).
It is worth noticing that the first step depends on the size of the basis set used in the computation of molecular density, but it is independent of the number of points where the MESP has to be computed. On the other hand, the second step depends on the number of points for computation, but it is independent of the basis set size.
This decoupling of processes depending on the system size (number of basis functions) from those depending on the number of MESP computation points, combined with the short/long-range separation at atomic level, makes the procedure reported here most efficient to deal with large systems, provided that the MESP is to be evaluated in a not too reduced number of points.
3.1. Algorithm for Molecular Density Partition/Expansion
According to DAM partition criterion, the density of atomic fragment at
A is given by:
where, for two-center CGTO distributions:
being the step function:
As our purpose is to express the fragments
in coordinates referred to center
A and the second primitive in each term of Equation (
16) is centered at
B, it is necessary to translate the functions
to center
A. The translation of the exponential factor in
can be made in terms of Bessel
I functions as proposed by Kaufmann and Baumeister [
31]. Working in an
aligned frame, with
A placed at the origin and
B lying on the
z axis, i.e.,
,
, the translation formula reads:
where
,
are the corresponding Bessel functions (see [
29] 8.467) and
are the Legendre polynomials (ibid 8.91).
On the other hand, the remaining factor, which is in essence the regular harmonic
(apart from a constant factor), can be translated to center
A by well known formulas [
32]. In the aligned frame, the formula reads:
Once the functions are referred to center A, the one-center products of regular harmonics, appearing both in the one-center and in the translated two-center distributions, are decomposed in terms of regular harmonics as described in section, Decomposition of products of regular harmonics, in SF. The final radial factors are identified as the quantities that multiply the corresponding regular harmonics resultant from the decomposition.
In practice, the algorithm proceeds as follows:
The interval bohr is partitioned in subintervals with boundaries corresponding to previously selected values of r that will be noted as (currently , see SF for details).
For each interval, the variable
r is mapped onto the interval
according to:
and a set of values of
t is chosen as the zeroes of the Chebyshev T polynomial of order
n (currently
) given by (see [
33] 22.16.4):
For each center,
A, of the system, one-center distributions are expanded as:
As described in section,
Expansion of one-center distributions, of SF. The radial factors are evaluated in the tabulation points
, multiplied by the
element of the density matrix, and accumulated.
Likewise, for each center,
A, a loop over all the remaining centers
is performed. In this loop, for each center
B, all the fragments
coming from two-center distributions with one function at
A and the other at
B, and attributted to center
A, are expanded in an aligned frame as a series of regular solid harmonics times radial factors
as described in section,
One-center expansion of two-center fragments, of SF. The radial factors are evaluated in the tabulation points
, multiplied by the
element of the density matrix, which has been previously rotated to the aligned frame (that is what the tilde means) and accumulated in the aligned frame. Next, the locally accumulated radial factors (i.e., for fixed
B) are rotated back to the molecular frame, and the resultant radial factors are further accumulated together with those coming from the one-center distributions and with the radial factors of other pairs of centers to yield the full radial factor
of Equation (
9). Details on rotations of both density matrix and radial factors are given in section,
Rotations, in SF.
The tabulations of the
radial factors are used to decide whether they are negligible or not and, for non-negligible factors, to carry out a numerical projection on Chebyshev T polynomials of variable
t in each interval
. This projection yields the corresponding piecewise expansion of
. Details of this expansion are given in section
Expansion of atomic radial factors in SF. Thus, the final expansion in the
i-th interval takes the form:
where
t is a function of
, as defined in (
20), and the exponential factor is introduced when
(leading term in expansion (
9)) decays steeply in the interval (see SF for details); otherwise,
is taken. The number of polynomials taken in the expansion at the
i-th interval,
, is determined on the fly by analyzing the convergence of the projections. The expansion coefficients,
, of non-negligible factors are stored in a buffer. An array with a set of suitable pointers to address the coefficients is also generated and stored.
Once the radial factors of expansion have been piecewise expanded, they are used to compute the auxiliary partial integrals:
in the same tabulation points,
, as used for the density, as well as the auxiliary constants:
and
Details are given in section,
Effective multipoles from density expansion, in SF.
The tabulations of
and
are used to project these partial integrals onto Chebyshev T polynomials in the same intervals as used for the radial factors of density. In this case, no exponential factor is necessary:
The numbers of polynomials in the intervals,
, are the same as in the corresponding radial factors. In this way, the pointers defined for addressing density expansion coefficients,
, can be used also for coefficients
and
of Equations (
29) and (
30).
Atomic point multipoles of Equation (
14) are obtained by:
where the sum runs over the intervals.
Molecular geometry and data corresponding to the tabulation of radial factors are stored in an external binary file with extension
damqt, ready to be used for computation of DAM expansion of density. In particular, the following information is stored: number of atoms, number of basis functions and number of shell functions, atomic number and Cartesian coordinates of nuclei, basis set, length of expansion (
9) (
), and for each center
A: pointers to expansion coefficients of radial factors, fitting exponents,
, and expansion coefficients,
.
Atomic multipole moments , auxiliary quantities and , and expansion coefficients and are stored in another external binary file with extension dmqtv. Since the pointers to and are the same as those used for by construction, they do not require to be stored again.
3.2. Algorithm for Electrostatic Potential Expansion
Once the files containing the information on MED partition/expansion and the auxiliary quantities for MESP have been generated, MESP can be computed at any desired points using this information. This step is completely independent of the first one, so that the computation can be made as many times as necessary and in different sets of points without requiring repetition of the partition/expansion process.
To compute the MESP, the following algorithm is employed:
MED partition/expansion data stored in file damqt are read and stored in memory.
MESP auxiliary data are read from
dmqtv, stored in memory and used for computing further auxiliary quantities. In particular, partial accumulated sums:
and
are computed and stored too.
A double loop over atoms (outer) and tabulation intervals (inner) is performed to determine the length of expansion (
11) in each interval and the long-range radius for the atom. This radius is chosen as the lower limit of the interval
i,
, for which
is lower than a user defined
long-range threshold.
Next, MESP is computed, running over the atoms, with Equation (
11). For points placed in the
long-range region,
, of atom
A, the contribution to MESP,
, is computed in terms of the corresponding atomic point multipoles
as:
For points placed in the
short-range region,
, the contribution is computed by means of:
and the quantities
and
are obtained in terms of
and
of Equations (
32) and (
33),
i being the index of the interval such that
, plus the expansions (
29) and (
30) for the integrals in the interval
and
, respectively.
In all cases, the regular solid harmonics are fast and accurately computed by recursion, as described in section,
Recurrence relations of regular solid harmonics, of SF. In the short-range case, eqs (
29) and (
30) are evaluated with the coefficients
and
previously retrieved from file
dmqtv and stored in memory, and with the Chebyshev polynomials computed by recursion, as shown in section,
Recurrence relations of Chebyshev polynomials, of SF.
If MESP derivatives are wanted, they can be computed together with MESP and using the same auxiliary quantities [
34]. The procedure is quoted in section,
Computing MESP derivatives, in SF.
Data on basis set and density matrix are only necessary if computation of MESP in terms of nuclear attraction integrals and density matrix, without DAM partition/expansion, is required. As this is an expensive procedure, its usage should be restricted to those cases in which a reference is necessary for testing the accuracy of the algorithm reported here.
4. Results
To test the performance of the method, we have started by analizing the accuracy of the results attained with the current algorithm. For this purpose, we have computed MESP values for benzene molecule in a set of equally spaced points corresponding to a 129 × 129 × 129 grid in the octant defined by:
(length in bohr). These results have been compared with the MESP
exact values,
, computed using the electron density matrix and the integrals involving basis set functions:
where
are the elements of the density matrix, and
stands for the number of basis functions,
, in the LCAO calculation.
The results of this comparison are reported in
Table 1 for four different lengths in expansion (
9), namely:
. The highest absolute error,
, and the root mean square error,
, in the grid points significantly decrease with the length of multipole expansion, and they suggest that an expansion with
is sufficiently precise for most practical applications. Nevertheless, higher precision can be attained for more demanding applications like topological analysis at a very moderate cost.
Remarkably, for expansions with
or higher, a great amount of points are computed with a high number of accurate decimal figures (twelve or more) as is shown in the last row of
Table 1. This can be explained because short-range contributions in most points of the grid (those lying outside the molecular volume) are very small or even negligible, and precision is greatly determined by the convergence of the long-range expansion, i.e., the number of terms in (
9), the accuracy in the radial factors playing a minor role.
In this regard, it is interesting to check how the algorithm performs in those points in which short-range terms are important. For this purpose, the number of points with a number of decimal accurate figures is plotted in
Figure 2 for the four expansions. As it can be seen, the number of points addressed in the curve corresponding to
is significantly greater than in the remaining curves. This is due to the insufficient convergence of the long-range MESP in this case, as mentioned before. In the remaining curves, the convergence in the long-range MESP has been achieved to a great extent, and the precision is mainly determined by the short-range terms. As the curves show, also in these cases the precision readily increases with the length of the expansion in two senses: a steady augment of the number of correct decimal figures in the least precise points is observed, and a significant decrease in the number of points for a given precision occurs. Raw data used for the figure can be found in section,
Precision of MESP calculation, in SF.
The computation wall-clock time with Equation (
36) was 5600 seconds, and 2.9, 6.6, 16.2 and 28.0 s for the respective expansions. On the other hand, DAM partition/expansion step with
took only 2.2 s for benzene density computed with Dunning’s cc-pVDZ basis set [
35]. These times were measured on a laptop with processor Intel(R) Core(TM) i7-6700HQ CPU @ 2.60 GHz, running an MPI parallel version of the codes compiled with gfortran, and using 4 processors. Although the calculation was made at HF level, it is important to stress that the computational cost of the full algorithm is independent of the computational level of the molecular density. This is so because the partition/expansion step only depends on the number of elements of the density matrix (i.e., the square of the basis set size), and the MESP computation only depends on the length of expansion (
9) and the number of atoms in the system.
It is fairly evident that the algorithm based on DAM yields results that can be sufficient for most practical purposes at a computational time that, in this case, is between two and three orders of magnitude lower than the time required for computation from density matrix and integrals. This is so in a rather small system in which the short-range atomic contributions in the selected grid are about 20% of the total contributions, for a long-range threshold equal to
taken in cases of
, and about 40% for a threshold of
taken in cases of
. As it will be shown below, for really large systems, the fraction of short-range contributions in equivalent grids is much smaller, and the gain in the computational cost increases with the system size with respect to the computation by means of Equation (
36). A further test on MESP surface extrema on a density isosurface has been included in the
supplementary files.
Once the validity of the results has been established from the point of view of accuracy, we have analyzed the performance of the algorithm in large systems. In
Table 2 we collect the results of MESP calculations in a set of molecules ranging from a small system like benzene (12 atoms, 222 basis functions) to two large ones: CC-MMIM BF
, consisting of a three circumcoronene slices with two pairs of MMIM BF
ionic liquid molecules embedded between the circumcoronene sheets (617 atoms) and a DNA fragment with 750 atoms. In the last two, the electron densities correspond to PM7 calculations [
36] (only basis valence functions).
With these results, we have analyzed the dependence of the computational cost of the two steps involved. Three different expansions have been used (), except in CC-MMIM FB and DNA fragment. In these cases, due to the ZDO aproximation involved in PM7 method, only valence one-center distributions contribute to densities. Consequenty, the partition/expansion is a very fast process and terms in MED and MESP expansions with higher than twice the highest L value in the basis set are zero.
As expected, the computational time of the partition/expansion step in small systems increases with the square of the number of basis functions, and at a lower rate when dealing with large systems, in which cases the dependence tends to be linear. The increment of the cost with respect to the length of expansion (
11) is also smaller than the predicted
dependence. In the
supplementary files, a little movie showing the MESP of the DNA fragment over its molecular surface defined as MED isosurface with
au is included. Red color means positive MESP values, blue color, negative values.
Finally, in
Table 3 we analyze the performance of MPI parallelization of the codes corresponding to both steps of the algorithm. Calculations have been carried out on a polyhidroxilated circumcoronene system with 360 centers, with a basis set consisting of 7560 contracted GTOs. The expansion of the atomic fragments has been made up to
, and the MESP has been computed with this expansion on a 129 × 129 × 129 cubic grid (2,146,689 points) within an interval
. The wall clock time, the average time per processor and the standard deviation are provided for the DAM partition/expansion and for the MESP tabulation.
In both steps, the computational time scales very well with the number of processors. In the partition/expansion step, a linear fit of the clock wall time vs. the reciprocal of the number of processors gives a regression parameter of , and the same value is obtained for the fit of the average time. In case of MESP tabulation, the scaling is likewise, with and , respectively.
The standard deviations show that the time is more evenly shared among processors in the partition/expansion than in the tabulation, but in both cases the time distribution is satisfactory. Furthermore, in the second step the standard deviation is affected by the fact that the main processor spends more time than the remaining ones, due to the workload associated to the gathering of tabulations accomplished by ancillary processors, which are stored in external files, and to tidying tasks.