1. Introduction
Electronic structure calculations have become a cornerstone of modern-day research in chemistry and materials physics, allowing
in silico modeling of chemical reactions and the first principles design of novel catalysts [
1]. Electronic structure calculations on molecular systems most often employ the linear combination of atomic orbitals (LCAO) approach, where the molecular orbitals (MOs) are expanded in terms of atomic orbitals (AOs). Several possible alternatives for the form of the AOs are commonly used—Gaussian-type orbitals (GTOs), Slater-type orbitals (STOs), as well as numerical atomic orbitals (NAOs); see [
2] for details. LCAO electronic structure calculations involve a variational minimization of the total energy with respect to the AO expansion coefficients of the MOs. Importantly, the formalism used in the LCAO approach is not restricted to AOs which are atom-centered basis functions; it can also be used, e.g., in combination with fully numerical basis functions as in the finite element approach, as has been recently demonstrated in [
3,
4]. Once the energy has been minimized and the corresponding wave function has been obtained, it is possible to compute a number of properties either directly from the electronic wave function (e.g., electron densities, orbital energies, molecular dipole moment), or from its response to external perturbations (nuclear magnetic shieldings, vibrational frequencies, etc.).
The mathematical foundations for spin-restricted Hartree–Fock (HF) theory within the LCAO approach were laid out independently by Roothaan and Hall [
5,
6]. In their seminal papers, Roothaan and Hall derived matrix equations that can be conveniently implemented on a computer as an iterative procedure. As will be seen later in
Section 6, the Roothaan–Hall equations turn out to yield a generalized eigenvalue problem
in the non-orthogonal AO basis set, which had been solved some years before by Löwdin in the context of Heitler–London theory [
7].
Subsequently to the work by Roothaan and Hall, Pople and Nesbet [
8] and Berthier [
9] independently published the corresponding equations for an unrestricted (open-shell) HF description by an analogous scheme, without providing an explicit derivation. The Pople–Nesbet–Berthier equations assume a form similar to the Roothaan–Hall equations—constituting a coupled set of general eigenvalue equations—as will also be seen later on in the manuscript (
Section 6). Restricted open-shell HF was then described by Roothaan [
10]; restricted open-shell calculations will not be considered in the present work as they have been extensively reviewed by Krebs in [
11] to which we refer for further details.
Density functional theory [
12,
13] (DFT; see also [
14,
15]) became popular in chemistry through the efforts of Pople and coworkers in making the method generally available to quantum chemists [
16] and showing that atomization energies from DFT may agree well with experiment [
17,
18]. Also DFT turns out to yield self-consistent field (SCF) equations that assume the same form as in HF but with a different expression for the Fock matrix
. Pople and coworkers reported the equations necessary for solving SCF for DFT in the LCAO context up to generalized gradient approximation (GGA) functionals in [
16]; an analogous derivation was also presented by Kobayashi et al in [
19]. The self-consistent implementation of meta-GGA functionals was later described by Neumann, Nobes and Handy in [
20]. Density functional calculations sometimes include also non-local correlation contributions; self-consistent LCAO implementations thereof have been reported by Vydrov and coworkers [
21,
22,
23,
24].
Despite the progress in and widespread success of DFT, to our knowledge, a uniform derivation of the SCF equations for HF and DFT including all the necessary expressions for the elements of the Kohn–Sham–Fock matrix up to the level of meta-GGA functionals has, up to now, not been explicitly published in the literature. This has likely contributed to the lack of complete support for meta-GGA functionals in popular quantum chemistry programs; for instance, PSI4 [
25] and PySCF [
26] lack support for meta-GGAs that depend on the Laplacian of the density such as the Becke–Roussel exchange functional [
27], for example. This paper, therefore, presents such a derivation, yielding expressions of the DFT contributions to the Kohn–Sham–Fock matrix up to the level of meta-GGA functionals in a consistent way, facilitating the implementation of DFT in new programs.
The present derivation also has an obvious educational value. Indeed, in what follows, HF and various flavors of DFT belonging to different rungs of Jacob’s Ladder [
28]—the local spin density approximation (LDA), the GGA and meta-GGA approximations—will be explicitly described in a uniform notation, making the similarities and dissimilarities between the approaches crystal clear. Facilitated by the uniform derivation, we will discuss key issues and features in the HF and DFT methodologies that arise from the mathematical formulation.
First, the basis set expansion of the molecular orbitals and the electron density is written out in
Section 2. Then, the energy expression for HF and DFT is presented in
Section 3, with a brief explanation of their physical content. The HF and DFT energy is shown to be invariant to rotations of the occupied and of the virtual orbitals in
Section 4, allowing the construction of localized orbitals. The possibilities and drawbacks of spin-restricted calculations are discussed in
Section 5. The finite-basis SCF equations are derived as generalized eigenvalue equations in
Section 6. It is shown that the general eigenvalue equations can be reduced into normal eigenvalue equations by a transformation to an orthonormal basis in
Section 7, and that linear dependencies in the basis can be eliminated on the way. The reason why the solution of the SCF equations amounts to a minimization of the total energy is rationalized in
Section 8. Direct minimization methods are briefly introduced and stability analysis discussed in
Section 9. The SCF method and direct minimization are contrasted in
Section 10. Finally, the contributions to the Kohn–Sham–Fock matrix arising from various-rung DFT functionals are listed in
Section 11. The article concludes with a brief summary and discussion in
Section 12. Atomic units are used throughout the text.
2. Basis Set Expansion
In the HF and DFT approaches, the electronic wave function is written as a Slater determinant, in which the electrons occupy a set of MOs
. The MOs are expanded in terms of normalized expansion functions
, which are typically AOs. Both
and
, as well as the LCAO coefficients
, are typically chosen to be real in the lack of magnetic interactions that would generally make the Hamiltonian operator complex. Note, however, that the use of complex coefficients has been shown to be sometimes beneficial to describe challenging systems even in the lack of magnetic fields in Hartree–Fock (as reviewed in [
29]) or DFT (as shown in [
30]); complex instabilities may also arise in specialized methods beyond the SCF level, see [
31] for instance. The expansion functions are generally not orthonormal
where
is the Kronecker delta:
if
and
otherwise. Greek letters,
,
and
will be used to identify the expansion functions
, whereas Roman letters,
i,
j and
k will be used to identify the MOs
. The
(spin-up) and
(spin-down) MOs are expanded separately as
Both the
and
MOs are orthonormal to themselves
or equivalently within the basis set
However, the
orbitals are generally not orthonormal to the
orbitals:
or
The electron density plays a pivotal role in quantum chemistry. In line with chemistry literature,
will be used to denote the electron density at the point
in contrast to the physics notation
which is customary in the DFT literature. The total electron density is formed from the
and
densities,
and
, as
. The spin-
electron density can be evaluated as
in which
is the number spin-
electrons in the system, and the sums over basis functions
and
indicate
and
, respectively; this convention for easier readability of sums over basis functions is used throughout the rest of this work.
The density matrix
has been defined in Equation (
4) as
As is evident from the form of Equation (
5), the density matrices are symmetric,
. As was already mentioned above, the total electron density is obtained from the sum of the
and
densities. Correspondingly, a total density matrix is given by
from which the total density can be evaluated using a relation analogous to Equation (
4).
3. Energy Expression
The starting point for the derivation is the non-relativistic energy expression [
5,
8,
13,
16],
where the electron repulsion integral
is defined as
and
a and
b are constants that define the fraction of HF exchange and the weight of the density functional approximation, respectively. The choice
and
corresponds to HF, whereas
and
yields a “pure” density functional without exact exchange such as the Perdew–Burke–Ernzerhof functional [
32]. The choice
and
is the most general one, which corresponds to a hybrid functional [
33] that are popular in quantum chemistry; perhaps the most famous example being the historical B3LYP functional [
34].
The first term in Equation (
7), which will be referred to as
, describes the kinetic energy of the electrons and the Coulombic attraction of the
N nuclei in the system, with the matrix elements
The one-electron operator in Equation (
9) is commonly known as the core Hamiltonian, and the resulting
is the dominating contribution to the total energy.
However, the core Hamiltonian lacks electronic interactions. These are described by the second and third terms in Equation (
7), which describe the classical Coulomb and the quantum mechanical “exchange” energy, and are referred to as
and
, respectively. The
contribution to the total energy can be straightforwardly derived from the expression for the Coulomb repulsion between the electrons described by the electron density
whereas the expression for the exchange energy contribution
can be obtained, for instance, using Slater’s rules for a HF wave function (
).
The final term in Equation (
7), referred to as
, describes the DFT exchange-correlation contributions which, alike
and
, arise from electronic interactions. The exchange-correlation term is commonly written as
where
is the exchange-correlation energy density per electron. Usually
is a function of the electron density
; it may also depend on the derivatives of
and the kinetic energy density
, depending on which rung of Jacob’s Ladder [
28] is used to the describe the exchange-correlation effects. The various rungs are discussed in
Section 11.
4. Unitary Invariance
The
and
matrices turn out to be invariant to rotations of the occupied orbitals among themselves. Rotating the occupied subset of the molecular orbitals
by a orthogonal matrix
defines a new set of occupied orbitals
the MO coefficients of which can be obtained as
This can also be written in matrix notation as
The invariance to rotations in the occupied-occupied block is easy to prove, as
where we have used the orthogonality of
,
.
The invariance to rotations in the occupied-occupied block can be used to fashion localized orbitals, for instance using an unitary optimization procedure [
35]. Although localized orbitals are not strictly speaking observables—due to which several localization criteria have been suggested in the literature [
36,
37,
38,
39,
40]—they have been shown to offer an effective way to study chemical reactions with ab initio calculations [
41,
42,
43].
In addition to the occupied orbitals, in general there are also a number of unoccupied orbitals, which are commonly known as virtual orbitals. The number of virtual orbitals in any given calculation depends on the size of the basis set: the bigger the basis is, the more virtual orbitals there are. Because the virtual orbitals do not enter into the density matrix, the HF and DFT energy expression, Equation (
7), is also invariant to rotations in the virtual-virtual block, similarly allowing their localization. However, as will be seen below in
Section 9, the energy can be changed by mixing virtual orbitals into the occupied orbitals [
44]. This approach provides another way to optimize the orbitals directly with, e.g., a gradient descent method, such as the geometric direct minimization method described in [
45].
5. Spin-Restriction vs. Unrestriction
The molecular orbitals are obtained from the requirement that they minimize the total energy according to Equation (
7). However, one must first choose the used spin formalism. The general choice is to use different spatial orbitals for the
and
electrons, in which case a spin-unrestricted approach is obtained. The unrestricted approach is often used even in systems in which there are an equal number of alpha and beta electrons,
: although the spin-restricted and unrestricted descriptions often reproduce matching results for such systems near the equilibrium, only the unrestricted formalism is able to break bonds in general. The reason for this is that when molecules are stretched past the Coulson–Fischer point [
46], the optimal orbitals spontaneously break spin symmetry, which can only be described in the unrestricted formalism. At variance, in the spin-restricted case the electrons occupy a common set of
spatial orbitals. The limitation of the spatial orbitals to be the same for both spins,
, yields less variational freedom, and prevents the correct dissociation of e.g., the H
molecule. As a flip side, the spin-restricted formalism affords computational savings over the unrestricted approach. The spin-restricted density matrices, Equation (
5), reduce to
meaning, e.g., that the
and
exchange terms in Equation (
7) coincide and can be simplified.
Spin-restriction is also possible in the case in which
. In this case, a restricted open-shell method is obtained. Restricted open-shell methods are more involved than the spin-restricted and spin-unrestricted methods discussed in the present work. Restricted open-shell methods have been extensively discussed in [
11] to which we refer for further discussion.
6. Self-Consistent Field Equations
Having chosen to use either spin-restricted or spin-unrestricted orbitals, one can proceed to the minimization of the energy expression in Equation (
7). The energy expression depends only on the
and
density matrices
and
and their sum
. The density matrices, in turn, are determined by the lowest
and
molecular orbitals according to Equation (
5). Because the energy expression in Equation (
7) thus only depends on the density matrices
and
, it is expedient to use the chain rule to write, e.g.,
where the partial derivative of the density matrix element
is
The
orbital derivative of the total density matrix has the same form as Equation (
14), where all
are replaced with
. Note that these findings hold even when the same orbitals are used for both
and
in a spin-restricted formalism, since the
and
orbitals are formally independent.
Due to the chain rule, Equation (
13), all we need are the density matrix derivatives of the energy expression. We only have to calculate the derivatives of the energy expression of Equation (
7) for one spin, as the energy expression is symmetric with respect to the
and
densities. It does not matter which spin we choose to be “up”; the expressions for the other spin will follow by symmetry by interchanging
. The first term of Equation (
7) yields simply
Next, taking the partial derivative with respect to
of the Coulomb and exchange terms in Equation (
7) results in
where
is known as the Coulomb matrix, and
where
is the spin-
exchange matrix, respectively. The Coulomb and exchange matrices can be used to rewrite the energy expression in Equation (
7) as
Note that in contrast to the Coulomb and exact exchange terms, the exchange-correlation term does not undergo simplifications, because the exchange-correlation term is not quadratic in the density matrix, as will be seen later in
Section 11. For the time being, we will denote the partial derivative of
with respect to
as
as the full expressions for
will be presented in
Section 11. Now, collecting the partial derivatives in Equations (
15)–(
19) gives us the density matrix derivatives of the energy expression as
where we have identified the Kohn–Sham–Fock matrices
, where
denotes
or
. Because the density matrices defined by Equation (
5) are symmetric, also the Fock matrices are symmetric,
. Note that since the Fock matrices only depend on the density matrices, also they are invariant to occupied-occupied and virtual-virtual rotations,
.
Naïvely, one would obtain the orbital derivative of the full energy expression in Equation (
7) with Equations (
13), (
14) and (
20) and then set it to zero to yield an equation for the unknown expansion coefficients
. However, the molecular orbitals cannot be varied freely: one must make sure that the orbitals stay orthonormal during the variation, as otherwise the Pauli exclusion principle would be violated. For instance, the orthonormality condition for the
electrons is
The way to enforce these conditions is to use Lagrangian multipliers
. That is, instead of the bare energy expression
E, we will optimize the Lagrangian
where the sums over
i and
j run over all orbitals; that is, both the occupied and the virtual ones. We can see from Equation (
22) that the matrices of Lagrangian multipliers
and
can be chosen to be symmetric. For instance, if
contained a symmetric part
and an antisymmetric part
,
, the contribution from the antisymmetric part would vanish because it is multiplied with the orbital overlap that is symmetric.
Next, we can calculate
, where
is given by Equations (
13), (
14), (
20) and the derivative of the constraint term is given by
where on the third line dummy summation indices have been renamed from
j to
i and
to
. The derivative can be evaluated as
because
and
are symmetric, and dummy summation indices can be renamed.
The optimal orbitals satisfy the stationary condition
from which
Equation (
25) can thus be written in matrix form as
where
is the (symmetric) matrix of Lagrangian multipliers.
Because
is symmetric, it can be diagonalized and it has real eigenvalues. Let us now assume that
is an orthogonal matrix that diagonalizes
where
are the eigenvalues. Re-expressing the orbital coefficients
in terms of a new set of orbitals rotated by
with Equation (
10),
, rewrites Equation (
26) in the form
that can be multiplied from the right by
producing
where, according to Equation (
27),
is a diagonal matrix with elements
.
Equation (
28) is almost what we want—an equation in the rotated basis that looks like Equation (
26) with a diagonal
—but one problem remains: the Kohn–Sham–Fock matrix is still the one corresponding to the original orbitals
instead of the transformed orbitals
, while the orbital rotation by
that takes us from
to
may lead to a different Kohn–Sham–Fock matrix
. However, if we choose the form of
such that the occupied-virtual (ov) and virtual-occupied (vo) blocks vanish
then
only rotates occupied orbitals with occupied orbitals and virtual orbitals with virtual orbitals, meaning that the orbital rotation does not change the density matrix given in Equation (
11). Then, the Fock matrix corresponding to the rotated orbitals coincides with that of the original orbitals,
, completing the proof that
can be chosen to be diagonal. (Occupied-virtual rotations
or
, discussed in more detail in
Section 9, are in fact here forbidden: the SCF equations were derived with the assumption that the energy is stationary, but this condition would instantly be violated by such rotations.)
We have thus obtained the Berthier–Pople–Nesbet [
8,
9,
13,
16] equations for the orbital coefficients
where the primes have become unnecessary and have been omitted for simplicity. The elements of the Kohn–Sham–Fock matrices
and
are given by Equation (
20), and the orbital energy matrices
and
are diagonal. In the spin-restricted case [
5,
16] the
and
molecular orbitals coincide, leading to identical density matrices
and
, and identical Fock matrices
and
. In this case, the SCF equations simplify to the Roothaan–Hall form
which was already mentioned in the Introduction.
7. Solution of Self-Consistent Field Equations
The Roothaan–Hall and Berthier–Pople–Nesbet expressions take the form of a generalized eigenvalue equation. The conventional way to solve these equations is to re-express the (unknown) orbital coefficients in terms of a matrix
as
Inserting Equation (
32) into the Roothaan–Hall equation, Equation (
31), yields
which can be multiplied from the left with
to yield
This means that the orbital transform of Equation (
32) yields a new generalized eigenvalue equation
where
and
. Now, if we choose
in such a way that
, Equation (
33) reduces to a normal eigenvalue equation
which can be solved with standard techniques. Then, the wanted orbital coefficients
can be calculated from
using Equation (
32).
If the basis set is well-conditioned, the matrix
can be chosen as
where
and
are the eigenvectors and eigenvalues of
This procedure is known as symmetric orthogonalization [
7].
However, if a large LCAO basis is used, the atomic orbital basis functions centered on different atoms may generate linear dependencies in the basis, making the basis set expansion ambiguous. These linear dependencies can be removed with the “canonical” orthonormalization procedure [
47], in which
where only those eigenvectors
with large enough eigenvalues
are included. The threshold
is typically of the order of
, and its value may have a noticeable effect on, e.g., the absolute energies that result from a SCF calculation; relative energies, however, should be less sensitive to
. If no eigenvalues fall under the threshold
, the symmetric and canonical orthogonalization approaches become equivalent for the purposes of SCF calculations in the case of a well-conditioned basis set: both yield an orthonormal basis of the same size, which will yield the same variational ground state energy.
Unnormalized basis sets can also be handled easily by the orthogonalization procedure. Although in principle it is not necessary to normalize the individual basis functions before obtaining an orthonormal basis by Equations (
35) and (
37), computer linear algebra packages may fail to find the eigenvalues and eigenvectors in a reliable fashion if the basis functions have pronouncedly different norms. Moreover, missing normalization of the basis set affects the eigenvalues, which has repercussions for canonical orthogonalization. These issues can be circumvented by normalizing the overlap matrix
where
before using Equations (
35) and (
37) [
3,
4]. The orthogonalizer for the unnormalized basis set is obtained as
; it is easy to see that this satisfies the necessary condition
even though the symmetricity of
for the case of Equation (
35) will be lost.
Even if
has been properly normalized, the use of the symmetric or canonical orthogonalization procedures still requires that the diagonalization of
is numerically stable. However, whenever a large number of linear dependencies exists in the basis set (e.g., a large number of diffuse functions are used or two nuclei are close together),
may become so ill-conditioned it cannot be accurately diagonalized. In such cases it is possible to reduce the size of the basis set without losing a significant amount of accuracy by an automatic procedure, see [
48,
49] for details.
8. Why Does the Self-Consistent Field Method Minimize the Energy?
The SCF equations, Equation (
30) or Equation (
31), offer a way to solve for the molecular orbitals described by
from a Kohn–Sham–Fock matrix
by finding its eigenvectors from Equation (
34). However, the Kohn–Sham–Fock matrix depends on the density matrices, which are built from the molecular orbitals according to the Aufbau principle. In the SCF procedure, one tries to find a self-consistent solution:
yields
, whose eigenvectors are
. The procedure starts from an initial guess for the orbitals
or the density matrices
, which have been recently reviewed and benchmarked in [
50] to which we refer for further details.
Why does the self-consistent field procedure—diagonalizing
to update the orbital coefficients
—correspond to minimization of the Hartree–Fock/Kohn–Sham energy? For simplicity, let us examine the case of HF theory. The energy expression, Equation (
18), can be written in this case (
,
) as
The Fock matrix elements, Equation (
20), are given by
Equation (
38) can be rewritten with Equations (
39) and (
40) as
Expanding the density matrices using Equation (
5) we see that Equation (
41) can be written as
where the core Hamiltonian and Fock matrices have been written in the molecular orbital basis,
and
.
If one were to start the calculation from the core guess, then
and
would be minimized. However, as discussed in [
50], this is a horrible choice as it completely disregards electronic repulsion effects, meaning that the
and
terms are far from optimal. The Roothaan step—obtaining new molecular orbitals by diagonalization of the Fock matrix—results in a minimization of the
and
terms, as after diagonalization only the lowest orbitals become populated and the sum thus runs only over the lowest eigenvalues
. After the update,
and
no longer yield their lowest possible values. However, the increase in the value of
should be much smaller than the decrease in the value of
, as the Fock matrices
and
also contain the core Hamiltonian. It is thus seen that Roothaan’s self-consistent field method, that is, the iterative diagonalization of the Fock matrix minimizes the energy.
However, the minimization is only valid for a fixed potential
in which the electrons are moving. When the orbitals are changed—as happens when
is made diagonal and its lowest eigenvectors occupied—a new Fock matrix
must be built and a new
constructed: the potential also changes with the electron density. If the orbitals were far from their optimal values,
and therefore
may change quite radically by the orbital update. This means that even though
was made diagonal in the previous iteration, it is no longer diagonally dominant after it has been updated. Indeed, the straightforward iterative diagonalization procedure often fails to converge for all but the simplest systems, because the density tends to undergo large oscillations in the naïve self-consistency cycle. To make the method usable, the convergence of the fixed-point problem of finding a
that generates
that generates
must be stabilized or accelerated in some way. This can be achieved, e.g., by damping [
51,
52], level shifts [
53,
54,
55], or extrapolation [
56,
57,
58,
59]. Fractional occupations can also be used in the initial iterations to aid convergence [
60].
The argument for why density functional calculations converge similarly to HF with the iterative Roothaan procedure is somewhat less obvious, because unlike HF the exchange-correlation functional is not generally quadratic in the density. However, the total energy expression
is approximately quadratic also in DFT when one is sufficiently close to an extremal point, as is easily seen by a Taylor expansion of Equation (
7). In practice the iterative procedure works well also for DFT, whose contributions to the Kohn–Sham–Fock matrix we will discuss in
Section 11.
9. Direct Minimization of the Energy
Instead of solving the orbitals from the SCF equations, which were obtained in
Section 6 from the stationary condition for the energy under the constraint of orthonormal orbitals, the orbitals can also be optimized by a direct minimization of the energy. As was discussed in
Section 4, the energy expression of Equation (
7) is invariant to occupied-occupied and virtual-virtual rotations. This means that if we have
occupied orbitals and
virtual orbitals from some initial guess (see possible choices in [
50]) for spin
, we can consider the energy as a function of a set of
rotation angles [
44] by examining a rotation of the orbitals via Equation (
10) by an orthogonal matrix
where
is an
matrix containing the rotation angles. The rotation matrix determined by Equation (
43) reduces to an identity matrix for vanishing rotation parameters,
. Because the rotation matrix of Equation (
43) is orthogonal, it automatically preserves the orthonormality of the orbitals, and special tricks i.e. Lagrangian multipliers are not needed to enforce this behavior.
The change in the density matrix is given by
How do the orbital coefficients change? Remembering that the first
orbitals are occupied, and the rest are virtual, we can write
. After an infinitesimal rotation
, the occupied orbitals change into
, that is
from which
. Now the gradient of the energy with respect to rotation of the current set of orbitals can be obtained as
where
is the Fock matrix in the MO basis. Direct minimization of Equation (
7) can then be pursued using Equation (
46) with, e.g., gradient descent methods. However, a proper preconditioning of the search direction is essential in order for the algorithm to be usable; see, e.g., the geometric direct minimization method described in [
45]. Many other direct minimization methods for the HF or DFT energy have also been proposed, and we refer the interested reader to the vast existing literature that cannot be comprehensively cited here.
10. SCF vs. Direct Minimization
Having described two alternative ways for solving the orbitals, we can discuss their advantages and disadvantages. The self-consistent field method is hard to beat for systems where convergence is straightforward: a suitably stabilized and accelerated SCF procedure often converges within 10 to 20 iterations when a suitable initial guess (see [
50]) has been provided. However, when the gap between the highest occupied and lowest unoccupied orbital is small, which commonly occurs in, e.g., first-row transition metal complexes, the SCF procedure may become extremely slow to converge, oscillate between two or more solutions, converge to a higher-lying solution, or even to a saddle-point solution. Namely, it is critically important to realize that even if the orbital gradient vanishes, or equivalently, that the SCF equations are fulfilled, this does not mean that the energy expression Equation (
7) truly has been minimized. Because there are typically several occupied as well as virtual orbitals, the minimization problem involves a large number of degrees of freedom. In multivariate calculus, a vanishing gradient only means that the orbitals correspond to some kind of extremum of the energy: a local minimum, a saddle-point solution, or even a local maximum, although the lattermost is highly improbable in SCF calculations.
In contrast to the sometimes erratic behavior of the SCF method, direct minimization based on orbital rotations is guaranteed to converge onto an extremal point
per the theory of numerical analysis; this is of great worth when studying systems with complicated electronic structures for which conventional SCF algorithms fail. However, more predictable convergence does not come for free: the downside of direct minimization methods is that they carry a higher computational cost due to, e.g., the use of line searches in the orbital optimization. Direct minimization methods can also be formulated at the second order, yielding more robust convergence to a local minimum at the cost of more computational resources per iteration, see, e.g., [
61,
62,
63]. Because direct minimization methods are based on an explicit rotation of the orbitals, they are able to always follow the same solution at variance to SCF methods where the orbital occupations are typically reset at every iteration according to the Aufbau principle. Because of this, direct minimization can lead to a solution where the Aufbau rule is violated, that is, the highest occupied orbital lies higher in energy than the lowest unoccupied orbital. Direct minimization methods can also be straightforwardly applied in more complicated electronic structure theories than self-consistent field theory. Such methods may especially include explicit dependence on the molecular orbitals, as discussed by one of the present authors in [
31,
64,
65] for the Perdew–Zunger self-interaction correction [
66] which depends explicitly on the
occupied orbitals, and [
67] for the perfect quadruples [
68] and perfect hextuples [
69] models that also depend on the
corresponding virtual orbitals.
In order to check the character of the extremum found by the SCF procedure or a direct minimization method, it is necessary to continue the analysis to second-order changes in the energy with respect to the orbital rotations by finding the lowest eigenvalue of the Hessian matrix: if it is negative, rotating the orbitals in the direction of the corresponding eigenvector will result in a further decrease of the energy. Whenever post-HF calculations are performed, or benchmark-quality values are sought at the SCF level, stability analysis [
70,
71] should be used to guarantee that the wave function indeed corresponds to a local minimum. Alternatively, trust-region methods [
72,
73,
74] can be employed to ensure that the orbitals converge onto a true local minimum.
As always in the minimization of multivariate functions, locating the global minimum is difficult, and typically the best one can hope for is to find a local minimum. Some systems permit several local electronic minima: for instance, charge transfer complexes may allow both a neutral (X ⋯ Y) as well as an ionic (X
Y
) solution. Finding such physically motivated solutions is often straightforward by suitable manipulations of the initial guess, for instance, by constructing guesses via the superposition of atomic potentials [
75,
76] with the correct atomic charges. Sometimes it may also be interesting to locate saddle-point solutions, which have physical interpretations as excited states. Specific excited states can be explored within the SCF approach by replacing Aufbau population of the orbitals with overlap criteria [
77,
78] or with direct minimization by replacing the energy with the square of the gradient [
79]; for instance, such an approach has been recently shown to predict highly accurate core spectra [
80]. The full space of SCF solutions can be explored via, e.g., meta-dynamics [
81].
11. Density Functional Contributions to Kohn–Sham–Fock Matrix
In
Section 6 we derived expressions for the Kohn–Sham–Fock matrix elements for all but the density functional contribution
which we will consider next. Hundreds of density functionals
of various forms have been published in the literature in the recent decades [
82], and offering a comprehensive selection thereof poses a considerable challenge to quantum chemistry software developers. This problem is further exacerbated by the need to keep track with the several new functionals still being published every year. Moreover, the density functionals
typically carry extremely complicated functional forms, making their correct implementation painstaking work. The implementation is made even more difficult by the need to compute the first derivatives of
for the SCF procedure, as well as several higher-order ones for, e.g., the calculation of various properties.
Fortunately, these challenges have been obviated by freely available, portable standard implementations such as
LibXC [
83] and
XCFun [
84]. The
LibXC software package strives to implement
all DFT functionals published in the literature, and provides a uniform interface to ∼500 functionals of various forms. At present,
LibXC is used by ∼30 electronic structure programs based on various numerical representations that range from basis set approaches (Gaussian-type orbitals, Slater-type orbitals, numerical atomic orbitals, finite elements, plane waves) to finite difference procedures. New functionals only have to be added once to
LibXC, meaning the library is easily kept up to date, after which they become available to all programs that support the corresponding rung of functionals on Jacob’s ladder [
28]. Next, we will derive the equations necessary to implement the various rungs’ functionals in the variational basis set approach.
11.1. LDA Functionals
The simplest density functional approximations (DFAs), belonging to the first rung of Jacob’s Ladder [
28], are generally referred to as local (spin) density approximations (LDAs). These are functions of only the electron density [
13]
such as the LDA exchange functional [
85,
86]
Assuming
f has the form of Equation (
48), the resulting contribution to the Kohn–Sham–Fock matrix
can be evaluated using Equation (
5) for the densities at point
as [
16]
with
having an analogous expression. Note that if the integral is evaluated using numerical quadrature,
Becke’s multigrid approach [
87] and further developments thereof being the standard approach in LCAO programs, the expression of Equation (
51) can be most efficiently formulated with matrix products. Storing the values of the basis functions at the quadrature points as a matrix
and defining a scaled version thereof as
the Fock matrix contribution can be evaluated as simply as
which is orders of magnitude faster than a simple
for loop based algorithm.
11.2. GGA Functionals
The second rung of Jacob’s Ladder [
28] is referred to as the Generalised Gradient Approximation [
88] (GGA). Density functional approximations on this rung also depend on the derivatives of the density
via the reduced gradients
,
, and
, with the gradient of the density
being determined by
The GGA contribution to the Fock matrix is given by [
16]
The
expression can be obtained by switching
and
in Equation (
52). In the restricted case,
with
, which leads to the DFT contributions to
given by the simpler expression
Practical implementations of Equations (
52) and (
53) can again be formulated using matrix products.
11.3. Meta-GGA Functionals
On the third rung on Jacob’s Ladder [
28] are the meta-GGA (mGGA) approximations
in which
and
are obtained as
The meta-GGA contributions to the Kohn–Sham–Fock matrix are straightforwardly obtained as [
20]
which can again be expressed in terms of matrix products to achieve faster quadrature. The expressions remain formally the same in the restricted case, but the quantities correspond to the total electron density.
11.4. Range-Separated Hybrid Functionals
As was mentioned before, the use of non-zero values for the constants
a and
b in Equation (
7) allows the inclusion of exact exchange effects in a DFT calculation. Such functionals represent the fourth rung of Jacob’s Ladder [
28], and are generally referred to as hybrids. A further development on hybrid functionals are range-separated hybrids [
89,
90], in which the interelectronic interaction is divided into a short-range (sr) and a long-range (lr) part with a resolution of the identity
where
. The rationale for range separation is that since density functional approximations for the exchange are based only on local information about the density, they fail to reproduce accurate estimates for, e.g., charge transfer processes. Separating the interaction by range per Equation (
57) leads to a hybrid exchange functional that has four contributions
where we have stressed that since the DFT contributions are evaluated based only on the density (and possibly its derivatives),
is nothing but a definition of a new density functional. In contrast, the HF contributions to the energy and the Kohn–Sham–Fock matrix have to be evaluated separately with range-separated ERIs
Several kinds of range-separation kernels
have been proposed; however, the error function based kernel
,
, where
is the range-separation parameter, is by far the most commonly used one because it is exceedingly simple to implement in codes employing a plane-wave or Gaussian basis set [
91,
92]. The error function kernel is used, for instance, in the Heyd–Scuseria–Ernzerhof (HSE) functionals for solid-state calculations [
91,
93], as well as in the
B97M-V [
94] functional that is discussed below in
Section 11.5. Some functionals based on Yukawa kernels,
,
, have also been published and are available in
LibXC, for instance. It is important to check that the range-separation kernel used in the density functional implementation matches the one used in the computation of the range-separated ERIs in Equation (
59), as the results will be incorrect otherwise.
11.5. Non-Local Correlation
Dispersion effects, i.e., van der Waals interactions, can be modeled in an
ab initio DFT setting with non-local correlation functionals [
95]
Because the non-local correlation energy term depends explicitly on the electron density, it also needs to be included in the SCF procedure, in principle. In contrast, empirical dispersion corrections such as Grimme’s various DFT-D approaches [
96,
97,
98] do not depend on the electron density, and are added only as an
ad hoc correction onto the electronic energy.
Perhaps the most accurate rung-3 and rung-4 functionals currently available [
99,
100,
101], the pure B97M-V [
102] mGGA as well as the range-separated
B97M-V [
94] hybrid mGGA, respectively, are built on top of [
103] the VV10 non-local correlation functional [
23] which is controlled by two adjustable parameters,
b and
C, which have been trained alongside the density functional in B97M-V and
B97M-V. The results of a recent benchmark study suggest that the VV10 contributions on densities and orbital energies are negligible, and that sufficiently accurate energetics may be obtained by a one-shot evaluation of
in a post-SCF fashion [
99]. Still, a rigorous minimization of the energy requires considering the effects of the non-local correlation on the wave function. Although Equation (
60) does not appear to fit on the rungs of functionals discussed above, the VV10 kernel turns out to yield a GGA-type contribution to the Kohn–Sham–Fock matrix as discussed in [
23], to which we refer for further details.