A Diagonally Updated Limited-Memory Quasi-Newton Method for the Weighted Density Approximation

We propose a limited-memory quasi-Newton method using the bad Broyden update and apply it to the nonlinear equations that must be solved to determine the effective Fermi momentum in the weighted density approximation for the exchange energy density functional. This algorithm has advantages for nonlinear systems of equations with diagonally dominant Jacobians, because it is easy to generalize the method to allow for periodic updates of the diagonal of the Jacobian. Systematic tests of the method for atoms show that one can determine the effective Fermi momentum at thousands of points in less than fifteen iterations.


Introduction
The accuracy and scope of density functional theory calculations is limited by the need to approximate the density functionals for the exchange-correlation energy [1][2][3][4].This motivates the robust stream of research into exchange-correlation functionals and the development of new strategies for approximating the exchange-correlation energy.Based on the realization that local and semilocal forms for the exchange-correlation energy functional will always give qualitatively incorrect results for some systems [5][6][7][8][9][10][11][12][13], there has been significant recent work on developing nonlocal density functionals.Some of this work reexamines the very first strategy for nonlocal density functional: the weighted density approximation [14][15][16][17].The goal of this work is to present a new diagonally updated limited-memory quasi-Newton method we developed for solving a system of nonlinear equations associated with the weighted density approximation.In the remainder of this section, we present background material on the weighted density approximation, so that the nature of the system of nonlinear equations we are solving is clear.(In particular, the system of nonlinear equations is very large, but the Jacobian is dominated by contributions from its diagonal.)Section 2 presents the new method we developed.
For simplicity, in this paper we will consider only the weighted density approximation for the exchange energy, although weighted density approximations for the exchange-correlation and kinetic energy are also available [11,[18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36].The basic strategy of the weighted density approximation for the exchange energy is to approximate the exchange hole for the electrons, where γ σ (r, r ) is the one-electron reduced density matrix of the noninteracting Kohn-Sham reference system with electron spin-density ρ σ (r).The exchange energy of σ-spin electrons can be computed from electron spin-density and the exchange hole, Many of the failures of modern density functionals are associated with the self-interaction errors that afflict traditional approaches to designing approximate functionals [37][38][39][40].One advantage of the weighted density approximation is that the self-interaction error is eliminated by explicitly imposing the normalization of the exchange hole, When Equation (3) holds, the exchange(-correlation) potential has the correct asymptotic decay [38,[41][42][43][44][45].This helps to explain why weighted density approximations tend to give more accurate excitation energies, band gaps, and frontier molecular orbital energies than traditional semilocal density functional approximations [36,[46][47][48][49][50][51][52][53][54][55][56][57].The relatively simple version of the weighted density approximation we consider here is not especially accurate (because the exchange hole is more localized than it should be), but its biggest disadvantage is its high computational cost.A similar normalization requirement to Equation (3), when applied to the direct correlation function, allows one to build nonlocal exchange-correlation holes, which is a promising strategy for increasing the accuracy of the weighted density approximation [19,20,58].In this contribution, we shall focus instead on reducing the computational cost of weighted density approximation calculations.
Inspired by the mathematical treatment of the nearly uniform electron gas, it is traditional (albeit not ubiquitous [19,20,36,[58][59][60][61][62]) to approximate the exchange hole as a function of the form where k σ F (r, r ) is interpreted as the effective Fermi momentum associated with an electron configuration with σ-spin electrons at positions r and r .We shall choose η k σ F |r − r | to have the form appropriate for the uniform electron gas, namely This ensures that the uniform-electron gas limit is retained.Garcia-Gonzalez et al. proposed modelling the two-point effective Fermi momentum, k σ F (r, r ) , as the power-mean of the effective Fermi momentum at the points separately [27,63], In this study, we use p = 0.001, which is very close to the geometric mean.(We confirmed that our qualitative conclusions are valid for all 0 < p ≤ 1 2 .)The multidimensional integrals in Equations ( 2) and (3) were evaluated numerically, using Becke-Lebedev quadrature [64].
The impediment to the widespread adoption of the weighted density approximation is its computational cost.The normalization constraint for the exchange hole, Equation (3), must be imposed at every point on the numerical integration grid, and evaluating the normalization integral at each point requires a three-dimensional numerical integration.For typical atoms and molecules, determining the effective Fermi momentum requires solving a nonlinear system of ~10 4 -10 6 equations (Equation (3) at each grid point) with respect to the same number of unknowns (k σ F (r) at each grid point), subject to an inequality constraint at each grid point (k σ F (r) > 0, because it is physically absurd for the Fermi momentum to be nonpositive).Moreover, each of the nonlinear equations is relatively expensive to evaluate, since it requires a numerical integration.
The traditional approach to density-functional theory avoids this problem by performing the integral over r in Equation (2) analytically, defining the exchange-correlation energy density, An alternative, but more convenient, intermediate quantity is the system-averaged spherically-averaged exchange charge, where dΩ u denotes integration over the solid angle associated with u = r − r. ( n x (u) is often called the system-averaged spherically-averaged exchange hole, but we prefer to distinguish between the exchange hole, h x (r, r ), and the exchange charge, ρ(r )h x (r, r ), which is the charge that is depleted due to the presence of an electron at r .)The exchange-correlation energy can then be evaluated using The normalization constraint is applied at the level of the functional by requiring that the system-averaged spherically-averaged exchange charge is normalized, Equation ( 10) is a far weaker constraint than Equation (3), as it only requires normalization in an average sense.When the electron density is (nearly) uniform, its behavior at one point in space is representative of its system average, imposing Equation (10) (nearly) removes the self-interaction error.This is not true for systems like atoms and molecules, where the chemically relevant regions of the electron density vary over many orders of magnitude.This is clear from the observation that the weighted density approximation using Equation (5) gives results that are quite different from the uniform density approximation, which is obtained if one assumes a homogeneous electron density and performs the integral in Equation (7) analytically to obtain an approximate formula for the exchange energy [18,65].
Motivated by our desire to explore the weighted density approximation in more detail, in the next section, we will present a diagonally-updated limited memory bad Broyden method that allows us to solve the large system of nonlinear equations associated with the weighted density approximation in a computationally efficient manner.Then we present the results of a numerical test of this method (Section 3) and discuss its implications and the possibilities for further improvement (Section 4).

Methods
The methods we shall use for solving the nonlinear equations for the weighted density approximation are generally applicable, so we will first rewrite the problem (cf.Equations ( 3)-( 6)) in a general form.Suppose there are G points in the numerical integration grid, r g G g=1 .The unknown is the value of the Fermi momentum at each of these points, which we denote Because the equations for the different spins are decoupled, we shall simplify our notation by omitting the spin-index, σ.There is a nonlinear equation for each grid point also, which has the form, where Here {w h } G h=1 denote the numerical integration weights.The Jacobian for this system of equations is defined as Using Newton's method, one could then define an update for the variables, where j −1 gh are the elements of the inverse Jacobian matrix, J −1 .In matrix-vector notation, Equation ( 14) can be written as The method is then repeated using the updated variables k new to evaluate the equations, (11), and the Jacobian thereof, until convergence is achieved.
In practice, this technique does not always converge.Convergence can be enhanced by scaling the step, using either a trust radius or a line search.Specifically, the update is rewritten as where ζ is chosen so that the step-size is decreased whenever the Newton update (Equation (15), which is derived by truncating the Taylor series for f(k) at first order) is unreliable.
For large systems of ∼ 10 4 or more equations, evaluating and inverting the Jacobian for each Newton iteration, Equation (16), is prohibitively expensive.This motivates quasi-Newton methods, where the Jacobian or its inverse is approximated.One of the simplest quasi-Newton methods is to evaluate only the diagonal elements of the Jacobian, The inverse of a diagonal matrix is simply the inverse of its diagonal elements, so this gives a very simple update formula, The system of equations associated with the weighted density approximation, Equation ( 12), gives a Jacobian that is strongly diagonally dominant, so this formula is appropriate for our application, and we shall test it in Section 3.
One can often improve the simple diagonal update formula if one uses information from previous iterations.As an example, suppose one has the function and argument values from several previous iterations, f 1 , f 2 , . . ., f m and k 1 , k 2 , . . ., k m , respectively.(Here and in the following, we use f m to denote the equations evaluated at the argument k m , f(k m ).)If the vector k m − k n is short enough, then These secant conditions can be used to improve an initial estimate for the Jacobian (Equation ( 19)) or inverse Jacobian (Equation ( 20)).The best-known methods for doing this are the methods proposed by Broyden [66,67], where and, more generally, we define The Jacobian in Equation ( 21) is usually called the Broyden update for the Jacobian.The method based on the inverse Jacobian, Equation (22), is usually called the "bad" Broyden method because in its original applications, it did not perform as well as the "good" Broyden update, Equation (21) [66,[68][69][70][71][72].It is more convenient to have an update to the inverse Jacobian than the Jacobian itself since (cf.Equation ( 16)) otherwise one needs to solve a (large) system of linear equations to determine the step.Fortunately, the "good" Broyden update can also be formulated as an update to the inverse Jacobian, Storing the entire (inverse) quasi-Newton Jacobian matrix is impractical for large systems of nonlinear equations, but fortunately one only needs to know the action of the inverse-Jacobian matrix on the vector f m to compute the updated step.Furthermore, we can decide to store only the results of the last few updates, Equations ( 23) and (24).With such a procedure, one can control the computational cost and amount of memory required to compute the quasi-Newton update.
The remainder of this section presents the limited-memory "bad" Broyden method we developed for the weighted density approximation.Specifically, we noticed that the "bad" Broyden method has an especially simple quasi-Newton update.Suppose one has an initial guess for the inverse Jacobian, J −1 .The action of the inverse quasi-Newton Jacobian on f m can then be computed recursively.One starts with the initialization, and then refines the Jacobian according to the recursion, One can include as many previous steps from Equations ( 23) and ( 24) as one wishes, but because it is most effective to update the inverse Jacobian using local information; including too much previous information not only increases the computational cost, but often degrades the accuracy of the approximation.If the data from s previous steps are included, then the action of the quasi-Newton inverse Hessian is computed by This is the limited-memory bad Broyden procedure we developed to solve Equations ( 11) and (12).
To make this procedure practical, we introduce two additional refinements.First, we can periodically update the approximate inverse-Jacobian that is used in Equation (30) using, for example, the diagonal approximation to the Jacobian (which is relatively easy to compute for the weighted density approximation).Second, we introduce a global trust radius to control the stepsize.We also tried using a local trust radius, where each component of the step could be scaled separately, but this did not improve our results.Suppose the trust radius for step m is τ m .Then we will take the full quasi-Newton step if the 1 norm of the step is shorter than the trust radius, (J m ) −1 f m 1 ≤ τ m .Otherwise, we (1) scale back the step so that it is the same size as the trust radius, and ( 2) because it is physically absurd to have a negative effective Fermi momentum; we never let the Fermi momentum at any grid point decrease by a factor of more than two in any iteration, Here k m=1,g G g=1 denote the elements of the updated Fermi momentum from Equation (31).Next we then check to see if the total absolute error in the nonlinear system decreased; otherwise we reject the step and try again with a step size equal to half the previous stepsize.That is, we evaluate f m+1 = f(k m+1 ) and if f m+1 1 > f m 1 , then we reject the step and try again with If the step is accepted, then we need to update the trust radius.The basic idea is that if the sign of the objective function changes, f m+1,g / f m,g < 0, then we overshot the solution of the gth nonlinear equation.We therefore let each grid point "vote" on whether it thought the last step was too long ( f m+1,g / f m,g > −0.1), too short ( f m+1,g / f m,g > 0.1), or about right.Let n long denote the number of too-long steps, n short denote the number of too-short steps, and ζ m = k m+1 − k m 1 denote the length of the previous step.We then update the trust radius based on whether the consensus seems to be that the trust radius was too short or too long.Specifically, The three cases in this formula correspond to cases where the step was too short for most grid points, too long for most grid points, or "about right".Equation ( 34) can be thought of as a trust-radius update based on the 0 norm.

Results
We tested the limited-memory bad Broyden method from Section 2 by solving the weighted density approximation for the neutral atoms, H-Kr, with p = 0.001 (cf.Equation ( 12)).We tested the method storing up to 10 previous steps (s = 1, 2, . . ., 10 in Equation ( 30)), using the diagonal approximation to the Jacobian (cf.Equation ( 17)) to estimate J −1 (cf.Equation ( 30)).We also considered updating the diagonal inverse-Jacobian approximation every t iterations (t = 1, 2, . . ., 5).For our tests, it was fastest to use 8 previous steps (s = 8) and to update the approximate inverse Jacobian every other iteration (t = 2).The nonlinear equations were considered converged when the error in every equation was less than 10 −3 .
In Figure 1, we report key results.The bad Broyden method by itself is quite poor, converging less than 15% of the calculations in 20 iterations, and only 60% of the calculations within 200 iterations.Bycontrast, because the Jacobian is diagonally dominant, using the diagonal Jacobian (and updating it in every other iteration) converges all the atoms within 16 iterations.The best method of all starts with the diagonal update, but then improves the model using the limited memory bad Broyden update retaining 8 vectors.With that method, all the atoms are converged in just 11 iterations.We also tested the method for different hole correlation models (the Gaussian model η [65] and the exchange-correlation hole of the uniform electron gas [11,73]) and for various values of p.The results are very similar, with the bad Broyden method with periodic diagonal updates performing the best.As the p value in Equation ( 12) increases, the equations become increasingly difficult to converge, though up until p ≈ 0.5 convergence is achieved in less than ~20 iterations.When p > 1, the algorithm seems to do an acceptable job of finding a least-squares solution that almost satisfies the nonlinear equations, which do not seem to have any solution [11,18].

Discussion
Despite its theoretical promise, applications of the weighted density approximation have been limited by the difficulty of solving the nonlinear equations for the local effective Fermi momentum in the weighted density approximation (cf.Equation ( 12)).Because there is one nonlinear equation for each grid point, and evaluating each equation requires an integration over all space, this is an The percent of the atoms H-Kr, for which the nonlinear equations in Equation ( 11) have converged to with 10 −3 (vertical axis) within a given number of iterations (horizonal axis).Eight vectors are retained in the bad Broyden update and the diagonal Jacobian is updated every other iteration.The curves show the results for the simplest limited-memory bad Broyden update (with no diagonal update except in the first iteration; dotted line), the limited-memory bad Broyden update (with diagonal updates every other iteration; solid line), and the diagonal approximation to the (inverse) Jacobian updated every other iteration (with no bad Broyden updates; dashed line).
The computational cost of the method is dominated by the evaluation of the objective function (Equation ( 12)) at each grid point and the computation of the diagonal elements of the Jacobian, which has almost exactly the same cost as evaluating the objective function.This is why we compute the diagonal elements of the Jacobian only in every other iteration instead of every iteration; the incremental increase in convergence rate obtained by computing the Jacobian's diagonal each iteration is not worth the additional computational expense.The actual computation of the bad Broyden update, Equations ( 28)- (30), is comparatively insignificant, partly because it can be efficiently evaluated using the (precomputed) quantities Based on these results, it seems the diagonally-updated limited-memory bad Broyden method is ideally suited for the problem of solving the weighted density approximation equations, (12), and is presumably useful for other similarly large, diagonally dominant systems of nonlinear equations as well.

Discussion
Despite its theoretical promise, applications of the weighted density approximation have been limited by the difficulty of solving the nonlinear equations for the local effective Fermi momentum in the weighted density approximation (cf.Equation ( 12)).Because there is one nonlinear equation for each grid point, and evaluating each equation requires an integration over all space, this is an extremely challenging numerical problem.It is not uncommon to have ~10 4 -10 6 grid points, and having fewer than ~10 3 grid points is extremely rare.We developed a limited-memory quasi-Newton method, which we believe will be useful also in other contexts, based on a recursive formulation of the bad Broyden update, built upon an approximate inverse-Jacobian (which we update every other iteration using the diagonal approximation for the Jacobian).For the atoms H-Kr, this method always converged in 11 or fewer iterations.
There are two ways to improve this model.First, we can reduce the number of nonlinear equations to be solved.It is reasonable to expect, for example, that after solving Equations (11) for enough grid points, accurate results for the effective Fermi momentum at the other grid points could be obtained by interpolation.Second, we could improve the quasi-Newton method itself.One possibility would be to notice that the same recursive method we developed for the bad Broyden method could be used for the good Broyden method, (21), simply by interchanging the roles of ∆k n and ∆f n .That method, however, still requires solving the equations for the step-size.Another possibility is that instead of sequentially updating the Jacobian, we could use a multisecant update [74].In particular, defining the matrices whose columns are determined from the previous steps, we can write a multisecant update for the bad Broyden method as: and for the good Broyden method as The matrix inversions in Equations ( 37) and ( 38) should be viewed as generalized (Moore-Penrose) inverses.We believe that Equation (38) is the most reasonable way to implement a limited-memory good Broyden method, but we have not implemented or tested this method because we doubt it is possible to greatly improve upon the results we obtained for the limited-memory bad Broyden method with periodic diagonal updates (as shown in Figure 1).

Figure 1 .
Figure 1.Convergence of quasi-Newton approaches for the weighted density approximation of atoms.The percent of the atoms H-Kr, for which the nonlinear equations in Equation (11) have converged to with 10 −3 (vertical axis) within a given number of iterations (horizonal axis).Eight vectors are retained in the bad Broyden update and the diagonal Jacobian is updated every other iteration.The curves show the results for the simplest limited-memory bad Broyden update (with no diagonal update except in the first iteration; dotted line), the limited-memory bad Broyden update (with diagonal updates every other iteration; solid line), and the diagonal approximation to the (inverse) Jacobian updated every other iteration (with no bad Broyden updates; dashed line).

Figure 1 .
Figure 1.Convergence of quasi-Newton approaches for the weighted density approximation of atoms.The percent of the atoms H-Kr, for which the nonlinear equations in Equation (11) have converged to with 10 −3 (vertical axis) within a given number of iterations (horizonal axis).Eight vectors are retained in the bad Broyden update and the diagonal Jacobian is updated every other iteration.The curves show the results for the simplest limited-memory bad Broyden update (with no diagonal update except in the first iteration; dotted line), the limited-memory bad Broyden update (with diagonal updates every other iteration; solid line), and the diagonal approximation to the (inverse) Jacobian updated every other iteration (with no bad Broyden updates; dashed line).