Linear Scaling Solution of the Time-Dependent Self-Consistent-Field Equations

A new approach to solving the Time-Dependent Self-Consistent-Field equations is developed based on the double quotient formulation of Tsiper [J. Phys. B, 34 L401 (2001)]. Dual channel, quasi-independent non-linear optimization of these quotients is found to yield convergence rates approaching those of the best case (single channel) Tamm-Dancoff approximation. This formulation is variational with respect to matrix truncation, admitting linear scaling solution of the matrix-eigenvalue problem, which is demonstrated for bulk excitons in the polyphenylene vinylene oligimer and the (4,3) carbon nanotube segment.

where A and B are Hermitian blocks corresponding to 4 th order tensors spanning transitions between occupied and virtual sub-spaces, ω is the real excitation energy and v = X Y is the corresponding transition density. By construction, the MO representation allows strict separation between the dyadic particle-hole (ph) and holeparticle (hp) solutions, X and Y , for which specialized algorithms exist. Nevertheless, convergence of the naive J-symmetric problem is typically much slower than the corresponding Hermitian Tamm-Dancoff approximation (TDA), A X = ω X, which is of reduced dimensionality in the MO representation.
Several TD-SCF eigensolvers are based on the oscillator picture 0 K T 0 p q = ω p q , with K = A + B and T = A − B the Hermitian potential and kinetic matrices, and the dual { p, q } = X − Y , X + Y corresponding to position and momentum. This picture avoids the imbalance X ≫ Y whilst admitting conventional solutions based on the Hermitian matrix G = K · T, as shown by Tamara and Udagawa [17] and extended by Narita and Shibuya with second order optimization of the quotient ω 2 [ p, q] = q · G · p/ | p · q| [10]. More recently, Tsiper considered the quotients ω [ p, q] = p · K · p 2 | p· q| + q · T · q 2 | p· q| , and developed a corresponding dual channel Lanczos solver. Subspace solvers in this dual representation have recently been surveyed by Tretiak, Isborne, Niklasson and Challacombe (TINC) [19], with comparative results for semi-empirical models.
Another challenge is dimensionality and scaling. Writing Eq. (1) in the general form L · v = ω v, admitting arbitrary representation, the superoperator matrix L is a ∼ N 2 × N 2 tetradic, with N the number of basis functions, assumed proportional to system size. In practice the action of L onto v is carried out implicitly as , P ] , using an existing framework for construction of the effective Hamiltonian (Fockian) F , where P is the one-particle reduced density matrix, G is a screening operator involving Coulomb, exchange and/or exchange-correlation terms and the correspondence between superoperator and functional notation is given by a tensorial mapping between diadic and matrix, v N 2 ×1 ⇔ v N ×N .
Recent efforts have focused on addressing the problem of dimensionality by employing linear scaling methods that reduce the cost of L[·] within Density Functional Theory (DFT) to O(N ). However, this remains an open problem for the Hartree-Fock (HF) exchange, an ingredient in models that account for charge transfer in the dynamic and static response, including the Random Phase Approximation (RPA) at the pure HF level of theory. Likewise, scaling of the TD-SCF eigenproblem remains formidable due to associated costs of linear algebra, even when using powerful Krylov subspace methods. Underscoring this challenge, one of the most successful approaches to linear scaling TD-DFT avoids the matrix eigenproblem entirely through explicit timeevolution [21,22].
Linear scaling matrix methods exploit quantum locality, manifest in approximate exponential decay of matrix elements expressed in a well posed, local basis; with the dropping of small elements below a threshold, τ mtx , this decay leads to sparse matrices and O(N ) complexity at the forfeit of full precision [3,4,11]. Likewise, linear scaling methods for computing the HF exchange employ an advanced form of direct SCF, exploiting this decay in the rigorous screening of small exchange interactions bellow the two-electron integral threshold τ 2e [13]. The consequence of these linear scaling approximations is an inexact linear algebra that challenges Krylov solvers due to nested error accumulation, a subject of recent formal interest [14,15]. Consistent with this view, TINC found that matrix perturbation (a truncation proxy) disrupts convergence of Krylov solvers with slow convergence, i.e. Lanczos and Arnoldi for the RPA, but has less impact on solvers with rapid convergence, i.e. generically for the TDA or Davidson for the RPA. Relative to semiempirical Hamiltonians, the impact of incompleteness on subspace iteration may be amplified with first principles models and large basis sets (ill-conditioning).
An alternative is Rayleigh Quotient Iteration (RQI), which poses the eigenproblem as non-linear optimization and is variational with respect to matrix perturbation. Narita and Shibuya [10] considered optimization of the quotient ω 2 [ p, q] with second order methods, but these are beyond the capabilities of current linear scaling technologies and also, convergence is disadvantaged by a power of 1 /2. For semi-empirical Hamiltonians, TINC found that optimization of the Thouless functional ω[ v] = v · L · v/ | v· v| , corresponding to the solution of Eq. (1), was significantly slower for the RPA relative to the TDA, and also compared to subspace solvers. For first principles models and non-trivial basis sets, this naive RQI can become pathologically slow as shown in Fig. 1. On the other hand, the Tsiper formulation exposes the underlying pseudo-Hermitian structure of the TD-SCF equations. Here, this structure is exploited with QUasi-Independent Rayleigh Quotient Iteration (QUIRQI), involving dual channel optimization of the Tsiper quotients coupled only weakly through line search.
Our development begins with a brief review of the representation independent formulation developed by TINC, which avoids the O(N 3 ) cost of rotating into an explicit p-h, h-p symmetry. Instead, this symmetry is maintained implicitly via annihilation, x ← f a (x) = P · x· Q + Q· x· P , with P the first order reduced density matrix and Q = I − P its compliment. Likewise, the indefinite metric associated with the J-symmetry of Eq. (1) is carried through the generalized norm x, y = tr{x T · [y, P ]}. Introducing the operator equivalents, 2| p,q | . Transformations between the transition density and the dual space involves simple manipulations and minimal cost, allowing Fock builds with the transition density and optimization in the dual space. The splitting operation is given by Calculations were started from the same random guess, and tight numerical thresholds were used throughout. In the representation independent scheme, the cost per iteration is the same for TDA and RPA.
This framework provides the freedom to work in any orthogonal representation, and to switch between transition density and oscillator duals with minimal cost. QUIRQI is given in Algorithm 1. It begins with a guess for the transition density, which is then split into its dual (lines 2-3). The choice of initial guess is discussed later. Lines 4-24 consist of the non-linear conjugate gradient optimization of the nearly independent channels: In each step, the flow of information proceeds from optimization of the duals to builds involving the density and back to the duals in a merge-annihilate-truncate-buildsplit-truncate (MATBST) sequence. For the variables v, p and q this sequence is comprised by lines 22-23 and 5-7, and lines 15-19 for the corresponding conjugate gradients h v , h p and h q . Truncation is carried out with the filter operation as described in Ref. [11], with cost and error determined by the matrix threshold τ mtx .
The Tsiper functional is the sum of dual quotients ω p and ω q , determined at line 8, followed by the gradients g p and g q computed at line 9. After the first cycle, the corresponding relative error e rel (10) and maximum matrix element of the gradient g max (11) are computed and used as an exit criterion at line 4, along with non-variational behavior ω > ω old .
Next, the Polak-Ribiere variant of non-linear conjugate gradients yields the search direction in each channel, h p and h q (12)(13)(14). The action of L[·] on to h p and h q is then computed, again with a MATBST sequence (15)(16)(17)(18)(19), followed by a self-consistent dual channel line search at line 20, as described below. With steps λ p and λ q in hand, minimizing updates are taken along each conjugate direction (22), and the cycle repeats with the MATBST sequence spanning lines 21-23 and 5-7.
Optimization of the Tsipper functional ω [λ p , λ q ] ≡ guess v 3: while e rel > ǫ and gmax > γ and ω < ω old do 5: 2| p,q | , ω = ωp + ωq 9: 10: : hp ← g p + βphp, hq ← g q + βqh q 15: hv = F (hp, hq), hv ← fa (hv) 16: filter ( with coupling entering through terms in the denominator such as U pq = h p , h q . A minimum in Eq. (3) can be found quickly to high precision by alternately substituting one-dimensional solutions one into the other until self-consistency is reached. This semi-analytic approach starts with a rough guess at the pair {λ p , λ q } (eg. found by a coarse scan) followed by iterative substitution, where for example the p-channel update is with an analogous update for the q-channel obtained by swapping subscripts. As the solution decouples (S pq , T pq and U pq become small) the steps are found independently. QUIRQI has been implemented in FreeON [2], which employs the linear scaling Coulomb and Hartree-Fock exchange kernels QCTC and ONX with cost and accuracy controlled by the two-electron screening threshold τ 2e [13]. N -scaling solution of the QUIRQI matrix equations is achieved with the sparse approximate matrix-matrix multiply (SpAMM), with cost and accuracy determined by the drop tolerance τ mtx [3,4,11]. All calculations were carried out with version 4.3 of the gcc/gfortran compiler under version 8.04 of the Ubuntu Linux distribution and run on a fully loaded 2GHz AMD Quad Opteron 8350.
For systems studied to date, QUIRQI is found to converge monotonically with rates comparable to the TDA as shown in Fig. 1. Based on the comparative performance presented by TINC, the TDA rate of convergence appears to be a lower bound for RPA solvers. In addition to the convergence rate, performance is strongly determined by the initial guess. The following results have been obtained using the polarization response density along the polymer axis [19], which can be computed in O(N ) by Perturbed Projection [20]. Also, a relative precision of 4 digits in the excitation energy is targeted with the convergence parameters ǫ = 10 −4 and γ = 10 −3 , with exit from the optimization loop on violation of monotonic convergence (ω > ω old due to precision limitations associated with linear scaling approximations).
In Fig. 2, linear scaling and convergence to the bulk limit are demonstrated for a series of polyphenylene vinylene (PPV) oligomers at the RHF/6-31G** level of theory for the threshold combinations {τ mtx , τ 2e } = 10 −4 , 10 −5 and 10 −5 , 10 −6 . Significantly more conservative thresholds have been used for the Coulomb sums, which incur only minor cost.
Convergence is reached in 24 − 25 iterations, with the cost of Coulomb summation via QCTC comparable to the cost of SpAMM(τ mtx = 10 −4 ). In Fig. 3, linear scaling and convergence to the bulk limit are demonstrated for a series of (4,3) carbon nanotube segments at the RHF/3-21G level of theory for the same threshold combinations, again with convergence achieved in about 24-25 cycles. In both cases, tightening the pair {τ mtx , τ 2e } leads to a systematically improved result. While the 10 −4 , 10 −5 thresholds that work well for PPV lead to a non-monotone behavior with respect to extent for the nanotube series, dropping one more decade to 10 −5 , 10 −6 leads to a sharply improved behavior. Dropping thresholds further to 10 −6 , 10 −7 yields identical results to within the convergence criteria (∼ four digits) across the series, also scaling with N but at roughly twice the cost.
These results demonstrate that QUIRQI can achieve both systematic error control and linear scaling in solution of the RPA eigenproblem for systems with extended conjugation. Relative to PPV, the greater numerical sensitivity encountered with the nanotube series is consistent with the ground state problem, where a smaller band gap and greater atomic connectivity typically demand tighter thresholds.
QUIRQI exploits decoupling of the Tsipper functional into nearly independent, pseudo-Hermitian quotients leading to aggressive convergence rates equivalent to the fully Hermitian TDA, while remaining variational with respect to matrix truncation (τ mtx ). However, QUIRQI is not variational with respect to the screening parameter τ 2e . It can be systematically improved by tightening τ 2e though, due to rigorous error bounds based on the Schwartz inequality [13]. These properties present opportunities for more precise error control as suggested by Rubensson, Rudberg and Salek [12]. Further, these properties are expected to hold even for the most general SCF models, with the only difference being an increasingly localized transition density matrix and larger cost prefactor with an increasing DFT component. Finally, a variational approach allows considerable flexibility in the path to solution, as errors due to approximation can be overcome by optimization, offering opportunities for single precision GPU acceleration, variable thresholding, incremental Fock builds as well as extrapolation techniques.