1. Introduction
The TimeDependent SelfConsistentField equations together with models that include some portion of the HartreeFock (HF) exchange admit control over the range of selfinteraction in the optical response [
1,
2,
3,
4], and are related to new models of electron correlation based on the Random Phase Approximation (RPA) [
5,
6,
7,
8]. Solving the TDSCF equations is challenging due to an unconventional
Jsymmetric structure of the naive molecular orbital (MO) representation,
where
$\mathbb{A}$ and
$\mathbb{B}$ are Hermitian blocks corresponding to 4th order tensors spanning transitions between occupied and virtual subspaces,
ω is the real excitation energy and
$\overrightarrow{v}=\left(\genfrac{}{}{0pt}{}{\overrightarrow{X}}{\overrightarrow{Y}}\right)$ is the corresponding transition density. By construction, the MO representation allows strict separation between the dyadic particlehole (
ph) and holeparticle (
hp) solutions,
$\overrightarrow{X}$ and
$\overrightarrow{Y}$, for which specialized algorithms exist. Nevertheless, convergence of the naive
Jsymmetric problem is typically much slower than the corresponding Hermitian TammDancoff approximation (TDA),
$\mathbb{A}\overrightarrow{X}=\omega \overrightarrow{X}$, which is of reduced dimensionality in the MO representation.
Several TDSCF eigensolvers are based on the oscillator picture
$\left(\begin{array}{cc}0& \mathbb{K}\\ \mathbb{T}& 0\end{array}\right)\left(\genfrac{}{}{0pt}{}{\overrightarrow{p}}{\overrightarrow{q}}\right)=\omega \phantom{\rule{0.222222em}{0ex}}\left(\genfrac{}{}{0pt}{}{\overrightarrow{p}}{\overrightarrow{q}}\right),$ with
$\mathbb{K}=\mathbb{A}+\mathbb{B}$ and
$\mathbb{T}=\mathbb{A}\mathbb{B}$ the Hermitian potential and kinetic matrices, and the dual
$\left\{\overrightarrow{p},\overrightarrow{q\phantom{\rule{0.166667em}{0ex}}}\right\}=\left\{\overrightarrow{X}\overrightarrow{Y},\overrightarrow{\phantom{\rule{0.166667em}{0ex}}X}+\overrightarrow{Y}\right\}$ corresponding to position and momentum. This picture avoids the imbalance
$\u2225\overrightarrow{X}\u2225\gg \u2225\overrightarrow{Y}\u2225$ whilst admitting conventional solutions based on the Hermitian matrix
$\mathbb{G}=\mathbb{K}\xb7\mathbb{T},$ as shown by Tamara and Udagawa [
9] and extended by Narita and Shibuya with second order optimization of the quotient
${\omega}^{2}\left[\overrightarrow{p},\overrightarrow{q}\right]=\overrightarrow{q}\xb7\mathbb{G}\xb7\overrightarrow{p}/\left\overrightarrow{p}\xb7\overrightarrow{q}\right$ [
10]. More recently, Tsiper [
11] considered the quotients
and developed a corresponding dual channel Lanczos solver. Subspace solvers in this dual representation have recently been surveyed by Tretiak, Isborne, Niklasson and Challacombe (TINC) [
12], with comparative results for semiempirical models.
Another challenge is dimensionality and scaling. Writing Equation (
1) in the general form
$\mathbb{L}\xb7\overrightarrow{v}=\omega \overrightarrow{\phantom{\rule{0.166667em}{0ex}}v}$, admitting arbitrary representation, the superoperator matrix
$\mathbb{L}$ is a ∼
${N}^{2}\times {N}^{2}$ tetradic, with
N the number of basis functions, assumed proportional to system size. In practice the action of
$\mathbb{L}$ onto
$\overrightarrow{v}$ is carried out implicitly as
$\mathit{L}\left[\mathit{v}\right]=[\mathit{F},\phantom{\rule{0.166667em}{0ex}}\mathit{v}]+\left[\mathit{G}\right[\mathit{v}],\phantom{\rule{0.166667em}{0ex}}\mathit{P}]\phantom{\rule{0.166667em}{0ex}}$, using an existing framework for construction of the effective Hamiltonian (Fockian)
F, where
$\mathit{P}$ is the oneparticle reduced density matrix,
$\mathit{G}$ is a screening operator involving Coulomb, exchange and/or exchangecorrelation terms and the correspondence between superoperator and functional notation is given by a tensorial mapping between diadic and matrix,
${\overrightarrow{v}}_{{\scriptscriptstyle {N}^{2}\times 1}}{\iff \mathbf{v}}_{{\scriptscriptstyle N\times N}}$.
Recent efforts have focused on addressing the problem of dimensionality by employing linear scaling methods that reduce the cost of
$\mathit{L}[\xb7]$ within Density Functional Theory (DFT) to
$\mathcal{O}\left(N\right)$. However, this remains an open problem for the HartreeFock (HF) exchange, an ingredient in models that account for charge transfer in the dynamic and static response, including the Random Phase Approximation (RPA) at the pure HF level of theory. Likewise, scaling of the TDSCF eigenproblem remains formidable due to associated costs of linear algebra, even when using powerful Krylov subspace methods. Underscoring this challenge, one of the most successful approaches to linear scaling TDDFT avoids the matrix eigenproblem entirely through explicit timeevolution [
13,
14].
Linear scaling matrix methods exploit quantum locality, manifest in approximate exponential decay of matrix elements expressed in a well posed, local basis; with the dropping of small elements below a threshold,
${\tau}_{\mathrm{mtx}}$, this decay leads to sparse matrices and
$\mathcal{O}\left(N\right)$ complexity at the forfeit of full precision [
15,
16,
17]. Likewise, linear scaling methods for computing the HF exchange employ an advanced form of direct SCF, exploiting this decay in the rigorous screening of small exchange interactions bellow the twoelectron integral threshold
${\tau}_{2\mathrm{e}}$ [
18]. The consequence of these linear scaling approximations is an inexact linear algebra that challenges Krylov solvers due to nested error accumulation, a subject of recent formal interest [
19,
20]. Consistent with this view, TINC found that matrix perturbation (a truncation proxy) disrupts convergence of Krylov solvers with slow convergence,
i.e., Lanczos and Arnoldi for the RPA, but has less impact on solvers with rapid convergence,
i.e., generically for the TDA or Davidson for the RPA. Relative to semiempirical Hamiltonians, the impact of incompleteness on subspace iteration may be amplified with first principles models and large basis sets (illconditioning).
An alternative is Rayleigh Quotient Iteration (RQI), which poses the eigenproblem as nonlinear optimization and is variational with respect to matrix perturbation. Narita and Shibuya [
10] considered optimization of the quotient
${\omega}^{2}\left[\overrightarrow{p},\overrightarrow{q}\right]$ with second order methods, but these are beyond the capabilities of current linear scaling technologies and also, convergence may be slower by a power of ½. For semiempirical Hamiltonians, TINC found that optimization of the Thouless functional
$\omega \left[\overrightarrow{v}\right]=\overrightarrow{v}\xb7\mathbb{L}\xb7\overrightarrow{v}/\left\overrightarrow{v}\xb7\overrightarrow{v}\right,$ corresponding to the solution of Equation (
1), was significantly slower for the RPA relative to the TDA, and also compared to subspace solvers. For first principles models and nontrivial basis sets, this naive RQI can become pathologically slow as shown in
Figure 1. On the other hand, the Tsiper formulation exposes the underlying pseudoHermitian structure of the TDSCF equations. Here, this structure is exploited with QUasiIndependent Rayleigh Quotient Iteration (QUIRQI), involving dual channel optimization of the Tsiper quotients coupled only weakly through line search. Although this work was first placed in the arXiv some time ago [
21], it is offered here after review and revision, with changes primarily in the concluding remarks.
2. Theoretical Development
Our development begins with a brief review of the representation independent formulation developed by TINC, which avoids the
$\mathcal{O}\left({N}^{3}\right)$ cost of rotating into an explicit
ph,
hp symmetry. Instead, this symmetry is maintained implicitly via annihilation,
$\mathit{x}\leftarrow {f}_{a}\left(\mathit{x}\right)=\mathit{P}\xb7\mathit{x}\xb7\mathit{Q}+\mathit{Q}\xb7\mathit{x}\xb7\mathit{P}$, with
P the first order reduced density matrix and
$\mathit{Q}=\mathit{I}\mathit{P}$ its compliment. Likewise, the indefinite metric associated with the
Jsymmetry of Equation (
1) is carried through the generalized norm
$\u2329\mathit{x},\mathit{y}\u232a=\mathrm{tr}$$\left\{{\mathit{x}}^{{\scriptscriptstyle T}}\xb7\left[\mathit{y},\mathit{P}\right]\right\}$. Introducing the operator equivalents,
$\mathit{L}\left[\mathit{p}\right]\iff \mathbb{K}.\overrightarrow{p}$ and
$\mathit{L}\left[\mathit{q}\right]\iff \mathbb{T}.\overrightarrow{q}$ , the Tsiper functional becomes
$\omega \left[\mathit{p},\mathit{q}\right]=\frac{\u2329\mathit{p},\mathit{L}\left[\mathit{p}\right]\u232a}{2\left\u2329\mathit{p},\mathit{q}\u232a\right}+\frac{\u2329\mathit{q},\mathit{L}\left[\mathit{q}\right]\u232a}{2\left\u2329\mathit{p},\mathit{q}\u232a\right}.$ Transformations between the transition density and the dual space involves simple manipulations and minimal cost, allowing Fock builds with the transition density and optimization in the dual space. The splitting operation is given by
$\mathit{p}={f}_{+}\left(\mathit{v}\right)=\mathit{P}\xb7\mathit{v}\xb7\mathit{Q}+{\left[\mathit{Q}\xb7\mathit{v}\xb7\mathit{P}\right]}^{{\scriptscriptstyle T}}$ and
$\mathit{q}={f}_{}\left(\mathit{v}\right)=\mathit{P}\xb7\mathit{v}\xb7\mathit{Q}{\left[\mathit{Q}\xb7\mathit{v}\xb7\mathit{P}\right]}^{{\scriptscriptstyle T}}$. Likewise,
$\mathit{L}\left[\mathit{p}\right]={f}_{}\left(\mathit{L}\left[\mathit{v}\right]\right)$ and
$\mathit{L}\left[\mathit{q}\right]={f}_{+}\left(\mathit{L}\left[\mathit{v}\right]\right)$. The back transformation (merge) from dual to density is
$\mathit{v}=F(\mathit{p},\mathit{q})=\left(\mathit{p}+\mathit{q}+{\left[\mathit{p}\mathit{q}\right]}^{{\scriptscriptstyle T}}\right)/2$. This framework provides the freedom to work in any orthogonal representation, and to switch between transition density and oscillator duals with minimal cost.
QUIRQI is given in Algorithm 1. It begins with a guess for the transition density, which is then split into its dual (lines 2–3). The choice of initial guess is discussed later. Lines 4–24 consist of the nonlinear conjugate gradient optimization of the nearly independent channels: In each step, the flow of information proceeds from optimization of the duals to builds involving the density and back to the duals in a mergeannihilatetruncatebuildsplittruncate (MATBST) sequence. For the variables
v,
p and
q this sequence is comprised by lines 22–23 and 5–7, and lines 15–19 for the corresponding conjugate gradients
${\mathit{h}}_{v}$,
${\mathit{h}}_{p}$ and
${\mathit{h}}_{q}$. Truncation is carried out with the
filter operation as described in Reference [
17] and also below, with cost and error determined by the matrix threshold
${\tau}_{\mathrm{mtx}}$.
Algorithm 1 QUIRQI 
 1:
procedure QUIRQI($\omega ,\mathit{v}$)  2:
guess v  3:
$\mathit{p}={f}_{+}\left(\mathit{v}\right)$, $\mathit{q}={f}_{}\left(\mathit{v}\right)$  4:
while ${e}_{\mathrm{rel}}>\u03f5$ and ${g}_{\mathrm{max}}>\gamma $ and $\omega <{\omega}^{\mathrm{old}}$ do  5:
$\mathit{L}\left[\mathit{v}\right]=[\mathit{F},\phantom{\rule{0.166667em}{0ex}}\mathit{v}]+\left[\mathit{G}\right[\mathit{v}],\phantom{\rule{0.166667em}{0ex}}\mathit{P}]$  6:
$\mathit{L}\left[\mathit{p}\right]={f}_{}\left(\mathit{L}\left[\mathit{v}\right]\right)$, $\mathit{L}\left[\mathit{q}\right]={f}_{+}\left(\mathit{L}\left[\mathit{v}\right]\right)$  7:
$\mathtt{filter}\left(\mathit{L}\left[\mathit{p}\right],\mathit{L}\left[\mathit{q}\right],\phantom{\rule{0.166667em}{0ex}}{\tau}_{\mathrm{mtx}}\right)$  8:
${\omega}_{p}=\frac{\u2329\mathbf{p},\mathit{L}\left[p\mathit{\right]}\u232a}{2\left\u2329\mathit{p},\mathit{q}\u232a\right}$, ${\omega}_{q}=\frac{\u2329\mathbf{q},\mathit{L}\left[\mathit{q}\right]\u232a}{2\left\u2329\mathit{p},\mathit{q}\u232a\right}$, $\omega ={\omega}_{p}+{\omega}_{q}$  9:
${\mathit{g}}_{p}=\mathit{q}\phantom{\rule{0.166667em}{0ex}}{\omega}_{q}\mathit{L}\left[\mathit{p}\right]$, ${\mathit{g}}_{q}=\mathit{p}\phantom{\rule{0.166667em}{0ex}}{\omega}_{p}\mathit{L}\left[\mathit{q}\right]$  10:
${e}_{\mathrm{rel}}=\left({\omega}^{\mathrm{old}}\omega \right)/\omega $  11:
${g}_{\mathrm{max}}=\underset{i,j}{max}\left\{{\left[{\mathit{g}}_{p}\right]}_{ij},{\left[{\mathit{g}}_{p}\right]}_{ij}\right\}$  12:
${\beta}_{p}=\frac{\u2329{\mathit{g}}_{p},\phantom{\rule{0.166667em}{0ex}}{\mathit{g}}_{p}{\mathit{g}}_{p}^{\mathrm{old}}\u232a}{\u2329{\mathit{g}}_{p}^{\mathrm{old}},{\mathit{g}}_{p}^{\mathrm{old}}\u232a}$, ${\beta}_{q}=\frac{\u2329{\mathit{g}}_{q},\phantom{\rule{0.166667em}{0ex}}{\mathit{g}}_{q}{\mathit{g}}_{q}^{\mathrm{old}}\u232a}{\u2329{\mathit{g}}_{q}^{\mathrm{old}},{\mathit{g}}_{q}^{\mathrm{old}}\u232a}$  13:
${\omega}^{old}\leftarrow \omega $, ${\mathit{g}}_{p}^{old}\leftarrow {\mathit{g}}_{p}$ , ${\mathit{g}}_{q}^{old}\leftarrow {\mathit{g}}_{q}$  14:
${\mathit{h}}_{p}\leftarrow {\mathit{g}}_{p}+{\beta}_{p}{\mathit{h}}_{p}$, ${\mathit{h}}_{q}\leftarrow {\mathit{g}}_{q}+{\beta}_{q}{\mathit{h}}_{q}^{}$  15:
${\mathit{h}}_{v}=F({\mathit{h}}_{p},{\mathit{h}}_{q})$, ${\mathit{h}}_{v}\leftarrow {f}_{a}\left({\mathit{h}}_{v}\right)$  16:
$\mathtt{filter}\left({\mathit{h}}_{p},{\mathit{h}}_{q},{\mathit{h}}_{v},\phantom{\rule{0.166667em}{0ex}}{\tau}_{\mathrm{mtx}}\right)$  17:
$\mathit{L}\left[{\mathit{h}}_{v}\right]=[\mathit{F},\phantom{\rule{0.166667em}{0ex}}{\mathit{h}}_{v}]+[\mathit{G}\left[{\mathit{h}}_{v}\right],\phantom{\rule{0.166667em}{0ex}}\mathit{P}]$  18:
$\mathit{L}\left[{\mathit{h}}_{p}\right]={f}_{}\left(\mathit{L}\left[{\mathit{h}}_{v}\right]\right)$, $\mathit{L}\left[{\mathit{h}}_{q}\right]={f}_{+}\left(\mathit{L}\left[{\mathit{h}}_{v}\right]\right)$  19:
$\mathtt{filter}\left(\mathit{L}\left[{\mathit{h}}_{p}\right],\phantom{\rule{0.166667em}{0ex}}\mathit{L}\left[{\mathit{h}}_{q}\right],\phantom{\rule{0.166667em}{0ex}}{\tau}_{\mathrm{mtx}}\right)$  20:
$\left\{{\lambda}_{p},{\lambda}_{q}\right\}=\underset{\left\{{\lambda}_{p},{\lambda}_{q}\right\}}{argmin}\phantom{\rule{0.277778em}{0ex}}\omega \left[\mathit{p}+{\lambda}_{p}{\mathit{h}}_{p},\mathit{q}+{\lambda}_{q}{\mathit{h}}_{q}\right]$  21:
$\mathit{p}\leftarrow \mathit{p}+{\lambda}_{p}{\mathit{h}}_{p}$, $\mathit{q}\leftarrow \mathit{q}+{\lambda}_{q}{\mathit{h}}_{q}$  22:
$\mathit{v}\leftarrow F\left(\mathit{p},\mathit{q}\right)$, $\mathit{v}\leftarrow {f}_{a}\left(\mathit{v}\right)$  23:
$\mathtt{filter}\left(\mathit{p},\mathit{q},\mathit{v},\phantom{\rule{0.166667em}{0ex}}{\tau}_{\mathrm{mtx}}\right)$  24:
end while  25:
end procedure

The Tsiper functional is the sum of dual quotients ${\omega}_{p}$ and ${\omega}_{q}$, determined at line 8, followed by the gradients ${\mathit{g}}_{p}$ and ${\mathit{g}}_{q}$ computed at line 9. After the first cycle, the corresponding relative error ${e}_{\mathrm{rel}}$ (10) and maximum matrix element of the gradient ${g}_{\mathrm{max}}$ (11) are computed and used as an exit criterion at line 4, along with nonvariational behavior $\omega >{\omega}^{\mathrm{old}}.$
Next, the PolakRibiere variant of nonlinear conjugate gradients yields the search direction in each channel, ${\mathit{h}}_{p}$ and ${\mathit{h}}_{q}$ (12–14). The action of $\mathit{L}[\xb7]$ on to ${\mathit{h}}_{p}$ and ${\mathit{h}}_{q}$ is then computed, again with a MATBST sequence (15–19), followed by a selfconsistent dual channel line search at line 20, as described below. With steps ${\lambda}_{p}$ and ${\lambda}_{q}$ in hand, minimizing updates are taken along each conjugate direction (22), and the cycle repeats with the MATBST sequence spanning lines 21–23 and 5–7.
Optimization of the Tsipper functional
$\omega \left[{\lambda}_{p},{\lambda}_{q}\right]\equiv \omega \left[\mathit{p}+{\lambda}_{p}{\mathbf{h}}_{p},\mathit{q}+{\lambda}_{q}{\mathit{h}}_{q}\right]$ involves a two dimensional linesearch (line 20) corresponding to minimization of
with coupling entering through terms in the denominator such as
${U}_{pq}=\u2329{\mathit{h}}_{p},{\mathit{h}}_{q}\u232a$. A minimum in Equation (
3) can be found quickly to high precision by alternately substituting onedimensional solutions one into the other until selfconsistency is reached. This semianalytic approach starts with a rough guess at the pair
$\left\{{\lambda}_{p},{\lambda}_{q}\right\}$ (eg. found by a coarse scan) followed by iterative substitution, where for example the
pchannel update is
with an analogous update for the
qchannel obtained by swapping subscripts. As the solution decouples (
${S}_{pq}$,
${T}_{pq}$ and
${U}_{pq}$ become small) the steps are found independently.
3. Results and Discussion
QUIRQI has been implemented in FreeON [
22], which employs the linear scaling Coulomb and HartreeFock exchange kernels QCTC and ONX with cost and accuracy controlled by the twoelectron screening threshold
${\tau}_{2\mathrm{e}}$ [
18].
Nscaling solution of the QUIRQI matrix equations is achieved through “sparsification” (In previous works [
15], this process has been loosely referred to as SpAMM, involving both truncation and dynamic dropping of small rowcolumn contributions from the sparse matrix multiply based on the BCSR data structure. See Section IVA3 of Reference [
15]. In this work, only truncation in the BCSR data structure has been used, but was incorrectly referred to as SpAMM in a previous instance [
21]. In more recent developments, SpAMM refers to recursive, hierarchical truncation in the product space [
23,
24], rather than the rowcolumn approach outlined in Reference [
15].), of the underlying vector space, where the
filter operation is applied to drop atomblocks with norm smaller than a drop tolerance
${\tau}_{\mathrm{mtx}}$ in a blockCSR data structure (BCSR) [
15,
16,
17]. All calculations were carried out with version 4.3 of the gcc/gfortran compiler under version 8.04 of the Ubuntu Linux distribution and run on a 2GHz AMD Quad Opteron 8350.
For systems studied to date, QUIRQI is found to converge monotonically with rates comparable to the TDA as shown in
Figure 1. Based on the comparative performance presented by TINC, the TDA rate of convergence appears to be a lower bound for RPA solvers. In addition to the convergence rate, performance is strongly determined by the initial guess. The following results have been obtained using the polarization response density along the polymer axis [
12], which can be computed in
$\mathcal{O}\left(N\right)$ by Perturbed Projection [
25]. Also, a relative precision of 4 digits in the excitation energy is targeted with the convergence parameters
$\u03f5={10}^{4}$ and
$\gamma ={10}^{3}$, with exit from the optimization loop on violation of monotonic convergence (
$\omega >{\omega}^{\mathrm{old}}$ due to precision limitations associated with linear scaling approximations).
Figure 1.
Convergence of RHF/321G TammDancoff approximation (TDA) and Random Phase Approximation (RPA) with the Rayleigh Quotient Iteration (RQI) and QUasiIndependen RQI (QUIRQI) algorithms for linear decaene (C${}_{10}$H${}_{2}$). Calculations were started from the same random guess, and tight numerical thresholds were used throughout. In the representation independent scheme, the cost per iteration is the same for TDA and RPA.
Figure 1.
Convergence of RHF/321G TammDancoff approximation (TDA) and Random Phase Approximation (RPA) with the Rayleigh Quotient Iteration (RQI) and QUasiIndependen RQI (QUIRQI) algorithms for linear decaene (C${}_{10}$H${}_{2}$). Calculations were started from the same random guess, and tight numerical thresholds were used throughout. In the representation independent scheme, the cost per iteration is the same for TDA and RPA.
In
Figure 2, linear scaling and convergence to the bulk limit are demonstrated for a series of polyphenylene vinylene (PPV) oligomers at the RHF/631G** level of theory for the threshold combinations
$\left\{\tau {}_{\mathrm{mtx}},{\tau}_{2\mathrm{e}}\right\}=\left\{{10}^{4},{10}^{5}\right\}$ and
$\left\{{10}^{5},{10}^{6}\right\}$. Significantly more conservative thresholds have been used for the Coulomb sums, which incur only minor cost. Convergence is reached in 24–25 iterations, with the cost of Coulomb summation via QCTC comparable to the cost of BCSR(
${\tau}_{\mathrm{mtx}}={10}^{4}$). In
Figure 3, linear scaling and convergence to the bulk limit are demonstrated for a series of (4,3) carbon nanotube segments at the RHF/321G level of theory for the same threshold combinations, again with convergence achieved in about 24–25 cycles. In both cases, tightening the pair
$\left\{{\tau}_{\mathrm{mtx}},{\tau}_{2\mathrm{e}}\right\}$ leads to a systematically improved result. While the
$\left\{{10}^{4},{10}^{5}\right\}$ thresholds that work well for PPV lead to a nonmonotone behavior with respect to extent for the nanotube series, dropping one more decade to
$\left\{{10}^{5},{10}^{6}\right\}$ leads to a sharply improved behavior. Dropping thresholds further to
$\left\{{10}^{6},{10}^{7}\right\}$ yields identical results to within the convergence criteria (∼four digits) across the series, also scaling with
N but at roughly twice the cost.
These results demonstrate that QUIRQI can achieve both systematic error control and linear scaling in solution of the RPA eigenproblem for systems with extended conjugation. Relative to PPV, the greater numerical sensitivity encountered with the nanotube series is consistent with the ground state problem, where a smaller band gap and greater atomic connectivity typically demand tighter thresholds.
Figure 2.
Approach to the bulk limit of the polyphenylene vinylene (PPV) first excited state at the 631G**/RPA level of theory, with inset showing linear scaling cost for HartreeFock (HF) exchange (ONX) and sparse linear algebra (BCSR). The cost of Coulomb sums with much tighter thresholds are comparable to those for the BCSR.
Figure 2.
Approach to the bulk limit of the polyphenylene vinylene (PPV) first excited state at the 631G**/RPA level of theory, with inset showing linear scaling cost for HartreeFock (HF) exchange (ONX) and sparse linear algebra (BCSR). The cost of Coulomb sums with much tighter thresholds are comparable to those for the BCSR.
Figure 3.
Approach to the bulk limit of the first excited state of the (4,3) carbon nanotube segment at the 321G/RPA level of theory, with inset showing linear scaling cost for HF exchange (ONX), sparse linear algebra (BCSR) and Coulomb sums (QCTC).
Figure 3.
Approach to the bulk limit of the first excited state of the (4,3) carbon nanotube segment at the 321G/RPA level of theory, with inset showing linear scaling cost for HF exchange (ONX), sparse linear algebra (BCSR) and Coulomb sums (QCTC).
4. Conclusions
Since this note appeared some time ago in arXiv [
21], several related efforts have appeared that deserve comment: (A) single channel optimization with radial cutoffs [
26] and (B) conventional algebra with a four channel line search [
27,
28]. In the first instance, the ONETEP group have implemented a single channel quotient scheme for the TDA and demonstrated linear scaling for a number of systems using the radial cutoff approach to achieve reduced complexity. In the radial cutoff approach, portions of the vector space are eliminated from the linear algebra when the Cartesian distance between associated atoms becomes greater than some cutoff radius (Going a step further, new technologies are emerging that achieve reduced complexity without truncation in the vector space [
23,
24].). The ONETEP paper is recommended by their careful discussion of radial cutoffs leading to artificial truncation in cases of long range charge transfer, e.g., Reference [
26] Figure 5. In the current implementation, the
filter operation eliminates elements of the vector space that are numerically small; in the case of long range charge transfer, extended conjugation,
etc., an unphysical truncation does not occur. For problems without long range charge transfer or extended conjugation, for example large problems with well localized chromophores as in References [
29,
30], the complexity of QUIRQI with respect to system size becomes
$\mathcal{O}\left(1\right)$.
In the second instance, the QUIRQI method has been extended to include two additional channels in the line search [
27,
28];
$\underset{\left\{\alpha ,\beta ,{\lambda}_{p},{\lambda}_{q}\right\}}{argmin}\phantom{\rule{0.277778em}{0ex}}\omega \left[\alpha \mathit{p}+{\lambda}_{p}{\mathit{h}}_{p},\beta \mathit{q}+{\lambda}_{q}{\mathit{h}}_{q}\right]$. The authors claim without elaboration that “the solution by our 4D search is and can be much better (than the dual channel approach)” [
28]. As shown in
Figure (1), QUIRQI decouples in the first few steps achieving convergence equivalent to TDA[RQI],
${R}_{pq}\to 1,{S}_{pq}\to 0,{T}_{pq}\to 0,{U}_{pq}\to 0$ in Equation (
3), so it is hard to understand this unsupported claim, considering also the imperative that
$\alpha *\beta =1$ to maintain normalization. These claims are also undercut by apparently slow rates of convergence; compare for example
Figure (1) of this work with Reference [
27], especially
Figure 3. These authors further claim without explanation that “dual channel optimization are not readily extensible to the subspace search” [
28]. Again, its hard to understand how the dual channel case isn’t extensible to subspace schemes for finding multiple eigenvalues; the equivalent of (single channel) block RQI has been demonstrated in the ONETEP paper [
26], and no obvious problems are foreseen with more sophisticated methods such as LOBPCG [
31] for single or dual channel approaches.
To summarize, the QUIRQI method is characterized by two innovations: First, dual channel optimization separates the Tsipper functional into two, nearly independent quotients that cannot be further improved by additional channels in the line search. Reflecting this separation, convergence of the TDSCF matrix eigenproblem with QUIRQUI is found to be equivalent to the single quotient matrix eigenproblem in the TammDancoff approximation, as shown in
Figure 1. Second, the method is variational with respect to an incomplete linear algebra, controlled in this work through the
filter threshold
${\tau}_{\mathrm{mtx}}$ [
15,
16,
17], as shown in
Figure 2 and
Figure 3. While QUIRQI is not variational with respect to the screening parameter
${\tau}_{2\mathrm{e}}$, the solution can be systematically improved by tightening
${\tau}_{2\mathrm{e}}$ [
18], in comparison to nested subspace methods that encounter an iterative accumulation of errors [
19,
20]. Indeed, eigensolution posed as optimization provides considerable flexibility in choosing a path to solution, offering opportunities for mixed precision GPU acceleration [
32] and variable thresholding (tightening the parameter
${\tau}_{2\mathrm{e}}$ during convergence).