Multi-state Multireference Rayleigh – Schrödinger Perturbation Theory for Mixed Electronic States : Second and Third Order

The formalism for multi-state multireference configuration-based RayleighSchrödinger perturbation theory and procedures for its implementation for the second-order and third-order energy within a multireference configuration interaction computer program are reviewed. This formalism is designed for calculations on electronic states that involve strong mixing between different zero-order contributions, such as avoided crossings or mixed valence-Rydberg states. Such mixed states typically display very large differences in reference-configuration mixing coefficients between the reference MCSCF wave function and an accurate correlated wave function, differences that cannot be reflected in state-specific (diagonalize-then-perturb) multireference perturbation theory through third order. A procedure described in detail applies quasidegenerate perturbation theory based on a model space of a few state-averaged MCSCF functions for the states expected to participate strongly in the mixing, and can be characterized as a “diagonalize-then-perturb-thendiagonalize” approach. It is similar in various respects to several published methods, including an implementation by Finley, Malmqvist, Roos, and Serrano-Andrés [Chem. Phys. Lett. 1998, 288, 299–306].


Introduction
The term "mixed electronic states" refers to states in which electronic structures of different types, such as valence and Rydberg, or covalent and ionic, contribute strongly to the wave function.Moreover, the relative contributions of the different types tend to vary strongly with variations in the molecular geometry, as in the case of avoided curve crossings.In multireference electronic structure calculations for such states, the relative contributions of different reference configurations tend to differ substantially between the reference (MCSCF) wave function and the final correlated wave function.
For example, the crossing point between the potential energy curves of the covalent and ionic configuration in alkali halides can vary by several Bohr between an MCSCF and a multireference CI calculation [1] (primarily because of the great difficulty in reproducing the electron affinity of the halogen atom).Obviously, in the region between the MCSCF and correlated crossing points, the MCSCF solution provides the wrong zero-order functions for the multireference treatment.If the computational model does not allow relaxation of the coefficients of the reference configurations in the correlation treatment, a correct description of the mixing and of the potential energy surfaces cannot be expected.
Two examples of correlation treatments that are affected by the problem of incorrect mixing are internally-contracted CI [2][3][4][5] and state-specific multireference perturbation expansions (of the diagonalize-then-perturb variety) at second and third order, such as CASPT2 [6] and CASPT3 [7].The reference (MCSCF) function does not interact directly with its orthogonal complement in the reference space, and therefore the other eigenvectors of the MCSCF Hamiltonian do not contribute to a perturbation expansion before the fourth order in the energy.Solutions for this problem in the case of contracted CI have been introduced [5,8], in which contracted CI excitations based on more than one MCSCF reference-space eigenvector are included.Similar ideas have been applied in multireference perturbation theory, as discussed below, and one form of such an approach for second-and third-order multireference Rayleigh-Schrödinger perturbation theory is described here.
Multireference perturbation approaches fall into two main classes: In quasidegenerate perturbation theory [9][10][11], also referred to as "perturb then diagonalize," a low-dimensional effective Hamiltonian is constructed using a perturbation expansion for each of its matrix elements, and this effective Hamiltonian is diagonalized to obtain the desired solutions for one or more states.The principal difficulty with this approach is the problem of intruder states, which can cause the perturbation series to diverge or to converge extremely slowly.On the other hand, in state-specific perturbation theory, also called "diagonalize then perturb," a zero-order function is first constructed by diagonalizing the Hamiltonian over the reference space, usually by an MCSCF calculation, and a single perturbation expansion is then constructed over this zero-order function.The principal problem with this approach is the previously mentioned lack of relaxation of the zero-order function before reaching the computationally demanding fourth order in the energy expansion.
The present paper discusses a form of multireference perturbation theory that can be referred to as a "diagonalize-then-perturb-then-diagonalize" approach, which is designed to facilitate the relaxation of the reference function and thus deal with the mixed-states problem.It begins with a state-averaged MCSCF [12,13] calculation to provide a small number of model-state zero-order functions, and applies quasidegenerate perturbation theory to obtain an effective Hamiltonian in that small model space, followed by diagonalization of the effective Hamiltonian to obtain properly-mixed wave functions and energies.The model states used to construct the effective Hamiltonian are a small subset of the eigenstates of the state-averaged MCSCF Hamiltonian, and if they can be chosen to be well-separated in energy from any other zero-order states the intruder-state problem can be reduced.This approach has been proposed and implemented, in one form or another, by a number of researchers, including Malrieu and co-workers [14], Sheppard et al. [15], Lisini and Decleva [16], Nakano [17], and Roos and co-workers [18].The current presentation includes the following specific features: 1.The choices of orbitals and zero-order Hamiltonian try to mimic the Møller-Plesset procedures that have been found to be very effective in single-reference perturbation expansions.
2. The zero-order Hamiltonians need not be diagonal.

A non-Hermitian effective
Hamiltonian is generated, based on the Bloch equation.
4. The method is formulated in configuration space, allowing flexibility in the choice of the reference space (including incomplete active spaces), and enabling easy implementation in a CI program.
5. Uncontracted configuration state functions are used as a basis for the perturbation expansions.
6. Procedures for both second and third order in the energy are included.
The other presentations share many of these features and differ in various respects, such as in the use of natural orbitals, Epstein-Nesbet partitioning, complete active spaces, diagonal zero-order Hamiltonians, Hermitian effective Hamiltonians, many-body methods, or limitation to second-order energies.
In order to introduce the notation and the overall approach, the fundamentals of Rayleigh-Schrödinger perturbation theory are reviewed very briefly in a general form in Section 2, and the application to the state-specific multireference treatment is described in Section 3. The generalization for mixed states is described in Section 4, followed by discussion in Section 5.

Rayleigh-Schrödinger perturbation theory for arbitrary zero-order functions
Conventionally, a perturbation treatment begins with the partitioning of the Hamiltonian, into a zero-order part , ˆ0 H for which the solutions are known, and a perturbation .
V Then the desired solutions are expanded, order by order, in the eigenfunctions of 0 Ĥ .Here we shall use the alternative, more general approach, in which the process is reversed, beginning with a given set of expansion functions and defining 0 Ĥ in terms of these functions.This approach clearly demonstrates the various op-tions available in the method, and is often followed in electronic structure presentations.
Given an orthonormal set of zero-order functions , , , , of which one, labeled , ) 0 ( α Φ is an approximation for the state of interest, we define a zero-order Hamiltonian with the Hermitian matrix . We then have In principle, the choice of the ) 0 ( ij E is arbitrary; however, an appropriate choice is essential, since it determines the convergence rate of the perturbation series.The perturbation is given by and the perturbation series for the state approximated by We also define The first-order energy is given in terms of the zero-order wave function, The equation for the first-order correction to the wave function is Expanding the first-order wave function in the zero-order functions, and applying ) ( resulting in a linear system of equations for the where . Intermediate normalization is imposed by the choice 0 ) 1 In many cases it is convenient to choose an 0 Ĥ that is diagonal in the zero-order functions, putting in which case the linear system has the explicit solution In either case, the second-and third-order energy corrections are given by

A. The zero-order functions
An optimized MCSCF wave function in a reference-space of m configuration state functions The 1 − m orthogonal-complement functions in the reference space (the other eigenvectors in the MCSCF Hamiltonian diagonalization) will be designated , while all other ("external") configuration state functions will be ) ( As a consequence of the diagonalization of the Hamiltonian in the MCSCF space, it is reasonable to choose the zero-order Hamiltonian in the form so that the reference-space block of the and the orthogonal complement functions ) , ( do not contribute to the first-order wave function and to the second-and third-order energies and need not be obtained explicitly.The same is true for all external functions that are higher than doubly excited relative to all the reference configurations.Note that the indices are used to refer to the multiconfigurational functions generated by diagonalization of the MCSCF Hamiltonian, while i, j usually refer to individual configu- ).We also have so that the calculations can be carried out in terms of the usual CI matrix elements ij H .

B. The zero-order Hamiltonian
Since Møller-Plesset partitioning is very effective in the single-reference case, it is reasonable to make 0 Ĥ as close as possible to a suitably-chosen generalized Fock operator, where pq E ˆ is a unitary group generator (excitation operator), and the summations are over the orbitals.
Therefore the energy parameters matrix ) 0 ( E is defined as Elements of  [19], it should be possible to choose the Fock operator so that all or most such matrix elements vanish.
For the MCSCF reference function the matrix elements of the pq E ˆ generators are just density matrix elements, and the zero-order energy is For diagonal elements ), ( where ) ( ) is the occupation number of the p-th orbital in If the Fock matrix f is diagonal, so is the It has been argued [20][21][22] that for size consistency (see Section 5) in multireference PT the zeroorder Hamiltonian should include certain two-electron terms.The 0 Ĥ proposed by Dyall [20] includes the two-electron interactions between active orbitals in matrix elements connecting configuration state functions that differ only in their active-orbital part.Such terms do not add an unduly large number of contributions to the matrix of 0 Ĥ , and do not substantially increase the computational effort for the solution of the linear equations for the perturbed wave function.

C. Choice of the generalized Fock operator
The simplest choice for the generalized Fock operator is a generic (spin-averaged) form, with where A more elaborate choice involves the true MCSCF Fock matrix [23] (which becomes symmetric upon convergence), Any linear transformation flexibility in the choice of the individual MCSCF orbitals can be resolved so as to make the Fock matrix f as diagonal as possible.But even if f is not diagonal, the matrix ) 0 ( E is very sparse, because F ˆ is a one-electron operator and can have nonvanishing matrix elements only between configuration state functions differing by at most one orbital.This property makes the solution of the linear system (12) relatively easy.A limited use of two-body terms in 0 Ĥ , as in Dyall's Hamiltonian [20], does not greatly increase the difficulty of this procedure.

D. Evaluation of the perturbation series in a CI program
The principal computational step in a direct CI program is the application of the Hamiltonian operator to a trial vector.This step can easily be used to calculate the first-order wave function and secondand third-order energies in the perturbation expansion.
Beginning with a trial vector are the coefficients of the reference configurations in the MCSCF wave function, Eq. ( 17), and all other components are zero, and noting that (where the matrix H is defined in terms of the original configuration state functions j Θ , with ), we obtain the vector of first-order coefficients by solving the linear equations (12) or, in the diagonal case, from Eq. ( 14).The computational effort of this step is quite small, because the nonzero part of ) 0 ( α c is very short.The first-order wave function is then has been obtained, the second-order energy is easily computed from Eq. ( 15).An application of the Hamiltonian to allows the evaluation of the third-order energy correction from Eq. ( 16).This last step is equivalent in computational effort to one CI iteration.The multireference perturbation theory calculation can in fact serve as an initial step in a CI treatment.

The treatment of mixed electronic states
Mixed electronic states are characterized by large changes in the relative contributions of the reference configurations in a well-correlated final wave function compared to the zero-order MCSCF function.Typically the reason for this behavior is that two or more zero-order wave functions are close in energy and can mix strongly in the wave functions for the corresponding correlated electronic states.
Examples are provided by avoided crossing regions in potential energy curves and surfaces and by states that involve mixing of valence and Rydberg character.The number of zero-order functions involved in the mixing is usually quite small, typically just a few multiconfigurational functions.
This problem can be dealt with by the use of quasidegenerate perturbation theory (QDPT) based on a model space consisting of the few zero-order functions expected to mix strongly in the states of interest.These zero-order functions ("model states") are a small subset of the multiconfigurational expansions obtained in the diagonalization of an MCSCF Hamiltonian over the reference space.An effective Hamiltonian is then constructed in second order and, if desired, third order and diagonalized to provide improved energies and wave functions.While the reference space may be quite large, only a few model states (each of which is an expansion over all the reference configurations) are used, and if these model states can be chosen to be well separated in energy from other states, the intruder-state problem common in QDPT treatments is not likely to be significant in this approach.However, it may be difficult to maintain such a separation while avoiding discontinuities in the model space over a wide range of geometries on a potential energy surface.
The procedure begins with a state-averaged MCSCF calculation in order to determine orbitals that represent a compromise between those most suitable for each of the individual zero-order functions likely to be involved in the mixed states.The coefficients of the reference configurations obtained in the MCSCF diagonalization for the contributing states provide a number of zero-order vectors ) , , , The zero-order Hamiltonian is chosen as in Eq. ( 19), with the whole reference block of ) 0 ( E taken to be diagonal, so that we have and elements of , do not enter into the calculation of the first-order wave function and the second-and third-order energies.
The effective Hamiltonian EFF Ĥ is expanded as where W ˆ is known as the shift operator.Only the model-space part of these operators is needed, and can be represented by l l × matrices, where the zero-order part is simply the diagonal model-space block of We also define The first-order effective Hamiltonian is simply the corresponding portion of the diagonalized MCSCF Hamiltonian, The effective Hamiltonian is related to the original Hamiltonian by a similarity transformation that decouples the model-space part from the rest of the Hilbert space, where the l-column decoupling matrix C, which is a representation of the wave operator, has the order-by-order expansion in terms of the coefficient matrices defining the order-by-order expansions of the wave functions.As before, the matrix H (unlike 0 H in Eq. ( 38)) is defined in terms of the individual configuration state functions i Θ .Equation ( 41) is a representation of the Bloch equation [9] in the configuration-statefunction basis.Note that the first m rows of The first-order wave operator is represented by the l-column matrix calculated for each of the zero-order functions by Eq. ( 12) or ( 14), analogously to the single-state calculations described in the previous section.The second-order non-Hermitian shift operator is then obtained as ) , , Solution of the l l × non-Hermitian eigenvalue problem ) provides the second-order energies in the diagonal matrix of eigenvalues ) 2 ( E and the expansion coefficients of the properly-mixed first-order wave functions For the calculation of the third-order shift operator we note that Wigner's ) 1 2 ( + n rule does not apply to the off-diagonal elements of the non-Hermitian shift operator, and therefore )

W cannot be
fully obtained from the first-order wave functions.Instead it is computed from the second-order wave functions, ) , , The complete second-order wave functions include contributions from triple and quadruple excitations and from the orthogonal complement space { } . However, as seen from Eq. (47), only those terms that interact (across V ˆ) with the zero-order functions, i.e., the single-and double-excitation terms, are required.If we choose ) 0 ( E to have no off-diagonal elements coupling the higher excitations with the single and double excitations, we can decouple the equations for the single-and double-excitation coefficients from the others and solve for these from the linear equations system In fact, in the diagonal case these coefficients are obtained directly from In either case, the principal computational step is the calculation of the matrix-vector products so that the computational effort is not very different from l times the work required for the calculation of the third-order energy for the single state case, Eq. ( 34).
The third-order shift-operator matrix elements are easily obtained by the contraction of vector pairs, ) , , This step is followed by the solution of the non-Hermitian eigenvalue problem for the third-order effective Hamiltonian matrix ( )

3
H to obtain the third-order energies C for the relevant part of the second-order wave functions ) 2 ( α Ψ , it does not provide the complete second-order wave function, because of the omission of the higher excitations and the orthogonal complement functions.Once the truncated second-order wave function has been obtained, the construction of the thirdorder effective Hamiltonian matrix and the diagonalization require little computational effort, and thus the total effort is still about l times the effort in the single-state case. An important aspect in which this treatment differs from the single-state case is the fact that it is no longer possible to tailor the zero-order Hamiltonian to resemble closely an appropriate Fock operator for each of the states in question, because of the use of state-averaged MCSCF.This aspect is the most serious shortcoming of the approach, especially in cases in which the model states differ greatly in the character of their orbitals.Furthermore, the most reasonable choices for the generalized Fock operator, and thus for 0 Ĥ , are likely to involve more nonzero off-diagonal elements in ) 0 ( E and V, and to be less analogous to the Møller-Plesset partitioning of single-reference perturbation theory.Nevertheless, the ) 0 ( E matrix should remain very sparse, because it is the matrix representation of a one-electron operator (the state-averaged Fock operator).Another potential problem is that it may be very difficult to choose a model space that does not vary discontinuously over a potential energy surface.

Discussion
The principal advantage of Rayleigh-Schrödinger perturbation theory over configuration interaction is the extensivity of the energy in each order of perturbation, i.e., the proper scaling of the energy with the extent of the system [28].This property is a consequence of the true order-by-order nature of the Rayleigh-Schrödinger perturbation expansion (unlike Brillouin-Wigner perturbation theory, in which the infinite-order energy appears in each order of the energy expression).In a many-body formulation extensivity can be determined by demonstrating the absence of unlinked-diagram terms in the energy expressions.However, multireference many-body treatments using incomplete model spaces generally contain some unlinked diagrams, and thus are not exactly extensive [29,30].In the present, configuration-based treatment the diagrammatic analysis is not easily applied.As in the many-body approach, it is to be expected that extensivity of the proposed method would depend on the choice of the model space, but the true order-by-order nature of the expansion should help in providing approximate extensivity .
A related property is size consistency [31] (or strict separability [26]), which requires that the energy calculated for a system consisting of noninteracting fragments be equal to the sum of the energies, calculated by the same model, for the separate fragments.This property ensures proper description of bond breaking.However, a perturbation expansion cannot be size consistent, in general, unless the reference function is size consistent.In a Hartree-Fock-based single-reference perturbation expansion (Møller-Plesset perturbation theory) size consistency usually requires the use of an unrestricted Hartree-Fock (UHF) reference function, which has undesirable consequences [32,33].In multireference methods size consistency is not easily defined because of the difficulty in specifying equivalent reference spaces for the total system and its fragments.Therefore, the approach normally used in describing bond breaking and other processes in multireference calculations is to treat the system as a single unit in all its configurations, including its dissociation asymptotes.This procedure, called the supermolecule approach, usually produces satisfactory dissociation energies even in truncated CI calculations [34] (which are not size consistent by the usual definition), provided the reference function dissociates properly.In a single-state expansion based on an MCSCF zero-order function the reference space can be chosen to ensure proper dissociation of that function and, using the supermolecule approach, of the order-by-order energy results (assuming there are no other approximations, such as omission of terms from the summations).It is likely that the same statement can be applied to the procedure discussed here for mixed states.
There have been many formulations of multireference perturbation theory, both state-specific and quasidegenerate [6,7,21,22,[35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52].Several (e.g., [21,22,52]) have focused specifically on achieving extensivity and/or size consistency (see also [53,54]).The single-state treatment described in Section 3 is similar to the approach of Hirao [48], differing from it primarily in how the flexibility in the choice of the MCSCF orbitals is used and, therefore, in the choice of the Fock matrix elements.Unlike CASPT2 [6] and CASPT3 [7], the present treatment does not assume a complete-active-space reference space.On the other hand, the CASPT methods expand the perturbed wave functions in contracted excited functions, rather than in the individual configuration state functions used in the present approach, resulting in much shorter expansions (at the cost of a more complicated procedure).The multi-state generalization shares some ideas with extensions to the contracted CI approach for dealing with mixed stated [8].The key elements in the approach described here are the attempt to mimic Møller-Plesset partitioning and the use of just a few state-averaged MCSCF wave functions to define the model space for a QDPT calculation of the second-and third-order energies.The method treats the several mixing states symmetrically, and since it is configuration based rather than a many-body approach, it avoids the difficulties that the use of such model states would present for a diagrammatic formalism.However, this gain is obtained at the probable cost of a less efficient procedure than may be possible in a diagrammatic method.
A diagonal version of the single-state approach described in Section 3 was reported by Shavitt and Stahlberg in 1991 [55,56] after implementation and testing in the COLUMBUS program system [57].
This implementation was later generalized to the nondiagonal case [57(b)].The multistate formalism has been implemented in a development version of the COLUMBUS program system.It differs from the method of Roos and co-workers [18] in several respects: It is not limited to complete-active-space reference spaces; it expands the perturbed wave functions in uncontracted excited configuration state functions instead of contracted excitation functions; and it includes the third-order capability.Test applications of this formalism have reproduced the second-order results of Roos and co-workers [18(b)] for the avoided crossing in LiF and for the V-state of ethylene, and produced moderate improvements in third order [58].
The idea of a model space focusing on just a few model states, much smaller in number than the size of the reference space in which the model states are constructed, is also a feature of the intermediate Hamiltonian method of Malrieu and co-workers [41,59].Among reviews of multireference perturbation methods are those by Malrieu and co-workers [59] and Hirao and co-workers [60].
, the first-order equation becomes

Φ
into the calculation of the first-order wave function and the secondand third-order energy, and need not be specified in the present treatment (except that 0 .The principal difference between 0 Ĥ and F ˆ is the omission from 0 Ĥ is an optimized MCSCF wave function that satisfies a generalized Brillouin theorem are one-electron and two-electron integrals over the MCSCF orbitals.This choice reduces to ordinary Møller-Plesset partitioning in the single-reference closed-shell RHF case.

C
the previous section, and like them, are orthonormal and noninteracting over H ˆ (and over V ˆ).The ) of l columns, in which at most the first m rows (corresponding to the reference configurations) are nonzero.

X
is also obtained, and can be used to obtain the transformed coefficients matrix ) 2 (