Excited-State Polarizabilities: A Combined Density Functional Theory and Information-Theoretic Approach Study

Accurate and efficient determination of excited-state polarizabilities (α) is an open problem both experimentally and computationally. Following our previous work, (Phys. Chem. Chem. Phys. 2023, 25, 2131−2141), in which we employed simple ground-state (S0) density-related functions from the information-theoretic approach (ITA) to accurately and efficiently evaluate the macromolecular polarizabilities, in this work we aimed to predict the lowest excited-state (S1) polarizabilities. The philosophy is to use density-based functions to depict excited-state polarizabilities. As a proof-of-principle application, employing 2-(2′-hydroxyphenyl)benzimidazole (HBI), its substituents, and some other commonly used ESIPT (excited-state intramolecular proton transfer) fluorophores as model systems, we verified that either with S0 or S1 densities as an input, ITA quantities can be strongly correlated with the excited-state polarizabilities. When transition densities are considered, both S0 and S1 polarizabilities are in good relationships with some ITA quantities. The transferability of the linear regression model is further verified for a series of molecules with little or no similarity to those molecules in the training set. Furthermore, the excitation energies can be predicted based on multivariant linear regression equations of ITA quantities. This study also found that the nature of both the ground-state and excited-state polarizabilities of these species are due to the spatial delocalization of the electron density.


Introduction
Molecular polarizability, especially static electric dipole polarizability (α), is a fundamental physicochemical property. It reflects the change of a molecule's dipole moment in a linear-response manner resulting from an external electric field perturbation [1]. It is a fundamental dielectric property important in many disciplines, especially in materials science [2,3]. Experimental determination of excited-state electrostatic properties is mainly determined using Stark spectroscopy, the electronic absorption/emission [4,5] method, and the flash photolysis time-resolved microwave-conductivity (FP-TMRC) [6,7] technique.
In classical physics, polarizability can be approximately obtained in terms of the volume of a system [8,9]. For example, many strong correlations have been observed for both atoms and molecules [10][11][12][13][14][15][16][17][18]. Tkatchenko and Scheffler (TS) [19] proposed use of atomic volumes and atomic polarizabilities to predict the ground-state polarizabilities for small molecules. Recent progress can be found in [20]. However, performance for excited-state systems has not been reported.
In quantum mechanics, polarizability can be obtained by iteratively solving the coupled-perturbed Hartree-Fock (CPHF) equation [21,22] or its Kohn-Sham DFT (density functional theory) counterpart [23]. This protocol requires a sufficiently large basis set with In quantum mechanics, polarizability can be obtained by iteratively solving the coupled-perturbed Hartree-Fock (CPHF) equation [21,22] or its Kohn-Sham DFT (density functional theory) counterpart [23]. This protocol requires a sufficiently large basis set with polarization and diffuse functions and huge computational costs, but the computational barriers can be partly overcome by using some linear-scaling methods [24][25][26]. In addition, machine learning (ML)-based [27][28][29] methods and a regression-based [30] model have been applied to predict the S0 polarizabilities. Polarizability can be related to the band gap of HOMO (highest occupied molecular orbital) and LUMO (lowest unoccupied molecular orbital) in an inverse manner [31][32][33].
In the literature, only a few studies [34][35][36][37][38][39][40] have reported excited-state polarizabilities. This is likely because accurate predictions of excited-state geometries and molecular properties of large molecules are difficult, especially when there are perturbations such as external fields.
Following our previous work, used information-theoretic approach (ITA) quantities to predict the S0 polarizabilities of both small and large molecules [41,42]. Here, we aim to predict the S1 polarizabilities of 2-(2′-hydroxyphenyl)benzimidazole (HBI, 1) and its derivatives as shown in Scheme 1. For 1, it is well-documented [43] that the S0 (Scheme 1a) intramolecular proton transfer (IPT) reaction takes place with difficulty, whereas the S1 (Scheme 1b) or T1 (triplet, not shown) intramolecular proton transfer (ESIPT) process can easily happen. Thus, in this work, only S0 and S1 were considered to reduce computational cost without compromising the results. We have found that with S0 or S1 electron densities as inputs, ITA quantities had good correlations with the excited-state polarizabilities. When transition densities were considered, both S0 and S1 polarizabilities had good relationships with ITA quantities. Furthermore, excitation energies could be predicted based on multiple linear regression equations of ITA quantities. For the first time, we applied the ITA quantities to predict excited-state polarizabilities. It is anticipated that this protocol can be readily applied to condensed-phase systems.  Table 1 are the correlation coefficients (R 2 ) between the S 0 polarizabilities (α iso ) and ITA quantities, molecular volumes, and quadrupole moments, which were obtained at the CAM-B3LYP/6-311+G(d) level for all the molecular systems shown in Schemes 1 and 2. The linear regression equations can be found in Table S1. It is clear from Table 1 that the Ghosh−Berkowitz−Parr (GBP) entropy (S GBP ), 2nd and 3rd relative Rényi entropy ( r R 2 and r R 3 ), information gain (I G ), G 1 , G 2 , and G 3 , and quadrupole moments (Θ iso ) have strong linear relationships with α iso , with R 2 > 0.8. Of note, the G 3 data were shown to be strongly correlated with α iso for various systems, among which were 30 planar or quasi-planar bases [41], and 20/40/8000 amino acids, dipeptides and tripeptides [41,42]. The Θ iso values had a good relationship with α iso , the theoretical rationale for which can be found in [42]. However, a solid and sound theoretical verification between G 3 and α iso was lacking. Additionally, molecular volumes had only reasonably good correlation with α iso , with R 2 = 0.705, indicating that it is not a good descriptor of α iso . For Shannon entropy and Fisher information, there exist three exceptions for 2, 8, and 13. It was found that these three molecules are Br-containing, indicating that there may be some regions for heavy atoms where the electron density and/or its gradients are numerically inconsistent. Taking the three outliers away, much stronger correlations were found for Shannon entropy (better than Fisher information). Here, the nature of ground-state polarizabilities was due to the spatial delocalization of the electron density. Overall, the strong correlations indicate that our computational results are sound.

Results
Shown in Table 1 are the correlation coefficients (R 2 ) between the S0 polarizabilities (αiso) and ITA quantities, molecular volumes, and quadrupole moments, which were obtained at the CAM-B3LYP/6-311+G(d) level for all the molecular systems shown in Schemes 1 and 2. The linear regression equations can be found in Table S1. It is clear from Table 1 that the Ghosh−Berkowitz−Parr (GBP) entropy (SGBP), 2nd and 3rd relative Rényi entropy ( r R2 and r R3), information gain (IG), G1, G2, and G3, and quadrupole moments (Θiso) have strong linear relationships with αiso, with R 2 > 0.8. Of note, the G3 data were shown to be strongly correlated with αiso for various systems, among which were 30 planar or quasi-planar bases [41], and 20/40/8000 amino acids, dipeptides and tripeptides [41,42].
The Θiso values had a good relationship with αiso, the theoretical rationale for which can be found in [42]. However, a solid and sound theoretical verification between G3 and αiso was lacking. Additionally, molecular volumes had only reasonably good correlation with αiso, with R 2 = 0.705, indicating that it is not a good descriptor of αiso. For Shannon entropy and Fisher information, there exist three exceptions for 2, 8, and 13. It was found that these three molecules are Br-containing, indicating that there may be some regions for heavy atoms where the electron density and/or its gradients are numerically inconsistent. Taking the three outliers away, much stronger correlations were found for Shannon entropy (better than Fisher information). Here, the nature of ground-state polarizabilities was due to the spatial delocalization of the electron density. Overall, the strong correlations indicate that our computational results are sound.   Table 2 shows S 1 polarizabilities, S 1 ITA quantities, including Shannon entropy (S S ), Fisher information (I F ), 2nd and 3rd relative Rényi entropy ( r R 2 and r R 3 ), G 2 and G 3 , molecular volumes (Vol), and quadrupole moments (Θ iso ), obtained at the TD-CAM-B3LYP/6-311+G(d) level. Table 2 also shows the correlation coefficients (R 2 ) between the S 1 polarizabilities and other quantities at S 1 . The linear regression equations can be found in Table S2. Note that some ITA quantities, well-defined at S 0 , are numerically inconsistent (for example, negative occupancy) at S 1, and are thus missing. One can see that G 2 , G 3 , and Vol have good correlations with α iso at S 1 , with R 2 > 0.8. It is intriguing that at S 1 and G 3 still have good correlations with α iso . This is the first time such a phenomenon has been observed. However, admittedly, the theoretical foundation lags behind the numerical evidence found in this work. Moreover, we have found that for Vol, the R 2 value was much larger at S 1 (0.818) than that at S 0 (0.705). One possible reason is that the excited-state relaxation expands the volume and polarizability space. Finally, in Columns 3 and 4 of S S and I F , 2, 8, and 13 seem to have abnormal values compared with the others. They are all Br-containing, indicating that there may be some regions for heavy atoms where density gradients are numerically inconsistent at S 1 . Overall, we showed that excited-state densities and molecular polarizabilities α iso are mutually entangled. After showing that ITA quantities can be correlated with α iso either at S 0 or S 1 , it is important to ask if one can use ITA quantities at S 0 to predict α iso at S 1 . The answer is definitely yes. Table 3 shows the correlation coefficients (R 2 ) between the α iso values at S 1 and ITA quantities, molecular volumes, and quadrupole moments at S 0 introduced in Table 1. More details can be found in Table S3. The linear regression equations can be found in Table S4. One can see that the α iso @S 0 , I G , G 1 , G 2 , and G 3 have good relationships with α iso @S 1 , with R 2 > 0.8. Moving forward, we ask if one can use the transition density (matrix) as an input for ITA quantities to correlate with α iso either at S 0 or S 1 . The answer again is yes. Table 4 shows strong correlations between α iso either at S 0 or S 1 and ITA quantities with the transition density matrix as an input. The linear regression equations can be found in Table S5. Except for Shannon entropy (S S ) and Fisher information (I F ), the other ITA quantities have reasonably good or strong correlations with α iso , either at S 0 or S 1 , with R 2 ranging from 0.72 to 0.94. The implication of this is that electron-density-based quantities can be used to predict excited-state properties, such as molecular polarizabilities. Table 3. Correlation coefficients (R 2 ) between the α iso @S 1 (in Bohr 3 ), and α iso @S 0 (in Bohr 3 ), ITA quantities@S 0 (in a.u.), molecular volume Vol@S 0 (in Bohr 3 /mol), and the isotropic quadrupole moment Θ iso @S 0 (in a.u.) for all the systems in Schemes 1 and 2. To further showcase the transferability of the linear regression models, we compared both the S 0 and S 1 polarizabilities for all systems in Scheme 3, as shown in Tables 5 and 6, respectively. Since Shannon entropy and Fisher information have exceptions for the linear regressions, they were not used for predicting the polarizabilities of all systems in Scheme 3. From Table 5, one can see that, S GBP , r R 2 , r R 3 , and G 3 perform well in reproducing the conventional S 0 polarizabilities, as evidenced by the mean unsigned errors (MUEs) and mean signed errors (MSEs). The other five quantities show reasonably good predictions. From Table 6, one can observe that the predicted S 1 polarizabilities are in reasonably good agreement with the conventional data. All the quantities show comparable performance. Taken together, we have shown that the transferability of the linear regression models is convincing, as the systems in Scheme 3 show little or low similarity to those in Schemes 1 and 2.     Next, we compared the α iso data (either at S 0 or S 1 ) predicted by the TS formulas with conventional results as a reference, as shown in Table 7. The predicted α iso results are shown in Tables S6 and S7. Employing the original Tkatchenko-Scheffler (TS) formula [19] with Becke [44] or Hirshfeld [45] partitions, the α iso data (either at S 0 /S 1 ) were either strongly underestimated or overestimated, with MUE(%) up to −26.44/−24.94 and 7.92/8.93, respectively. It was found that a mean value could reduce the MSE(%) to 14.40/15.53. Moreover, with the new TS formula [20], the results were not improved but worsened, as shown in Table 7. Taken together, we found that the TS formulas have room to improve in predicting S 1 polarizabilities. Finally, we found that excitation energies can be predicted in addition to multivariant linear regression equations of ITA quantities. For example, one can use the S 0 density matrix as an input for ITA quantities to correlate with excitation energies. Similarly, if the transition density matrix is used for ITA quantities, the excitation energies can also be predicted. Based on the two regression equations constructed from all the systems in Schemes 1 and 2, we collected predicted excitation energies, and those from conventional calculations, for all the systems in Scheme 3, as shown in Table 8. The MUEs and MSEs were −0.84/1.08 eV and −0.76/1.21 eV, respectively, for the excitation energies. This indicates that the inaccuracy of this protocol is comparable to that of underlying approximations of DFT [46,47].

Discussion
To accurately and efficiently predict excited-state polarizabilities is an ongoing issue. Solving standard CPHF/CPKS equations is computationally intensive, and the computational costs can be excessive for macromolecular systems. Other algorithms and models available in the literature are normally concerned with ground-state polarizabilities. Within this context, we applied density-based ITA quantities to correlate with α iso at S 0 /S 1 . This was inspired by our previous work on predicting the ground-state polarizabilities for small and macromolecular systems. Our tentative results show that the protocol is a promising theoretical tool. More systems along this line need to be considered to make this protocol more robust and applicable. We have to point out that when the system under study increases in size, the molecular wavefunctions (thus electron density) become difficult to assess and sometimes computationally unfeasible. Under these circumstances, we have to resort to linear-scaling electronic structure methods, such as GEBF (generalized energybased fragmentation method) [48][49][50][51], where only small subsystems of a few atoms or groups are treated.
Next, we adopted the TS method, as mentioned previously. We had found previously that, based upon the Hirshfeld or Becke partition scheme, the original TS formula had unsatisfactory performance either by overestimating or underestimating the S 1 polarizabilities. An apparent reduction of the deviations can be obtained by averaging the two sets of results. The reason behind this is unclear at the moment. From the original formula,  [52]. In the same way, a revised TS formula, improved performance of predicting S 0 polarizabilities. However, its predicting power was shown to be far from satisfactory for macromolecules. In this work, we further found that both the original and new TS formulas fail to give a satisfactory description of the S 1 polarizability. This indicates that the two formulas are oversimplified and may be systemdependent. Overall, the volume-based TS equations are inferior to the density-based ITA regression equations.

Information-Theoretic Approach Quantities
Shannon entropy S S [53] and Fisher information I F [54] are two cornerstone quantities in information theory. They are defined as Equations (1) and (2), respectively: (1) where ρ(r) is the electron density and ∇ρ(r) is the density gradient. The physical picture of S S and I F is clear; the former measures the spatial delocalization of the electron density and the latter gauges the sharpness or localization of the same. Apart from total density, components such as kinetic-energy density can be used to define an ITA quantity. With both electron density and the kinetic energy density, Ghosh, Berkowitz, and Parr developed a formula for entropy (S GBP ) [55]: where t(r; ρ) and t TF (r; ρ) represent the non-interacting and Thomas-Fermi (TF) kinetic energy density, respectively. Here k, c, and c K are three constants (k, the Boltzmann constant, c = (5/3) + ln(4πc K /3), and c K = (3/10)(3π 2 ) 2/3 ). Full integration of kinetic energy density t(r; ρ) leads to the total kinetic energy T S via while t(r; ρ) can be obtained from the canonical orbital densities, and t TF (r; ρ) is simply cast in terms of ρ(r), Of note, the kinetic-energy density can differ in its form, and can be used in different contexts [56][57][58][59][60][61][62][63]. However, S GBP satisfies the maximum-entropy requirement from a mathematical viewpoint [55].
Recently, in [70] one of the present authors proposed three functions G 1 , G 2 , and G 3 , at both atomic and molecular levels. These are defined as below: Equations (9)- (11) have been theoretically derived, numerically verified, and used in many applications, as in refs [41,42,70,71]. As one of our major achievements over the past decade, we aimed to combine density functional theory and information theory in a seamless manner, since the two theories can have electron density as an input. Our recent progress along this line can be found in two reviews [72,73]. As another prominent example, we have applied the ITA to assess homochirality [74,75].
Finally, the Hirshfeld's stockholder approach [45,[76][77][78][79] is often introduced to partition atoms in a molecule in the literature, as defined in Equation (12), Here, ρ A (r) is the atomic Hirshfeld density, ω A (r) is a sharing function, and ρ 0 B (r − R A ) is the atomic density of B centered at R A . The sum over all the free atom densities, typically spherically averaged S 0 atomic densities, is usually termed the promolecular density. The Stockholder approach is used in the context of ITA because it is also based on information-theoretic arguments. Alternative partitioning schemes include Becke's fuzzy atom approach [44] and Bader's zero-flux atoms-in-molecules (AIM) method [80].

Computational Details
All density functional theory (DFT) calculations were performed with the Gaussian 16 [81] package. Default options included ultrafine integration grids and tight selfconsistent field convergence, which were adopted to eliminate numerical noises. The ground-and excited-state structural relaxation was carried out at the CAM-B3LYP/6-311+G(d) [82,83] and TD-CAM-B3LYP/6-311+G(d) [82][83][84][85][86] level, respectively, for 36 molecular systems, as shown in Schemes 1-3. Optimized atomic Cartesian coordinates are supplied in the Supplementary Materials. Harmonic vibrational frequency calculations were executed at the same level, and no imaginary frequencies were observed by direct visual inspection. Isotropic polarizabilities (α iso = (α xx + α yy + α zz )/3) and isotropic quadrupole moments (Θ iso = (Θ xx + Θ yy + Θ zz )/3), molecular volumes (at 0.001 e/Bohr 3 contour surface of electronic density), and molecular wavefunctions, were obtained at the CAM-B3LYP/6-311+G(d) level. The Multiwfn 3.8 [87] program was utilized to calculate all ITA quantities at S 0 and S 1 using the molecular wavefunction file as an input. Of note, except for the S 0 state, the keyword "density=current" in Gaussian 16 was used to dump the density matrix in a wavefunction file. The stockholder Hirshfeld partition scheme of atoms in molecules was employed when atomic contributions were concerned. The reference-state density was the neutral atom calculated at the same level of theory as molecules.

Conclusions
In this work, we extended the information-theoretic approach (ITA) to treat excitedstate polarizabilities. With the ground-state, the first excited-state, or the transition density matrix as an input for ITA quantities, strong linear correlations were observed between molecular polarizabilities and ITA quantities. Based upon these regression equations for molecules in the training set, one can further predict the molecular polarizabilities for other systems in the test set. We verified that good accuracy and transferability of the linear regression models can be obtained. In addition, we found that the nature of both ground-state and excited-state polarizabilities is due to the electron delocalization of the electron density. Finally, we revealed that excitation energies can be predicted on top of a multivariant linear regression equation of ITA quantities. We mention in passing that the ITA quantities were applied to ground-state macromolecules in our previous work and excited-state systems in this work. More work along this line is underway for condensed-phase systems, and the results will be presented elsewhere.