Next Article in Journal
Lewis Acid–Base Adducts of α-Amino Isobutyric Acid-Derived Silaheterocycles and Amines
Previous Article in Journal
A Novel Aptamer Selection Strategy for Pseudomonas aeruginosa and Its Application as a Detecting Probe in a Hybrid Lateral Flow Assay
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting the Post-Hartree-Fock Electron Correlation Energy of Complex Systems with the Information-Theoretic Approach

1
Key Laboratory of High Performance Scientific Computation, School of Science, Xihua University, Chengdu 610039, China
2
School of Basic Medical Sciences, Yunnan University of Chinese Medicine, Kunming 650500, China
3
Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education, Institute of Biomedical Research, School of Chemical Science and Technology, Yunnan University, Kunming 650500, China
4
Yunnan Key Laboratory of Research Development for Natural Products, School of Pharmacy, Yunnan University, Kunming 650500, China
5
Department of Chemistry and Chemical Biology, McMaster University, Hamilton, ON L8S 4M1, Canada
6
Research Computing Center, University of North Carolina, Chapel Hill, NC 27599, USA
7
Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599, USA
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Molecules 2025, 30(17), 3500; https://doi.org/10.3390/molecules30173500
Submission received: 27 July 2025 / Revised: 19 August 2025 / Accepted: 25 August 2025 / Published: 26 August 2025

Abstract

Employing some simple physics-inspired density-based information-theoretic approach (ITA) quantities to predict the electron correlation energies remains an open challenge. In this work, we expand the scope of the LR(ITA) (LR means linear regression) protocol to more complex systems, including (i) 24 octane isomers; (ii) polymeric structures, polyyne, polyene, all-trans-polymethineimine, and acene; (iii) molecular clusters, such as metallic Ben and Mgn, covalent Sn, hydrogen-bonded protonated water clusters H+(H2O)n, and dispersion-bound carbon dioxide (CO2)n, and benzene (C6H6)n clusters. With LR(ITA), one can simply predict the post-Hartree-Fock (such as MP2 and coupled cluster) electron correlation energies at the cost of Hartree-Fock calculations, even with chemical accuracy. For large molecular clusters, we employ the linear-scaling generalized energy-based fragmentation (GEBF) method to gauge the accuracy of LR(ITA). Employing benzene clusters as an illustration, the LR(ITA) method shows similar accuracy to that of GEBF. Overall, we have verified that ITA quantities can be used to predict the post-Hartree-Fock electron correlation energies of various complex systems.

1. Introduction

Electron correlation energy lies at the heart of quantum chemistry [1,2]. However, the computational cost of high-level post-Hartree–Fock methods skyrockets with system size. In this context, there is a pressing need for alternative lower-scaling cost-efficient methods across broad classes of systems. In recent years, the information-theoretic approach (ITA) [3,4,5,6] has emerged as a promising framework for understanding and predicting the electron correlation energy from the perspective of information theory. By treating the electron density as a continuous probability distribution, ITA introduces a set of descriptors, such as Shannon entropy [7] and Fisher information [8], that encode global and local features of the electron density distribution. These quantities are inherently basis-agnostic and physically interpretable, providing a new lens through which quantum chemical problems can be approached.
In continuation with our previous work by employing the simple physics-inspired density-based ITA quantities to appreciate response properties [9,10,11,12,13] (such as molecular polarizability and NMR chemical shielding constant) and energetics of elongated hydrogen chains [14], in this work, we aim to predict the post-Hartree–Fock (see Figure 1) electron correlation energies of various molecular clusters and linear or quasi-linear organic polymers with increasing cluster size and polymer length. The shared set of physically motivated ITA quantities include Shannon entropy ( S S ) [7], Fisher information ( I F ) [8], Ghosh, Berkowitz, and Parr entropy (SGBP) [15], Onicescu information energy (E2 and E3) [16], relative Rényi entropy ( R 2 r and R 3 r ) [16], relative Shannon entropy (IG) [17] and relative Fisher information (G1, G2, and G3) [18]. The definitions of these 11 quantities can be found in Section 4. The Shannon entropy characterizes the global delocalization of the electron density, reflecting how uniformly electrons are distributed throughout space. The Fisher information quantifies local inhomogeneity, serving as a measure of the sharpness or localization of density features such as bonding regions or lone pairs. The Kullback–Leibler divergence (relative entropy) measures the distinguishability between two densities, providing a quantification of the difference in electronic structure between two systems/states. These systems include(i) 24 octane isomers (see Figure 2) [11]; (ii) polymeric structures (see Figure 3), polyyne, polyene, all-trans-polymethineimine, and acene [11]; (iii) molecular clusters (see Figure 4), such as metallic Ben and Mgn [19,20], covalent Sn [21,22], hydrogen-bonded protonated water clusters H+(H2O)n [23], and dispersion-bound carbon dioxide (CO2)n [24], and benzene clusters (C6H6)n [25]. We construct strong linear relationships between the low-cost Hartree–Fock [26] ITAs and the electron correlation energies from post-Hartree–Fock methods, such as MP2 or RI-MP2 [27,28,29], CCSD [30,31], and CCSD(T) [32]. It is noteworthy to mention that MP2 is mainly used here only as a proof-of-concept; Hartree–Fock can be simply replaced with any approximate functionals of density functional theory (DFT) [33,34].
By examining trends across increasing cluster size and polymer length, we assess the transferability, scalability, and physical insights provided by ITA features in capturing electron correlation. Our findings highlight not only the feasibility of ITA-driven correlation energy prediction but also reveal key descriptors that most strongly govern correlation effects in extended systems. These results suggest that ITA may serve as a promising direction for developing efficient, interpretable, and physically grounded models in quantum machine learning and electronic structure theory.

2. Results

To validate the accuracy of the LR(ITA) method, we chose a total of 24 octane isomers as shown in Figure 2. MP2, CCSD, and CCSD(T) are used to generate the electron correlation energies, and ITA quantities are obtained at the Hartree-Fock level at the same basis set 6-311++G(d,p). More details can be found in the Supplementary Materials (Table S1). Table 1 shows the linear relationships and RMSDs between the LR(ITA)-predicted and calculated electron correlation energies. There seems to be no substantial differences between R2 (and RMSD) values for MP2, CCSD, and CCSD(T). IF is slightly better than SGBP and substantially better than SS, which reflects the highly localized nature of the density in alkanes. For SS, IF, and SGBP, the RMSDs are <2.0 mH, indicating that LR(ITA) should be accurate enough to predict the electron correlation energies. Because CCSD and CCSD(T) are too computationally-intensive and intractable, only MP2 is used hereafter as proof-of-concept.
In Table 2, Table 3, Table 4 and Table 5, we have collected the linear correlation coefficients (R2 = 1.000) and RMSDs (root mean squared deviations) between the calculated correlation energies at the MP2/6-311++G(d,p) level and those predicted based on the ITA quantities at the HF/6-311++G(d,p) level for polyyne, polyene, all-trans-polymethineimine, and acene, respectively. More details can be found in Tables S2–S5. Some ITA quantities are not tabulated in the text mainly because of inferior accuracy, for example, G2 in Table 2, G2 and IG in Table 3, and G1, G2, and IG in Table 4, respectively. It is clearly showcased that R2 is close to 1 for most ITA quantities. More strikingly, based on the linear regression (LR) equations of ITA quantities, the predicted electron correlations deviate from the calculated ones only by ~1.5 mH for polyyne, ~3.0 mH for polyene, and <4.0 mH for all-trans-polymethineimine. For acene, the RMSDs are reasonably satisfactory by ~10–11 mH. These results collectively reveal that ITA quantities are indeed good descriptors of electron correlations for those linear or quasi-linear polymeric systems with delocalized electronic structures. For more challenging acenes, a single ITA quantity fails to capture a sufficient amount of information about more delocalized electronic structures.
Shown in Table 6, Table 7 and Table 8 are the results of the linear correlation coefficients (R2) and RMSDs (root mean squared deviations) between the calculated correlation energies at the MP2/6-311++G(d,p) level and those predicted based on the ITA quantities at the HF/6-311++G(d,p) level for neutral metallic Ben, Mgn, and covalent Sn systems, respectively. More details can be found in Tables S6–S11. One can see that strong correlations exist (R2 > 0.990) between ITA quantities and MP2 correlation energies, indicating that they are extensive in nature. However, the predicted electron correlation energies deviate much from the calculated ones by ~28–37 mH for Ben, ~17–33 mH for Mgn, and ~26–42 mH for Sn, respectively. These results collectively showcase that for 3-dimensional metallic clusters, Ben and Mgn, and covalent Sn, a single ITA quantity fails to quantitatively capture enough information about electron energies of complex systems.
Shown in Table 9 are the results of the linear correlation coefficients (R2) and RMSDs (root mean squared deviations) between the calculated correlation energies at the MP2/6-311++G(d,p) level and those predicted based on the ITA quantities at the HF/6-311++G(d,p) level for hydrogen-bonded protonated water clusters. The corresponding regression slopes and intercepts are provided in Table S12. Of note, the ITAs and the MP2 correlation energies are not shown mainly because the dataset has a total of 1480 structures. One can see that strong correlations exist (R2 = 1.000) between (8 out of 11) ITA quantities and the MP2 correlation energies, indicating that they are extensive in nature. The RMSDs range from 2.1 ( E 2 and E 3 ) to 9.3 ( G 3 ) mH, indicating that ITA quantities are good descriptors of the post-Hartree-Fock electron correlation energies of hydrogen-bonded systems.
Finally, we will switch our gear to two dispersion-bound clusters, (CO2)n and (C6H6)n. Table 10 gives the strong correlations (R2 = 1.000) and RMSDs between the RI-MP2 correlation energies and Hartree–Fock ITA quantities at the same basis set 6-311++G(d,p) for (CO2)n(n = 4−40). More details can be found in Table S13. The RMSDs vary from 6.3 ( E 2 and E 3 ) to 10.8 ( G 3 ) to 14.6 ( S S ) mH. For (C6H6)n (n = 4−14) clusters, we have calculated the linear correlations (R2 = 1.000) and RMSDs between the MP2/6-311++G(d,p) electron correlation energies and HF/6-311++G(d,p) ITA quantities, as collected in Table 11. More details can be found in Tables S14 and S15. The RMSDs range from 2.8 ( G 3 ) to 6.9 ( E 3 ) to 10.7 ( S S ) mH. The RMSD results collectively suggest (8 out of 11) ITA quantities are reasonably good descriptors of the post-Hartree-Fock electron correlation energies of dispersion-bound clusters.
To further illustrate the extrapolative capability of the LR(ITA) method, we employ some relatively larger (C6H6)n (n = 15−30) clusters to this end. Plus, as conventional MP2/6-311++G(d,p) calculations are too computationally-intensive, we employ GEBF [35,36,37,38] to obtain the MP2-level electron correlation energies as reference. Finally, as the linear regression based on the ITA quantity G3 has the least RMSD value, we choose LR(G3) to make predictions of electron correlation energies of benzene clusters. More details can be found in Tables S15 and S16. Figure 5 shows a comparison of the LR(G3)-predicted and GEBF-calculated MP2 electron correlation energies for benzene clusters (C6H6)n (n = 15−30). The RMSD between the LR(G3)-predicted and GEBF-calculated data is 8.6 mH, indicative that the LR(ITA) method has a comparable performance to the linear-scaling GEBF method. Of note, the R2 and RMSD values in Figure 5 characterize the prediction quality of an extrapolated set, which differs from the regression statistics in the previous tables that summarize fits within the training set. In addition, we have found that when subsystem wavefunctions (thus electron density and ITA quantities) are used to obtain the subsystem electron correlation energies, the final total electron correlation energies of GEBF-LR(G3) deviate from GEBF by 40.0 mH in terms of RMSD, as shown in Figure 5 and Table S17. This indicates that it is not a good choice to combine the ITA quantities with a fragment-based method (GEBF in our case) for predicting the electron correlation energy. One possible reason for this observation may come from the error accumulation, rather than error cancellation, on which the great success of GEBF relies. To further verify this point, we have plotted the deviations of LR(G3) and GEBF-LR(G3) as referenced to those of GEBF with respect to the cluster size as shown in Figure S1, it is lucidly shown that the overall trend observed for LR(G3) and GEBF-LR(G3) is that the deviation only fluctuates to some degree for the former; while that of the latter grows with the cluster size.

3. Discussion

To accurately and efficiently predict the post-Hartree-Fock electron correlation energy at a relatively low cost is a hot area in the community of quantum chemistry. Starting from Hartree-Fock molecular orbitals, there exist two typical methods. One is to calculate the local electron correlation energy, whose early development is due to Pulay and Sæbø [39,40,41]; the other is to predict the correlation energy with the aid of deep learning (DL) [42,43,44,45,46,47,48,49,50,51]. Our proposed LR(ITA) method is a special flavor of DL. Suffice it to note that an inherent drawback of local correlation methods is that they perform orbital localization [52,53]. This problem is also encountered by the DL-driven method. For our LR(ITA) method, only the molecular orbitals (thus, the electron density) are required without any manipulation. Very recently, we have showcased the good accuracy of LR(ITA) and its variant DL(ITA). With LR(ITA), one can even predict the FCI-level electron correlation with the DMRG (density matrix renormalization group) [54,55] algorithm as a solver for the elongated hydrogen chain [14], and the RMSD is only a few mH. Moreover, with DL(ITA), where a total of 11 ITA quantities are used as input [13], we have predicted the DLPNO-MP2 (Domain-Based Local Pair Natural Orbital MP2) [56] electron correlation energy for a database of >90 K real organic molecules, and the RMSD is about 6.8 mH. In addition, LR(ITA) is not limited to any post-Hartree-Fock electronic structure methods; MP2 is used here as a proof-of-concept. Thus, we have showcased that LR(ITA) is designed with architectural and conceptual simplicity and is numerically shown to be a good protocol to predict the electron correlation energies of various systems. Of note, the predictive power of LR(ITA) is best for chemically similar systems, whereas extrapolation across chemically distinct sets should be performed with caution. Plus, while the LR(ITA) model generally maintains a strong linear correlation for geometries close to the equilibrium, the predictive accuracy can decrease for significantly distorted geometries. This is because the ITA descriptors are computed from the Hartree-Fock electron density, which changes with geometry, and the linear regression coefficients are fitted to equilibrium structures.
Up to now, we have mainly focused on MP2, it is compelling and valuable to carry out a more extensive benchmarking against (i) CCSD(T) for larger or more complex systems and (ii) more challenging cases where both dynamic and static correlation effects may be significant, like polyyne, polyene, and acene with large n.
Admittedly, using LR(ITA) to accurately and efficiently predict the electron correlation energy is still in its infancy. On the one hand, for three-dimensional systems, the RMSD values between the predicted and computed MP2 correlation energies are unacceptably large, even though there is still a strong linearity between the ITA quantities and the MP2 correlation energy. Would it be possible that more sophisticated, higher-order ITA quantities could capture additional electronic structure information, analogous to the “rungs” of Jacob’s ladder in DFT? If so, developing and testing a hierarchy of ITA quantities could potentially improve the predictive power of LR(ITA) for complex three-dimensional systems.
On the other hand, we will implement a new concept of “ITL-DL Loop”. The physics behind it is simple: low-tier (such as semiempirical PM7 [57] or even promolecular [58,59]) electron densities are used as input for ITA quantities, and DL is introduced to obtain high-tier (such as DFT) electron densities. Based on the newly generated electron densities, ITA quantities are obtained and used as input for another either classical or quantum DL model to predict the electron correlation energies of electrons of physicochemical properties of molecules. Moreover, extending the ITA-based method to quantities reflecting the response of electronic energy with respect to the nuclear displacement is another potential direction. Work along these lines is in progress, and the results will be presented elsewhere.

4. Materials and Methods

4.1. Information-Theoretic Approach Quantities

Though density functional theory (DFT) [33,34] and information theory (IT) [3,4] are two totally different areas, they have been combined together with the electron density distribution as a seamless linker, and this community has seen many successes for more than 40 years [15,60,61,62,63,64,65,66,67,68,69,70,71]. In this work, we will outline some well-established ITA quantities. First and foremost, Shannon entropy S S [7] and Fisher information I F [8] are two foundational quantities in information theory. They are defined as Equations (1) and (2), respectively.
S S = ρ r l n ρ r d r
I F = | ρ ( r ) | 2 ρ ( r ) d r
where ρ r is the electron density and ρ ( r )   is the density gradient. Physically, S S   characterizes the spatial delocalization of the electron density, while I F reflects its sharpness or localization. Of note, S S and I F are not mutually exclusive and but always intercorrelated [72,73].
Beyond the total electron density, additional quantities such as kinetic-energy density can be incorporated into the formulation of information-theoretic approaches (ITA). Utilizing both electron density and kinetic-energy density, Ghosh, Berkowitz, and Parr introduced an entropy functional known as ( S G B P ) [15]
S G B P = 3 2 k ρ r c + l n t ( r ; ρ ) t T F ( r ; ρ ) d r
where t(r; ρ) and tTF(r; ρ) represent the non-interacting and Thomas–Fermi (TF) kinetic energy density, respectively. The constants are defined as follows: k is the Boltzmann constant, c = (5/3) +ln(4πcK/3), and cK = (3/10)(3π2)2/3]. The non-interacting kinetic energy density t r ; ρ integrates to give the total kinetic energy T S ,
t r ; ρ d r = T S
It can be computed from the canonical orbital densities as,
t r ; ρ = i 1 8 ρ i · ρ i ρ i 1 8 2 ρ
while the Thomas–Fermi expression is given by,
t T F ( r ; ρ ) = c K ρ 5 / 3 r
It is important to note that kinetic-energy density may take different forms depending on context [74,75,76,77,78,79,80,81]. Nonetheless, SGBP satisfies the maximum-entropy principle from a rigorous mathematical viewpoint [15].
Expanding further, several ITA descriptors have been proposed to characterize chemical reactivity. Within the framework of conceptual density functional theory (CDFT) [82,83,84,85], other well-established ITA quantities have been proposed, including the Onicescu information energy (of order n) [16],
E n = 1 n 1 ρ n r d r
relative Rényi entropy of order n [16],
R n r = 1 1 n l o g 10 ρ n r ρ 0 n 1 r d r
and relative Shannon entropy, or information gain ( I G ) [17], also called Kullback−Leibler divergence,
I G = ρ r l n ρ r ρ 0 r d r
E2 and E3 (of Equation (7) were introduced to define a finer measure of dispersion distribution than S S . In Equations (8) and (9),   ρ 0 r is a reference-state density, and both ρ 0 r and ρ r are normalized to the total number of electrons of a molecule.
More recently [18], one of the present authors introduced three ITA descriptors, G1, G2, and G3, applicable at both atomic and molecular levels, as follows:
G 1 = A 2 ρ A r ρ A r ρ A 0 r d r
G 2 = A ρ A r 2 ρ A r ρ A r 2 ρ A 0 r ρ A 0 r d r
G 3 = A ρ A r [ l n ρ A r ρ A 0 r ] 2 d r
Finally, to partition the electron density into atomic contributions within a molecule, the Hirshfeld stockholder approach [86,87] is frequently adopted. It is defined as follows:
ρ A r = ω A r ρ r = ρ A 0 r ( r R A ) B ρ B 0 r R B ρ r
Here,   ρ A r is the atomic (Hirshfeld) density, ω A r is the weight or “sharing” function, ρ B 0 r R B represents the reference (typically spherically averaged) atomic density centered at R B . The denominator is known as the promolecular density. The stockholder method naturally aligns with ITA due to its information-theoretic foundation. Alternative partitioning schemes include Becke’s fuzzy atom method [88] and Bader’s atoms-in-molecules (AIM) approach based on zero-flux surfaces [58]. A summary of our recent work in this direction is available in Ref. [89].

4.2. An Outline of GEBF

In the generalized energy-based fragmentation (GEBF) method [35,36,37,38], the total energy of a large system, such as a macromolecule or molecular aggregate, is expressed as a linear combination of the energies of smaller embedded subsystems, as given in Equation (14).
E t o t = m C m E ~ m m C m 1 A B > A Q A Q B R A B
Here, E ~ m and C m stand for the total energy and the coefficient of the mth subsystem, respectively. QA, is the atomic charge on atom A. RAB is the interatomic distance between atoms A and B.
The general procedure for performing GEBF calculations involves several steps. Employing a molecular cluster of benzene (C6H6) as illustrated in Figure 4f, each benzene molecule is treated as a fragment. Primitive subsystems are then constructed centered at each fragment, defined by a distance threshold (ζ). These primitive subsystems are assigned coefficients Cm = +1. Due to the spatial overlap among primitive subsystems, smaller derivative subsystems are generated. The coefficients of these derivative subsystems are determined automatically using the principle of inclusion and exclusion, ensuring proper energy accounting. Another parameter, γmax, representing the maximum number of fragments allowed in a subsystem, is introduced to control subsystem size.
All quantum chemical calculations for the subsystems are carried out using the GEBF method as implemented in the LSQC 3.0 (low scaling quantum chemistry) package [90]. In this work, the two key GEBF parameters (ζ, γmax) are set to be (4.0, 6).

4.3. Computational Details

A total of 24 of octane isomers, metallic clusters Ben (n = 3 to 25), Mgn (n = 3 to 20 and 28), (CO2)n (n = 4 to 40), organic clusters of (C6H6)n (n = 4 to 30), covalent Sn (n = 2 to 18); polymeric structures (see Figure 2) of polyyne, polyene, all-trans-polymethineimine, and acene, were taken from our previous publication. For the protonated clusters [(H2O)n(H3O)]+, they were taken from Ref [23]. For cluster sizes n = 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20, there are 74, 79, 113, 119, 108, 140, 121, 138, 114, 125, and 143 structures, respectively.
Molecular wavefunctions for all the systems were obtained at the HF/6-311++G(d,p) level. The Multiwfn 3.8 [91,92] program was utilized to calculate all ITA quantities by using the Gaussian 16 checkpoint or wavefunction file as the input. The stockholder Hirshfeld partition scheme of atoms in molecules was employed when atomic contributions were concerned. The reference-state density was the neutral atom calculated at the restricted open-shell ROHF/6-311++G(d,p) level. CCSD and CCSD(T) calculations for octane isomers were performed with the Gaussian 16 [93] package. For RI-MP2 calculations, Hartree-Fock (HF) orbitals from the Gaussian 16 calculations were then transformed into the ORCA [94] format by using the MOKIT [95] program (version 1.2.7rc9). The frozen core formalism [96,97] was used throughout this work, unless otherwise stated.

5. Conclusions

To summarize, in this work, we have applied the information-theoretic approach (ITA) quantities to appreciate the post-Hartree-Fock (such as MP2 or RI-MP2) correlation energies for various molecular clusters and polymeric systems with both localized and delocalized electronic structures. We have found that for linear or quasi-linear polymeric systems, such as polyyne and polyene, the predicted results based on the Hartree-Fock ITA quantities are in excellent agreement with the calculated MP2 correlation energies. For other systems, such as hydrogen-bonded protonated water clusters and dispersion-bound carbon dioxide and benzene clusters, satisfactory results can be obtained with the LR(ITA) protocol. For metallic Ben and Mgn, as well as covalent Sn, one can still obtain reasonable results. In addition, for relatively larger benzene clusters, we compare the LR(ITA) results with those from the GEBF method, and similar accuracy is observed. Our results collectively showcase that LR(ITA) is a promising method as a cost-efficient tool in predicting the electron correlation energy.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/molecules30173500/s1, Tables S1–S17: Hartree-Fock ITA quantities, the electron correlation energies and the total energies, and linear regression coefficients and correlation coefficients. Figure S1. The correlation energy differences between LR(G3)-, and GEBF-LR(G3)-predicted values as referenced to those of GEBF versus the cluster size.

Author Contributions

Conceptualization, J.C., S.L., P.W.A., and D.Z.; data curation, P.W.,D.H., L.L., Y.Z., and D.Z.; formal analysis, P.W., D.H., Y.Z., and D.Z.; funding acquisition, P.W.A. and D.Z.; project administration, S.L., P.W.A., and D.Z.; supervision, S.L., P.W.A., and D.Z.; writing—original draft, J.C., S.L., P.W.A., and D.Z.; writing—reviewing and editing, J.C., S.L., P.W.A., and D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (grant No. 22203071 and 22361051), the High-Level Talent Special Support Plan, the China Scholarship Council, NSERC, Canada Research Chairs, and the Digital Research Alliance of Canada.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

Part of the computations were performed on the high-performance computers of the Advanced Computing Center of Yunnan University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Szabo, A.; Ostlund, N.S. Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory; Dover Publications: Garden City, NY, USA, 1996. [Google Scholar]
  2. Tew, D.P.; Klopper, W.; Helgaker, T. Electron correlation: The many-body problem at the heart of chemistry. J. Comput. Chem. 2007, 28, 1307–1320. [Google Scholar] [CrossRef]
  3. Nalewajski, R.F.; Parr, R.G. Information theory, atoms in molecules, and molecular similarity. Proc. Natl. Acad. Sci. USA 2000, 97, 8879–8882. [Google Scholar] [CrossRef]
  4. Ayers, P.W. Information Theory, the Shape Function, and the Hirshfeld Atom. Theor. Chem. Acc. 2006, 115, 370–378. [Google Scholar] [CrossRef]
  5. Zhao, Y.; Zhao, D.; Rong, C.; Liu, S.; Ayers, P.W. Information Theory Meets Quantum Chemistry: A Review and Perspective. Entropy 2025, 27, 644. [Google Scholar] [CrossRef] [PubMed]
  6. Zhao, Y.; Zhao, D.; Rong, C.; Liu, S.; Ayers, P.W. Extending the information-theoretic approach from the (one) electron density to the pair density. J. Chem. Phys. 2025, 162, 244108. [Google Scholar] [CrossRef] [PubMed]
  7. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  8. Fisher, R.A. Theory of statistical estimation. Math. Proc. Camb. Philos. Soc. 1925, 22, 700–725. [Google Scholar] [CrossRef]
  9. Zhao, D.; Liu, S.; Chen, D. A Density Functional Theory and Information-Theoretic Approach Study of Interaction Energy and Polarizability for Base Pairs and Peptides. Pharmaceuticals 2022, 15, 938. [Google Scholar] [CrossRef]
  10. Zhao, D.; He, X.; Ayers, P.W.; Liu, S. Excited-State Polarizabilities: A Combined Density Functional Theory and Information-Theoretic Approach Study. Molecules 2023, 28, 2576. [Google Scholar] [CrossRef]
  11. Zhao, D.; Zhao, Y.; He, X.; Ayers, P.W.; Liu, S. Efficient and accurate density-based prediction of macromolecular polarizabilities. Phys. Chem. Chem. Phys. 2023, 25, 2131–2141. [Google Scholar] [CrossRef]
  12. Zhao, D.; Zhao, Y.; Xu, E.; Liu, W.; Ayers, P.W.; Liu, S.; Chen, D. Fragment-Based Deep Learning for Simultaneous Prediction of Polarizabilities and NMR Shieldings of Macromolecules and Their Aggregates. J. Chem. Theory Comput. 2024, 20, 2655–2665. [Google Scholar] [CrossRef]
  13. Yuan, Y.; Zhao, Y.; Lu, L.; Wang, J.; Chen, J.; Ayers, P.W.; Liu, S.; Zhao, D. Multiproperty Deep Learning of the Correlation Energy of Electrons and the Physicochemical Properties of Molecules. J. Chem. Theory Comput. 2025, 21, 5997–6006. [Google Scholar] [CrossRef]
  14. Zhao, Y.; Richer, M.; Ayers, P.W.; Liu, S.; Zhao, D. Can the FCI Energies/Properties be Predicted with HF/DFT Densities? J. Chem. Sci. 2025, accepted. [Google Scholar]
  15. Ghosh, S.K.; Berkowitz, M.; Parr, R.G. Transcription of ground-state density-functional theory into a local thermodynamics. Proc. Natl. Acad. Sci. USA 1984, 81, 8028–8031. [Google Scholar] [CrossRef]
  16. Liu, S.; Rong, C.; Wu, Z.; Lu, T. Rényi entropy, Tsallis entropy and Onicescu information energy in density functional reactivity theory. Acta Phys. -Chim. Sin. 2015, 31, 2057–2063. [Google Scholar] [CrossRef]
  17. Kullback, S. Information Theory and Statistics; Dover Publications: Mineola, NY, USA, 1997. [Google Scholar]
  18. Liu, S. Identity for Kullback-Leibler divergence in density functional reactivity theory. J. Chem. Phys. 2019, 151, 141103. [Google Scholar] [CrossRef] [PubMed]
  19. Abyaz, B.; Mahdavifar, Z.; Schreckenbach, G.; Gao, Y. Prediction of beryllium clusters (Ben; n = 3–25) from first principles. Phys. Chem. Chem. Phys. 2021, 23, 19716–19728. [Google Scholar] [CrossRef] [PubMed]
  20. Duanmu, K.; Friedrich, J.; Truhlar, D.G. Thermodynamics of Metal Nanoparticles: Energies and Enthalpies of Formation of Magnesium Clusters and Nanoparticles as Large as 1.3 nm. J. Phys. Chem. C 2016, 120, 26110–26118. [Google Scholar] [CrossRef]
  21. Raghavachari, K. Structures and stabilities of sulfur clusters. J. Chem. Phys. 1990, 93, 5862–5874. [Google Scholar] [CrossRef]
  22. Jones, R.O.; Ballone, P. Density functional and Monte Carlo studies of sulfur. I. Structure and bonding in Sn rings and chains (n = 2 − 18). J. Chem. Phys. 2003, 118, 9257–9265. [Google Scholar] [CrossRef]
  23. Ng, W.-P.; Zhang, Z.; Yang, J. Accurate Neural Network Fine-Tuning Approach for Transferable Ab Initio Energy Prediction across Varying Molecular and Crystalline Scales. J. Chem. Theory Comput. 2025, 21, 1602–1614. [Google Scholar] [CrossRef]
  24. Takeuchi, H. Geometry Optimization of Carbon Dioxide Clusters (CO2)n for 4 ≤ n ≤ 40. J. Phys. Chem. A 2008, 112, 7492–7497. [Google Scholar] [CrossRef]
  25. Takeuchi, H. Structural Features of Small Benzene Clusters (C6H6)n (n ≤ 30) As Investigated with the All-Atom OPLS Potential. J. Phys. Chem. A 2012, 116, 10172–10181. [Google Scholar] [CrossRef]
  26. Roothaan, C.C.J. New Developments in Molecular Orbital Theory. Rev. Mod. Phys. 1951, 23, 69–89. [Google Scholar] [CrossRef]
  27. Møller, C.; Plesset, M.S. Note on an Approximation Treatment for Many-Electron Systems. Phys. Rev. 1934, 46, 618–622. [Google Scholar] [CrossRef]
  28. Weigend, F.; Ahlrichs, R. Efficient use of the correlation consistent basis sets in resolution of the identity MP2 calculations. Chem. Phys. Lett. 1997, 294, 143–152. [Google Scholar] [CrossRef]
  29. Weigend, F.; Häser, M.; Patzelt, H.; Ahlrichs, R. RI-MP2: Optimized auxiliary basis sets and demonstration of efficiency. Chem. Phys. Lett. 1998, 294, 143–152. [Google Scholar] [CrossRef]
  30. Bartlett, R.J.; Watts, J.D. The coupled-cluster single and double excitation model for the ground-state correlation energy. Chem. Phys. Lett. 1989, 155, 133–140. [Google Scholar] [CrossRef]
  31. Čížek, J.; Paldus, J. Coupled-cluster method with singles and doubles for closed-shell systems. Int. J. Quantum Chem. 1971, 5, 359–379. [Google Scholar]
  32. Purvis, G.D.; Bartlett, R.J. A full coupled-cluster singles and doubles model: The inclusion of disconnected triples. J. Chem. Phys. 1982, 76, 1910–1918. [Google Scholar] [CrossRef]
  33. Parr, R.G.; Yang, W. Density Functional Theory of Atoms and Molecules; Oxford University Press: Oxford, UK, 1989. [Google Scholar]
  34. Teale, A.M.; Helgaker, T.; Savin, A.; Adamo, C.; Aradi, B.; Arbuznikov, A.V.; Ayers, P.W.; Baerends, E.J.; Barone, V.; Calaminici, P.; et al. DFT exchange: Sharing perspectives on the workhorse of quantum chemistry and materials science. Phys. Chem. Chem. Phys. 2022, 24, 28700–28781. [Google Scholar] [CrossRef]
  35. Li, S.; Li, W.; Fang, T. An Efficient Fragment-Based Approach for Predicting the Ground-State Energies and Structures of Large Molecules. J. Am. Chem. Soc. 2005, 127, 7215–7226. [Google Scholar] [CrossRef]
  36. Li, W.; Li, S.; Jiang, Y. Generalized Energy-Based Fragmentation Approach for Computing the Ground-State Energies and Properties of Large Molecules. J. Phys. Chem. A 2007, 111, 2193–2199. [Google Scholar] [CrossRef]
  37. Li, S.; Li, W.; Ma, J. Generalized Energy-Based Fragmentation Approach and Its Applications to Macromolecules and Molecular Aggregates. Acc. Chem. Res. 2014, 47, 2712–2720. [Google Scholar] [CrossRef] [PubMed]
  38. Li, W.; Dong, H.; Ma, J.; Li, S. Structures and Spectroscopic Properties of Large Molecules and Condensed-Phase Systems Predicted by Generalized Energy-Based Fragmentation Approach. Acc. Chem. Res. 2021, 54, 169–181. [Google Scholar] [CrossRef] [PubMed]
  39. Pulay, P. Localizability of dynamic electron correlation. Chem. Phys. Lett. 1983, 100, 151–154. [Google Scholar] [CrossRef]
  40. Sæbø, S.; Pulay, P. Local configuration interaction: An efficient approach for larger molecules. Chem. Phys. Lett. 1985, 113, 13–18. [Google Scholar] [CrossRef]
  41. Sæbø, S.; Pulay, P. Local Treatment of Electron Correlation. Annu. Rev. Phys. Chem. 1993, 44, 213–236. [Google Scholar] [CrossRef]
  42. Welborn, M.; Cheng, L.; Miller, T.F., III. Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis. J. Chem. Theory Comput. 2018, 14, 4772–4779. [Google Scholar] [CrossRef]
  43. Cheng, L.; Welborn, M.; Christensen, A.S.; Miller, T.F., III. A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules. J. Chem. Phys. 2019, 150, 131103. [Google Scholar] [CrossRef]
  44. Cheng, L.; Kovachki, N.B.; Welborn, M.; Miller, T.F., III. Regression Clustering for Improved Accuracy and Training Costs with Molecular-Orbital-Based Machine Learning. J. Chem. Theory Comput. 2019, 15, 6668–6677. [Google Scholar] [CrossRef]
  45. Imamura, Y.; Takahashi, A.; Nakai, H. Grid-based energy density analysis: Implementation and assessment. J. Chem. Phys. 2007, 126, 034103. [Google Scholar] [CrossRef]
  46. Nudejima, T.; Ikabata, Y.; Seino, J.; Yoshikawa, T.; Nakai, H. Machine-learned electron correlation model based on correlation energy density at complete basis set limit. J. Chem. Phys. 2019, 151, 024104. [Google Scholar] [CrossRef] [PubMed]
  47. Han, R.; Luber, S. Fast Estimation of Møller–Plesset Correlation Energies Based on Atomic Contributions. J. Phys. Chem. Lett. 2021, 12, 5324–5331. [Google Scholar] [CrossRef] [PubMed]
  48. Han, R.; Rodríguez-Mayorga, M.; Luber, S. A Machine Learning Approach for MP2 Correlation Energies and Its Application to Organic Compounds. J. Chem. Theory Comput. 2021, 17, 777–790. [Google Scholar] [CrossRef] [PubMed]
  49. Ng, W.-P.; Liang, Q.; Yang, J. Low-Data Deep Quantum Chemical Learning for Accurate MP2 and Coupled-Cluster Correlations. J. Chem. Theory Comput. 2023, 19, 5439–5449. [Google Scholar] [CrossRef]
  50. Townsend, J.; Vogiatzis, K.D. Transferable MP2-Based Machine Learning for Accurate Coupled-Cluster Energies. J. Chem. Theory Comput. 2020, 16, 7453–7461. [Google Scholar] [CrossRef]
  51. McGibbon, R.T.; Taube, A.G.; Donchev, A.G.; Siva, K.; Hernández, F.; Hargus, C.; Law, K.-H.; Klepeis, J.L.; Shaw, D.E. Improving the accuracy of Møller-Plesset perturbation theory with neural networks. J. Chem. Phys. 2017, 147, 161725. [Google Scholar] [CrossRef]
  52. Boys, S.F. Construction of Some Molecular Orbitals to Be Approximately Invariant for Changes from One Molecule to Another. Rev. Mod. Phys. 1960, 32, 296–299. [Google Scholar] [CrossRef]
  53. Edmiston, C.; Ruedenberg, K. Localized Atomic and Molecular Orbitals. Rev. Mod. Phys. 1963, 35, 457–464. [Google Scholar] [CrossRef]
  54. White, S.R. Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 1992, 69, 2863–2866. [Google Scholar] [CrossRef]
  55. White, S.R. Density-matrix algorithms for quantum renormalization groups. Phys. Rev. B 1993, 48, 10345–10356. [Google Scholar] [CrossRef]
  56. Riplinger, T.; Neese, F. An efficient and near linear scaling pair natural orbital based local coupled cluster method. J. Chem. Phys. 2013, 138, 034106. [Google Scholar] [CrossRef]
  57. Stewart, J.J.P. Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. J. Mol. Model. 2013, 19, 1173–1213. [Google Scholar] [CrossRef] [PubMed]
  58. Bader, R.F.W. Atoms in Molecules: A Quantum Theory; Clarendon Press: Oxford, UK, 1990. [Google Scholar]
  59. Clementi, E.; Raimondi, D.L. Atomic Screening Constants from SCF Functions. J. Chem. Phys. 1963, 38, 2686–2689. [Google Scholar] [CrossRef]
  60. Gadre, S.R.; Sears, S.B.; Chakravorty, S.J.; Bendale, R.D. Some novel characteristics of atomic information entropies. Phys. Rev. A 1985, 32, 2602–2606. [Google Scholar] [CrossRef] [PubMed]
  61. Sears, S.B.; Gadre, S.R. An information theoretic synthesis and analysis of Compton profiles. J. Chem. Phys. 1981, 75, 4626–4635. [Google Scholar] [CrossRef]
  62. Sears, S.B.; Parr, R.G.; Dinur, U. On the Quantum-Mechanical Kinetic Energy as a Measure of the Information in a Distribution. Isr. J. Chem. 1980, 19, 165–173. [Google Scholar] [CrossRef]
  63. Nagy, Á. Fisher information in density functional theory. J. Chem. Phys. 2003, 119, 9401–9405. [Google Scholar] [CrossRef]
  64. Nagy, Á.; Parr, R.G. Information entropy as a measure of the quality of an approximate electronic wave function. Int. J. Quantum Chem. 1996, 58, 323–327. [Google Scholar] [CrossRef]
  65. Morrison, R.C.; Parr, R.G. Approximate density matrices and Husimi functions using the maximum entropy formulation with constraints. Int. J. Quantum Chem. 1991, 39, 823–837. [Google Scholar] [CrossRef]
  66. Ramírez, J.C.; Pérez, J.M.H.; Sagar, R.P.; Esquivel, R.O.; Hô, M.; Smith, V.H., Jr. Amount of information present in the one-particle density matrix and the charge density. Phys. Rev. A 1998, 58, 3507–3515. [Google Scholar] [CrossRef]
  67. Hô, M.; Weaver, D.F.; Smith, V.H., Jr.; Sagar, R.P.; Esquivel, R.O. Calculating the logarithmic mean excitation energy from the Shannon information entropy of the electronic charge density. Phys. Rev. A 1998, 57, 4512–4517. [Google Scholar] [CrossRef]
  68. Hô, M.; Smith, V.H., Jr.; Weaver, D.F.; Gatti, C.; Sagar, R.P.; Esquivel, R.O. Molecular similarity based on information entropies and distances. J. Chem. Phys. 1998, 108, 5469–5475. [Google Scholar] [CrossRef]
  69. Hô, M.; Sagar, R.P.; Weaver, D.F.; Smith, V.H., Jr. An investigation of the dependence of Shannon-information entropies and distance measures on molecular-geometry. Int. J. Quantum Chem. 1995, S29, 109–115. [Google Scholar] [CrossRef]
  70. Hô, M.; Sagar, R.P.; Smith, V.H., Jr.; Esquivel, R.O. Atomic information entropies beyond the Hartree-Fock limit. J. Phys. B At. Mol. Opt. Phys. 1994, 27, 5149–5157. [Google Scholar] [CrossRef]
  71. Hô, M.; Sagar, R.P.; Pérez-Jordá, J.M.; Smith, V.H., Jr.; Esquivel, R.O. A numerical study of molecular information entropies. Chem. Phys. Lett. 1994, 219, 15–20. [Google Scholar] [CrossRef]
  72. Nagy, Á.; Liu, S. Local wave-vector, Shannon and Fisher information. Phys. Lett. A 2008, 372, 1654–1656. [Google Scholar] [CrossRef]
  73. Liu, S. On the relationship between densities of Shannon entropy and Fisher information for atoms and molecules. J. Chem. Phys. 2007, 126, 191107. [Google Scholar] [CrossRef]
  74. Bader, R.F.W.; Preston, H.J.T. The kinetic energy of molecular charge distributions and molecular stability. Int. J. Quantum Chem. 1969, 3, 327–347. [Google Scholar] [CrossRef]
  75. Tal, Y.; Bader, R.F.W. Studies of the energy density functional approach. I. Kinetic energy. Int. J. Quantum Chem. 1978, 14, 153–168. [Google Scholar] [CrossRef]
  76. Cohen, L. Local kinetic energy in quantum mechanics. J. Chem. Phys. 1979, 70, 788–789. [Google Scholar] [CrossRef]
  77. Cohen, L. Representable local kinetic energy. J. Chem. Phys. 1984, 80, 4277–4279. [Google Scholar] [CrossRef]
  78. Yang, Z.; Liu, S.; Wang, Y.A. Uniqueness and Asymptotic Behavior of the Local Kinetic Energy. Chem. Phys. Lett. 1996, 258, 30–36. [Google Scholar] [CrossRef]
  79. Ayers, P.W.; Parr, R.G.; Nagy, Á. Local kinetic energy and local temperature in the density-functional theory of electronic structure. Int. J. Quantum Chem. 2002, 90, 309–326. [Google Scholar] [CrossRef]
  80. Anderson, J.S.M.; Ayers, P.W.; Hernandez, J.I.R. How Ambiguous Is the Local Kinetic Energy? J. Phys. Chem. A 2010, 114, 8884–8895. [Google Scholar] [CrossRef]
  81. Berkowitz, M. Exponential approximation for the density matrix and the Wigner distribution. Chem. Phys. Lett. 1986, 129, 486–488. [Google Scholar] [CrossRef]
  82. Geerlings, P.; De Proft, F.; Langenaeker, W. Conceptual Density Functional Theory. Chem. Rev. 2003, 103, 1793–1874. [Google Scholar] [CrossRef]
  83. Johnson, P.A.; Bartolotti, L.J.; Ayers, P.W.; Fievez, T.; Geerlings, P. Charge Density and Chemical Reactivity: A Unified View from Conceptual DFT. In Modern Charge Density Analysis; Gatti, C., Macchi, P., Eds.; Springer: New York, NY, USA, 2012. [Google Scholar]
  84. Liu, S. Conceptual Density Functional Theory and Some Recent Developments. Acta Phys. -Chim. Sin. 2009, 25, 590–600. [Google Scholar]
  85. Geerlings, P.; Chamorro, E.; Chattaraj, P.K.; De Proft, F.; Gázquez, J.L.; Liu, S.; Morell, C.; Toro-Labbé, A.; Vela, A.; Ayers, P.W. Conceptual density functional theory: Status, prospects, issues. Theor. Chem. Acc. 2020, 139, 36. [Google Scholar] [CrossRef]
  86. Hirshfeld, F.L. Bonded-atom fragments for describing molecular charge densities. Theor. Chim. Acta 1977, 44, 129–138. [Google Scholar] [CrossRef]
  87. Heidar-Zadeh, F.; Ayers, P.W.; Verstraelen, T.; Vinogradov, I.; Vohringer-Martinez, E.; Bultinck, P. Information-Theoretic Ap-proaches to Atoms-in-Molecules: Hirshfeld Family of Partitioning Schemes. J. Phys. Chem. A 2018, 122, 4219–4245. [Google Scholar] [CrossRef] [PubMed]
  88. Becke, A.D. A multicenter numerical integration scheme for polyatomic molecules. J. Chem. Phys. 1988, 88, 2547–2553. [Google Scholar] [CrossRef]
  89. Rong, C.; Wang, B.; Zhao, D.; Liu, S. Information-Theoretic approach in density functional theory and its recent applications to chemical problems. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2020, 10, e1461. [Google Scholar] [CrossRef]
  90. Li, W.; Chen, C.; Zhao, D.; Li, S. LSQC: Low scaling quantum chemistry program. Int. J. Quantum Chem. 2015, 115, 641–646. [Google Scholar] [CrossRef]
  91. Lu, T.; Chen, F. Multiwfn: A multifunctional wavefunction analyzer. J. Comput. Chem. 2012, 33, 580–592. [Google Scholar] [CrossRef]
  92. Lu, T. A comprehensive electron wavefunction analysis toolbox for chemists, Multiwfn. J. Chem. Phys. 2024, 161, 082503. [Google Scholar] [CrossRef]
  93. Frisch, M.J.; Trucks, G.W.; Schlegel, H.B.; Scuseria, G.E.; Robb, M.A.; Cheeseman, J.R.; Scalmani, G.; Barone, V.; Petersson, G.A.; Nakatsuji, H.; et al. Gaussian 16 Rev. C.01; Gaussian, Inc.: Wallingford, CT, USA, 2016. [Google Scholar]
  94. Zou, J. Molecular Orbital Kit (MOKIT). Available online: https://gitlab.com/jxzou/mokit (accessed on 20 May 2025).
  95. Neese, F. Software update: The ORCA program system—Version 5.0. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2022, 12, e1606. [Google Scholar] [CrossRef]
  96. Pulay, P.; Saebø, S. Orbital-invariant formulation and second-order gradient evaluation in Møller–Plesset perturbation theory. Theor. Chim. Acta 1986, 69, 357–368. [Google Scholar] [CrossRef]
  97. Hampel, C.; Peterson, K.; Werner, H. A comparison of the efficiency and accuracy of different electron correlation methods for large molecules: Quasi-Newton, MP2, CCSD, and CCSD(T). Chem. Phys. Lett. 1992, 190, 1–12. [Google Scholar] [CrossRef]
Figure 1. Comparison of (a) conventional MP2 method (with Hartree-Fock orbitals as input) and (b) linear regression LR(ITA) models used in this work, where the density-based information-theoretic approach (ITA) quantities are used as input. Here, MP2 is used only as a proof-of-concept.
Figure 1. Comparison of (a) conventional MP2 method (with Hartree-Fock orbitals as input) and (b) linear regression LR(ITA) models used in this work, where the density-based information-theoretic approach (ITA) quantities are used as input. Here, MP2 is used only as a proof-of-concept.
Molecules 30 03500 g001
Figure 2. Shown here are a total of 24 isomers of both branched and linear octane studied in this work.
Figure 2. Shown here are a total of 24 isomers of both branched and linear octane studied in this work.
Molecules 30 03500 g002
Figure 3. Some representative polymeric structures used in this work, including (a) polyyne, (b) polyene, (c) all-trans-polymethineimine, and (d) acene.
Figure 3. Some representative polymeric structures used in this work, including (a) polyyne, (b) polyene, (c) all-trans-polymethineimine, and (d) acene.
Molecules 30 03500 g003
Figure 4. Some representative molecular structures used in this work, including (a) Ben, (b) Mgn, (c) Sn, (d) [H+(H2O)n], (e) (CO2)n, and (f) (C6H6)n clusters, respectively.
Figure 4. Some representative molecular structures used in this work, including (a) Ben, (b) Mgn, (c) Sn, (d) [H+(H2O)n], (e) (CO2)n, and (f) (C6H6)n clusters, respectively.
Molecules 30 03500 g004
Figure 5. Comparison of the LR(G3)-, GEBF-LR(G3)-predicted and GEBF-calculated MP2-level electron correlation energies for benzene clusters (C6H6)n (n = 15–30). The regression equation is trained on smaller benzene clusters (C6H6)n (n = 4–14). RMSD: root mean squared deviation. Note that the R2 and RMSD values here gauge the prediction quality of an extrapolated set, differing from the regression statistics in previous tables that summarize fits within the training set.
Figure 5. Comparison of the LR(G3)-, GEBF-LR(G3)-predicted and GEBF-calculated MP2-level electron correlation energies for benzene clusters (C6H6)n (n = 15–30). The regression equation is trained on smaller benzene clusters (C6H6)n (n = 4–14). RMSD: root mean squared deviation. Note that the R2 and RMSD values here gauge the prediction quality of an extrapolated set, differing from the regression statistics in previous tables that summarize fits within the training set.
Molecules 30 03500 g005
Table 1. Strong linear correlations (R2) and RMSD a (in mH) between the calculated b and predicted correlation energies based on the ITA quantities c for octane isomers.
Table 1. Strong linear correlations (R2) and RMSD a (in mH) between the calculated b and predicted correlation energies based on the ITA quantities c for octane isomers.
ITAMethodSlopeInterceptR2RMSD
S S MP20.03673221−4.470378930.8781.9
CCSD0.02760739−3.772407730.8971.3
CCSD(T)0.03224137−4.226582510.8931.5
I F MP20.01016369−21.90769910.9870.6
CCSD0.00756499−16.72780420.9890.4
CCSD(T)0.00885171−19.39098150.9880.5
S G B P MP20.03958034−18.813894750.9641.0
CCSD0.02958941−14.482379930.9740.6
CCSD(T)0.03459737−16.752585920.9720.8
a RMSD: root mean squared deviation. b The basis set 6-311++G(d,p) was used. c HF/6-311++G(d,p).
Table 2. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for polyyne. RMSD is in mH, and others are in a.u.
Table 2. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for polyyne. RMSD is in mH, and others are in a.u.
n S S I F /103 S G B P /103 E 2 E 3 /103 R 2 r R 3 r G 1 G 3 I G ϵ M P 2
117.1160.5030.09663.3412.25114.47815.411−6.70213.8890.253−0.2718
227.5030.9960.178126.4544.49826.68728.049−11.82226.7240.357−0.5284
337.8771.4890.260189.5656.74438.89140.680−16.94639.5890.458−0.7879
448.2381.9820.342252.6828.99151.09353.301−22.06452.4680.556−1.0485
558.6042.4750.425315.79711.23863.29265.918−27.18665.3350.654−1.3100
668.9682.9680.507378.91413.48575.49178.532−32.30378.2060.751−1.5715
779.3313.4610.589442.03215.73187.69091.146−37.42291.0790.849−1.8332
889.6963.9540.671505.14717.97899.888103.759−42.541103.9520.946−2.0952
9100.0634.4470.753568.26420.225112.086116.372−47.659116.8211.043−2.3570
10110.4354.9400.835631.37822.472124.284128.984−52.780129.6861.139−2.6192
30317.73014.8002.4781893.70867.408368.246381.243−155.141387.1803.076−7.8579
R21.0001.0001.0001.0001.0001.0001.0001.0001.0001.000−0.2718
RMSD1.51.31.31.21.21.31.51.40.92.9
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 3. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for polyene. RMSD is in mH, and others are in a.u.
Table 3. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for polyene. RMSD is in mH, and others are in a.u.
n S S I F /103 S G B P /103 E 2 E 3 /103 R 2 r R 3 r G 1 G 3 ϵ M P 2
122.0690.5100.10963.4272.24316.63817.935−8.84618.948−0.2910
237.4861.0100.204126.7324.48931.06733.236−16.19637.205−0.5659
352.8761.5100.298189.9306.72645.49348.534−23.49555.289−0.8423
468.2602.0090.393253.1628.96759.91863.824−30.80873.409−1.1192
583.6432.5090.488316.40611.20974.34279.111−38.12591.575−1.3964
699.0233.0090.583379.65313.45188.76694.397−45.438109.749−1.6737
7114.4033.5090.677442.90215.693103.190109.682−52.756127.925−1.9510
8129.7834.0080.772506.15017.934117.613124.967−60.070146.103−2.2285
9145.1634.5080.867569.39920.176132.037140.251−67.385164.282−2.5059
10160.5425.0080.962632.64722.418146.460155.536−74.701182.461−2.7834
30468.13215.0032.8561897.61667.253434.930461.224−221.004546.043−8.3329
R21.0001.0001.0001.0001.0001.0001.0001.0001.000
RMSD2.92.72.72.72.72.83.02.92.4
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 4. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for all-trans-polymethineimine. RMSD is in mH, and others are in a.u.
Table 4. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for all-trans-polymethineimine. RMSD is in mH, and others are in a.u.
n S S I F S G B P /103 E 2 E 3 /103 R 2 r R 3 r G 3 ϵ M P 2
117.8910.6020.10984.2344.13816.58517.76717.765−0.3219
229.2261.1940.204168.2818.27230.91832.78435.058−0.6255
340.5341.7860.300252.32212.40645.24747.79752.420−0.9295
451.8342.3770.395336.41816.54659.57662.80569.772−1.2337
563.1282.9690.490420.43220.67573.90577.81487.181−1.5377
674.4183.5610.585504.45724.80688.23492.823104.601−1.8416
785.7064.1520.680588.48828.940102.564107.833121.973−2.1454
896.9904.7440.775672.53533.072116.894122.845139.422−2.4491
9108.2735.3360.871756.62337.210131.224137.857156.850−2.7527
10119.5525.9270.966840.67741.345145.555152.870174.241−3.0563
20232.30811.8441.9171681.13582.670288.867303.008348.833−6.0907
30345.01417.7612.8692521.373123.976432.195453.192523.649−9.1245
R21.0001.0001.0001.0001.0001.0001.0001.000
RMSD0.41.00.90.90.71.11.23.9
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 5. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for acene. RMSD is in mH, and others are in a.u.
Table 5. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for acene. RMSD is in mH, and others are in a.u.
n S S I F /103 S G B P /103 E 2 /103 E 3 /103 R 2 r R 3 r G 1 G 2 G 3 ϵ M P 2
270.3952.4890.4600.31611.20769.91073.784−34.72225.64588.981−1.3706
394.5983.4780.6360.44215.68896.553101.740−47.66635.408123.979−1.9146
4118.8074.4680.8110.56920.169123.195129.691−60.60245.077158.965−2.4603
5143.0225.4570.9870.69524.651149.835157.637−73.54754.729193.946−3.0070
6167.2416.4471.1620.82129.133176.474185.576−86.48064.373228.921−3.5550
7191.4617.4361.3380.94833.614203.111213.512−99.41974.020263.894−4.1036
8215.6758.4261.5131.07438.096229.747241.444−112.34883.657298.878−4.6531
9239.8949.4151.6891.20042.578256.382269.372−125.27893.298333.853−5.2030
10264.11410.4051.8651.32647.059283.016297.298−138.209102.944368.828−5.7535
11288.48411.3942.0401.45351.543309.627325.167−151.260112.708404.117−6.3415
R21.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
RMSD10.511.511.411.411.411.611.910.410.910.3
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 6. Strong linear relationships (R2) and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for neutral Ben (n = 3−25) clusters.
Table 6. Strong linear relationships (R2) and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for neutral Ben (n = 3−25) clusters.
S S I F S G B P E 2 E 3 R 2 r G 3
R20.9960.9960.9960.9960.9960.9940.993
RMSD28.528.627.928.027.935.937.1
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 7. Strong linear relationships (R2) and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for Mgn (n = 3−20, and 28) clusters.
Table 7. Strong linear relationships (R2) and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for Mgn (n = 3−20, and 28) clusters.
S S I F /103 S G B P /103 E 2 /103 E 3 /105 R 2 r R 3 r G 3
R20.9980.9960.9960.9960.9960.9950.9930.995
RMSD17.724.825.224.824.826.733.027.2
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 8. Strong linear relationships (R2) and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for covalent Sn (n = 2−18) clusters.
Table 8. Strong linear relationships (R2) and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for covalent Sn (n = 2−18) clusters.
S S I F /103 S G B P /103 E 2 /103 E 3 /106 R 2 r R 3 r G 3
R20.9980.9980.9980.9980.9980.9980.9980.995
RMSD29.526.926.726.926.927.729.542.2
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 9. Strong linear correlations and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for protonated water clusters.
Table 9. Strong linear correlations and RMSD a between the calculated b and predicted correlation energies based on the ITA quantities c for protonated water clusters.
ITAR2RMSD (mH)
S S 1.0004.2
I F 1.0002.2
S G B P 1.0002.2
E 2 1.0002.1
E 3 1.0002.1
R 2 r 1.0003.0
R 3 r 1.0006.8
G 3 1.0009.3
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 10. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for CO2 clusters. RMSD is in mH, and others are in a.u.
Table 10. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for CO2 clusters. RMSD is in mH, and others are in a.u.
n S S I F /103 S G B P /103 E 2 /103 E 3 /105 R 2 r R 3 r G 3 ϵ M P 2
435.6764.6180.6040.7770.60890.11994.24287.199−2.0780
544.3435.7720.7550.9720.760112.597117.629110.311−2.6020
652.9756.9250.9051.1660.911135.124141.177133.364−3.1251
761.5518.0781.0561.3601.063157.664164.803156.231−3.6531
870.2259.2321.2071.5551.215180.182188.320179.164−4.1772
978.89010.3851.3571.7491.367202.688211.805202.144−4.7008
1087.45911.5381.5081.9431.519225.201235.323225.314−5.2286
1196.06612.6911.6592.1381.671247.744258.928248.319−5.7533
12104.63013.8451.8102.3321.823270.253282.434271.861−6.2824
13113.09614.9971.9602.5261.975292.762305.941295.591−6.8152
14121.76016.1512.1112.7212.127315.271329.437318.380−7.3397
15130.26117.3032.2622.9152.279337.783352.939342.101−7.8683
16138.80918.4562.4123.1102.431360.299376.486365.340−8.3948
17147.42619.6102.5633.3042.582382.823400.036388.562−8.9212
18155.93520.7632.7143.4982.734405.331423.523411.987−9.4509
19164.46421.9162.8643.6922.886427.851447.048435.461−9.9779
20173.04923.0693.0153.8873.039450.351470.533458.492−10.5090
21181.68124.2223.1664.0813.190472.899494.173481.566−11.0351
22190.08525.3753.3164.2753.342495.391517.595505.485−11.5638
23198.66926.5283.4674.2753.342517.900541.108528.669−12.0920
24207.33327.6813.6184.4703.494540.447564.742551.542−12.6201
25215.91228.8343.7684.6643.645562.977588.305575.132−13.1445
26224.34829.9873.9194.8583.797585.450611.697598.457−13.6717
27232.94231.1404.0695.0533.950607.998635.332621.629−14.2075
28241.31132.2924.2205.2474.102630.486658.742646.216−14.7384
29249.84933.4454.3715.4414.253653.028682.370669.245−15.2667
30258.48534.5984.5215.6364.405675.542705.876692.513−15.7929
31266.92435.7514.6725.8304.557698.031729.325716.064−16.3268
32275.45536.9044.8236.0254.709720.528752.779739.801−16.8616
33283.98738.0574.9736.2194.861743.042776.303763.194−17.3899
34292.46039.2095.1246.4135.013765.584799.882786.656−17.9154
35301.25040.3635.2756.6085.165788.149823.593809.202−18.4287
36309.83841.5165.4256.8025.316810.635847.024832.618−18.9590
37318.35042.6695.5767.1915.620833.121870.410856.497−19.4844
38326.87443.8225.7277.3855.772855.667894.049879.546−20.0129
39335.36144.9745.8777.5795.924878.154917.451903.378−20.5439
40343.79446.1276.0287.7746.076900.680941.037927.399−21.0765
R21.0001.0001.0001.0001.0001.0001.0001.000
RMSD14.66.56.66.36.36.46.810.8
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Table 11. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for benzene (C6H6)n clusters. RMSD is in mH, and others are in a.u.
Table 11. Strong linear relationships (R2) and RMSD a between the calculated b (in the last column) and predicted correlation energies based on the ITA quantities c for benzene (C6H6)n clusters. RMSD is in mH, and others are in a.u.
n S S I F /103 S G B P /103 E 2 E 3 /103 R 2 r R 3 r G 1 G 3 ϵ M P 2
4182.9435.9931.136759.35026.923172.970183.096−87.149221.627−3.3760
5228.3167.4901.420948.91933.629216.208228.869−108.820277.997−4.2330
6273.6918.9871.7031138.81940.367259.454274.657−130.621334.386−5.0898
7318.88610.4831.9871328.45847.078302.685320.404−152.252391.102−5.9553
8364.31011.9802.2701518.32153.807345.919366.163−174.079447.714−6.8098
9409.37413.4772.5541708.00060.526389.160411.955−195.780504.763−7.6758
10454.74414.9742.8381897.90367.261432.383457.676−217.571561.267−8.5359
11500.06916.4713.1212087.46873.973475.630503.477−239.230617.793−9.3921
12545.05417.9673.4042277.42180.708518.879549.286−261.020675.525−10.2656
13589.96319.4623.6882467.33987.442562.104595.025−282.767733.570−11.1408
14635.26420.9593.9712656.84294.148605.328640.753−304.418789.848−12.0021
R21.0001.0001.0001.0001.0001.0001.0001.0001.000
RMSD10.77.67.77.16.97.37.37.52.8
a RMSD: root mean squared deviation. b MP2/6-311++G(d,p). c HF/6-311++G(d,p).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, P.; Hu, D.; Lu, L.; Zhao, Y.; Chen, J.; Ayers, P.W.; Liu, S.; Zhao, D. Predicting the Post-Hartree-Fock Electron Correlation Energy of Complex Systems with the Information-Theoretic Approach. Molecules 2025, 30, 3500. https://doi.org/10.3390/molecules30173500

AMA Style

Wang P, Hu D, Lu L, Zhao Y, Chen J, Ayers PW, Liu S, Zhao D. Predicting the Post-Hartree-Fock Electron Correlation Energy of Complex Systems with the Information-Theoretic Approach. Molecules. 2025; 30(17):3500. https://doi.org/10.3390/molecules30173500

Chicago/Turabian Style

Wang, Ping, Dongxiong Hu, Linling Lu, Yilin Zhao, Jingbo Chen, Paul W. Ayers, Shubin Liu, and Dongbo Zhao. 2025. "Predicting the Post-Hartree-Fock Electron Correlation Energy of Complex Systems with the Information-Theoretic Approach" Molecules 30, no. 17: 3500. https://doi.org/10.3390/molecules30173500

APA Style

Wang, P., Hu, D., Lu, L., Zhao, Y., Chen, J., Ayers, P. W., Liu, S., & Zhao, D. (2025). Predicting the Post-Hartree-Fock Electron Correlation Energy of Complex Systems with the Information-Theoretic Approach. Molecules, 30(17), 3500. https://doi.org/10.3390/molecules30173500

Article Metrics

Back to TopTop