Entropy Pair Functional Theory: Direct Entropy Evaluation Spanning Phase Transitions

We prove that, within the class of pair potential Hamiltonians, the excess entropy is a universal, temperature-independent functional of the density and pair correlation function. This result extends Henderson’s theorem, which states that the free energy is a temperature dependent functional of the density and pair correlation. The stationarity and concavity of the excess entropy functional are discussed and related to the Gibbs–Bugoliubov inequality and to the free energy. We apply the Kirkwood approximation, which is commonly used for fluids, to both fluids and solids. Approximate excess entropy functionals are developed and compared to results from thermodynamic integration. The pair functional approach gives the absolute entropy and free energy based on simulation output at a single temperature without thermodynamic integration. We argue that a functional of the type, which is strictly applicable to pair potentials, is also suitable for first principles calculation of free energies from Born–Oppenheimer molecular dynamics performed at a single temperature. This advancement has the potential to reduce the evaluation the free energy to a simple modification to any procedure that evaluates the energy and the pair correlation function.


Introduction
The Helmholtz free energy, A(N, V, T), is central to the description of stable and metastable equilibrium (number of atoms, N, volume, V, and temperature, T). It is also the starting point for the treatment of relaxation toward equilibrium and the responses of systems near equilibrium to perturbations. Furthermore, local representations of A(N, V, T) are fundamental to spatial and temporal coarse graining of materials dynamics. We propose a functional approach to A(N, V, T) that will be useful for a wide range of materials, chemistry, and molecular biology problems.
The evaluation of A(N, V, T) is usually achieved by thermodynamic integration over temperature from the ideal gas or the harmonic crystal, or by integration over a parameter, λ, that continuously transforms one Hamiltonian with known free energy, A λ=0 , into the Hamiltonian of interest with free energy, A λ=1 [1]. Methods for determining the free energy, such as thermodynamic integration, can be cumbersome [2], particularly for first principles Hamiltonians [3][4][5][6][7][8][9][10][11][12] and for experiments. In many studies, where knowledge of the free energy would be valuable, it is not calculated because of the complexity or computational expense involved in evaluation of the entropy. Alternative approaches should be explored. One approach employed for the treatment of fluids, referred to as "direct evaluation of the entropy" [13][14][15][16], is based on expansions of the probability density to increasing orders in the correlation functions and powers of the density, in which a limited set of terms is retained. Stopping at the pair level, the pair correlation functions (see typical pair correlation functions in Figures 1-3) from simulations give an easily evaluated first approximation of the fluid entropy; if greater accuracy is sought, triplet correlations can be calculated and the triplet terms that contribute to the entropy can be evaluated [16,17]. Similarly, in the study of substitutional alloys, a cluster expansion of the configurational entropy is often evaluated by the cluster variation method (CVM) [18] in a process that includes increasingly higher levels of correlation to achieve the desired accuracy.
Here, we take a different approach. We prove the existence an excess entropy functional of the pair correlation, a universal functional that is valid for all systems. Our work falls within a long tradition of universal functional approaches (see Appendix A). Functional methods generally exploit stationary properties (see Appendix B).
Systems in equilibrium are characterized by time independent correlation functions as defined by Hansen and McDonald [19,20], i.e., the density, ρ(r), the pair correlation function (PCF), g(r 1 , r 2 ), and higher order correlation functions, g (n) (r 1 , r 2 . . . r n ). The correlation functions are defined in terms of the probability density, P r (r 1 ...r N ). P r (r 1 ...r N )dr 1 ...dr N is the probability that particle-n is in dr n for all n = 1, n = 2, ..., n = N. In Equation (1) the atom coordinates appear in an ordered list from 1 to N. P is symmetric under interchange of r 1 ...r N ; however, it is useful to assign specific ordering (see Appendix C) to the atomic positions. The correlation functions are: where For homogeneous systems ρ 1 = ρ = N/V. When n = 2 Equation (2) the pair correlation function is obtained; it is closely related to scattering experiments on liquids, glasses, and crystals [21].
(A) (B) Figure 1. The pair correlation function,ḡ(r), as defined by Equation (1) for BCC iron at 600 K as simulated with the Johnson potential [22]: (A) on a plane that passes through an atom at the origin and the four second nearest neighbors and (B) on a plane that passes through the four nearest neighbors of the atom at the origin. Sigma = 2.2143Å is the Sigma associated with the Lennard-Jones potential that very nearly matches the Johnson potential.
According to generalizations of Henderson's theorem [23,24], systems in equilibrium at T with interaction potentials of order n are completely characterized by the correlation functions up to g (n) . This follows from the fact that the set of n-body interac-tionsV = {v (1) , v (2) , . . . v (n) } are determined uniquely by the set of correlation functions g (1) , g (2) . . . g (n) . If the pair and all higher order interactions are specified and fixed, e.g., v (2) = 1/r and v (n) = 0 for n > 2, then the density, g (1) (r) ≡ ρ(r)/ρ determines the point potential, v (1) (r), and hence all other properties. The implication that A(N, VT) is a functional of the density is the basis of classical density functional theory [25]. The reader familiar with classical density functional theory or with electron density functional theory will be comfortable with this statement and may feel that because in nature the interactions are fixed there is no need to go beyond functionals of the density. However, expressions for the functional dependence of A(N, V, T) can be improved and the process for their development for different pair interactions can be streamlined. Currently, a new functional of the density must be introduced each time a system with a new pair potential is explored [26][27][28][29]. Functionals of pair correlations may be useful for these objectives.  Figure 1A) of the pair correlation functionḡ(r) at 600 K. Furthermore, functionals of the pair correlation may be useful in a wider context. For example, consider that the pair correlation can be defined and measured as a function of time for non-equilibrium ensembles that may have dynamics whose governing equations can be derived from functional derivatives of the excess entropy, S x , and A(N, V, T) [30]. The formalism presented here may thus have a wider range of application, beyond equilibrium. In this context the correlation functions provide a set of coordinates for coarse-grained dynamics [31] that may extend the time and spatial extent of simulation. Rather than elaborate on our vision for the use of this functional approach we simply make the surprising observation that the determination of the excess entropy requires only one quantity, the pair correlation function. For example, if the pair correlation function is known the excess entropy can be determined without knowledge of the temperature. This observation is sufficient to motivate our discussion. We also propose preliminary approximations to the universal excess-entropy functional of the density and pair correlation that require no knowledge of the interaction potentials or the temperature. We add the caveat that for the functional to be strictly applicable the Hamiltonian must be of pair potential form. However, we argue that higher order interactions are often small enough that they can be ignored because they affect the entropy only in second order (Appendix D). In this manuscript N, V, and T are assumed fixed and will be dropped from arguments henceforth.
The paper proceeds as follows. Section 2 describes properties of a variational form of the free energy. This form is combined with a constrained search to achieve a free energy functional of the density and pair correlation function. In Section 3 that part of the free energy that is independent of potentials and has an explicit linear dependence on T is identified with the excess-entropy, pair-density functional, S x [g]; its properties are discussed for the homogeneous pair potential case. In Section 4 approximations to S x [g] that are suitable for all phases are developed. In developing these approximations we begin with the conventional Kirkwood approximation for fluid probability densities [1]. We then propose a modified Kirkwood approximation that is suitable for crystals and fluids. For the crystal a connection is made between the modified Kirkwood approximation, conditional pair probabilities, and harmonic approximations. Concluding remarks in Section 5 are followed by appendices. Appendix A gives a brief survey of universal functionals that will remind the reader of the similarities and differences between several uses of this type of construct. The relationship between the variation of the free energy and the pair interaction is discussed in Appendix B. Appendix C covers organizing the list of atomic coordinates. Appendix D shows that the effect on the excess entropy of higher order interactions is second order in their strength. Appendix E describes simulation that provide PCFs and excess entropy that are compared to the output of excess entropy functionals. The results are for the Johnson potential [22]; they include the construction of an essentially exact target excess entropy as a function of temperature. Finally, the variance of the crystalline separation vectors within the classical Debye model is given in Appendix F.

Free Energy Properties
A is usually expressed in terms of the partition function. In order to arrive at a form that is stationary with respect to variations of the probability density about its equilibrium value [25,32], we rewrite A in terms of the Hamiltonian and the probability density, P(p N , r N ), where r N and p N are the N positions and momenta respectively [33].
The stationarity of this functional with respect to perturbation of the probability density, δP, about the equilibrium probability density, P H 0 , for Hamiltonian, H, follows from where Tr{δP} = 0, P H 0 = exp(− H k B T )/Z H , and Z H = Tr{exp(− H k B T )} have been used to obtain the second identity. It is easily verified that substituting P H 0 into Equation (3) gives the usual expression for free energy in terms of the partition function, When evaluating A H at a probability that differs from P H 0 by an amount ∆P that preserves the normalization of P 0 , i.e., Tr{∆P} = 0, we observed: By the Gibbs inequality, the second trace in Equation (6) is positive; therefore, A is minimized by P H 0 .
To summarize, evaluating the free energy functional, Equation (3), at a probability that differs from the exact probability, P H 0 , always increases the result. From Equation (7) we can derive upper and lower bounds for the free energy associated with Hamiltonian, H, that are based on the free energies of approximate Hamiltonians, H a that have know free energies (possibly analytic free energies). The Gibbs-Bogoliubov inequality [34], provides an upper bound which relates the change in free energy to the difference between the Hamiltonian of interest, H = H a + ∆V and the approximate Hamiltonian, H a . This follows immediately from Equation (7) by setting H = H a + ∆V and ∆P = P H a 0 − P H 0 : A lower bound may be found by interchanging the perturbed and unperturbed quantities in Equation (8) An interesting situation emerges when all terms dependent onV are moved to the righthand side: If the trial Hamiltonian, H a,{c i } , is specified by adjustable interactions,V a,{c i } , with parameters, {c i }, e.g.,V a,c (r) = c r , then: Looking back at Equation (7), the minimum property of the free energy functional, we see that it also allows the following exact, explicit expression for the free energy that invokes two levels of minimization. First, for given momentum density, m(r), number density, ρ(r), and the pair correlation function, g(r, r ), minimization is performed over all P that integrate to m(r), ρ(r), and g(r, r ). Second, A is minimized over m(r), ρ(r), and g(r, r ); Equation (12) summarizes the process of searching over all probability densities that give the specific m, ρ, and g and selecting the one that minimizes the trace. This is similar to the constrained search treatment of electron density functional theory [36]. Minimization over the momentum density (Equation (13)) can be easily performed because it is set by the temperature and can be removed from consideration because the probability density is separable, P(p N , Of all P p (p N ) that yield a specific m we choose P p (p N ) that gives the lowest A and further choose the m = m min that gives the lowest A. As a result, the free energy of the ideal gas, A id , appears as a separate term in the total free energy .
where Λ = 2πh 2 mk B T and the trace operation is understood to act on p or r according to context. Specializing to pair potential Hamiltonians,V = {v (1) (r), v (2) (r, r )}: The "v (1) term" is the external potential energy; the "v (2) term" is the interaction energy. Subtracting these two potential energy terms from the excess free energy leaves the excess entropy term which has explicit dependence on T and implicit dependence through g. Dividing the "entropy term" by −T removes all explicit dependence on T and gives the excess entropy functional, S exact Specializing to v (1) = 0, fixed N, and fixed V (ρ = N/V) gives, This is our main result. Equation (18) gives an exact but unusable expression for the excess entropy; maximizing over all P r is not feasible. Equation (18) does, however, prove that in any simulation (or experiment) the excess entropy depends on parameters such as temperature, particle mass, and pair potential only through their impact on g. S x [g] is a universal functional of the PCF; this simply means that it works without modification when applied to any pair potential simulation. In the sections below we address questions about the properties of the exact universal functional: (1) Can any properly normalized g be inserted into the functional? (2) What is the curvature of the functional? (3) How much does the value returned by the functional change when small changes to δg(r), are made in g(r)? (4) Is every g obtainable from a potential; is g "v representable"? Our development exactly parallels the development of electron density functional theory (see Appendix A). The next steps are to build upon theoretical and simulation results for the purpose of constructing useful approximations to the unusable exact functional and to compare the approximate values of the entropy to accurate values from thermodynamic integration of simulation results.

Excess Entropy Pair Density Functional
The subject of this paper, entropy in terms of correlation, is most clearly addressed by treating homogeneous pair potentials, v (2) (r 1 , r 2 ) = v(r = r 2 − r 1 ) = v(r) of finite range in the homogeneous case, v (1) = 0. When v (1) = 0 it may be assumed, without loss of generality, that ρ(r) = ρ = N V ; crystallization takes place through spontaneous symmetry breaking in g(r, r ). The free energy functional for a particular v (2) = v(r) separates into the potential energy, which has a simple explicit dependence on the potential, and the remaining kinetic and entropy terms; these remaining entropy terms form a universal functional of the pair correlation function. The form of the excess entropy is remarkable in having no explicit temperature dependence; knowledge of g and only g determines the excess entropy!
We have introducedḡ, an auxiliary functional of g(r, r ) that will be used in development of entropy functionals. In most cases, v(r) = v(|r|) = v(r); therefore, it is often a useful simplification to use the functional, g s (|r|) ≡ 1 4π π sin(θ)dθ 2π dφḡ(r, θ, φ). In a homogeneous, isotropic fluid g =ḡ = g s .
S x is the exact, temperature independent excess-entropy functional of g: The maximum excess entropy, Equation (18), a quantity with roots in statistical mechanics and information theory [37][38][39], with a constraint specified by g, is the excess entropy functional of g. This functional is defined over the domain of all g that can be derived from an N particle probability density, P(r 1 , . . . r N ) by Equation (1) ρg(r 1 , r 2 )ρ = N(N − 1) dr 3 ...r N P(r 1 ...r N ) (22) or if the density is treated as inhomogeneous [40]: The relationship between g and g r is: We will often use simply g; if ambiguity is possible a specific subscript will be added, g r Equation (23) or g H for Equation (22). What are the properties of the excess entropy functional? Are there only special g that form its domain or is the domain extensive and continuous? Can variations with respect to g be defined on the domain for the purpose of maximizing with respect to g or for finding the change in entropy associated with modification of g. Is the concavity determined over the domain so that if a maximum is found it is known to be the absolute maximum? Are all pair correlation functions related to at least one P(r 1 ...r N ) by Equation (24)?
For any g 1 and g 2 that are in the domain of ]. This is shown by utilizing probability densities, P 1 and P 2 , that are associated with g 1 and g 2 through the maximization process in Equation (18). We also introducē P λ = (1 − λ)P 1 + λP 2 , which importantly is not guaranteed to maximize the trace in Equation (18); therefore: Furthermore, The traces on the right hand side of Equation (26) are positive definite; therefore, S x is concave.
We can construct at least one proper probability density, the Kirkwood probability density [1], P K . The Kirkwood product is symmetric and positive definite.
The factors appearing in the first and second forms of the Kirkwood probability density in Equation (27) differ by terms or of order 1 N . In the second form, P HT K (r N ), the factor (1 − 1 N ) assures the correct High T limit, lim T→∞ P HT (r N ) = ρ N N! N N through first order in 1 N [13,[41][42][43][44]. Integrating over "dr 3 ...dr N " gives g(1, 2) because g(r) = 1 for an amount of integration volume in r 3 ...r N that becomes infinite in the thermodynamic limit. The domain of the excess entropy functional is extensive; every g yields at at least one physically admissible probability density and therefore is within the domain of the entropy functional. Therefore, given any g a physically admissible probability can be constructed. The first of Equations (27) is written below in a more informative way; it applies to both fluids and crystals: If the atoms are ordered along a chain as described in Appendix C, we can think of the first line of Equation (28) as representing a path from nearest neighbor to nearest neighbor; the second line goes from second neighbor to second neighbor (second neighbors along the chain but not necessarily in the structure). Let us look carefully at the entropy of the crystal. An exact expression for the probability density based on conditional pair probabilities [45] serves as a good starting point. P crystal is given in terms of conditional probability densities, e.g., the probability that an atom is at r 2 given that an atom is at r 1 and with the condition that there is an atom at r 0 , g(1, 2|0)ρ(2): Removing the conditions on each factor results in a non-symmetric approximation of the crystal probability (compare to Equation (28)): Comparing Equations (29 and (30) to the Kirkwood probability in Equation (28), we can identify the first line of Equation (28) with removal of conditions as in Equation (30). Each column in Equation (28) can be thought of as an approximation of a conditional probability in Equation (29); e.g.,

Kirkwood Entropy
The fact that the excess entropy is a universal functional allows us to propose approximations to P in terms of g and then to set parameters to give accurate entropies over many different pair potential simulations. In this paper we follow this approach; however, our dataset is very limited; we used simulation results for only the Johnson potential [22,46] from T = 10 0 to T = 10 7 K. Simulations used to obtain g(r) and the target entropy are described in Appendix E.
Our choices for approximations are informed by three interrelated approaches: 1. The Kirkwood approximation for P r in terms of products of PCFs; Equation (28). 2. The exact expression for P r in terms of conditional probabilities (or gs); Equation (29).
3. The Morris-Ho method of entropy calculation in terms of simulated correlation [35].
Approximations to P(r N ) in terms of g can be directly inserted into Equation (18) to generate approximate functionals [47]. Substituting the Kirkwood probability density into Equation (18): Simplifying to a homogeneous fluid gives the Kirkwood entropy as proposed by Green and Wallace [1,[13][14][15][41][42][43][44]48]: Equation (33) has three problems: 1. Due to a normalization problem it does not go to the correct limit at high temperature [49]. 2. It underestimates the entropy just above melting. 3. It grossly underestimates the entropy below melting; it gives −∞ for the crystal.
These shortcomings will undermine accuracy when applied to most systems. We understand the source of these shortcomings. The normalization problem at high temperature is addressed by the replacement g → g exp (φ[g]); φ is a parameterized functional of g. The cause of the Kirkwood overestimation of the entropy drop around melting is clear from Equation (27); in the fluid, whereḡ(r) has broad peaks (Figure 3), multiple products are needed to maintain separation between atoms. For example, atoms that form a triangle could overlap if only two spherical gs were included in the product. However, in the low-temperature-crystalḡ(r) approaches a sum of delta functions ( Figure 1); in this limit Equation (27) should be replaced by an expression with fewer constraining factors. The N(N − 1)/2 factors in the Kirkwood approximation (count the factors in Equation (27)) should be continuously reduced as the liquid cools toward crystallization, they should tend toward (N − 1) factors. Raising g to a fractional power, g → g 1 γ[g] , effectively reduces the number of factors of g appearing in P r ; γ[g] is a parameterized functional of g. The choice γ = N 2 is equivalent to reducing the number of factors of g to exactly N − 1. Combining the effects of φ and γ gives g → g [g] Notice that in the thermodynamic limit it is immate- [g] is applied to all g occurring in Equation (28) or if those on the first row are excluded from the substitution. Keep in mind that γ[g] and φ [g] are independent of r and T; however, they can depend on whether g has a form associated with a particular underlying crystal or fluid.

Modified Kirkwood Entropy Applied to Fluid Pair Correlation Functions
, gives: Using the modified Kirkwood probability density, P M K , to approximate the maximizing P r in the argument of the ln in Equation (18) gives an entropy functional that we refer to as the modified Kirkwood entropy,s x K [g]. Specializing to a homogeneous phase and collapsing the effect of φ into a term, 1 2φ gives:s The term 1 2φ will be selected to give the correct result for the Johnson Potential when g is almost independent of r, as it is at extremely high temperature. The value ofφ should approach one at extremely high temperature where g s (|r|) = (1 − 1/N). Let us follow the behavior of g s for large r as the temperature is increased. Starting at melting and extending to a high temperature we observe that lim r→∞ g s (|r|) = 1, then at an extremely high temperature, T t , the large r value of g s undergoes a transition to, lim r→∞ g s (|r|) = (1 − 1/N) (this limit is easily reached because we use a potential that is finite at r = 0 [46]). Up to T t the excluded atom (ρ drḡ = N − 1) is compensated within the first few neighbor shells. Above T t the excluded atom is spread over the whole volume. At infinite T the excluded atom is spread evenly over the total volume. We characterizeφ by the amount of the excluded atom that is inside the sphere inscribed in the simulation box, R in . The fractions of the volume inside and outside the sphere are: f in = 4π 3V R 3 in and f out = 1 − f in . Define: where drg s . Q characterizes the large r behavior of g; it transitions from zero at low temperature (Q is zero for the crystal) to one in the extreme high temperature range. We selected a form forφ that is parameterized in terms of Q by q 1 and This form ofφ[g] guarantees thatφ = 0 for the crystal and gives the proper behavior for the entropy in the extreme high temperature limit. If the functionals γ[g] and φ[g] are constructed on physical grounds, they could provide a reasonable starting point for a universal excess entropy functional, even though they are fitted to limited data. We define some simple functionals upon which we can build parameterized forms for γ [g]. Observingḡ(r), it is straight forward to determine whether it corresponds to a crystal or to a fluid; if it corresponds to a crystal the particular crystal is also easy to determine (see Figure 1). We define the functional, I p [g] to be BCC, FCC, HCP, etc., or fluid.
The Kirkwood entropy overestimates the gradual drop in entropy above melting and overestimates the drop in entropy at melting. Therefore, we need a T-independent functional of g that indicates the approach to crystallization so that we can use it to construct a universal excess entropy functional with the correct entropy reduction near melting. Consider the functional: The functional κ[g] increases in the fluid as T is reduced and diverges at crystallization [50].
Note that κ provides a monitor of the level of long-range correlation and the approach to crystallization. It can be used to construct an entropy functional that changes as crystallization is approached. For the fluid we restrict the dependence of γ on g to be through κ, i.e.,

Kirkwood Entropy Applied to Crystal Pair Correlation Functions
A functional described by Equation (35) is intended to apply to fluid and crystalline phases. As the excess entropy functional has no explicit dependence on T it should apply as a system changes phase from fluid to crystal. At crystallization, entropy drops abruptly; the entropy change is referred to as the entropy of fusion. For crystals the PCFs are peaked at separation vectors that are difference vectors of the lattice. For the Johnson potential, these peaks are well represented by spherical Gaussians (Figures 1 and 2). The PCF, g H , is characterized by Gaussians of width, λ 0i at the separations equal to R i of the average lattice [52-54].
The Gaussian width converges to λ ∞ as the separation vector approaches R ∞ : Displacements of atoms from their average positions are uncorrelated when the site separation is large, denoted by R ∞ . Therefore, λ 2 ∞ = 2λ 2 00 where λ 2 00 /2 is the variance of the atomic displacement. The displacement of every atom about its lattice site is described by the Gaussian: The density that appears in the Kirkwood formula when applied to crystals is specified at each site by λ 00 ; ρ i (r − R i ) = ρG λ 00 (r − R i ).The entropy when g r = 1, in terms of λ 00 is: In s x K [1] it can be recognized that the term −1 represents the restriction to one atom per site, the term 3 2 is essentially the equipartition energy divided by T, and 3 2 ln λ 2 00 is the reduction in entropy because each atom is restricted to its site to within a range characterized by λ 00 . As λs are proportional to √ T at low temperature the g = 1 entropy and the improvements discussed below are closely related to the equipartition specific heat, C v = (degrees of freedom)/2. In our situation, the equipartition contribution to the excess entropy is 3 2T . The choice of a particular expression for the excess entropy in terms of λ's affects mainly the integration constant in the equipartition excess entropy. Note, the integration constant is important; it determines the entropy of fusion. The g = 1 entropy is a reasonable extrapolation of the liquid entropy to low temperature. It misses the drop in entropy at melting. This makes sense because the g = 1 entropy does not include any reduction in entropy associated with correlation between atoms on different sites. The Kirkwood approximation can guide us. One approach is to consider only the factors of g(r) in the first row of Equation (28), where . This row has N factors of g; thus, this approach achieves the mission assigned to γ in the discussion of fluids; it reduces the factors of g by N 2 . The first row gives: The i th row will contribute 3 2 ln ) to the entropy [55]. If only the first row (first nearest neighbors) is included the result is equivalent to neglect of the conditions on the conditional probabilities. If more rows are added, then contributions are included from neighbors beyond first neighbors; the contributions of these more distant neighbors will usually be smaller because their correlation diminishes with distance.
Perhaps the simplest approach is use the spherically averaged PCFs. If the PCFs are treated as spherical in Equation (32), the Kirkwood entropy, s K x [g s ], is given by: In all cases, if there is no correlation between motion on different sites the entropy reverts to the entropy of the g r = 1 entropy. We have found that the values of λ 0i can be determined by fitting g s (|r|) to the sum over R i of the integral over solid angles of G λ i (R i , r); see Equation (41). The values of λ 0i [g s ] obtained by fitting g s (|r|) are shown for T < T m in Figure 4. For crystals, the λs are just an alternative way of specifying g(r). At low temperature they are proportional to √ T, as is to be expected in the regime where a harmonic model is valid. As sites move farther apart the peak widths, λ 0i , grow.  Recapping, the Kirkwood approximation of P r was applied to fluids and crystals. In fluids we proposed modification through φ and γ to correct the normalization of g and the multiplicity of gs that appear in P r . Thus, for fluid PCFs we have discussed two possible approximations to the functional: (1) the original Kirkwood excess entropy and (2) the modified Kirkwood excess entropy. For Crystals, the excess entropy of the g = 1 system is a reasonable extrapolation of the modified Kirkwood result for fluid PCFs. The crystal does not have a normalization problem and the number of factors of g is not a problem for the g = 1 entropy. We explored several ways to include inter site correlation via g; the different treatments are specified by the different arguments for the functional s x K [g]: indicates that the sum in Equation (46) includes only terms from the first row, (III) s x K [g(r)] indicates the sum is over all neighbors of contributions from the full vector correlation, g(r), and (IV) s x K [g s (|r|)] indicates that the correlation is approximated through the spherical average of g. For crystals, Expressions I and II represent limited summations of the terms contributing to the Kirkwood entropy. The full Kirkwood entropy is expressed in III; it is the most complete expression of the Kirkwood approximation. Expression IIII is a spherical approximation of the full Kirkwood expression.
To emphasize that a single functional spans phase transitions by acting on any properly normalized functions, g(r) we write: In Equation (48) any g that is found to be "fluid like" is inserted into the modified Kirkwood approximation for the fluid; any g that is "crystal like" is inserted into the full Kirkwood approximation for the crystal.
In the next section, alternative treatments of the crystal are discussed from the perspective of a harmonic crystal.

Connection to the Harmonic Crystal
For pair correlation functions that are very sharply peaked at crystal lattice vectors there should be a connection between Equations (44)-(47) and the entropy of harmonic solids. Equation (11) relates entropy to a that of an approximate interactionV a,{c i } (r 1 , ...r N ) parameterized by {c i } and the actual probability density; could Equation (11) be a tool that allows such a connection to be built?
We point out that in Equation (49) the entropy is a functional of g because the trial potential is determined by minimization for each choice of g. We follow Morris and Ho [35,56]: who proposed taking the trial interaction to be harmonic. For harmonic interactions the variational choices are the "spring constants," {mω 2 1 , ...mω 2 N }, and the coordinate choice, {y 1 , ...y N }, which is a set of 3N linearly independent coordinates that specify the atomic positions, y k = Map k (r 1 , ...r N ). In our monoatomic homogeneous system, using trial harmonic interactions,V harmonic {ω k ,Map k } (r 1 , ...r N ) = ∑ 3N k 1 2 mω 2 k y k (r 1 , ...r N ) 2 , [57] gives a functional: The analytic free energy of harmonic systems is known and is independent of the choice of coordinates; let us work it out for our case. First consider the Einstein model, a quadratic point potential that attracts atoms to lattice sites, y k = x k − R k . Note that such a quadratic potential increases without bound as an atom moves away from a lattice site; some ground rules need to be specified when sites are brought together to form a solid. We could allow one specific atom in the potential at each site; this would result in, 3N 1-d harmonic factors in the partition function, exp(− H 1d k B T ) 3N . Another choice is to let atoms be spread without prejudice as to the number of atoms on a given site; this leads to N N 3N factors. Our choice is to disallow multiple occupation of sites; this is closest to the observed simulated behavior of g(r). Therefore, we obtain N!3N factors. Furthermore, we consider the potential to be zero at lattice sites and to extend to the Wigner-Seitz boundary where it joins the potentials centered at neighboring sites. We have mandated that sites be singly occupied, this can be thought of as the result of a repulsive potential acting between atoms that is strong enough to absolutely exclude two atoms from being at the same site. Therefore, the N atoms can be distributed among the sites in N! ways giving the N!3N harmonic factors appearing in the argument of ln below. If the exclusion is not present there would instead be N N 3N factors. All ω k will be chosen independent of k, ω k = ω; this is consistent with all choices for {y k } being equivalent to within a translation.
In the second equation we have assumed that the atoms are far enough apart that the repulsive interaction does not influence the free energy; this assumption will break down just before melting. In applying Equation (49) the kinetic energy has been grouped with the free energy of the ideal gas to form S id : We perform the minimization with respect to ω k by setting dS harmonic A completely general way to write a linear mapping to y k is y k = x k − R k − ∑ j =k a k,j (x j −R j ). This representation emphasizes our goal of having a local functional of g. We are not going to minimize with respect to {a i,j }; instead we begin by exploring the two cases motivated by: (I) Kirkwood g = 1 entropy of the crystal s K x [1], and (II) which includes the entropy reduction from first neighbor correlation. For case I, we pick the classical Einstein model, y I k = x k − R k ; each atom is harmonically attached to a lattice site. Case II is a chain with harmonic links, These results should look familiar; case I is the excess entropy when g r = 1, Equation (44) (R 1 , r)] the entropy depends only on correlation between atoms that are nearest neighbors. What is the influence of other neighbors? In the conditional probability expression the more distant neighbors enter through the conditions. In the Morris-Ho treatment a natural way to include correlation with other neighbors is to consider < y 2 k > to be the diagonal element of the correlation matrix, C k,k =< y k y k > (see [35]). If the correlation matrix is diagonal, i.e., C(k, k ) =< y 2 k > δ k,k , or if it is approximated as diagonal, C Diagonal k,k = C k,k δ k,k then ∑ 3N k ln(< y 2 k >) = ln |C Diagonal |. Other truncations that leave C symmetric can be used to evaluate ln |C|, e.g., block diagonal. These approximations merely correspond to making alternative choices for the set {y k }. Truncating C to a tridiagonal matrix, C Tridiagonal k,k = C k,k (δ k,k + δ k,k +1 + δ k,k −1 ) incorporates additional coupling between displacements on different sites when applied to case I and between different bonds when applied to case II.
Is truncation of C α,β i,j to a diagonal or tridiagonal matrix reasonable? In the tridiagonal matrix, how much smaller are the off-diagonal elements than the diagonal elements? The ratios are (see Figure 4): For the low temperature Johnson potential both | I | and | II | are about 0.4, consistent with the classical Debye value (see Appendix F). However | I | climbs to 0.6 while | I I | dives, crossing zero a few hundred degrees before melting when λ 2 02 2 = λ 2 01 . This "zero" is likely to occur whenever > 0 at T = 0 because the first nearest neighbor peak will maintain its shape as melting approaches but the second nearest neighbor peak will lose its integrity. As a result of the very different behavior of I and II the impact of off diagonal correlation is very different in the two cases.
In the homogeneous case where all sites are equivalent this tridiagonal correlation matrix becomes Toeplitz and has an analytic determinant [58]. Approximating the full correlation matrix by a Tridiagonal Toeplitz matrix, C TT : where: Therefore, for cases I and II, in lim N→∞ , the off-diagonal coupling simply adds a term, In case I, for the Johnson potential, the off-diagonal correlation ratio is large, ≈ 1 2 ; it reduces the entropy significantly. Correlation gives an entropy of fusion in line with the target value and with Richard's rule [59]. Note that, 3 2 ln 1 + 1 In case II, bond-correlation gives only a small reduction in the entropy; in fact,s x II h and s x II h−TT are so close that in the discussion in the following section onlys x II h−TT will be shown. Equation (63) are attractive in two ways: first, they give an upper bound to the entropy and second, they are simple. Clearly, there is room to improve on these functionals; near melting, even thoughs x I h−TT is formally an upper bound, it is observed to be slightly lower than the target entropy. This indicates a problem; a possible explanation is that the assumption that every atom stay tightly bound to a specific site is violated near melting.
For distant neighbors, correlation ratios designated by will be small; their effect on the entropy should be roughly 3 2 ln 1 + 1 This may suggests a form that can be used to incorporate the influence on entropy reduction from the weak correlation between more distant bonds. As the system can be specified by 3N coordinates there are three local coordinates available for each site. The coordinates having the smallest variances will dominate entropy reduction. Some possible choices are: the displacements from a lattice site (x 0 , y 0 , z 0 ), the coordinates of a specific nearest neighbor separation(x 01 , y 01 , z 01 ), or the distances to three different close neighbors, (r 1 , r 2 , r 3 ) . This is reminiscent of the equal partition theorem in which each of the 3N coordinates adds 1/2k B to the specific heat. Here, the the 3N coordinates with the smallest fluctuations contribute 1/2 ln λ 2 . Normally these low variance coordinates will involve close neighbors. The largest contributors (most negative) will be placed on the diagonal of the correlation matrix where each of the three variances will contribute terms like 1 2 ln λ 2 . Weaker contributors will enter through terms like 3 2 ln 1 + 1 2 1 − 4| | 2 − 1 that represent weak correlations that couple the more strongly correlating coordinates.
As the temperature of a crystal increases all variances grow. The variance λ 2 00 /2, increases, as does the variance of the vector separations between neighbors, e.g., λ 2 01 /2 and λ 2 0,2 /2 (see Figure 4). At all temperatures the variance at large separation is exactly twice, λ 2 00 /2. However, as the temperature increases the separation at which this factor of two is reached becomes shorter and shorter. Above melting λ 00 and λ ∞ go to ∞, while "λ 01 " in the radial direction goes to a width indicated by the nearest neighbor peak ofḡ(r) and in the two perpendicular directions it goes effectively to ∞ because neighboring atoms can be found anywhere in the spherical shell at |r| [60].

Comparison of Selected Functionals with the Target Entropy
Here we compare simulated values of the excess entropy for Jonhson-potential-iron to values from several approximate functionals. The functionals originate from several starting points as described above. We caution the reader not to discount approaches because of their poor performance in comparison to the simulated results. Further refinement of any of these approaches might mature into a very reliable functional. We have been reluctant to introduce fitting parameters. Agreement with simulated results can always be improved through more parameters. In our opinion it is best to wait until a broader array of approaches has been explored. Our only use of fitting parameters is for fluids approximated by the modified Kirkwood functional,s x K . It has three parameters, q 0 , q 1 , and q 2 [51], they were fit to the target excess entropy associated with the Johnson pair potential [22]. The target entropy was calculated by thermodynamic integration as described in Appendix E. The three parameters affect only the value of the functional for fluid pair correlation functions; furthermore, q 1 and q 2 affect only the fluid entropy at very high temperature. Agreement with the target in the fluid is not surprising; after all, we performed a fit. However, given the physical basis for the corrections, we anticipate that the values for q 0 , q 1 , and q 2 will perform reasonably for most systems. The modified Kirkwood entropy,s x K , is compared to the Kirkwood entropy, s x K , and the target in Figure 5.   For crystalline pair correlation functions our proposal to use the definition of PCF for inhomogeneous densities (Equation (24) gives, when g r = 1, a "reference" entropy, s x K [1]. This definition extends the Kirkwood approximation of crystals. The reference entropy, s x K [1], does not account for correlation between atoms on different sites when g = 1. In Figure 5 the reference entropy provides a smooth continuation of the modified Kirkwood entropy of the fluid; it misses the drop in entropy at solidification. Figure 5 also shows several treatments that approximately incorporate correlation between atoms on different sites. These approximations are motivated by the Kirkwood approximation, conditional probabilities, and the Morris and Ho application of harmonic entropy to crystals. Our relabeling scheme leads naturally to entropy reduction being associated with nearest neighbors along a chain, s x K [G λ 01 (R 1 , r)]. We found that for the Johnson potential the neighbors beyond first neighbors contribute very little to entropy reduction; therefore, in Figure 5 we show (see unfilled triangles) only s x K [g(r)], which includes the correlation between all sites along the chain. The unfilled diamonds show s x K [g s (|r|)], which is based on the spherically averaged g; it is very similar in implementation to the Kirkwood approximation in the fluid (see Figure 6). The two evaluations of the s x K functional at g(r) and g s (|r|), i.e., s x K [g(r)] and s x K [g s (|r|)] give similar values. The similarity of these results is due to a cancellation of effects; the spherical treatment accounts for a higher coordination but with a larger variation. Consider, along the chain each site shares two bonds with its two neighbors specified by three coordinates (three coordinates per site), while the spherical treatment shares bonds with eight neighbors specified by one coordinate (four coordinates per site).  Figure 6. The pair correlation function g H s (|r|) for BCC iron at 500 K, as simulated with the Johnson potential, is shown as a solid line. The function g H s|∞ that is associated with the reference entropy is shown as a dashed line. It is the spherical average of Gaussians of width, λ ∞ centered at each of the lattice separation vectors. The ratio, g r (|r|) is shown as a dash-dot line; it goes to 1 at large separation. The combination − 4πr 2 2 g H s (ln g r − (1 − 1/g r ) is the integrand that determines the entropy correction with the reference entropy from the spherical treatment of the Kirkwood approximation in the crystal (dash-dot-dot).  Figure 5. Both results include entropy reducing correlation between sites; as a result they give an entropy of fusion whereas the reference entropy did not. It is sensible that s x II h−TT is higher than s x K [g(r)] because s x II h−TT is restricted to be an upper bound. Finally, let us look at inclusion of correlation between nearest neighbors in the Einstein model. The Einstein model performs poorly, it missed the entropy of fusion; however, the correction due to neighbors is large. The correction actually over estimates the entropy of fusion and brings s x I h−TT slightly below the target entropy. In the lower plot of Figure 5 we compare our result to the entropy of a classical density functional theory functional [28] of density and pair potential; the functional was evaluated for the liquid density and the Johnson potential. The entropy related to this functional (Equation (40) of Lutsko and Lam [28]) is: The agreement with the target of the various approximations is good considering the simplicity of the functionals. In future work, based on a more diverse set of simulation data, any of the approaches presented here could prove to be the better starting point for accurate representations of the universal excess entropy functional.

Conclusions
The important message of this work is not the quality or lack of quality of our proposed excess entropy functionals, but rather, the existence of a universal excess entropy functional and the usefulness of an entropy functional approach. We foresee that further research will bring a broadly applicable, accurate functional not only for single component systems but also for multicomponent, molecular, macroscopic particle, and spin systems. More imaginative functional forms that are designed to satisfy a variety of formal restrictions and fitted to vastly greater amounts of simulated data will advance the development of the functionals. Here we have mapped out a few starting points for functional development in order to clarify the entropy functional machinery. We have also put forth physical interpretations relating features of the PCFs and their role in entropy reduction; these relationships are obscured when the entropy is obtained through thermodynamic integration. The pair entropy functional approach could accelerate the study of technologically interesting materials and materials processes. For both first principles (DFT) simulations and simulations with classical many-body interaction (see Appendix D), entropy functional evaluation will make determination of the free energy as routine as the evaluation of the energy. Data Availability Statement: This manuscript has been authored by UT-Battelle, LLC under contract number DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains, and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/ downloads/doe-public-access-plan (accessed on 11 February 2021)).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Universal Functionals
To establish the position of this work within several approaches based on universal functionals, we give a brief survey of some relevant universal functional theories.
Density function theory (DFT) [61] for electrons addresses the quantum (usually Fermion) many-body problem where the interaction is unchanging; it is the pairwise, Coulomb interaction between electrons (many body interactions are zero). The underlying minimization principle is the Raleigh-Ritz variational principle of the ground state energy (expectation value of the Hamiltonian) with respect to the many-body wave-function. The problematic terms are the interaction energy (including exchange) and the kinetic energy; the straightforward term is the external energy (energy of interaction with the point potential that usually arises from Coulomb interactions with fixed protons). The external energy is easily determined from knowledge of the density, E external = drρ(r)v (1) (r). If the point potential is changed, i.e., protons are displaced, the energy changes; this contributes to forces between protons exerted via the electrons, forces that can be used in classical DFT-molecular-dynamics. The sum of electron kinetic and interaction energy is a universal functional of the density; the functional is independent of the point potential generated by the protons; the functional is system independent. The majority of proposed approximations to the universal functional are based on the kinetic and interaction energy for a particularly simple point potential, specifically v (1) (r) = 0. DFT was originally derived based on the uniqueness of the point potential that generates a given density [61]. More recently DFT has been derived from a constrained search procedure [36]. Typical calculations assume zero temperature; however, electron entropy at finite temperature is sometimes addressed [33]; furthermore, DFT-MD is often performed with the classical ions at a finite temperature set by a thermostat. Research in DFT strives to develop more accurate or simpler approximations to the universal functional, evaluate the functional for larger or more complex systems, or apply DFT to systems of physical, chemical, or technological interest. Electron DFT has proved to be a very successful computational approach. The quality of approximate functionals has improved through decades of research. It is instructive to note that a large number of approximate functionals that have been developed to the point of being incorporated into packaged DFT codes, [62]. Based on DFT it can be said that given a ground state density, and no other information, it is possible to determine the kinetic and interaction energy by inserting the density into the universal functional. Furthermore, the variation of the functional with respect to the density gives the external potential, thus determining the ground state energy of the electron system.
Classical density functional theory, CDFT [63,64], addresses the classical many-body problem ( in or near equilibrium) where the interaction between particles is typically different for each system studied and different for each of the types of particles within the system. The underlying minimization principle is of the free energy with respect to variation of the probability density. The most problematic term is the entropy. The kinetic energy is that of the ideal gas. The external energy is given trivially in terms of the density, E external = drρ(r)v (1) (r). The interaction energy is given straightforwardly in terms of the correlation functions (just pair correlation for pair interactions). For the interaction energy to be expressed as a functional of the density, an expression that relates the correlation to the density is required. Similar to the situation for electron DFT, this relationship is usually developed for the case when v (1) (r) = 0. Unlike electron DFT, which applies to a fixed interaction, in CDFT the relationship is reestablished through MD or Monte Carlo for the interaction and temperature of interest. Similar to electron DFT, the universality of the functional rests on the proof that, for a given set of fixed interactions, the point potential that generates a particular density is unique. In applications, the external potential is often the chemical potential of involved phases. The density changes through, for example, a solid liquid interface. Much of the recent work on CDFT has been in the context of Phase Field Crystal (PFC). Much of the work in PFC has been on developing an approximate functional that displays stability for relevant phases (liquid, BCC, FCC,...) as a function of temperature, [65]. Research in CDFT covers the same types of issues as encountered in electron DFT. In a special issue of Journal Physics: Condensed Matter, New developments in Classical Density Functional Theory [29], applications of CDFT are described along with recent improvements in functionals.
Pair-DFT for electrons has been proposed, [66]. In Pair-DFT the kinetic energy is a universal functional of the spin resolved pair density and the interaction energy is a simple functional of the pair density and pair interaction. Pair-DFT could utilize an effective two electron Schrödinger equation in a similar fashion to the Kohn-Sham approach [67] to DFT which is based on an effective one electron Schrödinger equation. A more pragmatic approach could combine selected results from full pair-DFT with improved relationships between the pair-density and the density to yield improved DFT approximations. In pair-DFT, given a density and the electron-electron pair correlation functions g α,β (r, r ), the electron kinetic energy could be obtained by simply plunging them into the pair-DFT universal functional (with no knowledge of the point potential and pair interaction).
As discussed in this paper, pair-CDFT boils down to obtaining the excess entropy from the density and pair correlation function (no other information required). For clarity we have focused on the homogeneous case, i.e., v (1) = 0. Furthermore, the aspects associated with a point potential are already developed in CDFT. CDFT can be recovered from pair-CDFT by relating the pair correlation function to the density; in forming this relationship to recover a CDFT it is permissible to use the pair interaction because, after all, CDFT is for a specific, fixed interaction.
If an accurate and easily evaluated universal S x [g] has been developed, then for situation (1) the free energy can be obtained by plugging v(r) and g(r) into Equation (20). For example, if a simulation with a known potential has been performed and g has been determined from the output; the free energy can be obtained without thermodynamic integration simply by substitution. Furthermore, the value obtained is expected to be reliable because of the stationarity of A[g]. Situation (2) applies when v(r) is known but no simulation or measurement of g has been performed; in this case g could be found, not from simulation, but from the requirements that it minimize A and that A be stationary at the true g. The free energy could then be obtained by substitution. Situation (3) could arise when only g is known, for example, a measured g. Stationarity allows the unique potential to be determined from A[g] at any T: As a simple illustration, consider the dilute limit where g ≈ exp(− v(|r|) k B T ) and the excess entropy is approximated by: S 0 x = −N/2 R dr(g ln g − (g − 1)) − N; Equation (A1) is satisfied exactly. In general, if the excess entropy functional expressions were found that were both accurate and had algebraic variations with respect to g then the pair potential would be given explicitly in terms of g. Therefore, an accurate approximation of the entropy functional eliminates the need for reverse Monte Carlo. Furthermore, according to Equation (A1) once the pair potential is known the potential energy and hence the free energy is specified.
Note that Equation (A2) specifies A in terms of g but it is not stationary in g, essentially, because stationarity was already used to eliminate v(r). If g were known from experiment or simulation Equation (A2) would give A provided that the interactions were pairwise. Equations (A1) and (A2) go beyond Henderson's theorem [23] by specifying the relationship between the entropy and the unique potential and between A, S x , and T. If present, the impact of the many-body interactions would manifest in both the pair (measured or simulated) and the higher order correlations (typically these are not measured and typically these are not calculated from simulations).

Appendix C. Mapping Atomic Coordinates to Atom List
Arranging atoms along a linear chain that travels from neighbor to neighbor helps clarify our discussion. For a given set of atomic coordinates this is not a unique operation. Furthermore, the atoms move as a simulation progresses. The results derived in this paper are not sensitive to the details of list construction. We do not foresee that list construction will be incorporated as part of simulation, it is a gedanken-construction. Our proposal is: (1) for the set of atomic positions perform a steepest descent to the inherent structure; (2) find the shortest chain or chains that links all sites with nearest neighbor links; (3) index atomic positions according to the position in the chain of the inherent structure site that it descended to in the steepest descent process; (4) if there are multiple chains of the same length calculate the entropy employing each chain and average. For a crystal simulation there would be multiple chains but each would give the same result so averaging is not needed. In most cases the differences between chains would go to zero in the thermodynamic limit so no averaging would be needed. However, there could be cases, e.g., defects, where special care would be needed to represent and average over bonds local to the defect.

Appendix D. Inclusion of Many-Body Effects
In order to obtain many body corrections to A from the Gibbs-Bugoliubov inequality the higher order correlation and the, presumably small, many body interactions would have to be known. Unfortunately neither the interactions nor the higher order correlations associated with an experimental g are ever exactly known. However, the excess-entropy functional does not suffer the same limitation because it is defined in Equation (18) by a constrained search. The linear change in S x due to the change in g under the influence of many-body interactions is included because it is the modified g that is known. In the presence of many-body interactions the entropy functional, S pair x , is modified by constraints on P, becoming S manybody x [g, ...g n ]. The P that maximizes Equation (18) will differ from the constrained true many-body probability density by an amount proportional to the many-body interactions but which will lower S x by an amount quadratic in the many-body interactions (S x [g] = max g (n>2) S x [g, g (3) , g (4) , . . . ] = S x [g (2) , g  (n) ) 2 ). The entropy functional of Equation (18) provides a stationary upper bound to the entropy with respect to many-body interactions. The same argument applies to g(r) from first principles MD; the entropy could be well reproduced by Equation (18) because the effect of higher order interaction is second order. The internal energy (total energy) from the first principles MD can be added to −TS id (T) − TS x [g] to obtain a first principles A that is accurate to second order in the difference between the first principles interactions and an, unspecified, effective pair potential. Knowledge of the functional, S x [g], provides a general method for obtaining the free energy from first principles without thermodynamic integration.  Figure A1. The potential energy, u, and the negative of the entropy multiplied by T, −Ts x target , in (eV/atom) are plotted as a function of T for iron as simulated with the Johnson potential. The dashed line is the fit to the simulated potential energy as described in the text. The dotted curve is obtained analytically from the fit to u(T). The symbols are the values of the simulated potential energy at the set of simulation temperatures used for the fit. Horizontal solid lines indicate the exact T = 0 potential energy, the zero of energy, and the exact T = ∞ potential energy. The vertical line is at T m .