Nested Sampling of Materials’ Potential Energy Surfaces: Case Study of Zirconium †

: The nested sampling (NS) method was originally proposed by John Skilling to calculate the evidence in Bayesian inference. The method has since been utilised in various research ﬁelds, and here we focus on how NS has been adapted to sample the Potential Energy Surface (PES) of atomistic systems, enabling the straightforward estimation of the partition function. Using two interatomic potential models of zirconium, we demonstrate the workﬂow and advantages of using nested sampling to calculate pressure-temperature phase diagrams. Without any prior knowledge of the stable phases or the phase transitions, we are able to identify the melting line, as well as the transition between the body-centred-cubic and hexagonal-close-packed structures.


Introduction
Computational materials science has seen rapid evolution in recent years, providing an increasing number of crucial tools for the prediction and discovery of material properties [1][2][3]. Gaining insight into the atomic-level behaviour of materials allows for the study of their characteristics under extreme conditions; enables the clarification of structural details; and offers specific atomic level data that is capable of guiding and augmenting experimental research. One of the most important challenges for such atomistic level simulations is how one can determine the particle configurations that are representative of a material's properties. Further, there is the question of how one may extract thermodynamically relevant information from the simulation data of these configurations. Central to this challenge is the unfathomable number of ways in which a group of particles may be arranged. We therefore have a high-dimensional, multi-modal surface representing our configurational phase space, with chemically relevant regions being exponentially localised due in part to the probability of microstates being proportional to the Boltzmann factor, as well as the number of potential configurations decreasing along with energy. This is the potential energy surface (PES), which describes the potential energy of a system of particles as a function of their spatial coordinates. Exploring the PES and determining the location of the highly-localised energy "basins" is therefore akin to finding a needle in a 3N-dimensional haystack.
Thankfully, there exists a plethora of computational techniques developed to target and sample specific parts of the PES. Among other outcomes, these methods allow for the detection of ground state and metastable structures at zero temperature [4][5][6]; the calculation of free energy differences between particular states [7]; or the locating of regions that correspond to phase transitions [8,9]. However, while these methods are very efficient at the tasks they were designed for, they are often limited as a result. For example, they may be applicable to study only specific phases and properties of a material, or require prior information about the system that may not be readily available (e.g., relevant crystalline structures). Acquiring a comprehensive and unbiased picture of the PES therefore presents a significant challenge, requiring considerable human and computational commitment, and likely the use of multiple techniques and/or software. The challenge is magnified even further if, as is often the case, thermodynamically significant regions of the phase space exist that have yet to be even considered.
The Nested Sampling (NS) method, introduced by John Skilling in the field of Bayesian statistics [10,11], offers a unique way of tackling this problem. NS is a cleverly designed, iterative approach to evaluate integrals that starts with a group of uniformly random samples drawn from the entire likelihood region. Each step of the algorithm generates new samples that are "nested" within the space bounded by the current point of lowest likelihood. As a result, the samples will progress towards regions of increasing likelihood, until the point of highest likelihood (or more precisely, the point beyond which an increase in the likelihood has no consequence on the results) is found (or other stopping criteria are satisfied). Uniquely, the process for choosing new samples means that NS conveniently provides an estimate for the volume of each likelihood level [10,12]. Considering the materials science problem, the above can be translated into a process for sampling the PES by starting with a set of uniformly random configurations, generated within the highenergy region (gas phase); and progressing towards the lowest energy configuration (the global minimum) through the NS process, all while providing an estimate for the phase space volume of each nested energy level. The integral we wish to evaluate is thus the elusive partition function, where β = 1/(k B T) is the thermodynamic temperature, P is the pressure, H is enthalpy and (q, p) represents all possible configurations of position and momenta that the atomic system may take. The importance of the partition function is clear from the number of useful thermodynamic response functions to which it provides access, such as the heat capacity and free energy [13], respectively, given by Nested sampling allows us to evaluate this integral as a discrete sum (with the momentum dependent part factored out) where H i is the enthalpy at the ith nested sampling iteration, and Γ i is its corresponding phase space volume, i.e., the volume of the phase space with enthalpy less than or equal to H i . In the following section we will explain in more detail how the NS algorithm may be used to evaluate a system's partition function, its corresponding macroscopic properties, as well as atomic-level thermodynamic behaviour.

Nested Sampling Algorithm
The NS algorithm is demonstrated in detail in Figure 1, showing the exploration of the PES at constant pressure, P. The sampling starts by generating K initial configurations, with particles positioned randomly inside simulation boxes that each have a random shape and volume V large enough to represent the gas phase (k B T PV). These configurations constitute the "live set", or the set of "walkers". Rather than calculate the internal energy U of each walker, we are instead concerned with their enthalpy, H = U + PV, due to the isobaric nature of the simulation. At the beginning of each iteration the walker of the highest enthalpy configuration is selected, corresponding to the lowest likelihood point within the context of Bayesian statistics. This sample is removed from the set and recorded along with the total phase space volume of the PES below the enthalpy of said sample, given by the simple expression Γ i = [K/(K + 1)] i , where i is the iteration number. From this it is clear that the difference in phase space volume between subsequent iterations, w i = Γ i−1 − Γ i (otherwise referred to as the weight factor), decreases as the number of walkers increases, and can be seen as roughly analogous to the increment size in typical numerical integration. Increasing the number of walkers therefore improves the resolution of the sampling and helps to capture basins of the PES with smaller phase space volume, at the cost of increased computation time. To keep the number of walkers constant throughout the sampling, a new sample is generated to replace the one previously selected, with the requirement that the new sample is to be randomly picked within the region of phase space encompassed below the enthalpy of the discarded (highest enthalpy) sample. A simple rejection sampling quickly fails to provide new acceptable configurations, due to the phase space volume decreasing dramatically with decreasing enthalpy. In one hand, all interatomic potentials have a highly repulsive core (reflecting the Pauli exclusion principle) and hence any two particles positioned too close to each other results in the entire configuration being rejected. On the other, hand one can see that the probability of randomly generating dozens of particle positions corresponding to a crystalline structure is infinitesimally small. Hence new configurations have to be generated in an alternative way. Practically, this is achieved by cloning a randomly selected configuration from the live set and performing a set of steps until the configuration can be assumed to be independent from its parent sample. There are various methods available to perform this walk, depending on the size of the system and potential model, including a series of simple Monte Carlo steps; constant total energy Hamiltonian Monte Carlo [14]; or Galilean Monte Carlo [14,15]. The above steps are repeated until the desired enthalpy is reached, providing a list of energy samples and their corresponding weight factors that can be used to approximate the partition function via Equation (3). Through this process we also gain access to thermodynamic averages of arbitrary observables, where A i are the calculated values for the given observable at each recorded enthalpy level H i . From the description of the algorithm it is also clear that the actual sampling does not include or depend on temperature in any way. Once the sampling of the PES is complete, any given temperature may be substituted into the Boltzmann factor of the above expression, and thus the partition function and other thermodynamic properties can be calculated at arbitrary temperatures as a simple post processing step. Not only that, but since the algorithm starts from the gas phase and generates consecutive configurations with a simple energy or enthalpy criteria, a further advantage of the method becomes obvious: no prior knowledge of the structures, e.g., crystalline phases are necessary. Hence the NS method not only provides crucial thermodynamic information about the system, but is also fully predictive.
To date, we have demonstrated how nested sampling can be applied to sample the constant pressure ensemble and to evaluate the isobaric partition function, as we use this to calculate the pressure-temperature phase diagram. Of course the NS technique can also be used at constant volume, in which case the sampling propagates through nested energy levels (instead of enthalpy levels) in order to determine the canonical partition function. This can be the preferred process to investigate the formation of clusters or molecular structures [13,16]. The power of the NS method and the insights it provides into the properties of materials has been demonstrated for several problems. One example is the determination of continuous and first-order-like melting transitions in the heat capacity of small water clusters, including how they stem from the relative phase space volume of competing local minima [17]. The method has also been applied to evaluate and compare the performance of interatomic potentials, highlighting that models designed to provide an accurate description of microscopic properties can be significantly limited in their prediction of macroscopic properties and phase stability; lithium [18], aluminium [19] and iron [20] being notable examples. Alloys such as CuAu [21] and AgPd have also been studied, the latter using sophisticated machine-learned potentials [22]. In several cases, nested sampling has revealed thermodynamically stable phases that have not been previously considered. One such example of this is the discovery of a new crystalline phase in the soft-core double ramp model (often referred to as the Jagla model), showing that the model's widely studied liquid-liquid transition is in fact metastable [23]. Similarly, NS calculations have provided evidence of new global minima in the bulk Lennard-Jones model that depend upon applied pressure and interatomic cut-off radius, showing the stability of structures with different close-packed stacking variants [24].
For an overview of materials applications of NS, we recommend the following review article [25]. Implementations of the NS algorithm that utilise parallel-processing are available in the pymatnest python software package [26], which uses the LAMMPS package [27] to perform molecular dynamics steps.

Calculating the Phase Diagram: Zirconium
In this section, we demonstrate the use of the NS method in sampling the PES of two embedded atom method (EAM) potentials of zirconium, in order to calculate their respective pressure-temperature phase diagrams. Zirconium has been the subject of several experimental and theoretical studies, due to its resistance to corrosion and favourable response to radiation damage. At lower pressures zirconium crystallizes in the body-centered cubic (bcc) structure, which transforms to the hexagonal close-packed (hcp) structure below 1136 K. With increasing pressure at room temperature, the hcp phase transforms into another hexagonal structure, often referred to as the ω phase, which is not close-packed and has three atoms per unit cell [28][29][30]. The first interatomic potential model we consider, henceforth referred to as Zr95, was developed by Ackland et al. in 1995 [31], and uses empirical values of several crystalline properties (e.g., elastic constants, binding energy, the vacancy formation energy) to accurately reproduce properties of hcp zirconium. A signifi-cantly improved model devised by Mendelev and Ackland was published in 2007 [32]. The model utilises an advanced fitting procedure that incorporates the fitting of pair correlation functions to experimental X-ray diffraction data, as well as the fitting of two-phase coexistence simulations with respect to experimental melting temperatures. We will refer to this potential as Zr07.
Nested sampling calculations were performed as presented in Ref. [19]. Simulations were run at constant pressure, using a simulation cell of variable shape and size that contains 64 atoms. The initial configurations were generated randomly, while new samples were generated from cloned configurations by performing walks with 1920 steps on average. Each of these steps either consisted of a total-energy Hamiltonian Monte Carlo [14,33,34] (all-atom) move; a change in the volume of the simulation cell; or a change in the shape of the cell via the application of shear or stretch moves. When the sampling process is finished, the heat capacity of the system is calculated as a function of temperature using the post processing method described in the previous section and Equation (2). Peaks in the heat capacity curve correspond to phase transitions, allowing one to quantify the corresponding transition temperature via Gaussian fitting. Independent nested sampling runs exhibit variations in the heat capacity, the size of which are inversely related to the number of walkers used (i.e., the method's resolution). The number of walkers is thus chosen such that the maximum difference in phase transition temperatures is no greater than 100 K. In the case of the zirconium potentials, between 640 and 960 walkers were required to achieve the desired accuracy. The error bars reported below correspond to the total width at half maximum of all heat capacity peaks. These calculations typically require 10 8 -10 9 energy evaluations per pressure, which scales linearly with the number of walkers. Figure 2 shows a calculated heat capacity curve for the Zr07 potential at 1 GPa, as well as the corresponding density as a function of temperature. It is immediately apparent from the peaks in heat capacity and sudden changes in density that there are two phase transitions at this pressure. The transition at around 2300 K is associated with an increase in density, while the smaller heat capacity peak at 1200 K coincides with a slight decrease in density. This negative change in density is an already-known feature of the bcc-hcp transition of zirconium. The identification of the different phases, especially that of the crystalline structures, can be aided by using a suitable order parameter, for example the bond order parameters [35] or the radial distribution function. Since NS provides us with the relative phase space volume of respective enthalpy levels, we can use Equation (4) to easily calculate the expected value of an observable at a given temperature [13,25]. We have therefore calculated the average radial distribution function of Zr at a range of temperatures, shown on the right panel of Figure 2. By comparing these RDF curves to the ideal, fully-ordered RDFs for bcc and hcp structures (shown in figure with dashed vertical lines), we are able to confirm that this solid-solid transition indeed corresponds to the desired bcc-hcp transition.
Calculating the free energy difference between the different phases can also provide insight into their relative stability. Traditionally the free energy difference is calculated using a reference state or along a collective coordinate. The NS method automatically assigns a weight coefficient to each configuration that is proportional to their relative thermodynamic relevance, hence, by using an appropriate order parameter we can associate each configuration with a given phase (or basin of the PES) and calculate the contribution of that phase to the partition function. Thus, the free energies of given structures may be calculated and compared using Equation (2). We demonstrate this in Figure 3, using the average Q 4 and W 4 bond order parameters [35] to sort configurations between different phases. These results accurately reflect the heat capacity curves shown in Figure 4, and again confirm the stability of the bcc-hcp transition.
Repeating the NS calculations and post processing analysis at a series of different pressures provides the entire pressure-temperature phase diagrams for each potential model of interest, which can be seen in Figure 4 alongside experimental data [28,29]. From these results it is evident that only the Zr07 potential is capable of capturing the bcc-hcp transition, presenting an obvious predictive advantage over Zr95. In addition, while the Zr95 potential's melting line shows reasonable agreement with experimental data at low pressure, it clearly diverges from experimental measurements at pressures above 10 GPa. In contrast, the Zr07 potential maintains reasonable agreement with the experimental melting line up to 20 GPa. However, neither of the two potentials exhibit thermodynamic stability of the ω phase, which has been shown to be stable at low temperatures and above ∼5 GPa experimentally, highlighting an area for future improvements of these models.   [28] and experimentally measured melting points [29], respectively. Blue symbols represent results of NS calculations with the Zr95 model [31] and red symbols correspond to NS results with the Zr07 model [32].

Summary and Outlook
In this manuscript we have utilised constant pressure NS to determine the phase diagram of two Zr EAM interatomic potentials. We found that while the simpler of the two potentials, Zr95, provides a reasonable prediction of Zr's melting line at low pressures, its advanced counterpart Zr07 provides a more accurate melting line at higher pressures and is capable of accurately capturing Zr's bcc-hcp solid transformation.
Nested sampling provides a new and unique way of sampling the potential energy surface of materials. It provides an exhaustive exploration of the PES, sampling thermodynamically relevant configurations automatically, without relying on advance knowledge of phases or chemical intuition of potential structures. Having access to the partition function means that phase transitions and thermodynamic properties can be simply evaluated in the post processing step. Hence, NS removes a number of methodological barriers associated with calculating a material's entire pressure-temperature phase diagram, providing a fully predictive approach that is independent of the type of transitions observed in the system. It can therefore be straightforwardly utilised to validate the macroscopic properties of interatomic potential models. Such easily attainable feedback on the performance of a model can potentially be exploited to develop potentials with greater reliability in reproducing experimentally observed behaviour. For example, in the context of the Zr system, to propose a further-modified form of the potential that could accurately model Zr's bcc-hcp-ω triple-point. Moreover, the thermodynamically relevant configurations can open novel routes towards developing advanced machine-learned potentials, that use highly-accurate electronic structure theory calculations as a data-set for functional parameterisation. Data Availability Statement: Generated thermodynamic data is available at https://data.mendeley. com/datasets/j7sygcyswb/1.