Density, Enthalpy of Vaporization and Local Structure of Neat N-Alkane Liquids

The properties of alkanes are consequential for understanding many chemical processes in nature and industry. We use molecular dynamics simulations with the Amber force field GAFF2 to examine the structure of pure liquids at each respective normal boiling point, spanning the 15 n-alkanes from methane to pentadecane. The densities predicted from the simulations are found to agree well with reported experimental values, with an average deviation of 1.9%. The enthalpies of vaporization have an average absolute deviation from experiment of 10.4%. Radial distribution functions show that short alkanes have distinct local structures that are found to converge with each other with increasing chain length. This provides a unique perspective on trends in the n-alkane series and will be useful for interpreting similarities and differences in the n-alkane series as well as the breakdown of ideal solution behavior in mixtures of these molecules.


Introduction
Alkanes are common in nature and are frequently used in various applications in chemical industries [1]. While it's not feasible to list all scientific and technological areas that involve alkanes, a few examples can provide a useful perspective on how the study of alkanes at the molecular scale can have a significant impact across many sectors of science and the economy. In particular, we wish to highlight that alkanes frequently occur in heterogeneous systems, composed of many molecules and phases at a wide range of pressures and temperatures. Petroleum is composed overwhelmingly of hydrocarbons and often a significant portion are alkanes [2]. The process of refining and cracking crude oil entails an examination of alkane behavior in numerous environments and conditions [2]. Methane and other small hydrocarbons occur in Earth's atmosphere and are consequential greenhouse gases [3,4]. Alkanes are much more reactive than CO 2 , however, so they have far shorter lifetimes in the atmosphere [3]. Their reactivity is important for their use in synthetic organic chemistry as well as linked to their hypothesized roles in the natural formation of complex organic molecules particularly in abiotic conditions. Venturing beyond Earth, these molecules are common in many extraterrestrial environments, comprising a significant portion of some planetary surfaces and atmospheres [5][6][7]. Many of these examples feature complex mixtures of molecules, so the use and evaluation of robust, transferable force fields for such systems is an opportunity to look at more realistic systems without the need for time consuming parameterization or slower simulation protocols. Additionally, they often serve as the first molecules introduced in organic chemistry courses. Beyond pedagogical convenience, they are also important models of the driving forces Liquids 2021, 1 48 underlying the behavior of hydrophobic molecules, particularly as they relate to biological questions, like protein folding [8][9][10], protein-protein interactions [8][9][10], lipid structure and dynamics [11][12][13], and other topics. As such, their behavior provides context for the properties of hydrophobic molecules and is relevant for understanding a diverse set of natural and technological processes.
Despite their apparent simplicity and ubiquitous prevalence, alkanes remain a frequent topic of inquiry. Here we focus on normal alkanes, n-alkanes, which are hydrocarbon molecules with a linear chain of fully saturated carbon atoms. These molecules have general molecular formulas of the form CH 3 -(C n-2 H 2n-4 )-CH 3 or more compactly C n H 2n+2 , where n is the number of carbon atoms in the molecule. The structure of n-alkanes was the topic of some of the first X-ray diffraction studies of aliphatic molecules in liquids [14]. These studies helped establish and inform our understanding of molecular structure, like bond lengths, angles, and dihedral angles. Many of the small n-alkane liquids have received significant attention. For example, Habenschuss et al. used X-ray diffraction to study the structure of liquid methane [15]. Despite methane itself appearing spherically symmetric in X-ray experiments, intermolecular structural correlations revealed distributions indicative of tetrahedral molecular symmetry [15]. Similar studies have been reported for ethane [16], propane [17], n-butane [18], n-pentane [19], and so on. A common thread through these studies is that n-alkanes have increasing molecular flexibility with length and present richly complicated intermolecular structures. These studies go beyond understanding just the structure of the molecule itself but reveal correlations between multiple molecules. This permits the elucidation of solvation structure when n-alkanes are used as the solvent. Additionally, it has permitted the exploration of processes like solvation and desolvation of solutes, wetting and dewetting of surfaces, nucleation, etc. Indeed, in this work, we are particularly interested in the intermolecular structure of alkane liquids as a path to understand the thermodynamics of these systems. This liquid structure provides a new perspective on n-alkanes as solvents, but also on features specific to liquid phase behavior.
There have been numerous molecular simulation studies of n-alkanes, both examining their specific properties and as model systems to evaluate force fields and other aspects of simulation protocols [9,12,[20][21][22][23][24][25][26]. Density is frequently presented, because it is straightforward to determine, a holistic measure of force field performance, and typically unambiguously comparable with experimental results. Many force fields, developed with different philosophies and parameterized using many different strategies, have been shown to accurately reproduce experimental densities for these systems [9]. Many force fields are specifically tuned to reproduce the density at some state point for the system of interest, but general force fields are constructed to on average provide adequate description of molecular interactions so that thermodynamic and dynamic behavior is suitably reproduced. As such, density serves as an important test of the performance of a molecular model.
In this work, we perform molecular dynamics simulations to examine the density and radial distribution functions of pure n-alkane liquids, from methane to pentadecane, to understand similarities and differences in the local structure of these liquids. We are specifically interested in the variation of liquid structure as the number of carbon atoms in the alkane increases. We find that there are several signatures of convergence with respect to chain length, depending on the property examined. This set of molecules is known to have trends in many thermodynamic properties with increasing n, which are often used to understand how intermolecular forces and molecular size affect thermodynamic properties (indeed they are often presented in general and organic chemistry to help new students understand relationships between molecule structure and emergent properties). Further, while this study does not directly consider mixtures, the convergence of properties for pure n-alkanes can be used to anticipate the onset, or conversely the breakdown, of ideal solution principles in mixtures.

Liquid Simulation Details and Protocol
The starting structures for each n-alkane were imported from the WebMO chemical structure database [27]. WebMO was used to set up and execute Gaussian calculations using Gaussian version 16 [28]. The molecular structure was optimized and then the electrostatic potential was calculated using Hartree-Fock theory with a 6-31G* basis. The atomic partial charges for each n-alkane were then assigned using the RESP method [29] as implemented in the AmberTools program Antechamber. AmberTools and Amber version 20 were used for all parameterization and simulation steps in this study, unless otherwise stated [30]. The second generation general Amber force field (GAFF2) was used to describe the van der Waals, bond, angle, and dihedral interaction terms in all molecules [31]. The starting parameter files for simulations were constructed using the AmberTools program tLEaP [30]. Starting configurations containing 1000 molecules of an n-alkane were prepared using the Packmol utility to randomly place the molecules in a cubic simulation cell [32]. The starting box size was selected to obtain starting densities of about 0.1 g/mL. This size is chosen to balance the desire to reduce so-called 'bad contacts' (arrangements of molecules in relatively high energy configurations resulting in large forces that are difficult to relax to equilibrium) in the starting configuration with the practical limitation of obtaining a configuration quickly.
Each system was first relaxed using 10,000 minimization steps, where the initial 10 steps use the steepest descent algorithm and the remaining use the conjugate gradient method. Second, to obtain the approximate liquid density, each n-alkane system was equilibrated in the NPT (isobaric, isothermal) ensemble at the experimental normal (1 bar) boiling point for the molecule for approximately 10 ns. The boiling points for each n-alkane at 1 bar are listed in Table 1. All simulations use a time step of 1 fs. The temperature is maintained with the Langevin thermostat with a collision frequency of 5 ps −1 . In all isobaric simulations, the pressure was controlled with the isotropic Berendsen barostat using a relaxation time of 1 ps. Third, each system was annealed for 20 ns in the NVT (isochoric, isothermal) ensemble 200 K above the normal boiling point of the molecule. This is intended to remove spurious effects that may have resulted from the randomly generated starting configuration. Fourth, the system was equilibrated for 20 ns in the NPT ensemble at the experimental normal boiling point of the molecule using the same parameters as step 2. An additional simulation was initiated from the end of this 20 ns simulation 10 K below the boiling point for each alkane for the numerical determination of the entropy of binding (discussed more below). Fifth and last, production for each system at the boiling point and 10 K below was performed by simulating for 100 ns in the NPT ensemble. These production simulations are analyzed for the properties reported here.

Gas Simulation Details and Protocol
We also performed gas phase simulations for the calculation of the enthalpy of vaporization, ∆H vap . We generally followed the method described by Wang and Tingjun (in Wang and Tingjun this is called protocol 4 or "P4") [24]. For each alkane, a single molecule was placed in a 100 Å box and simulated NVT at the experimental boiling point for 1.1 ns. The last 1 ns was used for the calculation of required average quantities. Otherwise, the simulation parameters were the same as previously described for liquids.

Density
The mass density is included in the standard output from Amber molecular dynamics simulations performed in the NPT ensemble. The values reported here are calculated by averaging the production simulations.

Enthalpy of Vaporization (∆H vap )
The enthalpy of vaporization, ∆H vap , is the enthalpy difference between the gas and liquid at a given state point. That is where the subscripts indicate the given phase. In this work, we followed a protocol to approximate Equation (1) developed by Wang and Tingjun [24]. More information and discussion can be found in their original paper [24], but briefly they showed that ∆H vap can be predicted from molecular dynamics with the expression where E potential gas is the average potential energy of the gas phase simulation, E potential liquid is the average potential energy of the liquid simulation per molecule, R is the gas constant, ∆T is the difference in average temperatures from the simulations in each phase (that is, ∆T = T simulation liquid − T simulation gas ), and N is the number of atoms in the molecule. Wang and Tingjun found that E potential gas can be better estimated from an ensemble of minimized energies obtained from gas phase dynamics, but in this work we used the energies directly from the dynamics.

Radial Distribution Functions, Free Energy of Binding, and Entropy of Binding
The radial distribution functions are calculated using the AmberTools simulation analysis tool cpptraj [34]. We consider two types of radial distribution functions: those between all carbon atoms in a system and those between a specific carbon in the chain and all other carbon atoms. Subsequent thermodynamic analysis uses the radial distribution function calculated between all carbon atoms in the system. The free energy of binding, ∆G bind , is determined from the maximum value in the radial distribution function as where k B is Boltzmann's constant, T is the temperature, and RDF max is the maximum value in the radial distribution function for that molecule. In this context, the term binding is used to describe the non-specific intermolecular association of pairs of molecules. The total average binding free energy is calculated as n ∆G bind , where n is the number of carbon atoms in the molecule and ∆G bind is obtained from Equation (3). The entropy of binding is estimated numerically from the relationship This derivative is evaluated numerically for each molecule with two simulations performed at the boiling point and 10 K below the boiling point.

Results
The density and ∆H vap are often viewed as important evaluations of the quality of a molecular model, especially the non-bonded terms. Figure 1 shows the density of each n-alkane at the normal (1 bar) boiling point. Figure 2 compares the simulations reported here with experimental densities via a parity plot. Table 2 compares ∆H vap from our simulations compared with experimental values.
function calculated between all carbon atoms in the system. The free energy of binding, ΔGbind, is determined from the maximum value in the radial distribution function as where kB is Boltzmann's constant, T is the temperature, and RDFmax is the maximum value in the radial distribution function for that molecule. In this context, the term binding is used to describe the non-specific intermolecular association of pairs of molecules. The total average binding free energy is calculated as n ΔGbind, where n is the number of carbon atoms in the molecule and ΔGbind is obtained from Equation (3). The entropy of binding is estimated numerically from the relationship This derivative is evaluated numerically for each molecule with two simulations performed at the boiling point and 10 K below the boiling point.

Results
The density and Δ are often viewed as important evaluations of the quality of a molecular model, especially the non-bonded terms. Figure 1 shows the density of each nalkane at the normal (1 bar) boiling point. Figure 2 compares the simulations reported here with experimental densities via a parity plot. Table 2 compares Δ from our simulations compared with experimental values.   [35]. The colors indicate the molecule and are the same as Figure 3. The largest deviation is observed for CH4 with the simulations overestimating the density by 8%. All simulations result in higher densities than the experimental results, with an average deviation of 1.9% across all liquids.   [35]. The colors indicate the molecule and are the same as Figure 3. The largest deviation is observed for CH 4 with the simulations overestimating the density by 8%. All simulations result in higher densities than the experimental results, with an average deviation of 1.9% across all liquids.
Structure of the liquids is evaluated using radial distribution functions. In Figure 3, we examine the radial distribution functions between all carbon atoms within a system. Further, in Figure 4, we examine the local structure around specific carbon atoms. These compare the structure around the terminal carbon with all other carbon atoms (Figure 4a), around the second carbon from the end (Figure 4b), around the third carbon from the end (Figure 4c), and so on. Both ends of each alkane are symmetric, so neither are shown. Only the longest alkane, pentadecane, has an interior carbon that is eight bonds from a terminal carbon, so it is the only molecule in Figure 4h.   Structure of the liquids is evaluated using radial distribution functions. In Figure 3, we examine the radial distribution functions between all carbon atoms within a system. Further, in Figure 4, we examine the local structure around specific carbon atoms. These compare the structure around the terminal carbon with all other carbon atoms (Figure 4a), around the second carbon from the end (Figure 4b), around the third carbon from the end (Figure 4c), and so on. Both ends of each alkane are symmetric, so neither are shown. Only the longest alkane, pentadecane, has an interior carbon that is eight bonds from a terminal carbon, so it is the only molecule in Figure 4h.  We examined the free energy and entropy of binding in Figure 5. The free energy of binding is calculated from the maximum value in the radial distribution functions shown in Figure 3. The entropy of binding is determined from the free energy of binding in Figure  5a as well as the analysis of another simulation 10 K below the boiling point of the molecule. Figure 5a is the average free energy binding a carbon from a molecule in the liquid, so we also examine n ΔGbind in Figure 6, as a measure of the total average binding free energy holding each molecule in solution. We examined the free energy and entropy of binding in Figure 5. The free energy of binding is calculated from the maximum value in the radial distribution functions shown in Figure 3. The entropy of binding is determined from the free energy of binding in Figure 5a as well as the analysis of another simulation 10 K below the boiling point of the molecule. Figure 5a is the average free energy binding a carbon from a molecule in the liquid, so we also examine n ∆G bind in Figure 6, as a measure of the total average binding free energy holding each molecule in solution.    Figure 2 shows that the GAFF2 force field with RESP charges faithfully reproduces nalkane liquid densities at the normal boiling point. The density emerges as a complex result of the interaction potential for a given molecule, so this is often regarded as an important measure of the quality of a force field. We find that all simulations overestimate the density with an average overestimation of 1.9% for all 15 molecules. The largest deviation between simulation and experiment is observed for methane, with the simulations overestimating the density by 8%. The GAFF2, and its first-generation predecessor GAFF, force field are developed with the goal of simulating organic molecules in biological contexts, so the deviation of methane can be rationalized. Methane is the shortest alkane and while it does appear in natural biological processes it is distinct from the relatively longer, greasy sp3 carbon chains where carbon more often appears in biologically relevant molecules. This is reflected in the training set of molecules used to parameterize GAFF, which includes many sp3 hybridized carbons, but not methane [31]. Indeed, we find that, without methane, the average deviation between experiment and simulation drops to 1.4%. Table 2 shows ∆H vap as predicted by our simulations compared with experimental values from the NIST database. The average absolute deviation between simulation and experimental value is 10.4%. The errors are not uniformly distributed, however, with progressive underestimation of the increase in ∆H vap for longer alkanes. For short hydrocarbons, the simulations overestimate ∆H vap and for longer alkanes the enthalpy increases less steeply than experiment. This presents a more complex picture of the fidelity of GAFF2 to reproduce alkane properties than the density. Like the density, shorter alkanes are seen to deviate most from experiment. There are entire studies dedicated to optimizing protocols for the calculation of ∆H vap , so there remains some opportunity to refine the protocol. For example, Wang and Tingjun showed that the result is sensitive to thermostat and many aspects of how configurations are collected and prepared [24].

Density, ∆H vap , and Evaluation of the GAFF2 Force Field for N-Alkanes
GAFF2 is the second-generation general Amber force field which is designed to model organic molecules, especially in biological environments [30]. It evolved in functional form and parameterization strategy from the OPLS force fields [20], which focuses on functional group parameterization. This means that the interaction parameters used here are not tuned specifically to these n-alkanes, but rather to all hydrocarbons with similar atom types. This grants credibility to the use of GAFF2 for systems and conditions like those considered in this work, which admittedly deviate far from the biological systems that motivated the original development of GAFF2.
When Jorgensen et al. introduced the OPLS/AA force field, they included ethane, propane, and butane in the training set of more than 50 organic molecules [20]. In their evaluation of the force field, they found that densities deviated by 2% for all molecules, or 1% for just the three n-alkanes tested, and ∆H vap deviated by 3% for all molecules or 1% for the alkanes. OPLS/AA differs from many other force fields because it specifically targeted parameterization for the liquid phase, which resulted in better performance. In a broader assessment of force fields spanning more types of molecules, GAFF was found to have an average error of 4.43% and 16.43% from experiments for density and ∆H vap , respectively [24]. Caleman et al. developed a protocol for testing force fields that included density and ∆H vap with many small molecules and applied it to GAFF, OPLS/AA, and CGenFF [36]. They found percent deviations in density of 4%, 2%, and 2% and in ∆H vap of 17%, 11%, and 7%, for each force field, respectively. Our results are closer on average to the experiment for many of these force fields, but n-alkanes are less of a challenge than the diverse collection of molecules included in many of these studies. Figure 3 shows the carbon-carbon radial distribution function of each n-alkane liquid at its respective normal boiling point. In these, the radial distribution function is calculated between all carbons. Methane has a prominent maximum at about 4 Å that corresponds to the most likely separation between nearest neighbor pairs of methane molecules. This is like other studies of liquid methane and other molecules well described as a spherical Lennard-Jones particle [15,[37][38][39]. The longer alkanes have peaks at about 4 and 5 Å. For ethane, thẽ 4 Å peak is larger in magnitude than the~5 Å peak, but from propane to larger alkanes the~5 Å peak is found to be larger in magnitude. For the longest alkanes considered, thẽ 4 Å peak becomes difficult to distinguish from the~5 Å feature. It is common for short entries in a homologous molecular series to have distinct fluid behavior [40].

Local Structure in N-Alkane Liquids
The process of averaging over all carbon atoms in a system can obscure features arising from molecular arrangements specific to certain regions of the molecule. In Figure 4, we show the radial distribution function of each carbon progressing from the terminal carbon (Figure 4a) to the interior carbons. The short alkanes are still seen to be distinct, but they become more similar as the alkane length increases. Interestingly, the terminal carbon atoms are seen to have a more distinct~4 Å peak for all alkanes, though this feature is less pronounced for the longer alkanes. The interior, non-terminal carbon atoms are generally observed to have similar local environments. At lower temperatures, this may provide insight into the nucleation process for n-alkanes as they crystallize, since the nucleation process in n-alkanes is known to entail elongation of the chains and then alignment to form the crystal lattice [41]. The distinct structure observed near the chain ends would likely be a useful metric for understanding phase transitions in these systems.

Thermodynamics of Binding
The radial distribution function reveals information about the spherically averaged intermolecular structure of the system, with the RDF maximum corresponding to the most likely separation of the pair of particles analyzed. 'Inversion' of the radial distribution function according to Equation (3) yields the free energy surface governing the behavior of that pair of particles. Therefore, the maximum in the radial distribution function corresponds to a minimum in the free energy of binding. This quantity is related to the work required to bring two particles together from infinite separation. Figure 5a shows that ∆G bind between carbon atoms becomes more positive with the increasing length of the alkane. Similar to the distinct shape of the radial distribution functions, CH 4 and C 2 H 6 , the short alkanes, are observed to have ∆G bind values distinct from the longer alkanes. After C 3 H 8 , ∆G bind is found to increase less steeply, following an apparently different trend. This change in slope corresponds to the shift in the most prominent peak from~4 to~5 Å in the radial distribution function seen in Figure 3, showing that not only does the local structure shift but also the strength with which neighbors bind each other.
In Figure 5b, we examine the entropy change associated with this process. Like the discussion of ∆G bind in the previous paragraph, ∆S bind is related to the change in entropy when a molecule is taken from infinite separation into the condensed phase. The ∆S bind is found to generally decrease with increasing molecule length. Interestingly, ∆S bind is positive for small alkanes, from methane to pentane, but then crosses over to become negative from hexane on. The trends in structure of n-alkanes are closely linked to their increasing flexibility with length [25,42,43]. Feng et al. showed that there is a smooth trend in structural properties that can be partially understood from the context of the flexibility of the n-alkane chain [25]. The shorter alkanes have minimal variation in their length, as measured for example by radius of gyration or end-to-end length [25], but this molecular flexibility plays an increasingly important role in the structural properties of longer n-alkanes. There is not a distinct transition point from inflexible to flexible, but rather increasing flexibility with chain length like the trend seen in Figure 5b.

Consideration of Longer N-Alkanes
The quantity ∆G bind is related to the free energy holding one carbon atom from a given n-alkane in the liquid. Each molecule, however, contains n carbon atoms, so it is natural to assume that n ∆G bind can be a measure of the average total free energy holding a molecule within the liquid. This quantity is shown in Figure 6 for each n-alkane. We observe that it is well fit by a quadratic function. The line in Figure 6 extrapolates the total average binding free energy to determine the chain length where the free energy is zero. This yields roots of 0 and 21, which are the closest integer values. The value 21 corresponds to C 21 H 44 and it is interesting to note that other studies have found dramatic changes in the dynamics and kinetics near this chain length. This is further evidenced by the fact that paraffin waxes are primarily composed of alkanes ranging from 20 to 40 carbon atoms [1,44,45] and this extrapolated point in our data near C 20 H 42 may be unique evidence of a transition in the n-alkane series from small molecule to polymer-like behavior.
It is tempting to assume that n-alkanes beyond pentadecane could be estimated by simply extrapolating from the trends outlined here to longer n-alkanes. This is expected to be true to some extent, but at some point, there is a crossover from small molecule-like behavior to polymer-like behavior. Such a transition is often characterized by slowed molecular transport (often quantified by viscosity or molecular diffusion constant) induced by entanglement of the polymer chains. This transition has been observed to occur in n-alkanes slightly longer than the ones examined here [41], which also fits with broader polymer theory [46]. However, since this study focuses on behavior at the boiling point, polymer-like behavior would be unlikely to manifest even if modestly longer n-alkanes are considered.

Conclusions
We have found that GAFF2 with charges parameterized with RESP reproduces liquid densities at the normal boiling point of n-alkanes from methane to pentadecane within an average deviation of 2%. All deviations in density are positive but deviations in ∆H vap are more complex. ∆H vap is overestimated for short alkanes but the increase with chain length is too small so it is underestimated for long alkanes. This suggests that revisiting the parameterization of aliphatic molecules could be a way to improve this force field for these types of molecules. We examine the local structure and resulting thermodynamics and show that the shortest n-alkanes have a distinct liquid structure, but based on radial distribution functions, the series is found to converge to similar distribution functions for longer alkane chains. Further, we found that the molecular structure around terminal carbon atoms depends sensitively on the chain length, but interior carbon atoms have a more uniform local solvation structure. Finally, we found that the binding free energy between carbon atoms becomes less favorable with increasing chain length. The longer alkanes have more carbon atoms "holding" onto each other, so the net free energy holding molecules in the condensed phase does not necessarily become less favorable.
This study supports using GAFF2 to model these systems, despite the disparity in some cases between the environments and physical conditions envisioned when the force field was developed. There are undeniably specialized force fields [22,47,48] or even less empirical methods like ab initio molecular dynamics that can be used for these systems, but there are advantages to GAFF2. The diverse range of molecules that can be described by GAFF2, or conversely the difficulty of parameterizing new molecules or functional groups within the framework of more sophisticated force fields, makes GAFF2 a useful tool for atypical molecules. Additionally, since GAFF2 employs relatively simple functional forms, it can feasibly consider larger systems, which are often beyond practical limits for more computationally intensive methods. Therefore, approaches like those that we have presented here are an important tool for studying complex, heterogeneous systems like those where alkanes often occur. Data Availability Statement: The information required to reproduce the simulations and all thermodynamic properties are provided within the article.