Influence of Ethanol Parametrization on Diffusion Coefficients Using OPLS-AA Force Field

Molecular dynamics simulations employing the all-atom optimized potential for liquid simulations (OPLS-AA) force field were performed for determining self-diffusion coefficients (D11) of ethanol and tracer diffusion coefficients (D12) of solutes in ethanol at several temperature and pressure conditions. For simulations employing the original OPLS-AA diameter of ethanol’s oxygen atom (σOH), calculated and experimental diffusivities of protic solutes differed by more than 25%. To correct this behavior, the σOH was reoptimized using the experimental D12 of quercetin and of gallic acid in liquid ethanol as benchmarks. A substantial improvement of the calculated diffusivities was found by changing σOH from its original value (0.312 nm) to 0.306 nm, with average absolute relative deviations (AARD) of 3.71% and 4.59% for quercetin and gallic acid, respectively. The new σOH value was further tested by computing D12 of ibuprofen and butan-1-ol in liquid ethanol with AARDs of 1.55% and 4.81%, respectively. A significant improvement was also obtained for the D11 of ethanol with AARD = 3.51%. It was also demonstrated that in the case of diffusion coefficients of non-polar solutes in ethanol, the original σOH=0.312 nm should be used for better agreement with experiment. If equilibrium properties such as enthalpy of vaporization and density are estimated, the original diameter should be once again adopted.


Introduction
Tracer binary diffusion coefficients (D 12 ) are essential when designing equipment or optimizing production, both for conventional and newly developed rate-controlled processes [1,2]. Even though large databases of diffusion coefficients have been published [3], a lack of data is still verified, especially for bioactive polar solutes in polar dense solvents such as ethanol.
The predominant method for experimental determination of D 12 values is the Taylor dispersion or Chromatographic Peak-Broadening method [4][5][6][7][8][9][10], which is time-consuming and requires specialized equipment and expensive solute standards. Alternatively, one may recur to phenomenological models such as the widely known Wilke-Chang equation [11,12]. However, for polar-solvent systems, the results achieved by such purely predictive equations are often moderate [3,13] since these models do not account for strong interactions between the molecules, for example hydrogen bonds. Data-correlative models, such as the 2-parameter equations of Dymond-Hildebrand-Batschinski (DHB) [14][15][16] and the Rice and Gray-based approach by Zêzere et al. [3], tend to yield better results, but their big disadvantage is the need of experimental data to determine the optimized parameters for each system.
An alternative approach for predicting D 12 considers Machine Learning (ML) algorithms as, for instance, the model proposed by Aniceto et al. [13] for polar-solvent systems. design and optimize industrial equipment and kinetic processes. Despite some correlations available in the literature [14][15][16], including some originating in our group [3], the fact is that there is no general theory to accurately estimate D 12 of polar systems without having experimental (or accurate computational) data. Therefore, new approaches for determining such values are expected to have an impact on the field.
Several modifications to OPLS-AA have been published to improve D 11 calculations of ethanol and other alcohols. For instance, Kulschewski and Pleiss [34] proposed the adjustment of the hydroxyl group partial charges to better describe D 11 of alcohols, but this parametrization still originated significant differences (e.g., for ethanol, the estimation of D 11 differs up to 22%, depending on the experimental value used for comparison). Another very recent work by Zhang et al. [35] focusing on C1 to C10 primary alcohols considered a combination of L-OPLS parameters for the hydrocarbon tail and OPLS-AA parameters for the hydroxyl group, as suggested by Zangi [33], with additional tiny adjustments (scale factors in the range of 1.00-1.03) of the partial charges. This approach, the so-called mixed-OPLS-AA model refinement of the OPLS-AA force field for liquid alcohols with scaled charges [34], decreased the deviations between experimental and estimated D 11 to values in the range of −8% for nonan-1-ol and 5% for ethanol [35]. For comparison, the corresponding deviations calculated with the original mixed-OPLS-AA model [33] (i.e., without the charge scaling) were 19% and 34%, respectively. Other force fields, such as OPLS4 [32], extensively tested by Baba et al. [22] for the prediction of D 11 over 152 diverse pure liquids at various temperatures, achieved maximum deviations roughly under 20% for ethanol. Petravic and Delhommelle [36] tested OPLS-UA (united atom version of OPLS-AA), which, similarly to OPLS-AA, overestimates D 11 of ethanol by roughly 25%, on average, due to density underestimation (around −4%). Enforcing the experimental density of ethanol decreased the D 11 value; the calculated value still slightly overestimates the experimental result, yet to a smaller extent [36]. Cardona et al. [37] tested both GAFF and TraPPE-UA for the estimation of D 11 for ethanol, reporting deviations of −2.08% and 7.47%, respectively. Finally, Schnabel et al. [38] proposed an ethanol rigid anisotropic united-atom model, based on Lennard-Jones and Coulombic interactions, which achieved average deviations of −6% for D 11 of liquid ethanol [39].
Not disregarding the previous proposed corrections and/or parameterizations, a different approach for refinement of OPLS-AA when computing either D 12 or D 11 is presented in this work. In particular, the influence of the diameter of ethanol's oxygen atom (σ OH ) on the diffusion coefficients of systems embodying hydrogen-bonding solvents is analyzed. This correction was triggered by Hirschfelder et al. [40], who claimed that the diameters of molecules tend to be smaller when computed from transport properties than when equilibrium properties are used. Here, we show that a reparameterization, i.e., smaller ethanol σ OH value, can be successfully introduced in the case of self-and binary diffusion coefficients (transport properties) in opposition to equilibrium properties (namely, density and enthalpy of vaporization) for which the original parametrization affords better results. A database of experimental D 11 and D 12 values, together with the above-mentioned equilibrium data, was compiled in order to validate the MD simulations and assumptions.  [20]. The deviations are also consistent with those from previous MD studies focusing on diffusivities of alcohols, as summarized in the Introduction section.

Results and Discussion
Hirschfelder et al. [40] stated that the diameters of molecules tend to be smaller when computed based on transport properties than when using equilibrium properties. Therefore, we decided to investigate the effects of changing the diameter of the oxygen atom from the ethanol hydroxyl group upon the D 12 values derived from classical MD simulations. In the OPLS-AA force field, the diameter of the aliphatic alcohol oxygen atom (σ OH ) is 0.312 nm. Thus, we performed simulations of ethanolic solutions of quercetin and gallic acid at the conditions reported in Table 1, using σ OH for ethanol in the range 0.304 to 0.312 nm and without changing the remaining simulation parameters (Figure 1 and Table S1). The best comparison of calculated and experimental D 12 of quercetin and gallic acid in ethanol ( Table 1) was obtained when using σ OH = 0.306 nm. Encouragingly, the optimum diameter found is close to the 0.307 nm utilized in the OPLS-UA force field [28]. In practice, this change implies that the molecules of ethanol become closer to each other than in the original parametrization; hence, density increases (as it will be shown further) and the free volume of the solvent decreases, lowering the D 12 values. In the case of quercetin, at the simulated conditions of 303.15 K and 1 bar, 303.15 K and 150 bar, 323.15 K and 1 bar, and 323.15 K and 150 bar, the newly found deviations ranged between −6.30% and 4.22% (individual results are given in Table 1), with global average relative deviations (ARD, defined in Section 3.1) and average absolute relative deviations (AARD, defined in Section 3.1) of −0.04% and 3.71%, respectively. This is a massive improvement over the results achieved with the initial value σ OH = 0.312 nm. A similar improvement was confirmed for gallic acid with the new σ OH value giving relative deviations (RD, defined in Section 3.1) between −5.31% and 6.08% and ARD and AARD values of 1.05% and 4.59%, respectively. Although the ARD value is close to zero, for both solutes, a strong dependence between RD and T is verified with the D 12 values being slightly underestimated at 303.15 K and overestimated at 323.15 K and 333.15 K, as can be seen in Table 1.  The expected dependency of D 12 with T, P, and Stokes−Einstein abscissae (T/µ 1 ) was generally conserved, as can be observed in Table 1 and in Figure 2. D 12 increases with rising temperature due to the higher internal energy and higher free volume of the system, which facilitate diffusion. Raising the pressure decreases the free volume of the solvent and increases the energy required for the solute to escape from the force field generated by the solvent, thus penalizing its diffusion [41][42][43]. As for the Stokes-Einstein abscissae, a linear relation between D 12 and T/µ 1 was found in both cases, with R 2 values of 0.9737 and 1.000 for quercetin and gallic acid, respectively. confirmed for gallic acid with the new value giving relative deviations (RD, def in section 3.1) between −5.31% and 6.08% and ARD and AARD values of 1.05% and 4.5 respectively. Although the ARD value is close to zero, for both solutes, a st dependence between RD and is verified with the values being slig underestimated at 303.15 K and overestimated at 323.15 K and 333.15 K, as can be see Table 1.  0.306 nm for ethanol, and the calculated diffusivities were evaluated in comparison experimental data in terms of the relative deviation (RD), average relative deviation (ARD) average absolute relative deviation (AARD).  The expected dependency of with , , and Stokes−Einstein abscissae ( was generally conserved, as can be observed in Table 1 and in Figure 2.
incr with rising temperature due to the higher internal energy and higher free volume o system, which facilitate diffusion. Raising the pressure decreases the free volume o solvent and increases the energy required for the solute to escape from the force generated by the solvent, thus penalizing its diffusion [41][42][43]. As for the Stokes-Ein abscissae, a linear relation between and ⁄ was found in both cases, wit values of 0.9737 and 1.000 for quercetin and gallic acid, respectively.

D12 of Organic Solutes in Liquid Ethanol: Oxygen's Radius Validation and Cases of Applicability
For validation of the new proposed parametrization, we selected two OH-be systems with polar ends, namely, ibuprofen, an organic acid, and butan-1-ol, a pri alcohol, both in liquid ethanol. Alternative compounds for testing would be pheno benzoic acid, but, unfortunately, the experimental diffusivities for these two compo

D 12 of Organic Solutes in Liquid Ethanol: Oxygen's Radius Validation and Cases of Applicability
For validation of the new proposed parametrization, we selected two OH-bearing systems with polar ends, namely, ibuprofen, an organic acid, and butan-1-ol, a primary alcohol, both in liquid ethanol. Alternative compounds for testing would be phenol and benzoic acid, but, unfortunately, the experimental diffusivities for these two compounds are scarce. As with the previous two systems, when employing ethanol's σ OH = 0.312 nm, the D 12 values deviate up to 32.24% from the experimental value for ibuprofen and up to 33.20% for butan-1-ol at the tested conditions (see Table S2 for detailed results). When using σ OH = 0.306 nm, the maximum deviation decreased to 4.26% and 8.86% for ibuprofen and butan-1-ol, respectively, with AARD = 1.55% and ARD = 0.70% for ibuprofen and AARD = 4.81% and ARD = 4.05% for butan-1-ol, confirming the improvements introduced by the proposed parameterization, as observed for the previous two systems. Furthermore, the dependency with T and P was also conserved, as can be evidenced by the results in Table 2. As for the Stokes-Einstein relation, depicted in Figure 3, once again, a linear relation was found between D MD 12 and T/µ 1 with R 2 = 0.9781 (experimental R 2 = 0.9891) for ibuprofen and for butan-1-ol, for which only two points were computed.  12 ) and computed (D MD 12 ) diffusion coefficients of ibuprofen and butan-1-ol in liquid ethanol at various temperatures and pressures. The computer simulations used σ OH = 0.306 nm for ethanol, and the calculated diffusivities were evaluated in comparison with experimental data in terms of the relative deviation (RD), average relative deviation (ARD), and average absolute relative deviation (AARD).   Table S2 for detailed results). W using = 0.306 nm, the maximum deviation decreased to 4.26% and 8.86% ibuprofen and butan-1-ol, respectively, with AARD = 1.55% and ARD = 0.70% ibuprofen and AARD = 4.81% and ARD = 4.05% for butan-1-ol, confirming improvements introduced by the proposed parameterization, as observed for previous two systems. Furthermore, the dependency with and was also conse as can be evidenced by the results in Table 2. As for the Stokes-Einstein relation, dep in Figure 3, once again, a linear relation was found between and ⁄ with 0.9781 (experimental = 0.9891) for ibuprofen and for butan-1-ol, for which only points were computed.    So far, the new parametrization improvements are independent of the actual functional group of the solute, i.e., alcohol (OH) or organic acid (COOH), confirming the results for quercetin (with five OH groups) and gallic acid (with three OH groups and one COOH group). To further study the validity of this hypothesis, additional simulations were performed for compounds without the OH moiety, namely, two hydrogen-bond-acceptor solutes (propanone and butanal) and two non-polar solutes (propane and benzene).
The D MD 12 values of propanone and butanal were computed in previous work [20], where the σ OH value of 0.312 nm yielded satisfactory results, with AARD values between 9.48% and 12.18% and ARD between 8.37% and 12.18% for the ketones studied. For aldehydes, the situation was similar, with AARD between 6.30% and 9.11% and ARD between 1.00% and 5.67%. All the simulations were carried out at temperatures between 303.15 K and 333.15 K and pressures up to 150 bar [20]. In this work, to test the new proposed σ OH value, two different temperatures (303.15 K and 333.15 K) and one value of pressure (1 bar) were simulated, for both solutes, using the computational procedure reported by Zêzere et al. [20], this time with 3 ns of equilibrium and 3 ns of production. Only one value of pressure was simulated since no strong dependence between P and RD has been found, either in Ref. [20] or in the present study. For propanone, at 303.15 K, the newly computed D MD 12 value shows higher absolute deviation (RD = −17.63%) than the one computed with σ OH = 0.312 nm, for which RD = 1.55%. However, at 333.15 K, the situation is quite the opposite, with the newly computed D MD 12 value achieving a lower absolute RD value (RD = 2.08%) than the one achieved with σ OH = 0.312 nm (RD = 23.53%). In this particular case, while the absolute RD value at higher T is lower, when setting σ OH = 0.306 nm, it comes at a cost of degraded performance at lower T, i.e., 303.15 K. The same behavior is verified for the aldehyde (butanal), for which the value of RD increases (from −5.15% to −24.48%) at 303.15 K and 1 bar and decreases (from 16.94% to −0.07%) at 333.15 K and 1 bar. Hence, for propanone and butanal, additional corrections are needed to translate the experimental dependence of the diffusivities with temperature, which affects also the data calculated with the ethanol's σ OH value of 0.312 nm [20].
To conclude this section, it is now safe to assume that the optimal σ OH value is tied to the solute containing OH, or not, independently of the functional group where the atoms are. Considering the studied cases, when the solute is protic (i.e., the solute can participate in hydrogen bonds as donor and as acceptor), best results are obtained when σ OH takes the value of 0.306 nm for the D 12 calculation. Conversely, when the solute is non-polar, the value for σ OH should be 0.312 nm. In the case of solutes with the ability to be hydrogen bond acceptors but not hydrogen bond donors, e.g., ketones and aldehydes, the preferred σ OH should be 0.312 nm since any lower value would cause RD to increase at lower T, despite decreasing RD at higher T values.

D 11 of Liquid Ethanol
The influence of the σ OH parameter was further tested on the computation of D 11 of liquid ethanol at 1 bar and at T between 298.15 K and 333.15 K. When parametrizing ethanol with σ OH = 0.312 nm, the achieved RD values were between 24.05% and 35.38%, with AARD = 30.62% and ARD = 30.62% (see Table S3 for more details). However, and similarly to the results for protic solutes, when parametrizing ethanol with σ OH = 0.306 nm, the achieved results drastically improve with the new-found RD between −5.91% and −0.77%, AARD = −3.51%, and ARD of the same value, as reported in Table 3. Furthermore, and similarly to what was verified for the D 12 calculation, the computed D 11 of ethanol follows the expected trends (i.e., increasing with T and the Stokes-Einstein coordinates). As depicted in Figure 4, there is a linear relation between D 11 and T/µ 1 with R 2 = 0.9997 (experimental value of 0.9958).  11 ) and computed (D MD 11 ) self-diffusion coefficient of ethanol. The computer simulations used σ OH = 0.306 nm for ethanol, and the calculated diffusivities were evaluated in comparison with experimental data in terms of the relative deviation (RD), average relative deviation (ARD), and average absolute relative deviation (AARD).  AARD = −3.51%, and ARD of the same value, as reported in Table 3. Furthermore, and similarly to what was verified for the calculation, the computed of ethanol follows the expected trends (i.e., increasing with and the Stokes-Einstein coordinates). As depicted in Figure 4, there is a linear relation between and ⁄ with = 0.9997 (experimental value of 0.9958).

Influence of the Oxygen's Energy Parameter
The influence of the Lennard-Jones energy parameter of ethanol's oxygen ( ) in the of ethanol and of quercetin and benzene in ethanol was also analyzed. It was found that increasing by 20% the value of (i.e., the value was made equal to the one in the TraPPE-UA force field [26]

Influence of the Oxygen's Energy Parameter
The influence of the Lennard-Jones energy parameter of ethanol's oxygen (ε OH ) in the D MD 11 of ethanol and D MD 12 of quercetin and benzene in ethanol was also analyzed. It was found that increasing by 20% the value of ε OH (i.e., the value was made equal to the one in the TraPPE-UA force field [26]) for calculations of D MD 11 of ethanol at 298.15 K and 1 bar, with σ OH = 0.306 nm, originated a minor improvement of the diffusivity (RD changed from −3.81% to 1.90%) without negatively affecting the density (RD was barely the same), suggesting that additional improvements can be made by fine tuning the value of ε OH . In the case of the original σ OH = 0.312 nm, the increase by 20% of ε OH led to an increase of RD from 33.33% to 38.10%, while a decrease by 20% of ε OH decreased RD from 33.33% to 25.71%, which suggests that fine tuning of the energy scale with the original radius value will hardly lead to a significant improvement in the D MD 11 of ethanol.
As it happened with the D MD 11 of ethanol, a systematic shift of the RD to more positive values was found for the protic compound quercetin when the value of ε OH was increased by 20%. For example, when employing σ OH = 0.306 nm, the RD for the calculated diffusivity, at 1 bar and 303.15 K, changed from −6.30% to 1.55%, while the value at 1 bar and 323.15 K changed from 3.13% to −7.10%. However, such systematic variation was not found in the case of the non-polar benzene molecule. In fact, when employing σ OH = 0.306 nm, the RD for the calculated diffusivity, at 1 bar and 313.15 K, changed from −18.86% to −19.30%.
The results above suggest that it may be possible to slightly decrease the RD values between calculated and experimental results upon tuning the ε OH parameter but also that two different σ OH values are needed for protic and non-polar solutes, i.e., it will be difficult to find a unique solution for all kinds of solutes. Hence, a new σ OH of ethanol's oxygen atom is proposed while fixing the energy parameter (ε OH ).

Equilibrium Properties of Ethanol
Typically, two properties used for the force-field calibration and validation are the enthalpy of vaporization (∆H vap ) and density (ρ). Contrary to what was verified in the case of diffusivities, when the original value σ OH = 0.312 nm is used for the calculation of ∆H vap , the computed value (42.0 kJ mol −1 ) compares well with the experimental result (42.3 ± 0.4 kJ mol −1 [45]), with RD = −0.71%. When the reparametrized value (σ OH = 0.306 nm) is used, the computed value of ∆H vap increases to 44.8 kJ mol −1 , which is 5.91% higher than the one computed with σ OH = 0.312 nm. This difference is mainly due to the computed potential energy of the liquid phase (U liquid ) being around 13% higher when using σ OH = 0.312 nm.
As for density, the values computed with σ OH = 0.306 nm are always higher than the ones computed with σ OH = 0.312 nm. This trend was already expected since, as stated before, a smaller σ OH value means the molecules may be closer to each other; hence, density increases. As for the density values computed with σ OH = 0.306 nm, RD values ranged between 1.59% and 2.58% for 298.15 K ≤ T ≤ 333.15 K at 1 bar. At a higher-pressure of 300 bar, the density values, computed at 308.15 K and 333.15 K, were also higher than the experimental ones with RD of 2.77% and 1.41%, respectively. Globally, this translates into an AARD = 2.12% and ARD of same value, which are both higher than the results obtained when using the original σ OH value (AARD = 0.46% and ARD = 0.34%). These results are summarized in Table 4, and comparison between these density values and the ones obtained with σ OH = 0.312 nm can be found in Table S4. Table 4. Ethanol's experimental (ρ exp ) and computed (ρ MD ) density and respective relative deviation (RD). The computer simulations used σ OH = 0.306 nm for ethanol.

Database
The database compiled for evaluation and optimization of the ethanol's oxygen diameter (σ OH ) is summarized in Table 5 for solute/ethanol or pure ethanol systems. Six polar-solute systems (namely, quercetin, gallic acid, ibuprofen, butan-1-ol, propanone, and butanal, with structural formulas in Figure 5) and two systems comprising non-polar solutes (benzene and propane, also shown in Figure 5) were considered. The first two polar-solutes (i.e., quercetin and gallic acid) were used for optimization of the σ OH value, while the second pair of polar solutes (i.e., ibuprofen and butan-1-ol) was selected for validation of the new proposed value. The remaining polar (i.e., propanone and butanal) alongside the non-polar solutes were chosen to investigate the initial hypothesis that OPLS-AA overestimates D 12 of protic solutes. Ethanol properties, such as D 11 , density (ρ), and enthalpy of vaporization (∆H vap ) were also included.
in which is the property under study, NDP is the number of data points, and the superscripts MD and exp represent the computed and experimental property, respectively.

Molecular Dynamics Simulation Procedure
The classical MD simulations were carried out with the GROMACS 2019 code [63- All the results calculated in this work were evaluated in terms of relative deviations (RD), average relative deviations (ARD), and average absolute relative deviations (AARD) calculated by: in which X is the property under study, NDP is the number of data points, and the superscripts MD and exp represent the computed and experimental property, respectively.

Molecular Dynamics Simulation Procedure
The classical MD simulations were carried out with the GROMACS 2019 code [63][64][65], using cubic boxes with 3500 molecules of ethanol and 4-20 molecules of the solute, corresponding to a mass fraction in the range between 0.5% and 1%. The precise number of solute molecules used for each system can be found in Supplementary Materials (Table S5).
The potential parameters used in this work for the different compounds are supplied in a separate zip file in a format (.itp) that is compatible with the GROMACS code. The equations used for calculation of the interactions and the topologies of the solute molecules can be found in Supplementary Materials (Equations (S1)-(S8)). For each temperature/pressure condition, the simulations were carried out using a published procedure by Zêzere et al. [20], with longer simulation times, based on the computational recipe proposed by Barrera and Jorge [23]. Each simulation was initialized by a steepest-descent minimization run, followed by a 100 ps simulation using the canonical ensemble (NVT) with initial velocities generated according to the Maxwell-Boltzmann distribution; next, a 100 ps run in the isothermal-isobaric ensemble (NPT) was carried out using the Berendsen coupling scheme [66]; finally, the simulation continued in the NPT ensemble up to a few nanoseconds (see below) with a time step of 0.001 ps. The last phase of the simulation was carried out using the leap-frog algorithm [67], with the box temperature and pressure being kept constant by using the V-rescale thermostat [68] and the Parrinello-Rahman barostat [69], respectively. The choice for the NPT ensemble was to certify that the simulations were performed at the same pressure and temperature conditions used to measure experimental data. Nevertheless, in previous work, it was found that diffusivities calculated from NVT and NPT simulations are very similar, with the main difference being the average pressure of the simulation [20]. Additionally, the LINCS algorithm was used to constrain the bond lengths, and a cut-off distance of 1.4 nm was adopted, as tested in a previous work [20], for both van der Walls and Coulomb interactions. The Particle-Mesh Ewald (PME) [70] summation was selected for the long-range electrostatic interactions. The simulation was carried out using the standard periodic boundary conditions, applying long-range dispersion corrections for energy and pressure. The compressibility values were taken from the literature [71] or estimated using published correlations [59]. As for the duration of the simulations, these were adjusted according to the desired property, as indicated in Sections 3.3 and 3.4.

Self-Diffusion and Binary Diffusion Coefficients
The diffusion coefficients (D 11 or D 12 ) were calculated by the Einstein relation of the mean square displacement (MSD) of the random motion of a molecule [72]: in which t 0 is the time origin, t is the time elapsed from t 0 , and r is the molecule/atom position. The average, represented by the angled brackets, was calculated using all molecules in the simulation and all time origins. Once the MSD as a function of time was known, a simple linear least-squares regression was performed between 50 and 100 ps, as shown previously, to yield accurate results [20]. For the D 12 calculations, the final NPT ensemble was carried out during 40 ns of which the first 20 ns were discarded to assure proper equilibration of the dynamics [20]. The final D 12 value was obtained from averaging the D 12 results of three independent simulation replicas.
As for D 11 calculations, the simulations were carried out with the setup previously described, with 10 ns of equilibration and 15 ns of production, and only one simulation was considered for the final D 11 value. Typically, the computed D 11 values are affected by the finite size of the simulation box. Hence, a correction should be introduced to overcome this limitation which, according to Yeh and Hummer [73], can be conducted by performing a linear regression between D MD,L 11 , the self-diffusion coefficient computed from a simulation box with finite length L, and 1/L. Accordingly: (5) where D MD,∞ 11 is the self-diffusion coefficient computed from the simulation box with infinite length (y-intercept at 1/L = 0), and m is the slope. Alternatively, a hydrodynamic correction can be directly applied [73]: in which D YH (T, µ 1 , L) is the Yeh and Hummer hydrodynamic correction of D MD,L

11
, k B is the Boltzmann constant (1.380649 × 10 −23 m 2 kg s −2 K −1 ), ξ is a dimensional constant of value 2.837297, and µ 1 the viscosity. This second approach was adopted in this work since it is computationally cheaper, and the D MD,∞ 11 results are equivalent to those obtained by the linear regression method, as depicted in Figure S1 (see Supplementary Materials). A total of 1500 ethanol molecules were used per simulation at each condition.

Enthalpy of Vaporization
The enthalpy of vaporization was computed by: in which U gas is the potential energy of the vapor phase, U liquid the potential energy of the liquid phase, and R g the ideal gas constant (R g = 8.314 J K −1 mol −1 ). To calculate U liquid , we used the last 15 ns of the simulation used to compute D 11 . The Barrera and Jorge [23] procedure was used to calculate U gas , with one molecule being placed inside a cubic box (15 nm × 15 nm × 15 nm), with no boundary conditions and all cutoff radii set to 0. The simulation was carried out in NVT using the leap-frog stochastic dynamics integrator, which adds a friction and a noise term to Newton's equation of motion [74]. The run was carried out for 50 ns, and the first 10 ns of the simulation were discarded.

Density
The densities of pure ethanol systems were calculated from the last 15 ns of the trajectories. For conditions for which no D 11 was computed, the simulations were carried out with the setup used for D 11 calculation.

Conclusions
The OPLS-AA force field was used in the calculation of tracer diffusion coefficients (D 12 ) of solutes in ethanol and of self-diffusion coefficients (D 11 ) of ethanol from molecular dynamics simulations carried out at different temperature and pressure conditions. It was found that when the oxygen atom of ethanol considers the original OPLS-AA diameter, it yields high deviations of D 11 of ethanol and of D 12 of protic solutes in ethanol.
In order to correct such deviations, the diameter of the oxygen atom (σ OH ) of ethanol was reoptimized by targeting D 12 of quercetin and of gallic acid (both protic solutes) in liquid ethanol, at 303.15 K ≤ T ≤ 323.15 K and P up to 150 bar, and 303.15 K ≤ T ≤ 333.15 K and P = 1 bar, respectively. With the new optimized value, i.e., σ OH = 0.306 nm, the comparison with the experiment was substantially improved, with an average absolute relative deviation (AARD) of 3.71% and average relative deviation (ARD) of −0.04% for quercetin and AARD = 4.59% and ARD = 1.05% for gallic acid.
Significant improvements were also found for the computed D 12 of ibuprofen in liquid ethanol at 298.15 K ≤ T ≤ 333.15 K and P up to 300 bar, with AARD = 1.55% and ARD = 0.70%, and for butan-1-ol at 298.15 K ≤ T ≤ 333.15 K and P = 1 bar, with AARD = 4.81% and ARD = 4.05%. A mixed behavior was observed for propanone and butanal for which the results improved at T = 333.15 K but significantly deteriorated at T = 303.15 K. However, in the case of non-polar solutes such as propane or benzene, the performance worsened with the new value at all the tested conditions. Regarding the D 11 of ethanol, the results vastly improved, having achieved AARD and ARD values of 3.51% and −3.51%, respectively, at the tested conditions.
In all the test systems, when computing either D 12 or D 11 , the expected trends with T, P, and Stokes-Einstein abscissae were always conserved.
Finally, the influence of the new value was evaluated in both enthalpy of vaporization (∆H vap ) and density of ethanol, achieving slightly worse results in both cases with RD = 5.91% for ∆H vap and AARD = 2.12% and ARD of same value for density. Hence, the value of σ OH = 0.306 nm is recommended only for the calculations of D 12 of protic solutes containing OH in pure liquid ethanol and D 11 estimation.  Data Availability Statement: The raw/processed data required to reproduce the above findings cannot be shared at this time due to technical/time limitations.

Conflicts of Interest:
The authors declare no conflict of interest.