Accuracy of Molar Solubility Prediction from Hansen Parameters. An Exemplified Treatment of the Bioantioxidant l-Ascorbic Acid

Estimating molar solubility from the Hildebrand-Scott relation employing Hansen solubility parameters (HSP) is widely presumed a valid semi-quantitative approach. To test this presumption and to determine quantitatively the inherent accuracy of such a solubility prognosis, l-ascorbic acid (LAA) was treated as an example of a commercially important solute. Analytical calculus and Monte Carlo (MC) simulation were performed for 20 common solvents with total HSP ranging from 14.5 to 33.0 (MPa)0.5 utilizing validated material data. It was found that, due to the uncertainty of the material data used in the calculations, the solubility prediction had a large scattering and, thus, a low precision. Prediction power is most adversely affected by the uncertainty of the HSP estimates (solvent and solute), followed by the solute heat of fusion. The solute melting temperature and molar volume have minor effects. Computed and experimental solubilities show the same qualitative behavior, while quantitative discrepancies reach one to three orders of magnitude. Solubility estimates were found to provide, at best, rough guiding information but, with the quality of material data on LAA available, they cannot be rated semi-quantitative. It is assumed that these results generally apply at least to solute-solvent systems with a material data quality and solubility similar to LAA.


Introduction
The production and processing of chemicals normally involves solute-solvent systems at some stage. Thus, solubility plays a key role in all sorts of industrial applications such as synthesis, extraction, or purification, as it also does, for instance, in packaging, painting, food technology, or pharmacology. In commercial environments, attention is placed on rational product design and process intensification. The development of solid amorphous dispersions in the pharmaceutical industry (e.g., drug containing polymers), the functionalization of food packaging films with antioxidants, or the formulation of multicomponent solvents for membrane production, separation purposes, or other applications, are examples of practical problems that benefit clearly from the possibility of reliable solubility prediction. Consequently, it is of great industrial value to predict or, at the least, reasonably estimate the solubility from basic physicochemical solute and solvent property data.
A frequently adopted approach using the solubility parameter concept has been proposed early on for nonpolar solvents by Hildebrand and Scott [1][2][3][4]. Despite its various assumptions, this approach is considered useful in many practical areas, not least because of its conceptual simplicity, even though its predictive accuracy has not been quantitatively assessed in the literature. This study evaluates the numerical uncertainty typically encountered when applying the Hildebrand-Scott approach to solubility prognosis of the commercially important natural antioxidant l-ascorbic acid (LAA). The equation giving the mole fraction solubility x 1 of a solid solute reads in its basic form for a regular solution [5,6]: where ∆H 1 the heat of fusion in units of J/mol, T m,1 the melting temperature, and V 1 the molar volume; for the solute, Φ 2 the solvent volume fraction, and the Hildebrand solubility parameters are δ 1 (solute) and δ 2 (solvent). R and T have their usual meaning. Throughout this paper, subscript "1" ("2") designates the solute (solvent). The derivation of Equation (1) rests on assuming a vanishing term containing the molar heat capacity difference of the solid and supercooled liquid state of the solute. Both regular solution theory as well as the Hildebrand solubility parameter concept originally treat the mixing of two liquids. To extend this approach to dissolving crystalline solids in liquids at T < T m,1 the molar volume V 1 of the (hypothetical) subcooled liquid solute needs to enter Equation (1). Generally, knowledge of the value of V 1 necessitates approximations, for instance, by group contribution methods [7] or by the extrapolation of solute melt density data to the solution temperature. Applying Equation (1) to various solute-solvent systems, it was rated vaguely to provide correlations or predictions that were "quite good," "fair," or "unreliable" for nonpolar, polar aprotic, and, protic solvents, respectively [5] (p. 344). This rating stays unchanged if the Hildebrand solubility parameter δ is replaced by the totalHansen solubility parameter (HSP) as with, in extension to Hildebrand, now three contributions to the material cohesive energy accounting for nonpolar dispersive, polar, and hydrogen bonding interactions (subscripts 'd', 'p', and 'h'). The HSP concept has been explained in detail in the literature [8,9]. In principle, the theoretical basis for Equation (1) limits its use to nonpolar systems, but any remedy for this drawback makes necessary a considerably more complex treatment leading to intricate equations. While genuine quantitative agreement with experimental solubility data is rather rare, the above relation is still widely believed to prove its practical worth.
Experimental values for the parameters V 1 and δ 1 , commonly originate from solubility data in some mixtures and are available for comparatively few compounds only. Therefore, these values, instead, have to be estimated with inevitable but usually unassessed error. Outlines of estimation methods as well as widely accepted δ 2 values for numerous solvents can be found in the literature [5,[7][8][9]. While Equation (1) is still relatively simple, there are significant data requirements for this approach relating to thermal (∆H 1 , T m,1 ), volumetric (V 1 ), and cohesive properties (δ 1 , δ 2 ). Besides its widespread use and economical bearing, LAA was chosen in this study as a paradigmatic compound for its low solubility (x 1 ≤ 10 −3 ) in numerous solvents. Such low solubilities ensure that the solvent volume fraction Φ 2 can safely be approximated by unity. Thus, iterative numerical approximations (e.g., the method of successive approximations [10]) do not need to be performed so that x 1 can straightforwardly be computed from Equation (1). Elimination of iterative optimization procedures in the calculation reduces numerical error and makes it possible to easily apply Monte Carlo (MC) methods, as described later, without the need for specific programming or dedicated software. The literature known to the authors does not provide quantitative information for LAA or other compounds on the error margins that occur when deriving solubility from HSP.
The present paper aims to deliver solubility predictions for LAA, including a comprehensive, exemplified treatment of the aforementioned accuracy issue. Another objective is to ascertain the fundamental uncertainty of estimating molar solubility following the concepts underlying the ISO Guide to the Expression of Uncertainty in Measurement (GUM) [11]. Available material data on LAA were scrutinized and thoroughly consolidated to provide values and their respective uncertainties for all relevant material quantities. From these, confidence intervals (CI) for the resulting solubility estimates of LAA are derived, the influences of the engaged quantities on the prediction error are investigated and, finally, the results are compared with available experimental data.

Determination of Molar Volume
Pure melt, solution, particle, and crystallographic literature data of LAA were assessed to derive an estimate for the molar volume V 1 of the (hypothetical) supercooled liquid at 25 • C, the usual reference temperature for HSP data. The LAA molecular weight was taken as 176.124 g.
In addition, because the solubility parameter is a function of molar volume (δ ∝ V −0.5 ), the molar volumes, V 2 , of all solvents investigated here were calculated from experimental densities listed in reference [12] and compared with the values compiled in HSP standard works [5,8,9]. The purpose was to check to which temperatures the data in [5,8,9] actually refer to. In all cases where the solvent molar volume did not relate to measurements at 25 • C or 20 • C, the literature was searched to obtain data that are more appropriate.

Determination of Solubility Parameter
In line with common practice, the extension of the Hildebrand parameter, the total Hansen solubility parameter δ t is taken to enter Equation (1) throughout. For the solute, δ 1 is calculated from additive group contributions according to the scheme by Hoftyzer and van Krevelen (HVK) using the molar volume as determined in this study and referencing the SMILE (simplified molecular-input [7,9]. For the solvents, δ 2 is computed from the Hansen parameters compiled in [9]. In case significant modification to the molar volume is applicable, δ 2 is additionally calculated by the HVK method to verify plausibility. Solvent data used for the calculations are compiled in Table S1.

Determination of Solute Melting Parameters
Published data on the heat of fusion, ∆H 1 , and the melting temperature, T m,1 , of LAA were evaluated to deduce the most probable values and minimum-maximum ranges for both quantities.

Solubility Calculations
Mole fraction solubilities x 1 were computed from Equation (1) using Φ 2 = 1. In addition, GUM-compliant simulations were carried out within the respective uncertainty ranges of the material variables relevant here [11]. While maintaining the usual pattern of the Monte Carlo method, the following four consecutive steps were performed: (1) an input domain of five input variables (∆H 1 , T m,1 , V 1 , δ 1 , δ 2 ) was defined, (2) values for each variable were randomly generated from a uniform probability distribution (boxcar) over the respective variable range (mid-range value ± 1 2 ·(maximum-minimum)), (3) molar solubilities x 1 were deterministically calculated according to Equation (1) and using the input values from step (2), and, finally, (4) the calculated solubility values were aggregated and analyzed. For each solvent, the quartiles and 95% confidence limits (LCL, UCL) of x 1 were numerically derived from 1000 random variable quintuples. The solution temperature was set to T = 25 • C for all calculations. The total solubility parameter and molar volume of LAA were both used as determined in this work. All computations and simulations were performed using a commercial spreadsheet (Excel, v. 2016, using the solver and analytical functions add-ins, Microsoft Corp., Redmond, WA, USA). Specific coding was not required. It is emphasized, that no molecular dynamics calculations were performed and that the term "Monte Carlo simulation" refers to the determination of the uncertainty of the molar solubilities calculated within the GUM framework as above.

Molar Volumes
Solute. Literature in the HSP context persistently quotes a molar volume of 124.85 cm 3 for LAA. Tracing the source, it becomes evident that this value originates from melt density data that were interpolated to 192 • C [13,14]. Apparently, this volume refers to the amorphous-liquid state at a temperature far beyond 25 • C, suggesting substantial overestimation. To arrive at a less questionable figure, more reliable data were searched. From X-ray crystallographic unit cell dimensions by Hvoslef, a value of 103.67 cm 3 at room temperature was derived [15]. Particle density at 25 • C gave 106.74 cm 3 with the particle purity and crystallinity both unknown [12,14]. As HSP values generally relate to the liquid state, it is instructive to look also at the LAA partial molar volume (apparent molar volume at infinite dilution) in the solution. By linearly extrapolating aqueous LAA solution density to zero molality at 25 • C, one obtains partial molar volumes ranging from 102.38 to 106.49 cm 3 (n = 9) with a median (mean) of 105.40 (105.04) cm 3 [16][17][18][19]. All these data point at a molar volume of about 104 cm 3 for highly ordered solid LAA at 25 • C.
In [13,14], the temperature dependence of melt density above 192 • C is approximated by a modified Rackett equation. Extending this relation way down to 25 • C yields an estimate for the molar volume of supercooled liquid LAA of 109.85 cm 3 , which is about 6% larger compared with the crystalline state. Since the coefficients of the approximate formula lack any indication as to their accuracy, this value should be considered tentative albeit, for instance, analogous polymer data suggest it to be reasonable. Plausibility is corroborated by the fact that, at 25 • C, amorphous (rubbery or glassy) polymers possess molar volumes that exceed their crystalline counterparts typically by up to 10% [7] (pp. 77-85). The foregoing results are summarized in Table 1, and it is proposed that 109.9 cm 3 /mol might be taken as the most probable value for V 1 (25 • C), while presuming an error of about ±4% seems reasonable. Table 1. Estimates for the molar volume V 1 of l-ascorbic acid.

Solubility Parameters
Solute. The LAA solubility parameter of the subcooled liquid state was computed for two different molar volumes (temperatures) according to the HVK scheme using the commercial software HSPiP [9]. At 20 • C and 25 • C, the resulting HSP components, as well as δ 1 , were the same within less than 0.3% error, so that δ 1 (20 • C) ≈ δ 1 (25 • C) = 35.3 (MPa) 0.5 will be used further. To derive an estimate for the uncertainty, the ±4% error of V 1 (25 • C) was accounted for so that one arrives at an imprecision of ±0.5 (MPa) 0.5 for the individual HSP components and about ±1 (MPa) 0.5 for δ 1 . These values are congruent with precision claims by Abbott [9] and Stefanis [22] on determining HSP from experimental data. Therefore, a margin of ±1 (MPa) 0.5 is viewed a realistic lower error boundary for δ 1 . Table 2 lists calculation results and literature data. 2) when using their molar volume (V 1 = 106.7 cm 3 /mol), suggesting some error with respect to δ p in [24]. This seems likely in view of Park et al. citing work by Ravindra et al. that had been relevant to their calculus but was later shown to be substantially erroneous [25].
Solvents. The solubility parameters δ 2 of all solvents relate to 20 • C or 25 • C. For methyl hydro peroxide, the Hansen components were not taken from [9] but, instead, calculated by the HVK method yielding at 20 • C (δ d , δ p , δ h ) = (14.9, 13.1, 21.7) and δ 2 = 29.4, each in units of (MPa) 0.5 . With δ 2 = 36.7 (MPa) 0.5 , according to [9], the difference stays roughly within the δ ∝ V −0.5 dependency, so that there appear to be no major discrepancies other than those arising from double the molar volume. Like δ 1 , an error margin of ∆δ 2 = ±1 (MPa) 0.5 is assumed a reasonable figure for the solvents.
All solvent values are presented in the overall results table further below.

Solute Fusion Parameters
LAA is known to possess low thermal stability and to commence decomposing near the melting point [26][27][28][29][30]. Consequently, the assessment of fusion parameters is not only dependent on the measurement method, but also on experimental conditions, such as heating rates, causing discrepancies in the literature data, which should be used with due care. Corvis et al. [30] reference eight Differential Scanning Calorimetry (DSC) investigations, reporting melting points from 190.4 to 197.0 • C with a median of 192.9 • C. Kofler quoted fusion temperatures of 191 • C (Kofler bench), 185-190 • C (hot-stage microscope), 189-194 • C (capillary tube), and 188 • C (Differential Thermal Analysis, DTA) [31]. Error estimates for the melting temperature were attempted by the Thermodynamic Research Centre [32] using the NIST Thermodata Engine [12] and by the Design Institute for Physical Property Data [14], all aiming to provide comprehensive measures of the overall data reliability. These evaluated data for T m,1 are compiled in Table 3. Weighting these individual estimates with the reciprocal uncertainty yields a mean of 190.6 • C with a presumed most probable error of ±5 • C. The latter value is justified by allowing for a safety margin in view of an average compound error of ±4.25 • C, and an uncertainty of ±4 • C (95% CI) for the most comprehensive error estimation published [12] (pp. 3-1, 3-30).
Because LAA degradation and melting phenomena appear to superimpose the exact effects experimental conditions have on fusion enthalpy, measurements are not yet fully understood. In consequence, there still is considerable uncertainty in the literature as regards ∆H 1 . Corvis [30] and Ziderman [33] quote seven values from six studies ranging from 37.0 to 47.1 kJ/mol. While the 95% CI (42.7 ± 4.0) kJ/mol agrees well with the mid-range value (42.1 kJ/mol), the median of 45.3 kJ/mol hints at an asymmetric, and thus, non-Gaussian distribution, rendering the use of mean and standard deviation doubtful. For the lack of more consistent data, the rounded mid-range value and half the (max-min) span are taken as "true" heat of fusion and its experimental uncertainty, respectively.

Solubilities
Calculated values of x 1 are compiled in Table 4 together with the quartiles Q i and the span ratio (Q 3 /Q 1 ) as determined from the MC simulation. Because symmetric boxcar probability distributions were used and x 1 = f(∆H 1 , T m,1 , V 1 , δ 1 , δ 2 ) is a monotonous function with regard to each variable, the median solubility from simulation is expected to meet Q 2 ≈ x 1 within statistical scatter, which was confirmed by the results. Also, as is expected from Equation (1), increasing solubility is matched by a decreasing absolute solubility parameter difference ∆δ = |δ 1 − δ 2 |. The numerical effect the material parameter uncertainties have on solubility will be analyzed in the following.  Table S1 and material data from Tables 2 and 3. Because all calculated solubilities meet x 1 < 0.002, the solvent volume fraction is well approximated by Φ 2 = 1 so that Equation (1) is simplified and the partially factorized expression for x 1 reads: (2) Keeping T at 25 • C, both the sensitivity of x 1 to the five material parameters and their individual contributions to the uncertainty of x 1 can approximately be assessed by computing the solubility for the maximal and minimal value of any one variable while the other variables stay fixed at their respective mid-range values. Since x 1 is a continuous and monotonous function in all variables, the span ratio R = (maximum x 1 )/(minimum x 1 ) rates the influence a material parameter has on the target quantity solubility. Values for this uncertainty measure or, in the wording of the GUM, sensitivity measure are listed in Table 5. The exact expressions that were used for the calculation of the span ratio R are compiled in Table S2. Table 5. Influence of variable uncertainty on molar solubility x 1 of l-ascorbic acid at 25 • C.
Apparently, the solubility parameter errors ∆δ i cause the principal largest uncertainty equally in x 1 . Their influence on solubility prognosis is the more dominant it is, then the larger the solvent-solute difference ∆δ. Since ∆δ i = ±1 (MPa) 0.5 already is a rather optimistic error estimate, it is unlikely that solubility uncertainty can be improved to this end. The second strongest influence is exerted by the melting enthalpy. Its effect is independent of the solvent, and it is the only prominent factor for small ∆δ. Reducing the relative error of the fusion enthalpy to ±4% would substantially lower the corresponding (max/min) ratio from 17.8 to 1.6, offering a promising route towards a more precise solubility prediction. In the following, the MC results and experimental literature data presented in Figures 1 and 2 are treated. The solid lines in Figure 1 correspond, from top to bottom, to the 100%, 97.5%, 75%, 25%, 2.5%, and 0% percentile from simulation. The dotted line represents the analytical computation results with nodes (filled squares) for each solvent.
Besides the solubility as calculated from Equation (1), Figure 1 also depicts the 100%, 95%, and 50% CI as derived from the MC simulation. The diagram reveals expected parabolic behavior for allCIs, as well as decreasing solubility uncertainty (narrowing CIs) as δ 1 and δ 2 become more similar. The latter point is taken up again in more detail in Figure 2.
The finite number of simulations makes extreme values improbable to occur and, in line with the GUM approach, the 95% CI or interquartile range should be considered to provide a meaningful measure of uncertainty. The quotients of conjugate confidence limits range from 2.1 to 11 (50% CI) and 2.8 to 483 (95% CI), respectively, indicating substantial imprecision.  The finite number of simulations makes extreme values improbable to occur and, in line with the GUM approach, the 95% CI or interquartile range should be considered to provide a meaningful  The finite number of simulations makes extreme values improbable to occur and, in line with the GUM approach, the 95% CI or interquartile range should be considered to provide a meaningful Additionally, in Figure 1, experimental solubilities at 20 • C and 25 • C from [34][35][36] are plotted (See Table S3 for numerical data). A difference of 5 • C in solution temperature results in an 8% to 30% change in solubility dependent on solvent and study. The inter-study variability of x 1 was found to be marked, reaching even 435% (!) for some solvents for unresolved reasons. Here, despite their magnitude, these discrepancies can be disregarded when compared with the 'intrinsic' uncertainty as evaluated above or with the systematic offset of the measured data. Compared with the 100% (50%) percentile the experimental values are larger by a factor 5 to 10 3 (20 to 2·10 4 ). Generally, higher solubility can be caused by a decrease of ∆H 1 , T m,1 , V 1 , or ∆δ = |δ 1 − δ 2 |, or any combination thereof. The first three causes are ruled out on the grounds of uncertainties of these material parameters being already bracketed by the 100% CI as displayed in Figure 1. Considering the solubility parameters, it is evident that shifting all δ 2 values equally by some −5 (MPa) 0.5 could make the estimated and experimental data match. This would correspond to a systematic and equally large error of the same sign for all solvents, which is considered extremely improbable. On the other hand, ∆δ also decreases if δ 1 (solute) becomes smaller. A non-linear least-squares fitting of Equation (1) to the experimental values optimizing δ 1 , and leaving all other parameters fixed at their respective mid-range values, consistently arrives at δ 1 ≈ 27 (MPa) 0.5 , some 24% below the value of 35.3 (MPa) 0.5 and derived according to HVK. Apparently, a too high solute solubility parameter could, in principle, explain the observed differences, but other reasons, such as a general insufficiency of the underlying theory or oversimplification due to dropping the heat capacity term in deriving Equation (1), could also play a role. Normally, this term is positive and effectively increases solubility to some extent making, in principle, predicted and experimental data match better. Because, (a) there is no indication as to the actual value of this term, and (b) it is considered very unlikely that a difference of 3 to 4 log(x 1 )-units can be consistently explained along this line, this point needs to be left unaddressed. Another explanation for systematically predicting too low solubilities could be a solute activity coefficient that is consistently overestimated by the model for all solvents. With due caution, this could possibly hint at some specific solute-solvent interactions playing a role in this context [37]. Exploring this issue further would be beyond the scope of this study.
Overall, the results show that the scatter of the deterministically calculated solubility values is strong due to the inherent inaccuracies of the material data. The quality of the material data of LAA is well within the usual range and can be classified as typical. Furthermore, the accuracy of the determination of the HSP values by group contribution methods is approximately the same for all compounds. If one adds that the concrete way of calculating the solubilities (purely deterministic versus numerical-iterative) could have an influence, one can assume that the statements about LAA are also valid for compounds with similar material data uncertainty and solubility.

Conclusions
Solubility estimates for l-ascorbic acid utilizing the Hildebrand-Scott relation are subject to severe uncertainty. Quantifying uncertainty for solvents with solubility parameters between 14.5 and 33.0 (MPa) 0.5 yields upper and lower confidence limits that are up to a factor of 10 (50% CI) and 500 (95% CI) apart, respectively. With the physicochemical data on LAA available to date, solubility uncertainty originates mainly from the HSP of both solvent and solute, and the solute heat of fusion. The most promising route towards achieving better accuracy is to obtain values for the heat of fusion that are more precise. Computed and experimental solubilities show the same qualitative behavior, although quantitative discrepancies are distinct, reaching one to three orders of magnitude for exact reasons still to be resolved. Further, it is assumed that the prediction uncertainty deduced for LAA in this work is generally to the same extent for any HSP-based solubility estimates of other compounds that possess comparable material data quality and solubility. In consequence, it appears that the notion of a semi-quantitative solubility prediction using the HSP methodology should be abandoned.