Drug Solubility Correlation Using the Jouyban–Acree Model: Effects of Concentration Units and Error Criteria

An important factor affecting the model accuracy is the unit expression type for solute and solvent concentrations. One can report the solute and solvent concentration in various units and compare them with various error scales. In order to investigate the unit and error scale expression effects on the accuracy of the Jouyban–Acree model, in the current study, seventy-nine solubility data sets were collected randomly from the published articles and solute and solvent concentrations in the investigated systems were expressed in various units. Mass fraction, mole fraction, and volume fraction were the employed concentration units for the solvent compositions, and mole fraction, molar, and gram/liter were the investigated concentration units for the solutes. The solubility data, with various solute/solvent concentration units, were correlated using the Jouyban–Acree model, and the accuracy of each model for correlating the data was investigated by calculating different error scales and discussed.


Introduction
Solubility is an important physico-chemical property which is in demand for different applications in the pharmaceutical industries [1]. These applications are the proper solvent chosen for synthesis, extraction, purification, and dissolving media for assessing the biological activity of a drug/drug-like compound. The commonly used method to provide the solubility data of drugs in mono-solvents and binary and ternary mixtures is their experimental measurements, which is a costly and time-consuming procedure. Another main limitation of experimental measurements arises in early drug discovery studies. Only small amounts of the drug powder are available, and many experimental determinations need to be performed. As a possible solution, a number of mathematical models were reported for the correlation/prediction of the solubility data in mono-solvent or binary mixtures. These models for estimating the solubility of drugs were reviewed by various research groups [2][3][4][5][6]. In the pharmaceutical applications of these models, their accuracy, simplicity, and amount of required input data are important parameters in their acceptance by the pharmaceutical investigators. An important factor affecting the model accuracy is unit expression type for solute and solvent concentrations. One can report the solute and solvent concentration in various units (e.g., mass fraction, mole fraction, and volume fraction for solvents and mole fraction, molar, and gram/liter for solutes). In the case of solute concentration, molarity and gram/liter are volumetric scales related to the moles of the solute. In contrast, the mole fraction is a gravimetric scale and is related to the number of moles of both solute and solvents. In the case of solvent concentration, mass fraction and mole fraction are gravimetric scales, whereas volume fraction is a volumetric scale [7]. Gravimetric scales are relatively robust scales; however, the volumetric scale can be affected by temperature, due to the possible expansion of the solution, especially at higher temperatures. Different cosolvency models were used for correlating the solubility of solutes in solvent mixtures [8,9]. The Jouyban-Acree model was one of the most accurate models that has recently attracted more attention. In order to investigate the unit expression effects on the accuracy of the Jouyban-Acree model, the aims of this work were: (1) to collect solubility data for several drugs in different solvent mixtures and express them in various units; (2) to fit each data set to the Jouyban-Acree model with various units and compute the deviation of back-calculated data; and (3) to compare the suitability of various accuracy criteria.

Experimental Data Sets and Computational Methods
The collected solubility data sets from the literature (a total of 79 data sets) were fitted to the Jouyban-Acree model and explained in detail for each analysis. Drug concentrations were converted using the molecular masses of the drugs and/or the density values of the saturated solutions. The solvent compositions were converted employing the density of the solvent mixtures taken from the literature. Various combinations of the solute/solvent concentration units were analyzed in this work. Table 1 lists the details of these expressions. Obviously, all fraction concentration units, i.e., mole fraction, mass fraction, and volume fraction, varied in the range of 0.0-1.0. The minimum molar concentration of the investigated data points was 3.8 × 10 −6 mole/L or 1.0 × 10 −3 g/L (for sulfadiazine datum dissolved in water at 293.2 K, SN = 59) and the maximum value (16.2 mole/L correspond to 2786.6 g/L) belonged to sulfanilamide in 1,4-dioxane + water (SN = 70, 0.5 + 0.5 mole fractions) at 323.2 K. A major part of these wide variations was compensated in the two first terms of the Jouyban-Acree model, in the logarithmic scale, and the range of the variation of the obtained excess values were much narrower. The Jouyban-Acree model, the most accurate cosolvency model [8], is described as follows: where x 1,T , x 2,T , and x m,T represent the solubility of the solute in mono-solvents, one and two, and mixed solvents in various concentration units (in this work, mole fraction, molar, and g/L) at a temperature of 'T', respectively. The w 1 and w 2 stand for the concentrations of the mono-solvents, one and two, in the absence of the solute. In this work, these parameters are expressed in various units (mole fraction, mass fraction, and volume fraction). Terms of J i are the parameters of the model and are computed by regressing analysis of (ln x m,T − w 1 . ln x 1,T − w 2 . ln x 2,T ) against ( w 1 .w 2 T ), ( ). The number of parameters (np) is usually two but, for some cases, up to three or even four can be used.
The experimental solubility data (x Exp. ), in the current work, were fitted to the model and the back-calculated solubility data (x Cal. ) were used to compute some indices of error evaluation, including the percentage of mean relative deviation (MRD%), relative mean square deviation of arithmetic scale (RMSD 1 ), RMSD of logarithmic scale (RMSD 2 ), error in arithmetic scale (E 1 ), and error in logarithmic scale (E 2 ), computed using Equations (2)-(6): where N is the number of data points in each set.

Results and Discussion
The solubility data of each drug expressed in the units of mole fraction, molar, and gram/liter in the binary solvent mixtures with the solvent compositions expressed in various units of mole fraction, mass fraction, and volume fraction, defined as codes 1-9 (see Table 1 for details), were fitted to Equation (1). More details of the collected data sets are listed in Supplementary Table S1. The back-calculated data were used to compute various error evaluation criteria. When overall MRD% values (for codes 1-9) were classified according to the drug, the largest value was obtained for the ketoconazole data sets (overall MRD% = 25.7) and the smallest value was observed for the dapsone data sets (overall MRD% = 4.9). The obtained error values for each numerical method, which were expressed in MRD%, are listed in Supplementary Table S2. The largest MRDs% for codes 1-3 were observed for the solubility of ketoconazole in the carbitol + water system (SN = 12) and those for codes 4-9 were obtained for ketoconazole in the acetonitrile + water system (SN = 11). Figure 1 illustrates the overall MRD% and their standard deviations (SDs) for 79 data sets and different numerical analysis codes. As can be seen from the results, there was no significant difference in overall MRD% values for codes 1-3 and 4-9; however, there was a significant difference among these subgroups. These results mean that the drug concentration was not an affecting parameter on the fitness capability of the Jouyban-Acree model when MRD% was considered as an error criterion; however, the concentration of the solvents in the absence of the drug might affect the fitness of the model to the experimental data. Careful examination of the distributions of the various solvent compositions revealed that the mean value of the mole fractions was 0.36, whereas those of the mass and volume fractions were 0.52 and 0.52. Our earlier observations showed that, with the equal distances among the fractions (i.e., mean fraction of 0.50), the Jouyban-Acree model provided the most accurate correlations, and the observed differences among codes 1-3 (expressed as mole fraction) with codes 4-9 (expressed in volume or mass fractions) could be justified by the skewness of the mole fractions. Another difference in these analyses was several variations in the numerical values of the model constants of Equation (1) and, also, the number of significant J terms of the Jouyban-Acree model. As an example, the J 0 , J 1 , J 2 , and the obtained MRD% values for the solubility data of sulfadiazine in acetonitrile + methanol mixtures (SN = 60) are listed in Table 2. The mean of mole (0.46), mass (0.50), and volume (0.50) fractions of the solvent composition in this set was relatively equal. Similar investigations were carried out on the solubility data of paracetamol in PEG 400 + water (SN = 55), with the mean of the mole (0.14), mass (0.50), and volume (0.48) fractions of the solvent composition. The highest deviations from 0.50 was observed for the mole fraction data, and the obtained MRD% for code 1 was 14.2%. Meanwhile, the corresponding values for mass (code = 4) and volume (code = 7) fractions were 3.3 and 3.0%. In another data set, i.e., meloxicam in ethanol + ethyl acetate (SN = 55), with the mean of the mole (0.50), mass (0.58), and volume (0.60) fractions of the solvent composition, the MRDs% for codes 1, 4, and 7 were 14.2, 3.3, and 3.0%, respectively.   Table 3 lists the effects of different numbers of the J terms and the MRD% values for SN = 60. As was expected, employing more curve-fitting parameters, i.e., the J terms, more accurate correlations could be obtained. According to the theoretical justification of the Jouyban-Acree model [8,9], the J terms represent the non-ideal mixing behavior of the solution. For ideal mixing behavior, all J terms were non-significant constants and the Jouyban-Acree model reduces to the Yalkowsky model [10]. The Yalkowsky model is an algebraic linear model which consider an ideal mixing for solvent mixtures without any energy exchanges. This model is expressed as: Supplementary Table S3  Supplementary Table S4 reports the RMSD 2 accuracy criterion for the investigated systems. The overall 100 RMSD 2 values varied from 10.5 (for codes 8 and 9) to 19.8 (for code 1). Concerning the solvent compositions, the order of 100 RMSD 2 values for the mole fraction solubility of the drugs was volume fraction (10.5) < mass fraction (10.9) < mole fraction (19.8). The corresponding orders concerning the molar and g/L drugs' concentrations were volume fraction (10.5) < mass fraction (11.0) < mole fraction (19.4) and volume fraction (10.5) < mass fraction (11.1) < mole fraction (19.4). Similar to the RMSD 1 case, the numerical values of drug solubility in the saturated solutions were the governing parameters in RMSD 2 calculations in which the overall 100 RMSD 2 were 13.7, 13.6, and 13.7, respectively, for the drugs' mole fraction, molar, and g/L solubilities. The largest 100 RMSD 2 value (for codes 1-3) was observed for the solubility of ketoconazole in carbitol + water system (SN = 12) and was obtained for ketoconazole in NMP + ethanol (SN = 14) for codes 4-9.
Supplementary Table S5 lists the details of E 1 for different codes where the largest E 1 values were observed for the solubility of sulfanilamide in 1,4-dioxane + water (SN = 70) for codes 1-9. Table 4 lists the overall E 1 values obtained for the various drugs investigated according to the investigated codes. Supplementary Table S6 reports the details of the E 2 values. E 1 and E 2 are the absolute error, or variances in the arithmetic and logarithmic scales, respectively. The absolute error uses the same scale as the data being measured. Therefore, in the case of solubility with the g/L unit (especially in the arithmetic scale), the high values can be recorded as error criteria that make the data comparison difficult. RMSD 1 and RMSD 2 are root-mean-square deviations in the arithmetic and logarithmic scales, respectively. These error criteria are the mean square root of the variance and, similar to absolute error, are related to units of measurements. However, MRD%, as a mean relative deviation, facilitates the comparison between datasets or models with different scales due to normalizing the data by dividing the variance to the observed values. From this point of view, the MRD% definition is similar to %RSD (relative standard deviation which is used as a repeatability and reproducibility index for repeated measurements) and may be the best error criterion.
Furthermore, as the solubility data for the investigated systems (different solutes and different solvents) lies in different data value ranges and with considering the magnitude of data for high soluble compounds which show a high absolute error for both arithmetic and logarithmic scales, the comparison between different systems for finding the system with high error is not possible. Herein, MRD% can be a helpful error metric for the comparison of different systems. This is because this metric, with its normalizing data property, puts the data in a similar and comparable range.
Correlations between various error criteria against MRD% are shown in Figure 2. Code 4 values were chosen as the reprehensive one for showing the correlations. As presented in Figure 2, good correlations were observed between the E 2 and RMSD 2 error criteria vs. MRD%, so that the data scattered around the line. However, in the case of RMSD 1 and E 1, some significant deviations were observed when the models assessed using the MRD% criterion.
In another effort, the effect of outlier data points on the error indices' behavior were also investigated. For this purpose, we intentionally changed a datum in several reported data sets and studied the trend of each error metric. Code 4 values were, again, chosen as the reprehensive one for showing the correlation. For example, the solubility data value for dapsone in the mixture of ethanol + water (SN = 5) at 298.2 K in the ethanol mass fraction of 0.5 (i.e., 0.000930) was changed to 0.00930. The error increase was from 8.9% to 10.5 for MRD%, from 24.74 to 83.28 for 10 5 RMSD 1 , from 1.37 to 2.24 for E 1 , from 11.52 to 24.46 for 100 RMSD 2, and from 0.089 to 0.11 for E 2 . In another investigation, the solubility data value for naproxen in the mixture of ethylene glycol and ethanol at 298.2 K in an ethylene glycol mass fraction of 0.5 (i.e., 0.0135) was changed to 1.35. The error increase was from 1.7% to 15.46 for MRD%, from 27.36 to 17959.59 for 10 5 RMSD 1 , from 1.97 to 260.33 for E 1 , from 2.4 to 60.30 for 100 RMSD 2, and from 0.02 to 0.2 for E 2 . Large deviations were observed for overestimated/underestimated data points, but all error criteria could be employed to detect outliers.

Conclusions
In the current study, the Jouyban-Acree model was used to correlate some solubility data sets at various binary solvent mixtures with different solute/solvent concentration units and to compare the suitability of various units with computing several accuracy criteria. The obtained results show that MRD% can be the best error metric that facilitates the comparison between datasets, or models, with different scales due to normalizing the data. Considering this error criterion for back-calculation data with various solute/solvent concentrations, the results show that concentration units cannot affect the fitness capability of the Jouyban-Acree model. Meanwhile, the concentration of the solvents in the absence of the drug might affect the fitness of the model to the experimental data. The number of curve-fit parameters of the Jouyban-Acree model was also affected by solute/solvent concentration expressions. However, this incompatibility can be compensated with the definition of the equal distances among the fractions for each selected solvent composition unit.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/molecules27061998/s1, Table S1: Drug Concentration; Table  S2: MRD%s of back-calculated data using Equation (1) for solubility; Table S3: 100000 RMSD 1 of back-calculated data using Equation (1) for solubility of investigated drugs in the studied solvent mixtures obtained for different concentration units; Table S4: 100 RMSD 2 of back-calculated data using Equation (1) for solubility of investigated drugs in the studied solvent mixtures obtained for different concentration units; Table S5: E 1 of back-calculated data using Equation (1) for solubility of investigated drugs in the studied solvent mixtures obtained for different concentration units; Table  S6: E 2 of back-calculated data using Equation (1)