Predicting Vapour Pressures of Organic Compounds from Their Chemical Structure for Classification According to the VOCDirective and Risk Assessment in General.

The use of organic compounds in the European Union will in the future be regulated in accordance with the Council Directive 1999/13/EC of 11 March 1999 [1]. In this directive, any organic compound is considered to be a volatile organic compound (VOC) if it has a vapour pressure of 10 Pa or more at 20°C, or has a corresponding volatility under the particular condition of use. Introduction of such a limit will sometimes create problems, because vapour pressures cannot be determined with an infinite accuracy. Published data on vapour pressures for a true VOC will sometimes be found to be below 10 Pa and vice versa. When the same limit was introduced in the USA, a considerable amount of time and money were spent in vain on comparing incommensurable data [2]. In this paper, a model is presented for prediction of the vapour pressures of VOCs at 20°C from their chemical (UNIFAC) structure. The model is implemented in a computer program, named P_PREDICT, which has larger prediction power close to 10 Pa at 20°C than the other models tested. The main advantage of the model, however, is that no experimental data, which will introduce uncertainty in the predictions, is needed. Classification using P_PREDICT, which only predicts one value for a given UNIFAC structure, is proposed. Organic compounds, which can be described by the UNIFAC groups in the present version of P_PREDICT, therefore, can be classified unambiguously as either VOCs or non-VOCs. Most people, including the present authors, feel uneasy about prioritising precision above accuracy. Modelling vapour pressures, however, could save a lot of money and the errors introduced are not large enough to have any substantial adverse effects for neither human beings nor the environment. A method for calculating vapour pressures at other temperatures than 20°C is tested with a dubious result. This method is used for EU risk assessment of new and existing chemicals.


Introduction
The Council Directive 1999/13/EC of 11 March 1999 on the limitation of emissions of volatile organic compounds due to the use of organic solvents in certain activities and installations, was published in the Official Journal of the European Communities. It is hereafter referred to as the VOCdirective. The VOC-directive defines a volatile organic compound as an organic compound having at 293.15 K a vapour pressure of 0.01 kPa or more, or having a corresponding volatility under the particular condition of use [1].
Data on vapour pressures for low boiling compounds are relatively abundant in the literature, while data for high boiling compounds often are scarce. This is especially the case for compounds having vapour pressures close to 10 Pa at 20 o C and below.
For this study, two commercial databases have been examined: the DIPPR database from Design Institute of Physical Properties (DIPPR) [3] and the TRC database from the University of Texas [4]. In both databases, vapour pressures are given as a function of temperature. For compounds which are solids at 20 o C, the vapour pressures are given for the sub-cooled liquid.
The prices for these databases are quite high for small and medium sized enterprises, which might only need to search for vapour pressures for a few compounds each year. A cheap alternative would be to consult the International Uniform Chemical Information Database (IUCLID) issued on CD-ROM by the European Commission Joint Research Centre [5]. In IUCLID, only vapour pressures at specific temperatures are given, but for most compounds, data are given at 20 o C.
Examining these three databases, conflicting data were encountered in several cases, even though the DIPPR and TRC data have been evaluated by experts. For instance, in the TRC database the vapour pressure of dihexyl phthalate (molar mass 334.45 Dalton) is given as 0.00095 Pa at 20 o C, whereas didodecyl phthalate (molar mass 502.78 Dalton) at the same temperature is given as 0.00096 Pa. Since the molar mass of the latter is 1.5 times that of the former, one or both these values must be erroneous. Table 1 shows that the vapour pressure of 1-hexanol is 50% higher in the DIPPR database than in the TRC database and that the DIPPR value is very close to the value found by N'Guimbi et al. [6]. Further, Table 1 shows that the value found in the TRC database for 1-decanol is more than a factor of eight higher than the value found for the same compound in the DIPPR database. Again the value in the DIPPR database is very close to the value of N'Guimbi et al. [6]. [Pa] TRC [4] [Pa] IUCLID [5] [Pa] N'Guimbi et al. [6] [Pa] NA NA: no data are available. 1) reported to be out of range. 2) Vapour pressure at 25 o C. Table 1 also shows data for methoxyacetic acid and triethanolamine. Without additional data or a validated model, it is impossible to decide which values are the correct ones, even when, as in these cases, the data differ by orders of magnitude.
Most data on vapour pressures originates from industry or industry related research. Vapour pressures, therefore, are often measured at process temperatures, i.e. in the range of 150-300 o C. Since vapour pressure is exponentially related to temperature, extrapolation from 150-300 o C to 20 o C is rather uncertain, unless accurate data are available, which have been measured at ambient temperatures.
The introduction of the VOC-directive [1] causes a special problem, because vapour pressures cannot be measured with an infinite accuracy. As money is at stake, fruitless discussions may be expected, when doubt can be raised whether a compound has a vapour pressure above or below the 10 Pa limit. Sometimes the devotees of the conclusion that a given compound should be classified as a VOC, may be able to find "hard" experimental evidence, which supports this view. At the same time, their opponents may be able to present data, which show the opposite. This problem is illustrated in Table 2. As the data in Table 2 shows, various values for the vapour pressure of n-dodecane can be obtained, using either experimental data or models. No means exists, which can help choosing the value closest to the (unknown) true vapour pressure of n-dodecane.

Existing Models
For risk assessment in the occupational and ambient environments, the demand to data accuracy for vapour pressures is not crucial, because other data used in the risk assessment process commonly have large uncertainties, e.g. toxicological data. Models, therefore, may be used with confidence for risk assessment in general or for classification in particular, even if they are not perfectly accurate.
The vapour pressure of a pure substance can be calculated using the Clausius-Clapeyron equation, which can be derived from fundamental thermodynamic relations [12]: where p=is the pure compound vapour pressure The Clausius-Clapeyron equation (1) has been derived under the following three assumptions: 1. The molar gas volume is assumed to be much larger than the molar liquid volume. 2. The gas phase is assumed to be ideal. 3.=∆h Vap is assumed to be constant.
The molar volume of an ideal gas at 20 o C is 24054.7 cm 3 mol -1 . The liquid molar volume of e.g. toluene is 106.0 cm 3 mol -1 [13], i.e. below 0.5 percent of the molar volume of an ideal gas. The two first assumptions, therefore, are not likely to introduce large errors.= However, ∆h Vap is not constant, but strongly dependent on temperature. For instance, the evaporation enthalpy of dodecane at 20 o C is 60.7 [MJ mol -1 ], whereas it is 44.4 [MJ mol -1 ] at 216.3 o C, which is the normal boiling point of n-dodecane [13]. Disregarding the temperature dependence of the molar evaporation enthalpy, therefore, can be expected to cause poor predictions. This is investigated below for two models based directly on the Clausius-Clapeyron equation and one model which has two correction terms added. These models all need different experimental data as input.

Knowing the Normal Boiling Temperature
The vapour pressure of a pure compound can be calculated from the Clausius-Clapeyron equation applying Trouton's rule. Trouton's rule says that ∆h Vap = ∆h VapTb = 88 T b , [7] where ∆h VapTb is the molar evaporation enthalpy at the normal boiling point and T b is the normal boiling point. Figure 1 shows an experimental data set on log-vapour pressures at 20 o C from DIPPR [3] versus calculated values using Trouton's rule in combination with the Clausius-Clapeyron equation (1). Figure 1 shows that the Trouton model works fine for low boiling compounds, but for high boiling compounds the vapour pressures are overestimated.

Knowing the Normal Boiling Temperature, Critical Pressure and Critical Temperature
Many models for prediction of evaporation enthalpies from critical properties, normal boiling temperature, and a series of model parameters, have been collected in Reid et al. [8; 11]. They perform more or less the same.   Figure 2 shows predictions of lnp using evaporation enthalpy, as predicted by the Ciacalone model [8], in combination with the Clausius-Clapeyron equation versus DIPPR [3] data. Again, the model fits very well for low boiling compounds, but the vapour pressures for high boiling compounds are overestimated.

Polynomial in Absolute Temperature
The Riedel-Plank-Miller model is a polynomial in the absolute temperature: The constants A-D can be calculated from critical pressure and critical temperature, normal boiling temperature, and a series of model parameters [11]. In Figure 3, data for calculation of the constants A-D have been taken from SUBTEC [13]. Figure 3 shows that the vapour pressures of low boiling compounds are predicted accurately, but prediction of vapour pressures for high boiling compounds are less accurate. The model appears to have only a modest systematic error (bias).

Accuracy of Model Predictions
Figures 1 -3 show that for low boiling compounds the prediction power of the models tested is rather high, but it is low for high boiling compounds. That is not surprising, because these models were developed for low boiling compounds or high boiling compounds at high temperatures. For high boiling compounds, the models based on Clausius-Clapeyron and an estimated value for the evaporation enthalpy systematically overestimate the vapour pressures at 20 o C. The performance of the Riedel-Plank-Miller model is better, but it does not predict vapour pressures of high boiling compounds at 20 o C especially accurately. All the models need experimental data for the calculations, which will result in different values predicted, when different input data are used in the calculations.

Needs
For risk assessment in the working and outdoor environments in general and particularly for classifying organic compounds in according with the VOC-directive, suitable models are needed. In particular, there is a need for models for predicting vapour pressures at 20 o C for high boiling compounds, for which data are scarce, uncertain, and sometimes conflicting. The most promising of the models tested, the Riedel-Plank-Miller model, could be the starting point. Data on critical properties and normal boiling temperatures could be modelled using the method of Lydersen [8,11]. Futher, Constantinou and Gani [14] have successfully predicted critical properties and normal boiling temperatures directly from the chemical (UNIFAC) structure of organic compounds. In this paper, however, a more simple approach will be tried out.

Objectives
The objective is to develop a model for predicting vapour pressure at 20 o C of organic compounds from minimum, but unambiguous information, i.e. the compounds' chemical (UNIFAC) structures. No experimental data, which potentially may introduce uncertainty in the predictions, should be used.

Methods
The authors' earlier attempts to model vapour pressures from the chemical structure were unsuccessful. Apparently, one major cause was low data quality. The data selected from DIPPR [3] for this work, therefore, were data, which complied with two objective rules: 1. Vapour pressure data should be available at ambient temperature, i.e. between -40 and 40 o C. 2. Data should be available from independent sources, which do not deviate more than 50%. 294 compounds were selected and divided into training and a test set by random sampling of equal size. In the following, these data are considered as consisting of "true" values for vapour pressures at 20 o C.

The Chemical (UNIFAC) Structure of Molecules
The UNIFAC method has been used for predicting vapour pressures, but only for volatile compounds having vapour pressures above 10 mmHg (1333 Pa). The model was not validated [15]. The UNIFAC group contribution method [16] considers molecules as not consisting of atoms, but of fractions of molecules, named functional groups. From a limited number of functional groups, a large number of molecules can be constructed. The functional groups used in this work are shown in Appendix. Table 3 shows examples on how molecules are constructed from the functional groups. The Table shows the ordinary UNIFAC groups (first order groups). Improvements, however, were obtained, when an additional set of groups (second order groups) was introduced to compensate for certain weaknesses in the molecular description using only first order groups [14].
The UNIFAC method for instance presumes that UNIFAC groups act independently. Some groups, in certain molecules, however, cannot be considered as being independent, e.g. the two hydroxy-groups in 1,2-propanediol will interact differently from the two hydroxy-groups in 2,5-hexanediol. Also the second order groups are listed in Appendix.
It is of note that UNIFAC does not discriminate between isomers. For instance, o-, m-, and pxylene have the same UNIFAC structure. 1) AC is aromatic carbon.
The UNIFAC group contribution method offers a method to model physico-chemical properties of organic compounds, because one functional group may be a part of many different compounds. It is of note that the UNIFAC method does not distinguish between isomers, e.g. o-, m-, p-xylenes consist of the same UNIFAC groups.

PLS
Multiple linear regression (MLR) can be used when variables are independent. If they are not, errors will be propagated uncontrolled, when matrices are inverted. MLR, therefore, cannot be used in this work, because the first and second order UNIFAC groups are correlated.
Instead, vapour pressures have been modelled from first and second order groups using a multivariate data analytical method named PLS, which is described in details in Martens and Naes [17]. The acronym stands for Partial Least Square. PLS projects data on structures, which are mutually orthogonal. Intercorrelation between variables, therefore, is not a problem. A commercially available computer programme, "The Unscrambler" [18], has been used for the modelling.
If the model predicts the vapour pressures for the compounds in the test set with a satisfactory accuracy, the model is said to be validated. It is implicitly implied that a validated model is a model, the prediction power of which is appropriate for the purpose for which it is going to be used, i.e. it can confidently be used for prediction of vapour pressures for other compounds, provided that the compounds in the training and test sets are representative for the new compounds. Representativeness in this context means that the functional groups of the new compounds are present in compounds in the training and test sets. The validation procedure using a test set of data, which is not used for the modelling, is the most rigorous test method available [19]. Further, it is the method which is the easiest to explain in layman's terms [17].
Several ways exist to judge a model's prediction power. The most informative method is to plot predicted versus measured values, calculate the linear regression line and observe: (1) How close the relationship between predicted and measured values is to being linear. (2) How close the slope of the regression line is to one. (3) How close the intercept with the ordinate axis is to zero, and (4) How close the correlation coefficient is to one. It is of note that all four items should be considered simultaneously. Table 4. shows three other measures proposed in [19].  Figure 4 shows the model validation procedure used in this work. The data set is divided into a training set, which is used to create the model and a test set, which is used for testing the predictive power of the model. Afterwards the modelling is repeated with the test set as training set and the training test as test set.

P_PREDICT
The developed models have been implemented in a computer program for easy calculation of vapour pressures at 20, 23, and 25 o C. The program has been named P_PREDICT. Molecules can be constructed in the program by selecting among the build-in first and second order UNIFAC groups, which are shown in Appendix. To check on correct construction of the molecule, the program calculates the molecular mass from the selected first order groups. This calculation provides a necessary, but of course not a sufficient check that the molecule is correctly constructed.
It is possible to calculate vapour pressures in P_PREDICT, using seven different models: a general model and a series of six special models to account for aliphatic hydrocarbons, aromatic hydrocarbons, n-alkanes, and oxygen-containing compounds (eg. alcohols, aldehydes, ketones, ethers, and esters), as well as one model for mono-, di-, and triols, and another one for n-alcohols.  Figure 5 shows that the slope of the regression line is almost equal to one and the intercept of the y-axis is almost zero. The relationship between predicted and measured values is close to being linear. Predictions for single compounds, e.g. diiodomethane, nitrobenzene, and hexadecane, however, are less accurate than for other compounds. The reason may be due to limitations in the model's prediction power or that the values for the vapour pressures for these compounds are erroneous -or both. The average bias is -0.02 and the standard error of performance (SEP) is 0.51 log units, corresponding to 1.2 Pa, the mean squared error of prediction (RMSEP) is therefore = (Bias 2 + SEP 2 ) 0.5 = (0.01 2 + 0.51 2 ) 0.5 ≅ 0.51 log units. It is of note that the bias is the average bias in the long run. Large overpredictions and large under-predictions may balance each other out resulting in a low average bias. A low bias, therefore, is a necessary but not a sufficient condition for a high prediction power. For practical reason, the model cannot be given explicitly as it consists of six equations with in total 336 coefficients. That is one of the reasons why the P_PREDICT software has been developed.   It is of note that the more specialised the models, the higher is the prediction power. However, the more specialised the model, the narrower is its use. Table 5 shows predictions of P_PREDICT general model and "true" vapour pressures for selected compounds. p: vapour pressure of the pure compound at 20 o C DIPPR [3] and IUCLID [5].=∆ is difference between calculated and measured vapour pressure and ∆% is the deviation, relative to the DIPPR value. NA: no data are available. 1) These values seem to be in error by a factor of 10, maybe due to misplaced decimal points. 2) at 25 o C. Table 5 shows large absolute, but small relative deviations for low boiling compounds. The largest absolute deviation is for hexane for which the prediction is 4432.9 Pa lower than the value found in DIPPR [3], corresponding to a relative deviation of 27.3%. For high boiling compounds the absolute deviations are more modest, but some of the relative deviations are large. The highest relative deviation is 73.4% for 1-octanol, where P_PREDICT predicts a value of 11.1 Pa, which should be compared with the DIPPR [3] value of 6.4 Pa. The four compounds from Table 1 have been predicted by P_PREDICT.
The absolute deviations are modest, but because the vapour pressures are small are the relative deviations large. The n-alkanol model predicts the vapour pressure to be 69.2 and 0.8 Pa for 1-hexanol and 1-decanol, respectively. The vapour pressure of methoxyacetic acid is predicted to be 11.5 Pa using the model for oxygenated compounds. Further, Table 5 shows that the data found in IUCLID [5], when available, are in most cases comparable to the data found in DIPPR [3]. Apparently, some values for 1butanol and o-xylene in IUCLID [5], however, are mistakes. It is of note that the data in IUCLID, according to the disclaimer on the CD-cover, "...has not undergone any evaluation or validation by the European Commission". Table 6 shows predicted and "true" vapour pressures for a selection of high boiling hydrocarbons. The largest absolute deviation between predicted and measured values in Table 6. is for tertbutylbenzene, the vapour pressure of which is predicted to be 142.7 Pa. This value is 65.5 Pa lower than the value in DIPPR [3], corresponding to a relative deviation of 31.5%. The largest relative deviation is for hexadecane. The predicted vapour pressure is 0.3 Pa, which should be compared with the value in DIPPR [3] of 0.11 Pa, corresponding to a relative deviation of 145.5%. The P_PREDICT hydrocarbon model predicts a value of 0.12 Pa, which deviates 9.1% from the DIPPR value.

"…, or having a corresponding volatility under the particular condition of use" [1]
In the draft versions of the VOC-directive, e.g. Draft VOC-directive, 1997 [20] the limit between VOCs and non-VOC was defined as 10 Pa at 20 o C. In the final version of EU VOC-directive [1], however, this was changed to: 10 Pa at 20 o C, or having a corresponding volatility under the particular condition of use. One possibility to cope with this demand is to use the following equation [7]: where P 1 is the vapour pressure in atmospheres, T b is the normal boiling point [K] and T 1 the temperature [K] at which the vapour pressure, P 1 , was measured. First T b is calculated from Equation (3), then P 2 is calculated for another temperature T 2 using the T b just calculated. Equation (3) is a part of the EUSES software, which is used for risk assessment of new and existing chemicals in the EU [21; 22]. Table 7 shows calculated and measured values for some compounds. Generally, it seems as Equation (3) underestimates the vapour pressures; the more the higher the temperature. The absolute deviation becomes smaller for high boiling compounds, whereas the relative deviation becomes larger. Further, there is a tendency that the larger the temperature spans; the higher are both the absolute and relative deviations. Table 7 shows that 1-nonanol is likely to be misclassified at 40 o C by Equation (3), because the calculated value is 6.6 Pa, while the experimental value is 13.45 Pa.

Discussion
Vapour pressure data for volatile organic compounds (VOCs) are used for technical purposes in the chemical industry. In the occupational and ambient environments, data on vapour pressures of organic compounds are used for risk assessments and risk management.
The demand to data quality depends on what data are to be used for. If data are to be use for designing costly petrochemical installations, cost-benefit considerations may often show that it is worthwhile to do expensive experimental work in order to obtain data of sufficient quality. In contrast, for risk assessment in the working or ambient environments, the demand for accurate data on vapour pressures is not that imperative. The reason is that the uncertainty of risk assessment is predominantly determined by very uncertain toxicological data. Cost-benefit considerations commonly tell us that improving the physico-chemical database has little effect on the overall uncertainty.
For some compounds, however, data found in the literature are deviating unacceptably much even compared with toxicological data. For other compounds, no data on vapour pressure can be found. This is often the case for high boiling compounds.
The recently published EU VOC-directive [1] defines a VOC as an organic compound which: 1. has a vapour pressure above 10 Pa at 20 o C, or 2. having a corresponding volatility under the particular condition of use.
In the draft versions, e.g. Draft VOC-directive, 1997 [20] only included item number 1. The rationale behind adding item 2 in the final version is likely to be that a substance evaporates at a rate proportional to its vapour pressure [13]. Including item 2 means that dotriacontane (C 30 n-alkane), the melting point of which is 70 o C, is considered to be a VOC when it is used at 200 o C, because its vapour pressure at that temperature is 30.4 Pa [3]. It is difficult to envision how item 2 will work in practice, because having a temperature dependent criteria means that a certain organic compound, e.g. 2-nonanol, is a VOC during summer time, but not during winter. Further it will be a VOC in Spain, but not in Scandinavia.

Organic compounds used at room temperature
In the context of risk assessment for occupational and environmental hygiene, it is not of great importance, whether the limit between VOCs and non-VOCs is one, ten or fifty Pa at 20 o C. The differences are too small to have any measurable impact on man or the environment. But, when first a limit has been introduced, it may have far-reaching economic implications and costly struggles may be anticipated between people of different interests. For compounds having vapour pressures close to 10 Pa at 20 o C, vapour pressure data may sometimes be found above as well as below the limit.
When the 10 Pa limit was introduced in the USA, a lot of time and money were wasted on hefty strides over incomparable vapour pressure values and measurement methods [2].
Measuring vapour pressures is costly, modelling vapour pressures, therefore, is an interesting alternative. Using P_PREDICT, in the context of the VOC-directive, is economically attractive, but if doing so the greatest advantage will be that classification of organic compounds can be done unambiguously, because no experimental data are used for the predictions.
The bias of the models is small, but vapour pressures for single compounds may be wrong. Because logarithm of vapour pressure was modelled, compounds with high vapour pressures have large absolute deviations, but the relative deviations are modest. For compounds with low vapour pressures, the absolute deviations are modest, but some relative deviations are high. The deviations, however, are not as large as the departures sometimes found in the literature between vapour pressure data for the same compound.
The first and second order groups in the present version of P_PREDICT are listed in Appendix. P_PREDICT can only predict the vapour pressure for molecules, which can be constructed using these functional groups.
Depending on regression models available for single compounds, vapour pressures for pure compounds and mixtures of known composition can be calculated at various temperatures using the SUBTEC software [13]. For most compounds, the temperature range is from below zero to several hundred degrees Celsius.

Organic Compounds Used at Elevated Temperatures
Vapour pressures at elevated temperatures could be calculated using Equation (3) from the values predicted by P_PREDICT at 20 o C. It is, however, a serious problem that Equation (3) seems systematically to underestimate vapour pressures at higher temperatures. The higher the temperature, the greater the underestimation. Further, the underestimations are higher for compounds with low vapour pressures, which are of particular interest in the context of the VOC-directive as they have vapour pressures at 20 o C close to 10 Pa.
Equation (3) is a part of the EUSES software, which is used in EU risk assessment of new and existing substances [21; 22]. The poor performance of this model may, therefore, have large implications.
Vapour pressures of pure compounds and mixtures can be calculated at elevated temperature using the SUBTEC software [13] for compounds included in this software.

Conclusions
The VOC-directive, which is recently introduced in the EU for regulating the use of organic compounds, defines the limit between VOCs and non-VOCs to be 10 Pa at 20 o C or at temperature of use. The experience from the USA, when the 10 Pa limit was introduced some years back, indicate that fruitless and costly efforts can be anticipated, because, it is often possible to find apparently high quality data above as well as below 10 Pa at 20 o C, when a compound's true vapour pressure is close to 10 Pa.
The model presented in this paper offers a possibility for unambiguous classification of compounds as either VOCs or non-VOCs. Discussions over conflicting data, the differences of which is too small to have any measurable impact on man or the environment, therefore, can be avoided. The vapour pressures calculated using Equation (3) apparently gives too low results for high boiling compounds at high temperatures. It is, therefore, unfortunately that this equation is used for risk assessment of new and existing chemicals in the EU.

Future work
More groups could be introduced into P_PREDICT, when more high quality data become available for compounds, which contain groups not in the present version of P_PREDICT. P_PREDICT could be extended to predict vapour pressures at more temperatures than 20, 23, and 25 o C, which it is capable of in its present version.
To avoid different vapour pressure predictions for the same compound when using P_PREDICT for classification, it must be decided by a competent authority how P_PREDICT should be used. That is, whether the general model should always be used or if it should be allowed to use the more accurate submodels. One possible way is to always use the most specialised model available, i.e. the vapour pressure for 1-butanol should be calculated using the n-alkanol model. The vapour pressure for 2butanol should be calculated using the alkanol model, the vapour pressure of a butylester should be calculated using the model for oxygen containing compounds, etc. For hydrocarbons, which consist of an aromatic and a non-aromatic moiety, it should be decided whether the aliphatic or the aromatic model should be used or if the hydrocarbon model should be used.