Next Article in Journal
Mathematical Modeling of Integral Characteristics of Repair Process under Maintenance Contracts
Next Article in Special Issue
Structural Stability and Electronic Properties of Boron Phosphide Nanotubes: A Density Functional Theory Perspective
Previous Article in Journal
Entropy Analysis for Cilia-Generated Motion of Cu-Blood Flow of Nanofluid in an Annulus
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

QSPR Models for the Molar Refraction, Polarizability and Refractive Index of Aliphatic Carboxylic Acids Using the ZEP Topological Index

by
Zoiţa Mărioara Berinde
Department of Chemistry and Biology, North University Center at Baia Mare, Technical University of Cluj-Napoca, 430122 Baia Mare, Romania
Symmetry 2021, 13(12), 2359; https://doi.org/10.3390/sym13122359
Submission received: 7 November 2021 / Revised: 27 November 2021 / Accepted: 1 December 2021 / Published: 8 December 2021

Abstract

:
The molar refraction, polarizability, and refractive index for a series of monocarboxylic, dicarboxylic, and unsaturated monocarboxylic acids, having a symmetric or asymmetric structure, were investigated by the application of quantitative structure property relationship (QSPR) technique. We used a linear regression method and a single molecular descriptor, the ZEP topological index, calculated in a simple manner, with the help of weighted electronic distances, and also calculated on the basis of the chemical structure of the molecules. The high-quality performance and predictive ability of the QSPR models obtained were validated by means of specific validation techniques: y-randomization test, the leave-one-out cross validation procedure, and external validation. The investigated properties are well modeled (with r2 > 0.99) by the ZEP index, using the regression analysis as a statistical tool for developing reliable QSPR models. Our approach provides an alternative technique to the existing additive methods for predicting the molar refraction and polarizability of carboxylic acids, which is essentially based on the summation of atom and/or functional group contributions or bond contributions, and of some correction increments.

1. Introduction

Carboxylic acids form a family of organic compounds that contain the characteristic carboxyl functional group (-COOH or -CO2H), and they constitute an important class of chemicals that are very important in industry and also occur in many other processes. Among the most significant uses of carboxylic acids are the following: in making soaps, detergents, and shampoos; in food industry; in pharmaceutical industry; in the manufacturing of rubber; in making dye stuffs, perfumes, and rayon. Moreover, this series of fatty carboxylic acids is extremely good for human health. In the last years, the use of the properties of carboxylic acids, as independent variables in QSAR models, has been steadily increasing [1,2,3,4,5]; however, QSPR models that involve the properties of carboxylic acids as dependent variables are very few. This lack of interest seems to be due to the specific structure of carboxylic acids, which strongly influences their properties. The carboxyl functional group is generally considered to be a highly polar organic functional group. Due to the sp2 hybridization state of the carbon atom and of the oxygen double bond, the carboxyl functional group has a planar structure which favors the p-π conjugacy and creates a strong permanent dipole. The dipoles present in carboxylic acids allow them to form strong hydrogen bonds between acid molecules, and also between acid molecules and water or other molecular solvents. These aspects influence the essential relationship between the structural attributes and the properties of carboxylic acids.
In our study, we considered three properties of carboxylic acids: molar refraction, molar polarizability, and refractive index. These properties are interrelated and are influenced by the electronic interactions and the polarity of carboxylic acids. The molar refraction, Rm (cm3 mol−1) is a constitutive-additive molecular property of substances [6]. The molar refraction is related to the polarizability of the molecules that make up the medium, by the Lorentz–Lorenz [7] equation:
R m = n D 2 1 n D 2 + 2 V m = 4 π 3 = N A P
where nD is the refractive index of the given substance at optical wavelengths, usually at 589 nm (sodium D-line), Vm is the molar volume, NA is Avogadro’s constant, and P is the mean polarizability of molecules. For a radiation of infinite wavelength, Rm = Vm and, therefore, the molar refractivity can be used as a measurement of the real volume of the molecules, a very important fact for chemists and biologists. The molar refraction can also be evaluated by means of refractive index, molecular weight, and density, by replacing the molar volume in Equation (1) with the ratio of molecular weight (MW) and density (d): Vm = MW/d. On the other hand, molar refraction is a measure of the total polarizability of a mole of substance, see [6] for more details.
The refractive index (nD) characterizes the capacity of a substance to refract the light. Light traversing a substance has a velocity different from the case when light is traversing a vacuum. The ratio of the velocity of light in a vacuum to that in a substance is the refractive index or the index of refraction of the substance. The refractive index is often used to identify a particular substance, to confirm its purity, or to measure its concentration, see [8] for more details.
Molar refraction and molecular polarizability being additive properties can be calculated by summing up the contributions of a variety of atoms and/or functional groups, bond contributions, and various corrections factors. The most developed way to obtain molar refraction uses Crippen’s fragmentation methods [9,10]. Alternatively, attempts have been made by various QSAR researchers to model molar refractivity by using topological indices [11,12]. Verma, Kuo, and Hansch [13] studied the polarizability effects on ligand–substrate interactions, in terms of the number of valence electrons (NVE), and proposed various linear QSAR models. Verma and Hansch [2] performed a comparison regarding the use of NVE and calculated molar refractivity (CMR) in QSARs for studying chemical–biological interactions, while Hansch and Kurup [14] found that the simple summation of the valence electrons (H = 1, C = 4, O = 6, etc.) in a molecule is a measure of its polarizability. They also showed that this parameter correlates with the nerve toxicity of a wide variety of chemicals acting on the nerves of frogs, rabbits, cockroaches, and humans. Fast empirical models to predict molecular polarizability were also developed by Wang, Xie, Hou, and Xu [15], using two different approaches. The refractive index, molar refractivities, and molar polarizability constant of heterocyclic compounds were studied by Sonar and Pawar [16], while Granados, Gracia-Fadrique, Amigo, and Bravo [17] studied the refractive index, surface tension, and density of aqueous mixtures of carboxylic acids.
Starting from this background, the main aim of the present study was to develop linear monovariable QSPR models that are able to predict molar refraction, polarizability, and refractive index in the class of carboxylic acids by using the ZEP topological index.

2. Materials and Methods

In order to develop predictive QSPR models for molecular refraction, refractive index, and polarizability values of carboxylic acids, we followed the following steps:
(i)
the selection of the data set;
(ii)
generation of molecular ZEP index for carboxylic acids used in this work;
(iii)
building QSPR models within the selected data set;
(iv)
validation of the obtained QSPR models using the y-randomization test and the internal and external validation strategies.

2.1. Data Set

The properties of aliphatic carboxylic acid selected in this study are molecular refractivity, denoted by Rm; refractive index, denoted by nD; and polarizability, denoted by P. The data set includes 80 acids: 50 saturated aliphatic monocarboxylic acids, 17 unsaturated acids, and 13 aliphatic dicarboxylic acids. Molecular refractivity values for these acids, as well as the refractive index values for 33 saturated aliphatic monocarboxylic acids and polarizability values for 21 saturated aliphatic monocarboxylic acids, were taken from the literature [1,18,19]. The values of polarizability for the 20 other monocarboxylic acids were calculated by means of the relation:
P = 3 R m 4 π N A = 0.3964308 10 24 R m
where NA is Avogadro’s constant. These values are given in Table 1 and are indicated by the superscript b.

2.2. The ZEP Index

The molecular topological index ZEP used in this QSPR study was calculated using hydrogen-suppressed graphs of the carboxylic acids. The molecular topological index ZEP introduced by Berinde [20] is defined as:
ZEP = i = 1 n j = 1 n wed   ( i , j ) 1 2
where wed (i,j) is the weighted electronic distance, also introduced by Berinde [20]:
wed ( i , j ) = 1 b i j Z i + Z j v i v j , if   there   is   a   bond   between   atom   i   and   atom   j 0 , if   is   not   a   bond   between   atom   i   and   j
In (4) v i , v j denote the degrees of the vertices i and j, respectively; Z k denotes the formal degree of vertex k and is defined by Z k = Z k v k ; and Zk denotes the order number of atom k in Mendeleev’s periodic system. The values of bij are 1, 2, 3, and 1.5 for a single bond, a double bond, a triple bond, and an aromatic bond, respectively. Alternatively, the topological index ZEP can be calculated by using the connectivity matrix, CEP [21]. In Table 2 are given the weighted electronic distances, the formal degrees of vertices, as well as the degrees of the vertices for common bonds in carboxylic acids. In order to emphasize the number of bonds of the carbon atom and oxygen atom, respectively, we kept the hydrogen atoms visible.
In contrast to the usual topological distance, which is equal to 1 for any bond between two atoms, the weighted electronic distance, according to its definition, is able to differentiate between simple and multiple bonds, between covalent non-polar bonds and polar covalent bonds, and is also able to differentiate between the bonds depending on their branching degree and their neighboring bonds. It is also able to differentiate between the symmetric and asymmetric arrangements of atoms or groups of atoms with respect to a chemical bond. This property of differentiating is illustrated in Figure 1 in the case of four marked molecular graphs, which represent the structures of the following carboxylic acids: ethanoic, propanoic, 2-methylpropanoic, and 2,2-dimethylpropanoic. The carboxyl functional group is linked in each of the four mentioned cases to the remaining catena by a simple bond, but with a different branching and different neighboring bonds. Therefore, the weighted electronic distances for these bonds are different: 7.5; 4.5; 3.5; 3.0, i.e., the smallest value of the weighted electronic distance corresponds to the greatest branching, see Figure 1.
We can illustrate the calculation technique of ZEP index for the hydrogen-suppressed graph of propanoic acid (G.2) by using Formula (3):
ZEP ( G . 2 ) = 9 1 2 + 13.5 1 2 + 15 1 2 + 2.5 1 2 + 8 1 2 = 14.9568
We note that the ZEP index has been studied by the author in various contexts, in order to check its correlation power with several properties, and it has provided good correlation parameters [20,21,22].
In this work, the values of ZEP index for 84 carboxylic acids were calculated (the four other acids will be used in the validation process of our QSPR models). Note that all these values are different from each other, which also indicates the fact that ZEP index also has a good discrimination power, see also [22].

2.3. QSPR Model Building

In order to build a QSPR model, the data set was randomly divided into two subsets, namely, the training set and the test set. The training set was used for developing QSPR models, while the test set was used for validating the predictive power of the obtained QSPR models. In the training set, using least-square regression and considering only one variable, i.e., the ZEP index, simple linear equations were developed. The statistical parameters used to test the goodness-of-fit between the model-predicted and experimental values were the correlation coefficient (R), the coefficient of determination (R2), the standard deviation (s), and the Fischer statistic value (F). A model with high values of R2 and F, and a low value of s is usually preferred. For the coefficient of determination, the following condition is recommended [23]: R2 > 0.6. This condition shows that the model will have a better fitting ability, but it does not reflect at all on the predictive power of the model [24].

2.4. Model Validation

For evaluating the stability and the predictive ability of QSPR models developed in the present paper, we applied the following three validation strategies from the list of five basic validation procedures presented in [25,26,27]: y-randomization test, the internal validation, and external validation.
Y-randomization test. The main aim of the y-randomization test is to detect and quantify chance correlations between the dependent variable and descriptors [25]. This test is designed to ensure the robustness of a QSPR model [26]. When applying the y-randomization test, the dependent variable, in our case Rm, or nD, or P, is randomly shuffled and a new QSPR model is developed using the independent variable, the ZEP index, but not randomly. The process is repeated several times. All QSPR models obtained are expected to have low R2 values, otherwise the QSPR model developed cannot be used for the given data set. According to Kiralj et al. [27], if R2yi < 0.2, there is no risk of a chance correlation in the developed model.
Internal validation. In our study, the validity of the model was tested using the cross-validation (CV) method and ‘leave-one-out’ (LOO) procedure in the training set. As is well-known, the correlation coefficient leave-one-out cross-validation describes the stability of a regression model. According to Kiralj et al. [27], the criterion of robustness and predictive ability of the model assumes R2CV > 0.5. It is accepted that the minimal acceptable statistics for a QSPR regression model are requirements R2cv ˃ 0.5 and R2 ˃ 0.6, see [23]. It is also generally agreed that a large difference between R2 and R2cv (exceeding 0.2–0.3) is an indicator of the overfitting of the QSPR model.
External validation. The purpose of the external validation is to test the true predictive ability of the QSPR model. For this purpose, we analyzed the test data set of compounds that were not included in the training set or used in the model development. We first applied the y-randomization test, then we calculated the statistical parameters R2ext and Q2ext, similarly to R2 and R2CV for the training set. The external validation performance is given by R2ext and Q2ext. R2ext is a measure of fitting for the external validation set and can be compared to R2 for the training data set [28].

3. Results and Discussion

We calculated the ZEP index for the 84 acids used in this study: the values obtained for the 50 saturated aliphatic monocarboxylic acids are listed in Table 1, the ZEP index values of the 13 aliphatic dicarboxylic acids and 17 unsaturated acids are listed in Table 3 and Table 4, respectively, while the values of ZEP for the remaining four acids are listed in Table 5.

3.1. QSPR Model Building for Molecular Refraction

3.1.1. Saturated Aliphatic Monocarboxylic Acids

In order to build a QSPR model for the molecular refractivity, we applied the above mentioned procedure. Table 1 displays the experimental molecular refraction for the 50 saturated aliphatic monocarboxylic acids having asymmetric structure with respect to the carboxyl group or having an asymmetric carbon atom. They were divided into two subsets: one set with 26 acids that formed the training set used in the modelling process, and another set with 24 acids that formed the test set, which are marked with b and was used for testing the model in external validation.
By correlating the molecular refractivity with ZEP index for the 26 monocarboxylic acids, which were used as a training set, we obtained the following linear QSPR model:
Rm = −2.152 (±0.151) + 1.331 (±0.005) ZEP
N = 26 R = 0.9999 R2 = 0.9998. R2CV = 0.9998 Q2ext = 0.9998 s = 0.2069 F = 73,732.2; MAE = 0.147; MAD = 0.11
The QSPR model (5) has a very good statistical quality for fitting the calculated Rm values to the experimental ones. The robustness of the model (5) and its internal predictive ability were evaluated by R2CV–cross validation coefficient based on leave-one-out (LOO); its value of 0.9998 being very good. Model (5) was also checked for reliability, robustness, and chance correlation by applying the y-randomization test. The y-randomization test was performed 10 times. Results of the y-randomization test are presented in Table 6.
In each y-randomization run, R2yi < 0.2, which shows that the good results in our original model were not due to a chance correlation or structural dependency of the training set. The QSPR model (5) was statistically internally validated and this equation was used for the calculating values of the molecular refractivity for the training set, and also for the Rm predicted values of monocarboxylic acids in the test set. The results are presented in Table 1. The analysis of residuals of predicted molecular refractivity against the experimental values, in the training set, shows that the residuals only exceeded in three situations the standard deviation limits of ±2 s, in our case we had ±0.42. There were three small excesses that appeared for acids with similar structure: 2,2-dimethylpropanoic(−0.47 error), 3,3-dimethylpentanoic (+0.45 error), and 3,3-dimethylhexanoic (+0.47 error). The linear QSPR equation resulted from eliminating these three values from the correlation process is the following:
Rm = −2.279 (±0.093) + 1.334 (±0.003) ZEP
N = 23 R = 0.9999 R2 = 0.9998 R2CV = 0.9998 s = 0.123 F = 199,918.5; MAE = 0.106; MAD = 0.08
By eliminating those values, the goodness of fit, the reliability, and the robustness of the QSPR model (6) are not significantly improved.
The capability of the linear model (5) to predict Rm values for monocarboxylic acids with unknown Rm, was investigated in the test set. The predicted Rm values for a series of the 24 monocarboxylic acids included in the test set were calculated with Equation (5) and are given in Table 1, together with their deviations from the corresponding experimental Rm values.
Note that the number of acids in the test set (24 acids) is close to the number of compounds in the original training set (26 acids):
Rm = −1.832 (±0.196) + 1.326 (±0.005) ZEP
N = 24   R ext = 0.999   R 2 ext = 0.998   R 0 ext 2 = 0.998   R cv   ext 2 = 0.998   s = 0.2356   F = 66,537.3 ;   MAE = 0.1604 ;   MAD = 0.105
The analysis of residuals shows a single compound, that is, 2,2,3,3-tetramethylbutanoic acid, falling outside the standard deviation limits of ±2 s. All the validation strategies show that the obtained model (5) is a valid QSPR model for the prediction of molecular refractivity of monocarboxylic acids. A general QSPR model for all the 50 monocarboxylic acids was also proposed:
Rm = −2.081 (±0.115) + 1.334 (±0.003) ZEP
N = 50 R = 0.9999 R2 = 0.9998 R2CV = 0.9998 s = 0.2258 F = 158,423.74; MAE = 0.159; MAD = 0.105
The obtained result suggests that our QSPR model (5) is, indeed, very good.

3.1.2. Aliphatic Dicarboxylic Acids

Dicarboxylic acids contain in their structure two functional carboxylic groups. Therefore, in our study we have considered dicarboxylic acids with a linear and symmetric structure with respect to the two functional carboxylic groups. As a consequence of this fact, the polarizability of dicarboxylic acids and the electronic interactions are stronger than in the case of monocarboxylic acids. This is the reason why, in a first step, we developed separately a QSPR model for a set of 10 aliphatic dicarboxylic acids (as a training set). Table 3 presents the values of ZEP index calculated for these acids and the experimental values of molecular refractivity. By linear regression and using a single descriptor we obtained the following equation:
Rm = −7.105 (±0.019) + 1.337 ZEP
N = 10   R = 0.9999   R 2 = 0.9998   s = 0.0178   F = 7,493,020.6   R CV 2 = 0.998 ;   MAE = 0.011 ;   MAD = 0.005
The coefficient of determination R2 = 0.9998 and the standard error s = 0.0178 show a very good correlation between the ZEP index and molecular refractivity for aliphatic dicarboxylic acids. The model was validated by leave-one-out cross-validation and y-randomization. The results of y-randomization are presented in Table 6. These data show, for each iteration, values of R2yi < 0.2, which proves the stability of the model. On the other hand, the cross-validation coefficient, R2CV = 0.998, illustrates the reliability of the model. The leave-one-out cross-validation predicted values are presented in Table 3. Therefore, the obtained model (9) is indeed suitable for calculating the values of molar refractions in this class of dicarboxylic acids.

3.1.3. Unsaturated Carboxylic Acids

Unsaturated carboxylic acids contain in their structure double and triple bonds, alongside the carboxylic functional group. The multiple bonds are arranged asymmetrically with respect to the carboxyl group. The multiple bonds influence the polarizability and the electronic interactions of unsaturated carboxylic acids, but this influence is less significant than in the case of dicarboxylic acids. In our study, we developed a QSPR model for molar refraction (Rm), corresponding to a set of 12 unsaturated acids (as a training set). Table 4 presents the values of ZEP index calculated for these acids and the experimental values of molecular refractivity. By applying the linear regression method and using a single descriptor we obtained the following equation:
Rm = 0.083 ± 0.673 + 1.287 ± 0.027 ZEP
N = 12   R = 0.998   R 2 = 0.995   S = 0.717   F = 2195.62   R CV 2 = 0.993 ;   MAE = 0.5233 ;   MAD = 0.3765
The coefficient of determination R2 = 0.995 and the standard error s = 0.717 show a very good correlation between ZEP index and molecular refractivity for aliphatic unsaturated acids. The model was validated by leave-one-out cross-validation and y-randomization techniques. The results of y-randomization are presented in Table 6. These results show, for each iteration, values R2yi < 0.2, which indicates the stability of the model. The value of cross-validated coefficient, R2CV = 0.995, which is very close to the coefficient of determination, illustrates the reliability of the model. The leave-one-out cross-validation predicted values are also presented in Table 4. In order to check the predictive ability of the model (10), we calculated, by using this equation, the values of the molar refraction for five unsaturated carboxylic acids. The obtained values were compared with the experimental values of molar refraction existing in the literature. The differences between the experimental and predicted values were not significant. Therefore, the QSPR model (10) was shown to be very good for the calculation of molar refraction for unsaturated carboxylic acids.
At the end of our study, we applied the linear regression method to the set of 80 acids obtained by the union of the set of 50 monocarboxylic acids, the set of 13 dicarboxylic acids, and the set of 17 unsaturated carboxylic acids. We thus obtained the following QSPR model for molar refraction:
Rm = −0.276 (±0.629) + 1.259 (±0.018) ZEP
N = 80   R = 0.992   R 2 = 0.984   S = 1.819   F = 4654.81   R CV 2 = 0.993 ;   MAE = 0.175 ;   MAD = 0.095
The obtained QSPR Equation (11), modelling the molar refraction of carboxylic acids, was used to compute the molar refraction for four other carboxylic acids, not previously considered in the QSPR study. The results obtained in this way are given in Table 5. As can be seen, the maximum standard error for Rm corresponds, as expected, to Equation (11), which comprises all carboxylic acids considered in the study.

3.1.4. Building the QSPR Model for Polarizability

In Table 6 are presented the experimental values of molecular polarizability for 41 saturated aliphatic monocarboxylic acids divided into two subsets. One set containing 21 acids that will serve as training set in the QSPR modelling process, and another set containing 20 acids that form the test set, which are marked with the superscript b and which shall be used for testing the model by the method of external validation. The values for the polarizability of the 20 acids in the test set were obtained by conversion of molecular refractivity, using Equation (2). By correlating the molecular polarizability with ZEP index for these 21 monocarboxylic acids used as training set, the following linear QSPR model was obtained:
P = −0.792 (±0.055) + 0.525 (±0.002) ZEP
N = 21 R = 0.9999 R2 = 0.9998 S = 0.0617 F = 81,178.039 R2CV = 0.9998 R2CV ext = 0.998; MAE = 0.438; MAD = 0.04
The QSPR model (12) has a very good statistical quality for fitting the calculated values of P to the experimental ones. The robustness of the model (12) and its internal predictive ability were evaluated using a R2CV–cross validation coefficient based on leave-one-out (LOO); its value of 0.9998 being very good. The model (12) was also checked for reliability, robustness, and chance correlation by applying the y-randomization test. The y-randomization test was performed 10 times. The results of the y-randomization test are presented in Table 6. In each y-randomization run, we obtained R2yi < 0.2, which shows that the good results in our original model were not due to a chance correlation or structural dependence of the training set.
The QSPR model (12) was statistically internally validated and then this equation was used for calculating the values of the molecular polarizability for the training set and also for the predicted values P of monocarboxylic acids in the test set. The results are presented in Table 1. Regression of the predicted polarizability against the observed molecular polarizability was R20 = 0.998. The analysis of residuals of predicted molecular polarizability against the experimental values, in training set, showed that the residuals only once exceeded the standard deviation limits of ±2 s, in our case ±0.123. This corresponds to the compound 2,2-dimethylpropanoic (0.18 error). The capability of the linear model (12) to predict P values for monocarboxylic acids was investigated in the test set. The predicted values of P for a series of 20 monocarboxylic acids included in the test set, close to the number of acids in the training set, were calculated with Equation (12) and are given in Table 1, together with their deviations from the corresponding experimental values of P. The external predictive power was confirmed by R2CV ext = 0.998 and R2 ext = 0.9998. The analysis of residuals shows a single compound, 2,2,3,3-tetramethylbutanoic acid, falling outside the standard deviation limits of ±2 s. All the validation strategies show that the obtained model (12) is a valid QSPR model for the prediction of the polarizability of monocarboxylic acids. A general QSPR model for all the 41 monocarboxylic acids was also proposed:
P = −0.806 (±0.047) + 0.527 (±0.001) ZEP
N = 41 R = 0.9999 R2 = 0.9998 S = 0.083 F = 146,719.36 R2CV =0.998; MAE = 0.062; MAD = 0.04
The statistical results for Equation (13) suggest that our QSPR model (12) is very good. The obtained QSPR equations modelling the polarizability of carboxylic acids were used to compute the polarizability for four other carboxylic acids, not previously considered in the QSPR study. The results obtained in this way are given in Table 7.
Notably, the obtained values for polarizability and molar refraction increased relatively with the size and molecular weight of carboxylic acids. This fact is in agreement with the formula of Lorentz–Lorenz, which gives the relationship between polarizability, the molar refractivity, and volume [18].

3.1.5. Building QSPR Models for Refractivity Index

The molecular set considered here comprises 33 aliphatic monocarboxylic acids, with the corresponding nD values (see Table 1), of which 22 acids were used as the training set in the modeling process and 11 acids were used as a test set for external validation, which are marked with the superscript b. The following QSPR model was obtained in this case:
nD = 1.396 (±0.001) + 0.001 ZEP
N = 22 R = 0.992 R2 = 0.984 S = 0.001183 F = 1284.906 R2CV = 0.981; MAE = 0.062; MAD = 0.0015
The model was similarly validated by leave-one-out cross-validation and y-randomization techniques. The results of y-randomization are presented in Table 6. Using Equation (14), we calculated the values of the refractivity index for 11 saturated aliphatic monocarboxylic acids. The obtained values were compared with the experimental values of refractivity index existing in the literature. Therefore, the QSPR model (14) was shown to be very good for the calculation of refractivity index for saturated aliphatic monocarboxylic acids.

4. Conclusions

In this work, we presented various QSPR models, as an alternative technique to the existing additive methods, for predicting the molar refraction, polarizability, and refraction index of carboxylic acids. We used a linear regression method and a single molecular descriptor, the ZEP topological index. ZEP was calculated in a simple manner with the help of weighted electronic distances (wed), also calculated on the basis of the chemical structure of the molecules. The QSPR models developed were validated by means of a leave-one-out cross validation procedure, external validation, and y-randomization. The obtained results show that the proposed models are simple and have a significant predictive potential. Therefore, all QSPR models thus developed, irrespective of the property by which they were constructed, i.e., for Rm, or P, or nD, can also be applied for predicting the other two properties of carboxylic acids, in agreement with the Lorentz–Lorenz formula. This intercorrelation relationship also explains the fact that, for all QSPR models reported here, the correlation coefficients have closed values.
The results reported in this paper could be used in QSAR (quantitative structure activity relationship) for the prediction of the biological or pharmaceutical activity of carboxylic acids. Thus, the molar refraction values could be used for the estimation and prediction of the lipophilicity of a homologous series of saturated fatty acids [3], while the refraction index could be used for the estimation and prediction of the toxicity of aliphatic carboxylic acids [1]. These aspects will be considered in a future work.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Maguna, F.P.; Nunez, M.B.; Okulik, N.B.; Castro, E.A. Improved QSAR Analysis of the Toxicity of Aliphatic Carboxylic Acids. Russ. J. Gen. Chem. 2003, 73, 1792–1798. [Google Scholar] [CrossRef]
  2. Verma, R.P.; Kurup, A.; Hansch, C. On the role of polarisability in QSAR. Bioorg. Med. Chem. 2005, 13, 237–255. [Google Scholar] [CrossRef]
  3. Pyka, A.; Bober, K. Selected traditional structural descriptors and RM values for estimation and prediction of lipophilicity of homologous series of saturated fatty acids. J. Am. Oil Chem. Soc. 2006, 83, 747–752. [Google Scholar] [CrossRef]
  4. Sakuratani, Y.; Kasai, K.; Noguchi, Y.; Yamada, Y. Comparison of predictivities of log P calculation models based on experimental data for 134 simple organic compounds. QSAR Comb. Sci. 2007, 26, 109–116. [Google Scholar] [CrossRef]
  5. Shafiei, F. Relationship between Topological Indices and Thermodynamic Properties and of the Monocarboxylic Acids Applications in QSPR. J. Math. Chem. 2015, 6, 15–28. [Google Scholar] [CrossRef]
  6. Atkins, P.W. Physical Chemistry, 6th ed.; Oxford University Press: Oxford, UK; Melbourne, Australia; Tokyo, Japan, 1998; p. 654. [Google Scholar]
  7. Glasstone, S. Textbook of Physical Chemistry; Macmillan: London, UK, 1948; p. 543. [Google Scholar]
  8. Charles, K. Introduction to Solid State Physics, 8th ed.; John Wiley & Sons, Inc.: New York, NY, USA, 2005; p. 464. [Google Scholar]
  9. Ghose, A.K.; Crippen, G.M. Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions. J. Chem. Inf. Comput. Sci. 1987, 27, 21–35. [Google Scholar] [CrossRef]
  10. Miller, K.J.; Savchik, J.A. A new empirical Method to calculate Average Molecular Polarizabilities. J. Am. Chem. Soc. 1979, 101, 7206–7213. [Google Scholar] [CrossRef]
  11. Padrón, J.A.; Carrasco, R.; Pellón, R.F. Molecular descriptor based on a molar refractivity partition using Randic-type graph-teoretical invariant. J. Pharm. Pharmaceut. Sci. 2002, 5, 258–266. [Google Scholar]
  12. Naef, R.A. Generally Applicable Computer Algorithm Based on the Group Additivity Method for the Calculation of Seven Molecular Descriptors: Heat of Combustion, LogPO/W, LogS, Refractivity, Polarizability, Toxicity and LogBB of Organic Compounds; Scope and Limits of Applicability. Molecules 2015, 20, 18279–18351. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Verma, R.P.; Hansch, A.C. A comparison between two polarisability parameters in chemical-biological interactions. Bioorg. Med. Chem. 2005, 13, 2355–2372. [Google Scholar] [CrossRef]
  14. Hansch, C.; Kurup, A. QSAR of Chemical Polarizability and Nerve Toxicity. 2. J. Chem. Inf. Comput. Sci. 2003, 43, 1647–1651. [Google Scholar] [CrossRef]
  15. Wang, J.; Xie, X.Q.; Hou, T.J.; Xu, X.J. Fast Approaches for Molecular Polarizability Calculations. J. Phys. Chem. A 2007, 111, 4443–4448. [Google Scholar] [CrossRef] [Green Version]
  16. Sonar, A.N.; Pawar, N.S. Studies on viscosity, density and refractive index of substituted heterocyclic compounds in different media. Rasayan J. Cem. 2010, 3, 250–254. [Google Scholar]
  17. Granados, K.; Gracia-Fadrique, J.; Amigo, A.; Bravo, R. Refractive Index, Surface Tension, and Density of Aqueous Mixtures of Carboxylic Acids at 298.15 K. J. Chem. Eng. Data 2006, 51, 1356–1360. [Google Scholar] [CrossRef]
  18. Golovanov, I.B.; Zhenodarova, S.M. Quantitative structure-property relationship: XXVI. Toxicity of aliphatic carboxylic acids. Russ. J. Gen. Chem. 2006, 76, 40–44. [Google Scholar] [CrossRef]
  19. Weast, R.C. CRC Handbook of Physics and Chemistry, 68th ed.; CRC: Bwa Raton, FL, USA, 1987. [Google Scholar]
  20. Berinde, Z. Applications of Molecular Topology in The Study of Physico-Chemical Properties of Organic Compounds; Cub Press 22: Baia Mare, Romania, 2001. (In Romanian) [Google Scholar]
  21. Berinde, Z.; Berinde, M. On a matrix representation of molecular structures. Carpathian J. Math 2004, 20, 205–209. [Google Scholar]
  22. Berinde, Z.M. Comparing the molecular graph degeneracy of Wiener, Harary, Balaban, Randic and ZEP topological indices. Creat. Math. Inf. 2014, 23, 165–174. [Google Scholar] [CrossRef]
  23. Tropsha, A.; Gramatica, P.; Gombar, V.K. The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models. QSAR Comb. Sci. 2003, 22, 69–77. [Google Scholar] [CrossRef]
  24. Roy, P.P.; Roy, K. On some aspects of variable selection for partial least squares regression models. QSAR Comb. Sci. 2008, 27, 302–313. [Google Scholar] [CrossRef]
  25. Rücker, C.; Rücker, G.; Meringer, M. y-Randomization and Its Variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345–2357. [Google Scholar] [CrossRef]
  26. Topliss, J.G.; Costello, R.J. Chance Correlations in Structure-Activity Studies Using Multiple Regression Analysis. J. Med. Chem. 1972, 15, 1066–1068. [Google Scholar] [CrossRef] [PubMed]
  27. Kiralj, R.; Ferreira, M.M.C. Basic Validation Procedures for Regression Models in QSAR and QSPR Studies: Theory and Application. J. Braz. Chem. Soc. 2009, 20, 770–787. [Google Scholar] [CrossRef] [Green Version]
  28. Consonni, V.; Ballabio, D.; Todeschini, R. Evaluation of model predictive ability by external validation techniques. J. Chemom. 2010, 24, 194–201. [Google Scholar] [CrossRef]
Figure 1. The wed for molecular graphs representing the skeleton of carboxylic acids.
Figure 1. The wed for molecular graphs representing the skeleton of carboxylic acids.
Symmetry 13 02359 g001
Table 1. The ZEP index of the training and test sets of saturated aliphatic monocarboxylic acids and their molecular refraction (Rm), refractive index (nD), and polarizability (P).
Table 1. The ZEP index of the training and test sets of saturated aliphatic monocarboxylic acids and their molecular refraction (Rm), refractive index (nD), and polarizability (P).
AcidZEPRmnDP
Exp.Pred.ErrorExp.Pred.ErrorExp.Pred.Error
propanoic14.956817.5117.76−0.25
butanoic18.395922.1422.33−0.191.411 b1.414−0.0038.778.860.01
2-methylpropanoic18.224022.1022.100.001.410 b1.414−0.0048.768.78−0.02
pentanoic21.860026.7726.94−0.171.4201.4190.00110.6110.68−0.07
2-methylbutanoic21.604226.7326.600.131.417 b1.418−0.00110.5910.550.04
2,2-dimethylpropanoic21.349426.7426.26−0.471.4191.4180.00110.6010.420.18
hexanoic25.324131.4131.55−0.141.4271.4200.00712.4512.50−0.05
2-methylpentanoic25.231231.36 b31.43−0.071.4251.4210.00412.43 b12.46−0.03
3-methylpentanoic25.177531.3631.360.001.4251.4210.00412.4312.430.00
4-methylpentanoic25.079031.3631.230.131.4251.4210.00412.4312.380.05
2,2-dimethylbutanoic24.978631.37 b31.090.28 12.44 b12.32−0.12
2,3-dimethylbutanoic25.045631.32 b31.180.14 12.4212.360.06
2-ethylbutanoic25.308831.3631.530.171.4251.4210.004
heptanoic28.788236.0436.16−0.121.4291.4250.00314.2814.32−0.04
2-Methylhexanoic28.695435.93 b36.04−0.11 14.24 b14.240.00
3-methylhexanoic28.625435.99 b35.950.04 14.26 b14.240.02
2,2-dimethylpentanoic28.417736.00 b35.670.331.423 b1.424−0.00114.27 b14.130.14
3,3-dimethylpentanoic28.333036.0135.560.451.431 b1.4240.007
2,3-dimethylpentanoic28.609835.9435.95−0.01 14.2414.230.01
2,4-dimethylpentanoic28.443435.9435.710.231.4291.4260.00314.1914.140.05
2-ethylpentanoic28.756735.9536.12−0.17 14.2514.31−0.06
2,2,3-trimethylbutanoic28.556835.96 b35.860.10 14.26 b14.200.06
octanoic32.252340.6740.77−0.101.433 b1.4280.00516.1216.14−0.02
2-methylheptanoic31.974040.62 b40.410.211.430 b1.4280.00216.10 b16.000.10
2,4-dimethylhexanoic32.016740.5940.460.131.432 b1.4280.00416.0716.020.05
3,3-dimethylhexanoic31.772140.6140.140.47
2-ethylhexanoic32.220840.6340.73−0.101.4351.4280.00716.1016.12−0.02
2,3,4-trimethylpentanoic31.874040.53 b40.270.26 16.07 b15.940.13
2-propylpentanoic32.204740.6340.71−0.081.4351.4280.00716.1016.12−0.02
2,2,3,3-tetramethylbutanoic31.409140.61 b39.650.96 16.10 b15.700.40
nonanoic35.716445.3045.39−0.091.439 b1.4320.00717.9617.960.00
2-methyloctanoic35.623645.25 b45.26−0.011.4391.4320.00717.94 b17.91−0.03
2,3-dimethylheptanoic35.521845.2145.130.081.438 b1.4320.006
2-ethylheptanoic35.684945.25 b45.34−0.09 17.94 b17.940.00
2,3,4-trimethylhexanoic35.438245.16 b45.020.14 17.90 b17.810.09
decanoic39.180549.9449.99−0.051.4431.435.000819.7919.780.01
2-methylnonanoic39.087749.88 b49.870.011.4411.4350.00619.77 b19.730.07
2,2-dimethyloctanoic38.810149.89 b49.500.391.4421.4350.007
2-ethyloctanoic39.149049.88 b49.96−0.081.4411.4350.00619.77 b19.760.01
2-propylheptanoic39.132949.88 b49.93−0.051.4411.4350.006
undecanoic42.644654.5754.61−0.041.4451.4390.00621.6321.600.03
2-methyldecanoic42.551854.51 b54.480.031.4441.4390.00521.61 b21.550.06
2,2-dimethylnonanoic42.274254.52 b54.120.401.4451.4380.007
2-ethylnonanoic42.613154.51 b54.57−0.061.4451.4380.00721.61 b21.580.03
2-propyloctanoic42.597054.51 b54.54−0.031.4441.4390.00521.61 b21.570.04
dodecanoic46.108759.1859.22−0.04 23.4223.420.00
tridecanoic49.572863.8163.83−0.02
tetradecanoic53.036968.44 b68.440.00 27.13 b27.050.08
pentadecanoic56.501073.07 b73.050.021.468 b1.4530.01528.97 b28.870.10
hexadecanoic59.965177.70 b77.660.04 30.80 b30.690.11
b The test set.
Table 2. Values of wed, Z k (upper row) and vk (lower row).
Table 2. Values of wed, Z k (upper row) and vk (lower row).
Bond C 4 24 = O 2 16 C 4 24 O 1 8 H C 1 6 H 3 C 2 12 H 2 C 1 6 H 3 C 3 18 H C 1 6 H 3 C 4 24 C 2 12 H 2 C 2 12 H 2 C 2 12 H 2 C 3 18 H C 2 12 H 2 C 4 24
wed2.58987.5654.5
Bond C 3 18 H C 3 18 H C 3 18 H C 4 24 C 4 24 C 4 24 C 2 12 H 2 = C 3 18 H C 2 12 H 2 = C 4 24 C 3 18 H = C 3 18 H C 3 18 H = C 4 24 C 4 24 = C 4 24
wed43.532.52.2521.751.5
Table 3. The ZEP index values of aliphatic dicarboxylic acids, their molecular refraction (Rm).
Table 3. The ZEP index values of aliphatic dicarboxylic acids, their molecular refraction (Rm).
AcidZEPRm
Exp.Pred.ErrorCroos
Training set
Propandioic19.565119.0719.050.0219.04
Butandioic23.045823.7023.71−0.0123.71
Pentandioic26.509928.3428.340.0028.34
Hexandioic29.974032.9732.970.0032.97
Heptandioic33.438137.6037.600.0037.60
Octanedioic36.902242.2442.230.0142.23
Nonanedioic40.366346.8246.86−0.0446.86
Decanedioic43.830451.5051.490.0151.49
Dodecanedioic50.758760.7760.760.0160.76
Tetradecanedioic57.686870.0370.020.0170.02
Test set
Undecanedioic 47.294556.13 b56.120.01
Tridecanedioic54.222865.39 b65.390.00
Pentadecanedioic61.150974.66 b74.650.01
b The test set.
Table 4. The ZEP index values of aliphatic unsaturated acids, their molecular refraction (Rm).
Table 4. The ZEP index values of aliphatic unsaturated acids, their molecular refraction (Rm).
AcidZEPRm
Exp.Pred.ErrorCroos
Training set
trans-2-Pentenoic20.939726.8327.30−0.0527.34
4-Pentenoic20.237926.5025.93 0.5725.87
trans-2-Hexenoic24.387631.4631.76−0.3031.79
trans-3-Hexenoic24.106431.4631.41 0.0531.41
(E)-but-2-enoic17.182422.2022.48−0.2822.52
propenoic13.302817.2316.92 0.3116.82
2-Octenoic31.007640.7340.28 0.4540.19
2-Octynoic30.864038.6839.26−0.5839.36
2-Propynoic12.504115.3315.66−0.3315.78
2-Nonynoic34.328143.3143.73−0.4243.87
2-Nonenoic34.471745.3644.75 0.6144.52
3-Butenoic16.339421.8721.44 0.4321.37
Test set
2-Methyl-propenoic16.816721.96 b21.73 0.23
cis-2-Methyl-2-butenoic20.939727.26 b27.04 0.22
trans-2-Heptenoic27.543536.05 b35.53 0.52
trans-3-Heptenoic27.554436.05 b35.55 0.50
3-heptynoic26.873134.20 b34.67−0.47
b The test set.
Table 5. Molar refraction of four external acids.
Table 5. Molar refraction of four external acids.
AcidZEPRm
Exp.Pred.Error
Octadecanoic66.893386.9686.88 Equation (6)0.08
87.15 Equation (9)−0.19
83.94 Equation (12)3.02
Hexadecanedioic64.615079.2979.28 Equation (10)0.01
81.07 Equation (12)−1.78
trans-5-octenoic 31.045340.440.04 Equation (11)0.36
38.81 Equation (12)1.59
(Z)-9-Octandecenoic65.670286.1284.60 Equation (11)1.52
82.40 Equation (12)3.72
Table 6. Results of the y-randomization test.
Table 6. Results of the y-randomization test.
Rm
Monocarboxylic Acids
nD
Monocarboxylic Acids
P
Monocarboxylic Acids
Rm
Dicarboxylic Acids
Rm
Unsaturated Acids
Iteration R yi 2 IterationIterationIteration R yi 2 Iteration R yi 2 Iteration R yi 2
10.00110.06510.05910.08710.143
20.06120.09120.00020.05220.015
30.01030.01930.11130.21630.008
40.20240.09740.05040.09840.023
50.03550.00850.04450.14650.010
60.04460.10260.05860.10860.097
70.01770.04670.07370.01670.057
80.00480.02880.02380.02480.013
90.02190.03390.08390.13990.197
100.179100.001100.055100.097100.013
Table 7. Polarizability of four external acids.
Table 7. Polarizability of four external acids.
AcidZEPP
Exp.Pred.Error
2-ethybutanoic25.308812.4312.49 Equation (13)−0.06
12.53 Equation (14)−0.10
2,3-dimethylheptanoic35.521817.9217.86 Equation (13)0.06
17.92 Equation (14)0.00
2-propilheptanoic 39.132919.7719.75 Equation (13)0.02
19.82 Equation (14)−0.05
tridecanoic49.572825.3025.23 Equation (13)0.07
25.32 Equation (14)−0.02
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Berinde, Z.M. QSPR Models for the Molar Refraction, Polarizability and Refractive Index of Aliphatic Carboxylic Acids Using the ZEP Topological Index. Symmetry 2021, 13, 2359. https://doi.org/10.3390/sym13122359

AMA Style

Berinde ZM. QSPR Models for the Molar Refraction, Polarizability and Refractive Index of Aliphatic Carboxylic Acids Using the ZEP Topological Index. Symmetry. 2021; 13(12):2359. https://doi.org/10.3390/sym13122359

Chicago/Turabian Style

Berinde, Zoiţa Mărioara. 2021. "QSPR Models for the Molar Refraction, Polarizability and Refractive Index of Aliphatic Carboxylic Acids Using the ZEP Topological Index" Symmetry 13, no. 12: 2359. https://doi.org/10.3390/sym13122359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop