Prediction of the n-Octanol/Water Partition Coefficients of Basic Compounds Using Multi-Parameter QSRR Models Based on IS-RPLC Retention Behavior in a Wide pH Range

The n-octanol–water partition coefficient (logP) is an important physicochemical parameter which describes the behavior of organic compounds. In this work, the apparent n-octanol/water partition coefficients (logD) of basic compounds were determined using ion-suppression reversed-phase liquid chromatography (IS-RPLC) on a silica-based C18 column. The quantitative structure–retention relationship (QSRR) models between logD and logkw (logarithm of retention factor corresponding to 100% aqueous fraction of mobile phase) were established at pH 7.0–10.0. It was found that logD had a poor linear correlation with logkw at pH 7.0 and pH 8.0 when strongly ionized compounds were included in the model compounds. However, the linearity of the QSRR model was significantly improved, especially at pH 7.0, when molecular structure parameters such as electrostatic charge ne and hydrogen bonding parameters A and B were introduced. External validation experiments further confirmed that the multi-parameter models could accurately predict the logD value of basic compounds not only under strong alkaline conditions, but also under weak alkaline and even neutral conditions. The logD values of basic sample compounds were predicted based on the multi-parameter QSRR models. Compared with previous work, the findings of this study extended the pH range for the determination of the logD values of basic compounds, providing an optional mild pH for IS-RPLC experiments.


Introduction
Anilines, pyridines, imidazoles, triazines, and other basic compounds are essential chemical raw materials for national economies [1][2][3]. They are important organic intermediates in the production of vitamins, enzymes, and sulfonamides in pharmaceutical research. However, their production and use, as well as accidents arising from their storage and transportation, often cause these compounds to enter the environment, resulting in the pollution of air, soil/sediment, and water systems. These environmental risks also negatively affect wildlife [4]. At present, the environmental behavior of basic compounds is attracting considerable attention. The physicochemical properties of organic compounds determine the distribution and the fate of these pollutants in environmental media. Therefore, the determination of the physicochemical properties of basic compounds will help in assessing their potential environmental and health risks. The n-octanol/water partition coefficient P (generally expressed as the logarithm form of P, logP), denotes the distribution of a chemical's concentration in octanol and water when the octanol-water system is at equilibrium [5]. logP is a widely used and crucial parameter in investigating the fate of organic pollutants in the environment [6], and it remains a fundamental parameter for estimating bioaccumulation. The classical methods for logP determination are the shake-flask method (SFM) and the slow-stirring method (SSM) [7][8][9]. However, due to the great difficulty, time cosuming, and high cost inherent in SFM and SSM, the logP values of basic compounds obtained by these two methods are lacking. As a result, researchers have been committed to the development and improvement of calculation methods for determining logP, based on fragmental constants or an atomic contribution approach. In recent years, many calculation models have been developed and reported in the literature, such as the extreme learning machine model [5], the density functional theory (DFT) method [10], the norm index-based model [11], etc. These calculation methods are efficient for neutral compounds, but there are problems and limitations when applied to ionizable molecules, for which complex considerations are necessary [12][13][14]. Large deviations have been found between the calculated values and experimental values of ionizable compounds, especially those with complex structures [15]. Therefore, a technique for determining the logP values of basic compounds based on experimental methods is necessary.
High performance liquid chromatography (HPLC) is one method recommended by the Organisation for Economic Co-operation and Development (OECD) for logP determination [16]. Compared with SFM and SSM, HPLC is fast, consumes a low number of samples, and has a high level of automation. A significant advantage of HPLC is that the samples need not be of high purity [17]. Because of these distinct advantages, HPLC is widely used for the determination of the logP values or apparent n-octanol/water partition coefficients (logD) of organic compounds [17][18][19][20]. Among ionized compounds, the logP or logD values of acid compounds have been extensively studied, but those of basic compounds have been relatively ignored. In our previous work, Qi et al. [21] investigated the retention behavior of weakly ionized basic compounds on silica-based C18 columns using ion-suppression reversed-phase liquid chromatography (IS-RPLC) and established a linear relationship between logD and logk w (logarithm of the retention factor at 100% aqueous phase as mobile phase). Notably, in Qi's work [21], a relatively high pH of 9.0-11.5 was used, at which the dissociation of basic compounds was almost inhibited. Although it is beneficial for the determination of the logD values of basic compounds, the silica gel of the stationary phase can easily be destroyed under strong alkaline conditions, and this is detrimental to the continuity of experiments and leads to an increase in experimental cost and time. Is the relatively mild pH which helps to protect the silica-based column also suitable for the determination of the logD values of basic compounds in a dissociated state? The answer is not clear. Therefore, the feasibility of predicting the logD values of basic compounds in a dissociated state is worth studying.
In this work, we systematically investigated the IS-RPLC retention behavior of 42 basic compounds, including anilines, pyridines, imidazoles, and triazines, on a silicabased C18 column from pH values of 7.0 to 10.0. Methanol was used as the organic modifier and phosphate buffer was selected as the ion-suppressor. The univariate linear logD-logk w models and multi-parameter QSRR models were established based on multiple linear regression (MLR). The applicability of these two kinds of models under each pH value was evaluated based on the linear regression correlation coefficient (R 2 ). The results showed that the multi-parameter QSRR model exhibited advantages for determining the logD values of ionized basic compounds in a wide pH range. Based on the multi-parameter models, the logD values of 15 alkaline compounds were predicted at virous pH values.

Establishment of logD-logk w Models and Comparison with Previous Work
The retention behavior of the 42 basic compounds (Table 1) were investigated at different ratios of methanol (ϕ = 0.7-0.1, the interval was 0.1 or 0.05 based on the retention of the investigated solutes). The logk values were plotted versus ϕ for each solute at pH 7.0, pH 8.0, pH 9.0, and pH 10.0. A logk-ϕ relationship diagram of some model compounds, verification compounds, and sample compounds is shown in Figure S1. The results showed that the logk values of the investigated solutes all had a good relationship against ϕ at each pH value, with a linear correlation coefficient R 2 greater than 0.99 in each case. This phenomenon confirmed that the retention behavior of ionizable compounds still satisfies the linear solvent strength (LSS) model (Equation (S1)) in IS-RPLC. The logk w value of each compound was then obtained using Equation (S1), and the values are summarized in Table 1. The logD values of the model compounds and verification compounds calculated using Equation (S3) based on the logP and pH values are also listed in Table 1. The literature logP and pK a values were obtained from the database module of ACD/Labs software; The logD values were calculated with logP, pK a , and pH using Equation (S3). NA: no logP value obtained from the literature. Notably, the values of the model compounds 1-14 were consistent with those reported in Qi's work [21]. For these 14 compounds, the logD was plotted against the logk w and linearly fitted at a pH of 7.0-10.0 ( Figure 1). As Figure 1 shows, the dots were relatively dispersed at pH 7.0. As a result, the linearity of the logD-logk w models at this pH was poor, with R 2 values of only 0.825, as is shown in Table 2. When the pH was raised to 8.0-10.0, the dots became more and more concentrated, with a significantly improved linearity of the models (R 2 = 0.943-0.969). Additionally, the R 2 value at pH 9.0 and pH 10.0 were almost the same, suggesting that the linearity of the models did not change much after pH 9.0. This may be attributed to the fact that the dissociation of most of the model compounds was well inhibited at pH 9.0 and above, leading to a similarity in the retention behavior of the solutes. The literature logP and pKa values were obtained from the database module of ACD/Labs software; The logD values were calculated with logP, pKa, and pH using Equation (S3). NA: no logP value obtained from the literature.
Notably, the values of the model compounds 1-14 were consistent with those reported in Qi's work [21]. For these 14 compounds, the logD was plotted against the logkw and linearly fitted at a pH of 7.0-10.0 ( Figure 1). As Figure 1 shows, the dots were relatively dispersed at pH 7.0. As a result, the linearity of the logD-logkw models at this pH was poor, with R 2 values of only 0.825, as is shown in Table 2. When the pH was raised to 8.0-10.0, the dots became more and more concentrated, with a significantly improved linearity of the models (R 2 = 0.943-0.969). Additionally, the R 2 value at pH 9.0 and pH 10.0 were almost the same, suggesting that the linearity of the models did not change much after pH 9.0. This may be attributed to the fact that the dissociation of most of the model compounds was well inhibited at pH 9.0 and above, leading to a similarity in the retention behavior of the solutes. By comparing Figure 1A-D, it was found that the poor linearity at pH 7.0 was mainly caused by the deviation of benzylamine from other points. In structure, benzylamine is different from aniline and pyridine. The lone pair of N atoms in benzylamine is not coplanar, but at an angle of about 50° with the of the benzene ring. The saturated methylene By comparing Figure 1A-D, it was found that the poor linearity at pH 7.0 was mainly caused by the deviation of benzylamine from other points. In structure, benzylamine is different from aniline and pyridine. The lone pair of N atoms in benzylamine is not coplanar, but at an angle of about 50 • with the π 6 6 of the benzene ring. The saturated methylene prevents the lone pair from entering the benzene ring for conjugation, resulting in the strong dissociation of the amino group in benzylamine. It is known that the dissociable basic compounds form cations of the corresponding acid at pH < pK a , and that the dissociation can be completely inhibited only when the mobile phase pH ≥ pK a + 2. As Table 1 shows, the pK a (9.35) of benzylamine was significantly higher than those of the other 13 model compounds. Therefore, the benzylamine was almost completely dissociated at pH 7.0, leading to a relatively different retention behavior compared with the other 13 compounds. With the increased pH of the mobile phase, the dissociation of the benzylamine was partially suppressed, and the difference in retention behavior compared with the other compounds gradually decreased. If benzylamine was removed from the model compounds, the linear correlation of the logD-logk w models at all pH values would improve, especially at pH 7.0 (Table S1).
A comparation of the logD-logk w models in this work with those in Qi's work [21] was conducted. It was found that the linear models obtained in this work were better than those obtained in Qi's work at pH 9.0 and pH 10.0. There are three possible reasons for this: First, the different ion-suppressors in these two works. In Qi's work, ammonia and triethylamine (TEA) were used as ion-suppressors, but in this work, ammonium phosphate buffer was used as the ion-suppressor. Second, the improvement in column performance. The current chromatographic columns are always superior to those used in the past, both in terms of packing technology and packing composition, which is beneficial for obtaining a good peak and the accurate retention of the solutes. Third, the introduction of the dual-point retention time correction (DP-RTC) method [22] in this work. Research has shown that the retention time correction method can make the acquirement of retention time more accurate [22]. To better understand which was the main reason for the different linearities of the models in these two works, model compounds 1-14 were further investigated on the Welch Xtimate C18 column using ammonia and triethylamine (TEA) as ion-suppressors at pH 9.0 (the same ion-suppressors used in Qi's work). The resulting logD-logk w models and linear regression coefficients are shown in Table S2. Obviously, under the same mobile phase, the linearities of the logD-logk w models on the Welch Xtimate C18 column were superior to those on the Phenomenex Gemini C18 column, especially with TEA as an ion-suppressor. Therefore, the better performance of the current column and the introduction of the DP-RTC method both contributed to the improved linearity of the logD-logk w models in this work.  It was further discovered that, on the Welch Xtimate C18 column at pH 9.0 with TEA as an ion-suppressor, the logD-logk w linear correlation (R 2 = 0.961, Table S2) was almost equal to that obtained with phosphate buffer as an ion-suppressor (R 2 = 0.968, Table 2). However, the logD-logk w linear correlation was relatively low when using ammonia as an ion-suppressor (R 2 = 0.931), though it was still higher than that obtained in Qi's work (R 2 = 0.925). It maybe that the ionic strength of the ammonia solution was lower than that of the triethylamine solution and the phosphate buffer, and that affected the retention behavior of the solutes. Since phosphate buffer can not only inhibit the dissociation of solutes, but can also maintain the ion strength of the mobile phase, it was adopted in the following experiments in this work.

Establishment of Multi-Parameter QSRR Models
As mentioned above, when compounds with high pK a values were used as model compounds, the logD-logk w linearity was not good at low pH, and this was unfavorable for accurately predicting the logD values of strong ionized alkaline compounds. How can this problem be solved? As we know, the more model compounds, the better the linearity of the models will be, resulting in more accurate logD predictions. Therefore, in addition to the above 14 compounds, a further 9 compounds (No. 15-23, Table 1) with experimental logP and pK a values were introduced as model compounds. Altogether, these are the compounds 1-23 listed in Table 1. Similarly, logD values were plotted versus logk w values, and the corresponding logD-logk w models are listed in Table S3.
Contrary to our expectations, the linear correlation of the logD-logk w models was not obviously improved compared with that listed in Table 2. In the study of acidic ionized compounds, we proposed that the involvement of molecular structure parameters such as electrostatic charge n e , hydrogen bonding parameter A, and hydrogen bonding parameter B could effectively improve the correlation of the logD-logk w models [15,19,23]. Inspired by this, we speculated that this rule may also be applicable to ionized basic compounds. Thus, the parameters n e , A, and B were introduced to the models, and the values of the three parameters for all the investigated compounds are listed in Table S4.
Multi-parameter QSRR models including logD, logk w , n e , A, and B were established using the multiple linear regression (MLR) method. In the process of linear fitting, n e , A, B, and their different combinations were introduced to optimize the models. It was found that, at pH 7.0 and pH 8.0, the models which included n e exhibited the best linearity, suggesting that electrostatic interaction as well as hydrophobic interaction played an important role in solutes' retention under neutral and weak alkaline conditions. Because at pH 7.0 and pH 8.0 (though especially at pH 7.0) some model compounds were in a state of completed dissociation and some in a state of partial dissociation, there was strong electrostatic interaction between the ionized solutes and the stationary phase and mobile phase. In contrast, at pH 9.0 and pH 10.0, the linearity was the best when n e , A, and B were all introduced at the same time. This suggests that both electrostatic interaction and hydrogen bond interaction were the main secondary actions affecting the retention of the basic compounds under relatively strong alkaline conditions when the dissociation was weak. The best multi-parameter QSRR models at each investigated pH are listed in Table 3. From Table 3, we can see that the linearity of the models improved significantly compared with the models that contained no molecular structure parameters, with an R 2 value of 0.946 achieved at pH 7.0. It was thus proved that the multi-parameter QSRR models for the determination the logD values of basic compounds could be properly established under strong alkaline, weak alkaline, and even at neutral conditions. Table 3. Multi-parameter QSRR models derived from the 23 model compounds.

External Validation of Multi-Parameter Models and Sample logD Determination
Furthermore, to evaluate the reliability and accuracy of the multi-parameter QSRR models, four compounds whose logP values have been reliably reported in the literature-4-bromoaniline, 2-ethylpyridine, 2-ethylaniline, and dibenzylamine-were chosen as the validation compounds to perform the external verification experiment. The logD values reported in the literature for these four compounds were calculated using Equation (S3) with corresponding pK a , logP, and pH values. The verification results are listed in Table 4. It was observed that the relative errors between the determined logD values (the values determined by the models in Table 3) and the literature logD values were all within the acceptable range of 20%, not only at high pH values of 9.0-10.0, but also at low pH values of 7.0-8.0. It was also found that the models were able to accurately predict the logD values of both weak basic compounds and strong ionized basic compounds. The validation results showed that the multi-parameter QSRR models established in this work have strong robustness, good predictability, and wide pH applicability.
Based on the multi-parameter models, the logD values of the 15 basic sample compounds, including aniline, imidazole, and triazine (No. 28-42 in Table 1), were predicted at pH 7.0, pH 8.0, pH 9.0, and pH 10.0. The determined logD results are given in Table 5. Notably, this is the first time that experimental logD values have been reported for triazines, as far as we know.
The investigated compounds, as well as their logP and pK a values as reported in the literature, are all listed in Table 1. All the compounds were obtained from commercial sources (AccuStandard, New Haven, CT, USA; TCI, Tokyo, Japan; Sinopharm Chemical Reagent Co., Ltd., Shanghai, China; J&K Scientific Co., Ltd., Beijing, China; Acros Organics, NJ, USA; Matrix Scientific, Columbia, SC, USA; Sigma-Aldrich, St. Louis, MO, USA), and all of them had a purity of 98% or higher. In the experiments, these compounds were divided into the following three groups: model compounds, verification compounds, and sample compounds. Stock solutions of each solute at concentrations of about 1.0 mg/mL were prepared in methanol and stored in a refrigerator at 4 • C before use.

Instruments and Equipment
A Waters 2695 Alliance separation module (Milford, MA, USA) consisting of a vacuum degasser, a quaternary pump, an auto-sampler, and a Waters 996 photodiode-array detector was employed to perform the HPLC experiments.
The pH was measured with a SevenMulti electrochemical analytical meter (Metter-Toledo, Schwerzenbach, Switzerland). The electrode system was standardized with ordinary aqueous buffers of pH 4.01, pH 7.02, pH 9.26 and pH 11.00 at 25 • C. All pH readings were carried out in buffer solution.

Chromatographic Condition
The chromatographic column used in the experiment was a Welch Xtimate ® C18 (150 mm × 4.6 mm i.d., 5 µm, Welch Technology Co., Ltd., Shanghai, China) with an alkaline-resistant stationary phase, a pH range of 1.0-12.5, and a flow rate of 1.0 mL/min. The column temperature was set at 30 • C. Methanol and 20 mmol/L ammonium phosphate buffer (pH = 7.0, 8.0, 9.0, 10.0) was used as the mobile phase to perform isocratic elution. The injection volume was 10 µL and the detection wavelength for each eluted compound was set at its optimum absorption wavelength.

Experimental Methods
At each pH, the retention time (t R ) of every analyte was determined for at least four different volume fractions (ϕ) of methanol. All the t R values of the solutes were obtained by averaging the results of at least three independent injections. The t R value was corrected using dual-point retention time correction (DP-RTC). For a detailed description of the process, refer to our previous work [23]. For the different hydrophobicities of the studied compounds, benzene, toluene, and benzyl alcohol were used as anchor compounds in the DP-RTC method. The dead time t 0 was determined using uracil. For each solute, the logarithm of k was plotted against ϕ, and logk w was obtained using the Equation (S1).
The logD values of the model compounds were calculated using Equation (S3), and the logD-logk w models were established with Equation (S4). The statistical analysis for the regression model was accomplished using Origin 9.4.

Conclusions
The retention behaviors of aniline, pyridine, imidazole, and triazine compounds were studied using IS-RPLC on a silica-based C18 column at pH 7.0-10.0. Multi-parameter QSRR models were established by introducing molecular structure parameters such as n e , A, and B for determining the logD values of basic compounds. It was proposed that the QSRR models had strong robustness, good predictability, and wide pH applicability because the logD values of ionized basic compounds could not only be accurately determined under strong alkaline conditions with the dissociation mostly inhibited, but they could also be determined under weak alkaline or even neutral conditions with solutes in an ionized state using the established models. Moreover, we successfully predicted the logD values of 15 basic sample compounds using the developed multi-parameter QSRR models at pH 7.0, pH 8.0, pH 9.0 and pH 10.0. This work made up for the deficiency of our previous work on the prediction of the logD values of basic compounds, providing an optional mild pH for experimental logD determination. Under relatively weak alkaline conditions or at neutral pH, it is not only convenient to adjust the pH of the mobile phase, but it is also important to protect the chromatographic column of the silica gel matrix, which helps to enhance the life of the silica-based columns and save money and time.