<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">ijms</journal-id>
<journal-title>International Journal of Molecular Sciences</journal-title>
<abbrev-journal-title>Int. J. Mol. Sci.</abbrev-journal-title>
<issn pub-type="epub">1422-0067</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/ijms10062558</article-id>
<article-id pub-id-type="publisher-id">ijms-10-02558</article-id>
<article-categories>
<subj-group>
<subject>Review</subject></subj-group></article-categories>
<title-group>
<article-title>QSPR Studies on Aqueous Solubilities of Drug-Like Compounds</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Duchowicz</surname><given-names>Pablo R.</given-names></name><xref ref-type="corresp" rid="c1-ijms-10-02558">*</xref></contrib>
<contrib contrib-type="author">
<name><surname>Castro</surname><given-names>Eduardo A.</given-names></name></contrib>
<aff id="af1-ijms-10-02558">Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas INIFTA (UNLP, CCT La Plata-CONICET), Diag. 113 y 64, C.C. 16, Suc.4, (1900) La Plata, Argentina; E-Mail:
<email>castro@quimica.unlp.edu.ar</email></aff></contrib-group>
<author-notes>
<corresp id="c1-ijms-10-02558">
<label>*</label>Author to whom correspondence should be addressed; E-Mail:
<email>pabloducho@gmail.com</email>; Tel. +54-221-425-7430; Fax: +54-221- 425-4642</corresp></author-notes>
<pub-date pub-type="collection">
<month>6</month>
<year>2009</year></pub-date>
<pub-date pub-type="epub">
<day>3</day>
<month>6</month>
<year>2009</year></pub-date>
<volume>10</volume>
<issue>6</issue>
<fpage>2558</fpage>
<lpage>2577</lpage>
<history>
<date date-type="received">
<day>11</day>
<month>4</month>
<year>2009</year></date>
<date date-type="rev-recd">
<day>19</day>
<month>5</month>
<year>2009</year></date>
<date date-type="accepted">
<day>31</day>
<month>5</month>
<year>2009</year></date></history>
<permissions>
<copyright-statement>© 2009 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland.</copyright-statement>
<copyright-year>2009</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0">
<p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>A rapidly growing area of modern pharmaceutical research is the prediction of aqueous solubility of drug-sized compounds from their molecular structures. There exist many different reasons for considering this physico-chemical property as a key parameter: the design of novel entities with adequate aqueous solubility brings many advantages to preclinical and clinical research and development, allowing improvement of the Absorption, Distribution, Metabolization, and Elimination/Toxicity profile and “screenability” of drug candidates in High Throughput Screening techniques. This work compiles recent QSPR linear models established by our research group devoted to the quantification of aqueous solubilities and their comparison to previous research on the topic.</p></abstract>
<kwd-group>
<kwd>QSPR theory</kwd>
<kwd>aqueous solubility</kwd>
<kwd>ADME/Tox properties</kwd>
<kwd>Lipinski rules</kwd>
<kwd>molecular descriptors</kwd>
<kwd>replacement method</kwd>
<kwd>group contribution methods</kwd>
<kwd>high throughput screening techniques</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Nowadays it is generally recognized that an ideal drug, besides being pharmacologically active, should additionally possess certain features regarding its bioavailability and its toxicological profile [<xref ref-type="bibr" rid="b1-ijms-10-02558">1</xref>–<xref ref-type="bibr" rid="b5-ijms-10-02558">5</xref>]. Absorption, Distribution, Metabolization, and Elimination/Toxicological (ADME/Tox) <italic>in silico</italic> filters constitute widely employed tools to determine whether it is probable or not for a drug candidate to reach its site of action or elicit toxic effects at its therapeutic dose. Moreover, modern approaches developed in the pharmaceutical industry for a rational molecular design have moved the ADME/Tox evaluations to the early stages of drug development, where an optimal activity of the compound is sought [<xref ref-type="bibr" rid="b6-ijms-10-02558">6</xref>].</p>
<p>The degree of absorption of a substance depends simultaneously on dose, solubility, and permeability, and the exploration of large databases containing orally bioavailable drugs led to the formulation of the widely-used Lipinski “rule of five” for compounds absorbed through the gastrointestinal barrier via passive diffusion [<xref ref-type="bibr" rid="b7-ijms-10-02558">7</xref>]. These simple rules state that oral bio-availability is likely to occur if at least three of the following rules are obeyed: molecular weight below 500; no more than five hydrogen bond donors and less than 10 hydrogen bond acceptors; and a calculated logarithm of the partition coefficient of the compound between water and octanol (log P) below 5.</p>
<p>The empirical conditions to satisfy Lipinski’s rule and display good oral bioavailability involve a balance between the aqueous solubility of a compound and its ability to diffuse passively through the different biological barriers. Aqueous solubility governs both the rate of dissolution of the compound and the maximum concentration reached in the gastrointestinal fluid. However, excessively polar compounds would result problematic at the stage of passing through the various biological barriers. Furthermore, it is known that aqueous solubility constitutes an important parameter in Medicinal Chemistry for the following reasons: soluble compounds are associated to shorter metabolization and elimination times, thus leading to lower probability of adverse effects and bioaccumulation [<xref ref-type="bibr" rid="b1-ijms-10-02558">1</xref>,<xref ref-type="bibr" rid="b2-ijms-10-02558">2</xref>,<xref ref-type="bibr" rid="b8-ijms-10-02558">8</xref>], and most pre-clinical tests involve solubilization of the drug being tested in hydrophilic solvents [<xref ref-type="bibr" rid="b9-ijms-10-02558">9</xref>,<xref ref-type="bibr" rid="b10-ijms-10-02558">10</xref>]. Accurate activity measurements can be obtained only if the substance is sufficiently soluble (above the detection limit of the assay). Otherwise, an active compound may appear to be inactive due to insufficient solubility rather than inadequate potency [<xref ref-type="bibr" rid="b4-ijms-10-02558">4</xref>,<xref ref-type="bibr" rid="b5-ijms-10-02558">5</xref>].</p>
<p>The aqueous solubility of a given chemical entity can be obtained by experimental determination, although this usually presents some difficulties [<xref ref-type="bibr" rid="b2-ijms-10-02558">2</xref>,<xref ref-type="bibr" rid="b3-ijms-10-02558">3</xref>]. The traditional “shake flask” assay for measuring solubility is an equilibrium (thermodynamical) assay in which the solid is mixed vigorously with an aqueous buffer for a long period of time. This approach requires a fairly large amount of sample (1 – 2 mg) and is time-demanding (24 – 72 hours or more to do properly). Kinetic solubility measurements, in miniaturized methods such as Nephelometry [<xref ref-type="bibr" rid="b11-ijms-10-02558">11</xref>], require little starting material but involve a reliable DMSO stock solution and multiple repeats to achieve accuracy. Furthermore, kinetic and thermodynamic solubility measurements are not interchangeable: they rely on fundamentally different physical properties to assess solid-state and solvation interactions and thus should be approached and interpreted with both caution and a detailed understanding of their strengths and limitations [<xref ref-type="bibr" rid="b12-ijms-10-02558">12</xref>]. Obviously, it is not feasible to measure the solubility when no samples of compounds are available, while the times required for these assays are not compatible with the new High Throughput Screening technologies.</p>
<p>This background explains the great interest of developing theoretical models to predict aqueous solubility directly from structure. Consequently, a high number of theoretical models have been proposed in the past to predict aqueous solubilities, ranging from the early studies of Amidon <italic>et al</italic>. in 1975 [<xref ref-type="bibr" rid="b9-ijms-10-02558">9</xref>] to several approaches including thermodynamic calculations, Group Contribution Methods and Quantitative Structure-Property Relationships (QSPR) [<xref ref-type="bibr" rid="b8-ijms-10-02558">8</xref>,<xref ref-type="bibr" rid="b13-ijms-10-02558">13</xref>–<xref ref-type="bibr" rid="b16-ijms-10-02558">16</xref>].</p></sec>
<sec sec-type="methods">
<label>2.</label>
<title>Some Different <italic>in silico</italic> Methods for Solubility Estimation</title>
<p>The simplest definition for aqueous solubility (<italic>S</italic>, mol·L<sup>−1</sup>) in a given solvent is the maximum amount of the most stable crystalline form of the compound that can remain in solution in a given volume of the solvent at a given temperature and pressure under thermodynamic equilibrium [<xref ref-type="bibr" rid="b12-ijms-10-02558">12</xref>]. This equilibrium balances the energy of the intermolecular interactions between solvent and solute molecules against the energy of solvent and solute molecules interacting intramolecularly with each other. For an ionizable compound, solubility without reference to pH and ionization constant pK<sub>a</sub> is meaningless, while for any compound under analysis the specific solid state (amorphous or crystalline state) and solvent/s used is central for determining the solubility. It is also possible to distinguish different precise definitions of the term solubility [<xref ref-type="bibr" rid="b1-ijms-10-02558">1</xref>].</p>
<p>The interaction between water and drug has been intensively studied previously and reviewed in ref. [<xref ref-type="bibr" rid="b2-ijms-10-02558">2</xref>]. A typically employed empirical method to estimate solubility is based on easily obtained measurements, combining log <italic>P</italic> and melting point (MP) data by using the “General Solubility Equation” (GSE) [<xref ref-type="bibr" rid="b17-ijms-10-02558">17</xref>–<xref ref-type="bibr" rid="b19-ijms-10-02558">19</xref>]. Surprisingly, despite of its relative simplicity this equation has impressive accuracy as demonstrated in several studies [<xref ref-type="bibr" rid="b20-ijms-10-02558">20</xref>–<xref ref-type="bibr" rid="b22-ijms-10-02558">22</xref>], and this fact has led to the proposal of improved versions of the GSE model for adjusting large data sets of compounds [<xref ref-type="bibr" rid="b23-ijms-10-02558">23</xref>–<xref ref-type="bibr" rid="b26-ijms-10-02558">26</xref>]. The log <italic>P</italic> parameter provides an estimate of the strength of the interaction of the compound with water, while most common log <italic>P</italic> estimation programs are fragment based and empirical, such as CLOGP (Daylight Chemical Information Systems) and ACD/logD (Advanced Chemistry Development, Inc.). The main drawback of this method appears when it involves compounds having very high melting points (the sample decomposes before melting) or very low or very high log <italic>P</italic> values [<xref ref-type="bibr" rid="b15-ijms-10-02558">15</xref>,<xref ref-type="bibr" rid="b27-ijms-10-02558">27</xref>]. Other empirical methods were also reported, although sharing the common disadvantage that all of them require the experimental measurement of some terms defined in the equation [<xref ref-type="bibr" rid="b28-ijms-10-02558">28</xref>,<xref ref-type="bibr" rid="b29-ijms-10-02558">29</xref>].</p>
<p>The energetics of a compound in water can be assessed through a model of solvation, by resorting to Molecular Simulation in a statistical thermodynamical-like approach. Jorgensen and Duffy [<xref ref-type="bibr" rid="b30-ijms-10-02558">30</xref>] employed Monte Carlo simulation with solute embedded in a bath of rigid water molecules to derive cohesive properties that can be used to predict solubility. However, this sort of calculations is quite computationally demanding for each different solute. A completely different approach to simulation is the Cellular Automata [<xref ref-type="bibr" rid="b31-ijms-10-02558">31</xref>], were solvent and solute are represented by cells on a grid while their movements are governed by their immediate neighbors and a set of transition rules. The occupancy patterns of the cells change at each step, and many steps are involved. Such a kind of simulation offers intriguing insights into the dissolution process, i.e. formation of mobile cavities within the solid solute, but is not as useful as Monte Carlo in quantitative work. An alternative to the simulation of a large ensemble of particles focuses on a single solute molecule that is modeled in more detail, being based on electronic structure methods of Quantum Mechanics. Within this framework the solvent, which polarizes the molecule and is itself polarized by the solute, can be approximated as a continuous dielectric (Cramer-Truhlar approach) [<xref ref-type="bibr" rid="b32-ijms-10-02558">32</xref>]. An alternative modeling of the solvent embeds both solute and solvent in a perfect conductor to calculate their polarization charge densities in the COSMO-RS (COSMOlogic GmbH and Co. KG) quantum chemical approach, leading to a chemical potential for the system that enables to estimate the solubility [<xref ref-type="bibr" rid="b33-ijms-10-02558">33</xref>]. Despite of this, Quantum Mechanics methods are much slower than Monte Carlo simulations and result unsuitable for the analysis of large datasets of compounds. <xref ref-type="table" rid="t1-ijms-10-02558">Table 1</xref> summarizes different classes of methods to predict aqueous solubility data [<xref ref-type="bibr" rid="b1-ijms-10-02558">1</xref>].</p>
<p>Among the different existing techniques for estimating different physical and thermodynamic data of interest, Group Contribution Methods (GCM) [<xref ref-type="bibr" rid="b34-ijms-10-02558">34</xref>–<xref ref-type="bibr" rid="b36-ijms-10-02558">36</xref>] are easy to apply, relying solely on the sum of contributions of each molecular structure fragment to the aqueous solubility. The basic assumption of this approach is the transferability concept for a group; if this hypothesis does not hold, then GCM can be corrected with experimental data when available to achieve better predictions. The methods proposed by Nirmalakhandan <italic>et al</italic>. [<xref ref-type="bibr" rid="b37-ijms-10-02558">37</xref>], Suzuki <italic>et al</italic>. [<xref ref-type="bibr" rid="b38-ijms-10-02558">38</xref>], Kuhne <italic>et al</italic>. [<xref ref-type="bibr" rid="b39-ijms-10-02558">39</xref>], Lee <italic>et al</italic>. [<xref ref-type="bibr" rid="b40-ijms-10-02558">40</xref>], and Klopman <italic>et al</italic>. [<xref ref-type="bibr" rid="b14-ijms-10-02558">14</xref>,<xref ref-type="bibr" rid="b41-ijms-10-02558">41</xref>] belong to this category. Among all these methods, only Klopman’s method is a pure and general group contribution model without using additional experimental parameters.</p>
<p>Although GCM have a simple and practical implementation, some common drawbacks of this methodology are the following: a) they require a large data set to obtain a contribution of each functional group; b) in its basic form (without corrections) it cannot model isomeric structures; c) they may contain a “missing fragment” problem, which means that if a compound contains a missing fragment which can be defined by the group contribution model, its aqueous solubility cannot be precisely predicted; d) there are not always measured data available to extend these methods to strange compounds such as molecules containing fused aromatic rings or to organometallic compounds. Since the final estimated GCM value assigned to the aqueous solubility of a compound involve that it change from the solid phase to a new one (liquid), this makes it harder to separate the contributions of individual parts of the molecule to the whole process. Nevertheless, GCM is a fast method for estimating aqueous solubility on large data sets of compounds and can produce reasonably accurate results.</p></sec>
<sec>
<label>3.</label>
<title>Predicting Solubility through Linear Regression Based QSPR-QSAR</title>
<p>In the realms of the Quantitative Structure Property-Activity Relationships theory (QSPR-QSAR), a physicochemical or biological property of a compound is assumed to be a unique consequence of its molecular structure [<xref ref-type="bibr" rid="b42-ijms-10-02558">42</xref>–<xref ref-type="bibr" rid="b44-ijms-10-02558">44</xref>]. Therefore, a model is employed to predict the property by means of structural descriptors or numerical variables that capture different constitutional, topological, geometrical or electronic characteristics of the molecular structure in consideration. These molecular descriptors can be readily calculated through mathematical formulae obtained from several theories, such as the Chemical Graph Theory, Information Theory, Quantum Mechanics, etc. [<xref ref-type="bibr" rid="b45-ijms-10-02558">45</xref>,<xref ref-type="bibr" rid="b46-ijms-10-02558">46</xref>] The hypotheses involved in QSPR-QSAR analyzes have proven in the past to function quite well for a wide spectrum of properties/activities of interest.</p>
<p>QSPR-QSAR models enable property estimation for substances that have yet not been tested for different reasons, such as instability, toxicity, or simply because their measurement requires too much time. In terms of economy, these studies allow the rational use of the available resources present in the laboratory or even a plant, avoiding performing expensive and unnecessary experimental determinations. With respect to their moral aspects, the QSPR-QSAR analyses applied to Toxicology have achieved great importance in the virtual screening of the toxic potential of compounds before their synthesis [<xref ref-type="bibr" rid="b47-ijms-10-02558">47</xref>], and thus represent an effective alternative that reduces animal testing in biological assays. In drug discovery, both the prediction with QSAR-QSPR of ADMET properties [<xref ref-type="bibr" rid="b48-ijms-10-02558">48</xref>] and the oral bioavailability of compounds [<xref ref-type="bibr" rid="b49-ijms-10-02558">49</xref>,<xref ref-type="bibr" rid="b50-ijms-10-02558">50</xref>] were conveniently addressed. Finally, from the theoretically point of view, the model can illuminate the mechanisms of physicochemical properties or biological activities of the compounds.</p>
<p>It is well known that a single descriptor is unable to carry all the structural information of a molecule, and one has to search for the best descriptors among the more than a thousand available in the literature, that are the most representative/descriptive parameters for the particular modeled property [<xref ref-type="bibr" rid="b51-ijms-10-02558">51</xref>–<xref ref-type="bibr" rid="b53-ijms-10-02558">53</xref>]. There exist various standard statistical methods that constitute a common practice for QSPR-QSAR model design, such as linear: Multivariable Linear Regression (MLR) [<xref ref-type="bibr" rid="b54-ijms-10-02558">54</xref>], Principal Component Analysis (PCA) [<xref ref-type="bibr" rid="b55-ijms-10-02558">55</xref>], Genetic Algorithms [<xref ref-type="bibr" rid="b56-ijms-10-02558">56</xref>], Replacement Method [<xref ref-type="bibr" rid="b57-ijms-10-02558">57</xref>], and non-linear methods: Artificial Neural Networks (ANN) [<xref ref-type="bibr" rid="b58-ijms-10-02558">58</xref>], or Support Vector Machines [<xref ref-type="bibr" rid="b59-ijms-10-02558">59</xref>]. The main advantage of developing linear models compared to non-linear ones is the fact that the former suffer less from the over-fitting (over-training) problem [<xref ref-type="bibr" rid="b60-ijms-10-02558">60</xref>,<xref ref-type="bibr" rid="b61-ijms-10-02558">61</xref>], they are more general and can transparently reveal the effect of the structural variables present in the model upon the property being modeled, thus making it possible to suggest cause/effect relationships.</p></sec>
<sec>
<label>4.</label>
<title>The Proposal of Descriptors Based on Lipinski Rules for Modeling Aqueous Solubilities</title>
<p>One of our recent QSPR studies analyzing aqueous solubilities employs MLR for establishing the connection between the solubility values of 148 heterogeneous organic chemicals and their molecular structure, represented through a new set of physically interpretable descriptors [<xref ref-type="bibr" rid="b62-ijms-10-02558">62</xref>]. The correct representation of the molecular structure of drug like compounds through molecular descriptors in every QSPR-QSAR study is of crucial importance. The set of descriptors introduced here is characterized by involving in a single number several of the parameters described by the Lipinski rules [<xref ref-type="bibr" rid="b7-ijms-10-02558">7</xref>]. The proposed Lipinski based descriptors are based on combinations of the detour index (<italic>dd</italic>) from Chemical Graph Theory (derived as the half sum of the elements of the Detour Matrix - <italic>DD</italic>) [<xref ref-type="bibr" rid="b63-ijms-10-02558">63</xref>] together with molecular features such as the number of H donors (<italic>D</italic>), the number of H acceptors (<italic>A</italic>) and the number of heteroatoms (<italic>H</italic>) present in the structure:
<disp-formula id="FD1">
<label>(1)</label>
<mml:math display="block">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>/</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>0.1</mml:mn></mml:mrow></mml:mfrac>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi>D</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>/</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>A</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow>
<mml:mi>A</mml:mi></mml:mfrac>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi>D</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>/</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>B</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>D</mml:mi></mml:mrow></mml:mfrac>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi>D</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>/</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>H</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow>
<mml:mi>H</mml:mi></mml:mfrac></mml:mrow></mml:math></disp-formula>where the 0.1 term in the <italic>D/D</italic> definition is introduced only to prevent dividing by zero, considering that several of the studied compounds do not have any H donor functional group.</p>
<p>The above descriptor definitions take into consideration many literature reports which demonstrate linear, polynomial and exponential correlations between <italic>dd</italic> and the boiling point of alkanes, cycloalkanes and aromatic compounds [<xref ref-type="bibr" rid="b64-ijms-10-02558">64</xref>–<xref ref-type="bibr" rid="b68-ijms-10-02558">68</xref>]. Since the boiling point of compounds from homologous series usually correlates well with molecular weight (MW), we have investigated the relationship between the <italic>dd</italic> and the MWs of the 148 compounds used for the present study. Inspection of the correlation between <italic>dd</italic> and MW pushed us to explore possible relationships between the square and cubic roots of <italic>dd</italic> and the MW. It is noticeable that cubic root of <italic>dd</italic>, in the first place, and square root of <italic>dd</italic>, in the second, display quite better linear correlations with the molecular weight of the 148 structures (<italic>R</italic> = 0.932 and <italic>R</italic> = 0.918, in that order). This is an indication of very good correlation, specially noticing the structural diversity of the dataset.</p>
<p>It is clear then that the Detour Index may be an appropriate descriptor to explain the differences in the aqueous solubility values that could be explained through the molecular weight of compounds. It can also characterize other molecular properties such as the degree of ramification and cyclization. However, there are a lot of examples of compounds that, although sharing the same graph and therefore the same <italic>dd</italic> value, have very different solubilities because of the other three parameters included in Lipinski’s rule (number of H donor and acceptors and log <italic>P</italic>). To answer this issue we have included <italic>A</italic>, <italic>B</italic> (= <italic>A</italic> + <italic>D</italic>), and <italic>H</italic> in the new descriptor’s definition. We also considered the square and cubic roots of the four descriptors above (<italic>D</italic>/<italic>D</italic><sup>1/2</sup>, <italic>D</italic>/<italic>D</italic><sup>1/3</sup>, <italic>D</italic>/<italic>A</italic><sup>1/2</sup>, <italic>D</italic>/<italic>A</italic><sup>1/3</sup>, <italic>D</italic>/<italic>B</italic><sup>1/2</sup>, <italic>D</italic>/<italic>B</italic><sup>1/3</sup>, <italic>D</italic>/<italic>H</italic><sup>1/2</sup>, and <italic>D</italic>/<italic>H</italic><sup>1/3</sup>), based on the better correlation between the squares and cubic roots of <italic>dd</italic> and MW compared to that between <italic>dd</italic> and MW. The physicochemical meaning of these descriptors is immediate. MW is directly correlated with <italic>dd</italic>, and the solubility tends to decrease, in homologous series, when MW increases. The more H donor and acceptors present in the molecule the more water soluble the compound will be. If no H donor or acceptor is present in the molecule, the water solubility would be jeopardized or even non existent (as is the case of alkanes). Therefore, the defined descriptors will take high values in compounds with slight aqueous solubility, while they will tend to infinite in non-soluble compounds.</p>
<p>We proceeded to search for a QSPR solubility model that minimizes the <italic>S</italic> parameter subjected to the condition of combining at least one of the proposed molecular descriptors reflecting the Lipinski rules together with those calculated with the Dragon software [<xref ref-type="bibr" rid="b69-ijms-10-02558">69</xref>]. The application of the Replacement Method (RM) variable subset selection technique [<xref ref-type="bibr" rid="b57-ijms-10-02558">57</xref>,<xref ref-type="bibr" rid="b70-ijms-10-02558">70</xref>,<xref ref-type="bibr" rid="b71-ijms-10-02558">71</xref>] to the available pool with <italic>D</italic> = 1,367 descriptors leads to an optimal relationship over 100 compounds that, in terms of the best predictive power of the equation measured via the calibration and the <italic>l-n%-o</italic> parameters [<xref ref-type="bibr" rid="b72-ijms-10-02558">72</xref>] and the least number of variables involved, contains six molecular descriptors of different type:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math display="block">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mtext>log</mml:mtext>
<mml:mrow>
<mml:mn>10</mml:mn></mml:mrow></mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>2.786</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.3</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mo>+</mml:mo>
<mml:mi> </mml:mi>
<mml:mn>0.0479</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.02</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mi mathvariant="italic">RDF040e</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>+</mml:mo>
<mml:mi> </mml:mi>
<mml:mn>0.285</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.07</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>C</mml:mi>
<mml:mo>-</mml:mo>
<mml:mn>006</mml:mn>
<mml:mo>−</mml:mo>
<mml:mn>5.639</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.7</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi>H</mml:mi>
<mml:mn>3</mml:mn>
<mml:mi>p</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>+</mml:mo>
<mml:mi> </mml:mi>
<mml:mn>0.00389</mml:mn>
<mml:mi> </mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.001</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>−</mml:mo>
<mml:mi> </mml:mi>
<mml:mn>0.231</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.04</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>D</mml:mi>
<mml:mo>/</mml:mo>
<mml:msup>
<mml:mi>B</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup>
<mml:mi> </mml:mi>
<mml:mo>+</mml:mo>
<mml:mi> </mml:mi>
<mml:mn>0.00988</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.002</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi>Q</mml:mi>
<mml:mi>X</mml:mi>
<mml:mi>X</mml:mi>
<mml:mi>e</mml:mi></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi>N</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>100</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>R</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.880</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>S</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.858</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>F</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>53.091</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>p</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mi> </mml:mi>
<mml:msup>
<mml:mn>10</mml:mn>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>4</mml:mn></mml:mrow></mml:msup>
<mml:mo>,</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.853</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.911</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>10</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>-</mml:mo>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>,</mml:mo>
<mml:mn>0.820</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>10</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>-</mml:mo>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1.006.</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where the absolute errors of the regression coefficients are given in parentheses and <italic>R</italic> is the correlation coefficient, <italic>F</italic> is the Fisher ratio and <italic>p</italic> is the significance of the model. Quite good estimations can be achieved with this QSPR model in many cases, considering the heterogeneous nature of the training set of molecules extracted from Merck Index 13<sup>th</sup> [<xref ref-type="bibr" rid="b73-ijms-10-02558">73</xref>]. About 99% of these compounds are “drug-like”, satisfying Lipinski’s rule.</p>
<p>Equation (2) involves different molecular descriptors that can be classified as follows: two of the proposed absorption-based descriptors: <italic>D</italic>/<italic>A</italic> and <italic>D</italic>/<italic>B</italic><sup>1/2</sup>; a Radial Distribution Function (RDF): <italic>RDF040e</italic>, RDF-4.0/weighted by atomic Sanderson electronegativities [<xref ref-type="bibr" rid="b74-ijms-10-02558">74</xref>]; a GETAWAY descriptor: <italic>H3p</italic>, H autocorrelation of lag 3/weighted by atomic polarizabilities [<xref ref-type="bibr" rid="b75-ijms-10-02558">75</xref>]; an Atom-Centred Fragment: <italic>C-006</italic>, the number of CH<sub>2</sub>RX functional groups [X: heteroatom (O, N, S, P, Se or halogens), R: any group linked through carbon] [<xref ref-type="bibr" rid="b76-ijms-10-02558">76</xref>]; and a geometrical descriptor: <italic>QXXe</italic>, Qxx COMMA2 value/weighted by atomic Sanderson electronegativities [<xref ref-type="bibr" rid="b77-ijms-10-02558">77</xref>]. A next step in the present analysis was to further validate the predictive power of the QSPR solubility model found by predicting the log(<italic>Sol</italic>) values in a test set containing 48 organic compounds, thus demonstrating that it is possible to achieve good estimations in many situations.</p></sec>
<sec sec-type="methods">
<label>5.</label>
<title>A QSPR Designed upon a Balanced Aqueous Solubility Data Set</title>
<p>It has been pointed out that solubility modeling efforts have suffered from some basic concerns, among them: training sets that are not drug-like, lack of structural diversity, unknown experimental error, incorrect tautomers or structures, neglect of ionization and crystal packing effects, over-sampling of compounds with low molecular weight and range in solubility data that is not pharmaceutically relevant [<xref ref-type="bibr" rid="b2-ijms-10-02558">2</xref>,<xref ref-type="bibr" rid="b4-ijms-10-02558">4</xref>]. Another study conducted by our research group [<xref ref-type="bibr" rid="b78-ijms-10-02558">78</xref>] tries to answer some of the previous issues, since it is developed from a structural diverse training set composed by drug-like compounds with more than half the dataset presenting solubility values below 1 mg·mL<sup>−1</sup>. Note that low solubility compounds are actually the ones one would like to be able to predict accurately, since they have higher probability of presenting difficulties in pre-clinic and clinic assays and formulation stages. Therefore, the QSPR Theory was employed for analyzing the aqueous solubility exhibited at 298 K by 145 diverse drug-like organic compounds. The molecular set was split into a 97-compound training set (train) and a 48-compounds test set (val), selecting the members of each set in such a way to share similar structural characteristics of the compounds. Additionally, an external molecular set (test set 21) that was not involved during the model design, and composed of 21 well-known compounds found in many solubility prediction papers, was also employed [<xref ref-type="bibr" rid="b2-ijms-10-02558">2</xref>,<xref ref-type="bibr" rid="b14-ijms-10-02558">14</xref>], in order to further examine the model’s validation.</p>
<p>In this work, most of the drugs that comprise the training and test sets meet several drug-likeness criteria. More than 99% of the data set observes the Lipinski-rule criteria for estimating drug oral bioavailability [<xref ref-type="bibr" rid="b7-ijms-10-02558">7</xref>], while more than 93% fulfill the Veber <italic>et al</italic>. rule [<xref ref-type="bibr" rid="b79-ijms-10-02558">79</xref>]. More than 99% of the dataset also meets more general criteria for evaluating drug-likeness extracted from several recent publications: [<xref ref-type="bibr" rid="b80-ijms-10-02558">80</xref>–<xref ref-type="bibr" rid="b82-ijms-10-02558">82</xref>] 100 ≤ molecular weight ≤ 800 g·mol<sup>−1</sup>; log <italic>P</italic> ≤ 7; number of H bond acceptors ≤ 10; number of H bond donors ≤5; rotatable bonds ≤15; halogen atoms ≤7; alkyl chains ≤ (CH<sub>2</sub>)<sub>6</sub>CH<sub>3</sub>; no perfluorinated chains: CF<sub>2</sub>CF<sub>2</sub>CF<sub>3</sub>; no large rings (i.e. with more than seven members); no presence of atoms other than C, O, N, S, P, F, Cl, Br, I, Na, K, Mg, Ca or Li and; presence of at least one N or O atom. Moreover, low molecular weight compounds are not over-represented in this molecular set. All the molecular structures are drawn in <xref ref-type="fig" rid="f1-ijms-10-02558">Figure 1</xref>.</p>
<p>The structural diversity of the training set is assessed through calculation of the average Tanimoto intermolecular distances (based on atom pairs) for all the possible pairs of structures that could be derived from the training set. For this purpose the PowerMV software provided by the National Institute of Statistical Sciences was used [<xref ref-type="bibr" rid="b83-ijms-10-02558">83</xref>]. According to the results, the average Tanimoto intermolecular distance for the training set is 0.781 with a <italic>S</italic> of 0.412, which confirms the high structural diversity of the training set. <xref ref-type="fig" rid="f2-ijms-10-02558">Figure 2</xref> shows a histogram representing the distribution of the 166 aqueous solubilities under study, which suggests that the experimental sample is normally distributed over more than four logarithmic units and can thus be employed in regression analysis.</p>
<p>The initial conformations of the drug compounds are obtained by means of the “model build” modulus of the HyperChem package [<xref ref-type="bibr" rid="b84-ijms-10-02558">84</xref>]. After that, the structures of the compounds are firstly pre-optimized with the Molecular Mechanics Force Field (MM+) procedure included in the Hyperchem, and the resulting geometries are further refined by means of the Semi-Empirical Method PM3. More than a thousand DRAGON [<xref ref-type="bibr" rid="b69-ijms-10-02558">69</xref>] theoretical descriptors are simultaneously explored including definitions of all classes, by means of the linear variable subset selection approach Replacement Method (RM) [<xref ref-type="bibr" rid="b57-ijms-10-02558">57</xref>, <xref ref-type="bibr" rid="b70-ijms-10-02558">70</xref>, <xref ref-type="bibr" rid="b71-ijms-10-02558">71</xref>]. The application of the RM method on the training set of 97 heterogeneous drugs leads to the following satisfactory three-descriptors relationship:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math display="block">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mtext>log</mml:mtext>
<mml:mrow>
<mml:mn>10</mml:mn></mml:mrow></mml:msub>
<mml:mi>S</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>0.435</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.03</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo> </mml:mo>
<mml:mo>·</mml:mo>
<mml:mo> </mml:mo>
<mml:mo>Ω</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>X</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi>s</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mn>0.503</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.06</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>·</mml:mo>
<mml:mo>Ω</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>M</mml:mi>
<mml:mi>L</mml:mi>
<mml:mi>O</mml:mi>
<mml:mi>G</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mi> </mml:mi>
<mml:mn>0.0767</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.01</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>⋅</mml:mo>
<mml:mo>Ω</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>D</mml:mi>
<mml:mi>F</mml:mi>
<mml:mn>060</mml:mn>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mo>+</mml:mo>
<mml:mi> </mml:mi>
<mml:mn>2.970</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.3</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>97</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi></mml:mrow></mml:msub>
<mml:mo>/</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>32.333</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>R</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.871</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>S</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0.903</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.849</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.971</mml:mn>
<mml:mo>,</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>10</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>-</mml:mo>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.809</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>10</mml:mn>
<mml:mo>%</mml:mo>
<mml:mo>-</mml:mo>
<mml:mi>o</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1.090</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:mi>p</mml:mi>
<mml:mi> </mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mi> </mml:mi>
<mml:msup>
<mml:mn>10</mml:mn>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>4</mml:mn></mml:mrow></mml:msup>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>48</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.848</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi> </mml:mi>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mi>v</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.899</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The QSPR derived does not incorporate redundant structural information, as it involves orthogonal descriptors [<xref ref-type="bibr" rid="b85-ijms-10-02558">85</xref>]. This model includes two calibration outliers with a residual exceeding the value 2<italic>S</italic> = 1.806: compounds <bold>15</bold> (acibenzolar-<italic>S</italic>-methyl, 1.902) and <bold>91</bold> (etofenprox, −2.545), while no one of the training compounds exceed the value 3<italic>S</italic> = 2.709; the presence of these outliers may be attributed exclusively to be a pure consequence of the limited number of structural descriptors participating in <xref ref-type="disp-formula" rid="FD3">Eq. (3)</xref>, since this model haa a high ratio of number of observations to number of parameters (<italic>N</italic>/<italic>d</italic> = 32.333).</p>
<p>The predictive power of the QSPR is satisfactory, as revealed by its stability upon the inclusion or exclusion of compounds, as measured by the <italic>loo</italic> parameters <italic>R<sub>loo</sub></italic> = 0.849 and <italic>S<sub>loo</sub></italic> = 0.971, and by the more severe test of higher percentage of compounds exclusion <italic>R<sub>l-10%-o</sub></italic> = 0.809 and <italic>S<sub>l-10%-o</sub></italic> = 1.090. These results are in the range of a validated model: <italic>R<sub>l-n%-o</sub></italic> must be greater than the value of 0.50, according to the specialized literature [<xref ref-type="bibr" rid="b86-ijms-10-02558">86</xref>]. Furthermore, the predictive capability of the so established equation is demonstrated by its performance in the test set val, leading to <italic>R<sub>val</sub></italic> = 0.848 and <italic>S<sub>val</sub></italic> = 0.899. Finally, after analyzing 5,000,000 cases for y-randomization [<xref ref-type="bibr" rid="b87-ijms-10-02558">87</xref>], the smallest <italic>S</italic> value obtained using this procedure was 1.650, a poorer value when compared to the one found considering the true calibration (<italic>S</italic> = 0.903). In this way, the robustness of the model could be assessed, showing that the calibration was not a fortuitous correlation and therefore results in a structure-activity relationship.</p>
<p>As can be appreciated from the derived QSPR, different definitions of descriptors are needed to correctly represent the structures for the drug-like heterogeneous compounds. After a proper standardization [<xref ref-type="bibr" rid="b88-ijms-10-02558">88</xref>] of the orthogonal descriptors present in <xref ref-type="disp-formula" rid="FD3">Equation (3)</xref>, it is feasible to assign a greater importance to those variables that exhibit larger absolute standardized coefficients. The most important structural factor of the model is the topological descriptor <italic>X1sol</italic>, the solvation connectivity index chi-1 proposed by Zefirov and Palyulin in 1991 [<xref ref-type="bibr" rid="b89-ijms-10-02558">89</xref>]. It has the following general formula when calculated for hydrogen- and fluorine-depleted molecular graphs:
<disp-formula id="FD4">
<label>(4)</label>
<mml:math display="block">
<mml:mi>X</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:msup>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>Z</mml:mi>
<mml:mi>k</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>δ</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:msub>
<mml:mi>δ</mml:mi>
<mml:mi>j</mml:mi></mml:msub>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mi>δ</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>where <italic>m</italic> is the order of index; summation is over all sub-graphs of order <italic>m; δ<sub>i</sub>δ<sub>j</sub></italic> ...<italic>δ<sub>k</sub></italic> are connectivities of vertexes of sub-graph; and <italic>Z<sub>i</sub>Z<sub>j</sub></italic> ...<italic>Z<sub>k</sub></italic> are coefficients characterizing the atom size, which coincide to the number of the period in the Periodic Table. The second important descriptor involved in <xref ref-type="disp-formula" rid="FD3">Eq. (3)</xref> corresponds to <italic>MLOGP</italic>, the Moriguchi octanol-water partition coefficient [<xref ref-type="bibr" rid="b90-ijms-10-02558">90</xref>]: this reveals that a compound’s hydrophobicity plays a crucial role in explaining the aqueous solubility data. Finally, the contribution of a 3D-Radial Distribution Function [<xref ref-type="bibr" rid="b74-ijms-10-02558">74</xref>] <italic>RDF060u</italic> helps to improve the predictive power of the QSPR. Such a kind of molecular descriptor defined for an ensemble of atoms may be interpreted as the probability distribution of finding an atom in a spherical volume of certain radius, incorporating different types of atomic properties in order to differentiate the nature and contribution of atoms to the property being modelled. For the case of <italic>RDF060u</italic>, the sphere radius is of 6.0 angstroms and no atomic property is employed, thus characterizing the molecular size.</p>
<p>The application of the developed structure-property relationship to the classical test set 21, whose data are considered “unknown” and that do not participate during the model development (as is the case of test set val), leads to a square root mean quadratic residual (<italic>rms</italic>) of 1.202. The statistical quality achieved on this test set is comparable to that obtained by the previously reported models for aqueous solubilities in <xref ref-type="table" rid="t2-ijms-10-02558">Table 2</xref>, and the main advantage here is that only three molecular descriptors are employed to model the physical property, leading to a favorable ratio <italic>N</italic>/<italic>d</italic> = 7. This equation results in a superior predictive quality than that obtained by the GCM of Klopman (<italic>rms</italic> = 1.213) involving 34 parameters [<xref ref-type="bibr" rid="b14-ijms-10-02558">14</xref>], and also outperforms the MLR of Yan (<italic>rms</italic> = 1.286) using 40 parameters [<xref ref-type="bibr" rid="b91-ijms-10-02558">91</xref>].</p>
<p>To conclude the present analysis, the chemical information encoded by only three theoretical molecular descriptors of the one-, two-, and three- types participating in a linear QSPR model enabled to explain the variation of the experimental aqueous solubilities in a satisfactory extent, and allowed a proper characterization of structurally heterogeneous drug-like organic compounds from both the training and test sets. The QSPR designed involved molecular descriptors that have a quite direct interpretation, and this relationship proved to have general applicability. The statistical parameters of the proposed model compare fairly well with others published previously based on the GCM methodology.</p></sec>
<sec sec-type="conclusions">
<label>4.</label>
<title>Conclusions</title>
<p>In this review we have analyzed the possibility of establishing quantitative structure-aqueous solubility relationships for drug-like compounds, and compared our recently developed linear QSPR method with others reported in the literature. Such kinds of linear equations are demonstrated to work quite well both for the training and validation stages of the model, and can in principle be used for the <italic>in silico</italic> prediction of physicochemical properties. Two different strategies can be adopted for correlating the structure and the solubility of compounds: (a) the proposal of novel descriptors posing some kind of physical interpretation, as it is the case for the Lipinski’s “rule of five” descriptors taking into account the bio-availability of drugs, or (b) the use of any kind of constitutional, topological, geometrical, or electronic descriptors for adjusting to the experimental solubility data. In both cases, it results of considerable importance the appropriate selection of a balanced set of chemical compounds that considers structural diversity, known experimental errors, correct tautomers or structures, consideration of ionization and crystal packing effects, range in solubility data that is pharmaceutically relevant, and that avoid the over-sampling of compounds with low molecular weight.</p></sec></body>
<back>
<ack>
<p>This research project was supported by the National Council of Scientific and Technological Research (CONICET) and by La Plata National University of Argentina.</p></ack>
<ref-list>
<title>References and Notes</title>
<ref id="b1-ijms-10-02558"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balakin</surname><given-names>KV</given-names></name><name><surname>Savchuk</surname><given-names>NP</given-names></name><name><surname>Tetko</surname><given-names>IV</given-names></name></person-group><article-title>In Silico approaches to prediction of aqueous and DMSO Solubility of drug-like compounds: Trends, problems and solutions</article-title><source>Curr. Med. Chem</source><year>2006</year><volume>13</volume><fpage>226</fpage><lpage>241</lpage></citation></ref>
<ref id="b2-ijms-10-02558"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Delaney</surname><given-names>JS</given-names></name></person-group><article-title>Prediction of aqueous solubility from structure</article-title><source>Drug Disc. Today</source><year>2005</year><volume>10</volume><fpage>289</fpage><lpage>295</lpage><pub-id pub-id-type="doi">10.1016/S1359-6446(04)03365-3</pub-id></citation></ref>
<ref id="b3-ijms-10-02558"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodwin</surname><given-names>JJ</given-names></name></person-group><article-title>Rationale and benefit of using high throughput solubility screens in drug discovery</article-title><source>Drug Disc. Today Technol</source><year>2006</year><volume>3</volume><fpage>67</fpage><lpage>71</lpage><pub-id pub-id-type="doi">10.1016/j.ddtec.2005.03.001</pub-id></citation></ref>
<ref id="b4-ijms-10-02558"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Johnson</surname><given-names>SR</given-names></name><name><surname>Zheng</surname><given-names>W</given-names></name></person-group><article-title>Recent progress in the computational prediction of aqueous solubility and absorption</article-title><source>AAPS J</source><year>2006</year><volume>8</volume><fpage>E27</fpage><lpage>E40</lpage><pub-id pub-id-type="doi">10.1208/aapsj080104</pub-id><pub-id pub-id-type="pmid">16584131</pub-id></citation></ref>
<ref id="b5-ijms-10-02558"><label>5.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Schneider</surname><given-names>G</given-names></name><name><surname>So</surname><given-names>S</given-names></name></person-group><source>Adaptative Systems in Drug Design</source><publisher-name>Landes Bioscience</publisher-name><publisher-loc>Austin, TX, USA</publisher-loc><year>2003</year></citation></ref>
<ref id="b6-ijms-10-02558"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname><given-names>H</given-names></name><name><surname>Adedoyin</surname><given-names>A</given-names></name></person-group><article-title>ADME-Tox in drug discovery: integration of experimental and computational technologies</article-title><source>Drug Disc. Today</source><year>2003</year><volume>8</volume><fpage>852</fpage><lpage>861</lpage><pub-id pub-id-type="doi">10.1016/S1359-6446(03)02828-9</pub-id></citation></ref>
<ref id="b7-ijms-10-02558"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lipinski</surname><given-names>CA</given-names></name><name><surname>Lombardo</surname><given-names>F</given-names></name><name><surname>Dominy</surname><given-names>DW</given-names></name><name><surname>Feeney</surname><given-names>PJ</given-names></name></person-group><article-title>Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings</article-title><source>Adv. Drug Deliv. Rev</source><year>2001</year><volume>46</volume><fpage>3</fpage><lpage>26</lpage><pub-id pub-id-type="doi">10.1016/S0169-409X(00)00129-0</pub-id><pub-id pub-id-type="pmid">11259830</pub-id></citation></ref>
<ref id="b8-ijms-10-02558"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname><given-names>CJ</given-names></name><name><surname>Hansch</surname><given-names>C</given-names></name></person-group><article-title>The relative toxicity of compounds in mainstream cigarette smoke condensate</article-title><source>Food Chem. Toxicol</source><year>2000</year><volume>38</volume><fpage>637</fpage><lpage>646</lpage><pub-id pub-id-type="doi">10.1016/S0278-6915(00)00051-X</pub-id><pub-id pub-id-type="pmid">10942325</pub-id></citation></ref>
<ref id="b9-ijms-10-02558"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amidon</surname><given-names>GL</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name><name><surname>Anik</surname><given-names>ST</given-names></name><name><surname>Valvani</surname><given-names>SC</given-names></name></person-group><article-title>Solubility of nonelectrolytes in polar solvents. V. Estimation of the solubility of aliphatic monofunctional compounds in water using a molecular surface area approach</article-title><source>J. Phys. Chem. A</source><year>1975</year><volume>79</volume><fpage>2239</fpage><lpage>2246</lpage><pub-id pub-id-type="doi">10.1021/j100588a008</pub-id></citation></ref>
<ref id="b10-ijms-10-02558"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansch</surname><given-names>C</given-names></name><name><surname>Bjorkroth</surname><given-names>JP</given-names></name><name><surname>Leo</surname><given-names>A</given-names></name></person-group><article-title>Hydrophobicity and central nervous system agents: on the principle of minimal hydrophobicity in drug design</article-title><source>J. Pharm. Sci</source><year>1987</year><volume>76</volume><fpage>663</fpage><lpage>687</lpage><pub-id pub-id-type="doi">10.1002/jps.2600760902</pub-id><pub-id pub-id-type="pmid">11002801</pub-id></citation></ref>
<ref id="b11-ijms-10-02558"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kariv</surname><given-names>I</given-names></name><name><surname>Rourick</surname><given-names>RA</given-names></name><name><surname>Kassel</surname><given-names>DB</given-names></name><name><surname>Chung</surname><given-names>TD</given-names></name></person-group><article-title>Improvement of “hit-to-lead” optimization by integration of <italic>in vitro</italic> HTS experimental models for early determination of pharmacokinetic properties</article-title><source>Comb. Chem. High Throughput Screen</source><year>2002</year><volume>5</volume><fpage>459</fpage><lpage>472</lpage><pub-id pub-id-type="doi">10.2174/1386207023330101</pub-id><pub-id pub-id-type="pmid">12470275</pub-id></citation></ref>
<ref id="b12-ijms-10-02558"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bhattachar</surname><given-names>SN</given-names></name><name><surname>Deschenes</surname><given-names>LA</given-names></name><name><surname>Wesley</surname><given-names>JA</given-names></name></person-group><article-title>Solubility: it's not just for physical chemists</article-title><source>Drug Disc. Today</source><year>2006</year><volume>11</volume><fpage>1012</fpage><lpage>1018</lpage><pub-id pub-id-type="doi">10.1016/j.drudis.2006.09.002</pub-id></citation></ref>
<ref id="b13-ijms-10-02558"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katritzky</surname><given-names>AR</given-names></name><name><surname>Maran</surname><given-names>U</given-names></name><name><surname>Lobanov</surname><given-names>VS</given-names></name><name><surname>Karelson</surname><given-names>M</given-names></name></person-group><article-title>Structurally diverse quantitative structure-property relationship correlations of technologically relevant physical properties</article-title><source>J. Chem. Inf. Model</source><year>2000</year><volume>40</volume><fpage>1</fpage><lpage>18</lpage><pub-id pub-id-type="doi">10.1021/ci9903206</pub-id></citation></ref>
<ref id="b14-ijms-10-02558"><label>14.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klopman</surname><given-names>G</given-names></name><name><surname>Wang</surname><given-names>S</given-names></name><name><surname>Balthasar</surname><given-names>DM</given-names></name></person-group><article-title>Estimation of aqueous solubility of organic molecules by the group contribution approach. Application to the study of biodegradation</article-title><source>J. Chem. Inf. Model</source><year>1992</year><volume>32</volume><fpage>474</fpage><lpage>482</lpage><pub-id pub-id-type="doi">10.1021/ci00009a013</pub-id></citation></ref>
<ref id="b15-ijms-10-02558"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McFarland</surname><given-names>JW</given-names></name><name><surname>Avdeef</surname><given-names>A</given-names></name><name><surname>Berger</surname><given-names>CM</given-names></name><name><surname>Raevsky</surname><given-names>OA</given-names></name></person-group><article-title>Estimating the water solubilities of crystalline compounds from their chemical structure alone</article-title><source>J. Chem. Inf. Model</source><year>2001</year><volume>41</volume><fpage>1355</fpage><lpage>1359</lpage><pub-id pub-id-type="doi">10.1021/ci0102822</pub-id></citation></ref>
<ref id="b16-ijms-10-02558"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pogliani</surname><given-names>L</given-names></name></person-group><article-title>Modeling purines and pyrimidines with the linear combination of connectivity indices–molecular connectivity “LCCI-MC” method</article-title><source>J. Chem. Inf. Model</source><year>1996</year><volume>36</volume><fpage>1082</fpage><lpage>1091</lpage><pub-id pub-id-type="doi">10.1021/ci960020d</pub-id></citation></ref>
<ref id="b17-ijms-10-02558"><label>17.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yalkowsky</surname><given-names>SH</given-names></name><name><surname>Valvani</surname><given-names>SC</given-names></name></person-group><article-title>Solubility and partitioning I: solubility of nonelectrolytes in water</article-title><source>J. Pharm. Sci</source><year>1980</year><volume>69</volume><fpage>912</fpage><lpage>922</lpage><pub-id pub-id-type="doi">10.1002/jps.2600690814</pub-id><pub-id pub-id-type="pmid">7400936</pub-id></citation></ref>
<ref id="b18-ijms-10-02558"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yalkowsky</surname><given-names>SH</given-names></name><name><surname>Valvani</surname><given-names>SC</given-names></name><name><surname>Roseman</surname><given-names>TJ</given-names></name></person-group><article-title>Water solubility: A critique of the solvatochromic approach</article-title><source>J. Pharm. Sci</source><year>1983</year><volume>72</volume><fpage>866</fpage><lpage>870</lpage><pub-id pub-id-type="doi">10.1002/jps.2600720808</pub-id><pub-id pub-id-type="pmid">6620139</pub-id></citation></ref>
<ref id="b19-ijms-10-02558"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname><given-names>G</given-names></name><name><surname>Ran</surname><given-names>Y</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name></person-group><article-title>Prediction of the aqueous solubility: comparison of the general solubility equation and the method using an amended solvation energy relationship</article-title><source>J. Pharm. Sci</source><year>2002</year><volume>91</volume><fpage>517</fpage><lpage>533</lpage><pub-id pub-id-type="doi">10.1002/jps.10022</pub-id><pub-id pub-id-type="pmid">11835210</pub-id></citation></ref>
<ref id="b20-ijms-10-02558"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peterson</surname><given-names>DL</given-names></name><name><surname>Yalkowski</surname><given-names>SH</given-names></name></person-group><article-title>Comparison of two methods for predicting aqueous solubility</article-title><source>J. Chem. Inf. Comput. Sci</source><year>2001</year><volume>41</volume><fpage>1531</fpage><lpage>1534</lpage><pub-id pub-id-type="doi">10.1021/ci010298s</pub-id><pub-id pub-id-type="pmid">11749579</pub-id></citation></ref>
<ref id="b21-ijms-10-02558"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ran</surname><given-names>Y</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name></person-group><article-title>Prediction of drug solubility by the general solubility equation (GSE)</article-title><source>J. Chem. Inf. Comput. Sci</source><year>2001</year><volume>41</volume><fpage>354</fpage><lpage>357</lpage><pub-id pub-id-type="doi">10.1021/ci000338c</pub-id><pub-id pub-id-type="pmid">11277722</pub-id></citation></ref>
<ref id="b22-ijms-10-02558"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ran</surname><given-names>Y</given-names></name><name><surname>Jain</surname><given-names>N</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name></person-group><article-title>Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE)</article-title><source>J. Chem. Inf. Comput. Sci</source><year>2001</year><volume>41</volume><fpage>1208</fpage><lpage>1207</lpage><pub-id pub-id-type="doi">10.1021/ci010287z</pub-id><pub-id pub-id-type="pmid">11604020</pub-id></citation></ref>
<ref id="b23-ijms-10-02558"><label>23.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meylan</surname><given-names>WM</given-names></name><name><surname>Howard</surname><given-names>PH</given-names></name><name><surname>Boethling</surname><given-names>RS</given-names></name></person-group><article-title>Improved method for estimating water solubility from octanol/water coefficient</article-title><source>Environ. Toxicol. Chem</source><year>1996</year><volume>15</volume><fpage>100</fpage><lpage>106</lpage><pub-id pub-id-type="doi">10.1002/etc.5620150205</pub-id></citation></ref>
<ref id="b24-ijms-10-02558"><label>24.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Meylan</surname><given-names>WM</given-names></name><name><surname>Howard</surname><given-names>PH</given-names></name></person-group><article-title>Estimating log P with atom/fragments and water solubility with log P</article-title><source>Persp. Drug Disc. Design</source><year>2000</year><volume>19</volume><fpage>67</fpage><lpage>84</lpage><pub-id pub-id-type="doi">10.1023/A:1008715521862</pub-id></citation></ref>
<ref id="b25-ijms-10-02558"><label>25.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Myrdal</surname><given-names>P</given-names></name><name><surname>Ward</surname><given-names>GH</given-names></name><name><surname>Dannenfelser</surname><given-names>RM</given-names></name><name><surname>Mishra</surname><given-names>DS</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name></person-group><article-title>AQUAFAC 1: Aqueous Functional group activity coefficients: Application to hydrocarbons</article-title><source>Chemosphere</source><year>1992</year><volume>24</volume><fpage>1047</fpage><lpage>1061</lpage><pub-id pub-id-type="doi">10.1016/0045-6535(92)90196-X</pub-id></citation></ref>
<ref id="b26-ijms-10-02558"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pinsuwan</surname><given-names>S</given-names></name><name><surname>Myrdal</surname><given-names>PB</given-names></name><name><surname>Lee</surname><given-names>YC</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name></person-group><article-title>AQUAFAC 5: Applications to alcohols and acids</article-title><source>Chemosphere</source><year>1997</year><volume>35</volume><fpage>2503</fpage><lpage>2513</lpage><pub-id pub-id-type="doi">10.1016/S0045-6535(97)00318-4</pub-id></citation></ref>
<ref id="b27-ijms-10-02558"><label>27.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Morris</surname><given-names>JJ</given-names></name><name><surname>Bruneau</surname><given-names>PP</given-names></name></person-group><article-title>Prediction of physicochemical properties</article-title><source>Virtual Screening for Bioactive Molecules</source><person-group person-group-type="editor"><name><surname>Bohm</surname><given-names>HG</given-names></name><name><surname>Schneider</surname><given-names>G</given-names></name></person-group><publisher-name>Wiley-VCH</publisher-name><publisher-loc>Weinheim, Germany</publisher-loc><year>2000</year><volume>10</volume><fpage>33</fpage><lpage>58</lpage></citation></ref>
<ref id="b28-ijms-10-02558"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thompson</surname><given-names>JD</given-names></name><name><surname>Cramer</surname><given-names>CJ</given-names></name><name><surname>Truhlar</surname><given-names>DG</given-names></name></person-group><article-title>Predicting aqueous solubilities from aqueous free energies of solvation and experimental or calculated vapor pressures of pure substances</article-title><source>J. Chem. Phys</source><year>2003</year><volume>119</volume><fpage>1661</fpage><lpage>1670</lpage><pub-id pub-id-type="doi">10.1063/1.1579474</pub-id></citation></ref>
<ref id="b29-ijms-10-02558"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yaws</surname><given-names>CL</given-names></name><name><surname>Xiang</surname><given-names>P</given-names></name><name><surname>Xiaoyin</surname><given-names>L</given-names></name></person-group><article-title>Water solubility data for 151 hydrocarbons</article-title><source>Chem. Eng</source><year>1993</year><volume>100</volume><fpage>108</fpage><lpage>111</lpage></citation></ref>
<ref id="b30-ijms-10-02558"><label>30.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jorgensen</surname><given-names>WL</given-names></name><name><surname>Duffy</surname><given-names>EM</given-names></name></person-group><article-title>Prediction of drug solubility from Monte Carlo simulations</article-title><source>Bioorg. Med. Chem. Lett</source><year>2000</year><volume>10</volume><fpage>1155</fpage><lpage>1158</lpage><pub-id pub-id-type="doi">10.1016/S0960-894X(00)00172-4</pub-id><pub-id pub-id-type="pmid">10866370</pub-id></citation></ref>
<ref id="b31-ijms-10-02558"><label>31.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kier</surname><given-names>LB</given-names></name><name><surname>Cheng</surname><given-names>C-K</given-names></name><name><surname>Seybold</surname><given-names>PG</given-names></name></person-group><article-title>Cellular automata models of aqueous solution systems</article-title><source>Reviews in Computational Chemistry</source><person-group person-group-type="editor"><name><surname>Lipkowitz</surname><given-names>KB</given-names></name><name><surname>Boyd</surname><given-names>DB</given-names></name></person-group><publisher-name>Wiley-VCH</publisher-name><publisher-loc>Weinheim, Germany</publisher-loc><year>2001</year><volume>17</volume><fpage>205</fpage><lpage>254</lpage></citation></ref>
<ref id="b32-ijms-10-02558"><label>32.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Cramer</surname><given-names>CJ</given-names></name><name><surname>Truhlar</surname><given-names>DG</given-names></name></person-group><article-title>Continuum solvation models: Classical and quantum mechanical implementations</article-title><source>Reviews in Computational Chemistry</source><person-group person-group-type="editor"><name><surname>Lipkowitz</surname><given-names>KB</given-names></name><name><surname>Boyd</surname><given-names>DB</given-names></name></person-group><publisher-name>Wiley-VCH</publisher-name><publisher-loc>Weinheim, Germany</publisher-loc><year>1995</year><volume>6</volume><fpage>1</fpage><lpage>72</lpage></citation></ref>
<ref id="b33-ijms-10-02558"><label>33.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klamt</surname><given-names>A</given-names></name></person-group><article-title>Prediction of aqueous solubility of drugs and pesticides with COSMO-RS</article-title><source>J. Comput. Chem</source><year>2002</year><volume>23</volume><fpage>275</fpage><lpage>281</lpage><pub-id pub-id-type="doi">10.1002/jcc.1168</pub-id><pub-id pub-id-type="pmid">11924739</pub-id></citation></ref>
<ref id="b34-ijms-10-02558"><label>34.</label><citation citation-type="web"><source>Artist</source>Available online: <ext-link xlink:href="http://www.ddbst.de/new/Win_DDBSP/frame_Artist.htm" ext-link-type="uri">http://www.ddbst.de/new/Win_DDBSP/frame_Artist.htm</ext-link>, 2 June 2009.</citation></ref>
<ref id="b35-ijms-10-02558"><label>35.</label><citation citation-type="web"><source>ChemEng Software Design</source>Available online: <ext-link xlink:href="http://www.cesd.com/chempage.htm" ext-link-type="uri">http://www.cesd.com/chempage.htm</ext-link>, 2 June 2009.</citation></ref>
<ref id="b36-ijms-10-02558"><label>36.</label><citation citation-type="web"><source>Predict</source>Available online: <ext-link xlink:href="http://www.mwsoftware.com/dragon/desc.html" ext-link-type="uri">http://www.mwsoftware.com/dragon/desc.html</ext-link>, 2 June 2009.</citation></ref>
<ref id="b37-ijms-10-02558"><label>37.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nirmalakhandan</surname><given-names>NNP</given-names></name><name><surname>Speece</surname><given-names>RE</given-names></name></person-group><article-title>Prediction of aqueous solubility of organic chemicals based on molecular structure. 2. Application to PNAs, PCBs, PCDDs, etc</article-title><source>Environ. Sci. Technol</source><year>1989</year><volume>23</volume><fpage>708</fpage><lpage>713</lpage><pub-id pub-id-type="doi">10.1021/es00064a009</pub-id></citation></ref>
<ref id="b38-ijms-10-02558"><label>38.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Suzuki</surname><given-names>T</given-names></name></person-group><article-title>Development of an automatic estimation system for both the partition coefficient and aqueous solubility</article-title><source>J. Comput.-Aided Mol. Des</source><year>1991</year><volume>5</volume><fpage>149</fpage><lpage>166</lpage><pub-id pub-id-type="doi">10.1007/BF00129753</pub-id><pub-id pub-id-type="pmid">1869898</pub-id></citation></ref>
<ref id="b39-ijms-10-02558"><label>39.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuhne</surname><given-names>R</given-names></name><name><surname>Ebert</surname><given-names>RU</given-names></name><name><surname>Kleint</surname><given-names>F</given-names></name><name><surname>Schmidt</surname><given-names>G</given-names></name><name><surname>Schuurmann</surname><given-names>G</given-names></name></person-group><article-title>Group contribution methods to estimate water solubility of organic chemicals</article-title><source>Chemosphere</source><year>1995</year><volume>30</volume><fpage>2061</fpage><lpage>2077</lpage><pub-id pub-id-type="doi">10.1016/0045-6535(95)00084-L</pub-id></citation></ref>
<ref id="b40-ijms-10-02558"><label>40.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>Y</given-names></name><name><surname>Myrdal</surname><given-names>PB</given-names></name><name><surname>Yalkowsky</surname><given-names>SH</given-names></name></person-group><article-title>Aqueous functional group activity coefficients (AQUAFAC) 4: Applications to complex organic compounds</article-title><source>Chemosphere</source><year>1996</year><volume>33</volume><fpage>2129</fpage><lpage>2144</lpage><pub-id pub-id-type="doi">10.1016/0045-6535(96)00311-6</pub-id></citation></ref>
<ref id="b41-ijms-10-02558"><label>41.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klopman</surname><given-names>G</given-names></name><name><surname>Zhu</surname><given-names>H</given-names></name></person-group><article-title>Estimation of aqueous solubility of organic molecules by the group contribution approach</article-title><source>J. Chem. Inf. Model</source><year>2001</year><volume>41</volume><fpage>439</fpage><lpage>445</lpage><pub-id pub-id-type="doi">10.1021/ci000152d</pub-id></citation></ref>
<ref id="b42-ijms-10-02558"><label>42.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Free</surname><given-names>SM</given-names></name><name><surname>Wilson</surname><given-names>JW</given-names></name></person-group><article-title>A mathematical contribution to structure-activity studies</article-title><source>J. Med. Chem</source><year>1964</year><volume>7</volume><fpage>395</fpage><lpage>399</lpage><pub-id pub-id-type="doi">10.1021/jm00334a001</pub-id><pub-id pub-id-type="pmid">14221113</pub-id></citation></ref>
<ref id="b43-ijms-10-02558"><label>43.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hansch</surname><given-names>C</given-names></name></person-group><article-title>p-σ-π analysis. A method for the correlation of biological activity and chemical structure</article-title><source>J. Am. Chem. Soc</source><year>1964</year><volume>86</volume><fpage>1616</fpage><lpage>1626</lpage><pub-id pub-id-type="doi">10.1021/ja01062a035</pub-id></citation></ref>
<ref id="b44-ijms-10-02558"><label>44.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Hansch</surname><given-names>C</given-names></name><name><surname>Leo</surname><given-names>A</given-names></name></person-group><source>Exploring QSAR Fundamentals and Applications in Chemistry and Biology</source><publisher-name>American Chemical Society</publisher-name><publisher-loc>Washington, DC, USA</publisher-loc><year>1995</year></citation></ref>
<ref id="b45-ijms-10-02558"><label>45.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Katritzky</surname><given-names>AR</given-names></name><name><surname>Lobanov</surname><given-names>VS</given-names></name><name><surname>Karelson</surname><given-names>M</given-names></name></person-group><article-title>QSPR - the correlation and quantitative prediction of chemical and physical properties from structure</article-title><source>Chem. Soc. Rev</source><year>1995</year><volume>24</volume><fpage>279</fpage><lpage>287</lpage><pub-id pub-id-type="doi">10.1039/cs9952400279</pub-id></citation></ref>
<ref id="b46-ijms-10-02558"><label>46.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Trinajstic</surname><given-names>N</given-names></name></person-group><source>Chemical Graph Theory</source><publisher-name>CRC Press</publisher-name><publisher-loc>Boca Raton, FL, USA</publisher-loc><year>1992</year></citation></ref>
<ref id="b47-ijms-10-02558"><label>47.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Worth</surname><given-names>AP</given-names></name><name><surname>Bassan</surname><given-names>A</given-names></name><name><surname>De Bruijn</surname><given-names>J</given-names></name><name><surname>Saliner</surname><given-names>AG</given-names></name><name><surname>Netzeva</surname><given-names>T</given-names></name><name><surname>Patlewicz</surname><given-names>G</given-names></name><name><surname>Pavan</surname><given-names>M</given-names></name><name><surname>Tsakovska</surname><given-names>I</given-names></name><name><surname>Eisenreich</surname><given-names>S</given-names></name></person-group><article-title>The role of the European Chemicals Bureau in promoting the regulatory use of QSARs methods</article-title><source>SAR QSAR Environ. Res</source><year>2007</year><volume>18</volume><fpage>111</fpage><lpage>125</lpage><pub-id pub-id-type="doi">10.1080/10629360601054255</pub-id><pub-id pub-id-type="pmid">17365963</pub-id></citation></ref>
<ref id="b48-ijms-10-02558"><label>48.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Noringer</surname><given-names>U</given-names></name></person-group><article-title>In silico modelling of ADMET-a minireview of work from 2000 to 2004</article-title><source>SAR QSAR Environ. Res</source><year>2005</year><volume>16</volume><fpage>1</fpage><lpage>11</lpage><pub-id pub-id-type="pmid">15912627</pub-id></citation></ref>
<ref id="b49-ijms-10-02558"><label>49.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Martin</surname><given-names>YC</given-names></name></person-group><article-title>A bioavailability score</article-title><source>J. Med. Chem</source><year>2005</year><volume>48</volume><fpage>3164</fpage><lpage>3170</lpage><pub-id pub-id-type="doi">10.1021/jm0492002</pub-id><pub-id pub-id-type="pmid">15857122</pub-id></citation></ref>
<ref id="b50-ijms-10-02558"><label>50.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yoshida</surname><given-names>F</given-names></name></person-group><article-title>QSAR model for drug human bioavailability</article-title><source>J. Med. Chem</source><year>2000</year><volume>43</volume><fpage>2575</fpage><lpage>2585</lpage><pub-id pub-id-type="doi">10.1021/jm0000564</pub-id><pub-id pub-id-type="pmid">10891117</pub-id></citation></ref>
<ref id="b51-ijms-10-02558"><label>51.</label><citation citation-type="web"><article-title>Molecular Descriptors Family Home page</article-title>Available online: <ext-link xlink:href="http://sorana.academicdirect.ro" ext-link-type="uri">http://sorana.academicdirect.ro</ext-link>, 2 June 2009.</citation></ref>
<ref id="b52-ijms-10-02558"><label>52.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Karelson</surname><given-names>M</given-names></name></person-group><source>Molecular Descriptors in QSAR/QSPR</source><publisher-name>Wiley-Interscience</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2000</year></citation></ref>
<ref id="b53-ijms-10-02558"><label>53.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Todeschini</surname><given-names>R</given-names></name><name><surname>Consonni</surname><given-names>V</given-names></name></person-group><source>Handbook of Molecular Descriptors</source><publisher-name>Wiley VCH</publisher-name><publisher-loc>Weinheim, Germany</publisher-loc><year>2000</year></citation></ref>
<ref id="b54-ijms-10-02558"><label>54.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Apostol</surname><given-names>TM</given-names></name></person-group><source>Calculus</source><publisher-name>Blaisdell Publishing Co</publisher-name><publisher-loc>Waltham, MA, USA</publisher-loc><year>1969</year></citation></ref>
<ref id="b55-ijms-10-02558"><label>55.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Malinowski</surname><given-names>ER</given-names></name></person-group><source>Factor Analysis in Chemistry</source><publisher-name>Wiley</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1991</year></citation></ref>
<ref id="b56-ijms-10-02558"><label>56.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Leardi</surname><given-names>R</given-names></name></person-group><article-title>Genetic algorithms in feature selection</article-title><source>Genetic Algorithms in Molecular Modeling Principles of QSAR and Drug Design</source><person-group person-group-type="editor"><name><surname>Devillers</surname><given-names>J</given-names></name></person-group><publisher-name>Academic Press</publisher-name><publisher-loc>London, UK</publisher-loc><year>1996</year><volume>1</volume><fpage>67</fpage><lpage>86</lpage></citation></ref>
<ref id="b57-ijms-10-02558"><label>57.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duchowicz</surname><given-names>PR</given-names></name><name><surname>Castro</surname><given-names>EA</given-names></name><name><surname>Fernández</surname><given-names>FM</given-names></name></person-group><article-title>Alternative algorithm for the search of an optimal set of descriptors in QSAR-QSPR studies</article-title><source>MATCH Commun. Math. Comput. Chem</source><year>2006</year><volume>55</volume><fpage>179</fpage><lpage>192</lpage></citation></ref>
<ref id="b58-ijms-10-02558"><label>58.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Zupan</surname><given-names>J</given-names></name></person-group><source>Encyclopedia of Computational Chemistry</source><publisher-name>Wiley</publisher-name><publisher-loc>Chichester, UK</publisher-loc><year>1998</year><year>2006</year></citation></ref>
<ref id="b59-ijms-10-02558"><label>59.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Vapanik</surname><given-names>V</given-names></name></person-group><source>The Nature of Statistical Learning Theory</source><publisher-name>Springer Verlag</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1995</year></citation></ref>
<ref id="b60-ijms-10-02558"><label>60.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Livingstone</surname><given-names>DJ</given-names></name><name><surname>Manallack</surname><given-names>DT</given-names></name></person-group><article-title>Statistics using neural networks: chance effects</article-title><source>J. Med. Chem</source><year>1993</year><volume>36</volume><fpage>1295</fpage><lpage>1297</lpage><pub-id pub-id-type="doi">10.1021/jm00061a023</pub-id><pub-id pub-id-type="pmid">8487267</pub-id></citation></ref>
<ref id="b61-ijms-10-02558"><label>61.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tetko</surname><given-names>IV</given-names></name><name><surname>Luik</surname><given-names>AI</given-names></name><name><surname>Poda</surname><given-names>GI</given-names></name></person-group><article-title>Applications of neural networks in structure-activity relationships of a small number of molecules</article-title><source>J. Med. Chem</source><year>1993</year><volume>36</volume><fpage>811</fpage><lpage>814</lpage><pub-id pub-id-type="doi">10.1021/jm00059a003</pub-id><pub-id pub-id-type="pmid">8464034</pub-id></citation></ref>
<ref id="b62-ijms-10-02558"><label>62.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Talevi</surname><given-names>A</given-names></name><name><surname>Castro</surname><given-names>EA</given-names></name><name><surname>Bruno-Blanch</surname><given-names>LE</given-names></name></person-group><article-title>New solubility models based on descriptors derived from the detour matrix</article-title><source>J. Arg. Chem. Soc</source><year>2006</year><volume>44</volume><fpage>129</fpage><lpage>141</lpage></citation></ref>
<ref id="b63-ijms-10-02558"><label>63.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Harary</surname><given-names>F</given-names></name></person-group><source>Graph Theory</source><publisher-name>Addison-Wesley</publisher-name><publisher-loc>Upper Saddle River, NJ, USA</publisher-loc><year>1969</year></citation></ref>
<ref id="b64-ijms-10-02558"><label>64.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Castro</surname><given-names>EA</given-names></name><name><surname>Tueros</surname><given-names>M</given-names></name><name><surname>Toropov</surname><given-names>AA</given-names></name></person-group><article-title>Maximum topological distances based indices as molecular descriptors for QSPR: 2--application to aromatic hydrocarbons</article-title><source>Comput. Chem</source><year>2000</year><volume>24</volume><fpage>571</fpage><lpage>576</lpage><pub-id pub-id-type="doi">10.1016/S0097-8485(99)00095-9</pub-id><pub-id pub-id-type="pmid">10890366</pub-id></citation></ref>
<ref id="b65-ijms-10-02558"><label>65.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Devillers</surname><given-names>J</given-names></name><name><surname>Balaban</surname><given-names>AT</given-names></name></person-group><source>Topological Indices and Related Descriptors in QSAR and QSPR</source><publisher-name>Gordon and Breach Science Publishers</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1999</year></citation></ref>
<ref id="b66-ijms-10-02558"><label>66.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Firpo</surname><given-names>M</given-names></name><name><surname>Gavernet</surname><given-names>L</given-names></name><name><surname>Castro</surname><given-names>EA</given-names></name><name><surname>Toropov</surname><given-names>AA</given-names></name></person-group><article-title>Maximum topological distances based indices as molecular descriptors for QSPR. Part 1. Application to alkyl benzenes boiling points</article-title><source>J. Mol. Struc-Theochem</source><year>2000</year><volume>501</volume><fpage>419</fpage><lpage>425</lpage><pub-id pub-id-type="doi">10.1016/S0166-1280(99)00453-4</pub-id></citation></ref>
<ref id="b67-ijms-10-02558"><label>67.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lukovits</surname><given-names>I</given-names></name></person-group><article-title>The detour index</article-title><source>Croat. Chem. Acta</source><year>1996</year><volume>69</volume><fpage>873</fpage><lpage>882</lpage></citation></ref>
<ref id="b68-ijms-10-02558"><label>68.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trinajstić</surname><given-names>N</given-names></name><name><surname>Nikolić</surname><given-names>S</given-names></name><name><surname>Lučić</surname><given-names>B</given-names></name></person-group><article-title>The detour matrix in chemistry</article-title><source>J. Chem. Inf. Model</source><year>1997</year><volume>37</volume><fpage>631</fpage><lpage>638</lpage><pub-id pub-id-type="doi">10.1021/ci960149n</pub-id></citation></ref>
<ref id="b69-ijms-10-02558"><label>69.</label><citation citation-type="web"><source>Milano Chemometrics and QSAR Research Group Homepage</source>Available online: <ext-link xlink:href="http://www.disat.unimib.it/chm" ext-link-type="uri">http://www.disat.unimib.it/chm</ext-link>, 2 June 2009.</citation></ref>
<ref id="b70-ijms-10-02558"><label>70.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duchowicz</surname><given-names>PR</given-names></name><name><surname>Castro</surname><given-names>EA</given-names></name><name><surname>Fernández</surname><given-names>FM</given-names></name><name><surname>González</surname><given-names>MP</given-names></name></person-group><article-title>A new search algorithm of QSPR/QSAR theories: Normal boiling points of some organic molecules</article-title><source>Chem. Phys. Lett</source><year>2005</year><volume>412</volume><fpage>376</fpage><lpage>380</lpage><pub-id pub-id-type="doi">10.1016/j.cplett.2005.07.016</pub-id></citation></ref>
<ref id="b71-ijms-10-02558"><label>71.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duchowicz</surname><given-names>PR</given-names></name><name><surname>Fernández</surname><given-names>M</given-names></name><name><surname>Caballero</surname><given-names>J</given-names></name><name><surname>Castro</surname><given-names>EA</given-names></name><name><surname>Fernández</surname><given-names>FM</given-names></name></person-group><article-title>QSAR of non-nucleoside inhibitors of HIV-1 reverse transcriptase</article-title><source>Bioorg. Med. Chem</source><year>2006</year><volume>16</volume><fpage>5876</fpage><lpage>5889</lpage></citation></ref>
<ref id="b72-ijms-10-02558"><label>72.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hawkins</surname><given-names>DM</given-names></name><name><surname>Basak</surname><given-names>SC</given-names></name><name><surname>Mills</surname><given-names>D</given-names></name></person-group><article-title>Assessing model fit by cross validation</article-title><source>J. Chem. Inf. Model</source><year>2003</year><volume>43</volume><fpage>579</fpage><lpage>586</lpage><pub-id pub-id-type="doi">10.1021/ci025626i</pub-id></citation></ref>
<ref id="b73-ijms-10-02558"><label>73.</label><citation citation-type="book"><source>The Merck Index An Encyclopedia of Chemicals, Drugs, and Biologicals</source><edition>13th Ed</edition><publisher-name>Merck &amp; Co</publisher-name><publisher-loc>Rahway, NJ, USA</publisher-loc><year>2001</year></citation></ref>
<ref id="b74-ijms-10-02558"><label>74.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Consonni</surname><given-names>V</given-names></name><name><surname>Todeschini</surname><given-names>R</given-names></name><name><surname>Pavan</surname><given-names>M</given-names></name></person-group><article-title>Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 2. Application of the novel 3D molecular descriptors to QSAR/QSPR studies</article-title><source>J. Chem. Inf. Model</source><year>2002</year><volume>42</volume><fpage>693</fpage><lpage>705</lpage><pub-id pub-id-type="doi">10.1021/ci0155053</pub-id></citation></ref>
<ref id="b75-ijms-10-02558"><label>75.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Consonni</surname><given-names>V</given-names></name><name><surname>Todeschini</surname><given-names>R</given-names></name></person-group><source>Rational Approaches to Drug Design</source><publisher-name>Prous Science</publisher-name><publisher-loc>Barcelona, Spain</publisher-loc><year>2001</year><fpage>235</fpage><lpage>240</lpage></citation></ref>
<ref id="b76-ijms-10-02558"><label>76.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Viswanadhan</surname><given-names>VN</given-names></name><name><surname>Ghose</surname><given-names>AK</given-names></name><name><surname>Revankar</surname><given-names>GR</given-names></name><name><surname>Robins</surname><given-names>RK</given-names></name></person-group><article-title>Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics</article-title><source>J. Chem. Inf. Model</source><year>1989</year><volume>29</volume><fpage>163</fpage><lpage>172</lpage><pub-id pub-id-type="doi">10.1021/ci00063a006</pub-id></citation></ref>
<ref id="b77-ijms-10-02558"><label>77.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Silverman</surname><given-names>DB</given-names></name></person-group><article-title>Three-dimensional moments of molecular property fields</article-title><source>J. Chem. Inf. Model</source><year>2000</year><volume>40</volume><fpage>1470</fpage><lpage>1476</lpage><pub-id pub-id-type="doi">10.1021/ci000457s</pub-id></citation></ref>
<ref id="b78-ijms-10-02558"><label>78.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Duchowicz</surname><given-names>PR</given-names></name><name><surname>Talevi</surname><given-names>A</given-names></name><name><surname>Bruno-Blanch</surname><given-names>LE</given-names></name><name><surname>Castro</surname><given-names>EA</given-names></name></person-group><article-title>New QSPR study for the prediction of aqueous solubility of drug-like compounds</article-title><source>Bioorg. Med. Chem</source><year>2008</year><volume>16</volume><fpage>7944</fpage><lpage>7955</lpage><pub-id pub-id-type="doi">10.1016/j.bmc.2008.07.067</pub-id><pub-id pub-id-type="pmid">18701302</pub-id></citation></ref>
<ref id="b79-ijms-10-02558"><label>79.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Veber</surname><given-names>DF</given-names></name><name><surname>Johnson</surname><given-names>SR</given-names></name><name><surname>Cheng</surname><given-names>H</given-names></name><name><surname>Smith</surname><given-names>BR</given-names></name><name><surname>Ward</surname><given-names>KW</given-names></name><name><surname>Kopple</surname><given-names>KD</given-names></name></person-group><article-title>Molecular property that influence the drug bioavailability of drug candidates</article-title><source>J. Med. Chem</source><year>2002</year><volume>45</volume><fpage>2615</fpage><lpage>2623</lpage><pub-id pub-id-type="doi">10.1021/jm020017n</pub-id><pub-id pub-id-type="pmid">12036371</pub-id></citation></ref>
<ref id="b80-ijms-10-02558"><label>80.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Charifson</surname><given-names>PS</given-names></name><name><surname>Walters</surname><given-names>WP</given-names></name></person-group><article-title>Filtering databases and chemical libraries</article-title><source>J. Comput. Aided Mol. Des</source><year>2002</year><volume>16</volume><fpage>311</fpage><lpage>323</lpage><pub-id pub-id-type="doi">10.1023/A:1020829519597</pub-id><pub-id pub-id-type="pmid">12489681</pub-id></citation></ref>
<ref id="b81-ijms-10-02558"><label>81.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Monge</surname><given-names>A</given-names></name><name><surname>Arrault</surname><given-names>A</given-names></name><name><surname>Marot</surname><given-names>C</given-names></name><name><surname>Morin-Allory</surname><given-names>L</given-names></name></person-group><article-title>Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers</article-title><source>Mol. Divers</source><year>2006</year><volume>10</volume><fpage>339</fpage><lpage>403</lpage></citation></ref>
<ref id="b82-ijms-10-02558"><label>82.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walters</surname><given-names>WP</given-names></name><name><surname>Murcko</surname><given-names>MA</given-names></name></person-group><article-title>Prediction of “drug-likeness”</article-title><source>Adv. Drug Deliv. Rev</source><year>2002</year><volume>54</volume><fpage>255</fpage><lpage>271</lpage><pub-id pub-id-type="doi">10.1016/S0169-409X(02)00003-0</pub-id><pub-id pub-id-type="pmid">11922947</pub-id></citation></ref>
<ref id="b83-ijms-10-02558"><label>83.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>K</given-names></name><name><surname>Feng</surname><given-names>J</given-names></name><name><surname>Young</surname><given-names>SS</given-names></name></person-group><article-title>PowerMV: A software environment for molecular viewing, descriptor generation, data analysis and hit evaluation</article-title><source>J. Chem. Inf. Model</source><year>2005</year><volume>45</volume><fpage>515</fpage><lpage>522</lpage><pub-id pub-id-type="doi">10.1021/ci049847v</pub-id><pub-id pub-id-type="pmid">15807517</pub-id></citation></ref>
<ref id="b84-ijms-10-02558"><label>84.</label><citation citation-type="web"><source>Hyperchem (Hypercube) Homepage</source>Available online: <ext-link xlink:href="http://www.hyper.com" ext-link-type="uri">http://www.hyper.com</ext-link>, 2 June 2009.</citation></ref>
<ref id="b85-ijms-10-02558"><label>85.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Randic</surname><given-names>M</given-names></name></person-group><article-title>Resolution of ambiguities in structure-property studies by use of orthogonal descriptors</article-title><source>J. Chem. Inf. Model</source><year>1991</year><volume>31</volume><fpage>311</fpage><lpage>320</lpage><pub-id pub-id-type="doi">10.1021/ci00002a018</pub-id></citation></ref>
<ref id="b86-ijms-10-02558"><label>86.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Golbraikh</surname><given-names>A</given-names></name><name><surname>Tropsha</surname><given-names>A</given-names></name></person-group><article-title>Beware of q2!</article-title><source>J. Mol. Graphics Model</source><year>2002</year><volume>20</volume><fpage>269</fpage><lpage>276</lpage><pub-id pub-id-type="doi">10.1016/S1093-3263(01)00123-1</pub-id></citation></ref>
<ref id="b87-ijms-10-02558"><label>87.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Wold</surname><given-names>S</given-names></name><name><surname>Eriksson</surname><given-names>L</given-names></name></person-group><source>Chemometrics Methods in Molecular Design</source><publisher-name>VCH</publisher-name><publisher-loc>Weinheim, Germany</publisher-loc><year>1995</year></citation></ref>
<ref id="b88-ijms-10-02558"><label>88.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Draper</surname><given-names>NR</given-names></name><name><surname>Smith</surname><given-names>H</given-names></name></person-group><source>Applied Regression Analysis</source><publisher-name>John Wiley &amp; Sons</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1981</year></citation></ref>
<ref id="b89-ijms-10-02558"><label>89.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antipin</surname><given-names>IS</given-names></name><name><surname>Arslanov</surname><given-names>NA</given-names></name><name><surname>Palyulin</surname><given-names>VA</given-names></name><name><surname>Konovalov</surname><given-names>AI</given-names></name><name><surname>Zefirov</surname><given-names>NS</given-names></name><collab>of Disperse Interactions</collab></person-group><source>Dokl Akad Nauk SSSR</source><year>1991</year><volume>316</volume><fpage>925</fpage><lpage>928</lpage><comment>(<italic>Chem. Abstr. 115</italic>, 91390).</comment></citation></ref>
<ref id="b90-ijms-10-02558"><label>90.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moriguchi</surname><given-names>I</given-names></name><name><surname>Hirono</surname><given-names>S</given-names></name><name><surname>Liu</surname><given-names>Q</given-names></name><name><surname>Nakagome</surname><given-names>I</given-names></name><name><surname>Matsuchita</surname><given-names>Y</given-names></name></person-group><article-title>Simple method of calculating octanol/water partition coefficient</article-title><source>Chem. Pharm. Bull</source><year>1992</year><volume>40</volume><fpage>127</fpage><lpage>130</lpage><pub-id pub-id-type="doi">10.1248/cpb.40.127</pub-id></citation></ref>
<ref id="b91-ijms-10-02558"><label>91.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname><given-names>A</given-names></name><name><surname>Gasteiger</surname><given-names>J</given-names></name></person-group><article-title>Prediction of aqueous solubility of organic compounds based on a 3D structure representation</article-title><source>J. Chem. Inf. Model</source><year>2003</year><volume>43</volume><fpage>429</fpage><lpage>434</lpage><pub-id pub-id-type="doi">10.1021/ci025590u</pub-id></citation></ref>
<ref id="b92-ijms-10-02558"><label>92.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hou</surname><given-names>TJ</given-names></name><name><surname>Xia</surname><given-names>K</given-names></name><name><surname>Zhang</surname><given-names>W</given-names></name><name><surname>Xu</surname><given-names>XJ</given-names></name></person-group><article-title>ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach</article-title><source>J. Chem. Inf. Model</source><year>2004</year><volume>44</volume><fpage>266</fpage><lpage>275</lpage><pub-id pub-id-type="doi">10.1021/ci034184n</pub-id></citation></ref>
<ref id="b93-ijms-10-02558"><label>93.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huuskonen</surname><given-names>J</given-names></name></person-group><article-title>Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology</article-title><source>J. Chem. Inf. Model</source><year>2000</year><volume>40</volume><fpage>773</fpage><lpage>777</lpage><pub-id pub-id-type="doi">10.1021/ci9901338</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-ijms-10-02558" position="float">
<label>Figure 1.</label>
<caption>
<p>Balanced data set of molecular structures under analysis. Training Set 1–97 Test Set 98–145.</p></caption>
<graphic xlink:href="ijms-10-02558f1a.gif"/>
<graphic xlink:href="ijms-10-02558f1b.gif"/>
<graphic xlink:href="ijms-10-02558f1c.gif"/>
<graphic xlink:href="ijms-10-02558f1d.gif"/>
<graphic xlink:href="ijms-10-02558f1e.gif"/></fig>
<fig id="f2-ijms-10-02558" position="float">
<label>Figure 2.</label>
<caption>
<p>Normal distribution of the experimental log<sub>10</sub><italic>Sol</italic> values under analysis (<italic>N</italic> = 166).</p></caption>
<graphic xlink:href="ijms-10-02558f2.gif"/></fig>
<table-wrap id="t1-ijms-10-02558" position="float">
<label>Table 1.</label>
<caption>
<p>Methods for predicting aqueous solubilities.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="middle" align="center"><bold>Description</bold></th>
<th valign="middle" align="center"><bold>Requirements</bold></th>
<th valign="middle" align="center"><bold>Speed</bold></th></tr></thead>
<tbody>
<tr>
<td valign="middle" align="left">Methods based on other experimental physico-chemical properties</td>
<td valign="middle" align="left">log <italic>P</italic>, MP, etc.</td>
<td valign="middle" align="left">Tens to hundreds compounds per day</td></tr>
<tr>
<td valign="middle" align="left">Methods using 3D parameters depending on molecular stereochemistry</td>
<td valign="middle" align="left">Optimized 3D structure, Monte Carlo, quantum chemical calculations</td>
<td valign="middle" align="left">Tens to tens of thousands compounds per day</td></tr>
<tr>
<td valign="middle" align="left">Fragmental and atom-type based methods using 1D or 2D parameters</td>
<td valign="middle" align="left">Molecule as a smile, 2D graph</td>
<td valign="middle" align="left">Million of compounds per day</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-ijms-10-02558" position="float">
<label>Table 2.</label>
<caption>
<p>Performance of different linear methods applied on the same 21-test set compounds.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="middle" align="center"><bold>Lead author</bold></th>
<th valign="middle" align="center"><bold>Method</bold></th>
<th valign="middle" align="center"><bold>Type of descriptors</bold></th>
<th valign="middle" align="center"><bold>Number of parameters</bold></th>
<th valign="middle" align="center"><bold>rms</bold></th>
<th valign="middle" align="center"><bold>N/d</bold></th>
<th valign="middle" align="center"><bold>Reference</bold></th>
<th valign="middle" align="center"><bold>Year</bold></th></tr></thead>
<tbody>
<tr>
<td valign="middle" align="center">Klopman</td>
<td valign="middle" align="center">GCM</td>
<td valign="middle" align="center">2D Substructures</td>
<td valign="middle" align="center">34</td>
<td valign="middle" align="center">1.213</td>
<td valign="middle" align="center">0.62</td>
<td valign="middle" align="center">[<xref ref-type="bibr" rid="b14-ijms-10-02558">14</xref>]</td>
<td valign="middle" align="center">1992</td></tr>
<tr>
<td valign="middle" align="center">Yan</td>
<td valign="middle" align="center">MLR</td>
<td valign="middle" align="center">3D Descriptors</td>
<td valign="middle" align="center">40</td>
<td valign="middle" align="center">1.286</td>
<td valign="middle" align="center">0.53</td>
<td valign="middle" align="center">[<xref ref-type="bibr" rid="b91-ijms-10-02558">91</xref>]</td>
<td valign="middle" align="center">2003</td></tr>
<tr>
<td valign="middle" align="center">Hou</td>
<td valign="middle" align="center">GCM</td>
<td valign="middle" align="center">Atomic</td>
<td valign="middle" align="center">78</td>
<td valign="middle" align="center">0.664</td>
<td valign="middle" align="center">0.27</td>
<td valign="middle" align="center">[<xref ref-type="bibr" rid="b92-ijms-10-02558">92</xref>]</td>
<td valign="middle" align="center">2004</td></tr>
<tr>
<td valign="middle" align="center">Huuskonen</td>
<td valign="middle" align="center">MLR</td>
<td valign="middle" align="center">Topologicals</td>
<td valign="middle" align="center">30</td>
<td valign="middle" align="center">0.810</td>
<td valign="middle" align="center">0.70</td>
<td valign="middle" align="center">[<xref ref-type="bibr" rid="b93-ijms-10-02558">93</xref>]</td>
<td valign="middle" align="center">2000</td></tr>
<tr>
<td valign="middle" align="center">Duchowicz</td>
<td valign="middle" align="center">MLR</td>
<td valign="middle" align="center">Dragon</td>
<td valign="middle" align="center">3</td>
<td valign="middle" align="center">1.202</td>
<td valign="middle" align="center">7.00</td>
<td valign="middle" align="center">this study</td>
<td valign="middle" align="center">2008</td></tr></tbody></table></table-wrap></sec></back></article>
