<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">ijms</journal-id>
<journal-title>International Journal of Molecular Sciences</journal-title>
<abbrev-journal-title>Int. J. Mol. Sci.</abbrev-journal-title>
<issn pub-type="epub">1422-0067</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/ijms131115387</article-id>
<article-id pub-id-type="publisher-id">ijms-13-15387</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Predicting Retention Times of Naturally Occurring Phenolic Compounds in Reversed-Phase Liquid Chromatography: A Quantitative Structure-Retention Relationship (QSRR) Approach</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Akbar</surname><given-names>Jamshed</given-names></name><xref ref-type="aff" rid="af1-ijms-13-15387">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Iqbal</surname><given-names>Shahid</given-names></name><xref ref-type="aff" rid="af1-ijms-13-15387">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Batool</surname><given-names>Fozia</given-names></name><xref ref-type="aff" rid="af1-ijms-13-15387">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Karim</surname><given-names>Abdul</given-names></name><xref ref-type="aff" rid="af1-ijms-13-15387">1</xref></contrib>
<contrib contrib-type="author">
<name><surname>Chan</surname><given-names>Kim Wei</given-names></name><xref ref-type="aff" rid="af2-ijms-13-15387">2</xref><xref ref-type="corresp" rid="c1-ijms-13-15387">*</xref></contrib></contrib-group>
<aff id="af1-ijms-13-15387">
<label>1</label>Department of Chemistry, University of Sargodha, Sargodha 40100, Pakistan; E-Mails: <email>jamshed.chemist@gmail.com</email> (J.A.); <email>ranashahid313@gmail.com</email> (S.I.); <email>foziaanalytical@yahoo.com</email> (F.B.); <email>drugrelease@yahoo.com</email> (A.K.)</aff>
<aff id="af2-ijms-13-15387">
<label>2</label>Laboratory of Molecular Biomedicine, Institute of Bioscience, Universiti Putra Malaysia, Serdang 43400, Malaysia</aff>
<author-notes>
<corresp id="c1-ijms-13-15387">
<label>*</label>Author to whom correspondence should be addressed; E-Mail: <email>chankw@ibs.upm.edu.my</email>; Tel.: +603-8947-2115; Fax: +603-8947-2116.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>11</month>
<year>2012</year></pub-date>
<volume>13</volume>
<issue>11</issue>
<fpage>15387</fpage>
<lpage>15400</lpage>
<history>
<date date-type="received">
<day>25</day>
<month>09</month>
<year>2012</year></date>
<date date-type="rev-recd">
<day>24</day>
<month>10</month>
<year>2012</year></date>
<date date-type="accepted">
<day>29</day>
<month>10</month>
<year>2012</year></date></history>
<permissions>
<copyright-statement>© 2012 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0">
<p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>Quantitative structure-retention relationships (QSRRs) have successfully been developed for naturally occurring phenolic compounds in a reversed-phase liquid chromatographic (RPLC) system. A total of 1519 descriptors were calculated from the optimized structures of the molecules using MOPAC2009 and DRAGON softwares. The data set of 39 molecules was divided into training and external validation sets. For feature selection and mapping we used step-wise multiple linear regression (SMLR), unsupervised forward selection followed by step-wise multiple linear regression (UFS-SMLR) and artificial neural networks (ANN). Stable and robust models with significant predictive abilities in terms of validation statistics were obtained with negation of any chance correlation. ANN models were found better than remaining two approaches. HNar, IDM, Mp, GATS2v, DISP and 3D-MoRSE (signals 22, 28 and 32) descriptors based on van der Waals volume, electronegativity, mass and polarizability, at atomic level, were found to have significant effects on the retention times. The possible implications of these descriptors in RPLC have been discussed. All the models are proven to be quite able to predict the retention times of phenolic compounds and have shown remarkable validation, robustness, stability and predictive performance.</p></abstract>
<kwd-group>
<kwd>QSRR (quantitative structure-retention relationship)</kwd>
<kwd>naturally occurring phenolic compounds</kwd>
<kwd>artificial neural networks</kwd>
<kwd>unsupervised forward selection</kwd>
<kwd>reversed phase liquid chromatography</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<title>1. Introduction</title>
<p>Naturally occurring phenolic compounds are widespread among plants; they are synthesized during various metabolic pathways and their concentration varies over a wide range depending upon the plant [<xref ref-type="bibr" rid="b1-ijms-13-15387">1</xref>–<xref ref-type="bibr" rid="b4-ijms-13-15387">4</xref>]. They have significant importance during the current decade, due to their well-proven antioxidant, anti-aging, antimicrobial and immunomodulatory activities [<xref ref-type="bibr" rid="b5-ijms-13-15387">5</xref>,<xref ref-type="bibr" rid="b6-ijms-13-15387">6</xref>]. Phenolic compounds provide oxidative stability to foods and beverages, besides contributing health benefits [<xref ref-type="bibr" rid="b7-ijms-13-15387">7</xref>–<xref ref-type="bibr" rid="b9-ijms-13-15387">9</xref>]. A recent rising interest in the determination of phenolic compounds is mainly due to their potential protective roles against number of diseases associated with oxidative stress or initiated by free radicals, including coronary heart disease, stroke and cancer [<xref ref-type="bibr" rid="b10-ijms-13-15387">10</xref>,<xref ref-type="bibr" rid="b11-ijms-13-15387">11</xref>]. So the overwhelming beneficial attributes of phenolics requires detailed study of their structure and availability in different food items. For this purpose, separation as well as identification of these compounds is necessary. Numerous analytical approaches have been described in the literature for the analysis of variety of phenolics [<xref ref-type="bibr" rid="b12-ijms-13-15387">12</xref>–<xref ref-type="bibr" rid="b15-ijms-13-15387">15</xref>]. In this context, reversed-phase liquid chromatography-mass spectrometry is considered a practically state-of-the-art technique; as reversed-phase liquid chromatography (RPLC) provides better separation and mass spectrometry (MS) gives sensitive detection and confirms structures of compounds [<xref ref-type="bibr" rid="b16-ijms-13-15387">16</xref>].</p>
<p>Quantitative structure-retention relationships (QSRRs) have gained wide attention in the area of separation science recently. These models are based on the relationship between structures and properties of compounds. Retention times of different compounds can be predicted from their formulae and even unknown compounds can be identified by using this method. In general, QSRR models attempt to predict the retention time of a molecule by characterizing it with a series of molecular descriptors. These models can effectively be used for the prediction of molecular structures, determination of retention times of new analytes and to understand the separation mechanism for a chromatographic system [<xref ref-type="bibr" rid="b17-ijms-13-15387">17</xref>]. Several QSRRs have been developed to predict the retention times of different analytes on different systems [<xref ref-type="bibr" rid="b18-ijms-13-15387">18</xref>–<xref ref-type="bibr" rid="b24-ijms-13-15387">24</xref>]. Applications and implications of QSRR methodology in chromatography has recently been thoroughly reviewed and emphasized [<xref ref-type="bibr" rid="b25-ijms-13-15387">25</xref>,<xref ref-type="bibr" rid="b26-ijms-13-15387">26</xref>]. No comprehensive report describing the QSRR study of phenolic compounds from natural sources has been presented so far. Naturally occurring phenolic compounds belong to varied classes, have a range of simple to complex structures and therefore, need a compact statistical approach of QSRRs. The aim of this study is to develop statistically significant QSRR models, based on structural descriptors, for the prediction of retention times of naturally occurring phenolic compounds in RPLC. The approach consists of reduction of large descriptor pool to the most relevant descriptors with minimum multicollinearity and redundancy. The SMLR and UFS-SMLR have been used as supervised and unsupervised-supervised algorithms to reduce the descriptor pool. The selected descriptors are then used to generate ANN models with enhanced statistical significance. The study has generated reasonably stable, robust, and predictive models, which could provide an effective tool for predicting and analyzing the retention behavior of naturally occurring phenolic compounds in RPLC.</p></sec>
<sec sec-type="results|discussion">
<title>2. Results and Discussion</title>
<p>A total of 1519 descriptors were calculated from optimized structures of phenolics by use of MOPAC2009 and DRAGON version 3 softwares (<xref ref-type="table" rid="t1-ijms-13-15387">Table 1</xref>). The descriptors were initially filtered by removing those with zero values, constant values for 50% of the compounds and variance less than 0.0005. This pretreatment left a total of 915 descriptors in the data, which were subsequently used for model generation.</p>
<p>For QSRR development, data set of 39 phenolic compounds [<xref ref-type="bibr" rid="b27-ijms-13-15387">27</xref>] was randomly split into a training set of 30 molecules and an external validation set of nine molecules. For the purpose of model generation, retention times (RT) were used as response variables.</p>
<sec>
<title>2.1. Stepwise Multiple Linear Regression Model (SMLR Model)</title>
<p>The 915 descriptors, survived after initial filtration, were used to construct models by SMLR method using a sufficiently stringent criterion (<italic>F</italic> = 6 to enter, <italic>F</italic> = 3 to remove) in order to keep less number of descriptors in the model so as to avoid multi-collinearity. The five descriptor model based on training set for predicting retention times of phenolics is</p>
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm1" display="block">
<mml:semantics id="sm1">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>RT</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>5.527</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.584</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>HNar</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>3.462</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.348</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>GATS</mml:mtext>
<mml:mn>2</mml:mn>
<mml:mtext>v</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>4.161</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.320</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>DISPe</mml:mtext>
<mml:mo>-</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1.386</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.305</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mor</mml:mtext>
<mml:mn>32</mml:mn>
<mml:mtext>e</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>1.634</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.514</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Ke</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>4.451</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.012</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>30</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mn>0.962</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.062</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.941</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1.929</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.760</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<p><xref rid="FD1" ref-type="disp-formula">Equation 1</xref> showed good stability as indicated by internal and external validation coefficients of determination. All the five descriptors exhibited very weak or negligible correlations with one another (<xref ref-type="table" rid="t2-ijms-13-15387">Table 2</xref>). Of all the descriptors, Ke, which appeared in step five of SMLR, showed somewhat more correlations with others, though not much significant, therefore, dropping this from the equation resulted in another equation with less number of descriptors and still of good statistical quality (<xref rid="FD2" ref-type="disp-formula">Equation 2</xref>).</p>
<p>Dropping step four descriptor Mor32e also resulted in a good model but it was comparatively poor in terms of external validation (<xref rid="FD3" ref-type="disp-formula">Equation 3</xref>). The four descriptor model (<xref rid="FD2" ref-type="disp-formula">Equation 2</xref>) was selected as an optimal model. The relative significance of descriptors in this model was ascertained by test statistics in Minitab 15. The corresponding <italic>T</italic>- and <italic>p</italic>-values for the individual terms in <xref rid="FD2" ref-type="disp-formula">Equation 2</xref> are: HNar, <italic>T</italic> = 11.19, <italic>p</italic> &lt; 0.001; GATS2v, <italic>T</italic> = −8.31, <italic>p</italic> &lt; 0.001; DISPe, <italic>T</italic> = 11.07, <italic>p</italic> &lt; 0.001; Mor32e, <italic>T</italic> = −3.71, <italic>p</italic> = 0.001. Low <italic>p</italic>-values indicate that these terms are significant in predicting retention times.</p>
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>RT</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>0.771</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.581</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>HNar</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>0.326</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.348</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>GATS</mml:mtext>
<mml:mn>2</mml:mn>
<mml:mtext>v</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>0.467</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.374</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>DISPe</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>0.115</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.356</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mor</mml:mtext>
<mml:mn>32</mml:mn>
<mml:mtext>e</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>0.686</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.092</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>30</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mn>0.946</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.080</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.924</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1.847</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.770</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD3">
<label>(3)</label>
<mml:math id="mm3" display="block">
<mml:semantics id="sm3">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>RT</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>7.574</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.615</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>HNar</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>2.242</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.367</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>GATS</mml:mtext>
<mml:mn>2</mml:mn>
<mml:mtext>v</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>3.792</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.442</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>DISPe</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>0.740</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.149</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>30</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mn>0.917</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.116</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.890</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>2.683</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.666</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<p><italic>y</italic>-Scrambling result was also encouraging for <xref rid="FD2" ref-type="disp-formula">Equation 2</xref> (<xref ref-type="fig" rid="f1-ijms-13-15387">Figure 1</xref>), where most of the scrambled models have statistical parameters clustered around zero in a symmetrical way, indicating that the scrambled models are of very low quality. Intercept value of the plot between <italic>R</italic><sup>2</sup> values of the scrambled models and correlation of observed and permuted responses was very low (0.141). This establishes the stability of model and eliminates possibility of any chance correlation.</p>
<p>Unsupervised Forward Selection-Stepwise Multiple Linear Regression Model (UFS-SMLR Model)</p>
<p>The 915 descriptors left after pretreatment were subjected to UFS algorithm with <italic>R</italic><sup>2</sup><sub>max</sub> = 0.90, that decreased the data set to only 22 linearly independent descriptors with minimum multi-collinearity and redundancy (<xref ref-type="table" rid="t3-ijms-13-15387">Table 3</xref>).</p>
<p>The SMLR method applied to UFS-selected descriptors produced a six descriptors model (<xref rid="FD4" ref-type="disp-formula">Equation 4</xref>). This model is quite good in terms of the entire applied statistical criterion, though, less significant than SMLR model as indicated by the PRESS and co-efficient of determination statistics.</p>
<disp-formula id="FD4">
<label>(4)</label>
<mml:math id="mm4" display="block">
<mml:semantics id="sm4">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>RT</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>29.480</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>2.997</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mp</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>0.208</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.056</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>IDM</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>0.208</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.033</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>DISPm</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>2.704</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.121</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mor</mml:mtext>
<mml:mn>22</mml:mn>
<mml:mtext>v</mml:mtext>
<mml:mo>-</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1.650</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.445</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mor</mml:mtext>
<mml:mn>28</mml:mn>
<mml:mtext>e</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>4.819</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.610</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>HATS</mml:mtext>
<mml:mn>7</mml:mn>
<mml:mtext>p</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>18.788</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.974</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>30</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mn>0.912</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.155</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.852</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>2.293</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.715</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<p>Selecting a five descriptor model (<xref rid="FD5" ref-type="disp-formula">Equation 5</xref>), after removal of step six descriptor, Mor28e, also showed a good predictive ability. The <italic>T</italic>- and <italic>p</italic>-values for individual terms in <xref rid="FD5" ref-type="disp-formula">Equation 5</xref> are: Mp, <italic>T</italic> = 9.30, <italic>p</italic> &lt; 0.001; IDM, <italic>T</italic> = 9.30, <italic>p</italic> &lt; 0.001; DISPm, <italic>T</italic> = 5.08, <italic>p</italic> = 0.003; Mor22v, <italic>T</italic> = −2.64, <italic>p</italic> = 0.014; Mor28e, <italic>T</italic> = −3.33, <italic>p</italic> = 0.003. The Mor22v descriptor has slightly higher <italic>p</italic>-value, however, all terms appeared to be significant in predicting retention times. The predictions made by <xref rid="FD2" ref-type="disp-formula">Equations 2</xref> and <xref rid="FD5" ref-type="disp-formula">5</xref> are given in <xref ref-type="fig" rid="f2-ijms-13-15387">Figure 2</xref>.</p>
<disp-formula id="FD5">
<label>(5)</label>
<mml:math id="mm5" display="block">
<mml:semantics id="sm5">
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mtext>RT</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>31.412</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>3.376</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mp</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>0.216</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.065</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>IDM</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>0.186</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.037</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>DISPm</mml:mtext>
<mml:mo>-</mml:mo>
<mml:mn>3.350</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>1.270</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mor</mml:mtext>
<mml:mn>22</mml:mn>
<mml:mtext>v</mml:mtext>
<mml:mo>-</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1.707</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>0.513</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi> </mml:mi>
<mml:mtext>Mor</mml:mtext>
<mml:mn>28</mml:mn>
<mml:mtext>e</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mn>19.576</mml:mn>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>±</mml:mo>
<mml:mn>2.258</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>N</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>30</mml:mn>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>R</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mn>0.877</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.189</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>int</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.820</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>PRESS</mml:mtext></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1.906</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow>
<mml:mrow>
<mml:mtext>ext</mml:mtext></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0.763</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:semantics></mml:math></disp-formula>
<p><italic>y</italic>-Scrambling result for UFS-SMLR was found similar to SMLR model, though slightly of less quality with <italic>R</italic><sup>2</sup> value 0.172.</p></sec>
<sec>
<title>2.2. Artificial Neural Network</title>
<p>The network architecture and validation statistics are given in <xref ref-type="table" rid="t4-ijms-13-15387">Table 4</xref>.</p>
<p>In this study, the whole data has been divided into three sets: training, test and validation sets. A test set is used for early stopping of training in order to avoid overfitting. Sometimes the test data alone may not provide an evidence of a good generalization an ANN e.g., it can be just a coincidence. To make sure that this is not the case, another validation set was used. This puts an extra check on the performance and generality of ANN. To make things clearer, the training, test and validation sets have been marked in <xref ref-type="table" rid="t5-ijms-13-15387">Table 5</xref>. ANN models are better than both SMLR and UFS-SMLR models. Though, SMLR model is comparable to SMLR-ANN model, nevertheless, the real strength of artificial neural network mapping technique was observed for UFS-SMLR-ANN model, which showed considerably better prediction ability than the simple UFS-SMLR model as depicted by <italic>Q</italic><sup>2</sup><sub>ext</sub>. In ANN models, the global sensitivity analysis was performed which ranked the descriptors of SMLR-ANN model as HNar &gt; DISPe &gt; GATS2v &gt; Mor32e and UFS-SMLR-ANN model as Mp &gt; DISPm &gt; Mor22v &gt; IDM &gt; Mor28e. The predictions of SMLR-ANN and UFS-SMLR-ANN are presented in <xref ref-type="fig" rid="f3-ijms-13-15387">Figure 3</xref>.</p></sec>
<sec sec-type="discussion">
<title>2.3. Interpretation of the Models</title>
<p>In case of selected SMLR model (<xref rid="FD2" ref-type="disp-formula">Equation 2</xref>), HNar and GATS2v are 2D descriptors derived from molecular graph. HNar is the Narumi harmonic topological index related to molecular branching and represents the number of non-hydrogen atoms divided by the reciprocal vertex degree [<xref ref-type="bibr" rid="b28-ijms-13-15387">28</xref>]. Its positive coefficient suggests that increase in HNar leads to an increase in RT. GATS2v is the Geary autocorrelation-lag2 weighted by atomic van der Waals volumes. The autocorrelation descriptors show the distribution of a certain property in the topological structure [<xref ref-type="bibr" rid="b29-ijms-13-15387">29</xref>]. The GATS2v descriptor shows the distribution of atomic volume at a distance of two bonds in the topological structure of molecule. The negative coefficient of GATS2v is an indicative of decrease in RT with an increase in lag2 autocorrelations of atomic volumes on molecular graph. The descriptors DISPe and Mor33e are derived from three dimensional structures of the molecules. DISPe is the d COMMA2 value weighted by atomic senders on electronegativities and it represents the displacement between the geometric and the electronegativity centers of the molecule [<xref ref-type="bibr" rid="b30-ijms-13-15387">30</xref>]. The positive coefficient of DISPe indicates that molecules with increased displacement between the geometric and the electronegativity centers will take more time to elute. Mor32e is the 3D-MoRSE signal-32, weighted by atomic Sanderson electronegativities. The 3D-MoRSE signals give three dimensional molecular representation of structure based on electron diffraction and contain information on mass distribution and branching within a molecule [<xref ref-type="bibr" rid="b31-ijms-13-15387">31</xref>]. The negative coefficient for Mor32e suggests an inverse relation with RT. It follows, therefore, that molecule with more branching, less lag-2 autocorrelation of atomic volumes, enhanced displacement between the geometric and the electronegativity centers and low value of Mo32e descriptor will have more retention times in RPLC.</p>
<p>For UFS-SMLR selected model (<xref rid="FD5" ref-type="disp-formula">Equation 5</xref>), Mp is a constitutional descriptor while IDM is a 2D topological descriptor. Mp is the mean atomic polarizability scaled on carbon atom, IDM is the mean information content on the distance magnitude. DISPm, and 3D-MoRSE signals are 3D descriptors. DISPm is the d COMMA2 value weighted by atomic masses, Mor22v and Mor28e are the 3D-MoRSE signals, 22 and 28, weighted by atomic van der Waals volumes and atomic Sanderson electronegativities, respectively. UFS-SMLR model also emphasized the importance of topological descriptor (IDM), atomic volume and atomic electronegativity based 3D descriptors of molecules for the retention behavior of phenolic compounds, as was observed in SMLR selected descriptors. Despite the weighing schemes, the behavior of three dimensional descriptors was similar in both approaches. 3D-MoRSE descriptors related negatively and 3D geometrical DISP descriptors related positively with the retention times in both types of models. This corresponds to similar effects of 3D descriptors in developed QSRRs. A positive coefficient for Mp is an indication of increase in retention time with increase in mean atomic polarizability. In phenolics, oxygen atom is largely present either as hydroxyl group (independent or as a part of carboxyl group) or as ether linkage. Based on the relative nature of carbon, hydrogen and oxygen, it is expected that a decrease in number of hydroxyl groups increases the Mp value. It therefore, suggests that molecules with more hydroxyl group will have low values of Mp and hence they are eluted earlier with the polar mobile phase due to greater number of polar hydroxyl groups in them and hence have less retention times. This behavior can be well observed in case of Gallic acid, Gentisic acid and Salicylic acid (<xref ref-type="table" rid="t5-ijms-13-15387">Table 5</xref>, <xref ref-type="supplementary-material" rid="s1-ijms-13-15387">Table S1 supplementary data</xref>) containing four, three and two hydroxyl groups with Mp values 0.64, 0.65 and 0.67, respectively. The other descriptor IDM also relates directly to RT suggesting an increase in RT with increase in its value. This descriptor provides mean information content on distance magnitude and it is expected to increase with increase in number of atoms in a molecule. Another descriptor DISPm is an indicative of conformational features of molecules. It is generally suggested that rigid molecules have low values of DISPm [<xref ref-type="bibr" rid="b29-ijms-13-15387">29</xref>]. This descriptor relates directly to RT which suggests that rigid molecule will have less retention time. The foregoing discussion revealed that generally molecules with more hydroxyl groups, less number of atoms, rigidity and high values of 3D-MoRSE descriptors are eluted faster than others. Mathematical detail of the molecular descriptors is available in the Handbook of Molecular Descriptors [<xref ref-type="bibr" rid="b32-ijms-13-15387">32</xref>].</p>
<p>Quantum mechanical descriptors failed to make any impact, whatsoever, on the models. <xref rid="FD2" ref-type="disp-formula">Equations 2</xref> and <xref rid="FD5" ref-type="disp-formula">5</xref> and optimal artificial neural networks (<xref ref-type="table" rid="t4-ijms-13-15387">Table 4</xref>) were used to predict the retention times of naturally occurring phenolic compounds. The predicted results are presented in <xref ref-type="table" rid="t5-ijms-13-15387">Table 5</xref>, <xref ref-type="fig" rid="f2-ijms-13-15387">Figures 2</xref> and <xref ref-type="fig" rid="f3-ijms-13-15387">3</xref> and residual plot for the developed models is presented in <xref ref-type="fig" rid="f4-ijms-13-15387">Figure 4</xref>.</p></sec></sec>
<sec>
<title>3. Experimental Section</title>
<sec sec-type="methods">
<title>3.1. Data for Retention Times of Phenolic Compounds</title>
<p>Data used to generate structure-retention relationship of phenolic compounds were obtained from a recently developed sharp method of their analysis in RPLC-MS system [<xref ref-type="bibr" rid="b27-ijms-13-15387">27</xref>]. Briefly, the compounds were separated by gradient elution, using a reversed-phase C<sub>18</sub> analytical column (50 × 2 mm, 2.5 μm particle size; Phenomenex Synergi Fusion-RP100A) with a C<sub>18</sub> guard column (4 × 2 mm; Phenomenex Fusion-RP) maintained at 35 °C. The mobile phase used was deionized water (A) and acetonitrile (B); each containing 0.1% (v/v) formic acid in a linear gradient from 1% to 100% B during 9.5 min.</p></sec>
<sec>
<title>3.2. Descriptor Computation</title>
<p>Three dimensional structures of phenolic compounds, created by using Chemsketch, were optimized by the use of semi-empirical PM6 Hamiltonian with eigen vector following (EF) algorithm implemented in MOPAC2009 software [<xref ref-type="bibr" rid="b33-ijms-13-15387">33</xref>]. Calculation of numerical descriptors from optimized geometries was performed usingMOPAC2009 and DRAGON, version 3 [<xref ref-type="bibr" rid="b34-ijms-13-15387">34</xref>] softwares. Total number of calculated descriptors was 1519. Molecular weight (MW) descriptor was duplicated in both the softwares, therefore, MW only from MOPAC2009 was used in this study. Dragon was used to compute 1497 descriptors divided into 18 logical blocks and 23 descriptors were obtained from MOPAC2009 (<xref ref-type="table" rid="t1-ijms-13-15387">Table 1</xref>).</p></sec>
<sec>
<title>3.3. Feature Selection and Model Generation</title>
<p>Step-wise multiple linear regression (SMLR) and unsupervised forward selection followed by step-wise multiple linear regression (UFS-SMLR) was used for feature selection. UFS is a technique to remove redundant and multi-collinear descriptors from the data set [<xref ref-type="bibr" rid="b35-ijms-13-15387">35</xref>]. UFS was performed with ufs-1.8, obtained from the Centre for Molecular Design (CMD), University of Portsmouth, using <italic>R</italic><sup>2</sup><sub>max</sub> = 0.9. The subset of descriptors produced by UFS was later used to develop model by SMLR method. Before applying the regression method, all the data were standardized to zero mean and unit variance in order to avoid any biased nature of the calculated descriptors, which may lead to series errors in generation and application of the models. The standardized data were subjected to SMLR method for model generation.</p>
<p>ANN is a powerful multivariate data analysis technique, capable of both linear and non-linear modeling and has been widely used in modeling structure-property relationships [<xref ref-type="bibr" rid="b22-ijms-13-15387">22</xref>,<xref ref-type="bibr" rid="b36-ijms-13-15387">36</xref>,<xref ref-type="bibr" rid="b37-ijms-13-15387">37</xref>]. An ANN mathematical model mimics the human brain intelligence system and consists of various interconnecting neurons organized in a sequential manner into an input layer, one or more hidden layers and an output layer. Each interconnection of the neurons has some numerical value (weight) associated with it. The signals are transmitted from the input layer to output layer through the neurons. The whole network is first trained on some data by adjusting the interconnection weights and is subsequently used to make predictions for external data. In the present study, optimal number of descriptors, selected by SMLR and UFS-SMLR techniques, was entered as continuous input signals into ANNs and output was the response variable RT. 500 ANNs were trained in both cases by the use of Statistica 8.0 automated artificial neural network implementation. Multilayer perceptrons (MLP) type network with feed-forward topology, Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm and normal randomization were used for ANNs training and sum-of-squares error function was used to test their performances. Identity, logistic, exponential and tanh activation functions both for hidden and output layer and number of hidden units from 3 to 8 were used in ANNs building. The models, exhibiting least external validation errors, were selected as optimal models. In ANNs building process, an early stopping technique was employed to avoid over-training of the ANN models. For this purpose, the training set was further sub-divided randomly into a subset of 25 molecules for training the ANNs and a subset of five molecules as a test set to avoid over-fitting. In the development of both SMLR descriptors based ANN (SMLR-ANN) and UFS-SMLR descriptors based ANN (UFS-SMLRANN), same subsets of training set were used. Further, for external validation of all the models, same external validation set of nine molecules was used.</p></sec>
<sec>
<title>3.4. Model Validation</title>
<p>Model validation is a requisite to assess the applicability of generated models. Several techniques are in use in chemometrics [<xref ref-type="bibr" rid="b38-ijms-13-15387">38</xref>–<xref ref-type="bibr" rid="b41-ijms-13-15387">41</xref>]. In the present study, models were validated both internally as well as externally and any chance correlation was tested by the use of a <italic>y</italic>-scrambling technique: a method frequently used for this purpose. Internal validation was performed by leave-one-out cross validation and external validation by applying the model on external validation set of nine molecules. The statistical quality of the model was judged by considering the sum of squares of prediction errors and the validation correlation coefficients <italic>Q</italic><sup>2</sup><sub>int</sub> &amp; <italic>Q</italic><sup>2</sup><sub>ext</sub> for internal and external validation respectively (<xref rid="FD6" ref-type="disp-formula">Equations 6</xref> and <xref rid="FD7" ref-type="disp-formula">7</xref>, respectively).</p>
<disp-formula id="FD6">
<label>(6)</label>
<mml:math id="mm6" display="block">
<mml:semantics id="sm6">
<mml:mrow>
<mml:mtext>PRESS</mml:mtext>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>^</mml:mo></mml:mover></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD7">
<label>(7)</label>
<mml:math id="mm7" display="block">
<mml:semantics id="sm7">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>Q</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mtext>i</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mtext>n</mml:mtext></mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>^</mml:mo></mml:mover></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn></mml:mrow>
<mml:mi>n</mml:mi></mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>y</mml:mi></mml:mrow>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>-</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>¯</mml:mo></mml:mover></mml:mrow>
<mml:mrow>
<mml:mtext>train</mml:mtext></mml:mrow></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow>
<mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula>
<p>where <italic>ŷ</italic><italic><sub>i</sub></italic> is the predicted value, <italic>y</italic><italic><sub>i</sub></italic> is the observed value for <italic>i</italic>th case in training or validation set as the case may be, and <italic>y̆</italic><sub>train</sub> is the mean of the training set. In above expressions, mean of the training set was used in order to have same standard reference for both internal and external validation statistics. However, using mean of validation set made almost no difference in the present study. For example, in case of SMLR model, <italic>Q</italic><sup>2</sup><sub>ext</sub> using training set mean was 0.769, while using validation set mean, it was 0.770. <italic>y</italic>-Scrambling was performed 500 times for the models in order to establish the stability of model and to negate any chance correlation. The statistical quality parameters of the scrambled models were compared with those of the original models. Performance of the selected ANN models was judged by the <italic>Q</italic><sup>2</sup><sub>ext</sub> statistics. All the statistical calculations were performed using Statistica 8.0 and MS Excel<sup>®</sup> 2007.</p></sec></sec>
<sec sec-type="conclusions">
<title>4. Conclusions</title>
<p>SMLR, UFS-SMLR and ANN directed QSRR models have successfully been developed for predicting the retention times of naturally occurring phenolic compounds in the RPLC system. ANN models are more authentic in prediction of retention times of phenolics in RPLC than the other two approaches. SMLR model is comparable to SMLR-ANN, however, UFS-SMLR model was found less predictive than others. The models identified Mp, IDM, HNar, DISP, GATS2v and 3D-MoRSE (signals 22, 28 and 32), descriptors responsible for the retention of phenolic compounds. These descriptors signify the importance of branching, size, hydroxyl groups and 3D geometric, electronegativity and mass distribution features within phenolics. The models were found predictive and robust.</p></sec>
<sec sec-type="supplementary-material">
<title>Supplementary Materials</title>
<supplementary-material id="s1-ijms-13-15387" content-type="local-data">
<media xlink:href="ijms-13-15387-s001.pdf" mimetype="application" mime-subtype="pdf"/></supplementary-material></sec></body>
<back>
<ref-list>
<title>References</title>
<ref id="b1-ijms-13-15387"><label>1</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Breinholt</surname><given-names>V.</given-names></name></person-group><source><italic>Desirable</italic> versus <italic>Harmful Levels of Intake Offlavonoids and Phenolic Acids</italic></source><publisher-name>The Royal Society of Chemistry</publisher-name><publisher-loc>Cambridge, UK</publisher-loc><year>1999</year><fpage>93</fpage><lpage>105</lpage></citation></ref>
<ref id="b2-ijms-13-15387"><label>2</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Shahidi</surname><given-names>F.</given-names></name><name><surname>Naczk</surname><given-names>M.</given-names></name></person-group><source>Food Phenolics</source><publisher-name>Technomic Publishing</publisher-name><publisher-loc>Lancaster, PA, USA</publisher-loc><fpage>1995</fpage></citation></ref>
<ref id="b3-ijms-13-15387"><label>3</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Naczk</surname><given-names>M.</given-names></name><name><surname>Shahidi</surname><given-names>F.</given-names></name></person-group><article-title>Phenolics in cereals, fruits and vegetables: Occurrence, extraction and analysis</article-title><source>J. Pharmaceut. Biomed</source><year>2006</year><volume>41</volume><fpage>1523</fpage><lpage>1542</lpage><pub-id pub-id-type="doi">10.1016/j.jpba.2006.04.002</pub-id></citation></ref>
<ref id="b4-ijms-13-15387"><label>4</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Iqbal</surname><given-names>S.</given-names></name><name><surname>Younas</surname><given-names>U.</given-names></name><name><surname>Sirajuddin</surname></name><name><surname>Chan</surname><given-names>K.W.</given-names></name><name><surname>Sarfraz</surname><given-names>R.A.</given-names></name><name><surname>Uddin</surname><given-names>M.K.</given-names></name></person-group><article-title>Proximate composition and antioxidant potential of leaves from three varieties of Mulberry (<italic>Morus</italic> sp.): A comparative study</article-title><source>Int. J. Mol. Sci</source><year>2012</year><volume>13</volume><fpage>6651</fpage><lpage>6664</lpage><pub-id pub-id-type="doi">10.3390/ijms13066651</pub-id><pub-id pub-id-type="pmid">22837655</pub-id></citation></ref>
<ref id="b5-ijms-13-15387"><label>5</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Herrmann</surname><given-names>K.</given-names></name></person-group><article-title>Ccurrence and content of hydroxycinnamic and hydroxybenzoic acid compounds in foods</article-title><source>Crit. Rev. Food Sci. Nutr</source><year>1989</year><volume>28</volume><fpage>315</fpage><lpage>347</lpage><pub-id pub-id-type="doi">10.1080/10408398909527504</pub-id><pub-id pub-id-type="pmid">2690858</pub-id></citation></ref>
<ref id="b6-ijms-13-15387"><label>6</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Middleton</surname><given-names>E.</given-names></name><name><surname>Kandaswami</surname><given-names>C.</given-names></name></person-group><source>The Impact of Plant Flavonoids on Mammalian Biology: Implications for Immunity, Inflammation and Cancer</source><publisher-name>Chapman and Hall</publisher-name><publisher-loc>London, UK</publisher-loc><fpage>1994</fpage></citation></ref>
<ref id="b7-ijms-13-15387"><label>7</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hertog</surname><given-names>M.G.L.</given-names></name><name><surname>Hollman</surname><given-names>P.C.H.</given-names></name><name><surname>Venema</surname><given-names>D.P.</given-names></name></person-group><article-title>Optimization of a quantitative HPLC determination of potentially anticarcinogenic flavonoids in vegetables and fruits</article-title><source>J. Agric. Food Chem</source><year>1992</year><volume>40</volume><fpage>1591</fpage><lpage>1598</lpage><pub-id pub-id-type="doi">10.1021/jf00021a023</pub-id></citation></ref>
<ref id="b8-ijms-13-15387"><label>8</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kondratyuk</surname><given-names>T.P.</given-names></name><name><surname>Pezzuto</surname><given-names>J.M.</given-names></name></person-group><article-title>Natural product polyphenols of relevance to human health</article-title><source>Pharm. Biol</source><year>2004</year><volume>42</volume><fpage>46</fpage><lpage>63</lpage><pub-id pub-id-type="doi">10.3109/13880200490893519</pub-id></citation></ref>
<ref id="b9-ijms-13-15387"><label>9</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname><given-names>L.H.</given-names></name><name><surname>Jiang</surname><given-names>Y.M.</given-names></name><name><surname>Shi</surname><given-names>J.</given-names></name><name><surname>Tomás-Barberán</surname><given-names>F.A.</given-names></name><name><surname>Datta</surname><given-names>N.</given-names></name><name><surname>Singanusong</surname><given-names>R.</given-names></name><name><surname>Chen</surname><given-names>S.S.</given-names></name></person-group><article-title>Flavonoids in food and their health benefits</article-title><source>Plant Food Hum. Nutr</source><year>2004</year><volume>59</volume><fpage>113</fpage><lpage>122</lpage><pub-id pub-id-type="doi">10.1007/s11130-004-0049-7</pub-id></citation></ref>
<ref id="b10-ijms-13-15387"><label>10</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Valko</surname><given-names>M.</given-names></name><name><surname>Leibfritz</surname><given-names>D.</given-names></name><name><surname>Moncola</surname><given-names>J.</given-names></name><name><surname>Cronin</surname><given-names>M.T.D.</given-names></name><name><surname>Mazura</surname><given-names>M.</given-names></name><name><surname>Telser</surname><given-names>J.</given-names></name></person-group><article-title>Free radicals and antioxidants in normal physiological functions and human disease</article-title><source>Int. J. Biochem. Cell Biol</source><year>2007</year><volume>39</volume><fpage>44</fpage><lpage>84</lpage><pub-id pub-id-type="doi">10.1016/j.biocel.2006.07.001</pub-id><pub-id pub-id-type="pmid">16978905</pub-id></citation></ref>
<ref id="b11-ijms-13-15387"><label>11</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scalbert</surname><given-names>A.</given-names></name><name><surname>Manach</surname><given-names>C.</given-names></name><name><surname>Morand</surname><given-names>C.</given-names></name><name><surname>Remesy</surname><given-names>C.</given-names></name><name><surname>Jimenez</surname><given-names>L.</given-names></name></person-group><article-title>Dietary polyphenols and the prevention of diseases</article-title><source>Crit. Rev. Food Sci. Nutr</source><year>2005</year><volume>45</volume><fpage>287</fpage><lpage>306</lpage><pub-id pub-id-type="doi">10.1080/1040869059096</pub-id><pub-id pub-id-type="pmid">16047496</pub-id></citation></ref>
<ref id="b12-ijms-13-15387"><label>12</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonoli</surname><given-names>M.</given-names></name><name><surname>Marconi</surname><given-names>E.</given-names></name><name><surname>Caboni</surname><given-names>M.F.</given-names></name></person-group><article-title>Free and bound phenolic compounds in barley (<italic>Hordeum vulgare</italic> L.) flours: Evaluation of the extraction capability of different solvent mixtures and pressurized liquid methods by micellar electrokinetic chromatography and spectrophotometry</article-title><source>J. Chromatogr. A</source><year>2004</year><volume>1057</volume><fpage>1</fpage><lpage>12</lpage><pub-id pub-id-type="doi">10.1016/j.chroma.2004.09.024</pub-id><pub-id pub-id-type="pmid">15584217</pub-id></citation></ref>
<ref id="b13-ijms-13-15387"><label>13</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhou</surname><given-names>Z.</given-names></name><name><surname>Robards</surname><given-names>K.</given-names></name><name><surname>Helliwell</surname><given-names>S.</given-names></name><name><surname>Blanchard</surname><given-names>C.</given-names></name></person-group><article-title>The distribution of phenolic acids in rice</article-title><source>Food Chem</source><year>2004</year><volume>87</volume><fpage>401</fpage><lpage>406</lpage><pub-id pub-id-type="doi">10.1016/j.foodchem.2003.12.015</pub-id></citation></ref>
<ref id="b14-ijms-13-15387"><label>14</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zadernowski</surname><given-names>R.</given-names></name><name><surname>Kozlowska</surname><given-names>H.</given-names></name></person-group><article-title>Phenolic acids in soybean and rapeseed flours</article-title><source>Lebensm. Wiss. Technol</source><year>1983</year><volume>16</volume><fpage>110</fpage><lpage>114</lpage></citation></ref>
<ref id="b15-ijms-13-15387"><label>15</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Parr</surname><given-names>A.</given-names></name><name><surname>Ng</surname><given-names>A.</given-names></name><name><surname>Waldron</surname><given-names>K.</given-names></name></person-group><article-title>Ester-linked phenolic components of carrot cell walls</article-title><source>J. Agric. Food Chem</source><year>1997</year><volume>45</volume><fpage>2468</fpage><lpage>2471</lpage><pub-id pub-id-type="doi">10.1021/jf960982k</pub-id></citation></ref>
<ref id="b16-ijms-13-15387"><label>16</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amakura</surname><given-names>Y.</given-names></name><name><surname>Okada</surname><given-names>M.</given-names></name><name><surname>Tsuji</surname><given-names>S.</given-names></name><name><surname>Tonogai</surname><given-names>Y.</given-names></name></person-group><article-title>Determination of phenolic acids in fruit juices by isocratic column liquid chromatography</article-title><source>J. Chromatogr. A</source><year>2000</year><volume>891</volume><fpage>183</fpage><lpage>188</lpage><pub-id pub-id-type="doi">10.1016/S0021-9673(00)00625-7</pub-id><pub-id pub-id-type="pmid">10999637</pub-id></citation></ref>
<ref id="b17-ijms-13-15387"><label>17</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McKinney</surname><given-names>J.D.</given-names></name><name><surname>Richard</surname><given-names>A.</given-names></name><name><surname>Waller</surname><given-names>C.</given-names></name><name><surname>Newman</surname><given-names>M.C.</given-names></name><name><surname>Gerberick</surname><given-names>F.</given-names></name></person-group><article-title>The practice of structure activity relationships (SAR) in toxicology</article-title><source>Toxicol. Sci</source><year>2000</year><volume>56</volume><fpage>8</fpage><lpage>17</lpage><pub-id pub-id-type="doi">10.1093/toxsci/56.1.8</pub-id><pub-id pub-id-type="pmid">10869449</pub-id></citation></ref>
<ref id="b18-ijms-13-15387"><label>18</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Amiri</surname><given-names>A.A.</given-names></name><name><surname>Hemmateenejad</surname><given-names>B.</given-names></name><name><surname>Safavi</surname><given-names>A.</given-names></name><name><surname>Sharghi</surname><given-names>H.</given-names></name><name><surname>Salimi Beni</surname><given-names>A.R.</given-names></name><name><surname>Shamsipur</surname><given-names>M.</given-names></name></person-group><article-title>Structure-retention and mobile phase-retention relationships for reversed-phase high-performance liquid chromatography of several hydroxythioxanthone derivatives in binary acetonitrile-water mixtures</article-title><source>Anal. Chim. Acta</source><year>2007</year><volume>605</volume><fpage>11</fpage><lpage>19</lpage><pub-id pub-id-type="doi">10.1016/j.aca.2007.10.028</pub-id><pub-id pub-id-type="pmid">18022405</pub-id></citation></ref>
<ref id="b19-ijms-13-15387"><label>19</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carlucci</surname><given-names>G.</given-names></name><name><surname>D’Archivio</surname><given-names>A.A.</given-names></name><name><surname>Maggi</surname><given-names>M.A.</given-names></name><name><surname>Mazzeo</surname><given-names>P.</given-names></name><name><surname>Ruggieri</surname><given-names>F.</given-names></name></person-group><article-title>Investigation of retention behaviour of non-steroidal anti-inflammatory drugs in high-performance liquid chromatography by using quantitative structure-retention relationships</article-title><source>Anal. Chim. Acta</source><year>2007</year><volume>601</volume><fpage>68</fpage><lpage>76</lpage><pub-id pub-id-type="doi">10.1016/j.aca.2007.08.026</pub-id><pub-id pub-id-type="pmid">17904471</pub-id></citation></ref>
<ref id="b20-ijms-13-15387"><label>20</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname><given-names>W.</given-names></name><name><surname>Luan</surname><given-names>F.</given-names></name><name><surname>Zhang</surname><given-names>H.</given-names></name><name><surname>Zhang</surname><given-names>X.</given-names></name><name><surname>Liua</surname><given-names>M.</given-names></name><name><surname>Hu</surname><given-names>Z.</given-names></name><name><surname>Fan</surname><given-names>B.</given-names></name></person-group><article-title>Quantitative structure-property relationships for pesticides in biopartitioning micellar chromatography. Quantitative retention-structure and retention-activity relationships of barbiturates by micellar liquid chromatography</article-title><source>J. Chromatogr. A</source><year>2006</year><volume>1113</volume><fpage>140</fpage><lpage>147</lpage><pub-id pub-id-type="doi">10.1016/j.chroma.2006.01.136</pub-id><pub-id pub-id-type="pmid">16490199</pub-id></citation></ref>
<ref id="b21-ijms-13-15387"><label>21</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Michel</surname><given-names>M.</given-names></name><name><surname>Baczek</surname><given-names>T.</given-names></name><name><surname>Studzińska</surname><given-names>S.</given-names></name><name><surname>Bodzioch</surname><given-names>K.</given-names></name><name><surname>Jonsson</surname><given-names>T.</given-names></name><name><surname>Kaliszan</surname><given-names>R.</given-names></name><name><surname>Buszewski</surname><given-names>B.</given-names></name></person-group><article-title>Comparative evaluation of high-performance liquid chromatography stationary phases used for the separation of peptides in terms of quantitative structure-retention relationships</article-title><source>J. Chromatogr. A</source><year>2007</year><volume>1175</volume><fpage>49</fpage><lpage>54</lpage><pub-id pub-id-type="doi">10.1016/j.chroma.2007.10.002</pub-id><pub-id pub-id-type="pmid">17980378</pub-id></citation></ref>
<ref id="b22-ijms-13-15387"><label>22</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fatemia</surname><given-names>M.H.</given-names></name><name><surname>Ghorbanzad’ea</surname><given-names>M.</given-names></name><name><surname>Baher</surname><given-names>E.</given-names></name></person-group><article-title>Quantitative structure retention relationship modeling of retention time for some organic pollutants</article-title><source>Anal. Lett</source><year>2010</year><volume>43</volume><fpage>823</fpage><lpage>835</lpage><pub-id pub-id-type="doi">10.1080/00032710903486294</pub-id></citation></ref>
<ref id="b23-ijms-13-15387"><label>23</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garkani-Nejad</surname><given-names>Z.</given-names></name></person-group><article-title>Quantitative Structure-Retention Relationship Study of Some Phenol Derivatives in Gas Chromatography</article-title><source>J. Chormatogr. Sci</source><year>2010</year><volume>48</volume><fpage>317</fpage><lpage>323</lpage></citation></ref>
<ref id="b24-ijms-13-15387"><label>24</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname><given-names>Y.</given-names></name><name><surname>Xi</surname><given-names>L.</given-names></name><name><surname>Chen</surname><given-names>D.</given-names></name><name><surname>Wu</surname><given-names>X.A.</given-names></name><name><surname>Liu</surname><given-names>H.</given-names></name><name><surname>Yao</surname><given-names>X.</given-names></name></person-group><article-title>Extraction, separation and quantitative structure-retention relationship modeling of essential oils in three herbs</article-title><source>J. Sep. Sci</source><year>2010</year><volume>33</volume><fpage>1980</fpage><lpage>1990</lpage><pub-id pub-id-type="doi">10.1002/jssc.201000105</pub-id><pub-id pub-id-type="pmid">20506431</pub-id></citation></ref>
<ref id="b25-ijms-13-15387"><label>25</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaliszan</surname><given-names>R.</given-names></name></person-group><article-title>QSRR: Quantitative Structure-(Chromatographic) Retention Relationships</article-title><source>Chem. Rev</source><year>2007</year><volume>107</volume><fpage>3212</fpage><lpage>3246</lpage><pub-id pub-id-type="doi">10.1021/cr068412z</pub-id><pub-id pub-id-type="pmid">17595149</pub-id></citation></ref>
<ref id="b26-ijms-13-15387"><label>26</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Kaliszan</surname><given-names>R.</given-names></name><name><surname>Bączek</surname><given-names>T.</given-names></name></person-group><article-title>QSAR in Chromatography: Quantitative Structure-Retention Relationships (QSRRs)</article-title><source>Recent Advances in QSAR Studies: Methods and Applications</source><person-group person-group-type="editor"><name><surname>Puzyn</surname><given-names>T.</given-names></name><name><surname>Leszczynski</surname><given-names>J.</given-names></name><name><surname>Cronin</surname><given-names>M.T.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Dordrecht, The Netherlands</publisher-loc><year>2010</year><fpage>223</fpage><lpage>259</lpage></citation></ref>
<ref id="b27-ijms-13-15387"><label>27</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gómez-Romero</surname><given-names>M.</given-names></name><name><surname>Zurek</surname><given-names>G.</given-names></name><name><surname>Schneider</surname><given-names>B.</given-names></name><name><surname>Baessmann</surname><given-names>C.</given-names></name><name><surname>Segura-Carretero</surname><given-names>A.</given-names></name><name><surname>Fernández-Gutiérrez</surname><given-names>A.</given-names></name></person-group><article-title>Automated identification of phenolics in plant-derived foods by using library search approach</article-title><source>Food Chem</source><year>2011</year><volume>124</volume><fpage>379</fpage><lpage>386</lpage><pub-id pub-id-type="doi">10.1016/j.foodchem.2010.06.032</pub-id></citation></ref>
<ref id="b28-ijms-13-15387"><label>28</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Riahi</surname><given-names>S.</given-names></name><name><surname>Mousavi</surname><given-names>M.F.</given-names></name><name><surname>Shamsipur</surname><given-names>M.</given-names></name></person-group><article-title>Prediction of selectivity coefficients of a theophylline-selective electrode using MLR and ANN</article-title><source>Talanta</source><year>2006</year><volume>69</volume><fpage>736</fpage><lpage>740</lpage><pub-id pub-id-type="doi">10.1016/j.talanta.2005.11.010</pub-id><pub-id pub-id-type="pmid">18970631</pub-id></citation></ref>
<ref id="b29-ijms-13-15387"><label>29</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fatemi</surname><given-names>M.H.</given-names></name><name><surname>Ghorbannezhad</surname><given-names>Z.</given-names></name></person-group><article-title>Estimation of the volume of distribution of some pharmacologically important compounds from their structural descriptors</article-title><source>J. Serb. Chem. Soc</source><year>2011</year><volume>76</volume><fpage>1003</fpage><lpage>1014</lpage><pub-id pub-id-type="doi">10.2298/JSC101104091F</pub-id></citation></ref>
<ref id="b30-ijms-13-15387"><label>30</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Saquib</surname><given-names>M.</given-names></name><name><surname>Gupta</surname><given-names>M.K.</given-names></name><name><surname>Sagar</surname><given-names>R.</given-names></name><name><surname>Prabhakar</surname><given-names>Y.S.</given-names></name><name><surname>Shaw</surname><given-names>A.K.</given-names></name><name><surname>Kumar</surname><given-names>R.</given-names></name><name><surname>Maulik</surname><given-names>P.R.</given-names></name><name><surname>Gaikwad</surname><given-names>A.N.</given-names></name><name><surname>Sinha</surname><given-names>S.</given-names></name><name><surname>Srivastava</surname><given-names>A.K.</given-names></name><etal/></person-group><article-title>C-3 alkyl/arylalkyl-2,3-dideoxy hex-2-enopyranosides as antitubercular agents: Synthesis, biological evaluation, and QSAR study</article-title><source>J. Med. Chem</source><year>2007</year><volume>50</volume><fpage>2942</fpage><lpage>2950</lpage><pub-id pub-id-type="doi">10.1021/jm070110h</pub-id><pub-id pub-id-type="pmid">17542574</pub-id></citation></ref>
<ref id="b31-ijms-13-15387"><label>31</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schuur</surname><given-names>J.H.</given-names></name><name><surname>Selzer</surname><given-names>P.</given-names></name><name><surname>Gasteiger</surname><given-names>J.</given-names></name></person-group><article-title>The Coding of the Three-Dimensional Structure of Molecules by Molecular Transforms and Its Application to Structure-Spectra Correlations and Studies of Biological Activity</article-title><source>J. Chem. Inf. Comput. Sci</source><year>1996</year><volume>36</volume><fpage>334</fpage><lpage>344</lpage><pub-id pub-id-type="doi">10.1021/ci950164c</pub-id></citation></ref>
<ref id="b32-ijms-13-15387"><label>32</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Todeschini</surname><given-names>R.</given-names></name><name><surname>Consonni</surname><given-names>V.</given-names></name></person-group><article-title>Handbook of Molecular Descriptors</article-title><source>Methods and Principles in Medicinal Chemistry</source><person-group person-group-type="editor"><name><surname>Mannhold</surname><given-names>R.</given-names></name><name><surname>Kubinyi</surname><given-names>H.</given-names></name><name><surname>Timmerman</surname><given-names>H.</given-names></name></person-group><publisher-name>John Wiley and Sons</publisher-name><publisher-loc>Weinheim, Germany</publisher-loc><year>2000</year><volume>11</volume></citation></ref>
<ref id="b33-ijms-13-15387"><label>33</label><citation citation-type="book"><source>MOPAC2009</source><publisher-name>Stewart Computational Chemistry</publisher-name><publisher-loc>Colorado Springs, CO, USA</publisher-loc><year>2009</year></citation></ref>
<ref id="b34-ijms-13-15387"><label>34</label><citation citation-type="book"><source><italic>DRAGON Software</italic>, version 3</source><publisher-name>Talete srl</publisher-name><publisher-loc>Milano, Italy</publisher-loc><year>2003</year></citation></ref>
<ref id="b35-ijms-13-15387"><label>35</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whitley</surname><given-names>D.C.</given-names></name><name><surname>Ford</surname><given-names>M.G.</given-names></name><name><surname>Livingstone</surname><given-names>D.J.</given-names></name></person-group><article-title>Unsupervised forward selection: A method for eliminating redundant variables</article-title><source>J. Chem. Inf. Comput. Sci</source><year>2000</year><volume>40</volume><fpage>1160</fpage><lpage>1168</lpage><pub-id pub-id-type="doi">10.1021/ci000384c</pub-id><pub-id pub-id-type="pmid">11045809</pub-id></citation></ref>
<ref id="b36-ijms-13-15387"><label>36</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mittermayr</surname><given-names>S.</given-names></name><name><surname>Olajos</surname><given-names>M.</given-names></name><name><surname>Chovan</surname><given-names>T.</given-names></name><name><surname>Bonn</surname><given-names>G.K.</given-names></name><name><surname>Guttman</surname><given-names>A.</given-names></name></person-group><article-title>Mobility modeling of peptides in capillary electrophoresis</article-title><source>Trends Anal. Chem</source><year>2008</year><volume>27</volume><fpage>407</fpage><lpage>417</lpage><pub-id pub-id-type="doi">10.1016/j.trac.2008.03.010</pub-id></citation></ref>
<ref id="b37-ijms-13-15387"><label>37</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yan</surname><given-names>A.X.</given-names></name><name><surname>Hu</surname><given-names>Z.D.</given-names></name></person-group><article-title>Linear and non-linear modeling for the investigation of gas chromatography retention indices of alkylbenzenes on cita-4, SE-30 and Carbowax 20M</article-title><source>Anal. Chim. Acta</source><year>2001</year><volume>433</volume><fpage>145</fpage><lpage>154</lpage><pub-id pub-id-type="doi">10.1016/S0003-2670(00)01379-9</pub-id></citation></ref>
<ref id="b38-ijms-13-15387"><label>38</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wold</surname><given-names>S.</given-names></name></person-group><article-title>Validation of QSAR’s</article-title><source>Quant. Struct. Act. Relat</source><year>1991</year><volume>10</volume><fpage>191</fpage><lpage>193</lpage><pub-id pub-id-type="doi">10.1002/qsar.19910100302</pub-id></citation></ref>
<ref id="b39-ijms-13-15387"><label>39</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tropsha</surname><given-names>A.</given-names></name><name><surname>Gramatica</surname><given-names>P.</given-names></name><name><surname>Gombar</surname><given-names>V.K.</given-names></name></person-group><article-title>The Importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models</article-title><source>QSAR Comb. Sci</source><year>2003</year><volume>22</volume><fpage>69</fpage><lpage>77</lpage><pub-id pub-id-type="doi">10.1002/qsar.200390007</pub-id></citation></ref>
<ref id="b40-ijms-13-15387"><label>40</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gramatica</surname><given-names>P.</given-names></name></person-group><article-title>Principles of QSAR models validation: Internal and external</article-title><source>QSAR Comb. Sci</source><year>2007</year><volume>26</volume><fpage>694</fpage><lpage>701</lpage><pub-id pub-id-type="doi">10.1002/qsar.200610151</pub-id></citation></ref>
<ref id="b41-ijms-13-15387"><label>41</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hawkins</surname><given-names>D.M.</given-names></name><name><surname>Basak</surname><given-names>S.C.</given-names></name><name><surname>Mills</surname><given-names>D.</given-names></name></person-group><article-title>Assessing model fit by cross-validation</article-title><source>J. Chem. Inf. Comput. Sci</source><year>2003</year><volume>43</volume><fpage>579</fpage><lpage>586</lpage><pub-id pub-id-type="doi">10.1021/ci025626i</pub-id><pub-id pub-id-type="pmid">12653524</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig id="f1-ijms-13-15387" position="float">
<label>Figure 1</label>
<caption>
<p>Representative <italic>y</italic>-scrambling plot (SMLR model).</p></caption>
<graphic xlink:href="ijms-13-15387f1.gif"/></fig>
<fig id="f2-ijms-13-15387" position="float">
<label>Figure 2</label>
<caption>
<p>Experimental and predicted retention times (RT) for training and validation sets. (<bold>a</bold>) SMLR model (<bold>b</bold>) UFS-SMLR model.</p></caption>
<graphic xlink:href="ijms-13-15387f2.gif"/></fig>
<fig id="f3-ijms-13-15387" position="float">
<label>Figure 3</label>
<caption>
<p>Experimental and predicted retention times (RT) for training, test and validation sets. (<bold>a</bold>) SMLR-ANN model (<bold>b</bold>) UFS-SMLR-ANN model.</p></caption>
<graphic xlink:href="ijms-13-15387f3.gif"/></fig>
<fig id="f4-ijms-13-15387" position="float">
<label>Figure 4</label>
<caption>
<p>Residual plot for QSRR models.</p></caption>
<graphic xlink:href="ijms-13-15387f4.gif"/></fig>
<table-wrap id="t1-ijms-13-15387" position="float">
<label>Table 1</label>
<caption>
<p>Descriptors used in the study.</p></caption>
<table frame="hsides" rules="rows">
<thead>
<tr>
<th align="left" valign="bottom">Method/Type</th>
<th align="left" valign="bottom">Descriptors</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="middle"><bold>MOPAC2009/Quantum mechanical</bold></td>
<td align="left" valign="middle">Total energy, electronic energy, core-core repulsion, dielectric energy, dipole moment, ionization energy, energies of highest occupied molecular orbital (<italic>E</italic><sub>HOMO</sub>) and lowest unoccupied molecular orbitals (<italic>E</italic><sub>LUMO</sub>), difference of <italic>E</italic><sub>LUMO</sub> and <italic>E</italic><sub>HOMO</sub>, hardness, softness, molecular mass, cosmo area, cosmo volume. Logarithmic transformations of dipole moment, ionization energy, <italic>E</italic><sub>LUMO</sub>, difference of <italic>E</italic><sub>LUMO</sub> and <italic>E</italic><sub>HOMO</sub>, hardness, softness, molecular mass, cosmo area and cosmo volume.</td></tr>
<tr>
<td align="left" valign="middle"><bold>DRAGON/18 blocks of descriptors</bold></td>
<td align="left" valign="middle">Constitutional, topological, molecular walk counts, BCUT, Galvez topological charge indices, 2D autocorrelations, charge descriptors, aromaticity indices, Randic molecular profiles, geometrical, RDF, 3D-MoRSE, WHIM, GETAWAY, functional groups, atom-centered fragments, empirical and properties.</td></tr></tbody></table></table-wrap>
<table-wrap id="t2-ijms-13-15387" position="float">
<label>Table 2</label>
<caption>
<p>Correlations of the descriptors in SMLR model.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="bottom"/>
<th align="center" valign="bottom">HNar</th>
<th align="center" valign="bottom">GATS2v</th>
<th align="center" valign="bottom">DISPe</th>
<th align="center" valign="bottom">Mor32e</th>
<th align="center" valign="bottom">Ke</th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top"><bold>HNar</bold></td>
<td align="center" valign="top">1.0000</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top"/></tr>
<tr>
<td align="center" valign="top"><bold>GATS2v</bold></td>
<td align="center" valign="top">−0.0482</td>
<td align="center" valign="top">1.0000</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/>
<td align="center" valign="top"/></tr>
<tr>
<td align="center" valign="top"><bold>DISPe</bold></td>
<td align="center" valign="top">0.1253</td>
<td align="center" valign="top">0.1566</td>
<td align="center" valign="top">1.0000</td>
<td align="center" valign="top"/>
<td align="center" valign="top"/></tr>
<tr>
<td align="center" valign="top"><bold>Mor32e</bold></td>
<td align="center" valign="top">−0.4053</td>
<td align="center" valign="top">−0.4069</td>
<td align="center" valign="top">0.0784</td>
<td align="center" valign="top">1.0000</td>
<td align="center" valign="top"/></tr>
<tr>
<td align="center" valign="top"><bold>Ke</bold></td>
<td align="center" valign="top">0.4727</td>
<td align="center" valign="top">0.4644</td>
<td align="center" valign="top">0.1360</td>
<td align="center" valign="top">−0.3608</td>
<td align="center" valign="top">1.0000</td></tr></tbody></table></table-wrap>
<table-wrap id="t3-ijms-13-15387" position="float">
<label>Table 3</label>
<caption>
<p>UFS selected descriptors with <italic>R</italic><sup>2</sup><sub>max</sub> = 0.90.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom">Descriptors</th>
<th align="left" valign="bottom">Name</th>
<th align="left" valign="bottom">Type</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">IDM</td>
<td align="left" valign="top">Mean information content on the distance magnitude</td>
<td align="left" valign="top">Topological</td></tr>
<tr>
<td align="left" valign="top">MATS6p</td>
<td align="left" valign="top">Moran autocorrelation-lag6/weighted by atomic poloarizabilities</td>
<td align="left" valign="top">2D-autocorrelations</td></tr>
<tr>
<td align="left" valign="top">Mp</td>
<td align="left" valign="top">Mean atomic polarizability (scaled on carbon atom)</td>
<td align="left" valign="top">Constitutional</td></tr>
<tr>
<td align="left" valign="top">E1e</td>
<td align="left" valign="top">1st component accessibility directional WHIM index/weighted by atomic Sanderson electronegativities</td>
<td align="left" valign="top">WHIM</td></tr>
<tr>
<td align="left" valign="top">MATS6e</td>
<td align="left" valign="top">Moran autocorrelation-lag6/weighted by atomic Sanderson electronegativities</td>
<td align="left" valign="top">2D-autocorrelations</td></tr>
<tr>
<td align="left" valign="top">Mor30m</td>
<td align="left" valign="top">3D-MoRSE-signal 30/weighted by atomic masses</td>
<td align="left" valign="top">3D-MoRSE</td></tr>
<tr>
<td align="left" valign="top">AROM</td>
<td align="left" valign="top">Aromaticity</td>
<td align="left" valign="top">Aromatic indices</td></tr>
<tr>
<td align="left" valign="top">E3u</td>
<td align="left" valign="top">3rd component accessibility directional WHIM index/unweighted</td>
<td align="left" valign="top">WHIM</td></tr>
<tr>
<td align="left" valign="top">Mor22v</td>
<td align="left" valign="top">3D-MoRSE-signal 22/weighted by atomic volume</td>
<td align="left" valign="top">3D-MoRSE</td></tr>
<tr>
<td align="left" valign="top">Mor28e</td>
<td align="left" valign="top">3D-MoRSE-signal 28/weighted by atomic Sanderson electronegativities</td>
<td align="left" valign="top">3D-MoRSE</td></tr>
<tr>
<td align="left" valign="top">Mor29m</td>
<td align="left" valign="top">3D-MoRSE-signal 29/weighted by atomic masses</td>
<td align="left" valign="top">3D-MoRSE</td></tr>
<tr>
<td align="left" valign="top">DISPm</td>
<td align="left" valign="top">d COMMA2 value/weighted by atomic masses</td>
<td align="left" valign="top">Geometrical</td></tr>
<tr>
<td align="left" valign="top">PJI3</td>
<td align="left" valign="top">3D petijean shape index</td>
<td align="left" valign="top">Geometrical</td></tr>
<tr>
<td align="left" valign="top">G3s</td>
<td align="left" valign="top">3rd component accessibility directional WHIM index/weighted by atomic electrotopological states</td>
<td align="left" valign="top">WHIM</td></tr>
<tr>
<td align="left" valign="top">MATS5e</td>
<td align="left" valign="top">Moran autocorrelation-lag5/weighted by atomic Sanderson electronegativities</td>
<td align="left" valign="top">2D-autocorrelations</td></tr>
<tr>
<td align="left" valign="top">PJI2</td>
<td align="left" valign="top">2D petijean shape index</td>
<td align="left" valign="top">Topological</td></tr>
<tr>
<td align="left" valign="top">SIC4</td>
<td align="left" valign="top">Structural information content (neighbourhood symmetry of 4-order)</td>
<td align="left" valign="top">Topological</td></tr>
<tr>
<td align="left" valign="top">E2p</td>
<td align="left" valign="top">3rd component accessibility directional WHIM index/weighted by atomic poloarizabilities</td>
<td align="left" valign="top">WHIM</td></tr>
<tr>
<td align="left" valign="top">Mor12e</td>
<td align="left" valign="top">3D-MoRSE-signal 12/weighted by atomic Sanderson electronegativities</td>
<td align="left" valign="top">3D-MoRSE</td></tr>
<tr>
<td align="left" valign="top">IVDE</td>
<td align="left" valign="top">Mean information content vertex degree equality</td>
<td align="left" valign="top">Topological</td></tr>
<tr>
<td align="left" valign="top">SPI</td>
<td align="left" valign="top">Superpendentic index</td>
<td align="left" valign="top">Topological</td></tr>
<tr>
<td align="left" valign="top">HATS7p</td>
<td align="left" valign="top">Leaverage-weighted autocorrelation of lag 7/weighted by atomic poloarizabilities</td>
<td align="left" valign="top">GETAWAY</td></tr></tbody></table></table-wrap>
<table-wrap id="t4-ijms-13-15387" position="float">
<label>Table 4</label>
<caption>
<p>Architecture and validation statistics of the optimal ANNs.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="bottom"/>
<th align="center" valign="bottom">SMLR-ANN</th>
<th align="center" valign="bottom">UFS-SMLR-ANN</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">No. of neurons in the input layer</td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">5</td></tr>
<tr>
<td align="left" valign="top">No. of neurons in the hidden layer</td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">5</td></tr>
<tr>
<td align="left" valign="top">No. of neurons in the output layer</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td></tr>
<tr>
<td align="left" valign="top">Hidden weight decay</td>
<td align="center" valign="top">0.01</td>
<td align="center" valign="top">0.01</td></tr>
<tr>
<td align="left" valign="top">Output weight decay</td>
<td align="center" valign="top">0.01</td>
<td align="center" valign="top">0.01</td></tr>
<tr>
<td align="left" valign="top">Hidden activation function</td>
<td align="center" valign="top">Tanh</td>
<td align="center" valign="top">Exponential</td></tr>
<tr>
<td align="left" valign="top">Output activation function</td>
<td align="center" valign="top">Tanh</td>
<td align="center" valign="top">Logistic</td></tr>
<tr>
<td align="left" valign="top">PRESS<sub>ext</sub></td>
<td align="center" valign="top">1.4841</td>
<td align="center" valign="top">1.1021</td></tr>
<tr>
<td align="left" valign="top"><italic>Q</italic><sup>2</sup><sub>ext</sub></td>
<td align="center" valign="top">0.8145</td>
<td align="center" valign="top">0.8622</td></tr>
<tr>
<td align="left" valign="top">Training error</td>
<td align="center" valign="top">0.0013</td>
<td align="center" valign="top">0.0047</td></tr>
<tr>
<td align="left" valign="top">Test error</td>
<td align="center" valign="top">0.0021</td>
<td align="center" valign="top">0.0009</td></tr>
<tr>
<td align="left" valign="top">Validation error</td>
<td align="center" valign="top">0.0042</td>
<td align="center" valign="top">0.0031</td></tr></tbody></table></table-wrap>
<table-wrap id="t5-ijms-13-15387" position="float">
<label>Table 5</label>
<caption>
<p>Experimental and predicted retention times (RT) of naturally occurring phenolic compounds.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" valign="middle" rowspan="3">Sr No.</th>
<th align="left" valign="middle" rowspan="3">Compound</th>
<th align="center" valign="middle" rowspan="3">Experimental RT (min)</th>
<th colspan="4" align="center" valign="bottom">Predicted RT (min)</th></tr>
<tr>
<th colspan="4" align="left" valign="bottom">
<hr/></th></tr>
<tr>
<th align="center" valign="bottom">SMLR</th>
<th align="center" valign="bottom">UFS-SMLR</th>
<th align="center" valign="bottom">SMLR-ANN</th>
<th align="center" valign="bottom">UFS-SMLR-ANN</th></tr></thead>
<tbody>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">Gallic acid</td>
<td align="center" valign="top">1.63</td>
<td align="center" valign="top">1.82</td>
<td align="center" valign="top">2.12</td>
<td align="center" valign="top">1.94</td>
<td align="center" valign="top">2.54</td></tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top">Gentisic acid</td>
<td align="center" valign="top">3.02</td>
<td align="center" valign="top">3.36</td>
<td align="center" valign="top">3.65</td>
<td align="center" valign="top">3.28</td>
<td align="center" valign="top">3.49</td></tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top">Protocatechuicacid <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">2.43</td>
<td align="center" valign="top">2.61</td>
<td align="center" valign="top">3.04</td>
<td align="center" valign="top">2.67</td>
<td align="center" valign="top">2.94</td></tr>
<tr>
<td align="left" valign="top">4</td>
<td align="left" valign="top">Salicylic acid <xref ref-type="table-fn" rid="tfn2-ijms-13-15387">a</xref></td>
<td align="center" valign="top">3.96</td>
<td align="center" valign="top">3.93</td>
<td align="center" valign="top">4.23</td>
<td align="center" valign="top">3.89</td>
<td align="center" valign="top">4.04</td></tr>
<tr>
<td align="left" valign="top">5</td>
<td align="left" valign="top">Syringic acid</td>
<td align="center" valign="top">3.27</td>
<td align="center" valign="top">3.36</td>
<td align="center" valign="top">2.58</td>
<td align="center" valign="top">3.10</td>
<td align="center" valign="top">2.61</td></tr>
<tr>
<td align="left" valign="top">6</td>
<td align="left" valign="top">Vanillic acid</td>
<td align="center" valign="top">3.14</td>
<td align="center" valign="top">3.29</td>
<td align="center" valign="top">3.05</td>
<td align="center" valign="top">3.07</td>
<td align="center" valign="top">2.93</td></tr>
<tr>
<td align="left" valign="top">7</td>
<td align="left" valign="top">2,4-Dihydroxybenzoic acid <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">3.26</td>
<td align="center" valign="top">2.67</td>
<td align="center" valign="top">3.13</td>
<td align="center" valign="top">2.76</td>
<td align="center" valign="top">3.05</td></tr>
<tr>
<td align="left" valign="top">8</td>
<td align="left" valign="top">3-Methoxybenzoic acid</td>
<td align="center" valign="top">4.32</td>
<td align="center" valign="top">4.25</td>
<td align="center" valign="top">3.53</td>
<td align="center" valign="top">4.37</td>
<td align="center" valign="top">3.31</td></tr>
<tr>
<td align="left" valign="top">9</td>
<td align="left" valign="top">4-Hydroxybenzoic acid</td>
<td align="center" valign="top">2.94</td>
<td align="center" valign="top">2.88</td>
<td align="center" valign="top">3.60</td>
<td align="center" valign="top">2.90</td>
<td align="center" valign="top">3.45</td></tr>
<tr>
<td align="left" valign="top">10</td>
<td align="left" valign="top">Caffeicacid <xref ref-type="table-fn" rid="tfn2-ijms-13-15387">a</xref></td>
<td align="center" valign="top">3.24</td>
<td align="center" valign="top">2.69</td>
<td align="center" valign="top">3.31</td>
<td align="center" valign="top">2.74</td>
<td align="center" valign="top">3.08</td></tr>
<tr>
<td align="left" valign="top">11</td>
<td align="left" valign="top">Chlorogenic acid</td>
<td align="center" valign="top">3.07</td>
<td align="center" valign="top">3.26</td>
<td align="center" valign="top">3.13</td>
<td align="center" valign="top">3.16</td>
<td align="center" valign="top">2.78</td></tr>
<tr>
<td align="left" valign="top">12</td>
<td align="left" valign="top">Ferulicacid <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">3.80</td>
<td align="center" valign="top">3.84</td>
<td align="center" valign="top">4.11</td>
<td align="center" valign="top">3.84</td>
<td align="center" valign="top">3.89</td></tr>
<tr>
<td align="left" valign="top">13</td>
<td align="left" valign="top"><italic>m</italic>-Coumaric acid</td>
<td align="center" valign="top">3.88</td>
<td align="center" valign="top">3.69</td>
<td align="center" valign="top">3.94</td>
<td align="center" valign="top">3.67</td>
<td align="center" valign="top">3.71</td></tr>
<tr>
<td align="left" valign="top">14</td>
<td align="left" valign="top"><italic>o</italic>-Coumaric acid</td>
<td align="center" valign="top">4.07</td>
<td align="center" valign="top">4.39</td>
<td align="center" valign="top">4.42</td>
<td align="center" valign="top">4.31</td>
<td align="center" valign="top">4.37</td></tr>
<tr>
<td align="left" valign="top">15</td>
<td align="left" valign="top"><italic>p</italic>-Coumaric acid</td>
<td align="center" valign="top">3.63</td>
<td align="center" valign="top">3.47</td>
<td align="center" valign="top">3.70</td>
<td align="center" valign="top">3.45</td>
<td align="center" valign="top">3.54</td></tr>
<tr>
<td align="left" valign="top">16</td>
<td align="left" valign="top">Sinapic acid</td>
<td align="center" valign="top">3.85</td>
<td align="center" valign="top">3.86</td>
<td align="center" valign="top">3.80</td>
<td align="center" valign="top">3.89</td>
<td align="center" valign="top">3.59</td></tr>
<tr>
<td align="left" valign="top">17</td>
<td align="left" valign="top"><italic>trans-</italic>Cinnamicacid <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">4.69</td>
<td align="center" valign="top">4.80</td>
<td align="center" valign="top">4.38</td>
<td align="center" valign="top">4.69</td>
<td align="center" valign="top">4.14</td></tr>
<tr>
<td align="left" valign="top">18</td>
<td align="left" valign="top">Dihydrocaffeic acid</td>
<td align="center" valign="top">3.00</td>
<td align="center" valign="top">2.84</td>
<td align="center" valign="top">2.52</td>
<td align="center" valign="top">2.85</td>
<td align="center" valign="top">2.57</td></tr>
<tr>
<td align="left" valign="top">19</td>
<td align="left" valign="top">Homovanillicacid <xref ref-type="table-fn" rid="tfn2-ijms-13-15387">a</xref></td>
<td align="center" valign="top">3.22</td>
<td align="center" valign="top">3.29</td>
<td align="center" valign="top">3.08</td>
<td align="center" valign="top">3.14</td>
<td align="center" valign="top">3.00</td></tr>
<tr>
<td align="left" valign="top">20</td>
<td align="left" valign="top">DOPAC</td>
<td align="center" valign="top">2.34</td>
<td align="center" valign="top">2.11</td>
<td align="center" valign="top">2.27</td>
<td align="center" valign="top">2.19</td>
<td align="center" valign="top">2.59</td></tr>
<tr>
<td align="left" valign="top">21</td>
<td align="left" valign="top">4-hydroxyphenylacetic acid <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">2.92</td>
<td align="center" valign="top">3.34</td>
<td align="center" valign="top">2.64</td>
<td align="center" valign="top">3.28</td>
<td align="center" valign="top">2.79</td></tr>
<tr>
<td align="left" valign="top">22</td>
<td align="left" valign="top">Ellagic acid</td>
<td align="center" valign="top">3.80</td>
<td align="center" valign="top">3.90</td>
<td align="center" valign="top">3.65</td>
<td align="center" valign="top">4.07</td>
<td align="center" valign="top">3.27</td></tr>
<tr>
<td align="left" valign="top">23</td>
<td align="left" valign="top">Vanillin</td>
<td align="center" valign="top">3.49</td>
<td align="center" valign="top">3.52</td>
<td align="center" valign="top">3.18</td>
<td align="center" valign="top">3.45</td>
<td align="center" valign="top">3.05</td></tr>
<tr>
<td align="left" valign="top">24</td>
<td align="left" valign="top">Tyrosol</td>
<td align="center" valign="top">2.73</td>
<td align="center" valign="top">3.00</td>
<td align="center" valign="top">2.80</td>
<td align="center" valign="top">3.05</td>
<td align="center" valign="top">2.77</td></tr>
<tr>
<td align="left" valign="top">25</td>
<td align="left" valign="top">Apigenin <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">5.14</td>
<td align="center" valign="top">5.01</td>
<td align="center" valign="top">4.88</td>
<td align="center" valign="top">5.16</td>
<td align="center" valign="top">4.99</td></tr>
<tr>
<td align="left" valign="top">26</td>
<td align="left" valign="top">Chrysin <xref ref-type="table-fn" rid="tfn2-ijms-13-15387">a</xref></td>
<td align="center" valign="top">5.92</td>
<td align="center" valign="top">6.18</td>
<td align="center" valign="top">5.78</td>
<td align="center" valign="top">5.77</td>
<td align="center" valign="top">5.62</td></tr>
<tr>
<td align="left" valign="top">27</td>
<td align="left" valign="top">Luteolin <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">4.76</td>
<td align="center" valign="top">4.33</td>
<td align="center" valign="top">4.82</td>
<td align="center" valign="top">4.45</td>
<td align="center" valign="top">4.90</td></tr>
<tr>
<td align="left" valign="top">28</td>
<td align="left" valign="top">Luteolin-7-<italic>O</italic>-glucoside</td>
<td align="center" valign="top">3.81</td>
<td align="center" valign="top">4.10</td>
<td align="center" valign="top">4.32</td>
<td align="center" valign="top">4.10</td>
<td align="center" valign="top">4.24</td></tr>
<tr>
<td align="left" valign="top">29</td>
<td align="left" valign="top">Kaempferide</td>
<td align="center" valign="top">6.06</td>
<td align="center" valign="top">5.65</td>
<td align="center" valign="top">5.91</td>
<td align="center" valign="top">5.66</td>
<td align="center" valign="top">5.74</td></tr>
<tr>
<td align="left" valign="top">30</td>
<td align="left" valign="top">Myricetin</td>
<td align="center" valign="top">4.28</td>
<td align="center" valign="top">3.98</td>
<td align="center" valign="top">4.03</td>
<td align="center" valign="top">3.98</td>
<td align="center" valign="top">4.00</td></tr>
<tr>
<td align="left" valign="top">31</td>
<td align="left" valign="top">Quercetin <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">4.76</td>
<td align="center" valign="top">4.28</td>
<td align="center" valign="top">4.87</td>
<td align="center" valign="top">4.39</td>
<td align="center" valign="top">4.89</td></tr>
<tr>
<td align="left" valign="top">32</td>
<td align="left" valign="top">Rutin</td>
<td align="center" valign="top">3.73</td>
<td align="center" valign="top">3.91</td>
<td align="center" valign="top">3.62</td>
<td align="center" valign="top">3.82</td>
<td align="center" valign="top">3.62</td></tr>
<tr>
<td align="left" valign="top">33</td>
<td align="left" valign="top">Hesperidin</td>
<td align="center" valign="top">3.94</td>
<td align="center" valign="top">3.71</td>
<td align="center" valign="top">4.23</td>
<td align="center" valign="top">3.73</td>
<td align="center" valign="top">4.26</td></tr>
<tr>
<td align="left" valign="top">34</td>
<td align="left" valign="top">Isosakuranetin</td>
<td align="center" valign="top">5.94</td>
<td align="center" valign="top">5.74</td>
<td align="center" valign="top">5.45</td>
<td align="center" valign="top">5.68</td>
<td align="center" valign="top">5.43</td></tr>
<tr>
<td align="left" valign="top">35</td>
<td align="left" valign="top">Naringenin</td>
<td align="center" valign="top">5.11</td>
<td align="center" valign="top">5.05</td>
<td align="center" valign="top">4.87</td>
<td align="center" valign="top">5.20</td>
<td align="center" valign="top">5.04</td></tr>
<tr>
<td align="left" valign="top">36</td>
<td align="left" valign="top">(+)-Catechin <xref ref-type="table-fn" rid="tfn3-ijms-13-15387">b</xref></td>
<td align="center" valign="top">2.99</td>
<td align="center" valign="top">3.91</td>
<td align="center" valign="top">4.07</td>
<td align="center" valign="top">3.89</td>
<td align="center" valign="top">3.63</td></tr>
<tr>
<td align="left" valign="top">37</td>
<td align="left" valign="top">(−)-Epicatechin <xref ref-type="table-fn" rid="tfn2-ijms-13-15387">a</xref></td>
<td align="center" valign="top">3.26</td>
<td align="center" valign="top">3.66</td>
<td align="center" valign="top">3.67</td>
<td align="center" valign="top">3.63</td>
<td align="center" valign="top">3.28</td></tr>
<tr>
<td align="left" valign="top">38</td>
<td align="left" valign="top">Genistein</td>
<td align="center" valign="top">5.09</td>
<td align="center" valign="top">5.15</td>
<td align="center" valign="top">5.12</td>
<td align="center" valign="top">5.37</td>
<td align="center" valign="top">5.21</td></tr>
<tr>
<td align="left" valign="top">39</td>
<td align="left" valign="top">(+)-Taxifolin</td>
<td align="center" valign="top">3.85</td>
<td align="center" valign="top">3.57</td>
<td align="center" valign="top">4.02</td>
<td align="center" valign="top">3.51</td>
<td align="center" valign="top">3.78</td></tr></tbody></table>
<table-wrap-foot><fn id="tfn1-ijms-13-15387">
<p>For ANN models, compounds labelled with letter</p></fn><fn id="tfn2-ijms-13-15387">
<label>a</label>
<p>represent molecules in the test set, while those with</p></fn><fn id="tfn3-ijms-13-15387">
<label>b</label>
<p>represent molecules in the validation set and unlabelled compounds are in training set.</p></fn></table-wrap-foot></table-wrap></sec></back></article>
