<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">biomolecules</journal-id>
      <journal-title>Biomolecules</journal-title>
      <abbrev-journal-title abbrev-type="publisher">Biomolecules</abbrev-journal-title>
      <abbrev-journal-title abbrev-type="pubmed">Biomolecules</abbrev-journal-title>
      <issn pub-type="epub">2218-273X</issn>
      <publisher>
        <publisher-name>MDPI</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/biom2010001</article-id>
      <article-id pub-id-type="publisher-id">biomolecules-02-00001</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Exploring the Optimal Strategy to Predict Essential Genes in Microbes</article-title>
      </title-group>
      
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Deng</surname>
            <given-names>Jingyuan</given-names>
          </name>
          <xref rid="af1-biomolecules-02-00001" ref-type="aff">1</xref>
          <xref rid="af2-biomolecules-02-00001" ref-type="aff">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Tan</surname>
            <given-names>Lirong</given-names>
          </name>
          <xref rid="af1-biomolecules-02-00001" ref-type="aff">1</xref>
          <xref rid="af2-biomolecules-02-00001" ref-type="aff">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Lin</surname>
            <given-names>Xiaodong</given-names>
          </name>
          <xref rid="af4-biomolecules-02-00001" ref-type="aff">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Lu</surname>
            <given-names>Yao</given-names>
          </name>
          <xref rid="af5-biomolecules-02-00001" ref-type="aff">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Lu</surname>
            <given-names>Long J.</given-names>
          </name>
          <xref rid="af1-biomolecules-02-00001" ref-type="aff">1</xref>
          <xref rid="af2-biomolecules-02-00001" ref-type="aff">2</xref>
          <xref rid="af3-biomolecules-02-00001" ref-type="aff">3</xref>
          <xref rid="c1-biomolecules-02-00001" ref-type="corresp">*</xref>
        </contrib>
      </contrib-group>
	  <aff id="af1-biomolecules-02-00001"><label>1 </label>Division of Biomedical Informatics, Cincinnati Children’s Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229-3026, USA; Email: <email>dengjn@gmail.com</email> (J.D.); <email>lrtan.lydia@gmail.com</email> (L.T.)</aff>
      <aff id="af2-biomolecules-02-00001"><label>2 </label>Department of Computer Science, School of Computing Sciences and Informatics, University of Cincinnati, 814 Rhodes Hall, Cincinnati, OH 45221-0030, USA</aff>
      <aff id="af3-biomolecules-02-00001"><label>3 </label>Department of Environmental Health, College of Medicine, University of Cincinnati, 231 Albert Sabin Way, Cincinnati, OH 45267-0524, USA</aff>
      <aff id="af4-biomolecules-02-00001"><label>4 </label>Department of Management Science &amp; Information Systems, Rutgers University, 252 Janice H. Levin Hall, Piscataway, NJ 08854, USA; Email: <email>xiaodonglin@gmail.com</email></aff>
      <aff id="af5-biomolecules-02-00001"><label>5 </label>Shanghai Institute of Medical Genetics, Shanghai Jiaotong University, 24/1400 Beijing (W) Road, Shanghai 200040, China; Email: <email>lvyao2005@hotmail.com</email></aff>
      <author-notes>
        <corresp id="c1-biomolecules-02-00001"><label>*</label> Author  to whom correspondence should be addressed;  Email: <email>long.lu@cchmc.org</email>; Tel.: +513-636-8720; Fax: +513-636-2056.</corresp>
      </author-notes>
      <pub-date pub-type="epub">
        <day>26</day>
        <month>12</month>
        <year>2011</year>
      </pub-date>
      <pub-date pub-type="collection">
        <month>03</month>
		<year>2012</year>
      </pub-date>
      <volume>2</volume>
      <issue>1</issue>
      <fpage>1</fpage>
      <lpage>22</lpage>
      <history>
        <date date-type="received">
          <day>11</day>
          <month>11</month>
          <year>2011</year>
        </date>
        <date date-type="rev-recd">
          <day>16</day>
          <month>12</month>
          <year>2011</year>
        </date>
        <date date-type="accepted">
          <day>19</day>
          <month>12</month>
          <year>2011</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
        <copyright-year>2012</copyright-year>
        <license xmlns:xlink="http://www.w3.org/1999/xlink" license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
          <p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p>
        </license>
      </permissions>
      <abstract>
        <p>Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.</p>
      </abstract>
      <kwd-group>
        <kwd>essential genes</kwd>
        <kwd>machine learning</kwd>
        <kwd>annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>1.Introduction</title>
      <p>Essential genes are defined as those that, when disrupted, confer a lethal phenotype to microorganisms under defined conditions. As such, the essentiality of a gene is the indispensability of this gene’s product to the survival of a microorganism. A complete understanding of gene essentiality is important in multiple facets of biology, medicine and bioengineering. For example, because of the lethal consequences of their disruption, essential genes are often attractive targets of antibiotics [<xref ref-type="bibr" rid="B1-biomolecules-02-00001">1</xref>]. Essential genes of an organism also constitute its minimal gene set, a key concept in the emerging field of synthetic biology [<xref ref-type="bibr" rid="B2-biomolecules-02-00001">2</xref>,<xref ref-type="bibr" rid="B3-biomolecules-02-00001">3</xref>]. Furthermore, studying gene essentiality is a crucial step toward unraveling the complex relationship between genotype and phenotype [<xref ref-type="bibr" rid="B4-biomolecules-02-00001">4</xref>], a fundamental question in genetics. </p>
      <p>Systematic genome-wide interrogations of essential genes have been conducted by single gene knockouts [<xref ref-type="bibr" rid="B5-biomolecules-02-00001">5</xref>,<xref ref-type="bibr" rid="B6-biomolecules-02-00001">6</xref>,<xref ref-type="bibr" rid="B7-biomolecules-02-00001">7</xref>,<xref ref-type="bibr" rid="B8-biomolecules-02-00001">8</xref>], transposon mutagenesis [<xref ref-type="bibr" rid="B9-biomolecules-02-00001">9</xref>,<xref ref-type="bibr" rid="B10-biomolecules-02-00001">10</xref>,<xref ref-type="bibr" rid="B11-biomolecules-02-00001">11</xref>,<xref ref-type="bibr" rid="B12-biomolecules-02-00001">12</xref>,<xref ref-type="bibr" rid="B13-biomolecules-02-00001">13</xref>,<xref ref-type="bibr" rid="B14-biomolecules-02-00001">14</xref>,<xref ref-type="bibr" rid="B15-biomolecules-02-00001">15</xref>], or antisense RNA inhibitions [<xref ref-type="bibr" rid="B16-biomolecules-02-00001">16</xref>,<xref ref-type="bibr" rid="B17-biomolecules-02-00001">17</xref>]. Although the efficiency of gene deletion has improved, performing large-scale experiments to knock out each gene encoded in an organism’s genome, usually in the magnitude of thousands, is still a daunting task. The work of experimentally identifying essential genes in an organism is even more formidable than was once thought as researchers have found that growth conditions can significantly alter the spectrum of essentiality in bacteria [<xref ref-type="bibr" rid="B18-biomolecules-02-00001">18</xref>,<xref ref-type="bibr" rid="B19-biomolecules-02-00001">19</xref>,<xref ref-type="bibr" rid="B20-biomolecules-02-00001">20</xref>,<xref ref-type="bibr" rid="B21-biomolecules-02-00001">21</xref>,<xref ref-type="bibr" rid="B22-biomolecules-02-00001">22</xref>] and yeast [<xref ref-type="bibr" rid="B23-biomolecules-02-00001">23</xref>]. Therefore, computational methods for predicting essential genes become an appealing option for circumventing the expense and difficulty of experimental screens. A computational prediction is especially useful when the organism is either unculturable, such as <italic>Pneumocystis carinii</italic>, or difficult to perform gene disruption on, such as <italic>Aspergillus fumigatus</italic>.</p>
      <p>In our previous research, we developed a machine-learning based algorithm that predicts essential genes by integrating diverse types of information encoded in a microorganism’s genome that are potentially associated with gene essentiality [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>]. We tested this algorithm in four bacterial species whose essential genes have been well characterized: <italic>Escherichia coli</italic> (<italic>EC</italic>), <italic>Pseudomonas aeruginosa</italic> (<italic>PA</italic>), <italic>Acinetobacter baylyi </italic>(<italic>AB</italic>) and <italic>Bacillus subtilis</italic> (<italic>BS</italic>). Ten-fold cross-validations in each organism showed a high predictive accuracy (AUC: ~0.9). We also reported that gene essentiality can be reliably transferred using features trained and tested in a distantly related microorganism (AUC: 0.69–0.89). Cross-organism predictions significantly outperformed homology mapping. </p>
      <p>Our algorithm thus significantly extended our ability to predict essential genes beyond orthologs by providing two alternative approaches: We can learn the characteristics underlying the subset of known essential genes in one organism and predict the essentiality of the rest of the genes in the same organism. Alternatively, we can transfer the gene essentiality from its most closely related model organisms where a complete set of essential genes is available. However, to determine the essential gene set in an understudied microbe, both approaches have potential limitations. The first approach is limited by the often low number of known essential genes, while the second approach is limited by the availability of model organisms and the evolutionary distance to the target organism. Although our previous work demonstrated that both approaches are capable of producing accurate predictions, further study is needed to determine the most suitable situation each approach can be employed.</p>
      <p>The current study represents a significant progress since our previous work by aiming to determine an optimal strategy for predicting essential genes in an understudied microbe by examining these potential limitations with regard to the above-mentioned approaches and a third approach that combines the two approaches. We performed our investigations on two pairs of microbes with well-characterized essential genes: two prokaryotes, <italic>Escherichia coli</italic> K-12 (<italic>EC</italic>) and <italic>Acinetobacter baylyi</italic> ADP1 (<italic>AB</italic>) and two eukaryotes, <italic>Saccharomyces cerevisiae </italic>S288c (<italic>SC</italic>) and <italic>Neurospora crassa</italic> OR74A (<italic>NC</italic>). We withheld different fractions of known essential genes in each organism and evaluated the predictive performance. Through these simulations, we were able to reveal the conditions under which each approach is most suitable for predicting essential genes in a microbe with respect to the size of known essential genes. The results obtained from our study will greatly facilitate the annotations of microbial genomes and provide valuable information to synthetic biology. </p>
    </sec>
    <sec sec-type="methods">
      <title>2.Experimental</title>
      <sec>
        <title>2.1. Data Sources</title>
        <p><italic>E. coli</italic> K-12 sequence data were downloaded from Comprehensive Microbial Resource (CMR) database at <uri>http://cmr.jcvi.org</uri>. It contains 4289 protein sequences in total [<xref ref-type="bibr" rid="B25-biomolecules-02-00001">25</xref>]. The essential genes of <italic>E. coli</italic> K-12 were downloaded from the PEC database [<xref ref-type="bibr" rid="B7-biomolecules-02-00001">7</xref>]. The Kato dataset contains 302 essential genes from gene deletion experiments. </p>
        <p><italic>A. baylyi ADP1</italic> sequences were collected from the Magnifying Genomes database (<uri>http://www.genoscope.cns.fr/agc/mage</uri>). Of a total of 3308 genes, 499 are essential genes from de Berardinis <italic>et al</italic>. [<xref ref-type="bibr" rid="B6-biomolecules-02-00001">6</xref>]</p>
        <p><italic>S. cerevisiae S288c</italic> sequences were downloaded from Saccharomyces Genome Database at: <uri>http://downloads.yeastgenome.org/sequence/genomic_sequence/</uri>. It contains 5885 ORFs. The essential gene list was downloaded from Giaever <italic>et al</italic>. [<xref ref-type="bibr" rid="B26-biomolecules-02-00001">26</xref>]. This dataset contains 1049 essential genes from targeted mutagenesis experiments. </p>
        <p><italic>N. crassa OR74A</italic> ORFs were downloaded from <italic>Neurospora crassa</italic> database at Broad Institute at <uri>http://www.broadinstitute.org/annotation/genome/neurospora/MultiDownloads.html</uri>. Dubious ORFs and pseudogenes were excluded from this list. The essential gene dataset was kindly provided by K. Borkovich at UC Riverside from the systematic genome deletion project in <italic>N. crassa</italic>. This list contains 7172 experimental verified essential/nonessential genes, and 1251 of them are essential genes.</p>
        <p>Gene expression data in these organisms were downloaded from NCBI GEO [<xref ref-type="bibr" rid="B27-biomolecules-02-00001">27</xref>], ArrayExpress [<xref ref-type="bibr" rid="B28-biomolecules-02-00001">28</xref>], and the gene-expression profiles of microarray data from Gasch <italic>et al</italic>. [<xref ref-type="bibr" rid="B29-biomolecules-02-00001">29</xref>].</p>
      </sec>
      <sec id="sec2dot2-biomolecules-02-00001">
        <title>2.2. Genomic Features</title>
        <p>Based on our previous research, we considered three main types of features: (A) those intrinsic to a gene’s sequence (e.g., GC content, length); (B) those derived from genomic sequence (e.g., localization signals and codon adaptation measures); and (C) experimental functional genomics data (e.g., gene-expression microarray data) (<xref ref-type="table" rid="biomolecules-02-00001-t001">Table 1</xref>). The detailed descriptions of these features and their biological implications can be found in the supplemental materials as well as in Deng <italic>et al</italic>. [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>]. For example, domain enrichment score (DES) reflects the conservation of local domains rather than the entire gene, which is calculated by the ratio of the domain’s occurrence frequencies in essential genes <italic>vs.</italic> in total genes in a given organism. In another example, phylogenetic score (PHYS) measures the evolutionary conservation of a gene, which is calculated by counting the number of genomes that have orthologous hits. Such conservation has been shown to correlate well with the indispensability of a gene. These diverse types of features suggest that gene essentiality is likely determined not solely by the genomic sequence, but by multiple aspects of biology coinciding. </p>
        <table-wrap id="biomolecules-02-00001-t001" position="anchor">
          <object-id pub-id-type="pii">biomolecules-02-00001-t001_Table 1</object-id>
          <label>Table 1</label>
          <caption>
            <p>Thirty-five considered features.</p>
          </caption>
          <table>
            <thead>
              <tr>
                <th colspan="2" align="left" valign="middle">Feature</th>
                <th colspan="2" align="left" valign="middle">Description</th>
                <th align="left" valign="middle">Class *</th>
                <th align="left" valign="middle">Data type</th>
                <th align="left" valign="middle">Available **</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td colspan="2" align="left" valign="middle">Aromo</td>
                <td colspan="2" align="left" valign="middle">Aromaticity score</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">A3s</td>
                <td colspan="2" align="left" valign="middle">Base composition A</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">C3s</td>
                <td colspan="2" align="left" valign="middle">Base composition C</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/SC/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">G3s</td>
                <td colspan="2" align="left" valign="middle">Base composition G</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/<bold>AB</bold>/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">T3s</td>
                <td colspan="2" align="left" valign="middle">Base composition T</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/<bold>AB</bold>/SC/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">CAI</td>
                <td colspan="2" align="left" valign="middle">Codon adaptation index</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/<bold>AB</bold>/SC/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">CBI</td>
                <td colspan="2" align="left" valign="middle">Codon bias index</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle"><bold>EC</bold>/AB/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Fop</td>
                <td colspan="2" align="left" valign="middle">Frequency of optimal codons</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/<bold>SC</bold>/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Nc</td>
                <td colspan="2" align="left" valign="middle">Effective number of codons</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">L_sym</td>
                <td colspan="2" align="left" valign="middle">Frequency of synonymous codons</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Integer</td>
                <td align="left" valign="middle">EC/AB/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">L_aa</td>
                <td colspan="2" align="left" valign="middle">Length amino acids</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Integer</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">GC</td>
                <td colspan="2" align="left" valign="middle">GC content</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">GC3s</td>
                <td colspan="2" align="left" valign="middle">GC content 3rd position of synonymous codons</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Gravy</td>
                <td colspan="2" align="left" valign="middle">Hydrophobicity score</td>
                <td align="left" valign="middle">A</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/<bold>SC</bold>/NC</td>
              </tr>
              <tr style="border-top:solid thin">
                <td colspan="2" align="left" valign="middle">Cytoplasm</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: cytoplasm</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Extracellular</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Extracellular</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>EC</bold>/AB/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Inner</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Inner membrane</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Outer</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Outer membrane</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">EC/AB</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Periplasm</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Periplasm</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">EC/AB</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Golgi</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Golgi</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Nucleus</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Nucleus</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Mito</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Mitochondrion</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Plasma</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Plasma membrane</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">SC/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Vacuole</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Vacuole</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">Peroxisome</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Peroxisome</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">ER</td>
                <td colspan="2" align="left" valign="middle">Subcellular localization: Endoplasmic reticulum</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle">SC/NC</td>
              </tr>
              <tr style="border-top:solid thin">
                <td colspan="2" align="left" valign="middle">ExpAA</td>
                <td colspan="2" align="left" valign="middle">Expect number of Amino acids in helices</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">First60</td>
                <td colspan="2" align="left" valign="middle">Expect number of AAs in helices in first 60 AAs</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle">EC/AB/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">PredHel</td>
                <td colspan="2" align="left" valign="middle">Number of predicted TM helices</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Integer</td>
                <td align="left" valign="middle">EC/AB/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">PHYS</td>
                <td colspan="2" align="left" valign="middle">Phylogenetic score</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">PA</td>
                <td colspan="2" align="left" valign="middle">Paralogy </td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">DES</td>
                <td colspan="2" align="left" valign="middle">Domain enrichment score</td>
                <td align="left" valign="middle">B</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>AB</bold>/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr style="border-top:solid thin">
                <td colspan="2" align="left" valign="middle">FLU</td>
                <td colspan="2" align="left" valign="middle">Fluctuation</td>
                <td align="left" valign="middle">C</td>
                <td align="left" valign="middle">Real</td>
                <td align="left" valign="middle"><bold>EC</bold>/<bold>SC</bold>/<bold>NC</bold></td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">CEH</td>
                <td colspan="2" align="left" valign="middle">Coexpression network hubs</td>
                <td align="left" valign="middle">C</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>EC</bold>/SC/NC</td>
              </tr>
              <tr>
                <td colspan="2" align="left" valign="middle">CEB</td>
                <td colspan="2" align="left" valign="middle">Coexpression network bottlenecks</td>
                <td align="left" valign="middle">C</td>
                <td align="left" valign="middle">Boolean</td>
                <td align="left" valign="middle"><bold>EC</bold>/SC/NC</td>
              </tr>
            </tbody>
          </table>
        <table-wrap-foot>
		<fn>
		<p>*—Class A: Sequence-based intrinsic features; Class B: Sequence-derived intrinsic features; Class C: Context-dependent features; **—Features used in the training and testing in each organism are in bold.</p>
		</fn>
		</table-wrap-foot>
		</table-wrap>
        
        <p>We evaluated these features based on their predictive power following a procedure described in Deng <italic>et al</italic>. [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>]. To briefly summarize, we performed a logistic regression analysis and ranked all features according to the cover length of log-odds ratio. A longer overall coverage length indicates greater contribution of the corresponding feature to the gene essentiality. Because we were more interested in predicting essential genes rather than non-essential genes, the features with a positive coverage length were our candidate features. We also considered prior biological information to remove feature redundancy. </p>
      </sec>
      <sec>
        <title>2.3. Training and Testing Sets Preparation</title>
        <p>The training data included the attribute values for each feature and the class assignments. Each gene was assigned a Boolean value regarding its essentiality (1—essential; 0—non-essential). The training data were divided into 10 equal parts. Nine parts were used to train the classifiers and the remaining part was used for testing. The control training set was generated by randomly assigning essential labels to all genes. The same number of random “essential genes” as the number of true essential genes was used in the training and testing frame.</p>
        <sec>
          <title>2.3.1. Same-Organism Approach</title>
          <p>For each of the four organisms (<italic>i.e.</italic>, <italic>EC</italic>, <italic>AB</italic>, <italic>SC</italic> and <italic>NC</italic>), we withheld different fractions of known essential genes to simulate the situation that only partial true essential genes were known. These known essential genes were selected through random sampling and comprised of our “gold standard” positive set. Because there are more non-essential genes than essential genes (10:1 in prokaryotes and 5:1 in eukaryotes), we constructed our training datasets with the same essential <italic>vs.</italic> non-essential ratio to resemble the situation in nature. That is, for a “gold standard” positive set of size <italic>N</italic>, we randomly selected <italic>xN</italic> (<italic>x = </italic>10 for prokaryotes, and 5 for eukaryotes) genes from the non-essential genes as the “gold standard” negative set. We then solved the problem of imbalanced training set through data re-sampling, where we extracted a smaller set of non-essential genes while preserving all the essential instances. This method modifies the prior probability of the non-essential and essential classes to obtain a more balanced training set. Similar approaches have been used in other studies [<xref ref-type="bibr" rid="B30-biomolecules-02-00001">30</xref>,<xref ref-type="bibr" rid="B31-biomolecules-02-00001">31</xref>]. We trained our model using this training set. Each time we repeated the random process 200 times to obtain a reliable result. </p>
        </sec>
        <sec>
          <title>2.3.2. Cross-Organism Approach</title>
          <p>As described in Deng <italic>et al.</italic> [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>], when predicting essential genes in each of the four organisms, the training set is the complete gene set of its paired organism. For example, when we predict essential genes in <italic>EC</italic>, the training set is the complete gene set in <italic>AB</italic>, where the complete <italic>AB</italic> essential genes compose the “gold standard” positive set and the remaining <italic>AB</italic> non-essential genes consist of the “gold standard” negative set.</p>
        </sec>
        <sec>
          <title>2.3.3. The Combined Approach</title>
          <p>For each of the four organisms, the training set was constructed as the combination of the training sets in the same-organism approach and cross-organism approach. Meanwhile, we assigned different weights to each model organism based on the evolutionary distance to the target organism. For example, when we predicted essential genes in <italic>EC</italic>, the “gold standard” positive set consisted of a randomly selected fraction of essential genes in <italic>EC</italic> together with the complete set of essential genes in <italic>AB</italic>, where genes from <italic>EC</italic> were assigned weights <italic>w</italic> (<italic>w</italic> &gt; 1), and those from <italic>AB</italic> were assigned a weight of 1. Similarly, the “gold standard” negative set consisted of the same fraction of randomly selected non-essential genes from <italic>EC</italic> together with the complete set of non-essential genes in <italic>AB</italic>, with weights <italic>w</italic> and 1 respectively.</p>
        </sec>
      </sec>
      <sec>
        <title>2.4. Classifier Design</title>
        <p>We used a logistic regression classifier to train and test the model. All classifiers were implemented using the Orange software package (<uri>http://www.ailab.si/orange/</uri>). To train and test our classifier, features were first extracted where available for each ORF and annotated with known essentiality values, thereby creating our “gold standard” data set. Then the “gold standard” dataset was divided into 10 equal parts. Nine parts were used to train the classifiers and the remaining part was used for testing.</p>
        <p>Then we applied the model to the target organism, and predicted the probability of essentiality for each gene in that organism. Based on the true gene labels and the predicted probability, we were able to calculate the AUC (Area Under Curve) of the Receiving Operation Curve (ROC) and the Sensitivity (number of correctly predicted essential genes/total essential genes) of the prediction. AUC and Sensitivity were then used to evaluate the performance of the model.</p>
      </sec>
    </sec>
    <sec sec-type="results">
      <title>3.Results and Discussion</title>
      <sec>
        <title>3.1. Optimal Strategy for Predicting Essential Genes in EC</title>
        <p><italic>EC</italic> is a gram-negative bacterium commonly found in the lower intestine of warm-blooded organisms. It is one of the most well-studied prokaryotic model organisms and has the best-characterized essential genes. </p>
        <p>We compared three approaches using our previously developed integrative algorithm (<xref ref-type="table" rid="biomolecules-02-00001-t002">Table 2</xref>): (1) the same-organism approach, where we learned traits among the partially known essential genes in <italic>EC</italic> and predicted the rest of the essential genes; (2) the cross-organism approach, in which we learned traits among the known essential genes in <italic>AB</italic>, a closely-related model organism, and tried to predict the essential genes in <italic>EC</italic>; and (3) the combined approach, in which we learned traits among the known essential genes in <italic>AB</italic> as well as the partially known essential genes in <italic>EC</italic> and tried to predict the rest of the essential genes in <italic>EC</italic>. Because in our previous research we have shown that our cross-organism approach outperforms homology mapping [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>], we did not compare homology mapping in this study.</p>
        <table-wrap id="biomolecules-02-00001-t002" position="anchor">
          <object-id pub-id-type="pii">biomolecules-02-00001-t002_Table 2</object-id>
          <label>Table 2</label>
          <caption>
            <p>Summary of the three approaches (see Experimental Section for details).</p>
          </caption>
          <table>
            <thead>
              <tr style="border-top:solid thin">
                <th rowspan="2" align="left" valign="middle">Approach</th>
                <th rowspan="2" align="left" valign="middle">Description</th>
                <th colspan="2" align="center" valign="middle">“Gold Standard” Set</th>
                <th rowspan="2" align="left" valign="middle">Prediction Set</th>
              </tr>
              <tr style="border-top:solid thin">
                <th align="left" valign="middle">Training Set</th>
                <th align="left" valign="middle">Testing Set</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="left" valign="top">Same-organism approach</td>
                <td align="left" valign="top">Learning from the limited number of known essential genes in the target organism</td>
                <td align="left" valign="top">9/10 of the “gold standard” set of the target organism</td>
                <td align="left" valign="top">1/10 of the “gold standard” set of the target organism</td>
                <td align="left" valign="top">The entire set of genes except the “gold standard” in the target organism </td>
              </tr>
              <tr>
                <td align="left" valign="top">Cross-organism approach</td>
                <td align="left" valign="top">Learning from essential genes from a closely-related model organism</td>
                <td align="left" valign="top">9/10 of the “gold standard” set in the related model organism </td>
                <td align="left" valign="top">1/10 of the “gold standard” set in the related model organism</td>
                <td align="left" valign="top">The entire set of genes except the “gold standard” in the target organism</td>
              </tr>
              <tr>
                <td align="left" valign="top">Combined approach</td>
                <td align="left" valign="top">Learning from known essential genes in the target organism as well as a closely-related model organism with higher weights to the former</td>
                <td align="left" valign="top">9/10 of the “gold standard” combined set. The weights assigned to the genes in the target and model organism is w:1</td>
                <td align="left" valign="top">1/10 of the “gold standard” combined set</td>
                <td align="left" valign="top">The entire set of genes except the “gold standard” in the target organism</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <sec>
          <title>3.1.1. Same-Organism Approach: Learning Traits from the Partially Known Essential Genes in <italic>EC</italic></title>
          <p>Among the total characteristic features that we considered, we have identified 13 that are potentially associated with gene essentiality in EC with relatively weak correlations among themselves (<xref ref-type="table" rid="biomolecules-02-00001-t001">Table 1</xref>). Among these 13 features, we previously identified the domain enrichment score (DES) as the strongest [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>], suggesting that gene essentiality is likely preserved through the function of protein domains or domain combinations rather than through the conservation of the entire genes. To show its efficiency, in our model construction process, we separated this dominant feature from the remaining 12 features. First, we used 12 features excluding DES to build the “no-DES” model. Next, we compiled the DES feature with the other features to form the “with-DES” model. </p>
          <p>We first built the “no-DES” model in <italic>EC</italic> (see Experimental Section). The 12 selected features were used as input variables in the logistic regression classifier. The classifier generated a probability score of essentiality for each gene of the entire target organism (both “gold standard” set and prediction set (<xref ref-type="table" rid="biomolecules-02-00001-t002">Table 2</xref>)). Combining this probability score and the true essentiality information of each gene, we generated the ROC curve. The ROC was then evaluated by the AUC score. We gradually increased the size of known essential genes in our model. The result showed that the AUC score increased from 0.84 to 0.88 before the size of known essential genes reached 2% of the total genes in the genome. At this point, the model had already performed very closely to its optimal, achieving over 95% of its best performance. Beyond this point, the AUC score increased slowly from 0.88 to 0.89 even with a substantial increase of known essential genes (<xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>a, red curve). </p>
          <fig id="biomolecules-02-00001-f001" position="anchor">
            <label>Figure 1</label>
            <caption>
              <p>Comparison of three approaches in <italic>EC</italic>. (<bold>a</bold>) The distribution of AUC along with the different sizes of known essential genes in <italic>EC</italic>: red curve: same-organism approach “with no-DES”; black curve: same-organism approach “with DES”; blue curve: combined approach; green curve: the DES feature only dashed line: cross-organism approach. The bar chart of the correctly classified essential genes among the top 400 predictions with respect to the different sizes of known essential genes in <italic>EC </italic>using (<bold>b</bold>) “no-DES” model; (<bold>c</bold>) “with-DES” model; and (<bold>d</bold>) combined model. The black bar shows the correctly classified essential genes in the “gold standard” set.</p>
            </caption>
            <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-g001.tif"/>
          </fig>
          <p>Besides the AUC score, we were also interested in the number of genes we successfully classified. Using 10% as a cutoff, the top 400 genes with the largest probability scores were predicted as essential genes. Those 400 genes came from two parts, the “gold standard” set (<xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>b, black bar) and the prediction set (<xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>b, white bar). <xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>b showed that the performance was nearly stable if the known essential genes took up more than 2% of the total genes in <italic>EC</italic>. </p>
          <p>Next, we compiled the DES feature with the other 12 features and built model by the same process used for the “no-DES” set. Compared with the “no-DES” model, the results were significantly improved (<xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>a, black curves and <xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>c). We can see that the AUC reached 0.94 if we knew about 2% of total genes to be essential. <xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>c also suggested that the performance of the classification is stable if more than 2% of the total genes are known to be essential. They both decrease quickly as less essential information is given. We also applied our model only using the DES feature and compared the predictions with both the “no-DES” and “with-DES” sets (<xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>a). The comparison showed that the DES alone is not enough to make optimal predictions, suggesting that including more features is necessary to achieve the optimal prediction performance. </p>
        </sec>
        <sec>
          <title>3.1.2. Cross-Organism Approach: Transferring Essential Gene Annotations from <italic>AB</italic></title>
          <p><italic>AB </italic>is a gram-negative bacterium commonly found in aquatic and soil environments. It belongs to the same class of gram-negative proteobacteria as <italic>EC</italic>. A set of 499 <italic>AB</italic> essential genes has been identified by targeted mutagenesis. We were able to use <italic>AB</italic> essential genes set to predict essential genes in <italic>EC</italic> [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>], and the direct prediction yielded an ROC with the AUC score of 0.92. In <xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>a, the dashed line shows the AUC of the prediction from <italic>AB</italic>, and the black curve dominates it when 1.5% of the total genes are known to be essential. This suggested that knowing 1.5% or more genes of the total genes to be essential in <italic>EC</italic> is sufficient to achieve a prediction better than transferring annotations from <italic>AB</italic>.</p>
        </sec>
        <sec>
          <title>3.1.3. Combined Approach: Combining Both <italic>AB</italic> and Partially Known Essential Information in <italic>EC</italic></title>
          <p>Based on the above results, we had a new question: If we combine both <italic>AB</italic> and the fraction of known genes with essential information in <italic>EC</italic> as the new “gold standard” set and try to predict the rest of the essential genes in <italic>EC</italic>, could the result be significantly improved? To answer this question, we randomly chose a fraction of genes (we gradually increased the number of known genes from 10% to 90%) from <italic>EC</italic> and combined them with <italic>AB</italic> dataset (see Experimental Section). In the model training process, we assigned different weights to the two gene sets to obtain a more reliable result. Here, the partially known genes with essential information from <italic>EC</italic> have been set to have 4:1 weights <italic>vs.</italic> the <italic>AB</italic> genes. We trained the model on this combined “gold standard” set. Each time we also repeated the random process 200 times to estimate the variance. The results (<xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>a, blue curve) showed that the combined approach outperformed the same-organism approach at the beginning. However, the black curve quickly outperformed the blue curve as the known essential genes in <italic>EC</italic> increased. The correctly predicted genes in <xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>d also supported this result.</p>
        </sec>
      </sec>
      <sec>
        <title>3.2. Optimal Strategy for Predicting Essential Genes in AB</title>
        <p>In <italic>AB</italic>, we identified 11 features that are potentially associated with gene essentiality and have relatively weak correlations among themselves [<xref ref-type="bibr" rid="B24-biomolecules-02-00001">24</xref>] (<xref ref-type="table" rid="biomolecules-02-00001-t001">Table 1</xref>). We followed the same analysis procedure as in <italic>EC</italic>. In the same-organism approach, we first used 10 features excluding DES as the input of the classifier to build the “no-DES” model, and then including DES to build the “with-DES” model. The model generated a probability score of gene essentiality for each gene of the entire target organism. Combining this probability score and the true essentiality information of each gene, we were able to evaluate the performance. In <xref ref-type="fig" rid="biomolecules-02-00001-f002">Figure 2</xref>a, the red and black curves showed the distribution of the AUC scores of the results output from the “no-DES” and “with-DES” models respectively. Both curves increase rapidly before 2% (66/3308) of total genes are known to be essential, achieving more than 95% of the best performance. Compared with “no-DES” results, the results of “with-DES” were significantly improved. Also, the dashed line in <xref ref-type="fig" rid="biomolecules-02-00001-f002">Figure 2</xref>a shows the AUC of the cross-organism approach using <italic>EC</italic> essential genes, suggesting that knowing 2% of total genes to be essential is “sufficient” to lead to a prediction better than transfer from <italic>EC</italic>. <xref ref-type="fig" rid="biomolecules-02-00001-f002">Figure 2</xref>b and c show the bar charts of the correctly classified essential genes using the “no-DES” and “with-DES” models respectively. For <italic>AB</italic>, we adopted a similar percentage as the cutoff to predict essential genes as in <italic>EC</italic>, and the top 400 genes with the largest probability scores were predicted as essential genes. In both <xref ref-type="fig" rid="biomolecules-02-00001-f002">Figure 2</xref>b and 2c, the performance is nearly stable if the known essential genes take up more than 2% of the total genes in <italic>AB. </italic>In the combined approach, we combined both the <italic>EC</italic> essential genes with increasing numbers of known <italic>AB</italic> essential genes by assigning different weights. The blue curve (<xref ref-type="fig" rid="biomolecules-02-00001-f002">Figure 2</xref>a) shows the combined approach outperforming the same-organism approach only at the beginning. Compared with <xref ref-type="fig" rid="biomolecules-02-00001-f001">Figure 1</xref>, the difference between the combined approach and the same-organism approach in <italic>AB</italic> was less significant than in <italic>EC</italic>. The green curve in <xref ref-type="fig" rid="biomolecules-02-00001-f002">Figure 2</xref>a shows the performance of DES feature only. This suggests that the integration of different features is able to make more accurate predictions than using DES alone. </p>
        <fig id="biomolecules-02-00001-f002" position="anchor">
          <label>Figure 2</label>
          <caption>
            <p>Comparison of three approaches in <italic>AB. </italic>(<bold>a</bold>) The distribution of AUC along with the different sizes of known essential genes in <italic>AB</italic>: red curve: same-organism approach “with no-DES”; black curve: same-organism approach “with DES”; blue curve: combined approach; dashed line: cross-organism approach. The bar chart of the correctly classified essential genes among the top 400 predictions with respect to the different sizes of known essential genes in <italic>AB </italic>using (<bold>b</bold>) “no-DES” model; (<bold>c</bold>) “with-DES” model; and (<bold>d</bold>) combined model. The black bar shows the correctly classified essential genes in the “gold standard” set.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-g002.tif"/>
        </fig>
      </sec>
      <sec>
        <title>3.3. Optimal Strategy for Predicting Essential Genes in SC</title>
        <p>Our results suggested that essential genes are highly predictable by learning the characteristics underlying gene essentiality in prokaryotes. To test whether the same trend can also be observed in eukaryotic species, we chose <italic>SC</italic> and <italic>NC </italic>as our test candidate species. </p>
        <p><italic>SC</italic> is an important eukaryotic model organism in cell biology and is one of the most thoroughly studied eukaryotic microorganisms. There are 1049 essential genes identified by the systematic deletion project [<xref ref-type="bibr" rid="B26-biomolecules-02-00001">26</xref>]. Using the same-organism approach in <italic>SC</italic>, we identified 14 features potentially associated with gene essentiality (<xref ref-type="table" rid="biomolecules-02-00001-t001">Table 1</xref>). Domain enrichment score (DES) was found to be a strong feature in predicting essential genes in eukaryotes as well. This suggests that, much as in prokaryotes, gene essentiality in eukaryotes is likely preserved through the function of protein domains or domain combinations rather than through the conservation of entire genes. First, we used 13 features excluding DES as the input of the classifier. After the 10-fold cross-validation, each gene of the target organism received a probability score of essentiality. Combining this probability score and the true essentiality information of each gene, we were able to evaluate the performance. <xref ref-type="fig" rid="biomolecules-02-00001-f003">Figure 3</xref>a (red curve) showed the AUC curve of the “no-DES” results. It gradually increases along with the increase of the known essential genes and reaches stable at around 4% point on the x-axis, achieving 95% of the best performance. Besides the AUC curve, we also plotted the bar chart of correctly predicted essential genes (<xref ref-type="fig" rid="biomolecules-02-00001-f003">Figure 3</xref>b). Since essential genes comprise of about 20% of a eukaryotic genome, we used 1200 as the cutoff, <italic>i.e.</italic>, the 1200 genes with the highest essential scores were predicted as <italic>SC</italic> essential genes. The performance increased as we increased the size of the training dataset, and the saturation point was at 4%. <xref ref-type="fig" rid="biomolecules-02-00001-f003">Figure 3</xref>a (green curve) shows that, similar to in prokaryotes, DES is a strong feature to the prediction of gene essentiality and incorporating it with other functional and genomics features is able to achieve an optimal performance. </p>
        <fig id="biomolecules-02-00001-f003" position="anchor">
          <label>Figure 3</label>
          <caption>
            <p>Comparison of three approaches in <italic>SC. </italic>(<bold>a</bold>) The distribution of AUC along with the different sizes of known essential genes in <italic>SC</italic>: red curve: same-organism approach “with no-DES”; black curve: same-organism approach “with DES”; blue curve: combined approach; dashed line: cross-organism approach. The bar chart of the correctly classified essential genes among the top 1200 predictions with respect to the different sizes of known essential genes in <italic>SC </italic>using (<bold>b</bold>) “no-DES” model; (<bold>c</bold>) “with-DES” model; and (<bold>d</bold>) combined model. The black bar shows the correctly classified essential genes in the “gold standard” set.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-g003.tif"/>
        </fig>
        <p>Next, we added the DES feature into the model. <xref ref-type="fig" rid="biomolecules-02-00001-f003">Figure 3</xref>a (black curve) and <xref ref-type="fig" rid="biomolecules-02-00001-f003">Figure 3</xref>c show a similar trend, except the values are significantly higher than those of the “no-DES” results. This further supports the notion that the DES feature has strong power in predicting essential genes in eukaryotic species. Moreover, we note that the saturation occurred at 4% point in both figures. Thus, knowing 4% or more of the total genes is essential to building a reliable prediction. </p>
        <p>In the combined approach, we used both <italic>NC</italic> and the partially known essential genes in <italic>SC</italic> as the new training set. Would the result be significantly improved again? We followed the same scheme as described above. The results were consistent: As shown in <xref ref-type="fig" rid="biomolecules-02-00001-f003">Figure 3</xref>a, the performance of the same-organism approach (black curve) dominates the performance of the combined approach (blue curve) from about 1.5% on the <italic>x</italic>-axis. Although the saturation point of the prediction is different, the dominating points are almost the same as those in <italic>EC</italic> and <italic>AB</italic>. </p>
      </sec>
      <sec>
        <title>3.4. Optimal Strategy for Predicting Essential Genes in NC</title>
        <p><italic>NC</italic> is an ascomycete, the red bread mold. Like all fungi, it reproduces by spores. It is used as a eukaryotic model organism because it is easy to grow and has a haploid life cycle which makes genetic analysis easier. There are 1250 essential genes in <italic>NC</italic> produced by the systematic gene deletion project. We identified 14 features potentially associated with gene essentiality in <italic>NC </italic>(<xref ref-type="table" rid="biomolecules-02-00001-t001">Table 1</xref>). Following the same procedure as above, we analyzed the “no-DES” and “with-DES” dataset of the same-organism approach separately. We assigned the top 1500 genes as the predicted essential genes. <xref ref-type="fig" rid="biomolecules-02-00001-f004">Figure 4</xref>a shows that when given about 4% of total genes to be essential, the prediction achieves stable AUC with over 95% best performance. Compared with the red curve, the black curve is significantly improved. The blue curve also showed the performance of the combined approach using <italic>SC</italic> and partial <italic>NC</italic> known essential genes. The conclusion is similar to that in <italic>SC</italic>: The same-organism approach in <italic>NC</italic> (black curve) dominates the combined approach (blue curve) after at least 1.5% of the total genes are known to be essential.</p>
        <fig id="biomolecules-02-00001-f004" position="anchor">
          <label>Figure 4</label>
          <caption>
            <p>Comparison of three approaches in <italic>NC. </italic> (<bold>a</bold>) The distribution of AUC along with the different sizes of known essential genes in <italic>NC</italic>: red curve: same-organism approach “with no-DES”; black curve: same-organism approach “with DES”; blue curve: combined approach; dashed line: cross-organism approach. The bar chart of the correctly classified essential genes among the top 1500 predictions with respect to the different sizes of known essential genes in <italic>NC </italic>using (<bold>b</bold>) “no-DES” model; (<bold>c</bold>) “with-DES” model; and (<bold>d</bold>) combined model. The black bar shows the correctly classified essential genes in the “gold standard” set.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-g004.tif"/>
        </fig>
      </sec>
      <sec>
        <title>3.5. Discussion</title>
        <p>Our results suggest that, in prokaryotes, when the number of known essential genes is greater than 2% of total genes, it will achieve over 95% of the best performance, recovering &gt;68% of total essential genes at the given cutoff. For example, for an understudied organism with 3000 genes, we need to know ~60 essential genes in order to accurately predict the majority of its ~300 essential genes. In contrast, in eukaryotes, achieving the same level of performance requires more than 4% of total genes, reflecting the increased complexity of eukaryotic organisms. The complexity comes from different aspects. One possibility is that eukaryotes have more complex genome structures than prokaryotes, such as the expanded protein domain repertoire. In fact, <italic>EC</italic> and <italic>AB</italic> contain 5468 and 4204 unique domains, respectively, while <italic>SC</italic> and <italic>NC</italic> contain 6023 and 7031 unique domains, respectively, according to the Interpro database. In addition, higher organisms have larger and more complex cellular structure as well as perform more diversified functions, which also require them to have more essential genes. </p>
        <p>We found that the required number of known essential genes was surprisingly small for both prokaryotes and eukaryotes, suggesting that the distribution of genomic features extracted from this small subset already provided a close approximation to the distribution of those extracted from the entire essential gene set. This showed the advantage of predicting essential genes using machine-learning approaches. </p>
        <p>We also noticed that as the model reaches saturation, there are still parts of essential genes (<italic>i.e.</italic>, 32% in prokaryotes) that cannot be correctly predicted as essential. We further explored these incorrectly predicted essential genes by plotting the distributions of their associated features. Here we defined the essential genes that were correctly predicted as true positives (TPs) and those that were incorrectly predicted as false negatives (FNs). <xref ref-type="fig" rid="biomolecules-02-00001-f005">Figure 5</xref> shows the boxplot of the two parts of genes in <italic>AB</italic>. The features for which the distributions between the two sets of genes differed most widely are DES and PHYS, followed by CAI, Nc and Aromo, all of which were derived from genomic sequences. This suggests that in order to correctly predict the FNs, relying on features based on genomic sequences is no longer enough. Other strong functional genomics features have to be discovered and incorporated into predictions. This observation also supports the notion that gene essentiality is likely determined not solely by genomic sequence, but by multiple aspects of biology, from sequence to function. </p>
        <fig id="biomolecules-02-00001-f005" position="anchor">
          <label>Figure 5</label>
          <caption>
            <p>The distribution of features among true positives (TPs) and false negatives (FNs) in <italic>AB</italic>.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-g005.tif"/>
        </fig>
        <p>We then performed functional analysis of the FN genes by categorizing them according to the clusters of orthologous groups (COGs) proteins classification. In COGs, genes can be generally classified into four broad functional categories: information storage &amp; processing, cellular processes &amp; signaling, metabolism and poorly characterized. Previous work has shown that essential genes are overrepresented in the category of information storage and processing with basic cellular functions such as RNA processing and modification and DNA replication [<xref ref-type="bibr" rid="B32-biomolecules-02-00001">32</xref>]. Essential genes involved in this category are often well conserved across species. On the other hand, the species-specific essential genes are mainly distributed in cellular processes and metabolic categories, which often reflects a microbe’s unique life style and living environment. <xref ref-type="fig" rid="biomolecules-02-00001-f006">Figure S1</xref>a and S1b) show the distributions of FN genes across different functional categories in <italic>EC</italic> and <italic>SC</italic> respectively. We can see in <italic>EC</italic> the FN genes are enriched in the metabolic category while in <italic>SC</italic> these FN genes are enriched in cellular processes and signaling category.</p>
        <p>Comparing different sets of features used between the prokaryotes (<italic>EC</italic>, <italic>AB</italic>) and eukaryotes (<italic>SC</italic>, <italic>NC</italic>) in <xref ref-type="table" rid="biomolecules-02-00001-t001">Table 1</xref>, the common features they shared are: Nc, L_aa, PHYS, PA, DES and FLU. These features cover all three categories described in <xref ref-type="sec" rid="sec2dot2-biomolecules-02-00001">Section 2.2</xref>. This supports our conclusion that the computational integration of different genomic and functional features is able to accurately predict essential genes in both prokaryotes and eukaryotes. However, there are some differences of features used between them, such as those sub-cellular localization features. For example, Nucleus, Plasma and PredHel are used only by <italic>SC</italic> and <italic>NC</italic> while Inner member is used only by <italic>EC</italic> and <italic>AB</italic>. These reflect the differences in cellular structure between prokaryotes and eukaryotes—the eukaryotic cells are much larger and more complex than prokaryotic cells.</p>
        <p>Through our analysis, we realize that the evolutionary distance between the understudied organism and the model organism may affect the thresholds observed in our study. Nevertheless, our results suggest that an organism’s own known essential genes usually contain more information about its unique physiology and are a better representative set of its total essential genes.</p>
        <p>Logistic regression was chosen in this study mainly because of its simplicity and ease of interpretation of results. Other machine-learning methods could have been used. However, most alternative techniques suffer from their own limitations, e.g., missing value problems or being prohibitively time-consuming, which prevent them from being used in this study. Nevertheless, we expect our conclusions from this investigation are unlikely to change if a different machine-learning technique is used. Since the four species we studied are all microorganisms, the conclusions from this study may not be applicable to more complex systems, such as mouse and human. Finally we believe the results obtained from our study provided important information on accurately predicting essential genes and will greatly facilitate the annotations of microbial genomes. </p>
      </sec>
    </sec>
    <sec>
      <title>4.Conclusion</title>
      <p>In this study, we investigated the performance of three approaches for predicting essential genes under conditions where information on different numbers of known essential genes is given. Our results suggest that when determining the best strategy for predicting essential genes, unless the number of known essential genes is small, <italic>i.e.</italic>, less than 1.5% of total genes, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. This is consistent in both prokaryotes and eukaryotes. Moreover, when the known essential genes are few (<italic>i.e.</italic>, &lt;1.5% of total genes), and a closely related organism is available, combining these two sources of information results in a significantly increased performance over either the same-organism approach or the cross-organism approach. On the other hand, when the target organism has a sufficiently large number of known essential genes, combining the annotations from a model organism often results in a reduced performance as compared with using its own known essential genes, reflecting the slight differences of the underlying properties of essential genes between different organisms.</p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgements</title>
      <p>The authors would like to thank four anonymous reviewers for valuable suggestions. LJL designed the research, JD and LT implemented the research. XL and YL offered critical suggestions. JD, XL and LJL contributed in writing and revising the article. This research was supported by Cincinnati Children’s Hospital Medical Center (CCHMC) Trustee Grant and a grant from The Midwest Center for Emerging Infectious Diseases (MI-CEID) awarded to LJL. </p>
    </ack>
	<app-group>
    <app>
      <title>Supplementary Section</title>
    <sec>
      <title>Intrinsic and Context-Dependent Genomic Features</title>
      <p>To create a training dataset for our classifier, features are extracted where available for each ORF in each organism and annotated with known essentiality values from the essential gene datasets. Our study considers three main types of features: (A) those intrinsic to a gene’s sequence (e.g., GC content, length); (B) those derived from genomic sequence (e.g., localization signals and codon adaptation measures) and (C) experimental functional genomics data (e.g., gene-expression microarray data).</p>
      <p>A-1. <italic>Genomic sequence properties</italic>: We use CodonW (<uri>http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html</uri>) to calculate the following properties associated with genomic sequences: Kyte and Doolittle’s grand average of hydropathicity (GRAVY) [<xref ref-type="bibr" rid="SB1-biomolecules-02-00001">1</xref>], protein length (amino acids), GC content, and two measures of codon usage: effective Nc [<xref ref-type="bibr" rid="SB2-biomolecules-02-00001">2</xref>,<xref ref-type="bibr" rid="SB3-biomolecules-02-00001">3</xref>] and CAI [<xref ref-type="bibr" rid="SB4-biomolecules-02-00001">4</xref>]. </p>
      <p>B-1. <italic>Predicted subcellular localization</italic>: We used the PA-SUB Server v2.5 to obtain these features [<xref ref-type="bibr" rid="SB5-biomolecules-02-00001">5</xref>]. Gram-negative bacteria (<italic>EC</italic>, <italic>PA</italic> and <italic>AB</italic>) have five predicted localizations: Inner membrane, Extracellular, Cytoplasm, Periplasm, Outer membrane. Gram-positive bacteria (<italic>BS</italic>) have three predicted localizations: Extracellular, Cytoplasm, Plasma membrane. </p>
      <p>B-2. <italic>Transmembrane helices for</italic> each <italic>ORF</italic>: The putative transmembrane helices are calculated by TMHMM Web server v2.0 [<xref ref-type="bibr" rid="SB6-biomolecules-02-00001">6</xref>,<xref ref-type="bibr" rid="SB7-biomolecules-02-00001">7</xref>].</p>
      <p>B-3. <italic>Evolutionary conservation of a gene</italic>: We used the RBH method to search orthologs in multiple complete genomes for each gene of the target organism (<italic>PA</italic>, <italic>EC</italic>, <italic>AB</italic> and <italic>BS</italic>). The number of genomes that have orthologous hits was used as a measure of evolutionary conservation of a gene. Such conservation has been shown to correlate well with the dispensability of a gene [<xref ref-type="bibr" rid="SB8-biomolecules-02-00001">8</xref>]. </p>
      <p>B-4. <italic>Paralogy</italic>: Duplicated genes in an organism are often referred to as paralogs. An all-against-all FASTA search was conducted for the whole set of ORFs in the target organism (<italic>PA</italic>, <italic>EC</italic>, <italic>AB</italic> and <italic>BS</italic>) to identify the paralogs with an E-value threshold of 10<sup>−20</sup>.</p>
      <p>B-5 <italic>Domain enrichment</italic>: For each individual domain, we collected its occurrence in each organism (<italic>PA</italic>, <italic>EC</italic>, <italic>AB</italic> and <italic>BS</italic>) using the Pfam database (<uri>http://pfam.sanger.ac.uk</uri>). Then we estimated the domain enrichment score according to the ratio of occurrence frequencies between essential gene sets and the total genes in the target organism: <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-i001.tif"/>, here n<sub>ess</sub> and n<sub>non-ess</sub> represent a domain’s occurrence frequency in the essential and non-essential dataset, respectively. N<sub>ess</sub> and N<sub>non-ess</sub> is the size of the essential and non-essential dataset, respectively.</p>
      <p>C-1. <italic>Fluctuation in gene-expression</italic>: The mRNA expression levels of essential genes often vary, on average, within a narrower range, whereas the expression of nonessential genes fluctuates more widely [<xref ref-type="bibr" rid="SB9-biomolecules-02-00001">9</xref>]. Gene expression data in these bacteria were downloaded from NCBI GEO [<xref ref-type="bibr" rid="SB10-biomolecules-02-00001">10</xref>], ArrayExpress [<xref ref-type="bibr" rid="SB11-biomolecules-02-00001">11</xref>], as well as the gene-expression profiles of microarray data from Gasch <italic>et al</italic>. [<xref ref-type="bibr" rid="SB12-biomolecules-02-00001">12</xref>]. The variance of each gene was calculated from these gene expression profiles as a measure of the fluctuation of gene expression.</p>
      <p>C-2. <italic>Topology in gene co-expression network</italic>: From gene expression microarray data, a gene-expression cooperativity graph is constructed as <italic>G<sub>g</sub></italic>(<italic>D</italic>) = (<italic>V<sub>g</sub>,E<sub>g</sub></italic>), with the vertex set <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-i002.tif"/> and the edge set <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-i003.tif"/> for <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-i004.tif"/> and | <italic>r<sub>ij</sub></italic> | ≥ 0.7. Each vertex represents a gene and each edge represents a gene pair whose gene expression profiles correlation coefficient | <italic>r<sub>ij</sub></italic> | is greater than 0.7. This cutoff value of | <italic>r<sub>ij</sub></italic> | is determined based on our previous work [<xref ref-type="bibr" rid="SB13-biomolecules-02-00001">13</xref>]. The hubs (nodes with high degrees) and bottlenecks (nodes with high betweenness or shortest paths occurrence) have been found to have correlations with gene essentiality [<xref ref-type="bibr" rid="SB14-biomolecules-02-00001">14</xref>]. The network statistics are calculated using tYNA (<uri>http://www.gersteinlab.org/tyna</uri>).</p>
	  <fig id="biomolecules-02-00001-f006" position="anchor">
          <label>Figure S1</label>
          <caption>
            <p>Functional distribution of false negative genes according to the orthologous groups of proteins (COGs) classification in <italic>EC</italic> (<bold>a</bold>) and <italic>SC</italic> (<bold>b</bold>) respectively.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="biomolecules-02-00001-g006.tif"/>
        </fig>
    </sec>
    <ref-list>
      <title>Supplementary References</title>
      <ref id="SB1-biomolecules-02-00001">
        <label>1.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Kyte</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Doolittle</surname>
              <given-names>R.F.</given-names>
            </name>
          </person-group>
          <article-title>A simple method for displaying the hydropathic character of a protein</article-title>
          <source>J. Mol. Biol.</source>
          <year>1982</year>
          <volume>157</volume>
          <fpage>105</fpage>
          <lpage>132</lpage>
          <pub-id pub-id-type="doi">10.1016/0022-2836(82)90515-0</pub-id>
        </citation>
      </ref>
      <ref id="SB2-biomolecules-02-00001">
        <label>2.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wright</surname>
              <given-names>F.</given-names>
            </name>
          </person-group>
          <article-title>The ‘effective number of codons’ used in a gene</article-title>
          <source>Gene</source>
          <year>1990</year>
          <volume>87</volume>
          <fpage>23</fpage>
          <lpage>29</lpage>
          <pub-id pub-id-type="doi">10.1016/0378-1119(90)90491-9</pub-id>
        </citation>
      </ref>
      <ref id="SB3-biomolecules-02-00001">
        <label>3.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Fuglsang</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <article-title>The ‘effective number of codons’ revisited</article-title>
          <source>Biochem. Biophys. Res. Commun.</source>
          <year>2004</year>
          <volume>317</volume>
          <fpage>957</fpage>
          <lpage>964</lpage>
          <pub-id pub-id-type="doi">10.1016/j.bbrc.2004.03.138</pub-id>
        </citation>
      </ref>
      <ref id="SB4-biomolecules-02-00001">
        <label>4.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Sharp</surname>
              <given-names>P.M.</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>W.H.</given-names>
            </name>
          </person-group>
          <article-title>The codon adaptation index—A measure of directional synonymous codon usage bias, and its potential applications</article-title>
          <source>Nucleic Acids Res.</source>
          <year>1987</year>
          <volume>15</volume>
          <fpage>1281</fpage>
          <lpage>1295</lpage>
          <pub-id pub-id-type="doi">10.1093/nar/15.3.1281</pub-id>
        </citation>
      </ref>
      <ref id="SB5-biomolecules-02-00001">
        <label>5.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Lu</surname>
              <given-names>Z.</given-names>
            </name>
            <name>
              <surname>Szafron</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Greiner</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Lu</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Wishart</surname>
              <given-names>D.S.</given-names>
            </name>
            <name>
              <surname>Poulin</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Anvik</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Macdonell</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Eisner</surname>
              <given-names>R.</given-names>
            </name>
          </person-group>
          <article-title>Predicting subcellular localization of proteins using machine-learned classifiers</article-title>
          <source>Bioinformatics</source>
          <year>2004</year>
          <volume>20</volume>
          <fpage>547</fpage>
          <lpage>556</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/btg447</pub-id>
        </citation>
      </ref>
      <ref id="SB6-biomolecules-02-00001">
        <label>6.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Sonnhammer</surname>
              <given-names>E.L.</given-names>
            </name>
            <name>
              <surname>von Heijne</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Krogh</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <article-title>A hidden markov model for predicting transmembrane helices in protein sequences</article-title>
          <source>Proc. Int. Conf. Intell. Syst. Mol. Biol.</source>
          <year>1998</year>
          <volume>6</volume>
          <fpage>175</fpage>
          <lpage>182</lpage>
        </citation>
      </ref>
      <ref id="SB7-biomolecules-02-00001">
        <label>7.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Krogh</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Larsson</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>von Heijne</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Sonnhammer</surname>
              <given-names>E.L.</given-names>
            </name>
          </person-group>
          <article-title>Predicting transmembrane protein topology with a hidden markov model: Application to complete genomes</article-title>
          <source>J. Mol. Biol.</source>
          <year>2001</year>
          <volume>305</volume>
          <fpage>567</fpage>
          <lpage>580</lpage>
          <pub-id pub-id-type="doi">10.1006/jmbi.2000.4315</pub-id>
        </citation>
      </ref>
      <ref id="SB8-biomolecules-02-00001">
        <label>8.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Chen</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>D.</given-names>
            </name>
          </person-group>
          <article-title>Understanding protein dispensability through machine-learning analysis of high-throughput data</article-title>
          <source>Bioinformatics</source>
          <year>2005</year>
          <volume>21</volume>
          <fpage>575</fpage>
          <lpage>581</lpage>
          <pub-id pub-id-type="doi">10.1093/bioinformatics/bti058</pub-id>
        </citation>
      </ref>
      <ref id="SB9-biomolecules-02-00001">
        <label>9.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Jeong</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Oltvai</surname>
              <given-names>Z.N.</given-names>
            </name>
            <name>
              <surname>Barabasi</surname>
              <given-names>A.L.</given-names>
            </name>
          </person-group>
          <article-title>Prediction of protein essentiality based on genomic data</article-title>
          <source>ComPlexUs</source>
          <year>2003</year>
          <volume>1</volume>
          <fpage>19</fpage>
          <lpage>28</lpage>
          <pub-id pub-id-type="doi">10.1159/000067640</pub-id>
        </citation>
      </ref>
      <ref id="SB10-biomolecules-02-00001">
        <label>10.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Barrett</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Troup</surname>
              <given-names>D.B.</given-names>
            </name>
            <name>
              <surname>Wilhite</surname>
              <given-names>S.E.</given-names>
            </name>
            <name>
              <surname>Ledoux</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Rudnev</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Evangelista</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Kim</surname>
              <given-names>I.F.</given-names>
            </name>
            <name>
              <surname>Soboleva</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Tomashevsky</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Edgar</surname>
              <given-names>R.</given-names>
            </name>
          </person-group>
          <article-title>Ncbi geo: Mining tens of millions of expression profiles—Database and tools update</article-title>
          <source>Nucleic Acids Res.</source>
          <year>2007</year>
          <volume>35</volume>
          <fpage>D760</fpage>
          <lpage>D765</lpage>
        </citation>
      </ref>
      <ref id="SB11-biomolecules-02-00001">
        <label>11.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Parkinson</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Kapushesky</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Shojatalab</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Abeygunawardena</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Coulson</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Farne</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Holloway</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Kolesnykov</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Lilja</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Lukk</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Mani</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Rayner</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Sharma</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>William</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Sarkans</surname>
              <given-names>U.</given-names>
            </name>
            <name>
              <surname>Brazma</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <article-title>Arrayexpress—A public database of microarray experiments and gene expression profiles</article-title>
          <source>Nucleic Acids Res.</source>
          <year>2007</year>
          <volume>35</volume>
          <fpage>D747</fpage>
          <lpage>D750</lpage>
        </citation>
      </ref>
      <ref id="SB12-biomolecules-02-00001">
        <label>12.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gasch</surname>
              <given-names>A.P.</given-names>
            </name>
            <name>
              <surname>Spellman</surname>
              <given-names>P.T.</given-names>
            </name>
            <name>
              <surname>Kao</surname>
              <given-names>C.M.</given-names>
            </name>
            <name>
              <surname>Carmel-Harel</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Eisen</surname>
              <given-names>M.B.</given-names>
            </name>
            <name>
              <surname>Storz</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Botstein</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Brown</surname>
              <given-names>P.O.</given-names>
            </name>
          </person-group>
          <article-title>Genomic expression programs in the response of yeast cells to environmental changes</article-title>
          <source>Mol. Biol. Cell</source>
          <year>2000</year>
          <volume>11</volume>
          <fpage>4241</fpage>
          <lpage>4257</lpage>
        </citation>
      </ref>
      <ref id="SB13-biomolecules-02-00001">
        <label>13.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Lu</surname>
              <given-names>L.J.</given-names>
            </name>
            <name>
              <surname>Xia</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Paccanaro</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Yu</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Gerstein</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Assessing the limits of genomic data integration for predicting protein networks</article-title>
          <source>Genome Res.</source>
          <year>2005</year>
          <volume>15</volume>
          <fpage>945</fpage>
          <lpage>953</lpage>
          <pub-id pub-id-type="doi">10.1101/gr.3610305</pub-id>
        </citation>
      </ref>
      <ref id="SB14-biomolecules-02-00001">
        <label>14.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Yu</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Greenbaum</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Xin Lu</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Zhu</surname>
              <given-names>X.</given-names>
            </name>
            <name>
              <surname>Gerstein</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Genomic analysis of essentiality within protein networks</article-title>
          <source>Trends Genet.</source>
          <year>2004</year>
          <volume>20</volume>
          <fpage>227</fpage>
          <lpage>231</lpage>
          <pub-id pub-id-type="doi">10.1016/j.tig.2004.04.008</pub-id>
        </citation>
      </ref>
    </ref-list>
    </app>
  </app-group>
    
    <ref-list>
      <title>References</title>
      <ref id="B1-biomolecules-02-00001">
        <label>1.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Haselbeck</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Wall</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Jiang</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Ketela</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Zyskind</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Bussey</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Foulkes</surname>
              <given-names>J.G.</given-names>
            </name>
            <name>
              <surname>Roemer</surname>
              <given-names>T.</given-names>
            </name>
          </person-group>
          <article-title>Comprehensive essential gene identification as a platform for novel anti-infective drug discovery</article-title>
          <source>Curr. Pharm. Des.</source>
          <year>2002</year>
          <volume>8</volume>
          <fpage>1155</fpage>
          <lpage>1172</lpage>
          <pub-id pub-id-type="doi">10.2174/1381612023394818</pub-id>
        </citation>
      </ref>
      <ref id="B2-biomolecules-02-00001">
        <label>2.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gibson</surname>
              <given-names>D.G.</given-names>
            </name>
            <name>
              <surname>Glass</surname>
              <given-names>J.I.</given-names>
            </name>
            <name>
              <surname>Lartigue</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Noskov</surname>
              <given-names>V.N.</given-names>
            </name>
            <name>
              <surname>Chuang</surname>
              <given-names>R.Y.</given-names>
            </name>
            <name>
              <surname>Algire</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>Benders</surname>
              <given-names>G.A.</given-names>
            </name>
            <name>
              <surname>Montague</surname>
              <given-names>M.G.</given-names>
            </name>
            <name>
              <surname>Ma</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Moodie</surname>
              <given-names>M.M.</given-names>
            </name>
            <name>
              <surname>Merryman</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Vashee</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Krishnakumar</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Assad-Garcia</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Andrews-Pfannkoch</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Denisova</surname>
              <given-names>E.A.</given-names>
            </name>
            <name>
              <surname>Young</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Qi</surname>
              <given-names>Z.Q.</given-names>
            </name>
            <name>
              <surname>Segall-Shapiro</surname>
              <given-names>T.H.</given-names>
            </name>
            <name>
              <surname>Calvey</surname>
              <given-names>C.H.</given-names>
            </name>
            <name>
              <surname>Parmar</surname>
              <given-names>P.P.</given-names>
            </name>
            <name>
              <surname>Hutchison</surname>
              <given-names>C.A.</given-names>
              <suffix>III.</suffix>
            </name>
            <name>
              <surname>Smith</surname>
              <given-names>H.O.</given-names>
            </name>
            <name>
              <surname>Venter</surname>
              <given-names>J.C.</given-names>
            </name>
          </person-group>
          <article-title>Creation of a bacterial cell controlled by a chemically synthesized genome</article-title>
          <source>Science</source>
          <year>2010</year>
          <volume>329</volume>
          <fpage>52</fpage>
          <lpage>56</lpage>
        <pub-id pub-id-type="doi">10.1126/science.1190719</pub-id><pub-id pub-id-type="pmid">20488990</pub-id></citation>
      </ref>
      <ref id="B3-biomolecules-02-00001">
        <label>3.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Pennisi</surname>
              <given-names>E.</given-names>
            </name>
          </person-group>
          <article-title>Genomics. Synthetic genome brings new life to bacterium</article-title>
          <source>Science</source>
          <year>2010</year>
          <volume>328</volume>
          <fpage>958</fpage>
          <lpage>959</lpage>
          <pub-id pub-id-type="doi">10.1126/science.328.5981.958</pub-id>
        </citation>
      </ref>
      <ref id="B4-biomolecules-02-00001">
        <label>4.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Dowell</surname>
              <given-names>R.D.</given-names>
            </name>
            <name>
              <surname>Ryan</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Jansen</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Cheung</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Agarwala</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Danford</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Bernstein</surname>
              <given-names>D.A.</given-names>
            </name>
            <name>
              <surname>Rolfe</surname>
              <given-names>P.A.</given-names>
            </name>
            <name>
              <surname>Heisler</surname>
              <given-names>L.E.</given-names>
            </name>
            <name>
              <surname>Chin</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Nislow</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Giaever</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Phillips</surname>
              <given-names>P.C.</given-names>
            </name>
            <name>
              <surname>Fink</surname>
              <given-names>G.R.</given-names>
            </name>
            <name>
              <surname> Gifford</surname>
              <given-names>D.K.</given-names>
            </name>
            <name>
              <surname>Boone</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <article-title>Genotype to phenotype: A complex problem</article-title>
          <source>Science</source>
          <year>2010</year>
          <volume>328</volume>
          <fpage>469</fpage>
          <pub-id pub-id-type="doi">10.1126/science.1189015</pub-id>
        </citation>
      </ref>
      <ref id="B5-biomolecules-02-00001">
        <label>5.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Baba</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Ara</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Hasegawa</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Takai</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Okumura</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Baba</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Datsenko</surname>
              <given-names>K.A.</given-names>
            </name>
            <name>
              <surname>Tomita</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Wanner</surname>
              <given-names>B.L.</given-names>
            </name>
            <name>
              <surname>Mori</surname>
              <given-names>H.</given-names>
            </name>
          </person-group>
          <article-title>Construction of escherichia coli k-12 in-frame, single-gene knockout mutants: The keio collection</article-title>
          <source>Mol. Syst. Biol.</source>
          <year>2006</year>
          <volume>2</volume>
          <fpage>2006</fpage>
          <lpage>0008</lpage>
        <pub-id pub-id-type="pmid">16788596</pub-id></citation>
      </ref>
      <ref id="B6-biomolecules-02-00001">
        <label>6.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>de Berardinis</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Vallenet</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Castelli</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Besnard</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Pinet</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Cruaud</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Samair</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lechaplais</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Gyapay</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Richez</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Durot</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Kreimeyer</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>le Fevre</surname>
              <given-names>F.</given-names>
            </name>
            <name>
              <surname>Schachter</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Pezo</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Doring</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Scarpelli</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Medigue</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Cohen</surname>
              <given-names>G.N.</given-names>
            </name>
            <name>
              <surname>Marliere</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Salanoubat</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Weissenbach</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>A complete collection of single-gene deletion mutants of acinetobacter baylyi adp1</article-title>
          <source>Mol. Syst. Biol.</source>
          <year>2008</year>
          <volume>4</volume>
          <comment>Article number: 174</comment>
          <pub-id pub-id-type="doi">10.1038/msb.2008.10</pub-id>
        </citation>
      </ref>
      <ref id="B7-biomolecules-02-00001">
        <label>7.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Kato</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Hashimoto</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Construction of consecutive deletions of the escherichia coli chromosome</article-title>
          <source>Mol. Syst. Biol.</source>
          <year>2007</year>
          <volume>3</volume>
          <comment>Article number: 132</comment>
          <pub-id pub-id-type="doi">10.1038/msb4100174</pub-id>
        </citation>
      </ref>
      <ref id="B8-biomolecules-02-00001">
        <label>8.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Kobayashi</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Tsuda</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Yoshida</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Takeuchi</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Utsunomiya</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Takahashi</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Suzuki</surname>
              <given-names>F.</given-names>
            </name>
          </person-group>
          <article-title>Bacterial sepsis and chemokines</article-title>
          <source>Curr. Drug Targets</source>
          <year>2006</year>
          <volume>7</volume>
          <fpage>119</fpage>
          <lpage>134</lpage>
          <pub-id pub-id-type="doi">10.2174/138945006775270169</pub-id>
        </citation>
      </ref>
      <ref id="B9-biomolecules-02-00001">
        <label>9.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Akerley</surname>
              <given-names>B.J.</given-names>
            </name>
            <name>
              <surname>Rubin</surname>
              <given-names>E.J.</given-names>
            </name>
            <name>
              <surname>Novick</surname>
              <given-names>V.L.</given-names>
            </name>
            <name>
              <surname>Amaya</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Judson</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Mekalanos</surname>
              <given-names>J.J.</given-names>
            </name>
          </person-group>
          <article-title>A genome-scale analysis for identification of genes required for growth or survival of haemophilus influenzae</article-title>
          <source>Proc. Natl. Acad. Sci. USA</source>
          <year>2002</year>
          <volume>99</volume>
          <fpage>966</fpage>
          <lpage>971</lpage>
        <pub-id pub-id-type="doi">10.1073/pnas.012602299</pub-id><pub-id pub-id-type="pmid">11805338</pub-id></citation>
      </ref>
      <ref id="B10-biomolecules-02-00001">
        <label>10.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gallagher</surname>
              <given-names>L.A.</given-names>
            </name>
            <name>
              <surname>Ramage</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Jacobs</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>Kaul</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Brittnacher</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Manoil</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <article-title>A comprehensive transposon mutant library of francisella novicida, a bioweapon surrogate</article-title>
          <source>Proc. Natl. Acad. Sci. USA</source>
          <year>2007</year>
          <volume>104</volume>
          <fpage>1009</fpage>
          <lpage>1014</lpage>
        <pub-id pub-id-type="doi">10.1073/pnas.0606713104</pub-id><pub-id pub-id-type="pmid">17215359</pub-id></citation>
      </ref>
      <ref id="B11-biomolecules-02-00001">
        <label>11.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gerdes</surname>
              <given-names>S.Y.</given-names>
            </name>
            <name>
              <surname>Scholle</surname>
              <given-names>M.D.</given-names>
            </name>
            <name>
              <surname>Campbell</surname>
              <given-names>J.W.</given-names>
            </name>
            <name>
              <surname>Balazsi</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Ravasz</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Daugherty</surname>
              <given-names>M.D.</given-names>
            </name>
            <name>
              <surname>Somera</surname>
              <given-names>A.L.</given-names>
            </name>
            <name>
              <surname>Kyrpides</surname>
              <given-names>N.C.</given-names>
            </name>
            <name>
              <surname>Anderson</surname>
              <given-names>I.</given-names>
            </name>
            <name>
              <surname>Gelfand</surname>
              <given-names>M.S.</given-names>
            </name>
            <name>
              <surname>Bhattacharya</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Kapatral</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>D’Souza</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Baev</surname>
              <given-names>M.V.</given-names>
            </name>
            <name>
              <surname>Grechkin</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Mseeh</surname>
              <given-names>F.</given-names>
            </name>
            <name>
              <surname>Fonstein</surname>
              <given-names>M.Y.</given-names>
            </name>
            <name>
              <surname>Overbeek</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Barabasi</surname>
              <given-names>A.L.</given-names>
            </name>
            <name>
              <surname>Oltvai</surname>
              <given-names>Z.N.</given-names>
            </name>
            <name>
              <surname>Osterman</surname>
              <given-names>A.L.</given-names>
            </name>
          </person-group>
          <article-title>Experimental determination and system level analysis of essential genes in escherichia coli mg1655</article-title>
          <source>J. Bacteriol.</source>
          <year>2003</year>
          <volume>185</volume>
          <fpage>5673</fpage>
          <lpage>5684</lpage>
        <pub-id pub-id-type="doi">10.1128/JB.185.19.5673-5684.2003</pub-id><pub-id pub-id-type="pmid">13129938</pub-id></citation>
      </ref>
      <ref id="B12-biomolecules-02-00001">
        <label>12.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Glass</surname>
              <given-names>J.I.</given-names>
            </name>
            <name>
              <surname>Assad-Garcia</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Alperovich</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Yooseph</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lewis</surname>
              <given-names>M.R.</given-names>
            </name>
            <name>
              <surname>Maruf</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Hutchison</surname>
              <given-names>C.A.</given-names>
              <suffix>III.</suffix>
            </name>
            <name>
              <surname>Smith</surname>
              <given-names>H.O.</given-names>
            </name>
            <name>
              <surname>Venter</surname>
              <given-names>J.C.</given-names>
            </name>
          </person-group>
          <article-title>Essential genes of a minimal bacterium</article-title>
          <source>Proc. Natl. Acad. Sci. USA</source>
          <year>2006</year>
          <volume>103</volume>
          <fpage>425</fpage>
          <lpage>430</lpage>
        <pub-id pub-id-type="doi">10.1073/pnas.0510013103</pub-id><pub-id pub-id-type="pmid">16407165</pub-id></citation>
      </ref>
      <ref id="B13-biomolecules-02-00001">
        <label>13.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Hutchison</surname>
              <given-names>C.A.</given-names>
            </name>
            <name>
              <surname>Peterson</surname>
              <given-names>S.N.</given-names>
            </name>
            <name>
              <surname>Gill</surname>
              <given-names>S.R.</given-names>
            </name>
            <name>
              <surname>Cline</surname>
              <given-names>R.T.</given-names>
            </name>
            <name>
              <surname>White</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Fraser</surname>
              <given-names>C.M.</given-names>
            </name>
            <name>
              <surname>Smith</surname>
              <given-names>H.O.</given-names>
            </name>
            <name>
              <surname>Venter</surname>
              <given-names>J.C.</given-names>
            </name>
          </person-group>
          <article-title>Global transposon mutagenesis and a minimal mycoplasma genome</article-title>
          <source>Science</source>
          <year>1999</year>
          <volume>286</volume>
          <fpage>2165</fpage>
          <lpage>2169</lpage>
        <pub-id pub-id-type="doi">10.1126/science.286.5447.2165</pub-id><pub-id pub-id-type="pmid">10591650</pub-id></citation>
      </ref>
      <ref id="B14-biomolecules-02-00001">
        <label>14.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Jacobs</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>Alwood</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Thaipisuttikul</surname>
              <given-names>I.</given-names>
            </name>
            <name>
              <surname>Spencer</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Haugen</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Ernst</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Will</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Kaul</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Raymond</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Levy</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Chun-Rong</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Guenthner</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Bovee</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Olson</surname>
              <given-names>M.V.</given-names>
            </name>
            <name>
              <surname>Manoil</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <article-title>Comprehensive transposon mutant library of pseudomonas aeruginosa</article-title>
          <source>Proc. Natl. Acad. Sci. USA</source>
          <year>2003</year>
          <volume>100</volume>
          <fpage>14339</fpage>
          <lpage>14344</lpage>
        <pub-id pub-id-type="doi">10.1073/pnas.2036282100</pub-id><pub-id pub-id-type="pmid">14617778</pub-id></citation>
      </ref>
      <ref id="B15-biomolecules-02-00001">
        <label>15.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Liberati</surname>
              <given-names>N.T.</given-names>
            </name>
            <name>
              <surname>Urbach</surname>
              <given-names>J.M.</given-names>
            </name>
            <name>
              <surname>Miyata</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>D.G.</given-names>
            </name>
            <name>
              <surname>Drenkard</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Wu</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Villanueva</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Wei</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Ausubel</surname>
              <given-names>F.M.</given-names>
            </name>
          </person-group>
          <article-title>An ordered, nonredundant library of pseudomonas aeruginosa strain pa14 transposon insertion mutants</article-title>
          <source>Proc. Natl. Acad. Sci. USA</source>
          <year>2006</year>
          <volume>103</volume>
          <fpage>2833</fpage>
          <lpage>2838</lpage>
        <pub-id pub-id-type="doi">10.1073/pnas.0511100103</pub-id><pub-id pub-id-type="pmid">16477005</pub-id></citation>
      </ref>
      <ref id="B16-biomolecules-02-00001">
        <label>16.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Forsyth</surname>
              <given-names>R.A.</given-names>
            </name>
            <name>
              <surname>Haselbeck</surname>
              <given-names>R.J.</given-names>
            </name>
            <name>
              <surname>Ohlsen</surname>
              <given-names>K.L.</given-names>
            </name>
            <name>
              <surname>Yamamoto</surname>
              <given-names>R.T.</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Trawick</surname>
              <given-names>J.D.</given-names>
            </name>
            <name>
              <surname>Wall</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Brown-Driver</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Froelich</surname>
              <given-names>J.M.</given-names>
            </name>
            <name>
              <surname>C</surname>
              <given-names>K.G.</given-names>
            </name>
            <name>
              <surname>King</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>McCarthy</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Malone</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Misiner</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Robbins</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Tan</surname>
              <given-names>Z.</given-names>
            </name>
            <name>
              <surname>Zhu Zy</surname>
              <given-names>Z.Y.</given-names>
            </name>
            <name>
              <surname>Carr</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Mosca</surname>
              <given-names>D.A.</given-names>
            </name>
            <name>
              <surname>Zamudio</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Foulkes</surname>
              <given-names>J.G.</given-names>
            </name>
            <name>
              <surname>Zyskind</surname>
              <given-names>J.W.</given-names>
            </name>
          </person-group>
          <article-title>A genome-wide strategy for the identification of essential genes in staphylococcus aureus</article-title>
          <source>Mol. Microbiol.</source>
          <year>2002</year>
          <volume>43</volume>
          <fpage>1387</fpage>
          <lpage>1400</lpage>
        <pub-id pub-id-type="doi">10.1046/j.1365-2958.2002.02832.x</pub-id><pub-id pub-id-type="pmid">11952893</pub-id></citation>
      </ref>
      <ref id="B17-biomolecules-02-00001">
        <label>17.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Ji</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Van</surname>
              <given-names>S.F.</given-names>
            </name>
            <name>
              <surname>Horn</surname>
            </name>
            <name>
              <surname>Warren</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Woodnutt</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Burnham</surname>
              <given-names>M.K.</given-names>
            </name>
            <name>
              <surname>Rosenberg</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Identification of critical staphylococcal genes using conditional phenotypes generated by antisense rna</article-title>
          <source>Science</source>
          <year>2001</year>
          <volume>293</volume>
          <fpage>2266</fpage>
          <lpage>2269</lpage>
        <pub-id pub-id-type="doi">10.1126/science.1063566</pub-id><pub-id pub-id-type="pmid">11567142</pub-id></citation>
      </ref>
      <ref id="B18-biomolecules-02-00001">
        <label>18.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Bijlsma</surname>
              <given-names>J.J.</given-names>
            </name>
            <name>
              <surname>Burghout</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Kloosterman</surname>
              <given-names>T.G.</given-names>
            </name>
            <name>
              <surname>Bootsma</surname>
              <given-names>H.J.</given-names>
            </name>
            <name>
              <surname>de Jong</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Hermans</surname>
              <given-names>P.W.</given-names>
            </name>
            <name>
              <surname>Kuipers</surname>
              <given-names>O.P.</given-names>
            </name>
          </person-group>
          <article-title>Development of genomic array footprinting for identification of conditionally essential genes in streptococcus pneumoniae</article-title>
          <source>Appl. Environ. Microbiol.</source>
          <year>2007</year>
          <volume>73</volume>
          <fpage>1514</fpage>
          <lpage>1524</lpage>
        <pub-id pub-id-type="doi">10.1128/AEM.01900-06</pub-id><pub-id pub-id-type="pmid">17261526</pub-id></citation>
      </ref>
      <ref id="B19-biomolecules-02-00001">
        <label>19.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Daniels</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Godoy</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Duque</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Molina-Henares</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>de la Torre</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>del Arco</surname>
              <given-names>J.M.</given-names>
            </name>
            <name>
              <surname>Herrera</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Segura</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Guazzaroni</surname>
              <given-names>M.E.</given-names>
            </name>
            <name>
              <surname>Ferrer</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Ramos</surname>
              <given-names>J.L.</given-names>
            </name>
          </person-group>
          <article-title>Global regulation of food supply by pseudomonas putida dot-t1e</article-title>
          <source>J. Bacteriol.</source>
          <year>2010</year>
          <volume>192</volume>
          <fpage>2169</fpage>
          <lpage>2181</lpage>
        <pub-id pub-id-type="doi">10.1128/JB.01129-09</pub-id><pub-id pub-id-type="pmid">20139187</pub-id></citation>
      </ref>
      <ref id="B20-biomolecules-02-00001">
        <label>20.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Joyce</surname>
              <given-names>A.R.</given-names>
            </name>
            <name>
              <surname>Reed</surname>
              <given-names>J.L.</given-names>
            </name>
            <name>
              <surname>White</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Edwards</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Osterman</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Baba</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Mori</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Lesely</surname>
              <given-names>S.A.</given-names>
            </name>
            <name>
              <surname>Palsson</surname>
              <given-names>B.O.</given-names>
            </name>
            <name>
              <surname>Agarwalla</surname>
              <given-names>S.</given-names>
            </name>
          </person-group>
          <article-title>Experimental and computational assessment of conditionally essential genes in escherichia coli</article-title>
          <source>J. Bacteriol.</source>
          <year>2006</year>
          <volume>188</volume>
          <fpage>8259</fpage>
          <lpage>8271</lpage>
        <pub-id pub-id-type="doi">10.1128/JB.00740-06</pub-id><pub-id pub-id-type="pmid">17012394</pub-id></citation>
      </ref>
      <ref id="B21-biomolecules-02-00001">
        <label>21.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Molina-Henares</surname>
              <given-names>M.A.</given-names>
            </name>
            <name>
              <surname>de la Torre</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Garcia-Salamanca</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Molina-Henares</surname>
              <given-names>A.J.</given-names>
            </name>
            <name>
              <surname>Herrera</surname>
              <given-names>M.C.</given-names>
            </name>
            <name>
              <surname>Ramos</surname>
              <given-names>J.L.</given-names>
            </name>
            <name>
              <surname>Duque</surname>
              <given-names>E.</given-names>
            </name>
          </person-group>
          <article-title>Identification of conditionally essential genes for growth of pseudomonas putida kt2440 on minimal medium through the screening of a genome-wide mutant library</article-title>
          <source>Environ. Microbiol.</source>
          <year>2010</year>
          <volume>12</volume>
          <fpage>1468</fpage>
          <lpage>1485</lpage>
        <pub-id pub-id-type="pmid">20158506</pub-id></citation>
      </ref>
      <ref id="B22-biomolecules-02-00001">
        <label>22.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Tong</surname>
              <given-names>X.</given-names>
            </name>
            <name>
              <surname>Campbell</surname>
              <given-names>J.W.</given-names>
            </name>
            <name>
              <surname>Balazsi</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Kay</surname>
              <given-names>K.A.</given-names>
            </name>
            <name>
              <surname>Wanner</surname>
              <given-names>B.L.</given-names>
            </name>
            <name>
              <surname>Gerdes</surname>
              <given-names>S.Y.</given-names>
            </name>
            <name>
              <surname>Oltvai</surname>
              <given-names>Z.N.</given-names>
            </name>
          </person-group>
          <article-title>Genome-scale identification of conditionally essential genes in e. Coli by DNA microarrays</article-title>
          <source>Biochem. Biophys. Res. Commun.</source>
          <year>2004</year>
          <volume>322</volume>
          <fpage>347</fpage>
          <lpage>354</lpage>
          <pub-id pub-id-type="doi">10.1016/j.bbrc.2004.07.110</pub-id>
        </citation>
      </ref>
      <ref id="B23-biomolecules-02-00001">
        <label>23.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Hillenmeyer</surname>
              <given-names>M.E.</given-names>
            </name>
            <name>
              <surname>Fung</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Wildenhain</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Pierce</surname>
              <given-names>S.E.</given-names>
            </name>
            <name>
              <surname>Hoon</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>W.</given-names>
            </name>
            <name>
              <surname>Proctor</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>St Onge</surname>
              <given-names>R.P.</given-names>
            </name>
            <name>
              <surname>Tyers</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Koller</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Altman</surname>
              <given-names>R.B.</given-names>
            </name>
            <name>
              <surname>Davis</surname>
              <given-names>R.W.</given-names>
            </name>
            <name>
              <surname>Nislow</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Giaever</surname>
              <given-names>G.</given-names>
            </name>
          </person-group>
          <article-title>The chemical genomic portrait of yeast: Uncovering a phenotype for all genes</article-title>
          <source>Science</source>
          <year>2008</year>
          <volume>320</volume>
          <fpage>362</fpage>
          <lpage>365</lpage>
        <pub-id pub-id-type="doi">10.1126/science.1150021</pub-id><pub-id pub-id-type="pmid">18420932</pub-id></citation>
      </ref>
      <ref id="B24-biomolecules-02-00001">
        <label>24.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Deng</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Deng</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Lin</surname>
              <given-names>X.</given-names>
            </name>
            <name>
              <surname>Wei</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Minai</surname>
              <given-names>A.A.</given-names>
            </name>
            <name>
              <surname>Hassett</surname>
              <given-names>D.J.</given-names>
            </name>
            <name>
              <surname>Lu</surname>
              <given-names>L.J.</given-names>
            </name>
          </person-group>
          <article-title>Investigating the predictability of essential genes across distantly related organisms using an integrative approach</article-title>
          <source>Nucleic Acids Res.</source>
          <year>2010</year>
          <volume>39</volume>
          <fpage>795</fpage>
          <lpage>807</lpage>
        <pub-id pub-id-type="pmid">20870748</pub-id></citation>
      </ref>
      <ref id="B25-biomolecules-02-00001">
        <label>25.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Hashimoto</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Ichimura</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Mizoguchi</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Tanaka</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Fujimitsu</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Keyamura</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Ote</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Yamakawa</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Yamazaki</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Mori</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Katayama</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Kato</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>Cell size and nucleoid organization of engineered escherichia coli cells with a reduced genome</article-title>
          <source>Mol. Microbiol.</source>
          <year>2005</year>
          <volume>55</volume>
          <fpage>137</fpage>
          <lpage>149</lpage>
        <pub-id pub-id-type="pmid">15612923</pub-id></citation>
      </ref>
      <ref id="B26-biomolecules-02-00001">
        <label>26.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Giaever</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Chu</surname>
              <given-names>A.M.</given-names>
            </name>
            <name>
              <surname>Ni</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Connelly</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Riles</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Veronneau</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Dow</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lucau-Danila</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Anderson</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Andre</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Arkin</surname>
              <given-names>A.P.</given-names>
            </name>
            <name>
              <surname>Astromoff</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>El-Bakkoury</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Bangham</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Benito</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Brachat</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Campanaro</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Curtiss</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Davis</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Deutschbauer</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Entian</surname>
              <given-names>K.D.</given-names>
            </name>
            <name>
              <surname>Flaherty</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Foury</surname>
              <given-names>F.</given-names>
            </name>
            <name>
              <surname>Garfinkel</surname>
              <given-names>D.J.</given-names>
            </name>
            <name>
              <surname>Gerstein</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Gotte</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Guldener</surname>
              <given-names>U.</given-names>
            </name>
            <name>
              <surname>Hegemann</surname>
              <given-names>J.H.</given-names>
            </name>
            <name>
              <surname>Hempel</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Herman</surname>
              <given-names>Z.</given-names>
            </name>
            <name>
              <surname>Jaramillo</surname>
              <given-names>D.F.</given-names>
            </name>
            <name>
              <surname>Kelly</surname>
              <given-names>D.E.</given-names>
            </name>
            <name>
              <surname>Kelly</surname>
              <given-names>S.L.</given-names>
            </name>
            <name>
              <surname>Kotter</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>LaBonte</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Lamb</surname>
              <given-names>D.C.</given-names>
            </name>
            <name>
              <surname>Lan</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Liang</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Liao</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Luo</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Lussier</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Mao</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Menard</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Ooi</surname>
              <given-names>S.L.</given-names>
            </name>
            <name>
              <surname>Revuelta</surname>
              <given-names>J.L.</given-names>
            </name>
            <name>
              <surname>Roberts</surname>
              <given-names>C.J.</given-names>
            </name>
            <name>
              <surname>Rose</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Ross-Macdonald</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Scherens</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Schimmack</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Shafer</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Shoemaker</surname>
              <given-names>D.D.</given-names>
            </name>
            <name>
              <surname>Sookhai-Mahadeo</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Storms</surname>
              <given-names>R.K.</given-names>
            </name>
            <name>
              <surname>Strathern</surname>
              <given-names>J.N.</given-names>
            </name>
            <name>
              <surname>Valle</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Voet</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Volckaert</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>C.Y.</given-names>
            </name>
            <name>
              <surname>Ward</surname>
              <given-names>T.R.</given-names>
            </name>
            <name>
              <surname>Wilhelmy</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Winzeler</surname>
              <given-names>E.A.</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Yen</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Youngman</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Yu</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Bussey</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Boeke</surname>
              <given-names>J.D.</given-names>
            </name>
            <name>
              <surname>Snyder</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Philippsen</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Davis</surname>
              <given-names>R.W.</given-names>
            </name>
            <name>
              <surname>Johnston</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Functional profiling of the saccharomyces cerevisiae genome</article-title>
          <source>Nature</source>
          <year>2002</year>
          <volume>418</volume>
          <fpage>387</fpage>
          <lpage>391</lpage>
        <pub-id pub-id-type="doi">10.1038/nature00935</pub-id><pub-id pub-id-type="pmid">12140549</pub-id></citation>
      </ref>
      <ref id="B27-biomolecules-02-00001">
        <label>27.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Barrett</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Troup</surname>
              <given-names>D.B.</given-names>
            </name>
            <name>
              <surname>Wilhite</surname>
              <given-names>S.E.</given-names>
            </name>
            <name>
              <surname>Ledoux</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Rudnev</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Evangelista</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Kim</surname>
              <given-names>I.F.</given-names>
            </name>
            <name>
              <surname>Soboleva</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Tomashevsky</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Edgar</surname>
              <given-names>R.</given-names>
            </name>
          </person-group>
          <article-title>Ncbi geo: Mining tens of millions of expression profiles—Database and tools update</article-title>
          <source>Nucleic Acids Res.</source>
          <year>2007</year>
          <volume>35</volume>
          <fpage>D760</fpage>
          <lpage>D765</lpage>
        <pub-id pub-id-type="doi">10.1093/nar/gkl887</pub-id><pub-id pub-id-type="pmid">17099226</pub-id></citation>
      </ref>
      <ref id="B28-biomolecules-02-00001">
        <label>28.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Parkinson</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Kapushesky</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Shojatalab</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Abeygunawardena</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Coulson</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Farne</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Holloway</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Kolesnykov</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Lilja</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Lukk</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Mani</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Rayner</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Sharma</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>William</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Sarkans</surname>
              <given-names>U.</given-names>
            </name>
            <name>
              <surname>Brazma</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <article-title>Arrayexpress—A public database of microarray experiments and gene expression profiles</article-title>
          <source>Nucleic Acids Res.</source>
          <year>2007</year>
          <volume>35</volume>
          <fpage>D747</fpage>
          <lpage>D750</lpage>
        <pub-id pub-id-type="doi">10.1093/nar/gkl995</pub-id><pub-id pub-id-type="pmid">17132828</pub-id></citation>
      </ref>
      <ref id="B29-biomolecules-02-00001">
        <label>29.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gasch</surname>
              <given-names>A.P.</given-names>
            </name>
            <name>
              <surname>Spellman</surname>
              <given-names>P.T.</given-names>
            </name>
            <name>
              <surname>Kao</surname>
              <given-names>C.M.</given-names>
            </name>
            <name>
              <surname>Carmel-Harel</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Eisen</surname>
              <given-names>M.B.</given-names>
            </name>
            <name>
              <surname>Storz</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Botstein</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Brown</surname>
              <given-names>P.O.</given-names>
            </name>
          </person-group>
          <article-title>Genomic expression programs in the response of yeast cells to environmental changes</article-title>
          <source>Mol. Biol. Cell</source>
          <year>2000</year>
          <volume>11</volume>
          <fpage>4241</fpage>
          <lpage>4257</lpage>
        <pub-id pub-id-type="pmid">11102521</pub-id></citation>
      </ref>
      <ref id="B30-biomolecules-02-00001">
        <label>30.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Scott</surname>
              <given-names>M.S.</given-names>
            </name>
            <name>
              <surname>Barton</surname>
              <given-names>G.J.</given-names>
            </name>
          </person-group>
          <article-title>Probabilistic prediction and ranking of human protein-protein interactions</article-title>
          <source>BMC Bioinformat.</source>
          <year>2007</year>
          <volume>8</volume>
          <comment>Article number: 239</comment>
          <pub-id pub-id-type="doi">10.1186/1471-2105-8-239</pub-id>
        </citation>
      </ref>
      <ref id="B31-biomolecules-02-00001">
        <label>31.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Chawla</surname>
              <given-names>N.V.J.</given-names>
              <suffix>N.</suffix>
            </name>
            <name>
              <surname>Kolcz</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <article-title>Editorial: Special issue on learning from imbalanced data sets</article-title>
          <source>SIGKDD Explor.</source>
          <year>2004</year>
          <volume>6</volume>
          <fpage>1</fpage>
          <lpage>6</lpage>
        </citation>
      </ref>
      <ref id="B32-biomolecules-02-00001">
        <label>32.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhang</surname>
              <given-names>C.T.</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>R.</given-names>
            </name>
          </person-group>
          <article-title>Gene essentiality analysis based on deg, a database of essential genes</article-title>
          <source>Methods Mol. Biol.</source>
          <year>2008</year>
          <volume>416</volume>
          <fpage>391</fpage>
          <lpage>400</lpage>
          <pub-id pub-id-type="doi">10.1007/978-1-59745-321-9_27</pub-id>
        </citation>
      </ref>
    </ref-list>
  </back>
</article>
