Microorganisms 2013, 1(1), 137-157; doi:10.3390/microorganisms1010137
Review

A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)

1,2email, 1,3,* email, 1email, 1,4email and 1email
Received: 26 September 2013; in revised form: 5 November 2013 / Accepted: 8 November 2013 / Published: 20 November 2013
(This article belongs to the Special Issue Advances and New Perspectives in Microbial Research)
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract: With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.
Keywords: metagenome; oligonucleotide composition; influenza virus; big data; peptide composition; bioinformatics; SOM; genome signature; microbial community
PDF Full-text Download PDF Full-Text [2506 KB, uploaded 20 November 2013 14:03 CET]

Export to BibTeX |
EndNote


MDPI and ACS Style

Iwasaki, Y.; Abe, T.; Wada, K.; Wada, Y.; Ikemura, T. A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM). Microorganisms 2013, 1, 137-157.

AMA Style

Iwasaki Y, Abe T, Wada K, Wada Y, Ikemura T. A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM). Microorganisms. 2013; 1(1):137-157.

Chicago/Turabian Style

Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi. 2013. "A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)." Microorganisms 1, no. 1: 137-157.

Microorganisms EISSN 2076-2607 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert