Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs
Abstract
:1. Introduction
2. Distance or Similarity between Two Vectors
2.1. Distance for Homologous Input Sequences
2.2. Distance for Non-Homologous Sequences
3. The Algorithmic Details of Self-Organizing Map (SOM)
3.1. Training Data
3.2. SOM Grid Size and Initialization
3.3. Update SOM
3.3.1. Identify the Winning Node
3.3.2. Learning by Revising the Winning Node and Its Neighbors: Numeric Vectors
3.3.3. Learning by Revising the Winning Node and its Neighbors: Fixed-Length Sequences
4. The Fit of SOM to Input Data
5. Software Implementing SOM with PWM
6. Conclusions
Acknowledgments
Conflicts of Interest
References
- Kohonen, T. Self-Organizing Maps; Springer: Berlin, Germany, 2001; Volume 30, p. 501. [Google Scholar]
- Ordway, J.M.; Fenster, S.D.; Ruan, H.; Curran, T. A transcriptome map of cellular transformation by the fos oncogene. Mol. Cancer 2005, 4, 19. [Google Scholar] [CrossRef] [PubMed]
- Covell, D.G.; Wallqvist, A.; Rabow, A.A.; Thanki, N. Molecular classification of cancer: Unsupervised self-organizing map analysis of gene expression microarray data. Mol. Cancer Ther. 2003, 2, 317–332. [Google Scholar] [PubMed]
- Xiao, L.; Wang, K.; Teng, Y.; Zhang, J. Component plane presentation integrated self-organizing map for microarray data analysis. FEBS Lett. 2003, 538, 117–124. [Google Scholar] [CrossRef]
- Wang, J.; Delabie, J.; Aasheim, H.; Smeland, E.; Myklebost, O. Clustering of the SOM easily reveals distinct gene expression patterns: Results of a reanalysis of lymphoma study. BMC Bioinform. 2002, 3, 36. [Google Scholar] [CrossRef]
- Toronen, P.; Kolehmainen, M.; Wong, G.; Castren, E. Analysis of gene expression data using self-organizing maps. FEBS Lett. 1999, 451, 142–146. [Google Scholar] [CrossRef]
- Xia, X.; Xie, Z. AMADA: Analysis of microarray data. Bioinformatics 2001, 17, 569–570. [Google Scholar] [CrossRef]
- Xia, X. Bioinformatics and the Cell: Modern Computational Approaches in Genomics, Proteomics and Transcriptomics; Springer: New York, NY, USA, 2007; p. 349. [Google Scholar]
- Kozak, M. Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res. 1981, 9, 5233–5252. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. The +4G site in Kozak consensus is not related to the efficiency of translation initiation. PLoS ONE 2007, 2, e188. [Google Scholar] [CrossRef] [PubMed]
- Ma, P.; Xia, X. Factors affecting splicing strength of yeast genes. Comp. Funct. Genom. 2011, 2011. [Google Scholar] [CrossRef]
- Vlasschaert, C.; Xia, X.; Gray, D.A. Selection preserves Ubiquitin Specific Protease 4 alternative exon skipping in therian mammals. Sci. Rep. 2016, 6, 20039. [Google Scholar] [CrossRef] [PubMed]
- Sidrauski, C.; Cox, J.S.; Walter, P. tRNA ligase is required for regulated mRNA splicing in the unfolded protein response. Cell 1996, 87, 405–413. [Google Scholar] [CrossRef]
- Sidrauski, C.; Walter, P. The transmembrane kinase Ire1p is a site-specific endonuclease that initiates mRNA splicing in the unfolded protein response. Cell 1997, 90, 1031–1039. [Google Scholar] [CrossRef]
- Gonzalez, T.N.; Sidrauski, C.; Dorfler, S.; Walter, P. Mechanism of non-spliceosomal mRNA splicing in the unfolded protein response pathway. EMBO J. 1999, 18, 3119–3132. [Google Scholar] [CrossRef] [PubMed]
- Kaufman, R.J. Stress signaling from the lumen of the endoplasmic reticulum: Coordination of gene transcriptional and translational controls. Genes Dev. 1999, 13, 1211–1233. [Google Scholar] [CrossRef] [PubMed]
- Mahony, S.; Benos, P.V.; Smith, T.J.; Golden, A. Self-organizing neural networks to support the discovery of DNA-binding motifs. Neural Netw. 2006, 19, 950–962. [Google Scholar] [CrossRef] [PubMed]
- Mahony, S.; Golden, A.; Smith, T.J.; Benos, P.V. Improved detection of DNA motifs using a self-organized clustering of familial binding profiles. Bioinformatics 2005, 21 (Suppl. 1), i283–i291. [Google Scholar] [CrossRef] [PubMed]
- Mahony, S.; Hendrix, D.; Golden, A.; Smith, T.J.; Rokhsar, D.S. Transcription factor binding site identification using the self-organizing map. Bioinformatics 2005, 21, 1807–1814. [Google Scholar] [CrossRef] [PubMed]
- Mahony, S.; Hendrix, D.; Smith, T.J.; Golden, A. Self-Organizing Maps of Position Weight Matrices for Motif Discovery in Biological Sequences. Artif. Intell. Rev. 2005, 24, 397–413. [Google Scholar] [CrossRef]
- Lee, N.K.; Wang, D. SOMEA: Self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model. BMC Bioinform. 2011, 12 (Suppl. 1), S16. [Google Scholar] [CrossRef] [PubMed]
- Kohonen, T.; Somervuo, P. How to make large self-organizing maps for nonvectorial data. Neural Netw. 2002, 15, 945–952. [Google Scholar] [CrossRef]
- Jukes, T.H.; Cantor, C.R. Evolution of protein molecules. In Mammalian Protein Metabolism; Munro, H.N., Ed.; Academic Press: New York, NY, USA, 1969; pp. 21–123. [Google Scholar]
- Kimura, M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 1980, 16, 111–120. [Google Scholar] [CrossRef] [PubMed]
- Hasegawa, M.; Kishino, H. Heterogeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. Jpn. J. Genet. 1989, 64, 243–258. [Google Scholar] [CrossRef]
- Kishino, H.; Hasegawa, M. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 1989, 29, 170–179. [Google Scholar] [CrossRef]
- Hasegawa, M.; Kishino, H.; Yano, T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 1985, 22, 160–174. [Google Scholar] [CrossRef]
- Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–526. [Google Scholar] [PubMed]
- Lanave, C.; Preparata, G.; Saccone, C.; Serio, G. A new method for calculating evolutionary substitution rates. J. Mol. Evol. 1984, 20, 86–93. [Google Scholar] [CrossRef] [PubMed]
- Tavaré, S. Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences; American Mathematical Society: Providence, RI, USA, 1986; Volume 17, pp. 57–86. [Google Scholar]
- Tamura, K.; Nei, M.; Kumar, S. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc. Natl. Acad. Sci. USA 2004, 101, 11030–11035. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Information-theoretic indices and an approximate significance test for testing the molecular clock hypothesis with genetic distances. Mol. Phylogenet. Evol. 2009, 52, 665–676. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. DAMBE5: A comprehensive software package for data analysis in molecular biology and evolution. Mol. Biol. Evol. 2013, 30, 1720–1728. [Google Scholar] [CrossRef]
- Xia, X. DAMBE6: New tools for microbial genomics, phylogenetics and molecular evolution. J. Hered. 2017, 108, 431–437. [Google Scholar] [CrossRef] [PubMed]
- Samsonova, E.V.; Kok, J.N.; Ijzerman, A.P. TreeSOM: Cluster analysis in the self-organizing map. Neural Netw. 2006, 19, 935–949. [Google Scholar] [CrossRef] [PubMed]
- Abe, T.; Kanaya, S.; Kinouchi, M.; Ichiba, Y.; Kozuki, T.; Ikemura, T. Informatics for unveiling hidden genome signatures. Genome Res. 2003, 13, 693–702. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences. Mol. Phylogenet. Evol. 2016, 102, 331–343. [Google Scholar] [CrossRef] [PubMed]
- Staden, R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984, 12, 505–519. [Google Scholar] [CrossRef]
- Stormo, G.D.; Schneider, T.D.; Gold, L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 1986, 14, 6661–6679. [Google Scholar] [CrossRef] [PubMed]
- Hertz, G.Z.; Hartzell, G.W.; Stormo, G.D. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Appl. Biosci. 1990, 6, 81–92. [Google Scholar] [CrossRef] [PubMed]
- Xia, X. Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction. Scientifica 2012, 2012, 917540. [Google Scholar] [CrossRef]
- Iwasaki, Y.; Wada, K.; Wada, Y.; Abe, T.; Ikemura, T. Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance. Chromosome Res. 2013, 21, 461–474. [Google Scholar] [CrossRef] [PubMed]
- Delgado, S.; Moran, F.; Mora, A.; Merelo, J.J.; Briones, C. A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps. Bioinformatics 2015, 31, 736–744. [Google Scholar] [CrossRef] [PubMed]
- Lorenzo-Redondo, R.; Delgado, S.; Moran, F.; Lopez-Galindez, C. Realistic three dimensional fitness landscapes generated by self organizing maps for the analysis of experimental HIV-1 evolution. PLoS ONE 2014, 9, e88579. [Google Scholar] [CrossRef] [PubMed]
- Xia, X.; Hafner, M.S.; Sudman, P.D. On transition bias in mitochondrial genes of pocket gophers. J. Mol. Evol. 1996, 43, 32–40. [Google Scholar] [CrossRef] [PubMed]
- Tapan, S.; Wang, D. A Further Study on Mining DNA Motifs Using Fuzzy Self-Organizing Maps. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 113–124. [Google Scholar] [CrossRef] [PubMed]
- Wang, D.; Tapan, S. A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 1677–1688. [Google Scholar] [CrossRef] [PubMed]
- Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: New York, NY, USA, 2002. [Google Scholar]
- Bauer, H.; Riesenhuber, M.; Geisel, T. Phase diagrams of self-organizing maps. Phys. Rev. E 1996, 54, 2807–2810. [Google Scholar] [CrossRef]
- Bauer, H.-U.; Pawelzik, K.R. Quantifying the neighborhood preservation of self-organizing feature maps. Neural Netw. 1992, 3, 570–579. [Google Scholar] [CrossRef] [PubMed]
- Kaski, S.; Lagus, K. Comparing self-organizing maps. In Artificial Neural Networks, In Proceedings of the ICANN 96, 1996 International Conference, Bochum, Germany, 16–19 July 1996; von der Malsburg, C., von Seelen, W., Vorbrüggen, J.C., Sendhoff, B., Eds.; Springer: Berlin/Heidelberg, Germany, 1996; pp. 809–814. [Google Scholar]
- Villmann, T.; Der, R.; Herrmann, M.; Martinetz, T. Topology Preservation in Self-Organizing Feature Maps: General Definition and Efficient Measurement. In Fuzzy Logik; Reusch, B., Ed.; Springer: Berlin/Heidelberg, Germany, 1994; pp. 159–166. [Google Scholar]
- Villmann, T.; Der, R.; Martinetz, T. A Novel Approach to Measure the Topology Preservation of Feature Maps. In Proceedings of the International Conference on Artificial Neural Networks (ICANN’94), Sorrento, Italy, 26–29 May 1994; Marinaro, M., Morasso, P.G., Eds.; Springer: London, UK, 1994; Volume 1, Parts 1 and 2. pp. 298–301. [Google Scholar]
- Villmann, T.; Der, R.; Herrmann, M.; Martinetz, T.M. Topology preservation in self-organizing feature maps: Exact definition and measurement. IEEE Trans. Neural Netw. 1997, 8, 256–266. [Google Scholar] [CrossRef] [PubMed]
- Hammer, B. Challenges in Neural Computation. Künstl Intell. 2012, 26, 333–340. [Google Scholar] [CrossRef]
- Boelaert, J.; Bendhaiba, L.; Olteanu, M.; Villa-Vialaneix, N. SOMbrero: An R Package for Numeric and Non-numeric Self-Organizing Maps. In Advances in Self-Organizing Maps and Learning Vector Quantization; Villmann, T., Schleif, F.-M., Kaden, M., Lange, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 219–228. [Google Scholar]
Seq. Pair | Ns | Nv | Ni | DIE | DSE | Seq. Pair |
---|---|---|---|---|---|---|
S1 vs. S2 | 9 | 4 | 87 | 0.1451 | 0.1464 | S1 vs. S2 |
S1 vs. S3 | 40 | 30 | 30 | Inapplicable | 2.0915 | S1 vs. S3 |
S2 vs. S3 | 20 | 10 | 70 | 0.4024 | 0.4116 | S2 vs. S3 |
(a) | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
A | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
C | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
G | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
T | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
(b) | |||||||
A | 1.695 | −4.963 | −4.963 | −4.963 | −4.963 | −4.963 | 1.695 |
C | −4.379 | 2.280 | 2.280 | −4.379 | −4.379 | −4.379 | −4.379 |
G | −4.379 | −4.379 | −4.379 | 2.280 | −4.379 | −4.379 | −4.379 |
T | −4.963 | −4.963 | −4.963 | −4.963 | 1.695 | 1.695 | −4.963 |
(a) | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
A | 1 | 0 | 0 | 1 | 0 | 0 | 2 |
C | 0 | 2 | 2 | 0 | 0 | 0 | 0 |
G | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
T | 0 | 0 | 0 | 0 | 2 | 2 | 0 |
(b) | |||||||
A | 0.723 | −5.935 | −5.935 | 0.723 | −5.935 | −5.935 | 1.716 |
C | −5.350 | 2.301 | 2.301 | −5.350 | −5.350 | −5.350 | −5.350 |
G | 1.308 | −5.350 | −5.350 | 1.308 | −5.350 | −5.350 | −5.350 |
T | −5.935 | −5.935 | −5.935 | −5.935 | 1.716 | 1.716 | −5.935 |
© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xia, X. Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs. Computation 2017, 5, 43. https://doi.org/10.3390/computation5040043
Xia X. Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs. Computation. 2017; 5(4):43. https://doi.org/10.3390/computation5040043
Chicago/Turabian StyleXia, Xuhua. 2017. "Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs" Computation 5, no. 4: 43. https://doi.org/10.3390/computation5040043
APA StyleXia, X. (2017). Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs. Computation, 5(4), 43. https://doi.org/10.3390/computation5040043