Next Article in Journal
The Synthesis of 3H-Labelled 8-Azido-N6-Benzyladenine and Related Compounds for Photoaffinity Labelling of Cytokinin-Binding Proteins
Previous Article in Journal
Peptides for Skin Protection and Healing in Amphibians
Open AccessArticle

Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides

1
School of Advanced Materials Science and Engineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
2
School of Chemical Engineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
3
Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109, USA
4
Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA
5
Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI 48109, USA
6
SKKU Advanced Institute of Nanotechnology (SAINT), Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
7
Department of Biological and Environmental Engineering, Cornell University, Ithaca, NY 14850, USA
*
Author to whom correspondence should be addressed.
Molecules 2019, 24(2), 348; https://doi.org/10.3390/molecules24020348
Received: 12 December 2018 / Revised: 16 January 2019 / Accepted: 17 January 2019 / Published: 18 January 2019
(This article belongs to the Section Medicinal Chemistry)
In biological systems, a few sequence differences diversify the hybridization profile of nucleotides and enable the quantitative control of cellular metabolism in a cooperative manner. In this respect, the information required for a better understanding may not be in each nucleotide sequence, but representative information contained among them. Existing methodologies for nucleotide sequence design have been optimized to track the function of the genetic molecule and predict interaction with others. However, there has been no attempt to extract new sequence information to represent their inheritance function. Here, we tried to conceptually reveal the presence of a representative sequence from groups of nucleotides. The combined application of the K-means clustering algorithm and the social network analysis theorem enabled the effective calculation of the representative sequence. First, a “common sequence” is made that has the highest hybridization property to analog sequences. Next, the sequence complementary to the common sequence is designated as a ‘representative sequence’. Based on this, we obtained a representative sequence from multiple analog sequences that are 8–10-bases long. Their hybridization was empirically tested, which confirmed that the common sequence had the highest hybridization tendency, and the representative sequence better alignment with the analogs compared to a mere complementary. View Full-Text
Keywords: representative nucleotide; hybridization profile; K-means clustering; multiple equilibria; sociogram representative nucleotide; hybridization profile; K-means clustering; multiple equilibria; sociogram
Show Figures

Graphical abstract

MDPI and ACS Style

Lee, B.; Ahn, S.Y.; Park, C.; Moon, J.J.; Lee, J.H.; Luo, D.; Um, S.H.; Shin, S.W. Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides. Molecules 2019, 24, 348.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop