Application of Information—Theoretic Concepts in Chemoinformatics

Received: 1 September 2010; in revised form: 26 September 2010 / Accepted: 16 October 2010 / Published: 20 October 2010
(This article belongs to the Special Issue What Is Information?)
Abstract: The use of computational methodologies for chemical database mining and molecular similarity searching or structure-activity relationship analysis has become an integral part of modern chemical and pharmaceutical research. These types of computational studies fall into the chemoinformatics spectrum and usually have large-scale character. Concepts from information theory such as Shannon entropy and Kullback-Leibler divergence have also been adopted for chemoinformatics applications. In this review, we introduce these concepts, describe their adaptations, and discuss exemplary applications of information theory to a variety of relevant problems. These include, among others, chemical feature (or descriptor) selection, database profiling, and compound recall rate predictions.
Keywords: database profiling; feature selection; feature significance; information theory; similarity searching; molecular topology; virtual screening
