Main Text
Shannon’s Information theory was devised to improve communication systems performance and to assure an efficient and reliable message exchange over a communication channel. In this contexts, the question “what is information” per se has never been asked and was irrelevant to the engineering problems under consideration. The newly invented notion of “information measure” has served the design tasks pretty well. That led to a long lasting improper mixing and merging between notions of “information” and “information measure”, which, in turn, made the relations between notion of “information” and notions of “data”, “knowledge”, and “semantics”, blurred, intuitive and undefined.
However, recent advances in almost all scientific fields put an urgent demand for an explicit definition of what information is; especially, what is meaningful information that dominates today the contemporary life science research. To meet this demand, I have proposed a new definition of information, which in its last edition sounds like this:
“Information is a linguistic description of structures observable in a given data set”.
Here, I would like to provide some auxiliary arguments justifying this definition: Shannon’s Information Theory was devised to be used in communication systems, where the transmitted message is always shaped as a linear one-dimensional string of signal data. Even a TV image was once transmitted in a line-by-line scan fashion. However, human brain perceives image as a single two-dimensional entity. Providing an information measure for a two-dimensional signal is a problem not foreseen by the Information Theory. Therefore, I have wittingly chosen a digital image to explore my “what information is” definition.
A digital image is a two-dimensional set of data elements called picture elements or pixels. In an image, pixels are placed not randomly, but, due to the similarity in their physical properties, they are naturally grouped into some lumps or clusters. I propose to call these clusters primary or physical data structures.
In the eyes of an external observer, the primary data structures are further grouped into more larger and complex agglomerations, which I propose to call secondary data structures (structures of structures). These secondary structures reflect human observer’s view on the grouping of primary data structures, and therefore they could be called meaningful or semantic data structures. While formation of primary (physical) data structures is based on objective (natural, physical) properties of data, the subsequent formation of secondary (semantic) data structures is a subjective process guided by human conventions and habits.
As it was said, Description of structures observable in a data set should be called “Information”. In this regard, two types of information must be distinguished – Physical Information and Semantic Information. They are both language-based descriptions; however, physical information can be described with a variety of languages (recall that mathematics is also a language), while semantic information can be described only by means of natural human language. (More details on the subject could be find in [
1]).
The segregation between physical and semantic information is the most essential insight about the nature of information provided by the new definition. Indeed, most of the present-day followers of Shannon’s Information Theory speak predominantly about Integrated Information Theory, Generalized Information Theory, United, Unified, Integral, Consolidated and so on “Informations”, cherishing the idea that semantic information can be seen as an extension of Shannon’s information and in some way be merged with it. Shannon personally has always distanced himself from such an approach and has warned (in 1956): “In short, information theory is currently partaking of a somewhat heady draught of general popularity. It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems”, [
2].
Although my definition of information as a complex notion composed of Real and Imaginary parts (in our case Physical and Semantic information) undoubtedly highlights the information duality, the mainstream information processing community persistently tries to treat them jointly.
From the point of view of my definition, all known today “informations” such as Shannon’s, Fisher’s, Renyi’s, Kolmogorov’s, and Chaitin’s informations—they all should be seen as physical information incarnations. Categorically, semantic information cannot be derived or be drawn from physical information. Despite of this, people persistently try to do that again and again.
Only from the point of view of my definition, the ambiguous relations between data and information, knowledge and information, cognition and information, could be clarified and made distinct. Floridi’s question “is information meaningful data?”, [
3], now has to be answered decidedly:
No! Information does not have any deal with data! Semantic information (semantic interpretation) is ascribed to physical information, and not to the data that carries it. The relations between knowledge and information could also be now expressed more correctly:
knowledge is semantic information memorized in the system. Cognition (intelligence, thinking) is also become undeniably explicated:
Cognition is the ability to process information, [
4].
Only from the point of view of my definition, which declares and affirms the duality of information, one can understand and explain the paradigm shift, which we witness today in all fields of science: from a Computational (that is, Physical information processing (data processing) based) approach to a Cognitive (that is, Semantic information processing based) approach. None can deny this ubiquitously discerned paradigm shift: from Computer vision to Cognitive vision, from Computational linguistics to Cognitive linguistics, from Computational biology to Cognitive biology, from Computational neuroscience to Cognitive neuroscience, and so on—the list can be extended endlessly.
Only from the point of view of my definition, information descriptions are reified as text strings written in some language with a case-appropriate alphabet. That is, information now must be seen as a material entity, not a spiritual or a psychic impression, but a solid physical substance (information as a thing—once that has been a very debated topic). That requires an urgent revision of many well established notions and information processing practices (in brain-, neuro-, bio-, and many other life sciences).
I hope my humble opinion would be helpful when the time will come to face these issues.