From Data to Knowledge Processing Machines †

: Natural complex adaptive systems arise, live, and eventually, die. One could say that they became unﬁt for their environment. The science of complexity puts it differently: the equilibrium states of a system are changing all the time, as adjustments to external ﬂuctuations. Down the road, when facing positive feed-back loops, equilibria can drift across the edge of chaos. Applied to the digital world, where control is in our hands, we can imagine steering equilibria to areas where beneﬁcial emergence arises while avoiding collapse. This paper deals with the ground principles to reach such control.


Introduction
The most sophisticated neural networks, and the most automated features of megacloud providers do not even come close to the naturally intelligent, resilient, and sentient biological systems. They are limited by the sacred pillars of computing science: the Church-Turing thesis and its implementation, the Von Neumann stored-program control architecture. New research from various fields of science is changing the game and is paving the way to artificial general intelligence (AGI). These include the science of complexity that Stephen Hawking called the science of the 21st century, neurosciences especially, applied to the construction of our knowledge framework within the neocortex and genomics that shows us how our cells share and process information to take action and to maintain overall stability. In this line of study, AGI is not a sophisticated version of existing AI models but a property of evolving machines. This leads us to redefine the concept of information as a building block of knowledge with context rather than mere data. With appropriate math objects, the structural machines proposed by Professor Burgin in his general theory of information [1,2], we can implement a new class of information processing machines, the autopoietic machines [3], that open a new era of relationships between human and machines with better performance, lower cost and unknown new applications only limited by human imagination and ingenuity. We have the opportunity to emphasize that current information processing systems are complex adaptive systems prone to emergent behavior when faced with large fluctuations and limited resources. While emergence has both positive and negative outcomes, in order to maintain stability without disrupting the unity of the system and its function, we need to change the foundational architecture of information systems just as the neocortex repurposed the reptilian brain without having to change its fundamental architecture.

Complex Adaptive Systems
A system is complex when it cannot be explained through the study of its isolated parts, as the behavior of a part affects the behaviors of others and adaptive when it resists to external fluctuations to a certain extent above which it crosses the border of the chaotic world and collapses. Systems have been studied for millennia but the creation of the Santa Fe Institute by Murray Gell-Mann et al., in 1984, is the landmark in the building of the science of complexity [4]. From dinosaurs to Empires, from rainforest ecosystems to large cities, we now understand global behaviors for which reductionism and determinism cannot help all living species. From this mere observation, a working hypothesis is that AGI cannot emerge from anything else than a CAS. Hence, the first question to address becomes: what digital networks are missing to behave like CAS? It starts with fathoming what confers robustness to complex adaptive systems.

Movement for Steadiness
The overall steadiness and homeostasis of a C.A.S do not mean the system has reached a point of equilibrium. As with liquid water, looking stable in a glass, whereas its chemical composition in molecules atoms and ions changes every 10 −15 s, a complex adaptive system moves from an equilibrium point to another, every time. In our body, adjustments of pH, sugar, temperature levels are real-time and never-ending processes through positive and negative feedback loops between many types of cells. In economics, when Walras mathematically proved that the general equilibrium in a market of perfect competition should be equal to the marginal utility and cost of production in the long run, the foundational principles were not consistent with the observation as Henri Poincaré pointed out. Supply, demand, and hence prices, change all the time due to external events and internal behaviors of all the agents of the trade ecosystem. There is no such thing as perfect competition, as information is biased and asymmetrical. That is what complex economics [5], thanks to Nash, Akerlof, Kahneman, Arthur, among others, is redefining with great results.

Movement for Emergence
CAS evolve through never ending trial and error, most trials being errors, but the key thing is that a few lead to a higher level of complexity and fitness. When a try is successful, the code is updated accordingly through DNA for living species or by law for social organizations, hence it is spread over the system and for the next generations. Nonetheless, the exploitation of a successful behavior should be challenged with venturesome exploration. These random walks can lead to an absorbing boundary, thus extinction, or to a higher level of prosperity. Then, when, and only when, equilibria drift to the frontier of chaos, emergence can happen. As Ilya Prigogine put it, that is where life happens and why time is not reversible. In other words, evolution is a succession of rare successful trials leading to a transition phase, a leap, that betokens a higher level of order. To rephrase Ilya Prigogine, this tiny pre-chaotic area is where life emerges and why time is not reversible. Applied to digital systems, the problem to solve is the replacement of random walks with controlled steps to trigger beneficial emergence, and the dodging of the detrimental ones. In other words, find the local minima of entropy around the ephemeral equilibria close to the edge of chaos, move there and compute again.

Movement for Knowledge
In his 1943 lectures, Erwin Schrödinger introduced the concept of an aperiodic crystal that could encode all the "what ifs" that our cells need to behave as they behave and begot a source of inspiration for Watson and Crick in the discovery of the DNA double helix molecule, ten years later. In 1948, the Shannon entropy stated that the informational value of a communicated message depends on the degree of surprisal of the message. Applied to Complex Adaptive Systems, DNA (or rule of law) enables local autonomy and trade of information. Local communications reduce noise and surprisal. Local agents (e.g., cells) are moving at a scale that upper layers cannot apprehend. Then, multiple sources of information (sensors) related to the same event decrease the Shannon Entropy, hence triggering more appropriate action thanks to reliable information, marginally fueling the knowledge repository, of which and of utmost importance: knowledge of the behaviors of the others. Cognition needs knowledge, nourished with information whose value is proportional to movement through diversity of scale, senses, and representations of the world.

Neocortex
Based on Vernon Mountcastle's [6,7] publications, Jeff Hawkins et al. [8] are trying to understand how knowledge is constructed and updated. Mountcastle's theory is widely accepted by the community and states that the Neocortex of mammalians is made of a set of cortical columns, sharing the very same blueprint. Their mission depends only on their connections (sensors, motors, or other columns for deeper reasoning). Each of these columns is a building block of our global knowledge, with their own representation of the world. In a human brain, we have around 150,000 of these partial views and every new piece of information coming in, is a marginal update of some of them. Many copies of the same information exist in an indefinite number of columns. The second finding of Hawkins is that each column is a predictive machine using reference frames of thousands or millions of objects and concepts, physical or abstract in 1D, 2D or 3D for physical objects and even more dimensions for more complex abstract concepts or thoughts. It is a grid where all items are labeled with coordinates. Then, columns exchange information to come up with a final and unique result. The wiring between the columns is both hierarchical and peer-to-peer, depending on what is under processing and how the prediction is vs. reality. Knowledge is consensual and updated before events occur and definitively validated afterwards. When applied to artificial intelligence, the first output of these findings is that even though mimicking the firing process of a neuron gave outstanding results, the cortical columns and their reference frames exhibit a higher level of organization than neural networks and may open new opportunities. The second is that learning is a process of prediction adjustment, meaning (1) that new information is to be consistent with previous knowledge and (2) prediction enables real-time processing.

From Data Processing to Knowledge Processing Machines
The science of complexity is pointing out that a higher level of order emerges at the edge of chaos, after a transition phase. Hence, our hypothesis to build digital cognition and homeostasis of digital networks. Then, the Neocortex architecture as described above is a profound confirmation that knowledge does not happen at neuron level but at a higher level of order: the cortical columns through organization and governance. That is the second hypothesis we will onboard to build a cognitive, self-managed and self-organized network of network of machines.

Data, Information, Knowledge
Framing the idea of knowledge processing vs. information processing, starts with the right definitions of information and knowledge. The widely accepted model is the DIKW pyramid (data, information, knowledge, wisdom). Data is raw (symbols), information brings context, knowledge gives meaning and wisdom elevates us to the understanding of the why. However, this presentation lacks the essence of how CAS and Neocortex work. Instead, the KIME square [9] states that, as matter contains energy, knowledge contains information. In other words, "Information is to Knowledge what Energy is to Matter", which is not a metaphor but a fundamental rule of the General Theory of Information. As energy gives dynamics to substance, information provides relief and contrast to knowledge [4]. Bateson stated in 1972 that "information is the difference that makes a difference". If this definition was considered as deceiving as it is beautiful, the concept of difference has strong implications. The difference is what a specific cognizing system is capable of detecting as a difference. So, the difference is receiver specific [10], i.e., the observer is part of the observed. This approach makes a lot of sense when compared to Jeff Hawkins's brain theory, i.e., we acquire new knowledge per difference. When new information comes in, it is compared to existing reference frames and triggers a marginal update or not.

The Structural Machines Framework
If the KIWE square is the first cornerstone of this framework, then knowledge has forms and shapes that confer adaptive properties (d'Arcy Thompson) [11]. Complex forms in the physical world as we know it, as proteins for example, are 3D. In the world of ideas and thoughts, it appears that our cortical columns dedicated to abstract objects encode n-dimension models of knowledge. That is the second cornerstone. We are living in a world of perpetual novelty as John H. Holland [12] put it, therefore, structures of knowledge are to be continuously updated to reflect reality. That is the third cornerstone. In a nutshell, the idea is to model an n-dimension motion picture of entities connected with sparsity and whose states and behaviors can change at any time. Such a thing does not exist in the field of information processing today, be it from standard computers to elaborate neural networks or even from quantum computing. The elementary unit is a triad made of two entities and a connection between them. This connection depicts the relationship and the behavior that evolves over time between the two entities. A knowledge structure is a network of triads connected to each other in a dynamic n-dimensional graph. All these elements are named and compose a hierarchical network of a network of triads. The structural machine framework describes a process which allows information processing through transformation of knowledge structures. The structural machine needs two key devices: a processor, which uses the knowledge structures as input and delivers the processed information as knowledge structures in the output space; and a control device, outside of the processing machine whose role is to determine at any moment that a processor is eligible to process a given workload. The controller allows the implementation of a cognizing agent overlay that manages the downstream processors and associated knowledge-structure evolution [3,4] The cognizing agents are defined using the mathematical theory of oracles proposed and developed by Prof. Mark Burgin [13]. The consequences of such an information processing framework are profound:

•
Knowledge structures are not limited to symbols (numbers or words) but also embed the relationship between these symbols and their evolutionary behaviors.

•
It is a generalization of a Turing Machine: if knowledge structures are words and if the transformation process is an algorithm, then we are back to a standard Turing Machine. • It describes how to control a complex system made of triads. • This complex system is adaptive, as triads are evolutionary agents with different states, relationships, and behaviors.
What emerges from this model is the concept of a digital gene in addition to the digital neuron. CAS follow models, sets of rules. Agents have attributes, relationships, and behaviors. Natural CAS are not controlled, unlike structural machines. Depending on use case, points of equilibrium can be kept far away from the chaotic boundaries or not, to foster emergence and transition of phase.

Autopoietic Machines
The structural machine framework allows control through digital gene edition. Another way to put it is that structural machines are autopoietic machines. Autopoietic is the property of a system to maintain itself homeostatic beyond the lifespan of its components. The halting problem (the fact that an algorithm may run for ever if conditions to stop are never met) is an example of the homeostasis issue (undecidable problem) in a network of Turing Machines whereas cognizing agents can detect a halting problem and design a solution.

Deep Reasoning
Highways, roads, traffic lights or signs, and connected vehicles moving across them constitute a complex adaptive system. DNA is what is set at inception through extraction and classification from neural networks, then new triads, captured by "miner" cognizing agents update the model and "designer" cognizing agents can analyze the situation and make decisions in a bespoke model of reasoning. The same applies for unlimited use cases: cyberattacks, or any fraudulent behavior in the financial or insurance markets. Robots, especially when used in hazardous places where external fluctuations are extreme. Swarms of drones looking for self-coordination or all domains under the influence of weather randomness, such as precision sustainable agriculture.

Conclusions
Building autopoietic machines using the tools and findings of the science of complexity, mimicking the Neocortex to manage knowledge, including the knowledge of the body and sensors fueling it with never-ending information working for him, paves the way for: (i) autonomous networks of machines combining performance, low energy consumption and less human resources to supervise them; (ii) the digital gene is a richer code of communication than API's; (iii) deep reasoning is at hand for many use cases; (iv) new roads to reach all-purpose artificial intelligence. Nonetheless, it is still theory and as Richard Feynman put it: "It doesn't make a difference how beautiful your guess is. It doesn't make a difference how smart you are, who made the guess, or what his name is. If it disagrees with experiment, it's wrong." [14].