Cognition as a Mechanical Process

Cognition is often defined as a dual process of physical and non-physical mechanisms. This duality originated from past theory on the constituent parts of the natural world. Even though material causation is not an explanation for all natural processes, phenomena at the cellular level of life are modeled by physical causes. These phenomena include explanations for the function of organ systems, including the nervous system and information processing in the cerebrum. This review restricts the definition of cognition to a mechanistic process and enlists studies that support an abstract set of proximate mechanisms. Specifically, this process is approached from a large-scale perspective, the flow of information in a neural system. Study at this scale further constrains the possible explanations for cognition since the information flow is amenable to theory, unlike a lower-level approach where the problem becomes intractable. These possible hypotheses include stochastic processes for explaining the processes of cognition along with principles that support an abstract format for the encoded information.


The Many Definitions of Cognition
Common definitions of cognition often include the phrase mental process or acquisition of knowledge. Reference to mental processing descends from an assignment of non-material substances to the act of thinking. Philosophers, such as the Cartesians and Platonists, have written on this topic, including the relationship between mind and matter. This perspective further involves concepts such as consciousness and intentionality. However, these ideas are based on metaphysical explanations and not on a modern scientific interpretation [1].
The metaphysical approach is exemplified by the philosopher Plato and his Theory of Forms, a hypothesis of how knowledge is acquired. The idea is that a person is aware of an object, such as a kitchen table, by comparison with an internal representation of that object's true form. The modern equivalent of this hypothesis is that our recognition of an object is by the similarity of its measurable properties with its true form. According to this theory, these true and perfect forms originate in the non-material world.
However, face recognition in primates shows that an object's measured attributes are not compared against a true form, but instead that recognition is from a comparison between stored memory and a set of linear metrics of the object [2]. These findings agree with studies of artificial neural networks, an analog of cerebral brain structure, where objects are recognized as belonging to a category without prior knowledge of the true categories [3].
The theory of true forms originates from a thinking of a perfectly designed world with deterministic processes, while a theory absent of true forms may instead depend on probabilistic processes. The rise of probabilistic thinking in natural science has coincided with modern statistical methods and explanations of natural phenomena at the atomic level [4].
A modern experimental biologist would approach a study of the mind from a material perspective, such as by the study of the cells and tissue of brain matter. This approach is dependent on reduction of the complexity of a problem. An example is from economics, where an individual is generalized as a single type and consequently the broader theories of population behavior are based on this assumption [5]. There is a similar approach in Newtonian physics where an object's spatial extent is simplified as a single point in space.
Since some natural phenomena are not tractable to mechanistic study, concepts exist that are not solely based on material and physical causes. However, it is necessary to base science theory of brain function on natural mechanisms while disallowing mental causation. There are exceptions where the physical world is visually indescribable and solely dependent on mathematical description, but these occurrences are typically not applicable to the investigation of life at the cellular level.

Mechanical Perspective of Cognition
Even though a mechanical perspective of neural systems is not controversial, there remains a non-mechanical and metaphysical perspective concerning our sensory perception of the world. An example is the philosophical conjecture about the relationship between the human mind and any simulation of it [6]. This conjecture is based on assumptions about intentionality and the act of thinking. However, others have presented scientific evidence where these assumptions do not hold true [7]. One example is the mechanism for an intent to move a body limb, such as in the act of walking. Whereas the traditional perspective expects a mental process of thinking that leads to the generation of these body movements, instead the mechanistic perspective is that a neuronal cell is the generator of the intent of a body movement [8].
While a metaphysical explanation for phenomena is applicable to some areas of knowledge, such as in the study of ethics, these explanations are not informative of nature where the physical processes are expected. In the case of neural systems, the neurons, their connections, and the neural processes are measurable by their properties, so their phenomena are assignable to material causes instead of mental causes. Further, there is a hierarchy of cellular organization that describes the brain where each level of this hierarchy is associated with a particular scientific approach [9]. An example is at the cellular level where the neurons are studied by the methods of cellular anatomy. This area of study also includes the mechanisms for neuron formation and communication between neurons.
Neural systems may be studied at a higher-level perspective, such as at the level of brain tissue or how information is communicated throughout the neural system [10]. The information processing of the brain is particularly relevant since it has a close analog with the artificial neural network architectures of computer science [11,12]. However, the lower levels of biological organization are not as comparable, such as where an artificial neural system is firmly based on an abstract and simplified concept of a neuronal cell and its synaptic functions.

Purpose of This Review
This review is a search for a modern scientific definition of cognition. This mechanistic perspective is ideally approached at the higher scale of a neural system-the flow of information. Since cognition involves knowledge, the informational level is the most relevant. The purpose here is to provide a solid foundation for building a theory on cognition that is free of the constraint of metaphysics. This includes rejection of traditional terminology that is not informative in explaining the cognitive processes. The language from metaphysics detracts from the scientific questioning process and inhibits the construction of a language for explaining cognitive mechanisms.
Common biological processes, including evolutionary theory, are also introduced here as a guide for helping define cognition. This guide restricts the possible explanations for the traits of cognition since these traits are constrained in their capacity for change. There is also an emphasis here on a putative process of how information is encoded in a neural system. Most of the examples are in the visual system since that is the better studied of the sensory systems, and is supported by the theories of optics and information flow. Lastly, there is a section on general cognition that approaches the problem from an evolutionary perspective.

Stochastic Processes in Biology
Vision is the better studied of the sensory systems in primates [13,14]. It is particularly relevant since the visual processes occupy one-half of the cerebral cortex [15]. There is theory from the cognitive sciences that both vision and language are the major drivers for acquiring knowledge and perception of the world. It may seem daunting to imagine that our vivid awareness of a scene is built upon levels of basic physical processes. However, cellular life has generated a high degree of complexity by layering physical processes, such as mutation and population exponentiality, over an evolutionary time scale.
This problem of causation of complex phenomena has occurred in explanations for the origin of the camera eye. The formation of a camera eye that has transformed from a simpler organ, such as an eye spot, requires a model with a very large number of advantageous modifications over time [16,17]. A casual observer of the different forms of eyes, such as for this case, would find it difficult to imagine a material process that could design a functional camera eye from a simpler form. The experienced observer would instead invoke biological processes, such as random morphological change [17] and selection for those changes that favor an increase in the rate of offspring production. The result is the potential for a complex adaptation.
Further evidence that the formation of a camera eye is within the reach of natural processes is provided by the analogous camera eye in a lineage of invertebrate cephalopods. This resulted from an adaptation that occurred independent of the origin of the vertebrate camera eye. Yet, another case of Darwinian evolution is in the optimized refractive index of the camera eye lens. This adaptation occurred by modifications that led to recruitment of protein molecules from other uses to the lens of the eye [18].
There is another case of independent evolution as observed in the neural circuity of animals. The circuit for motion detection in the visual field has converged on a similar design in two different eye forms, both the invertebrate compound eye and the mammalian camera eye [19]. These examples show evolutionary convergence on a similar physical design and evolution's potential for forming complex biological systems. In addition, the process of evolutionary convergence is dependent on developmental constraint on the kinds of modifications, otherwise the chance of convergence on a single design is expectedly low.
These are all examples of natural engineering of life forms by stochastic processes. They are not deterministic processes since they are not directed toward a final goal, but instead the adaptations are continually undergoing change by genetic and phenotypic causes.
The neural system of the brain is a direct analog of the above processes. The organ is considered highly complex and our perceptions are not easily translated to cellular level mechanisms. However, by the same probabilistic processes, the neurons and their interconnections have evolved into a cognitive system that is capable of complex computation with large amounts of sensory data. These cognitive processes include the identification of visual objects, encoding of sensory data to an efficient format, and pattern matching of visual objects to memory.

Abstract Encoding of Sensory Input
The biologically plausible proximate mechanism of cognition originates from the receipt of high dimensional information from the outside world. In the case of vision, the sensory data consist of reflected light rays that are absorbed across a two-dimensional surface, the retinal cells of the eye. These light rays range across the electromagnetic spectra, but the retinal cells are specific to a small subset of all possible light rays.
From an abstract perspective, the surface that receives the visual input is a twodimensional sheet of cells where each cell has an activation value at a point in time (Figure 1). Over a length of time, the distribution of these activations is undergoing change, so the neural system is reporting from a dynamic state of activations. This view at the visual surface is representative of both the spatial and temporal components of the proximate cause of vision.
NeuroSci 2021, 2, FOR PEER REVIEW 4 sensory data consist of reflected light rays that are absorbed across a two-dimensional surface, the retinal cells of the eye. These light rays range across the electromagnetic spectra, but the retinal cells are specific to a small subset of all possible light rays. From an abstract perspective, the surface that receives the visual input is a two-dimensional sheet of cells where each cell has an activation value at a point in time ( Figure  1). Over a length of time, the distribution of these activations is undergoing change, so the neural system is reporting from a dynamic state of activations. This view at the visual surface is representative of both the spatial and temporal components of the proximate cause of vision. Figure 1. An abstract representation of data that are received by a sensory organ, such as light rays that are absorbed by cells along the surface of the retina of a camera eye. The drawing shows the spatial pattern, but there is also a temporal dimension since this sensory input data are changing over time.
This representation of sensory data is similar to that received by artificial neural network systems. These artificial systems are capable of identifying objects in a visual scene and labeling them by their membership to a category of related objects. This also shows analogous function between the artificial process and natural cognition [20].
The open problem has been generalizing this knowledge (transfer learning) that is acquired from processing sensory input data. This is the essential problem for artificial systems in emulating cognition in animals. However, there is recent work that employs artificial models of transfer learning [21,22].
A related problem is in identifying an object where the viewpoint is variable. It is addressed by a model [3] that is designed for biological realism, along with a robust architecture for sampling the parts of an object. This approach includes the sampling of visual data which are then encoded in an abstract format, a vector of number values. Specifically, this sampling occurs across blocks of columns in a visual scene. Further, each column consists of a set of vectors where each vector is assigned to a discrete category by its level of representation of the input data ( Figure 2). These processed data are then utilized for finding columns of similarity that correspond to the parts of an object, a consensusbased approach toward establishing a robust identification of an object. An abstract representation of data that are received by a sensory organ, such as light rays that are absorbed by cells along the surface of the retina of a camera eye. The drawing shows the spatial pattern, but there is also a temporal dimension since this sensory input data are changing over time.
This representation of sensory data is similar to that received by artificial neural network systems. These artificial systems are capable of identifying objects in a visual scene and labeling them by their membership to a category of related objects. This also shows analogous function between the artificial process and natural cognition [20].
The open problem has been generalizing this knowledge (transfer learning) that is acquired from processing sensory input data. This is the essential problem for artificial systems in emulating cognition in animals. However, there is recent work that employs artificial models of transfer learning [21,22].
A related problem is in identifying an object where the viewpoint is variable. It is addressed by a model [3] that is designed for biological realism, along with a robust architecture for sampling the parts of an object. This approach includes the sampling of visual data which are then encoded in an abstract format, a vector of number values. Specifically, this sampling occurs across blocks of columns in a visual scene. Further, each column consists of a set of vectors where each vector is assigned to a discrete category by its level of representation of the input data ( Figure 2). These processed data are then utilized for finding columns of similarity that correspond to the parts of an object, a consensus-based approach toward establishing a robust identification of an object. Previous approaches to artificial systems have often overfit the network model to a training data set. Overfitting hinders the generalizability of the final model [23]-in this case, the model is a network of nodes interconnected with weight values. The overfitting problem leads to loss of transferability of the model to other applications. Nature solves this problem by a set of processes. One is the visual processing for spatial and temporal invariance of an object in a scene [24,25]. This leads to a more generalized form of the object than otherwise.
A second and complementary method is to neurally code the object by metrics that are abstract and generalizable. This reflects the example where a photograph of a cat is encoded so that it matches to both another photograph and a pencil sketch of the cat. This generalizability in identifying objects is now possible in the case of artificial systems [26]. Additionally, this generalizability leads to corrections for the variability in an object's form, such as change in its orientation, deobfuscation against the background, or detection based on a partial view (Figure 3).
(a) (b) Figure 3. (a) The first panel shows a photograph of a visual scene that contains a table along with other objects. The second panel in (a) is the same scene but transformed so that it appears as a pencil sketch drawing; (b) The first panel is a visual drawing of the digit nine (9), while the next panel is the same digit but transformed by rotation of the image.

Perception as a Mechanical Process
There is an extensive amount of visual processing in the brain since it occupies onehalf of the cerebral brain tissue [15]. Further, the number of neurons increases exponentially from millions in the earlier visual pathways to billions in the higher layers of the cerebrum [15]. This hierarchy of processes creates our visual perception of the world, but there is no evidence that a perception of a scene is processed by a single cognitive path. Studies show that an object is identified independent of the visual scene and its attributes Previous approaches to artificial systems have often overfit the network model to a training data set. Overfitting hinders the generalizability of the final model [23]-in this case, the model is a network of nodes interconnected with weight values. The overfitting problem leads to loss of transferability of the model to other applications. Nature solves this problem by a set of processes. One is the visual processing for spatial and temporal invariance of an object in a scene [24,25]. This leads to a more generalized form of the object than otherwise.
A second and complementary method is to neurally code the object by metrics that are abstract and generalizable. This reflects the example where a photograph of a cat is encoded so that it matches to both another photograph and a pencil sketch of the cat. This generalizability in identifying objects is now possible in the case of artificial systems [26]. Additionally, this generalizability leads to corrections for the variability in an object's form, such as change in its orientation, deobfuscation against the background, or detection based on a partial view (Figure 3). Previous approaches to artificial systems have often overfit the network model to a training data set. Overfitting hinders the generalizability of the final model [23]-in this case, the model is a network of nodes interconnected with weight values. The overfitting problem leads to loss of transferability of the model to other applications. Nature solves this problem by a set of processes. One is the visual processing for spatial and temporal invariance of an object in a scene [24,25]. This leads to a more generalized form of the object than otherwise.
A second and complementary method is to neurally code the object by metrics that are abstract and generalizable. This reflects the example where a photograph of a cat is encoded so that it matches to both another photograph and a pencil sketch of the cat. This generalizability in identifying objects is now possible in the case of artificial systems [26]. Additionally, this generalizability leads to corrections for the variability in an object's form, such as change in its orientation, deobfuscation against the background, or detection based on a partial view (Figure 3).

Perception as a Mechanical Process
There is an extensive amount of visual processing in the brain since it occupies onehalf of the cerebral brain tissue [15]. Further, the number of neurons increases exponentially from millions in the earlier visual pathways to billions in the higher layers of the cerebrum [15]. This hierarchy of processes creates our visual perception of the world, but there is no evidence that a perception of a scene is processed by a single cognitive path. Studies show that an object is identified independent of the visual scene and its attributes

Perception as a Mechanical Process
There is an extensive amount of visual processing in the brain since it occupies onehalf of the cerebral brain tissue [15]. Further, the number of neurons increases exponentially from millions in the earlier visual pathways to billions in the higher layers of the cerebrum [15]. This hierarchy of processes creates our visual perception of the world, but there is no evidence that a perception of a scene is processed by a single cognitive path. Studies show that an object is identified independent of the visual scene and its attributes are modified to disfavor variability in its appearance, so that any transformation of the object does not lead to misclassification error [27].
Temporally, the advanced sensory processing occurs over a millisecond time scale [24], so it not expected that perceptions occur in real time. Instead, cognitive processes create an internal representation, a facsimile, of the sensory data and that construction is the perception.
Studies have further divided perception and awareness into multiple types, but in all cases these cognitive processes are a mechanical construction of the outside world [7]. These internal models that form our representation of the world are material processes, including the perceived awareness of objects, a scene, and the occurrence of events. The physical events that occur over time in a scene are also time delayed and the length of that delay is subject to perception. Therefore, the representation of the time delay is not calibrated with real time. Artificial neural networks show analogous processes with models that are capable of predictive coding, such as completing a written sentence or the next frame of a visual image [28].
Visual perception also includes other processes, such as the transformation of a scene's brightness and contrast levels [29]. This may help in identifying objects against a background. Further, the cerebral processing in vision is more extensive than that of the early steps along the visual pathway, so it is reasonable to assume that the perceptual image is weakly correlated with the initial retinal input or the earlier-path internal representations of the visual data.
Last, the limit on the number of evolutionary and developmental outcomes restricts the possible hypotheses about cognition. For example, the evolution of the camera eye expectedly occurred by modifications of small effect, along with the accompanying adaptations in cognition. This predicts that the artificial systems can emulate the visual cognitive processes by a finite number of steps as represented by an algorithm. This has held true since deep learning methods are competitive with our cognitive ability to identify objects and process natural language.

Cognition as a Pattern Matching Process
To find a class of similar visual objects, a comparison to memory is necessary. The informative comparisons occur at particular dimensions in the visual input data. This pattern matching and sampling process is the expected model of cognition.
Animal cognition expectedly handles these pattern matching problems at the lower dimensional levels of information. In contrast, artificial systems are often designed to encode and process at a higher dimensional level, such as for the unmodified two-dimensional pixel data in the case of vision. Another example is for a grid of values and rules of a deterministic boardgame, such as chess. It is known that the high dimensional information is transformed to a lower dimensional form in the layers of the neural network, but a naive approach to these tasks has not been consistent with the goals of transfer learning.
A chess game among human players is mainly based on recognition of patterns of chess pieces on the board, along with a limited capacity to predict future possibilities for the state of the chess board [30]. Instead, the artificial systems are often designed by a different approach. They typically compute a best move by heuristic searching through all possible outcomes from all possible game moves, a method that is exhaustive in its search of possible combinations of board states [31].
A human player searches through a small set of possible outcomes in complex boardgames. The alternative approach based on a low level representation of the board state leads to a computational problem with complexity that is likely beyond the capacity and energetics of the brain. Since a human player is mainly restricted to observing patterns of pieces on the board, it is expected that natural cognition is mainly operating on the information at a state of lower dimensionality. There is empirical support for these ideas, too [32].
Similarly, transfer learning is likely occurring at a lower dimensionality than is present in the unprocessed input source data. Natural cognition receives high dimensional sensory input, a robust sampling process, and that input data are reformulated for constructing a perceptual model. This perception is an internal representation that is a high level of representation based on the source data. In the case of vision, the sampling occurs across a scene, across an object, and then it is possible to also sample across the internal representation of that object. These are statistical processes that are expected in modeling the variability of sensory objects (Figure 3). Without the robust sampling process, then an identification of an object is expectedly overfitted to a form not represented in memory, and therefore impeding any process of transfer learning.
These cognitive processes may also be described as a reduction of complexity in the sensory input data, along with extraction of relevant information for downstream cognitive processing. Likewise, it is already known that visual scenes are highly compressible [33] and consequently both natural and artificially designed systems are capable of extracting visual objects from a scene. This processing leads to an internal representation of objects and their properties. This process is complemented by preprocessing pathways for efficiency in cognition, such as internal correction of overall brightness and contrast levels in a visual scene.

Cognition and Essential Animal Behavior
A definition of general cognition includes the communication of abstract representations and functions related to pattern matching. This definition applies to both a natural and artificial design. However, animal cognition has the component of general cognition that is confounded with processes related to essential animal behavior. Insight into these differences is available from knowledge of the evolution and development of animals.
For example, it is necessary for animal populations to consist of individuals with a common set of behaviors. Examples of these include an adult form that survives to reproductive age and that sufficient progeny are produced to maintain the population. If a population does not maintain a sufficiently high birth rate to counteract the death rate, then the population will become extinct over a number of generations. This concept is a mathematical necessity. Since evolutionary time is very long, even slight changes in animal behavior may lead to population extinction, a process that is highly frequent across the history of life [34]. Therefore, cognition is not at all likely a standalone process, but instead heavily influenced by behavior that ensures reproduction and survival at each relevant life stage.
It is possible to imagine animal cognition without essential animal behavior. This is the presumed state of an artificial cognitive system that is not specifically programmed with a set of behavioral characteristics. There are popular conjectures that models of artificial cognition may lead to a metaphysical property of animal cognition, such as intentionality or consciousness. However, these are properties that are unsupported from a mechanistic perspective of brain computation. Instead, any artificial design of cognition is essentially the same as any tool that is undirected by design [35]. History is a better judge of how undirected tools are utilized than conjectures that confound a cognitive process with non-material causes.

Cognition and Large-Scale Neuroanatomical Changes
In the case of mammals, adaptations may occur that require enhancement or reduction to one or more of the functions of cognition. This leads to the prediction that there is not a hierarchy of general intelligence by brain size, but instead that the cognitive capacity, whether visual, auditory, or somatosensory, is a complex phenotype that is subject to evolution at the different levels of cellular organization and in specific cerebral regions.
In the case of the human lineage, it is arguable that general cognition has expanded to meet the requirements of advanced speech, speech perception, and representation of abstract concepts [36,37]. This is one hypothesis for the proximate cause of the evolution of cognition in recent hominins. However, the hypothesis for an expansion in cognitive function is not necessarily a one-to-one relationship with brain size, as exemplified in other mammals. It has been shown in the cerebral and cerebellar regions of whales that cognitive capability is not simply described by a change in neuron count or density [38].
To reiterate, the morphological changes of the brain and its regions are not necessarily a simple correlation with cognitive function. The addition and subtraction of cognitive capabilities, such as observed by contrasting species of marine and terrestrial mammals, are complex phenomena that are molded by evolution and development. Therefore, it is problematic to oversimplify the relationship between molecular or anatomical characters and a cognitive function.

Cognition as a Physiological Process
The brain is an organ with a physiology that is explainable at the different biological scales. At the molecular level, the neurons are described by a vast number of cellular processes, along with the electrochemical signaling that interconnects the system. There is also a higher scale process that codes for the internal neural representations of sensory data. Both these scales have an analog in other organs, such as the cellular composition of the heart and its electrical system that controls its pumping action.
However, the brain is commonly separated from the other organs and assigned a role that is both biological and metaphysical. One example is illustrated by the diverse set of academic disciplines that study the brain, such as the cognitive sciences, sociology, and cellular biology. These disciplines often approach the problem at a different scale and perspective. Not all approaches are amenable to the study of proximate mechanisms, such as for areas in clinical psychology or the philosophy of mind, and this is one reason for the current retention of explanations based on mental processes [7].
Instead, it is more efficient to study cognition and brain computation as a product of physical forces and communication of information across the system. This scale provides a better foundation for an experiment by a theoretical or an empirical approach. Theoretical study is possible by use of models from network science and information theory, both mature areas of inquiry. The other is from computer science and the use of artificial neural networks. An example is a recent demonstration that colors in a visual scene are efficiently expressed by language. This was shown in an artificial neural network [39]. This shows evidence for efficiency in the neural coding of information and that artificial models emulate this efficiency.
In summary, at its essence, cognition is a study of the physical processes of the brain. As with the organs of most animals, the brain has evolved and acquired derived characters, as documented across the history of life. At the lower biological scales, the brain is an organ no more complex than the heart. Moreover, the proximate mechanisms of cognition, the input and coding of sensory data, are not unimaginable in complexity. Instead, it is an organ with a physiology that is tractable for study at the different scales. As the heart is an organ with a physiology that includes the pumping of blood, the brain is an organ with a physiology that includes the communication of information.

Suggestions for the Natural and Computer Sciences
The cognitive sciences is broad in scope and methods. It is important to continue to integrate their findings with both the other natural and computer sciences. Past findings on perception and awareness have not permeated some of the other areas of knowledge, but eventually any metaphysical basis will yield to a more material definition of cognition [8]. The definition includes an expectation of probabilistic processes that are an essential part of cognition, along with an encoding process that efficiently stores information from the outside world. For acquisition of knowledge of the world and to form generalizations, it is expected that sensory information is internally represented at a higher level of representation, one that is built from the lower levels.
Computer scientists should remain skeptical of assertions about their artificial systems. The benchmark for these systems, such as the advanced deep learning approaches, is not some process of thinking. These are metaphysical ideals that are not material in origin. A neuron is the proximate cause of cognition and does not possess a special immeasurable quality. If a deep learning approach is an effort to emulate a pathway on how we acquire knowledge, then a valid and realistic model should be established beforehand. The problem is not whether the artificial systems can emulate the metaphysics of human thinking, a false proposition, but instead that these systems are emulating a specific and measurable cognitive process.