A Contextual Model for Visual Information Processing

: Despite signiﬁcant achievements in the artiﬁcial narrow intelligence sphere, the mechanisms of human - like (general) intelligence are still undeveloped. There is a theory stating that the human brain extracts the meaning of information rather than recognizes the features of a phenomenon. Extracting the meaning is ﬁnding a set of transformation rules (context) and applying them to the incoming information, producing an interpretation. Then, the interpretation is compared to something already seen and is stored in memory. Information can have diﬀerent meanings in different contexts. A mathematical model of a context processor and a diﬀerential contextual space which can perform the interpretation is discussed and developed in this paper. This study examines whether the basic principles of diﬀerential contextual space s work i n practice. The model is developed with Rust programming language and trained on black and white images which are rotated and shifted both horizontally and vertically according to the saccades and torsion movements of a human eye. Then, a picture that has never been seen in the particular transformation, but has been seen in another one, is exposed to the model. The model considers the image in all known contexts and extracts the meaning. The results show that the program can successfully process black and white images which are transformed by shifts and rotations. This research prepares the grounding for further investigations of the contextual model principles with which general intelligence might operate.


Introduction
It is widely considered that the achievements in artificial intelligence (AI), and especially its applications in image processing, have essentially exceeded the most courageous expectations of previous years [1].However, these AI models belong to the domain of artificial narrow intelligence (ANI) [2].Research is concerned with whether these models can be developed to the level of artificial general intelligence (AGI), which is believed to be built on different principles.They are undeveloped due to the complexity and limitations of experiments for the only one known example of general intelligence-the living human brain [3,4].
The human brain can understand the meaning of information, which is thought to be an active process nowadays.It consists of three phases following each other: sensory reflection, processing the information, and the synthesis of the integral image of the object.In other words, it starts from recognizing the elements of a phenomenon from the external world by organs of sense (perception) with the following understanding and the synthesis of them into an integral objective image.This process is personal and subjective, based on the individual's previous experience [5].So, information without meeting with a recipient is an opportunity that has not become reality.It receives meaning only by being interpreted by an individual [6].This has some relations to a quantum system of qubits where every qubit's state is the superposition of two possible values.It obtains the value after its measurement.Before this act, the system does not have any values or has all of them simultaneously.The same is true of information: until the message is interpreted, it does not have any meaning or has all possible meanings at the same time.The relations between quantum theory and information processing in biological systems are discussed in [7].
Redozubov and Klepikov [8] state that the process of extracting the meaning of a message can be described as follows: the information is processed somehow, and then the result of this transformation is compared to the objects stored in memory.If there is a match, then the information has received an interpretation.Differently, the incoming information has been interpreted into something met before.
The same approach was used by the 'British Bombe' machine for cracking the 'Enigma' code with crib-based decryption or known-text decryption.The idea is to apply some transformation to the given encrypted message and check if the result makes sense, or, differently, check if the result matches the words we know.For example, the incoming encrypted message is 'TEWWER'.It makes sense to no one.However, if there is a known text showing that the given meaningless sequence of symbols is transformed into 'WETTER' (German for 'weather') then we can tell that 'TEWWER' is interpreted as 'WETTER'.The next task is to guess the rules of the transformation.'T' with 'W' are interchangeable in the given example.Thus, they are Stecker partners [9].As soon as the rules are found for a particular part of the message, the transformation can be applied to another part of it to obtain meaning.Thus, interpretation is tightly coupled with the rules of transformation, which are defined as 'context' in [8].Information does not have meaning on its own, but can receive it by being interpreted within a particular context.This also means that the same information can potentially have many meanings being interpreted in different contexts.Jokes in different cultures are based on this ambiguity [10,11].
Existing object recognition models such as Hmax [12,13], Visnet [14,15], different architectures of the artificial neural networks [16], and image pattern recognition [17] models are mostly based on hierarchical principles of organization.They recognize primitive objects in the first level, and generalize them into complex ones in higher levels.These systems have shown a high effectiveness in investigations and industry.Combining them with the contextual approach is the main question for further investigations.
The architecture of hybrid cognitive systems such as Soar [18], ACT-R [19], LIDA [20], or iCUB [21] consists in two blocks in general: cognition and action selection.The cognition block is built from the combination of specialized ANI and the processing of incoming information.Depending on its output, the action selection block chooses a particular behavior for the system.The cognitive models represent a theory of the computational structures that are necessary to support human-level agents.They combine reinforcement learning, semantic memory, episodic memory, mental imagery, and an appraisal-based model of emotion and consider the process of cognition as the process of recognizing an object with a specialized block and then taking the following decisions.Thus, they support a statement that a human-like intelligence (AGI) can be built of more specialized blocks (ANI), which is still under discussion.
Restricting the amount of objects that the system works with into one context is a popular approach [22][23][24][25].Real-world objects are not isolated, but rather part of an entire scene and adhere to certain constraints imposed by their environment.For example, many objects rely on a supporting surface, which restricts their possible image locations.In addition, the typical layout of objects in the 3D world and their subsequent projection onto a 2D image leads to a preference for certain image neighborhoods that can be exploited using a recognition algorithm.Therefore, context plays a crucial role in scene understanding, as has been established in both psychophysical [26][27][28] and computational [29,30] studies [31].The main difference of the contextual model described in this paper is that it focuses on the set of transformation rules applied to an object rather than on its recognition.The set has been named 'context' and it has been suggested to be the key difference between ANI and AGI by Redozubov and Klepikov in [8].This investigation, being based on that paper, follows the same concept and naming conventions.Applying transformation rules to the incoming information restores the original object, followed by matching it with ideal objects stored in memory.If the match is successful, then the incoming information has been interpreted in a particular context.Thus, the contextual model suggests another approach to information processing, focusing on the situation in which a phenomenon appears rather than on its features only.This is different from the existing successful object recognition models and cognition architectures described above.
Thus, despite the fact that current systems have achieved incredible results, they are not able to solve the tasks that AGI is expected to solve.The contextual model described in this paper suggests a new view of the information flow.The main contribution of this research is that it proves the basic principles of the model of visual information processing in experiments.The paper is focused on the concept of applying transformation rules rather than performance, object recognition accuracy, or practical application.This work is expected to fill the gap of the practical grounding of the contextual approach.The model will be essentially improved in the following work by introducing the latest achievements of AI into the model.
The paper consists of the following sections.The 'Materials and Methods' section describes mathematical grounding, algorithms, datasets, and training approaches used for the model creation and testing.The 'Results' section reveals how successful the model worked in testing, with the analysis in the 'Discussion' section.The 'Conclusions' section summarizes the results and sets the direction for the further development of the model.

Contextual Model Overview
The formalization of the contextual information processing model has been performed by Redozubov [32].It is stated that if an arbitrary information message consists of the discrete elements (concepts c), then the variety of possible messages can be described by N available concepts which form a dictionary C = {c 1 , c 2 , c 3 , . . .c N }.Thus, the informational message can be defined as a set of concepts with length k: The original message, I, can be transformed into another message, I int , by replacing concepts c i of the original message with some other concepts c j from the same dictionary C: I = {c i } → I int = {c j }, where c i , c j ∈ C; i, j ∈ 1. ..N.The array of messages, I i, with their known interpretations, I i int , form subject S's memory M, in which every element m is a couple including the initial message I and its interpretation I int : m = (I, I int ).Thus, the whole memory M = {m i |I ∈ 1. ..N M }.Being in the learning mode, subject S is provided with incoming messages, I i, and their correct interpretation, I int , to save them into memory M. In this research, the valid memory entry m i is a right-shifted picture I 1 and the original picture is I int .Another pair is the same original picture, I int , shifted to the left, I 2 .Since the original picture, I int , consists of the primitive objects, c j int , they are transformed into other objects, c j .c j int and c j are the pictures' pixels turning into each other.M can be divided into K groups combining different original concepts c j with the same interpreted concepts c int : R j = {(c j , c int )}, where j ∈ K.In other words, if there are three different original concepts, c 1 , c 2 , and c 3 , interpreted into the same c int , then the memory elements, m i , which include the concepts c j in their information messages I i , can be divided into three groups as well.The frequency of every particular c int in M gives an appropriate estimate of the interpretation probability.A set of the same transformation rules forms a context Cont.Thus, all the revealed groups with the same transformation rules in each form a space of contexts {Cont i } for the S.
When the learning is finished, the interpretations, I int , without their original messages, I, form subset M int = {I i int |i ∈ 1. ..N Mint }.Depending on the number of the interpretations I i → I int in M, every element in M int can be given its coherence ρ (1).
Since some different messages, I, can have the same interpretation, I int , the number of the unique elements I int is N M int ≤ N M .Applying the rules R j by the context Cont j , we can obtain the interpretation I j int for any new message I, and then calculate its consistency with the interpretation memory M int : ρ j = ρ I int j .Thus, the computation in a context can be displayed as the schema in Figure 1.
Computers 2024, 13, x FOR PEER REVIEW 4 of 17 Since some different messages, I, can have the same interpretation, I int , the number of the unique elements I int is   .Applying the rules Rj by the context Contj, we can obtain the interpretation Ij int for any new message I, and then calculate its consistency with the interpretation memory M int :  = ( ).Thus, the computation in a context can be displayed as the schema in Figure 1.Based on the coherence value, it is possible to calculate the probability of the interpretation in a particular context Contj (2).
As a result, the interpretation of the information I in each of the K possible contexts and the probability of this interpretation is received: ( ,  )| = 1 …  }.
Redozubov [32] states that the information is not understood by the subject S and therefore has no meaning for it if all probabilities of its interpretations Ij int are zero: Σjpj = 0. On the contrary, the interpretations Ij int with pj ≠ 0 form a set of possible meanings of the incoming information I: {(Ik int , pk)|k = 1…M, M ≤ K, pk > 0}.The interpretation with the highest probability is the main one for the subject S: I int' = Il int , where l = index(max(pk)).
Original incoming information I is received by the contextual space.Each context Contk in the space applies its transformation rules Rk and receives interpretation  .It is compared to the memory which is identical for all modules.Comparison involves the calculation of the conformity assessment ρk.It can be described as the level how much  resembles items in M int .The final meaning selection is based on the conformities ρ and interpretation probabilities p saved in the memory.When the interpretation is received, the memory can be updated, accumulating experience.
The algorithm of the differential context space model was developed.It describes how the model learns and interprets transformations from given examples.It was programmed on Rust programming language and tested as an executable module for a personal computer with 32 × 32 pixel black and white images placed on a 64 × 64 pixel field.Transformation learning was performed on 242 32 × 32 different icons.They were normalized to have only black (considered as 0) and white (considered as 1) pixels in the 64 × 64 field (Figure 2).The choice of images was based on the following criteria: they should be different enough to make it possible to learn a transformation for every pixel.Based on the coherence value, it is possible to calculate the probability of the interpretation in a particular context Cont j (2).
As a result, the interpretation of the information I in each of the K possible contexts and the probability of this interpretation is received: [32] states that the information is not understood by the subject S and therefore has no meaning for it if all probabilities of its interpretations I j int are zero: Σ j p j = 0.
On the contrary, the interpretations I j int with p j ̸ = 0 form a set of possible meanings of the incoming information I: The interpretation with the highest probability is the main one for the subject S: I int' = I l int , where l = index(max(p k )).Original incoming information I is received by the contextual space.Each context Cont k in the space applies its transformation rules R k and receives interpretation I int k .It is compared to the memory which is identical for all modules.Comparison involves the calculation of the conformity assessment ρ k .It can be described as the level how much I int k resembles items in M int .The final meaning selection is based on the conformities ρ and interpretation probabilities p saved in the memory.When the interpretation is received, the memory can be updated, accumulating experience.
The algorithm of the differential context space model was developed.It describes how the model learns and interprets transformations from given examples.It was programmed on Rust programming language and tested as an executable module for a personal computer with 32 × 32 pixel black and white images placed on a 64 × 64 pixel field.Transformation learning was performed on 242 32 × 32 different icons.They were normalized to have only black (considered as 0) and white (considered as 1) pixels in the 64 × 64 field (Figure 2).The choice of images was based on the following criteria: they should be different enough to make it possible to learn a transformation for every pixel.
Firstly, the model was tested with only xy transformations.Then, rotations were added.As a result, in the first step, the contextual space worked with the data having every incoming bit related to a bit in the output.Due to the rotations, this rule was eased, making pixel transformations not that obvious.It has exceptions for α ∈ 0, π 2 .In this case, every pixel in the input image also has a related image in the output.This happens due to the nature of raster graphic rotations.The algorithm uses nearest pixel interpolation logic for the rotation realization.
The images were applied to 4356 different transformations in total.They include xy shifts from 0 to 16 px to imitate saccades and 0, π 4 , π 2 , 3π 4 α rotations to imitate torsions of the eye [33].This is a subset of the rotations used in [34]: 0, π 8 , π 4 , 3π 8 , π 2 , 5π 8 , 3π 4 , 7π 8 .The number of possible transformations is explained by the fact that a 32 × 32 px image which is located in the center of the 64 × 64 px field can be shifted 16 px in the x or y direction, including zero.This is 2 × 16 + 1 = 33 possible positions for every coordinate.This gives 33 × 33 = 1089 possible xy combinations.This number is multiplied by α, giving 1089 × 8 = 8712 transformations for the full set.The full set of rotations and shifts was then used for horizontal bar interpretations tests.Every new image in the learning set was exposed to the model with no shift and rotation to store it in memory without transformations as the interpretation.Firstly, the model was tested with only xy transformations.Then, rotations were added.As a result, in the first step, the contextual space worked with the data having every incoming bit related to a bit in the output.Due to the rotations, this rule was eased, making pixel transformations not that obvious.It has exceptions for  ∈ 0, .In this case, every pixel in the input image also has a related image in the output.This happens due to the nature of raster graphic rotations.The algorithm uses nearest pixel interpolation logic for the rotation realization.
The images were applied to 4356 different transformations in total.They include xy shifts from 0 to 16 px to imitate saccades and 0, , , α rotations to imitate torsions of the eye [33].This is a subset of the rotations used in [34]: 0, , , , , , , .The number of possible transformations is explained by the fact that a 32 × 32 px image which is located in the center of the 64 × 64 px field can be shifted 16 px in the x or y direction, including zero.This is 2 × 16 + 1 = 33 possible positions for every coordinate.This gives 33 × 33 = 1089 possible xy combinations.This number is multiplied by α, giving 1089 × 8 = 8712 transformations for the full set.The full set of rotations and shifts was then used for horizontal bar interpretations tests.Every new image in the learning set was exposed to the model with no shift and rotation to store it in memory without transformations as the interpretation.
The trained model was saved into a file to restore it in the same state for different testing sessions.Then, the model was presented with an image from the set which it had never seen.The set is normalized to black and white 32 × 32 pixel images in the 64 × 64 field as well.The objects on the pictures are taken from the 'Galaxy' computer game and have no relation to the training icons set (see Section 3.1).The set was chosen because every image in it is different compared to the rest in the set and in the transformation learning set.Firstly, the system was shown the interpretation object.Then, it was exposed to one of the learned transformations on the other examples, but it had never seen the object in The trained model was saved into a file to restore it in the same state for different testing sessions.Then, the model was presented with an image from the set which it had never seen.The set is normalized to black and white 32 × 32 pixel images in the 64 × 64 field as well.The objects on the pictures are taken from the 'Galaxy' computer game and have no relation to the training icons set (see Section 3.1).The set was chosen because every image in it is different compared to the rest in the set and in the transformation learning set.Firstly, the system was shown the interpretation object.Then, it was exposed to one of the learned transformations on the other examples, but it had never seen the object in it before.The model interpreted the given image; putting in the output file the interpretation, what transformation was applied to it, and the accuracy of the result.

Structures Diagram of the Model
The main structures used in the diff_context_space program and the relations between them are shown in Figure 3.
ContextSpace consists of a set of contexts and has two methods: Learn and Interpret.In the learning mode, the space receives some known Transformation t, incoming Information i, and its interpretation in the given transformation.
In relation to the human brain and visual information, it is known that the eyes constantly perform small (microsaccades) and medium (saccades) movements [35,36].The muscles which operate the eye can move it up, down, left, and right.Also, it is possible to rotate the eye on a small angle α (torsions).In other words, the picture on the retina is transformed by an xy shift or rotation.Since it is the brain which operates the eyes, the transformation of the picture is known.As a result, the brain receives the previous picture it before.The model interpreted the given image; putting in the output file the interpretation, what transformation was applied to it, and the accuracy of the result.

Structures Diagram of the Model
The main structures used in the diff_context_space program and the relations between them are shown in Figure 3.In the learning mode, the space receives some known Transformation t, incoming Information i, and its interpretation in the given transformation.
In relation to the human brain and visual information, it is known that the eyes constantly perform small (microsaccades) and medium (saccades) movements [35,36].The muscles which operate the eye can move it up, down, left, and right.Also, it is possible to rotate the eye on a small angle α (torsions).In other words, the picture on the retina is transformed by an xy shift or rotation.Since it is the brain which operates the eyes, the transformation of the picture is known.As a result, the brain receives the previous picture (I int ), the new one (I), and what happened to it (transformation).The same parameters are received by the ContextSpace structure in its Learn method.
The Transformation structure has two fields to describe horizontal and vertical shifts as integer values: Horizontal and Vertical.The structure has methods to calculate the distance to another instance (DistanceTo) and to apply itself to information (ApplyTo).
The Information structure has one field to keep the data (Data).It is a one-dimension array of unsigned integer values which can be represented in 1, 2, 4, 8, or 16 bytes.This structure has a method to calculate its coherence to other information, returning a float value from 0 to 1.The Transformation structure has two fields to describe horizontal and vertical shifts as integer values: Horizontal and Vertical.The structure has methods to calculate the distance to another instance (DistanceTo) and to apply itself to information (ApplyTo).
The Information structure has one field to keep the data (Data).It is a one-dimension array of unsigned integer values which can be represented in 1, 2, 4, 8, or 16 bytes.This structure has a method to calculate its coherence to other information, returning a float value from 0 to 1.
An instance of the ContextSpace finds in the array of contexts the one with the same transformation and initiates its Learn method, passing the incoming information i and its interpretation.The learning logic of the Context structure is discussed in Section 2.3.
In the interpretation mode (Interpret method), an instance of the ContextSpace structure receives incoming information I and a float value of the minimum desired accuracy, varying from 0 to 1.The contextual space requests every instance of the Context in its array and initiates the Interpret method, passing i in it.The algorithm of the interpretation is explained in detail in Section 2.4.
Having applied its own transformation rules, every Context returns an interpretation and accuracy.The latest one is used to select the context winner with the probabilitydependent logic.The higher the accuracy is, the higher the chance the appropriate context has to become the winner.

Context Learning Algorithm
The main point of the learning algorithm is to pick up the transformation rules for every context and clean them from the additional data.A particular context that is responsible for a certain transformation is given incoming information i and its interpretation in the learning mode (Section 2.2).The first example in Figure 4 shows one right shift position of a vertical line in the 8 × 8 pixel size visual field.

Context Interpretation Algorithm
The interpretation algorithm receives information i, applies learned transformation rules to it, and calculates the accuracy, as shown in Figure 5.In the first step, the incoming information is disassembled to the set of bits.Then, the context applies its transformation rules for every set bit there.Having summarized the results with the OR operator, the context receives an interpretation.The interpretation is compared to the memory that is shared between all contexts with all of the already known interpretations to find the best match.The calculation of the accuracy is based on comparing the number of set bits in the incoming information and the number of the rules the context has for these particular bits.The presented logic is simple, but it is enough to prove the basic principles of the We suspect that every pixel set to 1 in the incoming information has an appropriate pixel or their group in the interpretation.This comes from the nature of visual information.Thus, the context creates a set of rules for every set pixel in which every rule is nothing else than a hypothesis about how one pixel is transformed.The first learning example does not reveal the rule regarding what exact pixel or their group are transformed into.That is why the whole interpretation picture is saved.During the next stage, the context is given another group of pixels and their same transformation (1 px shift to the right).Having processed that in the same way, the interpretation rule can be clarified for the pixels which received an interpretation in the past.The simplest way to achieve this is to add a new experience to the existing one with the AND operator.This is illustrated in Figure 4.This algorithm is deliberately simple to receive predictable results.It is essential to prove the model's principles in practice.The methods to reveal the transformation rules in the field shall be more sophisticated; however, it shall learn them from examples rather than be encoded.

Context Interpretation Algorithm
The interpretation algorithm receives information i, applies learned transformation rules to it, and calculates the accuracy, as shown in Figure 5.In the first step, the incoming information is disassembled to the set of bits.Then, the context applies its transformation rules for every set bit there.Having summarized the results with the OR operator, the context receives an interpretation.The interpretation is compared to the memory that is shared between all contexts with all of the already known interpretations to find the best match.The calculation of the accuracy is based on comparing the number of set bits in the incoming information and the number of the rules the context has for these particular bits.The presented logic is simple, but it is enough to prove the basic principles of the interpretation of the context.It can be essentially improved, for example, by adding modern artificial neural networks for selecting the match in the interpretation memory [37].
Computers 2024, 13, x FOR PEER REVIEW 9 of 17 interpretation of the context.It can be essentially improved, for example, by adding modern artificial neural networks for selecting the match in the interpretation memory [37].
Figure 5. Information interpretation algorithm.The incoming information is disassembled to the set pixels.Then, the rules received in the learning mode are applied to it, producing interpretations with some accuracy.Set (value 1) pixels are highlighted with orange color.

XY Transformations
The model and programmed module are described in the Section 4. Firstly, it was tested with 242 16 × 16 pixel black and white images placed on a 32 × 32 pixel field.The Figure 5. Information interpretation algorithm.The incoming information is disassembled to the set pixels.Then, the rules received in the learning mode are applied to it, producing interpretations with some accuracy.Set (value 1) pixels are highlighted with orange color.

XY Transformations
The model and programmed module are described in the Section 4. Firstly, it was tested with 242 16 × 16 pixel black and white images placed on a 32 × 32 pixel field.The images were applied to 289 different shifts to imitate saccades: from zero to eight pixels in horizontal and vertical directions.The total number of images consumed by the model to learn transformation rules is 289 × 242 = 69,938.The icons were normalized to have only black (considered as 0) and white (considered as 1) pixels in the 32 × 32 field.The icons and the results are grouped into Table 1.Then, the model was tested with the same set of images, but with 32 × 32 px sizes in a 64 × 64 field.The number of transformations applied was 1089.The results were similar: all transformations and images were recognized properly.

XY and 4α Transformations for the Galaxy Images
The model was tested with 32 × 32 pixel black and white images placed on a 64 × 64 pixel field.The images have taken 4356 different transformations.Similar to the reported transformations in Section 2.1, the transformations included xy shifts from 0 to 16 px to imitate saccades.However, 0, π/4, π/2, and 3π/4 α rotations were added to imitate torsions.The total number of images consumed by the model to learn is 4356 × 242 = 1,054,152.The interpretation results were assessed regarding the number of errors the system made in image recognition.An error means that the model was not able to recognize the initial image.Another type of assessed errors was transformation recognition.The er-   The column '# of transformations' shows the number of transformations tested.'Correctly recognized transformations' show the proportion of cases in which the model recognized the transformation properly.'Average interpretation coherence' shows the coherence between the selected interpreted image and the one selected from memory (from 0.0 to 1.0).The results of applying the selected context to find the interpretation are shown in Figure 6.Some of them have artifacts (extra pixels).Then, the model was tested with the same set of images, but with 32 × 32 px sizes in a 64 × 64 field.The number of transformations applied was 1089.The results were similar: all transformations and images were recognized properly.

XY and 4α Transformations for the Galaxy Images
The model was tested with 32 × 32 pixel black and white images placed on a 64 × 64 pixel field.The images have taken 4356 different transformations.Similar to the reported transformations in Section 2.1, the transformations included xy shifts from 0 to 16 px to imitate saccades.However, 0, π/4, π/2, and 3π/4 α rotations were added to imitate torsions.The total number of images consumed by the model to learn is 4356 × 242 = 1,054,152.The interpretation results were assessed regarding the number of errors the system made in image recognition.An error means that the model was not able to recognize Then, the model was tested with the same set of images, but with 32 × 32 px sizes in a 64 × 64 field.The number of transformations applied was 1089.The results were similar: all transformations and images were recognized properly.The distribution of the image recognition errors, depending on the applied transformation, are shown in Figure 7.In the charts, the x and y axis refer to the x and y values of the transformation in which image interpretation error occurred.The rotation angle α is not depicted there.The distribution of the image recognition errors, depending on the applied transformation, are shown in Figure 7.In the charts, the x and y axis refer to the x and y values of the transformation in which image interpretation error occurred.The rotation angle α is not depicted there.The distribution of the image recognition errors, depending on the applied transformation, are shown in Figure 7.In the charts, the x and y axis refer to the x and y values of the transformation in which image interpretation error occurred.The rotation angle α is not depicted there.Almost all images from the 16 examples have interpretation errors that count for 14% or less.Among them, two images (Figure 7d,e) had around 1.3% interpretation errors; one image (Figure 7o) had less than 3% interpretation errors; two images (Figure 7g,i) had around 3.5% interpretation errors; three images (Figure 7a,b,l) had around 6% interpretation errors; three images (Figure 7h,k,p) had around 8% interpretation errors; three images

XY and 8α Transformations for Horizontal Bar
The next step of the following investigation will be the pinwheel formation of slit or bar rotation activations [38,39]; taking this into account, the model was tested only for a horizontal bar 32 × 1 px (Figure 8) interpretation instead of the Galaxy set (Table 1).This statement also allows for the exclusion of training examples from the interpretation memory and requests the model to only recognize the bar with a certain accuracy.The number of training pictures was reduced to 100 images but with smoother lines (Figure 9).Basing on the results in the Section 3.1, the number of learning examples can be decreased without an essential influence on the extraction of transformation rules by the model.This allows for a significant increase to the training process, since the number of transformations is 8712.The results revealed the following: in 20 cases (0.23%), the interpretation was not found for the 0.7 level of accuracy; in 425 cases (4.88%), the transformation was not selected properly.
All transformations errors had an incorrect x coordinate, but y and α were selected properly.A significant number of errors are related to a 1 px mistake in the coordinate (Figure 10).The number of training pictures was reduced to 100 images but with smoother lines (Figure 9).Basing on the results in the Section 3.1, the number of learning examples can be decreased without an essential influence on the extraction of transformation rules by the model.This allows for a significant increase to the training process, since the number of transformations is 8712.The number of training pictures was reduced to 100 images but with smoother lines (Figure 9).Basing on the results in the Section 3.1, the number of learning examples can be decreased without an essential influence on the extraction of transformation rules by the model.This allows for a significant increase to the training process, since the number of transformations is 8712.The results revealed the following: in 20 cases (0.23%), the interpretation was not found for the 0.7 level of accuracy; in 425 cases (4.88%), the transformation was not selected properly.
All transformations errors had an incorrect x coordinate, but y and α were selected properly.A significant number of errors are related to a 1 px mistake in the coordinate (Figure 10).The results revealed the following: in 20 cases (0.23%), the interpretation was not found for the 0.7 level of accuracy; in 425 cases (4.88%), the transformation was not selected properly.
All transformations errors had an incorrect x coordinate, but y and α were selected properly.A significant number of errors are related to a 1 px mistake in the coordinate (Figure 10).The results revealed the following: in 20 cases (0.23%), the interpretation was not found for the 0.7 level of accuracy; in 425 cases (4.88%), the transformation was not selected properly.
All transformations errors had an incorrect x coordinate, but y and α were selected properly.A significant number of errors are related to a 1 px mistake in the coordinate (Figure 10).

Discussion
specialized for visual information processing.This paper is only concentrated on the proof of the basic principles of the model to make the foundation for its further development.
The images used in this investigation were only black and white with a low resolution to make the results of the model predictable.However, it can be improved by adding edge detection algorithms so that the model can work with greyscale or color pictures as well.Also, increasing image resolution should decrease the influence of pixel loss on rotation transformations, making the edges smoother.Adding uncertainty to the interpretation of pixels with probability or using the most successful recognition models described in Section 4 for matching interpretations with memory should essentially improve the model.At the same time, it will become less predictable, making it harder to reproduce and analyze the results.This should make an essential contribution for practical applications.

Conclusions
This paper has shown that the differential context space can successfully learn the rules of xyα transformations of black and white visual information.It was proved for 16 × 16 px in a 32 × 32 field and for 32 × 32 px in 64 × 64 field images.The model can successfully interpret these images, even though they have never been seen with particular shifts and rotations.It has been demonstrated that the model does not make interpretive mistakes if the pixels are not lost during the transformation.The highest number of errors happened on the rotations at π 4 and 3π 4 with the furthest shift to the right.This fact can be explained by the nature of raster image rotations: in these positions, the maximum number of pixels are lost compared to the original.Thus, the results prove that the basic principles of contextual information processing have experimental grounding and that continued investigations are warranted.The following step is to improve the contextual model by integrating artificial neural networks in the object recognition steps.

Figure 1 .
Figure 1.Computational scheme of the context module.

Figure 1 .
Figure 1.Computational scheme of the context module.

(
I int ), the new one (I), and what happened to it (transformation).The same parameters are received by the ContextSpace structure in its Learn method.

Figure 3 .
Figure 3. Structures diagram of the model.ContextSpace consists of a set of contexts and has two methods: Learn and Interpret.In the learning mode, the space receives some known Transformation t, incoming Information i, and its interpretation in the given transformation.In relation to the human brain and visual information, it is known that the eyes constantly perform small (microsaccades) and medium (saccades) movements[35,36].The muscles which operate the eye can move it up, down, left, and right.Also, it is possible to rotate the eye on a small angle α (torsions).In other words, the picture on the retina is transformed by an xy shift or rotation.Since it is the brain which operates the eyes, the transformation of the picture is known.As a result, the brain receives the previous picture (I int ), the new one (I), and what happened to it (transformation).The same parameters are received by the ContextSpace structure in its Learn method.The Transformation structure has two fields to describe horizontal and vertical shifts as integer values: Horizontal and Vertical.The structure has methods to calculate the distance to another instance (DistanceTo) and to apply itself to information (ApplyTo).The Information structure has one field to keep the data (Data).It is a one-dimension array of unsigned integer values which can be represented in 1, 2, 4, 8, or 16 bytes.This structure has a method to calculate its coherence to other information, returning a float value from 0 to 1.

Figure 3 .
Figure 3. Structures diagram of the model.

Computers 2024 ,
13, x FOR PEER REVIEW 8 of 17

Figure 4 .
Figure 4. Context learning algorithm example.The context creates a set or the transformation rules for every set pixel in the incoming information and, as a result, produces 6 and 4 transformation rule hypotheses for 2 pictures.The consolidation of the transformation rules combines the previous experience, clarifying the hypotheses for every incoming set (value 1) pixel (highlighted with orange color).The reset (changed value from 1 to 0) pixels on clarification are highlighted with red color.

Figure 4 .
Figure 4. Context learning algorithm example.The context creates a set or the transformation rules for every set pixel in the incoming information and, as a result, produces 6 and 4 transformation rule hypotheses for 2 pictures.The consolidation of the transformation rules combines the previous experience, clarifying the hypotheses for every incoming set (value 1) pixel (highlighted with orange color).The reset (changed value from 1 to 0) pixels on clarification are highlighted with red color.
column '# of transformations' shows the number of transformations tested.'Correctly recognized transformations' show the proportion of cases in which the model recognized the transformation properly.'Average interpretation coherence' shows the coherence between the selected interpreted image and the one selected from memory (from 0.0 to 1.0).The results of applying the selected context to find the interpretation are shown in Figure6.Some of them have artifacts (extra pixels).

Figure 7 .
Figure 7. Image recognition errors with xy transformation distribution (a-p).

Figure 7 .
Figure 7. Image recognition errors with xy transformation distribution (a-p).

Figure 10 .
Figure 10.X transformation errors for slit.The error distribution regarding the rotation is shown in Figure11.The essential number of transformation misinterpretations happened for α ∈ 45°, 135° (α ∈ , ).

Figure 10 .
Figure 10.X transformation errors for slit.The error distribution regarding the rotation is shown in Figure11.The essential number of transformation misinterpretations happened for α ∈ 45°, 135° (α ∈ , ).

Computers 2024 ,
13,  x FOR PEER REVIEW 13 of 16 be decreased without an essential influence on the extraction of transformation rules by the model.This allows for a significant increase to the training process, since the number of transformations is 8712.

Figure 11 .
Figure 11.X transformation errors distribution (blue line) for slit depending on α.

Figure 11 .
Figure 11.X transformation errors distribution (blue line) for slit depending on α.