A Brain-Inspired Goal-Oriented Robot Navigation System

: Autonomous navigation in unknown environments is still a challenge for robotics. Many e ﬀ orts have been exerted to develop truly autonomous goal-oriented robot navigation models based on the neural mechanism of spatial cognition and mapping in animals’ brains. Inspired by the Semantic Pointer Architecture Uniﬁed Network (SPAUN) neural model and neural navigation mechanism, we developed a brain-like biologically plausible mathematical model and applied it to robotic spatial navigation tasks. The proposed cognitive navigation framework adopts a one-dimensional ring attractor to model the head-direction cells, uses the sinusoidal interference model to obtain the grid-like activity pattern, and gets optimal movement direction based on the entire set of activities. The application of adaptive resonance theory (ART) could e ﬀ ectively reduce resource consumption and solve the problem of stability and plasticity in the dynamic adjustment network. This brain-like system model broadens the perspective to develop more powerful autonomous robotic navigation systems. The proposed model was tested under di ﬀ erent conditions and exhibited superior navigation performance, proving its e ﬀ ectiveness and reliability.


Introduction
Animals have innate skills to navigate in unfamiliar environments [1]. It seems effortless for animals such as rats and primates but has been a fundamental long-term challenge for robots. Developing a brain-like navigation model based on the brain navigation mechanism of animals has attracted increasing attention.
The hippocampus, one of the most important brain regions, which codes for direction, distance, and speed signals of movement, is thought to play a pivotal role in place learning, navigation, and exploration. Place cells in the CA1 field of the hippocampus have a conspicuous functional characteristic of location-specific activity. They have a high firing rate when animals are in a particular location and may form a map-like representation of the environment, called grid cells space. Animals can use grid cells space for efficient navigation [2]. The grid cells found by Moser are also key cells for navigation [3][4][5]. Grid cells in the entorhinal cortex (EC) are arranged in strikingly periodic firing fields. A regularly tessellating triangular grid-like pattern was found to cover an entire environment, regardless of external cues. This suggests that grid cells have the ability to take in path integrals by internally generating a mapping of the spatial environment, which has been proved by an attractor network structure [6,7]. Head-direction (HD) cells exist in the postsubiculum and the EC [8]. Like a compass, they provide the input of biological plasticity for the brain-like navigation system [9]. The discharge frequency of speed cells found in the medial entorhinal cortex (MEC) is regulated by running speed, which could dynamically represent the animal's location [10]. Integrating HD cells and speed cells, animals have knowledge of their direction and speed of motion and can track their idiothetic motion [11].
Navigation also involves memory and retrieval of environmental information, which is considered a cognitive mechanism for storing and processing information. A navigation system with spatial cognition and mapping can better achieve rapid and stable navigation. Anatomically, the neural responses of working memory are found in large areas of the parietal and frontal cortex. Accumulating evidence suggests that the posterior parietal cortex (PPC) and the dorsolateral prefrontal cortex (DLPFC) in the cerebral cortex are closely involved in working memory [12][13][14], temporary storage, and manipulation of data related to cognitive control [15]. Adaptive resonance theory (ART) is a cognitive and neural theory of how the brain autonomously learns to categorize and recognize events in a changing world. In this paper, the ART mechanism was used to deal with the retrieval and storage of environmental information. The rewards are a positive incentive for the desired action. The rewards will increase the selected probability of the previous action and guide the animal to easily reach the final goal. Anatomically, the orbitofrontal cortex (OFC) is involved in the process of reward signals [16]. In this paper, the rewards are represented by value functions of Q-learning to aid navigation.
Numerous computational models have been proposed based on place cells and grid cells. Each place cell has its own firing field and, taken together, the firing fields of place cells tend to cover the entire surface of the environment. The firing fields of place cells have mainly been modeled as Gaussian functions [17][18][19]. A large majority of grid cells have been modeled as three classes: oscillatory interference [20][21][22], continuous attractor network [6,7,23], and self-organizing [24][25][26]. These models concentrate only on the formation of place cells and grid cells, which does not really apply to navigation and is not instructive for autonomous robotics navigation.
Many robot navigation systems use a brain-like computational mechanism to complete navigation tasks. Milford developed a bionic navigation algorithm called RatSLAM [27,28]. It combines head cells and place cells to form an abstract pose cell, which is mainly used for accurate positioning. Zeng and Si [29] proposed a cognitive mapping model in which HD cells and grid cells use the same working mechanism; however, a place cell network is not included, and their model could not provide metric mapping. Tang [30] developed a hierarchical brain-like cognitive model with a particular focus on the spatial coding properties of the EC, which does not include navigation strategy or cognitive mapping. Moreover, none of the above models contains the biological mechanism of memory storage and retrieval, which has an important effect on effective navigation. Adaptive resonance theory (ATR) is a cognitive and neural theory of how the brain autonomously learns to categorize, recognize, and predict objects and events in a changing world. In this paper, ART is adapted to deal with working memory. ART based on matching learning is an effective approach to handle the dynamic stability of the network and get better navigation performance.
Eliasmith [31] proposed a unified set of neural mechanisms, Semantic Pointer Architecture Unified Network (SPAUN), which has performed plenty of tasks. The structure of SPAUN is divided into three layers: input information processing, working memory, and action execution. Inspired by the framework of SPAUN's neural structure and the neural mechanisms of biological brain navigation, we proposed a novel three-layer brain-like overall structure of cognitive navigation, which was a supplement to SPAUN. The proposed navigation model can autonomously complete navigation tasks.

System Structure
Inspired by the SPAUN neural structure and the mechanism of neural navigation in animals' brains, a brain-like biologically plausible cognitive model was proposed and applied to robotic spatial navigation tasks. The model's architecture is shown in Figure 1. Appl. Sci. 2019, 9,   and brain neural system. This framework deals with data from input signals to final action generation to achieve autonomous navigation. The model consists of three layers: hippocampal region, dealing with input signals (head direction and instantaneous speed signals) by path integration to expression position; cerebral cortex, creating and modifying the topological map of environmental information and passing the reward information to the decision-making layer; and basal ganglia, determining the final action by winner-take-all rule. Meaning of abbreviations: DG, dentate gyrus; CA3 and CA1, internal area of hippocampus; EC, entorhinal cortex; PPC, posterior parietal cortex; DLPFC, dorsolateral prefrontal cortex.
The model includes three layers: cerebral cortex, hippocampal region and basal ganglia. The basal ganglia determine and perform the most highly recommended behavior, in this case, leading the robot to the target. The cerebral cortex mainly handles the working memory and reward function. When new knowledge is learned, the memory capacity of ART can increase adaptively without destroying the previous knowledge of the network. The ART method can build environmental information dynamically and stably. In this paper we used the ART network to dynamically process information, helping to achieve rapid and stable navigation.
Sitting between the cerebral cortex and basal ganglia is the hippocampal region, which includes the hippocampus and the EC. This plays a crucial role in spatial learning and navigation [32], where external information (HD and speed information) are converted to expressions of the internal environment. The hippocampal region is an area where grid cells, HD cells, and speed cells converge to transfer place information to the place cells.
Initially, speed cells and HD cells are inputs to layer III of the EC, which handles the ring attractor pre-processing. Then, they go through layer II of the EC, where they obtain the grid cells' activities. The grid cells in EC II pass through the perforant pathway into the dentate gyrus (DG). Then, the granular cells in the DG are associated with the CA3 of the hippocampus by moss fibers. CA3 projects to the CA1 by Schaffer collaterals. The deep V/VI layer performs top-down feedback from the hippocampus to the EC to help stabilize memories.

Model Description
Motion information is obtained by integrating direction and speed signals, then updating the self-position. This paper uses a 1D ring attractor to model the HD cells: Figure 1. System diagram of navigation model based on Semantic Pointer Architecture Unified Network (SPAUN) and brain neural system. This framework deals with data from input signals to final action generation to achieve autonomous navigation. The model consists of three layers: hippocampal region, dealing with input signals (head direction and instantaneous speed signals) by path integration to expression position; cerebral cortex, creating and modifying the topological map of environmental information and passing the reward information to the decision-making layer; and basal ganglia, determining the final action by winner-take-all rule. Meaning of abbreviations: DG, dentate gyrus; CA3 and CA1, internal area of hippocampus; EC, entorhinal cortex; PPC, posterior parietal cortex; DLPFC, dorsolateral prefrontal cortex.
The model includes three layers: cerebral cortex, hippocampal region and basal ganglia. The basal ganglia determine and perform the most highly recommended behavior, in this case, leading the robot to the target. The cerebral cortex mainly handles the working memory and reward function. When new knowledge is learned, the memory capacity of ART can increase adaptively without destroying the previous knowledge of the network. The ART method can build environmental information dynamically and stably. In this paper we used the ART network to dynamically process information, helping to achieve rapid and stable navigation.
Sitting between the cerebral cortex and basal ganglia is the hippocampal region, which includes the hippocampus and the EC. This plays a crucial role in spatial learning and navigation [32], where external information (HD and speed information) are converted to expressions of the internal environment. The hippocampal region is an area where grid cells, HD cells, and speed cells converge to transfer place information to the place cells.
Initially, speed cells and HD cells are inputs to layer III of the EC, which handles the ring attractor pre-processing. Then, they go through layer II of the EC, where they obtain the grid cells' activities. The grid cells in EC II pass through the perforant pathway into the dentate gyrus (DG). Then, the granular cells in the DG are associated with the CA3 of the hippocampus by moss fibers. CA3 projects to the CA1 by Schaffer collaterals. The deep V/VI layer performs top-down feedback from the hippocampus to the EC to help stabilize memories.

Model Description
Motion information is obtained by integrating direction and speed signals, then updating the self-position. This paper uses a 1D ring attractor to model the HD cells: where φ i,j is the preferred direction of the j th cell in the i th ring attractor.
. , m, m = 6 is the number of neurons in a single attractor. Different rings have different preferred directions. There are three ring attractors. The angle difference is π/3. Suppose that at time t, the robot is moving with speed V(t). The velocity projected to different ring attractors is: Given the velocity, we can figure out the wavelength of sine function: If two sinusoidal functions of different frequencies interfere with each other, it will generate beat frequency oscillations with a fixed spatial distance (as shown in Figure 2b). The beat frequency is equal to the difference between the two frequencies: Then the wavelength of the periodic oscillation is: λ i a fixed value and represents the distance between vertices in the grid cells space. According to Equation (6), we derive: Set f 0 = 7 Hz as a reference frequency. Beat frequency oscillation is integrated into the corresponding phase in different ring attractors (as shown in Figure 2a): Converting phase-coded into rate-coded to express the position signal, the activity of the j th cell in the i th ring is: With comprehensive input of ring attractors with different preferred orientations, the amplitude of the beat frequency oscillations will be recognized at each x ± kλ i (x = V(t) × t). This means that when the robot is moving linearly, it will produce grid-like activity with vertex spacing λ i (as shown in Figure 2c). The activity of grid cells is: With comprehensive input of ring attractors with different preferred orientations, the amplitude of the beat frequency oscillations will be recognized at each x ± kλi (x = V(t) × t). This means that when the robot is moving linearly, it will produce grid-like activity with vertex spacing λi (as shown in Figure 2c). The activity of grid cells is: The θ oscillating beat interference will produce a fixed interval beat y4, and the amplitude of the beat oscillation will be identified at each attractor. (c) Illustration of how hexagon-like firing mode was generated. Trajectory of random movement of robot (blue) and grid-like firing mode generated by this paper's method (red).
The place cells accept the inputs of the grid cells' population with different activities. The place cells are activated when the inputs are greater than the threshold h:

Navigation Strategy
There are two ways to guide the robot to the goal in an unknown environment: select the direction according to the reward signal or the Q-value of the action cells. Initially, the robot randomly explores the environment and will get a reward signal after reaching the target. If there are no reward signals within the search scope, it then selects a direction according to the Q-values; otherwise, the robot follows the reward signals.
If the reward signals can be detected in the step range, the value of the reward signals will increase according to Equation (12). With each step, all nodes update their reward signals. The direction with the maximum sum of reward signals will be chosen. Schematic diagram of periodic oscillation obtained by interference of different frequency waveforms. y 1 = sin(14πx + π/2); y 2 = sin(16πx + 3π/2); y 3 = y 1 + y 2 ; The θ oscillating beat interference will produce a fixed interval beat y4, and the amplitude of the beat oscillation will be identified at each attractor. (c) Illustration of how hexagon-like firing mode was generated. Trajectory of random movement of robot (blue) and grid-like firing mode generated by this paper's method (red).
The place cells accept the inputs of the grid cells' population with different activities. The place cells are activated when the inputs are greater than the threshold h:

Navigation Strategy
There are two ways to guide the robot to the goal in an unknown environment: select the direction according to the reward signal or the Q-value of the action cells. Initially, the robot randomly explores the environment and will get a reward signal after reaching the target. If there are no reward signals within the search scope, it then selects a direction according to the Q-values; otherwise, the robot follows the reward signals.
If the reward signals can be detected in the step range, the value of the reward signals will increase according to Equation (12). With each step, all nodes update their reward signals. The direction with the maximum sum of reward signals will be chosen.
If the reward signal is zero at the present position, the robot selects the direction according to the Q-values. In our system, there are eight action directions (at 45 • intervals). Each action cell receives the input from all activated place cells. According to a winner-takes-all mechanism, the direction with the maximum mean of Q-values is the optimal output direction.
If both the Q-value and the reward signal are zero at the present position, the robot will move randomly. In this case, the robot keeps the direction with a probability of 1 − ξ. With a probability of ξ, it randomly chooses a new direction with the aim of finding other, shorter routes.

Network Dynamic Adjustment
The stability-plasticity dilemma problem, whereby the brain learns quickly and stably without catastrophically forgetting its past knowledge [33], exerts a significant influence on stable robot navigation. In our system, nodes were sparsely added to the cognitive map during robot exploration. We combined the ART network [34] with the robot navigation system not only to solve the problem of stable plasticity of the network dynamic adjustment, but also to enable the robot to complete navigation tasks autonomously.
In this paper, ART (simultaneously working and learning) was slightly modified to evaluate whether the nodes should be added to the map. The fundamental structure of ART is shown in Figure 3. The algorithm flow is as follows: (1) Network receives input x i ; (2) Compute the match degree.
Calculate the distance between x i and other points in map b j .
(3) Choose the best match neuron.
Obtain the winning neuron by competition: (4) Inspect warning threshold.
If net j * < ρ, compare the rewards of two nodes, adding the node with bigger reward to the map, then update the map connection.
Else, if random probability >0.25, then add the winning neuron to the map.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 13 = − +Δ ( ) ( 1) r t r t r (12) If the reward signal is zero at the present position, the robot selects the direction according to the Q-values. In our system, there are eight action directions (at 45° intervals). Each action cell receives the input from all activated place cells. According to a winner-takes-all mechanism, the direction with the maximum mean of Q-values is the optimal output direction.
where , If both the Q-value and the reward signal are zero at the present position, the robot will move randomly. In this case, the robot keeps the direction with a probability of 1 -ξ. With a probability of ξ, it randomly chooses a new direction with the aim of finding other, shorter routes.

Network Dynamic Adjustment
The stability-plasticity dilemma problem, whereby the brain learns quickly and stably without catastrophically forgetting its past knowledge [33], exerts a significant influence on stable robot navigation. In our system, nodes were sparsely added to the cognitive map during robot exploration. We combined the ART network [34] with the robot navigation system not only to solve the problem of stable plasticity of the network dynamic adjustment, but also to enable the robot to complete navigation tasks autonomously. In this paper, ART (simultaneously working and learning) was slightly modified to evaluate whether the nodes should be added to the map. The fundamental structure of ART is shown in Figure  3. The algorithm flow is as follows: 1) Network receives input xi.; 2) Compute the match degree.
Calculate the distance between xi and other points in map bj.
n Figure 3. The fundamental structure of adaptive resonance theory.

Simulation Setup
Numerous researchers have used the Morris water maze experiment [35][36][37][38], in which a rat must find a hidden platform in its environment. This has become the standard paradigm for measuring navigation tasks.
A dry variant of the Morris water maze experiment was adopted to verify our system model. The navigation task was completed in a 100 × 100 virtual environment. The robot was placed in the lower left corner to find the target as shown in Figure 4. Initially, the robot randomly explored the surroundings, and the reward value was set to 1 when it found the goal. Table 1 summarizes the parameters used in the model.

 
(4) Inspect warning threshold. If ρ < * netj , compare the rewards of two nodes, adding the node with bigger reward to the map, then update the map connection. Else, if random probability >0.25, then add the winning neuron to the map.

Simulation Setup
Numerous researchers have used the Morris water maze experiment [35][36][37][38], in which a rat must find a hidden platform in its environment. This has become the standard paradigm for measuring navigation tasks.
A dry variant of the Morris water maze experiment was adopted to verify our system model. The navigation task was completed in a 100 × 100 virtual environment. The robot was placed in the lower left corner to find the target as shown in Figure 4. Initially, the robot randomly explored the surroundings, and the reward value was set to 1 when it found the goal. Table 1 summarizes the parameters used in the model.

Simulation Result
The robot started random exploration with an empty cognitive map and continued to update it using the ART process. We report three types of simulation environments to determine the effectiveness of our system model: an ideal situation with no obstacles in the environment ( Figure 5), simple obstacles set in the environment (Figure 6), and changes the start position in the simple obstacles situation (Figure 7).
In the ideal situation with no obstacles, the robot explored until it found the target 30 times. The early path was relatively random due to the absence of reward signals, but with our navigation strategy, it could quickly search a shorter path of an approximately straight route to the goal after five trials. Moreover, the probability of 0.1 for random exploration was used to increase the

Simulation Result
The robot started random exploration with an empty cognitive map and continued to update it using the ART process. We report three types of simulation environments to determine the effectiveness of our system model: an ideal situation with no obstacles in the environment ( Figure 5), simple obstacles set in the environment (Figure 6), and changes the start position in the simple obstacles situation (Figure 7).
In the ideal situation with no obstacles, the robot explored until it found the target 30 times. The early path was relatively random due to the absence of reward signals, but with our navigation strategy, it could quickly search a shorter path of an approximately straight route to the goal after five trials. Moreover, the probability of 0.1 for random exploration was used to increase the opportunity of finding other shorter paths. For example, on the 8th and 16th paths, it tried new directions and avoided getting into the local optimum. To further test the performance of the model, we added simple obstacles to the environment ( Figure 6). In this situation, the robot must search for the goal and, at the same time, effectively avoid obstacles. Compared to the ideal situation with no obstacles, more exploration steps were needed in the early stage, but with the diffusion of rewards, it could quickly find a better path to the goal. In this paper, we aim to outline the general navigation principles in different environments, not provide a specific obstacle-avoidance algorithm. To further test the performance of the model, we added simple obstacles to the environment ( Figure 6). In this situation, the robot must search for the goal and, at the same time, effectively avoid obstacles. Compared to the ideal situation with no obstacles, more exploration steps were needed in the early stage, but with the diffusion of rewards, it could quickly find a better path to the goal. In this paper, we aim to outline the general navigation principles in different environments, not provide a specific obstacle-avoidance algorithm. The main challenge of robot navigation in an unfamiliar environment is simultaneous localization and mapping (SLAM). After establishing the cognitive map, navigation becomes path planning. The A * algorithm, one of the most popular heuristic search algorithms, is widely used in path optimization. According to the obtained cognitive map shown in Figure 6c, we easily obtained a relatively shorter path using the A* algorithm, before ( Figure 7a) and after (Figure 7b) changing the start point. The main challenge of robot navigation in an unfamiliar environment is simultaneous localization and mapping (SLAM). After establishing the cognitive map, navigation becomes path planning. The A * algorithm, one of the most popular heuristic search algorithms, is widely used in path optimization. According to the obtained cognitive map shown in Figure 6c, we easily obtained a relatively shorter path using the A* algorithm, before (

Contrastive Analysis
The steps used to complete each trial are an important indicator of the navigation system. Kulvicius [39] added four pieces of olfactory information to aid navigation. Compared with that, the number of steps required to reach the goal are presented in Figure 8. In our system, having no olfactory cues to guide navigation, the first few trials needed more exploratory steps, but we could quickly search the relatively shorter path. This indicates that with the same environmental cues, we could provide better performance. Unlike in [39], the proposed model was consistent with the biological mechanism, indicating that it could effectively navigate in the experiment with immobile obstacles. In terms of the map build, nodes were sparsely added to the cognitive map during robot exploration. However, in [35] there was no judgment when new nodes were recruited, which would lead to high node randomness and not help further navigation. We added the ART network to solve the problem of stable plasticity of network dynamic adjustment. Based on the matching learning method, the ART process can effectively save resources, dynamically express the environment, and autonomously complete navigation tasks.

Contrastive Analysis
The steps used to complete each trial are an important indicator of the navigation system. Kulvicius [39] added four pieces of olfactory information to aid navigation. Compared with that, the number of steps required to reach the goal are presented in Figure 8. In our system, having no olfactory cues to guide navigation, the first few trials needed more exploratory steps, but we could quickly search the relatively shorter path. This indicates that with the same environmental cues, we could provide better performance. Unlike in [39], the proposed model was consistent with the biological mechanism, indicating that it could effectively navigate in the experiment with immobile obstacles.

Contrastive Analysis
The steps used to complete each trial are an important indicator of the navigation system. Kulvicius [39] added four pieces of olfactory information to aid navigation. Compared with that, the number of steps required to reach the goal are presented in Figure 8. In our system, having no olfactory cues to guide navigation, the first few trials needed more exploratory steps, but we could quickly search the relatively shorter path. This indicates that with the same environmental cues, we could provide better performance. Unlike in [39], the proposed model was consistent with the biological mechanism, indicating that it could effectively navigate in the experiment with immobile obstacles. In terms of the map build, nodes were sparsely added to the cognitive map during robot exploration. However, in [35] there was no judgment when new nodes were recruited, which would lead to high node randomness and not help further navigation. We added the ART network to solve the problem of stable plasticity of network dynamic adjustment. Based on the matching learning method, the ART process can effectively save resources, dynamically express the environment, and autonomously complete navigation tasks. In terms of the map build, nodes were sparsely added to the cognitive map during robot exploration. However, in [35] there was no judgment when new nodes were recruited, which would lead to high node randomness and not help further navigation. We added the ART network to solve the problem of stable plasticity of network dynamic adjustment. Based on the matching learning method, the ART process can effectively save resources, dynamically express the environment, and autonomously complete navigation tasks.

Conclusions
Due to the flexibility and autonomy of biological navigation, many efforts have been exerted to apply biological mechanisms to developing autonomous robot navigation systems. Although a variety of computational models have been proposed according to animals' goal-oriented behavior, rarely have they included the structure of multiple navigation cells. We propose a general structure for robot positioning, navigation, and memory mapping. We combined it with ART to form a new intelligent navigation system. The application of ART in robot technology promotes its cognitive ability. The proposed model was simulated and verified with three different conditions. The results show that the model enables the robot to maintain the basic ability of building an overall topological and metric map of an unknown environment. It also has the following additional capabilities and advantages: Under the guidance of population activities and reward signals, the robot can complete autonomous navigation tasks in an unknown environment and converge to an optimal path faster.
By using the mechanism of forced exploration, new paths can be explored so as to effectively avoid falling into the local optimum.
In conclusion, there are three main contributions of this paper. First, we constructed a unified overall navigation mathematical model that combines the biological plasticity of brain-like learning processes with working memory. This provides a valuable feedback tool for neuroscience. Second, we applied ART to aid in navigation, which reduced resource consumption and broadened the perspective to develop more powerful autonomous robotic navigation systems. Third, detecting the potential mechanism of biological space navigation, a hypothesis is proposed that there may be a region in the brain corresponding to the ART function waiting for biological neurologists to explore.