Autonomous Navigation Framework for Intelligent Robots Based on a Semantic Environment Modeling

: Humans have an innate ability of environment modeling, perception, and planning while simultaneously performing tasks. However, it is still a challenging problem in the study of robotic cognition. We address this issue by proposing a neuro-inspired cognitive navigation framework, which is composed of three major components: semantic modeling framework (SMF), semantic information processing (SIP) module


Introduction
Environment modeling, recognition, and planning are fundamental components for intelligent systems, such as mobile robots, that enable them to understand and perceive the complex environment in the same way a human does and perform the task reliably. When mobile robots are deployed in the real-world, numerous challenges arise: acquiring spatial models of the continuously changing environment that leads to the problem of robotic mapping, a radically changed scene appearance that is one of the significant factors towards the visual recognition failure [1], and the dynamic nature of the operating environment that is a critical challenge for robot planning [2]. Many approaches have been proposed to handle these issues [3]. However, among them, a growing world-wide trend is introducing semantic information into robotic systems using ontology [4], which endows the mobile robot with human-like mapping [5], recognition [6] and planning [7] capabilities. Inspired by these recent advances, our framework exploits the semantic knowledge by defining a Triplet Ontological Semantic Model (TOSM).
Recent developments in robotics have heightened the need for semantic information processing techniques to design object and place recognition systems based on the findings of cognitive science. Semantic recognition systems based on Convolutional Neural Network (CNN) models have made groundbreaking advances to tackle scene understanding tasks with an emphasis on object and place recognition [8]. The complex representation of visual information and their processing by biologically inspired design of CNN model has shown unprecedented advancements in visual recognition. The layered structure in the CNN model is more reliable to represent the lower area of the human visual cortex [9]. Motivated by these developments, we combine the strengths of a deep learning-based CNN model with an on-demand database for semantic object-based place recognition and robot localization.
When planning a task, an agent requires a description of the environment and of itself, containing the possible states, actions, and events. Classical planners, such as Planning Domain Definition Language (PDDL) planners [10], make use of a set of predicates and actions to generate a valid action sequence. Those planners have been vastly used by the research community and integrated into powerful frameworks [11]. However, they tend not to perform well on unpredictable and dynamic environments, usually described by Partially Observed Markov Decision Processes (POMDP). Aiming to solve POMDPs on a variety of domains, reinforcement learning approaches try to use a large amount of data to approximate a policy capable of obtaining an optimal action based solely on the current state. Those approaches have shown promising results, but they still underperform when dealing with long-term scenarios. Our framework tries to overcome those shortcomings by combining classical PDDL planners, sampling-based planners, and reinforcement learning algorithms into a hierarchical scheme.
Autonomous navigation frameworks based on knowledge base have become essential for complex applications of robotic systems in real-world environments. In this direction, the Ontology-Based Unified Robot Knowledge (OUR-K) framework [12] emphasized at robot intelligence to perform navigation, recognition, planning, and manipulation tasks using knowledge description (for contexts, features, objects, spaces, and actions) and knowledge association for establishing relationships between descriptions by supporting inference. The OUR-K framework considered small static indoor environments such as an office room; however, our framework concentrates on robot navigation in large dynamic environments using the TOSM-based architecture. The RoboEarth framework [13] designed a knowledge-based system to generate, share, and reuse data using a cloud database. In contrast, our framework uses the on-demand database to ensure real-time capabilities. Furthermore, it integrates semantic modeling, information processing, and autonomous navigation modules to perform a task. Our prior [14] work has persuaded us to introduce the navigation framework in this study.
Our framework makes major contributions to the semantic robotic research area as follows: • Builds a knowledge base comprising of the advanced ontological environment model.

•
Creates an on-demand database considering the robot's capabilities and missions.

•
Combines state-of-the-art deep learning-based CNN recognition model with an on-demand database. • Defines a multi-layer hierarchical planning scheme that integrates semantic knowledge, a classical planner, and reinforcement learning to enable multi-story building navigation.
The rest of our paper is divided into five sections as follows: Section 2, gives a brief overview of related work in other frameworks, environment modeling, recognition, and planning. Section 3, explains the overall framework and describes each component in detail. Section 4, illustrates experiments and discusses the results. Finally, conclusions and future directions follow in Section 5.

Related Work
In this section, we first discuss the related work for knowledge-based navigation frameworks (Section 2.1). After that, we review the literature that deals with the individual modules of our autonomous navigation framework distinguishing into environment modeling for robots (Section 2.2), recognition (Section 2.3), and planning (Section 2.4).

Knowledge-Based Navigation Frameworks
Recently, there is a growing interest in integrating the knowledge-based approaches into robotic systems, which use ontologies for knowledge representation or semantic web technologies for linking local robotic knowledge with web resources [15]. Therefore, the focus of robotic research community has been shifted to use these approaches for developing efficient robot navigation systems. Suh et al. [16] proposed Ontology-based Multi-layered Robot Knowledge Framework (OMRKF). It aimed at improving the robot's intelligence with its four levels of knowledge related to the model, context, perception, and activity. Its knowledge representation helped the robots to recognize objects with incomplete information and execute a task by only determining high-level service. It modeled the objects, actions, and concepts using Prolog [17]. Later, Lim et al. extended OMRKF and built OUR-K [12] knowledge framework for robot navigation, object recognition, action selection, manipulation, and planning tasks by applying ontology. It consisted of knowledge description for low-level data integration with high-level information and knowledge association in conjunction with bidirectional and unidirectional rules to enable the robot with reasoning capabilities when only partial information about an object was available.
With the rise of cloud-robotics, the web-scale approaches for knowledge processing are used by robotic frameworks. Few researchers suggest reusing and sharing abstract representations of high-level knowledge across different platforms. However, the problem is that most data depends on hardware-specific configurations. Besides, current database systems usually store only one type of data in isolation. These constraints limit the data reuse efficiently. Beetz et al. [13] addressed this problem in the RoboEarth, in which data collection, storage, and sharing were independent of specific robot hardware. It allowed the robots to access and share real-world environment and object models. In its web-based knowledge base, the robot's skills, surrounding environment, and objects were defined as generic action recipes. It generated the semantic maps for robot localization and navigation by integrating the geometrically precise visual SLAM methods with semantically enriched models of recognized objects that could be downloaded from the database. It relied on reference ontology and provided the semantic representation of actions along with the support for reasoning capabilities and rule-based learning [18]. Similarly, the KnowRob [19] knowledge processing system was a semantic framework for performing manipulation tasks that were previously vaguely described. It reduced the gap between detailed description and vague task description. It modeled the environment while integrated the common sense and encyclopedic knowledge with task and object description, along with robot capabilities. It also supported reasoning capabilities and used inference methods to query the knowledge base.

Environment Modeling for Robots
In this section, we present the literature review for environment modeling, focusing on two key areas: map representation and ontology-based knowledge models.

Map Representation
In the Simultaneous Localization and Mapping (SLAM) field, filter-based state estimation techniques and graph-based optimization methods are well studied for building environment maps. Thrun et al. [20] presented a follow-up research work for an offline SLAM problem by introducing GraphSLAM, which represented the log-likelihood of the data and generated the maps with more than 10 8 features in urban environments. Ref. [21], used selective resampling operations to reduce the number of particles in Rao-Blackwellized [22] particle filter for building accurate map of environment. Ref. [23], addressed the motion estimation and distortion problem in lidar cloud by proposing a real-time approach divided into two algorithms that run in parallel. The lidar odometry algorithm estimated transform between consecutive frames at a higher frequency, while the lidar mapping algorithm performed fine matching and registration of the point clouds at a lower frequency. Subsequently, Parallel Tracking and Mapping (PTAM) [24] was considered the most popular method to implement visual based SLAM because of its ability to handle large number of 3D points. Later, Mur-Artal et al. [25] extended the adaptability of PTAM to intractable environments by designing a new feature-based monocular ORB-SLAM system from scratch.
The semantic mapping problem has attracted the researchers' attention in recent years. Pronobis et al. [26] proposed a probabilistic system for extracting semantic information from different heterogeneous modalities and combined it with common-sense conceptual knowledge. It made the semantic maps more descriptive based on the concept of spatial properties that were represented by the chain graph. Kostavelis et al. [27] gave a qualitative description to enhance the representation of the environment and enabled the robot to perceive its surroundings similar to the human. McCormac et al. [28] used CNNs and dense SLAM system ElasticFusion for providing dense correspondence from the 2D frame into the 3D map. The semantic predictions from CNN were obtained from multiple viewpoints and then fused into a dense map. Yang et al. [29] proposed an incremental semantic 3D mapping system with a scrolling occupancy grid map while CNN was used for segmentation and computation of pixel label distribution. Nakajima et al. [30] presented an approach for incrementally building dense 3D maps that were semantically annotated. The class probabilities were assigned to each region of the 3D map and segmented using geometric-based techniques, while CNN enhanced frame-wise semantic segmentation. Xiao et al. [31] presented a Dynamic-SLAM by taking advantage of deep learning-based recognition technique, known as SSD, to recognize the dynamic objects at semantic level while recall rate was improved by proposing a missed detection compensation algorithm.

Ontology-Based Knowledge Model
Ontology-based knowledge systems are required to transform a simple robot into an intelligent one. For this, Riazuelo et al. [32] proposed a system by integrating the knowledge base with a visual SLAM to provide a precise and continuous perception of the environment and accurate recognition of objects. He designed the semantic mapping method based on scene geometry and object locations by combining both the visual SLAM and the RoboEarth ontology. The knowledge base methods used prior information of the landmarks' locations in the semantic map to determine the potential locations of objects for recognition and robot guidance when searching for a specific object. A robot can efficiently perform its task if it has semantic knowledge. The explicit semantic representation of a real-world environment enables it to decide that an object model is suitable to perform the assigned task; if not, then the robot can determine to select among other alternative object models using semantic information. Tenorth et al. [33] worked on this challenge and proposed semantic representation language to speed-up this task by describing actions, articulation models, object recognition, semantic maps of the environment, and reasoning methods about these pieces of information.
It is a challenging task to establish a relationship between representations and their entities, known as grounding. Johnston et al. [34] extended the robotic grounding systems using semantic web technologies. Its architecture was composed of an ontology-based vision subsystem that was mainly focused on object recognition, reasoning, categorization, communication, and collaboration. Autonomous household robots rely on accurate mapping and navigation that need spatial information. Tenorth et al. [35] proposed a system that combined spatial information, encyclopedic knowledge, and general knowledge with activity observations knowledge for environmental map building. Crespo et al. [36] proposed a semantic relational model based on a physical and conceptual representation of real-world objects, places, and semantic relationships among them to endow the robot with the capabilities of making queries about its surroundings for planning and navigation tasks.

Recognition
In this section, we focus on reviewing the object and place recognition approaches, along with localization techniques described in the literature.

Object Recognition
Visual object recognition is a significant problem in mobile robots. Advanced research efforts in object recognition [37,38] are focused on understanding how a cognitive representation, from the filtered output of sensor data, can be achieved to empower the robot with human-like perception [39]. In [40], human-like object recognition was developed for information processing using three hypotheses based on cognitive psychology. Deep neural networks (DNNs) inspired by neurons in the brain have recently come out as powerful algorithms to investigate the biological vision. The neural basis of DNNs has motivated the researchers to use them as models for information processing for recognition tasks [41]. CNN-based object recognition models are categorized into two-stage and one-stage detectors, in which You Only Look Once (YOLO) [42] is considered extremely fast and performs the detection task as a regression problem. Ref. [43], presented a robust method for object recognition based on a semantic model graph in which structural and geometrical attributes along with semantics as intrinsic properties of objects were used to establish a link between high-level semantics and low-level image features. Zhang et al. [44] proposed SPDA-CNN architecture consisting of two sub-networks and introduced mid-level layers for extracting part-specific features to improve the recognition performance. Detecting and recognizing partially occluded parts is a challenging task for modern object detectors because they cannot deal with occlusions. Wang et al. [45] addressed this issue by accumulating the confidence of local visual cues, also known as visual concepts, which were extracted from the internal states of deep networks using clustering.
Many researchers have considered the ontology-based knowledge-driven approaches for the semantic recognition task. In an early study, Choi et al. [46] proposed an approach of ontology inference for constructing the semantic contexts extracted from robot sensors to perform object recognition tasks in real-world environment. Web Ontology Language (OWL) was used for the representation of object ontology in which semantic contexts were generated using axiomatic rules. Another study [47] represented a cognitive vision approach for recognizing the categories of complex objects. This approach was proposed for the semantic interpretation of individual objects. The visual concept of ontology was composed of spatial, color, and texture data. These concepts were used as an intermediate layer, while the link between symbols and sensory information was established using machine learning techniques. Allani et al. [48] introduced a semantic information retrieval approach to build a modular ontology and graph-based model for visual features. A seminal study in this area was the Adaptive Bayesian Recognition framework introduced by Lee et al. [49], which endowed the robot with semantic understanding of 3D objects in home environment. A novel FER-CNN model was incorporated with this framework to extract and reconstruct 3D features by semantically linking to the ontology.

Place Recognition
Visual place recognition (VPR) plays a vital role in achieving long term autonomy for mobile robotic applications [1]. Several studies have shown that features extracted by CNN models outperform traditional vision-based approaches such as SIFT. This success of deep learning-based CNN models has motivated the researchers to use CNN based approaches for the VPR task. Generally, VPR methods are divided into image acquisition, feature extraction, and image search [50].
In a study, Schönberger et al. [51] proposed an approach based on 3D geometric and semantic understanding of real-world environment by training the generative model on the semantic scene to improve the descriptor learning for missing observations. Garg et al. [52] developed a set of semantics and appearance-based methods for efficient VPR in a challenging scenario by extracting convolutional features from a dense semantic segmentation network using the proposed Local Semantic Tensor (LoST) descriptor of images. Chen et al. [53] generated condition and viewpoint-invariant features by training two CNN models to recognize specific places with a multi-scale feature encoding method. Ref. [54], used CNN to perform the task of recognizing revisited locations under different environmental variations and mapping visual information to low dimensional space with Euclidean distance that corresponds to place dissimilarity.

Localization
The robotic research community is making steady progress in determining the autonomous robot localization concerning the environment using recognition techniques. Mitsuhashi et al. [55] proposed a system for mobile robot localization with an appearance-based place recognition approach in conjunction with the dead reckoning and gyrodometry model. Ref. [56], presented a vision-based localization approach in the outdoor environment against a known map by combining a place-dependent feature detector with prior experience of place. Instead of representing edges and corners, these features represented mid-level patches of different elements like windows. Ref. [57], presented the Visual Landmark Sequence-based Indoor Localization (VLSIL) approach and addressed environmental variations using semantic information of steady indoor objects (landmarks) while performed the localization using objects' occurrence order in the video. Zhong et al. [58] proposed a mobile robot's self-localization method using recognition of artificial visual landmarks which were identified using a bilayer recognition algorithm. The geometric relation was established between image and environmental frames to get the orientation and position of the camera for robot localization.

Planning
Planning is one of the main study areas in the robotics field, especially after robots started performing central roles in modern industry and our daily life [59,60]. Moreover, the Defense Advanced Research Projects Agency (DARPA) challenge [61] provided real-world conditions and pushed the boundaries of the Self-Driving Vehicle (SDV) researches. It also showed that sophisticated planning solutions were paramount for SDVs being able to handle a broad span of driving scenarios.
Planning is referred to as a sequence of actions necessary to accomplish a goal, given an initial state. To complete such a goal, the agent requires a description of the environment, its actions, and its capabilities. A planner can perform different roles depending on how general they are. There are many ways of dividing those roles, such as Global/Local planner or Mission/Behavior/Motion planner. The Global planner, generally performed offline, generates a high-level low-resolution sequence of actions, based on the a priori known environment, to accomplish the final goal. The local planner, on the other hand, focuses on a low-level high-resolution sequence of actions, based on the online sensor data, that focus on a portion of the global path [62]. In this paper, however, the Mission/Behavior/Motion division is explored [59].

Mission Planning
On robots, the mission planner generally performs as a "task planner," while on SDV, it focuses on map-level navigation or route planning [3,63].
Task planners usually follow one of two different approaches: classical planning or decision-theoretic planning. One of the first works to integrate classical planners into robotics was the Stanford Research Institute Problem Solver (STRIPS) [64], first implemented on a real robot by Nilson [65]. The planning community standardized STRIPS-based planning by introducing the PDDL, which was later expanded to support temporal planning as well as [10]. Cashmore et al. introduced the ROSPlan framework [11] integrating PDDL 2.1 with the Robot Operating System (ROS), which is vastly used by the robotics research community. By using a Knowledge Base, ROSPlan is able to build the initial domain state automatically, validated the generated plan, and decide when to re-plan. Estivill-Castro et al. [66] showed that PDDL-planners could also work on dynamic environments when other agents follow a deterministic behavior. Researchers have been attempting to integrate Hierarchical Task Network (HTN) with task planning using different techniques. Dvorak et al. [67] proposed the first Flexible ANML Actor and Planner (FAPE) that combined most of the Action Notation Modeling Language (ANML) [68] with HTN task decomposition. It also integrated acting that followed Procedural Reasoning System (PRS) refinements, plan repair and re-planning. However, it used non-recursive decomposition hierarchy which decreased the expressiveness of HTN planning.
Nonetheless, classical planners generally do not perform well in a dynamic and unpredictable environment due to its linear nature, making it unable to reason about multiple outcomes. Decision-theoretic planning, on the other hand, tries to maximize the probability of success (also known as a maximum expected utility, MEU) for every possible action outcomes from any state the robot reaches during the plan execution. It can solve Markov Decision Processes (MDP) and even POMDPs, but its high computational cost makes decision-theoretic approaches prohibitive. Most of the research done in this field was trying to overcome this issue. Boutilier et al. [69] tried to solve an approximated optimal policy by identifying which propositions are relevant and using this information to reduce the state-space. On a similar fashion, Dean et al. [70] used a known initial state to model the world nondeterminism and restrict the planner to focus only on states that might be encountered during the plan execution.
Some newer approaches tried to combine ontology and semantic knowledge into planning. Hence, ontological knowledge eases semantic, temporal, and spatial representations of the environment [4]. Galindo et al. [71] used semantic maps to extend the planner reasoning capabilities, aiding high-level tasks understanding. KnowRob [72] is a knowledge processing framework that combines episodic memory, mental simulation, and a generalized ontology to help a robot to reason, plan, and learn. This work, however, focused only on manipulation tasks [73].

Behavior Planning
The behavior planner receives the next goal/task information from the mission planner combined with the local sensorial data and uses such information to check feasibility, adjust the local level plan based on environmental constraints, and re-plan when necessary. Earlier works used an Finite State Machines (FSM) to make behavioral decisions [74]. Koo et al. [75] used a Strategy/Behavior/Motion hybrid planning approach to control an Unmanned Aerial Vehicle (UAV) in which an FSM was used to define the next action based on a given strategy and the perceived state. Shi et al. [76] combined an FSM with fuzzy logic transition rules obtained from human experience and heuristics. FSMs are, however, limited by the manually designed states, which may lead to livelocks or deadlocks whenever the agent encounters a situation that was not explicitly encoded into the FSM structure [3]. More sophisticated planning architectures tried to overcome those issues. Wei et al. [77] used a Prediction and Cost-Based algorithm to generate action candidates, predict the environment state, and evaluate the best action based on several cost values.

Motion Planning
Motion planners receive a goal point coupled with sensor data and try to find a valid motion sequence to reach the goal without violating any constraint, such as not hitting any obstacle along the way. Usually, motion planners are compared based on their computational efficiency (the process run time and its scalability with the environment growth) and completeness (its ability to find the optimal solution in a finite time) [3]. The most notable motion planning methods are the Cell Decomposition Methods (CDMs), the Potential Field Methods (PFMs), and the Sampling-Based Methods (SBMs).
In cell decomposition methods, the environment's free space is divided into small cells, and the output is a sequence of adjacent cells that leads the initial position to the goal while avoiding obstacles. Arney [78] used approximate cell decomposition, which does not set a fixed cell shape and size, to navigate efficiently. Lingelbach [79] combined CDM with probabilistic sampling to improve its scalability. Cell decomposition approaches, however, suffer from limited granularity, combinatorial explosion, and often generating infeasible solutions [80]. In potential field methods, an attractive potential is attached to goals while a repulsive one is attached to obstacles, and the robot follows the gradient direction generated by the combination of those fields. Both Pradhan et al. [81] and Kang et al. [82] showed that such approach is suitable for multi-robot navigation. Potential fields, however, suffer from often converging to local minima, trapping the robot midway [80].
Sampling-based algorithms have attracted considerable attention due to their ability to scale well while also being probabilistic complete, namely if the number of samples converges to infinity, the probability of finding the optimal solution converges to 1. The most notable SBMs are Probabilistic Roadmaps (PRM) and Rapidly-exploring Random Trees (RRT) [83]. Despite both being based on connecting points sampled randomly, the way both algorithms connect those points largely differ. PRM [84] maintains multiple graph expansions simultaneously, and have been shown to perform well on high dimensional spaces, while RRT [85] rapidly explores a single graph, making it more suitable for local planning on small sized maps. Karaman et al. [86] showed that under mild conditions, original PRM and RRT algorithms, most of the time, returned non-optimal values. The author then introduced asymptotically optimal versions of both methods, namely PRM* and RRT*. This advantage, however, is overshadowed by their relatively slow convergence, which hinders their usage in real-time scenarios. They also provided no theoretical guarantees for obtaining an optimal solution [87].

Deep Reinforcement Learning
Reinforcement Learning (RL) algorithms try to map a set of sensory inputs to actions by formulating this task as a POMDP. Deep RL uses Deep Neural Networks (DNN) to approximate such mapping. Mnih et al. [88] created a Deep Q-Network (DQN), namely a DNN, to estimate Q-values for a value-based RL approach. Shah et al. [89] combined a DQN with several other deep structures to map a natural language command, a segmented semantic image, and a depth image to a motion command, aiming end-to-end map-less navigation. DQN, however, is only suitable for discrete action spaces, thus making it less practical for navigation tasks. Lillicrap et al. [90] proposed an actor-critic RL approach called Deep Deterministic Policy Gradients (DDPG), which used separated neural networks for action generation and Q-value approximation. This hierarchical structure was able to output continuous actions. Tai et al. [91] expanded this usability by creating the asynchronous DDPG (ADDPG), which used several threads to collect experience and a centralized gradient learner. This method was used to perform mapless navigation, and it was shown that it has a faster convergence than the original DDPG.
Our work utilizes a similar approach to Faust et al. [92], who combined a PRM algorithm with an RL policy to address the shortcomings of each method.

TOSM-Based Autonomous Navigation Framework
This section introduces a novel autonomous navigation framework that endows robots with the capacity to perform complex missions using high-level and human-like knowledge. The framework illustrated in Figure 1 is composed of semantic modeling framework (SMF, Section 3.1), semantic information processing (SIP, Section 3.2) module, and semantic autonomous navigation (SAN, Section 3.3) module.

Semantic Modeling Framework
For robots to perform complicated missions, they must have the ability to comprehend the essence of each mission and their working environment. However, recognition methods based on traditional modeling techniques that utilize low-level geometric and semantic information are not suitable for performing complex missions. In other words, robots need a novel high-level environmental model that combines advanced knowledge information to apprehend environments. In this section, we propose the SMF based on the TOSM that imitates the human ability to express and understand the environment.

TOSM-Based Environment Modeling
To begin with, we divide the elements that make up the environment into an object, place, and robot. Each environmental element is represented by the explicit, implicit, and symbolic models based on the TOSM, as shown in Figure 2, to include high-level environmental information.
The explicit model refers to information that can be obtained directly by processing sensor modalities. It encompasses metric information (pose, velocity, size), geometrical features (shape, boundary), and image information (image features, colors) that can be measured using logical sensors based on sensor models. In contrast, the implicit model consists of induced information based on knowledge models and databases. The relations between environmental elements (e.g., The blue chair is inside of the meeting room.) and facts (e.g., a car is movable.) such as human semantic memory defined in the field of cognitive science [93] can be considered implicit model components. Finally, the symbolic model symbolizes environmental elements so that robots and humans can interact efficiently. Each model of TOSM is described based on classes, object properties, and datatype properties, which are terminologies defined in the OWL reference [94]. In the following tables, we explain the environment modeling based on TOSM using the OWL terminologies.
We use classes to group resources with similar characteristics for providing an abstraction mechanism. Table 1 lists the example of hierarchical classes for environmental elements. Object, Place, and Robot are defined as subclasses of the EnvironmentalElement, and each class is defined as superclass of the classes expressed in its right column (e.g., Door is a superclass of AutomaticDoor). Subclasses inherit the characteristics of superclasses, and the relation between two classes can be defined as "is-a." For example, AutomaticDoor inherits all the characteristics of Door, Object, and EnvironmentalElement; the relationship between AutomaticDoor and Door can be described as "An AutomaticDoor is a Door." The classes are used to determine domains and ranges of object properties and datatype properties. Therefore, subclasses can define more specific properties rather than superclasses, such as AutomaticDoor describes Door in more detail. In the case of subclasses for Robot, we use the robot description ontology model as defined in [95]. Object properties are used to represent relations between environmental elements corresponding to the implicit model; in the OWL, object properties provide relations between individuals that are members of the classes. The object properties can be defined using object property hierarchy, descriptions, and characteristics. The object property hierarchy in Table 2 expresses the nodes corresponding to the object properties in a tree-like structure; the right child node is a subproperty of the left parent node. We use SubPropertyOf, domains, and ranges among the various descriptions of object properties. SubPropertyOf is used to represent the subproperty relations between nodes in the object property hierarchy (e.g., isFrontOf SubPropertyOf spatialRelation). The object properties can set domains and ranges of each property using the classes. The individual using the property is inferred as an instance of the domain and range class. For example, if "CIR752 isLookingAt Table34" and Robot and Object are domain and range of isLookingAt respectively, CIR752 is inferred as Robot and Table34 as Object. We use symmetric and transitive characteristics of the object properties. "The property is symmetric" means that it is reasonable to exchange the domain and range of the property. For example, "B isNextTo A" can be inferred from "A isNextTo B", because isNextTo is symmetric. In case of isInsideOf, the property is transitive because we can realize "A isInsideOf C" using "A isInsideOf B" and "B isInsideOf C". We divide relations between objects, places, and robots into spatialRelation and robotRelation. SpatialRelation means the spatial relations between objects and places that constitute an environment. In particular, the properties, where Object specifies both domains and ranges, express relative position relations between objects within places. The properties can be used by robots to infer the relations between objects. Robots can also use a database which consists of isInsideOf and isConnectedTo to understand the environment similarly the way a human does. We explain the spatialRelation in detail with an example in Section 3.1.2. robotRelation is used to define the properties that are determined by the robot's current state. When the robot is in operation, it uses isLookingAt and isLocatedAt properties to comprehend the current state in real-time by associating it with objects and space stored in the database. We explain the robotRelation exactly in Section 3. 2 We use datatype properties to define the TOSM of environmental elements; in the OWL, data properties are relations between instances of classes and RDF literals or XML Schema datatypes. The datatype property hierarchy, similar to object property hierarchy, represents the subproperty relations between datatype properties. Additionally, classes determine domains of the datatype properties, and datatypes determine ranges. Table 3 lists definitions of the symbolic, explicit, and implicit model for Object. The symbolic model consists of name and ID to symbolize individuals of Object. For the sake of clarity, we describe ranges of pose, velocity, size, and color in the explicit model as above. The ranges of these properties are defined as an array of the datatype (e.g., float array) in the database. isKeyObject in the implicit model is utilized to represent crucial objects used for localization and place recognition. Furthermore, the properties, whose domains are the subclasses of Object, are used to define a more specific model of each subclass. In the case of Door, isOpen means whether it is open or not, and canBeOpen means whether it can be opened if it is closed. Robots can use this information when planning and performing tasks. The definitions of the TOSM for Place and Robot using datatype properties are shown in Tables 4 and 5 respectively. The definitions of the symbolic models are the same with Object. We select boundary composed of polygons for the explicit model of Place. In the implicit model, complexity informs the state of Place. capability and purpose in the implicit model of Robot are the main components when making an on-demand database.  We generate an on-demand database based on the environment database considering the robot's hardware-dependent functionalities when it is assigned a mission. For example, assuming a robot that cannot drive on stairs is given a mission from room1 to sidewalk1, a database for building2 connected to sidewalk1 and staircase1 is not necessary for the robot; we can create an on-demand database by sorting out the database that the robot needs. Based on the on-demand database, the robot can realize that it is in room1 inside f loor2 inside building1 using isInsideOf properties connected to room1. Besides, the paths to sidewalk1 from room1 can be planned, utilizing the individuals of Place connected to room1 by isConnectedTo.

Semantic Information Processing
Our Semantic Information Processing (SIP) module endows the robot with cognitive vision capability, inspired by the human ability, that tends to recognize the places by the objects present there, such as "computer lab" or "printer room," which are common terms that are used by human to recognize the places based on their objects. Similarly, humans identify outdoor places by distinguishing different features or objects, such as billboards.
Motivated by these ideas, our SIP module also uses an objects-based recognition approach to form a symbolic abstraction of the places. In the SIP module, first, the objects are recognized by a CNN model, then the inference is performed to leverage the recognized objects with their semantic information stored in the on-demand database. It also infers the robot's relationship information based on the spatial information of recognized objects in the environment. Our SIP module aims at performing recognition and localization tasks with semantically meaningful information to effectively guide the robot's behaviors to reach a goal location. To achieve this, we combine our recognition model with the on-demand database, as shown in Figure 4, and perform the inference on recognized objects against their semantic information stored in the on-demand database. We have selected the third version of YOLO [42] for object recognition because it presents a good trade-off between execution time and accuracy, compared with other object detectors. For example, its accuracy is almost similar to RetinaNet [96]; however, it is four times faster. Moreover, it is much better than SSD variants and Darknet-53 as its backbone network architecture makes it 1.5 times faster than ResNet-101 [42].
First, we train state-of-the-art one-stage CNN based model, YOLOv3 [42] for object recognition with our own data set. When a mobile robot navigates in the environment, visual data captured by a camera is passed to the CNN model; it produces a sequence of layered activations, which generate feature vectors that have a deeply learned representation of high-level features. We perform pre-processing step to remove the noise before passing the visual data captured by the robot's camera to the CNN model. Both the raw-data and the CNN model are hosted by short-term memory (STM). Raw-data does not have access to the on-demand database, while the CNN model, which holds the recognition components, has access to the on-demand database. Object and Place models are initially stored in this database. We download these semantically enriched models of recognized objects from the on-demand database when required by the robot.
Our YOLO-based recognition model consists of two phases: the processing phase and the retrieving phase.
The processing phase is also known as an offline or training stage in which we train the YOLOv3 model for object recognition using our dataset. The training phase involves three major steps; (i) Dataset annotation: we label objects in each image using the YOLO-Mark tool. (ii) Convolution weights: we use YOLOv3 weights trained on ImageNet for convolutional layers. (iii) Configuration set-up: we prepare a model configuration file that loads 64 images in the forward pass for each iteration during the training process. The model updates the weights when backpropagating after gradients' calculation. Thus, for each iteration, our model uses 64 batch size with 16 mini-batches, and four images are loaded for each mini-batch. In our configuration file, we define filter size to 60, the maximum number of iterations 30,000 for 15 classes. The YOLOv3 has three YOLO layers; therefore, we generate nine anchors to recognize small, medium, and large size object at its three YOLO layers. Finally, when the training phase is completed, we get new weights for our dataset.
The retrieval phase is known as an online or testing stage. In the testing phase, weights that were generated during model training, are used to recognize objects in a real-world environment.
In our SIP module, after object recognition using the CNN model, inference for semantic information is performed at run-time by integrating the recognized objects with their semantic information that contains spatial relationships between objects, described in Table 2. Through these spatial relationships among recognized objects, we infer the relationship information of the robot by leveraging with the semantic objects and space stored in the on-demand database as follows: Assume x is an object recognized in the environment by the CNN model, while X is the subclass of EnvironmentalElement (e.g., Furniture, Door, and Corridor). Then the class of object x is defined by a class definition function IsA(x, X), which indicates that object x is class X. As shown in Figure 4, there are two objects in the environment: table and chair. The class definition function IsA (table, Table) and IsA(chair, Chair) states that in both cases, the class of the table is Table and the chair is Chair. When the class is defined, we can get the datatype properties of the object from the database based on the robot's pose and relative pose of the object. In other words, Chair is specified chair2 (we call this X n ) that is an instance of Chair by the database query such as illustrated in the gray table inside Figure 4.
We also use a relationship inference function SR = rel(X n , X m ) to define relationships between recognized objects, other objects, and places in the database. According to the class definition function, X n indicates an instance of class X. SR describes a spatial relationship between X n and X m , such that "X n isNextTo/isInsideO f /isInFrontO f X m " (e.g., "table1 isNextTo chair2".) Similarly, we can infer the robot's relationship in the environment; in that case, the X n is an instance of Robot.
So, when the robot navigates in a real-world environment, and it finds two objects, table and chair, shown in Figure 4, the robot recognizes these two objects using the CNN model and infers these objects with their semantic information that is saved in the database. As a result, we get spatial relationships such as "table1 isNextTo chair2", "table1 isInsideO f corridor2", "robot isLookingAt chair2", and "robot isLocatedAt corridor2".
We can also see that the recognition model and the TOSM-based on-demand database share human-like memory architecture when working together for visual semantic recognition. The SIP module is processed in working memory (Short Term Memory, STM), while prior semantic knowledge stored in the on-demand database is hosted by Long-Term Memory (LTM), as shown in Figure 5. When the mobile robot navigates in a real-world environment, it sends the sensory data to SIP, which passes the pre-processed data to the recognition model that is stored in working memory and gets the periodic visual updates as robot navigates.

Semantic Autonomous Navigation
The Semantic Autonomous Navigation (SAN) module allows the robot to perform complex missions through a hybrid hierarchical planning scheme. The knowledge base, combined with a behavior database, enables autonomous navigation in a variety of environments while taking the robot capabilities into account. Throughout this section, we use a single mission example for a clearer understanding. Consider a multi-story building represented by the database displayed in Figure 3, where elevators and staircases connect each floor; a robot located on the 2nd floor inside the building; a mission given by a human that says it should retrieve a package at the sidewalk outside of the main entrance located on the first floor. The following sections demonstrate, through this mission, how the SAN module can enable robots to perform high-level multi-story navigation.

Task Planner
After receiving a mission, such as robotAt place1 (navigation) and inspected place2 (inspection), the SMF creates the robot's on-demand database based on the mission and its capabilities. For example, different robots have different capabilities: some can go up and downstairs, while others can operate an elevator autonomously, as shown in Figure 6. Considering a robot that can operate the elevator, the on-demand database does not need to contain the staircases' information.
The task (i.e., mission) planner access only to the on-demand database speeding up the queries during the plan execution. The environmental implicit information, namely isConnectedTo property together with the robot's location, is used to populate the ROSPLAN [11] knowledge base. The robot's implicit information is also used to provide the domain PDDL file. Thereupon, ROSPLAN is used to generate a valid behavior sequence. To generate a behavior sequence, PDDL planners require two files: domain and problem. The domain file describes the environment predicates and robot actions. As different robots can perform a distinct set of actions, a comprehensive set of domain actions, all sharing common predicates between them, is stored into the behavior database. Given the robot's implicit TOSM data, a matching domain file is generated and fed to ROSPLAN by the behavior database; the problem file is generated automatically by ROSPLAN knowledge base. It needs, however, the environment state information to do so. ROSPLAN state update services are used to feed the map data into the knowledge base. After that, the behavior sequence is generated through the Partial Order Planning Forwards (POPF) planner [97]. Despite the current state of the behavior database not including temporal-actions, we have plans to also support them in the future. Therefore, we opted to use the POPF planner due to its ability to deal with both simple and durative actions. ROSPLAN also sends the current behavior to the Behavior Planner through the medium of the Plan Dispatcher node.  In our example, the 02 robot shown in Figure 6b receives the multi-floor navigation mission. The robot then receives the on-demand database generated by the SMF based on the robot's implicit information (i.e., its capabilities). This on-demand database contains the building floors and the elevators, but no stairs, as the robot cannot transverse through them. Moreover, a domain file containing all the possible robot actions is sent to the ROSPLAN knowledge base. An example action, which sends a robot to a waypoint on the same floor, is shown on Listing 1. The predicates are general and shared between all actions stored on the database, thus making possible generating the domain file simply by querying which actions can be performed by the robot. Subsequently, the environment TOSM data is used to populate the ROSPLAN knowledge base environment state, adding connected waypoints, elevator, robot, and goal predicates. After running the POPF planner, the behavior sequence shown on Listing 2 is generated and sent by the dispatcher node to the Behavior Planner module.

Behavior Planner
The behavior planner can be divided into two modules: the waypoint generator and the behavior handler. After receiving the next goal from the task planner, one of those two modules takes care of breaking down this behavior into atomic actions. The waypoint generator handles motion behaviors. While complex specific activities, such as entering and exiting an elevator or going up or downstairs, are processed by the behavior handler.
The waypoint generator receives a goal point from the task planner, the current position from the localization module, and a local metric map obtained from the on-demand database and stored into the STM. It then uses the PRM algorithm proposed by [84], after being adapted for mobile robot navigation, to generate a feasible sequence of waypoints. We start by generating random samples N i on the metric map free space C f ree . The PRM algorithm's goal is to generate a valid path between the current position N p and the goal N g by creating edges connecting two sample nodes E k (N i , N j ). Initially, it generates n random samples on the 2D metric map, eliminating any sample outside of C f ree and re-generating new random samples until n valid samples are successfully generated, whereas in our approach, PRM works as a low-resolution path-planner, n can be of a smaller magnitude when compared to the overall map size, speeding up the query step. Thereon, it iterates through every sample generating valid paths connecting to the k-nearest neighbors. A path is considered valid if the two points can be connected by a straight line which is completely inside C f ree and is smaller than a maximum connection distance threshold. This graph generation depends only on the metric map and can be pre-generated by the robot when idle or while executing actions that do not require a large amount of processing power. Finally, to generate a waypoint sequence, N p and N g are added to the graph and the Dijkstra's algorithm [98] is used to query the shortest path.
The behavior handler receives a specific desired behavior from the task planner and then queries the behavior database for instructions on how to execute such activity. The behavior database (BDB) can be seen as the human implicit memory [93], which saves cognitive skills learned throughout one's life. The BDB stores finite state machines (FSM) able to coordinate single behaviors, such as the one shown in Figure 7. Each state (i.e., action) has a standardized ID and a possible events list that is used to check the transition to the next state. After obtaining the FSM description, the behavior handler sends the initial action ID to the action planner and waits for the returned events. Thereon, it uses the event list to choose the next state and sends it back to the action planner. This sequence is repeated until one of the two final states is reach: success or failure.
Both the waypoint generator and the behavior handler emit a success signal to the task planner when the behavior is completed, allowing the planner dispatcher to send the next one. In case of failure, a homonymous signal is sent back instead, and the PDDL planner is able to perform re-planning starting from the current state.

Action Planner
The action planner can receive two different kinds of actions: a waypoint from the waypoint generator or a specific action ID from the behavior handler.
When receiving an action ID, the action planner looks for the implementation of such action inside the robot's LTM. Each robot should have its implementation of the needed actions, as their execution is dependent on the robot sensors and actuators. In that case, the planner is unable to find the requested action, it returns a failure event to the behavior planner. On the other hand, if the action implementation is found, the robot executes it and returns the events related to the conclusion of such action.
If the requested action is related to following a waypoint instead, a different structure is used to handle it. A mentally simulated world was generated and used to train an autonomous navigation policy using reinforcement learning algorithms. This process is further explained on Sections 3.3.4 and 3.3.5. The action planner uses the policy returned by the mental simulation in order to navigate towards a waypoint while also avoiding close obstacles. After getting close enough to the goal point, the action handler returns the success event to the behavior planner and waits for the next instruction.

Mental Simulation
Mental simulation is one of the integral cognitive skills, which has been long studied by cognitive science and neuropsychology research [99]. Nonetheless, just a few works tried to implement such concepts into robotic platforms [72]. Some usages of the mental simulation are re-enacting memories, reasoning about different possible outcomes, and learning by imagination. In this work, the mental simulation module is used to perform reinforcement learning aiming for autonomous navigation. The simulation building algorithm was originally proposed on [100].
One of the main issues of performing mental simulation on robots is the need of domain experts to model the simulated world based on the robot working environment. Modeling the environment is not feasible when robots navigate through large-scale dynamic environments. To overcome such limitation, we built a middle-ware capable of generating a simulated world, including the robot itself, using only the information stored on the on-demand database. By doing so, whenever the robot updates its internal knowledge about the environment, the simulation is also updated in the same manner. Whenever the simulated environment is needed, the middle-ware queries the needed data from the on-demand database and generates Universal Robot Description Format (URDF) and Simulation Description Format (SDF) files, which are used to create the virtual world on the Gazebo Simulator. Figure 8 shows an example output of the simulation building algorithm. It uses the data stored on the on-demand database to generate the simulation environment. To ensure this simulation is robust while also maintaining its consistency with the real world, we use a library containing 3D meshes for common objects such as doors, chairs, and tables. If the robot encounters a real-world object in which there is no 3D model available, a placeholder is automatically generated using the perceived shape and color of the object. Such point can be seen on Figure 8, where the simulated door uses a 3D mesh, while the vending-machine and the beverage storage are represented by automatically generated shapes.

Reinforcement Learning
The main usage of the mental simulation is to generate experience tuples (S, A, R, S , D), where S is the current state, A is the chosen action, R is the reward and S the next state after performing A, and D represents whether or not the episode has ended. Those simulated episodic memories are generated whenever the robot is idle, such as when charging. Those memories are then sampled in batches to train a policy using the DDPG algorithm [90]. The RL structure is shown on Figure 9. Three consecutive sparse laser scans and relative distances to the goal are concatenated as the algorithm input. This input goes through an actor-network which outputs two different values. A sigmoid function is used to encode the robot forward speed between 0 and 1, while a tanh function encodes the angular speed between -1 and 1. Those outputs are concatenated with the input and served to the critic-network, which outputs the Q-value (i.e., the expected future reward) for the action chosen on this given state. Finally, the environment rewards are defined as where r completion , r time , r closer and r collision are positive integers defined trivially.
As the training step only needs to sample the experience tuples, it can be performed either on-board or by using a central cloud service, provided that such structure is accessible by the robot. The Deep Deterministic Policy Gradients (DDPG) structure used to approximate an autonomous navigation policy.

Experiments
In this section, we present our experiments using the TOSM-based autonomous navigation framework. The objective of the experiments is to demonstrate the framework's validity and introduce a possible use case.

Experimental Environments
We conducted experiments in the convention center environment expressed in Figure 10 to apply the proposed framework. The specified working environment was a size of 5000 m 2 with dynamic pedestrians. In Figure 10a, a red circle represents the robot's home position, and a translucent red polygon represents the area where the robot performed the mission. Additionally, Figure 10b  The environmental database, illustrated in Figure 11, for the first floor in the convention center consisted of 23 instances from three subclasses of Object and 16 instances from nine subclasses of Place. Although there were many objects on the first floor, we selected only static objects that can be worked as dominant features for recognizing environments; We defined this property as isKeyObject in the implicit model for objects. In the case of objects which were not key objects (e.g., pedestrian), we considered the objects only in the short term memory for the efficiency of memory consuming. By doing so, the database consumed memory less than 0.1 MB in size, although it covered the large environments.  Figure 12 illustrates how the robot performed its mission. First, the robot, located at the corridor1, received a mission from an administrator to inspect kdjHall1 on the first floor of the convention center and come back base1. Then the SMF, shown in a blue box, generated an on-demand database that containd only the database needed in the first-floor database, considering the robot's characteristics and missions. In addition to those defined in Table 1, classes expressed in the database were used by adding object and space classes, such as Column and InformationCenter, in consideration of the convention center environment. The SMF also built and saved an occupancy map for estimating the robot's pose and generating waypoints. Second, the SAN module set goals as "inspected kdjHall1" and "robot_at cir752 base1" to divide the mission into six modular behaviors using the ROSPlan, which moved the robot to the working area, realized the inspection, and returned to the robot base; each behavior was described on Listings 1 and 3.

Experimental Sequences
Each waypoint following behavior can be expanded, as shown in Figure 12, where the PRM algorithm used randomly sampled points to generate a 2D graph. This graph was traversed to generate the shortest path to the goal. Afterward, each desired point in the graph was sent sequentially to the Action planner, which then used the trained policy to navigate safely. ) : e f f e c t ( i n s p e c t e d ?wp) ) Figure 12 describes the moment of executing second behavior "goto_waypoint cir752 c2 kdjHall1 f l1", which meant cir752 should go to kdjHall1 from c2 inside f loor1. When the robot started executing the behavior, it got sensory inputs to recognize the environment. The SIP module recognized objects and places using sensory inputs based on the CNN model with semantic knowledge in the on-demand database. In this case, it could infer "cir752 isLookingAt column12, column16, and in f ormationCenter1", "cir752 isLocatedAt corridor2 that isInsideO f f loor1", and "column12 isNextTo column16". In addition, it got datatype properties of each instance. Finally, if the robot executed the final behavior successfully, which was goto_waypoint base1, it accomplished the mission.

Experimental Results
In this section, we evaluate the performances of the PRM-based behavior planner and the action planner to demonstrate the validation of our framework.

Behavior Planner
One of the key hyper-parameters of the PRM algorithm is the number of nodes n. We conducted experiments aiming to show the influence of this value on the algorithm performance. Initially, four differently sized places were selected, as shown in Table 6. Using each of the places' grid map, we calculated the map total area by multiplying the map's height by the map's width in meters. To calculate the free space ratio value, we divided the amount of pixels representing a free space on the map image by the total amount of pixels. Finally, we multiplied the free space ratio by the total map area to generate the free space area. We chose a set of the number of samples {50, 100,200, 300, 400, 500} heuristically, and ran our PRM algorithm 50 times for each n value on each map. The most important performance information for our application was the time taken to build the PRM graph, the time taken to find the optimal path, the length of the final path, and whether or not the algorithm was able to find a path between N p and N g . The average values for each of those measurements are shown in Figure 13. While the time taken to build the PRM graph grew quadratically with the number of samples, the time taken to find the optimal path only grew linearly. As our approach generated the graph only when receiving a new map, the effect of building the PRM graph can be mitigated. Smaller places showed higher times for both, graph building and path finding, when using the same amount of samples as larger places, because it became progressively harder for the algorithm to find free spaces when the map was nearly filled with samples. In addition, graphs with a high density of nodes per square meter tended to have a higher variety of possible paths, which could slow down the path search.
Additionally, we were able to show that the number of samples barely influenced the final path length. However, it had a considerable influence on the ability to find a valid path. When the amount of nodes was too small while comparing to the map size, some nodes could not find valid neighbors, which led to disconnected graphs. This situation is not desirable when performing a mission, as it would require the robot to re-generate the PRM graph whenever encountering such an issue. This can lead to unexpected delays and even dead-locks. Aiming to select an optimal number of nodes based on the free space information, we performed a linear regression while trying to minimize n with two constraints: the path search time should take no more than 15 ms; the success rate should be above 90%. We obtained the following coefficients: where ceil represents the rounding up operation. By using Equation (1), the number of samples needed by each one of the aforementioned maps is displayed in Table 7. We ran the algorithm 50 more times, this time using the optimal n values. It has been shown that by using the proposed formula, both constraints were satisfied.

Action Planner
The policy training described in Section 3.3.5 was performed using the simulation shown in Figure 8 generated by the mental simulation building algorithm. We used an experience database with a maximum size of 1 million tuples and learned on batches of 512. The learning rate used was 10 −6 , and the critic network decay value was 0.95. Finally, the rewards were chosen as follows: r completion = 1000, r time = 1, r collision = 500, and r closer were rewarded sparsely when the robot travelled 25%, 50% and 75% of the path, giving a reward proportional to r completion . We trained the robot for 30,000 episodes (approximately 1.3 million steps). The first 5000 episodes used a purely random policy in order to generate a large variety of experiences. The rewards obtained on each episode can be seen in Figure 14. It can be seen that until episode 15,000, the robot was stagnated, then learned a better policy and started exploiting these actions, which lead to higher rewards. After 22,000 steps, the leaning plateaued again. To demonstrate the advantage of our hybrid approach when compared to pure reinforcement learning navigation methods, we performed the experiment shown in Figure 15. The robot started on the red arrow, and its goal was to navigate until the red circle by avoiding collision with any obstacle within 10 min. The simulated environment was generated by the mental simulation building algorithm. It was based on a narrow corridor sequence inside a university building. The total path size was 65.6 m. We performed two different experiments ten times each. The first one was using only the policy learned to navigate the robot directly to the goal. The second one used the PRM algorithm to generate a waypoint sequence as the one displayed in Figure 15b, which would be used as goals for the navigation policy. The results of this experiment are summarized in Table 8. It can be seen that the hybrid approach arrived at the goal 40% faster than the navigation done purely by using the learned policy. When the goal was too far away, the robot kept turning itself to the goal point, going in the direction of a wall, then turning back to avoid a collision. This behavior slowed down the navigation and was less predictable than the path followed by the PRM-RL approach. Both methods had two failures. The RL method timed out after going back and forth in the first corridor. This failure happened depending on how close the robot was to the trashcan when approaching the corner. If the robot was too close, the laser scanner would not be able to see the second corridor's entrance, believe it was a dead-end, and turn back. In the case of the PRM-RL approach, the failures were caused because the PRM algorithm created a sample too close to trashcan, which led to a collision. We believe this could be solved by using a more robust sampling algorithm than the one currently used.
(b) Generated Waypoint Sequence.  The experimental results show that our hybrid approach added an extra layer of robustness to autonomous navigation based on reinforcement learning policies. It was able to perform a navigation task 40% faster than the baseline while maintaining the success rate.

Conclusions
In this paper, we have introduced our cognitive science inspired navigation framework and its three composing parts: semantic modeling framework (SMF), semantic information processing (SIP) module, and semantic autonomous navigation (SAN) module. We also showed how we modeled the environment employing the triplet ontological semantic model (TOSM), which allowed the three modules mentioned above to utilize the same modeled data. The on-demand database allowed different robots to share a common database by selectively querying the environment data based on each robot's intrinsic capabilities. The SIP module, performed object detection and localization using the deep learning-based approach, while, inferred the robot's relationship information based on the semantic properties of recognized objects derived from the on-demand database. Moreover, the SAN structure showed that a hybrid planning approach, which combines classical and sampling-based planners with reinforcement learning algorithms, is able to mitigate each planner's shortcomings leading to a more robust planning algorithm. The performed experiment showed that our framework can be applied to real-world scenarios, and navigation tasks on large dynamic environments can be performed seamlessly with the aid of TOSM data.
Immediate future work is focused on expanding the framework to other environments, more specifically outdoor environments. A modeled outdoor environment can also be used on different platforms, such as self-driving vehicles. Moreover, a large variety of TOSM semantic information shall be used by the SIP module when performing object recognition and localization. In addition, we are planning to add a large variety of possible behaviors to enable higher-level missions to be performed. A more robust sampling algorithm can also be used to improve both the graph build time and the success rate of navigation tasks. Finally, another line of research is using the mental simulation to check the plan validity before its execution. For example, the feasibility of following a sequence of points generated by the PRM algorithm can be tested on simulation. In case of failure, a new sampled graph can be generated.