Ontology-Based Framework for Cooperative Learning of 3D Object Recognition

Featured Application: This article proposes a semantic web framework and application able to perform semantic analysis to create a common understanding about objects across varied robots, enabling robots to learn from each other’s experience. Abstract: Advanced service robots are not, as of yet, widely adopted, partly due to the effectiveness of robots’ object recognition capabilities, the issue of object heterogeneity, a lack of knowledge sharing, and the difﬁculty of knowledge management. To encourage more widespread adoption of service robots, we propose an ontology-based framework for cooperative robot learning that takes steps toward solving these problems. We present a use case of the framework in which multiple service robots ofﬂoad compute-intensive machine vision tasks to cloud infrastructure. The framework enables heterogeneous 3D object recognition with the use of ontologies. The main contribution of our proposal is that we use the Uniﬁed Robot Description Format (URDF) to represent robots, and we propose the use of a new Robotic Object Description (ROD) ontology to represent the world of objects known by the collective. We use the WordNet database to provide a common understanding of objects across various robotic applications. With this framework, we aim to give a widely distributed group of robots the ability to cooperatively learn to recognize a variety of 3D objects. Different robots and different robotic applications could share knowledge and beneﬁt from the experience of others via our framework. The framework was validated and then evaluated using a proof-of-concept, including a Web application integrated with the ROD ontology and the WordNet API for semantic analysis. The evaluation demonstrates the feasibility of using an ontology-based framework and using the Ontology Web Language (OWL) to provide improved knowledge management while enabling cooperative learning between multiple robots.


Introduction
Service robots and automation systems have rapidly evolved to support humans' day-to-day activities in the home, office, and other places [1]. Among the many possible features of a service robot, one of the most important for effective human assistance is object recognition [2]. Effective 3D object recognition requires a large amount of storage and power consumption compared to typical robot systems. Offloading necessary data storage and computing from service robots to a centralized infrastructure would reduce robots' power consumption and reduce the complexity of their firmware. This idea has led us and other researchers to explore the use of cloud computing to support service robots, especially in the use case of 3D object recognition [3]. Cloud computing, by providing a means for knowledge sharing, can enable multiple service robots to cooperate in learning 3D object recognition capabilities. Different service robots have different capabilities and different physical characteristics but connecting multiple service robots within the same network could enable sharing and pooling of resources. To make this goal a reality, more research is needed on internetbased architectures for robot learning. In our work, we propose a cloud-based framework for knowledge sharing between networked service robots to enable cooperative 3D object recognition. Our framework represents knowledge of robots (e.g., a robot's capabilities and physical characteristics) and objects (e.g., an object's appearance and description). Using knowledge sharing capabilities in our framework, robots can retrieve knowledge synthesized by other robots without performing extensive computation. This will potentially help lower robots' computation costs and time for performing object recognition tasks. We have developed a proof-of-concept prototype of the cloud-based knowledge sharing framework and a cloud-based service for the use case of 3D object recognition. An evaluation of the proposed framework with our prototype demonstrates the feasibility of delivering object recognition services to robots.
Currently, different service robotic applications describe objects using their own specific methods. This prevents cooperative learning about detection and recognition between different service robots across different applications. In a cooperative environment in which robots share knowledge, translating object descriptions between different robotic applications would be tedious and error prone. It would involve different robot systems and different applications creating non-reusable knowledge. With the aim of enabling cooperative learning of recognition models for heterogeneous 3D objects, we propose an ontology-based framework. The framework allows a distributed system of cooperative robots to populate a unified knowledge base of heterogenous 3D objects. We build on the Unified Robot Description Format (URDF) defined by ROS.org to represent robots, because URDF is the de facto standard for describing robots [4,5]. We define a new ontology, the Robotic Object Description (ROD) ontology, to represent heterogeneous objects. With robots and objects being represented in a machine-interpretable manner, we can bring down barriers preventing different applications and robots from working cooperatively with each other. In order to map the ROD ontology human-understandable concepts, we bundle the WordNet database with our framework.
The rest of the paper is organized as follows: Section 2 presents related work and highlights our contributions. Section 3 demonstrates the need for an ontology-based framework for sharing knowledge of objects among different robotic applications and provides an overview of our framework architecture. In Section 4, we present, in detail, a specific approach to cooperative robot learning using ontologies. A validation of the framework is given through a proof-of-concept prototype implementation and ontology evaluation, which are described in Section 5. Finally, Section 6 concludes with an outlook toward future work.

Related Work
Many researchers have explored the possibility of adopting cloud infrastructure for robotic applications. See Jordan et al. for a survey on the rise of robotic systems supported by cloud computing and Kehoe et al. for a more extensive survey [6,7]. Robotic cloud support systems, in general, have been developed as instances of the "platform as a service" (Paas) and "software as a service" (Saas) traditional cloud service models. However, a new "robot as a service" (RaaS) model has also been introduced. For example, Yinong et al. [8] propose a RaaS system as a standard platform for common robotic services with a corresponding development process and execution platform. The system provides cloud infrastructure to schedule utility robotic services and give users access to robot resources at low cost [9].
As an example of robotic applications at the PaaS level, Rocha et al. [10] propose the cloud computing platform REALabs for providing cloud resources to networked robotics applications. The platform provides a Virtual Machine management service that allows users to provision and manage their own virtual machines as well as a collection of robotic services, including access control and data security. This platform allows some knowledge sharing among robotic applications.
At the SaaS level, Arumugam et al. [11] present DAvinCi, a cloud-based framework for multiple service robots. They describe a parallel FastSLAM (Fast Simultaneous Localization and Mapping) system deployed on the Hadoop MapReduce API. Their work demonstrates the utility of sharing a centralized large area map among networked robots. Another example of SaaS is the SCRS, the Senior Companion Robot System [11]. The proponents propose a cloud-computing platform to provide remote robot control services for interaction between humans and robots. The platform assists seniors with day-to-day activities such as reading books and home device control. A further example is the ROBOEARTH system, a cloud-based knowledge representation using the web ontology language, proposed by Tenorth et al. [12]. Their system allows the use of knowledge among networked robots. One use case of the system was demonstrated in a convenience store in which a robot tracked objects with attached RFID tags to detect and pick up specific objects.
Moving toward object recognition support using cloud computing, Kehoe et al. present an application based on the Willow Garage PR2 research robot that uses cloudbased support to perform highly difficult and highly complex tasks. Concretely, the robot acquires a 2D image and measures 3D point cloud data for a scene. Thereafter, the robot sends the 2D image to be processed on the cloud-based server for object recognition [13]. If the server can classify the object, it responds with the object's 3D CAD model. The robot utilizes both the 3D point cloud data and the 3D CAD model to compute grasping strategies, and stores the results on the server for reuse in the future. Turnbull and Samanta [14] present a robotic system integrated with cloud infrastructure that offloads highly computational tasks such as object perception. The system contains multiple sensors for acquiring image data from robotic applications and uploading them to the cloud server. Thereafter, the cloud server determines the locations of objects and how the robot can interact with them.
Other researchers are exploring support for interoperability between multiple robotic applications. For example, Sun et al. demonstrate a robot decision-making approach using an ontology-based platform developed at Shandong University [15]. They describe household objects, sensors, equipment, and location of robots in a machine-interpretable manner using an ontology. The use of this ontology-based platform allows automation of service robots without human intervention. Concretely, they define a context-aware infrastructure ontology to represent human activities at home. They also include a concept in their ontology, namely "object," to describe the types of objects used in the smart home. The ontology, however, does not comply with standards or reuse existing ontologies [16].
Dogmus et al. define an ontology, namely RehabRobo-Onto, to formally represent data about rehabilitation robots in a knowledge base [17]. They also develop a cloud-based platform to integrate this knowledge base with other knowledge resources such as patient and disease data. The platform can recommend suitable robots to assist patients. The platform allows knowledge to be exchanged between rehabilitation robot researchers over the Internet to improve the development of rehabilitation robots. Mahieu et al. propose an ontology-based platform using the Internet-of-Robotic-Things (IoRT) ontology [18]. The IoRT ontology contains different modules that describe various aspects of networked service robots, for example, semantic sensor networks for IoT, localization, person profiles, tasks, and personalized robotics. The platform provides a service for robot task planning based on IoT sensor data.
Prestes et al. and Carbonera et al. propose the Core Ontology for Robotics and Automation (CORA) to represent heterogeneous robots in a knowledge base [19,20]. CORA was developed by the IEEE Ontologies for Robotics and Automation Working Group. Their work focuses only on robot description [21]. Olszewska describes an ontology standard for autonomous robots called ORA developed by the IEEE RAS Autonomous Robotics (AuR) Study Group [22]. ORA includes concepts and relations for robotic applications inherited from SUMO [23]. Our work, in contrast to the previous work, represents het-erogenous robots with the Unified Robot Description Format (URDF), the de facto standard for describing robots defined by ROS.org [24,25]. The existing work does not consider knowledge sharing about object recognition tasks between different robots and different robotic applications. To achieve that goal, we propose a new ontology, the Robotic Object Description (ROD) ontology, to enable object learning by possibly heterogenous robots. Using URDF and ROD with WordNet, a large lexical database, our framework provides a cooperative learning environment inclusive of disparate robots and applications [26].
Umbrico et al. present the Sharework Ontology for Human Robot Collaboration (SOHO) ontology built on top of the CORA ontology to represent collaborative manufacturing scenarios [27]. Gómez and Miura propose an ontology-based knowledge management system to present the environment of home service robots and interaction between robots and the environment using natural language commands [28]. Our work, instead of focusing on enabling a collaborative environment for robots and humans, involves a framework to provide a common understanding about objects between service robots. To achieve this, we propose to use ontologies and linguistic analysis to describe robots and objects.
Brono et al. propose an ontology-based framework for storing cultural information to manage and adapt a robot's interaction to the user's habits and preferences [29]. This work presents a generic framework, while our work defines a specific ontology for robot cooperative learning. Diab et al. also present an ontology-based framework, namely Perception and Manipulation Knowledge (PMK) to provide common vocabularies for knowledge sharing between humans and robots [30]. Their work focuses on a single environment, while our work proposes a framework to share knowledge among varied robots in different environments.
The technology of 3D object perception helps support service robots in their effective operation, because a helpful robot must use 3D data to estimate the positions of objects relative to itself and the surroundings. In order to enable effective recognition of 3D objects and to enable planning of grasping or other manipulation strategies, we use the Point Cloud Data (PCD) format standard [31]. Recently, many studies on methods for object detection from 3D data for robots have been introduced [32], and the PCD standard was invented to facilitate detection of objects described by their 3D point patterns, considering the best match between points in an object description in a certain area of the scene. Object detection with PCD involves finding the translation and rotation of a template best aligning it with a scene [33][34][35]. Additional techniques are also being introduced to improve 3D object recognition capabilities, such as the use of 2D images with multi-view projection detection to speed up the detection of 3D objects [36] and efforts to improve the Iterative Closest Points (ICP) alignment algorithm by accelerating the speed of processing for 3D point clouds [37].
The main contribution of our proposal is a method for cooperative object recognition task learning by multiple robots. We further propose a general framework for allowing heterogeneous robots to share knowledge related to 3D object recognition tasks (and in the future, other tasks as well). Our framework contains knowledge of robots (e.g., robots' capabilities and physical characteristics) and objects (e.g., objects' appearance and description). By using the cooperative learning and knowledge sharing capabilities of our framework, robots can exploit previous learning without performing extensive computation. Hence, our proposal will, in the future, lower onboard computation time and embedded system cost, compared to existing work. See Table 1 for a comparison of our framework to the existing frameworks in the literature.

Proposed Cloud and Ontology-Based Framework Architecture
In this section, we propose a framework for large-scale distributed robot learning using ontologies through cloud computing, with a specific case study of object detection and recognition for service robots. We also detail how the framework supports cooperative learning and explain our object detection and recognition approach.

Motivational Example
We illustrate our proposal with the motivational example given in Figure 1. Consider two known objects (A and B) captured by different robots with different RGBD cameras or laser scanners. The structure and appearance of each object is represented in the Point Cloud Data (PDC) format (i.e., a set of points sampled from the object's surface) as defined by the Open Perspective Foundation [31]. The PCD data are uploaded to the cloud platform through the Object Training Service, which constructs, over time, a shared training set for object detection and recognition, a set of Object Templates. We allow multiple object templates for a single object in order to handle different viewpoints for identifying asymmetric objects (e.g., object A). If we suppose that another robot wants to identify another instance of object A in a target scene X also represented by a point cloud, then the robot merely needs to submit X to the Object Detection and Recognition Service to perform matching of the target scene with the set of Object Templates specifically for object A. Multiple object template matching processes can be run in parallel on the cloud platform to reduce the time required to find the best match. Finally, the best-matching template would be returned to the robot for the purpose of identifying the position and orientation of the instance of object A in the scene and making it possible to plan manipulation strategies.

The Ontology-Based Framework
We create an ontology-based framework to support the creation of a cooperative heterogenous robot learning environment. Our framework allows multiple robots to offload compute-intensive machine vision tasks to cloud infrastructure. Concretely, we propose the Robot Object Description (ROD) ontology, which uses of the SUMO ontology to represent objects and the Friend-of-a-Friend (FOAF) ontology to represent users who own or utilize robots. A user can label an object observed by a robot with a WordNet concept to create a common understanding of what the object is. Figure 2 gives a high-level overview of the framework. A newly labeled object can be added to a collective training dataset by a particular robot. Once the new knowledge is incorporated into the collective knowledge base, other robots can reuse the new knowledge to in turn recognize the same or a similar object and possibly perform an action with the object. In this example, the robot's goal is to grasp the object. The knowledge-based component of the proposed framework has five main modules, namely the Robot Repository, the Object Ontology, WordNet, the 3D Object Repository, and the Object Grasping Strategy Repository. We detail each module as follows.
The Robot Repository is a collection of robot descriptions. We use the Unified Robot Description Format (URDF), which is a standard for describing the physical and mechanical characteristics of robots, as shown in Figure 3. In addition, Appendix A elaborates on the URDF format in more detail. In fact, physical differences among a heterogenous collective of robots could affect the transferability of knowledge and their ability to cooperate toward a particular goal. For example, in order for a service robot to grasp an object, the robot must first be able to recognize the identity of the object, gauge the position of the object in the robotic frame of reference, and estimate the orientation of the object with respect the robot's position, using appropriate sensors apparatus. However, for the service robot to plan how to move and grasp the object, it needs a physical description of itself. The Object Ontology is a knowledge base of heterogenous objects. When robots learn knowledge in isolation, the information they acquire may only be usable by that particular robot. This would mean the platform would not allow reuse of object recognition and pose estimation knowledge. It would also require each robot to store knowledge of an exhaustively large number of objects, which is infeasible due to limited storage and computational power. We solve these problems by providing a cloud-based 3D Object Repository to store the collective knowledge of multiple service robots. Our knowledge base uses the new Robot Object Description (ROD) ontology to represent objects and FOAF to represent people that label objects. The use of ontologies promotes cooperation in learning about objects by multiple robots with different physical descriptions. WordNet enables the system to share a common understanding of objects across multiple robots and multiple applications. The basic operation required for a human to assist a robot in an object understanding task is to identify (label) the object and optionally provide a textual description. To construct and share such knowledge among multiple robots in various locations operated by different users, linguistic analysis needs to be considered. We integrate WordNet, a lexical database for the English language, with our ontology-based framework to enable interpretation and identification of objects and their descriptions. In this way, all the robots in our cooperative learning environment can share knowledge of objects and build up a common understanding of such objects.
The Point Cloud 3D Object description format is a standard for describing a 3D object as a collection of data points defined in a 3D vector space, as shown in Figure 4. Appendix B elaborates on the PCD format in more detail. This standard allows robots to draw inferences about the geometry of an object and its position and orientation with respect to the robot. With a point cloud, the robot can perform autonomous actions such as moving to and grasping an object. In our ontology-based framework, we use the Point Cloud Library (PCL) to formally describe the geometry of objects in the 3D Object Repository so that robots in the cooperative learning environment can reuse those data when performing object learning and recognition [32][33][34]. We allow association of multiple partial point clouds to form a single object template to represent different viewpoints of the object. Our prototype application generally uses template matching to identify the object in a given score that best matches a given template. Hence, the object template must be linked with the ROD ontology to formally describe the object represented by the template. The Object Grasping strategy module is used for object learning and recognition. Consider the use case of grasping and moving an object. The grasping strategy module uses a SVM with grasping rectangles (i.e., a set of coordinates at which it is possible to grasp a specific object) and point cloud data to determine how to grasp a particular object. This task is computationally intense, so we offload it to cloud infrastructure to reduce compute time, robot cost, and robot power consumption.

Cooperative Robot Learning Approach
Herein, we provide a description of the cooperative object learning, detection, and recognition approach built on top of the previously described cloud-based platform. Figure 5 shows a sequence diagram that describes the object training, ontology population, and object manipulation use cases for a cooperative learning scenario. We assume, for the time being, that each robot participating in the distributed system has a camera and a Wi-Fi connection to the Internet. The method is divided into three phases, elaborated in the following subsection: (i) object training, (ii) ontological knowledge population in ROD, and (iii) object recognition and manipulation. Figure 6 illustrates the approach to object recognition and detection training in our ontology-based framework. We firstly acquire a RGBD image to get a Point Cloud Data (PCD) structure, as shown in Figure 7a. Thereafter, we remove outlier points using statistical outlier removal, thus segmenting the objects in the scene from the background (Figure 7b,c. We utilize multiple PCDs for an object template to represent different viewpoints of objects. Such PCDs are uploaded and inserted into the ROD 3D object repository through the ROD service. The ROD service also provides the capability for linguistic analysis with WordNet to find, based on the user's English word labeling, the concept that best represents the object. Moreover, the ROD service provides the capability of finding a set of coordinates at which it is possible to grasp the object, namely object grasping knowledge. Grasping knowledge is stored in the Object Grasping Strategy Repository for other robots to use when grasping identical or similar objects. The approach to populating the repository and linking all this knowledge is explained in the next section.

Ontological Knowledge Population
To enable cooperative learning and recognition of 3D objects among multiple service robots, we propose a new ontology, namely the Robot Object Description (ROD) ontology, which utilizes the existing SUMO ontology to represent object knowledge for service robots. Furthermore, we perform linguistic analysis on user-provided labels of objects to find the best matching concept. This allows a common understanding of an object between multiple service robots, multiple users, and multiple applications. Figure 8 shows the relationships between ROD core concepts in detail. An Object Description is an abstraction of a real-world object. An object instantiated from an Object Description also refers to a SUMO concept to represent the object's type. The object description also links to a Wordnet concept through the relation hasLabel. The object description also contains the textual description, PCD files, and grasping strategies through the relations hasDescription, hasPCD and hasGraspingStrategy, respectively.
A Grasping Strategy provides information about how to grasp an object through the relation hasURL.
A Robot Description is an abstraction of a real-world service robot. A robot can learn about objects and has a URDF structure through the relations hasLearnt and hasURDF, respectively. Moreover, a robot can be trained by a person represented via FOAF through the relation hasTrained. Figure 8 also depicts an ontological knowledge base for a sample object Object A, represented as an instance of concept Object Description from the ROD ontology and Drinking Cup from the SUMO ontology. With the use of linguistic analysis and WordNet, we can construct a linkage property, ROD: hasLabel, between Object A and the English concept Coffee_Mug in WordNet. Furthermore, Object A has an additional description through the property ROD: hasDescription. Thereafter, we construct a sample grasping strategy, Grasping A, as an instance of a Grasping Strategy from ROD. Thus, Object A links to Grasping A through ROD: hasGraspingStrategy. The relationship between an object and a robot is enabled by instantiating a Robot Description, for example, Robot A. The fact that RobotA has learnt Object A is represented through the property ROD: hasLearnt. The person instance Person A is represented using the FOAF ontology. If Person A is the person who provided the label for a particular object to Robot A, this information is provided through property ROD: hasTrained. Finally, we construct linkages of Object A, Grasping A, and Robot A to their Internet resources through the property ROD: hasPCD, hasURL, and hasURDF, respectively. List 1 depicts an excerpt of the ontological knowledge base in RDF (Resource Description Framework) format. List 1. Excerpt of the ontological knowledge base in RDF.

Object Detection and Manipulation
In case the robot cannot respond to the request of a person, for example when the object is unknown, then the robot should interact with the person and the learning module, uploading the acquired PCD to the cloud. In addition, a robot may find it necessary to improve the training set for a particular object in the cloud platform and try to respond to the request of the person at the same time. Cooperative learning between robots is a simple matter of sharing templates for specific objects.
To recognize and manipulate a specific object using a robot with a camera, we firstly acquire an image to get a PCD point cloud. Thereafter, we use the ROD service to identify the object and its concept using template matching through the Sample Consensus Initial Alignment (SAC-IA) algorithm. Given a set of previously captured templates of an object in the ROD Repository, we can thus determine the object's position and orientation in the robot frame of reference. Concretely, as input, we take a depth image containing an object and attempt to fit one of the previously-captured templates to the scene. This works well for getting the position and orientation of the object in a cluttered scene, as shown in Figure 9. However, finding the best match between a scene and a set of templates in the point cloud representation is computationally expensive, because the number of object templates required is relatively large. We, therefore, offload the task to cloud-based parallel computing to reduce the time required to compute the best match. The ROD Service also provides grasping strategies for the object; thus, the robot can use this knowledge to grasp and manipulate the object.

Validation
In this section, we present an experimental validation of our approach. We first describe the development of a prototype as a proof-of-concept. Afterward, we describe our experiments to validate the ontology. Figure 10 depicts the architecture of our proof-of-concept system. Concretely, we use a web application to simulate the API requests that would be made during a service robot's interaction with the system. The system allows users to provide training data about objects and request object detection results. The training interface is shown in Figure 11. When training data for an object are uploaded for the first time, a training set is generated by the ROD service. The training set includes PCDs and grasping strategies. This training set is uploaded to the 3D Object Repository. Furthermore, The ROD service populates an ontological knowledge base in OWL format, namely the ROD Repository, to provide access to the training set files to any member of the robot collective over the Internet. This enables cooperative learning for multiple networked robots over the Internet. For the ROD Service to perform the tasks mentioned above, we integrate multiple application interfaces: a WordNet interface to provide access to the WordNet repository, a PCD Viewer to visualize PCD files, three.js to visualize URDF files, and SVM Light to generate grasping strategies. We developed prototypes of each module and a web application simulating cooperative learning by a collective of robots using Open-Link Virtuoso to generate the necessary SPARQL endpoints and the ROD Repository.   Figure 11 shows a sample screenshot from the training interface of our proof-ofconcept web application. Users can train the system on objects using the following steps: Select a service robot (URDF): Users can specify the model of the service robot by selecting an URDF file. The system will create an instance representing the robot in the ROD ontology based on this name. The prototype provides a small number of initial robot descriptions.

Proof-of-Concept
Class of Object: A user can specify the class of the training object from the list of SUMO classes.
Label of Object: Based on the selected class, the system generates a list of related concepts obtained from WordNet for the user to choose from, in order to further describe the object.
Describe the Detail: Users can provide additional information to describe the object. 3D image PCD File: Users can upload the PCD file of the object. Finally, users can submit all these data to generate a training set for the new object in the ROD repository.

ROD Ontology Implamentation and Usage
Here, we demonstrate an implementation and usage of our ROD ontology based on the two main features of ontologies, equivalence and disjointness between classes. Firstly, we construct a training set for two types of objects: drinking cup and bottle. We selected these objects based on the description of two SUMO classes: Drinking Cup and Bottle, as detailed in Table 1. Concretely, 25 different drinking cups and 25 different bottles were trained on in our proof-of-concept web application. Thereafter, objects with identical type have equivalent classes, while objects with different types have disjoint classes. Finally, 50 instances from two classes (i.e., DrinkingCup and Bottle) are stored in the Object Store and the ROD Ontology.
We train on an object with a robot from the given set of objects that belong to the class Bottle and Drinking-Cup in the SUMO ontology. Thus, our ontology-based framework will construct linked data specifying learning object instance. Table 2 illustrates the data stored in the system. We first train a robot on multiple objects to see if the framework can map the classes to those objects correctly. We then use the robot to recognize objects among a given set of different objects to evaluate the accuracy of object recognition by robots using shared knowledge from our framework. We use another robot to detect previously unseen objects to see if they can be detected correctly (return correct classes). Table 2. Example of SUMO classes for semantic analysis using WordNet.

Class of Object (SUMO) WordNet
DrinkingCup An open Fluid_Container that is intended to serve a Beverage to a single person. Note that this class includes both cups with handles and drinking glasses (beer_glass, beer_mug, champagne_flute, coffee_cup, coffee_mug, drinking_cup, flute, flute_glass, stein).
To demonstrate common understanding of objects learned by the system, we developed a web application integrating the ROD ontology and the WordNet API for semantic analysis. The result shows that our framework can describe real-world data with linkages to linguistic data, which can be useful to provide a common understanding between varied robotic applications.

Framework Usability Evaluation
We evaluate the usability of our framework by simulating a scenario in which a participant wants to command a service robot to look for and grasp an object. In the experiment, 20 non-expert participants were assigned to command a robot to search for different specific objects in the given scene. Figure 12 shows an example of a target object in a given scene. The goal is to find the 3D coordinates in the robot's 3D frame, based on which, an object grasping strategy can be further applied. In the experiment, the participant selects an object from among multiple objects within the scene and gives a command to the robot to look for and grasp the object. The simulation starts with the participant defining an object class based on the SUMO ontology. The system then performs semantic analysis to present a description as a label of the object from WordNet (see Table 3). The participant verifies that the label is suitable to represent the selected object. The participant can also elaborate on the physical appearance of the object by specifying a description, in order to populate the knowledge base with more knowledge that could be shared within the framework. The robot simulates the object recognition and object grasping tasks, then responds to the user as to whether the object can be found in the scene and can be grasped or not. Lastly, the participant evaluates the usability of the framework based on System Usability Scale (SUS) metrics [38]. The results show that the user satisfaction with the system had a SUS score of 75.25, as shown in Table 4. Table 3. Example of objects inserted into the ROD Ontology.  This experiment allowed participants to execute queries based on a common understanding of the learned data stored by the system. Without a shared ontological knowledge base, searching for an object without a precise specific class or label specification often results in a larger than necessary search space and is time consuming. On the other hand, a precise specification of a class or label will make the search-space smaller, resulting in less time spent on the search process. Therefore, this study affirms that the proposed ontology-based framework can benefit robots' cooperative learning.

Class of Object (SUMO) Label of Object (WordNet) Description
We also evaluated the efficacy of the object recognition algorithms used in our framework, namely Template Matching and Iterative Closest Point [36]. We adopt three commonly used evaluation metrics for object recognition, i.e., accuracy, F1-score, and IoU (intersection over union). We manually identify the object in each test scene to establish ground truth then measure the precision, recall, accuracy, and the F1-score. We also calculate bounding boxes from 3D point cloud data of each ground truth object to measure the system's average IoU. Tables 5 and 6 show the evaluation results for detecting different drinking cups and bottles, respectively. From the results, we conclude that the design of the framework needs to be flexible, allowing administration users to add newly improved object recognition algorithms into the framework to enhance the system's accuracy.

Discussion
We have presented a framework to support collaborative learning by a group of robots. The system provides a robot collective with the capability of improved learning. The usability test suggests that the ontology can be used to generate useful cooperative robot learning data. The limitation of robot learning is that is difficult to bring knowledge from multiple robots, environments, and applications together. For example, two robot users may have different words to describe the same object, making them distinct and preventing sharing of the data. The ontology enables description of objects to create a common understanding a classes in a hierarchy. Our framework uses cloud storage so that the robots can effectively share knowledge for object recognition and object grasping strategies. This solves the problem of memory and processing limitations of individual robots.
Grasping and moving an object requires complex computation, because physical robot information specified in URDF is necessary to calculate a relative position between the robot and the object, in order to move the robot arm to the object. The computation results in power consumption. Using an ontology to store grasping strategies allows robots to retrieve grasping strategies without performing extensive computation. This will reduce the robot's energy consumption and computation time profile.
In the future, we plan to improve the framework by adding more capabilities to support cooperative learning in which many robots share knowledge without the need to re-learn. This reduces the problem of storing large amounts of data and reduces the burden of processing, which is one of the major problems in the practical development of robots. We plan to extend our ROD ontology to support more types of objects that are not described in the SUMO ontology. As pointed out by one of this paper's reviewers, self-learning of object ontologies and semantic representation of 3D object structure are useful and interesting ideas to explore as part of the future work. Modeling a semantic representation of 3D objects would enable a wider application of the proposed framework beyond simple 3D object recognition; see [39] for an example of an application in the 3D cultural artifact domain. We also plan to incorporate voice commands to facilitate the commanding of robots, avoiding the need of use a web browser to execute training and detection operations.

Conclusions
In this paper, we have presented an ontology-based framework for cooperative robot learning that utilizes knowledge representations in the Web Ontology Language. Concretely, we define a new Robotic Object Description (ROD) ontology with the help of the SUMO ontology in conjunction with WordNet to create a common understanding of objects. We consider the use case of a robot handling requests from users to look for and retrieve objects. Since multiple users may have different ideas about how to describe the same object, the ontology-based framework will facilitate users to arrive at a common understanding through the knowledge populated in the framework. We have developed a proof-of-concept implementation of the framework to demonstrate the use of the ontology to allow learning about multiple objects to perform object recognition and grasping. This demonstrates how the system can support robots with different physical characteristics to learn and share knowledge. As a result, cooperative learning of robots is possible.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The Unified Robot Description Format (URDF) is a data standard used by the Robot Operating System (ROS) to describe the physical structure of a robot. Robots are described in the form of XML files, which implementation on various systems convenient. Open-source libraries for visualization of URDF files are also available. URDF is easy to understand; it specifies the movement of a robot's parts as well as its physical appearance. A URDF data structure details all of the parts of the robot. The detailed description of each part includes color and shape information. An example showing some of the important details of a URDF specification are shown in Figure A1 and described here.
Link: The connection of an individual part to other parts along with collision properties.
Joint: Pivot points for joint motion. Movement restrictions are identified to allow the control system to ensure safety.
Transmission: Relationship between parts when one of the parts is moving. Figure A1. Sample description of physical and mechanical characteristics of a robot using the URDF XML schema.

Appendix B
Different research communities, including the computer graphics [39] and robotics communities, use different methods to represent and manipulate 3D object information. The PCD format specifies groups of 3D points with optional color information. A PCD file must contain the following details: VERSION: PCD file format version. FIELDS: defines the data to store for a set of dimensions describing 3D object point data. Example: • FIELDS x y z: for 3D point coordinates, • FIELDS x y z rgb: for 3D point coordinates with color information.
SIZE: data type size for each dimension, either 1, 2, 4, or 8 bytes. TYPE: data type for each dimension, such as int or float. COUNT: the number of elements in the data type for each dimension, usually 1. WIDTH: width of the point cloud dataset, using one of two meanings: the total number of points in the cloud for an unorganized dataset, or the total number of points in each row of the dataset for an organized point cloud dataset.
HEIGHT: the height of the point cloud dataset, using one of two possible meanings: 1 for an unorganized dataset or the total number of rows that an organized point cloud dataset.
VIEWPOINT: an acquisition viewpoint for the points in the dataset has.
DATA: the data format used for the point cloud data: ASCII, binary, or binarycompressed.