Robotics Classification of Domain Knowledge Based on a Knowledge Graph for Home Service Robot Applications

Yiqun Wang; Rihui Yao; Keqing Zhao; Peiliang Wu; Wenbai Chen

doi:10.3390/app142411553

,

and

¹

School of Automation, Beijing Information Science & Technology University, Beijing 100192, China

²

School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2024, 14(24), 11553;https://doi.org/10.3390/app142411553

This article belongs to the Special Issue Artificial Intelligence in Complex Networks (2nd Edition)

Version Notes

Order Reprints

Abstract

The representation and utilization of environmental information by service robots has become increasingly challenging. In order to solve the problems that the service robot platform has, such as high timeliness requirements for indoor environment recognition tasks and the small scale of indoor scene data, a method and model for rapid classification of household environment domain knowledge is proposed, which can achieve high recognition accuracy by using a small-scale indoor scene and tool dataset. This paper uses a knowledge graph to associate data for home service robots. The application requirements of knowledge graphs for home service robots are analyzed to establish a rule base for the system. A domain ontology of the home environment is constructed for use in the knowledge graph system, and the interior functional areas and functional tools are classified. This designed knowledge graph contributes to the state of the art by improving the accuracy and efficiency of service decision making. The lightweight network MobileNetV3 is used to pre-train the model, and a lightweight convolution method with good feature extraction performance is selected. This proposal adopts a combination of MobileNetV3 and transfer learning, integrating large-scale pre-training with fine-tuning for the home environment to address the challenge of limited data for home robots. The results show that the proposed model achieves higher recognition accuracy and recognition speed than other common methods, meeting the work requirements of service robots. With the Scene15 dataset, the proposed scheme has the highest recognition accuracy of 0.8815 and the fastest recognition speed of 63.11 microseconds per sheet.

Keywords:

service robot; knowledge graph; kast classification; lightweight network; transfer learning

1. Introduction

Many breakthroughs have been achieved in service robotics in various fields due to technological advances in computer vision and robotics. A mobile service robot is usually a man–machine environment system. In this system, human participation and coordination are critical to improve system performance. Service robots have broad application prospects, especially in the home environment [1]. The service robot’s perception of the home environment is typically based on the recognition and detection of indoor scenes, objects, and various tools. Reference [2] introduced a visual attention mechanism to detect salient objects in the environment, improving the efficiency, intelligence, and robustness of the 3D modeling of the robot environment. It manually annotated the functional parts of objects and identified scene categories using public datasets to ensure that the robot could quickly find key areas of objects in the before research [3].

An indoor functional area is an area with a specific function, such as the living room, study, bedroom, dining room, bathroom, and other spaces. Various functional tools with functional and operational attributes are used in these spaces. Both perception and classification of the indoor environment are required by robots for analysis, decision making, and human–computer interaction [4].

There is a lack of research on knowledge representation and management models of tools and functional areas. Current research has mainly focused on the semantics of tool names and spatial relationships. Problems in this area include a large number of labels and difficulties in labeling some tools. Functional semantic features are critical in human cognition. Functionality refers to the potential uses of a tool or place. It is a better semantic description than the relationship between tool names and spatial semantics. The latest research by Li Feifei et al. [5] shows that the integration of functionality into environmental cognition substantially improves the cognitive ability of service robots.

It is expected that using knowledge representation of tools and functional areas through machine learning, especially deep learning, will improve the autonomous knowledge of service robots. Due to their natural interaction and humanized services, service robots already can recognize basic semantic information such as tool names and categories. As an important part of semantic knowledge, utility connects the real world and the task world and has attracted the attention of experts and scholars. Service robots typically use passive cognition to obtain the functional semantics of tools from semantic tags. Researchers are also exploring active cognitive methods based on inference learning [6,7].

Intelligent control is an important development direction for people-centered home service intelligent robots. The development of artificial neural networks has greatly promoted the development of intelligent control theory, making robots more humanized and more able to think from the perspective of people. Developing human-centered, natural, and efficient processes will become the main goal of the new generation of human–computer interaction technology.

The main contributions of this study are as follows:

(1): We use the theory of ontology and knowledge graphs to establish a knowledge graph system suitable for the home environment.
(2): An analysis of the characteristics and determinants of robot service is conducted to establish a decision-making template for service tasks by correlating the three types of knowledge in the home environment domain ontology. Knowledge acquisition is based on visual, physical, category, functional, and other attributes.
(3): A framework for the rapid classification of scenarios and tools for service robots is proposed because robots require timely information on indoor scenes using relatively small datasets.

The rest of the manuscript is divided into three sections. Section 2 presents the related work. Section 3 describes the application of domain knowledge based on the knowledge graph to home service robots. Section 4 presents the classification of home domain knowledge. Section 5 provides the experiments with models and frameworks. Section 6 provides the conclusion.

2. Related Work

Intelligent services are the core of semantic cognition and effective representation of environmental information, which include predicate logic representation, production rule representation, and ontology representation. Ontology has attracted people’s attention because of its structured knowledge expression and reasoning ability. Park et al. [8] proposed a design of a knowledge network system based on domain ontology. Hao et al. [9] used the ontology model to recombine the original datasets to obtain new datasets with better logic to meet the needs of upper-layer applications. Peng et al. [10] divided knowledge into three layers—basic knowledge, deep knowledge, and application knowledge, and created an agent-based hierarchical knowledge graph construction framework and methodology. Aiming at the overall formal representation and construction framework of knowledge graphs, Buchgeher et al. [11] clarified the specific representation mode of the pattern layer and data layer from a logical perspective. Yang et al. [12] proposed a semi-automatic tagging framework to represent Internet of Things (IoT) metadata. A probabilistic graph model was used to map the framework IoT resources to a knowledge network in an independent domain to build a knowledge graph to enable joint reasoning of entities, classes, and relationships.

The KnowRob knowledge system [13] was proposed for home service robots to provide robots with a richer knowledge of objects to improve the intelligence of robot task execution. The object knowledge was shared between the system’s modules. The team of Tian et al. [14,15,16] at Shandong University conducted in-depth research on ontology models for multi-domain knowledge sharing and reuse of robotic platforms. Rafferty et al. [17] established a knowledge-based model to determine the user’s intent and reasoning in a home environment. Xiangru Zhu et al. [18] improved the ability of machines to understand the real world through the construction of a multi-modal knowledge graph. Lin et al. [19] discussed the development of knowledge representation in accordance with the difference of entities, relationships, and properties.

Zhou et al. [20] showed the co-occurrence of “target pairs” with the Bayesian method to represent and identify the scene preferably. Ricardo Pereira et al. [21] proposed a two-branch CNN architecture based on 1D and 2D convolution layers, which enhances the classification of indoor scenes with the semantic feature of distance between objects. Zaiwei Zhang et al. [22] proposed a deep generation scene-modeling technology suitable for the indoor environment, in which the generation model can be trained with a feedforward neural network. The network maps the previous distribution (e.g., normal distribution) to the distribution of the main objects in the indoor scene. The research of Shengyu Huang et al. [23] also showed the recognition of successful scenes, which were not only about identifying single objects unique to some scene types, but depended on several different clues, including rough 3D geometry, color, and (implicit) distribution of object categories. Chen et al. [24] preprocessed multi-source data at the instance level, mapped it into the content of a knowledge graph, and further used SWRL to effectively extend OWL semantics and realize rule representation. López-Cifuentes et al. proposed a new scene recognition method based on the end-to-end multi-modal CNN, which significantly reduced the number of network parameters by combining image and context information through the attention module [25]. B. Miao et al. proposed an object scene model, in which the object feature aggregation module and the object attention module learn object features and object relationships, respectively [26].

Numerous deep learning image classification networks have been proposed in recent years. Brock et al. [27] proposed NFNets to improve training efficiency and accuracy by eliminating expensive batch normalization. Srinivas et al. [28] focused on improving training speed and accuracy by adding attention layers to ConvNets. In 2014, Oxford University and Deepmind Company proposed a deep convolutional network with a depth of 16–19 layers. ResNet [29] includes a residual block to ensure the efficiency of increasing the depth of deep networks. The network depth reached an unprecedented level, and ResNet provided excellent classification performance. These deep networks with many parameters have powerful data representation capabilities. The parameter layer of the neural network can extract shallow features, such as texture, as well as the functional semantic features of objects.

Although deep structures have powerful data representation capabilities, they have high computational complexity. Thus, it is difficult to perform real-time analysis on mobile terminals without efficient GPU resources. Some recent work has been aimed at improving training or reasoning speed rather than parametric efficiency. For example, RegNet [30], ResNeSt [31], and EfficientNet-X [4] focus on GPU and TPU inference speeds. In 2016, lightweight models achieved a significant breakthrough. They use depthwise separable convolution and have fewer parameters with only a negligible decrease in accuracy. Examples include SqueezeNet [32], MobileNet [33], and XceptionNet [34]. Significant progress has been made in reducing the size of the network. MobileNet, ShuffleNet, and XceptionNet are excellent applications of depthwise separable convolution. The combination of complementary search techniques and the NetAdapt method has resulted in further improvements in the inference speed [35]. In addition, transfer learning has been applied to image classification with small sample sizes. Wenmei Li et al. [36] proposed a scene classification method for HSRRS images with transfer learning and the DeCNN (TL DeCNN) model, with obvious classification advantages which will not overfit. Amsa Shabbir et al. [37] also fine-tuned the classification of satellite and scene images based on transfer learning and ResNet50, which classifies images more effectively. Transfer learning is also suitable for solving the problem of insufficient indoor scene recognition data.

3. Using a Knowledge Graph for a Home Service Robot

3.1. Home Service Robot Service System

In this paper, the home service robot system takes the family environment as the specific research scene. By identifying the common objects and scenes in the text, images, and videos marked with semantic labels in the sample database, it learns the semantic features such as apparent features, names, functions, and spatial layout of various objects and scenes in a supervised manner, so as to obtain the knowledge of home scenes.

As shown in Figure 1, the service system of the home service robot system consists of the following parts:

Figure 1. Home service robot service system.

(1): Knowledge extraction of data information of service robot

First, starting from a large number of household object samples of text, RGB-D images, and videos equipped with functional semantic tags, the RGB-D apparent features and semantic features including functional and usage methods of various objects are learned, object knowledge representation of combined apparent features and semantic features is constructed, and knowledge extraction is carried out by the learning algorithm.

Then, starting from the home location scene samples containing functional semantic tags, combining the semantic features such as object layout and spatial relationship, the scene location knowledge is represented, and knowledge extraction is carried out by the learning algorithm. Finally, starting from the sample of family characters with semantic tags, we learn all kinds of characters and their behavior characteristics, carry out knowledge representation of the characters, and extract knowledge using the learning algorithm.

(2): Home Service robot domain knowledge graph system design

The home environment domain knowledge graph representation form, which is more in line with the natural interaction requirements of service robots, is first designed. Then, an effective link between object knowledge, environment knowledge, human knowledge, and the knowledge graph is designed to build a domain knowledge graph system for service robots.

(3): Research on the construction and application of knowledge graph for home service robot system

Based on the knowledge acquisition of daily objects, location scenes, and people by service robots, as well as the design of domain knowledge graph systems, a household service domain knowledge graph will be constructed under a household service Internet of Things robot system. This will involve a position and pose observation of the sensor nodes of the Internet of Things and service robots.

In addition, the application research based on the domain knowledge graph will be further carried out. For the newly acquired information of the sensor nodes of the Internet of Things, the knowledge graph can accurately represent the relationship between various types of information, and help the service robot realize the reasoning and planning of service tasks.

In this paper, the service system of the home service robot system obtains semantic labels and other information of scenes and tools in the image through image recognition, and then connects the semantic information through the “relationship” through knowledge representation and ontology modeling. Finally, the information and relationship are managed and utilized through the knowledge graph, and the specific development and application is carried out in the field of family service. The system combines the visual domain with the knowledge graph to ensure the effective use of the visual information obtained by the service robot.

3.2. Construction of Concept Layer of the Home Service Robot Knowledge Graph

Domain ontology refers to the concepts, entities, and interrelationships in a specific domain and describes the rules and characteristics of knowledge in this domain. A domain ontology of the home environment is constructed by classifying the domain data of the home environment. This approach enables the description of people, objects, and robot service information at the semantic level to enable the sharing of domain knowledge for different pieces of equipment and different types of information.

Two basic methods are typically used for ontology construction. The top-down approach is sufficient to meet the application requirements in fields with relatively mature and complete knowledge systems. The broadest concepts in the field are defined first and are subsequently refined. However, in some fields, the data are not systematic, and the knowledge is incomplete. Thus, a bottom-up and data-driven approach is required. The most specific concepts are defined first and summarized next. The top-down and bottom-up approaches are combined in real-life applications of developing complex domain ontology. A concept layer is first established using the top-down approach. Then, the classes and attributes are supplemented by the bottom-up approach using various data. The following two steps are used in this study to construct the domain ontology model of the home environment:

Determine the classes in the home environment and their hierarchical relationship to establish an empty ontology model with a class structure. This involves analyzing the home environment to categorize relevant entities such as objects and locations. This initial step lays the foundation for the ontology by creating an organized framework of classes that represent the primary components of the home environment.
Determine the object attributes that describe the relationships between classes and the data attributes that describe the class (or instance). The domain and range for each property are established to ensure clear relationships between different classes and their instances.

These two steps are used to construct the concept layer of the knowledge graph, which has the following three parts:

Classification of knowledge in the home environment.

The environment class represents the environment domain, including the subclasses object class and location class. The object class contains all objects in the home environment, including furnishings, household items, and household appliances. The location class contains location information, and the user class represents the use characteristics, including the subclasses people class and behavior class. The people class contains information on the family, and the behavior class describes the behavior of the family. The robot service class represents robot services, including the operation and service subclasses. The operation class contains information on the robot’s operations of various objects and equipment. The service class contains details on specific service tasks of the robot.

2.: Define the relevant object attributes, use the class as the definition domain and the value domain, and establish the connection between classes.

“isLocation” describes the location of objects in the home environment. It uses the location class as the value domain to establish a connection between the object class and the people class “.toDo” determines the actions involved in the robot service operation. Its definition domain is the service class, and its value domain is the operation class.

“actOn” determines the objects involved in the robot service operation; its definition domain is the operation class, and its value domain is the object class. “hasBehavior” describes the behavior of the family. Its definition domain is the people class, and its value domain is the behavior class.

3.: Define the data attributes used to describe objects in the home environment, i.e., object knowledge representation.

The domain ontology of the home environment is shown in Figure 2.

Figure 2. Domain ontology of the home environment.

Based on the knowledge graph system of the home service robot, the semantic web rule language (SWRL) is used to establish a rule base for service inference, and the Jess inference engine is used to parse the ontology knowledge and service rules. Rule inference is used to match facts and rules in the inference engine to enable autonomous reasoning and decision making to perform services.

To ensure that the knowledge graph remains comprehensive and up-to-date, we draw upon rich data from esteemed public knowledge databases, such as DBpedia and ConceptNet. These sources provide foundational knowledge about household objects, their interrelationships, and domain-specific concepts, enriching the depth and breadth of the knowledge graph. We adopt a strategy of periodic data integration, continuously incorporating updates from these external repositories to maintain the relevance and accuracy of the knowledge base. This dynamic approach allows the knowledge graph to evolve gracefully, reflecting the latest insights and information in the field.

3.3. Home Service Reasoning Based on SWRL Rules

The robot uses the knowledge graph system to obtain and analyze the data in the ontology knowledge base to perform semantic cognition of the home environment. According to the user’s identity, behavior, habits, and the environmental information, the robot can determine which services should be provided using the SWRL rule base.

The subclasses, instances, and attributes of the environment and user domains in the home environment domain ontology can be combined using the SWRL rule and the subclasses and instances in the service domain as the conclusion of the SWRL rule. A decision-making template for service tasks is established by correlating the three categories in the home environment domain ontology.

As shown in Figure 3, the template indicates that the service task that the robot should provide is determined according to the environmental, user identity, and behavior information in the home environment. The establishment of the SWRL rule base should consider the conditions of various services and determine the various service modes that may occur in the home environment and the execution logic of the service tasks. An appropriate system and scale will substantially enhance the robot’s ability to provide services.

Figure 3. Template for developing the service inference SWRL rule base.

In addition, the construction and description of the SWRL-based service inference rules can be divided into three steps:

Determine the types, locations, and operations of the service and define them;
Comprehensively analyze various factors affecting the service and clarify the preconditions by defining their classes and attributes;
Determine the type and operation mode of the service.

Use “water delivery” as a service example. First, determine that the service object is user Alan, and the operation object includes the water cups, which are defined. Then analyze the factors affecting the water delivery, which may include “User Alan is sitting in the bedroom, and it is time for drinking water”. The service operation mode corresponding to the “water delivery” service is “send”.

To further enhance the water delivery service, we consider additional conditions, such as the environmental temperature. For example, if the temperature inside the home exceeds a certain threshold (e.g., 30 degrees Celsius), the service robot should proactively deliver water to help prevent dehydration.

Therefore, the SWRL rules for the water delivery service are established as follows:

User (Alan) ^ StateUser (sit) ^ Location (bedroom) ^ Time (t1) ^ hasUserState (Alan,sit) ^ hasLocation (Alan, bedroom) ^ hasTime (sit, t1) → operateOn (send, cup).

User (Alan) ^ Location (home) ^ Temperature (temp) ^ greaterThan (temp, 30) ^ isLocatedIn (Alan, home) ^ hasTemperature (home, temp) → operateOn (send, cup).

These rules ensure that the service robot can intelligently decide when to deliver water based on both user behavior and environmental conditions. The combination of basic and extended rules allows the robot to dynamically adapt its actions to various situations, enhancing the efficiency and responsiveness of the water delivery service.

The service inference rule base serves as an essential foundation for the robot’s decision making in service tasks, and it must account for abnormal factors that may occur in real household scenarios. Therefore, the robot must respond appropriately to highly unusual situations involving individuals or environmental conditions, such as a user falling, fainting, requiring rescue, or encountering abnormal environmental conditions like extreme temperatures or harmful gas concentrations.

For instance, in the case of a user fainting, the robot should provide an emergency call service. The SWRL rule for establishing this emergency call service is as follows:

StateUser(sleep) ^ Location(?x) ^ swrlb: booleanNot (?x, bed) → operateOn(call, family).

The service reasoning template based on SWRL rules effectively infers the service tasks that the robot should provide by analyzing diverse daily information and specific abnormal information in the context of household scenarios. This inference is achieved by incorporating detailed environmental information, user data, and user behavior patterns.

3.4. Object Knowledge Reasoning Based on SWRL Rules

It is difficult for the system to identify the color, shape, and material properties of each object instance without making mistakes. Thus, some properties are lacking, and reasoning is an effective method for obtaining missing knowledge. This section describes the dependencies between the object attributes. SWRL rules are used to establish the acquisition mechanism for missing attributes of objects and to perform reasoning using object knowledge. This strategy enhances the logic of the ontology model and enables the robot can infer the missing attributes based on the existing attributes of the object instance.

As shown in Figure 4, based on the acquisition of various attributes of object instances in the initial stage of the system, the mechanism includes the following rules:

Figure 4. The acquisition mechanism of missing attributes of objects.

The first rule type is the inference acquisition of the functional properties of objects. It includes two cases. The first case is to obtain functional attributes based on the category, physical, and visual attributes, including rules such as: Category (?x, Cup) → hasAffordance (?x, Containable), indicates that if the instance ?x is a “Cup”, then it has the functional attribute “Containable”. The second case is to use co-occurrence and infer missing functional attributes from known functional attributes, including rules such as: hasAffordance (?x, Openable) → hasAffordance (?x, Closable), indicating that if instance ?x has the functional attribute “Openable”, then it also has another functional attribute “Closable”.

The second rule type is the inference acquisition of the operational properties of objects. The operation attribute is obtained by functional attribute inference. It contains rules such as: hasAffordance (?x, Graspable) → hasOperational Attribute (?x, Grasp), indicating that if the instance ?x has the functional attribute “Graspable”, then it also has the operational attribute “Grasp”.

In addition, as shown in Figure 5, a mechanism for acquiring missing category, physical, and visual attributes of the object is established. The reasoning relationship between attributes is formally represented to achieve mutual reasoning for the three attributes and the individual attributes. It includes rules such as: Category (?x, Ball) → hasShape (?x, Sphere), indicates that if the instance ?x is a “Ball”, then it has a visual attribute, i.e., the shape is a “Sphere”; hasShape (?x, cubic) ^ hasWidth (?x, 20) → hasHeightcm (?x, 20) ^ hasLengthcm (?x, 20) means that if the shape of instance x is a cube and its width is 20 cm, then its length and height are also 20 cm.

Figure 5. The acquisition mechanism of missing attributes category, physical, and visual attributes of objects.

4. Classification of Domain Knowledge

4.1. Model Selection and Usage Strategies

The service robot platform is constrained by computing power, which limits the model’s parameter size and inference time. As a result, using networks with a large depth for direct training would lead to long training periods, and parameters cannot be updated promptly during the training process. Additionally, the large model size makes it difficult to deploy the model onto the service robot platform, resulting in delays in making timely predictions. Furthermore, the robot requires a high level of responsiveness for real-time services. MobileNetV3 is specifically designed to optimize speed and efficiency. By using depthwise separable convolutions, MobileNetV3 significantly reduces the number of parameters and model size. Moreover, MobileNetV3 is optimized for deployment on mobile and edge devices. Its optimized architecture ensures that it can run efficiently on these edge devices, enabling the robot to perform fast local inference while minimizing latency and energy consumption, thus guaranteeing the model’s quick response capabilities. Therefore, in this study, MobileNetV3 is used as the convolutional module of the model to ensure the efficient extraction of high-dimensional features from image data.

As shown in Figure 6, the network structure of MobileNetV3 is divided into three parts. The difference is the number of basic unit bnecks and internal parameters, mainly the number of channels.

Figure 6. Network structure of MobileNetV3.

MobileNetV3 includes two versions (large and small) that are suitable for different scenarios. The overall structure of the two versions is the same, but the difference is the number of basic units bottlenecks (bneck) and the internal parameters, such as the number of channels. Table 1 lists the parameters of MobileNetV3 (large) used in this work.

Table 1. Specific parameters of MobileNetV3 (large). ✓ indicates whether the SE module is used.

Since people live in indoor environments, few datasets are available due to privacy concerns. In this study, we used the CIFAR-100 and ImageNet datasets. Therefore, it is difficult to train deep learning models that require many parameters. In addition, a large model with many parameters deployed on the mobile terminal may be limited by the hardware computing power, and real-time applications may not be possible. Moreover, indoor images have worse lighting conditions and more occlusions than outdoor images, and it is difficult to perform effective feature extraction to describe the rich semantic features in the image.

Transfer learning strategies compensate for the shortcomings of convolutional neural networks for these tasks. Transfer learning is a technique that leverages pre-trained models to extract useful features from large and diverse datasets, followed by fine-tuning the model to adapt to a new dataset specific to the current task. As shown in Figure 7, the transfer learning strategy applied in this study enables the model, through pre-training on a large dataset, to learn visual features such as edges, textures, and patterns from various types of images. These features are shared across different domains, while also acquiring weights. After completing model pre-training, the linear classifier of the pre-trained model is removed, and the remaining network layers are used as convolutional layers with their weights frozen. This means that the weights of these layers will not be updated during further training. These frozen convolutional layers retain the general visual features learned from the large dataset. Finally, the new linear layer is used as the classifier, and only the parameter gradients of the last classification layer are updated on the indoor scene dataset, substantially improving the classification accuracy of the model.

Figure 7. The proposed transfer learning strategy.

The parameters of the convolutional layer obtained from pre-training contain the characteristic information of the original field. The classification of indoor functional areas is based on the detection of key objects. If pillows are present, the functional area is likely a bedroom. If there are dining tables, plates, and spoons, the functional area is likely a dining room. Local semantic features in visual information may be more helpful for classification than texture or edge features. Appropriate pre-training methods can improve the network’s ability to extract semantic features, prevent focusing on the entire space, and ultimately improve network performance.

It should be note that, when computational resources are limited, the width multiplier can be reduced from 1.0 to 0.75 or 0.5 to decrease the number of parameters and computational load. For simpler detection environments, the number of certain blocks can be appropriately reduced to simplify the model structure and shorten inference time. Adjusting the dropout rate according to the size of the dataset can help prevent overfitting and improve the model’s generalization ability. Setting the learning rate too high may prevent the model from finding the optimal solution, while setting it too low could slow down the convergence rate. The learning rate should be fine-tuned based on the actual dataset to determine the optimal value.

4.2. The Semantic Cognitive Framework

We use the strategy of pre-training with a large sample size and finetuning using the indoor dataset. First, the MobileNetV3 network is pre-trained on the CIFAR-100 dataset to obtain the weights and parameters. The final classification layer of the trained neural network is removed and used as a feature extractor. Then we add the linear layer as a classifier, perform a “gradient freeze” on the feature extractor, and retrain it on the indoor scene and the tool dataset. As shown in Figure 8, the structure of the semantic cognitive framework can be divided into two stages, i.e., feature extraction and type recognition. The feature extraction stage is used to extract image semantic features at different levels. The type recognition stage is used to reduce the dimension of the features extracted in the feature extraction stage and predict the sample category.

Figure 8. The structure of the semantic cognitive framework.

In the feature extraction stage, a convolutional neural network model pre-trained on a large dataset is used to obtain the transfer network parameters. The structure of the model is changed to construct a feature extraction model. The output of the feature extraction model is a feature vector with one dimension containing rich information. In this stage, the features are extracted to distinguish the various categories in different images to perform type recognition of the scenes and tools in the next stage.

The type recognition stage inputs the scene and tool images into the same feature extraction model as the previous stage. The feature extraction model is used to process the features of all scenes and tool images in the dataset to represent the scene and tool category. The model matches the category labels of each scene and tool to train a classifier to classify the scene and tool category.

5. Experiments with Models and Frameworks

The experimental dataset is as follows:

(1): Pre-training dataset

CIFAR-100 is an image dataset with 60,000 images, including various outdoor scenes and objects.

There are 100 sub-categories with fine labels, with 600 images per category; these 100 categories are divided into 20 major categories with coarse labels. Figure 9 shows some examples of the CIFAR-100 dataset and the fine labels.

Figure 9. Examples of the CIFAR-100 dataset.

(2): Scene dataset (indoor functional area dataset)

The Scene15 dataset is an extension of the Scene13 scene dataset. Scene13 contains more than 3000 images in 13 categories, including city, country, suburban house, street, highway, high-rise, mountain, coast, forest, living room, bedroom, office, and kitchen. The Scene15 dataset includes images in two additional categories (stores and factories), comprising 15 scenes. Figure 10 shows some examples of the indoor scene category of the Scene15 dataset.

Figure 10. Examples of the Scene15 dataset.

(3): Scene dataset (utility tool dataset)

The UMD part affordance dataset is a comprehensive dataset of general household tools. This dataset was acquired with RGB-D sensors and includes semantic labels for 17 household tools, including cup, turner, ladle, shear, mug, spoon, trowel, knife, tenderizer, scoop, shovel, scissors, hammer, saw, pot, bowl, and mallet. Figure 11 shows examples of the tool images and classes in the dataset.

Figure 11. Examples of the UMD part affordance dataset.

We used the Scene15 dataset for indoor functional area classification to verify the proposed recognition model. We considered five indoor scenes based on the home service robot’s working environment: living room, bedroom, kitchen, store, and office. For each scene, 80% of the images were selected as the training set, 10% were used as the validation set, and 10% were used as the test set. The hyperparameters for model training included a batch dataset of 256 images, 200 training epochs, and an initial learning rate of 0.001. Figure 12a,b show the loss values of the training set and the accuracy of the validation set during training.

Figure 12. Loss values and accuracy during training for the classification of indoor functional areas.

For the Scene15 scene dataset, the convergence of the loss value occurs after 50 epochs and stabilizes around 0.3. The accuracy of the model for the validation set converges to 80% after 30 epochs, increases after 125 epochs, and eventually stabilizes at around 88%.

In the experiment of the functional tool classification, the 17 household tools in the UMD part affordance dataset are divided into a training set, validation set, and test set at a ratio of 8:1:1. The hyperparameters for model training include a batch dataset of 256 images, 200 training epochs, and an initial learning rate is 0.001. Figure 13a,b show the loss values of the training set and the accuracy of the validation set during training.

Figure 13. Loss values and accuracy.

Breakpoint training is used to prevent falling into a local optimal solution during training. The loss value of the training set converges after 125 epochs and stabilizes below 0.5. The accuracy rate of TOP1 in the validation set is about 96%, and that of TOP5 is close to 100%.

The proposed model is compared with other advanced indoor scene recognition methods to demonstrate its performance. The classification accuracy and inference speed are used as evaluation metrics. The results indicate that the proposed method has a higher inference speed of 63.11ms compared to the other advanced indoor scene recognition methods, as shown in Table 2.

Table 2. Performance comparison of different methods.

The Table 3 results indicate an excellent performance for predicting the tool type, with accuracy values close to 100%. Classical deep networks such as ResNet have many network parameters, a low inference speed, and a high computational complexity. In contrast, the proposed method has a high inference speed due to the low number of parameters. The inference time is only about 1/4 that of ResNet, and its accuracy is significantly higher than that of the other models.

Table 3. Prediction accuracy for various tools.

Note that further expanding the dataset including more diverse and challenging indoor environments would be beneficial to the model, helping it to remain adaptable to a wider range of real-world situations and have a high scalability in various home settings.

6. Conclusions

The proposal established a domain ontology model of the home environment in order to standardize the data information to enable home service robots to acquire information on the home environment. A decision-making template was developed using a knowledge graph to acquire missing information and perform task reasoning. Knowledge reasoning occurred at two levels (attribute and instance) to ensure the stable and effective use of the knowledge graph.

The proposed transfer learning method exhibited effective feature extraction using different datasets. The use of lightweight neural networks optimized the model’s inference speed. Therefore, the model exhibited faster inference speed and higher prediction accuracy than classical deep neural network and machine learning methods. By incorporating transfer learning strategies, our model retains efficiency while being fine-tuned for specific indoor environments, significantly improving classification accuracy and real-time response capabilities. This approach demonstrates strong advantages in addressing issues related to data scarcity in service robots and the need for rapid responses, especially in dynamic indoor environments where quick task execution is essential.

This study also has some limitations. For instance, variations in lighting and occlusions are common in household environments, posing challenges to the robot’s recognition tasks. It is necessary to further enhance robustness to these variations through data augmentation techniques and real-time adaptive strategies. Additionally, the boundaries of functional areas in homes are often ambiguous, making features prone to confusion. In future work, multi-task learning can be introduced to integrate the features of functional areas and objects, thereby improving recognition performance in complex scenarios. It should be mentioned that, in future work, we will also focus on metrics such as memory usage and scalability in various household environments to enhance the overall performance of the designed model.

Author Contributions

Conceptualization, Y.W.; methodology, R.Y.; data curation, P.W.; writing—original draft preparation, R.Y.; writing—review and editing, K.Z.; funding acquisition, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (62276028), the Major Research Plan of the National Natural Science Foundation of China (92267110), and the Young Backbone Teacher Support Plan of Beijing Information Science &Technology University (YBT202414).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wu, Q.; Liu, Y.J.; Wu, C.S. An overview of current situations of robot industry development. ITM Web Conf. 2018, 17, 03019. [Google Scholar] [CrossRef]
Guo, B.; Dai, H.; Li, Z. A visual-attention-based 3D mapping method for mobile robots. Acta Autom. Sin. 2017, 43, 1248–1256. [Google Scholar]
Ye, C.; Yang, Y.; Mao, R.; Fermüller, C.; Aloimonos, Y. What can i do around here? deep functional scene understanding for cognitive robots. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: New York, NY, USA, 2017; pp. 4604–4611. [Google Scholar]
Liu, S.; Tian, G. An indoor scene classification method for service robot Based on CNN feature. J. Robot. 2019, 2019, 8591035. [Google Scholar] [CrossRef]
Srivastava, S.; Li, C.; Lingelbach, M.; Martín-Martín, R.; Xia, F.; Vainio, K.E.; Lian, Z.; Gokmen, C.; Buch, S.; Liu, K. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Proceedings of the Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 477–490. [Google Scholar]
Wu, P.; Fu, W.; Kong, L. A Fast Algorithm for Affordance Detection of Household Tool Parts Based on Structured Random Forest. Acta Opt. Sin. 2017, 37, 0215001. [Google Scholar]
Porzi, L.; Bulo, S.R.; Penate-Sanchez, A.; Ricci, E.; Moreno-Noguer, F. Learning depth-aware deep representations for robotic perception. IEEE Rob. Autom. Lett. 2016, 2, 468–475. [Google Scholar] [CrossRef]
Park, W.; Han, M.; Son, J.-W.; Kim, S.-J. Design of scene knowledge base system based on domain ontology. In Proceedings of the 2017 19th International Conference on Advanced Communication Technology (ICACT), Pyeongchang, Republic of Korea, 19–22 February 2017; pp. 560–562. [Google Scholar]
Hao, Q.; Li, Y.; Wang, L.M.; Wang, M. An ontology-based data organization method. In Proceedings of the 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD), Shanghai, China, 13–16 August 2017; pp. 135–140. [Google Scholar]
Peng, Z.; Song, H.; Zheng, X.; Yi, L. Construction of hierarchical knowledge graph based on deep learning. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 302–308. [Google Scholar]
Buchgeher, G.; Gabauer, D.; Martinez-Gil, J.; Ehrlinger, L. Knowledge graphs in manufacturing and production: A systematic literature review. IEEE Access 2021, 9, 55537–55554. [Google Scholar] [CrossRef]
Yang, Y.; Wu, Z.; Zhu, X. Semi-automatic metadata annotation of web of things with knowledge base. In Proceedings of the 2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC), Beijing, China, 23–25 September 2016; pp. 124–129. [Google Scholar]
Tenorth, M.; Beetz, M. Representations for robot knowledge in the KnowRob framework. Artif. Intell. 2017, 247, 151–169. [Google Scholar] [CrossRef]
Lu, F.; Tian, G.; Li, Q. Autonomous cognition and planning of robot service based on ontology in intelligent space environment. Robot 2017, 39, 423–430. [Google Scholar]
Lu, F.; Jiang, Y.; Tian, G. Autonomous cognition and personalized selection of robot services based on emotion-space-time information. Robot 2018, 40, 448–456. [Google Scholar]
Li, C.-c.; Tian, G.-h.; Zhang, M.-y.; Zhang, Y. Ontology-based humanoid cognition and reasoning of object attributes. J. Zhejiang Univ. Sci. A 2018, 52, 1231–1238. [Google Scholar]
Rafferty, J.; Nugent, C.D.; Liu, J.; Chen, L. From activity recognition to intention recognition for assisted living within smart homes. IEEE Trans. Hum. Mach. Syst. 2017, 47, 368–379. [Google Scholar] [CrossRef]
Zhu, X.; Li, Z.; Wang, X.; Jiang, X.; Sun, P.; Wang, X.; Xiao, Y.; Yuan, N.J. Multi-modal knowledge graph construction and application: A survey. IEEE Trans. Knowl. Data Eng. 2022, 36, 715–735. [Google Scholar] [CrossRef]
Lin, J.; Zhao, Y.; Huang, W.; Liu, C.; Pu, H. Domain knowledge graph-based research progress of knowledge representation. Neural Comput. Appl. 2021, 33, 681–690. [Google Scholar] [CrossRef]
Zhou, L.; Cen, J.; Wang, X.; Sun, Z.; Lam, T.L.; Xu, Y. Borm: Bayesian object relation model for indoor scene recognition. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 39–46. [Google Scholar]
Pereira, R.; Garrote, L.; Barros, T.; Lopes, A.; Nunes, U.J. A deep learning-based indoor scene classification approach enhanced with inter-object distance semantic features. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 32–38. [Google Scholar]
Zhang, Z.; Yang, Z.; Ma, C.; Luo, L.; Huth, A.; Vouga, E.; Huang, Q. Deep generative modeling for scene synthesis via hybrid representations. ACM Trans. Graph. (TOG) 2020, 39, 3381866. [Google Scholar] [CrossRef]
Huang, S.; Usvyatsov, M.; Schindler, K. Indoor scene recognition in 3D. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 8041–8048. [Google Scholar]
Chen, J.; Ge, X.; Li, W.; Peng, L. Construction of spatiotemporal knowledge graph for emergency decision making. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 3920–3923. [Google Scholar]
López-Cifuentes, A.; Escudero-Vinolo, M.; Bescós, J.; García-Martín, Á. Semantic-aware scene recognition. Pattern Recognit. 2020, 102, 107256. [Google Scholar] [CrossRef]
Miao, B.; Zhou, L.; Mian, A.S.; Lam, T.L.; Xu, Y. Object-to-scene: Learning to transfer object knowledge to indoor scene recognition. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 2069–2075. [Google Scholar]
Brock, A.; De, S.; Smith, S.L.; Simonyan, K. High-performance large-scale image recognition without normalization. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 1059–1071. [Google Scholar]
Srinivas, A.; Lin, T.-Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 16519–16529. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10428–10436. [Google Scholar]
Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2736–2746. [Google Scholar]
Iandola, F.N. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Zoph, B. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
Li, W.; Wang, Z.; Wang, Y.; Wu, J.; Wang, J.; Jia, Y.; Gui, G. Classification of high-spatial-resolution remote sensing scenes method using transfer learning and deep convolutional neural network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1986–1995. [Google Scholar] [CrossRef]
Shabbir, A.; Ali, N.; Ahmed, J.; Zafar, B.; Rasheed, A.; Sajid, M.; Ahmed, A.; Dar, S.H. Satellite and scene image classification based on transfer learning and fine tuning of ResNet50. Math. Prob. Eng. 2021, 2021, 5843816. [Google Scholar] [CrossRef]
Wang, Q.; Li, P.; Zhang, L.; Zuo, W. Towards effective codebookless model for image classification. Pattern Recognit. 2016, 59, 63–71. [Google Scholar] [CrossRef]

Figure 1. Home service robot service system.

Figure 2. Domain ontology of the home environment.

Figure 3. Template for developing the service inference SWRL rule base.

Figure 4. The acquisition mechanism of missing attributes of objects.

Figure 5. The acquisition mechanism of missing attributes category, physical, and visual attributes of objects.

Figure 6. Network structure of MobileNetV3.

Figure 7. The proposed transfer learning strategy.

Figure 8. The structure of the semantic cognitive framework.

Figure 9. Examples of the CIFAR-100 dataset.

Figure 10. Examples of the Scene15 dataset.

Figure 11. Examples of the UMD part affordance dataset.

Figure 12. Loss values and accuracy during training for the classification of indoor functional areas.

Figure 13. Loss values and accuracy.

Table 1. Specific parameters of MobileNetV3 (large). ✓ indicates whether the SE module is used.

Input	Operator	Exp Size	#Out	SE	NL	S
$224^{2} \times 3$	conv2d	-	16	-	HS	2
$112^{2} \times 16$	bneck, 3 × 3	16	16	-	RE	1
$112^{2} \times 16$	bneck, 3 × 3	64	24	-	RE	2
$56^{2} \times 24$	bneck, 3 × 3	72	24	-	RE	1
$56^{2} \times 24$	bneck, 5 × 5	72	40	✓	RE	2
$28^{2} \times 40$	bneck, 5 × 5	120	40	✓	RE	1
$28^{2} \times 40$	bneck, 5 × 5	120	40	✓	RE	1
$28^{2} \times 40$	bneck, 3 × 3	240	80	-	HS	2
$14^{2} \times 80$	bneck, 3 × 3	200	80	-	HS	1
$14^{2} \times 80$	bneck, 3 × 3	184	80	-	HS	1
$14^{2} \times 80$	bneck, 3 × 3	184	80	-	HS	1
$14^{2} \times 80$	bneck, 3 × 3	480	112	✓	HS	1
$14^{2} \times 112$	bneck, 3 × 3	672	112	✓	HS	1
$14^{2} \times 112$	bneck, 5 × 5	672	160	✓	HS	2
$7^{2} \times 160$	bneck, 5 × 5	960	160	✓	HS	1
$7^{2} \times 160$	bneck, 5 × 5	960	160	✓	HS	1
$7^{2} \times 160$	conv2d, 1 × 1	-	960	-	HS	1
$7^{2} \times 960$	pool, 7 × 7	-	-	-	-	1
$1^{2} \times 960$	conv2d, 1 × 1, NBN	-	1280	-	HS	1
$1^{2} \times 1280$	conv2d, 1 × 1, NBN	-	k	-	-	1

Table 2. Performance comparison of different methods.

Methods	Accuracy for Different Scenes					Average Accuracy	Recognition Speed (ms/Sheet)
Methods	Bedroom	Kitchen	Living Room	Office	Store	Average Accuracy	Recognition Speed (ms/Sheet)
Work [4]	65.00%	60.00%	70.00%	65.00%	95.00%	71.00%	360
Work [38]	95.00%	75.00%	98.33%	73.33%	93.33%	87.00%	510
CLM+SVM	95.00%	80.00%	95.00%	86.67%	98.33%	91.04%	360
ResNet	82.29%	59.89%	76.64%	86.98%	80.46%	77.25%	221.75
MobileNetV3	72.55%	91.01%	88.30%	71.69%	62.05%	77.12%	50.23
Pre-trained MobileNetV3 (CIFAR-100)	94.26%	76.54%	93.83%	87.00%	89.14%	88.15%	63.11

Table 3. Prediction accuracy for various tools.

Category	Accuracy	Category	Accuracy	Category	Accuracy	Category	Accuracy
Trowel	0.969	Tenderizer	0.866	Ladle	0.899	Spoon	0.772
Scoop	0.993	Shovel	0.996	Bowl	0.961	Knife	0.928
Shears	0.938	Saw	1.000	Pot	0.753
Mug	0.981	Hammer	0.904	Mallet	0.699
Turner	0.987	Cup	0.965	Scissors	0.981

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Robotics Classification of Domain Knowledge Based on a Knowledge Graph for Home Service Robot Applications

Abstract

1. Introduction

2. Related Work

3. Using a Knowledge Graph for a Home Service Robot

3.1. Home Service Robot Service System

3.2. Construction of Concept Layer of the Home Service Robot Knowledge Graph

3.3. Home Service Reasoning Based on SWRL Rules

3.4. Object Knowledge Reasoning Based on SWRL Rules

4. Classification of Domain Knowledge

4.1. Model Selection and Usage Strategies

4.2. The Semantic Cognitive Framework

5. Experiments with Models and Frameworks

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics