A Semantic Classification Approach for Indoor Robot Navigation

Alenzi, Ziyad; Alenzi, Emad; Alqasir, Mohammad; Alruwaili, Majed; Alhmiedat, Tareq; Alia, Osama Moh’d

doi:10.3390/electronics11132063

Open AccessArticle

A Semantic Classification Approach for Indoor Robot Navigation

by

Ziyad Alenzi

¹,

Emad Alenzi

¹,

Mohammad Alqasir

¹,

Majed Alruwaili

¹,

Tareq Alhmiedat

^1,2,*

and

Osama Moh’d Alia

¹

Department of Computer Science, Faculty of Computes & Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

²

Industrial Innovation & Robotics Center (IIRC), University of Tabuk, Tabuk 71491, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(13), 2063; https://doi.org/10.3390/electronics11132063

Submission received: 1 May 2022 / Revised: 23 June 2022 / Accepted: 29 June 2022 / Published: 30 June 2022

(This article belongs to the Special Issue Machine Learning: Advances in Models and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Autonomous robot navigation has become a crucial concept in industrial development for minimizing manual tasks. Most of the existing robot navigation systems are based on the perceived geometrical features of the environment, with the employment of sensory devices including laser scanners, video cameras, and microwave radars to build the environment structure. However, scene understanding is a significant issue in the development of robots that can be controlled autonomously. The semantic model of the indoor environment offers the robot a representation closer to the human perception, and this enhances navigation tasks and human–robot interaction. In this paper, we propose a low-cost and low-memory framework that offers an improved representation of the environment using semantic information based on LiDAR sensory data. The output of the proposed work is a reliable classification system for indoor environments with an efficient classification accuracy of 97.21% using the collected dataset.

Keywords:

semantic classification; environment understanding; LiDAR scanner; supervised machine learning

1. Introduction

Autonomous mobile robots offer solutions to diverse fields, such as transportation, defense, industry, and education. Mobile robots can perform other tasks as well, including monitoring, material handling, rescue operations, and disaster relief. Autonomous robot navigation is also a key concept in industrial development for minimizing manual work. However, in most applications, the autonomous robot is required to work in an unknown environment, which may contain several obstacles in route to the desired distention.

Robot navigation involves the robot’s ability to determine its own location, direction, and path to the destination. For most of the navigation tasks, mobile robots are required to read the environment based on sensor readings and prior knowledge. The formulation of this representation and the creation of a map, called Simultaneous Localization and Mapping (SLAM), was first addressed in [1,2].

In general, robots are expected to operate in autonomously in dynamic and complex environments. Understanding the environment’s characteristics is a significant task to allow the robot to move autonomously and make suitable decisions accordingly. Recently, the mobile robotics field has started to incorporate semantic information into navigation tasks, leading to a new concept called semantic navigation [3]. This kind of navigation brings robots closer to the human way of modeling and understanding the navigation environment, representing the navigation environment in a human-friendly way.

Semantic knowledge has created new paths in robot navigation, allowing a higher level of abstraction in the representation of navigation information. In robot navigation, semantic maps offer a representation of the environment considering elements with high levels of abstraction. Most of the recently developed semantic approaches divide the indoor environments into three categories [4]: rooms, corridors, and doorways, as these are the three most representative semantic labels on the 2D map. According to [5], semantic navigation offers significant benefits to the area of mobile robot navigations, as follows:

Human-friendly models: The mobile robot models the environment in a way that humans understand.
Autonomy: The mobile robot can decide, on its own, how to travel to the designated location.
Efficiency: Calculating the route to the destination does not require the robot to explore the entire environment. Instead, it focuses on specific areas for partial exploration.
Robustness: The mobile robot can recover missing information.

Understanding the environment is an essential task to perform high-level navigation; therefore, this paper focuses on the semantic information for mobile robot navigation. We developed a semantic classification system that can distinguish between four different environments (rooms, doors, halls, and corridors) based on LiDAR data frames, and it can be employed in indoor navigation approaches. The developed semantic classification system consists of two main phases: offline and online. In the former phase, the dataset is collected from four different environments, labeled, processed, and trained using Machine Learning (ML) models, whereas in the later phase, the testing process is taken place to assess the efficiency of the developed classification system. Thus, the main contributions of this paper are as follows:

Research on recently developed semantic-based navigation systems for indoor environments.
Building a semantic dataset that consists of indoor LiDAR frames collected from a low-cost RPLiDAR A1 sensor.
Adopting a preprocessing method to maintain the collected LiDAR data frames.
Investigating the accuracy of semantic classification using several ML models.

The rest of this paper is organized as follows: Section 2 discusses the recent developed semantic navigation systems employed in mobile robotics, whereas Section 3 presents the developed semantic classification model. Section 4 discusses the experiment testbed including the robot system and the environment structures. In Section 5, we discuss the obtained results of the proposed system, and finally, Section 6 concludes the work presented in this paper and suggests future work.

2. Related Works

In general, the robot navigation systems can be categorized into three categories: geometric-based navigation, semantic-based navigation, and hybrid (geometric and semantic). Geometric-based navigation uses geometric features or grids to describe the geometric layout of the environment, whereas the semantic-based navigation involves abstraction and includes vertices which correspond to places and edges in the environment. The hybrid category employs geometric information along with semantic knowledge to represent the environment by, for instance, employing range-finder sensors for metric representation while also using a digital camera to obtain semantic information about the environment.

Although the LiDAR frames are simple and easy to obtain, they can be employed to perform semantic classification for several environments; LiDAR sensors have been deployed in various classification projects. For instance, the authors of [6] proposed a new approach for road obstacle classification using two different LiDAR sensors. Moreover, the works presented in [7,8] developed classification systems for forest environment characteristics, whereas in [9], the authors used a single LiDAR sensor to maintain the continuous identification of a person in a complex environment. The author of [10] developed a classification system using a LiDAR sensor, which was able to classify three different types of buildings: single-family houses, multiple-family houses, and non-residential buildings. The obtained classification accuracy was >70%.

The work presented in [11] developed a method for pointwise semantic classification for the 3D LiDAR data into three categories: non-movable, movable, and dynamic objects. In [12], the authors proposed a 3D point cloud semantic classification approach based on spherical neighborhoods and proportional subsampling. The authors showed that the performance of their proposed algorithm was consistent on three different datasets acquired using different technologies in various environments. The work presented in [13] utilized a large-scale dataset named SemanticKITTI, which showed unprecedented scale in the pointwise annotation of point cloud frames.

Recently, the rapid development of deep learning in image classification has led to significant improvements in the accuracy for classifying objects in indoor environments. Several powerful deep network architectures have been proposed recently, including GoogleNet [14,15], VGGNet [16], MobileNet [17], and ResNet-18 [18] to solve the problem of image classification. For instance, the authors of [19] proposed an object semantic grid mapping system using 2D LiDAR and an RGB-D camera. The LiDAR sensor is used to generate a grip map and obtain the robot’s trajectory, whereas the RGB-D camera is employed to obtain the semantics of color images and employ joint interpolation to refine camera poses. The authors employed the Robot@Home dataset to assess the system’s efficiency and used the R-CNN model to detect static objects such as beds, sinks, microwaves, sinks, toilets, and ovens.

In [20], the authors proposed a framework to build an enhanced metric representation of indoor environments. A deep neural network model was employed for object detection and semantic classification in a visual-based perception pattern. The output of the developed system was a 2D map of the environment extended with semantic object classes and their locations. This system collected the required data from several sensors including LiDAR, an RGB-D camera, and odometers. In addition, the authors employed CNN-based object detection and a 3D model-based segmentation technique to localize and identify different classes of objects in the scene.

The work presented in [21] included the development of an intelligent mobile robot system which was able to understand the semantics of human environments and the relationships with and between humans in the area of interest. The obtained map offered a semi-structured human environment that provided a valuable representation for robot navigation tasks. Moreover, the obtained map consisted of high-level features, such as planar surfaces and door signs that involve text and objects.

The authors of [22] proposed a self-supervised learning approach for the semantic segmentation of LiDAR frames. Through this study, the authors revealed that it was possible to learn semantic classes without human annotation and then employ them to enhance the navigation process. The work presented in [23] developed a CNN model for classifying objects and indoor environments including rooms, corridors, kitchens, and offices using visual semantic representations of the data received from the vision sensor.

In [24], the authors proposed a probabilistic approach integrating heterogenous, uncertain information such as the size, shape, and color of objects through combining the data from multi-modal sensors. This system employed vision and LiDAR sensors and combined the received data to build a map. The work presented in [25] used a semantic relational model that involved both the conceptual and physical representation of places and objects in indoor environments. The system was developed using a Turtle robot with onboard vision and LiDAR sensors.

In [26], the authors employed image sensors for path planning based on the movable area extraction from input images using semantic segmentation. The experimental results proved that the ICNet could extract the moveable area with an accuracy of 90%.

The authors of [27] proposed a low-cost, vision-based perception system to classify objects in the indoor environment, converting the problem of robot navigation into scene understanding. The authors designed a shallow convolutional neural network with efficient scene classification accuracy to process and analyze images captured by a monocular camera.

The work presented in [28] analyzed the relationship between the performance of image prediction and the robot’s behavior, which was controlled by an image-based navigation system. In addition, the authors discussed the effectiveness of directing the camera into the ceiling to adapt to dynamic changes in the environment.

As stated above, most of the recently developed systems are based on the integration of visual and LiDAR representations of the environment. Table 1 presents a comparison between the research works discussed in this section according to the type of employed sensors, the number of classification environments and objects, and the obtained accuracy.

3. Semantic Classification System

In this work, we developed a semantic classification model for robot navigation that can recognize the navigation area using a low-cost, LiDAR-based classification system. The developed system can differentiate between four different environments: rooms, corridors, doorways, and halls.

The developed semantic classification system consists of two main phases: offline and online. In the offline phase, the LiDAR data frames are collected, processed, stored in a database file, and used as a training dataset for ML models, whereas in the online phase, the robot collects the LiDAR data, processes it, and then classifies the environment type (room, doorway, corridor, or hall) according to the pretrained model in the offline phase. Figure 1 shows the concept of the proposed semantic classification model (offline phase), which consists of five main stages, as follows:

The collection of LiDAR frames: The mobile robot scans the environment using RPLiDAR A1 and collects 360 different samples for each environment. However, in most of the cases, the scanned LiDAR data consist of missing and infinity values, which can be obtained from either very far or very close walls/objects, and this may reduce the accuracy of the classification model.
Process the collected data: Since the scanned LiDAR data consist of missing and infinity values, it is important to recover the missing values and process the infinity values to guarantee high classification accuracy. Algorithm 1 presents the recovery function that has been employed to process the missing data, which runs on a Raspberry Pi computer. The missing data are processed according to the LiDAR data frames that exist before and after the missing data frame(s), where in certain cases, the missing data frames are either averaged or replicated according to the position and the quantity of the missing data frames.
Label the processed data: The developed semantic classification system will be able to classify between four different environments: doorways, rooms, corridors, and halls. Therefore, each environment has been assigned a unique identification number (label) as presented later (for instance, room: 0, corridor: 1, doorway: 2, and hall: 3).
Store the labeled data: The processed LiDAR frames are stored in a csv file to make it available for the training phase. Figure 2 presents the structure of the processed LiDAR frames received from the LiDAR sensor after the processing stage.
Train the model: This phase involves employing several ML models to train and test the efficiency of the developed semantic classification system using various metrics.

Algorithm 1: Preprocessing phase of the sensed LiDAR data

Input: Array of LiDAR frames with the size of 360 values

Output: Processed LiDAR frames with the size of 360 values

1: let frames[] is a 1D array of the LiDAR frames

2: let cols is the total number of received frames

3: let count = 0 is a loop counter

4: let max is the maximum number in a set of LiDAR values

5: let loc is the initial index for the first ‘inf’ value

6: while (frames[count] < cols)

7: if(frames[count] == ‘inf’)

8: loc = count

9: if(count == 0)

10: max = 0

11: else: max = frames[count-1]

12: while(frames[count] == ‘inf’)

13: count++

14: if(max < frames[count])

15: max = frames[count]

16: while(count ≥ loc)

17: frames[count] = max

18: count--

19: count += loc

20: end

The online phase involves collecting LiDAR frames from the LiDAR sensor, processing the collected data, and then classifying the environment type (where the robot is located) according to the pretrained model in the offline phase. This proceeds as follows:

Scan the area of interest: The LiDAR frames from the area where the mobile robot is placed are scanned.
Process the collected data: The collected data contain missing and infinity values; therefore, Algorithm 1 is employed to recover these values.
Classify the environment type: The processed LiDAR frames are passed to the pretrained model to classify the environment type.

4. Experimental Testbed

This section discusses the experiment testbed including the mobile robot system, the labeling phase, and the collection procedure of the LiDAR frames from four different types of environments.

4.1. Mobile Robot System

This section presents the mobile robot system employed in our experiments. The employed mobile robot is a four-wheel drive robot equipped with a 2D laser (RPLiDAR A1), which is high-speed vision acquisition and processing hardware developed by SLAMTEC. It can achieve 360° scans within a 12 m range and generate 8000 pulses per second. Figure 3 depicts the developed mobile robot system employed in our experiments. The mobile robot was able to freely move in the area of interest based on range finder sensors.

The mobile robot system consists of a Raspberry Pi 4 4-GB RAM computer, an array of sensors (gyro, motor-encoder, and RPLiDAR), a motor driver, and four-wheel drive motors. The mobile robot architecture is presented in Figure 4. Table 2 shows the full specifications of the employed RPLiDAR A1, whereas Table 3 presents the full specifications for the employed Raspberry Pi 4 computer system. The onboard Raspberry Pi 4 computer collects the LiDAR frames from the RPLiDAR sensor, preprocesses the received frames, and then employs an ML model to identify the environment class.

With regard to software requirements, the developed semantic classification system integrates several software development kits including Python 3.7.13 as a development environment, Pandas 1.3.5 for processing the LiDAR frames, and a Numpy 1.21.6 library for working with arrays.

4.2. Setting up the Semantic Information Dataset

Recognizing places and objects is a complicated task for robots. Therefore, there are several methods to complete object identification tasks. As mentioned earlier, the proposed semantic classification system can distinguish between four different environments (rooms, doorways, corridors, and halls). First, the room dataset was collected from 43 different rooms, where most of the rooms were furnished and included cabinets, beds, chairs, and desks. The rooms’ size was in the range of 3.5 × 3.5 m.

Second, the corridor dataset was collected from 51 different corridors located in different places (Faculty of Computers & Information Technology building, Industrial Innovation & Robotics Center, and other corridors located in the University of Tabuk). In the corridor dataset, the ratio of infinity values was around 5.89%, and this is because one, one or two sides of the corridors may be out of range, and two, the RPLiDAR A1 may offer inaccurate readings in some situations.

Third is the doorway dataset. This involves collecting the LiDAR frames from almost 63 different doors (opened status). In most cases, the door dataset was collected from rooms, corridors, and halls with opened doors. The infinity ratio in the doorway dataset was around 2.76%; as with the room dataset, most of the LiDAR frames are in the range of the RPLiDAR A1 sensor.

Fourth, the hall dataset was collected from different halls located in the University of Tabuk, including: lecture halls, open space halls, and large labs, with a total number of 49 different halls. In the hall dataset, the ratio of infinity values is around 20.32%, since in some cases, the LiDAR frames were out of range.

Figure 5 shows the total number of records for each class (environment), where the total amount of collected data from each environment is almost equal to the other classes. As shown below, the established dataset with four classes is almost balanced, with the following records: 109, 100, 99, and 103 for rooms, corridors, doorways, and halls, respectively. Table 4 presents general statistics about the number of collected readings from different environments. Figure 6 shows an example of the hall, room, corridor, and door environments.

The labeling process has been completed manually by labeling each environment separately with a unique identifier (0: room, 1: corridor, 2: doorway, and 3: hall). The results were then saved in a database file.

One of the main limitations of the RPLiDAR A1 ranging sensor is that this type of sensor may provide infinity and missing values when scanning, which leads to inaccurate semantic information and thus an inefficient robot navigation system. Therefore, in this section, we presented measurements for the “infinity” values received by each class. Figure 7 shows the total number of “infinity” values for each environment class, where the ratio of infinity values differs from one environment to another; however, there is no noticeable variation between the corridor, doorway, and room classes, whereas a large difference is shown between the first three classes (room, corridor, and doorway) and the fourth class (hall), because the hall areas were, in general, larger than the range of the RPLiDAR A1.

Figure 8 depicts the collected LiDAR data frames from the RPLiDAR A1 for the room environment, which consists of several ‘infinity’ values obtained from close or out of range objects. The processed LiDAR frames are presented in Figure 9, where the ‘inf’ values were processed with numerical values according to the preprocessing function presented in Algorithm 1.

4.3. Machine Learning Algorithms

This section discusses the employed ML models, including decision tree, CatBoost, random forest (RF), Light Gradient Boosting (LGB), Naïve Bayes (NB), and Support Vector Machine (SVM).

Recently, ML methods have played a significant role in the field of autonomous navigation, where ML techniques have provided a robust and adaptive navigation performance while minimizing development and deployment time. In this work, we tried several ML models to train, test, and classify the environment type using the developed semantic classification approach in order to find the best performance among them. The following ML models have been employed:

Decision Tree: The decision tree model is a tree-like structure that allows you to make a decision on a given process. The flowchart starts with a main idea and then branches out based on the consequences of the final decision. We employed decision tree because it is a powerful method for classification and prediction, and it can handle non-linear datasets in an effective way. In our experiments, the training parameter for the decision tree is the max depth (max_depth = 16).
CatBoost: The CatBoost classifier is based on gradient boosted decision trees. In the training phase, a set of decision trees is built consecutively. Each successive tree is built with reduced loss compared to the previous trees. Usually, CatBoost offers state-of-the-art results compared to existing ML models, and it can handle categorical features in an autonomous way. For the implemented CatBoost classifier, the training parameters are presented in Table 5.
Random Forest (RF): The random forest is a classification algorithm made up of several decision trees. Each individual tree is built in a random way to promote uncorrelated forests, which then employ the forest’s predictive power to offer an efficient decision. The random forest has been employed in our system, since it is useful when dealing with large datasets, and interpretability is not a major concern. The training parameters for the implemented RF classifier are presented in Table 6.
Light Gradient Boosting (LGB): This is a decision tree-based method. It is a fast, distributed, high-performance gradient boosting framework. In this work, we employed Light Gradient Boosting because it achieves high accuracy when the classifier is unstable and has a high variance. The training parameters for the implemented LGB classifier are presented in Table 7.
Naïve Bayes (NB): This is based on the Bayes theorem to classify the objects in the dataset. The Naïve Bayes classifier assumes powerful, or naïve, independence between attributes of data points. We have employed Naïve Bayes because it employs similar methods to predict the probability of different classes based on various attributes.
Support Vector Machine (SVM): The SVM maps data to a high-dimensional feature space, so the data points can be categorized, even if the data are not linearly separable. SVM is effective in high-dimensional spaces, and it uses a subset of training points in the decision function. In addition, SVM is an efficient model in terms of memory requirements. The training parameters for the implemented Support Vector Machine classifier are presented in Table 8.

5. System Evaluation

This section details the training and testing phases of the classification process. The collected dataset has been divided into two subsets: training and testing, with 70% for training and 30% for testing (287 and 124 records for training and testing, respectively).

The proposed algorithms’ performance is tested and evaluated using the collected LiDAR frames dataset, which consists of 411 data frames distributed among four different class environments. For evaluation purposes, we assessed the performance of the ML model through different metrics, including:

Confusion matrix: This describes the complete performance of the environment classification model.
Accuracy: This refers to the percentage of the total number of environment classes that were correctly predicted. Accuracy defines how accurate the environment classification system is. The accuracy formula is presented below:

$A c c u r a c y = \frac{c p}{t}$

where cp and t refer to the correct predictions and the total number of samples, respectively.
Precision: This refers to the ratio of correctly predicted positive observations over the total predicted positive observations. For multi-class classification, there are two different approaches to compute the precision:
- Macro averaged precision: This calculates the precision for all classes individually and then averages the results.
- Micro averaged precision: This calculates the class wise true positive and false positive and then uses them to calculate the overall precision. The precision formula is presented as follows:
  
  $P r e c i s i o n = \frac{T P}{T P + F P}$
  
  where TP and FP refer to the number of true positive cases and false positive cases, respectively.
Recall: This refers to the ratio of correctly predicted positive observations to all the observations in the actual class. In multi-class classification, the recall is estimated in two different ways, as follows:
- Macro averaged recall: This calculates the recall for all the classes and then averages the results.
- Micro averaged recall: This calculates the class wise true positive and false negative and then uses them to calculate the overall recall. The recall formula is presented as follows:
  
  $R e c a l l = \frac{T P}{T P + F N}$
  
  where FN refers to the number of false negative cases.
F1-score: This refers to the weighted average of precision and recall. An F1-score is more useful than accuracy if the dataset is unbalanced. However, in our case, the environment classes are almost balanced. The F1-score can be calculated in two different ways:
- Macro averaged F1-score: This calculates the F1-score for each class and then averages the results.
- Micro averaged F1-score: This calculates the macro-average precision score, the macro-average recall score, and the estimated harmonic mean. The F1-score formula is presented as follows:
  
  $F 1 s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$
Receive Operating Characteristic (ROC): This refers to the performance measurement for classification problems at different threshold settings.

5.1. Confusion Matrix

The confusion matrix for each ML model has been obtained to assess efficiency in the testing phase. For instance, Table 9 shows the confusion matrix for the NB classifier, where the classifier could classify the rooms, corridors, and halls with high classification accuracy; however, the classification accuracy for the doorway class was only reasonable.

On the other hand, Table 10 presents the confusion matrix for the RF classifier, with better classification accuracy than the NB classifier for both the corridor and doorway classes. In Table 11, the confusion matrix for the CatBoost classifier is presented, where an efficient classification accuracy was obtained for identifying the room, corridor, and hall classes, whereas the door class received a low classification accuracy. Table 12 shows the confusion matrix for the decision tree classifier, where the classifier offers reasonable classification accuracy for the room, corridor, and hall classes, whereas the doorway offered the minimum classification accuracy. The confusion matrix for the LGB classifier is presented in Table 13, where the obtained classification accuracy was reasonable for all classes except the doorway calls. Finally, Table 14 presents the confusion matrix for the SVM classifier. As noted, the SVM offers the best classification accuracy for almost classes.

5.2. Classification Accuracy

The classification accuracy is a significant metric in this project, as the collected LiDAR dataset contains an almost equal number of samples for each environment (class). Therefore, the classification accuracy has been assessed for both the training and testing phases, as shown in Figure 10.

The training accuracy, along with several performance metrics for the employed ML models, is presented in Table 15. As noted, RF, CatBoost, decision tree, and LGB offer 100% training accuracy, whereas NB offers the minimum training accuracy with 90.97%, and SVM achieves 96.86% training accuracy.

We also evaluated the testing accuracy for all ML models. Table 16 presents several evaluation metrics for each ML model. In terms of accuracy, SVM achieves the best classification accuracy of 97.21% using the testing subset, whereas decision tree offers the minimum accuracy 75.80%. In general, SVM offers high classification accuracy when there is a clear margin of separation between classes.

5.3. Precision, Recall, and F1-Score

In addition to the classification accuracy, the precision, recall, and F1-scores all have been assessed for each ML model in the training phase. As noticed in Table 15, the RF, CatBoost, decision tree, and LGB offer the best precision, recall, and F1-score results with a score of almost 99.99%, whereas the NB classifier achieves the worst results with an average score (precision, recall, and F1-score) of 93.29%. The SVM classifier offers efficient scores (precision, recall, and F1-score), with an average score of 98.89%.

The precision, recall, and F1-score indicators are also computed for the testing phase. In this case, precision represents the quality of the positive predictions made by the ML model. The SVM classifier offers the best precision among all the ML models, with an average precision of 97.24%, whereas the RF classifier comes in second with an average precision of 92.17%. Decision tree achieves the worst precision score among all the ML models, at 75.55%.

Unlike prevision scores, recall provides a measure of how accurately the ML model was able to identify the relevant data. The SVM classifier offers the best recall score among the six ML models, at 97.48%. However, recall scores are not as significant as accuracy, since the established dataset is balanced. The F1-score was also computed; it sums up the predictive performance of an ML model by combining the recall and precision scores. According to the presented results in Table 16, the SVM model achieves the best F1-score results, at 97.34%.

5.4. Receiver Operating Characteristic (ROC)

This section discusses the ROC by analyzing the performance of six classification models at different classification thresholds, as presented in Figure 11. As discussed earlier, SVM offers the best accuracy, precision, recall, and F1-scores. However, the ROC is a significant metric to assess the performance of predicting each class separately. Table 17 presents the evaluation of ROC for all ML models, where the SVM classifier offers the best classification accuracy for the room, doorway, and hall environments, at 100.0%, 96.50%, and 100.0%, respectively. SVM, on the other hand, offers a reasonable identification accuracy for classifying the door class with an accuracy of 96.91%. The RF and CatBoost classifiers offer the best classification accuracy for the corridor class with an accuracy of 99.50% for both models. Moreover, CatBoost achieves similar classification accuracy on the hall class. Therefore, CatBoost comes in second place after SVM.

6. Discussion

The importance of this research is to study the benefits of the information collected using our RPLiDAR A1 sensor to better grasp the semantic information for a navigation robot’s traversed environment.

In this research, we were able to develop a system based on ML algorithms, through which we can distinguish the environment surrounding the robot including four different areas: rooms, corridors, doorways, and halls using a range-finder sensor. We were able to achieve a high accuracy rate of 97.21% utilizing the SVM model after testing with a variety of machine-learning techniques. Our goal, as stated in the introduction, is to provide a low-power, high-performance navigation system that can run on a Raspberry Pi 4 computer, and this is what we achieved.

In comparison with the developed semantic-based vision systems [19,20,21,22,23,24,25,26], the authors employed deep learning using vision camera sensors to build a semantic navigation map for robot systems, with a reasonable classification accuracy, where the average classification environments/objects were equal to (4–8) classes, with a reasonable classification accuracy (average classification accuracy of 83%). However, these architectures require powerful GPUs, large memories, and huge datasets, which make these solutions difficult to deploy on low-power devices.

In fact, we were unable to compare the results we obtained in this study with the results of other researchers’ work in this field due to a number of factors, the most important of which is the lack of a general dataset with which we can compare the results, as well as a different research structure. This research tried to rely on low-power, high-performance navigation systems, which do not currently exist in the field.

Furthermore, the results obtained in this paper, which showed that the SVM algorithm is superior to the other employed algorithms, are due to several factors, the most important of which is the nature of the data used, which is somewhat separated from each other, making it easier for the algorithm to classify with high accuracy.

7. Conclusions

In this research, we effectively improved the ability of a robot to achieve a broader understanding of the environment surrounding it by distinguishing four areas in which the robot may be located, which are rooms, corridors, doorways, and halls. The developed system was built on a low-power and high-performance navigation approach to label the 2D LiDAR room maps for better and more effective semantic robot navigation. Although many ML algorithms were used in distinguishing the collected data from the RPLiDAR A1 sensor, we found that the support vector machine algorithm achieved the highest accuracy in both the training and testing phases.

Further semantic information—through adopting additional range-finder sensors, building a large enough indoor dataset with more environment classes, and investigating more ML algorithms—will be important directions in future work.

Author Contributions

Z.A., E.A., M.A. (Mohammad Alqasir), and M.A. (Majed Alruwaili) were involved in the experimental setup, implementation, and the collection of the LiDAR datasets. O.M.A. contributed in the discussion and the experimental results parts. Finally, T.A. contributed in the implementation and testing including developing several Machine Learning models for semantic robot navigation. O.M.A. and T.A. prepared the complete manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The source code of the developed semantic classification system along with the gathered LiDAR data frames dataset are currently available to the public at https://www.kaggle.com/datasets/tareqalhmiedat/lidardataframes (accessed on 1 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Smith, R.C.; Cheeseman, P. On the representation and estimation of spatial uncertainty. Int. J. Robot. Res. 1986, 5, 56–68. [Google Scholar] [CrossRef]
Leonard, J.J.; Durrant-Whyte, H.F. Simultaneous map building and localization for an autonomous mobile robot. IROS 1991, 3, 1442–1447. [Google Scholar]
Borkowski, A.; Siemiatkowska, B.; Szklarski, J. Towards semantic navigation in mobile robotics. In Graph Transformations and Model-Driven Engineering; Springer: Berlin/Heidelberg, Germany, 2010; pp. 719–748. [Google Scholar]
Mozos, Ó.M. Semantic Labeling of Places with Mobile Robots; Springer: Berlin/Heidelberg, Germany, 2010; Volume 61. [Google Scholar]
Crespo, J.; Castillo, J.C.; Mozos, O.M.; Barber, R. Semantic information for robot navigation: A survey. Appl. Sci. 2020, 10, 497. [Google Scholar] [CrossRef] [Green Version]
García, F.; Jiménez, F.; Naranjo, J.E.; Zato, J.G.; Aparicio, F.; Armingol, J.M.; de la Escalera, A. Environment perception based on LIDAR sensors for real road applications. Robotica 2012, 30, 185–193. [Google Scholar] [CrossRef] [Green Version]
Hopkinson, C.; Chasmer, L.; Gynan, C.; Mahoney, C.; Sitar, M. Multisensor and multispectral Lidar characterization. Can. J. Remote Sens. 2016, 42, 501–520. [Google Scholar] [CrossRef]
McDaniel, M.W.; Nishihata, T.; Brooks, C.A.; Iagnemma, K. Ground plane identification using LIDAR in forested environments. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–8 May 2010; pp. 3831–3836. [Google Scholar]
Álvarez-Aparicio, C.; Guerrero-Higueras, Á.M.; Rodríguez-Lera, F.J.; Ginés Clavero, J.; Martín Rico, F.; Matellán, V. People detection and tracking using LIDAR sensors. Robotics 2019, 8, 75. [Google Scholar] [CrossRef] [Green Version]
Lu, Z.; Im, J.; Rhee, J.; Hodgson, M. Building type classification using spatial and landscape attributes derived from Lidar remote sensing data. Landsc. Urban Plan. 2014, 130, 134–148. [Google Scholar] [CrossRef]
Dewan, A.; Oliveira, G.L.; Burgard, W. Deep semantic classification for 3d lidar data. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 3544–3549. [Google Scholar]
Thomas, H.; Goulette, F.; Deschaud, J.E.; Marcotegui, B.; LeGall, Y. Semantic classification of 3D point clouds with multiscale spherical neighborhoods. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 390–398. [Google Scholar]
Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9297–9307. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html (accessed on 28 June 2022). [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Andrew, G.; Menglong, Z. Efficient convolutional neural networks for mobile vision applications. Mobilenets 2017, 10, 151. [Google Scholar]
Chen, Y.; Fan, H.; Xu, B.; Yan, Z.; Kalantidis, Y.; Rohrbach, M.; Yan, S.; Feng, J. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3435–3444. [Google Scholar]
Qi, X.; Wang, W.; Liao, Z.; Zhang, X.; Yang, D.; Wei, R. Object semantic grid mapping with 2D Lidar and RGB-D camera for domestic robot navigation. Appl. Sci. 2020, 10, 5782. [Google Scholar] [CrossRef]
Bersan, D.; Martins, R.; Campos, M.; Nascimento, E.R. Semantic map augmentation for robot navigation: A learning approach based on visual and depth data. In Proceedings of the 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), Joao Pessoa, Brazil, 6 November 2018; pp. 45–50. [Google Scholar]
Cosgun, A.; Christensen, H.I. Context-aware robot navigation using interactively built semantic maps. Paladyn J. Behav. Robot. 2018, 9, 254–276. [Google Scholar] [CrossRef]
Thomas, H.; Agro, B.; Gridseth, M.; Zhang, J.; Barfoot, T.D. Self-supervised learning of Lidar segmentation for autonomous indoor navigation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 14047–14053. [Google Scholar]
Zender, H.; Mozos, O.M.; Jensfelt, P.; Kruijff, G.J.; Burgard, W. Conceptual spatial representations for indoor mobile robots. Robot. Auton. Syst. 2008, 56, 493–502. [Google Scholar] [CrossRef] [Green Version]
Pronobis, A.; Jensfelt, P. Large-scale semantic mapping and reasoning with heterogeneous modalities. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Guangzhou, China, 11–14 December 2012; pp. 3515–3522. [Google Scholar]
Crespo, J.; Barber, R.; Mozos, O.M. Relational model for robotic semantic navigation in indoor environments. J. Intell. Robot. Syst. 2017, 86, 617–639. [Google Scholar] [CrossRef]
Miyamoto, R.; Adachi, M.; Nakamura, Y.; Nakajima, T.; Ishida, H.; Kobayashi, S. Accuracy improvement of semantic segmentation using appropriate datasets for robot navigation. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019; pp. 1610–1615. [Google Scholar]
Ran, T.; Yuan, L.; Zhang, J.B. Scene perception based visual navigation of mobile robot in indoor environment. ISA Trans. 2021, 109, 389–400. [Google Scholar] [CrossRef]
Ishihara, Y.; Takahashi, M. Empirical study of future image prediction for image-based mobile robot navigation. Robot. Auton. Syst. 2022, 150, 104018. [Google Scholar] [CrossRef]

Figure 1. The offline phase for the semantic classification system.

Figure 2. The structure of the semantic dataset.

Figure 3. The developed four-wheel drive mobile robot system with an RPLiDAR A1 sensor.

Figure 4. Mobile robot architecture.

Figure 5. The total number of records for each class environment: room, corridor, doorway, and hall.

Figure 6. Examples of the environments (a) hall, (b) room, (c) corridor, and (d) door.

Figure 7. Total number of infinity values in each class.

Figure 8. The collected LiDAR frames for the room environment before processing stage.

Figure 9. The processed LiDAR frames for the room environment after processing stage.

Figure 10. The training and testing accuracy for six ML models.

Figure 11. Evaluation of the classification accuracy for the four classes in all ML models.

Table 1. A comparison between recently developed, semantic-based classification systems.

Research Work	Type of Sensors	Classified Environments/Objects	Accuracy
[6]	2D LiDAR	Road obstacle classification	NA
[8]	3D LiDAR	Ground, shrubs, tree trunks, and tree branches
[9]	2D LiDAR	(single class) presence of people	NA
[10]		Single-family house, multiple-family house, and non-residential house	Accuracy: >70%
[11]	3D LiDAR	Non-movable, movable, and dynamic objects	Precision: 79.06% Recall: 79.60%
[12]	3D LiDAR	Façade, tree, barrier, car, and ground	Precision: 82.84%
[13]	3D LiDAR	25 classes	NA
[19]	LiDAR and RGB-D	Bed, couch, sink, toilet, microwave, and oven	Precision: 68.42% Recall: 65.00%
[20]	LiDAR, RGB-D, and odometer	People, window, door, bench, table, chair, trash bin, and fire extinguisher	Accuracy: 73.8%
[21]	3D scanner, laser scanner, RGB-D camera	Objects, planar surfaces, door signs	NA
[22]	LiDAR and RGB-D camera	Ground, walls, doors, tables, chairs	Simulation experiments Recall: 95%
[23]	Laser and vision sensors	Corridor, room, office, and kitchen	NA
[24]	Laser and vision sensors	Rooms with different shapes and sizes	Average accuracy: 80.5%
[25]	RGB-D cameras	Chair, sink, stove, refrigerator, heater, washing machine, sofa, TV set dining table, desk, computer, bookcase	NA
[26]	Image sensors	Movable area	Accuracy: 99%

Table 2. The specifications for RPLiDAR A1.

Specification	Value
Range	12 m
Rotational Degree	360° omnidirectional
Sample Rate	8000 sample/second
Frequency	5.5 Hz
Angular Resolution	1 degree
Manufactured	Slamtec, China
Price	$100.00

Table 3. The specifications for Raspberry Pi 4.

Specification	Value
Processor	Cortex-A72 64-bit @ 1.5 GHz
Memory	4 GB LPDDR4
Connectivity	2.4 GHz and 5.0 GHz IEEE 802.11b/g/n/ac
GPIO	Standard 40-pin GPIO header
Operating System	Raspbian
Manufactured	United Kingdom
Price	$120.00

Table 4. General statistics on the environment classes for the established dataset.

Environment	Label	# of Records	# of Readings	# of Inf	Inf Ratio	Average Size
Room	0	109	39,240	1379	3.51%	3.5 × 3.5 m²
Corridor	1	100	36,000	2123	5.89%	12 × 2.5 m²
Doorway	2	99	35,640	986	2.76%	1.20 m
Hall	3	103	37,080	7536	20.32%	15 × 15 m²

Table 5. Tuning parameters for the CatBoost classifier.

Parameter	Value
Depth	8
Learning rate	0.05
Iterations	65

Table 6. Tuning parameters for RF classifier.

Parameter	Value
Number of estimators	250
Minimum sample leaf	80

Table 7. Tuning parameters for LGB.

Parameter	Value
Maximum depth	5
Number of leaves	100
Min sample in leaf	25
Learning rate	0.1
Number of iterations	120

Table 8. Tuning parameters for the SVM classifier.

Parameter	Value
Gamma	1
Kernel	rbf
C	0.15

Table 9. Confusion matrix for the Naïve Bayes classifier in the testing phase.

0	30	0	0	0
1	4	23	11	0
2	0	0	21	0
3	0	1	2	32
	0	1	2	3

Table 10. Confusion matrix for the random forest classifier in the testing phase.

0	30	0	4	0
1	0	24	1	0
2	4	0	28	0
3	0	0	1	32
	0	1	2	3

Table 11. Confusion matrix for the CatBoost classifier in the testing phase.

0	31	0	8	0
1	0	24	1	0
2	3	0	25	0
3	0	0	0	32
	0	1	2	3

Table 12. Confusion matrix for the decision tree classifier in the testing phase.

0	29	1	12	0
1	0	22	5	2
2	5	1	14	1
3	0	0	3	29
	0	1	2	3

Table 13. Confusion matrix for the Light Gradient Boosting classifier in the testing phase.

0	30	0	8	0
1	0	23	3	0
2	4	1	19	0
3	0	0	4	32
	0	1	2	3

Table 14. Confusion matrix for the SVM classifier in the testing phase.

0	34	0	0	0
1	0	23	2	0
2	0	1	32	0
3	0	0	0	32
	0	1	2	3

Table 15. Evaluation of several metrics for the ML models in the training phase.

ML Model	Training Accuracy	Precision		Recall		F1-Score
ML Model	Training Accuracy	Macro	Micro	Macro	Micro	Macro	Micro
Naïve Bayes	90.97%	93.90%	93.37%	93.29%	93.37%	93.28%	93.37%
Random Forest	100.0%	99.99%	100.0%	99.99%	100.0%	99.99%	100.0%
CatBoost	100.0%	99.99%	100.0%	99.99%	100.0%	99.99%	100.0%
Decision Tree	100.0%	99.99%	100.0%	99.99%	100.0%	99.99%	100.0%
Light Gradient Boosting	100.0%	99.99%	100.0%	99.99%	100.0%	99.99%	100.0%
Support Vector Machine	96.86%	98.92%	98.95%	98.91%	98.95%	98.89%	98.95%

Table 16. Evaluation of several metrics for the ML models in the testing phase.

ML Model	Testing Accuracy	Precision		Recall		F1-Score
ML Model	Testing Accuracy	Macro	Micro	Macro	Micro	Macro	Micro
Naïve Bayes	85.48%	87.98%	85.48%	86.45%	85.48%	84.95%	85.48%
Random Forest	91.93%	92.17%	91.93%	92.64%	91.93%	92.37%	91.93%
CatBoost	90.32%	91.19%	90.32%	91.17%	90.32%	90.88%	90.32%
Decision Tree	75.80%	75.55%	75.81%	77.19%	75.81%	75.21%	75.81%
Light Gradient Boosting	83.87%	83.86%	83.87%	84.98%	83.87%	83.74%	83.87%
Support Vector Machine	97.21%	97.24%	97.58%	97.48%	97.58%	97.34%	97.58%

Table 17. Evaluation of the ROC for all ML models.

ML Model	Room	Corridor	Doorway	Hall
Naïve Bayes	94.11%	90.41%	80.88%	98.36%
Random Forest	91.89%	99.50%	88.95%	99.45%
CatBoost	91.14%	99.50%	85.09%	100.0%
Decision Tree	85.42%	92.33%	66.69%	93.68%
Light Gradient Boosting	89.67%	96.41%	75.16%	97.82%
Support Vector Machine	100.0%	96.91%	96.50%	100.0%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alenzi, Z.; Alenzi, E.; Alqasir, M.; Alruwaili, M.; Alhmiedat, T.; Alia, O.M. A Semantic Classification Approach for Indoor Robot Navigation. Electronics 2022, 11, 2063. https://doi.org/10.3390/electronics11132063

AMA Style

Alenzi Z, Alenzi E, Alqasir M, Alruwaili M, Alhmiedat T, Alia OM. A Semantic Classification Approach for Indoor Robot Navigation. Electronics. 2022; 11(13):2063. https://doi.org/10.3390/electronics11132063

Chicago/Turabian Style

Alenzi, Ziyad, Emad Alenzi, Mohammad Alqasir, Majed Alruwaili, Tareq Alhmiedat, and Osama Moh’d Alia. 2022. "A Semantic Classification Approach for Indoor Robot Navigation" Electronics 11, no. 13: 2063. https://doi.org/10.3390/electronics11132063

APA Style

Alenzi, Z., Alenzi, E., Alqasir, M., Alruwaili, M., Alhmiedat, T., & Alia, O. M. (2022). A Semantic Classification Approach for Indoor Robot Navigation. Electronics, 11(13), 2063. https://doi.org/10.3390/electronics11132063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semantic Classification Approach for Indoor Robot Navigation

Abstract

1. Introduction

2. Related Works

3. Semantic Classification System

4. Experimental Testbed

4.1. Mobile Robot System

4.2. Setting up the Semantic Information Dataset

4.3. Machine Learning Algorithms

5. System Evaluation

5.1. Confusion Matrix

5.2. Classification Accuracy

5.3. Precision, Recall, and F1-Score

5.4. Receiver Operating Characteristic (ROC)

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI