Next Article in Journal
EventDTW: An Improved Dynamic Time Warping Algorithm for Aligning Biomedical Signals of Nonuniform Sampling Frequencies
Next Article in Special Issue
Drivers’ Visual Perception Quantification Using 3D Mobile Sensor Data for Road Safety
Previous Article in Journal
Accurate and Robust Floor Positioning in Complex Indoor Environments
Previous Article in Special Issue
Mobile Robot Self-Localization with 2D Push-Broom LIDAR in a 2D Map
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep-Learning-Based Indoor Human Following of Mobile Robot Using Color Feature

School of Mechanical Engineering, Sungkyunkwan University, Suwon 16419, Korea
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(9), 2699; https://doi.org/10.3390/s20092699
Submission received: 20 March 2020 / Revised: 3 May 2020 / Accepted: 5 May 2020 / Published: 9 May 2020
(This article belongs to the Special Issue Robotic Sensing for Smart Cities)

Abstract

:
Human following is one of the fundamental functions in human–robot interaction for mobile robots. This paper shows a novel framework with state-machine control in which the robot tracks the target person in occlusion and illumination changes, as well as navigates with obstacle avoidance while following the target to the destination. People are detected and tracked using a deep learning algorithm, called Single Shot MultiBox Detector, and the target person is identified by extracting the color feature using the hue-saturation-value histogram. The robot follows the target safely to the destination using a simultaneous localization and mapping algorithm with the LIDAR sensor for obstacle avoidance. We performed intensive experiments on our human following approach in an indoor environment with multiple people and moderate illumination changes. Experimental results indicated that the robot followed the target well to the destination, showing the effectiveness and practicability of our proposed system in the given environment.

1. Introduction

In the last few years, human-following robots have become a highly active research area in various mobile robotics and computer vision applications. Computer vision technology is crucial in various practical applications such as personal guides for passengers in airports [1], elderly assistance [2], autonomous vehicles [3], surveillance systems [4], and video and sports TV [5], among others.
To follow a specific person effectively, the robot must be able to reliably track the target person within the surrounding world. However, several challenges may occur in various scenarios such as when the target person gets occluded by other people or objects due to a sudden change in his/her pose. The disappearance of the target person from the view of the robot’s camera may eventually cause the robot to stop and lose its goal. Maintaining a safe distance between the robot and the obstacles or the target in a surrounding environment without having the Freezing Robot Problem (FRP) [6] is an additional challenge for robotic follower systems. Hence, the reliable estimation of the persons’ location is quite important. Furthermore, the system must overcome all these challenges within the possibility of the available sensors (i.e., limited sensing range) mounted on the robot.
Robotic followers have been broadly studied in numerous published works and different techniques have been proposed to solve some of these challenges. One of the earliest techniques in this field used a camera to detect and track the human [7], where the authors combined a contour-based approach with a color-based approach. The human-following robot was developed using vision-based tracking for identifying an individual using the edges, color, and the texture of their clothes [8]. Bellotto and Hu [9] adopted a fusion technique using laser range data to detect the legs in order to estimate the distance to the target person and vision information to detect the faces of the people to follow the human. However, this method has a big limitation when there is occlusion by other people. Choi et al. [10] calculated the distance and angle in the screening process of children with Attention-Deficit Hyperactivity Disorder (ADHD) using simple mathematical algorithms. Li et al. [11] used a mobile robot with a red, green, blue (RGB) image segmentation process based on shape and color recognition. In [12], depth images and an H-S Histogram in a Hue, Saturation, and Value (HSV) space were obtained using a red, green, blue, depth (RGB-D) camera for service robots. Vision-based tracking for mobile robots has been proposed using template matching to follow the person [13,14]. These methods suffer from a high average computation time for video processing, thereby being sometimes insufficient for fast-tracking a human. Range-finders mounted on the mobile robot have been used for human detection and tracking due to their wide field of view (FOV) [15,16]. However, these methods have poor information about humans compared to vision-tracking methods. Previous approaches detect people using a camera or laser range-finder sensor and follow them according to their clothes’ color and positions. Although these methods try to improve the performance of human following and tracking, if the desired target is completely or partially occluded by another person or object, the mobile robot may lose track of the target person or erroneously follow another person. Therefore, the identification technique must be reliable and robust in various settings. In addition to the perceptual approach, the planning and control system should be integrated with perfect harmony to enhance the ability of the robot to deal with unexpected circumstances.
Unlike previous methods, Koide and Miura [17] integrated features of the gait, height, and appearance based on an online boosting method used as a specific human classifier. They used multiple sensors: two range finders to ensure detection of the torso and leg and a web camera to extract the clothes’ color.
Brunetti et al. [18] and Zhao et al. [19] provided a survey of the state-of-the-art human detection and tracking work by computer vision with deep learning techniques. Currently, the availability of low-cost cameras in the market allows them to be easily mounted on a robot. In modern applications, researchers use various RGB-D cameras such as Kinect, ASUS xTion and Orbbec Astra. These sensors are highly suitable for indoor environments and provide synchronized color and depth images.
In this paper, we introduce a novel framework that proposes a deep-learning-based approach for an indoor human-following robot addressing the following challenges: First, the proposed system can identify and follow the target person among other people even when there is partial occlusion of the target in the context of human-following robots. Second, the proposed system is able to perform the human-following under moderate illumination changes. Third, the proposed system attempts to recover the target in the event of the human disappearing from the robot’s FoV. This is done by selection of appropriate control for robots to maintain the robust following behavior of the person and tracking the target’s path history.
To the best of our knowledge, this is the first work that aims to continuously maintain the robot in an active state using a robust algorithm, which integrates a color feature, point cloud, and Single Shot Detector (SSD) algorithm with the Simultaneous Localization and Mapping (SLAM) algorithm for detecting multiple people and estimating all their locations using only two sensors in the context of the person identification and indoor human-following robots.
The remainder of this paper is organized as follows. First, we describe the proposed approach in Section 2. Section 3 explains the system controller of the proposed approach. In Section 4, we describe the used hardware and the experimental results that were achieved. Finally, Section 5 concludes this work and discusses some ideas for possible future improvements.

2. Human Following of Mobile Robot

The proposed human following system is depicted in the state-machine diagram, as shown in Figure 1. This diagram describes the robotic behavior, which expresses all the different states of a robot to achieve efficient tracking and autonomous navigation in dynamic environments. The tracking state may sometimes fail to track the desired person. However, the robot must be able to recover from the failure to keep track of the target during the course. Therefore, we defined three major states, i.e., tracking, last observed position (LOP), and searching, among which the robot can switch during its operation, in addition to the initialization and stop states. If a given time is spent in a failure situation during the searching state, the robot will stop its activity by a timer or allow the users to stop it. We designed three methods in order to control the robotic behavior in each of its various states (see Section 3 for more details). In general, humans usually follow other humans in a similar manner to these methods. The first method occurs when the target person is inside the camera’s FoV, in which the robot must move directly towards the location of the person based on visual tracking. The second method takes place when the robot loses the target because the person exits the camera’s FoV, in which case the robot must navigate to the person’s LOP. The final method arises when the robot does not find the target in the LOP thus switching to the next state of searching for the target and continuing the human-following behavior around the last observed spot of the target. While these methods may not be exactly to those of people, we hypothesized that they may be close to those expected by humans for robotic behavior.
Furthermore, this work combines the SLAM algorithm [20] with a visual tracking algorithm using Robot Operating System (ROS) (http://wiki.ros.org/ROS/Tutorials) tools. All these major states are described in detail in the following sections.

2.1. Initialization State

In this state, the initial position of a robot [ 0 , 0 , 0 ] and its quaternion [ 1 , 0 , 0 , 0 ] are referenced to the base coordinate frame, where the initial vector [ 0 , 0 , 0 ] represent [ x 0 , y 0 , z 0 ] and the quaternion represents [orientation, roll, pitch, yaw]. In wild navigation, only 2D plane information is needed, namely the orientation ( θ r ) and coordinates ( x r , y r ) of the mobile robot. The initialization state starts from the run moment of the code to the first frame in the tracking state, which is presented below.

2.2. Tracking State

Humans have several specific behavioral and physical features, such as fingerprints, iris, face and voice [21]. These features help brain processes to readily distinguish a certain person from others. Most of these features have already been studied in other fields like security. However, employing some of these features like iris and fingerprint in the robot field poses are some difficulties such as requiring direct human action from very close distances with expensive and advanced devices for identifying a certain person. However, the identification of a specific person by height, width, and clothing color is possible with affordable devices.
The most important sensor of humans is arguably the eye, which detects objects by their shapes, colors, widths, and heights, among other features. Similarly, robots can use visual sensors to identify objects in the real world. The framework of the proposed method of human tracking using an SSD model and a point cloud [22] are shown in Figure 2. We implemented a novel framework that involves the outlined steps: First, the SSD was used to create bounding boxes for human detection via Convolutional Neural Networks (CNNs) in a 2D image sequence. Second, 2D coordinates are converted to 3D coordinates in world space by reducing the point cloud to the volume of the human body based on the output of the SSD detector for estimating distances. Meanwhile, the color feature is extracted from the region of interest-based also on the output of the SSD detector. Third, the distance and labeled color are obtained to identify and track the target person. Finally, this information is then sent from a workstation (node 1), which publishes the people information as a talker to a robot computer (node 2), which subscribes to this information as a listener. The robot computer is responsible for the control of the robot, path planning, and obstacle avoidance based on the publisher’s information and the surrounding environment. The ROS Master manages registration information between the ROS nodes, which is the key element in the ROS system [23]. The communication between the workstation and the robot computer is made via a WiFi connection, we used 5 GHz 802.11ac and a bandwidth of 80 MHz for consistent communication. The human tracking state is a highly complex task, especially in dynamic environments. Thus keeping high video frame rate for the human tracking is very important for the performance. Below, we briefly explain the key elements that were used in this context as follows:

2.2.1. Human Detection

To enable the robot to detect humans around it, CNNs were used by the SSD model as a detector. This technique showed excellent performance in the detection of people in real-time, thus overcoming its counterparts in detection speed. The SSD uses an end-to-end training method to improve both the speed and accuracy; it is more accurate than You Only Look Once (YOLO) [24] and faster than Faster R-CNN (regions with CNN) [25] with some data sets [26]. However, these techniques require pre-training data. Figure 3 shows the MobileNet-SSD structure. The details of MobileNets and SSD are extensive and beyond the scope of this work. Therefore, we briefly describe them here. In short, SSD achieves a balance between accuracy and speed. To detect objects using 2D RGB image sequences, in which the input size of the image is 300 × 300 , only a single feed-forward process is required, followed by a non-maximum suppression step to produce the final detection. There are several different layers producing multi-scale feature maps and many aspect ratios with multiple scales that predict object detection. Each bounding box predicts the shape of the object with confidence, as shown in Figure 4a, where every shape is either a rectangle or a square depending on the shape of the predicted object. Therefore, every bounding box has four values in the pixel unit. They are given in the format ( u m i n , u m a x , v m i n , v m a x ) with MobileNets [27], (see [26,27] for more details).

2.2.2. Color Feature

Color-based detection using the clothes’ color of a full-body or the upper part of a human was adopted as a key to personal identification. The color feature can be extracted easily from a sequence of images in real time. It is robust for classifying people depending on their clothes’ colors [28]. There are many methods used for object detection, depending on shape, texture or both, such as Scale Invariant Feature Transformation (SIFT) [29,30] and Histograms of Oriented Gradient (HOG) [31,32]. We used a hue-saturation histogram for its simplicity, which is one of the most popular methods for representations of color distribution. An HS-histogram is created from set pixels in a bounding box area with different sizes and positions in consecutive scenes.
In order to reduce computation time and errors, especially in the case of a colorful environment, the whole camera volume is cropped to a volume of interest, where the body is positioned. This volume in the images, which is meaningful and important, is called a region of interest (ROI). The blue box indicates the ROI, as shown in Figure 4b. The ROI is a region that concerns humans in videos or images, according to basic concepts of neuropsychological studies [33]. It is used for ease of optimization in image processing.

2.2.3. Depth Image and Estimating the Distance

To avoid the FRP while switching between states presented in [34], the robot must be aware of its location and the relative positions of the people surrounding it. In human-following robot applications, estimating the accurate pose of people in 2D image data is not enough. Therefore, 3D pose estimation is indispensable. Although the CNN was originally prepared for image processing, it also became a significant component in the 3D object recognition process. The distance is not measured directly by the RGB-D images. However, if the center points of the object and camera are known in a 3D space position, the distance can be easily obtained via directly generating the coordinates of people using a point cloud. Once the SSD model detects the human’s body using a 2D RGB image sequence and creates a bounding box around the person’s body, the center of a bounding box ( c u , c v ) can be obtained using its vertices. The coordinate of this point can be set as the pose and the volume of interest can be extracted from the full image volume and converted from 2D-to-3D coordinates. The purpose of the 2D-to-3D conversion process is to generate a coordinate for every person based on the point cloud in meter units (see Figure 5).
Consider a group of people standing in front of the robot camera P i , i = 1 , 2 , 3 , , N , and the RGB-D camera tracking them in real-time over all frames. Let the bounding box of a person be denoted by p e r s o n i corresponding to the first frame. This means that we simultaneously obtain N people, N centers of boxes, N coordinates, N distances, and N bounding boxes in a video stream. This is one powerful advantage of SSD detector and ROS for object detection. Let p e r s o n i = { p e r s o n 1 , p e r s o n 2 , p e r s o n 2 , , p e r s o n N } , then, the centers of the boxes are as follows: ( c u i , c v i ) = ( ( c u 1 , c v 1 ) , ( c u 2 , c v 2 ) , ( c u 3 , c v 3 ) , , ( c u N , c v N ) ) , etc. The centers of boxes on the x-axis and y-axis are given by
c u i = ( u m a x i u m i n i ) / 2
c v i = ( v m a x i v m i n i ) / 2
where u m a x , u m i n , v m a x , and v m i n are the values of the vertices of the bounding box relative to the resolution of images in each frame in terms of p i x e l units.
Our system adopts [35,36,37] to estimate the distance in real-world dimensions and [38] for transferring the resulting dimensions and combining them with the SLAM algorithm. In the pinhole camera model, the relationship between the camera’s image plane coordinates ( u i , v i ) and the 3D coordinates of people ( X i , Y i , Z i ) are relative to the camera frame, as expressed by the following equation,
u i v i 1 = f x s u 0 0 f y v 0 0 0 1 X i Y i Z i
where f x and f y are the focal lengths of the camera per axis in p i x e l s . u 0 and v 0 are the principal point offsets of the camera per axis in p i x e l s , ( u , v ) are the image coordinate axes in p i x e l units and s is axis the skew, which only appears if the u axis is not perpendicular to the v axis.
The aim of the depth information is to directly obtain the values of ( X i , Y i , Z i ) of the people relative to the frame of the camera fixed on-board the robot. Once these parameters are known, the relative distance from a person to the camera frame can be calculated in meter units. Distances d i from the camera to the p e r s o n i are defined as follows:
d i = X i 2 + Y i 2 + Z i 2 .

2.3. Last Observed Position State

The robot may lose the desired target due to, for example, sudden moves at high speeds around a corner. Prior information of the target person was recorded in the past state (i.e., tracking state) using the RGB-D camera, as mentioned above. When a target loss occurs, the robot navigates autonomously to the LOP of the desired target to re-find him/her and continue its following behavior. In order to achieve this task, the system uses the recorded last frame information about the target as input to the SLAM algorithm and a 2D pre-built map. Laser-based SLAM is adopted to create a depth map using a gmapping package as a ROS node called slam_gmapping (https://wiki.ros.org/gmapping), which has been proposed by Grisetti et al. [39].
If it can re-find the desired target, the robot switches to the tracking state in which the target person is within the camera’s FoV. If it fails to find the target person, the system switches to the searching state, which is presented below.

2.4. Searching State

The previous state assumes that a robot could re-find a person within the FoV of the camera when the mobile robot arrives at the LOP of a person. Intuitively, a person moves in several trajectories as time passes; hence, the robot may sometimes struggle to re-find and follow a person. However, the robot randomly rotates and scans the place to detect the desired target. If it re-finds the desired target, it switches to the tracking state. If the robot fails to find the desired human, such as in the event of the person moving away after a turn (i.e., corner) and disappearing, the robot will stop after a given time or allow the users to stop it. In some of these cases, it is logical to suppose that it would not be capable of detecting the desired human.

3. Robot Controller

To design a robust controller, we should study all the aspects of the robot navigation during all expected states. There are three major states as mentioned above: (i) the tracking state, (ii) the LOP state and (iii) the searching state. In the first two states, a proportional integral derivative (PID) controller is used [40] to reduce the error distance between the robot position and target position. In the tracking state, the target position is provided by the visual tracking. In the LOP state, the desired position is the last known position of the target before the loss of the target. The control system is a closed-loop (also known as a feedback control system). When the robot reaches the last known position of the target in the LOP, the robot moves to the searching state. While in the searching state, the robot will move randomly in an open-loop manner around the last known position to recover the target. For the safety of the robot during the random movement, we defined the maximum values of angular and linear velocities and the safe distance from the environment.
For the tracking state, the relationship between the linear velocity of the robot ( V ) and the error rate in the current distance, ( d D ) is directly proportional, where D , d are the safe and actual distances between the robot and the target, respectively. Our system assumes D = 0.9 m, and d is the actual distance of the desired target person, which is obtained from Equation (4). The robot can navigate at speeds of up to 2 m/s and V m a x = 1.08 m/s in our experiments due to safety concerns.
The relationship between the angular velocity ( ω ) and the error rate in the u coordinate direction ( c u u c e n t e r ) is also directly proportional, where u c e n t e r is the image center in the horizontal direction. The proposed system adopts ( u c e n t e r = 320 p i x e l ) in our experiments, and c u is the current center of the target in the image in p i x e l units, which is obtained from Equation (1). The equations of the robot velocity in the tracking state can be written as follows:
V = k p ( d D ) + K i T ( d D ) d t
ω = k p ( c u u c e n t e r ) + K i T ( c u u c e n t e r ) d t
where ( c u u c e n t e r ) and ( d D ) are error rates for the angular and linear velocity, respectively. K p , K i , K p , K i are the PI constants and d t is the time rate between consecutive camera frames during tracking. The most difficult aspect of the PI controller is adjusting the K p , K i , and k d values to obtain the desired behavior. In our work, we got a K p value of 0.415 when the distance (d) was less than 3.5 m and 0.168 when d was greater than 3.5 m due to safety concerns; we also got a K i value of 0.001 . Furthermore, we got K p and K i values of 0.0023 and 0.0001 , respectively.
For the LOP, proportional robot control was used to navigate to the last observed point of the target ( x t , y t ) in the 2D plane. The pose of the robot was set as ( x r , y r , θ r ), where the angle ( θ r ) represents the robot’s heading (in radians). The angle of the target ( θ t ), from the positions of the target and the robot is determined as
θ t = tan 1 y t y r x t x r
The equations of robot velocity in the LOP state can be written as follows:
ω = k p ω ( θ t θ r )
V = k p v e p ; e p = ( y t y r ) 2 + ( x t x r ) 2
where k p ω , k p v are the proportional controller constants and ( θ t θ r ) , e p are the error rates for the angular ( ω ) and linear ( V ) velocity, respectively. In all experiments, the constants were set as k p ω = k p v = 1 .

4. Results and Discussion

We conducted extensive experiments for human following to demonstrate the performance of the proposed approach and tracking recovery from a failure situation when the target is lost. We divided our experiments into two categories, the first type of experiments included multi-person detection along with a target person, while the second included only one person. The visual tracking state may sometimes fail to track the target person, and the robot must be able to recover from such a failure situation. Therefore, we implemented the three major states described above, i.e., tracking, LOP, and searching. The units of coordinates, velocities, and time in all the experiments were meters, meters per second, and seconds, respectively. This section introduces a detailed explanation of the robotic behavior while testing.

4.1. Infrastructure Setting

The experiments reported in this paper were carried out using a robot, called Rabbot (http://edu.gaitech.hk/ria_e100/index.html), to evaluate the performance of the proposed approach, shown in Figure 6. The robot was fitted with an SLAMTEC RPLIDAR A2M8, Orbbec Astra and a computer (2.8 GHz with 4 GHz turbo frequency, i5 processor, hexa core, 8 GB RAM, and 120 GB SSD) called the robot computer, under ROS Kinetic+Ubuntu 16.04 64-bit. An Orbbec Astra camera was an RGB-D sensor that provides synchronized color and depth images. Rabbot was equipped by the camera with a resolution of 640 × 480 and a height modified to 1.47 m from the floor for better visual tracking of the environment. As mentioned above in Section 2.3, to create a map and obtain precise information about surrounding object positions, an RPLIDAR A2M8 was used. The code was run on two computers, one of them was the workstation for the deep learning module (Intel Core i7-6700 CPU @ 3.40 GHz) and the second one was an on-board computer in the robot. The communication between the workstation and the robot computer was made via a 5 GHz WiFi connection. Rabbot has a weight of approximately 20 kg and can carry a payload of approximately 50 kg.

4.2. Human Following Experiments

The testing environment is depicted in Figure 7. The target person walks along a realistic scenario path and the robot follows him/her. The path begins from the laboratory (small black circle with the letter ‘S’, which is the robot departure point) to the end of the corridor (the small black circle with the letter ‘E’, which is the target person destination point). We chose this path for testing because it is the longest and most optimal for the WiFi signal range. Its length is approximately 27 m, measured using wheel odometry, then validated with the walking counter (WC). The blue and green dashed lines represent the paths of the robot and the target, respectively. The target person walks in front of the robot and the Orbbec Astra camera tracks the target person using RGB and depth data. The red circles indicate other people, the blue numbers indicate glass walls, doors, and windows that allow sunlight to pass into the testing environment.
In the first category, the experiments included multi-person detection along with the target person in the testing environment. The performance of the proposed system was analyzed in the 18 experiments, as shown in Table 1. The system recorded data while the target walked at a normal speed. The total number of frames, the traveled distance of the robot and the time for these experiments were (18,563 frames, 450.80 m and 797.12 s), and the averages of frames, distance, and time for every experiment were (1031.3 frames, 25.04 m and 44.28 s), respectively.
The robot failed to recover the target person in two experiments for two main reasons; the first was due to the disappearance of the target person around a corner or other objects and the continuous walk toward the destination point; the second was because the robot tracked another person due to severe illumination changes, which is one limitation of the used cameras. This failure occurred after the robot traveled 18.58, 10.94 m in two failed experiments for tracking, LOP, and searching, as defined in Figure 1. However, the target person was correctly followed to the destination point during 16 of the experiments. The symbols, O , X shown in the second row from Table 1 indicate the experiment status. O indicates that the robot arrived at the predefined destination point and X indicates otherwise. Our judgment was based on success or failure, i.e., if the robot arrived at the destination point or not, regardless of the percentage of traveled distance in the failed experiment. The system was manually stopped by the observer.
In the successful experiments, the average travel distance of the robot was 26.33 m, which is comparable to the average travel distance of the target, 26.47 m. The traveled distance of the target was measured eight times using a WC. In addition, it was estimated for all experiments via the robot’s camera using a tf tree, as shown in Figure 5b. The average computation time for the video in real-time was 23.34 f r a m e /s, i.e., 42.84 ms, which corresponds to the frame rate of the Orbbec Astra camera and is suitable for a smooth robust human-following robot. The average and maximum robot velocities were (0.57 and 1.08 m/s), respectively.
Figure 8 shows snapshots of the experiments. The green bounding box around a person in the images indicates the target person and the red bounding boxes indicate the other people. The yellow rectangle on the target person indicates the ROI and the blue rectangle indicates the detection of the clothes’ colors. The test began at the laboratory and finished the end of the corridor in an indoor colorful environment. The mobile robot followed the target person based on his/her clothes’ colors.
At the beginning of the test, the mobile robot extracted the color feature of the target person and estimated his position, then started following him. A second person moved from the laboratory (Figure 8a) to the corridor (Figure 8b). During the testing, the target person disappeared behind the wall that separates the laboratory and corridor. As a result, the robot lost the target, and it simultaneously detected another person as a non-target (Figure 8c). However, the robot re-tracked the target based on the LOP and it correctly re-identified him based on the color feature. Two other people appeared in the scene, one of them stood in front of the common room while another was walking in front of the target (Figure 8d). The robot kept following the target while one person was walking forward and backward freely along the left side of the target in the corridor (Figure 8d,e), and another was standing on the right side (Figure 8f). Although there was an illumination change due to the sunlight in the testing environment, especially at the end of the corridor (see the difference in Figure 8d,e), the robot successfully followed the target person.
Figure 9 shows the robot and the target person movement trajectories in a realistic scenario during the tracking state. Figure 9a,b show the positions of the robot and person along the x coordinate and y coordinate, respectively. The green curve indicates the target person trajectory while the blue dotted curve indicates the robot trajectory. The red, yellow areas between the curves of the robot and the person along the x , y coordinates indicate the distance difference ( d D ) between them with + , directions relative to a departure point ( 0 , 0 ) of the robot, respectively. The high frequency in the position component while rotating both the robot and person from 11 to 13.5 s at a corner between the laboratory and corridor is due to sensor noise. If we compare their trajectories with the testing environment, the target person walked in the laboratory in the + x direction, with a slight slope toward the + y direction. Then, they turned to the right at the door and kept walking in the corridor in the y direction (see Figure 7.)
Figure 9c,d show the linear and rotational velocities, respectively. The velocity components of the robot are smooth that are well within the limits of the robot. In this experiment, the maximum and minimum linear velocities of the robot were 1.039 and 0.0 m/s at the first and last frame for the successful frames, respectively, and the average linear velocity was 0.85 m/s, while the rotational velocity was high during a rotation of the robot in the y direction from 9.5 to 15 s at the corner between the laboratory and the corridor. When the target person walked in an approximately straight line, the rotational velocity was approximately equal to 0.0 m/s because the position of the person center was kept in the heading direction. Due to camera noise, the robot lost the target. The loss of 2, 4, and 4 frames occurred at 1.5, 11.2, and 12 s, respectively. As a result, the robot started rotating toward the LOP with a rotational velocity of ± 0.45 m/s and the linear velocity dropped to zero. The black circles indicate the lost frames, as shown in Figure 9d. This means that the robot is in a persistent state of activity even when the target is lost for a millisecond. When the person arrived at the destination point, he stopped, therefore, the distance between the robot and the person began decreasing to 0.9 m. Consequently, the velocity of the robot started decreasing until it stopped (see Figure 9b,c).

4.3. Target Recovery Experiments

To show the capability of target recovery, we performed different experiments. The testing environment without other people is depicted in Figure 10. It is identical to the testing environment shown in Figure 7. The path starts from the laboratory (small black circle with the letter ‘S’, which is the robot departure point) to the end of the corridor (the small black circle with the letter ‘E’, which is the target person destination point). The blue and green dashed lines represent the paths of the robot and the target, respectively.
Table 2 shows the result of the 11 experiments, which constitute the second type of experiments. These experiments include only one person in the testing environment. The total number of frames, the traveled distance of the robot, and the time for all experiments were approximately 15,562 frames, 287.9 m and 673.6 s, and the averages of frames, distance, and time for every experiment were 1414.73 frames, 26.18 m and 61.24 s, respectively. The robot failed to recover the target person in one experiment; after the robot moved to the LOP, it looked for the person in the opposite direction, as the person moved to the right and the robot moved to the left in the corridor. However, the robot recovered human tracking and continued to follow the person to the destination point in 10 experiments. The symbols, O , X shown in the second row of Table 2 indicate the experiment status. O indicates that the robot arrived at the predefined destination point while X indicates otherwise. Although these experiments were conducted with one person, the rate of successfully tracked frames decreased from 91.98 to 0.58 % due to the intentional disappearance; this indicate that the robot spent a long time in the LOP and searching states. Meanwhile, the overall success rate increased from 88.9 to 90.91 % due to the behavior of waiting near the LOP after the target was lost.
The graphs in Figure 11 show the robot and person movement trajectories in a realistic scenario under the tracking and LOP states. Figure 11a,b show the positions of the robot and the person along the x y coordinates. The green curve indicates the target person trajectory while the blue dotted curve indicates the robot trajectory. Black and red vertical dotted lines indicate the ends of the tracking and re-tracking, respectively. The green slanted dotted line indicates the disappearance of the target person from the robot’s FoV, as shown in Figure 11a,b. Similar to Figure 9a,b, the difference between these two figures is the disappearance of the target twice intentionally from the robot’s FoV, and the person waiting near the LOP behind the wall. However, the robot was able to re-track the target after arriving at the LOP in the first disappearance (from 15.1 to 30.7 s) while it was able to re-track the target in a short time during the second disappearance (from 34.42 to 37.1 s). When the person arrived at the destination point, he stopped, thereby, the distance between the robot and person began decreasing to 0.9 m (see Figure 11b).
The robot and person trajectories were created in the x y coordinates by a path planner, as visualized in Figure 11c. The trajectories of both the robot and the person are almost identical. The filled black and empty red circles indicate the positions of the robot and person at the last tracked frame, respectively. The movement of the robot between them was carried out under the LOP state.
Figure 12 shows the trajectories of the robot and person in the 2D plan for three states. The green slanted dotted line indicates the disappearance of the target person from the robot’s FoV. Similar to Figure 11c, the difference is that this experiment was conducted in the laboratory only and the robot could not find the target at the LOP; therefore, it switched to the searching state. The robot departed from the origin point ( 0 , 0 ) toward the target person based on visual tracking, then lost the target at ( 3.7 , 0.084 m) where the LOP was at ( 4.8 , 0.34 m). The robot moved to the LOP and did not find the target there due to a change in his position; it switched to the searching state to find the target, and found him where the robot was, at ( 4.419 , 2.06 m). The filled black and empty red circles indicate the positions of the robot and person at the last tracked frame, respectively. The trajectory of the robot between them was carried out under the LOP state. The filled blue circle indicates the end of the position of the robot in the searching state and the start of the re-tracking of the target. The movement of the robot between the empty red and the filled blue circle was carried out under the searching state.

4.4. Comparison with Previous Approaches

As mentioned in Section 1, there have been significant researches on the subject of the human following of mobile robots. It is generally difficult to directly compare the performances of different robotic systems due to several factors such as different functional scopes, non-uniformity of sensors/hardware used, unavailability of a common dataset and non-identical operating conditions. However, in order to clarify our technical contributions, we try our best to compare our system with previous significant approaches, such as by Koide and Miura [17] and Lee et al. [41] at the function or module levels rather than the full system level.
Developed when deep learning was not common, Koide and Miura showed the full-scale human following that worked in both indoor and outdoor situations. They utilized various features such as appearance, color, height and gait, jointing features with online boosting to have a robust human following in severe illumination environments. On the other hand, Lee et al. used deep learning for indoor human following like us, adapting a sophisticated Bayesian-based target trajectory prediction. They did not show the experiments with occlusions by other people and illumination changes exclusively as much as we do.
The successful tracking rate, defined as the ratio of the number of successful frames (where the target was correctly tracked) to the total number of frames [13] is critical for successful human following. As shown in Table 1, the successful tracking rate of our indoor human following was about 92%, while Koide’s approach showed 81.0% for their outdoor following. Unfortunately, Koide and Miura did not provide the tracking rate of the indoor following in their paper. Despite the mission environments are different, our approach showed better performance for the tracking of the target during the following. Although Lee et al. had similar indoor following like us, we cannot compare because they did not provide their tracking rate.
The computation time for the overall human following that we proposed took approximately 0.042 s, as shown in Table 1. According to our findings in the paper, Koide’s method took 0.5 s, which was much longer than ours probably because they adapted multiple feature joints for both indoor and outdoor use and did not use the latest technologies like deep learning. According to our findings in the paper, Lee’s method took 0.3 s, which was also much longer than ours probably because they used Bayesian-based target trajectory prediction (a sophisticated function that we do not have yet.)
The mission success rate, showing how successfully the robot followed the target to the destination, was 88.9% in our case, as shown in Table 1. Other papers did not provide any information about the rate. The high mission success rate is important for robot developments because it shows how well the robot system is integrated for repeatability. Although we need to work harder to improve our human following, our system is fast, robust and repeatable in the event of occlusions and illumination changes.

5. Conclusions

This paper presents a novel framework that integrates the deep learning technique and state-machine control to develop a reliable and repeatable human-following robot. People are detected and tracked using an SSD detector, robust to occlusion. The target person is identified by extracting the color feature using an HS-histogram from a video sequence, robust to illumination changes. The robot follows the target safely to the destination using SLAM with the LIDAR sensor for obstacle avoidance. The contributions made in this work are given as follows. First, we designed an efficient and repeatable robotic state-machine control so that the robot stays in an active state of the human following without freezing. Second, we adopted a robust vision algorithm from occlusions and illumination changes integrating a color feature with the deep learning algorithm. Third, we developed a robust human following and verified it in realistic indoor situations.
Despite the high success rate in the performance of the proposed approach, it has some limitations. For instance, person identification is fully dependent on the color feature, which becomes meaningless when there are people wearing clothes that have the same or similar color. In addition, our proposed system is weak in the case of severe illumination changes. The proposed method has numerous applications for mobile robots, such as carrying luggage or belongings for the elderly or disabled or delivering goods in grocery stores or airports. For possible future improvements, we are currently working on predicting the target’s trajectories with sophisticated algorithms such as the Markov and Bayesian model instead of random search. In addition, we plan to use more human features in addition to the color, such as texture and shape, to make it robust to occlusions and illumination changes in various situations.

Author Contributions

R.A. developed the experimental setup, realized the tests, coded the software necessary for the acquisition of the data from sensors, realized the software necessary for the statistical analysis of the delivered data, and prepared the manuscript. M.-T.C. provided guidance during the whole research, helping to set up the concept, design the framework, analyze the results and review the manuscript. All authors have contributed significantly, and have participated sufficiently to take responsibility for this research. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology Innovation Program (20003762) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this paper:
FRPFreezing Robot Problem
RGB-DRed, Green, Blue-Depth
HSVHue, Saturation, and Value
FoVField of View
SSDSingle Shot Detector
GPUGraphics Processing Unit
CPUCentral Processing Unit
LOPLast Observed Position
SLAMSimultaneous Localization and Mapping
ROSRobot Operating System
CNNsConvolutional Neural Networks
YOLOYou Only Look Once
R-CNNRegions with Convolutional Neural Network
SIFTScale Invariant Feature Transformation
HOGHistograms of Oriented Gradient
ROIRegion of Interest
PIDProportional Integral Derivative
WCWalking Counter
Exp.Experiment
Elev.Elevator
labLaboratory
ADHDAttention-Deficit Hyperactivity Disorder
fpsFrame per Second

References

  1. Triebel, R.; Arras, K.; Alami, R.; Beyer, L.; Breuers, S.; Chatila, R.; Chetouani, M.; Cremers, D.; Evers, V.; Fiore, M.; et al. Spencer: A Socially Aware Service Robot for Passenger Guidance and Help in Busy Airports; Field and Service Robotics; Springer: Berlin, Germany, 2016; pp. 607–622. [Google Scholar]
  2. Tomoya, A.; Nakayama, S.; Hoshina, A.; Sugaya, M. A mobile robot for following, watching and detecting falls for elderly care. Procedia Comput. Sci. 2017, 112, 1994–2003. [Google Scholar] [CrossRef]
  3. Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art. arXiv 2017, arXiv:1704.05519. [Google Scholar]
  4. Mabrouk, A.B.; Zagrouba, E. Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Syst. Appl. 2018, 91, 480–491. [Google Scholar] [CrossRef]
  5. Thomas, G.; Gade, R.; Moeslund, T.B.; Carr, P.; Hilton, A. Computer vision for sports: Current applications and research topics. Comput. Vis. Image Underst. 2017, 159, 3–18. [Google Scholar] [CrossRef]
  6. Trautman, P.; Ma, J.; Murray, R.M.; Krause, A. Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation. Int. J. Robot. Res. 2015, 34, 335–356. [Google Scholar] [CrossRef] [Green Version]
  7. Schlegel, C.; Illmann, J.; Jaberg, H.; Schuster, M.; Wörz, R. Vision Based Person Tracking with a Mobile Robot; BMVC: London, UK, 1998; pp. 1–10. [Google Scholar]
  8. An, N.; Sun, S.Y.; Zhao, X.G.; Hou, Z.G. Online context-based person re-identification and biometric-based action recognition for service robots. In Proceedings of the 2017 29th Chinese Control And Decision Conference (CCDC), Istanbul, Turkey, 12–15 October 2017; pp. 3369–3374. [Google Scholar]
  9. Bellotto, N.; Hu, H. Multisensor-based human detection and tracking for mobile service robots. IEEE Trans. Syst. Man Cybern. Part B 2008, 39, 167–181. [Google Scholar] [CrossRef] [Green Version]
  10. Choi, M.T.; Yeom, J.; Shin, Y.; Park, I. Robot-Assisted ADHD Screening in Diagnostic Process. J. Intell. Robot. Syst. 2019, 95, 351–363. [Google Scholar] [CrossRef]
  11. Li, H.; Zhao, Q.; Li, X.; Zhang, X. Object detection based on color and shape features for service robot in semi-structured indoor environment. Int. J. Intell. Robot. Appl. 2019, 3, 430–442. [Google Scholar] [CrossRef]
  12. Ren, Q.; Zhao, Q.; Qi, H.; Li, L. Real-time target tracking system for person-following robot. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016; pp. 6160–6165. [Google Scholar]
  13. Gupta, M.; Kumar, S.; Behera, L.; Subramanian, V.K. A novel vision-based tracking algorithm for a human-following mobile robot. IEEE Trans. Syst. Man Cybern. Syst. 2016, 47, 1415–1427. [Google Scholar] [CrossRef]
  14. Gross, H.M.; Scheidig, A.; Debes, K.; Einhorn, E.; Eisenbach, M.; Mueller, S.; Schmiedel, T.; Trinh, T.Q.; Weinrich, C.; Wengefeld, T.; et al. ROREAS: robot coach for walking and orientation training in clinical post-stroke rehabilitation—Prototype implementation and evaluation in field experiments. Auton. Robot. 2017, 41, 679–698. [Google Scholar] [CrossRef]
  15. Oishi, S.; Kohari, Y.; Miura, J. Toward a robotic attendant adaptively behaving according to human state. In Proceedings of the 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), New York, NY, USA, 26–31 August 2016; pp. 1038–1043. [Google Scholar]
  16. Ferrer, G.; Zulueta, A.G.; Cotarelo, F.H.; Sanfeliu, A. Robot social-aware navigation framework to accompany people walking side-by-side. Auton. Robot. 2017, 41, 775–793. [Google Scholar] [CrossRef] [Green Version]
  17. Koide, K.; Miura, J. Identification of a specific person using color, height, and gait features for a person following robot. Robot. Auton. Syst. 2016, 84, 76–87. [Google Scholar] [CrossRef]
  18. Brunetti, A.; Buongiorno, D.; Trotta, G.F.; Bevilacqua, V. Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 2018, 300, 17–33. [Google Scholar] [CrossRef]
  19. Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019. [Google Scholar] [CrossRef] [Green Version]
  20. Kim, P.; Chen, J.; Cho, Y.K. SLAM-driven robotic mapping and registration of 3D point clouds. Autom. Constr. 2018, 89, 38–48. [Google Scholar] [CrossRef]
  21. Abdelwhab, A.; Viriri, S. A survey on soft biometrics for human identification. Mach. Learn. Biom. 2018. [Google Scholar] [CrossRef]
  22. Rusu, R.B.; Cousins, S. 3d is here: Point cloud library (pcl). In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1–4. [Google Scholar]
  23. Balsa-Comerón, J.; Guerrero-Higueras, Á.M.; Rodríguez-Lera, F.J.; Fernández-Llamas, C.; Matellán-Olivera, V. Cybersecurity in Autonomous Systems: Hardening ROS Using Encrypted Communications and Semantic Rules; Iberian Robotics Conference; Springer: Berlin, Germany, 2017; pp. 67–78. [Google Scholar]
  24. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
  25. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
  26. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single Shot Multibox Detector; European Conference on Computer Vision; Springer: Berlin, Germany, 2016; pp. 21–37. [Google Scholar]
  27. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  28. Munaro, M.; Menegatti, E. Fast RGB-D people tracking for service robots. Auton. Robot. 2014, 37, 227–242. [Google Scholar] [CrossRef]
  29. Lowe, D.G. Object Recognition From Local Scale-Invariant Features; ICCV: Piscataway, PA, USA, 1999; Volume 99, pp. 1150–1157. [Google Scholar]
  30. Satake, J.; Chiba, M.; Miura, J. Visual person identification using a distance-dependent appearance model for a person following robot. Int. J. Autom. Comput. 2013, 10, 438–446. [Google Scholar] [CrossRef] [Green Version]
  31. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
  32. Cheng, X.; Jia, Y.; Su, J.; Wu, Y. Person-following for Telepresence Robots Using Web Cameras. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 2096–2101. [Google Scholar]
  33. Jangbari, P.; Patel, D. Review on region of interest coding techniques for medical image compression. Int. J. Comput. Appl. 2016, 134, 1–5. [Google Scholar] [CrossRef]
  34. Kim, M.; Arduengo, M.; Walker, N.; Jiang, Y.; Hart, J.W.; Stone, P.; Sentis, L. An architecture for person-following using active target search. arXiv 2018, arXiv:1809.08793. [Google Scholar]
  35. Zisserman, R.H.A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  36. Henry, P.; Krainin, M.; Herbst, E.; Ren, X.; Fox, D. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Rob. Res. 2012, 31, 647–663. [Google Scholar] [CrossRef] [Green Version]
  37. Geng, J. Structured-light 3D surface imaging: A tutorial. Adv. Opt. Photonics 2011, 3, 128–160. [Google Scholar] [CrossRef]
  38. Foote, T. tf: The transform library. In Proceedings of the 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA), Woburn, MA, USA, 22–23 April 2013; pp. 1–6. [Google Scholar]
  39. Grisetti, G.; Stachniss, C.; Burgard, W. Improved techniques for grid mapping with rao-blackwellized particle filters. IEEE Trans. Robot. 2007, 23, 34. [Google Scholar] [CrossRef] [Green Version]
  40. O’Dwyer, A. Handbook of PI and PID Controller Tuning Rules; Imperial College Press: London, UK, 2009. [Google Scholar]
  41. Lee, B.J.; Choi, J.; Baek, C.; Zhang, B.T. Robust Human Following by Deep Bayesian Trajectory Prediction for Home Service Robots. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7189–7195. [Google Scholar]
Figure 1. State machine diagram for the proposed human following system.
Figure 1. State machine diagram for the proposed human following system.
Sensors 20 02699 g001
Figure 2. Framework of identification system.
Figure 2. Framework of identification system.
Sensors 20 02699 g002
Figure 3. MobileNet-SSD network architecture.
Figure 3. MobileNet-SSD network architecture.
Sensors 20 02699 g003
Figure 4. Human detection showing boundary boxes around the human’s body and region of interest. (a) Human detection. (b) Region of interest for color extraction.
Figure 4. Human detection showing boundary boxes around the human’s body and region of interest. (a) Human detection. (b) Region of interest for color extraction.
Sensors 20 02699 g004
Figure 5. Coordinate transformation of a mobile robot. (a) A mobile robot mounted with a fixed camera on a 2 D plane. (b) A view of the standard transform library (tf) tree in the tf rviz plugin.
Figure 5. Coordinate transformation of a mobile robot. (a) A mobile robot mounted with a fixed camera on a 2 D plane. (b) A view of the standard transform library (tf) tree in the tf rviz plugin.
Sensors 20 02699 g005
Figure 6. Mobile robot equipped with an RGB-D camera and range finding sensor.
Figure 6. Mobile robot equipped with an RGB-D camera and range finding sensor.
Sensors 20 02699 g006
Figure 7. Realistic scenario of paths of the robot and the target in human following.
Figure 7. Realistic scenario of paths of the robot and the target in human following.
Sensors 20 02699 g007
Figure 8. Snapshots of the robot’s field of view while performing the human following experiments: (a) It begins to follow the target wearing the white hooded t-shirt. (b) The target is walking out of the laboratory, and another person is standing in the corridor outside the door. (c) The robot loses the target momentarily at the laboratory’s door. (d) The robot re-tracks the target immediately. (e) The robot follows well while another person is walking in the opposite direction. (f) The robot follows well despite illumination changes while the other person is standing in the middle of the corridor.
Figure 8. Snapshots of the robot’s field of view while performing the human following experiments: (a) It begins to follow the target wearing the white hooded t-shirt. (b) The target is walking out of the laboratory, and another person is standing in the corridor outside the door. (c) The robot loses the target momentarily at the laboratory’s door. (d) The robot re-tracks the target immediately. (e) The robot follows well while another person is walking in the opposite direction. (f) The robot follows well despite illumination changes while the other person is standing in the middle of the corridor.
Sensors 20 02699 g008
Figure 9. Robot and person movement trajectories in a realistic scenario for tracking state. (a) Relative position between the target and the robot in the x-coordinate. (b) Relative position between the target and the robot in the y-coordinate. (c) Linear velocity. (d) Rotational velocity.
Figure 9. Robot and person movement trajectories in a realistic scenario for tracking state. (a) Relative position between the target and the robot in the x-coordinate. (b) Relative position between the target and the robot in the y-coordinate. (c) Linear velocity. (d) Rotational velocity.
Sensors 20 02699 g009
Figure 10. Realistic scenario of paths of the robot and the target for recovery tests.
Figure 10. Realistic scenario of paths of the robot and the target for recovery tests.
Sensors 20 02699 g010
Figure 11. Robot and person movement trajectories in a realistic scenario for two states. (a) Robot and person trajectories along the x coordinate. (b) Robot and person trajectories along the y coordinate. (c) Robot and person trajectories in the world coordinate (2D) for two states.
Figure 11. Robot and person movement trajectories in a realistic scenario for two states. (a) Robot and person trajectories along the x coordinate. (b) Robot and person trajectories along the y coordinate. (c) Robot and person trajectories in the world coordinate (2D) for two states.
Sensors 20 02699 g011
Figure 12. Robot and person trajectories in the world coordinate (2D) for three states.
Figure 12. Robot and person trajectories in the world coordinate (2D) for three states.
Sensors 20 02699 g012
Table 1. Experimental results of human following.
Table 1. Experimental results of human following.
ParametersExp. 1Exp. 2Exp. 3Exp. 4Exp. 5Exp. 6Exp. 7
Mission success rate (%)OOOOOOx
Target’s travel distance (m)28.0627.9328.0828.4227.9727.740.00
Robot’s travel distance (m)25.7726.9526.8627.6526.8826.4618.58
Robot’s travel time (s)44.2842.1643.4445.6842.1842.3968.96
Robot’s average velocity (m/s)0.580.640.620.610.640.620.27
# of frames10421024102211209329921507
Successfully tracked (frames)103099710101108928982679
Lost track of the target (frames)12271212410828
Successfully tracked (s)43.7741.0442.9345.1942.0041.9631.07
Lost track of the target (s)0.511.110.510.490.180.4337.89
Lost track of the target rate (%)1.22.61.21.10.41.054.9
Successful tracking rate (%)98.897.498.898.999.699.045.1
Computation time (fps)23.5324.2923.5324.5222.1023.4021.85
ParametersExp. 8Exp. 9Exp. 10Exp. 11Exp. 12Exp. 13Exp. 14
Mission success rate (%)xOOOOOO
Target’s travel distance (m)0.0027.9327.3727.4028.7927.8528.05
Robot’s travel distance (m)10.9427.0826.4325.8927.2126.4525.82
Robot’s travel time (s)39.9344.5344.2741.2242.8244.5842.90
Robot’s average velocity (m/s)0.270.610.600.630.640.590.60
# of frames961104298799910261017991
Successfully tracked (frames)479969983994994941973
Lost track of the target (frames)4827345327618
Successfully tracked (s)19.9041.4144.0941.0141.4841.2542.12
Lost track of the target (s)20.033.120.180.211.343.330.78
Lost track of the target50.27.00.40.53.17.51.8
Successful tracking rate (%)49.893.099.699.596.992.598.2
Computation time (fps)24.0723.4022.2924.2423.9622.8123.10
ParametersExp. 15Exp. 16Exp. 17Exp. 18TotalAverageStd
Mission success rate (%)OOOO88.9%--
Target’s travel distance (m)26.6828.1728.0826.37444.8924.729.01
Robot’s travel distance (m)25.8026.8325.3523.86450.8025.044.06
Robot’s travel time (s)44.0642.1541.7239.87797.1244.286.36
Robot’s average velocity (m/s)0.590.640.610.60-0.570.11
# of frames984102994094818,563.01031.3126.78
Successfully tracked (frames)956100392390816,857.0936.5141.83
Lost track of the target (frames)282617401706.094.8213.28
Successfully tracked (s)42.8141.0940.9638.18722.2740.135.86
Lost track of the target (s)1.251.070.751.6874.854.169.57
Lost track of the target2.82.51.84.2-8.020.16
Successful tracking rate (%)97.297.598.295.8-91.980.16
Computation time (fps)22.3324.4122.5323.78-23.340.85
std: standard deviation, fps: Frame per Second.
Table 2. Overall experimental results without other people.
Table 2. Overall experimental results without other people.
ParametersExp. 1Exp. 2Exp. 3Exp. 4Exp. 5Exp. 6Exp. 7
Mission success rate (%)OOOOOOx
Target’s travel distance (m)32.93127.40128.90628.00530.9428.8190.00
Robot’s travel distance (m)30.15226.64927.45527.16227.93127.52214.641
Robot’s travel time (s)77.56752.43364.56661.68771.06762.70644.984
Robot’s average velocity (m/s)0.390.510.430.440.390.440.33
# of frames1699134115371537191315021153
Successfully tracked (frames)1280805624727663840214
Lost track of the target (frames)4195369138101250662939
Successfully tracked (s)58.43831.47526.21329.17824.63035.0698.349
Lost track of the target (s)19.12920.95838.35332.50946.43727.63736.635
Lost track of the target rate (%)0.250.400.590.530.650.440.81
Successful tracking rate (%)0.750.600.410.470.350.560.19
Computation time (fps)21.9025.5823.8124.9226.9223.9525.63
ParametersExp. 8Exp. 9Exp. 10Exp. 11TotalAverageStd
Mission success rate (%)OOOO90.91%--
Target’s travel distance (m)28.56227.28727.16426.977286.99226.098.84
Robot’s travel distance (m)27.43526.49226.34526.214287.99826.183.98
Robot’s travel time (s)71.97459.97551.33155.327673.61761.249.84
Robot’s average velocity (m/s)0.380.440.510.47-0.430.06
# of frames1309133311421096155621414.73251.87
Successfully tracked (frames)10438339248638816801.45266.45
Lost track of the target (frames)2665002182336746613.27334.73
Successfully tracked (s)57.34837.47941.53243.565393.27635.7514.51
Lost track of the target (s)14.62622.4969.79911.762280.34125.4911.85
Lost track of the target rate (%)0.200.380.190.21-0.420.21
Successful tracking rate (%)0.800.620.810.79-0.580.21
Computation time (fps)18.1922.2322.2519.81-23.202.63
std: standard deviation, fps: Frame per Second.

Share and Cite

MDPI and ACS Style

Algabri, R.; Choi, M.-T. Deep-Learning-Based Indoor Human Following of Mobile Robot Using Color Feature. Sensors 2020, 20, 2699. https://doi.org/10.3390/s20092699

AMA Style

Algabri R, Choi M-T. Deep-Learning-Based Indoor Human Following of Mobile Robot Using Color Feature. Sensors. 2020; 20(9):2699. https://doi.org/10.3390/s20092699

Chicago/Turabian Style

Algabri, Redhwan, and Mun-Taek Choi. 2020. "Deep-Learning-Based Indoor Human Following of Mobile Robot Using Color Feature" Sensors 20, no. 9: 2699. https://doi.org/10.3390/s20092699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop