The field of robotics has found numerous applications in the recent years. Earlier, the predominant use of robots was mainly in performing tedious and repetitive tasks, such as manufacturing and transporting. But, a new generation of robots with more intelligence has been shown to benefit several other industries, including service, medical, and entertainment. Particularly, robots capable of interacting with humans with natural behavior has begun to emerge extensively due to their closeness with humans.
Several methods have been explored to establish human–machine interaction (HMI) in the literature. Gesture recognition pertains to recognition of human expressions through their hands, head, and/or body movements. In the recent years, gesture recognition has been one of the most focused research areas, where new methods and applications of interacting with medical, service, and entertainment devices have been studied. For example, Greatzel et al. [1
] developed a system to replace standard computer mouse operation with hand gestures using a computer vision algorithm. The system is designed to establish non-contact human–computer interaction (HCI), which helps surgeons to use computers during surgery. With this method, a surgeon can use the non-contact mouse by placing his/her hand stationary in the workspace for a moment and moving it to the desired position. The system uses a Kalman filter to estimate hand velocity and predict the next hand position during its movement. In another study, an intelligent wheelchair which can be controlled by the user’s hand gestures was presented by Yoshinori et al. [2
]. This wheelchair can detect the owner’s face and follow their hand gestures to perform actions. The system allows the user to command the robotic wheelchair through hand gestures to approach or move away from them when they are not riding it. The Eigenface-based face recognition method is used in this wheelchair to detect its owner and receive hand gesture commands through the spotting recognition method. Experiments were conducted to confirm the working and usefulness of the system in several settings, and success was reported in detecting the owner’s face and following their commands through hand gestures.
As with the medical field, service robotics is another field where gestures are used to control and command robots to achieve various tasks. For example, David et al. [3
] proposed an approach to using showing and pointing hand gestures with a domestic service robot using a time-of-flight camera. The study evaluated the showing and pointing gestures through a set of different experiments and reported higher accuracy than existing stereo-based systems. The authors tested the system real-time with their domestic service robot which competed in a robotics competition. During this event, the robot successfully identified an object through the user’s pointing gestures, and approached and delivered the object to the user. In another setting, the robot recognized the object shown by the user and delivered a similar object located in its environment. In a laboratory based testing, 16 participants showed 24 different objects through pointing gestures to the robot 192 times, and the robot correctly identified the gestures 187 times with an accuracy of 97%. Yin and Xie [4
] proposed a set of hand gesture-based commands to a humanoid service robot, HARO-1, for controlling its arm movements and turn-taking. Six hand gestures for controlling the robot’s six-axis arms and two hand gestures for turning the robot clockwise and anti-clockwise were used in this method. Neural network was used to segment captured hand images and the hand posture was recognized using a topological feature extraction method. The system indicated success when integrated with the service robot, HARO-1, and demonstrated the effectiveness and robustness of the approach. In another study, Lee [5
] presented a new method for recognizing the gesture of a whole human body, which is used to operate the service robot. Unlike other gesture-controlled robots which use a part of a human body, this method spots and recognizes whole human body key gestures to command the robot. The system was integrated into T-Rot, a personal service robot, to evaluate real-time. With an accuracy of 97.4%, the system reported success in recognizing whole body gestures, such as ‘sitting on the floor’ and ‘getting down on the floor’.
Robot entertainment, a field of the entertainment industry, uses a variety of semi-autonomous and autonomous robots to entertain the users through gestures. Establishing natural interaction between humans and robots is very essential for these classes of robots due to their closeness with humans during their deployment. Vision-, voice-, and gesture-based interactions are few of the most desired interaction methods explored in entertainment robots. Particularly, gesture-based entertainment robots reported success in establishing human–robot interaction (HRI) effectively compared to other methods explored. As an illustration, Hasanuzzaman et al. [6
] presented a vision-based gesture recognition system using skin color segmentation and a pattern matching technique to create HRI between the entertainment robot—AIBO—and the user. The study trained the robot to recognize eight different hand gestures from the user and perform corresponding actions, such as ‘stand’, ‘walk forward’, and ‘sit’. Another study performed by Hasanuzzaman et al. [7
] described a gesture-based, human-centric HRI system using Software Platform for the Agent and Knowledge management (SPAK) platform. The system uses face and gesture recognition to identify the user and corresponding actions for the gesture of respective users. With this human-centric HRI system, the robot can perform different actions for the same gesture, according to the user recognized through the face recognition system. Several other robots, such as ROBITA, Robonaut, and Leonardo use gestures to interact with humans [8
Head pose detection is one of the gesture recognition techniques used in various applications. It has been used in various fields such as robotics [9
], computer engineering [10
], physical science and health industry [12
], natural sciences [13
], and industrial academic areas [14
]. As an illustration, Sileye and Jean-Marc [17
] deployed head pose detection using the Hidden Markov Model to recognize the visual focus of attention of participants in meetings. Eric and Mohan [18
] presented a vision-based head pose detection and tracking method for monitoring driver awareness. The authors propose this method to monitor driver alertness and their head pose orientation while driving. Javier and Patricio [19
] proposed head pose detection between robots to decide the next action by an observing robot. In this study, the authors used two robots, one acting as a performer and the other as an observer. The performer robot changes its head pose which is processed by the observer robot to perform actions. The head pose detection and recognition system has found wider application areas such as face recognition, action recognition, gait recognition, head recognition, and hand recognition systems [20
]. In such systems, several sensors like binary, digital, and depth cameras are used to train and detect postures [24
]. In such systems, several machine learning feature extraction algorithms and classification methods are implemented for detection and recognition of gestures. For instance, Samina et al. [31
] analyzed several feature selection and extraction methods, and presented their effectiveness in achieving high performance of learning algorithms. A real-time tracking system for human pose recognition was proposed by Jalal et al. [32
] using ridge body part features, in which a support vector machine (SVM) was used to recognize different poses. In another study, a novel subspace learning algorithm, called discriminant simplex analysis (DSA), was developed by Fu et al. [33
] in which the intraclass compactness and interclass separability were measured by distances.
Even though head pose detection has been studied in robotics for many years, there are applications which can be further studied to effectively use a head pose detection method. Pet robots are one of the areas where it can be very useful in establishing human–robot interactions, but was given primitive focus in the literature. The applications of pet robots are manifold ranging from medical, service, and the entertainment industry. For example, a pet robot that we have developed has been used to reduce stress levels of patients [34
], improve learning abilities of children [35
], and entertain participants [36
]. With such multi-industry applicability, pet robots with head pose detection can further improve closeness with humans and create effective human–robot interaction models. Even though several pet robots have been designed and developed for various needs in the literature, there is a complete absence of wearable pet robots and human–robot interaction models in wearable pet robots. With known benefits of wearable pets [37
], designing a wearable pet robot can provide extensive research and application opportunities in several sectors. In this paper, we present the design and development of a wearable parrot-inspired pet robot, KiliRo, and its human–robot interaction model using vision-based head pose detection. The novelty of this paper is threefold: First, we introduce a new design and development of a wearable parrot-inspired pet robot. Second, we provide the design of human-robot interaction model for wearable pet robots using a vision-based head pose detection method. Third, we quantitatively demonstrate the success of this system through head pose images captured from the robot wearers shoulder in five different orientations.
The remainder of this paper is organized as follows: After the presenting the system architecture of our KiliRo robot in Section 2
, we outline our system consisting of methods for detecting and perceiving head pose orientation of the person wearing the robot (Section 3
). In Section 4
, we present the experiments involving 1380 images of head poses in five different orientations to validate our approach. Lastly, in Section 5
, we conclude this study and discuss the future works.
2. Robot Architecture
The main scope of this research study is to design and develop a wearable pet robot that can mimic the head pose orientation of the wearer. In terms of morphology, the KiliRo robot can be defined as a two-legged wearable robot, having a physical appearance that resembles a parrot.
We considered a set of design constraints in deciding the dimensions of the robot during the concept generation process:
After a series of brainstorming sessions on concept generation and selection sessions, we developed a wearable pet robot KiliRo centered on achieving 180 degrees of rotating head design. The curvature of the robot’s leg design was optimized to create a wearable robot design. The dimensions and weight of the robot played a vital role in the wearable robot design, as the wearers wore them in most cases. The dimensions of KiliRo-W and the selection of commercial devices, such as servo motors, electronic boards, etc., were opted to fit the robot design constraint on size and weight. The robot has three parts: head, body, and wings. The neck part connects the head and body. A static tail is attached at the top of the head for aesthetic appeal. The feet were designed through inspiration from parrots and were modified to adapt to the wearable design, which can fix to the body part. The robot parts were designed to be hollow to minimize the weight and optimize the three-dimensional printed materials. The specifications of the mechanical properties of the wearable parrot robot are listed in Table 1
The robot’s head was mounted with two servo motors (SG90, manufactured by TowerPro) to provide pitch and yaw motions. The robot can turn its head 90° left and right from the center, and move up and down. The produced design can mimic 5 head positions of the wearer, namely, straight, left, right, left intermediate, and right intermediate at angles of 0°, 90°, −90°, 45°, and −45°, respectively. The exploded of KiliRo robot is presented in Figure 1
and its physical architecture is illustrated in Figure 2
. The head rotation positions and the schematic of the robot are presented in Figure 3
and Figure 4
Initial head position of the robot is set at 0° and when it detects the wearer’s head position as left, it turns 90°, and turns −90° when the wearer turns their head toward the right. Similarly, 45° and −45° are achieved when the robot detects left intermediate and right intermediate positions of the wearer’s head. Even though the robot can achieve pitch motion on its head, it was not deployed during this study. The robot uses the camera mounted on its head to detect the wearer’s head position and processes it using a Raspberry pi-3 small computer to detect and actuate its head accordingly. The list of hardware used in the robot is presented in Table 2
. A TREK Ai-Ball portable Wi-Fi camera as the imaging sensor was used for the KILIRO robot. The Wi-Fi camera used can capture images at 30 Hz with a maximum resolution of 640 × 480. The camera possesses a range of focal length form 20 cm up to infinity with a view angle of 300°.
After creating the database for head pose detection, the testing phase was performed. As in the learning phase, the images required for the testing phase used the same camera and position on the wearer’s shoulder. The illustrations of five head pose orientations in the experimental phase are presented in Figure 10
. Table 4
presents the results of the head pose detection system tested with 250 images. Overall, the system resulted in an accuracy of 94.4%.
The results were analyzed by determining the probability of obtaining the result if performance was simply random. For that purpose, data can be presented as a number of correct responses (i.e., turning the head in the same direction as the human) and number of incorrect responses (i.e., turning the head in the wrong direction). As there are five possible directions, the probability of getting the direction correct is 0.20. For three of the required responses, Class 0, 2, and 3, the robot emitted the correct responses for 50 out of the maximum number of 50 trials. According to a binomial distribution, the probability of obtaining this result is near 0. For images corresponding to Class 2, performance was not flawless, and the robot emitted the correct response on 48 of the 50 trials. For Class 4 images, responses were the worst, with responses correct for only 38 of the 50 trials. However, even in those two cases, the probability of obtaining this result simply due to chance is near 0. Performance is thus significantly better than chance (p < 0.0001) for all responses.