Hand-Guiding Gesture-Based Telemanipulation with the Gesture Mode Classification and State Estimation Using Wearable IMU Sensors

Choi, Haegyeom; Jeon, Haneul; Noh, Donghyeon; Kim, Taeho; Lee, Donghun

doi:10.3390/math11163514

Open AccessArticle

Hand-Guiding Gesture-Based Telemanipulation with the Gesture Mode Classification and State Estimation Using Wearable IMU Sensors

Mechanical Engineering Department, Soongsil University, Seoul 06978, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(16), 3514; https://doi.org/10.3390/math11163514

Submission received: 19 June 2023 / Revised: 24 July 2023 / Accepted: 26 July 2023 / Published: 14 August 2023

(This article belongs to the Topic Intelligent Systems and Robotics)

Download

Browse Figures

Versions Notes

Abstract

:

This study proposes a telemanipulation framework with two wearable IMU sensors without human skeletal kinematics. First, the states (intensity and direction) of spatial hand-guiding gestures are separately estimated through the proposed state estimator, and the states are also combined with the gesture’s mode (linear, angular, and via) obtained with the bi-directional LSTM-based mode classifier. The spatial pose of the 6-DOF manipulator’s end-effector (EEF) can be controlled by combining the spatial linear and angular motions based on integrating the gesture’s mode and state. To validate the significance of the proposed method, the teleoperation of the EEF to the designated target poses was conducted in the motion-capture space. As a result, it was confirmed that the mode could be classified with 84.5% accuracy in real time, even during the operator’s dynamic movement; the direction could be estimated with an error of less than 1 degree; and the intensity could be successfully estimated with the gesture speed estimator and finely tuned with the scaling factor. Finally, it was confirmed that a subject could place the EEF within the average range of 83 mm and 2.56 degrees in the target pose with only less than ten consecutive hand-guiding gestures and visual inspection in the first trial.

Keywords:

hand-guiding gesture; gesture recognition; gesture state estimation; real-time remote control; bi-directional LSTM; wearable sensor

MSC:

68T01; 68T05

1. Introduction

Recently, as the demand for more sophisticated manipulation by accelerating the spread of smart factories has gradually increased, the remote control and teleoperation technology based on user-friendly and flexible gesture recognition has begun to attract attention [1]. In the case of the gesture-based remote control of the manipulator, a central research interest of this study, significantly satisfying gesture recognition accuracy should be essential as the underlying element technology for precise and accurate control. The hand-gesture-based remote control sensors for manipulators can be categorized into vision, depth camera, EMG, IMU sensors, etc.

First, we examine research cases using vision sensors, such as RGB cameras and Kinect. According to the study by C Nuzzi et al. [2], the data of 5 classes collected through the RGB camera were classified with an accuracy of 92.6% using the R-CNN algorithm. However, they reported limitations, such as light reflection, boundary extraction of background and hand, and limited working area of camera FOV. To overcome these limitations of RGB cameras, W Fang et al. [3] proposed a gesture recognition method for 37 hand gestures with CNN and DCGAN. They collected 37 hand gestures under various environmental conditions using artificial light sources. D Jiang et al. [4] classified 24 hand gestures collected with Kinect with an accuracy of 93.63% through CNN. However, like the RGB sensor, limited FOV and low illuminance [5] are also not allowed in the Kinect.

Vision sensor-based gesture recognition methods are frequently employed for remote control of manipulators. One study [6] utilized Kinect V2 and OpenPose [7] to develop a real-time human–robot interaction framework for robot teaching through hand gestures, incorporating a background invariant robust hand gesture detector. The researchers employed a pre-trained state-of-the-art convolutional neural network (CNN), Inception V3, alongside the OpenSign dataset [8] to classify 10 hand gestures. With 98.9% accuracy in hand gesture recognition, they demonstrated gesture-based telemanipulation using an RGB camera. However, this approach requires users to memorize perceivable gestures for the robot, and the vision sensor’s depth range constrains its capabilities. In addition, the system has only been tested indoors and may struggle in bright light due to the resulting contrast in RGB images. In cases where Kinect’s skeletal information is used, researchers have successfully controlled the speed and steering of a mobile robot [9] and the position of a 5-axis manipulator [10]. Nevertheless, performance and usability issues often arise in research using vision sensors, such as limited field of view (FOV), light reflection, occlusion, and illumination. As a result, this approach is considered nearly infeasible in industrial settings, where operators must stand and perform gestures while facing the monitor screen.

EMG sensors’ limited controllable degrees of freedom (DOFs) and dependency on human kinematic models make them unsuitable for telemanipulation applications when used alone. Vogel [11] combined sEMG with Vicon motion-capture camera systems to record EMG signals from the wrist and pose information to remotely control the DLR LWR-III manipulator and train machine learning models. Furthermore, to minimize occlusion effects in gesture-based telemanipulation using only Kinect, an approach that combined hand posture recognition based on sEMG-derived biofeedback information was introduced [12]. In a study employing IMU and EMG sensors [13], six static hand motions were recognized and used to control a robot arm by mapping each motion to the corresponding robot arm movement.

In the study of IMU-based gesture recognition [14], six hand gestures were recognized at an average accuracy of 81.6%, and the telemanipulation was achieved only with the predefined motion mapped for each gesture. In the study using the operator’s skeletal kinematic model [15], omnidirectional manipulation was achieved by estimating the hand motion trajectory, even though challenges persisted in the uncertainties of pose estimation and differentiating between unintentional and intentional motions. However, in most cases of IMU-based motion recognition, if the operator’s initial body alignment determined just after the sensor calibration does not hold, the accuracy of dynamic gesture recognition will drop drastically. In a study [16] for human motion tracking using a set of wearable IMU sensors, they did not use a body-fixed reference frame, but an earth-fixed frame for calculating the joint position between the body segments with considering the reference method from the biomechanics domain [17]. Thus, in this case, the time-variant body-heading direction does not matter because the angle of the human body, an essential feature of the recognition model, is not less affected by the change of the body-heading orientation. To apply this method, the segment axes should be determined segment-by-segment through predefined joint movements, such as pronation–supination for the upper-limb joint [18] and flexion–extension for the lower-limb joint [19]. Moreover, the relation of segments with the global reference frame should be identified after estimating the relative pose of the sensor to the segment. Then, the joint position can be calculated by two connecting segments. It can be said that this method should be time-consuming and inconvenient, as the number of the joints of interest is increased.

There has been another approach to securing consistent reference inertial measurement frames in IMU sensor-based human motion analysis of lower-limb [20] and upper-limb part motion [21] with simple standing–stopping sensor calibration gestures used in a study [22]. Regarding the body-fixed frame, the studies addressed the critical issue of variations in the subject’s body alignment, which can affect the accuracy and consistency of motion analysis and gesture recognition using wearable IMU sensors. The researchers emphasized the importance of updating the body-fixed frame according to the subject’s heading direction and body alignment changes. By doing so, the proposed method can effectively account for the time-varying nature of the body-fixed frame and maintain accurate gesture recognition, even when the wearer’s body alignment undergoes significant changes. They also demonstrated the efficacy of the proposed method through experimental evaluations, which show that the recognition performance remains robust, even in the presence of substantial body alignment changes. The study results suggest that the proposed approach effectively addresses the challenges associated with body-fixed frame variations and can potentially improve the utility of wearable inertial sensor-based systems for gesture recognition and related applications.

Thus, considering the method of the floating body-fixed frame [20,21], this study proposes a new hand-guiding gesture-based telemanipulation approach that addresses the limitations identified in previous hand-guiding gesture-based telemanipulation studies. As a result, this study makes the following research contributions:

This study proposes a novel spatial pose telemanipulation method of a 6-DOF manipulator without the human skeletal kinematic model with only two IMU sensors by an unprecedented combination of the gesture mode and states.
Consistent hand-guiding gesture mode classification and state estimation were successfully achieved by integrating the floating body-fixed frame method into the proposed telemanipulation method, even during the operator’s dynamic movements.

The rest of this paper is composed as follows. In Section 2, we describe the problem definition. In Section 3, we present details about the proposed method for the hand-guiding gesture-based manipulator remote-control method. Then, we describe the experimental validation results in Section 4. In Section 5, we conclude the paper.

2. Problem Definition

As discussed earlier, studies on hand gesture-based telemanipulation using IMU sensors can be categorized into two approaches:

The gesture recognition model-based method enables robot arm control through specific hand gestures mapped to corresponding motions. However, it is limited in its capacity to control the robot arm in all directions.
Skeletal kinematic model-based method: this approach allows for omnidirectional control of the robot arm by replicating the hand movements of a human worker. Despite its versatility, it faces challenges implementing pure linear or angular motion and differentiating between intended and unintended control motions. Moreover, accurately discerning the operator’s motion intent without external sensor-based error feedback remains unattainable.

In this study, as illustrated in Figure 1, a single hand-mounted IMU sensor is employed to estimate hand gesture states, including intensity and direction. Concurrently, hand gesture modes are classified using a bi-directional LSTM. The omnidirectional control of a spatial manipulator with more than six degrees of freedom is achieved by combining the corresponding mode and state. Moreover, by utilizing a real-time pose-tracking controller, the operator can update the target pose at desired moments, even while the manipulator moves towards the target pose, creating a natural trajectory up to the final target pose.

Figure 1 illustrates the framework of a hand-guiding gesture-based teleoperation strategy. The operator fastens wearable IMU sensors to their pelvis and hand. The pelvis-mounted sensor is essential for updating the orientation of {FB_f}, a virtually created body reference frame, in response to changes in the operator’s heading direction. A hand-mounted IMU sensor is necessary to classify the operator’s hand gesture mode and estimate its states. If the operator maintains the initial alignment with the {FB_f} constant, there is no need for a pelvis-mounted IMU sensor. The hand-mounted IMU sensor’s output is expressed to the {FB_f} frame and has an update rate of 100 Hz. A bi-directional LSTM model is employed to classify the three gesture modes, and the model has a time horizon length of 400 ms and classifies the gesture mode every 0.01 ms. Ultimately, by combining the mode and state of the hand-guiding gesture, the target pose is generated with three modes: intentional gesture, linear gesture for pure translational motion, and angular gesture for pure rotational motion of the EEF. These three modes allow for the following executions of the manipulator’s spatial pose control.

(Operator side) Linear gesture → (Robot side) spatial translational motion;
(Operator side) Angular gesture → (Robot side) spatial rotational motion;
(Operator side) Unintentional gesture → (Robot side) zero motion.

3. Method

As mentioned earlier, the gesture’s mode classification and state estimation should be combined to realize the spatial pose telemanipulation of a 6-DOF manipulator without the human skeletal kinematic model. In addition, the floating body-fixed frame method should also be considered to realize consistent gesture mode classification, even in the operator’s dynamic movement. Thus, this section is composed as follows: (Section 3.1) Floating body-fixed frame, (Section 3.2) Bi-directional LSTM-based hand-guiding gesture classification, and (Section 3.3) Hand-guiding gesture’s state estimation.

3.1. Floating Body-Fixed Frame

This section presents a protocol for generating an {FB_f} that updates according to the operator’s body heading direction. Figure 2 illustrates creating a {Bf} through a stand–stooping calibration gesture. Given that human body activity is explained by the body motion plane (frontal plane, sagittal plane, transverse plane), it is assumed that the initial {Bf} aligns with the body motion plane. By calculating the orientation difference between the IMU sensors with arbitrary attachment postures and the ideal {Bf} frame through the calibration gesture, the IMU sensor’s orientation is corrected to match the posture of {Bf}. For detailed {Bf} generation methods, please refer to the paper [6] describing the FGCD algorithm. The operator’s body heading direction aligns with the positive x-axis of the initially created {Bf}, and the z-axis of {Bf} remains vertically upward with respect to the inertial frame. Subsequently, using the pelvic IMU’s orientation, {Bf} is continuously updated to align the operator’s current body heading direction with the x-axis of {Bf}. This concept allows gesture mode recognition and state estimation for an initially created {Bf}, even when the body-heading direction varies dynamically. Equation (1) represents the formula for generating {FB_f}.

\begin{array}{l} {}_{F B_{f}}^{G}R = {}_{S_{f, p e l v i c}}^{G}R \cdot {}_{S_{f, p e l v i c, s t a n d}}^{G}R^{T} \cdot {}_{B_{f}}^{G}R \\ (∴ j u s t a f t e r c a l i b r a t i o n : {S_{c}} \to {B_{f}}) \end{array}

(1)

G denotes inertial frame, {FB_f} denotes floating body-fixed frame, and B_f denotes body-fixed frame. S_f, pelvic denotes the sensor-fixed frame at the pelvic IMU, and S_f, hand denotes the sensor-fixed frame at the hand-mounted IMU. The subscripts “stand” and “stoop” denote the operator’s posture when the orientation of the corresponding sensor is recorded. “S_c” represents a calibrated sensor-fixed frame. “C_j” is a new sensor-fixed frame of each sensor initially {FB_f} is created. Equation (2) converts the IMU sensor’s orientation, initially expressed in the inertial frame, into an expression for frame {FB_f}. To transform the acceleration and angular velocity outputs from the sensor-fixed frame to frame {FB_f}, Equations (3) and (4) are utilized. Unlike orientation, acceleration and angular velocity are outputs for the sensor-fixed frame of the sensor itself. Consequently, the constant transformation defined in the final step of calibration, which transitions from {S_f} to {S_c}, is not required for these outputs.

{}_{C_{j}}^{F B_{f}}R = {}_{F B_{f}}^{G}R^{T} \cdot {}_{S_{f, h a n d}}^{G}R \cdot {}_{S_{c, h a n d}}^{S_{f, h a n d, s t a n d}}R

(2)

{}^{F B_{f}}{\vec{a}}_{S}_{f, h a n d} = {}_{F B_{f}}^{G}R^{T} \cdot {}_{S_{f, h a n d}}^{G}R \cdot^{S_{f, h a n d}} \vec{a}

(3)

{}^{F B_{f}}{\vec{ω}}_{S}_{f, h a n d} = {}_{F B_{f}}^{G}R^{T} \cdot {}_{S_{f, h a n d}}^{G}R \cdot^{_{S_{f, h a n d}}} \vec{ω}

(4)

3.2. Bi-Directional LSTM-Based Hand-Guiding Gesture Classification

Figure 3 illustrates that the linear gesture, representing pure translational motion, and the angular gesture, corresponding to pure spatial rotational motion, are associated with an intentional and a return motion. Upon visual inspection, it becomes evident that the initial motion shares a similar shape during the execution of the intended motion, which is also observable in the return motion. Moreover, it should be noteworthy that this considerable intensity difference between the intended and returning motions in hand-guiding gestures depicted in Figure 3 inevitably makes the recognition accuracy and model generalization performances worse. Thus, in our preceding study [21], the RNN-based models, which have better performances in context understanding, were considered to realize an accurate gesture recognition model, even with these significant intensity differences. Moreover, the bi-directional LSTM model showed the best performance in recognizing the hand-guiding gestures in Figure 3. That is a reason why the bi-directional LSTM, which incorporates both past and future information at each time step, was chosen as the motion recognition model for this study.

Figure 4 presents the framework for recognizing hand gesture modes using a bi-directional LSTM. Firstly, 9-dimensional data, consisting of orientation, acceleration, and angular velocity expressed with respect to {FBf}, are chosen as features. The nine selected data types were extracted using the sliding window method, and the extracted data were used as inputs for the bi-directional LSTM model. Following the results of the preceding study [21], the size of the sliding window was set to 9 × 40. And as the intensity of the hand-guiding gesture varies among operators, the data extracted through the sliding window undergo a normalization process. In addition, Table 1 represents detailed computational cost- and hyperparameters-related information of the model in Figure 4. To prevent overfitting of the bi-directional LSTM model, ReduceLROnPlateau and EarlyStopping callback functions were added. The time complexity calculated for the bi-directional LSTM model yielded O (6480).

3.3. Hand-guiding Gesture’s State Estimation

After obtaining the transformation matrix from t − 1 to t of the sensor-fixed frame of the hand-mounted IMU through Equation (5), the resultant transformation matrix is converted into an axis–angle representation through Equation (6) to avoid the singularity issue of the ZYX Euler angle convention. Through Equation (7), the instantaneous axis of a rotation expressed w.r.t. the {FB_f} of the hand-mounted IMU and the direction of the angular gesture state,

^{F B_{f}} ω_{S f, h a n d, t}

, can be obtained, and the intensity,

θ_{h}

, can also be obtained.

{}_{S_{c, h a n d, t}}^{S_{c, h a n d, t - 1}}R = {}_{S_{c, h a n d, t - 1}}^{F B_{f}}R^{T} \cdot {}_{S_{c, h a n d, t}}^{F B_{f}}R

(5)

θ_{h} = a c o s (\frac{r_{11} + r_{22} + r_{13} - 1}{2}), {}_{S_{c, h a n d, t}}^{S_{c, h a n d, t - 1}}k = \frac{1}{2 \sin θ} \cdot [\begin{array}{l} r_{32} - r_{23} \\ r_{13} - r_{31} \\ r_{21} - r_{12} \end{array}]

(6)

{}_{S_{c, h a n d, t}}^{F B_{f}}k = {}_{S_{c, h a n d, t - 1}}^{F B_{f}}R \cdot {}_{S_{c, h a n d, t}}^{S_{c, h a n d, t - 1}}k \to {}^{F B_{f}}ω_{S_{c, h a n d, t}} \in ℝ^{3}

(7)

r_i,j represents each component of the SO(3) matrix obtained through Equation (5). To generate a command for the pure rotational motion of the manipulator’s EEF, {FB_f} is replaced with {B}, the base frame of the manipulator, as in Equation (8), and the unit screw axis S can be defined according to Def. 1. The parameter

λ_{a} \in ℝ

is a scale factor for user control of estimated gesture intensity. Equation (10) defines the S.E. (3) transformation matrix

e^{[S] θ} \in S E (3)

with the previously obtained unit screw axis S to obtain the target orientation of the manipulator EEF and multiply it by

{}_{T_{i - 1}}^{B}T \in S E (3)

, representing the current pose. EEF’s target pose for the manipulator’s base frame {B} obtained in this way can be sent to the position trajectory controller and position grouped controller of the robot operating system (ROS) via the MoveIt IK solver to perform joint space control.

{}^{B}ω_{T} = {}_{B}^{F B_{f}}R^{T} \cdot {}_{F B_{f}}ω_{S_{c, h a n d, t}} (∵ {}_{B}^{F B_{f}}R^{T} = I)

(8)

∴ S = [{}^{B}ω_{T} 0] \in ℝ^{6}, λ_{a} \cdot θ_{h} \to θ \in ℝ

(9)

{}_{T_{i}}^{B}T = e^{[S] θ} {}_{T_{i - 1}}^{B}T \in S E (3)

(10)

After calculating the average velocity by integrating the acceleration output from the hand-mounted IMU using Equation (11), the normalized result of the average velocity is determined as the direction of the corresponding gesture through Equation (12). The parameter n denotes the quantity of data collected during the intentional motion segment of the respective linear gesture.

{}^{F B_{f}}{\vec{v}}_{S_{f, h a n d}} = (\sum_{t - n}^{t} {}^{B_{f}}{\vec{a}}_{S_{f, h a n d}} \cdot Δ t) / n

(11)

{}^{F B_{f}}{\hat{v}} {}_{S_{f, h a n d}}= {}^{F B_{f}}{\vec{v}}_{S_{f, h a n d}} / ‖{}^{F B_{f}}{\vec{v}}_{S_{f, h a n d}}‖

(12)

As in the case of the angular gesture described above, to generate a command for the pure translation motion of the manipulator’s EEF, {FB_f} is replaced with B, the manipulator’s base frame (Equation (13)), and Def.1, as in Equation (14). Thus, the unit screw axis S is defined. Moreover, the scale factor is defined for adjusting the estimated gesture’s intensity, and Equation (15) is used to obtain the target position of the manipulator EEF.

{}^{B}v_{T} = {}_{B}^{F B_{f}}R^{T} \cdot {}_{F B_{f}}{\hat{v}}_{S_{f, h a n d}} (∵ {}_{B}^{F B_{f}}R^{T} = I)

(13)

∴ S = [0 {}^{B}v_{T}] \in ℝ^{6}, λ_{l} \to θ \in ℝ

(14)

{}_{T_{i}}^{B}T = e^{[S] θ} {}_{T_{i - 1}}^{B}T \in S E (3)

(15)

Definition 1

(Screw axis). For a given reference frame, a screw axis S of a joint can be written as:

S_{i} = [\begin{matrix} ω_{i} \\ v_{i} \end{matrix}] \in ℝ^{6}

(16)

where either (i)

‖ω‖ = 1

or (ii)

ω = 0

and

‖v‖ = 1

. If (i) holds, then

v = - ω \times q + h ω

, where q is a point on the axis of the screw, and h is the pitch of the screw (h = 0 for a pure rotation about the screw axis). If (ii) holds, then the pitch of the screw is infinite, and the twist is a translation along the axis defined by

v

. Although we use the pair

(ω, v)

for both a normalized screw axis S (where one of

‖ω‖

or

‖v‖

must be unity) and a general twist

V

(where there are no constraints on

ω

and

v

), the meaning should be clear from the context [23].

4. Experiments

This section examines the performances of the gesture mode classifier and state estimator. And then, the experimental studies to teleoperate the UR5e manipulator to the target poses with the developed hand-guiding gesture-based telemanipulation method by combining the mode and state were conducted in the motion-capture space.

4.1. Gesture Recognition

After subjects wore two XSENS MTw wearable IMU sensors, the datasets were collected for three predefined hand-guiding gesture modes. As shown in Figure 5, all subjects moved freely within the motion-capture space and repeatedly performed hand-guiding gestures in all directions during dataset acquisitions. Table 2 presents detailed training, validation, and test dataset acquisition information.

The detailed performances of the bi-directional LSTM model in the model-training step are presented in Table 3, showcasing the classification accuracies achieved. Table 4 also shows the results of real-time telemanipulation experiments, specifically focusing on hand-guiding gesture recognition using the trained model. The corresponding confusion matrices for both the model training and real-time experiments can be found in Figure 6. Moreover, according to these confusion matrices, it is noteworthy that a noticeable number of misclassifications inevitably exists, and it might result in the manipulator’s jerky motion due to the undesired mode switching into the incorrect gesture modes. To address this issue, as depicted in Figure 7, the latest twenty tentative modes are saved in the tentative mode history repository, and then the most frequent mode is selected as the current gesture mode. In other words, by incorporating a 0.2 s sliding window, the jerky motion of the manipulator caused by misclassifications can be effectively mitigated. However, it should also be noted that this implementation introduces a minimum time delay of 0.2 s for the desired mode switching.

4.2. Gesture State Estimation

In this study, a testbench, as shown in Figure 8, was prepared to verify the operator’s control intention estimation performance through the combination of mode recognition and state estimation described in Section 3.2 and Section 3.3. The details of the test bench and measurement information are as follows.

Testbench: Within the testbench, 6 Prime 13 cameras, 2 retro-reflective marker sets (hand, manipulator’s EEF), and a workstation with MOTIVE 2.1.1 (NaturalPoint Inc., Corvallis, OR, USA) software are installed. The pose of the marker-fixed frame is measured for the Optitrack-fixed frame {Of} defined in an L-shaped calibration square (CS-200) located within the motion-capture volume.
Hand trajectory: One wireless IMU sensor is attached to the back of the subject’s pelvis and right hand with a strap, and a marker set is attached to the right hand to measure its position for {Of}.

Gesture state: All outputs of the wireless IMU sensor are converted to be output for {FB_f}, and the converted features are output as the gesture mode and state through the models described in Section 3.2 and Section 3.3. Note that the conversion relationship between {FB_f} and {Of} defined by the calibration square cannot be accurately identified. However, the z-axis is the same as [0 0 1], and the authors tried their best to align the body-heading direction with the L-square-heading direction during the calibration gesture.

Figure 9 presents the measurement results for the linear gesture’s hand trajectory, estimated direction, and estimated intensities. Examining the projected results on the X.Y. plane reveals that the linear gesture’s direction estimates the direction. The hand trajectory was measured for the {Of} frame, while the gesture direction was estimated for the {FB_f} frame, resulting in different reference frames. After transforming the estimated direction’s reference frame to the {Of} frame, the origin was set as the starting point of the hand trajectory for improved readability. The estimated intensity is multiplied by a scaling factor ranging from 0.1 to 0.4 to determine the target position dots. Thus, it was experimentally validated that the position of the manipulator’s end-effector could be controlled according to the direction and intensity intended by the user.

Figure 10 illustrates the hand trajectory, hand orientation, estimated rotational axis, and axis of pure rotational motion of the EEF for the angular gesture. In contrast to the linear gesture, direction estimation for the angular gesture requires the hand-mounted IMU’s orientation, as described in Equations (5)–(7). Consequently, the pose expressed for the {Of} frame of the hand-mounted marker at the start and endpoints of the intentional motion section of the angular gesture is displayed. As a result of conducting an angular gesture experiment with more than five subjects, it was observed that there were instances where the hand performed an angular gesture while slightly bent toward the forearm. This might lead to an angular error between the intended and estimated rotation axes, so the difference between the intended and estimated direction needs to be compared. Therefore, the estimated rotational axis and the axis of pure rotational motion of the EEF are shown in Figure 10, with the angle between the two axes confirmed to be 0.57 degrees. These results confirm that the user can instigate the pure rotational motion of the EEF (End-Effector) according to the intended direction and intensity. The synthesis will be discussed in Section 4.3.

4.3. Validation of Hand-Guiding Gesture-Based Telemanipulation

Figure 11 illustrates the results of telemanipulation to move the UR5e manipulator’s EEF from the home pose to the goal pose, as depicted in the framework described in Figure 1. The heading direction was allowed to move freely without needing to be fixed. The experimental environment is presented in Figure 8. To track the EEF pose, a reflective marker set was attached, and the frames shown at the starting and ending points of the EEF trajectory in Figure 11 correspond to the marker set.

Figure 11a displays the control results using the position trajectory controller of ROS for the linear gesture case, demonstrating that a target pose is generated for each gesture. Due to the characteristics of the trajectory controller, a new target pose cannot be assigned until the EEF reaches the target point. Over three experiments, the EEF could be moved to approximately 83 mm of the goal pose on average. The 83 mm was significantly influenced by the inevitable distance gap caused by part interference between the marker set attached to the EEF and the marker set corresponding to the goal pose.

In Figure 11b,c, the target pose can be updated in real time based on the pose-tracking controller of ROS. Figure 11b presents the results of linear gesture-only telemanipulation, while Figure 11c shows the results of telemanipulation combining both linear and angular gestures. Notably, in the case of Figure 11c, both the position and orientation were simultaneously controlled, starting from a different goal pose and initial EEF poses. It was confirmed that the angle difference between the final EEF pose and the goal pose was within 2.56 degrees. Additionally, in all experiments, it was verified that the EEF could be moved to the desired goal pose with a hand-guiding gesture fewer than ten times.

5. Result and Discussion

This study presents a complete framework for teleoperating 6-DOF manipulators using two wearable IMU sensors, without relying on a complex human skeletal kinematic model. To validate the performance of the proposed method, the teleoperation experiments using a UR5e manipulator for spatial pose control were successfully conducted in the motion-capture space. As a result, the following experimental results were confirmed:

Utilizing the floating-body fixed frame, the hand-guiding gesture mode can be classified with 84.5% accuracy in real time, even during the operator’s dynamic movement scenarios.
The spatial direction of hand-guiding gestures could be estimated with an error of less than 1 degree.
The gesture intensity of hand-guiding gestures could be successfully estimated with a speed estimator and finely tuned with the scaling factor.
Finally, a subject could place the EEF within the average range of 83 mm and 2.56 degrees in the target pose with only less than ten consecutive hand-guiding gestures and visual inspection in the first trial.

The main research contribution of this study is in the unprecedented trial of a combination of the gesture’s mode and states to realize controllability on the spatial direction and displacement with the minimum number of wearable IMU sensors. Firstly, the intensity and direction of hand-guiding gestures were separately estimated through the proposed gesture state estimator, combined with the gesture mode obtained using a bi-directional LSTM-based gesture mode classifier. Based on the integrated gesture mode and state, the manipulator’s EEF can be successfully controlled by combining spatial translational and rotational motions. To show the significance of our method, Table 5 compares the method proposed in this study with the most recent studies in terms of recognition accuracy and controllability. While it is true that the recognition accuracy of our study falls short compared to recent research, we have introduced an additional process for selecting the mode of the hand-guiding gesture in our study. This has resolved the issue of manipulator malfunction due to the misrecognition of the hand-guiding gesture mode. Moreover, our study not only recognizes the mode of the hand-guiding gesture, but also estimates its state (direction and intensity) to allow for the remote control of the manipulator in the direction desired by the user. However, recent studies can only control the robot to a fixed direction and predetermined displacement. Therefore, this method is the first to control spatial linear or spatial angular motion up to displacement with only a single hand-mounted and pelvic-mounted IMU sensor.

Based on these significant results, future works should focus on the following potential areas for improvement and expansion:

Intuitive user interface (U.I.) with AR-assisted devices: The current system allows unskilled subjects to place the EEF within an average range of 83 mm and 2.56 degrees in the goal pose, but there is additional room for improvement in the U.I. part. Thus, future work should look into more intuitive control interfaces or improved feedback systems to help the operator guide the manipulator in remote sites more accurately.
Constraints on manipulator motion: In the current study, while it is possible to separately control the position and orientation of the manipulator’s EEF as intended by the user, it is impossible to control both simultaneously. Therefore, in future work, we need to add types of hand-guiding gestures to control both the position and orientation of the manipulator’s EEF simultaneously.
Integration with force feedback: Although the current research does not include haptic feedback, force feedback could offer the operator a more immersive and intuitive control experience.

By exploring these areas, we can continue to refine and expand the capabilities of the telemanipulation framework, thus making it even more effective and adaptable for various applications.

Author Contributions

Conceptualization, D.L.; methodology, D.L.; software, H.C. and H.J.; validation, H.C., H.J., D.N., T.K. and D.L.; formal analysis, H.C., D.N., T.K. and D.L.; investigation, D.L.; resources, D.L.; data curation, H.C.; writing—original draft preparation, D.L.; writing—review and editing, D.L.; visualization, H.C., H.J., D.N., T.K. and D.L.; supervision, D.L.; project administration, D.L.; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2022 R1F1A1074704); Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00218); MSIT (Ministry of Science and ICT), Korea, under the Innovative Human Resource Development for Local Intellectualization support program (IITP-2022-RS-2022-00156360) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation); and Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (N000P0017033) and (N000P0017123).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kumar, N.; Lee, S.C. Human-machine interface in smart factory: A systematic literature review. Technol. Forecast. Soc. Chang. 2022, 174, 121284. [Google Scholar] [CrossRef]
Nuzzi, C.; Pasinetti, S.; Lancini, M.; Docchio, F.; Sansoni, G. Deep learning-based hand gesture recognition for collaborative robots. IEEE Instrum. Meas. Mag. 2019, 22, 44–51. [Google Scholar] [CrossRef] [Green Version]
Fang, W.; Ding, Y.; Zhang, F.; Sheng, J. Gesture recognition based on CNN and DCGAN for calculation and text output. IEEE Access 2019, 7, 28230–28237. [Google Scholar] [CrossRef]
Jiang, D.; Li, G.; Sun, Y.; Kong, J.; Tao, B. Gesture recognition based on skeletonization algorithm and CNN with ASL database. Multimedia Tools Appl. 2019, 78, 29953–29970. [Google Scholar] [CrossRef]
Suarez, J.; Murphy, R.R. Hand gesture recognition with depth images: A review. In Proceedings of the 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, Paris, France, 13 September 2012; pp. 411–417. [Google Scholar]
Mazhar, O.; Navarro, B.; Ramdani, S.; Passama, R.; Cherubini, A. A real-time human-robot interaction framework with robust background invariant hand gesture detection. Robot. Comput. Manuf. 2019, 60, 34–48. [Google Scholar] [CrossRef] [Green Version]
CMU-Perceptual-Computing-Lab/Openpose. GitHub. Available online: https://github.com/CMU-Perceptual-Computing-Lab/openpose (accessed on 13 March 2023).
OpenSign—Kinect V2 Hand Gesture Data—American Sign Language. NARCIS. Available online: https://www.narcis.nl/dataset/RecordID/oai%3Aeasy.dans.knaw.nl%3Aeasy-dataset%3A127663 (accessed on 4 April 2023).
Zhou, D.; Shi, M.; Chao, F.; Lin, C.M.; Yang, L.; Shang, C.; Zhou, C. Use of human gestures for controlling a mobile robot via adaptive cmac network and fuzzy logic controller. Neurocomputing 2018, 282, 218–231. [Google Scholar] [CrossRef]
Bouteraa, Y.; Ben Abdallah, I.; Ghommam, J. Task-space region-reaching control for medical robot manipulator. Comput. Electr. Eng. 2018, 67, 629–645. [Google Scholar] [CrossRef]
Vogel, J.; Castellini, C.; van der Smagt, P. EMG-based teleoperation and manipulation with the DLR LWR-III. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011. [Google Scholar]
Bouteraa, Y.; Ben Abdallah, I. A gesture-based telemanipulation control for a robotic arm with biofeed-back-based grasp. Ind. Robot. Int. J. 2017, 44, 575–587. [Google Scholar] [CrossRef]
Chico, A.; Cruz, P.J.; Vasconez, J.P.; Benalcazar, M.E.; Alvarez, R.; Barona, L.; Valdivieso, A.L. Hand Gesture Recognition and Tracking Control for a Virtual UR5 Robot Manipulator. In Proceedings of the 2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM), Cuenca, Ecuador, 12–15 October 2021. [Google Scholar] [CrossRef]
Kulkarni, P.V.; Illing, B.; Gaspers, B.; Brüggemann, B.; Schulz, D. Mobile manipulator control through gesture recognition using IMUs and Online Lazy Neighborhood Graph search. Acta IMEKO 2019, 8, 3–8. [Google Scholar] [CrossRef]
Shintemirov, A.; Taunyazov, T.; Omarali, B.; Nurbayeva, A.; Kim, A.; Bukeyev, A.; Rubagotti, M. An open-source 7-DOF wireless human arm motion-tracking system for use in robotics research. Sensors 2020, 20, 3082. [Google Scholar] [CrossRef] [PubMed]
Roetenberg, D.; Luinge, H.; Slycke, P. Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors; Technical Report; Xsens Motion Technologies B.V.: Enschede, The Netherlands, 2013; pp. 1–7. [Google Scholar]
Luinge, H.; Veltink, P.; Baten, C. Ambulatory measurement of arm orientation. J. Biomech. 2007, 40, 78–85. [Google Scholar] [CrossRef] [PubMed]
Van der Helm, F.C.T.; Pronk, G.M. Three-dimensional recording and description of motions of the shoulder mechanism. J. Biomech. Eng. 1995, 117, 27–40. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.-T.; Novak, A.C.; Brouwer, B.; Li, Q. Concurrent validation of Xsens MVN measurement of lower limb joint angular kinematics. Physiol. Meas. 2013, 34, N63–N69. [Google Scholar] [CrossRef] [PubMed]
Kim, M.; Lee, D. Wearable inertial sensor based parametric calibration of lower-limb kinematics. Sens. Actuators A Phys. 2017, 265, 280–296. [Google Scholar] [CrossRef]
Jeon, H.; Choi, H.; Noh, D.; Kim, T.; Lee, D. Wearable Inertial Sensor-Based Hand-Guiding Gestures Recognition Method Robust to Significant Changes in the Body-Alignment of Subject. Mathematics 2022, 10, 4753. [Google Scholar] [CrossRef]
Yuan, Q.; Chen, I.-M. Human velocity and dynamic behavior tracking method for inertial capture system. Sens. Actuators A Phys. 2012, 183, 123–131. [Google Scholar] [CrossRef]
Lynch, K.M.; Park, F.C. Modern Robotics; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Yoo, M.; Na, Y.; Song, H.; Kim, G.; Yun, J.; Kim, S.; Moon, C.; Jo, K. Motion estimation and hand gesture recognition-based human–UAV interaction approach in real time. Sensors 2022, 22, 2513. [Google Scholar] [CrossRef] [PubMed]
Chamorro, S.; Jack, C.; François, G. Neural network based lidar gesture recognition for real-time robot teleoperation. In Proceedings of the 2021 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), New York City, NY, USA, 25–27 October 2021. [Google Scholar]
Se-Yun, J.; Eun-Su Kim, E.-S.; Park, B.Y. CNN-based hand gesture recognition method for teleoperation control of industrial robot. IEMEK J. Embed. Syst. Appl. 2021, 16, 65–72. [Google Scholar]
Kim, E.; Shin, J.; Kwon, Y.; Park, B. EMG-Based Dynamic Hand Gesture Recognition Using Edge A.I. for Human–Robot Interac-tion. Electronics 2023, 12, 1541. [Google Scholar] [CrossRef]
Cruz, P.J.; Vásconez, J.P.; Romero, R.; Chico, A.; Benalcázar, M.E.; Álvarez, R.; López, L.I.B.; Caraguay, L.V. A Deep Q-Network based hand gesture recognition system for control of robotic platforms. Sci. Rep. 2023, 13, 7956. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overview of the proposed hand-guiding gesture-based teleoperated manipulator control framework. Explanation of subscripts in the figure: ¹ Operator’s body-heading direction estimated through the pelvic IMU. ² Hand-guiding gestures’ mode = Linear gesture(l), Angular gesture(a), Via gesture(v). ³ Intensity and 3-d direction of the hand-guiding gesture. ⁴ kdl_kinematics_plugin/KDLKinematicsPlugin (w/kinematic serach resolution of 0.005). ⁵ https://github.com/ros-controls/ros_controllers/blob/melodic-devel/position_controllers/include/position_controllers/joint_group_position_controller.h (accessed on 4 April 2023).

Figure 2. The procedure of calibration gesture for defining a {B_f}.

Figure 3. Hand-guiding gesture modes: unintentional gesture, intentional gestures (linear gesture and angular gesture).

Figure 4. A framework of the bi-directional LSTM designed for three different hand−guiding gesture classifications.

Figure 5. ((Left) three photos) Testbench environment composed of the six-motion−capture−camera system, (right) trajectories of the reflective marker attached to the subject’s head during training dataset acquisition.

Figure 6. Confusion matrices of the hand-guiding gesture classification in (a) the training stage and (b) the real-time experiments.

Figure 7. A block diagram outlining the method for classifying the modes of hand-guiding gestures.

Figure 8. Overview of the testbench composed of six Optitrack prim 13 vision cameras, Xsens MTw wearable IMU sensors, a workstation, and a mobile manipulator.

Figure 9. Linear gesture results: hand trajectory measured by reflective marker set, estimated intended direction, and intensity of the linear gesture, and estimated target position dots w.r.t. the scaling factor of [0, 0.1, 0.2, 0.3, 0.4].

Figure 10. Angular gesture results: hand trajectory and orientation (at start and end pose) measured by six Optitrack Prime 13 cameras and reflective marker set and intended gesture’s direction and intensity estimated by the proposed method; the angle between IMU−based axis and Optitrack−based axis is 0.57 degrees.

Figure 11. Results of hand−guiding gesture−based telemanipulation without constraint on the operator’s fixed body−heading direction: (a) pure translation motion with “position trajectory control” ROS controller, (b) pure translation motion with “position grouped control” ROS controller, (c) both pure translational motion and rotational motion−based control with “position grouped control” ROS controller.

Table 1. Information related to the computational cost of and hyperparameters deployed bi-directional LSTM.

Model Size	Input Sequence Length	GPU Memory	Latency
114 kB	40	48 G.B. (RTX 30902)	1.3
Optimizer	Learning Rate	Batch Size	Epochs
Adam	0.001	5000	1000

Table 2. Information on the training, validation, and test datasets.

Training			Test
No. of Subjects	No. of the Training Sets	No. of the Validation Sets	No. of Subjects	No. of Datasets
6	163,375	108,276	3	90,547

Table 3. Training and test accuracies of the bi-directional LSTM.

Training and Test Classification Accuracy [%]
Linear Gestures	Angular Gestures	Unintentional Gestures	Total
99.7/88.7	98.4/85.6	97.6/74.4	98.6/82.9

Table 4. Accuracy of the hand-guiding gestures classification in real-time hand-guiding gesture-based telemanipulation experiments.

Linear Gestures	Angular Gestures	Unintentional Gestures	Total
89.4	84.5	79.6	84.5

Table 5. Comparison of the method proposed in this study with the most recent studies related to arm/hand gesture recognition in terms of recognition accuracy and controllability.

Gesture Type	Accuracy [%]	Controllability
Gesture Type	Accuracy [%]	Spatial Direction	Spatial Displacement	Limitation
(Dynamic) Hand gesture	84.5	O	O	Decoupling of controllable motion to spatial linear and angular
(Static) Hand Gesture [24]	91.7	×	×	Predefined motion only for UAV (e.g., move forward/backward, left/right, ascend/descend)
(Static & Dynamic) Arm gesture [25]	93.1	×	×	Predefined motion only for vehicles (e.g., stop, go-forward fast, go-forward while turning left/right, etc.)
(Static) Hand gesture [26]	88.0	×	×	Limited linear motion in only left and right directions
(Static) Hand gesture [27]	96.0	×	×	Predefined motion (e.g., close, open, rest, supination, fist, etc.)
(Static) Arm & hand gestures [28]	88.1	×	×	Predefined motion (e.g., wave in/out for selection of a new orientation reference, open for increasing/decreasing speed, etc.)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choi, H.; Jeon, H.; Noh, D.; Kim, T.; Lee, D. Hand-Guiding Gesture-Based Telemanipulation with the Gesture Mode Classification and State Estimation Using Wearable IMU Sensors. Mathematics 2023, 11, 3514. https://doi.org/10.3390/math11163514

AMA Style

Choi H, Jeon H, Noh D, Kim T, Lee D. Hand-Guiding Gesture-Based Telemanipulation with the Gesture Mode Classification and State Estimation Using Wearable IMU Sensors. Mathematics. 2023; 11(16):3514. https://doi.org/10.3390/math11163514

Chicago/Turabian Style

Choi, Haegyeom, Haneul Jeon, Donghyeon Noh, Taeho Kim, and Donghun Lee. 2023. "Hand-Guiding Gesture-Based Telemanipulation with the Gesture Mode Classification and State Estimation Using Wearable IMU Sensors" Mathematics 11, no. 16: 3514. https://doi.org/10.3390/math11163514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hand-Guiding Gesture-Based Telemanipulation with the Gesture Mode Classification and State Estimation Using Wearable IMU Sensors

Abstract

1. Introduction

2. Problem Definition

3. Method

3.1. Floating Body-Fixed Frame

3.2. Bi-Directional LSTM-Based Hand-Guiding Gesture Classification

3.3. Hand-guiding Gesture’s State Estimation

4. Experiments

4.1. Gesture Recognition

4.2. Gesture State Estimation

4.3. Validation of Hand-Guiding Gesture-Based Telemanipulation

5. Result and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI