1. Introduction
In character animation and humanoid robotics, analyzing and rebuilding different interactions between humans and the environment offers the opportunity to create alternative solutions for humanoid activities. In the last two decades, research on physics-based humanoid animation and robotics improved rapidly, resulting in highly realistic and adaptive control achievements [
1]. Despite this promising progress, the diversity and complexity of human behaviors, which includes a great number of behavior patterns and uncertain personal style preferences, have restricted the applications of previously proposed active controllers. The schematic of interaction based on humanoid active intention is shown in
Figure 1. The human body’s central nervous system, sensing system, and musculoskeletal energy system couple together to determine behavior performance. On the one hand, people tend to make a general decision with a typical skeletal composition reacting to a regular stimulus from the environment, which therefore performs a behavior pattern, that is, for example, walking slowly after a meal. Rebuilding this pattern requires a comprehensive analysis of the environment and social habits. On the other hand, human personality styles generate tiny distinctions of this behavior pattern, such as one would like to walk in a sneaky manner while another would like to swagger down the street. These distinctions will cause an unstable disturbance for the pattern-behavior rebuilding. Not only that, but with different characteristic body parameters between humanity and controlled characters, the performance of rebuilding always becomes sensitive to the environment and unpredictable. Although plenty of works have been devoted to solving these problems, such as the SIMBICON [
2] and GENBICON [
3] which create simple walking controllers, the data-driven controllers [
4,
5] which represent modulation methods for the mocap data, the solver-based controllers [
6,
7] which use dynamic solvers to optimize the reference motions, and the style-based controllers [
6,
8] which inherit the styles of the considered movements, they could be better generalized to various scenarios and robust to diverse human behavior changes. Therefore, programming a humanoid character to achieve human-level perception and motion control in environments is still an active while challenging issue for character animation.
Humanoid characters are complex systems that are hard to match with the sophistication and adaptability of style motion control intelligence. In addition, characters’ dexterous behavior requires high-level analysis that combines visual and proprioceptive perceptions to understand the object’s behaviors being handled. We envision humans can help the controlled characters react to these perceptions and produce whole-body style trajectories subject to these behaviors. In this work, we propose a style-based teleoperation framework to dynamically synchronize the style motion of a human operator and that of an animated character. With the help of the human operator’s perceptions and decisions to deal with different tasks in unknown environments, this framework can rebuild the human’s motion on the animated character while keeping this operator’s personal style. Besides, a human operator generates a motion with the understanding of the required task and perception of the character simulation scene, and the physics-based character then simulates a balanced motion that mimics the style of this human and achieves the task. As can be seen from
Figure 2, reference motion data with the center of mass (COM) of the human body is captured as input for the data processing and motion optimization part, and an optimized motion that mimics the style of the human body is generated. The body balance and synchronization controller then computes a set of body joint torques of the optimized motion to simulate the physics-based character.
The data processing part proposes a composite scheme to smooth the visual captured human motion and COM position, and the motion optimization part optimizes the smoothed motion in consideration of the current character pose with the high-level style features. High-level features for style mimicking are extracted referring to the human-style analysis in this paper. With the high-level features, we deliver a style-expressive performance of the character keeping with the operator’s motion in real-time. The optimization is able to synthesize the balance and style to create a feasible pose under different physic scales between the human operator and simulated character. This algorithm can reduce the unstable pose error owing to the characteristic differences (such as size, preference, etc.) as well as produce stable and stylized motion data. The body balance and synchronization controller generates body joint torques, considering the balance of the character and the optimized style motion. We devise a COM and root virtual control (CR-VC) strategy, adding on the PID joint controller and design a model-based torque compensation for the CR-VC. The compensation torques modulate the joint torques to synchronize the motion, and achieve a balance and simultaneous motion control for the character.
Our system cooperates the simulated character with the human operator for the required task and, in the meanwhile, mimics the motion style of the operator. Additionally, the motion synchronization of teleoperation is achieved in this work. Our framework can be implemented online in real-time with public physic engines without any inverse dynamics’ knowledge and preprocessing requirement. No offline optimization or learning methods are needed. To our knowledge, there is currently no synchronized teleoperation system that has all these features as well as aiming to keep balance and style while achieving the task motion in real-time.
Generalization is the core advantage of our framework. Specifically, it achieves generalization across regular patterns, motion styles, and characteristic parameters. This makes our system suitable for some interactive applications such as gaming and rehabilitation. For example, a set of virtual exercises on top of this framework can be designed for the diagnosis and rehabilitation of patients with impairments [
9,
10,
11], and this may also help increasingly immersive game variants improve user interest, enjoyment, and game-playing experience [
12]. Users can interactively create desired behaviors and immediately obtain flexible, stylized, and physically simulated imitation. In this context, a wide range of results can be achieved using our method. Under the human-character synchronization, we demonstrate the generalization across different balance recovery patterns and different walking styles. They are the integration of standing, balancing, walking, and transitions between them. Additionally, through the teleoperation system, our character can navigate towards a target object placed anywhere and reach to kick it. These adapt to various fundamental motions of gaming and rehabilitation.
Our paper is organized as follows.
Section 2 reviews the relevant works and
Section 3 gives a human-style analysis with the system overview. In
Section 4, the Kinect motion capture and data processing method is presented. The motion optimization based on the reference pose with style extractor is explained in
Section 5. The body balance and synchronization controller tracking the desired pose and balancing with rhythm synchronization is explained in
Section 6. In
Section 7, experiments and performance details are given by a demonstration of our system under various tasks. Finally, in
Section 8, we discuss this paper and mention a few ideas for future work.
2. Related Work
For the human-like motion animation, the physics-based controller offers an effective way to analyze the behavior patterns by visual-captured data and create the corresponding motions with balance and adaptiveness in the simulation. In recent decades, researchers worked on imitating the behavior patterns by mainly three modes as follows.
The first mode is to standardize a single regular motion pattern, and propose models and control strategies for it. Most previous works of this mode focus on walking control strategies [
6,
13,
14,
15,
16,
17,
18] to achieve a robust, stable, and generalizable walking performance in physics-based simulation. Models are used to simplify the joint actions from the high-dimensional humanoid structure, and controllers are provided to compute the dynamics of these models. The inverted pendulum model (IPM) [
13] is mainly designed for a humanoid walk, on behalf of the knee-unbent and slow speed feature of walking. Work in [
14] provided a strategy to extract the IPM from the whole body to automatically adjust the desired motion and produce an adaptive walk by the velocity-driven torque method. Although it achieved effective foot placement planning, the constant length of the IPM makes it hard to be used in the humanoid multi-motion control. Many works have developed this model by adding different elements such as the linear inverted pendulum model (LIPM) [
15], momentum-mapped inverted pendulum model (MMIPM) [
16], double inverted pendulum model (DIPM) [
17], or inverted-pendulum-based abstract model (IPAM) [
18]. Although these works showed good performances of extending the walking motion to other types of motion, they are limited to performing standardized motion patterns and were hard to express the style types of one pattern.
The second mode chooses befitting dynamic solvers to optimize the desired actions. Since humanoid actions involve high-dimensional joint groups, which will lead to complex and nonlinear dynamics of the action model, some algorithms are served to solve this problem. For example, there are the covariance matrix adaption (CMA) [
19], sequential least-squares quadratic programming (SLSQP) [
20], or iterative Linear Quadratic Gaussian (iLQG) [
21]). Others seek to solve this problem by simplifying and pre-linearizing the complex model, such as the quadratic program (QP) in [
6,
7,
22]. However, with a large number of samples and iterations, these methods remain computationally expensive and time-consuming for high-dimensional systems, which could cause problems for some applications that require fast and real-time responses.
The third mode adopts the kinematic balance strategy to correct the reference motion and show the feasibility by using penalty-based controllers. This mode is not limited by the characteristic simplified models, and can generate various gaits and styles of behavior patterns with low computational costs. A famous kinematic penalty-based controller is SIMBICON [
2], which adopts a linear feedback-penalty framework to adjust hip joint torques directly by the stance foot position error and COM velocity error. This framework, for its intuition and high efficiency, has been widely used in researches [
8,
23,
24,
25]. Typically, the GENBICON [
3] strategy developed the SIMBICON by an IPM pose design for foot placement and a transposed Jacobian virtual torque control (VTC) for COM velocity tuning. However, these works discriminately resolved the original data to partial joint trajectories and smoothed them for the pose design so that they filtered out the connotative style elements in the data. In this case, work in [
8] improved the GENBICON by adding a style extractor with the COM velocity curve and step width for a walking pattern. This work has proved the effectiveness of a high-level style feature extractor for the style mimicking issue.
Our work shares some methods of these works. These methods have demonstrated the ability to perform balanced, robust, and stylized motions, like penalty-based correcting strategies [
2] for joint angles and root orientation, IPM and LIPM in humanoid walking balance strategy [
3,
16], and some pattern features of human styles [
8]. Although they have provided high anthropomorphic imitation for some motions, these works are only based on offline mocap preparation with a repeatable control strategy process. Our system does not require any offline preprocessing, nor does it rely on specific motion pattern controls. We employ real-time human motion as the mocap input and propose the teleoperation framework to improve the motion quality, and optimize it in real-time for better tracking and style expression. Previous researches that address the real-time motion imitation problem are limited. Many use wearable, similarly, measuring devices involving force/torque sensors to obtain the human motion data synchronously with different mapping methods [
26,
27,
28,
29]. Different from them, we only use the visual camera to explore a larger motion space and more free movement. Work in [
30] employed the Kinect sensor and proposed a topple-free foot strategy for real-time motion reconstruction. Our framework adds a style extractor with corresponding pose optimization for a better style expression. Moreover, due to the different physic scales (size, shape, etc.) between the human operator and simulated character, we optimize the time synchronization of the system with the torque compensation based on the simplified model IPM and LIPM of the human body.
5. Motion Optimization
Although the above module has offered smoothed motion data, these data have yet to be improved to fit the character. Due to the limited accuracy of Kinect motion capture, operator behavior’s unpredictability and the characteristic differences between human and character, some motion data produced by the operator may not be suitable or even feasible for the controlled character. This section provides a motion optimization to not only make the data suitable for the current character but also inherit the style features from the operator. This optimization can be described in terms of four parts: motion states and state transitions, style feature extractor, style-based pose optimization, and balance strategy. Each of them is described in further detail.
5.1. Motion States and State Transitions
Humans always behave when standing with single leg or double legs. We adopt the finite state machine (FSM) [
2] to annotate the reference action with three states: left stance (LS), right stance (RS), and double stance (DS). These states are defined by the desired foot-ground contact of the human operator. If the vertical height difference between the left and right foot is within a threshold error, this pose is annotated as two foot-ground contacts with the DS state. If the vertical height of the left foot is higher than the right and the difference is larger than the threshold, the pose is annotated as right foot-ground contact with the RS state. Otherwise, it is the LS state.
The transition between these states in
Figure 6 is similar to the previous papers [
2,
3,
4]. As in the LS or RS state of human motion rhythm, the swing foot of the character may contact the ground earlier or later than the action changes the state. If the actual contact is earlier, the transition from LS or RS state to DS state is immediately triggered. If the actual contact is later, the current reference pose is kept until the character achieves the contact and changes the state.
5.2. Style Feature Extractor
Based on the human style analysis in
Section 3.1, high-level style features are extracted from the smoothed reference pose. The style feature extractor describes the human characteristic that constitutes the majority of both expressive and target-based style features. These features are the twist of ankles that determine the foot orientations, the leg motion plane normal curves, the relative position between the feet and COM and the heading of COM. Therefore, the style feature extractor set is denoted as
, which contains the twist angle of the left ankle
and right ankle
along the axis of the lower leg, the motion plane normal curves of the left leg
and right leg
, the vector from the COM to the left ankle
and right ankle
, and the heading of COM
. In this set, the ankle twist angle is extracted from the vertical axis of the ankle quaternion of the given pose. The motion plane of the leg is formed by the human upper leg and lower leg, so the normal curve of this plane should be perpendicular to the vector of both the upper leg and lower leg. The heading of COM is obtained by the vertical axis angle of the orientation of the root link.
These extracted features are delivered into the relevant modules with the following sections. Since they are extracted in the online experiment, the style feature set is time-varying according to the real-time movement of the human body.
5.3. Style-Based Pose Optimization
As mentioned above, the real-time reference data may not be easy to follow by the character at the current pose. Directly transmitting this data to the torque controller may cause an unpredictable disturbance that leads to a stable failure for the character. It often happens when a character stands on the ground with DS state.
Figure 7a indicates that the character at the current pose cannot reach the reference pose without slipping or changing its stance state. In this case, we propose the style-based pose optimization to make it feasible while maintaining the style features of this reference pose at the same time.
Firstly, the relative positions between the human ankle and pelvis links are calculated to simplify the human lower body as two connected IPM models, and the pelvis position relative to the character feet for the character’s lower-body simple model is then linearly retargeted with the human’s model.
Figure 7b represents the retargeting law. The positions of the character’s left and right ankles are fixed by the current pose, and the position of the character’s pelvis is calculated accordingly. The vertical height (the z position) of a character’s pelvis
is scaled by the difference between the size of human and character:
where
and
are the height of character’s and human’s lower-body, and the
and
are the given height of character and human model, respectively.
The horizontal position (the
and
position) of the pelvis
relative to the left ankle
and right ankle
is defined by the relationship between
and
:
Once the character’s simple model is determined, it becomes a three-dimensional inverse kinematics (IK) problem to calculate the optimized hip and knee orientation, which has an infinite number of solutions. As mentioned by study [
8], this problem can be constrained to a unique solution by specifying the signed inverse kinematics plane. We employ the plane normal curves by the leg motion plane normal curves
of features set to determine the inverse kinematics plane. It converges the three-dimensional IK problem to a two-dimensional IK problem with one unique solution.
Figure 7a displays the optimization process of the hip and knee orientation of the character. The reference pose in blue shadow is optimized to the red shadow pose, which shows the operability and style inheritance.
Lastly, we calculate the target foot heading with the foot style feature . Generally, the contact foot should be smoothly parallel to the ground for the full foot landing requirement. After the foot orientation is decided, the ankle orientation is accordingly computed by the quaternion difference between the lower body orientation and foot orientation.
5.4. Balance Strategy
Human balance behavior heavily relies on the root orientation and hip joints. We adopt the SIMBICON [
2,
4] modulation law to the root and hip joints.
In our system, for the balance of character motion, it is better to keep the transverse plane of the character body parallel to the ground. Therefore, the root link orientation in the X and Y axis is defined parallel to the X-Y plane of the world coordinate, and the root link orientation in the Z-axis equals the heading of COM .
The hip joints are also modulated according to the character motion. In the LS and RS state, for 3 DOF stance hip joints, the desired angle
is modulated by the reference stance hip joint
, and the current stance hip joint
with the weight
:
For each DOF swing hip joint, the balance modulation is applied to desired swing hip joint angle
as:
where
is the reference swing hip joint angle,
is the desired horizontal movement distance,
is the horizontal feedback movement distance from simulation,
and
are the desired and feedback horizontal velocity of COM, and
and
are the gain parameters referring to [
2], respectively.
8. Discussion
We have presented a style-based teleoperation framework for motion synchronization and style mimicking to achieve the tasks within the human-character interactions in real-time. In this work, the human operator can help the character understand the required tasks and percept the environment, and the character can achieve the tasks by following the human motion while maintaining the human style and keeping adaptable balance. The resulting motions are robust under external disturbances by different balance recovery patterns, like changing the stance states or waving the hands, decided by the human operator. Additionally, it can mimic styles under distinct style differences with task achievement and avoid the balance failure caused by the characteristic difference between the operator and character. Our framework does not rely on inverse dynamics or dynamic solvers for computation and can run in the publicly available forward simulation engines without any preprocessing of offline optimization or learning methods. These features make our method useful for many interactive applications such as games and rehabilitations. We expect this system to amplify a user’s authoring effort in creating more comprehensive behaviors based on the environment and social habits. It can be a circular chain formed by the human secular cognition and the physics-based character animation alternatively, and can promote the interaction and integration between them for a more anthropomorphic virtual world.
Certain limitations still exist. There are some limitations of the operator motions in our system. Due to the model-based torque compensation in
Section 6.3, the operator motions should fit the simplified model like the IPM in the DS state and LIPM in the LS and RS state. This restricts the motions from high dynamic or abnormal movements. Another drawback of our method is that it does not guarantee the motion feasibility of the character. Our system partially relies on human perception and understanding of the current tasks and situations. If the human is unresponsive or unskilled in operating the character, the control performance may not be as good as it should be. Future work may involve employing a pattern and style database to predict the next pose with respect to the required tasks and human pose. This can help humans control the character easier. The integration of some strategies for solving high-dynamic motions with high computational efficiency should be another interesting future direction. This would provide the users to create rich styles and patterns of motion in the teleoperation. Moreover, with rich styles and patterns, multi-person situations can be considered to create a more socialized and comprehensive strategy.