You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

17 May 2024

Machine Learning-Based Hand Pose Generation Using a Haptic Controller

,
,
and
1
Department of Digital Media, Seoul Women’s University, Seoul 01797, Republic of Korea
2
Department of Computer Science, Boston University, Boston, MA 02215, USA
3
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85719, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Multi-Robot Systems: Collaboration, Control, and Path Planning

Abstract

In this study, we present a novel approach to derive hand poses from data input via a haptic controller, leveraging machine learning techniques. The input values received from the haptic controller correspond to the movement of five fingers, each assigned a value between 0.0 and 1.0 based on the applied pressure. The wide array of possible finger movements requires a substantial amount of motion capture data, making manual data integration difficult. This challenge is primary due to the need to process and incorporate large volumes of diverse movement information. To tackle this challenge, our proposed method automates the process by utilizing machine learning algorithms to convert haptic controller inputs into hand poses. This involves training a machine learning model using supervised learning, where hand poses are matched with their corresponding input values, and subsequently utilizing this trained model to generate hand poses in response to user input. In our experiments, we assessed the accuracy of the generated hand poses by analyzing the angles and positions of finger joints. As the quantity of training data increased, the margin of error decreased, resulting in generated poses that closely emulated real-world hand movements.

1. Introduction

In character animation, the generation of dynamic movements, including body and hand poses, is crucial for conveying emotions, actions, and subtle expressions. Smooth and realistic hand movements significantly enhance the immersion of animated characters, thereby elevating storytelling and audience engagement. Character animation is commonly categorized into two main approaches: data-driven and physics-based methods. Data-driven methods utilize existing datasets of motion capture or keyframe animation to inform character movements. These methods typically involve algorithms that analyze and interpolate motion data for animation generation [1,2]. On the other hand, physics-based methods simulate the physical interactions and constraints that govern movement in the real world [3,4]. In recent years, there has been active research into techniques employing deep learning for such a data generation scheme. However, research on specifically generating hand motions remains largely unexplored, with the majority of studies concentrating on character movement and interaction. The field of hand tracking technology has shown significant advancements, particularly with camera-based systems [5]. Such systems excel at capturing the overall shape and position of the hand in real time, offering valuable data for diverse applications. However, camera-based tracking often faces limitations in capturing the fine details of individual finger movements. To address this, haptic controllers provide highly precise data on individual finger positions and bending angles. This information is crucial for tasks requiring delicate manipulation in VR or animation. Furthermore, haptic controllers offer a supplementary control method in situations where camera views might be occluded. Overall, haptic controllers complement camera-based hand tracking by providing more refined finger movement data, enhancing their utility in various applications. Thus, in this study, we propose a novel framework that uses a machine learning model and subsequently utilizes input values from a haptic controller to generate animations of hand poses. In particular, our work introduces a custom-built haptic controller, as illustrated in Figure 1, where the five buttons are manipulated using the fingers. Because each button is attached to a motor, the custom controller is able to provide a haptic feedback to the fingers when grasping a virtual object.
Figure 1. In-house haptic controller.
To facilitate data generation for our machine learning model, we initiated the process by smoothly integrating the Unity game engine with the haptic controller. Through this integration, we were able to use the input values from the controller to create detailed hand poses, which were essential components for training our model. In our setup, the haptic controller serves as the means for user interaction. When no buttons are engaged, the fingers are relaxed, while activating all buttons manually transitions the hand into a clenched fist. This intentional setup ensures precise control over the hand’s state, providing various expressions for data generation. Users are able to easily manipulate the haptic controller with their five fingers, shaping the hand’s configuration according to their preferences. Each manipulation prompts the automatic generation of corresponding data points, effectively translating the user’s intended hand movements into data. These input values, reflecting the movements of the five fingers, formed the foundation of our dataset. Meanwhile, the output values captured the nuanced angles of 15 joints, closely aligned with the poses of the fingers. With each finger having three joints, totaling 45 rotational dimensions, the output data offered a comprehensive representation of hand movement. For every frame of data, we followed a structured format: five input values paired with forty-five output values, ensuring a robust dataset for effective machine learning. Leveraging this carefully created dataset, spanning thousands of frames, we trained an XGBoost model to generate natural hand gestures.
The training methods of machine learning could be classified into three categories: supervised learning, unsupervised learning, and reinforcement learning. In our research, we introduce a method specifically aimed at generating smooth animations through supervised learning, which utilizes labeled data to establish relationships between input features and desired outputs. To effectively train the model, we relied on a labeled dataset containing examples of hand poses paired with their corresponding smooth animations, with various features extracted to capture hand movements such as joint angles and hand positions.
To generate hand gesture data, we captured hand poses using motion capture equipment. The generated motion consists of bending and extending one finger at a time. We used five motions that fully bend and extend the thumb, index finger, middle finger, ring finger, and pinky finger, respectively. Each motion consists of a total of 2000 frames. The label for a fully extended finger is 0, and the label for a fully bent finger is 1. In between, the pose is assigned a real value between 0.0 and 1.0 based on the bending angle. A hand consists of five fingers, and each finger has three joints, for a total of fifteen joints. Each joint has three angles ( x , y , and z ) in a local coordinate system. This indicates that, in one frame, a hand pose has 45 angle values. The five input buttons of the haptic controller correspond to the poses of each finger. Thus, five real numbers between 0.0 and 1.0 correspond to 45 real numbers between 0 and 360 degrees. The haptic controller was connected to the Unity game engine and automatically generated various data based on user input. Table 1 is an example of data created in this manner. By connecting the Unity game engine and haptic controller, we could automatically generate and label datasets. This allowed us to generate as much training data as needed and rapidly train and evaluate our models, which helped in developing our hand pose mapping solution.
Table 1. Example of training data generated from Unity: each frame includes input values from five fingers, producing three Euler angles per finger joint, for a total of nine output values.
The generated datasets were of five types: 1000, 3000, 5000, 7000, and 9000. They were trained using the XGBoost algorithm, and each dataset was used as training and validation data in a ratio of 8:2. Throughout the training process, which was evaluated using a validation set to ensure generalizability, the model learned to correlate input hand pose features with the resulting hand movement generation. This approach enabled the generation of smooth animations, even for novel hand poses, as the model can predict the corresponding animation when presented with new input. By leveraging supervised learning techniques, we can train models to produce realistic and seamless animations of the hand movement, ultimately enhancing the quality of animations across diverse applications, including virtual reality (VR), gaming, and motion capture. Particularly, hand pose generation is of significant importance in VR. Realistic hand representation plays a crucial role in enhancing the sense of presence within VR environments. When virtual hands accurately mimic real-world hand movements, it reduces cognitive dissonance and enhances immersion. Thus, our proposed method effectively maps user input from a haptic controller to generate smooth and natural hand pose animations, contributing to both immersive VR experiences and advancements in character animation.
The key contributions of our proposed framework are outlined as follows:
  • We utilized an in-house haptic controller with a Unity game engine to expedite motion data collection. A haptic controller and Unity integration improved the speed, accuracy, and user-friendliness of hand pose data collection.
  • We employed supervised learning to generate hand pose animation seamlessly, eliminating the need for extensive motion data.
  • We systematically assessed the effectiveness of our proposed method by evaluating joint angle and position errors across different amounts of data sample sizes.

3. Methods

We propose a novel machine learning-based technique to generate natural hand poses from a haptic controller. Figure 2 presents the overall pipeline of our work. First, to generate training data, we paired five input values from the haptic controller with corresponding hand poses. Each hand pose consisted of fifteen joint angles, three for each of the five fingers, stored in the Euler (x, y, z) angle format within a local coordinate system. Consequently, each training dataset comprised five real values as inputs and 45 ( 15 × 3 ) angle values as outputs. Using thousands of such generated datasets, we trained a machine learning model to generate hand gestures. In our experiments, we observed variations in the accuracy of results based on the method and quantity of training datasets generated.
Figure 2. The overall pipeline. (a) The haptic controller is utilized to generate coordinates from each joint, and (b) XGBoost is then employed for estimation. Input data are the force of 0–1 applied to each finger, and output data are the corresponding coordinates (x, y, and z) of each joint. Here, i represents the individual row of data from the force and coordinate pair.

3.1. Data Collection Using Unity Game Engine

In this work, to train the machine learning model for generating hand poses, we used the Unity game engine to create multiple training datasets with varying sizes. As shown in Figure 3, we used the Unity game engine to generate hand pose data and check the results trained with machine learning. Initially, the haptic controller was manually manipulated to generate values ranging from 0.0 to 1.0 . From these values, we established a real-time data generation system capable of saving hand poses. The input values from users were collected via the five buttons on the haptic controller. This controller was linked to the Unity game engine using a specially crafted module, facilitating the instantaneous transfer of input data. When the buttons are inactive, the fingers maintain their fully extended position, and, when pressed, they bend accordingly. This setup enabled us to precisely capture the user’s hand movements.
Figure 3. Unity screen for hand pose data generation.
We gathered input data from the haptic controller and calculated the output values, representing the angles of 15 joints in hand poses. These angles were expressed as Euler angles (x, y, and z) and were computed at a rate of 90 frames per second, resulting in the creation of 1000 datasets. Moreover, we conducted experiments to explore the effects of varying data quantities. By incrementally increasing the dataset size in 2000-unit intervals, we produced datasets containing 3000, 5000, 7000, and 9000 samples. Using this approach, we evaluated how the size of various datasets impacted the accuracy of generated hand poses. To streamline the data collection process, the system incorporates an automated saving feature. When users repeatedly press and release the buttons on the haptic controller within a set timeframe, the data are automatically stored. This functionality ensured efficient and hassle-free data generation, enhancing the overall user experience. Table 1 illustrates an example of the dataset generated from Unity. The left column shows the frame number, the middle column shows the input values of the five buttons on the haptic controller, and the right column shows the 45 output values, which are the angles of the 15 hand joints that generate the hand pose. Each joint has three angle values ( x , y , and z ) , and each finger has three joints. Thus, nine output values correspond to one input value. We used the generated data for training and validation in a ratio of 8:2.

3.2. Hand Pose Estimation Using Machine Learning

For our application, we employed machine learning techniques to generate realistic and seamless hand poses. We utilized supervised learning, leveraging an annotated dataset obtained from the Unity game engine. Specifically, we selected Extreme Gradient Boosting (XGBoost), since it is known for its performance and speed. Furthermore, XGBoost can handle regression tasks and incorporate L1 and L2 regularization methods to mitigate overfitting. In this work, we utilized the XGBoost algorithm to illustrate the effectiveness of combining a number of weak models to ultimately create a strong model. Multiple decision trees are combined in the XGBoost method to form Classification and Regression Trees (CART). A decision tree categorizes input data based on features at each node. Therefore, the XGBoost method uses multiple decision trees with different weights according to their characteristics. Consequently, it trains using individual trees based on different features.
The equation for the CART model is as follows:
y i ^ = k = 1 K f k ( x i ) , f k F C A R T
Each function f k in the CART model serves as a weighted tree function for specific features. The XGBoost model predicts the output of input x i as the sum of each output from all weighted functions in the CART model. The training objective function of CART is as follows:
o b j ( θ ) = i = 1 n l ( y i , y i ^ ) + k = 1 k Ω ( f k )
Here, the function l ( y i , y i ^ ) represents the loss function used for y i and y i ^ , and Ω ( f k ) acts as a regularization function to prevent overfitting.

4. Experiments

4.1. Experimental Setup

In our experimental data training approach, we aimed to optimize results through parameter tuning within the XGBoost algorithm. We systematically adjusted three key parameters: the number of rounds to determine the quantity of trees, the maximum tree depth, and the lambda for regularization. Our experimentation involved thorough exploration, testing different numbers of iterations for the number of rounds, and varying depths for each decision tree layer. To tackle the prevalent concern of overfitting observed in deeper layers, we strategically approached the task by categorizing the lambda parameter, which plays a important role in L2 regularization. For parameter tuning, we used grid search to identify the most effective configuration. In addition, for this work, we generated thousands of diverse hand pose datasets containing non-sequential and dissimilar hand poses to help avoid overfitting. After thorough experimentation and rigorous evaluation, we discerned that the optimal setup entailed the utilization of 200 rounds of estimators, a 3-layer architecture, and a lambda value of 10. Throughout the duration of our experiment, we maintained a constant training-to-testing ratio of 7:3 for each hand pose dataset. Python 3.12.0 with XGBoost package version 2.0.3, and scikit-learn package version 1.4.0 were utilized for conducting our experiments.

4.2. Pose Difference Detection

In this work, we compared the actual pose of the hand with the generated pose to measure the accuracy of the hand movements produced by the XGBoost algorithm. We analyzed the accuracy of 15 finger joints involved in animation, among the numerous joints comprising the hand. In Figure 4, we demonstrate how we assessed the error for a single finger joint within a single frame. In the left image, dotted lines represent the finger bones generated by our proposed method, while solid lines depict those measured from the actual pose. The small circles along the lines indicate the joints. Within the left image, for one of the three bones forming a finger, we choose a consistently colored reference joint and place it at the same position. Subsequently, we calculate both the angle and distance between these two bones. In the right image, the red arc illustrates the angle error between the two bones, while the blue line signifies the distance error between the two joints. Considering that we align the reference joint uniformly, all measured errors are classified as local errors. We exclusively measure angle differences concerning local axes, as referencing them based on the world axis holds no significance. As for position differences, we determine the distance variation relative to the wrist’s root position, which serves as the reference for the world position.
Figure 4. Difference measurement of position and angle: The dotted lines represent the generated pose, while solid lines represent the ground truth pose. The red arc indicates the angle difference, and the blue line represents the position difference.
To calculate the position error ( S d ) and angular error ( S a ) of hand poses, we used the following formula:
S d = j = 1 n 2 i = 1 n 1 d i , d i = p 1 i p 2 i
S a = j = 1 n 2 i = 1 n 1 θ i , θ i = a r c c o s v 1 i · v 2 i v 1 i v 2 i
S d represents the sum of all finger joint position errors across all frames in a hand gesture animation. p 1 denotes the position of a finger joint in the original pose, and p 2 denotes its position in the generated pose. Likewise, S a represents the sum of all finger joint angle errors across all frames in a hand gesture animation. v 1 denotes the vector of a finger bone in the original pose, and v 2 denotes its vector in the generated pose. The direction of the vector is defined from the joint closer to the palm to the joint closer to the fingertip. The angle between the two vectors is calculated using the properties of the vector dot product. n 1 represents the number of finger joints, and n 2 represents the total number of frames in the hand gesture animation, respectively. We set n 1 at 15 and n 2 at 2000.
The values in Table 2 and the graph in Figure 5 represent the angular differences between the original and generated hand poses. These angular differences are calculated using the formula for S a , which measures the sum of the angle errors across all finger joints and all frames in the animation. The green graph in Figure 6 represents the local position differences of finger joints ( p 1 , p 2 ) in the local coordinate system. These local position differences are calculated using the formula for S d , which measures the sum of the position errors across all finger joints and all frames in the animation. The red graph in Figure 6 represents the world position differences of finger joints ( p 1 , p 2 ) in the world coordinate system. These world position differences are also calculated using the formula for S d .
Table 2. Effects of training dataset size on hand pose generation accuracy.
Figure 5. Results of angle differences. The x-axis indicates the number of training data samples and the y-axis indicates the angle error in degrees (°).
Figure 6. Results of position differences. The x-axis indicates the number of training data samples and the y-axis indicates the position differences.

4.3. Experimental Results

We utilized visually distinguishable differences in the angles and positions of joints among various errors to measure the accuracy of the hand poses generated from the XGBoost algorithm. Table 2 shows the errors of hand poses generated using the training dataset. From left to right, the table presents results obtained using 1000, 3000, 5000, 7000, and 9000 data samples. The angle difference represents the cumulative error of finger joint angles between the original pose and the pose generated by the proposed method. The unit for the angle is degrees (°), and position does not have a specific unit as it corresponds to values measured in the Unity game engine.
To measure the accuracy of hand poses generated by the trained machine learning model, 2000 data samples were created by manipulating the haptic controller. Although generated using the same method as the training data, the only difference lies in their quantity and content. The error values illustrate the cumulative difference in all angles over 2000 frames across 15 joints, compared to the ground truth pose. For instance, using a machine learning model trained with 1000 data samples resulted in a cumulative error of 93,232 across all poses generated. The measurement method for position difference is similar to angle difference, but instead calculates the error using joint positions instead of joint angles. World position refers to the results measured along the world axis, while local position refers to the results measured along each joint’s local axis. There was a tendency for the error to decrease as the number of data samples increased.
Figure 5 shows the results of the angle differences illustrated as a graph in Table 2. The vertical axis represents the angle error, while the horizontal axis represents the amount of training data. As shown in Table 2, as the number of data samples increases, the joint angle error decreases. Similarly, Figure 6 illustrates the results of the position differences from Table 2 as a graph. The red line represents the world position difference, while the green line represents the local position difference. Similar to angle differences, the joint position error decreases as the number of data samples increases. The world position error is slightly larger than the local position error due to the cumulative effect of errors from local positions into world positions. The larger discrepancy in errors between the 1000 and over 3000 datasets may be attributed to the test dataset containing twice as many samples, totaling 2000. From approximately 3000 samples, when the amount of test data exceeds that of the training data, we observe a consistent error rate. However, the reduction in error is relatively modest compared to the increase in dataset size.
Figure 7 presents a sequence of images with generated hand poses based on the number of data samples. The numbers at the top represent the number of data points used for training. The column on the far left shows the hand poses generated using real-world data. These are a few representative poses out of 2000 hand poses. From left to right, it illustrates the hand poses generated using 1000, 3000, 7000, and 9000 training data samples, respectively.
Figure 7. Sequence of images showing hand poses generated from varying numbers of data samples. The dotted red circles indicate the areas where there are errors in the position and angle of the hand joints.
Each column of consecutive images captures results at intervals of 150 frames (about 1.67 s), starting from frame 30 (i.e., frames 30, 180, 330, 480, 630, 780, 930, 1080, and 1380). Since poses in sequential frames are similar, we extracted poses with a difference of more than 1 s for acquisition. Hand poses generated with 1000 data samples visually differ from the ground truth pose, especially in the way the second finger bends in the first and last images. However, as the number of data samples increases, the similarity to the ground truth pose becomes more apparent. The hand poses generated by the model trained with 1000 data samples exhibit noticeable visual discrepancies. However, as the amount of training data increases to 3000, subtle differences from the actual poses emerge, though they may not be easily discernible without close observation. With over 9000 samples, the generated poses closely resemble the actual ones, making it very difficult to distinguish between them.
In Figure 7, inconsistencies in the 1000-data point results can be observed in spot poses, but it is difficult to detect differences in the pose when there are 3000 or more data points. This is mainly because the error values specified in Table 2 are cumulative values of all the errors that occur in the 15 finger joints over 2000 frames, thus appearing large. However, the error that occurs in a single finger joint in a single frame is negligible. In other words, in the 1000-data point result, the accumulated angular difference is 93,232 degrees, but the actual error that occurs in a single finger joint, visible to the naked eye, is very small, averaging 3.1 degrees (93,232/(2000 × 15)). In the 3000-data point results, the angular error averages 1.133 degrees (33,992/(2000 × 15)), and the error that occurs in a single finger joint is even smaller, around 1 degree. The positional difference can also be calculated in the same way, and the accumulated error is 152.3 in local terms, but the error that occurs in a single finger joint is also negligible, averaging 0.051 (152.3/(2000 × 15)).

5. Conclusions

In this work, we present a method to create natural hand movements using an XGBoost algorithm based on input values from our custom-built haptic controller. We synchronized the Unity game engine with a haptic controller to generate multiple training datasets consisting of thousands of frames. Moreover, we conducted experiments by adjusting the hyperparameters of the XGBoost algorithm numerous times to identify the most accurate parameters for generating hand poses. Specifically, we generated hand gestures using five different datasets ranging from 1000 to 9000 data points with 2000-point increments. We observed that the error decreased, and the generated poses became more similar to the real poses, as the number of data points increased. To measure the error, we considered not only visual aspects such as the pose but also numerical values such as the angles and positions of the finger joints, which are difficult to observe accurately with the naked eye. For the position error, we measured it using both local coordinates with the parent of the finger joint as the origin and world coordinates with the wrist as the origin. The error was larger in the world coordinate system, which occurred because the overall position of the joints was farther from the origin. However, similar to the local coordinate system, the error showed a decreasing trend as the number of data points increased. The results using the 1000-data point dataset could be distinguished to some extent with the naked eye, but the results trained with 3000 or more data points were difficult to distinguish. Therefore, we measured the positions and angles of the finger joints as a criterion for accuracy. This led to the outcome that the number of training data points and the size of the error showed an inversely proportional relationship.
However, as the amount of training data surpassed 10,000 samples, significant improvements in visual aspects or error reduction were not observed. By employing a trained XGBoost model, we can generate hand poses that appear natural without the need for extensive motion data. Furthermore, we propose a technique to generate hand poses from stick-type haptic controllers, which do not align with hand shape. However, even with extensive training data, minor tremors may still occur during hand pose generation. This highlights the potential benefits of integrating time-series methods or employing other refined approaches to enhance the process further. In the future, it would be interesting to explore making the controller thinner and more ergonomically designed for comfortable handling, or to utilize motion data interpolation to enable interaction with surrounding objects or environments. This study could enhance user experience and offer exciting possibilities for further development.

Author Contributions

Methodology, E.-J.L.; Implementation, J.C. and D.O.; Validation, J.L.; Investigation, J.L. and D.O.; Data curation, J.C.; Writing—review & editing, E.-J.L.; Supervision, E.-J.L.; Funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Institute of Information and Communications Technology Planning and Evaluation (IITP): 2021-0-00986, and supported by a research grant from Seoul Women’s University: 2024-0028.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Gleicher, M. Retargetting Motion to New Characters. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques, Orlando, FL, USA, 19–24 July 1998; Association for Computing Machinery: New York, NY, USA, 1998; pp. 33–42. [Google Scholar] [CrossRef]
  2. Kovar, L.; Gleicher, M.; Pighin, F. Motion Graphs. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques, San Antonio, TX, USA, 23–26 July 2002; Association for Computing Machinery: New York, NY, USA, 2002; Volume 21, pp. 473–482. [Google Scholar] [CrossRef]
  3. Faloutsos, P.; Van De Panne, M.; Terzopoulos, D. Composable controllers for physics-based character animation. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’01), Los Angeles, CA, USA, 12–17 August 2001; Association for Computing Machinery: New York, NY, USA; pp. 251–260. [Google Scholar] [CrossRef]
  4. Coros, S.; Beaudoin, P.; Van De Panne, M. Robust task-based control policies for physics-based characters. In SIGGRAPH Asia ’09: ACM SIGGRAPH Asia 2009 Papers; Association for Computing Machinery: New York, NY, USA; pp. 1–9. [CrossRef]
  5. Kapitanov, A.; Kvanchiani, K.; Nagaev, A.; Kraynov, R.; Makhliarchuk, A. HaGRID–HFurthermore, Gesture Recognition Image Dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024. [Google Scholar]
  6. Pejsa, T.; Pandzic, I.S. State of the Art in Example-Based Motion Synthesis for Virtual Characters in Interactive Applications. In Computer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2020; Volume 29, pp. 202–226. [Google Scholar] [CrossRef]
  7. Geijtenbeek, T.; Pronost, N. Interactive Character Animation Using Simulated Physics: A State-of-the-Art Review. In Computer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2012; Volume 31, pp. 2492–2515. [Google Scholar] [CrossRef]
  8. Karg, M.; Samadani, A.-A.; Gorbet, R.; Kühnlenz, K.; Hoey, J.; Kulić, D. Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation. IEEE Trans. Affect. Comput. 2013, 4, 341–359. [Google Scholar] [CrossRef]
  9. Wang, X.; Chen, Q.; Wang, W. 3D Human Motion Editing and Synthesis: A Survey. Comput. Math. Methods Med. 2014, 2014, 104535. [Google Scholar] [CrossRef] [PubMed]
  10. Alemi, O.; Pasquier, P. Machine Learning for Data-Driven Movement Generation: A Review of the State of the Art. arXiv 2019, arXiv:1903.08356. [Google Scholar]
  11. Marsot, M.; Rekik, R.; Wuhrer, S.; Franco, J.S.; Olivier, A.H. Correspondence-free online human motion retargeting. arXiv 2023, arXiv:2302.00556. [Google Scholar]
  12. Victor, L.; Meyer, A.; Bouakaz, S. Pose Metrics: A New Paradigm for Character Motion Edition. arXiv 2023, arXiv:2301.06514. [Google Scholar]
  13. Holden, D.; Saito, J.; Komura, T.; Joyce, T. Learning Motion Manifolds with Convolutional Autoencoders. In SA ’15: SIGGRAPH Asia Technical Briefs; Association for Computing Machinery: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
  14. Holden, D.; Saito, J.; Komura, T. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 2016, 35, 138. [Google Scholar] [CrossRef]
  15. Holden, D.; Habibie, I.; Kusajima, I.; Komura, T. Fast Neural Style Transfer for Motion Data. IEEE Comput. Graph. Appl. 2017, 37, 42–49. [Google Scholar] [CrossRef]
  16. Grassia, F.S. Practical Parameterization of Rotations Using the Exponential Map. J. Graph. Tools 1998, 3, 29–48. [Google Scholar] [CrossRef]
  17. Pavllo, D.; Grangier, D.; Auli, M. QuaterNet: A Quaternion-based Recurrent Model for Human Motion. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; BMVA Press: London, UK, 2018. Available online: Https://dblp.org/rec/conf/bmvc/PavlloGA18 (accessed on 31 September 2018).
  18. Pavllo, D.; Feichtenhofer, C.; Auli, M.; Grangier, D. Modeling Human Motion with Quaternion-Based Neural Networks. Int. J. Comput. Vis. (IJCV) 2020, 128, 855–872. [Google Scholar] [CrossRef]
  19. Kim, S.; Park, I.; Kwon, S.; Han, J. Motion Retargetting based on Dilated Convolutions and Skeleton-specific Loss Functions. Comput. Graph. Forum 2020, 39, 497–507. [Google Scholar] [CrossRef]
  20. Aberman, K.; Weng, Y.; Lischinski, D.; Cohen-Or, D.; Chen, B. Unpaired Motion Style Transfer from Video to Animation. ACM Trans. Graph. 2020, 39, 64. [Google Scholar] [CrossRef]
  21. Aberman, K.; Li, P.; Lischinski, D.; Sorkine-Hornung, O.; Cohen-Or, D.; Chen, B. Skeleton-Aware Networks for Deep Motion Retargeting. ACM Trans. Graph. 2020, 39, 62. [Google Scholar] [CrossRef]
  22. Lee, K.; Lee, S.; Lee, J. Interactive Character Animation by Learning Multi-Objective Control. ACM Trans. Graph. 2018, 37, 180. [Google Scholar] [CrossRef]
  23. Starke, S.; Zhao, Y.; Komura, T.; Zaman, K. Local Motion Phases for Learning Multi-Contact Character Movements. ACM Trans. Graph. 2020, 39, 54. [Google Scholar] [CrossRef]
  24. Holden, D.; Kanoun, O.; Perepichka, M.; Popa, T. Learned Motion Matching. ACM Trans. Graph. 2020, 39, 53. [Google Scholar] [CrossRef]
  25. Tang, X.; Wang, H.; Hu, B.; Gong, X.; Yi, R.; Kou, Q.; Jin, X. Real-time controllable motion transition for characters. ACM Trans. Graph. 2022, 41, 1–10. [Google Scholar] [CrossRef]
  26. Duan, Y.; Shi, T.; Zou, Z.; Lin, Y.; Qian, Z.; Zhang, B.; Yuan, Y. Single-shot motion completion with transformer. arXiv 2021, arXiv:2103.00776. [Google Scholar]
  27. Kirac, F.; Kara, Y.E.; Akarun, L. Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recognit. Lett. 2014, 50, 91–100. [Google Scholar] [CrossRef]
  28. Tang, D.; Yu, T.H.; Kim, T.K. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013. [Google Scholar]
  29. Tang, D.; Jin Chang, H.; Tejani, A.; Kim, T.K. Latent regression forest: Structured estimation of 3D articulated hand posture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
  30. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  31. Oberweger, M.; Wohlhart, P.; Lepetit, V. Hands deep in deep learning for hand pose estimation. arXiv 2015, arXiv:1502.06807. [Google Scholar]
  32. Arimatsu, K.; Mori, H. Evaluation of machine learning techniques for hand pose estimation on handheld device with proximity sensor. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar]
  33. Wu, M.Y.; Ting, P.W.; Tang, Y.H.; Chou, E.T.; Fu, L.C. Hand pose estimation in object-interaction based on deep learning for virtual reality applications. J. Vis. Commun. Image Represent. 2020, 70, 102802. [Google Scholar] [CrossRef]
  34. Ohkawa, T.; Li, Y.-J.; Fu, Q.; Furuta, R.; Kitani, K.M.; Sato, Y. Domain adaptive hand keypoint and pixel localization in the wild. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.