A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks

Yadav, Shubham; Narayan, Jyotindra

doi:10.3390/data11060142

Open AccessData Descriptor

A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks

by

Shubham Yadav

¹

and

Jyotindra Narayan

^2,*

¹

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna 801106, India

²

Smart Healthcare & Robotics Interfacing Laboratory, Department of Mechanical Engineering, Indian Institute of Technology Patna, Patna 801106, India

^*

Author to whom correspondence should be addressed.

Data 2026, 11(6), 142; https://doi.org/10.3390/data11060142

Submission received: 9 May 2026 / Revised: 9 June 2026 / Accepted: 10 June 2026 / Published: 12 June 2026

Download

Browse Figures

Versions Notes

Abstract

The majority of currently available hand kinematic databases have been gathered using expensive marker-based systems or are restricted to a particular gesture-recognition task, failing to capture the dynamic nature of joints when the hand is engaged with an object. To address this gap, we introduce the RGB-based Hand Joint Kinematics (RGB-HJK) dataset, a publicly available collection of continuous, frame-level 3D joint angle trajectories, recorded while ten healthy adults (six male, four female; age

25.8 \pm 3.2

years; BMI

22.8 \pm 2.0

kg/m²) performed five standardized object interaction grasps: Power Grasp (cylindrical bottle), Tripod Grasp (pen), Static Power Hold (smartphone), Precision Pinch (thin paper), and Lateral Pinch (book). Data were collected using a standard RGB camera and the MediaPipe Hands markerless pipeline at

26.95 \pm 0.29

Hz, a rate that was stable across all subjects. Each participant completed five trials for each grasp type. After filtering using active hold, 28,111 validated frames remained, with a 100% detection rate for all 250 trials. Intra-subject repeatability was good (mean SD

\leq 7.9 °

across all joint grasp combinations) and inter-subject variability was within the range expected based on normal anatomical diversity. Importantly, kinematic validation of the Index Proximal Interphalangeal (PIP) joint (61.8° ± 18.4°) showed values consistent with ranges reported in previous studies using instrumented gloves and depth sensors. Principal Component Analysis (PCA) confirmed clear linear separability among the five grasp configurations. Unlike existing datasets, the RGB-HJK method does not compromise the natural sense of touch and is free of hardware occlusions, thereby providing an easily accessible ecological baseline.

Dataset: https://doi.org/10.34740/kaggle/dsv/16123762.

Dataset License: CC BY 4.0.

Keywords:

hand kinematics; grasp taxonomy; markerless motion capture; MediaPipe Hands; joint angles; rehabilitation; gesture recognition; RGB camera

1. Introduction

The human hand is arguably the most versatile part of the human body, enabling a smooth transition from forceful interactions with the environment to highly precise fine motor tasks [1,2,3]. Such unique versatility makes the quantitative characterization of hand joint kinematics a valuable pursuit in a wide variety of disciplines, including clinical rehabilitation, prosthetic hand control, human–computer interaction, and gesture recognition [4,5,6]. High-quality kinematic datasets provide the fundamental raw material that enables researchers to accurately model joint coordination, track recovery trajectories after neuromusculoskeletal injury, and design assistive technologies that truly reflect real-world, practical hand function [7,8,9].

For a long time, accurate human movement analysis relied on expensive marker-based systems. Although these systems provide precise spatial tracking, they require dedicated setups, lengthy preparation, and body-mounted markers that can interfere with natural hand movements during grasping tasks [10,11,12,13]. Instrumented data gloves can address several of the concerns mentioned above related to line of sight and marker occlusion; however, they also raise additional challenges. Such gloves tend to be uncomfortable due to size, frequently requiring calibration and custom fitting of the instrument. In turn, recently developed RGB-D cameras have become an acceptable solution, though they entail trade-offs; however, they require specialized equipment and may fail to properly locate an object when fingers grasp it tightly enough that each finger obstructs the others. This can cause tracking to fail, which is problematic when aiming for accurate results [14,15].

Deep learning-based pose estimation has reached a high level of maturity and has greatly contributed to hand tracking and dataset generation. Frameworks such as MediaPipe Hands allow for real-time 3D hand landmark inference from standard RGB cameras, without the use of physical markers, depth sensors, or complex lab infrastructure [16,17,18]. Based on these advances, a number of recent hand kinematic datasets have been proposed [19,20]. However, the majority of these publicly available datasets are designed for static pose estimation or abstract gesture recognition tasks [19,20]. They mostly focus on classifying hand shapes, rather than encoding the continuous, frame-by-frame joint angle trajectories that are dynamically generated during a hand’s physical interaction with an object [21,22]. Even among those databases that offer kinematics information, there is little to no structure in terms of alignment with existing grasp classifications [23,24,25]. Daily grasping activities occur in predictable, repeatable patterns that have been studied and classified extensively [24]. Unfortunately, such grasp classifications are not consistently available in many databases. This creates a gap between the data available from vision-based databases and the need for proper biomechanical modeling of the hand. Another issue in this domain is the cost of such a setup. Table 1 presents a comprehensive comparison to other well-known datasets already present in the literature since 2017.

Considering the aforementioned constraints and deficiencies in existing hand–object interaction datasets (Table 1), the proposed RGB-Based Hand Joint Kinematics (RGB-HJK) dataset was developed to capture natural functional grasping behavior using an affordable markerless framework. Unlike large-scale datasets such as FreiHAND [26], which primarily focus on hand pose estimation, or datasets such as DexYCB [28], GRAB [29], and synthetic datasets like ObMan [35], which rely on RGB-D sensors, motion capture systems, multi-view setups, or simulated environments, the proposed dataset emphasizes natural object manipulation during clinically relevant grasp tasks using a single RGB webcam and real-world objects across five standardized grasp categories. One of the key novelties of RGB-HJK is the availability of continuous frame-wise 3D joint angle trajectories rather than only pose labels, bounding boxes, or sparse keypoint annotations commonly reported in existing datasets. With over 28,000 valid active-hold frames, the dataset provides continuous kinematic measurements of 11 hand joints from 10 participants performing multiple grasping tasks, enabling detailed temporal analysis of grasp transitions. Furthermore, unlike existing approaches that require costly motion capture systems, depth sensors, magnetic trackers, or wearable markers that may affect natural object interaction, the proposed framework uses a low-cost inference pipeline based on a commercially available webcam ($50), thereby improving accessibility for rehabilitation, assistive robotics, and biomechanics applications.

Moreover, in contrast to most current datasets that focus on large-scale pose estimation challenges or simulation-based settings, RGB-HJK focuses specifically on functional grasp biomechanics using anatomically realistic joint-angle representations applicable to rehabilitation robotics, prosthetics, human–robot interaction, and movement analysis applications. Despite the acquisition process being very simple, the obtained trajectories exhibit low intra-subject variability and distinct separability of different types of grasps. This shows that markerless RGB cameras can capture complex functional grasp movements that traditionally required specialized laboratory-grade equipment. The main contributions of this paper are as follows:

(i): Markerless RGB-based data of continuous 3D hand-joint kinematics during five functional grasping tasks with real-world objects.
(ii): A low-cost acquisition framework using a standard RGB webcam and the MediaPipe Hands pipeline.
(iii): Over 28,000 validated active-hold frames with continuous trajectories of 11 anatomical hand joints.
(iv): Comprehensive technical validation through repeatability, variability, and PCA-based grasp separability analyses.

2. Methods

2.1. Participants

Ten volunteers (six males and four females, all right-handed) were recruited from the institute (IIT Patna), including students and staff. The study was reviewed and approved by the Institute’s Human Ethics Committee, and participant recruitment and data collection were carried out in full compliance with institutional guidelines and regulations, and all procedures adhered to the principles of the Declaration of Helsinki. Prior to enrollment, each individual confirmed that they had no history of hand or wrist injury, neurological conditions, or musculoskeletal disorders of the upper limb; those who reported such conditions were excluded. At the beginning of each session, hand length and width were measured with a flexible tape according to the landmark conventions described in Feix et al. [23]. Full demographic and anthropometric details are given in Table 2.

2.2. Grasp Taxonomy and Experimental Protocol

The first design decision we faced was which grasp to record. We ultimately chose five object interaction types based on the GRASP taxonomy [23] and corroborated by prior studies on coordinated hand movements [41,42]. The constraints that were imposed for choosing grasping postures were as follows: (1) the grasping poses had to have a functional correspondence to actions people use in everyday life, and (2) they had to be pertinent for such disciplines as rehabilitation, biomechanics, and assistive technologies. Additionally, each pose had to be conducted while holding a real object rather than in an air posture. The requirement of using a bottle, pen, smartphone, paper, and book helps impose realistic hand poses. Table 3 provides an overview of the five selected grasps along with their functional names.

Figure 1 features one frame from each grasp with the MediaPipe Hands skeletal overlay as well as the corresponding joint angles printed on the figure. Spanning the five subfigures, the differences in finger configuration are easy to spot: the full wrap of the Power Grasp, the tight three-point contact of the Tripod, the flatter curl of the Static Power Hold, and the opposed fingertip of the Precision and Lateral Pinches each leave a unique geometric signature on the landmark skeleton. All recordings were made under similar lighting conditions (∼500 lux) in a laboratory setting, with the participant seated comfortably in front of their right arm. A 720 p webcam was placed about 50 cm in front of the participant’s right hand, slightly above it. Before the actual experiment, the participant completed warm-up activities to achieve a suitable posture. Following that, the participant would perform the grasps in the sequence Grasps 1–5, with five repetitions per grasp. As illustrated in Figure 2, each of the repetitions is executed following a four-phase process, namely: (1) the approach phase toward the object, (2) the grasping phase to achieve the target posture, (3) the lift of the object to a height of approximately 10 cm off the table, and (4) the active-hold phase lasting at least five seconds.

2.3. Frame Level Dataset Statistics

Frame counts per participant are illustrated in Figure 3. Altogether, the recording process produced 43,873 frames. Of these, 28,111 or about 64% of the total have been filtered to become active-hold frames for subsequent processing. All others were transition and resting frames between trials, exactly as required by the study protocol. A notable observation was the consistency across participants, with values clustering around 64% despite variations in transition speed between trials. This consistency is a positive indicator that segmentation procedures are not inconsistent from session to session. This phenomenon is better visualized in Figure 3, in which the active-hold/resting frames are shown for each participant.

2.4. Hardware, Software, and Landmark Pipeline

The data-capture system was kept simple by design. A Python (v3.14.1) script was used to extract frames and determine the landmarks using OpenCV (v4.13.0) [43] and MediaPipe Hands (v0.10.35) [44]. For each frame, the script used MediaPipe to detect the hand, retrieved the 3D coordinates of 21 landmarks, calculated joint angles, and added the results to a table saved in an Excel file. The script captured data at a rate of

26.95 \pm 0.29

Hz, which is slightly slower than the camera’s normal rate of 30 Hz. It takes a little time to process each frame because of using a built-in camera on a built-in 720 p FaceTime HD unit on an Apple MacBook Pro 13-inch (M1, 2020; Apple Inc., Cupertino, CA, USA;

1280 \times 720

px, 30 Hz nominal). However, the nearly constant rate, with a small deviation of 0.29 Hz, indicates stable system performance and suggests that the acquisition framework operated reliably throughout the experiments.

The MediaPipe Hands system consists of two steps and warrants some elaboration, as it directly affects data quality. First, the simple palm detector scans the entire image and extracts the hand location, which is then sent to another model that calculates the 21 most significant 3D points. The key aspect is that once the hand is detected in the initial frame, tracking mode is activated, allowing the algorithm to estimate the locations of the 21 keypoints using temporal information from the previous frame rather than performing independent detection in each image. As a result, it significantly reduces jitter in joint movements and minimizes false hand detections while holding an object.

2.5. Joint Angle Computation

For each tracked joint j, two vectors are created from the MediaPipe landmark coordinates:

v_{1}

points from the joint center to its proximal (parent) landmark and

v_{2}

points to its distal (child) landmark. Then the included angle at the joint is the angle between those two vectors, calculated by the standard dot product formula:

θ_{j} = arccos (\frac{v_{1} \cdot v_{2}}{∥ v_{1} ∥ ∥ v_{2} ∥}), v_{1} = l_{p} - l_{j}, v_{2} = l_{c} - l_{j},

(1)

where

l_{j}

,

l_{p}

and

l_{c}

are the 3D coordinate vectors of the joint center, its proximal landmark, and its distal landmark, respectively. One implementation detail worth mentioning is that the dot product ratio is clamped to

[- 1, + 1]

before being passed to arccos, avoiding the numerical overflow that can sometimes occur due to floating point rounding near the domain boundary. Thus, an absolute included angle of the total three-dimensional opening of the joint is obtained, and not a clinically isolated planar angle such as pure flexion/extension.

Eleven anatomical joints were tracked in total (the CMC, MCP, and IP joints of the thumb, and the MCP and PIP joints of the index, middle, ring, and little fingers). The distal interphalangeal (DIP) joints were excluded due to severe self-occlusions during object interaction. In the Power Grasp and Static Power Grasp, the finger tips touch directly to the object surface and go out of the camera’s view partially or totally. We confirmed this in pilot recordings: the most distal landmarks (lm[8], lm[12], lm[16], lm[20] in the index of MediaPipe) produced noise-to-signal ratios high enough to produce physically implausible DIP angle estimates in those occlusion conditions. Dropping the four DIP joints brought the reliability to 100% for all five grasp types, and the remaining 11 joints carry the information that actually matters for grasp classification and biomechanical interpretation.

2.6. Data Cleaning

Raw session files arrive on disk as .xlsx spreadsheets containing the complete joint-angle trajectory from start to end of the approach, grasp, lift, hold, release, and everything in between. A Python script loops through each file and removes any frames that are not stable active-hold frames: transition segments, short tracking losses, and frames where one or more joint measurements are missing or outside the physiologically plausible range of 0–180° [45]. After that, the output is a cleaned .xlsx file for each participant and a brief summary JSON file that logs key information, including the total number of frames, the number of active-hold frames, the resting percentage, the session duration, and the capture rate.

The differences between the “before” and “after” are readily apparent. Using a separate Python script, the raw trajectories are plotted, showing all the chaos: sudden spikes when the tracking system fails to locate a landmark, slow drifts during object recovery, and plateaus filled with noise during the transition between holds. The cleaned trajectories are much more regular, with low-variance plateaus in each hold segment and clear separation between each grasp. In terms of the data, the cleanup removes 43,873 to 28,111 frames, dropping the number of valid holds by over a third, while the effective frame rate remains steady at

26.95 \pm 0.29

Hz. The rate does not drop during cleaning, confirming that frames are being removed in whole segments rather than randomly, which is exactly the behavior the segmentation logic was designed to produce. Raw joint angle trajectories in Figure 4 looked just as expected: spikes when the tracker momentarily lost a landmark, slow drift during object retrieval, and noisy segments at the boundaries between grasps. A Savitzky–Golay smoothing filter was applied during the preprocessing stage to reduce signal noise and improve the readability of the extracted signals.

3. Data Description

3.1. Dataset Overview and Access

The RGB-HJK dataset is openly hosted on Kaggle under a Creative Commons license CC BY 4.0; the permanent link is https://doi.org/10.34740/kaggle/dsv/16123762. We chose .xlsx as the primary file format because it opens without additional software on virtually every operating system and analysis environment. A companion .json metadata file travels with the data and records session-level provenance. The full set of dataset specifications is listed in Table 4.

3.2. Column Definitions

Each .xlsx file (raw and processed) contains two Excel sheets: Data and Metadata. The column definitions for the Data sheet are listed in Table 5.

3.3. Data Structure and File Organization

The RGB-HJK database is structured in a hierarchical folder system, enabling easy inclusion of files in any analytical or machine learning process. The two top-level folders that constitute the database are “Processed” and “Raw,” with each folder having exactly ten identical Excel files named from Subject01.xlsx to Subject10.xlsx. On the other hand, the Raw folder contains complete information on all recorded frames (e.g., all 3723 frames of Subject 01 for approach, lift, and rest actions), which are ideal for applications involving motion transitions or the development of segmentation algorithms. The Processed folder, on the other hand, provides complete curation of the dataset for specific application use (e.g., biophysical simulation or classification), using only 2061 frames from Subject 01 containing stable active-hold phases. Finally, to follow FAIR data principles, all .xlsx files in both folders have a Data sheet accompanied by a Metadata sheet.

The repository structure is organized as follows:

RGB-HJK dataset/ (Root Directory)
-
Processed/
∗
Subject01.xlsx – Subject10.xlsx
·
Sheet: Data (cleaned, validated active-hold frames only)
·
Sheet: Metadata (participant demographics and frame details)
-
Raw/
∗
Subject01.xlsx – Subject10.xlsx
·
Sheet: Data (continuous streams including transitions and rest)
·
Sheet: Metadata
∗
cleaning_summary.json (overall segmentation logs and frame reduction statistics)

Annotation quality was ensured through a standardized acquisition protocol, consistent grasp labeling across all participants, and automated landmark extraction using the MediaPipe Hands framework. All recordings were visually inspected to verify correct grasp execution and successful hand tracking. Furthermore, frames containing missing measurements, tracking failures, or physiologically implausible joint angles were removed during data cleaning. The high detection rate across all trials and the strong repeatability observed during technical validation further support the reliability and consistency of the dataset annotations.

4. Technical Validation and Data Visualization

4.1. Active-Hold Detection Reliability

In the entire set of data, including 10 subjects, 5 grasp postures, and 5 trials each (

10 subjects \times 5 grasps \times 5 trials

), there was not even one missed hold frame detected by MediaPipe Hands detector. This means the hold detection percentage was 100% and there were no tracking errors, as all manually marked hold frames were successfully detected by the algorithm. This outcome was not something to take for granted, since we were most concerned about occlusion during tight grasps. In the case of the rest/transition fraction, however, it is quite the opposite. It ranged from

29.5 %

(S09) to

44.6 %

(S01) with a mean of

35.9 \pm 4.3 %

across subjects (Figure 3). The difference between subjects shows how quickly or slowly individuals were at object retrieval and release across sessions and does not imply any alteration in the acquisition pathway among subjects.

4.2. Joint Angle Patterns Across the Five Grasps

The violin plot of all eleven joint angles across the five types of grasps is provided in Figure 5a–d. First off, we notice that grasping styles distribute themselves across the joint space they create, forming distinct areas, which means the distributions are not random at all but a direct reflection of the mechanical properties of a particular type of grasp.

The Power Grasp is characterized by near extended thumb joints (Thumb_CMC:

163.9 \pm 6.7 °

, Thumb_MCP:

172.7 \pm 5.1 °

) and moderately flexed finger MCP joints (e.g., Index_MCP:

165.1 \pm 5.7 °

, Middle_MCP:

132.0 \pm 4.7 °

), consistent with a cylindrical wrapping configuration. The Tripod Grasp exhibits reduced MCP angles and coordinated flexion patterns (Index_MCP:

110.0 \pm 6.8 °

, Middle_MCP:

104.6 \pm 9.4 °

) alongside stable PIP values (Index_PIP:

141.4 \pm 6.1 °

, Middle_PIP:

141.2 \pm 7.8 °

), reflecting pen stabilization. The Static Power Hold shows the greatest variability and deeper flexion in finger joints (Index_PIP:

108.4 \pm 12.1 °

, Middle_PIP:

92.9 \pm 15.8 °

, Ring_PIP:

92.0 \pm 15.9 °

), indicating a firm, full-hand grasp. The Precision Pinch resulted in near extension at the thumb and finger MCP joints (Thumb_MCP:

174.0 \pm 2.7 °

, Index_MCP:

161.9 \pm 5.0 °

) and moderate PIP flexion (Index_PIP:

133.4 \pm 9.0 °

) consistent with precise motor control. The Lateral Pinch presents moderate MCP flexion in the fingers (Index_MCP:

133.0 \pm 12.4 °

, Ring_MCP:

120.8 \pm 8.4 °

) alongside near-extended thumb joints (Thumb_CMC:

160.6 \pm 4.3 °

, Thumb_MCP:

167.2 \pm 3.4 °

), reflecting a side contact grasp configuration.

A closer look at the data from a different angle is shown in Figure 6a–d. Normalizing each joint angle to the global range and overlaying The five main types of grips or holds on radar graphs can really affect how things move. The characteristic kinematic profile of each grasp type can be readily distinguished. The shaded regions represent

\pm 1

SD across all pooled active-hold frames, illustrating the degree of variability and consistency of each grasp around its mean joint configuration.

4.3. Intra-Subject Trial to Trial Repeatability

For each subject grasp combination, the per trial mean joint angles were computed (one 11-dimensional vector per trial), and the standard deviation across the five trial means was taken as the trial to trial variability metric. These per subject standard deviations were then averaged across the ten subjects to produce the mean SD heatmap shown in Figure 7a.

Across all 55 joint grasp combinations (11 joints × 5 grasps), the mean intra-subject SD ranged from

1.2 °

(Thumb_CMC v/sPrecision Pinch) to

7.9 °

(Thumb_IP v/s Static Power Hold; and Little_MCP/Tripod Grasp), confirming strong intra-subject repeatability for the entire dataset. Precision Pinch was the most reproducible grasp (all joints

\leq 2.8 °

SD), consistent with the high proprioceptive control that participants exercise during fine pinch tasks. The Static Power Hold showed the largest intra-subject variability for Thumb_IP (

7.9 °

), plausibly because the thumb of this flat object grasp can adopt a range of adduction angles across trials without strong biomechanical constraints.

4.4. Inter-Subject Variability

Inter-subject variability was quantified by computing the standard deviation across subjects of the per grasp, per subject mean joint angles (Figure 7b). The Power Grasp and Tripod Grasp yielded the lowest inter-subject variability (all joints

\leq 6.1 °

and

\leq 5.8 °

, respectively), reflecting the strong mechanical constraints imposed by the cylindrical and pen grasping postures. The Static Power Hold and Precision Pinch showed substantially higher inter-subject variability (up to

15.9 °

for Ring_MCP in the Static Power Hold, and

15.8 °

for Little_MCP in the Precision Pinch), indicating that individuals adopt meaningfully different kinematic strategies when holding flat objects and during fine pinch. These values are consistent with previous studies, where inter-subject standard deviations of 10–

15 °

during daily hand activities were reported by Jarque-Bou et al. [46] and Santello et al. [42]. This suggests that the observed variability primarily reflects natural anatomical and behavioral differences rather than markerless tracking noise, consistent with the hand-synergy observations reported by Santello et al. [42].

4.5. PCA-Based Grasp Separability

The question was simply whether the information content of the 11 joint angles was sufficient to discriminate between the five grasps without subject-specific normalization. We tested this by performing PCA of the normalized active-hold frames. We did not want to plot all 28,111 frames together, as this would look like a messy cloud. Instead, we used Python’s scikit-learn library [47] to take a uniform random sample of 2000 frames from each grasp type (total n = 10,000). To verify that this was not a biased sample, we repeated this process using different random number seeds during exploratory analysis. The variance ratios and cluster shapes were identical each time, so the PCA loading plot in Figure 8 is not due to random chance. The plot of the first two principal components already speaks volumes. PC1 accounts for 64.4% of the variance on its own. PC2 accounts for 11.5%. The total proportion of variance explained by PC1 and PC2 is 75.9%. The distribution of types is mostly distinct clusters, with little overlap. Most flexion–extension axes are defined with respect to a global coordinate system, capturing major kinematic differences between grasps, while subtler finger coordination patterns are represented in the remaining components.

Examining the separate clusters, Power Grasp and Lateral Pinch form the most tightly grouped and clear cluster, far away from all other points, reflecting the low intra-subject variability that they exhibited in the repeatability heatmaps (Figure 7a). Static Power Hold and Precision Pinch clusters, however, have scattered points, as expected, since each subject used a different finger positions to hold a flat object or perform a precision pinch. Despite some overlap, the class centroids remain sufficiently separated such that misclassification would typically occur only in extreme outlier cases. Collectively, these results indicate that the 11 joint angles provide adequate discriminative information, making them suitable for classification, clustering, and direct machine learning applications.

5. Discussion

The difficulty in capturing hand kinematics has always involved the necessary compromise between accuracy and ecological validity. Although camera-based tracking systems offer highly accurate measures, they are not without drawbacks: they are cumbersome to set up and can therefore only be performed in laboratory settings, with retro-reflective markers attached to the skin [48]. Electromagnetic trackers and instrumented gloves from previous studies also have limitations [49,50]. As shown in Table 6, while sensor gloves are able to record task angles in real time, they share a common weakness in the sense that they prevent natural tactile sensations [51,52,53]. The physical constraints, along with the stiffness of the glove material and sensor drift, alter how a participant physically handles the object. Traditional baseline techniques, such as goniometry [54], are vulnerable to human error and limited to static resolution. The latest approach to resolving the issue has been the use of depth-sensing RGB-D and infrared-based methods [55]. One advantage of this markerless sensing approach is that it eliminates the need for wearable hardware or physical attachments on the hand, enabling more natural interactions. However, direct object interaction inevitably introduces self-occlusions, which can affect tracking accuracy during grasping tasks. When hands grab a bottle tightly, their closed fingers disrupt the sensor visibility [55,56].

The proposed RGB-HJK system overcomes these limitations by operating in an innovative way, leveraging the MediaPipe library to process standard 2D RGB images captured with a low-cost webcam (approximately $50). In order to verify the accuracy of this markerless pipeline when compared to existing methods, Table 6 demonstrates the kinematics of a sample Index PIP joint during our Static Power Grasp (holding a phone) compared to other related grasping activities described in the literature. The task angle we measured,

61.8 ° \pm 18.4 °

, is comparable to the angles obtained with the previous studies with instrumented glove (

54.7 °

to

68.3 °

) [51,52,53] and RGB-D cameras (

53.4 °

) [55].

The variance observed in our study results (

S D = 18.4 °

) arises from the inherent biological variation in how hands and objects interact, a phenomenon that is typically minimized in lab-based studies or glove-mounted systems. The RGB-HJK model successfully predicted the location of occluded joints during the active grasp phase, unlike the depth sensor/IR camera modalities, which generate significant noise when objects are at near distances [55]. Our results indicate that precise, verifiable hand kinematics do not necessitate complex, expensive hardware or costly laboratories; an inexpensive $50 webcam with effective computer vision algorithms will suffice.

6. Conclusions

The RGB-HJK dataset, an open access, marker-free database, has been presented in this work, comprising continuous 3D hand-joint angle sequences spanning five standard grasping tasks with objects across ten healthy volunteers. The study measured intra-subject repeatability and inter-subject variability using standard deviation (SD) in degrees and used PCA variance ratios to measure the linear separability of the grasps. The database contained 28,111 active-hold frames with a mean frame rate of

26.95 Hz

, and the 11 anatomical joints were detected with low variability between repeated trials by the same individual (≤

7.9 °

). The framework demonstrates strong potential for rehabilitation assessment; however, limitations remain due to single-camera self-occlusions, exclusion of distal finger joints, limited participant diversity, and lack of validation against gold-standard motion capture methods. Future work should expand clinical populations and include bilateral and diverse grasp evaluations, and establish longitudinal validation for rehabilitation monitoring and anomaly detection in assistive technologies.

Author Contributions

Conceptualization, S.Y. and J.N.; methodology, S.Y. and J.N.; software, S.Y.; validation, S.Y. and J.N.; formal analysis, S.Y.; investigation, S.Y.; resources, J.N.; data curation, S.Y.; writing—original draft preparation, S.Y.; writing—review and editing, J.N.; visualization, S.Y.; supervision, J.N.; project administration, J.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was reviewed and approved by the Institute Human Ethics Committee of the Indian Institute of Technology Patna. All experimental procedures involving human participants were conducted in accordance with the approval granted under the project “AI-Assisted Elbow Exoskeleton for Upper Limb Therapy Exercises using Electromyography Signals” (approval date: 18 November 2025). Participant recruitment and data collection were carried out in full compliance with institutional guidelines and regulations, and all procedures adhered to the principles of the Declaration of Helsinki.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. The dataset is fully anonymised and contains no personally identifiable imagery or video files.

Data Availability Statement

The dataset is openly available on Kaggle at https://doi.org/10.34740/kaggle/dsv/16123762 under the CC BY 4.0 licence.

Acknowledgments

The authors thank all volunteer participants at IIT Patna for their time and cooperation. The authors acknowledge the open-source contributions of the MediaPipe, whose tools made this work possible.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ADL	Activity of daily living
BMI	Body mass index
CMC	Carpometacarpal
DIP	Distal interphalangeal
FPS	Frames per second
GRASP	Grasp Representation with Annotations, Statistics, and Pose
IIT	Indian Institute of Technology
RGB-HJK	RGB-Based Hand Joint Kinematics
IP	Interphalangeal
MCP	Metacarpophalangeal
PC	Principal component
PCA	Principal component analysis
PIP	Proximal interphalangeal
RGB	Red–green–blue
SD	Standard deviation

References

Mohd Nayan, N.A.; Chien, C.W.; Lokman, N.; Alrashdi, M.; Che Daud, A.Z. Hand-Related Activities of Daily Living Challenges Among Individuals with Diabetic Peripheral Neuropathy: A Scoping Review. Ann. Rehabil. Med. 2025, 49, 139–151. [Google Scholar] [CrossRef]
Choi, M.J.; Bustos, V.P.; Xu, K.Y.; Nayak, V.V.; Coelho, P.G.; Tadisina, K.K. The Impact of Handheld Device Use on Hand Biomechanics. Bioengineering 2025, 12, 1145. [Google Scholar] [CrossRef]
Schneider, T.R.; Felbecker, A.; von Mitzlaff, B.; Weissofner, G.; Meier, S.; Eggenberger, P.; Annaheim, S. Hand dexterity and mobility independently predict cognition in older adults: A multi-domain regression analysis. Front. Aging Neurosci. 2025, 17, 1624307. [Google Scholar] [CrossRef] [PubMed]
Garcia-Villalba, L.A.; Rodríguez-Ramírez, A.G.; Rodríguez-Picón, L.A.; Méndez-González, L.C.; Ghasemlou, S.M. Interactive Platform for Hand Motor Rehabilitation Using Electromyography and Optical Tracking. Appl. Sci. 2025, 15, 12434. [Google Scholar] [CrossRef]
Nam, H.; Kim, C.; Kim, K.; Park, J.I. Physically Plausible Realistic Grip-Lift Interaction Based on Hand Kinematics in VR. Electronics 2023, 12, 2794. [Google Scholar] [CrossRef]
Expo 2025 Osaka. Hand in Bionic Hand: The New Age of Accessible, High-Tech Prosthetics. Available online: https://www.japan.go.jp/kizuna/2025/04/high-tech_bionic_prosthetics.html (accessed on 15 February 2026).
Wang, X.; Cho, C.; Zhang, P.; Ge, S.; Chen, J. Medical Imaging-Based Kinematic Modeling for Biomimetic Finger Joints and Hand Exoskeleton Validation. Biomimetics 2025, 10, 652. [Google Scholar] [CrossRef]
Diaz, M.T.; Benoit, A.R.; Kearney, K.M.; Kelly, T.F.; Lindbeck, E.M.; Tappan, I.; Bowers, W.S.; Durai, L.; Nunag, J.B.; Officer, M.B.; et al. A hand biomechanics dataset of kinematics, kinetics, electromyography, and imaging in healthy adults. bioRxiv 2025. bioRxiv:2025.08.21.671503. [Google Scholar] [CrossRef] [PubMed]
Bazina, T.; Mauša, G.; Zelenika, S.; Kamenar, E. Reducing Hand Kinematics by Introducing Grasp-Oriented Intra-Finger Dependencies. Robotics 2024, 13, 82. [Google Scholar] [CrossRef]
Gaillard, L.; Stubbe, L.; Riquet, D.; Houel, N. Influence of motion capture camera’s self-heating on optoelectronic plethysmography accuracy. Measurement 2025, 253, 117523. [Google Scholar] [CrossRef]
Hulleck, A.A.; Menoth Mohan, D.; Abdallah, N.; El Rich, M.; Khalaf, K. Present and future of gait assessment in clinical practice: Towards the application of novel trends and technologies. Front. Med. Technol. 2022, 4, 901331. [Google Scholar] [CrossRef]
Gutierrez, M.; Gomez, B.; Retamal, G.; Peña, G.; Germany, E.; Ortega-Bastidas, P.; Aqueveque, P. Comparing Optical and Custom IoT Inertial Motion Capture Systems for Manual Material Handling Risk Assessment Using the NIOSH Lifting Index. Technologies 2024, 12, 180. [Google Scholar] [CrossRef]
Zhai, Y.; Wu, S.; Hu, Q.; Zhou, W.; Shen, Y.; Yan, X.; Ma, Y. Influence of grasping postures on skin deformation of hand. Sci. Rep. 2023, 13, 21416. [Google Scholar] [CrossRef] [PubMed]
Wong, W.K.; Liang, H.; Sun, H.; Sun, W.; Yuan, H.; Zhao, S.; Fei, L. Learning to estimate 3D interactive two-hand poses with attention perception. Image Vis. Comput. 2025, 154, 105398. [Google Scholar] [CrossRef]
Carfi, A.; Patten, T.; Kuang, Y.; Hammoud, A.; Alameh, M.; Maiettini, E.; Weinberg, A.I.; Faria, D.; Mastrogiovanni, F.; Alenyà, G.; et al. Hand-object interaction: From human demonstrations to robot manipulation. Front. Robot. AI 2021, 8, 714023. [Google Scholar] [CrossRef]
Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.-L.; Yong, M.G.; Lee, J.; et al. MediaPipe: A Framework for Building Perception Pipelines. arXiv 2019, arXiv:1906.08172. [Google Scholar] [CrossRef]
Latreche, A.; Kelaiaia, R.; Chemori, A.; Kerboua, A. Reliability and validity analysis of MediaPipe-based measurement system for some human rehabilitation motions. Measurement 2023, 214, 112826. [Google Scholar] [CrossRef]
Heo, S.; Choi, T.; Choi, W. Clinical Validation of an On-Device AI-Driven Real-Time Human Pose Estimation and Exercise Prescription Program; Prospective Single-Arm Quasi-Experimental Study. Healthcare 2026, 14, 482. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Yu, C.; Tu, C.; Lyu, Z.; Tang, J.; Ou, S.; Fu, Y.; Xue, Z. A survey on hand pose estimation with wearable sensors and computer-vision-based methods. Sensors 2020, 20, 1074. [Google Scholar] [CrossRef]
Mastinu, E.; Coletti, A.; Mohammad, S.H.A.; van den Berg, J.; Cipriani, C. HANDdata - first-person dataset including proximity and kinematics measurements from reach-to-grasp actions. Sci. Data 2023, 10, 405. [Google Scholar] [CrossRef]
Wade, L.; Needham, L.; McGuigan, P.; Bilzon, J. Applications and limitations of current markerless motion capture methods for clinical gait biomechanics. PeerJ 2022, 10, e12995. [Google Scholar] [CrossRef]
Friemert, D.; Schnura, D.; Runkel, S.; Borsch, J.; Karamanidis, K.; Dellen, B.; Thieme, L.; Fiedler, A.; Jaekel, U.; Hartmann, U. Limitations of Public Biomechanical Movement Datasets for Deep Learning: Issues of Metadata, Standardization, and Variety in Motion Types. medRxiv 2025. medRxiv:2025.05.29.25328474. [Google Scholar] [CrossRef]
Feix, T.; Romero, J.; Schmiedmayer, H.B.; Dollar, A.M.; Kragic, D. The GRASP Taxonomy of Human Grasp Types. IEEE Trans. Hum.-Mach. Syst. 2016, 46, 66–77. [Google Scholar] [CrossRef]
Trejo Ramírez, M.P.; Thornton, C.J.; Evans, N.D.; Chappell, M.J. Quantification of finger grasps during activities of daily life using convolutional neural networks: A pilot study. Healthc. Technol. Lett. 2024, 11, 259–270. [Google Scholar] [CrossRef]
Sun, J.; Mao, P.; Kong, L.; Wang, J. A Review of Embodied Grasping. Sensors 2025, 25, 852. [Google Scholar] [CrossRef] [PubMed]
Zimmermann, C.; Ceylan, D.; Yang, J.; Russell, B.; Argus, M.J.; Brox, T. FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2019; pp. 813–822. [Google Scholar] [CrossRef]
Hampali, S.; Rad, M.; Oberweger, M.; Lepetit, V. HOnnotate: A method for 3D annotation of hand and object poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2020; pp. 3193–3203. [Google Scholar] [CrossRef]
Chao, Y.W.; Yang, W.; Xiang, Y.; Molchanov, P.; Handa, A.; Tremblay, J.; Narang, Y.S.; Van Wyk, K.; Iqbal, U.; Birchfield, S.; et al. DexYCB: A Benchmark for Capturing Hand Grasping of Objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Taheri, O.; Ghorbani, N.; Black, M.J.; Tzionas, D. GRAB: A Dataset of Whole-Body Human Grasping of Objects. In Proceedings of the Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 581–600. [Google Scholar] [CrossRef]
Fan, Z.; Taheri, O.; Tzionas, D.; Kocabas, M.; Kaufmann, M.; Black, M.J.; Hilliges, O. ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2023; pp. 12943–12954. [Google Scholar] [CrossRef]
Garcia-Hernando, G.; Yuan, S.; Baek, S.; Kim, T.K. First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
Kwon, T.; Tekin, B.; Stuhmer, J.; Bogo, F.; Pollefeys, M. H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2021; pp. 10118–10128. [Google Scholar] [CrossRef]
Brahmbhatt, S.; Tang, C.; Twigg, C.D.; Kemp, C.C.; Hays, J. ContactPose: A Dataset of Grasps with Object Contact and Hand Pose. In Proceedings of the Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 361–378. [Google Scholar] [CrossRef]
Moon, G.; Yu, S.I.; Wen, H.; Shiratori, T.; Lee, K.M. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image. In Proceedings of the Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 548–564. [Google Scholar] [CrossRef]
Hasson, Y.; Varol, G.; Tzionas, D.; Kalevatykh, I.; Black, M.J.; Laptev, I.; Schmid, C. Learning Joint Reconstruction of Hands and Manipulated Objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2019; pp. 11799–11808. [Google Scholar] [CrossRef]
Yuan, S.; Ye, Q.; Stenger, B.; Jain, S.; Kim, T.K. BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 2605–2613. [Google Scholar] [CrossRef]
Banerjee, P.; Shkodrani, S.; Moulon, P.; Hampali, S.; Han, S.; Zhang, F.; Zhang, L.; Fountain, J.; Miller, E.; Basol, S.; et al. HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2025. [Google Scholar]
Yang, L.; Li, K.; Zhan, X.; Wu, F.; Xu, A.; Liu, L.; Lu, C. OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 20953–20962. [Google Scholar] [CrossRef]
Shan, D.; Geng, J.; Shu, M.; Fouhey, D.F. Understanding Human Hands in Contact at Internet Scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2020; pp. 9866–9875. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Jiang, C.; Lyu, K.; Wan, W.; Shen, H.; Liang, B.; Fu, Z.; Wang, H.; Yi, L. HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 21013–21022. [Google Scholar] [CrossRef]
Braido, P.; Zhang, X. Quantitative Assessment of Finger Motion Coordination in Hand Activities. Clin. Biomech. 2004, 19, 626–636. [Google Scholar]
Santello, M.; Flanders, M.; Soechting, J.F. Postural Hand Synergies for Tool Use. J. Neurosci. 1998, 18, 10105–10115. [Google Scholar] [CrossRef] [PubMed]
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 25, 120–126. [Google Scholar]
Zhang, F.; Bazarevsky, V.; Vakunov, A.; Tkachenka, A.; Sung, G.; Chang, C.L.; Grundmann, M. MediaPipe Hands: On-device Real-time Hand Tracking. arXiv 2020, arXiv:2006.10214. [Google Scholar]
Casas, R.; Sandison, M.; Chen, T.; Lum, P.S. Clinical Test of a Wearable, High DOF, Spring Powered Hand Exoskeleton (HandSOME II). IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 1877–1885. [Google Scholar] [CrossRef]
Jarque-Bou, N.J.; Manero, A.; Vergara, M.; Sancho-Bru, J.L.; Roda-Sales, A.; Gracia-Ibáñez, V. A Calibrated Database of Kinematics and EMG of the Forearm and Hand During 26 Daily Life Activities. Sci. Data 2019, 6, 180132. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Choi, H.; Park, D.; Rha, D.-W.; Nam, H.S.; Jo, Y.J.; Kim, D.Y. Kinematic analysis of movement patterns during a reach-and-grasp task in stroke patients. Front. Neurol. 2023, 14, 1225425. [Google Scholar] [CrossRef] [PubMed]
Amaral, P.; Silva, F.; Santos, V. Recognition of Grasping Patterns Using Deep Learning for Human-Robot Collaboration. Sensors 2023, 23, 8989. [Google Scholar] [CrossRef] [PubMed]
Pratap, S.; Ito, K.; Hazarika, S.M. Synergistic grasp analysis: A cross-sectional exploration using a multi-sensory data glove. Wearable Technol. 2025, 6, e2. [Google Scholar] [CrossRef]
Roda-Sales, A.; Jarque-Bou, N.J.; Bayarri-Porcar, V.; Gracia-Ibáñez, V.; Sancho-Bru, J.L.; Vergara, M. Electromyography and kinematics data of the hand in activities of daily living with special interest for ergonomics. Sci. Data 2023, 10, 814. [Google Scholar] [CrossRef]
Lai, J.; Song, A.; Shi, K.; Ji, Q.; Lu, Y.; Li, H. Design and evaluation of a bidirectional soft glove for hand rehabilitation-assistance tasks. IEEE Trans. Med. Robot. Bionics 2023, 5, 743–753. [Google Scholar] [CrossRef]
Padilla-Magaña, J.F.; Peña-Pitarch, E.; Sánchez-Suarez, I.; Ticó-Falguera, N. Hand Motion Analysis during the Execution of the Action Research Arm Test Using Multiple Sensors. Sensors 2022, 22, 3276. [Google Scholar] [CrossRef]
Ibrahim, M.; Baba, U.F.P.; Singh, V.; Karanjkar, A.; Madhavan, L.; Shah, R.A.; Haq, A.; Pawar, M.; Kumari, A.; Panse, N.; et al. The Normal Active Range of Motion of the Index, Middle, Ring, and Little Fingers in a Sample of Indian Population. Indian J. Plast. Surg. 2024, 57, 248–255. [Google Scholar] [CrossRef]
Xie, Q.; Sheng, B.; Huang, J.; Zhang, Q.; Zhang, Y. A Pilot Study of Compensatory Strategies for Reach-to-Grasp-Pen in Patients with Stroke. Appl. Bionics Biomech. 2022, 2022, 6933043. [Google Scholar] [CrossRef]
Borzelli, D.; Boarini, V.; Casile, A. A quantitative assessment of the hand kinematic features estimated by the Oculus Quest 2. Sci. Rep. 2025, 15, 8842. [Google Scholar] [CrossRef]

Figure 1. Grasping object and its joint positioning along with the corresponding angles that are computed based on this information: (a) Power Grasp (grasping a bottle) (b) Tripod Grasp (grasping a pen) (c) Static Power Grasp (grasping a smartphone) (d) Precision Pinch (grasping a thin piece of paper) (e) Side Pinch (grasping a book). CMC, IP, MCP, and PIP represent the Carpometacarpal, Interphalangeal, Metacarpophalangeal, and Proximal Interphalangeal joints, respectively.

Figure 2. Interaction protocol used during data collection: (1) approach, (2) grasping, (3) lifting, and (4) holding. This protocol ensures a consistent time structure of trials and allows for the consistent extraction of stable grasp frames for analysis.

Figure 3. Per-subject distribution of recorded frames into active-hold (solid blue) and rest/transition (hatched grey) components with active-hold percentages.

Figure 4. Three sample joint angle trajectories of Subject S08 showing raw and cleaned signals. Top panel (a) shows the raw trajectories with noise and artifacts. Bottom panel (b) shows the cleaned signals after pre-processing. Red dashed lines indicate transitions between trials, and black dashed lines indicate changes between grip types.

Figure 5. Distribution of joint angles over the five grasp types. The width of each violin at a given angle denotes how many samples had this angle; the horizontal line across each violin denotes the median and interquartile range.

Figure 6. Radar charts of the mean normalized joint configuration for each of the five grasp types.

Figure 7. Heatmaps of joint angle standard deviation (SD, degrees) for (a) intra-subject repeatability (mean SD of trial mean angles within each subject grasp combination) and (b) inter-subject variability (SD of per-grasp mean angles across ten participants). Panel (a) confirms strong within-subject consistency (all cells

\leq 7.9 °

); Panel (b) shows Static Power Hold and Precision Pinch exhibits greater inter-subject adaptation (up to

16 °

), while Power Grasp and Tripod Grasp remain population consistent. Color scales differ between panels.

Figure 7. Heatmaps of joint angle standard deviation (SD, degrees) for (a) intra-subject repeatability (mean SD of trial mean angles within each subject grasp combination) and (b) inter-subject variability (SD of per-grasp mean angles across ten participants). Panel (a) confirms strong within-subject consistency (all cells

\leq 7.9 °

); Panel (b) shows Static Power Hold and Precision Pinch exhibits greater inter-subject adaptation (up to

16 °

), while Power Grasp and Tripod Grasp remain population consistent. Color scales differ between panels.

Figure 8. Projected data of the first two principal components from PCA across the 5 grasp types and joint angles. Each dot is an active-hold frame (up to 2000 frames per grasp). The centroids of the clusters are represented as circles. PC1 (64.4%) and PC2 (11.5%) account for a total of 75.9% of the variance, while the clusters have little overlap and are clearly linearly separable into grasp types.

Table 1. Comprehensive comparison of the proposed RGB-HJK dataset with existing hand kinematics and pose estimation datasets.

Dataset	Data Modality	Subjects	Size (Frames)	Interaction/Task	Annotations Provided	Benefits/Advantages	Limitations
FreiHAND (2019) [26]	RGB	32	134K	Single Hand Pose	3D Hand Pose & Shape	Large-scale baseline for single hand poses.	Isolated poses; lacks object manipulation.
HO-3D (2020) [27]	RGB-D	10	77K	Hand–Object	3D Hand & Object Poses	Accurate simultaneous hand–object tracking.	Requires multi-camera depth setups.
DexYCB (2021) [28]	RGB-D	10	582K	Hand Grasping	3D Hand & 6D Object	Extensive manipulation with 6D object tracking.	Complex setup tailored for robotic grasping.
GRAB (2020) [29]	MoCap	10	1.6M	Whole-Body Grasp	Full Body & Object 3D	Provides whole-body pose and grasp kinematics.	Relies on expensive MoCap suits and markers.
ARCTIC (2023) [30]	MoCap & RGB	9	2.1M	Bimanual Articulated	3D Hand & Object	Detailed bimanual dexterity with articulated objects.	Requires complex lab MoCap setup.
FPHA (2018) [31]	RGB-D & Mag.	6	100K	Egocentric Action	3D Hand Pose	First-person perspective for egocentric actions.	Magnetic sensors hinder natural tactile feedback.
H2O (2021) [32]	RGB-D	4	570K	First-Person Bimanual	3D Hand & 6D Object	3D bimanual interactions in first-person view.	Small cohort (4 subjects); head-mounted cameras.
ContactPose (2020) [33]	RGB-D & Thermal	50	2.9M	Hand–Object Contact	3D Hand & Contact Map	Detailed contact mapping and thermal analysis.	Static contact maps; lacks dynamic trajectories.
InterHand2.6M (2020) [34]	RGB	26	2.6M	Interacting Hands	3D Hand Pose	Massive scale; ideal for hand-to-hand interactions.	Lacks functional object grasping.
ObMan (2019) [35]	Synthetic	0	150K	Hand–Object	3D Hand & Object Poses	Perfectly annotated, massive synthetic scale.	Fully synthetic; lacks real-world variance.
BigHand2.2M (2017) [36]	Depth	10	2.2M	Single Hand Pose	3D Hand Pose	Million-scale dataset with diverse articulations.	Depth only; uses intrusive magnetic sensors.
HOT3D (2025) [37]	Multi-RGB	19	3.7M	Hand–Object	3D Hand & Object Poses	Multi-view egocentric recordings from AR/VR.	Optical markers alter natural interactions.
OakInk (2022) [38]	RGB	12	230K	Intent-Oriented HOI	3D Hand & Object Poses	Focuses on intent-oriented virtual grasping.	Poses fitted from 2D keypoints, not directly measured.
100DOH (2020) [39]	RGB (Video)	>19K	100K	Hand–Object Contact	2D BBoxes & Contact	Unmatched diversity from unconstrained videos.	Lacks 3D kinematics; limited to 2D annotations.
HOI4D (2022) [40]	RGB-D	9	2.4M	Egocentric HOI	3D Hand & 6D Object	Massive 4D dataset with action segmentation.	Depth sensors struggle with heavy occlusion.
RGB-HJK (Proposed)	RGB	10	>43K (>28K Validated)	Functional Grasping	Continuous 3D Joint Angles	Markerless, single low-cost webcam; continuous kinematics preserving natural tactile feedback.	Single-camera self-occlusions; distal joints excluded.

Table 2. Participant demographic and hand anthropometry data. All participants were right hand dominant with no exclusion criteria.

Subject	Age (yr)	Sex	BMI (kg/m²)	Hand L (mm)	Hand W (mm)
S01	24	M	22.4	191	88
S02	22	F	20.8	174	76
S03	28	F	24.1	196	91
S04	25	F	21.5	177	78
S05	30	M	23.7	198	93
S06	23	F	20.2	172	74
S07	27	M	25.0	193	89
S08	26	F	22.9	175	77
S09	22	M	21.1	189	87
S10	31	M	26.3	200	92
Mean ± SD	25.8 ± 3.2	6 M/4 F	22.8 ± 2.0	186.5 ± 10.9	84.5 ± 7.4

Table 3. The grasp taxonomy used in the dataset.

Name	Object	Functional Description
Power Grasp	Cylindrical water bottle	Full-hand cylindrical wrap; All fingers flexed around the object, to the thumb opposing across the palm.
Tripod Grasp	Standard ballpoint pen	Writing posture, pen stabilized between index and middle fingers by thumb support.
Static Power Grasp	Smartphone	Large flat object held in a semi power grasp; fingers curl around the device while the thumb rests on the surface.
Precision Pinch	Paper	Thin object took between thumb tip and lateral surface of index finger.
Lateral Pinch	Paperback book	Book taken at the spine; thumb pressed against index and middle fingers for lateral support.

Table 4. Dataset Specifications.

Specification	Detail
Subject	Human Kinematics, Biomechanics, Computer Vision
Data Repository	Kaggle (https://doi.org/10.34740/kaggle/dsv/16123762)
Data Type	Tabular (continuous time series joint angles)
Data Format	`.xlsx` (Primary Data), `.json` (Metadata)
Data Collection Method	Markerless 3D tracking using a standard RGB camera and the MediaPipe Hands inference pipeline during standardized object interaction grasps.
Data Access	Openly accessible under the Creative Commons CC BY 4.0 license.
Related Article	This data descriptor.

Table 5. Column definitions for the Data sheet in each .xlsx file.

Column	Units	Description
Time_sec	s	Time for entire column.
Trial_ID	–	Trial index for 5 trial types.
Grasp_ID	–	Grasp of 5 categories.
Thumb_CMC	deg	Carpometacarpal joint angle of thumb
Thumb_MCP	deg	Metacarpophalangeal joint angle of thumb.
Thumb_IP	deg	Interphalangeal joint angle of thumb.
Index_MCP	deg	MCP joint angle of Index.
Index_PIP	deg	Proximal Interphalangeal joint angle Index.
Middle_MCP	deg	MCP joint angle of Middle.
Middle_PIP	deg	PIP joint angle of Middle.
Ring_MCP	deg	MCP joint angle of Ring.
Ring_PIP	deg	PIP joint angle of Ring.
Little_MCP	deg	MCP joint angle of Little.
Little_PIP	deg	PIP joint angle of Little.

Note: NaN values indicate rest or transition frames and are removed in processed data.

Table 6. Comparison of related hand motion tracking methods with the proposed markerless RGB-HJK framework.

Reference	Subjects	Modality	Tasks/Grasps	Joints Tracked	Index PIP Metric (°)	Limitations
Roda-Sales et al. (2023) [51]	26	Instrumented Glove	161 ADLs (incl. flat objects)	18 Sensors	∼ $55 °$ – $70 °$ (ROM Range)	Aggregated generalized ADLs; inhibits tactile feedback
Lai et al. (2023) [52]	8	Soft Sensor Glove	50-mm Cylinder Grasp	Multiple	$68.3 ° \pm 5.3 °$ (Task Angle)	Prototype hardware; inhibits natural tactile feedback
Padilla-M. (2022) [53]	25	Sensor Glove	ARAT Grip Task (Washer)	15 Joints	$54.7 ° \pm 13.8 °$ (Task Angle)	Inhibits natural tactile feedback
Ibrahim et al. (2024) [54]	195	Goniometry	Max Active Flexion	12 Joints	$97.2 ° \pm 16.9 °$ (Max Flexion)	Manual error; static resolution only
Xie et al. (2022) [55]	10	RGB-D Camera	Reach-to-grasp pen	15 Joints	$53.4 ° \pm 10.2 °$ (Grasp End)	Depth noise at close ranges
Proposed (RGB-HJK)	10	Markerless RGB (1 webcam)	Holding Phone (Grasp Type 3)	11 Joints (Angles)	$61.8 ° \pm 18.4 °$ (Task Angle)	None—contactless, portable, low-cost

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yadav, S.; Narayan, J. A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks. Data 2026, 11, 142. https://doi.org/10.3390/data11060142

AMA Style

Yadav S, Narayan J. A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks. Data. 2026; 11(6):142. https://doi.org/10.3390/data11060142

Chicago/Turabian Style

Yadav, Shubham, and Jyotindra Narayan. 2026. "A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks" Data 11, no. 6: 142. https://doi.org/10.3390/data11060142

APA Style

Yadav, S., & Narayan, J. (2026). A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks. Data, 11(6), 142. https://doi.org/10.3390/data11060142

Article Menu

A Markerless RGB-Based Dataset of Continuous Hand Joint Kinematics in Functional Grasping Tasks

Abstract

1. Introduction

2. Methods

2.1. Participants

2.2. Grasp Taxonomy and Experimental Protocol

2.3. Frame Level Dataset Statistics

2.4. Hardware, Software, and Landmark Pipeline

2.5. Joint Angle Computation

2.6. Data Cleaning

3. Data Description

3.1. Dataset Overview and Access

3.2. Column Definitions

3.3. Data Structure and File Organization

4. Technical Validation and Data Visualization

4.1. Active-Hold Detection Reliability

4.2. Joint Angle Patterns Across the Five Grasps

4.3. Intra-Subject Trial to Trial Repeatability

4.4. Inter-Subject Variability

4.5. PCA-Based Grasp Separability

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI