1. Introduction
The human hand is arguably the most versatile part of the human body, enabling a smooth transition from forceful interactions with the environment to highly precise fine motor tasks [
1,
2,
3]. Such unique versatility makes the quantitative characterization of hand joint kinematics a valuable pursuit in a wide variety of disciplines, including clinical rehabilitation, prosthetic hand control, human–computer interaction, and gesture recognition [
4,
5,
6]. High-quality kinematic datasets provide the fundamental raw material that enables researchers to accurately model joint coordination, track recovery trajectories after neuromusculoskeletal injury, and design assistive technologies that truly reflect real-world, practical hand function [
7,
8,
9].
For a long time, accurate human movement analysis relied on expensive marker-based systems. Although these systems provide precise spatial tracking, they require dedicated setups, lengthy preparation, and body-mounted markers that can interfere with natural hand movements during grasping tasks [
10,
11,
12,
13]. Instrumented data gloves can address several of the concerns mentioned above related to line of sight and marker occlusion; however, they also raise additional challenges. Such gloves tend to be uncomfortable due to size, frequently requiring calibration and custom fitting of the instrument. In turn, recently developed RGB-D cameras have become an acceptable solution, though they entail trade-offs; however, they require specialized equipment and may fail to properly locate an object when fingers grasp it tightly enough that each finger obstructs the others. This can cause tracking to fail, which is problematic when aiming for accurate results [
14,
15].
Deep learning-based pose estimation has reached a high level of maturity and has greatly contributed to hand tracking and dataset generation. Frameworks such as MediaPipe Hands allow for real-time 3D hand landmark inference from standard RGB cameras, without the use of physical markers, depth sensors, or complex lab infrastructure [
16,
17,
18]. Based on these advances, a number of recent hand kinematic datasets have been proposed [
19,
20]. However, the majority of these publicly available datasets are designed for static pose estimation or abstract gesture recognition tasks [
19,
20]. They mostly focus on classifying hand shapes, rather than encoding the continuous, frame-by-frame joint angle trajectories that are dynamically generated during a hand’s physical interaction with an object [
21,
22]. Even among those databases that offer kinematics information, there is little to no structure in terms of alignment with existing grasp classifications [
23,
24,
25]. Daily grasping activities occur in predictable, repeatable patterns that have been studied and classified extensively [
24]. Unfortunately, such grasp classifications are not consistently available in many databases. This creates a gap between the data available from vision-based databases and the need for proper biomechanical modeling of the hand. Another issue in this domain is the cost of such a setup.
Table 1 presents a comprehensive comparison to other well-known datasets already present in the literature since 2017.
Considering the aforementioned constraints and deficiencies in existing hand–object interaction datasets (
Table 1), the proposed RGB-Based Hand Joint Kinematics (RGB-HJK) dataset was developed to capture natural functional grasping behavior using an affordable markerless framework. Unlike large-scale datasets such as FreiHAND [
26], which primarily focus on hand pose estimation, or datasets such as DexYCB [
28], GRAB [
29], and synthetic datasets like ObMan [
35], which rely on RGB-D sensors, motion capture systems, multi-view setups, or simulated environments, the proposed dataset emphasizes natural object manipulation during clinically relevant grasp tasks using a single RGB webcam and real-world objects across five standardized grasp categories. One of the key novelties of RGB-HJK is the availability of continuous frame-wise 3D joint angle trajectories rather than only pose labels, bounding boxes, or sparse keypoint annotations commonly reported in existing datasets. With over 28,000 valid active-hold frames, the dataset provides continuous kinematic measurements of 11 hand joints from 10 participants performing multiple grasping tasks, enabling detailed temporal analysis of grasp transitions. Furthermore, unlike existing approaches that require costly motion capture systems, depth sensors, magnetic trackers, or wearable markers that may affect natural object interaction, the proposed framework uses a low-cost inference pipeline based on a commercially available webcam (
$50), thereby improving accessibility for rehabilitation, assistive robotics, and biomechanics applications.
Moreover, in contrast to most current datasets that focus on large-scale pose estimation challenges or simulation-based settings, RGB-HJK focuses specifically on functional grasp biomechanics using anatomically realistic joint-angle representations applicable to rehabilitation robotics, prosthetics, human–robot interaction, and movement analysis applications. Despite the acquisition process being very simple, the obtained trajectories exhibit low intra-subject variability and distinct separability of different types of grasps. This shows that markerless RGB cameras can capture complex functional grasp movements that traditionally required specialized laboratory-grade equipment. The main contributions of this paper are as follows:
- (i)
Markerless RGB-based data of continuous 3D hand-joint kinematics during five functional grasping tasks with real-world objects.
- (ii)
A low-cost acquisition framework using a standard RGB webcam and the MediaPipe Hands pipeline.
- (iii)
Over 28,000 validated active-hold frames with continuous trajectories of 11 anatomical hand joints.
- (iv)
Comprehensive technical validation through repeatability, variability, and PCA-based grasp separability analyses.
3. Data Description
3.1. Dataset Overview and Access
The RGB-HJK dataset is openly hosted on Kaggle under a Creative Commons license CC BY 4.0; the permanent link is
https://doi.org/10.34740/kaggle/dsv/16123762. We chose
.xlsx as the primary file format because it opens without additional software on virtually every operating system and analysis environment. A companion
.json metadata file travels with the data and records session-level provenance. The full set of dataset specifications is listed in
Table 4.
3.2. Column Definitions
Each
.xlsx file (raw and processed) contains two Excel sheets:
Data and
Metadata. The column definitions for the
Data sheet are listed in
Table 5.
3.3. Data Structure and File Organization
The RGB-HJK database is structured in a hierarchical folder system, enabling easy inclusion of files in any analytical or machine learning process. The two top-level folders that constitute the database are “Processed” and “Raw,” with each folder having exactly ten identical Excel files named from Subject01.xlsx to Subject10.xlsx. On the other hand, the Raw folder contains complete information on all recorded frames (e.g., all 3723 frames of Subject 01 for approach, lift, and rest actions), which are ideal for applications involving motion transitions or the development of segmentation algorithms. The Processed folder, on the other hand, provides complete curation of the dataset for specific application use (e.g., biophysical simulation or classification), using only 2061 frames from Subject 01 containing stable active-hold phases. Finally, to follow FAIR data principles, all .xlsx files in both folders have a Data sheet accompanied by a Metadata sheet.
The repository structure is organized as follows:
Annotation quality was ensured through a standardized acquisition protocol, consistent grasp labeling across all participants, and automated landmark extraction using the MediaPipe Hands framework. All recordings were visually inspected to verify correct grasp execution and successful hand tracking. Furthermore, frames containing missing measurements, tracking failures, or physiologically implausible joint angles were removed during data cleaning. The high detection rate across all trials and the strong repeatability observed during technical validation further support the reliability and consistency of the dataset annotations.
5. Discussion
The difficulty in capturing hand kinematics has always involved the necessary compromise between accuracy and ecological validity. Although camera-based tracking systems offer highly accurate measures, they are not without drawbacks: they are cumbersome to set up and can therefore only be performed in laboratory settings, with retro-reflective markers attached to the skin [
48]. Electromagnetic trackers and instrumented gloves from previous studies also have limitations [
49,
50]. As shown in
Table 6, while sensor gloves are able to record task angles in real time, they share a common weakness in the sense that they prevent natural tactile sensations [
51,
52,
53]. The physical constraints, along with the stiffness of the glove material and sensor drift, alter how a participant physically handles the object. Traditional baseline techniques, such as goniometry [
54], are vulnerable to human error and limited to static resolution. The latest approach to resolving the issue has been the use of depth-sensing RGB-D and infrared-based methods [
55]. One advantage of this markerless sensing approach is that it eliminates the need for wearable hardware or physical attachments on the hand, enabling more natural interactions. However, direct object interaction inevitably introduces self-occlusions, which can affect tracking accuracy during grasping tasks. When hands grab a bottle tightly, their closed fingers disrupt the sensor visibility [
55,
56].
The proposed RGB-HJK system overcomes these limitations by operating in an innovative way, leveraging the MediaPipe library to process standard 2D RGB images captured with a low-cost webcam (approximately
$50). In order to verify the accuracy of this markerless pipeline when compared to existing methods,
Table 6 demonstrates the kinematics of a sample Index PIP joint during our Static Power Grasp (holding a phone) compared to other related grasping activities described in the literature. The task angle we measured,
, is comparable to the angles obtained with the previous studies with instrumented glove (
to
) [
51,
52,
53] and RGB-D cameras (
) [
55].
The variance observed in our study results (
) arises from the inherent biological variation in how hands and objects interact, a phenomenon that is typically minimized in lab-based studies or glove-mounted systems. The RGB-HJK model successfully predicted the location of occluded joints during the active grasp phase, unlike the depth sensor/IR camera modalities, which generate significant noise when objects are at near distances [
55]. Our results indicate that precise, verifiable hand kinematics do not necessitate complex, expensive hardware or costly laboratories; an inexpensive
$50 webcam with effective computer vision algorithms will suffice.
6. Conclusions
The RGB-HJK dataset, an open access, marker-free database, has been presented in this work, comprising continuous 3D hand-joint angle sequences spanning five standard grasping tasks with objects across ten healthy volunteers. The study measured intra-subject repeatability and inter-subject variability using standard deviation (SD) in degrees and used PCA variance ratios to measure the linear separability of the grasps. The database contained 28,111 active-hold frames with a mean frame rate of , and the 11 anatomical joints were detected with low variability between repeated trials by the same individual (≤). The framework demonstrates strong potential for rehabilitation assessment; however, limitations remain due to single-camera self-occlusions, exclusion of distal finger joints, limited participant diversity, and lack of validation against gold-standard motion capture methods. Future work should expand clinical populations and include bilateral and diverse grasp evaluations, and establish longitudinal validation for rehabilitation monitoring and anomaly detection in assistive technologies.