1. Introduction
Impairments of hand motor function are a common consequence of neurological injuries, particularly stroke [
1]. In addition to conventional rehabilitation supervised by physical therapists, wearable robotics, such as Hand Exoskeleton Systems (HESs), can further enhance rehabilitation training [
2]. It is well established that, when individuals are properly motivated, the expression of latent motor ability is facilitated [
3]. Accordingly, integrating a serious game with a HES provides an interactive human–robot rehabilitation setting that can become more engaging and potentially more effective, as reported in the literature [
4,
5,
6,
7]. In this context, improving the reliability and usability of human motion monitoring is fundamental for effective human–robot interaction in rehabilitation.
A key requirement for such interactive rehabilitation is the availability of reliable hand motion tracking to drive a virtual hand in real time. In particular, tracking solutions should balance accuracy with low setup burden, as long calibration and intrusive instrumentation can obstruct repeatable clinical-like interaction. Sensors integrated into the HES, such as inertial sensors (IMUs) and other wearable sensors (e.g., magnetic encoders, flex sensors, and EMG), can be used to track the state of the device in real time and, if appropriately mapped onto a model of the body part under rehabilitation, to produce a natural movement of the model in the game. From a human–robot interaction (HRI) perspective, reducing the number of sensors at run time avoids intrusiveness of environmental constraints, improving usability, shortening setup time, and supporting repeatable interaction in clinical-like scenarios. Alternative hand tracking solutions have been explored in the literature, including vision-based systems and wearable sensing devices, each with its own advantages and limitations. In the following section, these approaches are discussed and the rationale for the method adopted in this study is outlined.
As will be discussed in detail in the following sections, each of the finger mechanisms of the hand exoskeleton considered in this work is a 1-degree-of-freedom (DoF) rigid kinematic chain equipped with a single built-in encoder measuring the exoskeleton motion. The encoder-measured motion coordinate is denoted as
, i.e., the only actuated (independent) coordinate among the 18 variables describing the device kinematics. While
is sufficient to describe the device actuation, interactive virtual hand control typically requires joint-level kinematics, e.g., MetaCarpoPhalangeal (MCP) and Proximal InterPhalangeal (PIP) flexion angles. To enable such a representation while keeping the exoskeleton unmodified during operation, we propose a two-stage approach (
Figure 1). During an initial model-fitting phase, we temporarily redesign the exoskeleton with 3D-printed add-on parts to mount two additional encoders that provide reference measurements of finger motion. The acquired encoder signals are filtered to attenuate measurement noise and obtain smoother trajectories; the resulting dataset is then used to identify polynomial mapping functions that relate the built-in encoder angle
to the corresponding MCP and PIP joint angles. After the model-fitting stage, the add-on sensors can be removed and the original exoskeleton can be used unmodified: the virtual hand motion is reconstructed in real time using only the on-board encoder
and the identified polynomial mappings, without requiring external motion capture systems or other intrusive equipment for deployment.
In this work, we present the proposed model-fitting strategy and its integration in a Unity-based serious game to control a virtual hand (
Figure 1). Resource efficiency is achieved by limiting sensing and infrastructure requirements during testing and routine operation: MCP and PIP flexion–extension angles are estimated in real time from a single on-board encoder, avoiding additional wearable sensors and external tracking systems during rehabilitation sessions; additional sensing is required only during the model-fitting phase, using low-cost temporary encoders mounted through reversible 3D-printed add-on parts, without permanently modifying the device worn by the patient. The approach is further evaluated to investigate whether the encoder-based sensing strategy, relying on a single on-board encoder at run time, can provide sufficiently accurate and repeatable MCP and PIP kinematics for interactive rehabilitation-oriented applications. When needed, external motion capture (MoCap) is used only for experimental validation and is not required for system operation.
The main contributions of this paper are as follows: (i) a reversible and resource-efficient 3D-printed redesign enabling temporary sensorization while preserving the original device for deployment; (ii) a polynomial regression-based mapping from the single built-in encoder signal to MCP and PIP joint kinematics, obtained from filtered data collected during the model-fitting stage; (iii) the integration of the proposed reconstruction pipeline in a Unity serious game for real-time virtual-hand control; and (iv) an experimental evaluation including repeatability analysis and motion capture-based validation.
The structure of this paper is as follows:
Section 1 reviews motion-tracking methods for hand kinematics and positions the proposed approach within existing solutions;
Section 2 describes the exoskeleton’s redesign, the sensing architecture, and the motion reconstruction pipeline;
Section 3 reports the experimental results obtained with the proposed system; and finally,
Section 4 discusses the findings and concludes this paper.
Related Work
The following discussion outlines the different tracking systems, their pros and cons, and the rationale behind the choice of the current method to improve finger motion tracking with the HES developed at the Department of Industrial Engineering of the University of Florence (DIEF-HES) (as detailed later in
Section 2.1).
Tracking systems can be classified as non-visual, visual-based, or a combination of both [
8,
9,
10,
11,
12] (
Figure 2).
Visual tracking systems always include cameras, which can be either stereo cameras or 3D depth cameras (key system parameters typically include resolution, field of view, frame rate, and working distance). In the former case, two images are used to reconstruct the subject of interest, whereas in the latter, a camera is combined with one or more sensors, such as infrared. In both cases, both budget-friendly and expensive solutions are available, and an initial calibration is required for data acquisition, determining the quality of the capture (e.g., intrinsic/extrinsic calibration and/or hand model initialization). While the choice of the camera is a crucial factor in visual tracking systems, the setup of the tracked object is also a key aspect. One approach involves placing trackers in strategic positions of the object [
13], in which case marker occlusions or marker swapping may occur [
14] (often quantified via tracking loss rate under occlusions and reprojection/pose consistency). Alternatively, colored gloves can be used [
15]. A markerless measurement system is also available, offering a less intrusive experience for the user but at the cost of being less accurate (performance is commonly reported in terms of positional/joint angle error and end-to-end latency/jitter in real-time applications) [
16,
17].
The other main tracking method is non-visual tracking, which includes inertial-based, magnetic-based or other sensor-based systems. Inertial Measurement Units (IMUs) [
18,
19,
20] use accelerometers, gyroscopes and/or magnetometers to acquire data on inertial motion and the 3D orientation of joined individual segments. Unlike visual systems, IMUs do not suffer from the line-of-sight problem, though fluctuation in offsets and measurement noise can lead to integration drift. Magnetic sensors are also widely used for tracking movements in virtual reality (VR) due to their small size, although they may suffer from latency and jitter [
21]. In all cases, the sensors can be integrated into devices worn by the subject. While this solution provides a higher accuracy of the measurement, it also requires taking up part of the limited surrounding space of the hand, making it a more intrusive solution [
22,
23,
24].
Finally, hybrid solutions [
25,
26,
27] typically use cameras, such as Dynamic Vision Sensors (DVSs) or 3D depth cameras, to capture hand movements, combined with ElectroMyoGraphic (EMG) signal measurements to correlate muscle activity with the tracked motion.
Within this landscape, our work focuses on minimal, encoder-based sensing and on identifying a subject-specific mapping from device actuation to finger joint kinematics, avoiding external vision systems during deployment and thus eliminating line-of-sight constraints and reducing sensitivity to occlusions. The design choice of using encoders is motivated by (i) the limited space available in hand-coupled mechanisms, for which compact magnetic encoders (8 mm diameter, 3 mm height) are particularly suitable and (ii) the presence of an on-board encoder of the same type in the baseline device. Therefore, to ensure measurement consistency, the two temporary encoders added for the model-fitting phase were selected to match the built-in one, enabling the use of three homogeneous sensors.
2. Materials and Methods
This section describes the materials and methods used in this study. The device adopted for the experiments is first presented, including its mechanical structure, operating principle, and integration with the other components of the system. The relevant kinematic model and variables used throughout this work are also introduced. The following subsections illustrate the redesign of the device, the measurement setup, and the signal processing pipeline. Finally, the validation procedures adopted to assess the performance of the proposed approach are outlined.
2.1. Baseline Hand Exoskeleton
This study was conducted using the 2022 model of the DIEF-HES, which represents the most recent stable and fully operational version available; a complete hardware and software redesign is currently under development and testing and was therefore not used. The system comprehends the DIEF-HES, the Remote Actuation System (RAS), where actuation and control components are housed, and a monitor that displays the serious game used during the rehabilitation process (
Figure 3). The DIEF-HES, worn integral to the back of the hand and secured to the intermediate phalanges of each finger via Velcro straps, is connected to the RAS through Bowden cables (Semerfil Worldwires s.r.l., Bari, Italy) for actuation and through electrical cables for communication with the on-board sensors (encoders (RM08 Linear Miniature Rotary Magnetic Encoder, RLS, Komenda, Slovenia) and load cells (FSSM-500N, Forsentek Co., Shenzhen, China)). The serious game used alongside the DIEF-HES was specifically designed to work together with the system to engage the user during rehabilitation, while enlisting them through the exercises prescribed by physical therapists. The game simulates the user’s hand moving and interacting with digital objects in a virtual environment. Specifically, the DIEF-HES’s encoders measure the FM motion and drive the flexion–extension movements of the digital fingers. The embedded load cells measure the force applied by the user with each finger while playing the game; the hand exoskeleton is hence force-controlled to follow the force references coming from the virtual reality (e.g., the force reference is zero when no interaction with objects is detected, or the force reference assumes values proportional to the stiffness of the objects and the indentation between the bounding boxes of the object and the fingers). From this perspective, it is mandatory to achieve an accurate tracking of the finger kinematics, since the interaction with virtual objects heavily relies on it. As better described in later sections, DIEF-HES did not have a sufficiently accurate finger motion tracking system, so it became necessary to conduct the work described in this paper.
Regarding the mechanical design of the FM, it consists of five links and a ground frame (
Figure 4). The frame is rigidly coupled to the hand housing via a magnetic interface and a pin-hole coupling, making it integral to the patient’s hand. The FM has one DoF; consequently, the mechanism configuration is fully determined by a single generalized coordinate, this being the rear crank rotation
, which is the actuated link. Actuation is provided through a Bowden cable transmission: the system is connected, on one side, to a pulley rigidly attached to the rear crank and, on the actuator side, to a pulley mounted on an electric motor. Two sensors are integrated in the system: (i) a magnetic encoder, placed at joint
Z to measure
, which uniquely determines the full mechanism configuration at any time, and (ii) a load cell mounted integral to the connecting rod to continuously measure the axial force in that link. While the back of the hand is integral to the frame, the finger’s intermediate phalanx is secured to the thimble via a Velcro strap. The thimble is the only FM component whose position is not directly known during the exercise, as it is connected to the mechanism through a passive pivot–slider coupling (joint
E). This interface constrains the finger–thimble interaction to occur only along the direction normal to the phalanx, which is desirable to minimize shear stresses on the skin.
2.2. Exoskeleton Redesign and Sensor Integration
The redesign focused on a single FM of the DIEF-HES, specifically the index finger FM, to validate the proposed method and assess its performance before extending the approach to the full device. The FM motion can be tracked in real time through the on-board encoder, which measures the rear crank angle ; however, this measurement alone does not provide the thimble rotation (and thus the finger phalanx motion). The objective of the redesign is therefore to estimate the finger phalanx kinematics as a function of the mechanism motion. To this end, the rotational displacements of the MCP and PIP joints must be recorded together with . Accordingly, two additional temporary encoders were introduced: one to measure the MCP rotation and one to measure the thimble rotation, which is coupled to the intermediate phalanx and can thus be used to measure the PIP rotation.
The mechanical redesign can be summarized in two main interventions. First, to embed a magnetic encoder in the thimble, the latter and the connected link were redesigned while leveraging the existing FM architecture. Second, a dedicated temporary external assembly, referred to as the Phalanx–Metacarpal Module (PMM), was developed to accommodate the second temporary magnetic encoder for MCP joint angle measurement.
Figure 5 illustrates the mechanical redesign adopted to enable temporary sensorization. For clarity,
Figure 6 further provides a schematic representation of the resulting closed-chain coupling between the assemblies and the finger, explicitly showing how the measured device variables relate to the finger joint kinematics. In this schematic, the angles
and
are introduced only to illustrate the kinematic coupling, as formalized in Equations (
1) and (
2).
After offset removal at the open-hand reference, these quantities are directly mapped to the anatomical joint angles MCP and PIP, which are used consistently throughout the rest of this paper for clarity and for presenting the experimental results.
Since the two new assemblies were devised solely as test cases, encumbrance and weight were considered but not treated as strict design constraints, as the assemblies were not intended for final deployment in robot-assisted therapy with patients.
Both the redesigned thimble and PMM were designed for 3D printing in Acrylonitrile Butadiene Styrene (ABS).
The sensors selected for this study are two miniature rotary magnetic encoders, identical to the one already embedded in the device. These sensors offer several advantages, mainly a super small size with a 8 mm diameter body, an accuracy to , a high-speed operation to 30,000 rpm, and a non-contact, frictionless design, which makes them suitable for the proposed integration.
Finally, structural static analyses of the new components were performed using the Finite Element Method (FEM). Material properties of ABS, including yield stress (31 MPa) and Young’s modulus (1.5 GPa), were considered in SolidWorks v2020 simulations. Previous activities of the research group adopted a nominal force of 15 N; for this work, a force of 20 N was chosen to ensure the components’ reliability. The load was applied on joint
E, while hinge constraints were used to replicate the interactions with connected components. In addition, the force direction identified in the latest HES study by the authors [
6] was considered in relation to the global reference frame and the corresponding value of
.
It is noteworthy that, compared to the original aluminum design, the redesigned ABS CE-link was thickened to ensure its structural integrity. To conclude, the most stressed component of the system can be considered well-dimensioned since, under the worst operating conditions, it presented a maximum stress value of and a maximum displacement of ; therefore it is a stress that is far from the yield condition (a safety factor of ) and an acceptable maximum displacement, confirming the new assembly design.
2.3. Measurement and Signal Processing
Three super small non-contact rotary encoders (RLS RM08 Linear Miniature Rotary Magnetic Encoder) were the sensors chosen for this application. The encoder mounted at joint
Z measures the angle
, whereas the two temporary encoders, introduced for this measurement phase, measure the thimble (i.e., the PIP angle) and the MCP angle; the latter is placed in the new PMM (
Figure 7). The data acquisition setup comprises an Arduino Mega microcontroller board, a breadboard for circuitry, and an ESP32.
Five healthy subjects were enrolled in this preliminary study after providing informed consent and were asked to wear the redesigned FM and the PMM. Data collection was performed as follows. First, the exoskeleton was set in the open configuration, with both the thimble and the PMM in a horizontal position. The initial configuration of the FM, thimble, and PMM was used as the starting reference for data evaluation, corresponding to the open position. Second, the user’s hand, and consequently the connected FM, were closed at a constant speed until the rear crank reached the closed end-stop. The procedure was repeated three additional times to minimize the influence of external noise or software-related errors. It is worth noting that the movement performed by the volunteers corresponds to the gesture intended to be displayed in the serious game during rehabilitation exercises.
The recorded data (stored as CSV files) were imported into MATLAB vR2023b for processing (
Figure 8). This processing pipeline was selected to attenuate tremor and transmission-related fluctuations while preserving the low-frequency kinematic trends that are relevant for real-time interaction in the serious game. Prior to inclusion in the dataset, the signals were filtered according to the following pipeline: (i) Basic pre-processing, including offset removal; no timestamp correction or resampling was required because all encoders shared the same acquisition frequency (identical sensor model and configuration). (ii) Low-pass Butterworth filtering (4th order, zero-phase using
filtfilt to avoid phase lag, and 5 Hz cut-off) [
28,
29,
30] to attenuate high-frequency components attributable to physiological tremor and mechanical friction in the exoskeleton. (iii) Additional low-pass smoothing using a Savitzky–Golay filter (3rd-order polynomial) [
31,
32,
33] to further reduce residual high-frequency fluctuations while preserving the overall signal shape.
For each recording, flexion–extension cycles were extracted from the filtered encoder trajectories. A cycle was defined as a movement starting from a relaxed posture, reaching a clear flexion peak, and returning to the initial range of motion. Cycles with incomplete motion (reduced range), artifacts (loss of contact and signal saturation) or interruptions were discarded. From each recording, one representative flexion–extension cycle (close–open) was selected from the central portion of the trial and time-normalized to
. Therefore, five total cycles were retained (
). Each subject-specific cycle was used to fit a third-order polynomial mapping the measured
and
angles to the FM control angle
. A third-order polynomial was chosen as the lowest-order model capable of capturing the smooth but asymmetric curvature of the flexion–extension profile while avoiding the oscillations and overfitting typically introduced by higher-order fits.
where
and
are the polynomial coefficients. For each subject, the approximation quality was quantified by computing the Root-Mean-Square Error (RMSE) between the measured angles and the corresponding polynomial reconstructions. Finally, models were then obtained by averaging coefficients across cycles:
The resulting polynomials and describe a representative flexion–extension pattern used to drive the joint motion of the virtual finger in the serious game.
2.4. Validation with Motion Capture
The MoCap system used for this phase was the OptiTrack motion capture system. Twelve infrared cameras where employed: eleven for tracking purposes and 1 configured to record a black-and-white reference video for future reference. To capture the angles of interest, the experimental setup included the hand wearing the FM, and reflective markers were placed at strategic locations. Four markers were attached to the exoskeleton to track the rotation of
and the thimble (hence, the
joint), while three markers were placed on the finger to estimate the
and, again,
joint rotations (
Figure 9).
Motive (OptiTrack’s v3.1 software) provides several predefined marker sets for describing body motion; however, for this application a custom marker set was created.
Five subjects were asked to wear the glove with the FM and perform the same movements executed during the previous encoder-based recordings. After the acquisitions, the recordings were refined with Motive in the dedicated data-editing section. The resulting data were then exported as CSV files and processed in MATLAB.
3. Results
Figure 10 shows the group-level polynomial mappings relating the normalized actuation angle
to the corresponding joint kinematics. Reference
and
trajectories were obtained from the filtered measurements of the permanent on-board encoder and the two temporarily mounted encoders, as described in
Section 2.3. In contrast, the mapping functions were evaluated using the raw encoder signal (after the offset removal), in accordance with the intended usage in the Unity serious game. The top two panels of
Figure 10 show the raw data taken from five participants. For each subject, the time interval corresponding to the finger closing movement with the best signal quality was selected and used to compute the final polynomial fit as a result of averaging the single polynomial coefficients across subjects (
Table 1). For completeness, the InterQuartile Range (IQR) (25–75%) is reported in all panels.
A mixed group aged 20–30 years, female and male, were the volunteers for this preliminary dataset; the starting point was set as 0 for each group of samples. For these reasons, data variability tends to increase as the motion progresses away from the starting point. Reconstruction accuracy was quantified by comparing the estimated angles
with the reference trajectories
(encoder measurements). Residuals were defined as
and are summarized in the bottom panels through their median and IQR.
Table 1 reports the error metrics, including RMSE and MAE. Overall,
reconstruction achieved an RMSE of
° and an MAE of
°, whereas
reconstruction resulted in an RMSE of
° and an MAE of
°. Given the preliminary nature of this study and the intended use of the mapping to drive a virtual finger in a Unity-based serious game, the main requirement is a stable and visually plausible motion rendering rather than clinical-grade goniometric accuracy. Accordingly, the observed error levels are considered acceptable for the target real-time application.
Repeatability was assessed by analyzing the variability of the reference trajectories and the consistency of the reconstructed angles across repetitions and subjects. The residual spread is minimal at and increases toward the closed-hand configuration, which is consistent with the alignment procedure at the starting point and with residual sources of variability such as micro-movements of the hand and occasional stick–slip of the mechanism, despite filtering.
MoCap recordings, used to validate the learned kinematic relationships
, were collected in separate sessions; five volunteers were asked to perform repetitions of closing–opening of the hand. Collected data was then mapped in relation to the normalized actuation angle
to allow a comparison with
(
Figure 11). Specifically, encoder and MoCap recordings were subject to event-based alignment; hence, participants were asked to hold the open and closed positions longer then the rest of the movements so that the opening configuration and closed configuration could be used respectively as
and
. MoCap was used as an independent plausibility check of the reconstructed kinematics (range) rather than a synchronized point-wise ground truth. Agreement was quantified using MAE and IQR coverage, which are robust to marker noise and inter-subject variability.
Finally, Unity’s serious game was updated with the and functions, confirming that these results support the use of the group-level polynomial as a compact representation for real-time reconstruction.
4. Discussion and Conclusions
This work proposed a practical, two-stage workflow to reconstruct joint finger kinematics from a single on-board encoder signal of a 1-DoF FM of the DIEF-HES. During the first phase, the on-board permanent encoder and two temporary encoders were used to obtain reference and flexion–extension trajectories and identify third-order polynomial mappings (with being the variable that describes the 1-DoF FM kinematics). In the operational phase, the temporary sensors can be removed and the virtual finger motion can be rendered in real time using only the permanent encoder , thus reducing instrumentation, setup time, and overall user burden in rehabilitation-oriented HRI scenarios.
Experimental results on five healthy subjects showed that the learned mappings provide a stable and visually plausible reconstruction over repeated opening–closing cycles. Against the encoder-based reference trajectories, the approach achieved an RMSE of ° () and ° (), with an MAE of ° () and ° (). Residual trends indicated that errors are not uniformly distributed along , with larger deviations appearing away from the starting posture, consistent with increased variability during the motion. An additional validation was performed using independent MoCap recordings as a plausibility check of the reconstructed kinematic ranges. In this setting, agreement remained satisfactory, with a MoCap-based MAE of ° () and ° () and IQR coverage of and , respectively, supporting the consistency of the reconstructed motion within typical inter-subject dispersion.
Overall, the proposed method enables joint angle estimation suitable for interactive rehabilitation applications where the primary requirement is robust, repeatable, and realistic motion rendering (e.g., in a Unity serious game with average errors around 9°) [
34,
35,
36,
37], rather than clinical-grade goniometric accuracy (with average errors
°) [
38,
39]. Future work will (i) extend the evaluation to a larger group and to target users with motor impairments and (ii) generalize the approach to additional degrees of freedom and more complex grasp patterns, enabling richer HRI behaviors while preserving minimal sensing at run time.