Analysis of Upper-Limb and Trunk Kinematic Variability: Accuracy and Reliability of an RGB-D Sensor

: In the field of motion analysis, the gold standard devices are marker-based tracking systems. Despite being very accurate, their cost, stringent working environments, and long preparation time make them unsuitable for small clinics as well as for other scenarios such as industrial application. Since human-centered approaches have been promoted even outside clinical environments, the need for easy-to-use solutions to track human motion is topical. In this context, cost-effective devices, such as RGB-Depth (RBG-D) cameras have been proposed, aiming at a user-centered evaluation in rehabilitation or of workers in industry environment. In this paper, we aimed at comparing marker-based systems and RGB-D cameras for tracking human motion. We used a Vicon system (Vicon Motion Systems, Oxford, UK) as a gold standard for the analysis of accuracy and reliability of the Kinect V2 (Microsoft, Redmond, WA, USA) in a variety of gestures in the upper limb workspace—targeting rehabilitation and working applications. The comparison was performed on a group of 15 adult healthy subjects. Each subject had to perform two types of upper-limb movements (point-to-point and exploration) in three workspace sectors (central, right, and left) that might be explored in rehabilitation and industrial working scenarios. The protocol was conceived to test a wide range of the field of view of the RGB-D device. Our results, detailed in the paper, suggest that RGB-D sensors are adequate to track the upper limb for biomechanical assessments, even though relevant limitations can be found in the assessment and reliability of some specific degrees of freedom and gestures with respect to marker-based systems.


Introduction
Human motion analysis (HMA) is a domain in the biomechanics field aiming at quantitatively describing the human movement, which is traditionally acquired by means of motion tracking technologies. Applications of HMA span many areas ranging from clinics and motor rehabilitation to industrial and entertainment fields [1]. In the area of human motor rehabilitation, the data obtained from motion tracking are used for medical treatment plans and evaluations [2]. The most accurate standard systems for this field are marker-based optoelectronic tracking equipment. Placing the markers on anatomical landmarks on the subject's body, such systems allow a precise reconstruction of correlation between the gait parameters registered with a marker-based system and the Kinect device, concluding that it can be used to track gait parameters, although for its use in a clinical environment they suggest that further improvements in its sensitivity are needed. The accuracy of the Kinect V2 in full body acquisitions, concentrated on walking and balance control was assessed [52], concluding that Kinect V2 could achieve good level of precision for motion analysis. Another work [53] concentrated only on some degrees of freedom of the upper-limb. They tested the accuracy of the sensor, and found that the RGB-D sensor could approximate the results of the marker-based system, even though the range of movements analyzed was quite limited. A comprehensive assessment [54] analyzed the performance of Kinect in movements in respect to the Vicon system in detail. The authors concluded that the system is suitable for motion tracking, but the analyzed degrees of freedom were associated to specific gestures conceived to emphasize each specific motor primitive-a favorable condition for tracking. In general, we found that there was a limited number of studies that concentrated on assessing the validity of RGB-D sensors in assessing the upper limb movement in multiple workspace sectors resembling the variety of the possible tasks during daily-life or working activities. Especially, we noted how previous studies in this field concentrated mainly on paradigmatic rehabilitation tasks, gait, or functional movements, rather than mapping the variety of upper-limb gestures typical of working and real-life applications. Consequently, given that RGB-D sensors can promote a human-centered perspective in the context of motor evaluation, in this pilot study we implemented a comparison between the Kinect V2 (as a well-known RGB-D sensor widely used for motion tracking) and an optoelectronic marker-based Vicon system (as golden standard). The aim was to evaluate the accuracy and reliability of the upper-limb and trunk movement in an extended simulation of gestures typical of working environment, addressing in particular novel applications related to the scaling-up of rehabilitation assessments to the industrial field.

Equipment
The tests were performed in the motion acquisition laboratory of the Italian Council of National Research (CNR) in Italy, Lecco. The laboratory was equipped with: 1. One Vicon Vero system, composed of 10 infrared cameras; a set of reflective markers for motion tracking to be used with the Vicon system. In this experimental condition, 34 markers were used (25 for the upper-limb model, 9 for the target); 2.
One Kinect V2.0 device to track the human body in space. Kinect uses an RGB-D camera, for frame acquisition at 30 Hz sampling frequency, and a time of flight infrared camera, for depth sensing. For more in-depth on the Kinect systems, exhaustive details can be found in previous works [55,56]. The Kinect was mounted on an easel and was at about 2.5 m from the recorded scene for best tracking [52]; 3.
Two general purpose computers: the first one connected to the Vicon system and containing the software for the acquisition and for the pre-processing of the tracking data, and the second one containing a custom-made software in C#, which communicated directly with the Kinect V2 device. It could generate a file containing 25 points of interest composing the SDK Kinect skeleton; 4. One 60 cm diameter circular target with 9 points of interest named N, NE, E, SE, S, SW, W, NW, and O. This target was used as a reference for the subjects to execute point-to-point and workspace exploration movements [57].
A schematic of the set-up is portrayed in Figure 1.  One 60 cm diameter circular target with 9 points of interest named N, NE, E, SE, S, SW, W, NW, and O. This target was used as a reference for the subjects to execute point-to-point and workspace exploration movements [57].
A schematic of the set-up is portrayed in Figure 1.

Movement Selection
Each subject was asked to execute two types of movements with the right arm. The selected tasks were point-to-point and hand exploration movements [57], commonly adopted in the context of motor rehabilitation. These are gestures extrapolated from daily life activities, and at the same time allow exploring up to the limits of the upper limb available workspace.
The point-to-point movement ( Figure 2, top row) is a paradigmatic movement used by clinicians to test motor capabilities of patients. It was chosen for its wide use in the rehabilitation contexts [28]. The movement involves multi-joint coordination and is used often in daily life activities. The subject started from a predefined resting position with the upper-arm leaning along the body, then reached toward a specific point on the circular target and then returned to the starting position. The first movement, performed by each subject, was reaching towards the 'O' point, then back to resting

Movement Selection
Each subject was asked to execute two types of movements with the right arm. The selected tasks were point-to-point and hand exploration movements [57], commonly adopted in the context of motor rehabilitation. These are gestures extrapolated from daily life activities, and at the same time allow exploring up to the limits of the upper limb available workspace.
The point-to-point movement ( Figure 2, top row) is a paradigmatic movement used by clinicians to test motor capabilities of patients. It was chosen for its wide use in the rehabilitation contexts [28]. The movement involves multi-joint coordination and is used often in daily life activities. The subject started from a predefined resting position with the upper-arm leaning along the body, then reached toward a specific point on the circular target and then returned to the starting position. The first movement, performed by each subject, was reaching towards the 'O' point, then back to resting position. Afterwards, the subject pointed to 'NE', back to resting position, and then continued the motion pattern clockwise until reaching the 'N' point, and backwards.
The hand exploration task ( Figure 2, lower row) is another gesture conceived to simulate movements on a working surface and object displacement for distal limb coordination. Again, the circular target was used as reference to drive the subject's movement. The objective of each participant was to move the hand radially across the target, starting from the resting position and going to different points on the circumference. The resting position for this movement was upright with the hand pointing towards the 'O' point. The first movement was towards 'NE', then 'O', and the motion pattern proceeded clockwise until the subjects reached 'N', and finally back to 'O'. position. Afterwards, the subject pointed to 'NE', back to resting position, and then continued the motion pattern clockwise until reaching the 'N' point, and backwards. The hand exploration task ( Figure 2, lower row) is another gesture conceived to simulate movements on a working surface and object displacement for distal limb coordination. Again, the circular target was used as reference to drive the subject's movement. The objective of each participant was to move the hand radially across the target, starting from the resting position and going to different points on the circumference. The resting position for this movement was upright with the hand pointing towards the 'O' point. The first movement was towards 'NE', then 'O', and the motion pattern proceeded clockwise until the subjects reached 'N', and finally back to 'O'.

Experimental Set-Up
The equipment was placed in the acquisition volume of the Vicon system. The subject faced frontally with respect to the Kinect. In order to test the field of view of the device, the circular target was placed in three different positions: on the right of the subject; on the left; in front of the participant, horizontally. Regardless of the position of the target, movements were always performed with the right upper limb. For the right and left sectors, the target was placed on a tripod and adjusted so that the 'O' target was at the subject's shoulder height. While in the central sector, the target was placed horizontally on a table. The workspace sectors are depicted in Figure 2.
Two general-purpose computers-operating Vicon and Kinect, respectively-were positioned outside of the acquisition volume so as not to interfere with the acquisition. The only objects present in the workspace were a table, the target, the two tripods, and the Kinect.

Acquisition
Before each recording session, the subject was equipped in accordance to the upper limb model designed for the Vicon system. Five markers were placed on the trunk, one on each shoulder, three

Experimental Set-Up
The equipment was placed in the acquisition volume of the Vicon system. The subject faced frontally with respect to the Kinect. In order to test the field of view of the device, the circular target was placed in three different positions: on the right of the subject; on the left; in front of the participant, horizontally. Regardless of the position of the target, movements were always performed with the right upper limb. For the right and left sectors, the target was placed on a tripod and adjusted so that the 'O' target was at the subject's shoulder height. While in the central sector, the target was placed horizontally on a table. The workspace sectors are depicted in Figure 2.
Two general-purpose computers-operating Vicon and Kinect, respectively-were positioned outside of the acquisition volume so as not to interfere with the acquisition. The only objects present in the workspace were a table, the target, the two tripods, and the Kinect.

Acquisition
Before each recording session, the subject was equipped in accordance to the upper limb model designed for the Vicon system. Five markers were placed on the trunk, one on each shoulder, three on each upper arm, two on each elbow, and four on each forearm and wrist (Figure 3), for a total of 25 markers [53]. Meanwhile, the Kinect V2 started to warm-up for about 20 min.
The datasets were acquired in the previously described environment with both the Kinect and the Vicon. Since the software for the two systems were on different computers, the acquisition of the data was not synchronized during the recordings. We chose to always start the registration with the Vicon and afterwards start the Kinect data stream. At the signal of the operator, the subject started the movement.
The two different acquisition systems provided specific data structures. The Kinect dataset consisted in the Microsoft SDK 2.0 skeleton 3D data from 25 anatomical points ( Figure 3). The custom-made software used to interact with the Kinect provided a file readable with MATLAB (Mathworks, Natick, MA, USA). This file contained the 3D coordinates of the SDK skeleton's points with respect to the Kinect's reference system.
to the Kinect's reference system.
The dataset registered with the Vicon system needed preprocessing in the Nexus virtual environment (Vicon software), including marker tracking and labelling. Afterwards, using the upper limb model, the software could accurately reconstruct the position of the glenohumeral joint center (shoulder center of rotation), humeroulnar joint center (elbow center of rotation) and radiocarpal joint center (wrist joint center) for both left and right upper limbs, as well as estimating the position of the trunk [58]. This tracking allowed the estimation of 11 degrees of freedom (details in Section 2.6). After preprocessing, the data were saved in c3d file format and imported in MATLAB.
In order to allow test-retest comparison, each acquisition was repeated twice; the second trial was performed about 2 min after the first, in a similar way in respect to a previous analogous study [27].  The dataset registered with the Vicon system needed preprocessing in the Nexus virtual environment (Vicon software), including marker tracking and labelling. Afterwards, using the upper limb model, the software could accurately reconstruct the position of the glenohumeral joint center (shoulder center of rotation), humeroulnar joint center (elbow center of rotation) and radiocarpal joint center (wrist joint center) for both left and right upper limbs, as well as estimating the position of the trunk [58]. This tracking allowed the estimation of 11 degrees of freedom (details in Section 2.6). After preprocessing, the data were saved in c3d file format and imported in MATLAB.
In order to allow test-retest comparison, each acquisition was repeated twice; the second trial was performed about 2 min after the first, in a similar way in respect to a previous analogous study [27].

Data Analysis
All computed angular data (Vicon and Kinect) were filtered with a low-pass Butterworth filter at 5 Hz [59]. Then, since the two systems acquired data at different sampling frequencies (100 Hz Vicon; 30 Hz Kinect) and acquisition times (Vicon started about 1 s before Kinect and ended 1 s after Kinect), the two datasets were aligned. The first step was the detection of movement phases: each acquisition was subdivided into sub-movements (nine sub-movements for arm point-to-point forward phases, eight forward phases and eight backward phases for workspace exploration). The segmentation into movement phases was achieved as follows. The curvilinear abscissa and velocity profile of the wrist marker (end effector) were computed. Then, synchronization of each phase onset and offset was achieved by means of an algorithm based of thresholding and local minimum detection on the velocity profile. Then, after onset alignment, the data from each phase (O->NE, NE->O . . . ) was resampled so that each of them was 100 samples long to allow inter-variable, inter-sector, and inter-subject comparisons.
In order to compare the two systems, the position of the centers of rotation of the right upper-limb was used to compute a set of 11 variables, namely the rotational degrees of freedom (DoF), which described the upper-limb motion. The following degrees of freedom were considered ( Figure 4): shoulder elevation, shoulder rotation along the vertical axis, shoulder internal-external rotation, elbow flexion-extension, hand flexion-extension, hand pronation-supination, hand deviation, scapular elevation, trunk torsion, trunk anterior-posterior flexion, and trunk medial-lateral flexion. The direction of the arrows indicates 'positive' angles. In order to avoid problems of reference system registration, all variables were computed relative to a subject specific reference system.

Outcome Measures and Statistical Analysis
For measuring Kinect accuracy, the chosen comparison metric was the angular distance (ad) computed between the corresponding degrees of freedom, phase-by-phase, obtained with the Vicon and the Kinect systems, respectively. The implicit assumption was that the Vicon was the gold standard. First, directional pie-charts illustrating the ad for each of the degree of freedom (DoF)

Outcome Measures and Statistical Analysis
For measuring Kinect accuracy, the chosen comparison metric was the angular distance (ad) computed between the corresponding degrees of freedom, phase-by-phase, obtained with the Vicon and the Kinect systems, respectively. The implicit assumption was that the Vicon was the gold standard. First, directional pie-charts illustrating the ad for each of the degree of freedom (DoF) considered were generated, accounting for directions and for each of the sectors. For point-to-point movements, the directional pie-charts had 8 cardinal directions, referred to the forward phases of the movements; for exploration movements, we considered also the backward phases and thus the resolution of the pie-charts was doubled. A heatmap was designed to show the magnitude of the ad between Vicon and Kinect.
Then, all the ad data were pooled into three matrices, one matrix per sector, each matrix containing the ad for each DoF and computed phase. These three matrices were used as parameters for a two-way ANOVA test, with ad compared between degrees of freedom (dimension: 11) and sectors (3: central, right and left) as factors, in order to see how each DoF was tracked with the Kinect in respect to Vicon and quantify the performance of the RGB-D sensor in each degree of freedom and sector. The p-value (p) determined the probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct. The tests were repeated separately for point-to-point and exploration movements. Post-hoc tests (MATLAB function multcompare) were also implemented to determine which degrees of freedom and/or sectors differed from the others. The post-hoc test was Tukey's honest significant difference criterion or (HSD). This method compares the variables of interest two at a time under the assumption of equal variance and statistical independence. The results offer a pairwise confidence interval of similarity. Level of significance for all statistical tests was 0.05.
For measuring Kinect repeatability, the chosen comparison metric was the angular distance (ad) computed between the corresponding degrees of freedom, phase-by-phase, obtained with the Kinect system in the test and re-test trials.
To show this, directional, heatmap pie-charts depicting the ad comparing Kinect performance in test and retest conditions were plotted. For point-to-point movements, the directional pie-charts had eight cardinal directions, referred to the forward phases of the movements; for exploration movements, we considered also the backward phases and thus the resolution of the pie-charts was doubled. A heatmap was designed to show the magnitude of the ad between test and retest.
In order to provide a metric for reliability of the statistical results, a test-retest analysis was also performed, taking advantage of the fact that each acquisition was repeated twice. In order to quantify the reliability through test-retest datasets, the interclass correlation coefficient (ICC) between the datasets was chosen [54]. Post-hoc tests (MATLAB function 'multcompare') were also implemented to determine which degrees of freedom and/or sectors differed from the others. The post-hoc test was Tukey's honest significant difference criterion or (HSD). This method compares the variables of interest two at a time under the assumption of equal variance and statistical independence. The results offer a pairwise confidence interval of similarity. The level of significance for all statistical tests was set to 0.05.

Marker-Based System vs. RGB-D Sensor
In this section, Kinect accuracy in respect to the marker-based system and the statistical results obtained for all subjects are presented.
A series of visual representations of the circular target illustrate, through heatmaps, the performance of the Kinect against the Vicon system for each specific DoF, in each sector, and with respect to the directionality of the target. The mean accuracy of the Kinect device (across the two repetitions), averaged across subjects, obtained using the protocol established in this study is reported. First, the averaged data error (all subjects) for point-to-point in the central sector was reported (Figure 5a). The same result for the right (Figure 5b) and left (Figure 5c) sectors was reported. The same results were also reported for exploration for the central sector (Figure 6a   The results presented in this section are related to the test dataset (comparison between test and retest is presented in Section 3.2). The two-way Anova analysis showed that, on average, seven DoF presented an average error lower than 10°. Two DoF presented an average error in the range between 10°-20° and one had ad > 20° for both the executed movements (point-to-point and exploration). Furthermore, one DoF had error greater than 20° and different mean between point-to-point and exploration movements.
A more in-depth analysis of the arm point-to-point movement revealed that trunk torsion, trunk antero-posterior flexion, trunk medio-lateral flexion, shoulder elevation, shoulder rotation along the The results presented in this section are related to the test dataset (comparison between test and retest is presented in Section 3.2). The two-way Anova analysis showed that, on average, seven DoF presented an average error lower than 10 • . Two DoF presented an average error in the range between 10 • -20 • and one had ad > 20 • for both the executed movements (point-to-point and exploration). Furthermore, one DoF had error greater than 20 • and different mean between point-to-point and exploration movements.
A more in-depth analysis of the arm point-to-point movement revealed that trunk torsion, trunk antero-posterior flexion, trunk medio-lateral flexion, shoulder elevation, shoulder rotation along the vertical axis, elbow extension, and scapular elevation were tracked with an error lower or equal to 5 • . The hand deviation angle and hand flexion-extension were tracked with an error range between 5 • and 15 • . The shoulder internal-external rotation showed a mean error of about 20 • . Lastly, the hand pronation-supination angle had a mean error greater than 20 • . Furthermore, from the analysis of the point-to-point movement, statistical difference was not found between the mean of shoulder elevation and the means of shoulder rotation along the vertical axis (p = 0.96), scapular elevation (p = 0.99), trunk torsion (p = 0.78), trunk medio-lateral flexion (p = 1), elbow extension (p = 0.59), and trunk antero-posterior flexion (p = 0.61). While shoulder internal-external rotation, hand pronation-supination, hand flexion-extension, and hand deviation had statistically different mean one with the other and in respect to the other angles (p < 0.001). Meanwhile, considering the comparison between the sectors, we found that the average error in the right sector was smaller than in the other two sectors, ranging between 8 • and 9 • , and was statistically different from the left and central sectors (respectively, p = 0.012 and p = 0.0053). Left and central sectors had similar means and were not statically different (p = 0.96); the mean error varied between 9 • and 10 • .
The same test performed for the exploration movement provided similar results regarding the average variations between the DoF but with different statistical significance. We could identify some groups: shoulder elevation, shoulder rotation along the vertical axis, and elbow extension were grouped together; in fact, we could not find statistical difference in respect to shoulder elevation (respectively, shoulder rotation along vertical axis, p = 1 and elbow extension p = 0.99). These DoF showed statistical difference from the rest of the variables with p < 0.01 for all three. Hand pronation-supination and hand flexion-extension were not statistically different (p = 0.87) and were statistically different from all the others (p < 0.001); then, scapular elevation, trunk torsion and trunk medio-lateral were not statistically different, respectively compared to trunk torsion (p = 0.94), meanwhile trunk antero-posterior flexion had not a statistically different mean only in respect to trunk medio-later flexion (p = 0.99) but were statistically different in respect to trunk torsion (p = 0.05) and scapular elevation (p ≤ 0.001). Lastly, shoulder internal-external rotation and hand pronation-supination had statistically different means if compared one with the other (p < 0.0001) and from the others (p < 0.001). In the case of exploration, the inter-sector error was differently distributed: all the sectors had no statistical differences central (11.13 • ), right (11.39 • ), and left (11.18 • ). Comparing right and left sectors, they resulted not to have statistical differences (p = 0.79). The same could be said for right and central (p = 0.68) and for central and left (p = 0.98).

RGB-D Sensor Reliability
In this section, we describe the test-retest as a measure of reliability of the RGB-D sensor. An overview of the results is available in Tables 1 and 2. A series of visual representations of the circular target illustrate through heatmaps the repeatability of the Kinect for each specific DoF in each sector with respect to the directionality of the target. First, the mean angular distance for point-to-point in the central sector was reported (Figure 7a). Then, the same result was reported for the right (Figure 7b) and left (Figure 7c) sectors. The same results were also reported for exploration for the central sector (Figure 8a  For point-to-point movements, the directional pie-charts had eight cardinal directions, referred to the forward phases of the movements. The graphical representation adopted a heatmap to depict the magnitude of the ad.
Figures 9 and 10 portray visual representation of the statistical analysis of point-to-point movements and exploration movements, respectively. Furthermore, statistical results between test and retest datasets were provided. As seen in Figure 9 and as reported with the ICC, for the point-to point movement consistency between the results both in mean differences and in significance was found. Figure 10 shows that consistency between the degrees of freedom for the right and central sectors was preserved; meanwhile, the left sector had a slight difference, although not presenting statistical significance with respect to the other sectors.  . Point-to-point movements: two-way ANOVA test results, for test and re-test datasets. Figure 9. Point-to-point movements: two-way ANOVA test results, for test and re-test datasets.  Figure 10. Exploration movements: two-way ANOVA test results, for test and re-test datasets.

Summary of the Results
The RGB-D device was in general able to track upper-limb and trunk motion for the majority of the considered articular angles. However, the ad found in shoulder internal-external rotation and forearm pronation-supination was not negligible. As seen, if one can accept this error, the Kinect V2 is useful to track the glenohumeral joint center motion, the humeroulnar joint center motion, and the trunk movements especially in the right and central sector while performing the arm point-to-point movement. On the contrary, our results recommend more caution while registering exploration movements, since the mean error per sector resulted slightly higher especially with respect to point-to-point movement.

Degrees of Freedom
Most degrees of freedom were tracked with an error below 10 • . This result is consistent with previous studies [18], in which the variables describing shoulder elevation and elbow extension were tracked with an error around 5 • , but in a more constrained scenario (mono-directional reaching movements). The test of accuracy on healthy people, on indexes related to rehabilitation, revealed that most clinical parameters presented an absolute agreement and no systematic bias between RGB-D and marker-based systems [52]. Other similar investigations [53] found that most of the parameters extrapolated were tracked with an RMSE below 10 • , which is comparable to the results obtained through the methodologies illustrated in this work. Other studies reported even more precise results for shoulder tracking but focused on a restricted range of postures [20]. The only degrees of freedom presenting low accuracy were the ones related to the hand. This finding suggests that one should be careful when registering data with RGB-D cameras in a context of high variability of motion or in not favorable postures. In fact, in the current study, two angles presented a very high error: the shoulder internal-external rotation and the hand pronation-supination. The first one is critical since it is based on the projection of the forearm on the transversal plane of the upper arm. This projection highly depends on the angle of elbow extension, which means that when the angle between the arm and upper arm approaches 0 • , the shoulder internal-external rotation cannot be computed with high reliability since the arm is approaching kinematic singularity. Moreover, in similar configurations, small displacements in joint center reconstruction can produce unpredictable variations in the angle extrapolation. In this study, this condition is stressed since the limb is often in a fully extended, singularity posture (especially in exploration movements). This is most likely the reason for finding, in general, worse results in respect to previous works [20,54]. It is likely that, in a wide exploration of the workspace, the tracking is worse in respect to more constrained scenarios.
Since the morphology of the reconstructed skeletons is different when using Vicon and Kinect V2 [54], pronation-supination angle is critical too. Using the Kinect SDK skeleton, this variable was computed using the angular variation of a unitary vector starting from wrist and pointing towards the thumb. The angular variation of this vector around the axis of the forearm could be assumed to be the hand pronation-supination. A critical observation that made difficult the extrapolation of this angle was the poor quality of the tracking of the thumb with Kinect. Our results are in partial accordance with the findings of a comprehensive previous study [54], revealing that fine movements related to the hand could benefit from ad-hoc hand-models, which were not implemented in this study. Moreover, in the assessment of body extremities (hands and feet), even previous works found lower signal-to-noise ratio [52] leading to poor tracking. Furthermore, the movements we analyzed were very demanding for reliable hand pronation and supination computation.
In a more general discussion, a relevant finding was that the tracking performance appears to be better for movements with wide range of motions (such as point-to-point movements, which have been better tracked than hand exploration movements). Arguably, Microsoft's built-in algorithms have made it possible to better reconstruct wider movements that have a higher articular range. These results should be taken into account when tracking people in environments with high tracking precision requirements, or where high accuracy is requested in fine movements.

Sectors
So far, RGB-D sensors such as Kinect V2 were tested mainly considering the performance of the sensor for tracking gestures conceived to emphasize specific degrees of freedom [54], or during rehabilitative-oriented tasks [18,33,36,51]. In this study, our novel contribution was to strongly characterize the performance of the RGB-D sensor in upper-limb movements in a context of continuous variability which is naturally found in daily-life tasks and working activities [57].
The accuracy in tracking workspace sectors was different between the two analyzed movements. For the point-to-point movement, there was a statistically significant difference for the left and central sectors, in which tracking performance of the Kinect was poor with respect to the right sector. This deterioration in tracking performance was probably due to the fact that, even if the target was positioned to the left or in front of the subject, the movement was performed with the right arm. Consequently, during the execution of the tasks in these conditions, part of the subject's body was partially hidden by the arm for significant periods of time, causing the Kinect device to estimate the position of the covered points, probably introducing a further error. This experimental condition is common to RGB-D devices and should be considered as one of the main limitations to the adoption of such sensors in respect to marker-based systems. Depending on the context, the use of ad-hoc algorithms might be considered to solve this issue.
For the exploration movement, we found that the performances on the left and central sectors were comparable, while for the right sectors they were slightly worse. Although, the ICC for the right and central sectors proved to be higher and thus these two sectors provided more consistent and reliable data. This suggests that the central sector might be the most appropriate one in which to register exploration movements, in order to have the highest reliability and lowest error. These results remark that, in presence of obstruction, the use of RGB-D devices can be more critical. However, we still have to underline that the overall difference is limited to less than 3 • , even when performances were not statistically repeatable.
Our results might be of interest of the scientific community also in estimating more reliable configurations for accurate gesture recognition for several field of applications [60], and in particular for the use of the upper-limb in the contexts of assisted living, clinics, and working environments [38].

RGB-D Sensor: Reliability
The test-retest allowed to characterize the reliability of the RGB-D sensor. We found that repeatability was quite high in both point-to-point and exploration movements, except for hand pronation and supination degree of freedom. Our results are in general in line with the previous findings. In fact, preliminary studies on healthy people declared that RGB-D sensors offer the same repeatability of marker-based systems [16]. Similar results were found in a comprehensive rehabilitation-oriented study analyzing whole body movement and especially walking. The authors concluded that repeatability analysis yielded rather similar results for both Kinect V2 and Vicon [52]. The results of the current study agree with previous ones that show acceptable reliability and sensitivity across the sessions for many parameters measured by Kinect for both healthy subjects and also for stroke patients [61]. All these studies considered gestures strongly coupled with degrees of freedom: the slightly more cautious results achieved in this study are probably related to the choice of a more demanding protocol (choice of gestures, variability, and configurations) that stressed the tracking capability of the sensor on generic movements. No remarkable differences were found in repeatability across sectors, even though, as expected, the left sector showed little worse repeatability, probably due to obstruction.

Applications in Real Scenarios
As highlighted from the literature, the RGB-D devices can register upper-limb gestures and have already been used in rehabilitative applications for physical assessment or training, like bi-dimensional movements in the sagittal plane [28], in a tele-rehabilitation scenarios [62], or even the three-dimensional range of motion of the upper limb [63]. Its use in this domain is a great advantage for both clinicians and patients alike. For example, ranges of motion provide accurate enough approximations of the joint positions, providing clinicians with an objective indicator on the well-being of the patients. From the patient's point of view, it might make more popular home-based rehabilitative treatments.
On the other hand, industrial applications are oriented towards monitoring of workplace physical occupation, safety, and injury prevention. Thus, movements on a workbench include many simple movements such as reaching for objects or more complex ones such as manipulating items on the table, or interacting with machines using simple and controlled movements such as hand-over gestures [48][49][50]. In more general industrial contexts, it has been already used to study workers' safeness in working environments. In this context, we mention that in the framework of the recently started European H2020 research project "Mindbot" [64], RGB-D technologies will be used to provide monitoring of workers during interaction with collaborative robots. Moreover, the newly released Kinect Azure [65] enhances the tracking capability of Kinect V2 and provides a more evolved tool to foster the concepts proposed in this paper related to people and workers in living and working environments.

Limitations
The first limitation of this work is the limited number of workspace sectors. Real application scenarios might require the use of other sectors, such as shown in other upper-limb studies [57] simulating assembly scenarios, that were excluded in this study due to limitations of the Kinect system tracking algorithms. Also, the number of movements could be expanded to mimic real scenarios, including other upper-limb gestures useful in industry [47,66,67] and including the tracking of other body segments such as legs [52,63]. Including a total body model would allow to inspect a wider number of degrees of freedom and compare the performances on full 3D movement. Moreover, other RGB-D solutions can be considered, both in the adopted sensors and employed algorithms. For this study, we decided to refer to a well-documented, commercial solution with wide diffusion, even though further developments of the proposed concepts should involve the use of other sensors, or algorithms for skeleton segmentation.
In fact, only conditions in which the subjects partially obstruct the tracking were considered; in real scenarios, the environment might include further limitations and occlusions which were not considered in this study. Lastly, in order to propose a more solid statistical analysis, a more extended group of subjects might be enrolled.

Conclusions
In this paper, we investigated the comparison of an RGB commercial sensor with a golden standard optoelectronic marker-based system. We found that the RGB-D sensor with embedded human tracking algorithm could properly approximate the DoF computed with the marker-based system. However, we found that the performance of the RGB-D sensor is not well usable for the detection of some DoF and that 'wider' movements, such as point-to-point, are tracked with greater accuracy than those of exploration. Even though with some limitations, we can conclude that RGB-D sensor is a potential candidate for motion analysis in rehabilitative and industrial environments when marker-based systems cannot be employed.

Conflicts of Interest:
The authors declare no conflict of interest.