Placement Recommendations for Single Kinect-Based Motion Capture System in Unilateral Dynamic Motion Analysis

Low-cost, portable, and easy-to-use Kinect-based systems achieved great popularity in out-of-the-lab motion analysis. The placement of a Kinect sensor significantly influences the accuracy in measuring kinematic parameters for dynamics tasks. We conducted an experiment to investigate the impact of sensor placement on the accuracy of upper limb kinematics during a typical upper limb functional task, the drinking task. Using a 3D motion capture system as the golden standard, we tested twenty-one Kinect positions with three different distances and seven orientations. Upper limb joint angles, including shoulder flexion/extension, shoulder adduction/abduction, shoulder internal/external rotation, and elbow flexion/extension angles, are calculated via our developed Kinect kinematic model and the UWA kinematic model for both the Kinect-based system and the 3D motion capture system. We extracted the angles at the point of the target achieved (PTA). The mean-absolute-error (MEA) with the standard represents the Kinect-based system’s performance. We conducted a two-way repeated measure ANOVA to explore the impacts of distance and orientation on the MEAs for all upper limb angles. There is a significant main effect for orientation. The main effects for distance and the interaction effects do not reach statistical significance. The post hoc test using LSD test for orientation shows that the effect of orientation is joint-dependent and plane-dependent. For a complex task (e.g., drinking), which involves body occlusions, placing a Kinect sensor right in front of a subject is not a good choice. We suggest that place a Kinect sensor at the contralateral side of a subject with the orientation around 30∘ to 45∘ for upper limb functional tasks. For all kinds of dynamic tasks, we put forward the following recommendations for the placement of a Kinect sensor. First, set an optimal sensor position for capture, making sure that all investigated joints are visible during the whole task. Second, sensor placement should avoid body occlusion at the maximum extension. Third, if an optimal location cannot be achieved in an out-of-the-lab environment, researchers could put the Kinect sensor at an optimal orientation by trading off the factor of distance. Last, for those need to assess functions of both limbs, the users can relocate the sensor and re-evaluate the functions of the other side once they finish evaluating functions of one side of a subject.


Introduction
Three-dimensional (3D) motion analysis is a systematic study of human movement. With the quantification of joint kinematics, augmented by high-end instrumentation, investigators are able to obtain thorough 3D information on the movements of body segments through space and time, including linear and angular displacements, velocities, and ac-posing at a 45 • angle to the Kinect may improve the spatial accuracy for standing trunk flexion, hand clasping, and finger tapping tasks. The 45 • angle also had better accuracy in distinguishing a foot from the floor and determining knee location when the leg was straight [25]. Seo et al. [33] found that measuring the upper limb range of motion was most accurate when the Kinect was elevated 45 • in front of the subject, tilted toward the subject during the reaching task. For gait analysis, a Kinect sensor was mostly placed around one meter above the ground and tilted 0 • [17,31] to −5 • [34] in the horizontal plane, with the sensor placed in front of the subject to provide a frontal plane view. Yeung et al. evaluated the accruacies of two versions of Kinect sensors for measuring kinematics during treadmill walking at five camera viewing angles (from 0 • to 90 • ) [35]. They found that at frontal viewing angle 0 • , Kinect v2 sensor had better performance in capturing hip and knee saggital angle than other viewing angles [35]. To the best of our knowledge, there is little guideline or recommendation that thoroughly instructs a user how to place a Kinect sensor for dynamic tasks, especially for upper limbs using a single Kinect-based system.
We conducted a series of experiments to evaluate the impact of sensor placement on the accuracy of a single Kinect-based system in measuring upper limb kinematics during performing a typical upper limb functional task, the drinking (or hand to mouth) task. Drinking is one of the most common active daily activities. The ability to perform drinking and similar tasks represents movement coordination to some extent. The drinking task is a vital movement for assessing upper limb functions and evaluating the rehabilitation effects [36][37][38]. Compared with previous studies, the drinking task is more complex because it involves movements of multiple joints in multiple planes. More importantly, when performing the drinking task a subject's upper arm or part of it is always occluded by the forearm. Using the drinking task as an example to evaluate the performance of the Kinect sensor can generalize to other upper limb functional tasks, which involve complex movements and body occlusions.
The purpose of our study is to investigate the sensor placement guideline for the single Kinect-based systems in measuring joint kinematics when performing upper limb functional tasks, represented by the drinking task. The guideline is investigated by evaluating the upper limb kinematic accuracy when the sensor is placed at twenty-one places with different distances and orientations. We also would like to give some general recommendations on the placement of Kinect sensors for motion analysis of all kinds of dynamic tasks.

Subject
We recruited ten healthy male college students (age: 25.3 ± 3.6 years old, height: 176 ± 4.2 cm, mass: 66.3 ± 4.5 Kg) who have no upper limb musculoskeletal disorders in six months prior to the experiment. The subjects all volunteered to participate in the study, and all signed informed consent forms before the experiment. The experiment protocol was approved by the Ethics Committee of Research Academy of Grand Health in Ningbo University.

Experiment Procedures
We conducted an experiment to compare the accuracy of upper limb joint kinematics using a single Kinect-based system when the sensor is placed at different locations. A stateof-the-art 3DMC system (Vicon, Oxford Metrics Ltd., Oxford, UK) is employed to generate the golden standard kinematic data. Each subject performed a series of hand-to-mouth task in the biomechanics laboratory of Ningbo University. An experimenter demonstrated the standard hand-to-mouth task for each subject to standardize the movement's posture and speed. The hand-to-mouth (drinking) task represents activities such as eating and reaching the face. Subjects start with the arm in the anatomical position with their hand beside their body and end up with their hand reaching each their mouth (see Figure 1). Any shiny objects, such as watches, were removed from subjects to prevent interference with Kinect's motion detection. Twenty-one locations are evaluated for the single Kinect system. The locations differed by the distance between a sensor and a subject and by the Kinect's orientations relative to the subject. Those two factors are easy to adjust in real-world environments. See Figure 2 for the details of Kinect placements. We denote the joint angles derived from a Kinect-based system located at different places by K i,j that at a distance of i meters and an orientation of j degrees. i = 1.5, 2.0, 3.0, which represents the distance between a sensor and a subject is 1.5 m, 2.0 m, or 3 m. j = −60, −45, −30, 0, 30, 45, 60, in which −60 represents 60 • to a subject's left side, −45 represents 45 • to a subject's left side, −30 represents 30 • to a subject's left side, 0 represents right in front of a subject, 30 represents 30 • to a subject's right side, 45 represents 45 • to a subject's right side, 60 represents 60 • to a subject's right side. The Kinect sensor is placed on a tripod, 1 m above the ground.
We concurrently recorded the 3D coordinates of all reflective markers using the 3DMC and the coordinates of the skeletal joints using the Kinect v2 system. The coordinates of the reflective markers are recorded via the Nexus software at a sampling frequency of 100 Hz. The coordinates of the Kinect skeleton are recorded by a self-developed software based on Kinect SDK 2.0 at a sampling frequency of 30 Hz. Before the experiment, reflective markers were attached to the anatomical landmarks of each subject according to the UWA marker set [39]. Each subject firstly performed a static trial in the anatomical position. The elbow and wrist markers were then removed during the dynamic trials. Each subject sat in a chair and performed the hand-to-mouth task at least five times for each Kinect position. Subjects were provided with rest breaks between Kinect relocation.

Data Analysis
Upper limb joint angles, including shoulder flexion/extension, shoulder abduction/adduction, shoulder internal/external rotation, and elbow flexion/extension are calculated for the 3DMC system and the Kinect-based system, respectively.
For the Kinect-based system, Upper limb joint angles are calculated based on the 3D coordinates of trunk and upper limb joint center (see Figure 3, left), including Shoul-derRight, ShoulderLeft, SpineShoulder, SpineMid, ElbowRight, and WristRight derived from Kinect v2 SDK. The 3D coordinates are pre-processed by a zero-lag fourth-order Butterworth low-pass filter with a cut-off frequency of 6 Hz. According to the recommendation of R. Bartlett, 4 to 8 Hz are often used as the cut-off frequencies in low-pass filtering movement data [40]. We carried out a series of residual analyses on the raw data using the cut-off frequencies of 4, 5, 6, 7, and 8 Hz, respectively. We selected 6 Hz as the cut-off frequency because it yields the best result in our task and is validated for upper limb function assessment [12]. Local segment coordination (LSC) systems of the torso and upper arm (taking the right arm as an example) are defined in Table 1. The kinematic model of Kinect Φ calculates three Euler angles for shoulder joint following the flexion (+)/extension (−) adduction (+)/abduction (−) and internal (+)/external (−) rotation order. The shoulder angles are denoted as α FE , α AA , and α IE . The elbow flexion angle, denoted as α EFE , is calculated using the trigonometric function by the position coordinates from ShoulderRight, ElbowRight, and WristRight. The model Φ is developed using Matlab 2018a.  [39]. The UWA marker set includes 18 markers (see Figure 3, right). Trunk, upper arm, and forearm segments are defined based on these markers. The shoulder joint center was determined by the posterior shoulder marker (PSH) and the anterior shoulder marker (ASH). The elbow joint center is determined by the medial elbow epicondyle (EM) and lateral elbow epicondyle (EL) markers. The wrist joint center is determined by the ulnar styloid (US) and radial styloid (RS) markers. We use the calibrated anatomical systems technique [39] to establish motions of anatomical landmarks regarding to the coordination systems of the upper-arm cluster (PUA) or the forearm cluster (DUA). Thus, the motion of the upper-limb landmarks could be reconstructed from their constant relative positions in the upper-arm technical coordinate system. Shoulder joint angles, including flexion (+)/extension (−) β FE , adduction (+)/abduction (−) β AA , and internal (+)/external (−) rotation β IE , as well as elbow flexion (+)/extension (−) angle β EFE are served as the golden standard.

Statistical Analysis
We extracted angle at the point of target achieved (PTA) values from the shoulder and elbow angular waveforms for both the Kinect and 3DMC system, which are denoted as K Φ and K Γ . The mean absolute error (MEA) E Φ,Γ = |K Φ − K Γ | represents the accuracy of the Kinect-based system.
The Shapro-Wilk test was performed to test the normality of the data E Φ,Γ for all upper limb angles when the Kinect sensor was placed at 21 positions. A two-way repeated measure ANOVA was conducted to explore the impacts of distance (1.5 m, 2 m, and 3 m) and orientation (see the details of the seven orientations in Section 2.3) on the MEAs. A statistically significant difference is accepted as p < 0.5. The eta squared (η 2 ) is used as the measure of effect size. The η 2 of 0.01, 0.06, and 0.14 means small effect, moderate effect, and large effect [41]. For those factors that reach significant difference, post hoc analysis is applied to show where the difference occurs.

Results
In Table 2, we presented descriptive statistics of the mean absolute error (MEA) of upper limb joint angles between the Kinect-based system and the golden standard system when the Kinect sensor is placed at different distances and orientations with respect to the subject. The result of two-way repeated measure ANOVA is also demonstrated in Table 2, which explores the impact of distance and orientation on the accuracy of upper limb kinematic measurement by the Kinect-based system.
As the main effect for orientation reached significant difference for all four upper limb angles, post hoc comparisons using the LSD test were applied to investigate the performance of Kinect when placed at seven orientations. The results are visualized in Figure 4. It is clear that the performance in kinematic measurement is quite different when the Kinect is placed at different orientations. The impact of orientation differs for the four joint angles we investigated.
In terms of the shoulder flexion/extension angle (see Figure 4, upper left and Table 2), the angle at the orientation 0 • , denoted by K 0 , is significantly different in comparison with the other six orientations, including 30 • , 45 • , and 60 • to the left and right side of the subject, which are denoted by K −30 , K −45 , K −60 , K 30 , K 45 , and K 60 respectively. The Kinect being placed at 0 • has the largest error than being placed at other orientations with more than 10 • more deviations compared with the standards. There are no significant differences between MAEs of the Kinect at the one side of the subject, either left (p = 0.53 ∼ 0.99) or right (p = 0.46 ∼ 0.99).
The sagittal plane angle of the elbow joint, the elbow flexion/extension angle (see Figure 4, bottom right and Table 2  In terms of the shoulder angle in the frontal plane, shoulder abduction/adduction angle (see Figure 4, upper right and Table 2), the MEA pattern is quite different compared to angles in the sagittal plane. The MEA is the smallest (2.85 • ) when the Kinect is placed in front of the subject. The MEA is the largest at 60 • to the right side and shows significant difference with 30 • to the left (p = 0.01), in front of the subject (p < 0.01), and 45 • to the right (p = 0.03). The MEAs of the left side (K −60,−45,−30 = 4.80 ∼ 6.74) are smaller than the right side (K 0,30,45,60 = 6.05 ∼ 9.38).
For shoulder angle in the coronal plane, the shoulder internal/external rotation angle (see Figure 4, bottom left and Table 2

Discussion
We investigated the accuracy of a single Kinect-based system in measuring upper limb kinematics during a typical upper limb functional task when the Kinect sensor is placed at different distances and orientations. The upper limb joint angle, including shoulder flexion/extension, shoulder abduction/adduction, shoulder internal/external rotation, and elbow flexion/extension angle are simultaneously measured by a Kinect v2 sensor and a standard 3D motion capture system (3DMC). We evaluated the performance of a Kinect sensor being placed at twenty-one places via the mean absolute error (MEA) of each angle between a Kinect and a 3DMC system. We want to discover a guideline of Kinect placement for the Kinect users in assessing upper limb functional tasks. We also would like to summarize and provide some general recommendations on placement of a single Kinect sensor for all dynamic functional tasks.

Effects of the Kinect Placement
Our study finds that in the the sensor's effective capture space, the distance between a subject and a sensor does not influence the kinematic accuracy for both shoulder joints and elbow joint in all degrees of freedom. The MEAs between the Kinect sensor and the golden standard have no significant differences among the distances of 1.5 m, 2 m, and 3 m for the drinking task. Dutta [3] investigated the feasibility of Kinect as a 3D motion capture system in the workspace by comparing the location accuracy of a simplified four-cubes system with a Vicon system. In Dutta's study, the root-mean-squared errors between the two systems are around 0.0065 m, 0.0109 m, and 0.0057 m in the direction of x, y, and z axes (to the right, away from the sensor, and upward), respectively. The accuracy is at the same level for most places over the range of 1.0 m to 3.6 m, with the worse accuracy at the margin of the effective field of view in all three planes [3]. Although upper limb functional tasks are more complex than Dutta's simplified system, both our study and Dutta's showed similar results, indicating that besides close to the margin of the Kinect's effective field of view, the distance between a sensor and a subject is not a sensitive factor to the accuracy of upper limb kinematics using a single Kinect-based system. The Kinect shows smaller MEAs when placed at the investigated limbs' contralateral side for our upper limb functional task. In our study, the subjects performed a right-side drinking task, which means that the the subject's left side is the contralateral side. Our results (see Table 2, Figure 4) reveal that besides the shoulder flexion/extension angle, placing the Kinect at the left side of the subject showed lower kinematic errors than the right side; however, in a computer use task, which involves no upper limb occlusion, Xu Xu et al., found that placing the Kinect sensor in front of the subject is more accurate on measuring shoulder kinematics in comparing with placing the sensor 15 • or 30 • to the left [32]. In another upper limb joint angle measuring task using Kinect sensor, the subjects were asked to lift their right arm, and point their index finger to the targets [33]. For such a reaching task with little elbow flexion/extension and little occlusion between the upper limb and forearm, the Kinect sensor's optimal place is quite different from our study. The Kinect sensor in front of the subject, with 45 • elevation showed the least errors in measuring upper limb range of motions [33].
The Kinect sensor is usually placed in front of the subject with zero or less than 5 • tilt in the horizontal plane [17,31,34] gait analysis and static posture assessments. Compared with gait analysis and static posture assessments, kinematic assessment for the upper limbs using Kinect sensors is far more challenging [42]. Upper limb functional activities show larger variations of execution in the healthy population (as opposed to the stereotyped gait pattern) related to less stringent task accomplishment and the higher degrees of freedom in the upper limb [42]. The upper limbs, especially the shoulder joint, have a very large working range, in comparison with the lower limbs. Furthermore, the upper limb joints can easily occlude each other. The placement of the Kinect sensor is, therefore, easier for gait analysis and static posture assessments.
There is not a single optimal measurement position for a Kinect sensor in measuring human postures. The optimal Kinect placement is task-dependent. The placement of the Kinect sensor should be investigated carefully for each functional task, especially for those with body occlusions. To those tasks with little body occlusions, such as gait analysis, static movement assessment, or upper limb tasks with little body occlusion, the Kinect sensor could be placed in front of the subject, with small range of tilt angle, or with some degrees of elevation, depending on the tasks. For those tasks involving body occlusions, the placement of the Kinect sensor is more challenging and vital. Researchers must carefully evaluate the sensor placement to ensure the system's optimal performance.
The original Kinect uses structured-light (SL) technology (denoted as Kinect SL ) for obtaining depth information. The Kinect SL uses a low number of patterns to obtain depth estimation of the scenery at a relatively high sampling frequency (around 30 frames per second) [28]. The Kinect SL is based on the standard structured light principle and uses simple triangulation techniques to compute the depth information between the projected pattern seen by the near infra-red (NIR) laser camera and the input pattern stored on the unit.
The updated Kinect sensors are based on the time-of-flight principle (denoted as Kinect ToF ) [28,43]. Kinect ToF also is more robust and accurate in tracking human pose [44]. Kinect ToF would be a better choice for depth-stereo applications [28]. The Kinect ToF is based on measuring the time and light emitted by an illumination unit using the continuous wave (CW) intensity modulation approach [45]. The ToF-camera has a unique ambiguous measurement range due to the periodicity of the signal it uses. The human skeleton of the Kinect ToF is more anthropometric with smaller offsets of the skeleton joints, better overall accuracy of joint positions, more reliable even with partial body occlusions and so forth in comparison with the Kinect SL [44].
The Kinect ToF suffers from various artifacts caused by light reflection [45]. Besides the systematic errors of the range sensors, the sensors also suffer from errors from the "multi-path effect" and dynamic scenery [45]. The average depth accuracy is under 2 mm in the central viewing cone and increases to 2-4 mm in the range of up to 3.5 mm. The maximal range captured by Kinect ToF is around 4.0 m, where the average error increases beyond 4 mm [44]. In the effective field of view of a Kinect sensor, the Kinect sensor should be placed as near as possible. Although in our study, the upper limb kinematics are not significantly influenced by the "distance" factor.

Recommendations of the Kinect Placement for Dynamics Tasks
We propose the following recommendations for placing Kinect sensors in assessing kinematics of dynamic tasks. Researchers should firstly set an optimal effective capture space, which enables the maximal volume of view and make sure all investigated joints can be robustly captured in this space by setting the height and tilt angle of the sensor as well as the sensor orientation of the sensor relative to the investigated subject. The Kinect sensors use a depth sensor and a color camera, in which these two cameras have different fields of view; therefore, the effective capture space should consider both the depth sensor and the color camera. Secondly, the placement of the Kinect should ensure minimal body occlusion during the entire movement. For motions such as static motion assessment, gait analysis, or dynamic balance test, there is little body occlusion during whole tasks. The users only need to ensure that the subjects during the investigated movements are in the effective capture space; however, for upper limb functional tasks such as drinking, body occlusions occur when the subject flexes his/her elbow joint or his/her hand reaches his/her mouth. The depth information and the color images of upper arm are deviated because of the occlusion, which causes inaccuracy of the kinematic measures.
Thirdly, the out-of-the-lab environment is more challenging than the traditional laboratory for a 3D motion capture system due to the limited space, complex layout, or application scenario. In such a challenging task, the optimal location of the Kinect sensor sometimes cannot be achieved. The Kinect location should be compromised by trading off between several key factors such as distance and orientation. The orientations in our research are relative to the sagittal plane of the subject, in which the most common orientation, i.e., right in front of and facing toward the subject, is defined as 0 • and to the right side and to the left side of the subjects are defined as positive and negative orientations (see Figure 2). From our study, we found that in the effective FoV, the "distance" factor of the Kinect location has less influence on the kinematic measurements than the "orientation" factor; therefore, in a challenging scenario, the most important Kinect location is one that ensures the optimal orientation.
Lastly, for those functional tasks involving bi-lateral limbs, we cannot find an optimal location where the system has good accuracy for both left and right sides using a single depth-sensor-based system. If the users need to assess the kinematics of both sides, they can relocate the sensor and re-evaluate the functions of the other side once they finish evaluating the kinematic functions of one side of a subject.

Conclusions
The placement of a Kinect sensor is of great importance for assessing joint kinematics when performing dynamic tasks using a single Kinect-based system. We found that placing a Kinect sensor right in front of a subject is not a good choice for complex upper limb tasks (such as the drinking task), which involves body occlusions. We suggest placing a Kinect sensor at the contralateral side of a subject with the orientation around 30 • to 45 • for upper limb functional tasks. For all kinds of dynamic tasks, we put forward the following recommendations on the placement of a Kinect sensor. First, set an optimal location for capture, making sure that all investigated joints are visible during the whole task. Second, sensor placement should avoid body occlusion at the maximum extension by setting the height and tilt angle of the sensor as well as the sensor orientation of the sensor relative to an investigated subject. Third, if an optimal location cannot be achieved in an out-of-the-lab environment, researchers could put the Kinect sensor at an optimal orientation by trading off the factor of distance. Last, for those who need to assess functions of both limbs, the users can relocate the sensor and re-evaluate the functions of the other side once they finish evaluating the functions of one side of a subject. Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.
Data Availability Statement: The data are available from the corresponding author upon request.

Acknowledgments:
We give our sincere thanks to Junmin Teng for her kind help in subject recruitment and in conducting the experiment.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

3D
Three dimensional 3DMC Three-dimensional motion capture system RGB-D A depth sensor combined with an RGB-color camera