The Validity and Reliability of the Microsoft Kinect for Measuring Trunk Compensation during Reaching

Compensatory movements at the trunk are commonly utilized during reaching by persons with motor impairments due to neurological injury such as stroke. Recent low-cost motion sensors may be able to measure trunk compensation, but their validity and reliability for this application are unknown. The purpose of this study was to compare the first (K1) and second (K2) generations of the Microsoft Kinect to a video motion capture system (VMC) for measuring trunk compensation during reaching. Healthy participants (n = 5) performed reaching movements designed to simulate trunk compensation in three different directions and on two different days while being measured by all three sensors simultaneously. Kinematic variables related to reaching range of motion (ROM), planar reach distance, trunk flexion and lateral flexion, shoulder flexion and lateral flexion, and elbow flexion were calculated. Validity and reliability were analyzed using repeated-measures ANOVA, paired t-tests, Pearson’s correlations, and Bland-Altman limits of agreement. Results show that the K2 was closer in magnitude to the VMC, more valid, and more reliable for measuring trunk flexion and lateral flexion during extended reaches than the K1. Both sensors were highly valid and reliable for reaching ROM, planar reach distance, and elbow flexion for all conditions. Results for shoulder flexion and abduction were mixed. The K2 was more valid and reliable for measuring trunk compensation during reaching and therefore might be prioritized for future development applications. Future analyses should include a more heterogeneous clinical population such as persons with chronic hemiparetic stroke.


Introduction
Upper extremity (UE) motor impairments are highly prevalent in many clinical populations such as stroke [1]. Impaired UE movement is frequently accompanied by compensatory strategies that help a person adapt to limitations in motor function but may impact recovery and cause negative effects if used long term [2][3][4]. There are numerous well-researched, standardized assessments that measure UE abilities according to factors such as speed, strength, range of motion (ROM), and movement quality, but few that directly measure the amount of compensation utilized during task performance [5][6][7]. Without objective measurement and subsequent intervention, continued compensatory movements can reduce the amount of task-driven neuroplastic change achieved following neurologic injury and ultimately contribute to maladaptive plasticity, learned disuse or non-use, and chronic pain or injury [2][3][4]. Objective assessment of targeted and compensatory UE movements often relies on video motion capture cameras (VMC) or electromagnetic sensors that, while extremely accurate, are typically expensive and not feasible for application in a clinical setting. Because the amount of motor recovery achieved, and inversely the amount of compensation used, is highly predictive of participation and

Participants
A convenience sample of five healthy participants (3 women and 2 men, mean age 24.8 years) were recruited to participate in this study. A small sample size was considered due to the large sample of reaches (240 repetitions) performed by each participant and the overall focus of this study being the comparison of repeatable reaching motions across sensors and testing days. All participants gave informed written consent and the study protocol was approved by the university's Institutional Review Board.

Hardware
Both the K1 and K2 combine standard red-green-blue (RGB) video and an infrared (IR) depth sensor with advanced pattern recognition algorithms to provide full-body, three-dimensional (3D) skeletal motion capture without the use of wearable trackers. Both sensors provide data at approximately 30 frames per second (fps), but the K2 generally boasts improved hardware compared  (Table A1) [25]. For example, the K2 collects high definition RGB images (1920 × 1080 pixels) while the K1 collects standard definition RGB (640 × 480 pixels) that fails to compete with most modern webcams [25]. The RGB and IR cameras in the K2 also have wider fields of view and, when combined with updated tracking algorithms, can track greater numbers of skeletal landmarks and overall users [25]. Most importantly, the K2 utilizes a time-of-flight algorithm for motion tracking that is more robust, less noisy, and more reliable than the structured light algorithm used by the K1 [25]. The VMC system was considered the gold standard for comparison in this case and consisted of eight IR motion capture cameras (MAC Eagle Digital Cameras, Motion Analysis Corp., Santa Rosa, CA, USA) measuring at 60 fps with a 3D resolution accurate to within one millimeter.

Experimental Procedure
Participants performed a set of targeted reaching movements similar to a previously developed reaching performance task [12,26] while simultaneously being measured by the K1, the K2, and an 8-camera VMC system. Each participant was seated on a stool in the center of the VMC capture volume with the K1 and K2 positioned at a midline distance of approximately 2.0 m and a height of 1.2 m [12]. Each movement set involved reaching towards a target in the sagittal (forward), scaption (45 degree angle), or frontal (lateral) planes at either a non-extended or extended distance. The non-extended distance was defined relative to each participant's anthropometrics as shoulder height and arm's length, while the extended distance was moved 20 cm beyond arm's length ( Figure 1). This extended reach required a healthy participant to flex the trunk and displace the shoulder to meet the target, similar to compensatory movements employed for reaching by persons with hemiparetic stroke [23]. Participants were provided verbal instruction but, given that they were healthy participants performing a relatively simple targeted reaching movement, no formal training was provided. On two different testing days, five repetitions were performed within each of four sets for the three directions and two conditions, resulting in a total of 240 repetitions for each of five participants. Given the large number of movements, participants were consistently asked for signs of fatigue and pain. None of the healthy participants reported any pain or fatigue in the UE. Participants were also given short breaks between movement sets (approx. 3-5 min) to mitigate fatigue. These breaks allowed researchers to code and save data files, check for data errors, and double check or adjust experimental setup and procedures.
he VMC system was considered the gold standard for comparison in this case and consist IR motion capture cameras (MAC Eagle Digital Cameras, Motion Analysis Corp., Santa R SA) measuring at 60 fps with a 3D resolution accurate to within one millimeter.
xperimental Procedure articipants performed a set of targeted reaching movements similar to a previously devel ing performance task [12,26] while simultaneously being measured by the K1, the K2, and ra VMC system. Each participant was seated on a stool in the center of the VMC capture vol the K1 and K2 positioned at a midline distance of approximately 2.0 m and a height of 1 ach movement set involved reaching towards a target in the sagittal (forward), scaptio e angle), or frontal (lateral) planes at either a non-extended or extended distance. The ded distance was defined relative to each participant's anthropometrics as shoulder height length, while the extended distance was moved 20 cm beyond arm's length ( Figure 1). ded reach required a healthy participant to flex the trunk and displace the shoulder to mee , similar to compensatory movements employed for reaching by persons with hemipa [23]. Participants were provided verbal instruction but, given that they were he ipants performing a relatively simple targeted reaching movement, no formal training ded. On two different testing days, five repetitions were performed within each of four set ree directions and two conditions, resulting in a total of 240 repetitions for each of ipants. Given the large number of movements, participants were consistently asked for sig e and pain. None of the healthy participants reported any pain or fatigue in the UE. Particip also given short breaks between movement sets (approx. 3-5 min) to mitigate fatigue. T s allowed researchers to code and save data files, check for data errors, and double chec t experimental setup and procedures.

Data Collection
Kinematic data were collected for the K1 and K2 using the Microsoft Kinect for Windows Software Development Kit (SDK v1.8 and v2.0) [27], a virtual reality peripheral network (VRPN) server [28], and custom software designed in MATLAB (r2012a, Mathworks Inc., Natick, MA, USA). The 3D positions of 11 upper body landmarks for the K1 and K2 were measured relative to each sensor's origin ( Figure 2). Common landmarks were head, neck, shoulders, elbows, wrists, and hands. The K1 defined torso as the body centroid, while the K2 defined the torso as a mid-spine landmark. Similar data were simultaneously collected for the VMC system using Motion Analysis software (Cortex, Motion Analysis Corp., Santa Rosa, CA, USA) to measure the positions of 25 retroreflective markers placed on bony landmarks on the participant's upper body. Markers were placed on the top of the head (vertex); C7, T10, L5, and S4 vertebrae; sternal notch; xiphoid process; acromion processes; medial and lateral epicondyles; ulnar and radial styloids; anterior superior iliac spines; dorsal hands; and index fingers. Two redundant markers were placed on the humerus and forearm. 2020, 20, x FOR PEER REVIEW 4 s. The K1 defined torso as the body centroid, while the K2 defined the torso as a mid-s ark. Similar data were simultaneously collected for the VMC system using Motion Ana are (Cortex, Motion Analysis Corp. Santa Rosa, CA, USA) to measure the positions o eflective markers placed on bony landmarks on the participant's upper body. Markers d on the top of the head (vertex); C7, T10, L5, and S4 vertebrae; sternal notch; xiphoid pro ion processes; medial and lateral epicondyles; ulnar and radial styloids; anterior superior s; dorsal hands; and index fingers. Two redundant markers were placed on the humerus rm. nalysis Procedure nce collected, Kinect data were filtered (6th order, 6Hz Butterworth) and used to create ent vectors including spine (torso-neck), humerus (shoulder-elbow), and forearm (el /hand). VMC data were similarly filtered (6th order, 6Hz Butterworth), imported into MAT sed to create analogous body segments using marker midpoints and biomechanical conven linically relevant variables were calculated including reaching ROM, planar reaching dis tal and frontal), shoulder flexion and abduction, trunk flexion and lateral flexion, and e n. Reaching ROM was defined as the Euclidean distance between the shoulder and the h planar reaching distance was defined as the distance traveled by the hand in the sagitt l plane. Shoulder flexion and abduction were defined as the angle between the humerus

Analysis Procedure
Once collected, Kinect data were filtered (6th order, 6 Hz Butterworth) and used to create body segment vectors including spine (torso-neck), humerus (shoulder-elbow), and forearm (elbow-wrist/hand). VMC data were similarly filtered (6th order, 6 Hz Butterworth), imported into MATLAB, and used to create analogous body segments using marker midpoints and biomechanical conventions [29]. Clinically relevant variables were calculated including reaching ROM, planar reaching distance (sagittal and frontal), shoulder flexion and abduction, trunk flexion and lateral flexion, and elbow flexion. Reaching ROM was defined as the Euclidean distance between the shoulder and the hand, while planar reaching distance was defined as the distance traveled by the hand in the sagittal or frontal plane. Shoulder flexion and abduction were defined as the angle between the humerus and spine in the sagittal and frontal planes, respectively. Trunk flexion and lateral flexion were similarly Sensors 2020, 20, 7073 5 of 15 defined as the angle between the spine and the vertical coordinate axis in the sagittal and frontal planes, respectively. Finally, elbow flexion was defined as the angle between the forearm and the humerus.

Statistical Approach
A peak detection algorithm was used to determine the start and stop of each reach in terms of the maximum and minimum distance of the hand from the target. The target's position was not inherently available from the Kinect data, therefore an estimation was calculated as the average hand position at its maximum Euclidean distance from neutral. The first repetition of each trial was disregarded due to variable starting positions of the arm and hand. A three standard deviation algorithm was used to identify and remove outliers due to motion tracking errors. Validity was investigated using data from the first testing day (D1) to calculate magnitude differences, Pearson's correlations (r), Bland-Altman 95% limits of agreement (LOA), and a repeated measures analysis of variance (ANOVA) with Bonferroni corrections across sensors. Reliability was investigated using averages within each testing day to calculate magnitude differences, intra-class correlations (ICC), Pearson's correlations (r), Bland-Altman 95% LOA, and paired t-tests between days [30,31]. Estimates of correlations in terms of r and ICC were evaluated as excellent (0.75-1), modest (0.4-0.74), or poor (0-0.39) [31]. Bland-Altman analyses for validity (Table A2) and reliability (Table A4) as well as Pearson's correlations for reliability (Table A3) are presented in the Appendix A.

Trunk Compensation
For trunk flexion and trunk lateral flexion, the K2 was closer in magnitude to the VMC than the K1 in all directions and for both non-extended and extended reaches (Table 1). For trunk flexion, when considering Bland-Altman LOA for all movements, the K2 was within −3.5 • -6.6 • and the K1 was within −2.7 • -14.2 • of the VMC (Table A2). Similarly for trunk lateral flexion, the K2 was within −5.9-7.9 • and the K1 was within −9.0-13.4 • of the VMC. Significant differences were found between K2 and VMC for trunk flexion during extended forward reaching and lateral flexion during extended scaption reaching (Table 1). Significant differences were found between K1 and VMC for trunk flexion during all extended reaches and lateral flexion in all conditions but extended lateral reaching.
The K2 was more valid than the K1 for measuring trunk movements during extended reaches ( Table 2). The K2 showed excellent agreement with the VMC for measuring trunk flexion (r = 0.77-0.88) and lateral flexion (r = 0.77-0.89) during extended reaches. The K1 showed moderate-excellent agreement with the VMC for trunk flexion (r = 0.52-0.78) and moderate agreement for lateral flexion (r = 0.50-0.60) during extended reaches. For non-extended reaches, the K2 showed only moderate agreement (r = 0.43) for measuring trunk flexion during lateral reaching. All other correlations were poor for both the K1 and K2. Bland-Altman analyses show that mean biases for trunk flexion and lateral flexion were smaller and with narrower LOA for the K2 than the K1 when compared to VMC (Table A2).
Reliability results were mixed for all three sensors when measuring the trunk ( Table 3). The K2 showed excellent reliability for measuring trunk flexion during lateral reaching (ICC = 0.91),  (Table A4). 110.

Upper Extremity Movements
The movement traces for the three planar reaching conditions (i.e., sagittal, scaption, frontal) illustrate directional differences between the Kinects and the VMC (Figure 3). Discrepancies in reaching magnitude between the Kinects and the VMC were dependent on the direction of movement. Differences in reaching ROM and planar distance were greatest during forward reaching, reduced during scaption reaching, and least during lateral reaching (Figure 3). Reaching ROM, planar reach distance, and elbow flexion measurements consistently showed excellent validity for the K2 (r = 0.79-0.99) and moderate-excellent validity for the K1 (r = 0.60-0.95) ( Table 2). Reliability of these measurements was moderate-excellent for all three sensors (Table 3). Validity and reliability of shoulder flexion and abduction measurements varied from poor to excellent for all three sensors (Tables 2 and 3).
The movement traces for the three planar reaching conditions (i.e., sagittal, scaption, frontal) illustrate directional differences between the Kinects and the VMC (Figure 3). Discrepancies in reaching magnitude between the Kinects and the VMC were dependent on the direction of movement. Differences in reaching ROM and planar distance were greatest during forward reaching, reduced during scaption reaching, and least during lateral reaching (Figure 3). Reaching ROM, planar reach distance, and elbow flexion measurements consistently showed excellent validity for the K2 (r = 0.79-0.99) and moderate-excellent validity for the K1 (r = 0.60-0.95) ( Table 2). Reliability of these measurements was moderate-excellent for all three sensors (Table 3). Validity and reliability of shoulder flexion and abduction measurements varied from poor to excellent for all three sensors (Tables 2 and 3).

Discussion
The purpose of this investigation was to establish the validity and reliability of two versions of the Microsoft Kinect for measuring UE and trunk kinematics during various reaching conditions. Specifically, participants were asked to perform both a non-extended and extended reach in each of three directions (forward, scaption, lateral) while their movements were recorded by the K1, K2, and the gold-standard VMC simultaneously. The K2 measured the trunk more similarly to the VMC as shown by smaller average magnitude differences in trunk flexion and lateral flexion. Validity results for trunk measurement were excellent for the K2 and modest-excellent for the K1 during extended reaching conditions intended to simulate movements that might be used by persons with chronic stroke. Reliability for trunk measurement was modest-excellent for extended reaching with the K1, with the exception of the forward direction, but varied from poor to excellent for the K2. Results for both sensors were generally excellent for measuring arm and hand displacement, excellent for measuring elbow flexion, and mixed for shoulder measurement, with reaches in the scaption and lateral directions providing more valid and reliable results than the forward direction.
The results of this study are supported by previous research that examines the validity of the K1 and K2 in terms of other functional movements. Bonnechere and colleagues [9] found similar results when comparing the K1 to VMC during the performance of four functional movements including shoulder abduction (similar to lateral reaching) and elbow flexion (similar to forward reaching). Clark and colleagues [11] found the K2 to have excellent concurrent validity for measuring trunk movements during dynamic balance tasks and anterior-posterior movements, but poor-moderate validity for static tasks and medial-lateral movements. In the current investigation, the K2 similarly shows the greatest validity for measuring trunk flexion during an extended movement in the anterior-posterior direction. Reither et al. [12] found similar results while measuring the K1, K2, and VMC simultaneously with a single participant reaching forward, reaching to the side, and performing shoulder movements in various planes, but did not investigate trunk kinematics during such movements. In summary, Reither et al. [12] similarly found a greater range in single-day correlations between K1 and VMC (r = 0.31-0.96) than between the K2 and VMC (r = 0.45-0.96) with correlation magnitudes dependent on movement plane. The authors also found varied day-to-day reliability results for both K1 and K2 and, in general, a greater direction-dependent underestimation of kinematics displayed by the K1 [12]. The current study goes beyond the methods of Reither et al. [12] by utilizing an increased sample size of participants and movements, the inclusion of extended reaches to elicit trunk compensations, analysis of the trunk along with the UE, and movements in the scaption plane along with sagittal, frontal, and transverse planes.
We found several low and negative reliability (ICC) values (Table 3), particularly for shoulder flexion, shoulder abduction, trunk flexion, and trunk lateral flexion during non-extended reaching in the forward and scaption directions for all sensors including VMC. Negative ICC values are not ideal and can often be attributed to low between-subjects variance in the phenomenon being measured [32]. Accordingly, these results might be due to small between-day variance in the kinematic variables being tested. For example, a negative ICC value (ICC = −0.53) was calculated for the K2 between days for trunk flexion during the extended forward reach, but Bland-Altman analysis shows a small mean bias (bias = −3.0 • ) and LOA (LOA = −13.2-6.8 • ). This suggests a relatively small mean difference, and thus satisfactory repeatability, between testing days even in the face of a negative ICC calculation that may be due to small and non-systematic variance. A more heterogeneous clinical population may improve correlation results by increasing variance in the sample. Pearson's correlations (Table A3) and Bland-Altman LOA (Table A4) were included to give a broader picture of absolute and relative reliability for all three sensors. Additional, more advanced analyses may also provide further insight into these discrepancies; for example, dynamic time warping (DTW) is an advanced signal processing technique that could provide a measure of signal match for the time series data collected by the K1 and K2 [33].
The most notable limitation to this work is the use of healthy participants rather than a sample of participants with hemiparesis. As mentioned previously, persons with hemiparesis reach significantly differently than unimpaired persons, namely with slower movement, less accuracy, impaired interjoint coordination, and increased use of compensatory movement at the trunk [22,23]. Targets placed beyond the reach of healthy participants can elicit a similar compensatory response at the trunk, but persons with hemiparesis exhibit less symmetry and earlier trunk recruitment in comparison [23]. Healthy reaching is simply not the same as hemiparetic reaching. However, the purpose of the current study is to validate the measurement capabilities of the K1 and K2 relative to each other and to a gold-standard VMC system. Numerous referenced studies use healthy participants for sensor validation with intentions for future clinical application [9][10][11][12][13][14][15][16][17]. Healthy participants are more accessible, can perform the large number of required movements without fatigue or pain, and can more readily reproduce movements across trials and testing days for validity and reliability analyses. Given that the ultimate application of this study is implementation for clinical measurement of neurologically impaired populations, the ecological validity of future work would greatly benefit from testing with a more heterogeneous sample of persons with hemiparetic stroke.
The current study provides some insights for the design of such future work; for example, it may be necessary to recruit more individuals and reduce the overall repetitions performed to better capture variability, mitigate fatigue, and enhance the generalizability of results for real-world clinical populations. In addition, the experimental protocol could be adjusted to provide detailed instruction and training for impaired populations to reduce trial variability and enhance the efficiency of data reduction and cleaning. Given that the evidence shows that persons with hemiparetic stroke recruit the trunk earlier and more often than healthy populations [23], it may be necessary to eliminate or reduce the distance of the extended reach to maximize reaching performance and reduce frustration. Finally, given the results of the current study, it may be prudent to focus on the planes of movement best measured by the K1 and K2 due to their hardware constraints (e.g., lateral > scaption > forward).
Other variations in results might be attributed to various study limitations. First, the Kinect SDK uses a tracking algorithm that does not rely on the specific placement of markers on palpable bony landmarks as does the VMC. While this is convenient for users, it has been previously noted as a limitation in the Kinect's ability to accurately measure kinematics of movement due to variable body segment lengths; however, previous studies have developed algorithms through regression that may be able to correct for this during real-time tracking [9]. Second, it was clear through both observation and the relatively high standard deviations attributed to each movement ( Table 1) that different strategies were used for reaching by individual participants. No neutral starting point was defined a priori, and some participants returned their arm to their lap between repetitions while others remained in a flexed position. This resulted in large variations in range of motion, namely with elbow flexion. Finally, reliability results varied inconsistently for all three sensors, and it should be noted that, on top of statistical limitations, there are intra-individual differences across trials and across days in each participant's reaching kinematics. Participants were given similar instructions for each trial and testing day, but differences in the repeatability of human movement yet exist and may be attributable to the slight variance in between-day correlation and significance testing. Participants were provided verbal instruction but no formal training at the simple reaching movements, so movement may have differed between movement sets and even testing days due to subtle learning effects. It is also possible that the placement of motion capture markers varied slightly between days, resulting in reliability differences. Increasing the overall sample size in the future could mitigate these intra-and inter-individual differences in repeatable movement.
This study shows that the K1 and K2 may serve as useful tools for objectively measuring UE and trunk kinematics, but application may depend on the body segment, joint, and movement plane of interest. Few studies have investigated their relative measurement properties, but both sensors are widely employed as the basis for VR-based interventions for persons with motor impairments including stroke and cerebral palsy [19,21]. Use of such interventions continues to grow along with client interest, professional knowledge, and technological accessibility [34]. The current investigation may inform future VR development, namely the inclusion of real-time measurement of trunk compensation using the K2.

Conclusions
In conclusion, the K1 and K2 have been shown to be valid and reliable for measuring some aspects of UE and trunk kinematics during reaching. In particular, the K2 exhibited slightly better characteristics for measuring the trunk during standard and extended reaching in different directions, and may be recommended over the K1 in future development for purposes of measuring trunk compensation in clinical populations.  A   Table A1. Comparison of the first-generation Microsoft Kinect V1 (K1) and the second-generation Microsoft Kinect V2 (K2). The K2 boasts improved motion sensing hardware, particularly in resolution, field of view, and sensing algorithms. This is adapted from Pagliari and Pinto (2015) [25].