Deep Learning-Based Upper Limb Functional Assessment Using a Single Kinect v2 Sensor.

We develop a deep learning refined kinematic model for accurately assessing upper limb joint angles using a single Kinect v2 sensor. We train a long short-term memory recurrent neural network using a supervised machine learning architecture to compensate for the systematic error of the Kinect kinematic model, taking a marker-based three-dimensional motion capture system (3DMC) as the golden standard. A series of upper limb functional task experiments were conducted, namely hand to the contralateral shoulder, hand to mouth or drinking, combing hair, and hand to back pocket. Our deep learning-based model significantly improves the performance of a single Kinect v2 sensor for all investigated upper limb joint angles across all functional tasks. Using a single Kinect v2 sensor, our deep learning-based model could measure shoulder and elbow flexion/extension waveforms with mean CMCs >0.93 for all tasks, shoulder adduction/abduction, and internal/external rotation waveforms with mean CMCs >0.8 for most of the tasks. The mean deviations of angles at the point of target achieved and range of motion are under 5° for all investigated joint angles during all functional tasks. Compared with the 3DMC, our presented system is easier to operate and needs less laboratory space.


Introduction
Three dimensional (3D) kinematic analysis of upper limb functional movement has been widely conducted in many areas. Upper limb kinematic analysis has been employed in both theoretical studies such as the underlying theory of neuromusculoskeletal system [1][2][3] and practical concerns in the clinical assessment of motion functions, rehabilitation training [4], ergonomics studies [5,6], and so forth. Marker-based 3D motion capture systems (3DMC) [7] have been widely employed in quantitative measurements of upper limb functional tasks. In such a system, 3D motion data is obtained based on passive or active markers attached to the anatomical landmarks of participants. These marker-based systems have been confirmed to be valid and reliable in assessing upper limb kinematics [3,8]. However, these systems are not practical for applications in small clinics or home-based assessment, given the expensive hardware cost, time-consuming experiment conduction as well as the strict requirements for lab space and trained technician. Markerless motion capture system could be a possible alternative for upper limb functional assessment [9], especially after the introduction of a commercially available, low-cost, and portable device named Kinect (Microsoft, Redmond, WA, USA). The second iteration of the Kinect (denoted as Kinect v2), is capable of tracking real-time 3D motions with its depth image sensor [10] and its human Sensors 2020, 20, 1903 3 of 26 Various researches have been conducted to improve the accuracy of Kinect v2 in kinematics measurement. One type of solution is model fitting algorithms. Xu et al. [33] employed a linear regression algorithm between each Kinect-based shoulder joint angle and its 3DMC counterpart. Given the nonlinear relationship between the upper limb joint angle trajectories calculated via Kinect and the 3DMC system, linear regression algorithms have limited ability to improve the kinematic measurement accuracy. Only shoulder adduction/abduction angles are significantly improved after employed the linear regression algorithms and the RMSEs between the Kinect sensor and the 3DMC system is around 8.1 • and 10.1 • for the right and left shoulders. Kim et al. [36] proposed a post-processing method, which is a combination of two deep recurrent neural networks (RNN) and a classic Kalman filter, to correct unnatural tracking movements. This post-processing method only improves the naturalness of the captured joint trajectories. The accuracy is insufficient for clinical assessment.
Another type of solution is applying marker-tracking technology with the Kinect system. Timmi et al. [37] developed a novel tracking method using Kinect v2 by employing custom-made colored markers and computer vision techniques. The markers with diameters of 38 mm were painted using matte acrylic paints. Magenta, green, and blue paints were chosen for hip, knee, and ankle joint markers, respectively. The centers of the three markers should be placed on a straight line. They evaluated the method in comparison with lower limb kinematics over a range of gait speeds and found generally good results. However, the actual use case for this kind of system appears limited due to two factors: (1) The marker-tracking Kinect system could not solve the occlusion issue when performing upper limb functional task; (2) The introduction of markers into Kinect system bring the reliability issue from incorrect marker placement and complicated experimental calibration procedures. Thus, the method is unlikely to provide significant benefits over the skeleton tracker algorithm [34].
Using multi-Kinect and fusion systems might be another solution to improve the assessment accuracy of the Kinect system as it can reduce body occlusion and extend the field of view. However, these systems show apparent limitations such as: (1) It is difficult to set up and calibrate multiple depth cameras; (2) One Kinect is likely to be impacted by another Kinect sensor. For this matter, the evidence of improving accuracy is not strong [38].
Given the pros and cons of the existing algorithms, as shown above, we develop a novel deep learning refined kinematic model using a single Kinect v2 sensor for accurately assessing 3D upper limb joint angles. We form a kinematic model to calculate upper limb joint angles from Kinect. For a specific task, we construct a deep neural network to compensate for the systematic error on those joint angles. Such a neural network is trained using joint angles via the Kinect sensor as the input and those 3DMC counterparts as the target. For the 3DMC, a UWA kinematic model [39] is used to calculate 3D upper limb kinematics based on the 3D positions of reflective markers attached on the subjects. A deep neural network is a favorable tool for non-linear fitting, especially when the shape of the underlying model is unknown [40]. The recurrent neural network (RNN) architecture is designed specifically for time series data, which conforms to our joint angel data very well. Long short-term memory (LSTM) is the cutting-edge technology of RNN [41]. We employ a three-layer LSTM network in our method. See Figure 1 for a brief pipeline of our method.
A series of upper limb functional task experiments were conducted to evaluate the effectiveness of our developed deep learning-based model. The tasks represent a variety of active daily functional activities [42]. The hand to contra lateral shoulder task represents activities such as washing axilla or zip up a jacket. The hand to mouth task represents eating or reaching the face. The combing hair task represents washing/combing hair or reaching the back of the head. The hand to back pocket task represents reaching the back and perineal care.
3D positions of the reflective markers according to the UWA marker set are recorded using a 3DMC system. The joint centers extracted from the Kinect skeleton are recorded and a single Kinect v2 sensor. We use a leave-one subject-out cross-validation protocol to evaluate the performance of our deep learning refined kinematic model. The coefficient of multiple correlation (CMC) and root-mean-squared error (RMSE) are used to evaluate the performance of the deep learning refined kinematic model in Range of motion (ROM) and angles at the point of target achieved (PTA) are extracted to represent key kinematic parameters. ROM and PTA via both our deep learning refined kinematic model and the kinematic model for Kinect sensor are statistically compared with those via the 3DMC system. The absolute error and Bland-Altman plot are analyzed for the ROM and PTA via the deep learning refined kinematic model as well as the kinematic model for the Kinect sensor in comparison with those via the 3DMC system. Kinect-based system. The system has great potential to be an alternative of the 3DMC system and be widely used in clinics or other organizations, which lacks money, specialties, or lab space.

Methods
We denote the kinematic model for Kinect by and the UWA kinematic model for a 3DMC system by . The deep learning refined kinematic model for Kinect v2 is denoted by , which is a combination of the model and the trained neural network N. The upper limb kinematics calculated by model and are defined as and , respectively. We train a long short-term memory (LSTM) recurrent neural network (RNN) N using a supervised machine learning architecture to compensate for the systematic error of . During the training session, and are taken as the input data and the target data, respectively. In the application stage, is given as the input of N, and output is our refined upper limb kinematics (defined as ). See Figure 1 for a simple demonstrate. The UWA kinematic modeling for the 3DMC system and the upper limb kinematic modeling for the Kinect v2 system follow the procedures demonstrated in Figure 2. A standard 3D kinematic modeling procedure [43] includes four steps, namely setting up a global coordination system, setting up local segments coordination systems, calculation of transformation matrix for segment investigated and calculation of upper limb kinematics. The 3DMC system and the Kinect v2 sensor capture 3D marker trajectories and record 3D joint trajectories of a participant concurrently when performing upper limb functional tasks. Our deep learning refined kinematic model significantly improves the performance of upper limb kinematic assessment using a single Kinect v2 sensor for all investigated upper limb joint angles across all functional tasks. At the same time, such an assessment system is also easy to calibrate and operate. The requirements for laboratory space and specialties are easy to be fulfilled for a single Kinect-based system. The system has great potential to be an alternative of the 3DMC system and be widely used in clinics or other organizations, which lacks money, specialties, or lab space.

Methods
We denote the kinematic model for Kinect by Φ and the UWA kinematic model for a 3DMC system by Γ. The deep learning refined kinematic model for Kinect v2 is denoted byΦ, which is a combination of the model Φ and the trained neural network N. The upper limb kinematics calculated by model Φ and Γ are defined as k Φ and k Γ , respectively. We train a long short-term memory (LSTM) recurrent neural network (RNN) N using a supervised machine learning architecture to compensate for the systematic error of Φ. During the training session, k Φ and k Γ are taken as the input data and the target data, respectively. In the application stage, k Φ is given as the input of N, and output is our refined upper limb kinematics (defined as kΦ). See Figure 1 for a simple demonstrate.
The UWA kinematic modeling for the 3DMC system and the upper limb kinematic modeling for the Kinect v2 system follow the procedures demonstrated in Figure 2. A standard 3D kinematic modeling procedure [43] includes four steps, namely setting up a global coordination system, setting up local segments coordination systems, calculation of transformation matrix for segment investigated and calculation of upper limb kinematics. The 3DMC system and the Kinect v2 sensor capture 3D marker trajectories and record 3D joint trajectories of a participant concurrently when performing upper limb functional tasks.

Upper Limb Kinematic Modeling for Kinect v2
The 3D coordinates of the anatomical landmarks identified from the skeletal model of the Kinect v2 system (see Figure 3) during functional tasks are recorded concurrently with the 3DMC system. Local segment coordinates, including Thorax λ and Upper Arm η, are established. Each of the segment is based on the global coordinate.

Upper Limb Kinematic Modeling for Kinect v2
The 3D coordinates of the anatomical landmarks identified from the skeletal model of the Kinect v2 system (see Figure 3) during functional tasks are recorded concurrently with the 3DMC system. Local segment coordinates, including Thorax and Upper Arm , are established. Each of the segment is based on the global coordinate.

Upper Limb Kinematic Modeling for Kinect v2
The 3D coordinates of the anatomical landmarks identified from the skeletal model of the Kinect v2 system (see Figure 3) during functional tasks are recorded concurrently with the 3DMC system. Local segment coordinates, including Thorax and Upper Arm , are established. Each of the segment is based on the global coordinate.  The origin of the thorax segment is defined by SpineShoulder (SS). The y-axis of the thorax segment is defined by the unit vector going from SpineMid (SM) to SS (Equation (1)). The z-axis of the thorax segment is defined by the unit vector perpendicular to y-axes and the vector from ShoulderLeft (SL) to ShoulderRight (SR) (Equation (2)). The x-axis of the thorax segment is defined by z and y-axes to create a right-hand coordinate system (Equation (3)). The coordinate system of the thorax segment C Φ,λ is then constructed by x, y and z-axis (Equation (4)): The origin of the right upper arm segment is the right elbow joint center ElbowRight (ER). The y-axis of the right upper arm segment is defined by the unit vector going from the elbow joint center to shoulder joint center, ShoulderRight (SR), see Equation (5). The z-axis of the right upper arm segment is defined by the unit vector perpendicular to the plane formed by y-axis of the upper arm and the long axis vector of the forearm, pointing laterally (Equation (6)). The x-axis of the right upper arm segment R Φ,η is defined by the unit vector perpendicular to the z and y-axes, pointing anteriorly (Equation (7)). The coordinate system of the upper arm segment C Φ,η is then constructed by x, y and z-axis of the segment (Equation (8)): Then our customized upper limb kinematics model for the Kinect v2 system calculates the three Euler angles (α FE , α AA , α IE ) for shoulder rotations, which following the flexion (+)/extension (−), adduction (+)/abduction (−) and internal (+)/external (−) rotation order. The rotation matrix R Φ (λ, η) is obtained via the parent coordinate system C Φ,λ (Equation (4)) and the child coordination system C Φ,η (Equation (8)). Shoulder flexion/extension α FE , adduction/abduction α AA , internal/external rotation α IE angles are calculated by solving the multivariable equations in Equation (9).
The elbow flexion/extension angle α EFE is calculated by the position data from ShoulderRight (SS), ElbowRight (ER), and WristRight (WR) using the trigonometric function (Equations (10) and (11)). In equation (10), V Φ,WE is the unit vector going from the elbow joint center to the wrist joint center. The upper limb kinematics via the Kinect based system k Φ is formed by the shoulder and elbow joint angles (Equation (11)). The kinematics model for Kinect v2 was developed using MATLAB 2019a: The angular waveforms between the Kinect v2 sensor and the Vicon system are synchronized during post processing. The joint angles from both systems are firstly resampled to 300 Hz using the Matlab function "interp" and then synchronized using a cross-correlation based shift synchronization technique.

UWA Kinematic Modeling via 3D Motion Capture System
The UWA kinematic model Γ for the reference 3DMC system (in this paper we use Vicon, Oxford Metrics Group, Oxford, UK) is based on the 3D trajectories of the reflective markers to the anatomical position of each subject according to the UWA upper limb marker set [44]. The UWA marker set includes the seventh cervical vertebra (C7), 10th thoracic vertebra (T10), sternoclavicular notch (CLAV), xyphoid process of the sternum (STRN), posterior shoulder (PSH), anterior shoulder (ASH), elbow medial epicondyle (EM), elbow lateral epicondyle (EL), most caudal-lateral point on the radial styloid (RS), caudal-medial point on the ulnar styloid (US), a triad of markers affixed to upper arm (PUA), a triad of markers affixed to forearm (DUA) and the metacarpal (CAR) (see Figure 4 for the detailed marker setting). The PUA and DUA are positioned in areas that are not largely influenced by the soft tissue artifact, according to Campbell et al. [45,46]. Medial and lateral elbow epicondyle markers are removed for the dynamic functional tasks.
The angular waveforms between the Kinect v2 sensor and the Vicon system are synchronized during post processing. The joint angles from both systems are firstly resampled to 300 Hz using the Matlab function "interp" and then synchronized using a cross-correlation based shift synchronization technique.

UWA Kinematic Modeling via 3D Motion Capture System
The UWA kinematic model for the reference 3DMC system (in this paper we use Vicon, Oxford Metrics Group, Oxford, UK) is based on the 3D trajectories of the reflective markers to the anatomical position of each subject according to the UWA upper limb marker set [44]. The UWA marker set includes the seventh cervical vertebra (C7), 10th thoracic vertebra (T10), sternoclavicular notch (CLAV), xyphoid process of the sternum (STRN), posterior shoulder (PSH), anterior shoulder (ASH), elbow medial epicondyle (EM), elbow lateral epicondyle (EL), most caudal-lateral point on the radial styloid (RS), caudal-medial point on the ulnar styloid (US), a triad of markers affixed to upper arm (PUA), a triad of markers affixed to forearm (DUA) and the metacarpal (CAR) (see Figure  4 for the detailed marker setting). The PUA and DUA are positioned in areas that are not largely influenced by the soft tissue artifact, according to Campbell et al. [45,46]. Medial and lateral elbow epicondyle markers are removed for the dynamic functional tasks. A biomechanical model is employed based on the UWA upper limb marker set [39,44]. The coordinates of each marker at each sample point in the global coordinate system are recorded and represented by a three-dimensional vector (x, y, z). Four rigid body segments, namely Thorax, Torso, Upper Arm, and Forearm, are defined based on the anatomical landmark positions following the recommendations of the International Society of Biomechanics (ISB) [47]. In the following equations, body segment Thorax, Torso, Upper Arm and Forearm are defined as , , , and , respectively. The origin of a segment is denoted by o. The axes of each coordinate system are denoted by x, y and z.
The origin , of the thorax segment is defined as the midpoint between C7 and CLAV. The origin , of the torso segment is defined as the midpoint of T10 and STRN. The y-axis of thorax coordination system , is defined by the unit vector going from the midpoint of T10 and STRN to the midpoint of C7 and CLAV, pointing upwards. The z-axis of the thorax coordinate system , is A biomechanical model is employed based on the UWA upper limb marker set [39,44]. The coordinates of each marker at each sample point in the global coordinate system are recorded and represented by a three-dimensional vector (x, y, z). Four rigid body segments, namely Thorax, Torso, Upper Arm, and Forearm, are defined based on the anatomical landmark positions following the recommendations of the International Society of Biomechanics (ISB) [47]. In the following equations, body segment Thorax, Torso, Upper Arm and Forearm are defined as λ, µ, η, and ψ, respectively. The origin of a segment is denoted by o. The axes of each coordinate system are denoted by x, y and z.
The origin o Γ,λ of the thorax segment is defined as the midpoint between C7 and CLAV. The origin o Γ,µ of the torso segment is defined as the midpoint of T10 and STRN. The y-axis of thorax coordination system y Γ,λ is defined by the unit vector going from the midpoint of T10 and STRN to the midpoint of C7 and CLAV, pointing upwards. The z-axis of the thorax coordinate system z Γ,λ is defined by the unit vector perpendicular to the plane defined by T10, C7 and CLAV, pointing laterally. The x-axis of the thorax coordinate system x Γ,λ is defined by the unit vector perpendicular to the plane defined by the y-axis and z-axis to create a right-hand coordinate system. The coordinate system of the thorax segment C Γ,λ is then constructed by its x, y, and z-axis.
The origin o Γ,η of the right upper arm segment is defined by the elbow joint center E, which is the midpoint between EL and EM. The y-axis of the right upper arm segment y Γ,η is defined by the unit vector going from the elbow joint center E to shoulder joint center S, which is the center of PSH, ASH and ACR. The z-axis of the right upper arm segment z Γ,η is defined as the unit vector perpendicular to the plane formed by the y-axis of the upper arm and the long axis vector of the forearm. The x-axis x Γ,η is defined by the y-axis and the z-axis of the right upper arm segment to create a right-hand coordinate system. The coordinate system of the upper arm segment C Γ,η is then constructed by x, y and z-axis of the segment.
The origin o Γ,ψ of the right forearm segment coordinate system is defined by the wrist joint center W, which is the midpoint between RS and US. The y-axis of the right forearm segment coordinate system y Γ,ψ is defined by the unit vector from the wrist joint center W to the elbow joint center E, pointing upwards. The x-axis of the right forearm segment coordinate system x Γ,ψ is defined by the unit vector perpendicular to the plane formed by y-axis and the vector from US to RS, pointing anteriorly. The z-axis z Γ,ψ is defined by the unit vector perpendicular to the x and y-axis of right forearm segment, pointing anteriorly. The coordinate system of the forearm segment C Γ,ψ is then constructed by x, y and z-axis of the segment.
The calibrated anatomical systems technique [48] is used to establish the motion of anatomical landmarks relative to the coordinate systems of the upper arm cluster (PUA) or the forearm cluster (DUA). The motion of the upper-limb landmarks could be reconstructed from their constant relative positions to the upper-arm technical coordinate system. For each sampling time frame, the coordinates of each segment with respect to its proximal segment are transformed by a sequence of three rotations following z-x-y order.
The UWA upper limb kinematic model Γ is developed using the Vicon Bodybuilder software (Oxford Metrics Group). The reference shoulder angles and elbow flexion/extension angle k Γ = [β FE , β AA , β IE , β EFE ] are used as a golden standard to train our deep learning refined model for the Kinect v2 system. We use fourth-order zero-lag Butterworth low-pass filter with the cut-off frequency of 6 Hz for the UWA model Γ as well as the Kinematic model for Kinect Φ. The cut-off frequency is followed the recommendation from the literature and determined by residual analysis for the upper limb tasks [49].

Long Short-Term Memory Neural Network
We construct a recurrent neural network [41] N to refine the upper limb kinematics k Φ = [α FE , α AA , α IE , α EFE ] calculated by the kinematic model for Kinect v2 (see Section 2.1). In order to reduce the systematic error of Φ, the kinematics k Γ = [β FE , β AA , β IE , β EFE ] calculated by the UWA model for the 3DMC system (see Section 2.2) is taken as a target. To adapt the neural network, we assume that k Φ and k Γ are normalize into range [0,1].
As shown in Figure 5, our neural network is formed by three long short-term memory (LSTM) layers. The input of our model is a 101-time-step sequence (t = 101). The unit of each time-step is a 4-dimensional vector. We use, empirically, 100 neural units in each LSTM cell. The output of the model is also a 101-time-step sequence with 4-dimensional vectors.
To train this model, we let k Φ be the input of the model. The output is denoted by kΦ = [α FE ,α AA ,α IE ,α EFE ]. We calculate the mean square error between kΦ and k Γ as the loss of the model, and then employ an Adam method for optimization [50]. The network is trained with a batch size of 20 and the learning rate is set to 0.006 for 200 epochs. In application, the upper limb kinematics k Φ To train this model, we let be the input of the model. The output is denoted by ]. We calculate the mean square error between and as the loss of the model, and then employ an Adam method for optimization [50]. The network is trained with a batch size of 20 and the learning rate is set to 0.006 for 200 epochs. In application, the upper limb kinematics calculated by Kinect v2 is taken as the input of the neural network. The output of the neural network, namely , is our refined upper limb kinematics.

Subjects
We recruited thirteen healthy male university students (age: 25.3 ± 2.5 years old; height: 173.2 ± 4.1 cm; mass: 69.1 ± 6.5 Kg). The participants were absent of any upper limb neuromusculoskeletal problems or medication use that would affect their upper limb functions. The participants were informed about the basic procedure of the experiment before the test. The experimental protocol was approved by the Research Academy of Grand Health's Ethics Committee at Ningbo University and all participants provided written informed consent.

Experiment Protocol
We used a concurrent validity design to evaluate our deep learning based upper limb functional assessment system using the Kinect v2 sensor. The 3D anatomical position of the upper limb (take the right side as an example) and trunk were recorded concurrently by a Kinect v2 sensor and a 3DMC system with eight high-speed infrared cameras (Vicon, Oxford Metrics Ltd, Oxford, UK). The Kinect v2 sensor and the 3DMC recorded the position of anatomical landmarks with sampling frequencies of around 30 Hz and 100 Hz, respectively. The Kinect sensor was placed on a tripod, 0.8 meters above the ground, and 2 meters in front of the subject [51].
Optical reflective markers were attached to the anatomical landmarks of each individual following the instruction of the UWA upper limb marker set [39]. A static trial was recorded firstly during which the participant was standing in the anatomical position. The elbow and wrist markers were removed during dynamic trials. Four functional tasks, as shown in Figure 6, representing a variety of active daily functional activities [42] and at the same time are important for independent living [52], were performed. The tasks were selected based on previous studies [42,[52][53][54][55] after extensive consultation with clinicians. These tasks are also used in assessment scales such as Mallet score, which is commonly used for evaluation of shoulder function [56].
The participants were absent of any upper limb neuromusculoskeletal problems or medication use that would affect their upper limb functions. The participants were informed about the basic procedure of the experiment before the test. The experimental protocol was approved by the Research Academy of Grand Health's Ethics Committee at Ningbo University and all participants provided written informed consent.

Experiment Protocol
We used a concurrent validity design to evaluate our deep learning based upper limb functional assessment system using the Kinect v2 sensor. The 3D anatomical position of the upper limb (take the right side as an example) and trunk were recorded concurrently by a Kinect v2 sensor and a 3DMC system with eight high-speed infrared cameras (Vicon, Oxford Metrics Ltd., Oxford, UK). The Kinect v2 sensor and the 3DMC recorded the position of anatomical landmarks with sampling frequencies of around 30 Hz and 100 Hz, respectively. The Kinect sensor was placed on a tripod, 0.8 meters above the ground, and 2 meters in front of the subject [51].
Optical reflective markers were attached to the anatomical landmarks of each individual following the instruction of the UWA upper limb marker set [39]. A static trial was recorded firstly during which the participant was standing in the anatomical position. The elbow and wrist markers were removed during dynamic trials. Four functional tasks, as shown in Figure 6, representing a variety of active daily functional activities [42] and at the same time are important for independent living [52], were performed. The tasks were selected based on previous studies [42,[52][53][54][55] after extensive consultation with clinicians. These tasks are also used in assessment scales such as Mallet score, which is commonly used for evaluation of shoulder function [56]. Task 1: Hand to the contralateral shoulder, which represents all activities near contralateral shoulder such as washing axilla or zip up a jacket. Subjects started with the arm in the anatomical position with their hand handing beside their body in a relaxed neutral position and end up with the hand touched the contralateral shoulder (see Figure 6, left); Task 2: Hand to mouth or drinking, which represents activities such as eating and reaching the face. It begins with the same starting point, and ends when the hand reached subject's mouth (see Figure 6, middle-left); Task 3: Combing hair, which represents activities such as reaching the (back of the) head and washing hair. Subjects were instructed to move their hand to the back of their head (see Figure 6, middle-right); Task 4: Hand to back pocket, which represents reaching the back and perineal care. It begins with the same starting point and ends when the hand placed on the back pocket (see Figure 6, right).

Leave One Subject Out Cross-Validation
We firstly calculate upper limb kinematics and using upper limb kinematic model for Kinect v2 system and the UWA kinematic model for the reference 3DMC system, respectively. For all four functional tasks, the joint angles are resampled to 101-time steps. Joint angles are represented as 0-100% across the time domain, with 0% being the initial and 100% being the finish. Next, we use a leave one subject out cross-validation (LOOCV) (see Figure 7) to evaluate the performance of our proposed deep learning refined upper limb functional assessment model using Kinect v2 sensor.  Task 2: Hand to mouth or drinking, which represents activities such as eating and reaching the face. It begins with the same starting point, and ends when the hand reached subject's mouth (see Figure 6, middle-left); Task 3: Combing hair, which represents activities such as reaching the (back of the) head and washing hair. Subjects were instructed to move their hand to the back of their head (see Figure 6, middle-right); Task 4: Hand to back pocket, which represents reaching the back and perineal care. It begins with the same starting point and ends when the hand placed on the back pocket (see Figure 6, right).

Leave One Subject Out Cross-Validation
We firstly calculate upper limb kinematics k Φ and k Γ using upper limb kinematic model for Kinect v2 system Φ and the UWA kinematic model Γ for the reference 3DMC system, respectively. For all four functional tasks, the joint angles are resampled to 101-time steps. Joint angles are represented as 0-100% across the time domain, with 0% being the initial and 100% being the finish. Next, we use a leave one subject out cross-validation (LOOCV) (see Figure 7) to evaluate the performance of our proposed deep learning refined upper limb functional assessment modelΦ using Kinect v2 sensor.
Using the LOOCV protocol, the kinematic data k Φ and k Γ are partitioned into training data and test data. Assuming that we have n subjects, the validation process iterates n times. For each iteration, kinematic data of the left-out subject is set as the testing data and the kinematics of the remaining subjects is set as the training data. The testing data include one 3D matrix, which are the shoulder and elbow joint angles of the left-out subject calculated via the kinematic model for Kinect v2 system Φ. The training data from the remaining subjects are consist of two 3D matrices, the upper limb joint angles calculated via model Φ, regarded as the input data of the deep learning refined kinematic model Φ for Kinect v2, and the reference UWA kinematic model Γ for the 3DMC system, regarded as the target data of the modelΦ. Our deep learning refined kinematic modelΦ explores the nonlinear relationship between the upper limb kinematics via the kinematics model for Kinect k Φ and those angles via the UWA model using the 3DMC system k Γ . Such a model can reduce the systematic error of the Kinect system.

Kinect v2 system
and the UWA kinematic model for the reference 3DMC system, respectively. For all four functional tasks, the joint angles are resampled to 101-time steps. Joint angles are represented as 0-100% across the time domain, with 0% being the initial and 100% being the finish. Next, we use a leave one subject out cross-validation (LOOCV) (see Figure 7) to evaluate the performance of our proposed deep learning refined upper limb functional assessment model using Kinect v2 sensor.

Performance Evaluation and Statistical Analysis of the Deep Learning Refined Kinematic Model
The performance of our developed modelΦ is evaluated based on the test data, using the upper limb kinematics calculated via the model Γ for the 3DMC system as the ground truth. The coefficient of multiple correlation (CMC) values and root mean squared errors (RMSE) are calculated between upper limb kinematic waveforms k Φ and k Γ as well as between kΦ and k Γ in each application session.
The CMC values are used to evaluate the similarity and repeatability of the upper limb joint angle trajectories between k Φ and the k Γ as well as the similarities between kΦ and k Γ . The CMCs are calculated following Kadaba's approach [57]. The CMC values are explained as excellent similarity (0.95-1), very good similarity (0.85-0.94), good similarity (0.75-0.84), moderate similarity (0.6-0.74) and poor similarity (0-0.59) [58]. The RMSE values are employed to evaluate mean errors between the upper limb angle waveforms k Φ and the k Γ as well as errors between kΦ and k Γ across all functional tasks.
Range of motion (ROM) values and the joint angle at the point of target achieved (PTA) via the kinematic model Φ, our deep learning refined kinematic modelΦ for Kinect v2 system and the UWA kinematic model Γ for the 3DMC system are calculated and extracted. Both ROM and PTA data are extracted from the test data in the application process. The normality of all ROM and PTA values are tested by the Shapiro-Wilks test (p > 0.05). A paired sample t-test is used for the parameters which are normally distributed; the Wilcoxon Signed Ranks Test is used for those who are not. Bland-Altman analysis with 95% limits of agreement (LoA) is performed to assess the agreements between the ROMs and the PTAs via model Φ and model Γ as well as the agreements via modelΦ and model Γ. The CMC and RMSE are analyzed using Matlab 2019a, and the rest statistical analysis is carried out using SPSS 25.0.

Joint Kinematic Waveforms Validity
The kinematic waveforms of the chosen representative upper limb functional tasks via the kinematic model Φ and our deep learning refined kinematic modelΦ for the Kinect v2 system are presented in Figures 8-11 by means of average angles from the testing data. Joint angles via the UWA kinematic model Γ for the 3DMC system are presented in Figures 8-11 as the golden standard. The CMC values between k Φ and k Γ as well as between kΦ and k Γ are presented in Table 1. The RMSE values are presented in Table 2.       The RMSEs between and the as well as the RMSEs between and also demonstrate the promising ability of our deep learning refined kinematic model in increasing upper limb joint kinematic accuracy using Kinect v2. The RMSEs are both plane-dependent and taskdependent. Our model decreases the RMSEs with much lower mean values and standard deviations for all degrees of freedom under all functional tasks in comparison model . The RMSEs via our model are significantly smaller than those via model (p < 0.05) except for shoulder flexion/extension angles during the hand to back pocket task. For shoulder flexion/extension angle during the hand to back pocket task, despite the RMSEs via our model and via model do not reach significant difference, the RMSEs via both models are all relatively small. Our model yields lower RMSEs. Taking the combing hair task as an example, the RMSEs drop from 41.73° ± 8.19° to 11.50° ± 7.25° for shoulder flexion/extension angles, from 11.91° ± 4.61° to 5.14° ± 1.83° for shoulder adduction/abduction angles, from 31.45° ± 6.89° to 8.59° ± 2.91° for shoulder internal/external rotation angles and from 25.83° ± 3.45° to 6.96° ± 2.92° for elbow flexion/extension angles after using model instead of model .   Our modelΦ significantly improves the waveform similarity (see Table 1) and decreases the RMSE (see Table 2 The RMSEs between k Φ and the k Γ as well as the RMSEs between kΦ and k Γ also demonstrate the promising ability of our deep learning refined kinematic modelΦ in increasing upper limb joint kinematic accuracy using Kinect v2. The RMSEs are both plane-dependent and task-dependent. Our modelΦ decreases the RMSEs with much lower mean values and standard deviations for all degrees of freedom under all functional tasks in comparison model Φ. The RMSEs via our modelΦ are significantly smaller than those via model Φ (p < 0.05) except for shoulder flexion/extension angles during the hand to back pocket task. For shoulder flexion/extension angle during the hand to back pocket task, despite the RMSEs via our modelΦ and via model Φ do not reach significant difference, the RMSEs via both models are all relatively small. Our modelΦ yields lower RMSEs. Taking the combing hair task as an example, the RMSEs drop from 41.73 • ± 8.19 • to 11.50 • ± 7.25 • for shoulder flexion/extension angles, from 11.91 • ± 4.61 • to 5.14 • ± 1.83 • for shoulder adduction/abduction angles, from 31.45 • ± 6.89 • to 8.59 • ± 2.91 • for shoulder internal/external rotation angles and from 25.83 • ± 3.45 • to 6.96 • ± 2.92 • for elbow flexion/extension angles after using modelΦ instead of model Φ.
Using our deep learning refined kinematic modelΦ, shoulder and elbow flexion/extension angles during all four functional tasks show excellent similarities between kΦ and k Γ with the mean CMC of 0.95-0.99 except for slightly lower similarities during Task 4 (mean CMC = 0.94 and 0.93 for shoulder and elbow joint respectively). The shoulder internal/external rotation angles show excellent similarity (mean CMC = 0.98) during Task 1, very good similarity (mean CMC = 0.89) during Task 3 and good similarity during Task 2 and Task 4 (mean CMC = 0.75 for both tasks). For shoulder adduction/abduction angles, excellent similarity (mean CMC = 0.97), very good similarity (mean CMC = 0.88) and good similarity (mean CMC = 0.79) are observed in Task 3, Task 1 and Task 4, respectively. The lowest similarity is found for the shoulder adduction/abduction angles during the drinking water task with the mean CMC of 0.72.

Joint Kinematic Variables Validity
The joint angles at the point of target achieved (PTA) and the range of motion (ROM) during the upper limb functional tasks via the kinematic model Φ and our deep learning refined kinematic modelΦ for the Kinect v2 system as well as via the UWA kinematic model Γ for the 3DMC system are presented in Tables 3 and 4, by means of mean and standard deviation values (± SD). Differences and statistical significance of PTAs via model Φ and modelΦ in comparison with the PTAs via model Γ are given in Table 3; whereas the absolute errors and statistical significance of ROMs are given in Table 4. The Bland-Altman plots for all PTAs and ROMs are presented in Figures 12-15. Table 3. The joint angle at the point of target achieved (PTA) with mean and standard deviation (SD) values calculated via the kinematic model Φ for Kinect v2, our deep learning refined kinematic model Φ for Kinect v2 and the UWA model Γ for the 3DMC system. Φ − Γ represents the discrepancy between the PTAs via the model Φ and the reference model Γ.Φ − Γ represents the differences between the PTAs via the modelΦ and the reference model Γ. The PTAs via the model Φ are all reached significant difference in comparison with those via the refence model Γ (p < 0.05) except the shoulder flexion/extension angle during the hand to back pocket task and the shoulder adduction/abduction angle during the combing hair task. In contrast, there is no significant difference in all PTAs via our modelΦ and the references except for those of the elbow flexion/extension angles during the hand to back pocket task (p = 0.045). Although statistical significance exists in the PTAs of elbow flexion/extension angle during the combing hair task     Bland-Altman plots with 95% limits of agreement for joint kinematic parameters during the hand to back pocket task. X axes represents the angle means of two systems and the Y axes represents the mean of differences. The red line (middle one) represents the reference line at mean, and the two dashed lines represent the upper and lower limit of agreement. The upper four rows are the angles at the point of target achieved (PTA) and the lower four rows are the range of motion (ROM) values. Plots of the left column are measurement differences between our deep learning refined kinematic model for Kinect and the UWA kinematic model for the 3DMC. Plots of the left column are measurement differences between our deep learning refined kinematic modelΦ for Kinect and the UWA kinematic model Γ for the 3DMC. Plots of the right column are measurement differences between the kinematic model Φ for Kinect and the UWA kinematic model for the 3DMC Γ.

Discussion
Our study developed a novel deep learning refined kinematic model for 3D upper limb kinematic assessment using a single Kinect v2 sensor. Our refined modelΦ is in good agreement with the 3DMC system and is far more accurate than the traditional kinematic model using the same Kinect v2 sensor for upper limb waveforms, joint angles at the point of target achieved (PTA), and the range of motions (ROM) across all functional tasks. Using our deep learning-based model, the Kinect v2 could measure shoulder and elbow flexion/extension waveforms with mean CMCs >0.93 for all investigated tasks, shoulder adduction/abduction, and internal/external rotation waveforms with mean CMCs >0.8 for most of the tasks. The mean deviations of angles at the PTA and ROM are under 5 • for all investigated joint angles during all investigated functional tasks. In clinic application, generally less than 2 • is considered acceptable, an error between 2 • -5 • may also be acceptable with appropriate interpretation [59,60]. Thus, the performance of our deep learning refined kinematic model using a single Kinect v2 sensor is promising as an upper limb functional assessment system.
The results agree with other studies on similar upper limb functional tasks [42]. During the combing hair task, at the maximum elevation, the mean elbow flexion via our modelΦ is 146 • . This is in agreement with results of van Andel et al. [42], Magermans et al. [52] and Morrey et al. [61], who find average elbow flexion angles of 122 • , 136 • and 100 • , respectively. Andel et al. [42] find that the shoulder flexion angles reach nearly 100 • in the combing hair tasks and stay under 70 • during the other tasks. This is also the same case in our study via our modelΦ using a Kinect v2 sensor. Shoulder flexion angles are around 108 • during the hair combing task and remain under 60 • during the hand to contralateral shoulder and the hand to mouth task. The hand to mouth task does not require the full ROM of all joints and the most important joint angle is elbow flexion [52]. The mean elbow flexion is 112 • via our modelΦ, which is consistent with Magermans et al.'s research with the elbow flexion of 117 • [52].
The systematic errors of the proposed Kinect-based upper limb assessment system include errors due to the inaccurate depth measurement and the motion artifact of moving objects [10]. Kinect v2 measures the depth information based on the Time of Flight (ToF) technique. The ToF measures the time that "light emitted by an illumination unit requires to travel to an object and back to the sensor array". Kinect v2 utilizes the Continuous Wave (CW) Intensity Modulation approach, which requires several correlated images for calculation of each depth image. The distance calculated based on the mixing of correlated images requires approximation on the CW algorithm and causes systematic error in depth measurement. Recording and processing the correlated images are also both affected by moving objects, which lead to inaccurate depth measurement at object boundaries [10].
The systematic errors also include error due to the kinematic modeling. In both kinematic models, the shoulder joint angles are considered as humerus coordinate rotations relative to the thorax coordinate systems. The kinematic models developed for the Kinect v2 sensor and the model used for the 3DMC system are followed the same recommendation on the definition of joint coordinate systems of trunk, shoulder, and elbow joint proposed by the International Society of Biomechanics [47,62]. The second option of humerus coordinate system is used for both systems [47], in which the z-axis of the humerus coordinate system is perpendicular to the plane formed by the vector from the elbow joint center to the shoulder joint center and the vector from the wrist joint center to the elbow joint center. For the UWA model, the thorax segment is defined by the 7th cervical vertebra, the 10th thoracic vertebra, the sternoclavicular notch and the xyphoid process of the sternum. Because of the limited ability of skeletal joint tracking in the Kinect based system, the thorax coordinate system is defined by Kinect Skeleton landmarks of both trunk segment and shoulder joints (i.e., SpineShoulder, SpineMid, ShoulderLeft and ShoulderRight). Thus, tasks with large clavicle movements such as combing hair have great deviations in shoulder kinematic assessment. In our study, the shoulder joint angles during the combing hair task yield the largest root-mean-squared errors using our deep learning refined model in comparison with the golden standard system. From Figures 8-11, it can be seen that the systematic error of the Kinect based system is highly nonlinear. The LSTM network we employed is the state-of-the-art recurrent neural network, which is good at modeling the nonlinear relationship for time series data. Our deep learning based algorithm yields better results to the linear regression algorithm [63] in refining joint angles using a single Kinect sensor. In assessing shoulder joint angles during the computer-using task, only shoulder adduction/abduction is improved after the linear regression refinement [63]. As the measurement error is positively correlated with the magnitude of that joint angle [63], the measurement error is presented with its ROM. After applying the linear regression calibration, the mean RMSE of the shoulder adduction/abduction angle are decreased from 14.8 • and 9.1 • for the right and left shoulder respectively to 7.5 • , during which the ROM of the angle is under 20 • . While using our deep learning refined kinematic modelΦ, all upper limb joint angles, including shoulder flexion/extension, shoulder adduction/abduction, shoulder internal/external rotation, and elbow flexion/extension, are significantly improved during all functional tasks. Notably, the mean RMSEs of shoulder adduction/abduction angles are decreased to around 3 • for task 1, task 2 and task 4 and to around 5 • for task 3 with the mean ROMs of 12.97 • to 47.36 • .
Previous studies reveal that Kinect v2 with the automated body tracking algorithm is also not suitable to assess lower-body kinematics. The deviation of hip flexion during the swing phase is more than 30 • during walking [15]. The limits of agreement (LoA) between the Kinect v2 sensor and the 3DMC system are 28 • , 46 • for peak knee flexion angle at a self-selected walking speed [15], 7 • , 25 • for trunk anterior-posterior flexion [16]. Average errors of 24 • , 26 • are observed for the right and left peak knee flexion angles during squatting [19].
Timmi et al. [37] employed custom-made colored markers placed on bony prominences near the hip, knee, and ankle. The marker tracking approach improves the knee angle measurement with the LOA of −1.8 • and 1.7 • for flexion and −2.9 • , 1.7 • for adduction during fast walking. Compared with gait analysis and static posture assessment, motion analysis of the upper limb using Kinect sensors is far more challenging. Upper limb functional activities show a larger variation in the healthy population and a higher number of degrees of freedom in the upper limb. The upper limb, especially the shoulder joint, has a very large working range, comparing to the lower extremity. Furthermore, the upper limb joints are easy to be occluded by each other. The marker-tracking methodology may not be suitable for the Kinect based system in assessing upper limb kinematics.

Conclusions
We have developed a novel deep learning refined kinematic model for upper limb functional assessment using a single Kinect v2 sensor. The system demonstrates good kinematic accuracy in comparison with a standard marker-based 3D motion capture system during performing upper limb functional tasks, suggesting that such a single-Kinect-based kinematic assessment system has great potential to be used as an alternative of the traditional marker-based 3D motion capture system. Such a low-cost, easy to use system with good accuracy will help small rehabilitation clinics or meet the need for rehabilitation at home.