Towards a Live Feedback Training System: Interchangeability of Orbbec Persee and Microsoft Kinect for Exercise Monitoring

: Many existing motion sensing applications in research, entertainment and exercise monitoring are based on the Microsoft Kinect and its skeleton tracking functionality. With the Kinect’s development and production halted, researchers and system designers are in need of a suitable replacement. We investigated the interchangeability of the discontinued Kinect v2 and the all-in-one, image-based motion tracking system Orbbec Persee for the use in an exercise monitoring system prototype called ILSE. Nine functional training exercises were performed by six healthy subjects in front of both systems simultaneously. Comparing the systems’ internal tracking states from ’not tracked’ to ‘tracked’ showed that the Persee system is more conﬁdent during motion sequences, while the Kinect is more conﬁdent for hip and trunk joint positions. Assessing the skeleton tracking robustness, the Persee’s tracking of body segment lengths was more consistent. Furthermore, we used both skeleton datasets as input for the ILSE exercise monitoring including posture recognition and repetition-counting. Persee data from exercises with lateral movement and in uncovered full-body frontal view provided the same results as Kinect data. The Persee further preferred tracking of quasi-static lower limb motions and tight-ﬁtting clothes. With these limitations in mind, we ﬁnd that the Orbbec Persee is a suitable replacement for the Microsoft Kinect for motion sensing within the ILSE exercise monitoring system.


Introduction
Over recent decades, lifestyle factors such as sufficient physical activity were identified to reduce the risk for common health issues such as diabetes, obesity or cardiovascular diseases [1][2][3]. While ageing physical activity decreases as people have to deal with different physical and psychological limitations [4][5][6]. Accordingly, in 2010, the World Health Organization (WHO) released global recommendations on physical activity throughout the course of life [7]. Besides moderate to vigorous physical activity, the WHO recommends muscle-strengthening activities such as functional training, which is known to increase body flexibility and, thus, prevent injury, falls and other age-related disorders [8,9]. Nevertheless, lack of motivation, fear of injury and previous unpleasant experiences impede physical activity promotion interventions, in particular within the target group of people aged 65 years and older [10][11][12][13][14].
The interdisciplinary research field of Active and Assisted Living (AAL) supports the development and evaluation of assistive technologies and services for healthy ageing to maintain or even improve the quality of life [15]. Several AAL applications were developed to promote training in order to prolong people's functional abilities and, thus, foster independent living. With technological support, functional training can be promoted or even guided [16][17][18]. These exercise monitoring or guidance systems use motion sensing technologies such as wearable sensors or 3D cameras like the Microsoft Kinect. They record and automatically analyse the exercises performed, ideally at home in order to support active ageing of people before severe frailty [19,20]. The positive effect of exercise and, in particular, of functional training interventions on the quality of life has been observed by several studies [21][22][23].
The aim of the Austrian AAL pilot region fit4AAL was to develop and evaluate a plug and play system called ILSE. This system integrated posture recognition and live feedback for exercise monitoring during at-home functional training sessions. Sensor-based monitoring of remote training sessions enables the investigation of more objective usage patterns. For example, the reported amount of functional home training is based on measured activity, not on self-reported activity. Knowledge of the current progress has functional benefits, such as workflows that continue automatically after the target repetition count has been reached [24]. Trainers and personal coaches benefit from the knowledge of the quality and quantity at which their trainees performed functional training sessions at home. Older adults benefit from the guidance and live qualitative and quantitative feedback. Kritikos et al. [25] proved that live feedback improved movement quality gradually. Hence, we sought to provide an extended training experience with the advantages of live feedback during functional training at home. For these requirements, the consortium chose the Microsoft Kinect as a suitable device to design the motion sensing component following the design science research methodology by Johannesson and Perjons [26]. Right before the start of the prototype development in autumn 2017, Microsoft announced the discontinuation of the Kinect sensor series. Considering the numerous applications of Kinect in various research areas beyond fit4AAL, the research question of this paper arose as to whether there is a viable alternative to Kinect for technology-assisted exercise monitoring.
One of these alternative solutions providing skeleton tracking were the camera systems by Orbbec. In late 2015, the company Orbbec announced the world's first 3D cameracomputer system named Persee. It features a built-in Astra Pro 3D camera coupled with a processing unit comprised of an ARM processor and graphical processing unit. Until 2018, studies compared the Kinect v2 sensor to other marker-less motion capture technologies such as the Xtion by Asus or the RealSense by Intel for human motion analysis, excluding the Orbbec systems [27][28][29][30]. With the discontinuation of the Microsoft Kinect series, however, comparative studies have continuously considered Orbbec cameras as alternative solutions. As application developers, we considered the Orbbec Persee as a skeleton tracking system because, compared to other 3D camera systems, including the Kinect sensor, it is an all-in-one device that does not require additional hardware. For example, Cebanov et al. already pinned the outlook toward the use of the Persee as tool for AAL applications, although they developed a system for activity recognition using the Microsoft Kinect [31]. As already mentioned, in a study in 2018 [32], the interchangeability of Orbbec Persee and Microsoft Kinect was shown for gesture recognition. The study of Calin et al. [32] showed that Kinect v2 and the Orbbec Astra camera systems are interchangeable for gesture recognition tasks of 16 different poses. The poses mainly included the hand positions or rather movements such as scratching the head or talking on a phone as well as sitting, thinker and servant pose. They compared the classification results of several evidence-based posture recognition algorithms that were applied on databases collected by the Orbbec Astra and Kinect v2 sensors, and even a mixed database. Seven out of 23 classifiers dropped in accuracy (between 50% and 85%) due to differences in the systems' skeleton models. In the study of Kritikos et al. [25], they developed a mobile rehabilitation system using the Orbbec Persee as 3D camera system. They designed a real-time feedback system on the upper limb exercise lateral rise of the right hand. The trainees received acoustic feedback via Bluetooth using an app on their smartphones which they held in their left hand. They further tested the alignment of the system with a physician's opinions on detection of incorrect repetitions, resulting in an alignment of 75.2%, where the system identified more incorrect repetitions than the physician [25].
Before further developing the ILSE training system towards a similar movement quality assessing system, our goal was to discuss if the skeleton tracking using Orbbec Persee can sufficiently substitute the Kinect v2 skeleton tracking under the same conditions when it comes to monitor exercise with the ILSE system's current static posture recognition and repetition counting during functional training.
In order to expand the knowledge on the usage and evaluate the use of Persee as a Kinect substitute, we aimed to test the interchangeability for functional exercise monitoring for the AAL pilot region fit4AAL. The selected analysis methods add to the accuracy comparison by Calin et al. [32], and offer an extension to assess skeleton tracking capabilities: the tracking state, i.e., the confidence of skeleton tracking, as well as the dynamic skeleton tracking robustness, i.e., the consistency of skeleton tracking, were taken into account. Furthermore, we compared the performance of Persee and Kinect with respect to the three different subsystems of the ILSE system: (1) the skeleton tracking, (2) the static posture recognition, and (3) the repetition counting. The drafted algorithms were used to determine for which exercises the 3D camera systems already resulted in acceptable accuracies and where a further iteration of design and development was required.

Technical Specifications of Microsoft Kinect v2 and Orbbec Persee
There are two main differences between Kinect v2 and Persee: (i) The Kinect v2 uses the Time-of-Flight (ToF) principle, while the Orbbec Persee, like the Kinect v1, uses the structured light principle for recognizing three-dimensional scenes. The two methods have the aim to acquire depth data of the scenes that the skeleton tracking further processes to position information. While the structured light struggles with the recognition of objects further away from the camera, the ToF method is prone to artifacts due to reflections or closeness to planar surfaces such as the floor [33].
(ii) Moreover, the Kinect was developed as an add-on for the game console Xbox One, meaning that it requires additional hardware for processing and running software. The Persee, in contrast, comes with an integrated computer using either the Android or Ubuntu Linux operating system. The reported frame rate of both systems is 30 frames per second (FPS). A more detailed comparison of the technical specifications of Kinect v2 and Persee is given in Table 1.  [34]. The difference of five joints between the skeleton tracking algorithms does not matter for this study, since 15 joints were of interest for the comparison between Kinect v2 and Persee. The identification of joints of interest was strongly linked to the selection of human body movements. The joints and their annotations used throughout this work are illustrated in Figure 1. To test the interchangeability, we collected skeleton data of different functional movements using the Kinect v2 and the Persee under the same conditions. at most two people simultaneously, such as the first SDK version of the Kinect v1 [34]. The difference of five joints between the skeleton tracking algorithms does not matter for this study, since 15 joints were of interest for the comparison between Kinect v2 and Persee. The identification of joints of interest was strongly linked to the selection of human body movements. The joints and their annotations used throughout this work are illustrated in Figure 1. To test the interchangeability, we collected skeleton data of different functional movements using the Kinect v2 and the Persee under the same conditions.

Skeleton Data Collection
For the skeleton recordings, Kinect v2 and Persee were placed above each other in one vertical line. Thus, almost the same field of view could be ensured by only a slight difference in height (6.5 cm between the Kinect v2 and Persee lenses). The Persee was connected to a TV via HDMI. An Android app based on Nuitrack version 1.3.1 and Android 5.0 (version 16.04) was used for data collection with buttons to start and stop recording. For the data acquisition with Kinect v2, the sensor was connected via USB to a laptop on which the Kinect Software Development Kit (SDK) version 2.0 with the integrated skeleton tracking algorithm recorded. The following variables were collected on both systems: timestamps, joint positions, and tracking states.
Six healthy volunteers (four women, two men; age: 28.5 ± 6 years) were instructed in the purpose and the procedure of the measurements. All subjects gave their written informed consent before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki. In addition, an ethics approval of the ethics committee of the University of Salzburg was obtained (EK-GZ: 23/2018).
In total, nine functional motor tasks were selected to be recorded based on their static or quasi-static characteristics. The selection of the human movements strongly depended on the intended AAL application for functional training. Therefore, the study design of Bonnechère et al. [35] for functional movements was adapted and extended. The accuracy of the joint estimation of the Kinect sensor has been comparable to motion capture systems in more controlled body postures such as standing and exercising arms [36,37]. Thus, and in order to assess more uncontrolled body postures, different body movement categories were separated based on the dynamics of the movement, from simple to more complex exercises: Low-dynamic movements are exercises in sitting pose (shoulder abduction, elbow flexion, knee extension) or standing pose (shoulder abduction, elbow flexion, hip abduction): the shoulder abduction movements included rising both stretched arms from the initial position up to the height of the shoulders and back again. The elbow flexion also started with aligning the stretched arms along the upper body, and bending the lower arms to the front up to the height of the elbows and back again. These two exercises were performed while standing and sitting-(1) and (4) as well as (2) and (5), respectively, in

Skeleton Data Collection
For the skeleton recordings, Kinect v2 and Persee were placed above each other in one vertical line. Thus, almost the same field of view could be ensured by only a slight difference in height (6.5 cm between the Kinect v2 and Persee lenses). The Persee was connected to a TV via HDMI. An Android app based on Nuitrack version 1.3.1 and Android 5.0 (version 16.04) was used for data collection with buttons to start and stop recording. For the data acquisition with Kinect v2, the sensor was connected via USB to a laptop on which the Kinect Software Development Kit (SDK) version 2.0 with the integrated skeleton tracking algorithm recorded. The following variables were collected on both systems: timestamps, joint positions, and tracking states.
Six healthy volunteers (four women, two men; age: 28.5 ± 6 years) were instructed in the purpose and the procedure of the measurements. All subjects gave their written informed consent before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki. In addition, an ethics approval of the ethics committee of the University of Salzburg was obtained (EK-GZ: 23/2018).
In total, nine functional motor tasks were selected to be recorded based on their static or quasi-static characteristics. The selection of the human movements strongly depended on the intended AAL application for functional training. Therefore, the study design of Bonnechère et al. [35] for functional movements was adapted and extended. The accuracy of the joint estimation of the Kinect sensor has been comparable to motion capture systems in more controlled body postures such as standing and exercising arms [36,37]. Thus, and in order to assess more uncontrolled body postures, different body movement categories were separated based on the dynamics of the movement, from simple to more complex exercises: Low-dynamic movements are exercises in sitting pose (shoulder abduction, elbow flexion, knee extension) or standing pose (shoulder abduction, elbow flexion, hip abduction): the shoulder abduction movements included rising both stretched arms from the initial position up to the height of the shoulders and back again. The elbow flexion also started with aligning the stretched arms along the upper body, and bending the lower arms to the front up to the height of the elbows and back again. These two exercises were performed while standing and sitting-(1) and (4) as well as (2) and (5), respectively, in Figure 2. In addition to these bilateral movements, for each pose, a unilateral exercise was added: in the sitting pose, raising one of the lower legs up to the front was considered-(3) in Figure 2.
The successive lateral spreading of the entire leg was performed during standing-(6) in Figure 2.   Dynamic movements performed with varying poses included knee flexion (squat), hip flexion (hip hinge), and hip/knee flexion and extension (lunge). The squat is a popular exercise in functional training-(7) in Figure 2. Since functional exercises often include movements with the trunk bent over, the hip flexion movement described by bending over the upper body while remaining a straight back was added-(8) in Figure 2. The hip/knee flexion and extension in the form of a lunge was considered as an additional, more complex human movement-(9) in Figure 2. The lunge was performed bilaterally to the front. Hence, starting from a neutral stand, one of the legs was moved to the front performing the hip and knee flexion, followed by the extension in order to get back to the initial posture.
In total, nine functional movements were performed in front of the two systems. The subjects repeated each movement 10 times. The exercises hip abduction (standing), knee extension (sitting) and the lunges were performed five times on the right and five times on the left side. After each body movement, a break of at least five seconds was predetermined. All recordings were done in an office with dark carpeted floor and white walls with a whiteboard in the background.

Interchangeability Analysis
The entire data preparation and analyses were performed in RStudio with R version 4.0.3. The two skeleton datasets were temporally synchronized. Therefore, the subjects were asked to jump once before each body movement sequence. For the determination of the jumps and, thus, the synchronization of the systems, we used the local extrema in the signal along the longitudinal axis of the joint "Trunk Top" representing the highest vertical displacement during the jump. Furthermore, the videos recorded by the Persee were used to label the synchronization jump and the repetitions for each exercise sequence. In Table 2, the number of labeled repetitions and the initial static posture for each exercise are given. The motion sequence of interest for the application of the interchangeability analysis is defined from the start of the first repetition to the end of the last repetition. Tracking state analysis: Each of the two skeleton tracking algorithms provide information on the reliability of the joint tracking by so-called tracking states. The tracking states of the Kinect v2 are described by 'tracked', 'inferred' or 'not tracked'. The Persee in comparison indicates these states with level numbers of 0.75, 0.50 and 0.00, respectively. Within the tracking state analysis, the motion sequences from both 3D cameras were used in order to identify where tracking states decreased or even no confident tracking was possible. Due to the simultaneous observation of the same scene, we wanted to find indicators for situations and/or joints where the tracking algorithm of one system is more confident than the other. We observed the amount of tracking states to identify in which situations which joints reported tracking issues, or rather which exercise was confidently tracked by the 3D camera systems.
Skeleton tracking robustness: We used the body segment length variation as a measure for the robustness of the skeleton tracking. The body segment length describes the distance between two adjacent joints of the human body. For the computation of the segment lengths, the Euclidean distances d between the position vectors of adjacent joints → p and → q were calculated: In Table 3, the body segments of interest with their adjacent joints are shown. Naturally, the actual body segment length remains constant throughout the tracking algorithm's runtime [33]. However, the tracking algorithm produces varying lengths due to noisy input data (e.g., occlusion by other body parts or non-distinguishing depths [36]). A low variance in body segment length therefore means that the tracking algorithm can deal well with noisy input. In Table 3, the body segments of interest with their adjacent joints are shown. Naturally, the actual body segment length remains constant throughout the tracking algorithm's runtime [33]. However, the tracking algorithm produces varying lengths due to noisy input data (e.g., occlusion by other body parts or non-distinguishing depths [36]). A low variance in body segment length therefore means that the tracking algorithm can deal well with noisy input. For each body segment, the difference in length between successive frames was computed and then the average and corresponding standard deviation calculated. The median of these averages and the median of the standard deviations are the skeleton tracking robustness. We calculated skeleton tracking robustness once independent of subject, and once independent of exercise. For this analysis, only joint information with tracking state 'tracked'/0.75 were used in order to take the confidence of the systems into account, i.e., determine the difference when the system is confident to track the joint correctly.
Application of ILSE exercise monitoring: Furthermore, we applied the drafted algorithms of the ILSE exercise monitoring for static posture recognition and repetition counting. The resulting accuracies can determine under which conditions one or both systems fail and which further development steps have to be performed. In order to apply the posture recognition, the buttock-knee depth difference, and the knee heights were computed. In order to distinguish sitting from standing, the mean buttock-knee depth was calculated. When this depth or rather segment length information was below or above a certain threshold, the corresponding posture can be estimated. The threshold we used was half of the 5% percentile of the women buttock-knee-length from the German DIN 33,402 ergonomic anthropometric data collection [38], i.e., 275 mm. The percentile was selected in order to ensure that the static posture detection also works for "smaller" trainees. The developed generic static pose detection between standing and sitting further included: 1. Standing recognition (see Figure 3): In addition to the criterion that the mean buttock-knee depth information had to be less than 275 mm, the left and right knee had to be approximately at the same height with a set tolerated difference of 20 mm. This avoids a standing detection when one of the two legs is raised. 2. Sitting recognition (see Figure 4): In addition to the criterion that the mean buttockknee depth information had to be more than 275 mm, the knees had to be approximately at the same height with a tolerated difference of 20 mm. This does not only avoid detection of sitting when one of the legs is raised, but also identifies if a lunge rather than a sitting position is performed. For each body segment, the difference in length between successive frames was computed and then the average and corresponding standard deviation calculated. The median of these averages and the median of the standard deviations are the skeleton tracking robustness. We calculated skeleton tracking robustness once independent of subject, and once independent of exercise. For this analysis, only joint information with tracking state 'tracked'/0.75 were used in order to take the confidence of the systems into account, i.e., determine the difference when the system is confident to track the joint correctly.

Body Segment
Application of ILSE exercise monitoring: Furthermore, we applied the drafted algorithms of the ILSE exercise monitoring for static posture recognition and repetition counting. The resulting accuracies can determine under which conditions one or both systems fail and which further development steps have to be performed. In order to apply the posture recognition, the buttock-knee depth difference, and the knee heights were computed. In order to distinguish sitting from standing, the mean buttock-knee depth was calculated. When this depth or rather segment length information was below or above a certain threshold, the corresponding posture can be estimated. The threshold we used was half of the 5% percentile of the women buttock-knee-length from the German DIN 33,402 ergonomic anthropometric data collection [38], i.e., 275 mm. The percentile was selected in order to ensure that the static posture detection also works for "smaller" trainees. The developed generic static pose detection between standing and sitting further included:

1.
Standing recognition (see Figure 3): In addition to the criterion that the mean buttockknee depth information had to be less than 275 mm, the left and right knee had to be approximately at the same height with a set tolerated difference of 20 mm. This avoids a standing detection when one of the two legs is raised.

2.
Sitting recognition (see Figure 4): In addition to the criterion that the mean buttockknee depth information had to be more than 275 mm, the knees had to be approximately at the same height with a tolerated difference of 20 mm. This does not only avoid detection of sitting when one of the legs is raised, but also identifies if a lunge rather than a sitting position is performed.
On each new measurement, the transition criterion, which is derived from the initial posture recognition, is applied in order to determine whether to count or not to count the exercise movements. A generic repetition counting algorithm based on joint positions was designed for three principal movements of exercise (see Figure 5 for examples):

1.
Upward movement: From the starting position, joint heights along the longitudinal axis are monitored. If they exceed the transition criterion, another repetition is count.

2.
Lateral movement: From the starting position, joint positions along the frontal axis are monitored. If they exceed the transition criterion, another repetition is count.

3.
Downward movement: From the starting position, joint heights along the longitudinal axis are monitored. If they are below the transition criterion, another repetition is count.  On each new measurement, the transition criterion, which is derived from the initial posture recognition, is applied in order to determine whether to count or not to count the exercise movements. A generic repetition counting algorithm based on joint positions was designed for three principal movements of exercise (see Figure 5 for examples): 1. Upward movement: From the starting position, joint heights along the longitudinal axis are monitored. If they exceed the transition criterion, another repetition is count. 2. Lateral movement: From the starting position, joint positions along the frontal axis are monitored. If they exceed the transition criterion, another repetition is count. 3. Downward movement: From the starting position, joint heights along the longitudinal axis are monitored. If they are below the transition criterion, another repetition is count.
For the nine movements, in total, five adaptations of the counting algorithm were applied: for shoulder abduction movements, the upward movement of the elbow joints was tracked, for elbow flexions, the upward movement of the wrist joints, and for the sitting knee extension, the upward movement of the ankle joints. We used the abducting movement of the leg or rather the knee joints for the hip abduction movement. The three remaining exercises-hip hinge, squat and lunge-required the tracking of the downward movement of the top trunk joints of the skeletons.
The posture recognition and repetition counting algorithms only process joint information with tracking state of 'tracked'/0.75 to check when the system was confident to track the joints, how many postures are correctly recognized and how many repetitions are correctly counted. In total, the dataset consists of 542 repetitions, including 362 repetitions of 60 upward movement exercises, 120 repetitions of 12 lateral movement exercises, and 602 repetitions of 36 downward movement exercises.  On each new measurement, the transition criterion, which is derived from the initial posture recognition, is applied in order to determine whether to count or not to count the exercise movements. A generic repetition counting algorithm based on joint positions was designed for three principal movements of exercise (see Figure 5 for examples): 1. Upward movement: From the starting position, joint heights along the longitudinal axis are monitored. If they exceed the transition criterion, another repetition is count. 2. Lateral movement: From the starting position, joint positions along the frontal axis are monitored. If they exceed the transition criterion, another repetition is count. 3. Downward movement: From the starting position, joint heights along the longitudinal axis are monitored. If they are below the transition criterion, another repetition is count.
For the nine movements, in total, five adaptations of the counting algorithm were applied: for shoulder abduction movements, the upward movement of the elbow joints was tracked, for elbow flexions, the upward movement of the wrist joints, and for the sitting knee extension, the upward movement of the ankle joints. We used the abducting movement of the leg or rather the knee joints for the hip abduction movement. The three remaining exercises-hip hinge, squat and lunge-required the tracking of the downward movement of the top trunk joints of the skeletons.
The posture recognition and repetition counting algorithms only process joint information with tracking state of 'tracked'/0.75 to check when the system was confident to track the joints, how many postures are correctly recognized and how many repetitions are correctly counted. In total, the dataset consists of 542 repetitions, including 362 repetitions of 60 upward movement exercises, 120 repetitions of 12 lateral movement exercises, and 602 repetitions of 36 downward movement exercises.

Tracking State Analysis
Under the same conditions, we derived the following observations: during the 54 motion sequences, the skeleton tracking of the Kinect v2 recorded either with the status 'tracked' or 'inferred', never with status 'not tracked'. The skeleton tracking of the Persee documented the states 0.75 and 0.00, never the status 0.50. For both systems, the joints that were mostly labeled with tracking issues were the upper limb joints, namely the left and right wrist and elbow joints. For Kinect and Persee, respectively, the worst tracked joint is the right wrist. During 25 out of 54 motion sequences, the Persee skeleton tracking reported 'not tracked' right wrist information. The skeleton tracking of the Kinect v2 For the nine movements, in total, five adaptations of the counting algorithm were applied: for shoulder abduction movements, the upward movement of the elbow joints was tracked, for elbow flexions, the upward movement of the wrist joints, and for the sitting knee extension, the upward movement of the ankle joints. We used the abducting movement of the leg or rather the knee joints for the hip abduction movement. The three remaining exercises-hip hinge, squat and lunge-required the tracking of the downward movement of the top trunk joints of the skeletons.
The posture recognition and repetition counting algorithms only process joint information with tracking state of 'tracked'/0.75 to check when the system was confident to track the joints, how many postures are correctly recognized and how many repetitions are correctly counted. In total, the dataset consists of 542 repetitions, including 362 repetitions of 60 upward movement exercises, 120 repetitions of 12 lateral movement exercises, and 602 repetitions of 36 downward movement exercises.

Tracking State Analysis
Under the same conditions, we derived the following observations: during the 54 motion sequences, the skeleton tracking of the Kinect v2 recorded either with the status 'tracked' or 'inferred', never with status 'not tracked'. The skeleton tracking of the Persee documented the states 0.75 and 0.00, never the status 0.50. For both systems, the joints that were mostly labeled with tracking issues were the upper limb joints, namely the left and right wrist and elbow joints. For Kinect and Persee, respectively, the worst tracked joint is the right wrist. During 25 out of 54 motion sequences, the Persee skeleton tracking reported 'not tracked' right wrist information. The skeleton tracking of the Kinect v2 reported tracking issues with 'inferred' during 46 out of 54 motion sequences for the right wrist. The motion sequence shoulder abduction (standing) of subject P6 was the only one where the Kinect did not report any inferred tracking. Independent of exercise and subject, the Kinect tracked the joints "Trunk Top", "Trunk Mid", "Trunk Base" and left and right "Hip" joints with the tracking state 'tracked'. The Persee did not report any tracking issues for the motion sequences shoulder abduction (standing and sitting) and hip abduction (standing) for all subjects, except subject P4. In addition, the knee extension of subject P1 and the hip hinge of subject P4 were tracked entirely with status 'tracked' (see Table 4). Across all exercises and subjects, there was no joint that was reported without tracking issues by the Persee.

Skeleton Tracking Robustness
Figures 6 and 7 show the results of the skeleton tracking robustness of the Kinect v2 and the Persee. Figure 6 represents the skeleton tracking robustness results within each subject across all exercises. Figure 7 shows the results within each exercise across all subjects.

Accuracies of ILSE Exercise Monitoring
The skeleton tracking of both 3D camera systems identified every standing and sitting initial posture: 36 of 36 standing positions and 18 of 18 sitting positions. Only during the sitting posture recognition of the knee extension movement of subject P4, the skeleton tracking of the Microsoft Kinect v2 reported the tracking state of the right knee as 'inferred'.
From the 542 labeled repetitions, the Kinect v2 counted 378 repetitions correctly, leading to an accuracy of 69.7%. The Persee counted 253 repetitions correctly, resulting in an overall accuracy of 46.7%. The repetition counting accuracy over the exercises ranges from 17% (hip abduction) to 100% for the Kinect v2, and from 0% to 100% for the Persee (see Table 5). Both systems counted all 59 repetitions of the shoulder abduction (standing) correctly. The repetitions of the two exercises shoulder abduction (sitting) and elbow flexion (standing) were correctly counted with the Kinect data. The application of the counting algorithm monitoring the trunk downward movement during lunge exercises on the Persee skeleton data did not result in a single count.

Discussion
Within this study, we compared the confidence and consistency of skeleton tracking of Kinect and Persee by analysing their tracking states and their skeleton tracking robustness, respectively. Furthermore, the drafted posture recognition and repetition counting algorithms based on relative joint distances were used in order to determine if the algorithms work with data of both sources.
Confidence of skeleton tracking: Thus far, the analysis of the tracking state has been underrepresented in studies comparing depth camera systems (e.g., [33,39]). The tracking state analysis within this paper showed that the skeleton tracking of joints using the Orbbec Persee was more confident than the Microsoft Kinect v2 over entire motion sequences. The median percentage of the mean partition of not tracked time slots was, for the Persee, 0.5 ± 1.4%, in comparison to the Kinect, with 1.4 ± 3.5% inferred data points over all exercises and subjects. In particular, the tracking confidence over five of six subjects during shoulder abduction and hip abduction movements indicates that the Persee performs with high tracking confidence during exercises performed with the body facing towards the sensor (frontal views). This is underlined by the fact that the Persee reported a high tracking confidence when the subject P4 performed the hip hinge exercise with a very low range of motion. It can be argued that when the Persee reported tracking issues, it annotated them with the tracking state 0.00. The Kinect, in comparison, labeled them as 'inferred' joint positions. Furthermore, observing the confidence of tracking the joints over all exercises, the Kinect reported a median of average inferred tracking of joints by 0.4 ± 1.8% of the motion sequence. The Persee, in comparison, reported no tracking of joints for 0.8 ± 3.6% of the motion sequences (median of average). Hence, the result of the tracking state analysis depends on either confident tracking of certain joints such as trunk and hip joints over several exercises (Kinect) or confident tracking of all joints over frontal view exercises (Persee) is considered to be better. An explanation why the worst tracked joint was the right wrist could be the light incidence from the windows on the left side of the participants. The Persee reported the right wrist as not being tracked on average during 12 ± 22% of the motion sequence, while the Kinect showed similar behavior, reporting the status 'inferred' on average during 12 ± 19% of the motion sequence.
Consistency of skeleton tracking: Assessing the tracking robustness of each system, the skeleton tracking system used for the Persee (i.e., Nuitrack) was observed to be more consistent than the integrated one of the Kinect v2. The skeleton tracking robustness of confidently tracked body segments within and between the subjects was lower, hence, more consistent for the Persee tracking (see Figure 7). We noted that the Nuitrack skeleton tracking might implement a stabilization with right and left limbs since they resulted in similar lengths and hence, similar length differences. This leads to a more robust skeleton tracking using the Persee.
Application of ILSE exercise monitoring: Verifying the usage of the systems for ILSE, both systems recognized all initial postures of standing and sitting correctly. Only once did the reported tracking state of the Kinect hinder the application of the algorithm, since it was 'inferred' for the right knee. The skeleton tracking of the Microsoft Kinect v2, however, resulted in higher accuracies during repetition counting, with the Persee detecting fewer repetitions more often than the Kinect. Issues with counting the elbow joint movements were only given with the Persee data during shoulder abduction (sitting) for subjects P4 and P5. Both of them wore loose three-quarter blouses during the data collections instead of a T-shirt or a top. This affected the tracking of elbow joints. The Kinect had most issues with counting the correct number during side movements of the knee joints (hip abduction); the Persee with the upward movement of the wrist (elbow flexion) and the downward movement of the trunk. The lunge repetitions were not counted by the Persee. The shoulder abduction (standing) was the only exercise where both systems achieved an accuracy of 100%. The Kinect further acquired accuracies over 79% for shoulder abduction, elbow flexion and lunges.
In summary, the skeleton tracking of the Persee showed confidence and consistency. When it came to applying the same algorithms on both skeleton datasets, the Persee performed worse. This can be explained by the already observed reliable joint estimation performance in occluded situations of the Kinect v2 [33]. The accuracy of the joint estimation of the Kinect has been shown to be comparable to motion capture systems during more controlled body postures such as standing and exercising arms [36,37]. Nevertheless, it is remarkable that the generic ILSE exercise monitoring algorithms already performed well with the Kinect skeleton data, although they were neither optimized for the Kinect nor for the Persee. Since Calin et al. [32] reported a slight deviation in accuracy from gesture recognition classifiers when using the Persee instead of the Kinect, we are positive that a further iteration of the design and development of the repetition counting algorithm can improve the accuracy. The confidence and consistence of a skeleton tracking system can be increased by choosing a different skeleton tracking algorithm or add further sensing units such as a second 3D camera or even inertial measurement units [40][41][42]. The tracking state analysis as well as the skeleton tracking robustness can be useful methods for the assessment of skeleton tracking systems in iterative design processes. Furthermore, an additional comparison with the actual body segment length of the subjects can enrich the skeleton robustness assessment. Due to the small size of the exploratory dataset and the experimental development within this study, statistical testing was not reasonable but is highly suggested for further development of the system. Moreover, it is recommended to add different lightning conditions, more complex movements, and other controllable conditions in the future. Within the exploratory investigation and without statistical significance, these first results suggest that both systems preferred exercises performed with uncovered body parts with low range of motion.

Conclusions
Although the Kinect v2 and Persee are based on two different motion sensing principles, the two systems show promising behavior when it comes to human movement tracking. Both preferred exercises in more controlled situations without occlusions of body parts. The Persee liked further exercises with low range of motion of lower limbs and tight-fitting clothes. However, the alignment of both systems to the ground truth of the reported repetitions needs improvement. For the further assessment in the AAL context and the optimization of the algorithms for the ILSE application, the presented body movement dataset for functional training should be performed by the intended target group of AAL and extended by further movement types. Considering the known limitations of the two systems, such as occlusion [43] and low-dynamic functional movement tracking, the Orbbec Persee is interchangeable to the Microsoft Kinect v2 for the application in the ILSE exercise monitoring. The tracking accuracy might be improved by using sensor fusion with inertial measurement units [44,45] or in a network of multiple cameras [46]. However, the validation of the functionality of the Orbbec Persee in a live feedback training system is the subject of future research. The training system will include live feedback on quantity (static posture detection and repetition counting) and on quality (detection of common posture errors during exercise).
Author Contributions: From conceptualization over methodology and software to investigation V.V. in discussion with W.K. and T.S. contributed to this study; V.V. wrote the original draft and edited after reviews obtained by W.K. and T.S.; T.S. supervised the article; All authors have read and agreed to the published version of the manuscript. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data are not publicly available due to privacy issues. The video data is not anonymous.