1. Introduction
There is a clear and advancing benefit to the development of digitised automated systems to evaluate human motion using depth sensor technology for use in the healthcare domain [
1,
2,
3]. The general population is living longer; therefore, new and innovative means of quantifying and assessing a person’s physical health are needed to better allocate resources and target interventions. While many in the ageing population will remain healthy, active, and engaged into later life, some studies have shown that a minority of older adults suffer from frailty and musculoskeletal disorders [
4]. Focusing on frailty, it is not a single disease, but a combination of the natural ageing processes during which neuromuscular systems decline, and the accumulation of medical conditions leaves a person vulnerable to illness, trips, or falls [
5]. Further, older adults have unstable balance and motion stability compared with the young, and the amount of body sway increases with more challenging motions [
2,
6].
Although in-person clinical assessment is vital, there is a need to develop more efficient clinical approaches that are suitable for the Internet-of-Things, assistive living [
7], and Cloud Computing era [
5,
8,
9]. There are many limitations in current assessment processes. First, clinician-led assessments are dependent on the skills, experiences, and judgement of the individual clinician, and therefore may not always be objective. Second, clinical assessments are open to subjective bias and contain inter-/intra-variance between assessments. Third, the entire process can be time-consuming considering the person’s need to attend the appointment and undertake the assessment, and the need for clinics to arrange appointments and oversee the assessments. Fourth, people with physical mobility impairment increase their risk of further trauma by having to attend specialist clinics, so it would be preferable to undertake the assessment at home or a suitable location. Fifth, a person may exhibit different behavior because of the examination, which may alter the outcome and perceptions by the clinician.
Several studies have utilised depth sensor technology to analyse and quantify mobility to predict possible future declines in physical health [
1,
10,
11]. Early identification could enable remedial clinician-led intervention to occur more quickly and thus improve patient outcomes [
3,
4,
12]. Several attempts have been made to develop assessment systems to judge clinically relevant motions such as sit-to-stand, timed-up-and-go, and static balance [
13,
14,
15]. While these systems have been shown to be useful in monitoring and quantifying balance, they fall short of assessing time- and speed-related measurements between distinct population groups which could be insightful to a clinician in the decision-making process.
There are several methods which seek to characterise sit-to-stand by decomposing the motion into phases to identify the start, middle, and end phase, and how the movement was performed [
2,
16,
17]. Bennett et al. (2014) [
16] used pressure sensors to gather movement data. The Centre of Mass was calculated and evaluated using a classifier to determine if different phases of motion could be identified. The authors could identify between slow, unstable sit-to-stand, and healthy transition phases. Ejupi et al. (2015) [
13] used a depth sensor to examine the feasibility of detecting sit-to-stand motion between the elderly who may be prone to falling. By developing a system which uses time-and speed-related measurements, the authors could discriminate between those who were at high risk of falling and those who were not.
In this paper, we propose a non-invasive markerless digitalised and automated framework, using novel feature generation and motion decomposition, to analyse the performance of masters athletes, healthy old people, and young adults performing the sit-to-stand (five repetitions) motion, which is a functional test that is commonly used in a clinical setting to assess balance and stability [
18]. The framework acquires motion capture (mocap) data from a single depth sensor, where the skeletal stream is de-noised using a heuristic algorithm, then decomposed into a set of novel time-and speed-related features. Analysis techniques are employed to identify the performance in execution, sitting, and stand-to-sitting, thus providing detailed insight into the stages of motion analysis for clinicians.
4. Discussion and Conclusions
In this work, we utilise a depth sensor and automated framework to identify a range of clinically relevant outcome features that may be useful to a clinician in providing greater insight into the performance capability of a participant. Unique insights were obtained for each group. Young adults could execute the sit-to-stand, but presented large AP and ML. The healthy old group were able to execute the sit-to-stand, but presented a reduced AP and ML sway and an increase in time taken to stand and sit. Masters athletes could execute the sit-to-stand with relative ease, with little impediment to their motion, but presented reduced upper body lean when standing and sitting.
Comparing the performance between participant groups demonstrates the ability of the system to distinguish between effects of ageing. The young adults could perform the sit-to-stand with little impediment to their motion and were able to maintain control. While there is no doubt that masters athletes maintain a high physical capability [
28], performance nevertheless declines with advancing age alongside loss of muscle power and cardiopulmonary function [
29,
30,
31], so it is possible that the balance and performance of movements such as sit-to-stand in healthy old people and masters athletes decline with increasing age and loss of muscular control.
This work exclusively focused on the use of depth sensor technology due to its ability to track human motion without any physical anatomical landmarks, sensors, or devices being placed on the participant’s body. There are several studies that have utilised wearable technology (i.e., mobile devices, accelerometer, and gravity sensors) to track balance and sit-to-stand motions successfully; however, they are only capable of providing outcomes in relation to where the device/marker has been located, which means that the body itself is not being assessed, and they are expensive to implement widely [
32,
33,
34]. However, future work should explore uniting both modalities to provide a holistic overview of the execution of balance and sit-to-stand motions.
There are several limitations in this study, most of which relate to the use of technology in making a clinical judgement. First, the Kinect is sensitive to light, occlusion, and placement, which could impact the tracking of the skeletal joints and the outcomes from the framework. Future studies are needed to improve tracking in different environments. Second, this study relied on labelling annotated by human coders, and there is a potential that bias may impact coding. Future studies are needed to explore the relationship between human coding and computerised coding comparisons. Third, the detection of outlier frames may have impacted the detection of phases. Future work should seek to explore the use and reliability of interpolation methods, such as [
35], to replace outlier frames with an estimation of the correct frame. Finally, this study presented multiple analyses; however, we should consider how a clinician would interpret these results. Future work should seek to explore how we should present data in a clinical context.
We have proposed a framework which unites depth sensor technology and feature extraction to assess the sit-to-stand motion sequence. The framework has been shown to be reliable and accurate in evaluating the transition phases and providing clinical outcome measures. Future work will focus on future clinical validation, increase the number of participants, improve reliability, and extend the framework to analyse a wide range of motions.