1. Introduction
In recent years, people have been seriously affected in terms of physical health and quality of life due to motor dysfunction accompanied by cervical spondylosis, stroke, Alzheimer’s disease, etc. Additionally, the proportion of individuals experiencing motor function disorders due to various accidents is also on the rise [
1]. Research has shown that recovering the ability to engage in activities of daily living as early as possible is the main goal of rehabilitation for patients, and specific daily movement training, such as activities of daily living training, is the main means of improving the probability of limb recovery [
2].
The traditional action training program has the disadvantages of over-reliance on doctors’ professional ability, boring and cumbersome processes, and low training efficiency [
3]. With the development of technologies such as artificial intelligence and big data, intelligent rehabilitation medical treatment has received widespread attention [
4]. Intelligent rehabilitation medical treatment records and analyzes the evaluation process, simplifying training and saving human resources through an intelligent action evaluation system, which has the advantages of high convenience, efficiency, and accuracy [
5]. Therefore, the automatic assessment of daily movements in rehabilitation training can standardize assessment standards and is of great significance in providing training guidance [
6].
Research on human action evaluation can be categorized into two types based on the method of data acquisition: that on wearable sensors and that on non-wearable sensors [
7]. The former uses inertial sensors and potentiometers in wearable devices, which are cost-effective and free from angle occlusion issues but may cause discomfort and restrict natural action [
8]. The latter employs vision sensors such as Kinect and Xtion PRO to provide natural, convenient, and non-intrusive 3D key-point data acquisition, which is essential to detecting abnormal actions and evaluating action quality [
9]. Due to the cost and data advantages of non-wearable sensors, research in this area is flourishing. Ma used Kinect sensors to obtain depth images, proposed the use of multiple Kinect detection of human key points to solve the action occlusion problem, and realized action quality evaluation for rehabilitation training [
10]. Bruce compared human key-point data obtained from Kinect v2 and the Vicon optical action capture system to achieve human action evaluation, including anomaly detection and quality evaluation [
11]. Wang used an Azure Kinect camera (Microsoft, Redmond, WA, USA) to capture a 3D model of the human body and selected actions such as standing, stepping, leg lifting, and squatting to calculate joint angles, enabling knee function evaluation and exercise evaluation [
12].
In terms of model building and algorithm research, more and more research in the literature explores action evaluation methods based on machine learning. Ding quantitatively evaluated features through multiple supervised learning models and employed end-to-end recurrent neural networks to improve action classification and evaluation [
13]. Wang used OpenPose to extract human key points and angle features from RGB images and successfully implemented tennis action evaluation [
14]. Li addressed the need for the automation of action evaluation in rehabilitation medicine and used time-domain filtering and a convolutional neural network to classify fine actions, achieving efficient action evaluation [
5].
Most current studies rely on comparisons with manual predefined templates when quantifying action quality, which suffer from inconsistent time-series length. Some research has proposed time alignment algorithms such as Dynamic Time Warping (DTW) and its variants to address this problem [
15]. Zhou proposed standard component analysis and extended it to standard time warping to solve the temporal alignment problem of human action sequences [
16]. Gong combined stream shape learning and a new robust similarity metric to solve the temporal synchronization problem by using dynamic folding and warping, extending previous temporal alignment studies [
17]. Fan constructed action standard datasets and designed a distance function combined with a DTW algorithm for key-frame extraction and action evaluation [
18]. While these methods succeed in resolving the inconsistency of time-series lengths, they produce errors in calculating angles and distances between key points and do not consider weight in angle and distance features.
To address the above problems, in the second part of this article, an intelligent rehabilitation action evaluation system (IRAES) framework based on the Feature-Weighted DTW (FW-DTW) algorithm is proposed.
Section 3,
Section 4 and
Section 5 present data acquisition and processing methods, action segmentation and feature extraction strategies, and template creation and action evaluation algorithms, respectively.
Section 6 provides the experimental results. Finally,
Section 7 concludes the paper.
2. Intelligent Rehabilitation Action Evaluation System (IRAES) Framework
Rehabilitation therapy usually requires regular action training at a specialized medical center, which is often costly and time-consuming. Therefore, it makes sense to set up an action evaluation system at home or in an outpatient clinic. Under the guidance of a professional doctor, the quality of action training is fed back through the network, and the action evaluation is carried out by referring to the action rating scale, which enables the score of action achievement. This study utilizes the intelligent rehabilitation action evaluation system (IRAES) framework, which combines the traditional patient–clinician assessment model and the new home monitoring assessment model [
19], as shown in
Figure 1.
- B.
System Component
In order to establish a link between rehabilitation training and action evaluation, this paper leverages the IRAES framework, adopts the FW-DTW algorithm to evaluate daily upper-limb actions, and proposes an action achievement score method to enhance evaluation accuracy. The whole system mainly includes modules such as data acquisition, data processing, action segmentation, feature extraction, action evaluation, and achievement score, as shown in
Figure 2. The Azure Kinect sensor was used to collect videos of four daily actions, that is, drinking water, combing hair, touching shoulders, and touching pockets, to obtain the 3D coordinate data of the key points of the human skeleton. Data filtering and gap filling algorithms were used for data processing. The three-axis variance and difference of the main key points were calculated through a sliding window to find the peaks and valleys, and the start and end points of the action segments were determined for action segmentation. Human structure vectors were constructed, the joint angles and inter-joint distances were calculated, and features were normalized to form a feature matrix. The Euclidean Barycenter Dynamic Time Warping Barycenter Averaging (DBA) algorithm was used to generate an action template, while the DTW algorithm matched the feature matrix to the action template sequence. The similarity between the two sequences was calculated, and a score mechanism was established to evaluate action performance.
3. Data Acquisition and Processing
An Azure Kinect depth camera was used to collect 3D coordinate data of 32 key points of human skeleton, and the human key-point skeleton diagram composed of 32 key points is shown in
Figure 3. This paper evaluates upper-limb action, including the nose, by selecting key points including the head, neck, shoulders, elbows, hands, wrists, thoracic spine, lumbar spine, spine base, and hips; these key points are highlighted with colored circles in
Figure 4. Key points such as the eye, collarbone, fingertips, and lower limbs exert minimal impact on action evaluation. Since the hand and thumb are close, it is sufficient to choose one of the two, represented by a gray circle, which can reduce the data dimension and improve the efficiency of action evaluation. During the basic action, where the person remains stationary, the remaining six key points, shown in black circles, are selected as reference points for action evaluation.
- B.
Data processing
During data collection, due to the occlusion of key points, ambient light, clothing worn, external interference, and equipment placement, there may be data gaps and noise, resulting in errors between experimental values and real values and loss of key information. In order to reduce the impact of these errors, gap filling and data filtering are required:
- (1)
Gap filling: In this paper, segmented cubic spline interpolation is used to fill the gaps. The main idea is to divide the data series into several intervals and use cubic polynomials to fit the data in each interval, thus calculating the value and filling in the gaps. The fitted curve is represented by the segmented function S(x); for example, n + 1 data points are given (xi,yi), where i = 0,1,∙∙∙,n, xi∈[a,b]. Segmentation is the division of the interval [a,b] into n intervals [(x0,x1),(x1,x2),...,(xn-1,xn)], where the left-end point x0 = a, the right-end point xn = b, and a cubic polynomial of the shape yi = a + bxi + cxi2 + dxi3 is constructed in each interval. S(x) must meet the following conditions:
- (1)
If the interpolation condition is satisfied, every point in the interval passes through a cubic spline function, that is, S(xi) = yi, where i = 0,1,∙∙∙,n.
- (2)
In each interval [xi,xi + 1], there is a cubic polynomial.
- (3)
Curve smoothing: S(x)∈C2[a,b], S(x), S′(x), and S″(x) are continuous over the interval [a,b].
S(x) can be solved by using three types of boundary conditions. Based on the cubic equation of S(x), each interval contains four unknowns. For n intervals, this results in 4n unknowns, requiring an equal number of equations (4n) to solve. In total, 4n−2 equations can be obtained from the three satisfying conditions of the cubic spline function, and the remaining two equations can be obtained from the boundary conditions, which are of three types:
- (1)
The second-order derivative of the specified endpoint, here M0 and Mn, respectively. S″(x0) = M0, and S″(xn) = Mn; in particular, when M0 = Mn = 0, that is, S″(x0) = S’’(xn) = 0, they are called natural boundary conditions.
- (2)
The first-order derivative of the specified endpoint, here m0 and mn, that is, S′(x0) = m0 and S′(xn) = mn.
- (3)
S(x) is a periodic function with period (b-a): S(k)(x0 + 0) = S(k)(xn−0), with k = 0,1,2.
Spline curves under three different boundary conditions are used, where boundary condition 1 is the second-order derivative of the specified endpoint, boundary condition 2 is the first-order derivative of the specified endpoint, and boundary condition 3 is a periodic function within the interval; the results are shown in
Figure 4.
From the figure, the spline curves under all three boundary conditions exhibit noticeable variations at the endpoints, while the curves remain almost unchanged in the middle section. Boundary condition 1 (natural boundary condition), where the second derivative at the endpoints is set to zero, results in the smoothest and most natural transition at the endpoints. This condition avoids unnecessary oscillations and ensures a smooth, continuous curve that better reflects the natural motion trends, especially in applications like human body key-point data interpolation. Compared with first-order derivatives or periodic boundary conditions, especially for motion data, the velocity and acceleration at the endpoints usually have physical significance. Setting the second derivative at the endpoints (i.e., acceleration) allows the interpolation curve to better align with the actual physical motion laws. Additionally, experiments demonstrated that boundary condition 1 yields more accurate and realistic results in handling key-point drift and gap filling during fast movement, without introducing errors or unnatural changes. Its simplicity and computational efficiency further make it an ideal choice, especially when dealing with large data. Therefore, boundary condition 1 is chosen for segmented cubic spline interpolation in this study.
- (2)
Data filtering: this paper adopts the method of moving average to remove noise from the key-point data. The filter window size is set to 1, 3, 5, and 7, and a certain key-point data in the
X-axis direction are mean-filtered; the filtering effect is shown in
Figure 5.
As shown in the figure, with the increase in the window size, the data noise points are gradually reduced, and the smoothness of the curve is gradually increased, but when the window size is 7, the original data are changed to a large extent, and the accuracy of the experimental results cannot be guaranteed. In order to eliminate the influence of data noise, drift, etc., without changing the accuracy of the original data, it is appropriate to select the window size of 5 for the moving average.
4. Action Segmentation and Feature Extraction Strategy
- A.
Action Segmentation
The collected action data included invalid actions, such as the preparation process at the beginning of the action and the static process at the end; the invalid information should be removed, and the effective action data should be segmented before action feature extraction to improve the efficiency and accuracy of action evaluation. The action segmentation process is as follows:
① Set the length of the window to w and the fixed sliding step to l, where the units are all the number of samples;
② Select hand key points, and calculate the sum of the X, Y, and Z three-axis differences
Ek within each sliding window with the following equation:
where
Ek is the sum of the X-, Y-, and Z-axis differences of the hand key points at moment
k;
,
, and
are the hand key-point X-, Y-, and Z-coordinates of the
ith point in the window at moment
k, respectively; and
,
, and
are the X-, Y-, and Z-coordinates of the hand key points of the
i-1
th point in the window at moment
k, respectively.
③ Calculate the sum of the X-, Y-, and Z-axis variances,
Vk, of the key points of the hand within the sliding window with the following equation:
where
Vk is the sum of the X, Y, and Z three-axis variances of the hand key points at moment
k, and
,
, and
are the mean values of X, Y, and Z in the window, respectively.
④ Draw the V-value curve graph, and look for the peaks in the V-value curve that meet the trajectory threshold condition of the peaks, constituting the trajectory sequence , where tv, i and tv is the V-value trajectory threshold, the setting of which can remove the interference generated by key-point judder.
⑤ Find the starting point of the action, and traverse the elements in the trajectory sequence Vp from front to back. If the horizontal coordinate of the current value corresponds to the difference and Ek > tE, then subtract the current sampling point from the start-point window length offset value ms, that is, obtain the start point of the action; if the current value of the horizontal coordinates corresponds to the difference and Ek < tE, then the sampling point is judged to be an interference point, and continue to judge the next waveform. tE is the threshold value of difference, and E is used to eliminate the interference generated by the judder of the key points of the hand.
⑥ Find the end point of the action, and traverse the elements in the trajectory sequence Vp from back to front. The judgement step is the same as in point ⑤; find the sampling point that meets the condition, add the sampling point to the end-point window length offset value me, and obtain the end point of the action.
- B.
Feature extraction
In each frame of the human body key-point data, 16 main key points are selected and combined in pairs to form 15 vectors representing the human body structure. In addition, two auxiliary vectors for calculating angles are added, which are from the left shoulder pointing to the left hip and from the right shoulder pointing to the right hip, for a total of 17 vectors. The projection of these vectors on the XOY plane is shown in
Figure 6.
Angle values are formed between structure vectors, and the changes in angle values reflect different action trends, so angle features can be extracted to evaluate actions. Through analysis, it is found that when different people perform the same action, the angle changes formed by key points such as shoulders, elbows, wrists, and hips are basically the same. Therefore, in this paper, we select four angle features consisting of these key points to evaluate the actions,
, as shown in
Table 1.
Angle features can only reflect the action trend, and the distance features between different key points are also needed to supplement the description of action details for similar actions. Through analysis, it is found that the distance features between the hand and the nose, the head, the right shoulder, and the right hip are more obvious and can allow for distinguishing different details of the same action, so these eight distance features are selected and normalized in this paper,
, as shown in
Table 2.
Combining angle features and distance features to form a feature matrix can provide multi-dimensional action information for action evaluation and reduce the error of action evaluation. Assuming that the angle feature of the
mth action sequence is
, where and the distance feature is
, where
k is the number of samples, the combination of angle and distance features forms the action feature matrix
, as shown in the following equation:
5. Template Creation and Action Evaluation
- A.
The overall framework of the action evaluation process
In action evaluation, in order to more objectively evaluate action achievement in the subject, it is necessary to produce action templates. In this paper, we use the DBA algorithm to create a unique action template for each type of action, which reduces the amount of computation and eliminates the chance of selecting action templates; then, we use the DTW algorithm to measure the similarity between the template action and the action to be tested, and we establish a score mechanism for action achievement based on the action rating scale [
20] to realize the action evaluation; the flowchart is shown in
Figure 7.
- B.
DTW algorithm
DTW is an algorithm that can compute the similarity of two time series of different lengths. It evaluates the similarity of two action sequences by dynamically adjusting the length of the action sequence to be measured and calculating the cumulative shortest distance from the template action sequence.
Let us suppose that there are two time series x(i) and y(j), where and . The warping path is denoted by W to represent the alignment or mapping of the time series x(i) and y(j), , and p denotes the length of W. The warping path should satisfy three constraints of boundary condition, continuity, and monotonicity.
Construct an
m ×
n lattice matrix with
x(
i) and
y(
i) sequence lengths with m and n as rows and columns, respectively, as shown in
Figure 8. The DTW algorithm accumulates
d(
i,
j) to find the grid point with the shortest cumulative distance
γ(
i,
j) to plan the optimal path. The calculation of the cumulative distance
γ(
i,
j) of any point in the grid is shown in the following equation:
where
d(
i,
j) is the Euclidean distance between elements in the
x(
i) sequence and elements in the
y(
i) sequence,
γ(
i,
j) is the cumulative distance of the grid at
d(
i,
j) from (1,1) to (
i,
j), and
denotes the selection of the point with the smallest cumulative distance.
In this paper, the tested action feature matrix
Fm is replaced by
x(
i), and the template action feature matrix
F’m is replaced by
y(
j). The calculation of
d(
i,
j) in Equation (4) is shown in Equation (5), since the feature matrix is a combination of angle features and distance features:
where
Fm(i,
k) is the
kth feature value of the
ith frame feature vector in
Fm,
F’m(j,
k) is the
kth feature value of the
jth frame feature vector in the action feature matrix
Fm, and
l is the number of feature values.
- C.
Template creation
In this paper, the DBA algorithm is used to obtain the average time series of multiple action template time series of each type of action to obtain a unique template of each type of action, which reduces the matching calculation time between the action time series to be measured and multiple template action time series, improves the efficiency of action evaluation, and eliminates the contingency of selecting action templates.
The input of DBA is a set of time series, and the output is the average sequence of the set of series. The purpose of DBA is to calculate an average sequence that minimizes the sum of squares of the DTW distances to all sequence in the series set. DTW alignment is performed on each time series to obtain the aligned sequence, that is, each sequence is mapped to the same timeline. Calculating the barycenter point of each sequence, that is, the points on each sequence are mapped to the average sequence to obtain the initial value of the average sequence. Then, DTW alignment is performed on the average sequence to obtain the aligned average sequence, and the center of gravity of the average sequence is calculated, that is, the points are mapped on each sequence to the aligned average sequence to obtain the new value of the average sequence. If the average sequence value changes, the average time series needs to be updated repeatedly until the value of the average series converges. DBA is an iterative algorithm, and the flowchart is shown in
Figure 9.
- D.
Action Evaluation
- (1)
DTW algorithm improvement: this paper improves the Euclidean distance calculation method in the DTW algorithm.
Euclidean distance calculation method based on feature weights: Due to data point drift when selecting key points during movement, it has a greater impact on angle features than distance features, and the error of action evaluation will be larger, especially when the actions are closer to each other. Therefore, this paper proposes the Euclidean distance calculation method based on feature weights, setting the weight share of angle features and distance features in action features as
w1 and
w2, respectively, and the sum of
w1 and
w2 is 1. Equation (5) is changed to the following equation:
where
Fm(i,
k1) is the
k1th feature value of the angle feature vector of the
ith frame in the action feature matrix
Fm,
F’m(
j,
k1) is the
k1th feature value of the angle feature vector of the
jth frame in the action feature matrix
F’m, and
l1 is the number of angle feature values;
Fm(
i,
k2) is the
k2th feature value of the distance feature vector of the
ith frame in the action feature matrix
Fm,
F’m(
j,
k2) is the
k2th feature value of the distance feature vector of the
jth frame in the action feature matrix
F’m and
l2 is the number of distance feature values.
- (2)
Similarity calculation: Action evaluation requires the actions to be quantified, evaluated, and compared to other actions by similarity calculations. Similarity measures can be used to compare how similar different people are when performing particular actions. It can reflect an individual’s action control and determine their level of performance when executing a specific action.
In this paper, DTW is used to calculate the distance between the action sequence to be tested and the template sequence, which is relative and cannot clearly reflect the degree of similarity between the two sequences. Therefore, after the cumulative shortest distance is obtained by DTW calculation, the similarity between the action sequence to be tested and the template sequence needs to be calculated in order to determine the degree of achievement of the action [
21]. The specific calculation formula is shown in the following equation:
where
s denotes the similarity between the action sequence to be tested and the template sequence with a value in the range of [0, 1],
γ is the Euclidean distance between the action sequence to be tested and the template sequence, and
k is the larger number of elements in the action sequence to be tested and the template sequence.
- (3)
Achievement score: In rehabilitation medicine, it is crucial to observe a patient’s motor performance and score his or her action achievement according to specific scoring criteria to evaluate the patient’s performance in rehabilitation training. Scoring criteria usually include requirements such as correct action form, speed, balance, and coordination. Assessors can use a scoring system to determine the patient’s level of rehabilitation and subsequently adjust the treatment plan to track the patient’s progress based on the score results. This is an important aspect of rehabilitation medicine that aims to help patients return to a normal level of living and improve their quality of life. In this paper, the action performance is linked to the action score, and a six-score scale is used to score action performance under the guidance of a professional doctor and with reference to the relevant literature [
22] and the Action Rating Scale [
23]. This article uses a five-point scale.
In rehabilitation action evaluation, a more standardized score mechanism is important for rehabilitation trainers and can provide more reference for subsequent treatment. When using the FW-DTW algorithm to calculate action similarity, the closer it is to 1, the higher the degree of achievement of the action sequence to be tested relative to the template action. Therefore, this paper links action similarity with action score, sets the correspondence between action score and action similarity with reference to the action rating scale, and establishes a five-point action achievement score mechanism, where scores of 0 to 4 corresponds to different levels of similarity, as shown in
Table 3.
6. Numerical Experiments
Preliminary
- (1)
Experimental environment: In this paper, we used a computer device with CPU model Intel i5-10400 (Intel, Santa Clara, CA, USA), a graphics card RTX 2070, and 16G RAM; we used Azure Kinect sensors (Microsoft, Redmond, WA, USA) to collect motion data, C++ to write the program and Visual Studio 2017 to compile it. Since Azure Kinect is no longer commercially available, it can be replaced with a depth camera.
- (2)
Data acquisition: Self-constructed healthy person action datasets, which include four types of daily activity actions, i.e., drinking water, combing hair, touching the opposite shoulder, and touching the back pocket, were used; each action type involved 10 healthy subjects to conduct the experiment, where each person completed the action process once in a natural state, resulting in a total of 40 sets of action data. Among the 10 groups of data for each type of action, 4 groups were randomly selected to generate templates for each type of action, and the remaining 6 groups were used as test groups. In the self-constructed patient action datasets, each type of action included experimental data completed by three patients, each of whom completed the action once in their natural state, and a total of 12 sets of action data were obtained.
Action segmentation
By setting the window length
w = 10 and the sliding step
l = 1, we calculated the three-axis difference sum and variance sum of the hand joints through the sliding window. We set the V-value curve peak–peak threshold
tv as the average of the sequence data, the E-value curve data threshold
tE = 10, the window length offset value
ms = 1.5
w, and
me = 3
w for the segmentation of the four kinds of actions of healthy people and patients.
Figure 10 shows the results for the drinking action as an example.
In the healthy person’s action, there are two peaks in the drinking action, where the first peak is shifted forward from the start-point window value and the last peak is shifted backward from the end-point window value, to obtain the start point and end point of the action, respectively. There are two to three trajectory peaks and some small trajectory peaks in the patients’ actions, which are caused by abnormal hand actions and the limb fluttering of the patient, indicating that it is more difficult for the patient to carry out actions with a large spatial span. The results show that the method used in this experiment can better segment the start and end points of the action and has good practicality and universality.
- C.
Feature extraction
- (1)
Angle features: Based on the key-point vectors of the human body, we could obtain angle features of healthy people and patients.
Figure 11 shows the results for the drinking action as an example.
As can be seen from the figure, the patients performed the drinking action with varying degrees of differences in the angles of the vectors compared with the healthy individuals, which indicates that the patients performed the action with significant dyskinesia, especially in terms of large-scale action and elbow bending.
- (2)
Distance features: Based on the distance of the human body reference points in the
Y-axis direction, we normalized the distance and obtained the distance features for healthy people and patients.
Figure 12 shows the results for the drinking action as an example.
By comparing the distance characteristic plots of healthy people and patients and analyzing the changes in distance during the drinking action, it was found that the patients performed the action with greater fluctuations in the distance characteristic curves, shorter periods of calmness, and obvious hand shaking, which prevented them from completing the action.
- D.
Action Evaluation
- (1)
Similarity calculation: The feature matrix DTW algorithm and the FW-DTW algorithm proposed in this paper were applied to healthy people and patients, respectively, and the similarity of their actions was calculated and averaged to compare the differences between the two methods in healthy individuals and patients; the results are shown in
Table 4. After a lot of experimental testing, the weighting ratio of the angle features in the action features was set to
w1 = 0.2, and the weighting ratio of the distance features in the action features was set to
w2 = 0.8.
By comparing the similarity averages in
Table 4, it can be found that the FW-DTW algorithm improved the similarity scores of drinking water and combing hair by 0.0003 in healthy individuals and touching the opposite shoulder and touching the back pocket in patients by 0.0017 and 0.0013, respectively. For patients, the similarity scores of drinking water, combing hair, touching the opposite shoulder, and touching the back pocket increased by 0.0821, 0.0614, 0.1019, and 0.2271, respectively. This is because the feature matrix-based algorithm integrates angle and distance features, while the feature weight-based algorithm reduces the error caused by the angle features and thus improves the similarity, proving that the feature weight-based DTW algorithm makes a significant contribution to improving the similarity of actions. Additionally, by comparing the average similarity values for each type of action between healthy individuals and patients, it was found that the similarity between the template actions and those performed by healthy individuals differed by more than 80% from those performed by patients. This indicates that the proposed evaluation method can effectively assess the action quality of individuals with varying health conditions, identifying those with limb disabilities.
- (2)
Achievement score: In rehabilitation action evaluation, for the four actions of drinking water, combing hair, touching the opposite shoulder, and touching the back pocket, we used the FW-DTW algorithm to score the action similarity data of six healthy people and three patients. The action score–similarity comparison table is shown in
Table 5.
Mathematical fitting was used, and we found that the mathematical relationship of score–similarity is a linear relation.
A polynomial was fitted to the score–similarity, and the results are shown in
Figure 13, where the confidence level of the fitted parameters reaches 99%, which can meet the needs of practical applications.
7. Conclusions
The intelligent rehabilitation action evaluation system (IRAES) framework is proposed in this article; it improves the accuracy of action similarity assessment by improving the distance calculation method in the DTW algorithm and provides a novel approach for the development of intelligent rehabilitation systems. Compared with traditional rehabilitation action evaluation methods, DTW-based evaluation using feature matrix, and machine learning-based evaluation, the IRAES makes the following three main contributions:
(1) Firstly, this paper proposes an intelligent rehabilitation action evaluation system framework (IRAES) based on the Kinect sensor, addressing the inefficiency, cumbersomeness, and subjective bias of traditional assessment methods. The system uses Kinect to capture the three-dimensional coordinate data of key points of the human skeleton and displays the patient’s movement assessment results in real time, which helps doctors adjust the rehabilitation plan in time. Compared with manual observation, this system provides a more objective, accurate, and real-time assessment of action quality. In particular, the DTW algorithm based on feature weights can adaptively adjust the alignment of action sequences, effectively balance the weights of angle and distance features, reduce errors, and improve the accuracy and robustness of action evaluation.
(2) Secondly, in view of the limitations of existing studies that primarily focus on data from healthy individuals, this paper constructed a comprehensive dataset containing action data of healthy people and patients for the IRAES. Compared with studies that only involve simple actions such as standing, sitting, and walking or only focus on the data of healthy people, this paper better aligns with the clinical application requirements in terms of action types and population coverage. By analyzing the similarity differences in four actions between healthy individuals and patients, limb disability can be tested, providing references for personalized rehabilitation plans. In addition, this paper proposes a DBA-based action template creation algorithm, which generates action templates from multiple action samples, significantly reducing computational load and time, thereby enhancing overall efficiency. These innovative designs in dataset construction and template generation algorithms support the real-time applicability of the intelligent rehabilitation action evaluation system.
(3) Finally, in order to solve problems such as human body differences, environmental interference, motion changes, and sensor errors, this article takes innovative measures in data processing, action segmentation, and feature extraction. We use segmented cubic spline interpolation and moving average to eliminate environmental interference and propose an action segmentation strategy based on action feature analysis to manage random pauses and repeated actions effectively. By constructing human body key-point vectors and extracting angle and distance features, the impact of individual differences and sensing errors is reduced. In order to achieve intelligent and adaptive evaluation, we designed an intelligent index system for rehabilitation movement evaluation, established a five-point score mechanism based on movement similarity, and achieved quantification and adaptability of movement quality assessment through polynomial fitting. Faced with the challenges of algorithm complexity and modeling difficulties, this article selected the DTW algorithm with low complexity and strong adaptability to achieve efficient and accurate rehabilitation action assessment, which has important reference value for the development of intelligent rehabilitation systems.
The experimental results demonstrate the following findings: Firstly, compared with action evaluation based on the feature matrix DTW algorithm, action evaluation based on the FW-DTW algorithm significantly improves the similarity of four types of actions, thereby enhancing the overall performance of action evaluation. Second, the difference in similarity between patients’ action data and the template action data is large enough to evaluate patients’ limb disability status. Finally, the confidence level of the established score mechanism reached 99%, confirming its applicability for rehabilitation action evaluation.