An Upper Extremity Rehabilitation System Using Efﬁcient Vision-Based Action Identiﬁcation Techniques

Featured Application: This study proposes an upper extremity rehabilitation system using efficient action identification system for home based on color and depth sensor information, and can perform well under complex ambient environments. Abstract: This study proposes an action identiﬁcation system for home upper extremity rehabilitation. In the proposed system, we apply an RGB-depth (color-depth) sensor to capture the image sequences of the patient’s upper extremity actions to identify its movements. We apply a skin color detection technique to assist with extremity identiﬁcation and to build up the upper extremity skeleton points. We use the dynamic time warping algorithm to determine the rehabilitation actions. The system presented herein builds up upper extremity skeleton points rapidly. Through the upper extremity of the human skeleton and human skin color information, the upper extremity skeleton points are effectively established by the proposed system, and the rehabilitation actions of patients are identiﬁed by a dynamic time warping algorithm. Thus, the proposed system can achieve a high recognition rate of 98% for the deﬁned rehabilitation actions for the various muscles. Moreover, the computational speed of the proposed system can reach 125 frames per second—the processing time per frame is less than 8 ms on a personal computer platform. This computational efﬁciency allows efﬁcient extensibility for future developments to deal with complex ambient environments and for implementation in embedded and pervasive systems. The major contributions of the study are: (1) the proposed system is not only a physical exercise game, but also a movement training program for speciﬁc muscle groups; (2) The hardware of upper extremity rehabilitation system included a personal computer with personal computer and a depth camera. These are economic equipment, so that patients who need this system can set up one set at home; (3) patients can perform rehabilitation actions in sitting position to prevent him/her from falling down during training; (4) the accuracy rate of identifying rehabilitation action is as high as 98%, which is sufﬁcient for distinguishing between correct and wrong action when performing speciﬁc action trainings; (5) The proposed upper extremity rehabilitation system is real-time, efﬁcient to vision-based action identiﬁcation, and low-cost hardware and software, which is affordable for most families.


Introduction
Telemedicine and home-care systems have become the trend because of the integration of technology into the practice of medicine [1][2][3][4][5].Some medical care can be applied at home using some simple system.If technology products are easy to operate, then patients or caregivers can conveniently perform daily self-care activities at home by themselves.This can improve the time spent on patient care, the quality of care, and the therapeutic benefits of care.Telemedicine and home care systems can also reduce the costs and time associated with transportation between the hospital and home for follow-up care or treatment [6][7][8][9].
In rehabilitation, the person receiving treatment exhibits impaired mobility.Treatment typically takes a long time to achieve a positive effect, and if the treatment is stopped or interrupted, then a functional decline or reversal of progress may occur [10,11].Therefore, rehabilitation is a very lengthy process that takes a heavy psychological, physical, and economic toll on the patients and families of patients [12].If an efficient rehabilitation system can be designed for use at home that helps patients to perform the movements that they must perform repeatedly every day to maintain their mobility and physical function, then with such a system, transportation challenges and costs are eliminated.Moreover, infection risks for patients with weakened immune systems can be avoided because frequent hospital stays are eliminated too.A home-based rehabilitation program can also make rehabilitation more flexible and enables more frequent exercise.In sum, transportation burdens are reduced, families have more time, and exercises can be more frequently performed.
Rehabilitation involves the use of repetitive movements for maintaining or improving physical or motor functions [10].After receiving a professional evaluation and a recommended exercise regimen to perform at home, a patient may only need to be observed and recorded by a rehabilitation supporting instrument during the basic training sections performed at home.A professional demonstrates an action or use of an instrument, and the patient can then operate a quality monitoring system or review visual feedback provided by a home rehabilitation system for quality self-monitoring of the motions performed during the daily rehabilitation training at home [13].
Some somatosensory games claim that they can achieve the effects of sports and entertainment, and some of these have been used for physical training and physical rehabilitation [14][15][16].However, these games are usually designed to involve whole body activities, and most require a standing position [17][18][19][20][21][22] often in front of a camera at a distance of at least 100 cm.Beside most of those somatosensory games and human pose estimation are focus on whole body pose discrimination [22] neither pay attention on changes of range of motion of joints nor think about how muscles works in those poses.
Nowadays, there are many human pose estimation systems that extract skeletons or skeleton points from depth sensors [18][19][20][21][22][23].However, determining the joint movements (include directions and angles) are necessary for rehabilitation applications.In [18][19][20], they derive a unified gaussian kernel correlation (GKC) representation and develop an articulated GKC and articulated pose estimation for both the full body and the hands.That study can achieve effective human pose estimation and tracking, but may have limitations in single joint movement determination, such as shoulder rotation, wrist pronation.In [21], the input depth map is extracted to match with a set of pre-captured motion exemplars to generate a body configuration estimation, as well as semantic labeling of the input point cloud, but as we know that body figure can be shown or defined as a point cloud but the real body skeletons is segmental, each movement comes from the angle change of the key joint rather than each point cloud of the body.In [23], the system grabs a color image as input and extract the 2D locations of anatomical keypoints, and then applies an architecture for jointly learning parts detection and parts association.The pros of [23] is 2D pose estimation of multiple people in images, that can be applied in social context identification assisting system for Autism or security monitoring system for home or public spaces, but may not be sufficiently accurate for medical assessment.In our study, we extract both color and depth information from RGB-D sensors, and rebuild a new upper extremity skeleton and skeleton points.To provide an efficient rehabilitation system that can assist the patients to train Appl.Sci.2018, 8, 1161 3 of 22 specific joints and muscles, we need to obtain not only the pose changes, but also to find out these changes coming from which joints in real human body skeletons.
Although there are many pose estimation systems and somatosensory games have been developed and presented, most of those existent systems are performed based on extracting skeletons or skeleton points from a depth sensor and mainly focus on providing human pose estimation, rather than determining the joint movements (include directions and angles).In this study, we desire to obtain not only the pose changed, but also try to find out these changes come from which joint in real human body skeleton.Most of those systems and game sets may not focus on training some specific actions for the purpose of medical rehabilitation, have insufficient evidence base on medical applications, and might be unsuitable for someone who are unable to stand or stand for long, lack of standing balance, or only need to rehabilitate their upper extremity, such as patients with spinal cord injuries, hemiplegia, or advanced age.
To overcome the aforementioned challenges, this study proposes an action identification system for home upper extremity rehabilitation, which is a movement training program for specific joints movement and muscle groups, and have advantages such as cost-economic, suitable for sitting position rehabilitations, and easy for operating at home.The major contributions of the study are: (1) the proposed system is not only a physical exercise game, but also a movement training program for specific muscle groups; (2) The hardware of upper Extremity Rehabilitation System included a personal computer with personal computer and a depth camera.These are economic equipment, so that patients who need this system can set up one set at home; (3) patients can perform rehabilitation actions in sitting position to prevent him/her from falling down during training; (4) the accuracy rate of identifying rehabilitation action is as high as 98%, which is sufficient for distinguishing between correct and wrong action when performing specific action trainings; (5) The proposed upper extremity rehabilitation system is real-time, efficient to vision-based action identification, and low-cost hardware and software, which is affordable for most families.

The Proposed Action Identification System for Home Upper Extremity Rehabilitation
In the proposed system, we apply an RGB-depth sensor to capture the image sequences of the patient's upper extremity actions to identify its movements, and a skin color detection technique is used to assist with extremity identification and to build up the upper extremity skeleton points.A dynamic time warping algorithm is used to determine the rehabilitation actions (Figure 1).

Experimental Environment
We suppose the user is sitting in front of the table to use this system.If the patient performs rehabilitation actions in a sitting position, the patient saves energy compared with standing, and the patient can focus his or her mind on the motion control training of the upper extremities.Moreover, if patients have limited balance abilities, they can sit at a table or desk, and the table or desk can provide support and prevent falls.The proposed system is set up similarly to a study desk or

Experimental Environment
We suppose the user is sitting in front of the table to use this system.If the patient performs rehabilitation actions in a sitting position, the patient saves energy compared with standing, and the patient can focus his or her mind on the motion control training of the upper extremities.Moreover, if patients have limited balance abilities, they can sit at a table or desk, and the table or desk can provide support and prevent falls.The proposed system is set up similarly to a study desk or computer table at home, which provides a familiar and convenient environment.Caregivers are not required to prepare an additional space to set up the rehabilitation system, and the patient can engage in rehabilitation in a small space (Figure 2).

Experimental Environment
We suppose the user is sitting in front of the table to use this system.If the patient performs rehabilitation actions in a sitting position, the patient saves energy compared with standing, and the patient can focus his or her mind on the motion control training of the upper extremities.Moreover, if patients have limited balance abilities, they can sit at a table or desk, and the table or desk can provide support and prevent falls.The proposed system is set up similarly to a study desk or computer table at home, which provides a familiar and convenient environment.Caregivers are not required to prepare an additional space to set up the rehabilitation system, and the patient can engage in rehabilitation in a small space (Figure 2).The hardware of the upper extremity rehabilitation system includes a personal computer and a Kinect camera.
The first step of the vision-based action identification technique is to identify the bones of the patient.Many methods exist to accomplish this task.In this study, we suppose that the user's sitting posture is similar to that of sitting at a table.In this situation, the bones of their upper extremities can be identified accordingly.Second, through the human skin detection approach, skeleton joint points can be determined as well.After the human skeletal structure is established, motion determination is conducted.Through the motion determination process, we can determine what type of motion the user just performed, and then inform the patient or caregiver whether the rehabilitation action was correct or not.The main purpose of these processes is to create a home rehabilitation system that can provide efficient upper extremity training programs for patients and their caregivers.

Depth/RGB Image Sensor
A home rehabilitation system should be easy to use and low cost; thus, we adopted the Microsoft Kinect RGB-depth (D) sensor to capture salient features.The Kinect depth camera and color camera have some differences, and some distortions occur between RGB color features and depth features extracted from the RGB-D sensors.Therefore, a calibration process to adjust the color and depth features of images is necessary to achieve a coherent image.Figure 3 indicates the manner in which the angle of the depth camera and the angle of the color camera should be adjusted to calibrate depth information and color information to achieve a coherent image.

Depth/RGB Image Sensor
A home rehabilitation system should be easy to use and low cost; thus, we adopted the Microsoft Kinect RGB-depth (D) sensor to capture salient features.The Kinect depth camera and color camera have some differences, and some distortions occur between RGB color features and depth features extracted from the RGB-D sensors.Therefore, a calibration process to adjust the color and depth features of images is necessary to achieve a coherent image.
Figure 3 indicates the manner in which the angle of the depth camera and the angle of the color camera should be adjusted to calibrate depth information and color information to achieve a coherent image.

Skeletonizing
In this study, we detected and skeletonized the patient's upper extremities using OpenNI [24,25] techniques.OpenNI cannot identify the human skeleton immediately at a short distance, and human skeleton detection is difficult in a sitting position.Thus, we extracted the contours of the image, and then determined the human upper body bones from these contours, thereby identifying the joints of the upper body.This study did not directly adopt the skeletal determination of Open NI; rather, we applied the distance transform process for the body contour image in preprocessing [26,27].

RGB to Gray color transform
If the background and color of the characters are too similar, mistakes can be made in the identification of the human body and errors can be made in the establishment of skeletal joint points.Therefore, the image is converted to gray-scale to remove the background.

Skeletonizing
In this study, we detected and skeletonized the patient's upper extremities using OpenNI [24,25] techniques.OpenNI cannot identify the human skeleton immediately at a short distance, and human skeleton detection is difficult in a sitting position.Thus, we extracted the contours of the image, and then determined the human upper body bones from these contours, thereby identifying the joints of the upper body.This study did not directly adopt the skeletal determination of Open NI; rather, we applied the distance transform process for the body contour image in preprocessing [26,27].
• RGB to Gray color transform If the background and color of the characters are too similar, mistakes can be made in the identification of the human body and errors can be made in the establishment of skeletal joint points.Therefore, the image is converted to gray-scale to remove the background.

• Distance Transform
A distance transform, also known as a distance map or distance field, is a derived representation of a digital image.The map labels each pixel of the image with the distance to the nearest obstacle pixel.The most common type of obstacle pixel is a boundary pixel in a binary image.One technique that may be used in a wide variety of applications is the distance transform or Euclidean distance map [28,29].The distance transform method is an approximate Euclidean distance map.The distance transform of the image labels each object pixel of the binary image with the distance between that pixel and the nearest background pixel.For example, if the binary image has a pixel it is 1, and if there is no pixel it is 0 (Figure 4a).After distance transform, the farther away from the pixel value of 0, the greater the result that Euclidean distance will obtain.The pixel value of the center will change from 1 to 3 as depicted in Figure 4b.Thus, the distance transform features can highlight the outline of a skeleton frame (Figure 5).distance map.The distance transform of the image labels each object pixel of the binary image with the distance between that pixel and the nearest background pixel.For example, if the binary image has a pixel it is 1, and if there is no pixel it is 0 (Figure 3a).After distance transform, the farther away from the pixel value of 0, the greater the result that Euclidean distance will obtain.The pixel value of the center will change from 1 to 3 as depicted in Figure 3b.Thus, the distance transform features can highlight the outline of a skeleton frame (Figure 4).• Gaussian Blur 7 × 7 The Gaussian smoothing method is applied to reduce the noisy features.After the Gaussian smoothing process is performed, the edges of the image become blurred, and that reduces pepper and salt noises simultaneously.The Gaussian blur convolution kernel sizes mostly range from 3 × 3 to 9 × 9.This study adopted the convolution kernel size 7 × 7, which could obtain the best results for the overall skeleton frame building process in our experiments (Figure 5).distance map.The distance transform of the image labels each object pixel of the binary image with the distance between that pixel and the nearest background pixel.For example, if the binary image has a pixel it is 1, and if there is no pixel it is 0 (Figure 4a).After distance transform, the farther away from the pixel value of 0, the greater the result that Euclidean distance will obtain.The pixel value of the center will change from 1 to 3 as depicted in Figure 4b.Thus, the distance transform features can highlight the outline of a skeleton frame (Figure 5).• Gaussian Blur 7 × 7 The Gaussian smoothing method is applied to reduce the noisy features.After the Gaussian smoothing process is performed, the edges of the image become blurred, and that reduces pepper and salt noises simultaneously.The Gaussian blur convolution kernel sizes mostly range from 3 × 3 to 9 × 9.This study adopted the convolution kernel size 7 × 7, which could obtain the best results for the overall skeleton frame building process in our experiments (Figure 6).• Gaussian Blur 7 × 7 The Gaussian smoothing method is applied to reduce the noisy features.After the Gaussian smoothing process is performed, the edges of the image become blurred, and that reduces pepper and salt noises simultaneously.The Gaussian blur convolution kernel sizes mostly range from 3 × 3 to 9 × 9.This study adopted the convolution kernel size 7 × 7, which could obtain the best results for the overall skeleton frame building process in our experiments (Figure 6).

• Convolution Filtering
Next, to strengthen the skeleton features, we used directional convolution filters to enhance the features.This study used the 5 × 5-sized convolution kernel because it is faster than convolution kernel 7 × 7 and clearer than convolution kernel 3 × 3 to highlight the features of an image of the human body (Figure 6).

• Convolution Filtering
Next, to strengthen the skeleton features, we used directional convolution filters to enhance the features.This study used the 5 × 5-sized convolution kernel because it is faster than convolution kernel 7 × 7 and clearer than convolution kernel 3 × 3 to highlight the features of an image of the human body (Figure 7).

• Convolution Filtering
Next, to strengthen the skeleton features, we used directional convolution filters to enhance the features.This study used the 5 × 5-sized convolution kernel because it is faster than convolution kernel 7 × 7 and clearer than convolution kernel 3 × 3 to highlight the features of an image of the human body (Figure 7).Through the convolution filtering process, the areas that required handling were highlighted.We adopted four directions of convolution filtering.Through the four-directional convolution filtering, the contour pixels in each of the four directions are strengthened.The result highlights pixels in each single direction.Take 0 degree terms for an example, as depicted in Figure 8.We compute the maximal values from the four-directional convolution filtering results as the feature values of the corresponding feature points in the following process.Through the convolution filtering process, the areas that required handling were highlighted.We adopted four directions of convolution filtering.Through the four-directional convolution filtering, the contour pixels in each of the four directions are strengthened.The result highlights pixels in each single direction.Take 0 degree terms for an example, as depicted in Figure 8.We compute the maximal values from the four-directional convolution filtering results as the feature values of the corresponding feature points in the following process.

• Convolution Filtering
Next, to strengthen the skeleton features, we used directional convolution filters to enhance the features.This study used the 5 × 5-sized convolution kernel because it is faster than convolution kernel 7 × 7 and clearer than convolution kernel 3 × 3 to highlight the features of an image of the human body (Figure 6).Through the convolution filtering process, the areas that required handling were highlighted.We adopted four directions of convolution filtering.Through the four-directional convolution filtering, the contour pixels in each of the four directions are strengthened.The result highlights pixels in each single direction.Take 0 degree terms for an example, as depicted in Figure 7.We compute the maximal values from the four-directional convolution filtering results as the feature values of the corresponding feature points in the following process.

• Binarization
We used a binarization process as presented in Equation (1) to exclude nonskeletal features.First, we examined the image from the bottom left corner of the origin (0, 0) to the top right corner of the screen (255, 255) to determine if pixels were present.To determine whether a pixel is skeleton or noise, the cut off threshold is set at 6. output Img(x, y) = 0, output Img(x, y) < 6 255, output Img(x, y) > 6 (1) Following the steps of distance transform, Gaussian blur, and convolution filtering, we could exclude non-skeleton noise and obtain the image of a human skeleton as depicted in Figure 9.
First, we examined the image from the bottom left corner of the origin (0, 0) to the top right corner of the screen (255, 255) to determine if pixels were present.To determine whether a pixel is skeleton or noise, the cut off threshold is set at 6. 0, output Img(x, y) 6 output Img(x, y) 255,outputImg(x, y) 6 Following the steps of distance transform, Gaussian blur, and convolution filtering, we could exclude non-skeleton noise and obtain the image of a human skeleton as depicted in Figure 8.

Skin Detection
We integrated skin color detection into the skeletonizing process to enable the system to more accurately establish the skeleton.

• RGB to YCbCr (luminance and chroma) Transform
In this study, we applied the YCbCr elliptical skin color model [30] to detect the skin regions.The YCbCr color model decouples the color and intensity features to reduce the lighting effects in color features.Although the YCbCr color model consumes more computational time than the RGB model, it performs computations faster than the HSV (Hue-Saturation-Value) color model [28].Therefore, using YCbCr color features from human bodies, we can extract skin regions with computational efficiency and more accurately establish the corresponding human skeletons.

• Elliptical Skin Model
We used an elliptical skin model to determine whether skin was present.We created an elliptical skin model where if the pixels outside the elliptical model do not match the skin color, then the pixels inside the ellipse represent the skin color.

Morphologcal Close Operation
The RGB image must be converted into YCbCr, and the elliptical skin model is used to separate areas that may be skin color from those that are not.When an area in which the RGB image is likely to be the skin color is separated and reserved, the images may nevertheless include too much noise.In that case, we would conduct morphological close processing on the resultant feature maps.

•
Largest Connected Object Morphological close makes the object itself more connected and also filters out noise.The operation uses the connected object to find the largest connected object in the picture, and then

Skin Detection
We integrated skin color detection into the skeletonizing process to enable the system to more accurately establish the skeleton.

• RGB to YCbCr (luminance and chroma) Transform
In this study, we applied the YCbCr elliptical skin color model [30] to detect the skin regions.The YCbCr color model decouples the color and intensity features to reduce the lighting effects in color features.Although the YCbCr color model consumes more computational time than the RGB model, it performs computations faster than the HSV (Hue-Saturation-Value) color model [28].Therefore, using YCbCr color features from human bodies, we can extract skin regions with computational efficiency and more accurately establish the corresponding human skeletons.

• Elliptical Skin Model
We used an elliptical skin model to determine whether skin was present.We created an elliptical skin model where if the pixels outside the elliptical model do not match the skin color, then the pixels inside the ellipse represent the skin color.

• Morphologcal Close Operation
The RGB image must be converted into YCbCr, and the elliptical skin model is used to separate areas that may be skin color from those that are not.When an area in which the RGB image is likely to be the skin color is separated and reserved, the images may nevertheless include too much noise.In that case, we would conduct morphological close processing on the resultant feature maps.

• Largest Connected Object
Morphological close makes the object itself more connected and also filters out noise.The operation uses the connected object to find the largest connected object in the picture, and then records the location of its center point.The center of the position is the head of the skeleton depicted in Figure 10.
records the location of its center point.The center of the position is the head of the skeleton depicted in Figure 9.

Skeleton Point Establishment
After obtaining the curve-shaped human skeleton from the body contour image and skin color detection, the skeleton is not yet quite consistent with the actual upper extremity structure of the linear human skeleton with joints.Therefore, to accurately detect body movements, it is necessary to establish the accurate position of each joint where movements are produced.In this study, we present a rule-based process for establishing the skeleton point, which is as follows.

Head Skeleton Point Determination
To rapidly determine the skeleton points of the human head, we assume that when the rehabilitation user is sufficiently near the screen center, then the head skeleton can be detected in the central region of the screen, denoted by SC.We adopted the skeleton feature map and the detected skin region, denoted by SK and SR, respectively, to locate the head skeleton point.Because the user's face should be sufficiently close to the camera, the area of the face's skin region should be sufficiently large.The search process for the head skeleton point is performed through the following steps.
Step 1: Scan the skeleton feature map within the circular region whose center is located at the central coordinate of the input image with a radius of max(W/4, H/4), where W and H denote the width and height of the image, respectively.
Step 2: If the skeleton feature map can be obtained in Step 1, then we validate the corresponding connected-component area of the skin color region covering the skeleton feature map that is sufficiently large to be a human face-its area should be larger than a given threshold Ts, where the threshold is set at 800 pixels in our experiments.As a result, we can obtain the location of the head skeleton point, denoted by SH and depicted in Figure 10.

Skeleton Point Establishment
After obtaining the curve-shaped human skeleton from the body contour image and skin color detection, the skeleton is not yet quite consistent with the actual upper extremity structure of the linear human skeleton with joints.Therefore, to accurately detect body movements, it is necessary to establish the accurate position of each joint where movements are produced.In this study, we present a rule-based process for establishing the skeleton point, which is as follows.

Head Skeleton Point Determination
To rapidly determine the skeleton points of the human head, we assume that when the rehabilitation user is sufficiently near the screen center, then the head skeleton can be detected in the central region of the screen, denoted by S C .We adopted the skeleton feature map and the detected skin region, denoted by SK and SR, respectively, to locate the head skeleton point.Because the user's face should be sufficiently close to the camera, the area of the face's skin region should be sufficiently large.The search process for the head skeleton point is performed through the following steps.
Step 1: Scan the skeleton feature map within the circular region whose center is located at the central coordinate of the input image with a radius of max(W/4, H/4), where W and H denote the width and height of the image, respectively.
Step 2: If the skeleton feature map can be obtained in Step 1, then we validate the corresponding connected-component area of the skin color region covering the skeleton feature map that is sufficiently large to be a human face-its area should be larger than a given threshold T s , where the threshold is set at 800 pixels in our experiments.As a result, we can obtain the location of the head skeleton point, denoted by S H and depicted in Figure 11.

Skeleton Point Establishment
After obtaining the curve-shaped human skeleton from the body contour image and skin color detection, the skeleton is not yet quite consistent with the actual upper extremity structure of the linear human skeleton with joints.Therefore, to accurately detect body movements, it is necessary to establish the accurate position of each joint where movements are produced.In this study, we present a rule-based process for establishing the skeleton point, which is as follows.

Head Skeleton Point Determination
To rapidly determine the skeleton points of the human head, we assume that when the rehabilitation user is sufficiently near the screen center, then the head skeleton can be detected in the central region of the screen, denoted by SC.We adopted the skeleton feature map and the detected skin region, denoted by SK and SR, respectively, to locate the head skeleton point.Because the user's face should be sufficiently close to the camera, the area of the face's skin region should be sufficiently large.The search process for the head skeleton point is performed through the following steps.
Step 1: Scan the skeleton feature map within the circular region whose center is located at the central coordinate of the input image with a radius of max(W/4, H/4), where W and H denote the width and height of the image, respectively.
Step 2: If the skeleton feature map can be obtained in Step 1, then we validate the corresponding connected-component area of the skin color region covering the skeleton feature map that is sufficiently large to be a human face-its area should be larger than a given threshold Ts, where the threshold is set at 800 pixels in our experiments.As a result, we can obtain the location of the head skeleton point, denoted by SH and depicted in Figure 10.

Shoulder Skeleton Point Determination
After the skeleton point of the head (S H ) is established, we can determine the pair of shoulder skeleton points according to the position of the head because the shoulders should appear under the two sides of the head: Step 3: To find the initial shoulder skeleton points, we first search down the rectangular boundaries of the box formed by the head's skin color region to locate the pair of skeleton feature points that are first encountered under the head.Then, we set these two skeleton points as the initial left and right shoulder skeleton points, denoted by S SL and S SR respectively.
Step 4: As depicted in Figure 12a,b, the initial shoulder points are possibly determined on the basis of the clavicle positions under the head's boundaries, but the actual shoulder points should be closer to the lateral ends of the clavicles.Therefore, we set the initial shoulder points S SL and S SR as the centers of the corresponding semicircular regions formed by the movement regions of the two arms.
Step 5: Because the real shoulder points should be at the rotational centers of the shoulder joints, we set the initial shoulder points S SL and S SR as the centers, and set a radius r with the angles θ ranging from 90 • to −90 • of the semicircular regions of the left and right shoulders.Then, we determined whether skeleton feature points were present over the two semicircular regions through Equation ( 2).
where S x and S y denote the x and y coordinates of the initial shoulder points (i.e., S SL and S SR ), respectively, and S X and S Y represent the x and y coordinates of the possible shoulder points we evaluated along the semicircular regions of the left and right shoulders, respectively.Because the real shoulder point will be the rotational center of the shoulder joint, the candidate shoulder point should accord with this characteristic.Thus, if a pair of skeleton points is located at the search regions of the left and right shoulders, denoted by S L and S R respectively, then we set these two skeleton points as the actual left and right shoulder points.The search process for the shoulder skeleton points is depicted in Figure 12.

Shoulder Skeleton Point Determination
After the skeleton point of the head (SH) is established, we can determine the pair of shoulder skeleton points according to the position of the head because the shoulders should appear under the two sides of the head: Step 3: To find the initial shoulder skeleton points, we first search down the rectangular boundaries of the box formed by the head's skin color region to locate the pair of skeleton feature points that are first encountered under the head.Then, we set these two skeleton points as the initial left and right shoulder skeleton points, denoted by and respectively.
Step 4: As depicted in Figure 12a,b, the initial shoulder points are possibly determined on the basis of the clavicle positions under the head's boundaries, but the actual shoulder points should be closer to the lateral ends of the clavicles.Therefore, we set the initial shoulder points and as the centers of the corresponding semicircular regions formed by the movement regions of the two arms.
Step 5: Because the real shoulder points should be at the rotational centers of the shoulder joints, we set the initial shoulder points and as the centers, and set a radius r with the angles ranging from 90° to −90° of the semicircular regions of the left and right shoulders.Then, we determined whether skeleton feature points were present over the two semicircular regions through Equation (2).
where Sx and Sy denote the x and y coordinates of the initial shoulder points (i.e., SL S and SR S ), respectively, and and represent the x and y coordinates of the possible shoulder points we evaluated along the semicircular regions of the left and right shoulders, respectively.Because the real shoulder point will be the rotational center of the shoulder joint, the candidate shoulder point should accord with this characteristic.Thus, if a pair of skeleton points is located at the search regions of the left and right shoulders, denoted by L S and R S respectively, then we set these two skeleton points as the actual left and right shoulder points.The search process for the shoulder skeleton points is depicted in Figure 12.

Elbow and Wrist Skeleton Point Determination
After establishing actual shoulder skeleton points ( L S , R S ), we can determine the elbow and wrist skeleton points.
Step 6: We set the obtained shoulder skeleton points (i.e., L S , R S ) from Step 5 as the centers, then set a radius r with the angles ranging from 45° to 315° to search a 3/4-circular region to find the arm skeleton points as depicted in Equation (3).The search is completed when the maximum value is found, and all the found points constitute arm skeleton points, as depicted in Figure 13a.

Elbow and Wrist Skeleton Point Determination
After establishing actual shoulder skeleton points (S L , S R ), we can determine the elbow and wrist skeleton points.
Step 6: We set the obtained shoulder skeleton points (i.e., S L , S R ) from Step 5 as the centers, then set a radius r with the angles θ ranging from 45 • to 315 • to search a 3/4-circular region to find the arm skeleton points as depicted in Equation ( 3).The search is completed when the maximum value is found, and all the found points constitute arm skeleton points, as depicted in Figure 13a.We set the end of points as the pair of wrist skeleton points, denoted by S W L and S WR , respectively, and depicted in Figure 13b.
where the radius of the search region r is determined to be twice the width of the shoulder because the length of the human arm is typically within twice the shoudler width.It can be determined using the following equation: Step 7: Next, we can place the left and right elbow skeleton points halfway between the whole arm skeleton points.They are denoted as S EL and S ER , respectively, and are depicted in Figure 13c.We set the end of points as the pair of wrist skeleton points, denoted by WL S and WR S , respectively, and depicted in Figure 13b.
where the radius of the search region r is determined to be twice the width of the shoulder because the length of the human arm is typically within twice the shoudler width.It can be determined using the following equation:

The Overall Skeleton Points Correction Process
If the user does not sit in an upright position or if the user's body rotates, these situations may cause some errors in the skeleton point setting, as indicated in Figure 14a.This study used depth information to correct the positions of skeletons.When the depth information varies for the right side and left side, we can shrink the radius r for searching on the far side and extend the radius r for searching on the proximal side, as depicted in Figure 14b.

The Overall Skeleton Points Correction Process
If the user does not sit in an upright position or if the user's body rotates, these situations may cause some errors in the skeleton point setting, as indicated in Figure 14a.This study used depth information to correct the positions of skeletons.When the depth information varies for the right side and left side, we can shrink the radius r for searching on the far side and extend the radius r for searching on the proximal side, as depicted in Figure 14b.We set the end of points as the pair of wrist skeleton points, denoted by WL S and WR S , respectively, and depicted in Figure 13b.
where the radius of the search region r is determined to be twice the width of the shoulder because the length of the human arm is typically within twice the shoudler width.It can be determined using the following equation:

The Overall Skeleton Points Correction Process
If the user does not sit in an upright position or if the user's body rotates, these situations may cause some errors in the skeleton point setting, as indicated in Figure 14a.This study used depth information to correct the positions of skeletons.When the depth information varies for the right side and left side, we can shrink the radius r for searching on the far side and extend the radius r for searching on the proximal side, as depicted in Figure 14b.After the user is trained, the screen will display the movement starting point and end point to guide the user to conduct the appropriate action.However, the time to complete each action may vary by user.Euclidean distance can compare two sequences, but it might not be able to compare two sequential movements of different durations.The traditional Euclidean distance calculation method easily leads to the expansion of distance between two similar but different vectors because the vectors contain action and time and different lengths.The action classifier process is equal to comparing two action sequences of different lengths.Therefore, we must adopt a suitable methodology to handle the two action vectors of different lengths on the time axis, and still achieve the best correspondence.The nonlinear dynamic time warped alignment allows a more intuitive measure to be calculated.The dynamic time warping (DTW) algorithm [31] can solve the problem of various lengths of time without performing additional computations during the training process.The DTW algorithm can achieve the best nonlinear point-to-point correspondence between two motion trajectories, and can still compare the trajectories of various The DTW algorithm is widely used in audio matching, voice recognition, gesture recognition, and limb motion recognition [32].Many studies use the DTW algorithm to compare motion trajectories.
In this study, we adopted the DTW algorithm to compare actions and determine the similarity between the defined and the user-performed action, and to avoid the influence of various times spent on similar rehabilitation actions on the action determination.The DTW algorithm identified the features of continuous action sequences, and tolerated the effect of deviation of two vectors of different lengths on the time series in the movement comparison.On the basis of the dynamic programming optimization results, correlations between the two nonlinear trajectories were determined to be the corresponding point-to-point matching relations.According to the After the user is trained, the screen will display the movement starting point and end point to guide the user to conduct the appropriate action.However, the time to complete each action may vary by user.Euclidean distance can compare two sequences, but it might not be able to compare two sequential movements of different durations.The traditional Euclidean distance calculation method easily leads to the expansion of distance between two similar but different vectors because the vectors contain action and time and different lengths.The action classifier process is equal to comparing two action sequences of different lengths.Therefore, we must adopt a suitable methodology to handle the two action vectors of different lengths on the time axis, and still achieve the best correspondence.The nonlinear dynamic time warped alignment allows a more intuitive measure to be calculated.The dynamic time warping (DTW) algorithm [31] can solve the problem of various lengths of time without performing additional computations during the training process.The DTW algorithm can achieve the best nonlinear point-to-point correspondence between two motion trajectories, and can still compare the trajectories of various lengths.The DTW algorithm is widely used in audio matching, voice recognition, gesture recognition, and limb motion recognition [32].Many studies use the DTW algorithm to compare motion trajectories.
In this study, we adopted the DTW algorithm to compare actions and determine the similarity between the defined and the user-performed action, and to avoid the influence of various times spent on similar rehabilitation actions on the action determination.The DTW algorithm identified the features of continuous action sequences, and tolerated the effect of deviation of two vectors of different lengths on the time series in the movement comparison.On the basis of the dynamic programming optimization results, correlations between the two nonlinear trajectories were determined to be the corresponding point-to-point matching relations.According to the After the user is trained, the screen will display the movement starting point and end point to guide the user to conduct the appropriate action.However, the time to complete each action may vary by user.Euclidean distance can compare two sequences, but it might not be able to compare two sequential movements of different durations.The traditional Euclidean distance calculation method easily leads to the expansion of distance between two similar but different vectors because the vectors contain action and time and different lengths.The action classifier process is equal to comparing two action sequences of different lengths.Therefore, we must adopt a suitable methodology to handle the two action vectors of different lengths on the time axis, and still achieve the best correspondence.The nonlinear dynamic time warped alignment allows a more intuitive measure to be calculated.The dynamic time warping (DTW) algorithm [31] can solve the problem of various lengths of time without performing additional computations during the training process.The DTW algorithm can achieve the best nonlinear point-to-point correspondence between two motion trajectories, and can still compare the trajectories of various lengths.The DTW algorithm is widely used in audio matching, voice recognition, gesture recognition, and limb motion recognition [32].Many studies use the DTW algorithm to compare motion trajectories.
In this study, we adopted the DTW algorithm to compare actions and determine the similarity between the defined and the user-performed action, and to avoid the influence of various times spent on similar rehabilitation actions on the action determination.The DTW algorithm identified the features of continuous action sequences, and tolerated the effect of deviation of two vectors of different lengths on the time series in the movement comparison.On the basis of the dynamic programming optimization results, correlations between the two nonlinear trajectories were determined to be the corresponding point-to-point matching relations.According to the After the user is trained, the screen will display the movement starting point and end point to guide the user to conduct the appropriate action.However, the time to complete each action may vary by user.Euclidean distance can compare two sequences, but it might not be able to compare two sequential movements of different durations.The traditional Euclidean distance calculation method easily leads to the expansion of distance between two similar but different vectors because the vectors contain action and time and different lengths.The action classifier process is equal to comparing two action sequences of different lengths.Therefore, we must adopt a suitable methodology to handle the two action vectors of different lengths on the time axis, and still achieve the best correspondence.The nonlinear dynamic time warped alignment allows a more intuitive measure to be calculated.The dynamic time warping (DTW) algorithm [31] can solve the problem of various lengths of time without performing additional computations during the training process.The DTW algorithm can achieve the best nonlinear point-to-point correspondence between two motion trajectories, and can still compare the trajectories of various lengths.The DTW algorithm is widely used in audio matching, voice recognition, gesture recognition, and limb motion recognition [32].Many studies use the DTW algorithm to compare motion trajectories.
In this study, we adopted the DTW algorithm to compare actions and determine the similarity between the defined and the user-performed action, and to avoid the influence of various times spent on similar rehabilitation actions on the action determination.The DTW algorithm identified the features of continuous action sequences, and tolerated the effect of deviation of two vectors of different lengths on the time series in the movement comparison.On the basis of the dynamic programming optimization results, correlations between the two nonlinear trajectories were determined to be the corresponding point-to-point matching relations.According to the After the user is trained, the screen will display the movement starting point and end point to guide the user to conduct the appropriate action.However, the time to complete each action may vary by user.Euclidean distance can compare two sequences, but it might not be able to compare two sequential movements of different durations.The traditional Euclidean distance calculation method easily leads to the expansion of distance between two similar but different vectors because the vectors contain action and time and different lengths.The action classifier process is equal to comparing two action sequences of different lengths.Therefore, we must adopt a suitable methodology to handle the two action vectors of different lengths on the time axis, and still achieve the best correspondence.The nonlinear dynamic time warped alignment allows a more intuitive measure to be calculated.The dynamic time warping (DTW) algorithm [31] can solve the problem of various lengths of time without performing additional computations during the training process.The DTW algorithm can achieve the best nonlinear point-to-point correspondence between two motion trajectories, and can still compare the trajectories of various lengths.The DTW algorithm is widely used in audio matching, voice recognition, gesture recognition, and limb motion recognition [32].Many studies use the DTW algorithm to compare motion trajectories.
In this study, we adopted the DTW algorithm to compare actions and determine the similarity between the defined and the user-performed action, and to avoid the influence of various times spent on similar rehabilitation actions on the action determination.The DTW algorithm identified the features of continuous action sequences, and tolerated the effect of deviation of two vectors of different lengths on the time series in the movement comparison.On the basis of the dynamic programming optimization results, correlations between the two nonlinear trajectories were determined to be the corresponding point-to-point matching relations.According to the optimization results of the dynamic programming, the best point-to-point corresponding relationship between the two nonlinear trajectories was found, and different movement trajectories with different time lengths for the same action could be accurately identified.

Results
The hardware of the upper extremity rehabilitation system comprised a personal computer with the Intel Core2 CPU and a Kinect camera.The software comprised the WIN7 operating system, a QT2.4.1 platform, and OpenNI, as presented in Table 2. Using Kinect depth information, we performed experiments at the resolutions of 640 × 480 and 320 × 240 as displayed in Table 3.To improve execution speed, we adopted the resolution of 320 × 240 for further examination.We examined 20 samples, and each of them had at least 500 frames.The total number of frames was 11,134.
We used skin color detection to determine the presence of the human body and to orient the head.The skin color detection process yielded an average execution speed of 0.26 ms, equivalent to 3889 frames per second (FPS) in a release model.The results from testing all of the system processing functions, including skin color detection and skeletonizing, indicated that each frame only required 8.02 ms, equivalent to 125 FPS in a release model; results are presented in Table 4. Results of test samples are depicted in Figures 15-17.Because the action identification system proposed in this study is aimed at assisting patients who must complete rehabilitation actions, calculation of its accuracy is based on whether the designated user has performed an action correctly or not; a correct performance receives an OK system response, whereas an improperly performed action receives a NO system response.
In addition to quantitatively evaluating the performance of the proposed action identification system, this study evaluated the identification accuracy of rehabilitation actions.Table 5. displays the quantitative identification accuracy data for rehabilitation actions performed using the proposed system.The table data indicate that the proposed system can achieve high accuracy in identifying rehabilitation actions.The average action identification accuracy was 98.1%.Results of rehabilitation action identification are depicted in Figure 18.Accordingly, the high accuracy of the system's identification of rehabilitation actions enables it to effectively assist patients who perform rehabilitation actions of the upper extremity at home.The computational speed of the proposed system can reach 125 FPS, which equates to a processing time per frame of less than 8 ms.This low computation cost ensures that the proposed system can effectively satisfy the demands of real-time processing.Such computational efficiency allows for extensibility with respect to future system Because the action identification system proposed in this study is aimed at assisting patients who must complete rehabilitation actions, calculation of its accuracy is based on whether the designated user has performed an action correctly or not; a correct performance receives an OK system response, whereas an improperly performed action receives a NO system response.
In addition to quantitatively evaluating the performance of the proposed action identification system, this study evaluated the identification accuracy of rehabilitation actions.Table 5 displays the quantitative identification accuracy data for rehabilitation actions performed using the proposed system.The table data indicate that the proposed system can achieve high accuracy in identifying rehabilitation actions.The average action identification accuracy was 98.1%.Results of rehabilitation action identification are depicted in Figure 18.Accordingly, the high accuracy of the system's identification of rehabilitation actions enables it to effectively assist patients who perform rehabilitation actions of the upper extremity at home.The computational speed of the proposed system can reach 125 FPS, which equates to a processing time per frame of less than 8 ms.This low computation cost ensures that the proposed system can effectively satisfy the demands of real-time processing.Such computational efficiency allows for extensibility with respect to future system developments that account for complex ambient environments or implementation in embedded and pervasive systems.

Discussion
In this study, we replaced dual cameras with a Kinect depth camera.Compared with the study using dual cameras [33], we spent more time to detect skin color-approximately 12 ms.However, this disadvantage can be overcome and palm position can be determined if the user wears long sleeves that reveal only the palm.In addition, the rapid processing speed for the upper extremity tracking ensures the accuracy of the proposed system because we can make a direct judgment of each position.We also check the key points of the skeleton to ensure movements are correctly enacted.
Compared with the study that used HSV for skin color detection [27], we selected to use the YCbCr and elliptical skin models for skin color detection to hasten the skin color detection process.The skeletonizing process of the proposed system optimizes skeletonizing and achieves a high degree of accuracy.It can perform well at a high dpi (640 × 480) as well as at a low dpi (320 × 240), and achieves a high processing speed.
The hand-pair gesture presented by Patlolla et al. [33] used two skeleton points (palms), achieved an accuracy of 93%, and operated nearly in real time.The body gesture method proposed by Gonzalez-Sanchez et al. [27], used three skeleton points and achieved 32 FPS with an accuracy of 98%.The method proposed in this study used seven skeleton points, and achieved 37 FPS at 640 × 480 dpi with an accuracy of 98%; at the lower dpi of 320 × 240, the speed was 8 ms, equivalent to 125 FPS.The results of comparison with two studies are depicted in Table 6.

Conclusions
This study proposed an action identification system for daily home upper extremity rehabilitation.In this section, we will discuss the key contributions of this study.First, in Skeleton Point Establishment phase, we set up those skeleton points not only according to the changes of image features, but also consider the real movements of joints and muscles.Hence, we set up and correct those skeleton points following the principles of anatomy of actions and principles of muscles testing, and those principles also be applied to set rehabilitation action programs.We have preset seven kinds of rehabilitation movements for specific muscle groups in the system to provide a guide for users to perform the actions.The selection of these rehabilitation actions was based on their importance for rehabilitating the ability to perform daily living activities that require the use of the upper extremity such as dressing oneself or reaching for things.Each rehabilitation action program corresponds to training of specific muscle groups such as the biceps, triceps, or deltoid muscles.Second, the hardware used in the upper extremity rehabilitation system comprises a personal computer with a Kinect depth camera that can be set up similarly to a study desk or computer table at home; thus, the system provides a familiar and convenient environment for rehabilitation users, and the patient can perform the rehabilitation routine even in a limited space without extraneous equipment.Third, patients who cannot stand for a long periods of time, such as stroke, hemiplegia, muscular dystrophy, advanced age, etc., can perform rehabilitation actions in a sitting position, which reduces energy expenditure and enables the patient to focus on the motion control training of the upper extremities without worrying about falling.Fourth, the execution speed of the proposed system can reach 8 ms at a resolution of 320 × 240, handle a frame rate equivalent to 125 FPS, and achieve 98% accuracy when identifying rehabilitation actions.Fifth, the proposed upper extremity rehabilitation system operates in real time, achieves efficient vision-based action identification, and consists of low-cost hardware and software.In light of these enumerated benefits, we contend that this system can be effectively used by rehabilitation patients to perform daily exercises, and can reduce the burden of transportation and the overall cost of rehabilitation.
The current limitations of the proposed system are: (1) We combined the skin color detection into skeletonizing process in order to allow the system to be more accurate in establishing the skeleton.That may cause some errors in skeletonizing process if user wears long-sleeved clothes; (2) the proposed system has to establish six rehabilitation actions in the training programs for specific joints such as shoulders, elbows and for specific muscles, such as biceps, triceps, deltoid, that might insufficient for a totally training program.We need to consider how the motion of each plane and axis of the human body translates into 3D or 2D pose estimation in order to develop a more comprehensive and more accurate action rehabilitation system.
In our further studies, the proposed rehabilitation system can also be improved and extended to fit the real human body skeleton to measure the range of motion of human action by image instead of manual of profession in the future based on the machine learning techniques.If set an accurate and real human body skeleton map in pose estimation or action identification system is possible, then we can know not only the pose changed but also the amount of movements and the maximal ranges of active motions.According to this base, a vision-based movement analysis system can be built in our future study.The system can analyze the image sequences of a patient who are performing given activities, then analytical results will determine whether if a patient uses a compensatory action or an error movement pattern violate biomechanical principles.

22 Figure 1 .
Figure 1.System block diagram of the proposed upper extremity rehabilitation system.

Figure 1 .
Figure 1.System block diagram of the proposed upper extremity rehabilitation system.

Figure 1 .
Figure 1.System block diagram of the proposed upper extremity rehabilitation system.

Figure 3 .
Figure 3. Calibration process to adjust the color and depth features to be coherent: (a) before adjustment and (b) after adjustment.

Figure 3 .
Figure 3. Calibration process to adjust the color and depth features to be coherent: (a) before adjustment and (b) after adjustment.

Figure 3 .
Figure 3. Distance transform process using Euclidean distance: (a) binary Image and (b) distance transform.

Figure 4 .
Figure 4.The distance transform process to highlight the outline of a skeleton frame: (a) before distance transform and (b) after distance transform.

Figure 4 .
Figure 4. Distance transform process using Euclidean distance: (a) binary Image and (b) distance transform.

Figure 5 .
Figure 5.The distance transform process to highlight the outline of a skeleton frame: (a) before distance transform and (b) after distance transform.

Figure 5 .
Figure 5.The process to highlight the outline of a skeleton frame: (a) before Gaussian 7 × 7 smoothing and (b) after Gaussian 7 × 7 smoothing.

Figure 6 .
Figure 6.The process to highlight the outline of a skeleton frame: (a) before Gaussian 7 × 7 smoothing and (b) after Gaussian 7 × 7 smoothing.

Figure 6 .
Figure 6.The process to highlight the outline of a skeleton frame: (a) before Gaussian 7 × 7 smoothing and (b) after Gaussian 7 × 7 smoothing.

Figure 7 .
Figure 7.The effects of various convolution kernel processes to highlight the features of an image of the human body: (a) 3 × 3 convolution kernel, (b) 5 × 5 convolution kernel, and (c) 7 × 7 convolution kernel.

Figure 7 .
Figure 7.The effects of various convolution kernel processes to highlight the features of an image of the human body: (a) 3 × 3 convolution kernel, (b) 5 × 5 convolution kernel, and (c) 7 × 7 convolution kernel.

Figure 5 .
Figure 5.The process to highlight the outline of a skeleton frame: (a) before Gaussian 7 × 7 smoothing and (b) after Gaussian 7 × 7 smoothing.

Figure 6 .
Figure 6.The effects of various convolution kernel processes to highlight the features of an image of the human body: (a) 3 × 3 convolution kernel, (b) 5 × 5 convolution kernel, and (c) 7 × 7 convolution kernel.

Figure 7 .
Figure 7. Four directions of convolution filtering.Figure 8. Four directions of convolution filtering.

Figure 8 .
Figure 7. Four directions of convolution filtering.Figure 8. Four directions of convolution filtering.

Figure 10 .
Figure 10.Establishing the skeleton point of the head: (a) The user approaches the preset point, which triggers detection.(b) The skeleton feature map combined with YCbCr elliptical skin color detection establishes the head skeleton point.

Figure 10 .
Figure 10.Establishing the skeleton point of the head: (a) The user approaches the preset point, which triggers detection.(b) The skeleton feature map combined with YCbCr elliptical skin color detection establishes the head skeleton point.

Figure 11 .
Figure 11.Establishing the skeleton point of the head: (a) The user approaches the preset point, which triggers detection.(b) The skeleton feature map combined with YCbCr elliptical skin color detection establishes the head skeleton point.

Figure 12 .
Figure 12.The shoulder point search process: (a) The initial shoulder point is sought under the square region (b) The initial shoulder points (c) The actual shoulder points.

Figure 12 .
Figure 12.The shoulder point search process: (a) The initial shoulder point is sought under the square region (b) The initial shoulder points (c) The actual shoulder points.

Step 7 :Figure 13 .
Figure 13.The established skeleton points of the wrist and elbow: (a) All the points we found constitute arm skeleton points.(b) We set the end points at the wrist skeleton points.(c) We set the skeleton point of the elbow halfway between the whole arm skeleton points.

Figure 14 .
Figure 14.Results before and after arm point correction: (a) before arm point correction and (b) after arm skeleton point correction.

Figure 13 .
Figure 13.The established skeleton points of the wrist and elbow: (a) All the points we found constitute arm skeleton points.(b) We set the end points at the wrist skeleton points.(c) We set the skeleton point of the elbow halfway between the whole arm skeleton points.

Step 7 :Figure 13 .
Figure 13.The established skeleton points of the wrist and elbow: (a) All the points we found constitute arm skeleton points.(b) We set the end points at the wrist skeleton points.(c) We set the skeleton point of the elbow halfway between the whole arm skeleton points.

Figure 14 .
Figure 14.Results before and after arm point correction: (a) before arm point correction and (b) after arm skeleton point correction.

Figure 14 .
Figure 14.Results before and after arm point correction: (a) before arm point correction and (b) after arm skeleton point correction.
Appl.Sci.2018, 8, x 18 of 22 developments that account for complex ambient environments or implementation in embedded and pervasive systems.

Figure 18 .
Figure 18.The results of rehabilitation action identification: position 0 to 6.

Figure 18 .
Figure 18.The results of rehabilitation action identification: position 0 to 6.

Table 2 .
Hardware and software of the proposed system.

Table 3 .
Comparison of system execution speed at various resolutions.

Table 4 .
Experimental data of skin color detection speed and system execution speed.

Table 5 .
Accuracy of identification of rehabilitation actions.

Table 6 .
Comparison with two studies.