Advanced Machine Learning for Gesture Learning and Recognition Based on Intelligent Big Data of Heterogeneous Sensors

With intelligent big data, a variety of gesture-based recognition systems have been developed to enable intuitive interaction by utilizing machine learning algorithms. Realizing a high gesture recognition accuracy is crucial, and current systems learn extensive gestures in advance to augment their recognition accuracies. However, the process of accurately recognizing gestures relies on identifying and editing numerous gestures collected from the actual end users of the system. This final end-user learning component remains troublesome for most existing gesture recognition systems. This paper proposes a method that facilitates end-user gesture learning and recognition by improving the editing process applied on intelligent big data, which is collected through end-user gestures. The proposed method realizes the recognition of more complex and precise gestures by merging gestures collected from multiple sensors and processing them as a single gesture. To evaluate the proposed method, it was used in a shadow puppet performance that could interact with on-screen animations. An average gesture recognition rate of 90% was achieved in the experimental evaluation, demonstrating the efficacy and intuitiveness of the proposed method for editing visualized learning gestures.


Introduction
The development of technologies, which are based on intelligent big data such as virtual reality and invoked reality, have contributed to the increase in research interest in natural user interfaces (NUI) and natural user experience (NUX).Gesture-based systems maximize immersion by allowing end-users to intuitively manipulate contents.Numerous applications recognize end-user gestures such as the ability to remotely control a television using hand gestures or to activate stage effects triggered by body gestures.The successful implementation of applications based on gesture recognition requires a high gesture recognition accuracy, which requires defining and learning end-user gestures by immense collected gestures in advance.However, obtaining adequate gesture recognition accuracies is difficult because of the constraints imposed by different system components.First, each sensor has a limited sensory range in terms of capturing and measuring body parts in motion, reducing the accuracy possible for capturing complex gestures.Second, the end-users encounter a difficulty when they attempt to integrate their own gestures into the systems.For example, during the learning process applied to incorporate actual end-user gestures.Non-experts often face difficulty in gathering training gestures and learning gestures because this process requires a certain degree of programming knowledge.
In this study, a generic gesture recognition and learning framework is developed, which utilizes heterogeneous sensors, and enables end-users to modify their gestures based on intelligent big data.There are disadvantages associated with the use of heterogeneous sensors such as data synchronization and data fusion.However, it is difficult to accurately identify gestures without using heterogeneous sensors because of the limited range and recognition ability of each sensor.More complex gestures can be recognized using multiple sensors.Therefore, a suitable approach for utilizing heterogeneous sensors to increase the gesture recognition accuracy must be identified.In addition, for heterogeneous sensors, the most popular and low-cost sensors such as Kinect, Myo, and Leap Motion must be integrated for a better utilization of the proposed method.The approach must enable end-users to learn and recognize their specific gestures.To recognize gestures correctly, they must be learned by applying an appropriate learning algorithm using good training gestures.If the obtained results are not satisfactory, the end-user must be able to re-edit the learning data.
Therefore, a novel gesture recognition and learning method was developed and experimentally evaluated.The method comprises four end-user interfaces (UIs): Body selection, gesture learning, gesture editing, and gesture recording.
The remainder of this paper is organized as follows.In Section 2, the studies conducted on gesture learning and recognition are reviewed.Section 3 describes the structure of the proposed generic gesture recognition and learning framework.In Section 4, the implementation of the gesture learning and recognition process is described.Section 5 presents the experimental methods, results, and analysis of those results.Finally, Section 6 presents the conclusions of the study and the scope for future work.
For example, Ibañez et al. [23] proposed a Kinect-based gesture recognition tool that uses only the skeleton generated by Kinect as input data, and it can visualize the results.However, this tool cannot support a variety of other sensors or perform partial recognition tasks.Further, the editing of the obtained results is not supported by the tool.Signer et al. [24] proposed a 2D-data-based gesture recognition tool with gesture registration and learning features; this tool also supports visualization and simple editing tasks.Zhang et al. [25] proposed a gesture recognition framework that uses electromyography and gyroscope data and can learn from these two datasets simultaneously.The proposed framework utilizes the hidden Markov model (HMM) algorithm as its learning algorithm.Truong et al. [26] proposed a 3D gesture recognition framework using Laban movement analysis and HMM models.Ma et al. [27] proposed an enhanced HMM model that could recognize handwritten characters in real time.In addition, the framework classified gestures into long-term/short-term gestures and dynamic/static gestures.Borghi et al. [28] also proposed a gesture recognition algorithm, which was based on the HMM algorithm, using 3D skeleton data.The proposed method divides the skeleton into upper, lower, left, and right parts and recognizes the corresponding gestures.Suma et al. [29] proposed a Kinect-based gesture recognition tool that included a network communication structure, and it could visualize end-user gestures.The learned gestures could be applied in game control using the communication system.However, this tool could not learn or recognize gestures.Gillian et al. [30] developed a gesture recognition library based on C++ that had high universality, and it supported learning and recognition and various types of input data.However, it did not offer editing features for visualizing and learning data.Gris et al. [31] used Kinect to develop a gesture recognition tool that could collect data by specifying the body parts required for the gesture.However, the tool only supported the recognition of static gestures.Maes et al. [32] proposed a recognition system based on learning gestures that enabled users to practice dancing.However, the proposed system only focused on learning and recognition, and it did not include a module to edit the data used to learn the gesture.Yavşan et al. [33] proposed an interface for gesture recognition and an NAO robot control using Kinect.In the proposed method, the learning and recognition experiments were conducted using K-nearest neighbor (K-NN) and feed-forward neural networks (FNNs).However, only static gestures were recognizable, and the system lacked editing capabilities for learning data.
In this study, we develop a method wherein various gestures are precisely learnt by collecting them from multiple sensors.This method offers GUI-based editing features in which even non-experts can easily edit the collected gestures.Moreover, the learning accuracy can be improved through the collection of static and dynamic gesture data and the editing of this collected data.After reviewing the visual learning results, the learning data can be re-edited to achieve better recognition.Further, the learning results can be applied to interactive systems such as smart TVs through a communication network.

Generic Gesture Learning and Recognition Framework
The proposed generic gesture recognition and learning framework is designed to obtain and learn various gestures based on sensing data by multi-sensor fusion.In this framework, end-users can define customized gestures by performing the corresponding gestures, which are learnt by the framework to recognize user-specific gestures.Moreover, the defined gestures can be edited after the end user analyzes the accuracy of the recognized gesture.

Overview of the Proposed Generic Gesture Recognition and Learning Framework
The proposed framework comprises five stages: Gesture registration, editing, learning, recognition, and transfer.Figure 1 illustrates the working of the proposed generic gesture recognition and learning framework.
First, in the gesture registration stage, the proposed method obtains gesture data from the multi-sensor fusion.Using two or more sensors, various types of gestures can be learnt and precise and accurate gesture recognition can be realized [17][18][19][20][21][22].Moreover, the proposed method allows the end user to limit the range of gestures to be registered for recognizing specific body parts, thereby improving the recognition accuracy.Second, if the learnt gestures are not found to be satisfactory during the gesture editing stage, the end user can edit the collected gestures using the GUI editor.Thus, non-experts can easily define and learn various gestures.Third, during the gesture learning stage, the end user can also select the most appropriate learning algorithm.Subsequently, during the gesture recognition and gesture transfer stages, the learnt data can be applied to various interaction systems.In short, the proposed framework can realize accurate gesture recognition and learning through multi-sensor data fusion and gesture data.

Gesture Registration Stage
The proposed framework obtains gestures from Kinect, Myo, and Leap Motion.The gesture formats of Kinect, Myo, and Leap Motion are shown in Table 1.The gesture registration stage consists of a module for selecting the relevant body part and a gesture recording module to resister a new gesture.
The body part selection module selects the body parts by taking the recognition accuracy into consideration.For example, when the end-user wants for the proposed method to learn gestures using the right arm, the gestures identified from body parts other than the right arm are excluded, and only the relevant gestures are learnt.Figure 2 shows six body parts recognized by Kinect, four body parts recognized by Myo, and two body parts recognized by Leap Motion.

Gesture Registration Stage
The proposed framework obtains gestures from Kinect, Myo, and Leap Motion.The gesture formats of Kinect, Myo, and Leap Motion are shown in Table 1.The gesture registration stage consists of a module for selecting the relevant body part and a gesture recording module to resister a new gesture.
The body part selection module selects the body parts by taking the recognition accuracy into consideration.For example, when the end-user wants for the proposed method to learn gestures using the right arm, the gestures identified from body parts other than the right arm are excluded, and only the relevant gestures are learnt.Figure 2 shows six body parts recognized by Kinect, four body parts recognized by Myo, and two body parts recognized by Leap Motion.In the proposed framework, human gestures, expressed using arms and legs, are recognized based on the movement of the twelve learning body parts: Head, trunk, left arm, right arm, left leg, right leg, lower left arm, lower right arm, upper left arm, upper right arm, left hand, and right hand.Mathematically, let  be the set of all body parts,  = { ,  , ⋯ ,  }; let  * be the subset of , i.e., body parts selected by the end-user at time t,  * ∈ .
The gesture recording module obtains gestures from the sensor values of Kinect, Myo, and Leap Motion.Given that the sensor values in the proposed method consider only the orientation of the user, the values of each sensor values at time t are expressed by  , , = { , , ,  , , ,  , , }, where i is the index of a body part in , and j is the index of the sensors of the  body part.Let  be the set of all sensor values, and  * be the set of all sensor values for the selected body parts at time t,  * . , , ∈  . * =  , , ,  , , , ⋯ ∪  , , ,  , , , ⋯ ∪ ⋯ } . * ⊂  .Table 2 represents available joints acquired by each sensor.Given that the sensor values in the proposed method consider only the orientation of the user, the values of each sensor values at time t are expressed by s i,j,t = x i,j,t , y i,j,t , z i,j,t , where i is the index of a body part in B, and j is the index of the sensors of the i th body part.Let S be the set of all sensor values, and S * t be the set of all sensor values for the selected body parts at time t, B * t .s i,j,t ∈ S.   When the gesture recording module records and stores a gesture, the gesture at time t, gt, includes  * and  at time t,  = { * ,  }.Although the proposed method stores  ,  * ⊂  ⊂S, during the gesture registration stage, the learning model learns and recognizes only  * during the gesture learning and recognition stage in order to reduce the processing time required.Figure 3 illustrates the visual relationship between these notations and the gestures collected by Kinect, Myo, and Leap Motion sensors.For multi-sensor data fusion, the framework obtains sensor values using different devices such as Kinect, Myo, and Leap motion.Therefore, the proposed method is susceptible to port collision when data is collected from various gesture recognition sensors through the communication model.To address this susceptibility, a port assignment module is designed that assigns a unique ID to each sensor such that the sensor can receive data from different ports, as shown in Figure 4.For multi-sensor data fusion, the framework obtains sensor values using different devices such as Kinect, Myo, and Leap motion.Therefore, the proposed method is susceptible to port collision when data is collected from various gesture recognition sensors through the communication model.To address this susceptibility, a port assignment module is designed that assigns a unique ID to each sensor such that the sensor can receive data from different ports, as shown in Figure 4.As shown in Table 1, the sensor value type of each sensor is unique.However, in this framework, only the skeleton orientation information is used for training.

Gesture Editing Stage
The gesture editing stage comprises the visualization module and collected gesture editing module.Figure 5 illustrates the structure of the two modules in this stage.As the name suggests, the visualization module enables the visualization of the gestures when editing the collected gestures in order to realize a more intuitive editing experience for the end-user.It consists of a collected gesture viewer and learning information viewer.Each collected gesture is associated with a name, similarity, and trajectory, and the learning information includes the names, similarities, and counts of the gestures to be learnt.
The collected gesture editing module supports functions such as gesture re-recording, body part re-selection, and frame reselection for higher recognition accuracies.If the results obtained after learning and recognition are deemed unsatisfactory, the inadequate gesture can be edited, deleted, or re-recorded through editing to obtain the appropriate gesture.In Figure 5, the collected gestures are expressed as  = { * ,  }, where k is the index of the collected gestures in the database.

Visualization module
Collected gesture editing module  As shown in Table 1, the sensor value type of each sensor is unique.However, in this framework, only the skeleton orientation information is used for training.

Gesture Editing Stage
The gesture editing stage comprises the visualization module and collected gesture editing module.Figure 5 illustrates the structure of the two modules in this stage.As the name suggests, the visualization module enables the visualization of the gestures when editing the collected gestures in order to realize a more intuitive editing experience for the end-user.It consists of a collected gesture viewer and learning information viewer.Each collected gesture is associated with a name, similarity, and trajectory, and the learning information includes the names, similarities, and counts of the gestures to be learnt.As shown in Table 1, the sensor value type of each sensor is unique.However, in this framework, only the skeleton orientation information is used for training.

Gesture Editing Stage
The gesture editing stage comprises the visualization module and collected gesture editing module.Figure 5 illustrates the structure of the two modules in this stage.As the name suggests, the visualization module enables the visualization of the gestures when editing the collected gestures in order to realize a more intuitive editing experience for the end-user.It consists of a collected gesture viewer and learning information viewer.Each collected gesture is associated with a name, similarity, and trajectory, and the learning information includes the names, similarities, and counts of the gestures to be learnt.
The collected gesture editing module supports functions such as gesture re-recording, body part re-selection, and frame reselection for higher recognition accuracies.If the results obtained after learning and recognition are deemed unsatisfactory, the inadequate gesture can be edited, deleted, or re-recorded through editing to obtain the appropriate gesture.In Figure 5, the collected gestures are expressed as  = { * ,  }, where k is the index of the collected gestures in the database.

Visualization module
Collected gesture editing module  The collected gesture editing module supports functions such as gesture re-recording, body part re-selection, and frame reselection for higher recognition accuracies.If the results obtained after learning and recognition are deemed unsatisfactory, the inadequate gesture can be edited, deleted, or re-recorded through editing to obtain the appropriate gesture.In Figure 5, the collected gestures are expressed as

Gesture Learning Stage
where k is the index of the collected gestures in the database.

Gesture Learning Stage
As shown in Figure 6, the gesture learning stage comprises a learning algorithm selection module, body part filtering module, direction normalization module, and learning module.First, the end-user selects an algorithm to learn the collected gestures.Second, the body part filtering increases the learning accuracy by focusing on a selected body part when collecting gestures.Third, the direction normalization module normalizes each sensor's orientation along eight directions in order to accelerate the learning accuracy.Finally, the learning module learns the normalized gestures using the selected algorithm.As shown in Figure 6, the gesture learning stage comprises a learning algorithm selection module, body part filtering module, direction normalization module, and learning module.First, the end-user selects an algorithm to learn the collected gestures.Second, the body part filtering increases the learning accuracy by focusing on a selected body part when collecting gestures.Third, the direction normalization module normalizes each sensor's orientation along eight directions in order to accelerate the learning accuracy.Finally, the learning module learns the normalized gestures using the selected algorithm.In the proposed framework, the end-user can select from several learning algorithms including hidden Markov models (HMMs), dynamic recurrent neural networks (Dynamic RNNs), and dynamic time warping (DTW) algorithms.To enhance the gesture recognition accuracy, the appropriate learning algorithms can be selected through the learning algorithm selection module.In a previous study, it was shown that the recognition accuracies of HMMs or RNNs are higher than that of DTW [34][35][36].Moreover, in our experiment, the highest accuracy was obtained when using HMMs.Therefore, the proposed framework uses HMM by default, and the user has the option of selecting another algorithm.
HMM is a statistical Markov model that comprises two elements: A hidden state and an observable state.HMM is suitable for tasks that involve recognizing patterns that change over time.Equation ( 1) represents hidden Markov model: where P represents the observable states of a posture, and G represents the hidden states of a gesture.π indicates a matrix of the initial probabilities of hidden states, while A indicates a matrix of the transition probabilities of hidden states; B indicates a matrix of the emission probabilities of hidden states.In this stage, gesture recognition is performed using B.
The HMM uses the following three algorithms.First, the observation probabilities are calculated using forward and backward algorithms.Second, the Viterbi algorithm is used to find the most appropriate state transition sequence for the observable result sequence.This algorithm selects the most probable transition state from the previous states and outputs a result by backtracking to the initial state.Third, the initialization, state transition, and observation probabilities-which are HMM parameters-are optimized to determine the maximum value of the observation probability.To optimize these parameters, the Baum-Welch algorithm is used to generate an optimized HMM for the observation sequence.
In the proposed method, gestures recognition using HMM is achieved by using one integer to represent the direction of one sensor.The directions, namely, up, down, left, right, front, and back, are represented by corresponding bits, i.e., 00100(2), 01000(2), 00010(2), 00001(2), and 10000(2), In the proposed framework, the end-user can select from several learning algorithms including hidden Markov models (HMMs), dynamic recurrent neural networks (Dynamic RNNs), and dynamic time warping (DTW) algorithms.To enhance the gesture recognition accuracy, the appropriate learning algorithms can be selected through the learning algorithm selection module.In a previous study, it was shown that the recognition accuracies of HMMs or RNNs are higher than that of DTW [34][35][36].Moreover, in our experiment, the highest accuracy was obtained when using HMMs.Therefore, the proposed framework uses HMM by default, and the user has the option of selecting another algorithm.
HMM is a statistical Markov model that comprises two elements: A hidden state and an observable state.HMM is suitable for tasks that involve recognizing patterns that change over time.Equation ( 1) represents hidden Markov model: where P represents the observable states of a posture, and G represents the hidden states of a gesture.π indicates a matrix of the initial probabilities of hidden states, while A indicates a matrix of the transition probabilities of hidden states; B indicates a matrix of the emission probabilities of hidden states.In this stage, gesture recognition is performed using B.
The HMM uses the following three algorithms.First, the observation probabilities are calculated using forward and backward algorithms.Second, the Viterbi algorithm is used to find the most appropriate state transition sequence for the observable result sequence.This algorithm selects the most probable transition state from the previous states and outputs a result by backtracking to the initial state.Third, the initialization, state transition, and observation probabilities-which are HMM parameters-are optimized to determine the maximum value of the observation probability.To optimize these parameters, the Baum-Welch algorithm is used to generate an optimized HMM for the observation sequence.
In the proposed method, gestures recognition using HMM is achieved by using one integer to represent the direction of one sensor.The directions, namely, up, down, left, right, front, and back, are represented by corresponding bits, i.e., 00100(2), 01000(2), 00010(2), 00001(2), and 10000(2), respectively.Therefore, one or more directions can be represented by one integer.For example, up/left/front is represented as 10110 (2).Equation (2) represents the algorithm for calculating direction d of a sensor: ( The direction is determined based on the differences in the 3D coordinates of the sensor values between previous and current frames.The direction sequence of the sensor values used in the gesture is calculated, and an optimized HMM for the gesture is created using the Baum-Welch algorithm.The gesture recognition result is obtained by calculating the observation probability of the HMM for each gesture using the Viterbi algorithm with the gesture direction sequence as the input value.

Gesture Recognition Stage and Gesture Transfer Stage
In the gesture recognition and transfer stage, the sensing module obtains gestures using Kinect, Myo, and Leap Motion.The body part filtering module filters the sensing values by considering the body parts selected by the end-user for the learnt gestures.Subsequently, the recognition module calculates the similarity of all learnt gestures, and then, the most similar learnt gesture is selected.The selected learnt gesture is deliveries provided as output to smart TVs and VR sets through the network module.Figure 7 illustrates the module structure chart of the gesture recognition and gesture transfer stage.The direction is determined based on the differences in the 3D coordinates of the sensor values between previous and current frames.The direction sequence of the sensor values used in the gesture is calculated, and an optimized HMM for the gesture is created using the Baum-Welch algorithm.The gesture recognition result is obtained by calculating the observation probability of the HMM for each gesture using the Viterbi algorithm with the gesture direction sequence as the input value.

Gesture Recognition Stage and Gesture Transfer Stage
In the gesture recognition and transfer stage, the sensing module obtains gestures using Kinect, Myo, and Leap Motion.The body part filtering module filters the sensing values by considering the body parts selected by the end-user for the learnt gestures.Subsequently, the recognition module calculates the similarity of all learnt gestures, and then, the most similar learnt gesture is selected.The selected learnt gesture is deliveries provided as output to smart TVs and VR sets through the network module.

Generic Gesture Learning and Recognition Approach
The method developed in this study is based on the generic gesture learning and recognition framework.Figure 8 shows the UI Architecture of the proposed generic gesture learning and recognition framework.The method developed in this study is based on the generic gesture learning and recognition framework.Figure 8 shows the UI Architecture of the proposed generic gesture learning and recognition framework.

Generic Gesture Learning and Recognition Overview
Figure 9 shows a snapshot of the UI designed for the gesture learning application.As shown in Figure 9, the end-user is displayed a list of learnt gestures.The end-user can select a gesture on the gesture list for editing.To add a new gesture to the gesture list, the end-user can define a new gesture in the gesture creation and option window, select body parts to learn, and start recording.The enduser can also re-learn or re-edit the collected gestures for learning.The method developed in this study is based on the generic gesture learning and recognition framework.Figure 8 shows the UI Architecture of the proposed generic gesture learning and recognition framework.

Generic Gesture Learning and Recognition Overview
Figure 9 shows a snapshot of the UI designed for the gesture learning application.As shown in Figure 9, the end-user is displayed a list of learnt gestures.The end-user can select a gesture on the gesture list for editing.To add a new gesture to the gesture list, the end-user can define a new gesture in the gesture creation and option window, select body parts to learn, and start recording.The enduser can also re-learn or re-edit the collected gestures for learning.

Implementation of User Interface
In the body part selection UI, the end-user can select the body part(s) that will be used for recording the gestures.Figure 10

Implementation of User Interface
In the body part selection UI, the end-user can select the body part(s) that will be used for recording the gestures.Figure 10 shows the body selection UI for Kinect.In the UI, the end-user can select up to six parts of the body-the head, trunk, left arm, right arm, left leg, and right leg-during the gesture learning stage.As shown in the right side Figure 10, the end-user can directly select the relevant body parts.After choosing the body part, the end-user can record and save the gesture.The gesture recording UI is shown in Figure 11.In this UI, the end-user can visualize the original color video as well as the skeletal representation.The end-user can see if the recorded skeletal representation is not representative of the original color video.For example, if a frame displays an inappropriate motion trajectory of the target gesture, the inaccurate frame, in which a specific sensed value is measured incorrectly during the measurement or an empty frame is present between the recorded frames, it can be manually excluded.Moreover, learning using the obtained gestures that contain such frames leads to low recognition accuracies.Thus, the proposed method supports partial deletion and rerecording of such inappropriate frames, thereby improving the recognition accuracy.After choosing the body part, the end-user can record and save the gesture.The gesture recording UI is shown in Figure 11.In this UI, the end-user can visualize the original color video as well as the skeletal representation.The end-user can see if the recorded skeletal representation is not representative of the original color video.For example, if a frame displays an inappropriate motion trajectory of the target gesture, the inaccurate frame, in which a specific sensed value is measured incorrectly during the measurement or an empty frame is present between the recorded frames, it can be manually excluded.Moreover, learning using the obtained gestures that contain such frames leads to low recognition accuracies.Thus, the proposed method supports partial deletion and re-recording of such inappropriate frames, thereby improving the recognition accuracy.

Implementation of User Interface
In the body part selection UI, the end-user can select the body part(s) that will be used for recording the gestures.Figure 10 shows the body selection UI for Kinect.In the UI, the end-user can select up to six parts of the body-the head, trunk, left arm, right arm, left leg, and right leg-during the gesture learning stage.As shown in the right side Figure 10, the end-user can directly select the relevant body parts.After choosing the body part, the end-user can record and save the gesture.The gesture recording UI is shown in Figure 11.In this UI, the end-user can visualize the original color video as well as the skeletal representation.The end-user can see if the recorded skeletal representation is not representative of the original color video.For example, if a frame displays an inappropriate motion trajectory of the target gesture, the inaccurate frame, in which a specific sensed value is measured incorrectly during the measurement or an empty frame is present between the recorded frames, it can be manually excluded.Moreover, learning using the obtained gestures that contain such frames leads to low recognition accuracies.Thus, the proposed method supports partial deletion and rerecording of such inappropriate frames, thereby improving the recognition accuracy.As shown in Figure 12, by using the gesture editing UI, the end-user can edit the recorded gestures.The UI displays two previews-original and edited.Therefore, the end-user can compare the two previews and adjudge whether the editing is appropriate.Moreover, the end-user can use the slide bar to preview the recorded video, set the range of the video for deletion, and save the gesture to the gesture database.As shown in Figure 12, by using the gesture editing UI, the end-user can edit the recorded gestures.The UI displays two previews-original and edited.Therefore, the end-user can compare the two previews and adjudge whether the editing is appropriate.Moreover, the end-user can use the slide bar to preview the recorded video, set the range of the video for deletion, and save the gesture to the gesture database.After editing the recorded gestures, the end-user generates the HMM module using the gesture learning UI.The UI displays the set of gestures selected by the end-user for learning, as shown in Figure 13.In the gesture learning UI, the end-user can visualize the trajectory of the gestures and compare it against other gestures for the same type of data.Moreover, if differences arise in the two trajectories obtained from the data because of unstable sensors, the end-user can edit the data to reduce the noise.After the end-user finishes editing the gestures, the proposed method automatically learns all the gestures, and the end-user can visualize the similarity of each data point corresponding to each gesture.Here, similarity refers to the similarity score of the gesture data and the trained HMM model by the Viterbi algorithm.After training the HMM model, we use the dissimilarities to identify and correct noisy learning data, thereby enhancing the recognition accuracy of the HMM model.If the similarity is significantly lower than the average of other training data's similarity, the corresponding data editing is required.In the proposed method, the GUI-based gesture learning UI allows the end-user to obtain improved recognition accuracy by the deleting and re-recording of low similarity data.After editing the recorded gestures, the end-user generates the HMM module using the gesture learning UI.The UI displays the set of gestures selected by the end-user for learning, as shown in Figure 13.In the gesture learning UI, the end-user can visualize the trajectory of the gestures and compare it against other gestures for the same type of data.Moreover, if differences arise in the two trajectories obtained from the data because of unstable sensors, the end-user can edit the data to reduce the noise.After the end-user finishes editing the gestures, the proposed method automatically learns all the gestures, and the end-user can visualize the similarity of each data point corresponding to each gesture.Here, similarity refers to the similarity score of the gesture data and the trained HMM model by the Viterbi algorithm.After training the HMM model, we use the dissimilarities to identify and correct noisy learning data, thereby enhancing the recognition accuracy of the HMM model.If the similarity is significantly lower than the average of other training data's similarity, the corresponding data editing is required.In the proposed method, the GUI-based gesture learning UI allows the end-user to obtain improved recognition accuracy by the deleting and re-recording of low similarity data.As shown in Figure 12, by using the gesture editing UI, the end-user can edit the recorded gestures.The UI displays two previews-original and edited.Therefore, the end-user can compare the two previews and adjudge whether the editing is appropriate.Moreover, the end-user can use the slide bar to preview the recorded video, set the range of the video for deletion, and save the gesture to the gesture database.After editing the recorded gestures, the end-user generates the HMM module using the gesture learning UI.The UI displays the set of gestures selected by the end-user for learning, as shown in Figure 13.In the gesture learning UI, the end-user can visualize the trajectory of the gestures and compare it against other gestures for the same type of data.Moreover, if differences arise in the two trajectories obtained from the data because of unstable sensors, the end-user can edit the data to reduce the noise.After the end-user finishes editing the gestures, the proposed method automatically learns all the gestures, and the end-user can visualize the similarity of each data point corresponding to each gesture.Here, similarity refers to the similarity score of the gesture data and the trained HMM model by the Viterbi algorithm.After training the HMM model, we use the dissimilarities to identify and correct noisy learning data, thereby enhancing the recognition accuracy of the HMM model.If the similarity is significantly lower than the average of other training data's similarity, the corresponding data editing is required.In the proposed method, the GUI-based gesture learning UI allows the end-user to obtain improved recognition accuracy by the deleting and re-recording of low similarity data.The gesture recognition UI shown in Figure 14 is used for the gesture recognition and gesture transfer stages.For the gesture recognition stage, the UI can be used to test whether the learnt gesture is accurate.The end-user can visualize the raw data obtained from sensors in the gesture data log panel.In the gesture succeed log, the end-user can identify the gesture recognized by the Symmetry 2019, 11, 929 13 of 21 proposed method.If the recognition accuracy is lower than the threshold, the performed gesture is considered inadequate.For the gesture transfer stage, the end-user can transfer the recognized gesture result to other applications through the network module.Subsequently, after the HMM model is generated during the gesture learning stage, the Viterbi algorithm is utilized to calculate the similarity of end-user input data for each gesture.The gesture with the maximum similarity is chosen as the recognition result.The gesture recognition UI shown in Figure 14 is used for the gesture recognition and gesture transfer stages.For the gesture recognition stage, the UI can be used to test whether the learnt gesture is accurate.The end-user can visualize the raw data obtained from sensors in the gesture data log panel.In the gesture succeed log, the end-user can identify the gesture recognized by the proposed method.If the recognition accuracy is lower than the threshold, the performed gesture is considered inadequate.For the gesture transfer stage, the end-user can transfer the recognized gesture result to other applications through the network module.Subsequently, after the HMM model is generated during the gesture learning stage, the Viterbi algorithm is utilized to calculate the similarity of enduser input data for each gesture.The gesture with the maximum similarity is chosen as the recognition result.

Experiments
In this study, the proposed method was implemented and evaluated.Kinect was used to identify and record gestures.In the experiments, the HMM algorithm was used as the learning algorithm.

Performance Show
The proposed method was experimentally evaluated using a shadow puppet performance show.As shown in Figure 15, the actors create certain animations on the screen and interact with it using hand gestures.

Experiments
In this study, the proposed method was implemented and evaluated.Kinect was used to identify and record gestures.In the experiments, the HMM algorithm was used as the learning algorithm.

Performance Show
The proposed method was experimentally evaluated using a shadow puppet performance show.As shown in Figure 15, the actors create certain animations on the screen and interact with it using hand gestures.The gesture recognition UI shown in Figure 14 is used for the gesture recognition and gesture transfer stages.For the gesture recognition stage, the UI can be used to test whether the learnt gesture is accurate.The end-user can visualize the raw data obtained from sensors in the gesture data log panel.In the gesture succeed log, the end-user can identify the gesture recognized by the proposed method.If the recognition accuracy is lower than the threshold, the performed gesture is considered inadequate.For the gesture transfer stage, the end-user can transfer the recognized gesture result to other applications through the network module.Subsequently, after the HMM model is generated during the gesture learning stage, the Viterbi algorithm is utilized to calculate the similarity of enduser input data for each gesture.The gesture with the maximum similarity is chosen as the recognition result.

Experiments
In this study, the proposed method was implemented and evaluated.Kinect was used to identify and record gestures.In the experiments, the HMM algorithm was used as the learning algorithm.

Performance Show
The proposed method was experimentally evaluated using a shadow puppet performance show.As shown in Figure 15, the actors create certain animations on the screen and interact with it using hand gestures.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.

Gesture Registration Stage Result
Table 3.The defined gesture list for the experiments.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.

Gesture Registration Stage Result
For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.
Table 3.The defined gesture list for the experiments.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.

Gesture Registration Stage Result
Table 3.The defined gesture list for the experiments.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.

Gesture Registration Stage Result
For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.
Table 3.The defined gesture list for the experiments.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.

Gesture Registration Stage Result
Table 3.The defined gesture list for the experiments.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.

Gesture Registration Stage Result
For the experiments, ten hand gestures were defined based on the storyline of the shadow puppet performance show.The defined gestures were hand gestures which were intended to activate stage effects during the performance show.Table 3 lists the gestures used in the performance show.
Table 3.The defined gesture list for the experiments.The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.

Gesture Registration Stage Result
The proposed method recognizes the gesture performed by the actor, and the performance show contents are organized by animating a corresponding virtual character based on this gesture information.

Implementation of the Gesture Learning and Recognition Approach
The gesture learning and recognition approach was implemented using the Microsoft Foundation Class library (MFC), which is based on the C++ language, in a Windows 8.1 environment.We stored all gesture data in Hadoop Distributed File System (HDFS).The OpenCV library was used to visualize each gesture image.

Gesture Registration Stage Result
In the experiment, the recognition rates of gestures performed by three subjects were compared.Each participant provided 120 examples for each of the 10 types of gestures (a total of 1200 gestures), which served as learning data.Table 4 shows the gestures recorded by the proposed method for different body parts as selected by the end-user.We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Images of Frames
Gesture 4 Gesture 9 Gesture 5 Gesture 10 horizontal axis: frame count, vertical axis: x, y, z value We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.Table 6.First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Editing
Images of Frames Before editing We confirmed that the coordinates of a particular body part are similar even when different subjects perform the same gesture.If the user selects specific body parts to be used for learning and recognition, gesture learning and recognition tasks are performed only for the corresponding body parts.

Gesture Editing Stage Result
To improve the accuracy of gesture recognition, the user can eliminate duplicated or invalid frames from the acquired gesture data.Table 6 shows the results of an end user selecting specific frames of each gesture.These include the specific frames of the end-user's gestures.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.

Gesture Learning Stage Result
The criterion for determining the gesture recognition stage is the recognition accuracy of the As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.As can be inferred from the above table, frames 1-9, 10-19, 70-79, and 80 are similar before the gesture is edited.The table also shows the results obtained after eliminating frames 10-19 and 70-79 by using gesture visualization and the editing UI.

Gesture Learning Stage Result
The criterion for determining the gesture recognition stage is the recognition accuracy of the learnt gestures.The gesture recognition accuracy is calculated as a percentage of successfully recognized gestures in the test data, as shown in Equation ( 3 In the experiment, the performance of each algorithm was compared.The number of participants was 21.For each participant, 100 examples were recorded for each gesture.Among these, 60 examples were used for learning, and the remaining 40 examples were used for testing.The threshold for the similarity of a gesture performed by a participant and the target gesture was set to 40%.If the similarity was below 40%, the gesture recognition task was considered unsuccessful.Table 7 lists the average gesture recognition accuracy for each gesture using HMM, Dynamic RNN and DTW.These were 92.00%, 91.98%, and 82.62% for HMM, Dynamic RNN, and DTW, respectively.The highest accuracy was obtained using HMM.

Gesture Recognition Stage Result
In addition to the above experiments, a scenario was designed using the 10 defined gestures to evaluate the recognition performance of the proposed method for each gesture.Table 8 shows the testing performance scenario.Between two gestures, a 1 s interval was added to distinguish between the start and end of a gesture.Table 9 lists the recognition rates for the testing scenario, which suggest that all gestures were recognized correctly even when multiple gestures were performed consecutively.Therefore, the recognition performance rate was 90% on average.Table 9 lists the recognition rates for the testing scenario, which suggest that all gestures were recognized correctly even when multiple gestures were performed consecutively.Therefore, the recognition performance rate was 90% on average.Table 9 lists the recognition rates for the testing scenario, which suggest that all gestures were recognized correctly even when multiple gestures were performed consecutively.Therefore, the recognition performance rate was 90% on average.Table 9 lists the recognition rates for the testing scenario, which suggest that all gestures were recognized correctly even when multiple gestures were performed consecutively.Therefore, the recognition performance rate was 90% on average.Table 9 lists the recognition rates for the testing scenario, which suggest that all gestures were recognized correctly even when multiple gestures were performed consecutively.Therefore, the recognition performance rate was 90% on average.Table 9 lists the recognition rates for the testing scenario, which suggest that all gestures were recognized correctly even when multiple gestures were performed consecutively.Therefore, the recognition performance rate was 90% on average.

Figure 1 .
Figure 1.Overall process of the proposed generic gesture recognition and learning framework.

Figure 2 .
Figure 2. Body parts recognized by Kinect, Myo, and Leap motions.In the proposed framework, human gestures, expressed using arms and legs, are recognized based on the movement of the twelve learning body parts: Head, trunk, left arm, right arm, left leg, right leg, lower left arm, lower right arm, upper left arm, upper right arm, left hand, and right hand.Mathematically, let B be the set of all body parts, B = {b 1 , b 2 , • • • , b 12 }; let B * t be the subset of B, i.e., body parts selected by the end-user at time t, B * t ∈ B. The gesture recording module obtains gestures from the sensor values of Kinect, Myo, and Leap Motion.Given that the sensor values in the proposed method consider only the orientation of the user, the values of each sensor values at time t are expressed by s i,j,t = x i,j,t , y i,j,t , z i,j,t , where i is the index of a body part in B, and j is the index of the sensors of the i th body part.Let S be the set of all sensor values, and S * t be the set of all sensor values for the selected body parts at time t, B * t .s i,j,t ∈ S. S * t = s 1,1,t , s 1,2,t , • • • ∪ s 2,1,t , s 2,2,t , • • • ∪ • • • .S * t ⊂ S .Table 2 represents available joints acquired by each sensor.

Figure 5 .
Figure 5. Modular structure of the gesture editing stage.

Figure 5 .
Figure 5. Modular structure of the gesture editing stage.

Figure 6 .
Figure 6.Modular structure of the gesture learning stage.

Figure 6 .
Figure 6.Modular structure of the gesture learning stage.

Figure 7 .
Figure 7. Modular structure of the gesture recognition and gesture transfer stage.Figure 7. Modular structure of the gesture recognition and gesture transfer stage.

Figure 7 .
Figure 7. Modular structure of the gesture recognition and gesture transfer stage.Figure 7. Modular structure of the gesture recognition and gesture transfer stage.
Symmetry 2019, 11, x FOR PEER REVIEW 10 of 21 4. Generic Gesture Learning and Recognition Approach

Figure 8 .
Figure 8. UI Architecture of the proposed generic gesture learning and recognition framework.

Figure 9 .
Figure 9. UI of the proposed generic gesture learning and recognition application.

Figure 8 .
Figure 8. UI Architecture of the proposed generic gesture learning and recognition framework.

4. 1 .
Figure9shows a snapshot of the UI designed for the gesture learning application.As shown in Figure9, the end-user is displayed a list of learnt gestures.The end-user can select a gesture on the gesture list for editing.To add a new gesture to the gesture list, the end-user can define a new gesture in the gesture creation and option window, select body parts to learn, and start recording.The end-user can also re-learn or re-edit the collected gestures for learning.

Figure 8 .
Figure 8. UI Architecture of the proposed generic gesture learning and recognition framework.

Figure 9 .
Figure 9. UI of the proposed generic gesture learning and recognition application.Figure 9. UI of the proposed generic gesture learning and recognition application.

Figure 9 .
Figure 9. UI of the proposed generic gesture learning and recognition application.Figure 9. UI of the proposed generic gesture learning and recognition application.
shows the body selection UI for Kinect.In the UI, the end-user can select up to six parts of the body-the head, trunk, left arm, right arm, left leg, and right leg-during the gesture learning stage.As shown in the right side Figure 10, the end-user can directly select the relevant body parts.Symmetry 2019, 11, x FOR PEER REVIEW 11 of 21

Figure 15 .
Figure 15.Shadow puppet performance show stage.Figure 15.Shadow puppet performance show stage.

Figure 15 .
Figure 15.Shadow puppet performance show stage.Figure 15.Shadow puppet performance show stage.

Frame
frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Frame
axis: frame count, vertical axis: x, y, z value

Table 1 .
Data format for each sensor.
Figure 1.Overall process of the proposed generic gesture recognition and learning framework.

Table 1 .
Data format for each sensor.

Table 2 .
Available joints acquired by each sensor.

Table 2
represents available joints acquired by each sensor.

Table 2 .
Available joints acquired by each sensor.

Table 2 .
Cont.When the gesture recording module records and stores a gesture, the gesture at time t, g t , includes B * t and S t at time t, g t = B * t , S t .Although the proposed method stores S t , S * t ⊂ S t ⊂S, during the gesture registration stage, the learning model learns and recognizes only S * t during the gesture learning and recognition stage in order to reduce the processing time required.Figure3illustrates the visual relationship between these notations and the gestures collected by Kinect, Myo, and Leap Motion sensors.

Table 3 .
The defined gesture list for the experiments.

Table 3 .
The defined gesture list for the experiments.

Table 3 .
The defined gesture list for the experiments.

Table 3 .
The defined gesture list for the experiments.

Table 3 .
The defined gesture list for the experiments.

type Average Sensing Values for Three Participants Gesture Type Average Sensing Values for Three Participants
Symmetry 2019, 11, x FOR PEER REVIEW 16 of 21

Table 6 .
First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.
horizontal axis: frame count, vertical axis: x, y, z value

Table 6 .
First key frame of the original skeleton and key frames of the transparent skeleton for different gestures.

Table 7 .
Comparison results of recognition accuracy among the hidden Markov model (HMM), dynamic recurrent neural networks (Dynamic RNN), and dynamic time warping (DTW).

Table 8 .
Testing scenarios to evaluate recognition performance.Symmetry 2019, 11, x FOR PEER REVIEW 18 of 21

Table 8 .
Testing scenarios to evaluate recognition performance.Symmetry 2019, 11, x FOR PEER REVIEW 18 of 21

Table 8 .
Testing scenarios to evaluate recognition performance.Symmetry 2019, 11, x FOR PEER REVIEW 18 of 21

Table 8 .
Testing scenarios to evaluate recognition performance.

Table 9
lists the recognition rates for the testing scenario, which suggest that all gestures were

Table 9
lists the recognition rates for the testing scenario, which suggest that all gestures were

Table 9
lists the recognition rates for the testing scenario, which suggest that all gestures were