Estimation of Hand Motion from Piezoelectric Soft Sensor Using Deep Recurrent Network

: Soft sensors are attracting signiﬁcant attention in human–machine interaction due to their high ﬂexibility and adaptability. However, estimating motion state from these sensors is difﬁcult due to their nonlinearity and noise. In this paper, we propose a deep learning network for a smart glove system to predict the moving state of a piezoelectric soft sensor. We implemented the network using Long-Short Term Memory (LSTM) units and demonstrated its performance in a real-time system based on two experiments. The sensor’s moving state was estimated and the joint angles were calculated. Since we use moving state in the sensor offset calculation and the offset value is used to estimate the angle value, the accurate moving state estimation results in good performance for angle value estimation. The proposed network performed better than the conventional heuristic method in estimating the moving state. It was also conﬁrmed that the calculated values successfully mimic the joint angles measured using a leap motion controller. on this result, we demonstrate that the network exhibits good performance for moving state estimation in real-time, multiple sensor data as single pre-acquired data. The result indicates that the proposed network can also predict moving state in real-time system, and chasing ¯ k 0 ) for a real-time also facilitated with the proposed


Introduction
In recent years, human-machine interaction techniques have been spotlighted in robotics research. For human-machine interaction, accurate representation of the human body state is important. In studies on human motion recognition, various sensors are used to measure human body states [1,2]. In particular, soft sensors have received significant attention with respect to human-machine interaction and are applied to many wearable applications due to their high flexibility and adaptability [3][4][5].
The raw data acquired from various sensors need to be appropriately processed to be used in a motion state estimation. However, if the sensor has a nonlinear characteristic, it is hard to analyze or utilize. To solve this problem, recent studies have proposed the use of networks using deep learning to predict the nonlinear correlation [6][7][8][9]. Given that the soft sensor has higher nonlinear characteristics compared to traditional sensors [10], deep learning has been applied to characterize soft sensors and motion recognition in [11][12][13].
Among the various soft sensors, those made of piezoelectric materials are well suited for human-machine interactions in wearable devices [14,15] due to their flexibility associated with micro-scale thickness, low power, and convenience in installation [16,17]. In our previous studies, we demonstrated the performance of smart gloves using flexible piezoelectric sensors and demonstrated the interaction ability of these gloves in virtual reality applications [18,19]. Apart from these advantages, piezoelectric sensors exhibit high nonlinearity and noise similar to other soft sensors. In this study, we adopted deep learning as a method to address this problem. Specifically, the offsets of piezoelectric sensors were randomly changed in addition to the nonlinearly in a real-time system. Since we follow the offset value only in the case that the sensor does not move, the accuracy of the sensor offset value is related to the estimation of the moving state. At the same time, the angle value is calculated using the sensor offset value. This means that a good estimation of the moving state leads to a good result for angle estimation. Thus, we propose a network to predict the moving state for stabilization of the offset value. Moreover, we incorporated the trained network into the interface system for the smart glove to recognize hand joint angles.
Given that our goal is to apply the proposed network to a the real-time virtual reality interaction system, we designed it to be suitable for processing sequential data at high speed. In the proposed network, sequential sensor data, which are analog voltages, are used as the input data. We introduce the structure of the network using the Long-Short Term Memory (LSTM) units for predicting outputs and utilized the estimated result in our glove system. Several previous studies have already used the LSTM unit for processing of sequential sensor data [11,12,20,21] and the performance of the network was demonstrated.
Training of the network was conducted using a leap motion controller as the ground-truth data. The moving state, the target data for training, was defined with the measured angle value, which means that if the value is 1, the sensor is in moving state and static when value is 0. The performance of the network was demonstrated by two different experiments, with pre-acquired datasets and using a real-time interface system with a smart glove. We evaluated the accuracy of network predictions using our training and test dataset. In addition, the performance of the network for real-time application was evaluated using two criteria, an identical ratio of moving states and the similarity of the joint angles. An overview of the proposed system is shown in Figure 1. Our major contributions are as follows: (i) proposing a deep recurrent network that is trained by the novel input data, piezoelectric soft sensor; (ii) applying the proposed network into the real-time VR interface system; and (iii) showing the improvement of performance by applying deep learning into sensor data processing compared to our prior works. This paper is organized into the following sections. In Section 2, we introduce the interface system used for the smart glove. In Section 3, we propose and evaluate the deep learning network for sensor data processing. We demonstrate the performance of the real-time application of network in Section 4. Section 5 is a brief summary of the main conclusions.

Interface System
Herein, we introduce the interface system used to receive signals from the smart glove and to calculate the joint angles of the hand motion using a piezoelectric voltage.

Glove Structure
In our previous studies [18,19], data from the fabricated sensor were received by an interface board. In this study, we use the same interface board and piezoelectric sensors proposed in previous works. Especially, the sensor used in this study is proposed in [18]. A PVDF film was used to generate a piezoelectric effect, and the sensor was fabricated with a uniform beam structure with thin PVDF and Mylar layers. Detailed information on the structure of the interface board and sensor is outlined in [18]. Different from previous works, we changed the structure of the glove and interface program. The entire glove, including the main component and ring component, was fabricated using Ecoflex 00-30 silicone rubber. The new design improves the flexibility of the glove and wear sensation. Specifically, we fabricated the ring part using several lines that were combined with the straight-finger section of the glove. Small neodymium magnets were included at the end of the ring component and they were linked to each other to form a closed loop. The structure of the glove and the position of the sensors are shown in Figure 2a

System Setup
In this study, we need to simultaneously acquire and process data from two different sensors: a piezoelectric sensor and the leap motion controller. The piezoelectric sensor data are our main interest to be trained, and the data from the leap motion controller are used as the ground-truth data for training. Thus, we programmed the interface to implement two virtual hands, as shown in Figure 3. The virtual hand located on the left side was implemented based on finger joint angles estimated by the piezoelectric sensors, and the right hand shows joint angles measured directly using a leap motion controller. The detail process of angle calculation is reviewed in Section 3. Given that the proposed program received two kinds of sensor data, it was necessary to synchronize the data. The data acquisition rate of the piezoelectric sensor is 100 Hz, while the rate of the leap motion controller is 60 Hz. Therefore, the leap motion values were synchronized to the piezoelectric sensor outputs using the following approximation.
k ) is the raw data from the leap motion controller, and t (l) k is time of leap motion controller data at kth order, for the condition t In addition, the offset voltage applied to the sensor from the interface board is 2.5 V. This voltage is changed when the piezoelectric voltage is generated from the piezoelectric sensor. The range of the analog voltage transferred to the system from the interface board is 0-5 V.

Application of Recurrent Network
In this section, we propose the deep recurrent network to process the piezoelectric sensor data. To make a network that works well with the input sensor data, we need to know about the characterization of sensor data. In addition, to train the network with good performance, input data and the ground-truth data should be well defined. Therefore, we explain our motion recognition method and discuss the characterization of the piezoelectric sensor. Afterwards, we discuss how to obtain sensor data for the training. Finally, the structure of the proposed network using two different recurrent units, RNN and LSTM, is introduced, and the performance between two units is compared.

Conventional Angle Recognition Method
In our previous study, we used the following relationship between the curvature and the voltage of the piezoelectric sensor [22]: where q is the charge stored in the piezoelectric sensor, C is capacitance, v is the piezoelectric voltage, is electromechanical coupling coefficient,k is the mean curvature, and R load is the load resistance. Applying the Laplace transformation, Equations (2) and (3) can be written as: in the s domain.
Inserting Equation (4) into Equation (5) by substituting for Q(s), we have: By reorganizing Equation (6) as the relation between K(s) and V(s), we have: When the piezoelectric sensor has low capacitance value and load resistance value (1 >> C · R load ), Equation (7) reduces to: Equation (8) can be expressed as follows in the time domain after implementing the inverse-Laplace transform: In the beam structure, the value of the mean curvature is related to the change of the slope between the start and end of the beam, and the slope can be expressed as the angle of the sensor [22]. From these relations, we derive the following equation using the angular change of the sensor and piezoelectric voltage [18]: where θ is the angle variation of the piezoelectric sensor, A is the gain including all proportional terms, O is the offset, ∆t m is the time interval, and t n = ∑ n m=1 ∆t m .

Estimation of Moving State and Offset Voltage
Assigning t = 0 in Equation (9), v(0) is determined with the initial value of mean curvaturek(0). This means that, ifk(0) is nonzero, there is a voltage offset. Ideally, the mean curvature is zero at the original position. However, it is difficult to achieve zero mean curvature in a real case. In addition, given that a finger moves continuously in real-time experiment, the initial mean value can have small variation when it comes back to the original position. The difference between original position and varied position also affects the offset value.
As shown in Figure 4, calculation of joint angle values using improper offsets results in incorrect results. If v(t m ) is the same as the O(t m ), as shown in Figure 4a, the joint angle is calculated normally. However, if there is a difference between the estimated offset and real offset of the raw data, as shown in Figure 4b,c, the error (v(t m ) − O(t m )) is integrated as the time ∆t m . Therefore, the selection of an appropriate offset voltage value is an important issue. In [18], we defined the offset value O a as the average of 3000 data, such that: Using the average value is a reasonable choice for following the tendency of the signal. However, the average value can oscillate when data having a large difference are entered, and a slight offset change is applied slowly. To solve this problem, we determined the offset voltage with voltages for non-movement to updatek(0) in [19]. Assuming only curvature change of the sensor results in significant change of piezoelectric voltage sensor, the heuristic moving state m h is estimated as follows: where v th is the threshold voltage for deciding the moving state. We consider the finger as moving when m h (t n ) = 1 and stopped when m h (t n ) = 0. This method can continuously renew offset change that occurs via variation ofk(0) faster than previous one. However, heuristic moving state estimation of Equation (12) confronts following problems in a real-time system: • If noise is added to v, it is hard to distinguish noise from curvature change.

•
Input noise can be random value in case that the system is operated in different environment.

•
When noise is random, v th is hard to determine.
For these reasons, it is difficult to estimate m h in noisy environment. Therefore, we propose a network to train a moving state using recurrent neural networks as alternative.

Data Acquisition
For network training, we need the dataset including the input data and the ground-truth target data which need to be predicted. In this study, we extracted the ground-truth moving state from the finger joint angle measured by leap motion controller. Different from other vision sensors, the leap motion controller facilitates the direct acquisition of the finger joint angle without additional processing. Although the measured angle from the leap motion controller exhibits differences in the real joint angle, as compared to Chophuk et al. [23], we chose the leap motion controller as a ground-truth sensor due to its characteristic and because it can facilitate large dataset generation.
To obtain a dataset for learning, we conducted an experiment using the interface mentioned in Section 2. For data acquisition, we set up the experiment as follows: • We used the soft sensor located on the glove, and its piezoelectric voltage was measured with the interface board. • Joint angle for acquiring moving state was simultaneously measured with leap motion controller.
• Single movement was composed with flexion and extension.
• The range of flexion angle was determined randomly between 30 degrees and 90 degrees.
• Movements for training set were conducted 1080 times, and those for the test set were conducted 270 times.
The decision of the moving state m gt is given by: where ∆θ th is the threshold angle variation. We assumed that the finger is moving when the difference between the average of 10 recent values and the new value is larger than V th . We selected ∆θ th as 2 degrees based on preliminary tests. Figure 5 displays the sample of the dataset acquired from the experiment. The network was trained using the input data in Figure 5a to estimate the moving state in Figure 5c. Since our goal was to extract real-time moving state with using piezoelectric sensor outputs, we preprocessed the piezoelectric voltage into sequential input data X(t n ) = [x(t n−9 ), ..., x(t n )] using the sliding window method, where In addition, we defines target data of the network as y(t n ) = m gt (t n ). Summaries of the processed training and test sets are shown in Table 1.

Proposed Network
To train the moving state of the sensor, we used a recurrent neural network (RNN), which is appropriate for training a sequential dataset. Using input vector x t and previous hidden state h t , a simple RNN determines the hidden state as follows: where W h , W x , and b are the network parameters. Given that hidden state h t is determined with previous hidden state h t−1 , simple RNN can contain sequential information based on the recurrent hidden state. However, when the length of the data becomes larger, the gradient vanishes and explodes. To solve the problem and transfer the hidden layer information deeper, Long Short-Term Memory is proposed in [24]. Using cell vector c t and several gates, the hidden state h t is calculated as follows: where i t , f t , and o t are input gate, forget gate, and output gate vector, respectively; g t and h t are cell input and output activation functions; W and b denote weight matrices and bias vector of the network; is the element-wise product of the vectors; and tanh is output activation function. h t of LSTM unit can save more sequential information than h t of simple RNN unit. The network size of LSTM is four times larger than RNN. We applied both recurrent units into the proposed network and compared the performance.
To implement the real-time moving state estimation, we propose the network as shown in Figure 6. At each time step t n , X(t n ) passes two recurrent layers and produces the hidden layer h(t n ), which contains the sequential information of X(t n ). To convert the sequential information into a single moving state, the fully connected layer combines the information of h(t n ) and determines the final output. Given that state 1 refers to moving and 0 refers to stopping, the final output of the network p(t n ) states the probability of moving. Therefore, we define the predicted moving state m p (t n ) = [p(t n )]. We trained the network for 100 epochs and used stochastic gradient descent (SGD) as the optimizer. In addition, we applied a dropout layer after the final recurrent layer at a rate of 50% rate in training to avoid overfitting.
As the recurrent unit, we used simple RNN and LSTM units and compared the result. Table 2 displays the result for accuracy of the training set and test set for both methods. From the result, the model using LSTM unit exhibited slightly higher accuracy for the test and training set. In addition, the test accuracy of the model using the LSTM unit is more consistent than that of the model using simple RNN. The result indicates that only simple substitution for the recurrent unit without network structure change results in performance improvement in the proposed network. As stated in [25], there exists a modeling weakness of a simple RNN structure, and the LSTM structure is used to overcome the vanishing error [24,26]. The vanishing of the error gradient is increased when the long-term dependencies of input-output are enlarged. it means the LSTM unit can transfer the error information deeply than the RNN unit during the training when the training data includes long recurrent time steps. Since the single training data of our work include 10-time steps of sensor data, the LSTM unit can transfer the information deeply than the simple RNN unit when training the network. Therefore, we conclude that the LSTM unit is better suited than the simple RNN unit for soft sensor characterization, and chose the LSTM unit for the proposed network.

Result and Discussion
We evaluated our model using a real-time glove system. The experiment was conducted with an interface and the system stated in Section 2. Two experiments were conducted in this study. We initially demonstrated network performance in real-time moving state estimation and established the validity of network application for joint angle calculation. To estimate moving states m p of each joint, we utilized the network proposed in Section 3.4 and estimated the moving state with all X from different sensors received by interface board. For calculation of the joint angle, we updated the offset values using m p and calculated angle degrees for all the sensor locations.
Bending thumb and clenching fist motion except for the thumb were selected as experimental motions. The motions are shown in Figure 7. Each motion was performed five times for one test. Using the thumb bending motion, the three sensors installed in the thumb, thumb1, thumb2, and thumb3, were evaluated. The sensors installed in the index finger, index1, index2, and index3, were evaluated using the motion clenching fist except the thumb. Experiments were conducted 80 times for moving state estimation and 50 times for angle value calculation.

Moving State Estimation
To evaluate the performance of the proposed network, we compared the result m p with m h from Equation (14). Table 3 shows an identical ratio of m p and m h with m gt . In Table 3, it is evident that the coincidence between m p and m gt is 5-25% higher than the coincidence between m h and m gt . This shows that the predicted m p is more similar to the ground-truth data than m h . In addition, while m h confuses curvature change with noise, as shown in Figure 8d, the result in Figure 8c shows that m p rarely confuses noise with sensor movement and recognizes only sensor movement. Based on this result, we demonstrate that the network exhibits good performance for moving state estimation in real-time, multiple sensor data as single pre-acquired data. The result indicates that the proposed network can also predict moving state in real-time system, and chasingk(0) for offset renewing in a real-time system is also facilitated with the proposed network.

Angle Value Estimation
Based on good network performance for predicting m p , we show the ability for joint angle estimation by utilizing m p in this subsection. For demonstration, we compared the result for the methods used in our earlier studies [18,19]. In [18], we assumed that the offset O a as average value of sensor data without using moving state. In [19], we estimated the moving state heuristically and determined the offset O h . In this experiment, we assumed that the offset values were determined for the predicted moving state O p , as the average of the recent 200 data for m p = 0.
For joint angle calculation, we applied offsets O a , O h , and O p in Equation (10). The calculated joint angles were compared to the joint angles measured by the leap motion controller, and the criteria are defined as the difference between the degrees. Table 4 shows the average values of the angle difference in degrees between the joint angle from the leap motion controller and the calculated joint angles. Table 5 shows the standard deviation of the degree difference for each location. Figures 9 and 10 show the graph for examples of angle value comparisons between the joint angles in each location.  From the results, the averages and standard deviations for degree differences facilitate the lowest value of the joint angle calculated using O p . In addition, as shown in Figures 9 and 10, the angle estimation based on the proposed method shows the most similar value compared to the ground-truth data. The result demonstrates that a good estimation of the moving state leads to a good result for angle estimation. One notable result is the joint angle calculated using O h is very stable occasionally but does not always exhibit good result for angle estimation compared to using O a . This means that incorrect estimation of the moving state can lead to poor results for angle estimation. Therefore, we demonstrate the importance of precise estimation of moving state via two results: good estimation of joint angle using O p and some estimation of joint angle using O h .

Conclusions
In this paper, we propose a virtual reality interface system with a smart glove system using a deep learning network to predict the moving state of piezoelectric soft sensors. To predict the nonlinear characteristic of sequential sensor data, we use LSTM as the recurrent unit of the proposed network. This network predicted the moving state of a single soft sensor with 87% accuracy on the test dataset and in excess of 80% accuracy with real-time, multiple data. In addition, the joint angle calculated using the proposed method has shown the best similarity to leap motion controller compared to the other methods. Based on the result, we concluded that deep learning can be applied to the inference of the piezoelectric sensor characteristic in a real-time system. In addition, precise estimation of the sensor moving state affects the estimation of the joint angles in real-time application. For future works, we will apply deep learning networks to other systems using piezoelectric sensors and broaden the prediction of feature of these sensors.