Analysis and Classiﬁcation of Motor Dysfunctions in Arm Swing in Parkinson’s Disease

: Due to increasing life expectancy, the number of age-related diseases with motor dysfunctions (MD) such as Parkinson’s disease (PD) is also increasing. The assessment of MD is visual and therefore subjective. For this reason, many researchers are working on an objective evaluation. Most of the research on gait analysis deals with the analysis of leg movement. The analysis of arm movement is also important for the assessment of gait disorders. This work deals with the analysis of the arm swing by using wearable inertial sensors. A total of 250 records of 39 different subjects were used for this task. Fifteen subjects of this group had motor dysfunctions (MD). The subjects had to perform the standardized Timed Up and Go (TUG) test to ensure that the recordings were comparable. The data were classiﬁed by using the wavelet transformation, a convolutional neural network (CNN), and weight voting. During the classiﬁcation, single signals, as well as signal combinations were observed. We were able to detect MD with an accuracy of 93.4% by using the wavelet transformation and a three-layer CNN architecture.


Introduction
The life expectancy of humankind is increasing worldwide. Life expectancy is projected to increase in the 35 industrialised countries with a probability of at least 65% for women and 85% for men. There is a 90% probability that life expectancy at birth among South Korean women in 2030 will be higher than 86.7 years, the same as the highest worldwide life expectancy in 2012, and a 57% probability that it will be higher than 90 years [1]. Due to the increasing life expectancy, the number of old-age diseases is also increasing. One of them is PD. At present, there are 10 million people affected by this disease, and the trend is increasing [2]. Parkinson's disease is a neurodegenerative disease and is currently incurable. However, the progression of the disease can be delayed by medication. For this reason, an exact diagnosis is very important so that the medication can be adjusted as well as possible to the particular person. There are different rating scales for the uniform assessment, e.g., the Unified Parkinson's Disease Rating Scale (UPDRS) [3]. With the help of this rating scale, for example, cognitive and motor performance are assessed. One of the motor tests is the Timed Up and Go (TUG). The assessment is visual and therefore subjective. For this reason, many researchers are working on the objective evaluation of this test.
A similar system was used in [22] with nearly the same sensors and sensor position. An eigenvector method was suggested to compare the axes of the left and right hand. The results showed a difference between people with Parkinson's disease and healthy people.
In our approach, we want to propose a medical wearable system that: (a) classifies between subjects with motor dysfunctions and a control group based exclusively on arm motions (b) uses 3D data from the accelerometer, gyroscope, and magnetometer (c) includes new parameters (d) is small and easy to use (e) is not bound to a location (f) requires a small number of sensors (g) is low cost According to the previously mentioned classification, this paper is organized as follows. Section 2 describes our materials. The section is divided into the medical experiment protocol, the hardware used, and the dataset. Then, in Section 3, a description of our methods and how we apply the methods to our data are described. Section 4 include the results. Finally, a discussion and comparison is found in Section 5 .

Protocol
We decided to use the TUG test as a suitable test for recording gait data. Among other things, it is used to evaluate the motor performance of the UPDRS. For the test, only a chair with a backrest and armrests was needed. At first, the test person was sitting on a chair. Upon a command from the test leader, the test person stood up and walked straight ahead for ten meters at an appropriate speed to a mark. At the mark, the test person turned around and walked ten meters straight ahead, back to the chair. The test person sat down in the chair. The test and the recording were then finished. We divided the TUG into two different parts for later analysis of the data. Part (A) contained all data of the TUG including standing up and sitting down in the chair. Part (B) included going straight to the mark, turning around, and going straight back to the chair. Parts (A) and (B) are shown in Figure 1. The aim of this splitting was to extract the gait data from the complete recording.

Hardware
For data recording, we used two wristbands with the Meta Motion Rectangle wearable sensors from Mbientlab; see Figure 2 [23]. This is an inertial measurement unit (IMU) sensor. It consists of a BMI 160 with a 3-axis gyroscope and a 3-axis accelerometer and a BMM 150 with a 3-axis magnetometer. By using the Bosch sensor fusion algorithm, the Euler angle and linear acceleration can be obtained [24]. The x-axis corresponds to the gait direction.

Dataset
To create a dataset for later analysis, we worked together with the Niederlausitz Clinic in the study "Development of a digitalParkinson Disease Assessment" (ethics request granted in December 2018 by Ethics Committee Brandenburg). All persons were evaluated by the physicians. A total of 39 different persons with 250 recordings were available for the dataset. Of these, there were 15 motor dysfunction patients with 80 recordings and 24 persons with 170 recordings as the control group. Table 1 summarizes the data.

Sensor Data
While the subjects performed the TUG test, 3D Euler angles and 3D linear acceleration of the arms were captured. The signals for the Euler angles and the linear acceleration were the result of the sensor fusion algorithm from Bosch. Both signals were recorded at a frequency of 100 Hz. The algorithm for the sensor fusion used the data from the accelerometer, gyroscope, and magnetometer. Figure 3 shows at the top the 3D Euler angles and at the bottom the 3D linear acceleration signals. In Figure 3, the complete signal of one wristband during the TUG test is shown. Furthermore, Part (A) contains all recorded data and Part (B) the data between the black dotted lines, the active walking parts.   Figure 3 shows that some jumps existed in the signal of the z-axis of the Euler angle. This was because the value range of the sensor was between 0 • and 360 • . This made the signal unstable. To correct this, we removed all jumps that were greater than a threshold of 300 • . In Equation (1), our procedure is shown. If the absolute value of the difference of two successive sensor values |x i − x i+1 | > 300, a correction of the signal was performed, where i ∈ {1, ..., N}. N indicates the length of the signal. The result of the cleanup are given in Figure 4.

Derivation
It was not possible to create a classifier that could classify the subjects with motor dysfunctions (MD) and no MD by using the Euler angles, because the Euler angles were measured in absolute values. This means that the angles were not calibrated to a starting value at the beginning of the recording. For this reason, we calculated the derivative of each axis of the Euler angles. For this purpose, we calculated the difference between two successive measured values. The equation of the first order discrete derivative can be seen in (2), where N is the length of the signal, x i is the signal at index i, and x i is the value for the difference at i. The result of the derivation can be seen in Figure 5. The derivation makes the signals more comparable for different recordings. This is because the relative angle is used by the derivation.

Resampling
Before CNN can interpret the data, the signal must have a uniform length. To do this, we resampled the data to a length of 512 values. For resampling, we used the Python library SciPy [25].

Wavelet Transformation
When considering static signals, the Fourier transformation is very well suited. Unfortunately, there are hardly any static signals in the real world. Every signal changes its frequency dynamically in time. This also applies to the human gait. The gait is a dynamic process. For this reason, it does not make sense to use Fourier analysis.
The origin of the data was a temporal series; therefore, we preferred the use of the wavelet transform in order to increase the information, by decomposition of the time frequency. After the experiments, the accuracy showed a useful feature extracted from this transform. For the wavelet transformation, a signal was convoluted with a wavelet template. By selecting the kernel, we ensured that the ranges around 1.2 Hz (frequency of the arm swing [26]) had a high amplitude. With this template, we calculated the wavelet transformation over the complete signal. In our case, these were the x-, y-, and z-axes of the derived Euler angle and the x-, y-, and z-axes of the linear acceleration of both wristbands. Figure 6 shows the scalograms of the individual signals of one wristband. On the y-axis, the frequencies are shown in Hertz and on the x-axis the time in seconds. For the calculation of the wavelet transformations, we used the Python library PyWavelets [27].  Figure 6a,c,e corresponds to the x-, y-, and z-axes of the derived Euler angles. We calculated for each signal the continuous wavelet transformation with the Morlet wavelet. It can be seen that there was a high amplitude from 0.25 Hz. In the lower frequency data < 0.25, the individual arm swings can be seen. Figure 6b,d,f reflects the x-, y-, and z-axes of linear acceleration. We calculated for each signal the continuous wavelet transformation with the Morlet wavelet. With these data, it can be seen that the largest amplitude was in the range of 1 Hz. This corresponds to the natural arm swing since this corresponds to a frequency of approximately 1.2 Hz [26].

CNN
In image classification, as well as other signals, the application of CNNs has been very successful. The difference from common NNs is that a CNN searches for a local pattern in the input signal. When using multiple CNN layers, one after the other, larger patterns can be detected [28,29]. Thus, a CNN often provides better classification results than NN. In our case, we achieved the best results with the use of three convolution layers. Then, we applied one NN with three encoders and one decoder. Our used CNN with the configuration is shown in Figure 7. We used Python and the Keras library to create the CNN [30]. We obtained the architecture for our CNN by systematically testing. We wanted to keep the number of CNN layers as small as possible. However, with less than three layers, no useful results were available.  In order to have a useful input for the CNN, we resampled the signal to a uniform length of 512 values; see Section 3.3. We then applied a wavelet transformation to the signal; see Section 3.4. This gave us a 128 × 512 matrix for the signal. We used this matrix as input for the CNN. As the activation function, we used the ReLU function for all convolution layers. We also used the ReLU function in the hidden layers of the encoder and decoder. The equation of the ReLU function can be seen in Equation (3). The characteristic of the ReLU function is that the weight of the output is not negative. In the output layer, we used the sigmoid function; see Equation (4). After each convolution layer, we performed a two-dimensional max-pooling with a pool size of 2 × 2 and a drop out with a probability of 0.2.
The first convolutional layer searched for the smallest pattern from the signal. For convolution, we used a 3 × 3 matrix. In total, we created 64 different filters in the first convolutional layer. In the second convolutional layer, we increased our kernel size to 5 × 5 and created 64 filters again. The third convolutional layer had a kernel size of 7 × 7, and the filters created were reduced to 32 pieces. After the convolutional layers, we used a flatten layer so that the signal could be interpreted by the dense layers. In the dense layers, we started with three encoder layers with 100, 50, and 10 neurons, followed by a decoder layer with 30 neurons. Finally, we obtained our prediction in the output layer. Since we had a binary problem, a single neuron was used. For the training of the models, we used a batch size of 50 and 50 epochs. For training, we used an Intel Core i7-6700HQ with 2.6 GHz with four cores. Furthermore, the system used 16 GB RAM. The computer required approximately 45 min to train a model.

Multi-Channel CNN
In the last section, we presented our architecture for a single signal. To achieve better and more robust results, we wanted to use multiple channels x, y Euler angles, and x of linear acceleration for classification. For this reason, we created an m-dimensional input. For the third dimension, we used the number of m different signals used. Figure 8 shows the construction. Another difference was that the first convolutional layer created 128 filters. The model was similar to the one in Figure 7. The computer required approximately 2 h to train a model.

Weight Voting
The multi-channel CNN was trained with 3 signals at the same time. The difference in voting was that for each signal, a separate model was trained, which was independent of the other models. In our case, we had a binary problem, so the calculation for the voting was easy. We used the predicted classes and calculated the average of all predictions; see Equation (5), where m i is the prediction of a model from a classifier and M is the number of classifiers.
If v ≥ 0.5, then the predicted class is MD and in all other cases, no MD; see Equation (6).

Evaluation
We decided to use 3-fold cross-validation for the classification to make the results of our applied methods reasonable. We used 66.6% of the data for training and 33.3% for testing. For each measurement, we calculated the sensitivity, specificity (precision), recall, F1-score, and accuracy. For this, we used the confusion matrix in Table 2. Sensitivity (recall) is a widespread measurement in medicine. It indicated the ratio of predicted MD to all MD inside our test data; see Equation (7). The specificity described how well our system can distinguish MD from the control group (no MD). It was the ratio of predicted non-MD persons in all test data where healthy persons were present; see Equation (8). Precision was the proportion of correctly predicted MD to all MD; see Equation (9). Accuracy was the ratio of all correctly recognized MD and no MD to all test data; see Equation (10). The F1-score (F1) was the harmonious average between precision and recall; see Equation (11).

Methodology
After we have presented our material and methods, we will now discuss in this section how we applied these methods. In the presentation of the dataset, we already said that we divided our recording into two different parts. First, we classified Parts (A) and (B), which comprised the complete recording of the TUG test. The other scenario was that we only used Part (B). In Part (B), only the gait was used. Figure 9 shows the complete algorithm of the classification. In principle, we distinguished between the signals of the Euler angles and the linear acceleration. First, we removed the jumps within a signal of the Euler angles and then calculated the derivation of the signal. This made the signal more comparable. These steps were not necessary for linear acceleration. Then, we set the signals to a uniform length. This was necessary so that the signals could be interpreted by CNN later during classification. After resampling, we calculated the wavelet transformation for each individual signal. We used the resulting scalograms for the classification. In the classifications, we analyzed three different cases. At first, we classified each signal individually by CNN. This allowed us to show which axis of the sensors was very important. In the second case of classification, we used the three best signals for a multi-channel CNN. The third case was that we used the three best signals for classification by voting.
linear acceleration Euler angles

Single Layer
To find out which sensor data were particularly useful for classification, we first separated all signals from each other. The results are shown in Table 3. In the table, we applied three-fold cross-validation to the sensor data. Furthermore, we optically separated the results from the Euler angles and the linear acceleration with a double line. For each signal, we calculated the precision, specificity, recall, F1-score, and accuracy. In every cell, we show the meanx = 1 .., N} plus or minus the standard deviation s = where N is the length of the signal. The columns with the best results are highlighted with bold. It can be seen that the x-axis of the Euler angle and the x-axis of the linear acceleration produced the best results. Furthermore, it can be seen that the z-axis of the Euler angle and linear acceleration provided the lowest results.

Signal Combination
To get better results in the classification, we decided to combine the individual layers. For the combination, there were several possibilities. On the one hand, it was possible to use an ensemble classifier like voting. On the other hand, we could use a multi-channel CNN. In Table 3, the x-axis of the Euler angles and the linear acceleration produced the best results. The third was the Euler angles of the y-axis. In this section, we used these three signals to improve our results. The results are shown in Table 4. We again used three-fold cross-validation for our results. Each cell represented the result as x ± s, as introduced in Section 4.1.1.  Table 4 shows the results of the signal combination classification. The three channel CNN achieved better results than the three signal voting. The three channel CNN was also better than any signal in Table 3.

Single Layer
In this section, we present our results if only Part (B) of the TUG test was used for classification. In Table 5, you can see the results for a CNN classification for each axis of the sensors. As in Section 4.1.1, we used three-fold cross-validation and calculated the averagex plus or minus the standard deviation s. The best results for each sensor and each column are marked with bold. Like the analysis of the complete TUG test, the x-axis provided the best results for Euler angles and linear acceleration. However, the results were not as accurate as in Section 4.1.1.  Table 6 shows the results of the signal combination of Part (B) of the TUG test. For the results, three-fold cross-validation was applied and for each cell, and the averagex plus or minus the standard deviation s was calculated. The three signal voting performed best. However, the results were marginally better than the single signal CNN classification in Table 5. Furthermore, the results were not as good as if the complete TUG test was used for the classification.

Discussion
In Tables 3 and 5, the x-axis always shows the best results. The x-axis corresponds to the movement in the sagittal plane. According to the literature, the most important characteristics of human gait are also present in this plane [31,32]. For this reason, it is a logical conclusion that the features with the highest significance are present on this axis.
We presented our results in the previous section. We compared the results when the complete TUG test, Parts (A) and (B), was used for the classification, as well as if we only used the gait, Part (B), for the classification. The results showed that for the classification of motor dysfunctions, the gait alone gave quite good results with an accuracy of 90.3%, but when looking at the complete test, we obtained even better results with an accuracy of 93.3%. From this, we concluded that the complete TUG test was necessary for the analysis of motor dysfunctions.
Furthermore, we classified each signal separately. During the classification, we found out that the x-axis of the Euler angle and linear acceleration gave the best results, independent of whether Parts (A) and (B), as well as only Part (B) were used for the classification. From this, we concluded that the x-axis was the most relevant.
The conclusion was that we obtained better results through the combination of the signals compared to single signals. In the classification of Parts (A) and (B), the three-channel CNN proved to be the best solution. When classifying with only Part (B), voting was the best choice. Table 7 shows our classification results compared to the corresponding state-of-the-art works. Our results were comparable to the results from large, expensive, and stationary video based systems.

Reference Description Accuracy
Our System IMU sensors 93.3% [12] Kinect camera 90% [17] Kinect, Bayesian network 93.4% [18] Kinect and e-Motion capture program 96.23% [15] Gyroscope 90% Our system delivered better results than the wearable system that also classified the data [15]. We could not make a comparison with the other works because they focused on a statistical evaluation of the data. CNN in combination with wavelet transformations was a powerful technique for arm swing analysis.