Next Article in Journal
A Comparative Performance Evaluation of Routing Protocols for Flying Ad-Hoc Networks in Real Conditions
Previous Article in Journal
Modeling and Test Results of an Innovative Gyroscope Wave Energy Converter
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Performance Evaluation of RNN with Hyperbolic Secant in Gate Structure through Application of Parkinson’s Disease Detection

Graduate School of System Informatics, Kobe University, 1-1, Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(10), 4361; https://doi.org/10.3390/app11104361
Submission received: 13 April 2021 / Revised: 7 May 2021 / Accepted: 9 May 2021 / Published: 11 May 2021
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
This paper studies a novel recurrent neural network (RNN) with hyperbolic secant (sech) in the gate for a specific medical application task of Parkinson’s disease (PD) detection. In detail, it focuses on the fact that patients with PD have motor speech disorders, by converting the voice data into black-and-white images of a recurrence plot (RP) at specific time intervals and constructing the detection model that combines RNN and convolutional neural network (CNN); the study evaluates the performance of the RNN with sech gate compared with long short-term memory (LSTM) and gated recurrent unit (GRU) with conventional gates. As a result, the proposed model obtained similar results to LSTM and GRU (an average accuracy of about 70%) with less hyperparameters, resulting in faster learning. In addition, in the framework of the RNN with sech in gate, the accuracy obtained by using tanh as the output activation function is higher than using the relu function. The proposed method will see more improvement by increasing the data in the future. More analysis on the input sound type, the RP image size, and the deep learning structures will be included in our future work for further improving the performance of PD detection from voice.

1. Introduction

In recent years, the recurrent neural network (RNN) has been frequently used in time series data processing such as in medical information processing, etc. The RNN has a recursive structure inside and can handle variable lengths of input data.
Generally, since it is difficult for a Simple RNN (Vanilla RNN) [1] with a simple structure to learn the time series data with long-term dependencies, two types of RNNs with complex gated structures to control the required information are proposed; they are long short-term memory (LSTM) [2,3] and gated recurrent unit (GRU) [4], respectively. However, while the performance of RNNs with gated structures is improved, since backpropagation through time (BPTT) used for learning works by unrolling all input time steps, the more parameters there are in the RNN, the more memory is required and the higher the calculation costs. In order to solve this problem, many papers have been published that attempted to simplify LSTM and GRU [5,6,7,8]. In our previous research, we proposed SGR (simple gated RNN) that uses the parts of gate structure of LSTM and GRU for Simple RNN to reduce parameters and realize faster learning [9]. In addition, we also proposed an RNN that introduced a new activation function “hyperbolic secant (sech)” into the gate [10], which is even simpler than Simple RNN (Vanilla RNN).
The objective of this paper is to evaluate this RNN model with sech in the gate through a specific medical application task. In detail, the task here is to detect Parkinson’s disease (PD) from the sound information of subjects. PD is the second most common neurodegenerative disease after Alzheimer’s disease [11]. The detection of PD is very important for medical treatments as well as improving the patient’s quality of live (QOL). Focusing on the fact that patients with PD have motor speech disorders, here we ask subjects to pronounce a voice of /a/ for about 4 to 10 s to acquire data. In order to check the periodicity of the voice, the sound data are then converted into a recurrence plot (RP) at specific time intervals. A recurrence plot (RP) can visualize a periodic nature and chaos in time series. The generated RPs are set as input to a neural network model that combines the convolutional neural network (CNN) and RNN for classification.
In the experiments, in order to evaluate the performance of the RNN with sech gate structure, we compared it with LSTM and GRU with conventional gate structures. We also compared the performance of this RNN when the used output activation function was changed between tanh and relu. For performance evaluation, accuracy, F1-score and Matthews correlation coefficient (MCC) were used. As a result, the proposed model obtained similar results to LSTM and GRU (an average accuracy of about 70%) with less hyperparameters, contributing to faster learning. In addition, in the framework of the RNN with sech in gate, the accuracy obtained by using tanh as the output activation function is higher than using the relu function.
The outline of the paper is as follows: Section 2 discusses related works. Section 3 explains the framework of PD detection models and data preprocessing. The details of the experiments, the results, and discussion are described in Section 4. Section 5 is the conclusions.

2. Related Work

In this section, we review important points regarding related works on RNNs and CNNs and the recurrence plot used in our study.
As mentioned in Section 1, recent RNNs generally refer to RNNs with weighted gate structures such as LSTM and GRU, rather than Simple RNN (or Vanilla RNN) with simple structures. This is because the RNN with gate structure succeeded in partially solving a vanishing gradient problem. When the vanishing gradient occurs, the error becomes smaller rapidly as it goes back layers, and the learning does not progress well. The gate structure deals with this problem by controlling the vanishing gradient with weight parameters. However, the weight parameters increase the calculation cost, and the analysis is difficult. To address these problems, many papers have been written that attempted to reduce the parameters of the RNN [5,6,7,8]. As in our previous study, we proposed SGR that reduced weighted gates while maintaining the performance. However, due to having weighted gates, the calculation costs were not significantly reduced, and the analysis remained difficult [9]. Therefore, we proposed a new RNN model that reduced parameters by removing the conventional weighted gate and using a new gate structure with scalar value controlling the vanishing gradient and the sech function as the activation function [10]. Regarding this RNN using the sech function, in [10], the task performance was lower than that of conventional gated RNNs (such as LSTM and GRU) due to the characteristics of the specific task and tanh activation function. However, in a binary classification task in natural language processing (NLP), it was found that, despite the parameter of about 1/6 or less, the performance was comparable to that of conventional gated RNNs without using normalization methods such as batch normalization. It is then important to clarify the performance quantity for more tasks such as time series data processing and image processing.
In this paper, we confirm the performance difference in a practical medical application task between this RNN with sech function and conventional gated RNNs and evaluate it. Therefore, in this study, we set the detection of Parkinson’s disease as the practical medical application task. In time-frequency analysis for analyzing time series data, it is common to use the short-time Fourier transform (STFT). STFT divides a time signal into short segments of equal length and computes the Fourier transform separately on each segment. However, Fourier transform has the drawback of lower resolution for non-stationary signals. Since the voice of PD patients is non-stationary due to a faint or unstable voice caused by dysarthria, in our study, we attempt voice analysis using a recurrence plot (RP) that can process even non-stationary signals. As the way to represent chaotic time series and visualize a periodic nature, a recurrence plot (RP) was introduced by Eckmann et al. [12]. There are various methods for generating recurrence plots, such as plotting points that are smaller than the arbitrary threshold or generating recurrence plots using percentile. Compared to the Fourier transform, which is not suitable for describing systems in which independent basis functions cannot be properly selected, RP can handle both non-linear and unsteady states [13]. Additionally, RP can detect even faint modulations of animal voice signals that cannot be captured by conventional time-frequency analysis and is a very powerful tool [14].
Recently, there has been increasing research in which recurrence plot images are classified using a CNN [15,16], which is used for the image recognition tasks [17,18,19]. Furthermore, there is a Parkinson’s disease identification study which used a CNN and recurrence plots of handwriting dynamics data, too [20]. Therefore, in this paper, we propose a Parkinson’s disease voice detection model that combines a CNN and RNN and use it to evaluate RNNs.
Related to other studies on PD detection, in the machine learning approach, there is a study that used embedding extracted from a deep neural network named x vectors and classified it using cosine distance, cosine distance preceded by Latent Dirichlet Allocation (LDA), and Polylingual Latent Dirichlet Allocation (PLDA) [21]. Additionally, in [22], four machine learning methods (k-nearest neighbor, multi-layer perceptron, optimum-path forest, and support vector machine) were used with 18 feature extraction techniques for the detection of PD. In the deep learning approach, in [23], multiple artificial neural networks (ANNs) were used with 26 speech features for PD detection, and principal component analysis (PCA) and self-organizing map (SOM) were applied for feature selection. While in [24], a deep neural network (DNN) was applied for PD severity prediction using 16 biomedical voice measures. In this research, the calculation costs of voice preprocessing are relatively high, and multiple voice features are required. Compared with this research, in our study, we propose to use an RP which only calculates the percentile and absolute distance so as to reduce the calculation cost and make it easy for implementation. By using an RP generated from simple vowel voice, our approach uniquely detects PD with the model combining the CNN and RNN. To the best of our knowledge, detecting PD with the model combining the CNN and RNN using an RP generated from simple vowel voice has not been referred before.

3. The Model Description and Data Preprocessing

This section describes neural network models used in this paper in detail.

3.1. RNN Models

3.1.1. Long Short-Term Memory (LSTM)

As is shown in Figure 1, a layer of typical LSTM is defined by Equations (1)–(6) as follows:
z t = tanh ( W z x t + U z h t 1 + b z )
i t = sigmoid ( W i n x t + U i n h t 1 + b i n )
o t = sigmoid ( W o u t x t + U o u t h t 1 + b o u t )
f t = sigmoid ( W f o r x t + U f o r h t 1 + b f o r )
c t = i t z t + f t c t 1
h t = tanh ( c t ) o t
where z t is an input to be added to cell state, c t is an internal cell state, h t is a hidden state at next time. Additionally, i t ,   o t ,   f t represent outputs from input gate, output gate, and forget gate, respectively. From here, unless otherwise specified, the operator symbol represents the Hadamard product, W ,   U m × n are weighting matrices, b n is bias, regardless of the subscript, and the explanations are omitted hereafter. In addition, x t is an input at time step t , h t 1 is a hidden state at time step t 1 , and h t is a hidden state at time step t in this paper.
The input gate controls input to cell state, using the value of i t   ( 0 < i t < 1 ) which is the output from Equation (2) so that it can be selectively stored from the input. Additionally, the output gate controls the output from cell state using the value of o t   ( 0 < o t < 1 ) which is the output from Equation (3). Finally, the forget gate controls the internal cell state directly by f t   ( 0 < f t < 1 ) which is the output from Equation (4) [2,3].

3.1.2. Gated Recurrent Unit (GRU)

As is shown in Figure 2, a layer of GRU is defined by Equations (7)–(10) as follows:
r t = sigmoid ( W r x t + U r h t 1 + b r )
u t = sigmoid ( W u x t + U u h t 1 + b u )
h t ˜ = tanh ( W g r x t + U g r ( r t h t 1 ) + b g r )
h t = u t h t 1 + ( 1 u t ) h t ˜
where r t ,   u t are outputs from reset gate and update gate. The reset gate controls how much the previous hidden state h t 1 is considered to create a new hidden state h t ˜ , using the gate output r t   ( 0 < r t < 1 ) which is the output from Equation (7). Similarly, the update gate is in control of deciding how much the new hidden state h t ˜ is mixed to generate the next hidden state h t , using the gate output u t   ( 0 < u t < 1 ) which is the output from Equation (8) [4]. The performance of GRU compared to that of LSTM depends on the learning task, while the required number of weight parameters for the same number of units is 3/4 that of LSTM.

3.1.3. Our Proposed RNN Model

In the conventional method, there are various problems such as an increase in the amount of calculation, difficulty in the analysis of the gate structure, and dependence on data due to batch normalization. Therefore, in order to solve these problems in the conventional RNNs, we constructed the new structure using the sech function as the gate structure [10]. In the conventional gate structure, if the gate structure has weight parameters, the calculation cost is very high, but if it does not have weight parameters, scaling for negative and positive values does not work properly, because the sigmoid function is not an even function. Hence, instead of the sigmoid function, we used the sech function that is an even function and has an output range from 0 to 1 for the gate structure [10].
The sech function is defined by Equation (11) as follows:
sech ( x ) = 2 / ( e x + e x )
Additionally, differentiating (11) is defined by Equation (12) as follows
d d x ( sech ( x ) ) = sech ( x ) · tanh ( x )
Figure 3 shows a graph of the sech function and sech’s differentiated function.
As is shown in Figure 4, a layer of the RNN with sech gate structure with the highest performance in the paper [10] is defined by Equations (13)–(15) as follows:
h t ˜ = sech ( a h t 1 ) h t 1
h t = ( W x x t + b x ) + h t ˜
o t = act ( h t )
where a is a scalar value, which is introduced as a parameter for controlling the degree of vanishing gradient. Additionally, the output to the lower layer is represented by o t , and act is activation function. Hereinafter, for the sake of simplicity, the RNN with sech gate structure is referred to as “RNN-SH”.

3.2. Parkinson’s Disease Detection

3.2.1. Voice Data Preprocessing and Recurrence Plot Creation

Here, we explain how to preprocess voice data and create recurrence plots. The voice data used in our experiments are recordings of the voice/a/ including men and women for about 4 to 10 s. Regardless of the recording, all voice data are converted to monaural, 16,000 Hz, and then, processing to create RPs is started, because voice data are recorded in various environments. Additionally, all voice data are quantized in 16 bits.
The preprocessing of voice data before creating RPs is as follows:
  • Delete the silent section before and after voice data: In order to avoid the influence of voice volume, all voice data are normalized so that the maximum amplitude is the expressible maximum value, and the sections of −40 dB or less are deleted only from before and after the sound.
  • Furthermore, delete the first and last of the above voice data for 0.1 s: This was because the first and last sounds after deleting the silent sections were often unstable.
Next, the procedure of creating recurrence plots is as follows:
  • Divide the above preprocessed voice data into 0.01 s sections.
  • Immediately before creating the recurrence plot, normalize the sound of 0.01 s section so that it becomes [−1, 1], and then plot points with a distance smaller than the 35th percentile in 0.01 s section. The reason for using the 35th percentile is that the accuracy was highest when the 35th percentile was used in the experiments, which is described later.
  • Compress the generated black-and-white RP with 160 × 160 image size to 20 × 20 using bilinear interpolation. This may deteriorate the accuracy, but it is carried out in consideration of memory efficiency of Video Random Access Memory (VRAM) when inputting to the neural network.
  • Repeat above 1–3 until all divided voice data become recurrence plots. However, if a fraction less than 0.01 s appears in the last section, it will be rounded down.

3.2.2. Parkinson’s Disease Detection Model

Figure 5 shows the structure of the Parkinson’s disease detection model. The CNN model, RNN model, and output layer are processed in this order for detection. Relu is used as the activation function in the CNN model, but for each RNN, a suitable activation function is used (tanh or relu).

4. Performance Evaluation

4.1. Outline of Experiments

In order to evaluate the performance of each RNN using the recurrence plots and the model constructed in Section 3, experiments were performed using a computer. Here, PyTorch 1.7.1 which is a machine learning library, cuda11 and Python 3 (version 3.8) was used. The execution environment was Ubuntu 18.04.4 (64 bit) and we used i7 8700 (RAM 16 GB) and gtx1080 of GPU (VRAM 8 GB) for single precision calculations. For the processing of voice data, a python library named Pydub [25] was used. Some of Pydub functions depend on FFmpeg [26]. In addition, when using GPU on PyTorch, it was necessary to set CUBLAS_WORKSPACE_CONFIG =: 16:8 as an environment variable for reproducibility due to use of cuda11. In the experiments, all random seeds were initialized to 10.

4.2. Input Voices and Preprocessing

As mentioned in Section 3, the voice dataset used in the experiments are recordings of the voice/a/ for about 4 to 10 s. This dataset contains data for 22 healthy people (HP) and 30 PD patients, and three times data for each person is recorded. Therefore, the total number of data is 156 ((22 + 30) × 3). The dataset contains 43 subjects (13 HP and 30 PD cases) who were hired as volunteers by the GYENNO SCIENCE Parkinson Disease Research Center (Ethical Approval: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and the “Law of the People’s Republic of China on Medical Practitioners” (1998) declaration and its later amendments or comparable ethical standards), and nine subjects (nine HP) who were requested by the authors in order to alleviate the data imbalance. The breakdown of gender was 27 females and 25 males. PD patients consisted of patients with HY (Hoehn and Yahr) stage 1–5. A total of 13 PD patients were in stage 3–5, and 17 PD patients were lower than stage 3.
Figure 6 shows the examples of preprocessed voice data and generated RPs in the procedure of Section 3. If the value of the 35th percentile is p for the time series { x i } i = 1 N = { x 1 ,   x 2 ,   ,   x N } , the recurrence plot matrix R is plotted by Equation (16) as follows:
R ( i ,   j ) = { 1 if   | x i x j | < p 0 otherwise ,
where i ,   j { 1 ,   2 ,   ,   N } are the numbers of the components. R ( i ,   j ) = 1 means that black is plotted, and R ( i ,   j ) = 0 means that white is plotted.
We divided dataset into, train: 28 × 3 (HP: 10 × 3, PD: 18 × 3, about 54%), validation: 12 × 3 (HP: 6 × 3, PD: 6 × 3, about 23%), and test: 12 × 3 (HP: 6 × 3, PD: 6 × 3, about 23%). However, the three data of the same person were divided so that they belonged to the same group (train or validation or test) in the experiments. Additionally, at the time of training, the voices of healthy person data shifted by 2 s were used as oversampling data so that the total number of data did not become imbalanced. Furthermore, in order to avoid the influence of dispersion due to data division, the data were shuffled, experiments were performed five times consecutively for each RNN, and the performance was evaluated based on the average.

4.3. RNN Model Configuration and Hyperparameters

The RNNs used in the experiments were LSTM, GRU, and RNN-SH. Additionally, the number of hidden units was set to 64, 128, and 256. RNN-SH used tanh or relu as the output activation function. We also experimented when the RNNs are stacked in two layers. We used RAdam [27], which is a kind of stochastic gradient descent algorithm for learning, and set RAdam parameters to recommended values in Adam [28]. The dropout ratio and parameter of weight decay were set to 0.5 and 0.01, respectively, to prevent overfitting. The number of epochs was 50, and the mini-batch size was 27. Regarding the initialization of weight parameters, they were initialized to a Gaussian distribution with a mean of 0 and a variance of 1 / n ( n : the number of input units) [29], and all biases were initialized to 0. Additionally, CNN weights were initialized to a uniform distribution [ v ,   v ] where v = 3 / n [29]. The scalar value a of gate structure of RNN-SH was initialized with 1.0. Finally, Softmax cross entropy was used as the loss function. This loss function ( L ) is expressed by the following Equation (17):
L = 1 m i , j t i j log ( exp ( y i j ) k = 1 o exp ( y i k ) )
where m is the batch size, o is the output size, y i j is the output from output layer, and t i j is the correct answer label. In addition, i is the data number, j is the dimension, and t i j is the correct answer label of j-th dimension in the i-th data.

4.4. Experimental Results

The results of the experiments are shown in Table 1. In addition, Figure 7 shows graphs of the average values in Table 1 of each model. Table 1 shows the results of the validation set and test set at the time when the loss was the lowest with respect to the validation set in 50 epochs. Since the experiments were performed five times each, the mean and standard deviation of the validation set and the test set are respectively shown. Additionally, the overall mean and standard deviation of the validation set and test set are shown too. “Average (validation/test)” is the average of five times for each validation set or each test set, respectively. “Total Average” is the all average of the validation set and test set. Here, we used accuracy, F-score and Matthews correlation coefficient (MCC) as performance evaluation indicators. Each formula is defined by Equations (18)–(20) as follows:
        A c c u r a c y = ( T P + T N ) / ( T P + T N + F P + F N )
  F s c o r e = 2 T P / ( 2 T P + F P + F N )
M C C = ( T P × T N ) ( F P × F N ) ( T P + F P ) × ( T P + F N ) × ( T N + F P ) × ( T N + F N )   ,  
where T P ,   T N ,   F P ,   F N are the number of true positives, true negatives, false positives, and false negatives, respectively. Accuracy and F-score is a value between 0 and 1, the closer they are to 1, the better the performance is. For unbalanced data, F-score can perform more accurate performance evaluation than accuracy. Note that the F1-score shown in the experiments is macro-average. The MCC is a correlation coefficient value between −1 and +1, and +1 indicates a perfect prediction, 0 an average random prediction and −1 an inverse prediction. The MCC was calculated for more accurate evaluation, including inverse predictions. In Table 1, accuracy and F-score show the value rounded off the third decimal place, and MCC shows the value rounded off the fourth decimal place.
As a result, from Table 1, there was no significant difference in the performance between RNNs. Statistically significant difference was not found between the top three models (GRU: two layers and 256 units, LSTM: two layers, 64 units, and RNN-SH (tanh): one layer and 256 units) by the Friedman test. Here, Figure 8 shows the average ROC curve with AUC values of RNN-SH and GRU, and Figure 9 shows a graph of the relationship between the number of units and the parameters of the PD detection model. From Figure 8, it is clear that the AUC value of GRU is higher than the value of RNN-SH (tanh); however, from Figure 9, the parameters of RNN-SH (tanh) with one layer and 256 units are less than a third of the parameters of GRU with two layers and 256 units, although the difference in the performance is small. In addition, the parameter amount of LSTM with two layers and 64 units is not much different from that of RNN-SH with one layer and 256 units. However, in the case of the LSTM, though the total average is high, there is a difference between the validation average and the test average, and it can be confirmed that overfitting has occurred. On the other hand, for the RNN-SH, the validation average and test average are about the same value. Figure 10 shows the examples of confusion matrix in the first trial of LSTM and RNN-SH (tanh). In Figure 10, LSTM has a better performance for the validation set than RNN-SH, but not so much for the test set.
From Table 1 and Figure 7, in RNN-SH, the performance when the output activation function was tanh was higher than that of relu.
Figure 11 shows the execution time of each RNN in the case of one layer and 256 units in the first execution. Here, since we used LSTM and GRU that were pre-implemented in pytorch, accurate speed comparison with RNN-SH, which was implemented by the combination of pytorch functions, cannot be performed. Therefore, in Figure 11, we used LSTM and GRU that were individually implemented by the combination of pytorch functions, and the speeds are compared. In order to perform speed comparisons fairly, we considered the parallelism of LSTM and GRU, and implemented LSTM and GRU so that they could be calculated at high speed by splitting the result of parallel calculation for each weight of gate, etc. From Figure 11, since RNN-SH has fewer parameters and a lower calculation cost than LSTM and GRU, the learning of RNN-SH was faster than them.
Figure 12 shows learning curves of loss for each RNN with 256 units and one or two layers. From Figure 12, it can be seen that the losses oscillate due to the small amount of dataset, but the learning of RNN-SH progresses relatively gently compared to that of LSTM and GRU. However, RNN-SH (tanh) with two layers oscillate violently and could not learn well.

4.5. Discussion

Statistically significant difference was not found between RNN models, but the total number of parameters of the PD detection model using RNN-SH is about 1/4 of the model using LSTM and about 1/3 of the model using GRU when the number of units and layers are the same. This difference becomes even wider when the units increase or multiple layers are used. Hence, it is very effective in terms of memory efficiency and faster learning. When using RNN-SH, it is easy to take measures against overfitting because there are few parameters, slow learning progresses, and only weight parameters for the input.
Regarding the comparison when tanh and relu are used in RNN-SH, taking the case of one layer and 256 units for example, the scalar value a , which is the gate parameter, is about 1.007 for tanh and about 1.012 for relu, and they are very close with each other. This was also the case under other conditions in these experiments. Therefore, it is considered that the difference in the performance of tanh and relu is caused by the difference only in activation function. From Figure 12, RNN-SH (tanh) with 256 units and two layers oscillate violently, and the reason why it could not learn well comes from the vanishing gradient at the output due to tanh. On the other hand, RNN-SH (relu) with 256 units and two layers could be learned smoothly; however, the accuracy was lower than that of tanh. Here, Figure 13 shows the examples of confusion matrix in the first trial of the tanh and relu RNN-SH. From Figure 13, the FP rate of relu was higher than that of tanh, and TN rate of relu was lower than that of tanh, indicating that the relu model could not identify PD patients properly and could not be learned well. Relu has gathered attention as the activation function that contributes to the multi-layering of neural networks. However, relu is not symmetric with respect to the origin and has problems such as a dying relu problem [30]. In our experiments, we did not use optimal parameter initialization or normalization methods that optimize for relu. Thus, dying relu may have occurred and caused the accuracy to deteriorate. Since relu is the function that is important for multi-layering, it is necessary to analyze in the future what is needed to improve the performance when using the relu.
In terms of PD detection using an RP, the accuracy is around 70%. This is because some of the voice data of healthy people had hoarse voices and uneven voice volume. In our experiments, the voice/a/ was used; however, it was difficult for even a healthy person to utter the voice/a/ at a constant volume for a long time. Therefore, from the results of our experiments, the voice/a/ was not sufficient for PD detection. In order to improve the accuracy of PD detection, it is necessary to consider what pronunciation is more suitable for the detection in the future. Additionally, in our experiment, the RP images were resized by bilinear interpolation in consideration of the memory usage. However, when the RP image is compressed, features different from the original features may appear or important features may disappear. Since it is possible that the accuracy had deteriorated due to resizing, it is necessary to further study the image resizing method and the architecture of the CNN in the future.

5. Conclusions

This paper evaluated the effectiveness of our RNN-SH model in a practical medical application of PD detection using RP with less parameters than other gated RNNs such as LSTM and GRU. RNN-SH can greatly reduce parameters, more so than other RNNs, with comparable accuracy for time series data processing although the difference in accuracy was small and a statistically significant difference was not found between RNN models.
Since the gate parameter of RNN-SH is scalar, it is easy to analyze the result. In addition, another advantage is that the activation function can be easily changed according to the tasks. However, it turned out that it is difficult to improve the accuracy by simply replacing tanh with relu as the activation function in RNN-SH. From our experiment, the PD detection ability is lower using relu in comparison to using tanh, and we consider that dying relu might be the cause. It is necessary to analyze and make further improvements by considering parameter initialization and normalization methods, etc.
Since the dataset size in our experiment was relatively small, the proposed method will see more improvement by increasing the data in the future. More analysis on the input sound type, the RP image size, and the deep learning structures will be included in our future work for further improving the performance of PD detection from voice. In regard to RP image compression, we would investigate an image size or a CNN structure that does not deteriorate the accuracy of PD detection in more detail to improve the current model.

Author Contributions

Conceived and designed the model, T.F. and Z.L.; performed the experiment and analyzed the results, T.F.; wrote the preliminary version of this manuscript, T.F.; revised the manuscript, Z.L., C.Q., K.M., and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and the “Law of the People’s Republic of China on Medical Practitioners” (1998) declaration and its later amendments or comparable ethical standards.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly at present, so supporting data is not available.

Acknowledgments

We wish to express our appreciation to the editor and reviewers for their insightful comments as well as the participants for supporting this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  2. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  3. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
  4. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representa-tions using RNN encoder-decoder for statistical machine translation. Conf. Empir. Methods Nat. Lang. Process. 2014, 1724–1734. [Google Scholar]
  5. Zhou, G.-B.; Wu, J.; Zhang, C.-L.; Zhou, Z.-H. Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 2016, 13, 226–234. [Google Scholar] [CrossRef] [Green Version]
  6. Lei, T.; Zhang, Y.; Wang, S.I.; Dai, H.; Artzi, Y. Simple Recurrent Units for Highly Parallelizable Recurrence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2018; pp. 4470–4481. [Google Scholar]
  7. Yue, B.; Fu, J.; Liang, J. Residual Recurrent Neural Networks for Learning Sequential Representations. Information 2018, 9, 56. [Google Scholar] [CrossRef] [Green Version]
  8. Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 23 June 2018; pp. 5457–5466. [Google Scholar]
  9. Fujita, T.; Luo, Z.; Quan, C.; Mori, K. Simplification of RNN and Its Performance Evaluation in Machine Translation. Trans. Inst. Syst. Control Inf. Eng. 2020, 33, 267–274. [Google Scholar] [CrossRef]
  10. Fujita, T.; Luo, Z.; Quan, C.; Mori, K. Structure Construction and Performance Analysis of RNN Aiming for Reduction of Calculation Costs. Trans. Inst. Syst. Control Inf. Eng. 2021, 34, 89–97. [Google Scholar]
  11. Wirdefeldt, K.; Adami, H.-O.; Cole, P.; Trichopoulos, D.; Mandel, J. Epidemiology and etiology of Parkinson’s disease: A review of the evidence. Eur. J. Epidemiol. 2011, 26, 1–58. [Google Scholar] [CrossRef]
  12. Eckmann, J.-P.; Eckmann, J.-P.; Kamphorst, S.O.; Kamphorst, S.O.; Ruelle, D.; Ruelle, D. Recurrence Plots of Dynamical Systems. Synchronization Syst. Time Delayed Coupling 1995, 16, 441–445. [Google Scholar]
  13. Shiro, M.; Hirata, Y.; Aihara, K. Similarities and Differences between Recurrence Plot and Fourier Transform. Seisan Kenkyu 2020, 72, 137–138. [Google Scholar]
  14. Facchini, A.; Kantz, H.; Tiezzi, E. Recurrence plot analysis of nonstationary data: The understanding of curved patterns. Phys. Rev. E 2005, 72, 021915. [Google Scholar] [CrossRef] [PubMed]
  15. Garcia-Ceja, E.; Uddin, Z.; Torresen, J. Classification of Recurrence Plots’ Distance Matrices with a Convolutional Neural Network for Activity Recognition. Procedia Comput. Sci. 2018, 130, 157–163. [Google Scholar] [CrossRef]
  16. Chen, Y.; Su, S.; Yang, H. Convolutional Neural Network Analysis of Recurrence Plots for Anomaly Detection. Int. J. Bifurc. Chaos 2020, 30, 2050002. [Google Scholar] [CrossRef]
  17. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 1097–1105. [Google Scholar] [CrossRef]
  18. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  19. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  20. Afonso, L.C.; Rosa, G.H.; Pereira, C.R.; Weber, S.A.; Hook, C.; Albuquerque, V.H.C.; Papa, J.P. A recurrence plot-based approach for Parkinson’s disease identification. Futur. Gener. Comput. Syst. 2019, 94, 282–292. [Google Scholar] [CrossRef]
  21. Jeancolas, L.; Petrovska-Delacrétaz, D.; Mangone, G.; Benkelfat, B.E.; Corvol, J.C.; Vidailhet, M.; Lehéricy, S.; Benali, H. X-Vectors: New quantitative biomarkers for early Parkinson’s disease detection from speech. Front. Neuroinform. 2021, 15, 4. [Google Scholar] [CrossRef]
  22. Almeida, J.S.; Rebouças Filho, P.P.; Carneiro, T.; Wei, W.; Damaševičius, R.; Maskeliūnas, R.; de Albuquerque, V.H.C. De-tecting Parkinson’s disease with sustained phonation and speech signals using machine learning techniques. Pattern Recognit. Lett. 2019, 125, 55–62. [Google Scholar] [CrossRef] [Green Version]
  23. Berus, L.; Klancnik, S.; Brezocnik, M.; Ficko, M. Classifying Parkinson’s Disease Based on Acoustic Measures Using Artificial Neural Networks. Sensors 2018, 19, 16. [Google Scholar] [CrossRef] [Green Version]
  24. Grover, S.; Bhartia, S.; Akshama; Yadav, A. Predicting Severity of Parkinson’s Disease Using Deep Learning. Procedia Comput. Sci. 2018, 132, 1788–1794. [Google Scholar] [CrossRef]
  25. Pydub. Available online: https://github.com/jiaaro/pydub (accessed on 15 February 2021).
  26. FFmpeg. Available online: https://github.com/FFmpeg/FFmpeg (accessed on 15 February 2021).
  27. Liu, L.; Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; Han, J. On the variance of the adaptive learning rate and beyond. arXiv 2019, arXiv:1908.03265. [Google Scholar]
  28. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference Learning Representations (ICLR), San Diego, CA, USA, 5–8 May 2015. [Google Scholar]
  29. LeCun, Y.A.; Bottou, L.; Orr, G.B.; Müller, K.-R. Efficient BackProp. In Transactions on Petri Nets and Other Models of Concurrency XV; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2012; pp. 9–48. [Google Scholar]
  30. CS231n Convolutional Neural Networks for Visual Recognition Course Website. Available online: https://cs231n.github.io/neural-networks-1/#actfun (accessed on 15 February 2021).
Figure 1. A layer of typical LSTM. LSTM has three gates with weight parameters to control the information that should be retained. The biases are omitted here.
Figure 1. A layer of typical LSTM. LSTM has three gates with weight parameters to control the information that should be retained. The biases are omitted here.
Applsci 11 04361 g001
Figure 2. A layer of GRU. GRU controls how much information is retained by two gates, reset gate and update gate. The biases are omitted here.
Figure 2. A layer of GRU. GRU controls how much information is retained by two gates, reset gate and update gate. The biases are omitted here.
Applsci 11 04361 g002
Figure 3. A graph of the sech function and sech’s differentiated function (cited from [10]). Sech function is even function and has output range from 0 to 1.
Figure 3. A graph of the sech function and sech’s differentiated function (cited from [10]). Sech function is even function and has output range from 0 to 1.
Applsci 11 04361 g003
Figure 4. A layer of RNN-SH (cited from [10]). The smaller scalar value a is, the larger the hidden state and the gradient are after passing through the gate, and the larger a is, the smaller they are. This a controls the amount of retained information and the degree of vanishing gradient. The biases are omitted here.
Figure 4. A layer of RNN-SH (cited from [10]). The smaller scalar value a is, the larger the hidden state and the gradient are after passing through the gate, and the larger a is, the smaller they are. This a controls the amount of retained information and the degree of vanishing gradient. The biases are omitted here.
Applsci 11 04361 g004
Figure 5. The structure of Parkinson’s disease detection model. The input image size (RP) is 20 × 20, and the relu was used as the activation function after each output of the convolutional layer. In order to prevent overfitting, dropout is applied immediately after max pooling and before output layer. If the RNN has two layers, dropout is applied before the RNN of the second layer too. The size input to RNN is 64 × 5 × 5 (= 1600) because the output from the CNN is flattened. The model output is binary (2 classes), and the output is generated after all RPs have been input.
Figure 5. The structure of Parkinson’s disease detection model. The input image size (RP) is 20 × 20, and the relu was used as the activation function after each output of the convolutional layer. In order to prevent overfitting, dropout is applied immediately after max pooling and before output layer. If the RNN has two layers, dropout is applied before the RNN of the second layer too. The size input to RNN is 64 × 5 × 5 (= 1600) because the output from the CNN is flattened. The model output is binary (2 classes), and the output is generated after all RPs have been input.
Applsci 11 04361 g005
Figure 6. The examples of preprocessed voice data and generated RPs. The left image (a) is an example of a healthy person, and the right image (b) is an example of a PD patient. (a1,b1) are voice waveform, (a2,b2) are normalized waveform for 0.280 to 0.290 s. (a3,b3) are generated RPs from normalized waveform for 0.280 to 0.290 s. PD patients have a faint voice by dysarthria and there are differences between the two recurrence plots. The RP was rotated 90 degrees in this figure.
Figure 6. The examples of preprocessed voice data and generated RPs. The left image (a) is an example of a healthy person, and the right image (b) is an example of a PD patient. (a1,b1) are voice waveform, (a2,b2) are normalized waveform for 0.280 to 0.290 s. (a3,b3) are generated RPs from normalized waveform for 0.280 to 0.290 s. PD patients have a faint voice by dysarthria and there are differences between the two recurrence plots. The RP was rotated 90 degrees in this figure.
Applsci 11 04361 g006
Figure 7. The graphs of the average value of each model in Table 1. The left is the validation average, the middle is the test average, and the right is the total average. Here, the standard deviation is omitted to make the graphs easier to see.
Figure 7. The graphs of the average value of each model in Table 1. The left is the validation average, the middle is the test average, and the right is the total average. Here, the standard deviation is omitted to make the graphs easier to see.
Applsci 11 04361 g007
Figure 8. The average ROC curve with AUC values of GRU and RNN-SH. (a) is the ROC curve of the GRU, and (b) is that of the RNN-SH (tanh). The average here is an average of the validation set and the test set.
Figure 8. The average ROC curve with AUC values of GRU and RNN-SH. (a) is the ROC curve of the GRU, and (b) is that of the RNN-SH (tanh). The average here is an average of the validation set and the test set.
Applsci 11 04361 g008
Figure 9. The graph of the relationship between the number of units and the number of parameters of PD detection model. If RNNs have the same number of units, the number of parameters of PD detection model using RNN-SH is considerably reduced.
Figure 9. The graph of the relationship between the number of units and the number of parameters of PD detection model. If RNNs have the same number of units, the number of parameters of PD detection model using RNN-SH is considerably reduced.
Applsci 11 04361 g009
Figure 10. The examples of confusion matrix in first trial of LSTM and RNN-SH (tanh). (a) is the confusion matrix of the LSTM, and (b) is that of the RNN-SH (tanh). Here, the confusion matrix is normalized.
Figure 10. The examples of confusion matrix in first trial of LSTM and RNN-SH (tanh). (a) is the confusion matrix of the LSTM, and (b) is that of the RNN-SH (tanh). Here, the confusion matrix is normalized.
Applsci 11 04361 g010
Figure 11. The execution time of each RNN in the case of 1 layer and 256 units. The number in parentheses is the epoch at the time when the loss was the lowest for the validation set. RNN-SH is faster than LSTM and GRU. GRU is slow due to its low parallelism.
Figure 11. The execution time of each RNN in the case of 1 layer and 256 units. The number in parentheses is the epoch at the time when the loss was the lowest for the validation set. RNN-SH is faster than LSTM and GRU. GRU is slow due to its low parallelism.
Applsci 11 04361 g011
Figure 12. Learning curves of loss for each RNN with 256 units and one or two layers. (a1,a2), (b1,b2), (c1,c2), and (d1,d2) represent the graphs of learning curves of RNN-SH (tanh), RNN-SH (relu), LSTM, and GRU, respectively. The number after the letters corresponds to the number of layers.
Figure 12. Learning curves of loss for each RNN with 256 units and one or two layers. (a1,a2), (b1,b2), (c1,c2), and (d1,d2) represent the graphs of learning curves of RNN-SH (tanh), RNN-SH (relu), LSTM, and GRU, respectively. The number after the letters corresponds to the number of layers.
Applsci 11 04361 g012
Figure 13. The examples of confusion matrix in first trial of RNN-SH. (a) is the confusion matrix of the RNN-SH (tanh), and (b) is that of the RNN-SH (relu). In first trial case, although the model could be learned relatively well when tanh is used, it could not be learned well when relu is used, and the PD detection ability was low.
Figure 13. The examples of confusion matrix in first trial of RNN-SH. (a) is the confusion matrix of the RNN-SH (tanh), and (b) is that of the RNN-SH (relu). In first trial case, although the model could be learned relatively well when tanh is used, it could not be learned well when relu is used, and the PD detection ability was low.
Applsci 11 04361 g013
Table 1. Experimental results (validation set/test set).
Table 1. Experimental results (validation set/test set).
The Number of TrialsAverage (Validation/Test)Total Average
LayersModelUnitsIndicators12345
1LSTM64Acc. (%)75.00/72.2277.78/55.5658.33/75.0069.44/63.8961.11/63.8968.33 ± 7.58/66.11 ± 6.8967.22 ± 7.33
F1. (%)74.98/72.1477.50/55.4258.04/74.5169.42/63.1860.63/62.4768.11 ± 7.67/65.54 ± 6.9566.83 ± 7.43
MCC0.501/0.4470.570/0.1120.169/0.5210.390/0.2890.228/0.3020.371 ± 0.154/0.334 ± 0.1410.353 ± 0.149
128Acc. (%)69.44/69.4475.00/55.5666.67/69.4461.11/72.2244.44/27.7863.33 ± 10.45/58.89 ± 16.6161.11 ± 14.05
F1. (%)69.23/68.8474.02/55.4266.56/68.2460.63/72.2244.44/27.5562.98 ± 10.23/58.46 ± 16.4860.72 ± 13.90
MCC0.394/0.4050.543/0.1120.335/0.4220.228/0.444-0.111/-0.4470.278 ± 0.219/0.187 ± 0.3400.233 ± 0.290
256Acc. (%)75.00/77.7880.56/61.1172.22/61.1161.11/72.2255.56/77.7868.89 ± 9.20/70.00 ± 7.5469.44 ± 8.43
F1. (%)74.98/77.7180.17/60.9970.78/59.0960.63/72.1455.42/77.7868.40 ± 9.13/69.54 ± 8.0468.97 ± 8.62
MCC0.501/0.5590.636/0.2240.496/0.2480.228/0.4470.112/0.5560.395 ± 0.194/0.407 ± 0.1450.401 ± 0.171
GRU64Acc. (%)69.44/75.0080.56/58.3363.89/63.8963.89/55.5658.33/77.7867.22 ± 7.54/66.11 ± 8.8566.67 ± 8.24
F1. (%)69.42/74.9880.17/58.3063.18/60.1763.86/55.4258.04/77.7866.94 ± 7.54/65.33 ± 9.1966.13 ± 8.44
MCC0.390/0.5010.636/0.1670.289/0.3510.278/0.1120.169/0.5560.352 ± 0.158/0.337 ± 0.1760.345 ± 0.167
128Acc. (%)69.44/77.7875.00/63.8969.44/66.6766.67/72.2247.22/52.7865.56 ± 9.56/66.67 ± 8.4366.11 ± 9.03
F1. (%)69.42/77.7873.33/62.4769.42/62.5066.56/72.2246.18/49.6364.98 ± 9.64/64.92 ± 9.6464.95 ± 9.64
MCC0.390/0.5560.577/0.3020.390/0.4470.335/0.444-0.058/0.0640.327 ± 0.209/0.363 ± 0.1700.345 ± 0.191
256Acc. (%)69.44/75.0083.33/58.3375.00/61.1172.22/69.4452.78/69.4470.56 ± 10.03/66.67 ± 6.0968.61 ± 8.52
F1. (%)69.42/74.9883.13/58.3074.83/57.8672.22/68.8451.85/69.2370.29 ± 10.29/65.84 ± 6.7068.07 ± 8.97
MCC0.390/0.5010.684/0.1670.507/0.2670.444/0.4050.058/0.3940.417 ± 0.205/0.347 ± 0.1170.382 ± 0.170
RNN-SH
(tanh)
64Acc. (%)72.22/69.4483.33/58.3363.89/69.4475.00/63.8961.11/55.5671.11 ± 7.97/63.33 ± 5.6767.22 ± 7.93
F1. (%)72.14/68.8483.13/57.5161.48/69.2374.98/63.1859.09/50.0070.16 ± 8.87/61.75 ± 7.2765.96 ± 9.13
MCC0.447/0.4050.684/0.1740.321/0.3940.501/0.2890.248/0.1490.440 ± 0.151/0.282 ± 0.1070.361 ± 0.153
128Acc. (%)75.00/72.2277.78/63.8966.67/63.8969.44/61.1155.56/55.5668.89 ± 7.74/63.33 ± 5.3966.11 ± 7.22
F1. (%)74.98/71.8877.14/63.6465.71/63.1869.42/60.6347.64/55.0066.98 ± 10.48/62.86 ± 5.4564.92 ± 8.60
MCC0.501/0.4560.589/0.2820.354/0.2890.390/0.2280.177/0.1140.402 ± 0.140/0.274 ± 0.1110.338 ± 0.142
256Acc. (%)72.22/80.5683.33/61.1172.22/66.6763.89/72.2261.11/69.4470.56 ± 7.78/70.00 ± 6.4370.28 ± 7.14
F1. (%)72.22/80.5483.13/61.1172.14/63.8863.64/72.2259.09/68.8470.04 ± 8.26/69.32 ± 6.8069.68 ± 7.58
MCC0.444/0.6120.684/0.2220.447/0.4010.282/0.4440.248/0.4050.421 ± 0.155/0.417 ± 0.1240.419 ± 0.140
RNN-SH
(relu)
64Acc. (%)63.89/77.7844.44/50.0063.89/69.4458.33/66.6755.56/75.0057.22 ± 7.16/67.78 ± 9.7262.50 ± 10.04
F1. (%)63.64/77.7137.50/37.6958.47/69.2357.51/66.5655.00/74.9854.42 ± 8.92/65.24 ± 14.3359.83 ± 13.10
MCC0.282/0.559-0.149/0.0000.402/0.3940.174/0.3350.114/0.5010.164 ± 0.185/0.358 ± 0.1950.261 ± 0.213
128Acc. (%)75.00/66.6766.67/63.8966.67/66.6763.89/69.4455.56/52.7865.56 ± 6.24/63.89 ± 5.8364.72 ± 6.09
F1. (%)74.83/65.7162.50/60.1764.94/66.2563.64/69.4244.62/39.2362.10 ± 9.78/60.16 ± 10.8861.13 ± 10.39
MCC0.507/0.3540.447/0.3510.372/0.3420.282/0.3900.243/0.1690.370 ± 0.099/0.321 ± 0.0780.346 ± 0.092
256Acc. (%)66.67/75.0083.33/61.1169.44/63.8966.67/75.0063.89/44.4470.00 ± 6.89/63.89 ± 11.2566.94 ± 9.82
F1. (%)66.25/74.8383.13/61.1168.24/63.6466.56/74.9863.64/42.8669.56 ± 6.94/63.48 ± 11.7666.52 ± 10.13
MCC0.342/0.5070.684/0.2220.422/0.2820.335/0.5010.282/−0.1180.413 ± 0.143/0.279 ± 0.2290.346 ± 0.202
2LSTM64Acc. (%)80.56/69.4477.78/58.3383.33/61.1169.44/66.6761.11/77.7874.44 ± 8.13/66.67 ± 6.8070.56 ± 8.44
F1. (%)80.54/69.4277.14/58.3083.33/57.8669.42/66.5660.99/77.7874.29 ± 8.12/65.98 ± 7.4370.14 ± 8.82
MCC0.612/0.3900.589/0.1670.667/0.2670.390/0.3350.224/0.5560.496 ± 0.165/0.343 ± 0.1300.420 ± 0.167
128Acc. (%)75.00/75.0075.00/58.3375.00/66.6769.44/66.6755.56/77.7870.00 ± 7.54/68.89 ± 6.8969.44 ± 7.24
F1. (%)74.98/74.9874.02/58.3074.02/62.5069.42/65.7155.42/77.7869.57 ± 7.34/67.85 ± 7.4068.71 ± 7.42
MCC0.501/0.5010.543/0.1670.543/0.4470.390/0.3540.112/0.5560.418 ± 0.163/0.405 ± 0.1360.411 ± 0.150
256Acc. (%)63.89/75.0075.00/55.5669.44/61.1163.89/72.2252.78/83.3365.00 ± 7.37/69.44 ± 9.9467.22 ± 9.02
F1. (%)62.47/74.8374.02/55.4268.24/59.0963.64/72.1449.63/83.2863.60 ± 8.08/68.95 ± 10.3066.27 ± 9.64
MCC0.302/0.5070.543/0.1120.422/0.2480.282/0.4470.064/0.6710.322 ± 0.160/0.397 ± 0.1970.360 ± 0.183
GRU64Acc. (%)75.00/77.7877.78/63.8963.89/55.5666.67/61.1161.11/69.4468.89 ± 6.43/65.56 ± 7.5867.22 ± 7.22
F1. (%)74.98/77.7177.14/63.6461.48/53.2566.56/61.1161.11/69.2368.26 ± 6.69/64.99 ± 8.1866.62 ± 7.65
MCC0.501/0.5590.589/0.2820.321/0.1240.335/0.2220.222/0.3940.394 ± 0.133/0.316 ± 0.1500.355 ± 0.147
128Acc. (%)66.67/77.7880.56/61.1180.56/66.6769.44/69.4455.56/55.5670.56 ± 9.40/66.11 ± 7.5468.33 ± 8.80
F1. (%)66.56/77.7880.17/61.1180.54/62.5069.42/68.8455.42/54.2970.42 ± 9.36/64.90 ± 7.9367.66 ± 9.10
MCC0.335/0.5560.636/0.2220.612/0.4470.390/0.4050.112/0.1180.417 ± 0.193/0.350 ± 0.1580.383 ± 0.180
256Acc. (%)69.44/77.7883.33/55.5677.78/66.6763.89/72.2258.33/77.7870.56 ± 9.06/70.00 ± 8.3170.28 ± 8.70
F1. (%)69.42/77.7183.13/55.0077.71/62.5063.64/72.1458.30/77.7170.44 ± 9.04/69.01 ± 8.9469.72 ± 9.02
MCC0.390/0.5590.684/0.1140.559/0.4470.282/0.4470.167/0.5590.416 ± 0.186/0.425 ± 0.1630.421 ± 0.175
RNN-SH
(tanh)
64Acc. (%)61.11/52.7883.33/61.1172.22/61.1161.11/66.6766.67/38.8968.89 ± 8.31/56.11 ± 9.6962.50 ± 11.06
F1. (%)54.18/39.2383.13/60.0070.78/60.0056.25/63.8866.56/38.1366.18 ± 10.50/52.25 ± 11.1859.21 ± 12.89
MCC0.354/0.1690.684/0.2360.496/0.2360.298/0.4010.335/-0.2280.433 ± 0.142/0.163 ± 0.2100.298 ± 0.225
128Acc. (%)72.22/69.4475.00/63.8972.22/58.3363.89/66.6761.11/55.5668.89 ± 5.39/62.78 ± 5.1565.83 ± 6.09
F1. (%)70.78/69.2374.02/61.4871.43/55.5663.64/66.5654.18/44.6266.81 ± 7.19/59.49 ± 8.7863.15 ± 8.82
MCC0.496/0.3940.543/0.3210.471/0.1930.282/0.3350.354/0.2430.429 ± 0.097/0.297 ± 0.0710.363 ± 0.108
256Acc. (%)75.00/75.0080.56/63.8969.44/75.0058.33/58.3358.33/61.1168.33 ± 8.89/66.67 ± 7.0367.50 ± 8.06
F1. (%)74.98/74.5180.42/63.1868.24/74.9857.51/56.7054.04/56.2567.04 ± 10.03/65.12 ± 8.2366.08 ± 9.23
MCC0.501/0.5210.620/0.2890.422/0.5010.174/0.1810.211/0.2980.385 ± 0.170/0.358 ± 0.1320.372 ± 0.153
RNN-SH
(relu)
64Acc. (%)72.22/77.7869.44/63.8963.89/72.2258.33/63.8952.78/55.5663.33 ± 7.11/66.67 ± 7.6665.00 ± 7.58
F1. (%)72.14/77.7166.30/60.1761.48/72.2257.51/63.1851.85/53.2561.86 ± 6.99/65.31 ± 8.6963.58 ± 8.08
MCC0.447/0.5590.491/0.3510.321/0.4440.174/0.2890.058/0.1240.298 ± 0.163/0.354 ± 0.1460.326 ± 0.158
128Acc. (%)69.44/75.0080.56/58.3363.89/63.8963.89/63.8961.11/66.6767.78 ± 6.94/65.56 ± 5.4466.67 ± 6.34
F1. (%)67.41/74.5180.54/55.5661.48/63.8663.64/63.1860.00/62.5066.61 ± 7.40/63.92 ± 6.0865.27 ± 6.90
MCC0.449/0.5210.612/0.1930.321/0.2780.282/0.2890.236/0.4470.380 ± 0.136/0.346 ± 0.1200.363 ± 0.129
256Acc. (%)66.67/72.2277.78/63.8961.11/63.8958.33/55.5652.78/50.0063.33 ± 8.50/61.11 ± 7.6662.22 ± 8.16
F1. (%)66.25/71.8877.71/62.4757.86/63.6457.51/53.2542.86/37.6960.44 ± 11.46/57.78 ± 11.6559.11 ± 11.63
MCC0.342/0.4560.559/0.3020.267/0.2820.174/0.1240.101/0.0000.288 ± 0.158/0.233 ± 0.1570.261 ± 0.160
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fujita, T.; Luo, Z.; Quan, C.; Mori, K.; Cao, S. Performance Evaluation of RNN with Hyperbolic Secant in Gate Structure through Application of Parkinson’s Disease Detection. Appl. Sci. 2021, 11, 4361. https://doi.org/10.3390/app11104361

AMA Style

Fujita T, Luo Z, Quan C, Mori K, Cao S. Performance Evaluation of RNN with Hyperbolic Secant in Gate Structure through Application of Parkinson’s Disease Detection. Applied Sciences. 2021; 11(10):4361. https://doi.org/10.3390/app11104361

Chicago/Turabian Style

Fujita, Tomohiro, Zhiwei Luo, Changqin Quan, Kohei Mori, and Sheng Cao. 2021. "Performance Evaluation of RNN with Hyperbolic Secant in Gate Structure through Application of Parkinson’s Disease Detection" Applied Sciences 11, no. 10: 4361. https://doi.org/10.3390/app11104361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop