Implementation of a Deep Learning Algorithm Based on Vertical Ground Reaction Force Time–Frequency Features for the Detection and Severity Classification of Parkinson’s Disease

Conventional approaches to diagnosing Parkinson’s disease (PD) and rating its severity level are based on medical specialists’ clinical assessment of symptoms, which are subjective and can be inaccurate. These techniques are not very reliable, particularly in the early stages of the disease. A novel detection and severity classification algorithm using deep learning approaches was developed in this research to classify the PD severity level based on vertical ground reaction force (vGRF) signals. Different variations in force patterns generated by the irregularity in vGRF signals due to the gait abnormalities of PD patients can indicate their severity. The main purpose of this research is to aid physicians in detecting early stages of PD, planning efficient treatment, and monitoring disease progression. The detection algorithm comprises preprocessing, feature transformation, and classification processes. In preprocessing, the vGRF signal is divided into 10, 15, and 30 s successive time windows. In the feature transformation process, the time domain vGRF signal in windows with varying time lengths is modified into a time–frequency spectrogram using a continuous wavelet transform (CWT). Then, principal component analysis (PCA) is used for feature enhancement. Finally, different types of convolutional neural networks (CNNs) are employed as deep learning classifiers for classification. The algorithm performance was evaluated using k-fold cross-validation (kfoldCV). The best average accuracy of the proposed detection algorithm in classifying the PD severity stage classification was 96.52% using ResNet-50 with vGRF data from the PhysioNet database. The proposed detection algorithm can effectively differentiate gait patterns based on time–frequency spectrograms of vGRF signals associated with different PD severity levels.


Introduction
Parkinson's disease (PD) is a neurodegenerative disease that belongs to a group of motor system disorders caused by the loss of dopamine-producing brain cells. PD is the second most common neurodegenerative disease [1]; its prevalence is approximately 0.3% in the general population, approximately 1% in individuals older than 60, and approximately 3% in people aged 80 and over [1]. The incidence of PD is 8-18 per 100,000 people. The median age at onset is 60 years, and the mean duration of the progression of the disease from diagnosis to death is approximately 15 years [1]. There is a 1.5-2-fold greater prevalence and incidence of this disease in men [1]. PD treatments cost approximately USD 2500 each year, and therapeutic surgery costs up to USD 100,000 per patient [2]. The primary PD symptoms are tremors in the hands, arms, legs, jaw, and face; rigidity (inflexibility of the limbs and trunk); bradykinesia (slowness in movement); and postural instability (balance and coordination disturbance) [3][4][5]. As these symptoms become more severe, patients may experience difficulties walking, talking, or accomplishing simple tasks.   [10] for PD severity stage.

0
No signs of disease 1 Symptoms are very mild; unilateral involvement only 1. 5 Unilateral and axial involvement 2 Bilateral involvement without impairment of balance 2. 5 Mild bilateral disease with recovery on pull test 3 Mild to moderate bilateral disease; some postural instability; physical independence 4 Severe disability; still able to walk or stand unassisted 5 Wheelchair bound or bedridden unless aided Classification is the process of identifying the class of a new observation using a set of categories based on a training process involving observations for which the classes are known. In PD classification, various machine learning algorithms have been implemented as classifiers and combined with sophisticated feature extraction methods for dimensionality reduction. Recently, deep learning approaches, instead of conventional machine learning algorithms, have been applied to improve PD classification performance. For example, Jane et al. presented a Q-backpropagated time delay neural network (Q-BTDNN) in a clinical decision-making system (CDMS) to diagnose patients with PD (PD vs. CO) [11]. The Q-BTDNN was trained using a Q-learning induced backpropagation (Q-BP) training algorithm by generating a reinforced error signal, and the weights of the network were corrected through the backpropagation of the generated error signal. Correa et al. implemented a method to model PD patients' difficulties in starting and ending movements by examining information from speech, handwriting, and gait [12]. These researchers trained a convolutional neural network (CNN) to classify PD patients and CO subjects. The PD population in the database was divided into three groups based upon the stage of PD: low, intermediate, or severe. Lee and Lim classified idiopathic PD patients and COs based on their gait force characteristics using a continuous wavelet transform (CWT) to generate approximate coefficients and detail coefficients [13]. Forty features were extracted from those coefficients using statistical approaches, including frequency distributions and their variabilities. The features of idiopathic PD patients and COs were classified using a neural network with weighted fuzzy membership functions (NEWFM). Zhao et al. developed a two-channel model that combined Long Short-Term Memory (LSTM) and CNNs to learn spatio-temporal patterns in gait data recorded by foot sensors [14]. The model was trained and tested on three public vGRF datasets. The model could perform multi-category classification on features such as the severity level of PD, while previous machine learning-based approaches could only perform binary classification.
As previously mentioned, only a few studies have used the deep learning approach for the detection and severity classification of PD, and some of them have used statistical features combined with machine learning methods. The drawbacks of using machine learning are the dependence of its performance on data size and the understanding of features [15,16]. Machine learning only performs well on small to medium datasets and needs a better understanding of features to represent the data. The objective of this work was to develop a deep learning classifier to help physicians screen and classify the severity of PD in patients, using vGRF spectrograms. The effectiveness of time-frequency spectrogram (feature transformation) of vGRF signals from left (LF), right (RF), and compound foot (CF = LF + RF) movements in classifying features of PD severity was investigated. Specifically, the aim was to determine whether a significant difference in vGRF is related to the specifics of disease severity, as passive (weight acceptance) and active (push off) peaks of vGRF are important gait parameters [17] and exhibit significant relevance in the detection of gait abnormalities, especially in the PD gait assessment [13,14,[18][19][20]. Different deep learning algorithms (including AlexNet, ResNet-50, ResNet-101, and GoogLeNet) were also utilized with the proposed method to compare the effectiveness among classifiers.

Materials and Methods
The proposed PD severity classification algorithm attempts to extract pattern features and visualizations from vGRF signals in PD patients with severity stages of 0, 2, 2.5, and 3 on the HY rating scale by transforming one-dimensional time domain signals into twodimensional patterns (images) using the feature transformation method from a CWT. The proposed PD severity classification algorithm consists of four main steps, as shown in Figure 2: (1) signal preprocessing of PD patients' vGRF signals, (2) feature extraction from a spectrogram of the vGRF signal generated using CWT and PCA, (3) construction and training of a CNN classifier, and (4) cross-validation to evaluate the performance of the classification algorithm.
The database contains information recorded from 93 idiopathic PD patients (average age: 66.3 years; 63% men and 37% women) and 73 CO subjects (average age: 66.3 years; 55% men and 45% women). Every subject was instructed to walk at their usual pace for about two minutes while wearing a pair of shoes with eight force sensors located under each insole. The raw vGRF signal data in this database were obtained using forcesensitive sensors (Ultraflex Computer Dyno Graphy, Intronic Inc., NL-7650 AB Tubbergen, The Netherlands) with the output proportional to the force under the foot in Newtons, collected at 100 samples per second (frequency of readings during movement was 100 Hz). The recordings also included two signals that reflect the sum of the eight sensor outputs from the left and right foot.
The database also contains information about each participant, including gender, age, height, weight, walking velocity, and severity level of PD. The PD severity level was assigned according to two rating scales, HY [10] and the Unified Parkinson's Disease Rating Scale (UPDRS) [25]. The HY rating scale, widely used to represent the way in which symptoms of PD progress, defines five stages of PD, with two additional intermediate stages, 1.5 and 2.5 (Table 1) [10]. The number of participants diagnosed using the HY rating scale is shown in Table 2.

Signal Preprocessing
A two-minute foot force signal was acquired during data collection from subjects. The LF, RF, and CF vGRF signals of the CO and PD subjects were used as inputs to the proposed algorithm. It was difficult to interpret the foot force data directly, despite using a CWT to transform the features, due to the length of the foot force signal. To observe the foot force signal more accurately, a window function was employed. A window function is a mathematical construct that is zero-valued outside of selected intervals. In this research, 10, 15, and 30 s window sizes were used. The aim of the time-windowing process was to obtain shorter signal data. In the clinical application, this data collection is more convenient for the PD patient and, furthermore, reduces the fall risk. The possibility of patient injury rises if the data collection time is longer. Normalization and zero-mean processing were also used, to reduce the redundancy and dependency of data.
In 1987, Nilsson and Thorstensson observed the adaptability in the frequency and amplitude of leg movements during human locomotion at different speeds [26]. They reported that the overall range of stride frequency for normal leg movements is 0.83-1.95 Hz. The stride cycle period is defined as the time from the heel contact of one foot with the ground to the next heel contact of the same foot with the ground. The stride cycle period can be derived from the vGRF signal, and the stride frequency is the inverted value of the stride cycle duration. In conclusion, we selected two frequency ranges, 0.83-1.95 Hz and 1.95-50 Hz, for detailed observations of vGRF spectrograms among CO and PD subjects.

The Continuous Wavelet Transform
The continuous wavelet transform (CWT) is a signal processing tool used to observe the time-frequency spectrum characteristics of non-stationary signals [27]. As in the case of the Gabor transform [28], a CWT can be used to filter a signal using a dilated version of the mother wavelet, but the frequency translation is affected by dilation (scaling) and contraction. A CWT is a time-frequency transformation method, because it changes the signal time domain to the time-frequency domain. The output of a CWT is a time-frequency spectrogram (time-scale representation), which provides valuable information about the relationship between time and frequency.
A CWT consists of a time series function x(t) ∈ L 2 (R), with a scaling or dilation factor s ∈ R + (s > 0) that controls the width of the wavelet, and a translation parameter, τ, controlling the location of the wavelet as expressed in the following equation: where ψ(t) is a mother wavelet, also called a window function. The mother wavelet function used in this research was a Morlet or Gabor wavelet. This wavelet function consists of a Gaussian-windowed complex sinusoid (a complex exponential multiplied by a Gaussian window) as follows: The parameter t refers to the time, and f represents the reference frequency. The aim of the time-frequency transformation is to represent the vGRF signal ( Figure 3a) as a time-frequency spectrogram image, as shown in Figure 3b,c, Figures 4 and 5. The images clearly show different patterns of vGRF between CO and PD subjects that cannot be found in the time and frequency domains of the signal. Using the time-frequency spectrogram, variations in the foot pressure signal caused by temporal characteristics can also be analyzed. Temporal characteristics, also known as spatial characteristics or linear gait variabilities, consist of the measurements of step length, the stance width, the length of the step rhythm, and the step velocity.

Principal Component Analysis
The main goal of principal component analysis (PCA) is to perform dimension reduction for a dataset containing a large number of interrelated variables while, to the greatest extent possible, retaining the variations present in the dataset [29]. This reduction is achieved by transforming the dataset into a new set of variables, the principal components (PCs), which are ordered, de-correlated variables.
The PCA method is defined mathematically using the following steps. Consider a matrix X = [P 1 ; P 2 ; P 3 ; . . . ; P 2 ] T constructed using the spectrogram images of PDs and COs, where P is a row vector consisting of the pixels of a spectrogram image of PDs or COs, and i is the number of spectrogram images of PDs and COs. The PC is built using the equation X T X, a covariance matrix of the matrix X, to determine its eigenvalues and eigenvectors. The W matrix, an m × m matrix of weights whose columns are the eigenvectors of X T X, is obtained. Finally, the matrix for extracted feature F can be described as the full PCs' decomposition of X, as shown in the following equation: F = XW.
The purpose of using PCA for feature enhancement was to extract fewer patterns while identifying the most important texture and pattern features. This processing was conducted in order to improve the performance of machine learning and artificial intelligence algorithms used for classifying the data points. The full PCs of each spectrogram image sample were selected to preserve the important texture and pattern features for visualization.

Convolutional Neural Network
A convolutional neural network (CNN) is composed of one or more convolutional layers, often with subsampling and pooling layers, followed by one or more fully-connected layers, as in a basic multi-layer neural network [30]. CNN was utilized to distinguish the time-frequency spectrogram representation of vGRF between PD severity stages. The convolutional layer plays the most important role in CNN performance. This layer is composed of a set of kernels (learnable filters) as parameters, which contain a small receptive field but are expanded through the full depth of the input. When the data pass through this layer, each kernel is convolved across the spatial dimensionality of the input (width and height of the input volume), resulting in the calculation of the dot product and production of a 2D activation map. The filters in the convolutional layers are edge detectors and color filters. An activation layer utilizes a non-saturating activation function f (x) = max(0, x), such as a sigmoid function, in which σ(x) = (1 + e −x ) −1 , to generate the output from the input produced by the previous layer. Another important concept in CNNs is pooling, also known as non-linear down-sampling. The aim of the pooling layer is to reduce the dimensionality and minimize the number of parameters and the complexity of model computation. This layer, known as the max-pooling layer, takes the input of each activation map and scales the input dimension using the MAX function. Finally, the fully connected layers attempt to generate scores from the previous activations to use for classification, as in traditional artificial neural networks (ANNs). Neurons in this layer have connections to all of the outputs of the previous layer. The performance of AlexNet, ResNet-50, ResNet-101, and GoogLeNet was examined in this study.

AlexNet CNN
The AlexNet architecture [31] comprises 25 layers, including an input layer, 5 convolution 2D layers, 7 rectified linear unit (ReLU) layers, 2 cross-channel normalization layers, 3 max-pooling 2D layers, 3 fully connected layers, 2 dropout layers for regularization, a softmax layer using a normalized exponential function, and an output layer. The input to the AlexNet CNN in the proposed method is a time-frequency spectrogram of the vGRF signals produced by the CWT. There are two methods for fine-tuning a pretrained AlexNet CNN: transfer learning and feature extraction. We chose the feature extraction method because it is easy to apply to pretrained networks without expending a lot of time, as it is faster than the transfer learning method and requires less training. This method applies two previous fully connected layers and uses a support vector machine (SVM) for classification.

ResNet-50 and ResNet-101 CNN
The main idea behind a residual network (ResNet) [32] is the presentation of a socalled "identity shortcut connection" that skips one or more layers. A shortcut (or skip) connection is used to solve the problem of vanishing or exploding gradients by using blocks that re-route the input and add to the concept learned in the previous layer. During learning, a layer learns the concepts of the previous layer and merges with inputs from that previous layer. ResNet-X refers to a residual deep neural network with X number of layers; for example, ResNet-50 indicates a ResNet developed using 50 layers. The architectures of ResNet-50 and ResNet-101 are described in Table 3.  [33] is a pretrained CNN that has 22 layers with 9 inception layers. An inception layer determines the optimal local sparse structure in a convolutional vision network, which can be approximated and covered by readily available dense components. In general, the inception layer is a network consisting of parallel convolutions of different sizes and types (1 × 1, 3 × 3, and 5 × 5) for the same input, which stacks all of the outputs. The exact structure of GoogLeNet is as follows: • An average pooling layer with a 5 × 5 filter size and a stride of 3. Although AlexNet, ResNet-50, ResNet-101, and GoogLeNet achieved significant performance in the PD severity detection (overall accuracy~97%), their architecture characteristics exhibited different influences on performance based on the benefits and drawbacks of the networks. The advantages and disadvantages of AlexNet, ResNet-50, ResNet-101, and GoogLeNet applied in the proposed method are summarized in Table 4.

Cross-Validation
Cross-validation is a statistical method used to assess and compare learning algorithms by dividing data into two groups: a training set used to train a model and a testing set used to test the model [36]. The training and testing sets are varied in consecutive rounds so that each data point is tested using a classifier in whose training it did not participate. There are two main purposes of using cross-validation. Cross-validation is used to quantify the generalizability of an algorithm, by testing the classifier on unseen data. The second purpose is to evaluate the performance of different algorithms and identify the best algorithm with which to classify the available data or, alternatively, to compare the performance of two or more variants of a parameterized model. In order to compare the results with the existing literature, k-fold cross-validation was utilized. Consequently, k iterations of training and testing were carried out in such a way that within each iteration, a different fold of the dataset was used for testing, while the remaining (k-1) folds were used for training. In this research, 10-fold cross-validation was applied.

Results
The experiments were carried out using MATLAB R2018a software on an NVIDIA GeForce GTX 1060 6 GB computer with 24 GB RAM. The computation time is affected by the number of input time-frequency spectrogram images (related to the time-windowing process, where a smaller time window will result in more images and longer computation time) and the number of neurons in the CNN. We employed multi-class classification for the COs and PD Stages 2, 2.5, and 3. This approach is representative of real-life applications, because doctors and neurologists do not have preliminary information about whether a patient is healthy or suffers from PD and, if the latter, what the severity is.
The sensitivity, specificity, accuracy, and AUC value of the proposed method were included as parameters for evaluation. The detailed definition of each evaluation parameter is provided in [37]. When selecting between diagnostic tests, Youden's index is often applied to evaluate the effectiveness of the test [38]. Youden's index is a function of sensitivity and specificity, and its value ranges between 0 and 1. A value close to 1 indicates that the diagnostic test's effectiveness is relatively high and the test is close to perfect, and a value close to 0 indicates poor effectiveness, where the test is useless. Youden's index (J) is the sum of the two fractions and indicates whether the measurements correctly diagnosed the diseased group (sensitivity) and healthy controls (specificity) over all cut-points c, −∞ < c < ∞: The proposed method covered two kinds of classifications, multi-class (CO vs. PD Stage 2 vs. PD Stage 2.5 vs. PD Stage 3) classification and two-class (CO vs. PD) classification. In the two-class classification, PD Stage 2, 2.5, and 3 datasets were combined into one PD dataset. The best classification performance was obtained using AlexNet CNN for multi-class classification and ResNet CNN for two-class classification. The best classification result of the Ga dataset has 98.15% sensitivity, 98.16% specificity, 98.16% accuracy, and an AUC value of 0.9816 on average for multi-class classification and 99.77% sensitivity, 98.80% specificity, 99.11% accuracy, and an AUC value of 0.9995 for two-class classification. The best classification result of the Ju dataset has 98.06% sensitivity, 98.38% specificity, 98.24% accuracy, and an AUC value of 0.9822 on average for multi-class classification and 98.94% sensitivity, 99.04% specificity, 99.01% accuracy, and an AUC value of 0.9993 for two-class classification. The best classification result for the Si dataset has 97.73% sensitivity, 98.76% specificity, 98.27% accuracy, and an AUC value of 0.9825 on average for multi-class classification and 98.85% sensitivity, 98.41% specificity, 98.56% accuracy, and an AUC value of 0.9964 for two-class classification. Based on these classification results, the performance of the proposed method was not influenced by different datasets in the database, even though the data collection processes varied among these datasets.

PD Severity Classification of All Datasets (Merged)
For this classification, the three vGRF datasets in gaitpdb were merged and used as inputs to the proposed PD severity classification algorithm. For the 10 s, 15 s, and 30 s time windows, the input signal numbers for CO, PD Stage 2, PD Stage 2.5, and PD Stage 3 were 994, 1180, 784, and 277; 658, 781, 516, and 184; and 321, 379, 243, and 91, respectively. The best result for this classification type was obtained using the ResNet CNN, with 92.08% sensitivity, 95.60% specificity, 94.58% accuracy, and an AUC value of 0.9384 on average for multi-class classification and 94.46% sensitivity, 97.69% specificity, 96.63% accuracy, and an AUC value of 0.9949 for two-class classification. The complete classification results are shown in Tables 5-16 for multi-class and Tables 17-19 for two-class, and Tables 20-23 summarizes the classification results. Table 5. Multi-class classification of LF from Ga dataset for CO (Class 0), PD Stage 2 (Class 2), PD Stage 2.5 (Class 2.5), and PD Stage 3 (Class 3) using several CNN classifiers (AlexNet, ResNet-50, ResNet-101, and GoogLeNet) with 10-fold cross-validation.                   Note: * denotes the best classification result and was selected using Youden's index criteria.

Discussion
In this section, we discuss the gait analysis for each severity stage of PD based on the time and frequency analyses of the time-frequency spectrograms. Some key features of a signal are difficult to observe with the naked eye, but time-frequency spectrogram analysis can help to decipher important information regarding time and frequency characteristics. A CWT was used in this study to transform the signal from the time domain into the timefrequency domain. The gait phenomena could be identified using pattern visualization and recognition based on time-frequency spectrograms for CO subjects and PD patients with severity stages of 2, 2.5, and 3.
This observation was only performed for the CF vGRF signal. Since this type of input signal is the additive force between the left and right foot force signals, it describes the correlations between the features of the left and right feet instead of a single feature of the left or right foot. In order to further investigate the gait phenomena, a 10 s time window spectrogram was selected because the image feature was derived from a shorter input signal, and more detail can be perceived from the texture and pattern visualization of gait phenomena. For a 15 and 30 s time window spectrogram, the texture and pattern information is more compressed, and thus, the gait phenomena are blurred and not easily observed (see Figures 4 and 5). The 0.1-5 Hz and 5-50 Hz frequency ranges were only applied to the detailed observations of the CWT time-frequency spectrogram and were not used for the classification.

Healthy Controls
Normal gait phenomena were interpreted by observing the time-frequency spectrogram of CO subjects, as shown in the first column of Figure 3. In the 0.1-5 Hz frequency range ( Figure 3, first column, second row), the strongest walking force magnitude, represented in yellow, of the normal gait occurs at 1.6-2.1 Hz and is stable from the initial time to the end of the experiment. The foot force distributions and walking velocities for normal subjects were therefore the same when they were walking. At 2.5-3 Hz and approximately 4.5-5 Hz, small areas signifying the lowest force magnitude, shown in dark blue, alternate with a significant force magnitude, indicated by light blue, forming a regular pattern. This phenomenon appears in the spectrogram and is caused by the CF force signal at the lowest magnitudes. The three lowest magnitudes can be observed in one cycle of the CF force time domain signal (top left of Figure 3); the lowest magnitudes are almost equal in every cycle of the signal. The lowest magnitudes that occur at the beginning and end of the half gait cycle (that is, only the left or right foot gait cycle), close to the 0 force unit, show that the toe-off and initial contact and the lowest magnitude that occur between the half gait cycle are demonstrated only when one foot is in contact with the ground.
In the 5-50 Hz frequency range ( Figure 3, first column, third row), a steady, strong force level, represented in yellow, occurs at approximately 5 Hz, with the same magnitude as that which occurs during walking, from the beginning to the end of the recording, and a significant force magnitude, shown in light blue, occurs up to 50 Hz in all records. Both time-frequency spectrograms indicate that the time and frequency components in the spectrogram have a regular pattern. This interpretation became a benchmark for investigating PD gait phenomena. These data were compared to analyze the gait characteristics of PD patients based on spectrogram analyses.

Parkinson's Disease Stage 2
The time-frequency spectrograms for PD patients were similar to those of the CO spectrograms. For PD Stage 2 patients, as presented in the second column of Figure 3, the strongest force is at 1.6-2.1 Hz in the 0.1-5 Hz (Figure 3, second column, second row) frequency range, and there is a significant, strong magnitude, shown in light yellow, at 1 Hz, which is weaker than the force magnitude at 1.6-2.1 Hz. The significant force magnitude at 2.5-3 Hz and approximately 4.5-5 Hz becomes more yellow instead of light blue as in the CO spectrogram. It is also apparent that the pattern of the lowest force magnitude at 2.5-5 Hz is regular at some times and irregular at other times. This observation indicates that the magnitudes of the global and local minima are not the same in every gait cycle ( Figure 3, second column, first row). In the time domain, the CF vGRF signal has fluctuating force magnitudes that cause an irregularity in the signal.
In the 5-50 Hz frequency range ( Figure 3, second column, third row), the strongest force magnitude, shown in yellow, is about 5 Hz, and significant force, represented by light blue, occurs up to 50 Hz every time. However, the force magnitude is not distributed equally over the entire walking period.

Parkinson's Disease Stage 2.5
As shown in the third column of Figure 3, the spectrogram for PD Stage 2.5 patients is not very different from the PD Stage 2 spectrogram in either frequency range. The only difference is that, in the 0.1-5 Hz frequency range, a significant, strong magnitude at 1 Hz becomes stronger, and yellow areas of force magnitude appear in the image. PD patients in the early stages-2 and 2.5-of the disease can have a walking velocity similar to that of COs, but their force distribution is typically not equally distributed, due to the presence of tremors.

Parkinson's Disease Stage 3
Of the patients studied in this research, those with PD Stage 3 had the most severe level of disease. The spectrograms of this group exhibit the most irregular patterns of all severity levels. In the fourth column, first row of Figure 3, the CF vGRF signal has the most fluctuation and irregular force magnitudes because of the jerky movements and tremors of the patients.
In the 0.1-5 Hz frequency range (third column, second row of Figure 3), the strongest walking force magnitude, shown in yellow, occurs at a lower frequency than in stages 2 and 2.5 at 1-1.5 Hz. A significant strong force magnitude, depicted in light yellow, also appears at approximately 0.75 Hz, although the force level is not the same in every gait cycle. At 2-3 Hz and 3.5-4 Hz, significant force magnitude regions occur, as shown by colors that are more yellow.
In the 5-50 Hz frequency range (third column, third row of Figure 3), the strongest force magnitude only appears in certain gait cycles and is not equally distributed. A significant force magnitude, shown in light blue, only occurs up to 20 Hz, and forms an irregular pattern in every gait cycle.

Comparison of Results with the Existing Literature
A comparison between the proposed methodology and a study by Zhao et al. [14] is presented in Table 24. The authors carried out multi-class classification of vGRF signals for CO vs. PD Stage 2 vs. PD Stage 2.5 vs. PD Stage 3 using the same information found in the same database used for the proposed method, gaitpdb. These authors separated the classification types based on the three datasets-Ga, Ju, and Si-and used 10-fold cross-validation as the evaluation method. The two-class classification results were also compared with those of studies conducted by Maachi et al. [39], Wu et al. [40], Ertugrul et al. [41], Zeng et al. [42], Daliri [43], and Khoury et al. [44,45]. These comparison results are shown in Tables 25 and 26. In Khoury et al.'s study, the classification types were divided based on the three datasets-Ga, Ju, and Si.  In summary, the proposed method produced almost the same classification results as those published in the existing literature, but the proposed algorithm generated better visualizations via time-frequency spectrograms associated with the progression of PD severity. The irregularity in patterns in the spectrograms is proportional to the severity level of PD. The more severe the disease, therefore, the more irregular the spectrogram's pattern. This phenomenon could be helpful for medical specialists or neurologists in monitoring PD progression, allowing them to provide more effective and accurate medications and therapies to patients.

Conclusions
In this study, a deep learning algorithm was implemented based on vGRF timefrequency features for the detection and severity classification of Parkinson's disease. Pattern visualization and recognition of the time-frequency spectrogram made it possible to successfully differentiate PD severity stages and COs. A CWT was used to generate spectrograms to visualize gait foot force signals by transforming signals from the time domain into the time-frequency domain. Three time-window sizes (10, 15, and 30 s), two frequency ranges (0.83-1.95 and 1.95-50 Hz), and three types of gait foot force signals (LF, RF, and CF force signals) were selected as inputs to obtain good feature visualization. After the original signal was transformed, PCA was applied for feature enhancement, to increase between-class separability and to reduce within-class separability. Finally, CNNs were used to perform classification. To evaluate the CNNs' classification process, 10-fold crossvalidation was performed, and the accuracy, sensitivity, specificity, and the AUC value were evaluated. The proposed method was able to achieve the highest performance for more than 97.42% of the parameters being evaluated and achieved superior performance in comparison with the detection and PD severity classification performance of state-of-the-art methods found in the literature.
Although the evidence indicates that the proposed method achieved good performance, there are several major drawbacks that could be improved. First, an existing database was used with the proposed method, and clinical data with a greater number of severity levels should be used to verify the performance and to resolve the limitation of the relatively small number of PD patients at certain severity levels in the current database. Clinical data collection will be carried out using a smart insole with an embedded 0.5" force sensing resistor of our own design. The precision and accuracy of force sensing resistor readings are also considered in order to obtain the correct representation of the vGRF signal. PD patients will be asked to perform some simple daily activities, such as turning around and sitting, instead of only walking down a long pathway. Second, long-term data collection to monitor PD progression is important for treatment decisions, since the gait patterns of PD patients appear to change with the long-term progression of the disease. Third, to further investigate the clinical meaning of the results, PD gait phenomena based on time-frequency spectrograms should be discussed with physicians. Fourth, other input data, such as kinetic data, temporal data, step length, and cadence, and other classifiers should be used to confirm and compare the effectiveness of pattern visualization and recognition based on the use of time-frequency spectrograms in PD detection.