4.1. Experiment Setup
The experiments were conducted on a laptop equipped with an Intel i5-9300H CPU 2.4 GHz and GPU NVDIA GeForce GTX 1650 Ti with Max-Q Design (Santa Clara, CA, USA). To investigate the impact of missing different body parts on PD detection, a lightweight gated initialization network was implemented. The detailed architecture of PD–Healthy Classification is shown in
Table 2; it was structured as follows: an initial layer consisting of 128 units, followed by a dropout layer with a rate of 0.2 to prevent overfitting. A second layer with 64 units was then applied, followed by a dense layer with 32 units and another dropout layer at 0.2. The final output layer consists of a single neuron with a sigmoid activation function.
The model was compiled using the Adam optimizer with a learning rate between 0.00005 and 0.001 in different missing frames and different body parts, and binary cross-entropy as the loss function. An early stopping mechanism was applied to monitor the validation loss, with a patience value of 5 epochs to avoid overfitting and ensure efficient training. The model was trained for up to 50 epochs with a batch size of 32, using a training–validation split for evaluation. The model was saved at the best epoch that achieved the best validation performance.
4.2. Occlusion Simulation
Occlusion in human pose estimation often arises due to unfavorable camera angles or the presence of obstructing objects, which results in incomplete capture of the human body. To replicate such real-world conditions, an extended version of the original dataset was constructed by artificially removing selected body keypoints within a defined range of video frames, specifically between frame 10 and frame 40.
The extended dataset is stored in a CSV format, where each record corresponds to a single video consisting of 50 frames. For every frame, 34 keypoint values were extracted using AlphaPose, representing the x- and y-coordinates of 17 human body keypoints. Consequently, each record contained a total of 1700 data values (34 keypoints × 50 frames), representing the complete sequence of extracted coordinates for one subject.
To further emulate real-world occlusions where specific body regions are blocked, the keypoints were grouped based on body parts: the head, upper body, hips, and legs. These groups of keypoints were selectively removed to simulate localized occlusion effects.
Figure 6a illustrates the full-body keypoints of a patient, with each keypoint making up from the x- and y-coordinates. In the head region shown in
Figure 6b, the occluded keypoints include the nose, left eye, right eye, left ear, and right ear. Each occluded keypoint was replaced with (x, y) = (0, 0) in the dataset. The occluded keypoints for the upper body shown in
Figure 6c include the left shoulder, right shoulder, left elbow and right elbow. For the hip region shown in
Figure 6d, the occluded keypoints include the left wrist, right wrist, left hip and right hip. Wrists were grouped under the hip region, since during walking, the hands (and wrists) typically align horizontally with the hips. The leg region shown in
Figure 6e includes keypoints of the left and right knees, as well as the left and right ankles.
Figure 7 illustrates the dataset preprocessing pipeline designed to simulate occlusion for testing modules, while the training modules utilize the complete dataset without missing values. The rationale for applying different preprocessing strategies to training and testing data is to assess how effectively the proposed technique can recover missing keypoints.
To simulate missing data caused by occlusions or sensor errors, specific keypoints are manually removed from the complete dataset. The processed dataset is then divided into five subsets: one with no missing values serving as the ground truth and four subsets with progressively missing frames, where the last 10, 20, 30, and 40 frames are removed, respectively. The removed values are replaced with zeros. This backward frame removal was used to simulate realistic challenges, as tracking methods usually rely on past frames to make predictions for the future state.
In addition, occlusions were simulated at the body-part level by independently removing grouped keypoints (head, upper body, hips, and legs). This enables analysis of which body region contributes most significantly to Parkinson’s disease classification. Once preprocessed as described, the dataset is ready for the next stage of experimentation.
Figure 7.
Flowchart of occlusion simulation in testing dataset.
Figure 7.
Flowchart of occlusion simulation in testing dataset.
4.3. Evaluation Metrics
To evaluate model performance,
Accuracy was used as the primary metric, followed by
Precision,
Recall and
F1-score, where the formula is as follows:
A confusion matrix is used to provide a detailed overview of model’s classification performance by comparing predicted labels with the actual labels.
Figure 8 shows a confusion matrix; each element in the matrix represents the number of samples belonging to a prediction–actual combination. In PD classification, the term true positive is defined as when the model correctly classifies PD, true negative is defined as when the model correctly classifies as healthy, false positive is defined as when the model incorrectly classifies healthy as PD, and false negative is defined as incorrectly classification of PD as healthy.
Several performance metrics are applied to assess the effectiveness of the proposed approach. The performance metrics used include Mean Absolute Error (
MAE) [
36], Mean Squared Error (
MSE) [
37], and Mean Absolute Percentage Error (
MAPE) [
38]. The formula for
MAE,
MSE, and
MAPE are presented as follows:
4.4. Evaluation of PD Classification with Occluded Parts
Table 3 presents the results of PD classification using LSTM under different occlusion scenarios. The complete dataset without any missing body parts or missing frames achieved the highest accuracy at 0.8913 and an F1-Score at 0.8387. From the overall comparison, leg and hip keypoints emerge as the most critical for accurate classification. In contrast, missing head and body keypoints still allow for moderate classification performance, which suggests that these regions are less sensitive to data loss.
The results also show that the effect of missing keypoints varies across different body regions. As shown in
Table 3, an increasing number of missing frames consistently reduces classification accuracy across all groups. In the head region, the accuracy decreases from 0.8587 (10 missing frames) to 0.7609 (40 missing frames). The body region shows a decline from 0.8370 to 0.5870, while the hips region drops from 0.8478 to 0.5870. The leg region shows the most significant decline, from 0.8043 to 0.5217, as the number of missing frames increases from 10 to 40.
This downward trend, moving from head to legs, highlights the increasing importance of lower-body keypoints in PD classification. Since PD is strongly linked to gait and movement abnormalities, missing information in the leg region significantly undermines the model’s ability to distinguish PD patients from healthy controls. Thus, ensuring reliable recovery of leg and hip keypoints is essential for maintaining classification performance.
4.5. Missing Keypoints Recovery Ability
Table 4 shows the performance of the proposed method on an incomplete dataset with 40 missing frames and 70 LSTM hidden units across different body parts. The results show that the method remains effective even when a large portion of the data is unavailable. Among all body regions, the head exhibits the highest error rate, while the legs have the lowest error rate, indicating stronger recovery ability in the lower body.
Table 5 shows the PD classification performance of the proposed method on an incomplete dataset with 40 missing frames and 70 LSTM hidden units across different body parts, and
Figure 9 shows the loss curves and accuracy curves of PD classification. The results show that the proposed method is able to recover the classification performance after using the proposed method. The body, hips, and leg regions are able to recover a classification performance that is the same as the classification performance without occlusion.
Table 6 further evaluates the method under varying numbers of missing frames, ranging from 10, 20, 30 and 40 for each body part. The error increases progressively as the number of missing frames grows. The head region shows the most significant performance drop, highlighting its vulnerability to missing information. On the contrary, the legs and hips maintain relatively low errors even with 40 missing frames, demonstrating greater resilience to missing data.
Table 7 investigates the impact of different LSTM hidden unit sizes (30, 50, and 70) on recovery performance. Interestingly, the results reveal that the optimal performance does not always align with the largest LSTM size. For example, the legs achieve their lowest errors with 50 hidden units, while the hips perform best with 30 units. Across all configurations, the legs and hips consistently exhibit the highest predictability, whereas the head and body show greater sensitivity to model architecture. Collectively, these findings suggest that the proposed method effectively recovers missing keypoints, particularly in the hips and legs, which are the most crucial for PD-related movement analysis.
Table 8 shows the performance of the proposed method on an incomplete dataset with 40 missing frames and 50 LSTM hidden units under different levels of occlusion across different body parts. A performance decline was observed when more body parts were being occluded. When all the body keypoints are occluded, this represents the worst-case scenario of the occlusion.
4.6. Comparison with Other Methods
Table 9 presents a comparison of four baseline models, namely Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), Recurrent Neural Network (RNN), Temporal Convolutional Network (TCN), Fusion of 2D Keypoint and GEI, and Spatio-Temporal Graph Convolutional Networks (STGCNs) against the proposed RecovGait framework in classifying PD, and
Figure 10 shows the confusion matrix of all the methods. These models were selected because they capture key temporal and spatial dependencies in gait data and have been widely applied in recent studies on human activity recognition and PD detection. Unlike RecovGait, these baseline models rely solely on the classification stage and do not incorporate initialization or movement tracking features, making their performance more vulnerable to degradation when the input data contain missing keypoints. In contrast, RecovGait integrates gated initialization and unscented tracking, enabling a more stable performance under incomplete data conditions. All models were trained using the complete dataset and subsequently evaluated on datasets with missing keypoints.
The results clearly demonstrate that RecovGait achieves a significantly higher accuracy of 0.8804, whereas the best-performing baseline model attains only 0.6957 accuracy. This substantial performance gap highlights the effectiveness of RecovGait in overcoming the negative impact of missing data on classification accuracy. Overall, the comparison outlines the importance of integrating recovery mechanisms into the model pipeline, showing that RecovGait not only improves robustness but also provides a more reliable solution for real-world PD classification tasks where missing data is inevitable.
Figure 10.
Confusion matrix of PD classification using RecovGait compared with other methods. (a) Confusion matrix of CNN. (b) Confusion matrix of GRU. (c) Confusion matrix of RNN. (d) Confusion matrix of TCN. (e) Confusion matrix of fusion of 2D Keypoint and GEI. (f) Confusion matrix of STGCN. (g) Confusion matrix of RecovGait.
Figure 10.
Confusion matrix of PD classification using RecovGait compared with other methods. (a) Confusion matrix of CNN. (b) Confusion matrix of GRU. (c) Confusion matrix of RNN. (d) Confusion matrix of TCN. (e) Confusion matrix of fusion of 2D Keypoint and GEI. (f) Confusion matrix of STGCN. (g) Confusion matrix of RecovGait.
4.8. Ablation Study
To understand the contribution of each component in the proposed method, an ablation study is conducted as summarized in
Table 10 and
Table 11. The experiments were evaluated under four configurations: (i) without any recovery techniques, (ii) using only the unscented tracking method, (iii) using only the gated initialization model, and (iv) the proposed method combining unscented tracking with gated initialization. The evaluation was performed on sequences with the last 40 frames missing for each body part. For both the gated initialization model and the proposed method, a lightweight gated initialization model with 50 hidden units was employed to ensure fair comparison.
As shown in
Table 10, the proposed method consistently outperforms the other configurations across all error metrics (MAE, MSE and MAPE). This demonstrates that the integration of unscented tracking with gated initialization enables the model to achieve accuracy levels comparable to more complex, higher-capacity architectures, while maintaining computational efficiency.
It is also observed that legs and hips exhibit lower recovery errors compared to head and body. The improvements for leg keypoints are particularly significant and visually more noticeable. This is likely due to the dynamic and periodic nature of leg movements in gait, which makes them inherently more predictable and well-suited for sequential modeling.
Figure 11.
Visualization of missing keypoints recovery using RecovGait.
Figure 11.
Visualization of missing keypoints recovery using RecovGait.
As shown in
Table 11, the missing body parts keypoints severely degrade PD classification performance. The application of any technique resolves this issue, bringing the classification performance back to a robust level. This indicates that all methods are sufficiently recovering keypoints to enable correct high-level classification.