Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task

Polvorinos-Fernández, Carlos; Sigcha, Luis; Valero, Mayca Marín; Grande, Miriam; de Arcas, Guillermo; Pavón, Ignacio

doi:10.3390/technologies14060345

Open AccessArticle

Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task

by

Carlos Polvorinos-Fernández

^1,*

,

Luis Sigcha

²

,

Mayca Marín Valero

³

,

Miriam Grande

³

,

Guillermo de Arcas

¹

and

Ignacio Pavón

¹

Department of Mechanical Engineering, Instrumentation and Applied Acoustics Research Group (I2A2), Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid, 28006 Madrid, Spain

²

ALGORITMI Research Center, School of Engineering, University of Minho, 4800-058 Guimaraes, Portugal

³

Asociación Parkinson Madrid, 28014 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(6), 345; https://doi.org/10.3390/technologies14060345

Submission received: 10 April 2026 / Revised: 29 May 2026 / Accepted: 8 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Advances in Biomedical Engineering and Artificial Intelligence for Neurological Health)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Assessing motor symptoms in Parkinson’s disease (PD) is challenging due to the progressive evolution of the condition and the variability of symptoms, which are not fully captured by periodic clinical visits. In this context, wearable sensors and machine learning (ML) have emerged as a viable path toward objective and continuous monitoring, although achieving robust generalization to free-living conditions remains a challenge. This work explores the egg-beating task, a simple everyday activity, as a digital approach for PD motor assessment using smartwatch-based inertial measurements and ML techniques. Twenty-two individuals with PD and sixteen healthy controls (HC) completed a one-minute egg-beating task while wearing a smartwatch equipped with tri-axial accelerometer and gyroscope sensors. Data were recorded both under supervised clinical conditions and during unsupervised home sessions. Time- and frequency-domain features were extracted from the inertial signals, and models trained exclusively on supervised recordings were then tested on supervised, unsupervised, and combined data. PD participants showed systematically lower movement amplitude, slower oscillation frequency, and a progressive drop in signal energy over the course of the task, all of which align with the characteristic features of bradykinesia. The support vector machine achieved the best overall performance, reaching 90% accuracy in distinguishing PD from healthy controls under supervised conditions, with a reduction of less than 4% when applied to unsupervised data. These results support the egg-beating task as a practical and ecologically valid method for real-world motor assessment, with potential for future use in remote monitoring and longitudinal assessment.

Keywords:

Parkinson’s disease; wearable devices; machine learning; diagnosis; inertial sensors

Graphical Abstract

1. Introduction

Parkinson’s disease (PD) is a chronic, progressive neurodegenerative condition marked by the progressive loss of dopaminergic neurons in the substantia nigra, a brain region essential for the control of voluntary movement [1]. It affects more than 1% of individuals over the age of 60 and ranks as the second most common neurodegenerative disorder after Alzheimer’s disease [2]. As the global population ages, the burden of PD is expected to grow substantially, with prevalence estimates rising from around 12 million cases in 2021 to nearly 25 million by 2050 [3]. This expected increase draws attention to the growing public health impact of PD and reinforces the need for effective approaches to diagnosis, treatment, and long-term care [2].

Clinically, PD is associated with a wide range of symptoms that are generally divided into motor and non-motor domains [4]. Motor manifestations include resting tremor, muscular rigidity, bradykinesia (slowness of movement), and postural instability, all of which markedly compromise mobility and daily functioning [5]. Non-motor symptoms, which may precede the onset of motor signs and often exert a considerable impact on quality of life, encompass a broad spectrum of features, including cognitive decline, anxiety, autonomic dysfunction, and sensory disturbances [6]. The complex and multifaceted nature of PD highlights the importance of a comprehensive and multidisciplinary approach to patient management [7].

The diagnosis of PD remains primarily clinical, based on the evaluation of both patient-reported and observed manifestations. At present, no single definitive test exists that can confirm PD with complete certainty during a patient’s lifetime [8]. Accordingly, diagnosis relies on neurological examination, detailed clinical history, and the systematic exclusion of alternative neurological or systemic conditions presenting with similar features [9].

To improve diagnostic accuracy, several structured criteria have been developed over recent decades [10]. Among the most widely used are those proposed by the UK Parkinson’s Disease Society Brain Bank (UKPDSBB) [11], as well as the revised criteria introduced by the Movement Disorder Society (MDS) [12].

Both the UKPDSBB and MDS frameworks identify bradykinesia as a fundamental feature. In addition, each requires the presence of at least one of the other cardinal motor signs (resting tremor, muscular rigidity, or postural instability) to support a diagnosis of PD. The diagnostic accuracy of these criteria has been reported to range from approximately 65% to 90%, depending on the stage of the disease and the level of clinical expertise [8].

Wearables and Machine Learning for Motor Assessment

In recent years, the subjective nature of clinical PD assessment has driven growing interest in the use of wearable sensors for the continuous and objective monitoring of motor symptoms. Devices incorporating accelerometers, gyroscopes, and other inertial sensors are capable of capturing movement-related signals in both clinical and real-world environments, thereby offering a more representative view of a patient’s motor status in daily life [13].

Machine learning (ML) has proven useful for making sense of the large amounts of data that wearable sensors produce, identifying patterns that would be difficult to detect through conventional analysis. Unlike traditional clinical evaluation, which relies heavily on the examiner’s judgement, ML models can pick up movement patterns that separate PD patients from healthy controls (HC) even when data is collected under varying conditions. This is consistent with recent reviews noting that ML is particularly well suited to pulling clinically useful information out of multimodal sensor data [14].

Among the different types of wearable technologies, smartwatches have become some of the most widely used devices in the context of PD diagnosis and management. Equipped with inertial measurement units (IMUs), these devices allow for continuous and unobtrusive monitoring of both motor and physiological activity, making them particularly well suited for integration with ML-based approaches. Table 1 provides a summary of prior studies in which IMUs have been employed to assess motor symptoms associated with PD.

To enhance the accuracy and clinical relevance of PD diagnostic approaches, it is essential to perform assessments in both controlled clinical environments and free-living conditions. In addition, effective free-living evaluation for PD diagnosis requires the identification of activities that best capture the heterogeneous manifestations of the disease. Ruiz-Vitte et al. [24] examined a set of ten daily activities, including buttoning a button, cutting with a knife, and eating or drinking with utensils, obtaining precisions between 71% and 94% using a Support Vector Machine (SVM). Likewise, Li et al. [25] introduced twelve activities, such as standing up or picking up objects, reporting performance values ranging from 66% to 76% with k-Nearest Neighbours (kNN), SVM, and XGBoost models. More recently, Wang et al. [26] proposed a wearable machine learning framework for assessing PD severity from short daily activities using feature engineering and supervised classifiers, achieving up to 84.7% accuracy and area under the ROC curve (AUC-ROC) values above 0.90 by combining representative tasks such as walking, rising from a chair, and drinking.

Since bradykinesia is a mandatory criterion under current PD diagnostic guidelines, finding tasks that reliably elicit and capture it is particularly important. Bradykinesia is defined by slowness of movement or a progressive reduction in movement amplitude, often accompanied by increasing hesitations or pauses as the movement progresses [27]. Consequently, identifying everyday repetitive activities that allow the assessment of bradykinesia without modifying its natural expression is of particular relevance [28].

Tasks designed for clinical use often bear little resemblance to what patients actually do at home, which creates a mismatch when models trained in the lab are deployed in real-world settings. This gap tends to hurt model performance in practice and limits how useful these systems can actually be. As a result, there is a need for natural, repeatable daily activities that maintain task consistency while reflecting real-world behavior, thereby supporting robust ML-based motor assessment. This issue corresponds to a fundamental challenge in machine learning for digital health, namely generalization under distribution shift with limited labelled data.

For this reason, the primary aim of this work was to assess participants’ condition while performing a simple, reproducible daily-life task consisting of beating eggs for one minute. This activity was selected due to its ecological validity, minimal learning requirements, and its ability to elicit fine motor movements. The task involves repetitive flexion–extension movements of the wrist together with rotational coordination, requiring a sustained range of motion and rhythmic consistency, both of which are commonly affected in individuals with bradykinesia. Furthermore, as egg beating is also used as an occupational therapy exercise for patients, its evaluation not only enables the quantification of motor impairment but may also provide insights into its potential therapeutic relevance.

To our knowledge, this is the first study that investigates a simple, ecologically valid daily-life activity, such as the egg-beating task, for PD assessment under both supervised and unsupervised free-living conditions, explicitly addressing the challenge of machine learning generalization under domain shift. This work contributes: (1) an ML-oriented evaluation of generalization under domain shift between supervised clinical and unsupervised free-living conditions using participant-level splits; (2) a novel free-living egg-beating motor task that captures temporal degradation patterns associated with bradykinesia and is specifically designed for ML-based assessment; and (3) an open IMU-based dataset to support reproducible ML research on real-world motor assessment. The rest of this paper is organized as follows: Section 2 presents the methods used for data collection and analysis, Section 3 presents the results, Section 4 discusses the results, and, finally, Section 5 provides the conclusions.

2. Materials and Methods

2.1. Study Design

This study was an observational control trial without randomization, comprising two parallel groups defined by participants’ condition, including individuals with PD and HC. The inclusion and exclusion criteria used for recruitment are described in the study protocol (NCT06817772) [29]. All PD patients were diagnosed in accordance with the UKPDSBB, were in the early stages of the disease (Hoehn and Yahr scale [30] stage ≤ 2.5) and exhibited no motor symptom fluctuations.

The study was conducted over a total duration of eight consecutive days. On Day 1, participants performed the egg beating activity under supervised conditions in a research facility. Subsequently, for the following seven days (Days 2–8), participants were instructed to perform the same task once per day in their home environment in unsupervised contexts. On the final day of the study (Day 8), participants returned to the clinic and repeated the activity under supervised conditions. Figure 1 presents an overview of the experimental protocol.

The same protocol was followed for both the PD and HC groups. Participants carried out the egg-beating task continuously for one minute at a pace of their own choosing, while being asked to preserve a consistent performance across the eight-day period. Before data collection began, all participants attended a training session where the task was explained and demonstrated to make sure everyone understood what was expected of them. This was done to reduce variability that could stem from participants interpreting or performing the task differently.

To limit the influence of medication timing on the results, all participants were asked to perform the task at the same time each day throughout the home sessions. In addition, potential sources of bias related to external conditions were controlled by providing all participants with identical materials. Specifically, each participant used the same type of container for the task, an identical fork, and eggs obtained from the same commercial brand and batch. Using identical equipment across all sessions made the data easier to compare and helped ensure the results could be reproduced.

2.2. Data Acquisition Device

A custom-built smartwatch was worn on the dominant wrist by all participants during every data collection session, as shown in Figure 2. At the core of the device is the LSM6DSO sensor (STMicroelectronics, Geneva, Switzerland) which combines a tri-axial accelerometer and a tri-axial gyroscope within a single microelectromechanical systems (MEMS) unit. The accelerometer operates with a full-scale range of ±78.45 m/

s^{2}

and a sensitivity of 0.00239 m/

s^{2}

, while the gyroscope captures angular velocity over a full-scale range of ±17.45 rad/s with a resolution of 0.00122 rad/s.

A custom application was developed for the smartwatch to capture accelerometer and gyroscope signals simultaneously at 50 Hz, which is sufficient to resolve the fine wrist movements involved in egg beating [31]. The interface of the development app is shown in Figure 3. The application operates in a locked mode, so participants could not accidentally navigate away from the recording screen during a session. To support participant adherence, the interface also displays standard watch information, such as the current time, date, and remaining battery level. Throughout each session, the recorded signals were stored locally in the device’s internal memory and later retrieved at the end of the study for preprocessing and subsequent analysis.

2.3. Data Collection

The dataset was obtained from a cohort consisting of 22 patients diagnosed with PD and 16 participants classified as healthy controls (HC). Across all sessions, 57.9% of the recordings came from PD participants and 42.1% from HC, giving a reasonably balanced dataset. For transparency and reproducibility purposes, the complete dataset has been made publicly accessible [32]. To track potential medication effects, participants kept a daily written log of their medication intake throughout the study. The logs showed consistent adherence across sessions, which is especially relevant here given that all PD participants were in early stages of the disease, where medication-related motor fluctuations are typically not a major factor.

Table 2 summarizes the demographic characteristics of the participants included in the study.

Figure 4 presents an example of a data collection session, showing a patient with PD performing the egg-beating task while wearing the smartwatch.

2.4. Signal Processing and Machine Learning Models

An overview of the workflow followed in this study is presented in Figure 5. The signal processing stage was carried out in MATLAB 2025a, whereas all machine learning procedures were implemented in Python 3.12. The ML pipeline in Python was constructed using several libraries, including scikit-learn 1.6.1 for the development and evaluation of the models, numpy 2.0.2 and scipy 1.16.3 for numerical operations and signal processing tasks, and matplotlib 3.10.0 for the generation of visualisations.

Before proceeding with data processing, the quality of the unsupervised home recordings was first assessed through a participant-wise visual inspection of the inertial signals. In this step, recordings obtained during home sessions were compared against the corresponding supervised baseline for each individual. As the task was standardized and consistently performed by the same participant, similar signal patterns were expected across sessions. In addition to visual inspection, quantitative criteria based on signal similarity and energy thresholds were considered to ensure consistent exclusion of non-compliant recordings. Any recordings that exhibited marked discrepancies relative to the supervised baseline, suggesting possible non-compliance with the task or inconsistencies in execution, were removed from the analysis. Sessions that were not available were treated as missing data and excluded accordingly.

Following this quality control step, each session was labelled according to its acquisition context, distinguishing between supervised clinical recordings and unsupervised free-living conditions, as well as the corresponding participant group (PD or HC).

The inertial signals were subsequently processed using a third-order Butterworth high-pass filter with a cutoff frequency of 0.5 Hz. This filtering stage was applied to attenuate low-frequency components primarily associated with gravitational acceleration, thereby preserving the frequency range more representative of voluntary human movement [33].

A qualitative assessment of the filtered signals was then conducted through visual inspection of the time-domain waveforms and the corresponding scalograms for both accelerometer and gyroscope data. This step was motivated by the expectation that bradykinesia-related characteristics, such as progressive reductions in movement amplitude and alterations in rhythmic patterns, would be more readily observable in the time–frequency domain than when relying solely on aggregated statistical features.

After filtering, the signals were divided into overlapping segments using a sliding window approach, a commonly adopted method in time-series analysis and classification. Since bradykinesia is characterized by a gradual deterioration in motor performance, adequately capturing its temporal dynamics requires windows of sufficient length. Accordingly, three window sizes (128, 256, and 512 samples) and four overlap ratios (0%, 25%, 50%, and 75%) were systematically evaluated to determine the configuration that most effectively represents the motor patterns associated with the egg-beating task. The final selection was based on the average classification performance across all four ML models, rather than optimization for a single classifier, with the aim of achieving a more robust and generalizable outcome.

For each combination of window size and overlap, a comprehensive set of features was extracted to characterize the temporal properties associated with bradykinesia. These features comprised statistical descriptors from both the time and frequency domains, including the mean, standard deviation, entropy, spectral centroid, skewness, and kurtosis. Feature extraction was performed independently for each sensor axis, as well as for the root mean square (RMS) signal obtained from the combination of the three axes, as defined in Equation (1) for the accelerometer and Equation (2) for the gyroscope.

a = \sqrt{{a_{x}}^{2} + {a_{y}}^{2} + {a_{z}}^{2}}

(1)

ω = \sqrt{{ω_{x}}^{2} + {ω_{y}}^{2} + {ω_{z}}^{2}}

(2)

A general description of the features is provided in Table 3. In total, 90 features were computed per sensor. Both accelerometer and gyroscope data were utilized, which resulted in a combined total of 180 features per window.

Due to the high dimensionality of the feature matrix obtained, a feature selection or dimensionality reduction stage was introduced prior to model training. To uncover redundancy among variables, a correlation-based approach was adopted, relying on the Pearson correlation coefficient as defined in Equation (3). Feature pairs presenting a correlation above 0.85(

| r | \geq 0.85

) were considered to carry overlapping information, and in such cases only one feature from each pair was preserved. This operation was performed only on the training data to avoid any risk of information leakage into the test set. The choice of a correlation-based filter was motivated by its model-agnostic nature, enabling the reduction of redundancy while preserving the interpretability of the feature space.

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2}} \cdot \sqrt{\sum {(y_{i} - \bar{y})}^{2}}}

(3)

The dataset was divided into two subsets, allocating 70% for model training and 30% for testing, using a random split while maintaining the class distribution across both groups. To obtain stable performance estimates and reduce dependence on a single data partition, this procedure was repeated 10 times with different random seeds. The reported metrics correspond to the mean values across these 10 iterations.

The division was carried out at the participant level rather than at the window level, thereby avoiding subject-specific information leakage and supporting a more realistic assessment of model generalization. A participant-level hold-out strategy was selected in preference to cross-validation to ensure a clear separation between supervised training data and free-living evaluation data, in line with the objective of evaluating real-world performance under domain shift. Moreover, the training subset was composed exclusively of recordings obtained under supervised conditions, where task execution was verified. In contrast, reflecting the focus of this work on generalization under domain shift with limited labelled data, the test set included recordings from participants under both supervised and unsupervised conditions, enabling the comparison of performance across different contexts.

Four ML classification algorithms were considered in this study: kNN, SVM, RF, and LR. For each model, a hyperparameter tuning process was conducted to enhance predictive performance and computational efficiency. A random search approach was adopted, allowing exploration of a wide range of parameter configurations and the identification of those yielding optimal performance for each model [34]. This procedure involved 10 random iterations per model, and the final configuration was selected based on the average performance across multiple evaluation metrics, including accuracy, precision, recall, specificity, and F1-score.

Hyperparameter selection was performed exclusively within the training partition, with the test set held out entirely and used only for final performance evaluation. This hyperparameter optimization was performed using a Randomized Search. For each candidate configuration, model performance was estimated using stratified 5-fold cross-validation grouped by participant identity, ensuring that all windows belonging to the same participant were assigned to the same fold and never split across training and validation subsets within the inner loop, following the same participant-level grouping strategy applied in the outer train-test split. This internal cross-validation was carried out independently within each of the 10 outer participant-level train–test splits.

For the RF classifier, the tuning process covered a set of hyperparameters commonly adopted in IMU-based classification problems. Specifically, the number of decision trees was set to 50, 100, and 200, allowing the effect of ensemble size to be assessed. Tree complexity was controlled by testing both unconstrained growth and maximum depths fixed at 5 and 10 levels. In addition, the minimum number of samples required to perform a split and to form a leaf node were varied, using thresholds of 2 and 5 for splitting, and 1 and 2 for leaf nodes, respectively.

Regarding the kNN model, the neighbourhood size was explored using values of 3, 5, 7, and 9. Two weighting strategies were evaluated: uniform weighting and distance-based weighting. In parallel, two distance measures—Manhattan and Euclidean—were considered. Given the dimensionality of the feature space after selection, weighting neighbours according to distance was expected to better capture local structure, as closer observations are typically more informative in high-dimensional sparse settings.

For the SVM classifier, the regularisation parameter C was tested at 0.1, 1, and 10. To assess the nature of the decision boundary, three kernel types were examined: linear, radial basis function, and polynomial. This allowed the evaluation of whether nonlinear mappings were necessary to separate the classes effectively. The observed results, with SVM achieving the best overall performance, suggest that the underlying feature space derived from the egg-beating task is largely linearly separable, which aligns with the structured and repetitive characteristics of the activity.

In the case of the LR model, the regularisation strength C was also varied across 0.1, 1, and 10. Both L1 and L2 regularisation schemes were investigated to analyse their impact on model sparsity and generalisation capability. The maximum number of iterations was set to either 100 or 300 to ensure convergence under all tested configurations. Particular attention was given to L1 regularisation, as it tends to yield sparse solutions, which is advantageous in settings with a relatively large number of features and may improve interpretability in clinical applications.

Model performance was assessed using a set of standard evaluation metrics, namely accuracy, precision, recall, specificity, F1-score, and AUC-ROC. Each of the four classifiers was trained and tested across all twelve combinations of window sizes and overlap ratios. The choice of the final segmentation configuration was based on the average performance obtained across the four models, rather than on the outcome of a single classifier, as configurations that demonstrate stable performance across different algorithms are more indicative of robust and meaningful signal patterns than those tailored to a specific model. Once the optimal configuration was identified, the classifier that achieved the highest performance within that setting was selected to carry out all subsequent analyses.

The overall workflow of the proposed methodology is summarized in Algorithm 1, which presents the process in the form of pseudocode.

Algorithm 1 Signal Processing and Machine Learning Workflow.

Perform signal processing (MATLAB) and ML implementation (Python)

for each participant do

Compare home recordings with supervised baseline

Discard inconsistent or missing sessions

end for

Label data by context (supervised/unsupervised) and group (PD/HC)

Apply Butterworth high-pass filter (3rd order, 0.5 Hz)

for each window size and overlap do

Segment signals (sliding window)

Extract time- and frequency-domain features

Compute RMS magnitude (accelerometer, gyroscope)

end for

Remove correlated features (

| r | \geq 0.85

) using training data

Split dataset (70% train, 30% test) at participant level

for each model (kNN, SVM, RF, LR) do

Perform random search (10 iterations)

Evaluate performance (Accuracy, Precision, Recall, F1, AUC)

end for

Select optimal segmentation and best-performing model

3. Results

3.1. Signal Analysis

The temporal behavior of the filtered acceleration signal during the egg-beating task is illustrated in Figure 6, where Figure 6a corresponds to a participant with PD and Figure 6b corresponds to a healthy control (HC) participant. The corresponding frequency-domain representations of these recordings are shown in Figure 7. It should be noted that all plots in this section are derived from the same recording for each participant within their respective clinical group.

The most obvious difference between the two groups is the amplitude of the acceleration signals. The PD participant in Figure 6a exhibits motion confined within approximately ±5 m/

s^{2}

, whereas the HC participant in Figure 6b reaches values close to ±10 m/

s^{2}

. This roughly twofold difference in amplitude was seen consistently across PD participants and matches what would be expected from the reduced motor output characteristic of bradykinesia.

Beyond amplitude, the two groups also differ in how movement energy is distributed across the three axes. For HC participants, the signal is predominantly concentrated along the Y and Z axes, which aligns with the expected kinematics of the egg-beating motion. In contrast, PD participants display a more uniform distribution of energy across all three axes, indicating reduced directional consistency in the movement. This observation was not explicitly anticipated prior to data collection and emerged from visual inspection of the signals, suggesting a potential indicator of motor disorganization in PD due to bradykinesia.

The scalograms highlight a pronounced contrast in the temporal evolution of movement frequency between the two groups. In the PD participant shown in Figure 7a, the dominant frequency shows a gradual decline over the duration of the one-minute task. This pattern appeared across PD participants and is consistent with what bradykinesia would produce in a sustained repetitive task. In contrast, the HC participant in Figure 7b maintains a steady dominant frequency from start to finish, reflecting consistent and well-coordinated movement throughout. The one-minute task turned out to be long enough to capture this gradual decline, something that would probably go unnoticed in shorter recordings.

The PD participant also shows a drop in the overall magnitude of the acceleration vector over time, pointing to a gradual loss of movement intensity that is often seen in PD as the disease affects motor endurance. In contrast, the HC participant maintains a relatively constant amplitude over time, reflecting both a higher level of endurance and the capacity to preserve movement intensity throughout the task.

The gyroscope signals reinforced the patterns identified in the accelerometer signals, while also adding information about the rotational component of the movement. As illustrated in Figure 8a, the PD participant exhibits a clear and progressive reduction in angular velocity across all axes over the duration of the task, reflecting increasing difficulty in maintaining the wrist rotation required for egg beating. This decline was more pronounced in the gyroscope than in the accelerometer data, which suggests that wrist rotation may be more sensitive to bradykinesia than linear acceleration in this particular task. In contrast, the HC participant shown in Figure 8b preserves a nearly constant angular velocity throughout the one-minute task, indicating a more consistent, coordinated, and sustained motor performance.

The frequency behavior of the gyroscope signal further supports and extends these findings. For the PD participant shown in Figure 9a, both the amplitude and the dominant frequency of angular velocity decrease progressively over the course of the task, with the reduction in frequency appearing more pronounced than that observed in the accelerometer scalogram. In contrast, the HC participant depicted in Figure 9b maintains relatively stable amplitude and frequency throughout the one-minute egg-beating activity, with no evident downward trend.

The trends described, namely the gradual reduction in movement amplitude, the decrease in dominant frequency, and the attenuation of overall signal energy, were consistently observed among PD participants across both accelerometer and gyroscope data. These observations directly shaped the feature extraction stage, where we focused on capturing exactly these kinds of temporal, spectral, and energy-related changes. The next subsection evaluates whether these differences at the signal level are reflected in robust classification performance under both supervised and unsupervised recording conditions.

3.2. ML Analysis

The ML analysis was performed using the 103 features that remained after applying the correlation-based feature selection procedure, corresponding to approximately 53% of the initial set of 180 features. Table 4 reports the mean classification performance of the four evaluated models (kNN, SVM, RF, and LR) across all combinations of window sizes and overlap ratios. The results show a clear trend: larger window sizes consistently produce better classification performance across all models and metrics.

In particular, window lengths of 512 samples consistently achieve the best performance across all overlap settings, surpassing the results obtained with 128- and 256-sample windows. Within that window size, a 50% overlap gave the best results, peaking at 84.2% accuracy with similarly strong scores across the remaining metrics. In comparison, the shorter window sizes (128 and 256 samples) produce lower and relatively uniform performance across different overlap ratios, without a clearly superior configuration. The 512-sample window was also more sensitive to the choice of overlap, with intermediate values clearly outperforming both extremes.

Based on these observations, the 512-sample window combined with a 50% overlap was selected as the optimal segmentation scheme for the subsequent analysis, as it consistently delivered the highest average performance across all evaluated models and metrics. Using this configuration, the four ML models were assessed on data collected under supervised conditions, enabling the identification of the top-performing classifier. The results, summarized in Figure 10, indicate that the SVM achieved the best overall performance, with metrics close to 90%, making it the most effective model among those considered. This points to the feature space being largely linearly separable, with little to gain from more complex decision boundaries. The LR model followed with an approximate performance of 86%, while kNN and RF produced comparable results of around 81%. The error bars included in Figure 10 represent the variability of each reported metric across the 10 independent random train–test splits, reflecting the robustness of each classifier to different data partitions. Overall, these findings highlight the strong discriminative capability of the egg-beating task in distinguishing between PD and HC participants.

To further assess whether the SVM model captures clinically meaningful patterns rather than participant-specific characteristics, permutation importance was computed across 10 independent random seeds, with 5 repeats, using accuracy as the scoring metric. Figure 11 presents the top 10 features ranked by mean accuracy decrease. The most influential features were Correlation, Hjorth Complexity, and Entropy, all of which are aligned with the progressive amplitude reduction, frequency slowing, and loss of movement consistency observed in the signal analysis section and carry direct physiological relevance to bradykinesia. Inter-axis gyroscope correlation reflects the loss of directional consistency in wrist movement, Hjorth Complexity captures changes in signal regularity associated with motor degradation, and Entropy quantifies the progressive fragmentation of rhythmic patterns over the course of the one-minute activity. The narrow error bars across seeds confirm that these importance estimates are stable and not dependent on a particular data partition.

Beyond the SVM, the robustness of the remaining classifiers was verified through convergence diagnostics provided as Supplementary Materials. The out-of-bag error curve for the Random Forest classifier (Supplementary Figure S1) decreases sharply up to 80 trees and stabilizes thereafter, confirming ensemble convergence at the 100 trees configuration used in the study. The training log-loss for the Logistic Regression model (Supplementary Figure S2) converges before 100 iterations for the L2 regularization scheme, well below the maximum number of iterations employed, confirming that the iterative optimization reaches a stable solution.

From this point on, the evaluation centered on the best-performing ML model (SVM), which was trained using data acquired under supervised conditions and then used to infer clinical labels from recordings obtained in different contexts. This setup was designed to examine the model’s ability to generalize beyond the conditions under which it was trained, a key requirement for real-world applicability where data acquisition is not always controlled. A comparison of the results obtained from supervised recordings, unsupervised recordings, and the combination of both is provided in Table 5, which reports the performance metrics achieved under each scenario.

Overall, the SVM model trained on supervised data produced marginally higher values across most evaluation metrics when compared with the results obtained from unsupervised recordings and from the combined dataset. The differences observed, however, were modest, on the order of 3–4%, which indicates a high degree of consistency in model performance across different acquisition settings. The small drop in performance under unsupervised conditions suggests that the egg-beating task itself helps reduce the domain shift problem that typically hampers real-world health monitoring applications.

The confidence intervals are narrow across all three settings, which indicates stable and consistent results despite the relatively small sample. A slight increase in the width of the intervals can be observed when moving from supervised recordings (±1.1–1.4%) to unsupervised ones (±0.4–0.7%), which is consistent with the higher variability typically associated with uncontrolled conditions. Even so, the overall narrow intervals indicate that the reported performance estimates are robust.

The confusion matrices shown in Figure 12 provide a detailed view of the classification outcomes for each recording context in terms of distinguishing between PD and HC participants. When trained on supervised data, the model achieves the highest overall accuracy, correctly identifying 91.6% of PD cases and 90.6% of HC subjects, with balanced performance across both classes. Under unsupervised conditions, performance decreases slightly but remains high, with correct classification rates of 89.0% for PD and 86.5% for HC, alongside modest increases in misclassification rates of 11.0% and 12.5%, respectively. Training on the full dataset results in intermediate performance, with accuracies of 89.6% for PD and 88.6% for HC. In all three scenarios, false positive and false negative rates stayed below 12%, reflecting reliable discrimination between PD and HC participants.

Figure 13 presents the AUC-ROC curves obtained for the SVM model under the different recording conditions. High values are observed across all three scenarios, further supporting the ability of the egg-beating task to capture biomechanical differences associated with PD. In line with the previously reported metrics, the model trained and evaluated on supervised data achieves the best performance, slightly exceeding the results obtained with both the unsupervised and combined datasets. Notably, the curves corresponding to the unsupervised and combined conditions are closely aligned, suggesting that the information extracted from the task remains stable and informative even when acquired outside controlled environments.

4. Discussion

The results of this study show that the egg-beating task works as a practical tool for assessing motor function in PD. The signal features extracted during the task, particularly the progressive drop in amplitude, frequency, and signal energy seen in PD participants, were consistent enough across sessions to support reliable classification in both settings. Importantly, these distinctions were obtained from a simple, one-minute daily activity performed using a standard fork, without reliance on specialized instrumentation or continuous clinical supervision.

Both the accelerometer and gyroscope data showed clear, repeatable differences between the two groups. Participants with PD consistently showed diminished movement amplitude, a progressive slowing of movement frequency, and a decline in signal energy over time. These findings are consistent with how bradykinesia is defined and measured in established clinical scales such as the MDS-UPDRS, particularly in items that assess repetitive hand movements. By contrast, healthy participants exhibited stable, rhythmic, and sustained motion throughout the activity, providing a robust baseline for normal motor behavior. Together, these findings suggest that simple everyday tasks, when paired with wearable sensors, can pick up on motor impairments that standard clinical assessments may miss.

One practical advantage of this approach is that the model outputs are relatively easy to interpret. The classifications can be traced back to concrete signal features like the gradual drop in spectral energy and the shift in dominant frequency visible in the scalograms. These signal-level patterns correspond closely to the features driving the model’s predictions, offering a transparent link between raw sensor data and classification outcomes. This matters in clinical settings, where clinicians need to understand why a model produces a given output, not just whether it is accurate.

Of the four models tested, the SVM performed best, reaching accuracies close to 90%. This suggests that the model is well suited to handling the variability inherent in biomechanical signals, as well as the inter-individual differences characteristic of PD. One of the main contributions of this work is showing that the model holds up when tested on data collected at home without supervision. The transition from supervised to unsupervised data resulted in a performance reduction of less than 4%, indicating strong generalization despite the increased variability associated with real-world recordings. These findings underscore the importance of task design in improving model robustness, rather than relying exclusively on more complex algorithms, ensuring that the data acquisition process captures ecologically valid and temporally consistent behavior can substantially reduce the impact of domain shift.

Compared with prior studies using wrist-worn IMU data for PD classification, the results obtained here are competitive not only in terms of supervised accuracy but also in the relatively small performance gap observed under unsupervised conditions. Previous work by Varghese et al. [15] reported accuracies between 80% and 88%, while Shawen et al. [17] achieved AUC values of up to 0.68 for bradykinesia detection. More recent approaches based on daily-life data, such as Evers et al. [35], reported AUC values ranging from 0.70 to 0.84 when analyzing heterogeneous real-world activities. In contrast, the present study achieves accuracies exceeding 90% under supervised conditions and maintains values close to 87% in unsupervised home settings, reflecting a comparatively smaller degradation in performance. This suggests that the combination of a well-chosen task and a straightforward model produces features that travel better between training and real-world deployment.

In practice, the approach only needs a single supervised session to train the model, after which patients can carry out assessments on their own at home. This is particularly relevant for longitudinal monitoring, since it removes the need for repeated clinic visits to track how motor function changes over time. For early-stage patients like those in this cohort, this could mean catching signs of progression sooner and adjusting treatment before symptoms worsen significantly.

Limitations and Future Work

Despite these promising results, several limitations should be considered. The sample was large enough to show that the approach provides preliminary evidence supporting the feasibility of the approach, but remains too small to exclude residual overfitting or participant-specific effects. Segmenting the signals into windows increases the number of data points available for training, but the number of independent participants remains the binding constraint. Future studies should recruit larger and more varied cohorts, covering a wider range of disease stages, and should look at whether the approach can distinguish between different levels of severity. In addition, further investigation of medication effects is warranted, particularly in more advanced stages of PD where motor fluctuations become more pronounced and may significantly influence performance. The proposed methodology could also be extended to other movement disorders characterized by motor impairment, such as essential tremor, which presents a substantially higher prevalence and shares certain motor features that may be captured through the same wearable-based assessment framework.

From a methodological perspective, the combination of a relatively small cohort and a high-dimensional feature space introduces a potential risk of overfitting. Participant-level splitting and correlation-based feature selection helped mitigate this, but future work with larger datasets should explore more sophisticated feature selection or representation learning methods. Also, future work should also examine the sensitivity of hyperparameter optimization to sample size. With cohorts of limited size, cross-validation folds necessarily contain few independent participants, which may increase the variance of internal performance estimates during model selection. Larger datasets would allow the exploration of more systematic optimization strategies, such as nested cross-validation, and more principled dimensionality reduction approaches that could yield more parsimonious models.

The home sessions introduced some real-world variability, but the protocol was still semi-controlled in important ways. Moving to fully unsupervised, long-term home monitoring would give a much richer picture of how motor function varies day to day and changes over time. Incorporating longitudinal data would also enable the investigation of disease progression and the sensitivity of the proposed task to subtle changes over time. In parallel, future research should consider integrating additional daily activities, such as walking or object manipulation, to develop a more comprehensive framework capable of capturing a wider range of PD motor symptoms beyond bradykinesia.

The study also relied entirely on inertial data from a single wrist-worn device. The inclusion of additional data modalities—such as physiological signals, medication timing, or patient-reported outcomes—could provide a more holistic understanding of both motor and non-motor symptom interactions. Testing different sensor positions and device types, and validating across devices, would be important steps before deploying this kind of system more broadly.

5. Conclusions

This study set out to evaluate whether the egg-beating task, recorded with a smartwatch and analyzed with ML, could serve as a practical tool for PD motor assessment. The results demonstrate that a one-minute task performed with standard household objects generates signals rich enough to separate PD patients from healthy controls, with accuracies above 90% in supervised conditions and above 87% at home.

Across both sensors, PD participants consistently showed lower movement amplitude, slower dominant frequency, and declining signal energy, which are patterns that map directly onto the clinical features of bradykinesia.These observations shaped the feature extraction strategy from the outset.

A key outcome of this work is the minimal loss of performance observed when moving from controlled to uncontrolled recording conditions, suggesting that the structured nature of the task helps preserve a stable feature representation across contexts. This directly tackles one of the main obstacles to deploying ML in real-world health monitoring.

Finally, the full dataset has been made publicly available so that other groups can reproduce these results and build on them. Next steps should include validation in larger and more diverse cohorts, extension to different disease stages, and integration of additional sensing modalities to build more complete assessment tools.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/technologies14060345/s1, Figure S1: Out-of-bag (OOB) error curve of the Random Forest classifier; Figure S2: Training log-loss convergence of the Logistic Regression classifier.

Author Contributions

Conceptualization: C.P.-F. and I.P.; Data curation: C.P.-F.; Formal analysis: C.P.-F.; Funding acquisition: G.d.A. and I.P.; Investigation: C.P.-F. and I.P.; Methodology: C.P.-F., M.M.V., M.G. and I.P.; Project administration: I.P.; Resources: I.P.; Supervision: I.P.; Validation: C.P.-F. and L.S.; Visualization: C.P.-F.; Writing—original draft: C.P.-F., L.S., M.M.V. and M.G.; Writing—review & editing: C.P.-F., L.S., M.M.V., M.G., G.d.A. and I.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was possible thanks to the financing of the project BIOCLITE PID2021-123708OB-I00, funded by MCIN/AEI/10.13039/501100011033/FEDER, EU.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Universidad Politécnica de Madrid (BDPLEDEMDP-IPG-DATOS-20231030, 3 November 2023).

Informed Consent Statement

Written informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset will be made available after publication at [32].

Acknowledgments

During the preparation of this work, the authors used DeepAI 1.7.0 to assist with stylistic editing. Following the use of this tool, the authors reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC-ROC	Area under the curve—Receiver operating characteristic
CNN	Convolutional Neural Network
F	Female
FFT	Fast Fourier Transform
HC	Healthy control
IMU	Inertial Measurement Unit
kNN	k-Nearest Neighbours
LR	Logistic regression
M	Male
ML	Machine Learning
MEMS	Microelectromechanical systems
UKPDSBB	United Kingdom Parkinson’s Disease Society Brain Bank
MDS	Movement Disorder Society
PD	Parkinson’s disease
PSD	Power Spectral Density
RMS	Root mean square
RF	Random Forest
SD	Standard deviation
SVM	Support Vector Machine

References

Poewe, W.; Seppi, K.; Tanner, C.M.; Halliday, G.M.; Brundin, P.; Volkmann, J.; Schrag, A.E.; Lang, A.E. Parkinson disease. Nat. Rev. Dis. Primers 2017, 3, 17013. [Google Scholar] [CrossRef]
Luo, Y.; Qiao, L.; Li, M.; Wen, X.; Zhang, W.; Li, X. Global, regional, national epidemiology and trends of Parkinson’s disease from 1990 to 2021: Findings from the Global Burden of Disease Study 2021. Front. Aging Neurosci. 2025, 16, 1498756. [Google Scholar] [CrossRef]
Su, D.; Cui, Y.; He, C.; Yin, P.; Bai, R.; Zhu, J.; Lam, J.S.T.; Zhang, J.; Yan, R.; Zheng, X.; et al. Projections for prevalence of Parkinson’s disease and its driving factors in 195 countries and territories to 2050: Modelling study of Global Burden of Disease Study 2021. BMJ 2025, 388, e080952. [Google Scholar] [CrossRef]
Leite Silva, A.B.R.; Gonçalves de Oliveira, R.W.; Diógenes, G.P.; de Castro Aguiar, M.F.; Sallem, C.C.; Lima, M.P.P.; de Albuquerque Filho, L.B.; Peixoto de Medeiros, S.D.; Penido de Mendonça, L.L.; de Santiago Filho, P.C.; et al. Premotor, nonmotor and motor symptoms of Parkinson’s Disease: A new clinical state of the art. Ageing Res. Rev. 2023, 84, 101834. [Google Scholar] [CrossRef]
Váradi, C. Clinical Features of Parkinson’s Disease: The Evolution of Critical Symptoms. Biology 2020, 9, 103. [Google Scholar] [CrossRef] [PubMed]
Kumaresan, M.; Khan, S. Spectrum of Non-Motor Symptoms in Parkinson’s Disease. Cureus 2021, 13, e13275. [Google Scholar] [CrossRef]
Virameteekul, S.; Revesz, T.; Jaunmuktane, Z.; Warner, T.T.; De Pablo-Fernández, E. Clinical Diagnostic Accuracy of Parkinson’s Disease: Where Do We Stand? Mov. Disord. 2023, 38, 558–566. [Google Scholar] [CrossRef] [PubMed]
Tolosa, E.; Garrido, A.; Scholz, S.W.; Poewe, W. Challenges in the diagnosis of Parkinson’s disease. Lancet Neurol. 2021, 20, 385–397. [Google Scholar] [CrossRef]
Armstrong, M.J.; Okun, M.S. Diagnosis and Treatment of Parkinson Disease: A Review. JAMA 2020, 323, 548. [Google Scholar] [CrossRef]
Kulcsarova, K.; Skorvanek, M.; Postuma, R.B.; Berg, D. Defining Parkinson’s Disease: Past and Future. J. Park. Dis. 2024, 14, S257–S271. [Google Scholar] [CrossRef]
Gibb, W.R.; Lees, A.J. The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson’s disease. J. Neurol. Neurosurg. Psychiatry 1988, 51, 745–752. [Google Scholar] [CrossRef] [PubMed]
Postuma, R.B.; Berg, D.; Stern, M.; Poewe, W.; Olanow, C.W.; Oertel, W.; Obeso, J.; Marek, K.; Litvan, I.; Lang, A.E.; et al. MDS clinical diagnostic criteria for Parkinson’s disease. Mov. Disord. 2015, 30, 1591–1601. [Google Scholar] [CrossRef]
Polvorinos-Fernández, C.; Sigcha, L.; Borzì, L.; Olmo, G.; Asensio, C.; López, J.M.; de Arcas, G.; Pavón, I. Evaluating Motor Symptoms in Parkinson’s Disease Through Wearable Sensors: A Systematic Review of Digital Biomarkers. Appl. Sci. 2024, 14, 10189. [Google Scholar] [CrossRef]
Seng, K.P.; Ang, L.M.; Peter, E.; Mmonyi, A. Machine Learning and AI Technologies for Smart Wearables. Electronics 2023, 12, 1509. [Google Scholar] [CrossRef]
Varghese, J.; Alen, C.M.v.; Fujarski, M.; Schlake, G.S.; Sucker, J.; Warnecke, T.; Thomas, C. Sensor Validation and Diagnostic Potential of Smartwatches in Movement Disorders. Sensors 2021, 21, 3139. [Google Scholar] [CrossRef]
LeMoyne, R.; Mastroianni, T. Implementation of a Smartwatch with Machine Learning for Ascertaining Efficacy of Deep Brain Stimulation for Parkinson’s Disease Treatment. In Proceedings of the 2024 E-Health and Bioengineering Conference (EHB); IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar] [CrossRef]
Shawen, N.; O’Brien, M.K.; Venkatesan, S.; Lonini, L.; Simuni, T.; Hamilton, J.L.; Ghaffari, R.; Rogers, J.A.; Jayaraman, A. Role of data measurement characteristics in the accurate detection of Parkinson’s disease symptoms using wearable sensors. J. NeuroEng. Rehabil. 2020, 17, 52. [Google Scholar] [CrossRef] [PubMed]
Gutowski, T.; Stodulska, O.; Ćwiklińska, A.; Gutowska, K.; Kopeć, K.; Betka, M.; Antkiewicz, R.; Koziorowski, D.; Szlufik, S. Machine Learning-Based Assessment of Parkinson’s Disease Symptoms Using Wearable and Smartphone Sensors. Sensors 2025, 25, 4924. [Google Scholar] [CrossRef]
Shah, V.V.; McNames, J.; Mancini, M.; Carlson-Kuhta, P.; Nutt, J.G.; El-Gohary, M.; Lapidus, J.A.; Horak, F.B.; Curtze, C. Digital Biomarkers of Mobility in Parkinson’s Disease During Daily Living. J. Park. Dis. 2020, 10, 1099–1111. [Google Scholar] [CrossRef] [PubMed]
Tsakanikas, V.; Ntanis, A.; Rigas, G.; Androutsos, C.; Boucharas, D.; Tachos, N.; Skaramagkas, V.; Chatzaki, C.; Kefalopoulou, Z.; Tsiknakis, M.; et al. Evaluating Gait Impairment in Parkinson’s Disease from Instrumented Insole and IMU Sensor Data. Sensors 2023, 23, 3902. [Google Scholar] [CrossRef]
Trabassi, D.; Serrao, M.; Varrecchia, T.; Ranavolo, A.; Coppola, G.; De Icco, R.; Tassorelli, C.; Castiglia, S.F. Machine Learning Approach to Support the Detection of Parkinson’s Disease in IMU-Based Gait Analysis. Sensors 2022, 22, 3700. [Google Scholar] [CrossRef]
Sousani, M.; Rojas, R.F.; Preston, E.; Ghahramani, M. Integrating IMU Sensors and Dual-Task Timed Up and Go to Identify Biomarkers for Early Stage Parkinson’s Disease Detection. IEEE Sens. J. 2025, 25, 38217–38229. [Google Scholar] [CrossRef]
Meigal, A.Y.; Gerasimova-Meigal, L.I.; Reginya, S.A.; Soloviev, A.V.; Moschevikin, A.P. Gait Characteristics Analyzed with Smartphone IMU Sensors in Subjects with Parkinsonism under the Conditions of “Dry” Immersion. Sensors 2022, 22, 7915. [Google Scholar] [CrossRef]
Ruiz-Vitte, A.; Comesaña, A.; Muñoz-Arcentales, A.; Larraga-García, B.; Alonso, A.; Rocon, E.; Gutiérrez, A. Activity recognition in patients with tremor: Integrating time–length windows for enhanced detection. Comput. Biol. Med. 2025, 193, 110305. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Zhao, Y.; Qi, J.; Wang, X.; Yang, Y.; Yang, P. Effective Severity Assessment of Parkinson’s Disease using Wearable Sensors in Free-living IoT Environment. In Proceedings of the 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS); IEEE: New York, NY, USA, 2023; pp. 900–906. [Google Scholar] [CrossRef]
Wang, X.; Peng, X.; Xu, Z.; Xu, M.; Yang, Y.; Zhou, M.; Zhao, Z.; Yue, P.; Yang, P. PDWearML: Leveraging Daily Activities for Fast Parkinson’s Disease Severity Assessment with Wearable Machine Learning. IEEE Trans. Biomed. Eng. 2026, 1–13. [Google Scholar] [CrossRef]
Bologna, M.; Espay, A.J.; Fasano, A.; Paparella, G.; Hallett, M.; Berardelli, A. Redefining Bradykinesia. Mov. Disord. 2023, 38, 551–557. [Google Scholar] [CrossRef]
Del Din, S.; Godfrey, A.; Mazzà, C.; Lord, S.; Rochester, L. Free-living monitoring of Parkinson’s disease: Lessons from the field. Mov. Disord. 2016, 31, 1293–1313. [Google Scholar] [CrossRef]
Polvorinos-Fernández, C.; Sigcha, L.; Centeno-Cerrato, M.; de Arcas, G.; Grande, M.; Marín, M.; Pareés, I.; Martínez-Castrillo, J.C.; Pavón, I. Evaluation of Free-Living Motor Symptoms in Patients with Parkinson Disease Through Smartwatches: Protocol for Defining Digital Biomarkers. JMIR Res. Protoc. 2025, 14, e72820. [Google Scholar] [CrossRef]
Hoehn, M.M.; Yahr, M.D. Parkinsonism: Onset, progression, and mortality. Neurology 1967, 17, 427. [Google Scholar] [CrossRef] [PubMed]
Locatelli, P.; Alimonti, D.; Traversi, G.; Re, V. Classification of Essential Tremor and Parkinson’s Tremor Based on a Low-Power Wearable Device. Electronics 2020, 9, 1695. [Google Scholar] [CrossRef]
Polvorinos-Fernández, C.; Centeno-Cerrato, M.; Sigcha, L.; Grande, M.; Marín, M.; de Arcas, G.; Pavón, I. BIOCLITE: Smartwatch Dataset for Parkinson’s Disease in Supervised and Unsupervised Contexts. 2025. Available online: https://zenodo.org/records/16408199 (accessed on 7 June 2026). [CrossRef]
Winter, D.A. Kinetics: Forces and Moments of Force. In Biomechanics and Motor Control of Human Movement; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2009; Chapter 5; pp. 107–138. [Google Scholar] [CrossRef]
Sun, S.; Cao, Z.; Zhu, H.; Zhao, J. A Survey of Optimization Methods From a Machine Learning Perspective. IEEE Trans. Cybern. 2020, 50, 3668–3681. [Google Scholar] [CrossRef]
Evers, L.J.; Raykov, Y.P.; Krijthe, J.H.; Silva de Lima, A.L.; Badawy, R.; Claes, K.; Heskes, T.M.; Little, M.A.; Meinders, M.J.; Bloem, B.R. Real-Life Gait Performance as a Digital Biomarker for Motor Fluctuations: The Parkinson@Home Validation Study. J. Med. Internet Res. 2020, 22, e19068. [Google Scholar] [CrossRef]

Figure 1. Experimental protocol.

Figure 2. Experimental setup.

Figure 3. Interface of the smartwatch data collection application.

Figure 4. PD participant during egg beating activity.

Figure 5. Pipeline followed for the evaluation of the egg beating activity.

Figure 6. Time evolution of the accelerometer signal for (a) PD and (b) HC subjects during the egg-beating activity.

Figure 7. Frequency evolution of the accelerometer signal for (a) PD and (b) HC subjects during the egg-beating activity.

Figure 8. Time evolution of the gyroscope signal for (a) PD and (b) HC subjects during the egg-beating activity.

Figure 9. Frequency evolution of the gyroscope signal for (a) PD and (b) HC subjects during the egg-beating activity.

Figure 10. Metrics obtained for the ML models evaluated for the activity of beating egg.

Figure 11. Top-10 features ranked by permutation importance for the SVM classifier.

Figure 12. Confusion Matrices for the evaluation of egg beating activity in different data collection contexts.

Figure 13. AUC-ROC results for the evaluation of egg beating activity in different data collection contexts.

Table 1. Technical specifications of the smartwatches evaluated.

Work	Year	Participants	Evaluated Activity	Main Findings
Varghese et al. [15]	2021	260 PD, 89 HC	Resting tremor, postural tremor, finger pointing, gait	A gradient boosting decision tree model applied to smartwatch data can successfully distinguish PD patients from HC, achieving accuracies between 80% and 88%.
LeMoyne and Mastroianni [16]	2024	1 PD	Resting tremor	Smartwatch-based inertial sensing combined with ML for evaluating the efficacy of deep brain stimulation in PD treatment, achieving 90% classification accuracy.
Shawen et al. [17]	2020	13 PD	Functional tasks, fine upper extremity tasks, gross upper extremity tasks, and tasks used in clinical assessment	Data from wrist-worn sensors using RF models to identify tremor and bradykinesia in PD patients, reporting AUC-ROC values of up to 0.79 for tremor and 0.68 for bradykinesia.
Gutowski et al. [18]	2025	241 PD	Resting tremor, postural tremor, pronation–supination.	ML models integrating data from wearable devices and smartphones can estimate the severity of multiple motor symptoms in PD, achieving correlations with clinical ratings of up to $r \approx 0.8$ .
Shah et al. [19]	2020	29 PD, 27 HC	Gait	Turning and gait indicators discriminate PD from HC, with AUC = 0.87–0.89.
Tsakanikas et al. [20]	2023	19 PD	Gait	IMU-based devices can lead to accurate detection of gait impairment with AUC values of 0.93 using a SVM and RF.
Trabassi et al. [21]	2022	64 PD, 64 HC	Gait	IMU-derived data and SVM model achieved an accuracy of 86% in distinguishing PD from HC.
Sousani et al. [22]	2025	28 PD, 34 HC	TUG test, cognitive and motor dual-task TUG	IMU data combined with SVM and LR achieved 95% accuracy.
Meigal et al. [23]	2022	5 PD	Gait (extended TUG)	Smartphone head-mounted IMU used to extract step timing and acceleration features during an extended TUG.

Table 2. Participant Demographics. The table outlines demographic information, including the total sample size, mean age with standard deviation, age range, disease duration for PD group, and gender distribution (F: Female. M: Male).

	PD Patients	HC
Sample size	22	16
Mean ± SD age (years)	65.2 ± 9.2	60.5 ± 7.0
Range age (years)	45–79	49–73
Disease duration (years)	2–11	N/A
Gender	10 F, 12 M	10 F, 6 M

N/A: not applicable.

Table 3. Summary of Extracted Features.

Category	Features
Temporal statistics	RMS, amplitude level, mean, standard deviation, range, interquartile range, first quartile, third quartile, variance, median, kurtosis, absolute maximum, absolute minimum, skewness
Energy and movement intensity	Signal energy, dominant-band energy, signal magnitude area, absolute average variation, Shannon entropy, approximate entropy
Signal changes and peaks	Smoothness, zero-crossing count, peak-to-average ratio, mean square root of local maxima, standard deviation of local maxima, absolute maximum of local maxima
Jerk statistics	RMS, mean, standard deviation, kurtosis, skewness, range, maximum, global
Frequency statistics	Dominant frequency, dominant frequency (1–4 Hz), dominant frequency (4–8 Hz), dominant jerk frequency, spectral power, spectral power density (PSD), PSD (1–8 Hz), PSD (0.2–4 Hz), maximum PSD, dominant PSD frequency, second maximum PSD, second dominant PSD frequency, spectral entropy, average of first five FFT (Fast Fourier Transform) components, FFT energy
Other metrics	Hjorth parameters, Lyapunov exponent, inter-axis correlation, absolute maximum inter-axis correlation, delay of first correlation maximum, signal magnitude vector

Table 4. Evaluation metrics of each combination of window size and overlapping.

Window Size	Overlap	Accuracy	Precision	Recall	Specificity	F1-Score
128	0	82.0%	82.0%	81.9%	81.9%	82.0%
	25	81.4%	81.4%	81.3%	81.3%	81.4%
	50	81.7%	81.7%	81.7%	81.7%	81.7%
	75	81.5%	81.9%	81.6%	81.6%	81.4%
256	0	83.1%	83.2%	83.1%	83.1%	83.1%
	25	82.0%	82.1%	82.1%	82.1%	82.0%
	50	82.7%	82.7%	82.6%	82.6%	82.7%
	75	82.4%	82.5%	82.4%	82.4%	82.4%
512	0	83.2%	83.6%	83.2%	83.2%	83.1%
	25	83.5%	83.0%	83.6%	83.6%	83.5%
	50	84.2%	84.1%	84.0%	83.7%	84.3%
	75	83.2%	83.3%	83.1%	83.1%	83.2%

Table 5. Performance metrics for the evaluation of the egg beating activity in different contexts.

Context	Accuracy	Balanced Accuracy	Precision	Recall	Specificity	F1-Score
Supervised	$91.1 \pm 0.4 %$	$89.6 \pm 0.5 %$	$90.7 \pm 0.6 %$	$91.6 \pm 0.5 %$	$90.6 \pm 0.7 %$	$91.2 \pm 0.5 %$
Unsupervised	$87.8 \pm 1.4 %$	$86.4 \pm 1.1 %$	$86.8 \pm 1.2 %$	$89.0 \pm 1.3 %$	$86.6 \pm 1.3 %$	$87.9 \pm 1.1 %$
Combined	$89.1 \pm 0.6 %$	$88.3 \pm 0.8 %$	$88.7 \pm 0.9 %$	$89.6 \pm 0.7 %$	$88.6 \pm 0.8 %$	$89.1 \pm 0.7 %$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Polvorinos-Fernández, C.; Sigcha, L.; Valero, M.M.; Grande, M.; de Arcas, G.; Pavón, I. Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task. Technologies 2026, 14, 345. https://doi.org/10.3390/technologies14060345

AMA Style

Polvorinos-Fernández C, Sigcha L, Valero MM, Grande M, de Arcas G, Pavón I. Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task. Technologies. 2026; 14(6):345. https://doi.org/10.3390/technologies14060345

Chicago/Turabian Style

Polvorinos-Fernández, Carlos, Luis Sigcha, Mayca Marín Valero, Miriam Grande, Guillermo de Arcas, and Ignacio Pavón. 2026. "Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task" Technologies 14, no. 6: 345. https://doi.org/10.3390/technologies14060345

APA Style

Polvorinos-Fernández, C., Sigcha, L., Valero, M. M., Grande, M., de Arcas, G., & Pavón, I. (2026). Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task. Technologies, 14(6), 345. https://doi.org/10.3390/technologies14060345

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Assessment of Parkinson’s Disease Using a Novel Free-Living Egg-Beating Motor Task

Abstract

1. Introduction

Wearables and Machine Learning for Motor Assessment

2. Materials and Methods

2.1. Study Design

2.2. Data Acquisition Device

2.3. Data Collection

2.4. Signal Processing and Machine Learning Models

3. Results

3.1. Signal Analysis

3.2. ML Analysis

4. Discussion

Limitations and Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI