A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study

Zanela, Andrea

doi:10.3390/app16063091

Open AccessArticle

A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study

by

Andrea Zanela

Energy and Data Science Lab, ENEA “Casaccia” Research Centre, I-00123 Rome, Italy

Appl. Sci. 2026, 16(6), 3091; https://doi.org/10.3390/app16063091

Submission received: 22 January 2026 / Revised: 25 February 2026 / Accepted: 17 March 2026 / Published: 23 March 2026

(This article belongs to the Special Issue Emerging Technologies for Assistive Robotics)

Download

Browse Figures

Versions Notes

Abstract

The clinical evaluation of motor impairment in Parkinson’s disease is commonly based on the Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part III, which relies on visual assessment and is therefore subject to inter-rater variability. Existing technology-based solutions often require wearable sensors or lack structural alignment with the item-based architecture of the clinical examination. This study presents a fully automated and contactless framework designed to quantitatively describe motor performance in tasks explicitly aligned with MDS-UPDRS Part III. The system integrates stereo vision, deep learning-based pose estimation, and acoustic analysis to derive continuous, standardized quantitative descriptors. Objective Motor Item Indices were defined for 17 of the 18 motor items, excluding rigidity, which cannot be inferred from vision-based measurements. The framework was evaluated in a cohort of healthy subjects to establish an internal reference baseline for feature normalization and index construction. Within this cohort, descriptors exhibited coherent multivariate organization and internally consistent distributions, supporting methodological feasibility at this baseline definition stage. This work represents a methodological and baseline definition phase. Clinical validation in Parkinsonian populations, correlation with neurologist-rated scores, and longitudinal assessment remain necessary to determine diagnostic, severity-related, or early-stage applicability.

Keywords:

Parkinson’s disease; MDS-UPDRS; contactless motor assessment; deep learning; pose estimation; digital motor biomarkers

1. Introduction

1.1. Background and Motivation

Population ageing represents a major global challenge and is accompanied by a rapid increase in neurodegenerative disorders (NDs) [1,2,3,4,5]. Among these, Parkinson’s disease (PD) is the second most prevalent condition after Alzheimer’s disease and a leading cause of motor disability. As prevalence rises, the need for objective, scalable, and reproducible tools for motor assessment becomes increasingly urgent. The severity of motor impairment in PD is most commonly evaluated using the Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS), with Part III representing the gold standard for clinical motor assessment in both research and outpatient settings [6].

Despite being a globally validated and expert-consensus-based clinical scale, the MDS-UPDRS includes motor examinations that rely on subjective visual assessment and clinical judgment, which may be affected by inter-rater variability and human error, particularly in longitudinal evaluations and multicenter studies [7].

Several technology-based solutions have been proposed to complement clinical observation, including wearable sensors and vision-based systems [8,9,10,11]. Wearables enable continuous monitoring but require physical instrumentation and calibration, potentially limiting usability in routine practice.

Contactless approaches, on the other hand, preserve the observational nature of the MDS-UPDRS while enabling objective quantification. Recent advances in artificial intelligence, machine learning, and deep learning have significantly expanded the ability to extract clinically relevant information from complex biomedical data [12,13,14,15,16,17,18,19]. In particular, computer vision and pose estimation techniques now allow quantitative reconstruction of human kinematics from video streams, overcoming limitations of subjective visual inspection.

1.2. Proposed Technological Framework

This study adopts a supervised learning paradigm in which algorithms are trained to associate image-based inputs with anatomically meaningful outputs. The proposed framework integrates stereo vision and deep learning-based pose estimation to reconstruct three-dimensional body kinematics from synchronized video streams [13,20,21]. Stereo imaging enables depth recovery from paired camera views, while convolutional neural network-based pose estimation identifies anatomical KeyPoints representing body segments.

By analyzing the temporal evolution of these KeyPoints, the system extracts quantitative descriptors aligned with the motor constructs assessed in the MDS-UPDRS Part III. The present work extends a previous clinical proof-of-concept study of the proposed vision-based assessment system [22] by broadening quantitative coverage to 17 of the 18 motor items of the MDS-UPDRS Part III scale. Detailed acquisition settings, signal processing steps, and feature definitions are described in Section 2.

1.3. Study Aim and Contributions

Importantly, the present study focuses on healthy subjects as a critical step in the validation and calibration of the automated assessment framework. By analyzing a population without motor disorders, a reference baseline is established that captures variability in motor features within a healthy reference cohort and provides a robust reference for future investigations involving PD patients.

It is important to emphasize that the healthy cohort analyzed in this study is not intended to define population-wide reference baseline values. Certain motor phenomena that are central to Parkinson’s disease, such as true freezing episodes, pathological fatigability, or marked motor hesitations, cannot be fully represented in healthy subjects. The present baseline analysis therefore focuses on physiological variability and methodological validation, while the characterization of disease-specific motor patterns will require dedicated validation in PD cohorts. Accordingly, the aim of the present study is not to establish diagnostic or severity metrics in PD, but to define and characterize a healthy reference framework necessary for future patient-oriented validation.

The focus is therefore on methodological robustness, feature standardization, and analysis of the internal structure of the derived indices.

Within this scope, the main contributions of this work are: (i) the implementation of a contactless multimodal acquisition and processing pipeline aligned with MDS-UPDRS Part III tasks; (ii) the definition of quantitative Motor Item Indices derived from standardized kinematic and acoustic descriptors; and (iii) the establishment of an internal reference baseline in healthy subjects to support future clinical validation.

The potential applicability of the framework for early detection or monitoring of subtle motor changes cannot be inferred from the present healthy-cohort analysis. However, the methodological architecture established here provides the necessary quantitative foundation for such investigations in appropriately designed clinical studies.

2. Methods and Materials

2.1. Methodological Framework

Motor impairment in PD is commonly assessed using the MDS-UPDRS [23]. In this work we focus on Part III (Motor Examination), which comprises 18 items scored on an ordinal scale from 0 (normal) to 4 (severe) and covers a broad range of motor domains (e.g., speech, facial expression, bradykinesia, tremor, gait and posture). The proposed framework is designed to align the experimental tasks and derived descriptors with the structure of Part III.

2.1.1. Image Processing and 2D-KeyPoints Computing

Once the videos for each participant in the test campaign have been collected, the locations of the KeyPoints for each subject in the acquired images must be found to extract people’s motor features. Before being able to proceed with KeyPoints finding, it is necessary to perform a pre-processing of input images to minimize mismatch and other sources of error in the following process. Once the images have been properly treated, we will be able to proceed and detect the people in the images and recover their pose.

For this task, we will use the OpenPose software package (v1.5.1) Carnegie Mellon University, Pittsburgh, PA, USA) [24], a well-known tool for multi-person 2D pose estimator. OpenPose works with a bottom-up approach and uses a feed-forward multi-stage Convolutional Neural Network (CNN) to simultaneously predict a set of 2D confidence maps of human body part locations and a set of 2D vector fields, known as Part Affinity Fields (PAFs), which encode the degree of association between body parts. The confidence maps and affinity fields are then parsed through a greedy inference process, which outputs the 2D image coordinates of the KeyPoints for each subject detected in the image. Several models are available in OpenPose to approximate the main features of the human body and to identify individuals. In this work, a FULL-BODY model is adopted (see Figure 1) by combining three different models:

The BODY-25 model, which describes the posture of the head, torso, arms, legs, and feet using 25 KeyPoints connected by 24 segments.
The HAND-21 model, which describes the posture of each hand.
The FACE-68 model, which describes the posture of the face.

Overall, the posture of a subject is represented by four vectors—body, face, left hand, and right hand—for a total of 135 KeyPoints. Each KeyPoint is assigned to one of these vectors by prefixing its index with BO (body), FA (face), LH (left hand), or RH (right hand). OpenPose tools use NVIDIA libraries CUDA 10.0 (NVIDIA Corporation, Santa Clara, CA, USA) [25] and cuDNN 7.2 (CUDA Deep Neural Network, NVIDIA Corporation, Santa Clara, CA, USA) [26], a GPU-accelerated library of primitives for deep neural networks. Our system runs on an HP Z6 workstation (HP Inc., Palo Alto, CA, USA) with an Nvidia Quadro RTX 8000 (NVIDIA Corporation, Santa Clara, CA, USA) and 48 GB of GDDR6 memory.

Using the OpenPose library, our system computes for each frame the image coordinates of the 135 KeyPoints for each person who appears within the scene. While OpenPose provides robust multi-person pose estimation under controlled conditions, its performance may be affected by occlusions, lighting variability, and clothing.

To mitigate these effects, temporal consistency checks and multi-camera fusion were applied, reducing the impact of occasional KeyPoint detection errors on downstream feature extraction.

2.1.2. Audio Processing and Speech Analysis

The audio stream, acquired using the AKG C417PP lavalier microphone (AKG Acoustics, Vienna, Austria) [27], was processed independently from the visual branch while maintaining temporal synchronization. Although audio and stereo video streams were recorded concurrently, the two modalities serve distinct roles within the proposed framework. Audio data were used exclusively for the assessment of speech-related impairment corresponding to the Speech item of the MDS-UPDRS, whereas all other motor items were quantified solely from three-dimensional kinematic information.

Speech recordings were analyzed through an automated Dysarthria Analyzer pipeline following established methodologies for quantitative characterization of Parkinsonian speech [28,29,30]. Acoustic descriptors were extracted to capture production-related characteristics such as loudness modulation, pitch variability, temporal organization, and articulation-related dynamics.

In parallel, speech intelligibility was quantified using an Automatic Speech Recognition (ASR) system. The ASR-generated transcript was compared to a predefined reference text, and similarity was evaluated using a Jaccard distance-based metric. This approach provides an objective and reproducible estimate of functional speech intelligibility, complementing signal-level acoustic descriptors.

The integration of production-related acoustic features and intelligibility measures reflects the multidimensional structure of clinical speech assessment in Parkinson’s disease and enables the derivation of structured speech descriptors suitable for aggregation within the proposed analytical framework.

In addition to the speech-related feature extracted from the microphone signal (loss of modulation, volume, and intelligibility), all other motor features are obtained from the intelligent video analysis system based on the 2D KeyPoints detected in the images acquired by the two stereo cameras. For each detected 2D KeyPoint, the corresponding three-dimensional position is retrieved from the point cloud, yielding a set of 3D KeyPoints for each camera.

These raw 3D KeyPoints are then validated by removing false positives due to objects or other elements in the environment, enforcing temporal consistency with respect to previously detected poses, and correcting potential mismatches in tests involving more than one subject. After this validation stage, the two sets of 3D KeyPoints coming from the two stereo cameras are merged into a single unified set of 3D KeyPoints per frame [31,32].

The resulting trajectories are analyzed to compute kinematic features from the reconstructed motion trajectories [33], corresponding to the clinical descriptors used in the MDS-UPDRS Part III [34,35,36]. The procedure for assessing the motor skills of the subjects includes creating a clear space and periodically incorporating various objects, such as chairs or obstacles, depending on the specific test being performed. All evaluations were conducted using two stereo cameras with the two baselines positioned perpendicularly.

The distance between the two stereo cameras is adjusted based on the type of test; it is closer for tests in which the subject remains stationary and farther apart for tests related to walking (Figure 2).

2.1.3. Feature Standardization and Motor Item Index Computation

For each MDS-UPDRS Part III item, a specific subset of quantitative features was selected to represent the motor performance required by the task. The analysis was conducted on the reference group of 15 healthy subjects. For each subject s and each feature j, the feature value x_s_,j was averaged across the three repetitions of the corresponding test. Descriptive statistics were then computed across subjects, and the mean μ_j and standard deviation σ_j for each feature were estimated from the healthy dataset.

The measured feature value x_s_,j was expressed in standardized form as a z-score:

z_{s, j} = \frac{x_{s, j} - μ_{j}}{σ_{j}} .

(1)

A z-score expresses how many standard deviations a measurement deviates from the expected value in the control population. Values close to zero indicate performance consistent with the internal reference distribution, whereas large positive or negative values denote an increased deviation (e.g., slower, less regular, or less stable movement).

All standardized features contributing to a specific motor item were aggregated into a composite index [36,37], referred to as the Motor Item Index (MII). For each subject s and motor item k, the corresponding MII was computed as

{M I I}_{s, k} = \sum_{j = 1}^{N_{k}} w_{k, j} z_{s, j}

(2)

where N_k is the number of features associated with item k, and w_k_,j are the relative weights assigned to each feature (with

\sum_{j = 1}^{N_{k}} w_{k, j} = 1

). For the baseline analysis, uniform weighting was adopted as a pragmatic modeling assumption to avoid introducing data-driven bias in the absence of patient-level clinical outcomes. Accordingly, the composite indices should be interpreted as operational descriptors relative to the normative reference rather than clinically validated constructs.

This conservative choice reflects the absence of ground-truth clinical scores in the present baseline dataset. Future work will investigate data-driven and clinically informed weighting strategies, including sensitivity analyses and feature-importance modeling, once patient-level MDS-UPDRS ratings become available.

Equation (2) provides a single quantitative descriptor of the overall motor performance in the task described by the corresponding MDS-UPDRS item for subject s. In the baseline dataset of healthy controls, the mean of each standardized feature is zero by construction, with a standard deviation of one, as the healthy group itself defines the normalization reference. However, the group-level baseline index for item k,

\bar{{M I I}_{k}} = \frac{1}{S} \sum_{n = 1}^{S} {M I I}_{s, k}

(3)

where S = 15 is the number of healthy subjects, being a weighted combination of correlated features, does not necessarily preserve these exact properties.

Typically, the distribution of subject-level

{M I I}_{s, k}

values exhibits a smaller standard deviation that reflects inter-feature correlations and inter-subject variability. Because

{M I I}_{s, k}

is expressed in standardized units relative to the healthy reference cohort, it enables quantitative comparison with respect to an internal reference baseline, thereby providing a descriptive framework for future patient-based investigations aligned with established clinical assessment paradigms [34,35,38].

This approach yields a continuous, unit-free score that reflects the degree of deviation relative to the internal reference baseline. Its interpretation may follow conventional descriptive thresholds commonly adopted in quantitative assessment frameworks:

Values within $|{M I I}_{s, k}| \leq 0.5 S D$ fall within the central range of the reference distribution;
Deviations in the range $0.5 S D < |{M I I}_{s, k}| \leq 1.0 S D$ represent moderate deviations from the cohort mean;
Values with $|{M I I}_{s, k}| > 1.0 S D$ correspond to upper-tail deviations within the internal reference distribution;
Deviations exceeding $|{M I I}_{s, k}| > 2.0 S D$ reflect extreme positions relative to the internal reference baseline.

These statistical thresholds are intended to support standardized interpretation of deviations relative to the internal healthy reference distribution.

They are descriptive with respect to the normative sample and do not imply clinical impairment, diagnostic categorization, or severity grading. The use of z-score normalization assumes approximate symmetry of feature distributions within the reference cohort and is intended here as a pragmatic and interpretable standardization strategy.

In larger datasets, this framework can be readily extended to robust or percentile-based normalization approaches to further mitigate the influence of non-Gaussian feature distributions or outliers. The proposed indices provide continuous, objective descriptors of motor performance that complement, rather than replace clinical rating scales.

This standardized framework enables quantitative comparison between individuals relative to a healthy reference and provides a foundation for future studies aimed at objective monitoring and integration with traditional clinical scales such as the MDS-UPDRS Part III. Unlike categorical scores, the continuous MII captures subtle intra- and inter-subject variability within the healthy reference cohort, supporting fine-grained quantitative characterization in the baseline definition setting adopted here. In addition to the computation of standardized feature values and the derivation of the MII, further analyses were carried out on the baseline dataset to examine the internal structure and interdependence of the selected kinematic features.

Multivariate analysis is increasingly adopted in quantitative movement analysis frameworks to characterize feature interdependence, reduce redundancy, and enhance the clinical interpretability of composite motor indices [39]. These analyses include:

Spearman correlation matrices, used to quantify monotonic relationships among features.
Principal Component Analysis (PCA) [40], used to summarize dominant patterns of variance within the standardized feature set.

Spearman correlation analysis was performed using the standardized feature set

\{z_{s, j}\}

for each motor item. For each item k a Spearman rank-correlation matrix

ρ_{(j_{1}, j_{2})}

was computed by aggregating the feature vectors across the 15 healthy subjects (and across both body sides when applicable).

This matrix provides a pairwise measure of association between features, independent of linearity assumptions and robust to non-Gaussian distributions. The resulting correlation structure makes it possible to identify functional clusters of features, such as those reflecting amplitude-related behavior, temporal variability, or fatigue-related decay. These clusters provide a descriptive overview of how selected features co-vary within the healthy reference cohort and may highlight redundant or complementary measures within the present sample. PCA was then applied to the same standardized feature matrix to examine the multivariate structure of the kinematic descriptors. Dimensionality reduction techniques, including PCA, are widely employed in wearable-based motor assessment to identify dominant movement patterns, reduce feature redundancy, and improve the interpretability of digital motor biomarkers [41].

PCA was conducted on the covariance matrix of

\{z_{s, j}\}

, yielding a set of orthogonal principal components ordered according to the amount of variance explained. For each item, the first two principal components typically capture the majority of the total variance and describe the dominant modes of variation within the present healthy reference cohort—for instance, a primary component reflecting amplitude-related variation and a secondary component associated with timing-related variability within this sample.

The PCA biplots provide a joint visualization of subject distribution and feature loadings, enabling the interpretation of measurement dimensions that contribute most strongly to inter-subject differences. In healthy subjects, samples usually cluster compactly in the PCA space, reflecting the constrained variability observed within the healthy reference cohort.

The loading vectors provide a descriptive indication of how selected features contribute to partially distinct patterns of variation within the present sample, or whether some descriptors exhibit redundant behavior.

Together, the Spearman correlation matrices and PCA constitute complementary exploratory tools that provide a descriptive overview of feature relationships and redundancy patterns within the present baseline dataset. These analyses contribute to the methodological characterization of the framework by illustrating how selected descriptors co-vary within the healthy reference cohort, without implying structural validation or independence of underlying measurement dimensions.

Each MII is designed to provide a quantitative descriptor aligned with the motor construct assessed by the corresponding MDS-UPDRS item, while preserving the continuous and multidimensional nature of quantitative kinematic and acoustic measurements. The definition of MIIs was designed to closely mirror the structure and scoring logic of the corresponding items in the MDS-UPDRS Part III.

In particular, the choice between unilateral and bilateral indices was driven by the clinical lateralization of the item, rather than by the execution modality of the task itself. For motor items that are clinically scored separately for the left and right sides (e.g., finger tapping, hand movements, toe tapping, leg agility, postural and kinetic tremor), independent indices were computed for each limb, even when the task was performed simultaneously with both sides.

Conversely, for non-lateralized items that assess global or bimanual motor performance (e.g., pronation–supination movements of the hands), a single composite index was derived to capture overall task execution.

This design choice ensures methodological consistency with the MDS-UPDRS clinical framework while preserving sensitivity to side-specific motor asymmetries when clinically relevant. Throughout the different motor tasks, identical symbols are used to denote features that represent the same underlying clinical construct (e.g., amplitude preservation, movement regularity). While their computational implementation may vary according to the biomechanics of the specific task, their intended physiological interpretation remains consistent across items.

In addition, to quantify the test–retest reliability of the extracted task-specific features under stable physiological conditions, within-subject coefficients of variation (wCV) and intraclass correlation coefficients (ICC(2,1)) were computed across the three repetitions performed by each subject for all evaluable tasks (see Supplementary Table S1). In the present methodological context, ICC values above 0.75 are generally interpreted as indicative of acceptable test–retest reliability for digital motor descriptors, whereas values exceeding 0.90 reflect high repeatability under stable physiological conditions.

All implementation parameters governing video acquisition (ZED stereo RGB-D at 60 fps), 3D KeyPoint reconstruction via stereo triangulation, OpenPose-based pose extraction, audio preprocessing, voice activity detection, pitch extraction, spectral analysis, ASR configuration (OpenAI Whisper Python implementation, version 20240930, OpenAI, San Francisco, CA, USA, using the multilingual “small” model) and composite index normalization are consolidated in Supplementary Table S2 to ensure computational reproducibility.

To provide a quantitative indication of baseline stability, the normalization parameters (µ and σ) were recomputed in a leave-one-subject-out (LOSO) fashion for each item. Across all motor tasks, the LOSO-derived reference statistics showed limited dispersion relative to the full-sample estimates, while recalculation of the corresponding Motor Item Indices resulted in minimal changes, with mean absolute differences typically below 0.05 standardized units and maximum deviations below 0.20 in most cases, indicating limited sensitivity of the normalization baseline to individual-subject inclusion.

2.2. Experimental Setup and Materials

Following the methodological framework described in Section 2.1, the experimental setup adopted for data acquisition and the characteristics of the reference cohort used for pipeline calibration are detailed below.

2.2.1. Participants

The data presented in this paper were collected in the Energy and Data Science Laboratory at the ENEA “Casaccia” Research Centre using a synchronized multimodal acquisition setup integrating audio and stereo video streams. The study included 15 healthy subjects (12 men, 3 women), who provided written informed consent after receiving a comprehensive explanation of the study tools, procedures, and objectives. Participants were recruited among healthy staff members of the Research Centre in order to span a broad range of anthropometric characteristics. Exclusion criteria included history of neurological disorders, diagnosed speech or hearing impairment, musculoskeletal conditions affecting gait or posture, and use of medications known to interfere with motor or vocal performance. Table 1 summarizes the main physical features of the cohort.

Although limited in size, the cohort is consistent with previous proof-of-concept studies on digital motor and speech biomarkers in PD and age-matched controls [42,43,44,45]. The aim of the control dataset was not to define population-wide reference values, but to calibrate the processing pipeline and establish a standardized baseline for feature normalization.

The limited sample size and the predominance of male participants reflect the exploratory nature of the present study and the availability of volunteers within the research center. Potential effects of sex-related differences on speech, facial expressivity, and movement kinematics were not investigated at this stage and will be addressed in future studies involving larger and more balanced populations.

2.2.2. Multimodal Acquisition System

As illustrated in Figure 3, the proposed system processes synchronized audio and stereo video data through two parallel but temporally aligned branches. Audio recordings are analyzed to extract speech-related descriptors, while stereo image pairs are processed to reconstruct three-dimensional body kinematics. The outputs of the two branches are subsequently fused at the feature level to obtain subject-specific multimodal descriptors.

Although audio and stereo video streams are acquired synchronously, the two modalities serve distinct and complementary roles within the proposed framework. Specifically, audio data are exclusively used for the assessment of speech-related impairment, corresponding to the Speech item of the MDS-UPDRS, whereas motor items are quantified solely based on visual and 3D kinematic information.

The audio stream is not used for the analysis of limb or axial motor tasks but is recorded concurrently to ensure temporal consistency across modalities and to enable a unified experimental protocol. Speech data were acquired using an AKG C417PP lavalier microphone [27], selected for its omnidirectional polar pattern, compact form factor, and suitability for hands-free recording. The microphone placement ensured stable mouth-to-microphone distance across subjects and minimized variability due to head or trunk movements. The audio stream was continuously recorded and temporally synchronized with the stereo video signals.

As shown in Figure 3, the audio stream feeds an automated Dysarthria Analyzer pipeline, which combines acoustic feature extraction with an objective estimate of speech intelligibility derived from automatic speech recognition (ASR). This design reflects the dual nature of clinical speech assessment in PD, which integrates both signal-level production characteristics (e.g., loudness, pitch modulation, speech rate) and the functional ability to convey linguistic information. Acoustic descriptors were extracted following a validated framework for Parkinsonian speech analysis, while speech intelligibility was quantified by comparing ASR-generated transcripts with a predefined reference text using a Jaccard distance-based metric. The resulting speech-related features provide a compact and reproducible representation of vocal motor control and intelligibility, suitable for integration with motor descriptors.

For each subject, a total of 60 audio recordings were collected, resulting in 900 audio sequences across the entire cohort.

2.2.3. Stereo Vision and Recording Configuration

Visual data were acquired using two Stereolabs ZED2 stereoscopic cameras (Stereolabs, San Francisco, CA, USA) [46], positioned at predetermined viewpoints to fully capture whole-body movements. Each ZED2 device consists of two horizontally separated cameras with coplanar optical planes, and a baseline of 12 cm.

Depth information was computed using the ZED2 software development kit (ZED SDK v4.1.3), which leverages NVIDIA CUDA libraries [25] to perform real-time stereo matching and neural-network-based depth estimation on a GPU. The ZED2 cameras provide a depth range of 0.2–20 m, with a maximum resolution of 4416 × 1242 pixels and a maximum frame rate of 100 Hz. In this study, recordings were performed at a resolution of 2560 × 720 pixels and 60 fps, with left and right images of 1280 × 720 pixels each.

For each camera, the reference coordinate system was defined with the origin at the left camera of the stereoscopic pair; the x-axis lies in the optical plane, the y-axis follows the baseline direction, and the z-axis is orthogonal to both and points upward. The relative pose between the two cameras was known and fixed. Stereo image pairs were processed to generate dense 3D point clouds representing the external surface of the scene. Each point cloud was stored in four 32-bit floating-point channels: three for spatial coordinates (x, y, z) and one for color information.

In parallel, 2D body KeyPoints were extracted from RGB images using the OpenPose full-body model. As depicted in Figure 3, 2D KeyPoints and stereo-derived depth information were fused to reconstruct raw 3D body KeyPoints, which were subsequently validated and merged to obtain a consistent and anatomically plausible 3D skeletal representation.

2.2.4. Experimental Protocol and Dataset Structure

Participants were asked to perform a standardized set of motor tasks commonly used for the assessment of motor function. Each task was recorded simultaneously by both 3D cameras and repeated three times. For each subject, a total of 120 video recordings were collected, resulting in 1800 stereo video sequences across the entire cohort. Audio recordings were acquired concurrently with the video data to ensure precise temporal alignment between speech and motor features.

Overall, the dataset comprises 900 audio files and 1800 stereo video recordings for the entire cohort. The resulting dataset provides synchronized audio, 2D visual, and 3D kinematic information for each subject and task, enabling the extraction of multimodal descriptors that jointly characterize speech and movement. These descriptors constitute the input for subsequent feature extraction and analysis stages, as detailed in the following sections.

2.3. Operationalization of Individual MDS-UPDRS Items

For all items described below, task-specific features were standardized using the z-score procedure defined in Section 2.1.3 and aggregated into the corresponding MII through Equation (2), unless otherwise specified. The implementation of each task follows the scoring logic of the corresponding MDS-UPDRS Part III item, preserving distinctions between unilateral and bilateral ratings where clinically relevant.

For each item, the experimental configuration and feature extraction strategy were defined to reflect the underlying clinical construct through quantitative descriptors derived from synchronized audio and stereo video data. All indices described below are continuous measures expressed relative to the internal healthy reference cohort.

The following subsections detail the methodological implementation adopted for each item.

2.3.1. Testing Speech

Speech impairment in PD, commonly referred to as hypokinetic dysarthria, arises from altered control of respiratory, phonatory, and articulatory mechanisms and typically manifests as reduced vocal intensity, limited prosody, imprecise articulation, and decreased intelligibility [47,48,49]. In MDS-UPDRS Part III, item 3.1 provides a global perceptual evaluation of speech clarity, loudness, and modulation during a short spoken task [6]. Although clinically meaningful, this judgment is inherently subjective and may be affected by inter-rater variability, motivating the development of quantitative approaches [50]. To obtain objective descriptors aligned with this item, we combined acoustic measures derived from an automated Dysarthria Analyzer with a complementary estimate of speech intelligibility based on text similarity. This pipeline was used exclusively for item 3.1, whereas all other items rely on visual and kinematic information. The Dysarthria Analyzer follows the methodology proposed by Tsanas et al. and has been widely adopted for quantitative characterization of Parkinsonian speech [28,29,30].

Speech was recorded using a microphone placed at point P₂ (Figure 2e). Participants were instructed to read a predefined text at comfortable pace and loudness. The signal was processed in parallel by the Dysarthria Analyzer and by an intelligibility module. A subset of acoustic descriptors was selected to represent perceptual dimensions relevant to clinical scoring:

SPL_mean: Mean sound pressure level, reflecting overall vocal intensity and hypophonia (dB).
F0_var: Variability of the fundamental frequency, capturing pitch modulation and prosodic monotony (Hz).
SpeechRate: Speech rate, reflecting alterations such as tachyphemia or slowed speech (syllables/s).
PauseRatio: Fraction of time classified as silence or pauses, associated with speech fluency and freezing phenomena (dimensionless).
HNR: Harmonics-to-noise ratio, indicating voice quality and articulatory precision (dB).

While these variables describe speech production, they only indirectly capture the functional outcome of communication. Because clinical evaluation also depends on how well the content is understood, we complemented acoustic analysis with an automatic estimate of intelligibility.

The spoken utterance was transcribed using an Automatic Speech Recognition system. Whisper was selected for its robustness to heterogeneous acoustic conditions and atypical speech patterns, including dysarthria [51,52,53]. Speech intelligibility analysis was conducted using the OpenAI-whisper Python implementation with the pre-trained multilingual “small” model, loaded via the whisper.load_model(“small”) interface. The transcription served exclusively to compute a similarity-based intelligibility score. Let T_ref denote the set of tokens in the reference text and T_asr the set of tokens in the ASR-generated transcript after standard normalization. Speech intelligibility was quantified using the Jaccard distance:

D_{j} = 1 - \frac{⌊T_{r e f}⌋ \cap ⌊T_{a s r}⌋}{⌊T_{r e f}⌋ \cup ⌊T_{a s r}⌋}

(4)

where D_j = 0 denotes perfect correspondence and larger values indicate increasing information loss. This metric acts as a functional proxy rather than a comprehensive linguistic evaluation; more advanced measures will be investigated in future work.

All speech descriptors were computed once per subject, consistent with the single-score nature of item 3.1. The resulting Speech Index (SI) corresponds to the MII for this item within the general aggregation framework described above.

2.3.2. Testing Facial Expression

Parkinson’s disease frequently affects facial motor control, leading to reduced spontaneous and voluntary movements (hypomimia). In MDS-UPDRS Part III, item 3.2 evaluates facial expressiveness, spontaneous blinking, and the ability to generate natural movements during rest and speech [6,54,55,56,57,58].

Although clinically meaningful, this judgement is inherently subjective and may lack sensitivity to subtle alterations. To obtain objective descriptors, facial dynamics were quantified through vision-based tracking of landmarks extracted from the frontal video stream of camera 2 (Figure 2e).

Participants remained seated and still for approximately ten seconds, first maintaining a neutral expression and then producing a brief verbal response, mirroring routine clinical observation. Facial movements were characterized using KeyPoints from the FACE-68 model [32,33,34,35,36]. Eight landmarks were used for blink analysis and four for mouth activity. For each eye, eyelid aperture was estimated as the sum of two vertical inter-KeyPoint distances: d(FA37, FA41) and d(FA38, FA40) for the right eye, and d(FA43, FA47) and d(FA44, FA46) for the left.

Summation improves robustness to local KeyPoint jitter, while z-score normalization mitigates inter-subject differences in facial scale. Rapid reductions in these signals toward zero indicate eyelid closure, enabling reliable detection of blink timing consistent with established video-based approaches [59,60,61]. Mouth activity and smile emergence were quantified through distances d(FA62, FA66) and d(FA60, FA64), which increase during lip opening and smiling.

These measures provide interpretable proxies of lower facial activation and are conceptually related to established facial description frameworks such as FACS [62]. Prior to feature extraction, all distance trajectories were filtered to attenuate high-frequency noise. Trajectories were low-pass filtered (cut-off 6 Hz) prior to feature extraction. From the processed signals, the following descriptors were extracted. Blink-related features were derived from the eyelid distance signals:

BlinkRate: Number of detected blink events per second, averaged over the observation window. Reduced blink rate is a well-established marker of hypomimia and facial bradykinesia in PD (blinks/s).
IBI_CV: Coefficient variation of the inter-blink intervals (IBI), computed as the ratio between the standard deviation and the mean of successive blink intervals. This feature quantifies blink temporal regularity, with higher values indicating more irregular and sporadic blinking patterns (dimensionless).
BlinkDur_mean: Mean duration of blink events, measured as the time between eyelid closing and reopening. Alterations in blink duration may reflect impaired coordination of orbicularis oculi activation (s).
BlinkAmp: Mean blink amplitude, defined as the difference between baseline eyelid separation and the minimum distance reached during a blink. This feature reflects the completeness and strength of eyelid closure (m).
Smile-related features were derived from lip separation distance signals:
LipSep_mean: Mean lip separation over the observation window, reflecting the overall level of facial engagement and mouth opening (m).
LipSep_max: Maximum lip separation observed during the task, capturing the peak amplitude of voluntary or spontaneous smiling movements (m).
SmileOnset: Latency between the start of the observation window and the first detected smile activation, defined as the time at which lip separation exceeds a predefined threshold. This feature measures the promptness of facial motor activation (s).
SmileDur: Total cumulative duration during which lip separation remains above the smile threshold, reflecting the ability to sustain facial expressions (s).
SmileCount: Number of distinct smile activations detected during the observation window, providing a measure of facial expressivity frequency (dimensionless).

Features extracted from the three repetitions were aggregated at the subject level to obtain a single set of descriptors per individual. The resulting Facial Expression Index (FEI) represents the MII for this item within the standardized aggregation framework.

2.3.3. Testing Rigidity

Rigidity is a cardinal motor feature of Parkinson’s disease and is defined as increased resistance to passive movement, independent of velocity [6]. In MDS-UPDRS Part III (item 3.3), it is evaluated through examiner-driven manipulation of the neck and limbs while the patient remains relaxed.

The rating integrates haptic perception of resistance, uniformity, intermittency, and side asymmetry. Because rigidity emerges only in response to externally applied forces, it cannot be inferred from vision-based observation of voluntary or spontaneous movements. Passive joint resistance is therefore not recoverable from video-derived kinematics alone.

This limitation is widely acknowledged in the literature on digital biomarkers for PD. Several reviews emphasize that rigidity remains largely inaccessible to contactless sensing modalities and cannot be reliably quantified without physical interaction or direct torque measurements [8,37,38]. Accordingly, the present framework does not attempt to approximate rigidity. This choice prevents the introduction of unreliable surrogate indicators and maintains methodological consistency with the capabilities of video-based assessment.

For this reason, MDS-UPDRS item 3.3 is excluded from automated computation and remains within the domain of standard neurological examination.

2.3.4. Testing Finger Tapping

Finger tapping is a standard task in MDS-UPDRS Part III used to assess upper-limb bradykinesia through amplitude, speed, and rhythm of repetitive movements [6,27]. The protocol, illustrated in Figure 2d, includes separate evaluations for the right and left hands.

Participants were asked to perform ten consecutive taps by repeatedly touching the index finger to the thumb as quickly and as widely as possible. Hand motion was quantified by tracking the index fingertip and thumb tip. The Euclidean distances d(LH04, LH08) and d(RH04, RH08) were computed over time for the left and right hands, respectively. Minima correspond to finger contact, whereas maxima represent full opening. Distance trajectories were low-pass filtered (cut-off 12 Hz) prior to cycle segmentation [29,30]. Local extrema were then detected to segment individual tapping cycles. From each trial, the following descriptors were extracted [29,30,31,32,33]:

A_mean: the average per-cycle amplitude (mm).
f: the movement cycle frequency (Hz).
D: amplitude decrement ratio, defined as the ratio between the last and first cycle amplitudes (dimensionless).
R: regularity index computed as 1 − (σ_A/μ_A) (dimensionless).
P: pause ratio (0–1) defined as the total pause duration divided by the overall task duration (dimensionless).
J_t: temporal jitter, quantified as the standard deviation of inter-tap intervals (s).
S_amp: slope of amplitude variation over time, estimated via linear regression of cycle amplitudes across repetitions (mm/cycle).

Separate indices were computed for the left and right hands, reflecting the unilateral structure of the corresponding clinical item. The resulting Finger Tapping Index (FTI) summarizes task performance for each hand.

2.3.5. Testing Hand Movements

Hand opening–closing is a standard MDS-UPDRS Part III task used to evaluate upper-limb bradykinesia through movement amplitude, speed, and regularity [6,27]. The protocol mirrors the finger-tapping setup (Figure 2d) and includes independent assessment of the right and left hands.

Participants performed ten consecutive clench–unclench cycles as quickly and as widely as possible. For each hand, the thumb and little finger KeyPoints were tracked, and the Euclidean distances d(LH04, LH20) and d(RH04, RH20) were computed. Minima correspond to full clenching, whereas maxima indicate full opening. The distance signals were processed using the same pipeline adopted for finger tapping, including low-pass filtering, detection of local extrema, and segmentation into ten cycles.

This approach preserves voluntary dynamics while attenuating tremor components and localization noise [29,30]. From each trial, the following descriptors were extracted [29,30,31,32,33]:

A_mean: average per-cycle movement amplitude (mm).
f: movement cycle frequency (Hz).
D: amplitude decrement ratio (dimensionless).
R: regularity index, computed as 1 − (σ_A/μ_A) (dimensionless).
P: pause ratio (0–1) (dimensionless).
J_t: temporal jitter, (s).
S_amp: slope of amplitude variation across repetitions (mm/cycle).
The resulting Hand Movement Index (HMI) summarizes overall task performance.

2.3.6. Testing Pronation–Supination Movements of the Hands

Pronation–supination is an MDS-UPDRS Part III task evaluating upper-limb speed, amplitude, and rhythmicity [6,27]. Participants performed ten synchronous rotations of both hands with arms extended, turning the palms upward and downward as widely and rapidly as possible (Figure 2d).

Bimanual motion was quantified through the inter-thumb distance d(LH04, RH04) derived from the 3D KeyPoints. Successive minima and maxima correspond to alternating palm-down and palm-up configurations. Signals were processed using the same pipeline described for previous hand tasks, including low-pass filtering and segmentation into ten cycles. This approach preserves voluntary dynamics while attenuating tremor and localization noise [29,30]. From each trial, the following descriptors were extracted [29,30,31,32,33]:

A_mean: average per-cycle amplitude representing rotational excursion (mm).
f: movement cycle frequency, reflecting execution speed (Hz).
D: amplitude decrement ratio, defined as A_last/A_first quantifying fatigability across repetitions (dimensionless).
R: regularity index, computed as 1 − (σ_A/μ_A) (dimensionless).
P: pause ratio (0–1) (dimensionless).
J_t: temporal jitter (s).
S_amp: slope of amplitude variation across repetitions (mm/cycle).

The resulting Hand Pronation–Supination Index (H-PSI) corresponds to the MII and provides a continuous, unit-free quantitative descriptor of global bimanual motor performance, summarizing amplitude, speed, and rhythmic consistency within a single measure. This formulation aligns with current approaches to composite digital motor biomarkers in PD [29,30,31].

2.3.7. Testing Toe Tapping

Toe tapping is an MDS-UPDRS Part III task assessing lower-limb bradykinesia through movement amplitude, speed, and rhythmicity [6,27]. Right and left feet were evaluated independently.

Participants performed ten consecutive taps, lifting the toe as quickly and as widely as possible (Figure 2d). For each foot, the vertical coordinate of the big-toe KeyPoint was tracked, yielding the signal y(t). Minima correspond to ground contact, whereas maxima indicate peak elevation.

Signals were low-pass filtered (cut-off 5 Hz) prior to cycle segmentation [29,30]. Local extrema were then identified to segment ten cycles. From each trial, the following descriptors were extracted [29,30,31,32,33]:

A_mean: average per-cycle movement amplitude (mm).
f: movement cycle frequency (Hz).
D: amplitude decrement ratio, defined as A_last/A_first (dimensionless).
R: regularity index, computed as 1 − (σ_A/μ_A) (dimensionless).
P: pause ratio (0–1) (dimensionless).
J_t: temporal jitter (s).
S_amp: slope of amplitude variation across repetitions (mm/cycle).

The resulting Toe Tapping Index (TTI) corresponds to the MII for this item and provides a continuous descriptor of tapping performance relative to the healthy reference cohort.

2.3.8. Testing Leg Agility

Leg agility is an MDS-UPDRS Part III task evaluating lower-limb speed, amplitude, and rhythmicity [6,27]. Right and left legs were assessed independently.

Participants performed ten repetitions of lifting the foot and rapidly stomping it back down as widely and quickly as possible. Lower-limb motion was quantified by tracking the vertical coordinate of the heel KeyPoint, yielding the signal y(t). Minima indicate ground contact, whereas maxima correspond to peak elevation. Signals were low-pass filtered (cut-off 5 Hz) prior to cycle segmentation, and ten cycles were identified through local extrema detection [29,30].

From each trial, the following descriptors were extracted [29,30,31,32,33]:

A_mean: average per-cycle movement amplitude (mm).
f: movement cycle frequency (Hz).
D: amplitude decrement ratio, defined as A_last/A_first (dimensionless).
R: regularity index, computed as 1 − (σ_A/μ_A) (dimensionless).
P: pause ratio (0–1) (dimensionless).
J_t: temporal jitter (s).
S_amp: slope of amplitude variation across repetitions (mm/cycle).

The resulting Leg Agility Index (LAI) corresponds to the MII for this item and provides a continuous descriptor of leg agility performance relative to the healthy reference cohort.

2.3.9. Testing Arising from a Chair

Arising from a chair is evaluated in MDS-UPDRS item 3.9 to assess slowness, hesitation, balance, and the need for arm support [6,27].

Participants started seated and were instructed to stand up with arms crossed over the chest, avoiding armrests whenever possible. Upper-limb involvement was quantified by monitoring the distance between the left and right thumb KeyPoints, enabling detection of compensatory arm use.

Body kinematics during the rising phase were derived from the vertical trajectories of the nose, neck, and mid-hip landmarks. Rise duration was defined as the interval between movement onset and attainment of the upright position. Hesitations were identified from irregularities in these trajectories. Postural control after standing was evaluated through three body segments: the trunk (BO01–BO08), analyzed for forward flexion and lateral inclination; the shoulders (BO02–BO05), assessed for vertical asymmetry; and the hips (BO09–BO12), used to quantify pelvic tilt.

These variables were derived from the same acquisition setup adopted for the other tasks and are also used in the posture analysis described in Section 2.3.13. From these signals, the following descriptors were extracted [29,30,31]:

t_rise: total rise duration (s).
v_mean: mean vertical velocity of the mid-hip (m/s).
a_peak: peak vertical acceleration (m/s²).
U_hands: arm-use index, quantifying the contribution of the upper limbs during standing (dimensionless).
θ_forward_max: maximum forward trunk flexion during the rising phase (°).
θ_forward_slope: trunk extension rate from flexed to upright posture (°/s).
θ_lateral_max: maximum lateral trunk tilt, reflecting symmetry and balance (°).
R_sym: regularity index of lateral tilt recovery, quantifying postural steadiness (dimensionless).
P: pause ratio (0–1) (dimensionless).

The resulting Chair-Rising Performance Index (CRPI) corresponds to the MII for this item and summarizes performance of arising from a chair.

2.3.10. Testing Gait

Gait is a central domain of motor impairment in Parkinson’s disease and is evaluated in MDS-UPDRS Part III through observation of walking speed, stride amplitude, turning ability, arm swing, and postural control [6,27].

Participants walked along a straight path of at least 10 m, turned, and returned. Two conditions were tested: unobstructed walking and walking with obstacle bypass. Each condition was repeated as needed to achieve the required distance.

Lower-limb motion was quantified by tracking the heel KeyPoints of both feet. Step length and stride parameters were derived from their oscillatory trajectories, while walking speed was inferred from temporal progression. To reduce errors due to occlusions, only segments in which the foot closer to the camera was fully visible were analyzed. Turning performance was evaluated from the trajectory of the mid-hip KeyPoint, representing whole-body rotation. Foot clearance was estimated from vertical heel displacement during the swing phase.

Upper-limb coordination was characterized using wrist trajectories to quantify arm swing amplitude [29,30,31]. Postural control during locomotion was further assessed through body segment orientations: trunk (BO01–BO08) for forward and lateral inclination, shoulders (BO02–BO05) for vertical asymmetry, and hips (BO09–BO12) for pelvic alignment.

These measures complement spatiotemporal parameters and are also used in the posture analysis described in Section 2.3.13. From heel, wrist, and trunk signals, the following descriptors were extracted [29,30,31,32,33]:

Step length mean: average stride amplitude derived from heel oscillations (m).
Step length CV: coefficient of variation in step length, quantifying stride consistency (dimensionless).
Heel lift maximum: peak vertical displacement of the heel during the swing phase (m).
Heel lift CV: variability of heel-lift height, reflecting stability of foot clearance (dimensionless).
Arm swing amplitude: mean oscillation amplitude of the wrists in the sagittal plane (m).
Arm swing CV: variability of arm swing amplitude, indicating rhythmicity and coordination (dimensionless).
Distance traveled: total forward displacement during the walking segment (m).
Lateral tilt maximum: peak trunk inclination in the frontal plane (°).
Lateral tilt CV: temporal variability of lateral trunk inclination, reflecting postural control (dimensionless).
Forward tilt maximum: maximum trunk inclination in the sagittal plane (°).
Forward tilt CV: variability of forward trunk inclination during walking (dimensionless).

The resulting Gait Index (GI) corresponds to the MII for this item and provides a continuous quantitative descriptor of gait performance relative to the healthy reference cohort.

2.3.11. Testing Freezing of Gait

Freezing of gait (FoG) is an episodic inability to generate effective stepping despite the intention to walk and commonly emerges during turning or obstacle negotiation [27,30].

Within the present framework, FoG-related analysis was derived from the same walking protocol and trajectories described for general gait assessment (Section 2.3.10). Participants walked forward, turned 180°, and returned along the same path. Although the experimental space required repeated forward–backward segments, this configuration increased the number of direction changes, which are recognized contexts of higher freezing susceptibility [27].

The objective was not to provoke overt freezing episodes but to characterize vulnerability through quantitative descriptors associated with hypometric stepping and increased variability. Kinematic analysis relied on the same heel, wrist, and trunk KeyPoints adopted for gait evaluation, ensuring methodological consistency. No additional variables were introduced. Instead, the standardized gait features were reorganized into two interpretative domains.

The first domain includes amplitude-related descriptors reflecting reduced stride generation and forward propulsion, namely mean step length, maximum foot clearance, arm-swing amplitude, and total distance traveled.

The second domain comprises variability-related descriptors capturing instability of cycle-to-cycle coordination, including coefficients of variation in step length, foot clearance, arm swing, and trunk orientation [30,31].

The Freezing of Gait Index (FoGI) was computed using the aggregation framework defined in Equation (2). Although derived from the same standardized inputs as the Gait Index (GI), the FoGI differs in weighting logic: amplitude-related variables were sign-inverted so that reduced excursion corresponds to higher vulnerability, whereas variability-related descriptors were maintained in their direct form. The resulting FoGI provides a continuous measure of freezing-related gait susceptibility relative to the healthy reference and represents the MII associated with this construct.

2.3.12. Testing Postural Stability

Postural stability is evaluated in MDS-UPDRS Part III through an externally induced perturbation [6,27].

Participants stood upright with feet parallel and arms relaxed while a rapid backward pull was applied at the shoulders. The clinical rating considers the number of corrective steps or the occurrence of balance loss.

Postural responses were quantified by tracking lower-limb and trunk KeyPoints. Backward stepping behavior was estimated from variations in the distance between the left and right big toes (BO19, BO22), providing a robust measure of step initiation and count. Global body displacement during the perturbation was approximated from the vertical trajectory of the hip KeyPoint (BO12), used as a proxy of center-of-mass motion. From these trajectories, the following descriptors were extracted [29,30,31]:

steps_backward_count: number of backward steps following the perturbation (dimensionless).
max_feet_distance_change: maximum change in inter-feet distance (m).
hip_y_min_drop: minimum vertical displacement of the hip during perturbation (m).
recovery_time: time required to return to a stable upright posture (s).
peak_backward_velocity: peak backward velocity of the hip/center of mass (m/s).
balance_steps_number: total number of corrective steps taken (dimensionless).
recovery_step_length: length of the corrective recovery step (m).
sway_ap_std: standard deviation of anterior–posterior sway during the recovery phase (m).
sway_ml_std: standard deviation of medial–lateral sway (m).
com_displacement: backward displacement of the center of mass (m).
trunk_angle_peak: peak trunk angle variation during perturbation (°).

The resulting Postural Stability Index (PSI) corresponds to the MII for this item and provides a continuous quantitative descriptor of postural stability performance relative to the healthy reference cohort.

2.3.13. Testing Posture

Posture is evaluated in MDS-UPDRS Part III (item 3.13) through inspection of body alignment during both static stance and movement [6,27]. In the present framework, posture assessment integrates information from the sit-to-stand task (Section 2.3.9), gait analysis (Section 2.3.10), and an additional static standing test lasting approximately 10 s (Figure 2c).

In all conditions, alignment was quantified using three body segments: trunk (BO01–BO08), shoulders (BO02–BO05), and hips (BO09–BO12). Consistent with the clinical scoring logic, the worst configuration observed across tasks was retained for each descriptor prior to aggregation within the MII framework.

Posture-related descriptors were defined to capture both baseline alignment and dynamic stability. During static standing, mean segment orientations and their variability in sagittal and frontal planes were computed, reflecting global inclination and postural micro-adjustments.

From the sit-to-stand transition, maximum forward and lateral inclinations together with the regularity index of realignment (R_sym) were derived.

From gait, peak trunk inclinations and their coefficients of variation were extracted to quantify alignment consistency during locomotion [29,30,31].

The resulting feature set included the following static standing features:

forward_tilt_mean_static: mean sagittal-plane trunk inclination (°).
lateral_tilt_mean_static: mean frontal-plane trunk inclination (°).
shoulder_tilt_mean_static: average shoulder-level asymmetry (°).
hip_tilt_mean_static: average hip-level asymmetry (°).
forward_tilt_RMS_static: RMS variability of forward trunk inclination (°).
lateral_tilt_RMS_static: RMS variability of lateral trunk inclination (°).
Dynamic posture features from Section 2.3.9:
θ_forward_max: maximum forward trunk inclination during rising (°).
θ_lateral_max: maximum lateral trunk inclination during rising (°).
R_sym: regularity index of postural realignment (dimensionless).
Dynamic posture features from Section 2.3.10:
forward_tilt_max: maximum forward trunk inclination during walking (°).
forward_tilt_CV: coefficient of variation of forward trunk inclination (dimensionless).
lateral_tilt_max: maximum lateral trunk inclination during walking (°).
lateral_tilt_CV: coefficient of variation of lateral trunk inclination (dimensionless).

The resulting Posture Performance Index (PPI) corresponds to the MII for this item and provides a continuous quantitative descriptor of postural alignment relative to the healthy reference cohort.

2.3.14. Testing Body Bradykinesia

Global body bradykinesia is summarized in MDS-UPDRS item 3.14 through the examiner’s overall impression of slowness, reduced amplitude, and diminished spontaneity observed across different motor contexts [6,27].

To obtain a quantitative counterpart, the present framework integrates descriptors derived from the arising-from-chair task (Section 2.3.9), gait analysis (Section 2.3.10), and posture evaluation (Section 2.3.13).

This strategy reflects the multidimensional nature of the construct and relies exclusively on features already defined in those sections. No additional metrics were introduced, ensuring methodological consistency across items. The following variables contributed to the estimation of global body bradykinesia. From Section 2.3.9:

t_rise: rise duration (s).
v_mean: mean vertical velocity (m/s).
a_peak: peak vertical acceleration (m/s²).
θ_forward_max: maximum forward trunk inclination (°).
θ_forward_slope: trunk extension rate during standing up (°/s).
P: pause ratio, reflecting hesitations (dimensionless).
R_sym: regularity of trunk realignment (dimensionless).
Features from Section 2.3.10:
arm_swing_amp: arm swing amplitude (m)
arm_swing_CV: variability of arm swing (dimensionless).
step_length_mean: average step length (m).
step_length_CV: variability of step length (dimensionless).
heel_lift_max: maximum heel clearance (m)
heel_lift_CV: variability of heel clearance (dimensionless).
distance_traveled: total locomotor displacement (m)
forward_tilt_max: maximum forward trunk inclination during gait (°).
forward_tilt_CV: variability of forward trunk inclination (dimensionless).
lateral_tilt_max: maximum lateral trunk deviation (°).
lateral_tilt_CV: variability of lateral trunk deviation (dimensionless).
Features from Section 2.3.13:
forward_tilt_mean_static: mean sagittal trunk inclination (°).
lateral_tilt_mean_static: mean frontal trunk inclination (°).
shoulder_tilt_mean_static: shoulder-level asymmetry (°).
hip_tilt_mean_static: hip-level asymmetry (°).
forward_tilt_RMS_static: RMS of anterior–posterior sway (°).
lateral_tilt_RMS_static: RMS of medio–lateral sway (°).

The resulting Global Body Bradykinesia Performance Index (GBPI) corresponds to the MII for item 3.14 and summarizes whole-body motor behavior across the evaluated domains. The GBPI should be interpreted as an operational quantitative descriptor rather than a direct clinical rating.

2.3.15. Testing Postural Tremor of the Hands

Postural tremor is evaluated in MDS-UPDRS Part III (item 3.15) during sustained arm extension against gravity, with separate ratings for the right and left hands [6,27]. Participants held both arms forward at shoulder height for approximately 10 s, with palms down and fingers spread.

Four fingertip KeyPoints were tracked: index and little finger for each hand. For each landmark, a distance signal d(t) was computed relative to a fixed upper-body reference, allowing isolation of oscillatory motion while reducing sensitivity to global body displacement.

To extract tremor-related components, signals were band-pass filtered in the 3–12 Hz range [29,30,31]. Feature computation was performed on the filtered trajectories d_BP(t). For each hand, descriptors obtained from the two fingertips were averaged to produce unilateral measures. The following features were extracted:

A_peak: maximum peak-to-peak tremor amplitude, measuring the largest excursion during the task (m).
A_RMS: RMS tremor amplitude, quantifying overall oscillatory energy (m).
A_95: robust tremor amplitude (5th–95th percentile range) (m).
f_peak: dominant tremor frequency, corresponding to the peak of the power spectrum within 3–12 Hz (Hz).
BW_50: half-power spectral bandwidth, indicating spectral coherence around the dominant frequency (Hz).
S_reg: regularity index derived from autocorrelation, reflecting rhythmic stability (dimensionless).
CV_cycle: cycle-to-cycle variability, quantifying temporal irregularity of tremor oscillations (dimensionless).
D_drift: drift magnitude, capturing slow non-tremor displacement over the task duration (m).
S_LR: left–right symmetry index, computed as the normalized difference between left- and right-hand RMS amplitudes (dimensionless).
C_LR: inter-hand coupling, defined as the maximum cross-correlation between left- and right-hand filtered signals (dimensionless).

Two independent indices were computed—Left and Right Postural Tremor Indices (L-PTI and R-PTI)—corresponding to item 3.15. These indices summarize unilateral tremor amplitude and stability. Although not included in unilateral aggregation, bilateral descriptors such as S_LR and C_LR provide complementary information on tremor asymmetry and inter-limb coordination.

2.3.16. Testing Kinetic Tremor of the Hands

Kinetic tremor is evaluated in MDS-UPDRS Part III (item 3.16) during voluntary goal-directed movement, with separate ratings for each hand [6,27]. Participants alternated the index finger between the nose and a fixed target at a slow pace, and each side was tested independently.

For each trial, the index fingertip of the active hand and the nose KeyPoint were tracked. The three-dimensional inter-point distance d(t) captures the voluntary movement trajectory together with superimposed oscillatory components.

To isolate tremor-related activity, signals were band-pass filtered in the 3–12 Hz range [29,30,31], attenuating slow intentional motion and high-frequency noise. Feature extraction was performed on the filtered trajectories d_BP(t). Because movements are voluntary and non-stationary, bilateral symmetry or inter-hand coupling measures were not included. From the filtered signals, the following descriptors were extracted:

A_peak: maximum peak-to-peak tremor amplitude, representing the largest oscillatory excursion (m).
A_RMS: RMS tremor amplitude, quantifying overall oscillatory energy (m).
A_95: robust tremor amplitude (5th–95th percentile range) (m).
f_peak: dominant tremor frequency, corresponding to the peak of the power spectral density within 3–12 Hz (Hz).
BW_50: half-power spectral bandwidth, indicating spectral concentration and coherence (Hz).
S_reg: regularity index based on autocorrelation, reflecting rhythmic stability (dimensionless).
CV_cycle: cycle-to-cycle variability, quantifying temporal irregularity (dimensionless).
DC: tremor duty cycle, defined as the fraction of time during which tremor oscillations exceed a standardized amplitude threshold (dimensionless).

Two independent composite indices were computed—Left and Right Kinetic Tremor Indices (L-KTI and R-KTI)—corresponding to the Motor Item Indices associated with item 3.16. These indices provide continuous unilateral descriptors of kinetic tremor behavior relative to the normative reference framework.

2.3.17. Testing Rest Tremor Amplitude

Rest tremor amplitude is evaluated in MDS-UPDRS Part III (item 3.17) by considering the maximum oscillatory excursion observed while the body segment is at rest [6,27]. In the present framework, tremor constancy is addressed separately (item 3.18).

Although a standardized quiet-sitting interval was recorded, the automated analysis considered all examination segments in which a given anatomical region was identified as being at rest. Tremor activity was quantified using selected KeyPoints acting as local motion sensors. Distal limb landmarks were used for the right and left body sides, while lip and jaw landmarks represented the orofacial region.

For each KeyPoint, displacement was computed as the Euclidean distance from its mean position, thereby removing static offsets. Signals were analyzed exclusively during rest intervals and band-pass filtered in the 3–7 Hz range [29,30,31], consistent with typical Parkinsonian rest tremor frequencies. From the filtered signals, the following amplitude-oriented descriptors were extracted:

A_peak_rest: maximum peak-to-peak amplitude observed during rest intervals (m).
A_RMS_rest: RMS amplitude of the filtered signal, quantifying overall tremor energy (m).
A_95_rest: robust amplitude estimate defined as the 5th–95th percentile range (m).
A_max_event: maximum tremor amplitude observed across all valid rest intervals during the examination (m).

For each anatomical region, KeyPoint-level values were combined by retaining the maximum. When multiple rest intervals were available, the final regional descriptor corresponded to the largest value observed across windows.

Three regional indices were obtained—jaw/lips, right body side, and left body side—for item 3.17. The overall index was defined as the maximum among the three regional values, mirroring the summarization rule adopted for rest tremor assessment.

2.3.18. Testing Constancy of Rest Tremor

Rest tremor constancy (MDS-UPDRS item 3.18) describes the temporal persistence of tremor during periods in which the body region is at rest [6,27]. In contrast to amplitude estimation (item 3.17), this analysis focuses on the proportion of time during which tremor activity is present.

Kinematic trajectories were evaluated over the entire recording, restricting computation to intervals classified as rest according to the criteria adopted for rest tremor amplitude assessment. Within these intervals, tremor-related oscillations were isolated using band-pass filtering in the 3–7 Hz range [29,30,31].

Local tremor amplitude was estimated within sliding windows and compared with thresholds derived from the healthy reference cohort, yielding a binary representation of tremor presence (“ON”) or absence (“OFF”). From this temporal representation, the following descriptors were extracted:

RTCI: Rest Tremor Constancy Index (0–1), defined as the fraction of total rest time during which tremor activity is detected (dimensionless).
N_burst: tremor burst rate, defined as the number of distinct tremor episodes per minute of rest (min⁻¹).
T_on_med: median duration of tremor (“ON”) episodes, reflecting typical episode persistence (s).
T_off_med: median duration of tremor-free (“OFF”) intervals between successive episodes (s).
PI: Persistence Index, defined as the ratio between the duration of the longest tremor episode and the total rest time, capturing the dominance of sustained tremor activity over intermittent patterns (dimensionless).

Regional constancy indices were computed separately for jaw/lips, right body side, and left body side. The subject-level descriptor was defined as the maximum across regions, providing a unified summary of tremor persistence within the automated framework. The resulting index represents a continuous description of tremor constancy relative to the normative reference and is not intended to replicate ordinal clinical ratings.

2.4. Synthesis of Motor Item Indices

For each evaluable task aligned with MDS-UPDRS Part III, a corresponding MII was defined as described in Section 2.3.1, Section 2.3.2, Section 2.3.3, Section 2.3.4, Section 2.3.5, Section 2.3.6, Section 2.3.7, Section 2.3.8, Section 2.3.9, Section 2.3.10, Section 2.3.11, Section 2.3.12, Section 2.3.13, Section 2.3.14, Section 2.3.15, Section 2.3.16, Section 2.3.17 and Section 2.3.18.

Each MII represents a continuous, unit-free descriptor expressed in standardized units relative to the healthy reference cohort. Rigidity (item 2.3.3) was not included, as its evaluation requires passive manipulation and cannot be inferred from vision-based kinematics alone. MIIs were derived from task-specific subsets of quantitative features describing speech, facial activity, limb movements, posture, gait, tremor, and balance. For clarity, a consolidated summary of the digital descriptors contributing to each MII across all evaluated tasks is provided in Supplementary Table S3. All features were standardized with respect to the normative sample, and item-level indices were obtained through the aggregation strategy defined in Equations (1)–(3).

Because all indices share the same normalization framework, they are expressed on a common reference scale. This property allows coherent comparison of measurements across tasks and individuals within the adopted internal reference baseline.

In Section 3, MIIs are examined both separately and jointly to explore their distribution, internal relationships, and structural organization within the normative dataset. This representation establishes a systematic bridge between multidimensional measurements and the item-based structure of the clinical examination, while remaining descriptive with respect to the reference population.

3. Results

All experimental trials were conducted in a dedicated indoor environment at the ENEA “Casaccia” Research Centre.

The room (approximately 6.30 × 5.90 m) was equipped with a synchronized multimodal acquisition system integrating stereo video and audio recording. Two stereo cameras provided complementary viewpoints, while a lavalier microphone captured speech concurrently with motor tasks. This configuration enabled temporal alignment between kinematic and acoustic streams. Representative views of the environment and corresponding 3D reconstructions are shown in Figure 4.

Unless otherwise specified, cameras were positioned at a height of 90 cm with frontal orientation, ensuring full-body coverage across seated, standing, and walking activities while limiting occlusions. Prior to quantitative processing, all recordings underwent quality-control procedures. Visual inspection verified KeyPoint stability and anatomical visibility.

Trials with excessive occlusion (>30% unreliable frames) were excluded. For repetitive tasks, acquisitions with fewer than 70% valid cycles were discarded. Audio signals were checked for clipping, noise contamination, and microphone stability. Retained recordings satisfied requirements for acoustic analysis and ASR-based processing. These criteria ensured that subsequent analyses relied on temporally aligned and technically reliable measurements.

3.1. Speech

Speech-related features were extracted from a standardized reading task using the automated speech analysis pipeline described in Section 2, which combines acoustic descriptors of speech production with an objective estimate of speech intelligibility derived from ASR-based text similarity. Features were aggregated at the subject level, yielding a single set of descriptors per individual, in accordance with the unilaterally scored nature of MDS-UPDRS item 3.1. The experimental setup adopted for speech assessment, together with an example of the automatically detected body and facial KeyPoints used in the analysis, is illustrated in Figure 5.

The same frontal configuration and KeyPoint detection pipeline were used consistently across all subjects. Descriptive statistics (mean ± SD, median [interquartile range (IQR; 25th–75th percentile)], and 5th–95th percentile) of acoustic and intelligibility-related speech features, described in Section 2.3.1, in the healthy control group are summarized in Table 2. These values provide a quantitative characterization of normal speech production across multiple complementary dimensions.

Mean sound pressure level (SPL_mean) falls within the typical conversational range and shows limited inter-subject variability, indicating preserved vocal intensity. Fundamental frequency variability (F0_var) exhibits moderate dispersion, consistent with normal prosodic modulation. Speech rate (SpeechRate) is tightly clustered around values expected for fluent adult speech, reflecting stable articulation timing. Fluency-related measures further support normal performance, with low pause ratios and short silent intervals. Voice quality, as quantified by the Harmonics-to-Noise Ratio (HNR), exhibits values typical of healthy phonation, indicating preserved articulatory control. Finally, the Jaccard distance used as an objective proxy for speech intelligibility remains close to zero with minimal variability, confirming a high correspondence between the spoken and reference texts.

Based on these feature values, the Speech Index (SI) was computed for each subject in the healthy reference group following the standardized procedure described in Equations (1)–(3). In healthy controls, the resulting SI distribution exhibits a mean value close to zero and a standard deviation of 0.44, reflecting inter-subject variability within the healthy reference cohort across intensity, prosody, speech rate, fluency, voice quality, and intelligibility without evidence of systematic deviations or outlier behavior. This distribution defines the reference baseline derived from healthy subjects to support future quantitative comparisons in clinically characterized PD cohorts.

To avoid redundancy across motor domains, the interpretation of correlation matrices and PCA projections follows a consistent analytical framework throughout the manuscript. Unless otherwise specified, correlation matrices are discussed descriptively in terms of clustering tendencies and relative association strength, and PCA projections are interpreted as exploratory visualizations of variance organization within the present reference cohort.

To examine relationships among the acoustic and intelligibility-related speech features and to assess potential redundancy within the feature set, pairwise Spearman rank correlations were computed across healthy subjects. As shown in Figure 6 (left), the resulting matrix highlights which speech descriptors tend to co-vary and which ones behave more independently within the healthy reference cohort.

The resulting correlation matrix reveals a structured yet non-redundant pattern of associations. Moderate correlations are observed between prosodic variability (F0_var) and pause ratio, while speech rate shows weak-to-moderate negative associations with both prosodic variability and intelligibility-related measures. Mean sound pressure level exhibits a moderate positive association with Jaccard distance, suggesting that variations in vocal intensity may influence automatic transcription accuracy. In contrast, voice quality (HNR) shows weak correlations with the remaining features, supporting its role as a largely independent descriptor of phonatory control. Overall, these findings indicate that the selected features describe partially complementary aspects of speech production within the normative cohort.

The multivariate structure of the speech feature space was further investigated using PCA on the standardized features extracted from the healthy control group. As shown in Figure 6 (center and right), subjects cluster compactly around the origin, reflecting homogeneous speech variability within the healthy reference cohort. Variability along the first two principal components reflects normal inter-subject differences within this sample rather than distinct subgroups.

Inspection of the PCA loadings shows that the first principal component is primarily driven by vocal intensity and intelligibility-related measures, whereas the second component is dominated by prosodic and fluency-related features. Speech rate and voice quality contribute to complementary directions, further highlighting the intrinsically multidimensional nature of normal speech production. Collectively, these results provide a descriptive representation of how speech variability in healthy individuals can be organized across partially independent dimensions related to intensity, prosody, fluency, voice quality, and intelligibility. This multidimensional organization provides a rationale for aggregating these complementary descriptors into a composite Speech Index within the reference framework adopted in this study.

3.2. Facial Expression

Facial expression features were extracted from the same frontal video recordings used for speech assessment and aggregated at the subject level, yielding a single set of descriptors per individual, consistent with the non-lateralized scored nature of MDS-UPDRS item 3.2. Facial movements were quantified using vision-based tracking of eyelid- and mouth-related KeyPoints, as described in Section 2.3.2.

Representative examples of detected facial KeyPoints and connecting segments during neutral facial posture and during smiling are shown in Figure 7, while the corresponding temporal trajectories highlighting spontaneous blinking and smile emergence are illustrated in Figure 8.

For visualization purposes, the plots reported in Figure 8—and in all subsequent figures throughout the manuscript—display the unfiltered trajectories. Accordingly, Figure 8 is meant to be an intuitive illustration of event dynamics (blink closures and smile activation timing), whereas all reported descriptors are extracted from the filtered signals to ensure robustness. All quantitative features and indices were instead computed exclusively from the filtered signals described in Section 2.3.2. This distinction allows direct inspection of the natural temporal variability of the KeyPoint signals while ensuring robustness and reproducibility of the derived metrics.

Reference distributions of blink- and smile-related facial expression features in the healthy control group are summarized (see Section 2.3.2). This reference distribution provides a baseline for deviation analysis rather than a diagnostic threshold. The values in Table 3 provide a quantitative characterization of normal facial motor behavior across complementary dimensions.

Blink-related metrics indicate preserved spontaneous blinking activity, with blink rates within physiological ranges and moderate variability of inter-blink intervals. Mean blink duration and blink amplitude show compact distributions, consistent with normal eyelid kinematics and the absence of slowed or incomplete blinks. Smile-related features further reflect intact facial expressiveness, with adequate mean and maximum lip separation, timely smile initiation, and sufficient smile duration and frequency across subjects. Overall, the relatively narrow inter-subject variability observed across all features is consistent with the stereotyped nature of basic facial motor patterns in neurologically intact individuals.

Based on these feature values, the Facial Expression Index (FEI) was computed for each subject in the healthy reference group following the standardized procedure defined previously. In healthy controls, the distribution of individual FEI exhibits a mean value close to zero and a standard deviation of 0.28, indicating limited inter-subject variability of facial expression dynamics under controlled conditions. Compared with motor tasks involving distal limb movements or speech production, facial expressions exhibited a more compact variability profile, consistent with the constrained physiological range of spontaneous blinking and basic facial movements. This distribution provides a normative reference for describing deviations in facial motor behavior within the framework adopted in this study.

The correlation structure and PCA projection are summarized in Figure 9. Blink-related features showed moderate internal associations, while smile-related descriptors exhibited stronger coupling, particularly among amplitude and temporal variables. Cross-domain correlations between blinking and smiling were weaker, indicating partially independent facial subsystems within the reference cohort.

Consistent with this structure, PCA provides a compact low-dimensional visualization without distinct subject clustering. The first components reflect complementary contributions from eyelid activity, smile amplitude, and temporal organization of expressive events. Overall, these findings describe a multidimensional yet balanced organization of facial motor behavior in healthy individuals, supporting the aggregation of blink- and smile-related descriptors into a composite Facial Expression Index.

3.3. Rigidity

As discussed in Section 2.3.3, rigidity cannot be quantified within the present contactless vision-based framework. The clinical assessment of rigidity requires passive manipulation of joints by the examiner in order to perceive resistance to externally imposed movement, a mechanism that is intrinsically inaccessible to video-derived kinematic observation alone. Consequently, none of the five sub-items of MDS-UPDRS item 3.3 (neck, right and left upper limbs, right and left lower limbs) are computed in this study.

This exclusion reflects a deliberate methodological boundary between motor phenomena that can be described through observable motion and those that depend on passive biomechanical properties. Explicitly defining this scope ensures conceptual consistency between the automated measurements and the nature of the information that can be reliably extracted from contactless sensing. The framework therefore focuses on characterizing domains that are accessible to vision-based quantification, while leaving rigidity assessment to standard clinical examination.

3.4. Finger Tapping

Finger tapping performance was evaluated separately for the right and left hands using the kinematic descriptors defined in Section 2.3.4. Trials were visually inspected to verify adequate hand visibility and stable KeyPoint detection; no recordings were excluded in the healthy control dataset. Representative examples of pose tracking and thumb–index distance trajectories are shown in Figure 10.

Reference distributions of the extracted kinematic features for the healthy control group are summarized in Table 4, separately for the right and left hands.

Across subjects, all features exhibit physiologically plausible variability. Mean tapping amplitude (A_mean), movement frequency (f), amplitude decrement ratio (D), and regularity (R) fall within expected healthy ranges, with moderate inter-subject dispersion reflecting the intrinsically dynamic nature of the task.

Pause ratio (P) and temporal jitter (J_t) remain low overall, confirming the absence of interruptions or marked timing irregularities in healthy finger tapping.

Right–left differences are limited, with largely overlapping distributions for all features, indicating balanced bilateral fine-motor performance.

Based on these feature values, the Finger Tapping Index (FTI) was computed separately for the right and left hands following the standardized procedure defined previously. In the healthy reference group, the resulting FTI distributions were centered around zero, with standard deviations of 0.35 for the right hand and 0.48 for the left hand. This reflects the expected symmetry of the normalization procedure while capturing the greater inter-subject variability typical of repetitive fine-motor tasks. No systematic lateral bias was observed, supporting the use of both hands as independent but comparable references.

The correlation matrix and PCA projection (Figure 11) summarize the internal organization of finger-tapping descriptors. Amplitude-, speed-, and regularity-related features show coherent associations, whereas pause-related measures contribute more independently.

In the PCA space, subjects form a compact distribution, with the first component primarily reflecting overall movement execution and the second capturing timing variability. Together, these findings describe complementary quantitative dimensions of finger tapping within the healthy reference cohort, supporting their aggregation into a composite index within the exploratory baseline framework.

3.5. Hand Movements

Hand movements were evaluated separately for the right and left sides using the kinematic descriptors defined in Section 2.3.5. All recordings satisfied the predefined quality criteria. Representative examples of pose tracking and thumb–little finger distance trajectories are shown in Figure 12.

Reference distributions of the extracted features are summarized in Table 5.

Across subjects, amplitude (A_mean), frequency (f), decrement ratio (D), and regularity (R) display compact dispersion, while pause ratio (P) and temporal jitter (J_t) remain limited, indicating smooth and continuous execution. Values for the two sides largely overlap, supporting bilateral consistency within the normative cohort.

The Hand Movement Index (HMI) was computed independently for each hand using the standardized aggregation framework. In healthy participants, both indices are centered around zero, with standard deviations of 0.34 for the right hand and 0.30 for the left, reflecting expected inter-subject variability associated with larger-amplitude movements.

The correlation matrix and PCA projection (Figure 13) summarize the internal organization of hand movement descriptors. Amplitude- and regularity-related features exhibit coherent associations, whereas pause- and timing-related measures contribute more independently. In the PCA space, subjects form a compact distribution, with the first component primarily reflecting global movement execution and the second capturing timing variability. Overall, these results indicate complementary quantitative dimensions within the healthy reference cohort, supporting their integration into a composite index within the exploratory baseline framework.

3.6. Pronation–Supination Movements of Hands

Pronation–supination performance was analyzed using the kinematic descriptors defined in Section 2.3.6. All recordings satisfied the quality criteria. Representative examples of pose tracking and inter-thumb distance trajectories are shown in Figure 14.

Because the task was executed simultaneously with both hands, the extracted features describe global bimanual coordination rather than side-specific performance. Reference values for the healthy cohort are reported in Table 6. Amplitude (A_mean), frequency (f), decrement ratio (D), and regularity (R) show limited dispersion, whereas pause ratio (P) and temporal jitter (J_t) remain small, consistent with smooth rhythmic rotations.

The Hand Pronation–Supination Index (H-PSI) was obtained using the standardized aggregation framework. Within the normative sample, the index is centered around zero with moderate spread, reflecting physiological variability expected for coordinated bimanual actions. Relationships among features were explored using Spearman correlations and PCA (Figure 15).

Amplitude and regularity descriptors tend to vary coherently, while timing-related measures contribute more independently. The correlation pattern suggests the presence of two descriptive groupings within the normative dataset:

A movement-related set (A_mean, R, D, S_amp), mainly reflecting amplitude and stability of execution;
Timing-related set (P, J_t), associated with cycle-to-cycle irregularities.

In the PCA representation, observations remain compactly distributed, with the first component mainly reflecting overall movement execution and the second capturing residual timing fluctuations. These analyses provide a descriptive view of the internal organization of bimanual rotational performance in healthy individuals and support the integration of complementary dimensions within the composite index.

3.7. Toe Tapping

Toe tapping was analyzed separately for the right and left feet using the descriptors defined in Section 2.3.7. All recordings satisfied the quality criteria. Representative examples of pose tracking and vertical toe trajectories are shown in Figure 16.

Reference distributions for the healthy cohort are summarized in Table 7. Amplitude (A_mean), frequency (f), decrement ratio (D), and regularity (R) exhibit compact variability, while pause ratio (P) and temporal jitter (J_t) remain limited, indicating continuous rhythmic execution. Right–left values largely overlap, supporting bilateral consistency within the normative sample.

The Toe Tapping Index (TTI) was computed independently for each side using the standardized aggregation framework. In healthy subjects, both indices are centered around zero, with moderate dispersion reflecting expected inter-individual differences. Relationships among features were explored through Spearman correlations and PCA (Figure 17).

Amplitude and regularity descriptors tend to vary coherently, whereas timing-related measures contribute more independently. Within the healthy reference cohort, these relationships can be qualitatively summarized into two descriptive groupings: an amplitude–regularity set (A_mean, R, D, S_amp) and a timing-related set (P, J_t). In the PCA representation, observations remain compactly distributed around the origin. The first component mainly reflects overall movement execution, while the second captures residual timing fluctuations.

Together, these results provide a descriptive picture of the internal organization of toe-tapping performance within the normative dataset and support the combination of complementary dimensions into the composite index.

3.8. Leg Agility

Leg agility was evaluated separately for the right and left sides using the kinematic descriptors defined in Section 2.3.8. All recordings met the quality requirements. Representative examples of pose tracking and vertical heel trajectories are shown in Figure 18.

Reference values for the healthy cohort are reported in Table 8. Amplitude (A_mean), frequency (f), decrement ratio (D), and regularity (R) present limited dispersion, whereas pause ratio (P) and temporal jitter (J_t) remain small, consistent with continuous rhythmic execution. Right–left distributions largely overlap, supporting bilateral consistency within the normative sample.

Associations among features were explored using Spearman correlations and PCA (Figure 19). Figure 19 illustrates which leg agility descriptors capture largely overlapping information and which ones contribute orthogonal timing-related variability in the normative dataset. Amplitude and stability descriptors tend to vary coherently, whereas timing-related measures contribute more independently.

Within the healthy reference cohort, these relationships can be qualitatively summarized into two descriptive groupings: an amplitude–stability set (A_mean, R, D, S_amp) and a timing-related set (P, J_t). In the PCA space, observations remain compactly distributed around the origin. The first component mainly reflects overall movement execution, while the second captures residual timing fluctuations.

These analyses (Figure 19) provide a descriptive representation of how different quantitative dimensions of leg agility are organized within the normative dataset and support their integration into a composite index.

3.9. Arising from a Chair

The sit-to-stand task was analyzed using the descriptors defined in Section 2.3.9. All recordings satisfied the quality criteria. Representative frames illustrating the main phases of the movement are reported in Figure 20, while examples of the principal trajectories used for feature computation are shown in Figure 21.

Reference values for the healthy cohort are summarized in Table 9. Temporal variables (t_rise, v_mean, a_peak) indicate consistent propulsion across subjects, with minimal reliance on upper limbs (U_hands). Trunk descriptors (θ_forward_max, θ_forward_slope, θ_lateral_max) show moderate variability, reflecting individual strategy differences, while R_sym remains high and P small, consistent with continuous execution.

The composite Chair-Rising Performance Index (CRPI) is centered around zero within the normative group, with moderate dispersion reflecting the multi-segment nature of the movement.

Associations among features were explored using Spearman correlations and PCA. As shown in Figure 22, propulsion-related and compensation-related descriptors exhibit partially distinct but interacting patterns within healthy sit-to-stand performance.

Propulsion-related measures tend to vary coherently, whereas lateral adjustments and brief hesitations contribute more independently.

Within the healthy reference cohort, these relationships can be qualitatively summarized into two descriptive groupings: a propulsion–strategy set (t_rise, v_mean, a_peak, θ_forward_max, θ_forward_slope) and a compensation-related set (U_hands, P, θ_lateral_max). In the PCA representation, subjects remain compactly distributed without evident separation.

The leading component mainly reflects global rising mechanics, while the second captures inter-individual differences in timing and trunk strategy.

These analyses provide a descriptive overview of how complementary aspects of the sit-to-stand action are organized within the normative dataset and support their integration into a unified index.

3.10. Gait

Gait assessment setup shown in Figure 23. Pose skeletons overlaid on frontal and lateral camera views during free walking and obstacle avoidance walking. The gait task was executed with the protocol described in Section 2.3.10.

Because the available straight path in the experimental room was limited to 5 m, participants repeated the trajectory multiple times, including direction changes, in order to approximate the walking distance required by the clinical procedure while increasing the number of cycles available for analysis.

Representative kinematic signals describing lower-limb coordination, foot clearance, arm swing, forward displacement, and trunk posture are shown in Figure 24. Together, these trajectories provide a multidimensional description of stride amplitude, rhythmicity, inter-limb coordination, and balance control during locomotion. Reference values for the healthy cohort are summarized in Table 10. Step length, heel lift, and arm swing exhibit stable magnitudes across subjects, whereas variability metrics remain low. Trunk inclination measures show moderate dispersion, reflecting physiological differences in walking style without indicating instability.

After normalization, features were aggregated into the Gait Index (GI). Within the healthy reference group, the GI is centered around zero with limited spread, consistent with the constrained variability expected in normal locomotion.

Relationships among descriptors were explored using Spearman correlations and PCA. As shown in Figure 25, gait variables organize into two partially distinct components: locomotor amplitude/coordination and variability/postural control. Amplitude-related variables tend to covary, whereas variability metrics display more independent behavior within the healthy cohort.

Within the normative dataset, these associations can be qualitatively summarized into two descriptive groupings: a stride–coordination set (step_length_mean, heel_lift_max, arm_swing_amp, distance_traveled) and a variability–postural-control set (step_length_CV, heel_lift_CV, arm_swing_CV, lateral_tilt_CV, forward_tilt_CV). In the PCA representation, subjects form a compact cloud without evident separation. The first component mainly reflects global locomotor expression, whereas the second captures inter-individual differences in timing consistency and trunk control. Overall, the multivariate organization observed in healthy participants indicates that gait emerges from complementary but not redundant contributions of amplitude and variability descriptors, supporting their integration within a composite index.

3.11. Freezing of Gait

The Freezing of Gait (FoG) analysis was derived from the same standardized kinematic feature space used for the general gait evaluation (Section 3.10). Consequently, subjects, trajectories, descriptive statistics, and multivariate relationships remain identical and are not repeated here.

The present section reinterprets that shared feature space with a specific focus on patterns commonly associated with freezing-related vulnerability. From a biomechanical perspective, freezing phenomena are typically described through two complementary tendencies [30]: reduction in effective locomotor amplitude and increase in cycle-to-cycle variability. Following this view, amplitude-related descriptors (step_length_mean, heel_lift_max, arm_swing_amp, distance_traveled) and variability-related descriptors (step_length_CV, heel_lift_CV, arm_swing_CV, lateral_tilt_CV, forward_tilt_CV) were reorganized according to their functional interpretation within the freezing framework.

Using this grouping, a Freezing of Gait Index (FoGI) was computed at the subject level. Although the FoGI relies on the same normalized variables employed for the Gait Index (GI), the aggregation logic differs. In the GI, deviations may compensate across dimensions, yielding a global summary of gait performance. In contrast, the FoGI aligns features along a common direction associated with hypometria and instability.

To obtain this alignment, amplitude-related descriptors were sign-inverted prior to aggregation, whereas variability measures were maintained in their original form. Within the healthy cohort, the FoGI distribution remains centered around zero by construction, with a dispersion of 0.00 ± 0.35 SD. The slightly broader spread compared with the GI reflects the cumulative effect of coherently oriented contributions rather than compensatory balancing across variables.

The internal organization of the feature space follows the same qualitative structure described for gait, where amplitude and variability dimensions behave as partially independent components. In this context, the FoGI should be interpreted as a thematic reweighting of the existing descriptors, intended to summarize the expression of patterns that, in clinical settings, are often discussed in relation to freezing.

Overall, this reinterpretation does not introduce new measurements but provides an alternative lens through which the same biomechanical information can be summarized, maintaining full consistency with the normative framework defined for gait.

3.12. Postural Stability

Postural stability was assessed through a perturbation-based task designed to observe how subjects reorganize balance after an externally applied backward disturbance. Aside from rigidity testing, this was the only protocol item involving the interaction between the participant and an operator. The subject stood upright while the operator delivered the perturbation as described in Section 2.3.12. Representative moments of the maneuver are shown in Figure 26a,b.

Using the procedures defined in Section 2.3.12, features were extracted to describe stepping behavior, stance modulation, recovery timing, trunk motion, and residual sway. Descriptive statistics for the healthy cohort are reported in Table 11.

Across subjects, feature distributions appear compact and physiologically plausible. The number of backward or balancing steps is generally low, while stance widening, and hip displacement remain limited.

Recovery time and peak backward velocity occupy relatively narrow ranges, indicating consistent responses within the cohort. Trunk excursion and center-of-mass displacement show moderate dispersion, whereas sway measures remain small in magnitude. These patterns describe the variability expected in neurologically intact individuals exposed to the same perturbation protocol.

After normalization, the Postural Stability Index (PSI) was computed for each subject. The distribution is centered around zero by construction (PSI = 0.00 ± 0.30 SD), with dispersion reflecting the coexistence of multiple recovery strategies rather than the presence of extreme behaviors. This provides an internal descriptive baseline for future comparisons.

For ease of reading, Figure 27 highlights the separation between step-based recovery behavior and sway-based stabilization, which represent two complementary modes of variability in the healthy cohort.

The Spearman matrix (Figure 27, left) reveals a structured pattern of monotonic associations. Stepping-related variables are naturally linked, with steps_backward_count closely associated with balance_steps_number. Recovery_step_length and peak_backward_velocity show inverse relationships, suggesting that larger steps tend to accompany more gradual deceleration. In contrast, sway descriptors exhibit weaker associations with stepping measures, indicating partial independence between oscillatory stabilization and step-based corrections. Trunk_angle_peak shows heterogeneous correlations, consistent with its role as an auxiliary adjustment. This organization can be summarized into two recurrent descriptive groupings:

Stepping–recovery descriptors (e.g., steps_backward_count, balance_steps_number, recovery_step_length, peak_backward_velocity), related to gross repositioning of the base of support.
Postural–oscillation descriptors (e.g., sway_ap_std, sway_ml_std, trunk_angle_peak), reflecting finer stabilization dynamics.

The PCA representation (Figure 27, center–right) is coherent with this reading. The first two components account for 47.4% of the variance (PC1 = 27.7%, PC2 = 19.7%). PC1 is primarily influenced by stepping and displacement variables, whereas PC2 receives stronger contributions from sway and trunk-related measures. The score distribution is compact and does not show evident outliers, indicating comparable recovery behaviors across participants. Importantly, these analyses are descriptive: they characterize how the selected variables organize within a healthy population and help identify dominant modes of variation in the dataset. They do not imply validation of specific physiological constructs but provide a structured reference for interpreting deviations in future patient cohorts.

3.13. Posture

Postural performance was evaluated using the experimental configuration described previously. Subjects performed the posture-related tasks detailed in Section 2.3.13, including quiet standing, rising from a chair, and walking.

The objective of this item is to capture both static alignment and dynamic trunk control within a unified quantitative framework corresponding to MDS-UPDRS item 3.13.

Representative examples are shown in Figure 28. The reconstructed skeletons illustrate typical upright alignment in the frontal and sagittal planes, while the reported trajectories highlight the small-amplitude oscillations normally observed during quiet stance.

Descriptive statistics for all posture-related variables are summarized in Table 12 and define the healthy reference baseline for index computation. Static alignment measures show small offsets and limited dispersion, consistent with stable upright posture. Dynamic features extracted from the rising-from-chair task exhibit moderate variability, reflecting natural inter-individual differences in strategy, whereas the regularity index remains uniformly high. During gait, trunk inclinations and their variability remain within narrow ranges, indicating repeatable postural behavior across steps.

Based on the standardized features, the Posture Performance Index (PPI) was computed for each subject. All variables were sign-aligned so that reduced sway and smaller deviations from neutral posture contributed coherently within the aggregated measure. By construction, the healthy cohort is centered around zero (PPI = 0.00 ± 0.46 SD), providing an internal reference for future comparisons. The correlation matrix (Figure 29, left) displays moderate and distributed associations without dominant block structures.

Static alignment and dynamic trunk-control descriptors occupy partially distinct yet interacting regions of the feature space, with mild intra-task coupling and limited cross-task relationships, indicating that multiple aspects of posture are sampled within the protocol.

PCA (Figure 29, center and right) provides a complementary low-dimensional representation. The first two components explain 25.4% and 18.6% of total variance, with subjects continuously distributed and no evident clustering. Loadings are balanced, suggesting that variability reflects the combined contribution of several descriptors rather than a single dominant parameter.

These analyses offer an exploratory characterization of posture-related variability within the healthy reference cohort, supporting the internal structure underlying computation of the PPI.

3.14. Body Bradykinesia

Body bradykinesia reflects a global reduction in spontaneous movement amplitude and speed involving axial control, transitional actions, and locomotion. In this framework, the construct was quantified in correspondence with MDS-UPDRS item 3.14 by integrating descriptors derived from quiet standing, rising from a chair, and gait, as defined in Section 2.3.14. Descriptive statistics for the contributing variables are reported in Table 13 and represent the healthy reference baseline for the computation of the Global Body Bradykinesia Performance Index (GBPI).

Across subjects, the distributions are compact and physiologically plausible. Static posture measures show limited dispersion, consistent with stable alignment. Rising-from-chair parameters display coherent temporal and dynamic patterns with minimal pauses. Gait descriptors exhibit slightly broader variability, reflecting natural differences in spontaneous locomotor style, yet remain within homogeneous limits.

Based on the standardized feature set, the GBPI was computed for each subject after sign alignment, so that reduced amplitude, slower execution, or increased variability contributed coherently to the aggregated metric. As expected from the normalization procedure, the healthy cohort is centered around zero (GBPI = 0.00 ± 0.32 SD). This distribution provides the internal reference model for subsequent comparisons.

Because this index integrates heterogeneous tasks, Figure 30 is intended as a structural summary showing that posture, sit-to-stand, and gait contribute partially distinct information within the internal reference population.

The correlation matrix (Figure 30, left) reveals predominantly low-to-moderate associations. Variables belonging to the same task tend to present mild coupling—for instance among gait amplitude measures or among trunk-related sit-to-stand descriptors—whereas cross-task correlations are weaker. This pattern indicates that posture, transitional movements, and walking contribute partially distinct information within the present dataset. PCA (Figure 30, right) provides a complementary summary. The first two components explain 21.0% and 18.8% of total variance, with subjects continuously distributed and no evident clustering.

Loadings (Figure 31) suggest that gait-related variables contribute more prominently to the first component, whereas sit-to-stand descriptors influence the second; static posture features have smaller contributions, consistent with their reduced variability in healthy individuals.

As in the other sections, these multivariate analyses should be interpreted as exploratory descriptions of how the selected variables organize within the reference population. They illustrate the internal structure supporting the GBPI computation, without implying validation of specific clinical constructs.

3.15. Postural Tremor of the Hands

Postural tremor was recorded using the acquisition configuration described for the facial and speech tasks. Participants kept both arms outstretched for 10 s, as specified in Section 2.3.15. This condition exposes the small-amplitude, high-frequency oscillations typical of physiological postural tremor. Figure 32 presents an example of the reconstructed pose together with representative fingertip displacement signals.

The trajectories show rapid oscillations superimposed on a stable postural reference and constitute the basis for amplitude-, frequency-, regularity-, and drift-related metrics. Descriptive statistics for each hand are summarized in Table 14 and define the normative reference used for standardization.

In healthy subjects, amplitude descriptors (A_peak, A_RMS, A_95) remain small and tightly distributed. The dominant frequency typically falls within the expected physiological range (≈7–10 Hz), while bandwidth values are generally narrow. Temporal measures indicate high rhythmic stability, with S_reg close to unity and modest cycle-to-cycle variability. Drift remains minimal, reflecting the ability to maintain a steady posture. No systematic differences emerge between hands.

After z-score normalization, unilateral composite indices were computed for each side. As expected, both distributions are centered around zero (left SD = 0.592; right SD = 0.515), reflecting physiological variability while maintaining overall homogeneity.

In addition to unilateral descriptors, bilateral measures were evaluated to characterize inter-hand relationships. Their distributions are reported in Table 15.

Symmetry values are generally small, whereas temporal coupling shows broader dispersion, consistent with the largely local origin of physiological tremor. These bilateral parameters are not included in the index aggregation but may provide complementary information in future comparative analyses.

The Spearman matrices (Figure 33) reveal clear internal organization. Amplitude metrics form a strongly coupled group, indicating substantial redundancy among different formulations of tremor magnitude. Frequency and cycle variability measures show coherent associations, while drift remains comparatively independent. The same qualitative structure is observed for both hands.

PCA (Figure 34) provides a compact representation of this arrangement. The first two components explain 41.2% and 24.6% of total variance, respectively. Observations from the two hands overlap extensively, confirming the absence of lateral differentiation in the healthy cohort. Loadings indicate that amplitude-related features mainly contribute to the first component, whereas frequency and rhythmic descriptors influence the second. Drift and bandwidth display smaller, distributed contributions.

As in the previous sections, these analyses should be interpreted as exploratory descriptions of the internal geometry of the feature space within the reference population. They clarify how tremor descriptors relate to one another and motivate their aggregation, without implying validation of clinical constructs.

3.16. Kinetic Tremor of the Hands

For this task, participants executed repeated nose–finger movements with each hand, following the protocol described in Section 2.3.16 (Figure 35). The paradigm combines voluntary, goal-directed motion with superimposed micro-oscillations, enabling the characterization of kinetic tremor during movement execution.

The distance profiles are dominated by the large voluntary excursion, while small, irregular high-frequency fluctuations represent the tremor component. These residual oscillations were used to extract amplitude, spectral, temporal regularity, and duty cycle descriptors. Their distributions in the healthy cohort are summarized in Table 16 and define the reference baseline for normalization.

Across subjects, amplitude measures remain extremely small, confirming that tremor magnitude during voluntary movement is limited in physiological conditions. Dominant frequencies typically lie within the expected 5–10 Hz interval, whereas bandwidth values are broader than those observed in postural tremor, reflecting reduced spectral coherence. Temporal descriptors indicate moderate irregularity, with S_reg well below unity and non-negligible cycle variability. The duty cycle provides complementary information on the proportion of time in which oscillatory activity is detectable. No systematic lateral differences are observed.

Using standardized values, the Kinetic Tremor Index was computed separately for each hand.

As expected, both distributions are centered around zero (SD ≈ 0.46 right, 0.39 left), representing normal inter-subject variability without evidence of asymmetry.

The Spearman matrices (Figure 36) highlight a reproducible internal structure.

Amplitude metrics form a tightly coupled group, confirming substantial redundancy among magnitude descriptors. Relationships between frequency and cycle variability reveal an additional coherent pattern, whereas the duty cycle shows weaker associations, supporting its complementary role. The qualitative organization is similar for both hands.

PCA (Figure 37) offers a compact summary of these relationships. The first two components explain 57.7% and 14.3% of the total variance. Score distributions from the two hands largely overlap, again indicating bilateral homogeneity. Loadings show that amplitude-related features primarily shape the first component, while frequency and timing descriptors contribute mainly to the second; remaining variables exert smaller distributed effects.

Consistently with the other motor tasks, these multivariate analyses are intended as exploratory descriptions of feature interdependence within the normative population. They clarify how descriptors group together and inform aggregation choices, without implying construct validation.

3.17. Rest Tremor Amplitude

For this task, participants remained at rest while kinematic signals were recorded according to the protocol described in Section 2.3.17. Representative pose reconstructions and example distance trajectories from a fixed reference point for facial and body landmarks are shown in Figure 38.

The signals display extremely small fluctuations, consistent with physiological micro-movements rather than clinically relevant tremor. Region-specific amplitude descriptors were extracted for the jaw/lips and for the left and right sides of the body. Their distributions in healthy controls are summarized in Table 17 and constitute the reference baseline for standardization.

Across all regions, amplitudes remain minimal and show strong bilateral symmetry. Facial values are generally lower than limb values, in agreement with typical observations in neurologically intact individuals. The limited dispersion primarily reflects residual motion and measurement noise. Standardized z-scores were averaged with equal weights to compute three regional indices: the Jaw/Lips, Right-Body, and Left-Body Rest Tremor Amplitude Indices.

As expected, their means are close to zero within the reference population. In accordance with the clinical definition of item 3.17, a global Rest Tremor Amplitude Index (RTAI) was defined as the maximum of the three regional values.

The slightly positive mean of the resulting distribution (≈0.07) arises from this order-statistic operation rather than from abnormal oscillatory activity.

The Spearman matrices shown in Figure 39 indicate near-collinearity among amplitude descriptors within each anatomical region. These measures are almost perfectly associated, suggesting that they capture highly overlapping aspects of tremor magnitude and can be summarized through a single aggregated amplitude index.

PCA provides a compact representation of the same geometry (Figure 40). The first component alone explains virtually the entire variance (>99%), while subsequent components contribute negligibly. Observations from different regions largely overlap in score space, and loadings are strongly collinear. As in the previous sections, these analyses should be interpreted as descriptive summaries of internal redundancy in the healthy dataset.

They clarify why a single aggregated index can represent rest tremor amplitude while regional measures remain available for anatomical localization.

3.18. Constancy of Rest Tremor

Rest tremor amplitude (item 3.17) and rest tremor constancy (item 3.18) are both evaluated across the entire examination, but they address distinct clinical aspects. Whereas item 3.17 relies on peak magnitude, item 3.18 concerns the temporal persistence of tremor and therefore requires time-based descriptors.

For this reason, tremor detections were aggregated over all intervals in which each anatomical region was identified as being at rest, independently of the specific task being performed. This strategy mirrors the clinical logic of the MDS-UPDRS, where constancy reflects observations accumulated throughout the visit rather than during a single standardized test.

Reference values for region-specific constancy descriptors are summarized in Table 18.

Across all regions, the results are consistent with physiological resting conditions: tremor events are rare, brief, and poorly sustained. The Rest Tremor Constancy Index (RTCI) occupies a very small fraction of the available rest time, burst counts are low, tremor-on durations remain short, and tremor-free intervals are long. Bilateral symmetry between body sides is preserved, while slightly higher values in the jaw/lips region likely reflect normal micro-movements and measurement sensitivity.

To obtain a subject-level quantity aligned with the clinical rule of item 3.18, a global RTCI was defined as the maximum value across regions. Within the healthy cohort, this distribution remains tightly clustered near zero, as expected in neurologically intact individuals. The use of the maximum operator ensures compatibility with clinical scoring while maintaining a continuous metric.

Spearman correlations were computed separately for each anatomical region to examine how temporal constancy metrics relate to one another. As shown in Figure 41, several descriptors exhibit monotonic associations within each region, indicating partially overlapping representations of tremor persistence.

RTCI shows strong positive associations with tremor-on duration and with the persistence index, and negative associations with tremor-off duration. Burst count tends to increase with RTCI, although with greater variability. Similar patterns appear across regions, suggesting a coherent organization of the temporal descriptors. As in previous sections, these relationships should be interpreted as descriptive properties of the feature set within the healthy population.

PCA offers a compact visualization of the same geometry (Figure 42). The first principal component captures the vast majority of variance (>95%), while the remaining components contribute only marginally. Loadings indicate that RTCI, tremor-on duration, burst number, and persistence index vary in a common direction, whereas tremor-off duration is oriented oppositely. Observations from different anatomical regions largely overlap in score space.

Together, these analyses indicate that rest tremor constancy in healthy individuals can be summarized along a predominant temporal dimension. This supports the practical use of RTCI as a continuous descriptor of tremor persistence while preserving regional measures for localization when needed.

3.19. Sensitivity Analysis of Composite Indices

Uniform aggregation of multiple standardized features offers a transparent and easily reproducible strategy for constructing Motor Item Indices; however, when features are partially correlated, equal weighting may introduce redundancy and could in principle overemphasize specific dimensions. To characterize the structural stability of the proposed composite indices with respect to feature redundancy, a sensitivity analysis was performed using a leave-one-out procedure.

3.19.1. Intra-Item Leave-One-Feature-Out Analysis

For each motor item, the corresponding MII was first computed from the full standardized feature set as defined in Section 2 (with sign alignment applied so that reduced amplitude, slower execution, increased variability, or reduced spontaneity contributed coherently to the index). The index was then recomputed repeatedly by excluding one feature at a time (leave-one-feature-out), while keeping the remaining features unchanged.

For each exclusion, the reduced index was compared to the full index across subjects by computing the Pearson correlation coefficient. High correlations indicate that the composite index is not dominated by any single descriptor and that removing an individual (potentially correlated) feature does not materially alter the subject-wise ordering induced by the index.

Across motor items, leave-one-feature-out correlations were consistently high (range approximately 0.85–0.99 in the present dataset), indicating that the composite indices are structurally stable to the removal of any single contributing feature. Item-wise results are reported in Supplementary Table S4.

3.19.2. Inter-Task Leave-One-Block-Out Analysis for Global Body Bradykinesia (GBPI)

Because the Global Body Bradykinesia Performance Index (GBPI; item 3.14) integrates heterogeneous motor contexts (arising from chair, gait, and static posture), stability was additionally evaluated at the task-block level. After computing the full GBPI from the concatenated standardized feature set, three reduced GBPI variants were obtained by excluding one entire block at a time (arising-from-chair, gait, or static posture), while retaining the other two blocks. Each reduced GBPI was compared to the full GBPI via Pearson correlation across subjects.

The resulting correlations remained high when excluding each block (r = 0.962 without arising-from-chair, r = 0.970 without gait, and r = 0.963 without static posture), indicating that no single motor context disproportionately drives the global index and that the multi-domain aggregation preserves a stable representation of whole-body performance within the internal reference cohort.

Overall, these sensitivity analyses provide a descriptive check of index stability under alternative feature inclusion scenarios, supporting the use of uniform weighting as a transparent initialization strategy at this baseline definition stage.

3.20. Robustness of Baseline Normalization

Because the Motor Item Indices (MIIs) are derived from z-score normalization relative to the healthy reference cohort, the stability of the estimated normalization parameters (mean μ and standard deviation σ) was explicitly evaluated.

Given the sample size (n = 15) and the sex imbalance within the cohort, additional analyses were performed to characterize the sensitivity of the baseline normalization to sample perturbations and distributional assumptions.

3.20.1. Leave-One-Subject-Out Stability

A leave-one-subject-out (LOSO) procedure was applied to all motor items. For each subject, μ and σ were recomputed excluding that individual, and the corresponding MII was recalculated using the reduced reference set. The LOSO-derived MIIs were then compared with the original full-sample MIIs across subjects.

Across all items, correlations between full-sample and LOSO MIIs were consistently high (see Supplementary Table S5), and mean absolute differences remained small relative to the unit-scale of the standardized indices. These findings indicate that the exclusion of a single subject does not materially alter the subject-wise ordering or magnitude of the derived indices within the present dataset, suggesting limited sensitivity of the normalization parameters to individual sample perturbations.

3.20.2. Robust Normalization Comparison

To assess sensitivity to distributional assumptions, standard z-score normalization was compared with a robust scaling approach based on median and median absolute deviation (MAD). For each item, MIIs were recomputed using median/MAD scaling, and the resulting indices were compared with the standard z-score–based MIIs.

As reported in Supplementary Table S6, robustly scaled indices showed high concordance with standard MIIs across all items, with minimal deviations in subject-wise ranking and magnitude. This comparison indicates that the baseline normalization is not critically dependent on the assumption of strict normality within the reference cohort.

Taken together, these analyses provide an empirical assessment of normalization stability under both sample perturbation and alternative scaling strategies. Within the present healthy reference dataset, the internal baseline parameters exhibit consistent behavior, supporting the methodological stability of the adopted standardization framework at this feasibility stage.

3.21. Summary of Baseline Results and Methodological Scope

Across all analyzed motor tasks, the proposed framework yielded quantitatively consistent and physiologically plausible descriptors of motor behavior in healthy subjects. The extracted kinematic and acoustic features, as well as the resulting Motor Item Indices, showed coherent patterns across repetitions and individuals. Multivariate analyses revealed recurring structural motifs across domains: in most tasks, descriptors organized into partially complementary dimensions reflecting movement amplitude/execution versus timing variability or stability-related components. Correlation matrices did not exhibit dominant block redundancy, and PCA projections showed compact subject distributions without systematic extreme deviations.

The sensitivity analyses presented in Section 3.19 further characterize the internal behaviour of the composite indices. Leave-one-feature-out evaluations demonstrated that the removal of any individual descriptor did not materially alter the subject-wise ordering induced by the Motor Item Indices. For the Global Body Bradykinesia Performance Index, exclusion of entire task blocks (arising-from-chair, gait, or posture) likewise produced highly concordant results. These observations indicate that the proposed aggregation strategy is not driven by single features or single motor contexts within the present reference dataset.

Taken together, these findings provide an exploratory description of the internal organization of the feature sets and support the methodological coherence of the composite index construction. By establishing a standardized internal reference baseline in a healthy cohort, this study demonstrates the technical feasibility of a fully automated, vision- and audio-based quantification of motor performance aligned with the MDS-UPDRS Part III framework.

The use of continuous, unit-free indices enables fine-grained representation of motor behaviour and facilitates quantitative comparison across individuals and tasks while preserving conceptual compatibility with the observational structure of the clinical examination. Importantly, the proposed system intentionally avoids generating automated estimates for motor constructs that are not inferable from visual or kinematic information alone, such as rigidity, which requires passive manipulation and haptic evaluation. This boundary condition maintains adherence to neurological examination principles and prevents over interpretation of derived metrics.

The present work should therefore be regarded as a baseline characterization and methodological foundation in healthy subjects. It does not establish diagnostic thresholds, severity mapping, or clinical performance metrics. Rather, it defines a structured quantitative reference against which deviations in clinically characterized populations can be examined in future validation studies. Formal clinical validation, association with neurologist-rated MDS-UPDRS scores, and evaluation in Parkinsonian cohorts remain necessary steps to determine diagnostic or severity-related applicability.

4. Conclusions

This work presents a comprehensive, contactless, and fully automated framework for the quantitative description of motor performance in tasks aligned with the MDS-UPDRS Part III.

By integrating stereo vision, deep learning-based pose estimation, and multimodal kinematic and acoustic processing, the system generates continuous, unit-free descriptors designed to remain conceptually compatible with the clinical constructs underlying the neurological motor examination.

A central achievement of the study is the definition of a standardized reference baseline derived from healthy individuals. This baseline enables feature normalization, supports reproducible computation of Motor Item Indices, and provides an internal reference framework for interpreting deviations in future investigations. Within this reference population, the indices exhibited organized behavior and limited dispersion within the healthy reference cohort. Multivariate analyses provided descriptive visualizations of feature relationships within this cohort, illustrating structured patterns observed within the present sample. These observations are exploratory and sample-specific and do not imply dimensional stability or structural validation.

Importantly, these analyses are intended as descriptive tools to characterize internal organization and complementarity among features, rather than as demonstrations of clinical or construct validity. The framework was developed in explicit agreement with established neurological examination principles.

Automated estimation is deliberately not attempted for items that cannot be inferred from visual or acoustic information alone, such as rigidity, which requires passive manipulation and haptic evaluation. This boundary condition preserves clinical interpretability and avoids introducing surrogate measurements that would lack examination equivalence.

The contactless architecture offers several methodological advantages. It avoids sensor placement and calibration procedures, reduces operator dependency, and allows standardized acquisition across heterogeneous settings. The resulting indices provide a continuous representation of motor behavior that may facilitate quantitative comparisons across sessions and subjects while maintaining transparency with respect to how each measure is derived.

The present investigation represents a methodological and technical stage centered on the construction of the internal healthy reference baseline. It does not aim to establish diagnostic performance, severity grading, or replacement of clinical judgment. Rather, it defines the quantitative measurement infrastructure necessary for subsequent patient-oriented validation. By demonstrating the feasibility of a fully automated and contactless system capable of generating structured, multidimensional, and standardized motor descriptors aligned with the MDS-UPDRS Part III architecture, this study establishes a reproducible foundation for objective motor quantification.

Within the healthy reference cohort, the derived indices exhibited organized patterns within the healthy reference cohort. These observations are descriptive and sample-specific, and do not imply structural validation or dimensional stability.

Future work will extend this framework to clinically characterized Parkinson’s disease cohorts, where formal validation studies will assess agreement with neurologist-rated scores, structural behavior of the indices in pathological conditions, and their suitability for longitudinal investigation. These steps are essential to determine diagnostic, severity-related, or early-stage applicability and to clarify the translational scope of the approach.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app16063091/s1, Table S1: Test–retest repeatability of digital motor features across tasks; Table S2: Computational configuration parameters; Table S3: Summary of digital descriptors contributing to each Motor Item Index; Table S4: Sensitivity analysis of composite indices; Table S5: Leave-one-subject-out (LOSO) stability of baseline normalization; Table S6: Comparison between standard z-score normalization and robust median/MAD scaling for all Motor Item Indices.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was exempt from requiring formal ethical approval, as determined by the ENEA Ethics Committee (Prot. No. 003/2026, 16 March 2026).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data supporting the findings of this study are available from the author upon reasonable request.

Conflicts of Interest

The author declares no conflicts of interest.

References

World Health Organization. Ageing and Health; World Health Organization: Geneva, Switzerland, 2022; Available online: https://www.who.int/news-room/fact-sheets/detail/ageing-and-health (accessed on 15 March 2026).
European Commission. The Impact of Demographic Change in Europe; European Commission: Brussels, Belgium, 2023; Available online: https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/new-push-european-democracy/impact-demographic-change-europe_en (accessed on 15 March 2026).
Feigin, V.L.; Stark, B.A.; Owens Johnson, C.; Roth, G.A.; Bisignano, C.; Abady, G.G.; Abbasifard, M.; Abbasi-Kangevari, M.; Abd-Allah, F.; Abedi, V.; et al. Global, regional, and national burden of stroke and its risk factors, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 2021, 20, 795–820. [Google Scholar] [CrossRef]
Dorsey, E.R.; Sherer, T.; Bloem, B.R. The emerging evidence of the Parkinson pandemic. J. Park. Dis. 2018, 8, S3–S8. [Google Scholar] [CrossRef]
Dorsey, E.R.; Bloem, B.R. The Parkinson pandemic—A call to action. Lancet Neurol. 2018, 17, 939–940. [Google Scholar] [CrossRef]
Goetz, C.G.; Tilley, B.C.; Shaftman, S.R.; Stebbins, G.T.; Fahn, S.; Martinez-Martin, P.; Poewe, W.; Sampaio, C.; Stern, M.B.; Dodel, R.; et al. Movement Disorder Society–sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): Scale presentation and clinimetric testing results. Mov. Disord. 2008, 23, 2129–2170. [Google Scholar] [CrossRef]
Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
Espay, A.J.; Bonato, P.; Nahab, F.B.; Maetzler, W.; Dean, J.M.; Klucken, J.; Eskofier, B.M.; Merola, A.; Horak, F.B.; Bloem, B.R.; et al. Technology in Parkinson disease: Challenges and opportunities. Mov. Disord. 2016, 31, 1272–1282. [Google Scholar] [CrossRef]
Del Din, S.; Godfrey, A.; Mazzà, C.; Lord, S.; Rochester, L. Free-living gait characteristics in ageing and Parkinson’s disease: Impact of environment and task. J. Neuroeng. Rehabil. 2016, 13, 46. [Google Scholar] [CrossRef]
Maetzler, W.; Domingos, J.; Srulijes, K.; Ferreira, J.J.; Bloem, B.R. Quantitative wearable sensors for objective assessment of Parkinson’s disease. Mov. Disord. 2013, 28, 1628–1637. [Google Scholar] [CrossRef]
Brody, H. Cancer Diagnosis. Nature 2020, 579, S1. [Google Scholar] [CrossRef]
Ranschaert, E.R.; Morozov, S.; Algra, P.R. Artificial Intelligence in Medical Imaging: Opportunities, Applications and Risks; Springer: Cham, Switzerland, 2019. [Google Scholar]
Marr, D. Vision: A Computational Approach; W.H. Freeman & Co.: San Francisco, CA, USA, 1982. [Google Scholar]
Poggio, T.; Torre, V.; Koch, C. Computational vision and regularization theory. In Readings in Computer Vision; Fisher, M., Hsu, D., Eds.; Elsevier: Amsterdam, The Netherlands, 1987; pp. 638–643. [Google Scholar] [CrossRef]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Taraglio, S.; Zanela, A. A practical use of cellular neural networks: The stereo-vision problem as an optimisation. Mach. Vis. Appl. 2000, 11, 242–251. [Google Scholar] [CrossRef]
Zanela, A.; Taraglio, S. A cellular neural network based optical range finder. Int. J. Circuit Theory Appl. 2002, 30, 271–285. [Google Scholar] [CrossRef]
Zanela, A.; Schirinzi, T.; Mercuri, N.B.; Stefani, A.; Romagnoli, C.; Annino, G.; Bonaiuto, V.; Cerroni, R. Using a Video Device and a Deep Learning-Based Pose Estimator to Assess Gait Impairment in Neurodegenerative Related Disorders: A Pilot Study. Appl. Sci. 2022, 12, 4642. [Google Scholar] [CrossRef]
Movement Disorder Society. MDS-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS). Available online: https://www.movementdisorders.org/MDS/MDS-Rating-Scales/MDS-Unified-Parkinsons-Disease-Rating-Scale-MDS-UPDRS.htm (accessed on 15 March 2026).
Cao, Z.; Hidalgo Martinez, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [Google Scholar] [CrossRef]
NVIDIA. CUDA Toolkit. Available online: https://developer.nvidia.com/cuda/toolkit (accessed on 15 March 2026).
NVIDIA. cuDNN. Available online: https://developer.nvidia.com/cudnn (accessed on 15 March 2026).
AKG. C417 PP Microphone (Speech/Spoken Word). Available online: https://www.akg.com/microphones/speech-spoken-word-microphones/C417PP.html (accessed on 15 March 2026).
Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng. 2010, 57, 884–893. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2012, 59, 1264–1271. [Google Scholar] [CrossRef]
Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef]
Kocabas, M.; Huang, C.-H.P.; Hilliges, O.; Black, M.J. PARE: Part Attention Regressor for 3D Human Body Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; CVF Open Access: Scarsdale, NY, USA, 2021. [Google Scholar]
Wang, Z.; Chen, L.; Rathore, S.; Shin, D.; Fowlkes, C. Geometric Pose Affordance: Monocular 3D Human Pose Estimation with Scene Constraints. In Computer Vision—ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; LNCS 13806; Springer: Berlin/Heidelberg, Germany, 2023; pp. 3–18. [Google Scholar] [CrossRef]
Pavllo, D.; Feichtenhofer, C.; Grangier, D.; Auli, M. Modeling Human Motion with Quaternion-Based Neural Networks. Int. J. Comput. Vis. 2020, 128, 855–872. [Google Scholar] [CrossRef]
Rábano-Suárez, P.; Del Campo, N.; Benatru, I.; Moreau, C.; Desjardins, C.; Sánchez-Ferro, Á.; Fabbri, M. Digital Outcomes as Biomarkers of Disease Progression in Early Parkinson’s Disease: A Systematic Review. Mov. Disord. 2025, 40, 184–203. [Google Scholar] [CrossRef]
Espay, A.J.; Hausdorff, J.M.; Sánchez-Ferro, Á.; Klucken, J.; Merola, A.; Bonato, P.; Paul, S.S.; Horak, F.B.; Vizcarra, J.A.; Mestre, T.A.; et al. A roadmap for implementation of patient-centered digital outcome measures in Parkinson’s disease obtained using mobile health technologies. Mov. Disord. 2019, 34, 657–663. [Google Scholar] [CrossRef]
Fröhlich, H.; Bontridder, N.; Petrovska-Delacréta, D.; Glaab, E.; Kluge, F.; Yacoubi, M.E.; Marín Valero, M.; Corvol, J.-C.; Eskofier, B.; Van Gyseghem, J.-M.; et al. Leveraging the Potential of Digital Technology for Better Individualized Treatment of Parkinson’s Disease. Front. Neurol. 2022, 13, 788427. [Google Scholar] [CrossRef]
Warmerdam, E.; Romijnders, R.; Hansen, C.; Elshehabi, M.; Zimmermann, M.; Metzger, F.; Schmidt, G.; Berg, D.; Maetzler, W. Long-term unsupervised mobility assessment in movement disorders. Lancet Neurol. 2020, 19, 462–470. [Google Scholar] [CrossRef]
Zhang, W.; Ling, Y.; Chen, Z.; Ren, K.; Chen, S.; Huang, P.; Tan, Y. Wearable sensor-based quantitative gait analysis in Parkinson’s disease patients with different motor subtypes. Npj Digit. Med. 2024, 7, 169. [Google Scholar] [CrossRef]
Caramia, C.; Torricelli, D.; Schmid, M.; Munoz-Gonzalez, A.; Gonzalez-Vargas, J.; Grandas, F.; Pons, J.L. IMU-Based Classification of Parkinson’s Disease From Gait: A Sensitivity Analysis on Sensor Location and Feature Selection. IEEE J. Biomed. Health Inform. 2018, 22, 1765–1774. [Google Scholar] [CrossRef]
Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Del Din, S.; Godfrey, A.; Mazzà, C.; Lord, S.; Rochester, L. Free-living monitoring of Parkinson’s disease: Lessons from the field. Mov. Disord. 2016, 31, 1293–1313. [Google Scholar] [CrossRef]
Engel, D.; Greulich, R.S.; Parola, A.; Vinehout, K.; Student, J.; Waldthaler, J.; Timmermann, L.; Bremmer, F. Sway frequencies may predict postural instability in Parkinson’s disease: A novel convolutional neural network approach. J. Neuroeng. Rehabil. 2025, 22, 29. [Google Scholar] [CrossRef]
Amprimo, G.; Masi, G.; Priano, L.; Azzaro, C.; Galli, F.; Pettiti, G.; Mauro, A.; Ferraris, C. Assessment Tasks and Virtual Exergames for Remote Monitoring of Parkinson’s Disease: An Integrated Approach Based on Azure Kinect. Sensors 2022, 22, 8173. [Google Scholar] [CrossRef]
de Kam, D.; Nonnekes, J.; Oude Nijhuis, L.B.; Geurts, A.C.H.; Bloem, B.R.; Weerdesteyn, V. Dopaminergic medication does not improve stepping responses following backward and forward balance perturbations in patients with Parkinson’s disease. J. Neurol. 2014, 261, 2330–2337. [Google Scholar] [CrossRef]
Janssen Daalen, J.M.; van den Bergh, R.; Prins, E.M.; Moghadam, M.S.C.; van den Heuvel, R.; Veen, J.; Bloem, B.R.; Helmich, R.C.; Del Din, S.; Rochester, L.; et al. Digital biomarkers for non-motor symptoms in Parkinson’s disease: The state of the art. Npj Digit. Med. 2024, 7, 186. [Google Scholar] [CrossRef]
Stereolabs. ZED 2 Stereo Camera. Available online: https://www.stereolabs.com/en-it/products/zed-2 (accessed on 15 March 2026).
Duffy, J.R. Motor Speech Disorders: Substrates, Differential Diagnosis, and Management, 3rd ed.; Elsevier: St. Louis, MO, USA, 2013. [Google Scholar]
Skodda, S.; Visser, W.; Schlegel, U. Vowel articulation in Parkinson’s disease. J. Voice 2011, 25, 467–472. [Google Scholar] [CrossRef]
Miller, N.; Noble, E.; Jones, D.; Burn, D. Life with communication changes in Parkinson’s disease. Age Ageing 2006, 35, 235–239. [Google Scholar] [CrossRef]
Skodda, S. Aspects of speech rate and regularity in Parkinson’s disease. J. Neurol. Sci. 2011, 310, 231–236. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; JMLR.org: New York, NY, USA, 2023; Volume 202, pp. 28492–28518. Available online: https://proceedings.mlr.press/v202/radford23a.html (accessed on 15 March 2026).
Calbert Graham, N.; Roll, N. Evaluating OpenAI’s Whisper ASR: Performance analysis across diverse accents and speaker traits. JASA Express Lett. 2024, 4, 025206. [Google Scholar] [CrossRef]
Moya-Galé, G.; Walsh, S.J.; Goudarzi, A. Automatic Assessment of Intelligibility in Noise in Parkinson Disease: Validation Study. J. Med. Internet Res. 2022, 24, e40567. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Jankovic, J. Parkinson’s disease: Clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 2008, 79, 368–376. [Google Scholar] [CrossRef]
Bologna, M.; Fabbrini, G.; Marsili, L.; Defazio, G.; Thompson, P.D.; Berardelli, A. Facial bradykinesia. J. Neurol. Neurosurg. Psychiatry 2013, 84, 681–685. [Google Scholar] [CrossRef]
Bianchini, E.; Rinaldi, D.; Alborghetti, M.; Simonelli, M.; D’Audino, F.; Onelli, C.; Pegolo, E.; Pontieri, F.E. The Story behind the Mask: A Narrative Review on Hypomimia in Parkinson’s Disease. Brain Sci. 2024, 14, 109. [Google Scholar] [CrossRef]
Simons, G.; Smith Pasqualini, M.C.; Reddy, V.; Wood, J. Emotional and nonemotional facial expressions in people with Parkinson’s disease. J. Int. Neuropsychol. Soc. 2004, 10, 521–535. [Google Scholar] [CrossRef]
Argaud, S.; Vérin, M.; Sauleau, P. Facial emotion recognition in Parkinson’s disease: A review and new hypotheses. Mov. Disord. 2018, 33, 554–567. [Google Scholar] [CrossRef]
Bentivoglio, A.R.; Bressman, S.B.; Cassetta, E.; Carretta, D.; Tonali, P.; Albanese, A. Analysis of blink rate patterns in normal subjects. Mov. Disord. 1997, 12, 1028–1034. [Google Scholar] [CrossRef]
Nishikawa, N.; Tejima, S.; Kamiyama, D.; Kurita, M.; Yamamoto, K.; Imai, S.; Sako, W.; Oyama, G.; Hatano, T.; Hattori, N. Spontaneous eye blink-based machine learning for tracking clinical fluctuations in Parkinson’s disease. Npj Park. Dis. 2025, 11, 247. [Google Scholar] [CrossRef] [PubMed]
Bandini, A.; Orlandi, S.; Escalante, H.J.; Giovannelli, F.; Cincotta, M.; Reyes-Garcia, C.A.; Velasco-Santos, C.; Houben, T.; Stergioulas, L.K.; Manfredi, C. Analysis of facial expressions in Parkinson’s disease through video-based automatic methods. J. Neurosci. Methods 2017, 281, 7–20. [Google Scholar] [CrossRef] [PubMed]
Ekman, P.; Friesen, W.V. Facial Action Coding System: A Technique for the Measurement of Facial Movement; Consulting Psychologists Press: Palo Alto, CA, USA, 1978. [Google Scholar]

Figure 1. KeyPoints of the FULL-BODY OpenPose model. (Left) HAND-21 model (enlarged view). Starting from the outer side of the palm (HR00 in the case of the right hand shown), each finger is represented by four KeyPoints, with increasing indices from the finger base (at the palm) to the fingertip. The numbering proceeds from the thumb (right hand: HR01–HR04) to the little finger (HR17–HR20). (Center) BODY-25 model. The KeyPoints are defined as follows: Nose → BO00, Neck → BO01, RightShoulder → BO02, RightElbow → BO03, RightWrist → BO04, LeftShoulder → BO05, LeftElbow → BO06, LeftWrist → BO07, MidHip → BO08, RightHip → BO09, RightKnee → BO10, RightAnkle → BO11, LeftHip → BO12, LeftKnee → BO13, LeftAnkle → BO14, RightEye → BO15, LeftEye → BO16, RightEar → BO17, LeftEar → BO18, LeftBigToe → BO19, LeftSmallToe → BO20, LeftHeel → BO21, RightBigToe → BO22, RightSmallToe → BO23, RightHeel → BO24. (Right) FACE-68 model (enlarged view). Starting from the right side near the right ear, 17 KeyPoints follow the facial contour (FA00–FA16). The eyebrows are described by 5 KeyPoints each, numbered from the outer side toward the center (right eyebrow: FA17–FA21; left eyebrow: FA22–FA26). The nose is represented by 9 KeyPoints, with 4 defining the nasal bridge (FA27–FA30) and 5 defining the base (FA31–FA35). Each eye is described by 6 KeyPoints, starting from the upper-right corner (right eye: FA36–FA41; left eye: FA42–FA47). The mouth region is shown with a further magnification to highlight the KeyPoint distribution: the mouth is represented by 20 KeyPoints, with 12 KeyPoints defining the outer lip contour (FA48–FA59) and 8 KeyPoints defining the inner contour (FA60–FA67).

Figure 2. Test environment setup used to assess motor skills. Setup to assess the parameters of the subjects in various conditions, including a free walk (a), a walk with an obstacle to avoid (b), standing still (c), sitting at a medium distance from camera 2 (d), and sitting near camera 2 (e). The arrow indicates the direction of motion of the subject.

Figure 3. Overview of the proposed multimodal processing pipeline. The system processes synchronized audio and stereo video inputs through two parallel branches. The audio stream is analyzed by an automated Dysarthria Analyzer, combining acoustic feature extraction with ASR-based speech intelligibility estimation using Jaccard distance. In parallel, stereo image pairs are processed to obtain depth information via stereovision and 2D body KeyPoints using the OpenPose full-body model. The 2D KeyPoints and 3D spatial information are fused to reconstruct and validate 3D body KeyPoints. Finally, speech- and motion-related descriptors are jointly processed in the feature extraction stage to generate subject-specific multimodal features for subsequent analysis.

Figure 4. Test environment and 3D scene reconstruction. (a) View of the experimental room from camera 1 positioned at P1; (b) view from camera 2 positioned at P4. Panels (c,d) show the corresponding 3D scene reconstructions, illustrating the spatial coverage achieved by the two-camera setup.

Figure 5. Experimental setup for speech and facial expression assessment. (Left) Lateral view of the subject seated during the standardized reading task, illustrating the microphone and camera placement and body KeyPoints used for posture and kinematic reference. (Right) Frontal view of the subject, showing facial and upper-body KeyPoints automatically detected for speech and facial expression analysis. The frontal camera was positioned at a fixed distance and height to ensure consistent video acquisition, while a microphone placed below the camera enabled stable audio recording for acoustic and ASR-based speech analysis. Different colors are used to represent body segments and facial landmarks for visualization purposes only; the same color convention applies to all figures of this type in the manuscript.

Figure 6. Multivariate analysis of speech features. (Left) Spearman correlation matrix of acoustic and intelligibility-related speech features in the healthy control group. (Center) PCA scores showing the distribution of subjects in the space defined by the first two principal components. (Right) PCA loadings illustrating the contribution of each speech feature to PC1 and PC2.

Figure 7. Facial expression task. Pose skeleton of the subject during neutral facial posture (left) and during smile activation (right), captured with the frontal camera.

Figure 8. Raw KeyPoint distance signals used for facial expression analysis. (Left) Eyelid distance trajectories during right-eye blinking. (Right) Lip separation distances during smile activation.

Figure 9. Multivariate analysis of facial expression features. (Left) Spearman correlation matrix among blink- and smile-related features. (Center) PCA score plot showing inter-subject variability in the PC1–PC2 space. (Right) PCA loadings highlighting complementary contributions of eyelid dynamics and smile-related measures.

Figure 10. Pose skeletons and thumb–index distance time series for right hand (top) and left hand (bottom), illustrating KeyPoint tracking and cycle segmentation during finger tapping.

Figure 11. Multivariate analysis of finger tapping features. Spearman correlation matrices for left and right hands (left, center) and PCA biplot of normalized kinematic features (right) for the healthy control group (n = 15 subjects, 30 observations).

Figure 12. Pose skeletons and thumb–little finger distance time series for right hand (top) and left hand (bottom), illustrating KeyPoint tracking and cycle segmentation during the hand movement task.

Figure 13. Multivariate analysis of hand movement features. Spearman correlation matrices for left and right hands (left, center) and PCA biplot of normalized kinematic features (right) for the healthy control group (n = 15 subjects, 30 observations).

Figure 14. Pose skeletons and inter-thumb distance time series illustrating KeyPoint tracking and cycle segmentation during the pronation–supination task.

Figure 15. Multivariate analysis of pronation–supination features. Spearman correlation matrix (left) and PCA biplot of normalized kinematic features (right) for the healthy control group (n = 15).

Figure 16. Pose skeletons and y coordinate time series for right (top) and left feet (bottom), illustrating KeyPoint tracking and cycle segmentation during the toe-tapping task.

Figure 17. Multivariate analysis of toe-tapping features. Spearman correlation matrices for left and right feet (left, center) and PCA biplot of normalized kinematic features (right) for the healthy control group (n = 15 subjects, 30 observations).

Figure 18. Pose skeletons and vertical heel displacement time series for the right (top) and left legs (bottom), illustrating KeyPoint tracking and cycle segmentation during the leg agility task.

Figure 19. Multivariate analysis of leg agility features in the healthy control group (n = 15). (Left,center): Spearman correlation matrices for the right and left legs. (Right) PCA biplot of standardized kinematic features, highlighting the separation between amplitude–stability and timing-variability components.

Figure 20. Sit-to-stand task: main movement phases captured from camera 1 (lateral view, (top)) and camera 2 (frontal view, (bottom)), showing the reconstructed body skeleton with tracked joints and inter-segment connections.

Figure 21. Representative signals for arising-from-chair: (a) vertical trajectories of BO00 (nose), BO01 (neck), and BO08 (mid-hip); (b) inter-thumb distance as an arm-use proxy; (c) lateral inclination of trunk/shoulder/hip segments; (d) forward trunk flexion (BO01–BO08).

Figure 22. Multivariate analysis for arising-from-chair features (healthy controls, n = 15): Spearman correlation matrix (left), PCA scores (center), and PCA loadings (right).

Figure 23. Gait assessment setup. Pose skeletons overlaid on frontal and lateral camera views during free walking (top) and obstacle avoidance walking (bottom).

Figure 24. Representative gait signals. (a) Heel-to-heel distance; (b) right heel vertical trajectory; (c) inter-wrist distance; (d) distance traveled; (e) lateral trunk inclination for three body segments; (f) forward trunk inclination.

Figure 25. Multivariate analysis for gait features (healthy controls, n = 15): (Left) Spearman correlation matrix illustrating monotonic relationships among step parameters, foot clearance, arm swing, postural control, and variability metrics. (Center) PCA scores showing the distribution of subjects in the space defined by the first two principal components. (Right) PCA loadings indicating the contribution of each gait-related feature to PC1 and PC2, highlighting complementary dimensions associated with gait variability and locomotor amplitude/postural strategy.

Figure 26. Postural stability task. Representative moments of the postural stability assessment. (a,b) Backward pushing maneuver applied to the subject. (c) Temporal evolution of inter-feet distance during the perturbation and recovery phases. (d) Vertical displacement (y-coordinate) of the subject’s left hip, reflecting center-of-mass adjustments.

Figure 27. Multivariate analysis of postural stability features in healthy controls (n = 15). (Left) Spearman correlation matrix illustrating relationships among stepping, recovery, trunk, and sway-related variables. (Center) PCA scores showing the distribution of subjects in the space defined by the first two principal components. (Right) PCA loadings highlighting the contribution of each feature to PC1 (stepping–propulsion) and PC2 (postural–oscillation) dimensions.

Figure 28. Representative examples of the posture assessment. (a,b) Reconstructed pose skeleton of the subject viewed from the lateral and frontal cameras. (c) Lateral trunk inclination during quiet standing for three body segments (BO01–BO08, BO02–BO05, BO09–BO12). (d) Forward trunk inclination during quiet standing for the upper trunk segment (BO01–BO08).

Figure 29. Multivariate analysis of posture-related features in healthy controls (n = 15). (Left) Spearman correlation matrix illustrating relationships among static posture, rising-from-chair, and gait-related trunk features. (Center) PCA scores showing the distribution of subjects in the space defined by the first two principal components. (Right) PCA loadings highlighting the contribution of each feature to PC1 and PC2.

Figure 30. Multivariate analysis of kinematic features associated with body bradykinesia (MDS-UPDRS item 3.14) in healthy controls (n = 15). (Left) Spearman correlation matrix illustrating relationships among posture-, rising-from-chair-, and gait-related features. (Right) PCA scores showing the distribution of subjects in the space defined by the first two principal components (PC1–PC2).

Figure 31. Loadings are shown separately for static posture, rising-from-chair and gait features. Only variables with substantial loading magnitude (|r| ≥ 0.25) are displayed to maintain readability; features with minimal contribution lie near the origin and were omitted. Dynamic gait-related features contribute predominantly to PC1, while rising-from-chair features contribute mainly to PC2.

Figure 32. Postural tremor assessment. (Left) Reconstructed pose skeleton of the subject during the task. (Right) Time evolution of the distance from a fixed reference point for the index and little fingers of the left and right hands, illustrating small-amplitude, high-frequency oscillations typical of physiological postural tremor.

Figure 33. Spearman correlation matrices among hand-specific tremor features for the postural tremor task in healthy controls (n = 15), shown separately for the left and right hands.

Figure 34. PCA of hand-specific postural tremor features. (Left) PCA scores for the 30 observations (15 subjects × 2 hands), showing substantial bilateral overlap. (Right) PCA loadings indicating that amplitude-related features primarily define PC1, while frequency-related features dominate PC2.

Figure 35. Left panels: pose skeletons of the subject during the nose–finger task for the right hand (top) and left hand (bottom), illustrating the arm trajectories during voluntary movement execution. Right panels: time evolution of the distance between the index finger and the nose for the corresponding hand, highlighting the presence of small, superimposed oscillations attributable to physiological kinetic tremor during movement.

Figure 36. Spearman correlation matrices among hand-specific kinematic features for the kinetic tremor task in healthy subjects (n = 15). (Left) Left hand; (Right) right hand.

Figure 37. PCA of kinetic tremors features. (Left) PCA score plot of left- and right-hand observations showing substantial bilateral overlap. (Right) PCA loadings, with amplitude-related features dominating PC1 and frequency- and timing-related features contributing mainly to PC2.

Figure 38. Testing rest tremor amplitude. (a) Pose skeleton during the test. Time profiles of distances from a fixed reference point for KeyPoints on (b) jaw/lips, (c) right side of body, and (d) left side of body.

Figure 39. Spearman correlation matrices of rest tremor amplitude features for body left side (left), jaw/lips (center), and body right side (right). Near-unitary correlations confirm the inherently one-dimensional structure of MDS-UPDRS item 3.17.

Figure 40. PCA of rest tremor amplitude features. (Left) PCA score plot of region-level observations (jaw/lips, body right side, body left side) showing a continuous distribution dominated by a single variance axis. (Right) PCA loadings indicating strong collinear contributions of all amplitude descriptors to PC1.

Figure 41. Spearman rank correlation matrices computed separately for the jaw/lips, right side of body, and left side of body. RTCI shows strong positive correlations with tremor-on duration and persistence index and strong negative correlations with tremor-off duration, indicating that MDS-UPDRS item 3.18 is dominated by a single temporal construct reflecting tremor persistence during rest.

Figure 42. PCA score plot (left) and loading plot (right) for region-specific temporal descriptors associated with item 3.18. The first principal component explains over 95% of the total variance and is dominated by RTCI, tremor-on duration, burst count, and persistence index, are consistent with a predominantly one-dimensional organization within this sample.

Table 1. Summary of subjects in the study (n = 15).

Gender	Female	3 (20%)
	Male	12 (80%)
Age (years)		58.5 ± 11.8
Height (cm)		175.4 ± 9.3
Weight (kg)		77.4 ± 11.4

Table 2. Speech Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
SPL_mean (dB)	67.14 ± 3.05, 67.04 [65.87–69.11], 60.04–73.25
F0_var (Hz)	23.19 ± 4.19, 22.37 [19.79–25.67], 16.40–29.87
SpeechRate (syllables/s)	4.20 ± 0.33, 4.13 [4.08–4.28], 3.58–4.76
PauseRatio (-)	0.18 ± 0.04, 0.19 [0.16–0.21], 0.10–0.24
HNR (dB)	18.60 ± 3.25, 19.11 [16.02–20.86], 13.27–24.47
JaccardDistance (-)	0.07 ± 0.04, 0.06 [0.05–0.11], 0.00–0.15

Table 3. Facial Expression Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
BlinkRate (blinks/s)	0.29 ± 0.07, 0.28 [0.24–0.34], 0.18–0.42
IBI:CV (-)	0.31 ± 0.09, 0.30 [0.25–0.37], 0.17–0.49
BlinkDur_mean (s)	0.19 ± 0.03, 0.19 [0.17–0.21], 0.14–0.25
BlinkAmp (m)	0.0048 ± 0.0011, 0.0047 [0.0040–0.0055], 0.0031–0.0069
LipSep_mean (m)	0.041 ± 0.009, 0.040 [0.036–0.046], 0.026–0.058
LipSep_max (m)	0.078 ± 0.015, 0.076 [0.068–0.086], 0.052–0.109
SmileOnset (s)	2.31 ± 0.74, 2.18 [1.79–2.78], 1.12–3.91
SmileDur (s)	1.84 ± 0.63, 1.77 [1.42–2.19], 0.71–3.05
SmileCount (-)	1.87 ± 0.83, 2.00 [1.00–2.00], 0–3

Table 4. Finger Tapping Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Right Hand	Left Hand
A_mean (mm)	44.65 ± 5.52, 45.00 [40.28–48.94], 37.28–52.62	44.34 ± 4.90, 43.41 [40.74–45.91], 37.57–51.80
f (Hz)	3.59 ± 0.48, 3.69 [3.29–4.08], 2.87–4.21	3.35 ± 0.58, 3.35 [2.98–3.72], 2.57–4.13
D (-)	0.934 ± 0.048, 0.923 [0.888–0.958], 0.872–1.003	0.938 ± 0.054, 0.941 [0.900–0.982], 0.877–1.031
R (-)	0.929 ± 0.021, 0.919 [0.903–0.936], 0.908–0.963	0.915 ± 0.029, 0.920 [0.898–0.931], 0.874–0.953
P (-)	0.056 ± 0.023, 0.058 [0.045–0.068], 0.019–0.085	0.061 ± 0.022, 0.065 [0.049–0.074], 0.028–0.090
J_t (s)	0.0209 ± 0.0076, 0.0205 [0.0180–0.0233], 0.0086–0.0345	0.0255 ± 0.0085, 0.0243 [0.0200–0.0284], 0.0147–0.0367
S_amp (mm/cycle)	−0.318 ± 0.125, −0.349 [–0.415–−0.278], –0.474–−0.127	−0.345 ± 0.162, −0.357 [–0.464–−0.256], –0.601–−0.129

Table 5. Hand Movements Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Right Hand	Left Hand
A_mean (mm)	134.49 ± 9.91, 135.03 [127.73–139.90], 120.44–147.97	127.74 ± 7.42, 126.85 [123.50–131.18], 116.68–138.88
f (Hz)	2.71 ± 0.24, 2.73 [2.56–2.85], 2.37–3.05	2.66 ± 0.19, 2.68 [2.63–2.79], 2.30–2.90
D (-)	0.946 ± 0.017, 0.946 [0.932–0.957], 0.925–0.975	0.943 ± 0.020, 0.941 [0.931–0.959], 0.914–0.971
R (-)	0.933 ± 0.018, 0.938 [0.917–0.948], 0.907–0.952	0.919 ± 0.020, 0.920 [0.905–0.934], 0.884–0.945
P (-)	0.0547 ± 0.0156, 0.0524 [0.0478–0.0593], 0.0359–0.0773	0.0463 ± 0.0242, 0.0460 [0.0343–0.0582], 0.0065–0.0831
J_t (s)	0.0212 ± 0.0077, 0.0228 [0.0161–0.0263], 0.0094–0.0310	0.0246 ± 0.0065, 0.0238 [0.0206–0.0294], 0.0159–0.0333
S_amp (mm/cycle)	−0.453 ± 0.107, −0.445 [−0.524–−0.402], −0.587–−0.286	−0.433 ± 0.132, −0.424 [−0.526–−0.390], −0.626–−0.207

Table 6. Pronation–Supination Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Both Hands
A_mean (mm)	443.38 ± 31.96, 444.52 [395.57–493.03]
f (Hz)	1.03 ± 0.14, 1.07 [0.79–1.20]
D (-)	0.945 ± 0.035, 0.942 [0.888–0.993]
R (-)	0.932 ± 0.023, 0.930 [0.895–0.966]
P (-)	0.0346 ± 0.0178, 0.0337 [0.010–0.061]
J_t (s)	0.0315 ± 0.0047, 0.0309 [0.0244–0.0384]
S_amp (mm/cycle)	−2.60 ± 0.85, –2.53 [–3.80–−1.49]

Table 7. Toe Tapping Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Right Foot	Left Foot
A_mean (mm)	59.80 ± 4.65, 58.50 [54.81–62.50], 54.81–69.24	57.06 ± 5.11, 56.97 [53.28–57.00], 48.69–65.37
f (Hz)	2.03 ± 0.16, 2.06 [1.97–2.12], 1.73–2.22	1.87 ± 0.13, 1.82 [1.71–1.93], 1.71–2.07
D (-)	0.943 ± 0.023, 0.945 [0.923–0.945], 0.916–0.976	0.949 ± 0.027, 0.955 [0.924–0.956], 0.909–0.991
R (-)	0.933 ± 0.025, 0.927 [0.915–0.942], 0.902–0.977	0.942 ± 0.023, 0.937 [0.914–0.952], 0.913–0.972
P (-)	0.0348 ± 0.0171, 0.0345 [0.025–0.045], 0.0106–0.0600	0.0419 ± 0.0162, 0.0448 [0.027–0.045], 0.0146–0.0617
J_t (s)	0.0319 ± 0.0068, 0.0322 [0.027–0.036], 0.0219–0.0421	0.0311 ± 0.0081, 0.0335 [0.024–0.033], 0.0178–0.0425
S_amp (mm/cycle)	−0.652 ± 0.226, −0.632 [−0.966–−0.633], −0.976–−0.365	−0.588 ± 0.267, −0.640 [−0.777–−0.407], −0.921–−0.155

Table 8. Leg Agility Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Right Leg	Left Leg
A_mean (mm)	194.27 ± 22.01, 187.32 [172.51–202.12], 169.52–231.26	193.12 ± 22.57, 194.72 [182.82–206.62], 154.77–217.66
f (Hz)	2.11 ± 0.19, 2.15 [2.04–2.25], 1.82–2.37	2.12 ± 0.25, 2.08 [1.97–2.20], 1.73–2.52
D (-)	0.920 ± 0.037, 0.907 [0.886–0.927], 0.874–0.977	0.917 ± 0.053, 0.913 [0.883–0.949], 0.851–1.010
R (-)	0.900 ± 0.030, 0.909 [0.890–0.928], 0.850–0.932	0.906 ± 0.027, 0.902 [0.886–0.918], 0.865–0.947
P (-)	0.074 ± 0.024, 0.075 [0.060–0.090], 0.047–0.117	0.059 ± 0.023, 0.058 [0.045–0.071], 0.029–0.091
J_t (s)	0.0324 ± 0.0094, 0.0337 [0.030–0.036], 0.0168–0.0461	0.0343 ± 0.0109, 0.0322 [0.027–0.037], 0.0199–0.0529
S_amp (mm/cycle)	−1.46 ± 0.60, −1.55 [−2.05–−1.03], −2.18–−0.55	−1.42 ± 0.56, −1.30 [−1.57–−1.03], −2.25–−0.70

Table 9. Arising from a chair Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
t_rise (s)	0.98 ± 0.08, 1.01 [0.93–1.04], 0.84–1.08
v_mean (m/s)	0.70 ± 0.13, 0.72 [0.58–0.81], 0.52–0.87
a_peak (m/s²)	3.02 ± 0.52, 2.98 [2.60–3.32], 2.54–4.82
U_hands (-)	0.05 ± 0.035, 0.05 [0.03–0.06], 0.01–0.10
θ_forward_max (°)	32.07 ± 4.37, 32.27 [28.69–35.64], 27.19–39.25
θ_forward_slope (°/s)	55.82 ± 7.35, 53.65 [50.92–61.29], 44.04–67.91
θ_lateral_max (°)	2.61 ± 1.62, 2.50 [1.28–4.04]. 0.67–4.83
R_sym (-)	0.935 ± 0.018, 0.932 [0.920–0.950], 0.911–0.961
P (-)	0.040 ± 0.013, 0.041 [0.028–0.049], 0.024–0.057

Table 10. Gait Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
step_length_mean (m)	672.29 ± 55.25, 689.47 [638.23–710.88], 573.59–738.28
step_length_CV (-)	0.0650 ± 0.0222, 0.0689 [0.0448–0.0838], 0.0340–0.0941
heel_lift_max (m)	105.62 ± 13.11, 105.94 [95.39–115.06], 84.39–126.56
heel_lift_CV (-)	0.0791 ± 0.0242, 0.0814 [0.0592–0.0944], 0.0426–0.1137
arm_swing_amp (m)	127.40 ± 20.13, 125.71 [110.85–142.42], 93.89–158.81
arm_swing_CV (-)	0.1014 ± 0.0294, 0.0948 [0.0793–0.1249], 0.0587–0.1545
distance_traveled (m)	4.957 ± 0.633, 4.979 [4.432–5.369], 4.042–5.843
lateral_tilt_max (°)	3.106 ± 1.362, 2.958 [1.918–4.323], 1.188–4.975
lateral_tilt_CV (-)	0.0969 ± 0.0331, 0.0953 [0.0727–0.1270], 0.0518–0.1378
forward_tilt_max (°)	16.87 ± 4.35, 17.23 [14.49–18.19], 10.34–24.16
forward_tilt_CV (-)	0.0849 ± 0.0221, 0.0888 [0.0630–0.1023], 0.0561–0.1131

Table 11. Postural Stability Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
steps_backward_count (-)	0.200 ± 0.414, 0.000 [0.000–0.000], 0.000–1.000
max_feet_distance_change (m)	0.066 ± 0.026, 0.063 [0.048–0.083], 0.035–0.101
hip_y_min_drop (m)	0.027 ± 0.014, 0.028 [0.018–0.033], 0.007–0.047
recovery_time (s)	0.960 ± 0.265, 0.907 [0.790–1.143], 0.531–1.312
peak_backward_velocity (m/s)	0.203 ± 0.073, 0.209 [0.153–0.259], 0.089–0.304
balance_steps_number (-)	0.267 ± 0.458, 0.000 [0.000–0.500], 0.000–1.000
recovery_step_length (m)	0.360 ± 0.078, 0.371 [0.326–0.397], 0.264–0.454
sway_ap_std (m)	0.014 ± 0.005, 0.014 [0.010–0.016], 0.005–0.020
sway_ml_std (m)	0.013 ± 0.004, 0.012 [0.009–0.016], 0.007–0.019
com_displacement (m)	0.061 ± 0.016, 0.060 [0.051–0.073], 0.040–0.083
trunk_angle_peak (°)	6.985 ± 2.033, 7.112 [5.470–8.408], 3.922–9.433

Table 12. Postural Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
forward_tilt_mean_static (°)	15.06 ± 1.69, 15.65 [13.86–16.54], 12.77–17.08
lateral_tilt_mean_static (°)	0.01 ± 0.60, −0.08 [–0.28–0.36], −1.07–0.92
shoulder_tilt_mean_static (°)	0.24 ± 0.95, 0.46 [–0.49–0.75], –1.16–1.55
hip_tilt_mean_static (°)	−0.73 ± 0.94, −0.32 [−1.10–−0.17], −2.33–0.20
forward_tilt_RMS_static (°)	0.81 ± 0.23, 0.79 [0.69–0.98], 0.48–1.11
lateral_tilt_RMS_static (°)	0.44 ± 0.12, 0.42 [0.36–0.51], 0.29–0.61
θ_forward_max (°)	15.35 ± 3.78, 15.49 [13.23–18.74], 9.23–21.80
θ_lateral_max (°)	1.95 ± 1.18, 1.81 [1.24–2.32], 0.54–4.82
R_sym (-)	0.94 ± 0.02, 0.93 [0.92–0.95], 0.91–0.96
forward_tilt_max (°)	16.87 ± 4.35. 17.23 [14.49–18.19], 10.34–24.16
forward_tilt_CV (-)	0.08 ± 0.02, 0.09 [0.06–0.10], 0.06–0.11
lateral_tilt_max (°)	3.01 ± 1.03, 2.82 [2.16–3.87], 1.61–4.48
lateral_tilt_CV (-)	0.09 ± 0.03, 0.09 [0.07–0.11], 0.06–0.12

Table 13. Body Bradykinesia Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Rising-from-Chair Features	Values
t_rise (s)	0.98 ± 0.08, 1.01 [0.93–1.04], 0.84–1.08
v_mean (m/s)	0.70 ± 0.13, 0.67 [0.60–0.80], 0.56–0.88
a_peak (m/s²)	3.23 ± 0.48, 3.16 [3.00–3.60], 2.57–3.92
θ_forward_max (°)	15.35 ± 3.78, 15.49 [13.23–18.74], 9.23–21.80
θ_forward_slope (°/s)	29.01 ± 4.56, 28.16 [26.11–32.54], 23.41–38.53
P (-)	0.03 ± 0.02, 0.03 [0.02–0.05], 0.01–0.06
R_sym (-)	0.94 ± 0.02, 0.93 [0.92–0.95], 0.91–0.96
Gait Features	Values
arm_swing_amp (m)	15.25 ± 4.78, 15.42 [12.31–18.50], 7.27–22.68
arm_swing_CV (-)	0.18 ± 0.04, 0.18 [0.15–0.21], 0.11–0.23
step_length_mean (m)	0.53 ± 0.07, 0.52 [0.48–0.58], 0.43–0.63
step_length_CV (-)	0.06 ± 0.02, 0.06 [0.05–0.07], 0.04–0.09
heel_lift_max (m)	0.12 ± 0.02, 0.12 [0.11–0.13], 0.10–0.16
heel_lift_CV (-)	0.06 ± 0.01, 0.06 [0.05–0.07], 0.05–0.07
distance_traveled (m)	5.03 ± 0.62, 4.89 [4.57–5.48], 4.22–5.93
forward_tilt_max (°)	16.87 ± 4.35, 17.23 [14.49–18.19], 10.34–24.16
forward_tilt_CV (-)	0.08 ± 0.02, 0.09 [0.06–0.10], 0.06–0.11
lateral_tilt_max (m) lateral_tilt_CV (-)	3.01 ± 1.03, 2.82 [2.16–3.87], 1.61–4.48 0.09 ± 0.03, 0.09 [0.07–0.11], 0.06–0.12
Static Posture Features	Values
forward_tilt_mean_static (°)	15.06 ± 1.69, 15.65 [13.86–16.54], 12.77–17.08
lateral_tilt_mean_static (°)	0.01 ± 0.60, −0.08 [−0.28–0.36], −1.07–0.92
forward_tilt_RMS_static (°)	0.81 ± 0.23, 0.79 [0.69–0.98], 0.48–1.11
lateral_tilt_RMS_static (°)	0.44 ± 0.12, 0.42 [0.36–0.51], 0.29–0.61

Table 14. Hand-Specific Postural Tremor Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Right Hand	Left Hand
A_peak (m)	0.978 ± 0.247, 0.856 [0.822–1.143], 0.685–1.463	0.935 ± 0.177, 0.919 [0.833–1.008], 0.573–1.317
A_RMS (m)	0.292 ± 0.064, 0.265 [0.255–0.333], 0.206–0.436	0.281 ± 0.055, 0.284 [0.246–0.309], 0.161–0.411
A_95 (m)	0.813 ± 0.179, 0.739 [0.711–0.926], 0.576–1.212	0.783 ± 0.154, 0.789 [0.684–0.860], 0.450–1.143
f_peak (Hz)	8.707 ± 0.859, 8.60 [8.25–9.30], 7.20–10.10	8.673 ± 1.080, 8.60 [7.90–9.30], 6.90–10.50
BW_50 (Hz)	0.027 ± 0.046, 0.000 [0.000–0.050], 0.000–0.100	0.020 ± 0.041, 0.000 [0.000–0.000], 0.000–0.100
S_reg (-)	0.9665 ± 0.0101, 0.9709 [0.9606–0.9728], 0.9479–0.9787	0.9655 ± 0.0102, 0.9683 [0.9564–0.9732], 0.9490–0.9810
CV_cycle (-)	0.271 ± 0.114, 0.234 [0.191–0.324], 0.153–0.567	0.259 ± 0.090, 0.245 [0.184–0.322], 0.155–0.443
D_drift (-)	0.1194 ± 0.0036, 0.1197 [0.1170–0.1225], 0.1131–0.1242	0.1181 ± 0.0060, 0.1178 [0.1148–0.1208], 0.1134–0.1294

Table 15. Bilateral Postural Tremor Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Values
S_LR (-)	0.057 ± 0.047, 0.042 [0.026–0.076], 0.006–0.155
C_LR (-)	0.223 ± 0.283, 0.100 [0.064–0.157], 0.046–0.868

Table 16. Hand-Specific Kinetic Tremor Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Right Hand	Left Hand
A_peak (m)	0.0071 ± 0.0008, 0.0071 [0.0066–0.0075], 0.0057–0.0087	0.0070 ± 0.0010, 0.0068 [0.0066–0.0074], 0.0056–0.0091
A_RMS (m)	0.0012 ± 0.0001, 0.0011 [0.0011–0.0012], 0.0010–0.0015	0.0011 ± 0.0001, 0.0011 [0.0011–0.0012], 0.0009–0.0014
A_95 (m)	0.0038 ± 0.0004, 0.0038 [0.0035–0.0041], 0.0031–0.0047	0.0037 ± 0.0005, 0.0037 [0.0034–0.0040], 0.0030–0.0049
f_peak (Hz)	7.16 ± 1.08, 7.17 [6.33–7.75], 5.50–9.17	6.50 ± 1.29, 6.67 [5.67–7.42], 4.17–9.33
BW_50 (Hz)	0.53 ± 0.23, 0.50 [0.33–0.67], 0.17–1.00	0.50 ± 0.31, 0.33 [0.33–0.67], 0.17–1.17
S_reg (-)	0.363 ± 0.088, 0.341 [0.319–0.387], 0.254–0.540	0.325 ± 0.095, 0.325 [0.267–0.392], 0.153–0.486
CV_cycle (-)	0.252 ± 0.041, 0.248 [0.225–0.282], 0.190–0.331	0.279 ± 0.058, 0.274 [0.252–0.325], 0.187–0.406
DC (-)	0.278 ± 0.021, 0.275 [0.269–0.285], 0.233–0.327	0.277 ± 0.016, 0.275 [0.271–0.280], 0.253–0.316

Table 17. Region-Specific Rest Tremor Amplitude Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Jaw/Lips	Body—Right Side	Body—Left Side
A_peak_rest (m)	0.0009 ± 0.0002, 0.0009 [0.0008–0.0010], 0.0006–0.0012	0.0014 ± 0.0003, 0.0014 [0.0011–0.0015], 0.0010–0.0019	0.0014 ± 0.0003, 0.0013 [0.0011–0.0015], 0.0010–0.0019
A_RMS_rest (m)	0.0003 ± 0.0001, 0.0003 [0.0002–0.0003], 0.0002–0.0004	0.0004 ± 0.0001, 0.0004 [0.0003–0.0005], 0.0003–0.0006	0.0004 ± 0.0001, 0.0004 [0.0004–0.0004], 0.0003–0.0006
A_95_rest (m)	0.0008 ± 0.0002, 0.0007 [0.0006–0.0008], 0.0005–0.0010	0.0012 ± 0.0003, 0.0011 [0.0010–0.0013], 0.0008–0.0017	0.0012 ± 0.0003, 0.0011 [0.0010–0.0013], 0.0008–0.0017
A_max_event (m)	0.0009 ± 0.0002, 0.0009 [0.0008–0.0010], 0.0006–0.0012	0.0014 ± 0.0003, 0.0014 [0.0011–0.0015], 0.0010–0.0019	0.0014 ± 0.0003, 0.0013 [0.0011–0.0015], 0.0010–0.0019

Table 18. Region-Specific Rest Tremor Constancy Reference Values (mean ± SD, median [IQR], and 5th–95th percentile).

Feature	Jaw/Lips	Body—Right Side	Body—Left Side
RTCI (-)	0.015 ± 0.008, 0.015 [0.009–0.021], 0.004–0.031	0.009 ± 0.005, 0.009 [0.005–0.013], 0.002–0.020	0.008 ± 0.005, 0.008 [0.005–0.011], 0.002–0.018
N_burst (min⁻¹)	1.067 ± 0.961, 1.000 [0.000–2.000], 0.000–3.000	0.467 ± 0.640, 0.000 [0.000–1.000], 0.000–2.000	0.400 ± 0.632, 0.000 [0.000–1.000], 0.000–2.000
T_on_med (s)	0.421 ± 0.087, 0.420 [0.345–0.490], 0.300–0.580	0.361 ± 0.055, 0.350 [0.325–0.400], 0.280–0.480	0.355 ± 0.052, 0.340 [0.320–0.390], 0.280–0.470
T_off_med (s)	37.607 ± 20.322, 31.400 [19.850–53.850], 12.900–75.500	50.733 ± 18.848, 50.200 [34.100–64.600], 20.700–82.100	53.760 ± 18.794, 53.600 [36.500–67.450], 22.400–84.000
PI (-)	0.042 ± 0.023, 0.039 [0.021–0.058], 0.013–0.090	0.025 ± 0.012, 0.023 [0.018–0.030], 0.011–0.058	0.024 ± 0.012, 0.022 [0.017–0.029], 0.011–0.056

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zanela, A. A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study. Appl. Sci. 2026, 16, 3091. https://doi.org/10.3390/app16063091

AMA Style

Zanela A. A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study. Applied Sciences. 2026; 16(6):3091. https://doi.org/10.3390/app16063091

Chicago/Turabian Style

Zanela, Andrea. 2026. "A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study" Applied Sciences 16, no. 6: 3091. https://doi.org/10.3390/app16063091

APA Style

Zanela, A. (2026). A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study. Applied Sciences, 16(6), 3091. https://doi.org/10.3390/app16063091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Contactless Deep Learning Framework for Quantitative Motor Assessment Aligned with the Movement Disorder Society Unified Parkinson’s Disease Rating Scale Part III: A Healthy Baseline Definition Study

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Proposed Technological Framework

1.3. Study Aim and Contributions

2. Methods and Materials

2.1. Methodological Framework

2.1.1. Image Processing and 2D-KeyPoints Computing

2.1.2. Audio Processing and Speech Analysis

2.1.3. Feature Standardization and Motor Item Index Computation

2.2. Experimental Setup and Materials

2.2.1. Participants

2.2.2. Multimodal Acquisition System

2.2.3. Stereo Vision and Recording Configuration

2.2.4. Experimental Protocol and Dataset Structure

2.3. Operationalization of Individual MDS-UPDRS Items

2.3.1. Testing Speech

2.3.2. Testing Facial Expression

2.3.3. Testing Rigidity

2.3.4. Testing Finger Tapping

2.3.5. Testing Hand Movements

2.3.6. Testing Pronation–Supination Movements of the Hands

2.3.7. Testing Toe Tapping

2.3.8. Testing Leg Agility

2.3.9. Testing Arising from a Chair

2.3.10. Testing Gait

2.3.11. Testing Freezing of Gait

2.3.12. Testing Postural Stability

2.3.13. Testing Posture

2.3.14. Testing Body Bradykinesia

2.3.15. Testing Postural Tremor of the Hands

2.3.16. Testing Kinetic Tremor of the Hands

2.3.17. Testing Rest Tremor Amplitude

2.3.18. Testing Constancy of Rest Tremor

2.4. Synthesis of Motor Item Indices

3. Results

3.1. Speech

3.2. Facial Expression

3.3. Rigidity

3.4. Finger Tapping

3.5. Hand Movements

3.6. Pronation–Supination Movements of Hands

3.7. Toe Tapping

3.8. Leg Agility

3.9. Arising from a Chair

3.10. Gait

3.11. Freezing of Gait

3.12. Postural Stability

3.13. Posture

3.14. Body Bradykinesia

3.15. Postural Tremor of the Hands

3.16. Kinetic Tremor of the Hands

3.17. Rest Tremor Amplitude

3.18. Constancy of Rest Tremor

3.19. Sensitivity Analysis of Composite Indices

3.19.1. Intra-Item Leave-One-Feature-Out Analysis

3.19.2. Inter-Task Leave-One-Block-Out Analysis for Global Body Bradykinesia (GBPI)

3.20. Robustness of Baseline Normalization

3.20.1. Leave-One-Subject-Out Stability

3.20.2. Robust Normalization Comparison

3.21. Summary of Baseline Results and Methodological Scope

4. Conclusions

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI