GRU-Based Deep Multimodal Fusion of Speech and Head-IMU Signals in Mixed Reality for Parkinson’s Disease Detection

Daria Hemmerling; Milosz Dudek; Justyna Krzywdziak; Magda Żbik; Wojciech Szecowka; Mateusz Daniol; Marek Wodzinski; Monika Rudzinska-Bar; Magdalena Wojcik-Pedziwiatr

doi:10.3390/s26010269

,

and

¹

Department of Measurement and Electronics, AGH University of Krakow, 30-059 Krakow, Poland

²

Department of Neurology, Andrzej Frycz Modrzewski Krakow University, 30-705 Krakow, Poland

^*

Author to whom correspondence should be addressed.

Sensors2026, 26(1), 269;https://doi.org/10.3390/s26010269
(registering DOI)

This article belongs to the Special Issue Intelligent Biomedical Systems: The Convergence of Sensors, Signal Processing, and Machine Learning

Version Notes

Order Reprints

Abstract

Parkinson’s disease (PD) alters both speech and movement, yet most automated assessments still treat these signals separately. We examined whether combining voice with head motion improves discrimination between patients and healthy controls (HC). Synchronous measurements of acoustic and inertial signals were collected using a HoloLens 2 headset. Data were obtained from 165 participants (72 PD/93 HC), following a standardized mixed-reality (MR) protocol. We benchmarked single-modality models against fusion strategies under 5-fold stratified cross-validation. Voice alone was robust (pooled AUC ≈ 0.865), while the inertial channel alone was near chance (AUC ≈ 0.497). Fusion provided a modest but repeatable improvement: gated early-fusion achieved the highest AUC (≈0.875), cross-attention fusion was comparable (≈0.873). Gains were task-dependent. While speech-dominated tasks were already well captured by audio, tasks that embed movement benefited from complementary inertial data. Proposed MR capture proved feasible within a single session and showed that motion acts as a conditional improvement factor rather than a sole predictor. The results outline a practical path to multimodal screening and monitoring for PD, preserving the reliability of acoustic biomarkers while integrating kinematic features when they matter.

Keywords:

Parkinson’s disease; augmented reality (AR); speech analysis; voice biomarkers; inertial measurement units (IMU); head motion; multimodal data fusion; gyroscope and accelerometer signals; wearable sensing; remote digital assessment

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.