Next Article in Journal
Data-Dependent Weighted E-Value Aggregation for Fusion Learning
Previous Article in Journal
On an Inverse First-Passage Problem for Jump-Diffusion Processes
Previous Article in Special Issue
KAN-Former: 4D Trajectory Prediction for UAVs Based on Cross-Dimensional Attention and KAN Decomposition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Biomechanics-Guided and Time–Frequency Collaborative Deep Learning Framework for Parkinsonian Gait Severity Assessment

1
Department of Neurologic Surgery, Mayo Clinic Florida, Jacksonville, FL 32224, USA
2
Department of Neurosurgery, The 904th Hospital of the Joint Logistics Support Force of PLA, Wuxi 214044, China
3
College of Artificial Intelligence and Automation, Hohai University, Changzhou 213200, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 89; https://doi.org/10.3390/math14010089
Submission received: 17 November 2025 / Revised: 14 December 2025 / Accepted: 18 December 2025 / Published: 26 December 2025

Abstract

Parkinson’s Disease (PD) is a neurodegenerative disorder in which gait abnormalities serve as key indicators of motor impairment and disease progression. Although wearable sensor-based gait analysis has advanced, existing methods still face challenges in modeling multi-sensor spatial relationships, extracting adaptive multi-scale temporal features, and effectively integrating time–frequency information. To address these issues, this paper proposes a multi-sensor gait neural network that integrates biomechanical priors with time–frequency collaborative learning for the automatic assessment of PD gait severity. The framework consists of three core modules: (1) BGS-GAT (Biomechanics-Guided Graph Attention Network), which constructs a sensor graph based on plantar anatomy and explicitly models inter-regional force dependencies via graph attention; (2) AMS-Inception1D (Adaptive Multi-Scale Inception-1D), which employs dilated convolutions and channel attention to extract multi-scale temporal features adaptively; and (3) TF-Branch (Time–Frequency Branch), which applies Real-valued Fast Fourier Transform (RFFT) and frequency-domain convolution to capture rhythmic and high-frequency components, enabling complementary time–frequency representation. Experiments on the PhysioNet multi-channel foot pressure dataset demonstrate that the proposed model achieves 0.930 in accuracy and 0.925 in F1-score for four-class severity classification, outperforming state-of-the-art deep learning models.

1. Introduction

PD is a common neurodegenerative disorder characterized by a series of motor impairments, such as tremor, rigidity, bradykinesia, and gait disturbance, all of which profoundly affect patients’ quality of life. Among these symptoms, gait abnormality serves as a critical biomarker for PD diagnosis and disease staging, as it objectively reflects dysfunctions in rhythm regulation, gait symmetry, and dynamic balance. Therefore, accurate and objective gait analysis plays a pivotal role in early PD screening, quantification of disease severity, and evaluation of therapeutic outcomes [1].
The rapid advancement of wearable sensing technologies has turned gait data collection using plantar pressure sensors and other devices into a key area of research. By utilizing deep learning models for automated analysis, these methods eliminate subjective bias from clinical observations and enable continuous monitoring of detailed gait dynamics. Despite progress, current studies still face challenges in effectively leveraging multidimensional information in gait signals.
Current methods struggle to adequately model spatial relationships among multiple sensors, as interactions among plantar pressure sensors are influenced by human anatomy and biomechanics. Most approaches use fully connected structures or simple convolutional operations to learn sensor dependencies, lacking explicit biomechanical guidance. It often results in spatial features that are not physiologically plausible and may capture misleading correlations. Moreover, the flexibility of temporal feature extraction is limited [2]. Gait rhythms in individuals with PD can be unstable and variable, requiring models that capture multi-scale temporal patterns. While architectures like Inception-1D are commonly used, their fixed kernel sizes challenge adaptability to variations in gait across patients and disease stages. Finally, existing models do not sufficiently exploit frequency-domain information. Abnormal gait characteristics in PD are often more pronounced in the frequency domain; yet, most models focus primarily on time-domain analysis, overlooking important spectral cues that could enhance the understanding of rhythmic abnormalities in gait [3].
To address the challenges in gait analysis, we propose a multi-sensor gait neural network that integrates biomechanical knowledge with time–frequency modeling. Our framework introduces three key innovations:
(1) We create an adjacency matrix based on the anatomical layout of plantar sensors, using BGS-GAT to model sensor dependencies, enhancing the interpretability of spatial features.
(2) Utilizing the Inception-1D architecture, we implement dilated convolutions and AMS-Inception1D to model the unstable gait rhythms in PD flexibly.
(3) We design a parallel TF-Branch that extracts complementary features using RFFT and lightweight convolutional layers for comprehensive gait characterization.
This model effectively identifies gait patterns associated with PD severity, demonstrating superior accuracy and robustness compared to existing methods.
The rest of this paper is structured as follows. Section 2 reviews the relevant literature, and Section 3 offers a detailed description of the proposed model architecture. Section 4 outlines the experimental setup and presents the results, while Section 5 concludes the paper and discusses potential future research directions.

2. Related Work

This section reviews existing studies on wearable sensor-based gait analysis, the use of Graph Neural Networks (GNNs) in gait modeling, and recent advancements in time–frequency analysis for medical signal processing.

2.1. Gait Analysis Based on Wearable Sensors

Wearable devices such as plantar pressure sensors and Inertial Measurement Units (IMUs) have become essential tools for objective gait assessment in PD [4]. Early studies predominantly relied on hand-crafted gait features, including spatial-temporal parameters such as stride length, gait speed, and stance-phase percentage [5]. Although these features have clear physiological and biomechanical interpretations, they struggle to capture the complex, subtle nonlinear variations that occur during dynamic gait.
With the continuous advances in wearable sensing technologies—particularly the emergence of wireless IMUs featuring high sampling rates and high measurement precision—objective quantification of Parkinsonian gait impairments has become feasible, as highlighted by Moreau et al. [6]. These technological improvements enable continuous, high-resolution, non-invasive monitoring of gait and posture, addressing the limitations of traditional clinical tools, such as the Movement Disorder Society-Unified PD Rating Scale (MDS-UPDRS), in detecting subtle motor abnormalities, particularly those affecting gait and postural control.
Meanwhile, machine learning techniques have demonstrated substantial potential in analyzing complex kinematic data from wearable sensors, offering new avenues for identifying clinically meaningful motor fluctuations. Abujrida et al. [7] utilized gait parameters collected by a smartphone placed in the participant’s front pocket to successfully predict items 2.12 (walking and balance) and 2.13 (freezing) of the MDS-UPDRS II. Safarpour et al. [8] employed sensors worn on both feet and the lower back to record gait parameters during laboratory-based posture tasks and daily at-home activities, and estimated Postural Instability and Gait Disorder (PIGD) scores accordingly. Han et al. [9] proposed an automatic Freezing of Gait (FOG) assessment method leveraging IMUs and a Multi-Stage Temporal Convolutional Network (MS-TCN), systematically examining the influence of task conditions, medication states, and stopping behavior on detection performance. In addition, Tzallas et al. [10] concurrently recorded gait data using instrumented insoles and IMUs, demonstrating the consistency and complementarity of these sensing modalities in detecting Parkinsonian gait abnormalities, which further supports the clinical value of wearable multi-sensor fusion for objective assessment.
While research has shown that sensor-based assessments are practical for PD, they face limitations, including small sample sizes (often fewer than 40 participants), a lack of standardized protocols, and a primary focus on single-symptom scores, such as PIGD. These issues restrict detailed evaluations of gait and posture impairments.

2.2. GNNs for Spatial Gait Modeling

GNNs have emerged as practical tools for processing non-Euclidean structured data. In gait analysis, body joints and plantar pressure sensors are represented as nodes in a graph, with edges based on their physical connections or proximity. This framework presents a promising approach to modeling spatial dependencies in locomotion.
Zhong et al. [11] introduced RFdGAD, which employs a GNN to extract frequency-domain features from Vertical Ground Reaction Force (VGRF) signals. The framework comprises a frequency-domain learning module for each VGRF sensor and a graph-adaptive network that captures inter-sensor connectivity, thereby enhancing the accuracy of local discharge detection. In the medical domain, Graph Convolutional Networks (GCNs) have been increasingly applied to classify PD severity using gait analysis. For example, Zhang et al. [12] proposed WM-STGCN, a practical spatiotemporal modeling framework for PD gait recognition, demonstrating superior performance over traditional machine-learning and LSTM-based methods.
Nerrise et al. [13] developed a Geometric-Weighted Graph Attention Network to identify functional connectivity subnetworks associated with gait impairments in individuals with PD. By integrating geometric priors into the attention mechanism, the method provides both individual-level and group-level interpretability, offering a novel perspective on embedding biological structural priors into graph attention models. Wang et al. [14] further proposed a Global-Local Dynamic Directed Graph Neural Network (GLD2-GNN), which dynamically learns time-varying topologies from plantar VGRF signals and incorporates temporal convolution units to capture temporal dependencies. This design achieved highly robust PD detection across various datasets, opening new avenues for learning dynamic gait topology. Tian et al. [15] introduced a Cross-Spatiotemporal Graph Convolution Network (CST-GCN), which explicitly models cross-frame dependencies among gait skeletal joints through a cross-spatiotemporal neighborhood labeling strategy and employs non-shared weights across spatial and temporal dimensions. Their method yielded significant improvements in multi-view PD gait MDS-UPDRS score estimation.
In gait analysis, integrating biomechanical prior knowledge into graph attention mechanisms presents a critical challenge. Without these priors, models may capture misleading correlations between sensors. This study addresses the issue by introducing the BGS-GAT module, which embeds biomechanical constraints into the graph attention process.

2.3. Multi-Scale and Frequency-Domain Analysis of Temporal Signals

To capture multi-scale temporal patterns, researchers have examined multi-scale convolutional architectures. InceptionNet and its one-dimensional variant, Inception1D, are key frameworks that utilize parallel convolutions with different kernel sizes for feature extraction. Naimi et al. [16] introduced InceptoFormer, which combines Inception1D with a Transformer to classify gait severity in PD, demonstrating the benefits of combining multi-scale convolutions with attention mechanisms. Additionally, Wang et al. [17] developed the PhysioGait Predictive Network (PhysioGPN), which integrates temporal and spatial features to predict FOG, underscoring the importance of multi-scale feature extraction for complex gait dynamics.
However, existing studies still exhibit limitations: most models employ manually predefined convolutional kernel sizes and lack adaptive mechanisms to accommodate inter-individual variability in gait rhythms, limiting their performance in handling the highly complex and unstable gait cycles of PD patients. Furthermore, Zhao et al. [18] proposed a Federated Transferring Multi-Channel CNN that leverages multi-sensor parallel convolutions to extract hierarchical temporal features and adopts a layered transfer mechanism to enable cross-institution generalization with limited samples, highlighting the potential of multi-scale convolutional strategies for implicit feature fusion and robustness enhancement.
On the other hand, frequency-domain analysis offers unique advantages for revealing periodic and rhythmic components in biosignals. Typical features observed in PD patients—such as 4–6 Hz resting tremor and rhythm-integration abnormalities—are particularly prominent in the spectral domain [19]. Traditional approaches typically extract hand-crafted frequency-domain features (e.g., power spectral density) before feeding them into machine learning models. In the deep learning domain, transforming time-domain signals into spectral representations through FFT and combining them with convolutional architectures has achieved notable success in biomedical signal analysis tasks such as Electrocardiography (ECG) and Electroencephalography (EEG) [20].
Li et al. [21] performed time–frequency decomposition of PD sleep EEG signals using Tunable Q-factor Wavelet Transform (TQWT) and Wavelet Packet Transform (WPT), followed by Deep Residual Shrinkage Networks (DRSNs) for multi-class classification, achieving high-accuracy discrimination of PD, REM sleep Behavior Disorder (RBD), and comorbid conditions, further supporting the diagnostic potential of spectral features in neurodegenerative diseases. Additionally, Kwon et al. [22] employed a wearable single-lead ECG system for long-term monitoring of Heart Rate Variability (HRV). They demonstrated significantly decreased Low-Frequency (LF) power in PD patients, which correlated negatively with cerebellar gray matter volume. These findings indicate that frequency-domain ECG features can reflect autonomic dysfunction in PD, providing new physiological evidence for early diagnosis. However, in deep-learning-based PD gait analysis, most existing methods remain confined to purely time-domain modeling and fail to exploit the complementary benefits of joint time–frequency learning. Consequently, the morphological gait information embedded in the time domain and the rhythmic tremor signatures revealed in the frequency domain have yet to be deeply integrated in a unified framework.
In summary, while existing studies have made significant advancements in analyzing gait in PD, three significant limitations persist:
(1) There is a lack of biomechanical priors in spatial modeling, which leads to insufficient physiological reasoning when defining the relationships among multiple sensors.
(2) The use of fixed temporal feature extraction scales restricts adaptive modeling of unstable and diverse gait cycles in individuals with PD.
(3) Frequency-domain information is underutilized, resulting in limited integration of spectral features necessary for capturing rhythmic abnormalities.
To address these limitations, this work proposes a multi-sensor gait network that incorporates biomechanical priors along with time–frequency collaborative learning. By introducing three key modules—BGS-GAT, AMS-Inception1D, and TF-Branch—the model enables comprehensive, adaptive, and physiologically interpretable learning of gait features across spatial, temporal, and frequency domains.

3. Multi-Sensor Gait Network with Biomechanical Priors and Time–Frequency Collaboration

We propose a multi-sensor gait analysis model that combines biomechanical priors with a time–frequency collaborative mechanism. This model learns gait representations across three complementary branches—spatial, temporal, and frequency—enabling a comprehensive exploration of the multidimensional characteristics of gait signals. This approach aims to identify pathological gait patterns in individuals with PD. The overall architecture of the proposed framework is illustrated in Figure 1.

3.1. Model Input

The gait data used in this study were collected from an array of plantar pressure sensors. Each sample contains N = 18 sensor channels with a temporal length of T = 100 time steps. The raw signal can be represented as Equation (1).
X = { x t , c |   t = 1 , , T ; c = 1 , , N } ,
where x t , c denotes the pressure value recorded by the c -th sensor at time step t .
During preprocessing, the raw signals were segmented using a sliding window with a window length of 100 and a stride of 50, followed by normalization and filtering to ensure that each sample covers a complete gait cycle and to suppress noise. After preprocessing, each sample is represented as a time-series matrix of size ( T , N ) reflecting the pressure variations across 18 sensors over 100 time steps.
In the model implementation, the time-series matrix is treated as the raw input X B × T × N , where B denotes the batch size, T the number of time steps, and N the number of sensor channels. At the input layer, the model processes the 18-channel signals in parallel and feeds them into three dedicated feature extraction branches:
(1) Spatial branch (BGS-GAT): captures inter-regional force coordination patterns by leveraging the spatial topology of plantar sensors and modeling biomechanical dependencies.
(2) Temporal branch (AMS-Inception1D): extracts multi-scale dynamic features along the temporal dimension, characterizing gait rhythm and phase-dependent variations.
(3) Frequency branch (TF-Branch): applies an RFFT along the temporal axis to model spectral energy distributions and rhythmic characteristics of gait signals.
In the following sections, all three branches operate on the same raw input X and independently model gait dynamics from complementary spatial, temporal, and frequency perspectives. Their learned representations are subsequently integrated at the fusion layer for multimodal feature aggregation and final classification.

3.2. BGS-GAT

To effectively model the spatial dependencies and biomechanical coupling among plantar pressure sensors, we introduce a BGS-GAT. This module represents the plantar sensor layout as a graph structure [23], where edges encode anatomical proximity and biomechanical relationships. By incorporating attention mechanisms, BGS-GAT adaptively learns the interaction strength between sensor nodes, enabling the model to capture spatial correlations and coordinated loading patterns throughout the gait cycle.

3.2.1. Design Motivation of the BGS-GAT Module

During the gait cycle, pressure variations across different foot regions are interconnected. For example, force transmission from the heel to the midfoot reflects the transition from loading to propulsion, while coordinated activation of the forefoot and toes corresponds to push-off. Individuals with abnormal gait patterns show altered force pathways and intensities, highlighting the need to model these biomechanical dependencies.
Traditional CNNs struggle with irregular topologies in non-Euclidean spaces. To address this, BGS-GAT models the 18 plantar pressure sensors as nodes in a graph, defining their connections based on biomechanical principles. An attention-based message-passing mechanism enables weighted feature aggregation and adaptive learning of spatial dependencies.
Compared to conventional fully connected or fixed convolutional methods, BGS-GAT offers key benefits:
(1) Structural Adaptivity: The attention mechanism adjusts interaction weights between nodes according to different gait phases.
(2) Physical Consistency: Node connectivity reflects the actual spatial layout of the sensors, aligning with biomechanical structures and physiological force transmission patterns.

3.2.2. Adjacency Matrix Construction

The plantar pressure sensing system consists of N = 18 sensors. Based on their spatial arrangement on the plantar surface, the foot can be modeled as a weighted undirected graph:
G = ( V , E , A ) ,
where V = { v 1 , v 2 , , v N } denotes the set of sensor nodes, E represents the set of edges connecting physically adjacent sensors, and A N × N is the adjacency matrix, with A i j measuring the connection strength between nodes v i and v j .
The adjacency matrix is constructed based on both the physical 2D layout of the 18 plantar sensors and established biomechanical force transmission pathways. Nodes represent sensors located in anatomically defined regions (hindfoot, midfoot, forefoot, toes). Edges are assigned between sensors that are physically adjacent or exhibit functional biomechanical coupling during gait. The i j 2 rule reflects the natural sequential loading from heel to toe and medial–lateral force redistribution, ensuring that graph connectivity encodes physiologically meaningful relationships. This hybrid anatomical–biomechanical strategy enhances interpretability and prevents the formation of non-physiological interactions.
Considering the anatomical continuity and hierarchical regional structure of the plantar surface (hindfoot–midfoot–forefoot–toe), we adopt a distance-decay weighting strategy as Equation (3).
A i j = 1 1 + | i j | if | i j | 2 , 0 otherwise ,
To preserve self-connections, we set A i i = 1 or all nodes.
In Equation (3), sensors at two positions are considered to have a direct physical connection, and the connection strength decreases with increasing spatial distance. The adjacency matrix is kept fixed during training and serves as a masking matrix in the attention computation, ensuring that information propagation occurs only within biomechanically plausible neighborhoods. It serves structural constraints and enhances the model’s physical interpretability.

3.2.3. Graph Attention Mechanism

For each sample, the pressure sequence of the i -th sensor is first averaged along the temporal dimension to obtain its overall loading representation. A 1 × 1 convolution is then applied to project the signal into a higher-dimensional feature space, producing a feature vector for each node. By concatenating the feature vectors of all N plantar sensors, we obtain the input feature matrix for the graph attention layer:
X B × N × F ,
where B denotes the batch size, N the number of nodes (sensors), and F the feature dimension of each node. The graph attention layer performs weighted feature aggregation across nodes through the following four stages.
(1) Feature Projection
First, each node feature is linearly transformed and projected into a higher-dimensional latent space:
H = X W , W F × F ,
where W is the learnable projection matrix and F denotes the output feature dimension. This operation maps the original sensor features into the graph feature space.
(2) Attention Coefficient Computation
For any two nodes v i and v j , the unnormalized attention score between them is computed as Equation (6).
e i j = LeakyReLU ( a l h i + a r h j ) ,
where h i and h j denote the projected features of nodes i and j , respectively. The vectors a l , a r F are learnable weight parameters that measure the importance of a node and its neighbors. The LeakyReLU activation with a negative slope of 0.2 is applied to avoid sparse gradients.
(3) Masking and Normalization
To prevent spurious dependencies between non-adjacent nodes, the adjacency matrix A is applied as a mask to the attention scores:
e i j = { e i j , i f   A i j > 0 , , i f   A i j = 0 .
Subsequently, the masked attention scores are normalized using a softmax function to obtain the normalized attention weights:
α i j = exp e i j k N ( i ) exp e i k ,
where N ( i ) denotes the set of neighboring nodes of node i .
(4) Feature Aggregation and Residual Connection
The output feature of each node is obtained by computing a weighted sum over the features of its neighboring nodes:
h i = j N ( i ) α i j h j .
In matrix form, this can be written as Equation (10).
H = Softmax ( E ) H ,
where E = [ e i j ] denotes the attention score matrix.
To enhance gradient propagation and prevent feature degradation, a residual connection and a self-regularizing activation function are incorporated:
H ˜ = SELU ( H + X W r ) ,
where W r is a linear projection matrix that adjusts the residual branch when the input and output dimensions do not match, and the SELU activation ensures stable feature distributions.

3.2.4. Network Architecture and Implementation

The BGS-GAT module consists of two sequential graph attention layers, each with an output feature dimension of F = 64 . A dropout rate of 0.1 is applied during training to mitigate overfitting. The overall computation pipeline can be expressed as (12).
X H 0 GAT 1 ( A ) H 1 GAT 2 ( A ) H 2 GlobalAvgPool Z g ,
where H 0 denotes the linearly projected node feature matrix; GAT 1 ( A ) and GAT 2 ( A ) represent the first and second graph attention operations with the adjacency mask A ; and global average pooling (GAP) is applied over the node dimension to obtain the global spatial embedding vector z g F .
The spatial feature Z g produced by the BGS-GAT module is then combined with the representations from the temporal branch (AMS-Inception) and the frequency branch (TF-Branch). These multimodal features are jointly fed into the fusion layer to enable comprehensive spatiotemporal modeling of gait signals.
This structure effectively characterizes force interactions and coordinated pressure variations across plantar regions in a non-Euclidean space, providing physically interpretable and data-driven spatial representations for Parkinson’s gait severity classification.

3.3. AMS-Inception1D

In temporal modeling, gait signals display inherently multiscale dynamics. Rapid oscillations within short time windows represent transient gait fluctuations, while slower variations over longer periods capture the overall gait rhythm. A single convolutional kernel is insufficient to simultaneously capture features across these diverse temporal scales.
To address this challenge, we propose AMS-Inception1D, which utilizes parallel convolutional branches to extract features with different temporal receptive fields. Additionally, it incorporates a channel attention mechanism to adaptively combine these multi-scale representations.

3.3.1. Design Motivation of the AMS-Inception1D Module

In real gait processes, short-term features (e.g., heel strike or toe-off events) coexist with long-term characteristics (e.g., stance-phase duration or cadence). Both types of temporal patterns jointly shape the overall temporal dynamics of gait.
Traditional One-dimensional Convolutions (Conv1D) typically use fixed kernel sizes, which restrict the model to a single temporal scale. As a result, they struggle to simultaneously capture both short-term and long-term dynamics in gait sequences.
AMS-Inception1D adopts a parallel multi-scale convolution strategy, applying multiple temporal receptive fields to the input sequence and employing a channel attention mechanism (Squeeze-and-Excitation Gate, SE-Gate) to adaptively fuse the resulting features. This enables the model to automatically adjust the importance of different temporal scales according to the complexity of the input gait patterns.

3.3.2. Multi-Scale Feature Extraction

The AMS-Inception1D module is designed to capture dynamic patterns ranging from local to global temporal scales, thereby enhancing the model’s ability to perceive rhythmic characteristics in gait signals. The module consists of four parallel branches, each employing a different convolution or pooling operation to extract features at multiple temporal resolutions.
(1) Local Receptive Field Branch
To learn linear combinations across channels and capture short-range temporal dependencies, a 1 × 1 convolution is applied for feature projection:
X 1 = σ ( Conv 1 D ( X ; k = 1 ) ) ,
where k denotes the kernel size and σ ( ) represents the SELU activation function. This branch performs a point-wise linear transformation along the temporal dimension, enhancing cross-channel feature interactions.
(2) Medium-Scale Feature Branch
To capture medium-range temporal variations across adjacent time steps, a 3 × 1 convolution kernel is employed:
X 2 = σ ( Conv 1 D ( X ; k = 3 ) ) .
This branch is able to model short-term gait fluctuation patterns, such as the local dynamics occurring between the stance and swing phases.
(3) Dilated Convolution Branch
To enlarge the temporal receptive field without increasing the number of parameters, a dilated convolution with a dilation rate of d = 2 is introduced:
X 3 = σ ( Conv 1 D ( X ; k = 3 , d = 2 ) ) .
The effective receptive field of the dilated convolution is given by k eff = k + ( k 1 ) ( d 1 ) , which enables the model to capture long-range temporal dependencies across wider time intervals, thereby improving its ability to model complete gait cycles.
(4) Pooling Branch
To extract smooth global trends from the time series, a max-pooling operation is applied followed by a 1 × 1 convolution to restore the channel dimension:
X 4 = σ ( Conv 1 D ( MaxPool 1 D ( X ; k = 3 ) ; k = 1 ) ) .
This branch helps suppress local noise and extract global trend information.
(5) Multi-Scale Feature Fusion
The output feature tensors of the four branches are denoted as Equation (17).
X 1 , X 2 , X 3 , X 4 B × T × F ,
where B denotes the batch size, T is the number of temporal steps, and F represents the output channel dimension of each branch.
To integrate temporal information across different receptive scales, the feature maps from the four branches are concatenated along the channel dimension (i.e., the third axis), yielding a unified multi-scale representation:
X cat = Concat ( X 1 , X 2 , X 3 , X 4 ) B × T × 4 F ,
where Concat denotes channel-wise concatenation. After fusion, the number of output channels expands from F to 4 F , enabling the model to simultaneously preserve multi-scale temporal patterns captured by the pointwise, medium-kernel, dilated-convolution, and pooling branches.
This design equips the model with both local and global temporal modeling capacity within a single layer, enabling it to effectively characterize temporal dynamics across different gait phases. Furthermore, it provides the subsequent Temporal Phase-Aware Transformer (TPAT) with rich multi-scale temporal features as its input.

3.3.3. Adaptive Channel Weighting Mechanism (SE-Gate)

Due to the substantial variability in temporal feature distributions across different gait phases, the importance of features at different scales also varies across subjects. To address this, a SE-Gate is introduced to adaptively re-weight multi-scale features.
First, a global channel descriptor is obtained by applying GAP to the concatenated feature representation X cat along the temporal dimension:
z = GAP ( X cat ) B × 4 F .
Next, channel-wise dependencies are modeled and non-linear recalibration is performed through a two-layer fully connected network:
s = σ 2 ( W 2 σ 1 ( W 1 z ) ) ,
where W 1 4 F × r and W 2 r × 4 F denote the learnable weight matrices, σ 1 ( ) is the SELU activation function, σ 2 ( ) is the Sigmoid activation function, and r represents the channel reduction ratio (approximately F / 8 ).
Finally, the channel attention vector s is applied to recalibrate the fused feature representation:
X ˜ = X cat s ,
where ⊙ denotes channel-wise multiplication. Through this mechanism, the model can automatically emphasize informative temporal feature channels while suppressing redundant ones, thereby achieving adaptive fusion of multi-scale representations.

3.3.4. Module Output and Feature Normalization

After channel re-weighting, the feature representation X ˜ is further processed with Batch Normalization (BN) followed by the SELU activation function to stabilize training and accelerate convergence:
Y = SELU ( BN ( X ˜ ) ) .
The final output Y B × T × 4 F is then fed into the TPAT module to further capture global dependencies across temporal steps.

3.3.5. TPAT

To capture long-range temporal dependencies and phase variations in gait dynamics, a TPAT module is incorporated into the proposed architecture.
Unlike conventional Transformers, TPAT explicitly models the relative phase relationships between temporal steps during attention computation, thereby enhancing the model’s sensitivity to periodic gait characteristics.
Its input is the channel-weighted and normalized feature tensor Y B × T × 4 F and the temporal feature modeling is carried out in the following three stages:
(1) Temporal Position and Phase Encoding
The time index t is transformed into a phase embedding vector:
ϕ t = [ sin ( 2 π t / P ) , cos ( 2 π t / P ) ] ,
where p denotes an approximate gait cycle period. This embedding is added to the input features to enable the model to distinguish phase relationships across temporal steps during attention computation.
(2) Multi-Head Self-Attention
Multi-head attention is applied to the phase-modulated features:
Attention ( Q , K , V ) = Softmax ( Q K T d k + Φ ) V ,
where Φ denotes the phase-shift matrix, which introduces relative positional relationships between temporal steps and facilitates learning of gait rhythm variations.
(3) Feed-Forward Network (FFN) and Residual Connections
Each attention layer is followed by a FFN, Layer Normalization (LayerNorm), and residual connections to enhance feature stability:
Z t = FFN ( LayerNorm ( Y t + Attention ( Y t ) ) ) .
Finally, the TPAT outputs the refined temporal representation:
Z t B × T × F t ,
which simultaneously encodes local temporal dynamics and periodic phase characteristics, thereby providing richer temporal dependencies for subsequent time–frequency collaborative fusion.
Through the TPAT module, the model not only captures long-range temporal dependencies in gait signals but also remains sensitive to cross-cycle phase shifts (e.g., asymmetric foot strike patterns and delayed rhythmic transitions), thereby enabling more accurate discrimination of Parkinsonian gait severity levels.

3.4. TF-Branch

3.4.1. Design Motivation of the TF-Branch Module

Gait signals are non-stationary time series characterized by distinct periodicity and rhythmic structures, containing information in both time and frequency domains. Time-domain features capture local fluctuations, reflecting short-term stability and motor coordination, while frequency-domain features illustrate energy distribution across frequencies, revealing gait cadence and rhythmicity. In Parkinsonian gait, we observe a concentration of low-frequency energy, shifts in dominant frequency peaks, and increased high-frequency jitter, which purely time-domain models cannot fully capture. To address this, we introduce a TF-Branch that complements time-domain representations with frequency-domain cues, enabling more comprehensive analysis of gait signals.

3.4.2. Frequency-Domain Transformation and Representation

To extract the global rhythmic characteristics of gait, the pressure time series from all N = 18 sensors are first averaged along the channel dimension, yielding the global foot-pressure trajectory X ¯ B × T × 1 . This aggregated signal reflects the overall temporal evolution of plantar loading and provides a stable representation of gait cadence and periodicity.
Using X ¯ as the input to the frequency branch, RFFT is applied along the temporal dimension to obtain the complex-valued frequency spectrum:
F r ( X ¯ ) = T 1 n = 0 X ¯ ( n ) e j 2 π k n / T , k = 0 , 1 , , T / 2 .
Since the magnitude spectrum more reliably reflects energy distribution than the phase spectrum, the magnitude component of the RFFT is adopted as the frequency-domain representation:
X f = | F r ( X ¯ ) | B × K × 1 ,
where K = T / 2 + 1 denotes the number of valid frequency bins.
To focus on the low-frequency components associated with gait rhythmicity and suppress high-frequency noise, only the first K = 64 dominant frequency bins are retained:
X f ( K ) = { X f ( b , k , c ) | k = 1 , , K } , X f ( K ) B × K × C .
Equation (29) ensures that the model focuses on the primary gait frequency and its harmonics, while preventing high-frequency noise from interfering with spectral feature learning.

3.4.3. Frequency-Domain Feature Modeling

The frequency-domain feature X f ( K ) characterizes the distribution of signal energy across different frequency components.
To capture local variation patterns in the spectrum and the coupling relationships across frequency bands, a one-dimensional convolution operator is applied along the frequency axis to perform deep modeling of spectral features:
Y f = σ ( Conv 1 D ( X f ( K ) ; k f , F f ) ) ,
where Conv 1 D ( ) denotes the one-dimensional convolution operation applied along the frequency axis, k f is the kernel size, F f is the number of output channels, and σ ( ) represents the SELU activation function.
This process is equivalent to sliding convolutional kernels over the spectral domain, thereby capturing local fluctuations in energy distribution, inter-band energy transitions, and relationships between the fundamental frequency and its harmonics.
After convolution, the resulting feature tensor Y f B × K × F f represents the multi-dimensional spectral representation transformed in the frequency domain, serving as the frequency-domain embedding foundation for subsequent time–frequency fusion.

3.4.4. Time–Frequency Collaborative Fusion Mechanism

To achieve complementary integration of time-domain and frequency-domain features, a collaborative fusion strategy combining global feature aggregation and linear fusion is employed.
Let the high-level representation produced by the time-domain branch be Z t B × T × F t , and the output of the frequency-domain branch be Z f = Y f B × K × F f . GAP is first applied along the temporal or frequency dimension, respectively, to extract global embedding vectors:
z t = GAP ( Z t ) B × F t , z f = GAP ( Z f ) B × F f .
Subsequently, the two representations are fused to construct the joint time–frequency feature:
z t f = α z t + ( 1 α ) z f ,
where α [ 0 , 1 ] is a learnable weighting coefficient used to adaptively balance the contributions of time-domain and frequency-domain information.
A larger value of α encourages the model to emphasize temporal dynamics, whereas a smaller value shifts the focus toward spectral rhythmic features.
This learnable weighting strategy enables the network to automatically adjust the time–frequency fusion ratio based on the distributions of temporal and spectral characteristics in each input sample.
Finally, the fused vector z t f B × F t f is used as a global time–frequency representation and fed into the classification layer. Together with the spatial embedding Z g generated by the BGS-GAT branch, it forms a unified input to the classifier for comprehensive prediction of disease severity.

3.5. Feature Fusion and Classification Output

After multi-stage feature extraction through the spatial branch (BGS-GAT), temporal branch (AMS-Inception1D and TPAT), and frequency branch (TF-Branch), the model obtains three complementary high-level representations. The spatial features capture plantar loading topology and inter-sensor coupling relationships, the temporal features characterize dynamic evolution and phase-dependent gait variations, and the spectral features reveal periodic rhythmic patterns and energy distribution characteristics. To achieve unified modeling and joint decision-making across these modalities, a feature fusion and classification module is designed to integrate multi-source representations and predict gait severity levels.
First, the global feature vectors produced by the three branches are denoted as Equation (33).
z g B × F g , z t B × F t , z f B × F f ,
where F g , F t and F f denote the feature dimensions of each branch, and z g , z t and z f correspond to the global average pooled outputs from the spatial, temporal, and frequency branches, respectively. To integrate information from different modalities, a feature concatenation strategy is adopted, in which the three representations are combined along the feature dimension to form a unified embedding vector:
z fuse = Concat ( z g , z t , z f ) B × ( F g + F t + F f ) .
This operation preserves the independent representational capacity of each modality while allowing the network to learn high-dimensional cross-modal interactions in subsequent layers. Then, a fully connected mapping is applied to achieve nonlinear feature combination and dimensionality compression:
h = ϕ ( W f z fuse + b f ) ,
where W f ( F g + F t + F f ) × F h and b f F h are learnable parameters, and ϕ ( ) denotes the SELU activation function. This step enables semantic fusion across modalities within a unified feature space, extracting latent representations that are most discriminative for disease severity assessment.
To improve the model’s generalization ability and suppress overfitting, a Dropout regularization layer is applied after the fusion layer, randomly discarding a portion of neurons to enhance feature robustness. Finally, a Softmax classifier is used to generate four-class probability predictions for gait severity levels, with the output dimension defined by Equation (36).
y ^ = Softmax ( W c h + b c ) B × 4 ,
where W c and b c are the learnable weights and biases of the classification layer, respectively, and the four output categories correspond to PD gait severity levels { 0 , 2 , 2 , 5 , 3 } .
During training, the model is optimized using categorical cross-entropy loss and the adaptive Nadam optimizer for parameter updates. The cross-entropy loss is defined as Equation (37).
L C E = 1 B i = 1 B c = 1 4 y i , c log ( y ^ i , c ) ,
where y i , c denotes the one-hot encoded ground-truth label and y ^ i , c represents the predicted probability. To further balance class distribution differences, class-specific weights are introduced during training, ensuring stable convergence under imbalanced sample conditions.
By integrating temporal, frequency, and spatial domains into the semantic representation layer, the proposed framework achieves full-pipeline multimodal information fusion. This module enables the model to consolidate heterogeneous gait features within a unified representation space, thereby facilitating accurate recognition of Parkinsonian gait severity and enhancing biomechanical interpretability. The fusion strategy not only strengthens robustness against modality-specific variations but also reflects a comprehensive understanding of the multidimensional structure of human gait at the feature level, providing a reliable foundation for computer-aided disease assessment.

4. Experiments and Results

4.1. Dataset and Evaluation Metrics

This study employs a publicly available Parkinson’s gait dataset from PhysioNet. The dataset contains VGRF signals collected using wearable insoles equipped with 18 foot-pressure sensor channels, sampled at 100 Hz. VGRF signals capture the plantar loading patterns generated during each gait cycle. The 18 insole sensors record pressure waveforms corresponding to key gait events such as heel strike, mid-stance, and toe-off, providing both spatial and temporal information on foot–ground interaction. In Parkinson’s disease, VGRF signals often show reduced force peaks, prolonged stance phases, and increased variability across cycles. Thus, VGRF offers a physiologically meaningful representation of gait impairment and serves as an appropriate input modality for automated PD severity assessment. All participants walked on a flat surface at a natural self-selected pace for approximately two minutes, during which the system continuously recorded complete gait cycles.
Figure 2 illustrates recorded sequences of data output from diverse sensors of a healthy control participant and an individual with Parkinson’s disease.
A total of 166 participants were included in this study, consisting of 93 patients with PD and 73 Healthy Controls (HC). The PD group was further stratified into three categories according to the Hoehn & Yahr (H&Y) scale: mild (Stage 2), moderate (Stage 2.5), and moderately severe (Stage 3). The distribution of sex and disease severity among participants is summarized in Table 1. The HC group was age- and sex-matched with the PD group (p > 0.05).
In the preprocessing stage, the raw VGRF signals were segmented into time-series windows using a sliding-window strategy with a window length of 100 frames and a stride of 50 frames, where each sample contained time-series data from 18 sensor channels. To avoid data leakage caused by window-level splitting, all gait recordings from the same participant were first assigned to a specific fold, and the sliding-window segmentation was then performed within each fold. Consequently, all windows derived from the same subject were kept strictly within the same fold and never appeared across both the training and validation sets.
Subsequently, the following processing steps were applied:
(1) Normalization: Each sensor channel was standardized using z-score normalization based on its mean and standard deviation to mitigate inter-subject variability.
(2) Position encoding: A normalized position encoding was added along the temporal dimension to preserve sequential gait timing information.
(3) Class balancing: During training, the SMOTE algorithm was employed to oversample minority classes and alleviate class imbalance.
(4) Label mapping and discretization: According to the Hoehn & Yahr scale, severity levels { 0 , 2 , 2.5 , 3 } were mapped to discrete categories { 0 , 1 , 2 , 3 } , and a one-hot representation was used as the model output.
We employed the 10-fold cross-validation strategy for model training and evaluation. In this approach, the samples from the Parkinson’s Disease (PD) and healthy control (HC) groups were divided into 10 mutually exclusive subsets, ensuring that the class proportions remained consistent across folds (93 PD subjects and 73 HC subjects). During each iteration, one fold was set aside as the validation set, while the remaining nine folds were combined to form the training set. This process was repeated until each fold had been used once for validation. To maintain fairness and enhance generalization, SMOTE (Synthetic Minority Over-sampling Technique) oversampling and normalization were applied independently within each fold.
To comprehensively evaluate model performance, this study adopted four commonly used classification metrics: Accuracy, Precision, Recall, and F1-score. Let TP, TN, FP, and FN denote the numbers of true positive, true negative, false positive, and false negative samples, respectively. The metrics are defined as follows:
Precision :   P r = T P T P + F P
Recall :   R e = T P T P + F N
F 1 - Score = 2 × P r × R e P r + R e
Accuracy ( % ) : A c c = T P + T N T P + T N + F P + F N × 100 %

4.2. Experimental Environment and Implementation Framework

The experiments were conducted on a computing platform equipped with an NVIDIA GeForce RTX 4090 GPU (manufactured by NVIDIA Corporation, Santa Clara, CA, USA), using Python 3.9. All deep learning models were implemented using the TensorFlow 2.x/Keras framework, with dynamic GPU memory allocation enabled to prevent out-of-memory issues. A fixed random seed (SEED = 42) was applied throughout the training process to ensure reproducibility of results. The main network architecture and training hyperparameters are summarized in Table 2.
During training, an early stopping mechanism was employed to monitor the validation loss, terminating the process when no significant improvement (Δloss < 0.01) was observed for 10 consecutive epochs. Meanwhile, ModelCheckpoint was used to automatically save the model weights achieving the best validation accuracy, and CSVLogger recorded training and validation metrics at each epoch. Additionally, a “Reduce on Plateau” strategy was applied in the later training phase to adaptively decrease the learning rate, thereby enhancing convergence stability and improving final performance.

4.3. Results and Analysis

4.3.1. Comparison with Baseline Methods

To verify the effectiveness of the proposed model, five representative baseline models were selected for comparison:
(1) Conventional deep learning models: CNN1D, LSTM, and CNN-LSTM;
(2) Advanced temporal models: Attention-LSTM and Temporal Convolutional Network (TCN).
All models were trained and evaluated under the same data partitioning and preprocessing conditions using 10-fold cross-validation to ensure fairness.
Table 3 and Figure 3 summarize the 10-fold cross-validation results of the proposed model and five representative baseline methods (CNN1D, LSTM, CNN-LSTM, Attention-LSTM, and TCN) on the PD gait severity classification task. As shown, under the same data partitioning and training settings, the proposed model achieves the best performance across all four evaluation metrics—Accuracy, Precision, Recall, and F1-score. In particular, the model attains an Accuracy of 0.930 and an F1-score of 0.925, significantly outperforming the other deep learning baselines.
Compared with the best-performing baseline method, Attention-LSTM, the proposed model achieves improvements of 4.1% in Accuracy and 13.4% in F1-score. It shows that the model not only improves overall recognition accuracy but also distinguishes gait patterns more effectively across different PD severity levels, demonstrating better stability and generalization.
Figure 4 presents the distribution and mean trends of F1-scores across 10-fold cross-validation for each model, providing insight into performance stability and robustness. It can be observed that although CNN1D, LSTM, CNN-LSTM, and Attention-LSTM achieve comparable average performance, their score ranges are relatively wide with noticeable fluctuation across folds, indicating sensitivity to data partitioning. The TCN model exhibits even greater dispersion, suggesting unstable generalization across different gait samples. The proposed model achieves the highest mean F1-score and produces compact box plots with few outliers, demonstrating consistent performance across folds, strong generalization, and superior robustness during training.
Figure 5 illustrates the 10-fold-averaged confusion matrices for all models, providing a fine-grained comparison of classification performance. The proposed model achieves the highest accuracy across all four classes, with particularly notable advantages in distinguishing moderate and advanced PD severity levels. Most misclassifications occur between adjacent severity categories, which aligns with the continuous and progressive nature of PD gait deterioration in clinical practice. These results indicate that the proposed method more effectively captures gait dynamics and underlying pathological patterns, demonstrating stronger clinical discriminative capability and practical application potential.

4.3.2. Ablation Study

To assess the contribution of each core component of the proposed framework, we conducted ablation experiments by removing the BGS-GAT, AMS-Inception1D, and TF-Branch modules in turn. Performance was then compared against the full model. The results are presented in Table 4 and Figure 6.
From Table 4 and Figure 6, it is evident that removing any individual module results in a decline in performance, confirming that each component contributes meaningfully to the overall architecture. Specifically, eliminating the BGS-GAT module results in the largest performance drop, with the F1-score decreasing from 0.925 to 0.892. This highlights the critical role of biomechanics-guided graph modeling in capturing plantar force transfer patterns and spatial structure abnormalities in PD gait. When the AMS-Inception1D module is removed, the model’s ability to extract multi-scale temporal features is weakened, reducing the F1-score to 0.903, validating its importance for modeling complex temporal dynamics and local gait transitions. In contrast, removing the TF-Branch causes a smaller decline (F1-score reduced to 0.921), indicating that time–frequency fusion serves primarily as an enhancement mechanism, further strengthening the model’s ability to capture subtle frequency-domain abnormalities associated with rhythmic disturbances in PD gait.
Additionally, Figure 7 compares the training and validation curves of different ablation models. As shown, the full model converges the fastest and exhibits the most stable trend, with minimal fluctuations in validation loss, demonstrating superior training stability and generalization capability. In contrast, removal of the BGS-GAT and AMS-Inception1D modules results in more pronounced oscillations in validation accuracy and loss, indicating their pivotal roles in model convergence and effective feature learning. Although excluding the TF-Branch maintains relatively stable training behavior, its overall performance remains inferior to the full model, further confirming the auxiliary value of frequency-domain information in enhancing model robustness and classification precision.
Figure 8 further illustrates the contributions of each module to classification outcomes. The full model achieves the highest correct recognition rate across all four categories, with particularly notable performance in distinguishing moderate and more advanced PD cases (classes 2 and 3). The few misclassifications observed are primarily concentrated between adjacent severity levels, aligning with the progressive and continuous nature of PD gait impairment. When the BGS-GAT module is removed, the recognition performance for moderate-severity subjects degrades most significantly, underscoring the essential role of biomechanics-guided spatial modeling in capturing subtle pathological differences. Eliminating the AMS-Inception1D module weakens the ability to model multi-scale temporal dynamics, leading to blurred class boundaries and reduced discrimination capability. Removing the TF-Branch results in a slight increase in misclassification, suggesting that frequency-domain cues provide complementary support in improving classification stability and precision.
Combining the observations from Figure 7 and Figure 8, it is evident that all three core modules contribute meaningfully to training convergence, classification accuracy, and model stability. Among them, BGS-GAT and AMS-Inception1D serve as the primary performance drivers, substantially enhancing spatial dependency learning and multi-scale temporal representation capability. Meanwhile, the TF-Branch provides complementary frequency-domain information, further improving robustness and discriminative power in complex gait representation tasks. Together, these modules enable the proposed model to achieve superior feature expressiveness and resilience in assessing Parkinsonian gait severity.

5. Conclusions

This paper presents a multi-sensor deep learning framework that integrates biomechanical priors with time–frequency collaborative modeling for the automatic assessment of gait severity in PD. The proposed architecture addresses the limitations of existing methods in spatial dependency modeling, multi-scale temporal feature extraction, and frequency-domain information utilization through three core modules—BGS-GAT, AMS-Inception1D, and TF-Branch—enabling comprehensive characterization of gait signals across spatial, temporal, and frequency domains. Experiments on a public plantar pressure dataset demonstrate that the proposed model achieves an accuracy of 0.930 and an F1-score of 0.925 in the four-class severity classification task, significantly outperforming state-of-the-art deep learning models and exhibiting strong stability and generalization capability across different data splits. Ablation studies further confirm the effectiveness and necessity of each module in enhancing overall performance.
Future work will focus on integrating multi-modal sensing data (e.g., IMU and electromyography signals) to build a more comprehensive assessment system, as well as exploring lightweight deployment and personalized adaptation strategies. The dilation rate and kernel sizes in AMS-Inception1D were selected based on gait characteristics rather than exhaustive tuning, and further temporal-scale exploration is left for future work.

Author Contributions

Conceptualization, W.L.; Methodology, W.L.; Software, W.L.; Validation, W.L.; Writing—original draft, T.Z. and Q.Y.; Project administration, W.L.; Funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Project of the Wuxi Health Commission (Z202321).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, Y.; Yuan, Y.; Zhang, G.; Wang, H.; Chen, Y.-C.; Liu, Y.; Tarolli, C.G.; Crepeau, D.; Bukartyk, J.; Junna, M.R.; et al. Artificial intelligence-enabled detection and assessment of Parkinson’s disease using nocturnal breathing signals. Nat. Med. 2022, 28, 2207–2215. [Google Scholar] [CrossRef] [PubMed]
  2. Chen, J.; Guan, A.; Du, J.; Ayush, A. Multivariate time series prediction with multi-feature analysis. Expert Syst. Appl. 2025, 268, 126302. [Google Scholar] [CrossRef]
  3. Chen, J.; Guan, A.; Cheng, S. Double decomposition and fuzzy cognitive graph-based prediction of non-stationary time series. Sensors 2024, 24, 7272. [Google Scholar] [CrossRef] [PubMed]
  4. Bowman, T.; Pergolini, A.; Carrozza, M.C.; Lencioni, T.; Marzegan, A.; Meloni, M.; Vitiello, N.; Crea, S.; Cattaneo, D. Wearable biofeedback device to assess gait features and improve gait pattern in people with Parkinson’s disease: A case series. J. Neuroeng. Rehabil. 2024, 21, 110. [Google Scholar] [CrossRef]
  5. Zhu, S.; Wu, Z.; Wang, Y.; Jiang, Y.; Gu, R.; Zhong, M.; Jiang, X.; Shen, B.; Zhu, J.; Yan, J.; et al. Gait analysis with wearables is a potential progression marker in Parkinson’s disease. Brain Sci. 2022, 12, 1213. [Google Scholar] [CrossRef]
  6. Moreau, C.; Rouaud, T.; Grabli, D.; Benatru, I.; Remy, P.; Marques, A.-R.; Drapier, S.; Mariani, L.-L.; Roze, E.; Devos, D.; et al. Overview on wearable sensors for the management of Parkinson’s disease. npj Park. Dis. 2023, 9, 153. [Google Scholar] [CrossRef]
  7. Abujrida, H.; Agu, E.; Pahlavan, K. Machine learning-based motor assessment of Parkinson’s disease using postural sway, gait and lifestyle features on crowdsourced smartphone data. Biomed. Phys. Eng. Express 2020, 6, 035005. [Google Scholar] [CrossRef]
  8. Safarpour, D.; Dale, M.L.; Shah, V.V.; Talman, L.; Carlson-Kuhta, P.; Horak, F.B.; Mancini, M. Surrogates for rigidity and PIGD MDS-UPDRS subscores using wearable sensors. Gait Posture 2022, 91, 186–191. [Google Scholar] [CrossRef]
  9. Li, H.; Zecca, M.; Huang, J. Evaluating the utility of wearable sensors for the early diagnosis of Parkinson disease: Systematic review. J. Med. Internet Res. 2025, 27, e69422. [Google Scholar] [CrossRef]
  10. Tsakanikas, V.; Ntanis, A.; Rigas, G.; Androutsos, C.; Boucharas, D.; Tachos, N.; Skaramagkas, V.; Chatzaki, C.; Kefalopoulou, Z.; Tsiknakis, M.; et al. Evaluating gait impairment in Parkinson’s disease from instrumented insole and IMU sensor data. Sensors 2023, 23, 3902. [Google Scholar] [CrossRef]
  11. Zhong, C.; Ng, W.W.Y. A robust frequency-domain-based graph adaptive network for Parkinson’s disease detection from gait data. IEEE Trans. Multimed. 2022, 25, 7076–7088. [Google Scholar] [CrossRef]
  12. Zhang, J.; Lim, J.; Kim, M.-H.; Hur, S.; Chung, T.-M. WM–STGCN: A novel spatiotemporal modeling method for Parkinsonian gait recognition. Sensors 2023, 23, 4980. [Google Scholar] [CrossRef] [PubMed]
  13. Nerrise, F.; Zhao, Q.; Poston, K.L.; Pohl, K.M.; Adeli, E. An explainable geometric-weighted graph attention network for identifying functional networks associated with gait impairment. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer Nature: Cham, Switzerland, 2023; pp. 723–733. [Google Scholar]
  14. Wang, X.; Zhou, G.; Zhao, Z.; Zhang, X.; Li, F.; Qi, F. A global-local dynamic directed graph neural network for Parkinson’s disease detection. IEEE Trans. Neural Syst. Rehabil. Eng. 2025, 33, 3947–3957. [Google Scholar] [CrossRef] [PubMed]
  15. Tian, H.; Li, H.; Jiang, W.; Ma, X.; Li, X.; Wu, H.; Li, Y. Cross-spatiotemporal graph convolution networks for skeleton-based Parkinsonian gait MDS-UPDRS score estimation. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 412–421. [Google Scholar] [CrossRef]
  16. Naimi, S.; Said, A.; Bouachir, W.; Bilodeau, G.A. InceptoFormer: A multi-signal neural framework for Parkinson’s disease severity evaluation from gait. arXiv 2025, arXiv:2508.04540. [Google Scholar]
  17. Wang, W.; Lin, J.; Le, X.; Li, Y.; Liu, T.; Pan, L.; Li, M.; Yao, D.; Ren, P. Addressing multiple challenges in early gait freezing prediction for Parkinson’s disease: A practical deep learning approach. IEEE J. Biomed. Health Inform. 2025, 29, 6251–6262. [Google Scholar] [CrossRef]
  18. Zhao, H.; Xie, J.; Lei, Y.; Cao, H.; Cao, J.; Liao, W.-H. Federated transferring multi-channel CNN for diagnosis of Parkinson’s disease under unseen and small data. IEEE J. Biomed. Health Inform. 2025. online ahead of print. [Google Scholar] [CrossRef]
  19. Zampogna, A.; Suppa, A.; Patera, M.; Cavallieri, F.; Bove, F.; Fraix, V.; Castrioto, A.; Schmitt, E.; Pelissier, P.; Chabardes, S.; et al. Resting and action tremor in Parkinson’s disease: Pathophysiological insights from long-term STN-DBS. npj Park. Dis. 2025, 11, 284. [Google Scholar] [CrossRef]
  20. Siuly, S.; Khare, S.K.; Kabir, E.; Sadiq, M.T.; Wang, H. An efficient Parkinson’s disease detection framework: Leveraging time-frequency representation and AlexNet convolutional neural network. Comput. Biol. Med. 2024, 174, 108462. [Google Scholar] [CrossRef]
  21. Zhang, R.; Jia, J.; Zhang, R. EEG analysis of Parkinson’s disease using time–frequency analysis and deep learning. Biomed. Signal Process. Control 2022, 78, 103883. [Google Scholar] [CrossRef]
  22. Park, M.; Kim, S.-W.; Hong, J.Y.; Cho, S.P.; Park, J.; Urtnasan, E.; Baek, M.S. Implications of heart rate variability measured using wearable electrocardiogram devices in diagnosing Parkinson’s disease and its association with neuroimaging biomarkers: A case-control study. Front. Aging Neurosci. 2025, 17, 1530240. [Google Scholar] [CrossRef]
  23. Wang, X.; Xu, X.; Zhao, Z.; Li, F.; Qi, F.; Liang, S. VGRF signal-based gait analysis for Parkinson’s disease detection: A multi-scale directed graph neural network approach. IEEE J. Biomed. Health Inform. 2025. online ahead of print. [Google Scholar] [CrossRef]
Figure 1. Overall architecture of the proposed multi-sensor gait network with biomechanical priors and time–frequency collaborative modeling.
Figure 1. Overall architecture of the proposed multi-sensor gait network with biomechanical priors and time–frequency collaborative modeling.
Mathematics 14 00089 g001
Figure 2. VGRF patterns obtained from the left (a) and right (b) feet of a control subject and a PD patient.
Figure 2. VGRF patterns obtained from the left (a) and right (b) feet of a control subject and a PD patient.
Mathematics 14 00089 g002
Figure 3. Comparison of average accuracy, precision, recall, and F1-score across models under 10-fold cross-validation.
Figure 3. Comparison of average accuracy, precision, recall, and F1-score across models under 10-fold cross-validation.
Mathematics 14 00089 g003
Figure 4. Distribution and mean trends of F1-scores for different models.
Figure 4. Distribution and mean trends of F1-scores for different models.
Mathematics 14 00089 g004
Figure 5. Ten-fold averaged confusion matrices of different models. 0: Healthy, 1: Mild (Severity 2), 2: Moderate (Severity 2.5), 3: Moderate (Severity 3).
Figure 5. Ten-fold averaged confusion matrices of different models. 0: Healthy, 1: Mild (Severity 2), 2: Moderate (Severity 2.5), 3: Moderate (Severity 3).
Mathematics 14 00089 g005
Figure 6. Radar chart comparison of accuracy, precision, recall, and F1-score across ablation models.
Figure 6. Radar chart comparison of accuracy, precision, recall, and F1-score across ablation models.
Mathematics 14 00089 g006
Figure 7. Comparison of training and validation accuracy/loss curves for ablation models.
Figure 7. Comparison of training and validation accuracy/loss curves for ablation models.
Mathematics 14 00089 g007
Figure 8. Averaged confusion matrices of ablation models across four gait severity classes. 0: Healthy, 1: Mild (Severity 2), 2: Moderate (Severity 2.5), 3: Moderate (Severity 3).
Figure 8. Averaged confusion matrices of ablation models across four gait severity classes. 0: Healthy, 1: Mild (Severity 2), 2: Moderate (Severity 2.5), 3: Moderate (Severity 3).
Mathematics 14 00089 g008
Table 1. Demographic characteristics and disease stage distribution of participants.
Table 1. Demographic characteristics and disease stage distribution of participants.
GroupsSubjectsMaleFemaleHealthyMild Severity 2Moderate Severity 2.5Moderate Severity 3
PD935835-562710
Control73403373---
Table 2. Main model architecture and training hyperparameters.
Table 2. Main model architecture and training hyperparameters.
ModuleKey Configuration
AMS-Inception1DFour convolutional branches (kernel = 1, 3, 3 [dilation = 2], MaxPool + 1 × 1); filters = 32; SE-Gate fusion
BGS-GATTwo-layer GAT; output dim = 64; adjacency matrix based on 18 sensor topology
TF-BranchRFFT (64 frequency bins); Conv1D (32, 64) + BatchNorm + MaxPooling
ClassifierDense (100 → 20 → 4); activation = SELU; output = Softmax
OptimizerNadam (learning rate = 1 × 10−4)
Loss functionCategorical cross-entropy
Batch size/Epochs64/30
RegularizationDropout (0.1–0.2) + BN
Model savingEarlyStopping (patience = 10) + ModelCheckpoint
Data balancingSMOTE oversampling + class weight adjustment
Table 3. Performance comparison of different models on PD gait severity classification.
Table 3. Performance comparison of different models on PD gait severity classification.
ModelAccuracyPrecisionRecallF1-ScoreParameter Scale
CNN1D0.865 ± 0.0870.779 ± 0.1500.787 ± 0.1550.767 ± 0.159 ~ 10 5
LSTM0.885 ± 0.0530.809 ± 0.1420.811 ± 0.1410.799 ± 0.146 ~ 10 5 10 6
CNN-LSTM0.875 ± 0.0710.797 ± 0.1570.799 ± 0.1510.783 ± 0.153 ~ 10 5 10 6
Attention-LSTM0.889 ± 0.0640.801 ± 0.1600.809 ± 0.1470.791 ± 0.154 ~ 10 5 10 6
TCN0.832 ± 0.0870.767 ± 0.1440.775 ± 0.1520.754 ± 0.159 ~ 10 5
Our model0.930 ± 0.0670.938 ± 0.0690.930 ± 0.0670.925 ± 0.072 ~ 10 6
Table 4. Comparison of ablation experiment results.
Table 4. Comparison of ablation experiment results.
ModelAccuracyPrecisionRecallF1-ScoreParameter Scale
w/o BGS-GAT0.900 ± 0.1100.908 ± 0.1150.900 ± 0.1100.892 ± 0.122 ~ 10 6 ( )
w/o AMS-Inception0.910 ± 0.0740.919 ± 0.0760.910 ± 0.07440.903 ± 0.081 ~ 10 6 ( )
w/o TF-Branch0.930 ± 0.0680.929 ± 0.0810.930 ± 0.0680.921 ± 0.085 ~ 10 6 ( )
Full model0.930 ± 0.0670.938 ± 0.0690.930 ± 0.0670.925 ± 0.072 ~ 10 6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lin, W.; Zhou, T.; Yang, Q. A Biomechanics-Guided and Time–Frequency Collaborative Deep Learning Framework for Parkinsonian Gait Severity Assessment. Mathematics 2026, 14, 89. https://doi.org/10.3390/math14010089

AMA Style

Lin W, Zhou T, Yang Q. A Biomechanics-Guided and Time–Frequency Collaborative Deep Learning Framework for Parkinsonian Gait Severity Assessment. Mathematics. 2026; 14(1):89. https://doi.org/10.3390/math14010089

Chicago/Turabian Style

Lin, Wei, Tianqi Zhou, and Qiwen Yang. 2026. "A Biomechanics-Guided and Time–Frequency Collaborative Deep Learning Framework for Parkinsonian Gait Severity Assessment" Mathematics 14, no. 1: 89. https://doi.org/10.3390/math14010089

APA Style

Lin, W., Zhou, T., & Yang, Q. (2026). A Biomechanics-Guided and Time–Frequency Collaborative Deep Learning Framework for Parkinsonian Gait Severity Assessment. Mathematics, 14(1), 89. https://doi.org/10.3390/math14010089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop