Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection

Li, Hao; Ma, Jun; Cao, Boqiang; Ren, Xunhuan; Chen, Yiming; Guo, Qicheng; Li, Bohan; Baryskievic, Illa; Baryskievic, Anatoliy; Tsviatkou, Viktar

doi:10.3390/fractalfract10050297

Open AccessArticle

Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection

by

Hao Li

¹

,

Jun Ma

^1,*

,

Boqiang Cao

²

,

Xunhuan Ren

^1,3

,

Yiming Chen

¹,

Qicheng Guo

¹,

Bohan Li

¹

,

Illa Baryskievic

¹,

Anatoliy Baryskievic

¹ and

Viktar Tsviatkou

¹

Department of Infocommunication Technologies, Belarusian State University of Informatics and Radioelectronics, 220013 Minsk, Belarus

²

Institute of Computer Technology and Artificial Intelligence, Kyrgyz National University named after Jusup Balasagyn, Bishkek 720033, Kyrgyzstan

³

Faculty of Information Engineering, Xinjiang Institute of Technology, Aksu 843100, China

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2026, 10(5), 297; https://doi.org/10.3390/fractalfract10050297

Submission received: 17 March 2026 / Revised: 17 April 2026 / Accepted: 23 April 2026 / Published: 28 April 2026

(This article belongs to the Special Issue Fractional and Fractal Methods in Biomedical Imaging and Time Series Learning)

Download

Browse Figures

Versions Notes

Abstract

Gait impairment is a common motor manifestation of Parkinson’s disease (PD), which is also frequently accompanied by other motor abnormalities such as bradykinesia, rigidity, postural instability, and movement asymmetry. These motor impairments are closely associated with reduced mobility and increased fall risk. Although wearable plantar insole sensing provides a promising basis for objective gait assessment, existing studies have mainly focused on conventional time- or frequency-domain descriptors, whereas the nonlinear complexity of gait, laterality-related imbalance, and deviation from normal gait patterns remain insufficiently characterized in an integrated manner. To address this gap, this paper proposes FID-Gait, which is a three-domain fusion framework for PD identification using instrumented insole data. The framework combines automated gait-cycle segmentation with multidomain feature modeling, including a fractal domain for nonlinear gait complexity, a plantar-loading–phase imbalance (PLPI) domain for loading asymmetry and temporal disturbance, and a covariance-adjusted deviation (CAD) domain for measuring deviation from normal gait patterns. Experiments on the PhysioNet Gait in Parkinson’s Disease dataset showed that FID-Gait achieved strong discriminative performance under multiple evaluation protocols. At the gait-cycle level, the selected MLP classifier achieved an accuracy of 99.11% and an F1-score of 99.47%. At the subject level, the selected AdaBoost classifier achieved the highest accuracy of 90.22% and the best F1-score reached 93.02%. Five-fold cross-validation further supported the robustness of the proposed representation, and leave-one-subject-out evaluation provided preliminary evidence of subject-independent generalization. Overall, FID-Gait provides an effective and interpretable framework for PD gait characterization and identification in offline experimental settings.

Keywords:

Parkinson’s disease; smart insoles; plantar insole sensor gait analysis; fractal-dimension features; plantar-loading–phase imbalance; covariance-adjusted deviation; spatial box-counting fractal dimension; gait classification

1. Introduction

Neurodegenerative disorders (NDDs) have become increasingly prevalent with population aging and longer life expectancy. Many NDDs are associated with progressive impairments in motor control, and gait and balance disturbances are among the most functionally relevant manifestations because they directly affect mobility, independence, and quality of life [1,2,3,4]. In current clinical practice, mobility assessment still relies heavily on observational examination and clinical rating scales administered by clinicians or physiotherapists. Although such assessments are clinically valuable, they are inherently subjective and can be influenced by inter- and intra-rater variability, for example due to differences in clinical training, observational experience, and scoring habits across raters [5,6]. In addition, many clinical scales rely on discrete ordinal scores, which may limit sensitivity to subtle motor changes. These limitations motivate the development of objective, repeatable, and scalable methods for gait assessment in both clinical and daily-life settings [7].

Parkinson’s disease (PD) is one of the most representative NDDs in which motor impairment and gait dysfunction play central roles. In addition to gait instability and fall risk, PD is frequently accompanied by other motor abnormalities such as bradykinesia, rigidity, postural instability, and movement asymmetry [8]. In some patients, PD is also associated with a forward-flexed trunk related to muscle rigidity, which further alters posture and locomotor behavior [9]. Common PD-related gait abnormalities include freezing of gait (FOG), reduced step length, increased stride-to-stride variability, as well as side-related gait asymmetry, all of which are closely associated with symptom severity and disease progression. In PD, laterality-related abnormalities are often reflected by asymmetric plantar loading and unequal temporal organization between the left and right limbs rather than the fixed unilateral compensatory patterns more typical of hemiplegic or foot-drop gait. PD may also present with postural abnormalities and tremor, further distinguishing its locomotor profile from other pathological gait patterns. In this context, contextualizing PD within the broader field of pathological gait recognition is important for avoiding confusion between PD-specific abnormalities and other pathological gait phenotypes [10]. With the rapid development of wearable sensing technologies, gait can now be quantified more objectively outside conventional laboratory settings. In particular, plantar insole sensors provide direct measurements of foot–ground interaction and preserve rich temporal and loading information during walking, making them a promising modality for PD-related gait assessment and monitoring [11].

In recent years, artificial intelligence (AI), especially machine learning (ML) and deep learning (DL), has been increasingly applied to medical data analysis because these methods can learn discriminative patterns from multidimensional physiological data [12,13]. In gait analysis, such methods provide useful tools for transforming sensor measurements into objective disease-related representations and classification models. Recent studies have shown the potential of deep neural networks for pathological gait recognition by combining heterogeneous gait-related representations, such as skeleton sequences, joint angles, and gait parameters [8], as well as for the optimized recognition of pathological locomotor patterns from inertial data [10]. In addition, wearable biosensor systems embedded in body-worn devices have also been successfully used together with deep learning for activity recognition, further supporting the feasibility of intelligent movement analysis beyond conventional laboratory settings [14]. Nevertheless, interpretability remains essential for clinically meaningful wearable gait assessment.

Beyond conventional descriptive statistics, gait is increasingly understood as a nonlinear and multiscale dynamical process. In mathematics, fractals are commonly associated with exact or statistical self-similarity across scales, and their structural complexity is often characterized by the fractal dimension, which is a scale-dependent quantity that extends the conventional Euclidean notion of dimension and may take non-integer values. In this sense, the fractal dimension describes how structural detail changes with observation scale and provides a mathematical way to characterize irregular geometries beyond traditional integer-dimensional objects [15,16]. Fractal and scale-related analyses have therefore been introduced to capture aspects of gait organization that may not be adequately reflected by low-order temporal- or amplitude-based measures alone [17]. At the same time, clinically meaningful PD gait abnormalities are not limited to complexity changes: they also involve laterality-related manifestations, altered temporal organization, and subject-specific deviation from normal walking patterns [18,19].

However, several limitations remain in current plantar insole sensor signal-based PD gait analysis. First, many existing approaches still rely primarily on conventional time-domain, frequency-domain, or time–frequency descriptors [20,21,22], whereas nonlinear gait complexity may remain insufficiently characterized. Second, although PD gait often exhibits asymmetry, disturbed gait-phase organization, and deviation from normal locomotor patterns, these properties are not always represented in a unified and interpretable manner [23,24]. Third, subject-level screening and monitoring would benefit from a continuous abnormality measure that quantifies how far an individual deviates from a normal baseline while accounting for the feature correlation structure [19]. Therefore, there remains a need for an integrated representation that jointly models gait complexity, biomechanical imbalance, and deviation from normal gait patterns.

To address these issues, we propose FID-Gait, which is a three-domain fusion framework for PD identification using smart-insole data. The framework integrates a fractal domain for nonlinear gait complexity, a plantar-loading–phase imbalance (PLPI) domain for loading asymmetry and temporal disturbance, and a covariance-adjusted deviation (CAD) domain for quantifying deviation from a normal reference distribution. By combining these complementary domains, FID-Gait provides an interpretable and discriminative representation for PD gait characterization and identification.

The main contributions of this work are summarized as follows:

We propose an automated gait-cycle segmentation pipeline for plantar insole sensor signals.
We design three complementary feature domains, namely fractal, PLPI, and CAD, to characterize nonlinear gait complexity, biomechanical imbalance, and deviation from normal gait patterns.
We develop the FID-Gait framework, which achieves strong performance at both the gait-cycle and subject levels.

The remainder of this paper is organized as follows. Section 2 reviews related studies on PD gait analysis and plantar insole sensor signal-based intelligent assessment. Section 3 introduces the dataset, signal preprocessing procedure, gait segmentation method, and multidomain feature construction. Section 4 presents the experimental settings and classification results, including ablation analysis and robustness evaluation. Section 5 discusses the main findings, practical implications, and study limitations. Finally, Section 6 concludes the paper.

2. Related Work

2.1. Wearable Gait Analysis for Parkinson’s Disease and Related Movement Disorders

Wearable sensing has enabled objective gait assessment for Parkinson’s disease and related movement disorders outside highly controlled laboratory environments. Among the commonly used wearable modalities, inertial measurement units (IMUs) have been widely studied for ambulatory gait monitoring and symptom assessment. Representative studies have explored IMU-based time–frequency features and machine learning for home-based PD detection [25], multimodal sensor fusion for early symptom recognition [26], and long short-term memory (LSTM) models for motion reconstruction and fall-risk-related analysis [27]. These studies support the feasibility of real-world gait monitoring, but the measured signals are indirect and may provide limited information about detailed foot–ground interaction. In parallel, vision-based and pose-estimation-based gait analysis has advanced as a complementary approach for extracting spatiotemporal and kinematic gait information without body-fixed instrumentation [7]. This line of work is relevant here because vision-based and pose-estimation-based systems provide a low-cost and potentially portable alternative for gait assessment with the growing applicability to point-of-care, home-based, and remote monitoring scenarios [28,29]. However, their performance may still be affected by viewpoint variation, illumination conditions, and body-part occlusion, which can reduce the accuracy and stability of estimated gait variables [28,30,31].

Compared with IMUs, plantar-pressure insoles directly capture foot loading patterns during gait and therefore offer a particularly relevant sensing modality for lower-limb motor assessment. Prior studies have used plantar-pressure data for PD-related tasks including disease progression assessment, Normal/PD classification, and the estimation of clinical scores. Existing approaches include signal and center-of-pressure (CoP)-based progression analysis [32], wavelet features combined with fuzzy neural networks [33], linear discriminant analysis using gait and tremor cues [21], deep architectures for UPDRS-related regression [34], and wavelet-based support vector machine (SVM) classification [33]. In addition to PD-oriented applications, plantar-pressure sensing has also shown value in related mobility-assessment scenarios such as elderly fall-risk evaluation, where pressure distribution and CoP-related dynamics provide informative predictors [35,36]. These studies demonstrate the practical utility of plantar-pressure sensing, but they also indicate that feature representation remains a central factor affecting robustness, interpretability, and generalization. A comparison of representative plantar-insole-based studies for PD gait analysis is summarized in Table 1.

2.2. Feature Representation for Plantar Insole Sensor Signal Data PD Gait Analysis

Despite the increasing availability of wearable plantar insole sensor signal data, many PD gait studies still rely primarily on conventional time-domain, frequency-domain, or time–frequency descriptors. Existing reviews and representative pipelines show that these feature families have been expanded in multiple directions, including frequency-only analysis [20], time–frequency features with linear discriminant analysis (LDA) [21], large pools of handcrafted time-domain variables [37], image-based transformation of time-series signals for neural networks [38], and mixed time–frequency feature sets for support vector machine classification [22,35,39]. Although such approaches have achieved useful performance, they mainly describe local amplitudes, periodic content, or statistical summaries, and they may not fully capture the nonlinear, asymmetric, and heterogeneous nature of PD gait.

One research direction that addresses this limitation focuses on gait complexity. Fractal and scale-related analyses have been used to characterize gait organization beyond low-order statistics, and PD-related studies have linked fractal-like properties to stride length regulation and gait variability [40]. However, fractal-dimension estimation can be sensitive to time-series length and parameter settings, which becomes particularly important when features are extracted from short step-level windows [41]. This motivates the careful use of fractal descriptors in plantar insole sensor signal gait analysis rather than treating them as drop-in replacements for conventional statistics.

Another relevant direction concerns gait asymmetry and imbalance. PD is well known to exhibit clinically meaningful laterality effects, and gait asymmetry has been associated with disease characteristics and motor impairment. At the same time, symptom laterality does not always align perfectly with gait or turning asymmetry, indicating substantial inter-subject heterogeneity. For this reason, imbalance-related descriptors are most valuable when they remain interpretable at the individual level and can support cross-subject comparison or longitudinal follow-up [18]. More broadly, network-based representations have also been explored in other human movement-analysis settings to characterize coordination structure under different task conditions, suggesting that higher-order relational descriptors may complement conventional handcrafted features [42].

Overall, prior studies suggest that nonlinear complexity, imbalance/asymmetry, and deviation from a Normal reference may provide complementary information for PD gait characterization. In this context, deviation from a Normal reference denotes a measurable departure from healthy-control gait, which is reflected in differences in central tendency, stride-to-stride variability, or group-level separability of clinically relevant gait metrics [43,44,45]. However, these aspects are often investigated separately, and an interpretable plantar insole sensor signal data representation that unifies them within a single subject-level framework remains insufficiently explored. This gap motivates the three-domain design adopted in the proposed FID-Gait framework.

3. Materials and Methods

3.1. PD Dataset Description

The proposed method was evaluated using the publicly available Gait in Parkinson’s Disease dataset from PhysioNet [46]. The dataset contains gait recordings from 93 subjects with idiopathic Parkinson’s disease (PD) and 73 Normal control subjects, yielding 166 subjects in total, and the data were collected from three independent studies, namely Ga, Ju, and Si [47,48,49]. For the whole cohort, the subject-level sex composition was Male/Female = 58/35 for the PD group and Male/Female = 40/33 for the Normal control group.

During data acquisition, subjects walked on level ground at their usual self-selected pace for approximately 2 min. Plantar-force measurements were collected using instrumented insoles with 8 force sensors under each foot, resulting in 16 sensor channels in total. The signals were sampled at 100 Hz, corresponding to a sampling interval of 10 ms. In addition to the 16 individual sensor channels, the released records also provide two composite channels, corresponding to the summed force under the left foot and right foot, respectively.

Table 2 summarizes the subject-level and record-level composition used in this paper.

Because this research was conducted using a public dataset, the data-acquisition protocol was predetermined and did not include repeated walking trials separated by predefined rest intervals. It should be noted that gait measurements may be influenced by testing procedures and protocol design, and recent studies have emphasized the importance of protocol standardization in Parkinson’s disease gait research. In addition, repeated trials with appropriate inter-trial rest may facilitate a more thorough evaluation of measurement stability and reduce potential procedural interference. Therefore, in this paper, inter-trial consistency and its related effects could not be specifically assessed [50,51,52,53].

3.2. Statistical Analysis

To compare the PD and Normal groups, a unified statistical analysis workflow was adopted. Unless otherwise specified, all tests were two-sided with statistical significance set at

p < 0.05

.

After preprocessing, single-feature distributions were compared between the two groups using appropriate parametric or nonparametric tests according to the data characteristics. For significant differences, effect sizes were additionally reported with Cohen’s d used for mean-based comparisons.

To control the family-wise error rate in multiple feature comparisons, the Holm procedure was applied.

For subject-level distribution analyses, statistical inference was based on subject-level summary values. Normality was first assessed using the Shapiro–Wilk test; when the assumption of normality was not satisfied, between-group differences were tested using the Wilcoxon rank-sum test, and rank-biserial correlation was reported as the effect size. For analyses involving multiple comparisons (e.g., plantar-region screening and regional fractal-dimension screening), the Holm correction was applied to control the family-wise error rate, and raw p-values were explicitly distinguished from adjusted p-values in the main text. For single prespecified comparisons (e.g., the CAD group comparison), no additional post hoc multiple-comparison correction was applied.

For feature-domain ablation analysis, classifier-level performance drops relative to the full feature set were compared across ablation settings using the Friedman test, which was followed by Wilcoxon signed-rank tests with Holm correction for post hoc pairwise comparisons when applicable.

3.3. Data Processing and Model Settings

To ensure reproducibility, the data preprocessing pipeline, gait-state segmentation procedure, feature-processing strategy, and classifier settings were standardized and are summarized here.

The plantar-force signals were converted into numerical time-series data and smoothed using a centered moving average with a window length of 5 samples (50 ms). Missing values introduced by centered smoothing, as well as non-numeric entries, were imputed by backward fill followed by forward fill. Each channel was normalized within each record by its maximum value. No additional band-pass filtering, resampling, or detrending was applied.

Gait-state segmentation was performed using the clustering-based adaptive threshold generation method described below. Briefly, each plantar-region signal was divided into two clusters by K-means clustering, with the number of clusters set to 2, the number of initializations set to 10, and the random seed set to 42. The adaptive threshold for each region was defined as the lower cluster centroid after convergence. Gait states were then identified based on these thresholds and the corresponding state-transition rules from which gait-cycle-related features were extracted.

No additional feature selection or dimensionality reduction was applied in the main experiments. Except for the ablation analysis, in which predefined feature groups were removed for comparison, all classifiers were trained on the full extracted feature set. Positive and negative infinity values were replaced with missing values and imputed using the median of each feature column. All numerical features were standardized before classification.

For model development, the dataset was first divided into a training portion (70%) and an independent test set (30%) using stratified sampling. The 70% training portion was then further split into actual training and validation subsets in a stratified 8:2 ratio. This hierarchical partitioning strategy is consistent with common machine-learning evaluation practice, in which an independent hold-out test set is preserved for final assessment on unseen data, whereas the training portion is further divided for model fitting, validation, model selection, and hyperparameter tuning. Similar evaluation designs have also been reported in gait-related studies, including explicit subject-level training/validation/testing splits, and the use of an internal validation subset derived from the training data for model selection and hyperparameter tuning [9,54,55,56]. In addition, the independent test set was not used for model fitting, hyperparameter tuning, or validation-stage model selection.

Nine classifiers were evaluated: Decision Tree, Logistic Regression, K-Nearest Neighbors, Random Forest, Gaussian Naive Bayes, Gradient Boosting, Multilayer Perceptron, Support Vector Machine, and AdaBoost. For classifiers with stochastic components,

random_state = 42

was set when applicable. To address class imbalance, balanced class weights were used in Decision Tree, Logistic Regression, Random Forest, and Support Vector Machine. Logistic Regression and Multilayer Perceptron were trained with a maximum of 500 iterations. K-Nearest Neighbors used default settings (

n_{neighbors} = 5

, Minkowski distance,

p = 2

), and Random Forest used 100 trees. Gaussian Naive Bayes used the default

var_smoothing = 10^{- 9}

. The Multilayer Perceptron had one hidden layer with 100 neurons, ReLU activation, and the Adam optimizer. The Support Vector Machine used an RBF kernel with

C = 1.0

,

γ = scale

, and probability estimation enabled.

3.4. A Bodily-Kinesthetic Control Integration (BKCI)-Based FID-Gait Architecture for Smart-Insole Gait Analysis

As shown in Figure 1, the proposed Bodily-Kinesthetic Control Integration (BKCI)-based architecture for smart-insole gait analysis consists of the Body System Module, the Multidomain Feature Generation Module (FID-Gait), and the Multilevel Integration Module. Within this architecture, BKCI provides a conceptual perspective for organizing gait analysis as a closed-loop process linking bodily regulation, plantar insole sensing, multidomain feature construction, and decision making. Accordingly, the Body System Module describes the bodily control basis of gait, the FID-Gait module organizes plantar insole sensor signal information into multidomain gait representations, and the Multilevel Integration Module connects sensing, feature generation, and classification into a unified analytical framework.

3.4.1. Body System Module and Bodily-Kinesthetic Control Integration

The Body System Module focuses on the central nervous system (CNS) and peripheral nervous system (PNS), which form a closed-loop regulatory system through motor output and sensory feedback, including proprioceptive and tactile inputs [57,58]. From a neuromechanical perspective, gait results from interactions among neural control, musculoskeletal dynamics, and sensory feedback rather than isolated limb movements [57,58,59]. Accordingly, plantar insole sensor signals acquired by smart insoles reflect both foot–ground interaction and peripheral manifestations of motor-control states.

Based on this rationale, this paper introduces Bodily-Kinesthetic Control Integration (BKCI), which is derived from Bodily-Kinesthetic Intelligence (BKI) [35,60]. Whereas BKI emphasizes bodily perception, motor control, and environmental interaction, BKCI extends this perspective into a closed-loop framework for smart-insole gait analysis by integrating neural regulation, sensory feedback, plantar-loading variation, and gait-pattern representation.

In this framework, plantar insole sensor signals and derived gait features are interpreted as structured representations of impaired bodily control rather than isolated measurements or purely statistical descriptors. The multidomain features therefore capture complementary aspects of gait abnormality, including altered dynamic regulation, impaired coordination and temporal organization, and deviations from normal gait patterns. Thus, the proposed FID-Gait framework functions as both a classification pipeline and an interpretable representation of PD-related gait abnormality from the perspective of bodily control.

3.4.2. Multidomain Feature Generation Based on FID-Gait

FID-Gait is a multidomain representation framework for plantar insole sensor signal data, consisting of three complementary domains:

(a): F: Fractal-Dimension Feature Generation (complexity domain).
Fractal-dimension features are extracted from step-level plantar insole sensor data sequences to characterize nonlinear complexity and signal roughness. This domain uses time-domain fractal-dimension estimators, including HFD, PFD, KFD, and BCFD, to describe irregular temporal dynamics.
(b): I: Plantar-Loading–Phase Imbalance Feature Generation (imbalance domain).
This domain includes two types of imbalance features: plantar insole sensor signal data imbalance features, which describe asymmetrical loading between the left and right feet and across plantar regions, and gait-phase ratio imbalance features, which capture abnormalities in phase organization and temporal structure. The latter are derived from the Gait State Segmentation Module, where adaptive thresholds are generated by clustering-based threshold generation and then used for gait-state set construction, providing a consistent basis for phase-ratio imbalance calculation.
(c): D: Deviation Feature Generation (deviation domain).
Using the normal-control distribution as a reference, this domain quantifies overall gait deviation with a covariance-aware metric. Specifically, the squared Mahalanobis distance is used as a continuous score of departure from the normal baseline.

In summary, FID-Gait adopts a fused three-domain representation—complexity, imbalance, and deviation—to characterize PD-related gait differences while maintaining interpretability.

3.4.3. Multilevel Integration Module

The Multilevel Integration Module (Figure 1) describes how plantar insole sensing, multidomain representation, and classification are connected within a coherent analytical pipeline [61]. In this paper, this module is conceptualized as a three-level integration mechanism that links data acquisition, feature construction, and final decision making within a unified framework.

At the physical measurement level, plantar insole sensor signals are collected through smart insole sensors during walking. These sensors continuously capture the distribution and temporal variation of foot–ground interaction, thereby providing the primary observational basis for subsequent gait analysis.

At the multidomain feature level, the acquired plantar insole sensor signals are transformed into structured gait representations through the construction and fusion of the three feature domains, namely Fractal Dimension, Imbalance, and Deviation, thus converting raw sensor measurements into interpretable descriptors of gait dynamics, coordination, and abnormality.

At the decision-making level, the fused multidomain features are further processed by machine-learning classifiers to generate detection outcomes. Within the BKCI framework, these outputs can also be linked to feedback-related pathways, including human multimodal feedback, so as to support anomaly prompting and potential training or intervention applications.

3.5. Plantar-Loading–Phase Imbalance Feature Generation

3.5.1. Gait State Segmentation for PLPI Construction

To compute interpretable gait-phase ratio imbalance descriptors in the plantar-loading–phase imbalance (PLPI) domain, we introduce a Gait State Segmentation Module within the FID-Gait framework. As illustrated in Figure 2, this module transforms multizonal plantar insole sensor signals into gait-state transition points and gait-state duration descriptors by combining clustering-based threshold generation with rule-based gait state detection. Thus, the module provides a consistent temporal partition of each gait cycle and establishes the basis for subsequent phase-organization and imbalance analysis.

Specifically, the smart insole is divided into five plantar anatomical zones: heel (H), rearfoot (RF), midfoot (MF), forefoot (FF), and toe (T). Let

p_{z, l (r)} \in \{p_{z, l (r)} (t_{n}) ∣ n = 1, \dots, N_{p}\}

(1)

denote the plantar insole sensor data sequence of the z-th zone for the left or right foot, where

N_{p}

is the total number of sampled time points and

t_{n} = \frac{n}{f_{s}}, n = 1, 2, \dots, N_{p}

(2)

is the time moment of the n-th sample acquired at sampling frequency

f_{s}

.

The five zonal plantar insole sensor signals are defined as

p_{T} = p_{8}, p_{H} = p_{1}, p_{R F} = \frac{p_{2} + p_{3}}{2}, p_{M F} = \frac{p_{4} + p_{5}}{2}, p_{F F} = \frac{p_{6} + p_{7}}{2},

(3)

which correspond to the toe, heel, rearfoot, midfoot, and forefoot regions, respectively.

As shown in Figure 2, each zonal plantar insole sensor data sequence is processed independently using iterative two-cluster partitioning (KMeans-2) to generate an adaptive threshold for that zone. After convergence, the lower cluster centroid is taken as the threshold, yielding the threshold set

\{T h_{p_{z}} ∣ z \in {H, R F, M F, F F, T}\} .

(4)

The detailed iterative definitions of cluster assignment, cluster-mean updating, and the stopping criterion are provided in Appendix A.

After threshold generation, the five adaptive zonal thresholds are jointly used to identify gait-state transition points, which are defined here as the time instants at which the plantar insole sensor signal of a given zone crosses its corresponding adaptive threshold in an ascending or descending direction. These transition points are threshold-crossing points rather than peak-force points. Based on the temporal relationships among the thresholds

T h_{p_{T}}

,

T h_{p_{F F}}

,

T h_{p_{M F}}

,

T h_{p_{R F}}

, and

T h_{p_{H}}

, the gait cycle is segmented into six canonical gait states: Initial Contact (IC), Loading Response (LR), Mid-Stance (MS), Terminal Stance (TS), Pre-Swing (PS), and Swing (SW). These state transitions further define higher-level temporal structures, including the overall gait cycle duration

T D_{G C}

, stance-state duration

T D_{S T}

, and swing-state duration

T D_{S W}

, as shown in Figure 2.

Let

t_{i} \in \{(t_{i, start}, t_{i, end}) ∣ i \in {I C, L R, M S, T S, P S, S W}\}

denote the start and end moments of gait state i. Let

N_{g c}

denote the number of gait cycles and

N_{c s}

denote the number of gait states within one normal gait cycle. For the j-th gait cycle, where

j = 1, 2, \dots, N_{g c},

and for the i-th gait state, where

i \in {I C, L R, M S, T S, P S, S W} (i = 1, 2, \dots, N_{c s})

, the duration of the i-th gait state in the j-th gait cycle is defined as

T D (i, j) = t_{end} (i, j) - t_{start} (i, j), i = 1, 2, \dots, N_{c s}, j = 1, 2, \dots, N_{g c},

(5)

where

T D (i, j)

denotes the time duration of the i-th gait state in the j-th gait cycle and serves as the temporal basis for subsequent phase-ratio feature construction. Since one normal gait cycle contains six canonical gait states, these durations fully describe the temporal structure of a cycle.

Using the transition points detected from the five plantar zones, a gait state set is generated for each gait cycle. Accordingly, the complete set of gait-state durations can be written as

\{T D (i, j) ∣ i = 1, 2, \dots, N_{c s}; j = 1, 2, \dots, N_{g c}\} .

(6)

Then, the total number of gait-state durations is

N_{w g s} = N_{c s} \times N_{g c} .

These gait-state durations provide the temporal basis for constructing PLPI descriptors, especially those characterizing phase organization and ratio-based imbalance. The complete multizonal gait-state transition rules are summarized in Appendix A.1.

Figure 3 illustrates a representative adaptive-threshold-based segmentation result and the corresponding gait-state percentage distribution in the Normal group. The segmented states showed a physiologically reasonable temporal organization with the overall stance-related and swing-related portions remaining close to the canonical Normal-gait pattern of about 60% and 40%, respectively. In classical gait analysis, Initial Contact, Loading Response, Mid-Stance, Terminal Stance, and Pre-Swing approximately occupy 0–2%, 2–12%, 12–31%, 31–50%, and 50–62% of the gait cycle, respectively, which are followed by swing from about 62% to 100% [62,63]. Although the exact percentages of individual sub-phases were not identical to textbook kinematic definitions, the present results still support the robustness and physiological plausibility of the proposed pressure-threshold-based gait-state segmentation.

3.5.2. Plantar-Loading Asymmetry Features

Abnormal plantar-loading coordination during walking is mainly reflected in two aspects: left–right limb imbalance and impaired loading transfer from heel contact to forefoot propulsion within the same foot. These patterns are closely related to gait-event organization, weight shifting, and dynamic locomotor control, and thus they provide biomechanically meaningful information for gait-abnormality assessment [64,65,66,67]. Based on this rationale, plantar-loading asymmetry features were constructed from gait-cycle plantar insole sensor signals to quantify imbalance both within each foot and between the two feet. By jointly characterizing intra-foot anterior–posterior coordination and inter-foot asymmetry, these features provide an interpretable and compact representation of plantar insole sensor signal data imbalance for subsequent gait assessment and classification.

From both functional and statistically corrected perspectives, the heel and forefoot were selected as the representative plantar regions. As shown in Figure 4a, threshold summaries across the five plantar regions were compared between the Normal and Parkinson groups at the subject level. After Wilcoxon rank-sum testing with Holm correction, only the heel (adjusted

p = 0.0089

,

r = 0.22

) and forefoot (adjusted

p = 0.0015

,

r = 0.26

) remained statistically significant, whereas the rearfoot, midfoot, and toe did not.

For representative forefoot-channel selection, Figure 4b shows that both P6 and P7 exhibited significant between-group differences (Holm-adjusted

p < 1 \times 10^{- 4}

) with rank-biserial correlations of 0.35 and 0.54, respectively. In contrast, the robust-dispersion comparison based on IQR/median showed no marked instability for either channel, suggesting that the group difference was mainly reflected by a shift in distribution location. Considering both disease sensitivity and robustness, P6 was retained as the representative forefoot channel. Accordingly, p1 and p6 were chosen as the representative channels for the heel and forefoot regions, respectively.

For the j-th gait cycle, let

p_{L 1}^{(j)} (t)

,

p_{L 6}^{(j)} (t)

,

p_{R 1}^{(j)} (t)

, and

p_{R 6}^{(j)} (t)

denote the normalized plantar insole sensor signal time series of the representative heel and forefoot channels of the left and right feet. For each representative channel

s \in {L 1, L 6, R 1, R 6}

, the peak loading and mean loading are defined as

P P_{s}^{(j)} = max_{t \in G C_{j}} p_{s}^{(j)} (t), M P_{s}^{(j)} = \frac{1}{| G C_{j} |} \sum_{t \in G C_{j}} p_{s}^{(j)} (t),

where

G C_{j}

denotes the set of sampling points in the j-th gait cycle and

| G C_{j} |

is the corresponding number of samples.

To quantify the relative loading imbalance between two representative channels, the asymmetry index is defined as

A I (x_{a}^{(j)}, x_{b}^{(j)}) = \frac{| x_{a}^{(j)} - x_{b}^{(j)} |}{x_{a}^{(j)} + x_{b}^{(j)} + ε} .

(7)

Since the normalized plantar insole sensor signal data values are nonnegative,

A I \in [0, 1)

, and a larger value indicates a greater loading difference between the two compared channels.

Six representative channel pairs are considered:

P_{1} = (L 6, L 1)

,

P_{2} = (R 6, R 1)

,

P_{3} = (R 6, L 1)

,

P_{4} = (L 6, R 1)

,

P_{5} = (R 6, L 6)

, and

P_{6} = (L 1, R 1)

. Here,

P_{1}

and

P_{2}

characterize within-foot forefoot–heel coordination,

P_{3}

and

P_{4}

characterize diagonal inter-foot loading relationships, and

P_{5}

and

P_{6}

characterize inter-foot asymmetry between homologous forefoot and heel regions, respectively.

Based on these channel pairs, the plantar-loading asymmetry features of the j-th gait cycle are defined as

A S_{1}^{(j)} = A I (P P_{L 6}^{(j)}, P P_{L 1}^{(j)}), A S_{2}^{(j)} = A I (P P_{R 6}^{(j)}, P P_{R 1}^{(j)}),

(8)

A S_{3}^{(j)} = A I (P P_{R 6}^{(j)}, P P_{L 1}^{(j)}), A S_{4}^{(j)} = A I (P P_{L 6}^{(j)}, P P_{R 1}^{(j)}),

(9)

A S_{5}^{(j)} = A I (P P_{R 6}^{(j)}, P P_{L 6}^{(j)}), A S_{6}^{(j)} = A I (P P_{L 1}^{(j)}, P P_{R 1}^{(j)}),

(10)

A S_{7}^{(j)} = A I (M P_{L 6}^{(j)}, M P_{L 1}^{(j)}), A S_{8}^{(j)} = A I (M P_{R 6}^{(j)}, M P_{R 1}^{(j)}),

(11)

A S_{9}^{(j)} = A I (M P_{R 6}^{(j)}, M P_{L 1}^{(j)}), A S_{10}^{(j)} = A I (M P_{L 6}^{(j)}, M P_{R 1}^{(j)}),

(12)

A S_{11}^{(j)} = A I (M P_{R 6}^{(j)}, M P_{L 6}^{(j)}), A S_{12}^{(j)} = A I (M P_{L 1}^{(j)}, M P_{R 1}^{(j)}) .

(13)

Accordingly, the plantar-loading asymmetry feature vector for the j-th gait cycle is written as

A S^{(j)} = [A S_{1}^{(j)}, A S_{2}^{(j)}, \dots, A S_{12}^{(j)}] .

(14)

Here,

A S_{1}^{(j)}

–

A S_{6}^{(j)}

are the peak-loading asymmetry features, whereas

A S_{7}^{(j)}

–

A S_{12}^{(j)}

are the mean-loading asymmetry features. Accordingly, the plantar-loading asymmetry feature set for the j-th gait cycle consists of 12 features.

From a biomechanical perspective, these features provide a compact and interpretable description of gait imbalance. The within-foot forefoot–heel asymmetry terms reflect abnormalities in loading-transfer coordination from heel contact to forefoot propulsion; the between-foot homologous asymmetry terms characterize lateral loading bias between corresponding plantar regions; and the diagonal asymmetry terms reflect contralateral coordination changes during alternating support [65,66,67].

3.5.3. Gait-Phase Ratio Imbalance Features

Abnormal gait is commonly characterized by imbalanced stance–swing proportions and disrupted temporal relationships among gait sub-phases. As these temporal patterns are closely linked to gait rhythm and phase coordination, they serve as informative indicators of gait abnormalities. Based on this rationale, this paper derives gait-phase ratio imbalance features from gait-state segmentation results to quantify representative temporal imbalance patterns within the gait cycle. These features characterize both the global stance–swing proportion and the local temporal coordination among non-swing sub-phases, providing an interpretable representation of gait-phase dysregulation.

For the j-th gait cycle, the phase durations of the left foot are denoted as

T D_{L I C}^{(j)}

,

T D_{L L R}^{(j)}

,

T D_{L M S}^{(j)}

,

T D_{L T S}^{(j)}

,

T D_{L P S}^{(j)}

, and

T D_{L S W}^{(j)}

, and those of the right foot are denoted as

T D_{R I C}^{(j)}

,

T D_{R L R}^{(j)}

,

T D_{R M S}^{(j)}

,

T D_{R T S}^{(j)}

,

T D_{R P S}^{(j)}

, and

T D_{R S W}^{(j)}

.

Accordingly, the total stance durations of the left and right feet are defined as

T D_{L S T}^{(j)} = T D_{L I C}^{(j)} + T D_{L L R}^{(j)} + T D_{L M S}^{(j)} + T D_{L T S}^{(j)} + T D_{L P S}^{(j)},

(15)

T D_{R S T}^{(j)} = T D_{R I C}^{(j)} + T D_{R L R}^{(j)} + T D_{R M S}^{(j)} + T D_{R T S}^{(j)} + T D_{R P S}^{(j)},

(16)

and the corresponding gait-cycle durations are defined as

T D_{L G C}^{(j)} = T D_{L S T}^{(j)} + T D_{L S W}^{(j)},

(17)

T D_{R G C}^{(j)} = T D_{R S T}^{(j)} + T D_{R S W}^{(j)} .

(18)

First, six global phase-ratio features are constructed:

G R_{1}^{(j)} = \frac{T D_{L S T}^{(j)}}{T D_{L G C}^{(j)}}, G R_{2}^{(j)} = \frac{T D_{L S W}^{(j)}}{T D_{L G C}^{(j)}}, G R_{3}^{(j)} = \frac{T D_{L S W}^{(j)}}{T D_{L S T}^{(j)}} .

(19)

G R_{4}^{(j)} = \frac{T D_{R S T}^{(j)}}{T D_{R G C}^{(j)}}, G R_{5}^{(j)} = \frac{T D_{R S W}^{(j)}}{T D_{R G C}^{(j)}}, G R_{6}^{(j)} = \frac{T D_{R S W}^{(j)}}{T D_{R S T}^{(j)}} .

(20)

Here,

G R_{1}^{(j)}

to

G R_{3}^{(j)}

denote the stance-phase ratio, swing-phase ratio, and swing-to-stance ratio of the left foot, respectively, while

G R_{4}^{(j)}

to

G R_{6}^{(j)}

denote the corresponding features of the right foot.

To further characterize the relative temporal organization among non-swing sub-phases, the phase-ratio feature between any two phases a and b is defined as

G R^{(j)} (a, b) = \frac{min (T D_{a}^{(j)}, T D_{b}^{(j)})}{max (T D_{a}^{(j)}, T D_{b}^{(j)})} .

(21)

For the left foot,

a, b \in {L I C, L L R, L M S, L T S, L P S}

and

a \neq b

. According to the order

(L I C, L L R)

,

(L I C, L M S)

,

(L I C, L T S)

,

(L I C, L P S)

,

(L L R, L M S)

,

(L L R, L T S)

,

(L L R, L P S)

,

(L M S, L T S)

,

(L M S, L P S)

, and

(L T S, L P S)

, the left-foot sub-phase ratio features are defined as

G R_{6 + k}^{(j)} = G R^{(j)} (a_{k}, b_{k}), k = 1, 2, \dots, 10,

(22)

where

(a_{k}, b_{k})

denotes the k-th left-foot phase pair in the above order.

Similarly, for the right foot,

a, b \in {R I C, R L R, R M S, R T S, R P S}

and

a \neq b

. According to the order

(R I C, R L R)

,

(R I C, R M S)

,

(R I C, R T S)

,

(R I C, R P S)

,

(R L R, R M S)

,

(R L R, R T S)

,

(R L R, R P S)

,

(R M S, R T S)

,

(R M S, R P S)

, and

(R T S, R P S)

, the right-foot sub-phase ratio features are defined as

G R_{16 + k}^{(j)} = G R^{(j)} (c_{k}, d_{k}), k = 1, 2, \dots, 10,

(23)

where

(c_{k}, d_{k})

denotes the k-th right-foot phase pair in the above order.

Therefore, the gait-phase ratio imbalance feature set for the j-th gait cycle is defined as

G R^{(j)} = [G R_{1}^{(j)}, G R_{2}^{(j)}, \dots, G R_{26}^{(j)}] .

(24)

Among them,

G R_{1}

–

G R_{6}

are the global temporal ratio features of stance and swing phases for the left and right feet, whereas

G R_{7}

–

G R_{26}

are the relative temporal coordination features among non-swing sub-phases. By definition, all sub-phase ratio features lie within

(0, 1]

, and values closer to 1 indicate more similar phase durations.

3.6. Fractal-Dimension Feature Generation

In the complexity domain of FID-Gait, regional fractal-dimension features were extracted to characterize the nonlinear complexity of plantar insole sensor signals. Higuchi, Petrosian, Katz, and box-counting fractal dimensions were jointly used to capture complementary dynamic properties of insole sensor data sequences from the five plantar subregions of both feet. Compared with whole-foot analysis, this regional strategy reduces the masking of local dynamics caused by signal superposition and better preserves functional variations across the Heel, Rearfoot, Midfoot, Forefoot, and Toe. It thus facilitates the identification of local complexity abnormalities and their spatial distribution during gait. Additionally, inter-limb difference features between homologous subregions were constructed to quantify bilateral complexity asymmetry.

3.6.1. Definition of Regional Plantar Insole Sensor Data Sequences

Based on the five plantar subregions H,

R F

,

M F

,

F F

, and T, for any foot side

S \in {L, R}

, let the j-th gait cycle window contain

N_{j}

sampling points, and let the sampling instant be written as

t_{n}

(

n = 1, 2, \dots, N_{j}

). Let

p_{(S, i)}^{(j)} (t_{n})

represent the preprocessed plantar insole sensor signal value of the i-th original sensor on foot side S at time

t_{n}

within the j-th window, where

i = 1, 2, \dots, 8

.

The regional insole sensor data sequences of the five plantar subregions are defined as follows:

p_{(H, S)}^{(j)} (t_{n}) = p_{(S, 1)}^{(j)} (t_{n})

,

p_{(R F, S)}^{(j)} (t_{n}) = \frac{p_{(S, 2)}^{(j)} (t_{n}) + p_{(S, 3)}^{(j)} (t_{n})}{2}

,

p_{(M F, S)}^{(j)} (t_{n}) = \frac{p_{(S, 4)}^{(j)} (t_{n}) + p_{(S, 5)}^{(j)} (t_{n})}{2}

,

p_{(F F, S)}^{(j)} (t_{n}) = \frac{p_{(S, 6)}^{(j)} (t_{n}) + p_{(S, 7)}^{(j)} (t_{n})}{2}

, and

p_{(T, S)}^{(j)} (t_{n}) = p_{(S, 8)}^{(j)} (t_{n})

.

3.6.2. Computation of Four Types of Fractal Dimensions

For each regional insole sensor data sequence

p_{(z, S)}^{(j)} (t_{n})

, the four time-domain fractal dimensions, HFD, PFD, KFD, and BCFD, characterize the complex dynamic features of the regional insole sensor data sequence from the perspectives of multiscale roughness, local oscillatory complexity, the relationship between trajectory length and overall extension, and the geometric covering-scale relationship, respectively. The detailed mathematical definitions of these four estimators are provided in Appendix A.2.

Figure 5 presents representative regional insole sensor data sequences and the corresponding fractal-fitting schematics for the five plantar subregions in the Normal and Parkinson groups. The results visually illustrate the differences in fractal patterns and waveform characteristics across subregions.

3.6.3. Construction of Left-Right Difference Features

To characterize the complexity asymmetry between homologous plantar subregions on the two sides, left–right difference features were constructed for each subregion and each type of fractal dimension. For any subregion

z \in {H, R F, M F, F F, T}

and any fractal-dimension type

F D \in {H F D, P F D, K F D, B C F D}

, the left–right difference is defined as

Δ F D_{z}^{(j)} = F D_{(z, L)}^{(j)} - F D_{(z, R)}^{(j)},

(25)

where

Δ F D_{z}^{(j)} > 0

indicates that the complexity of the corresponding subregion of the left foot is higher than that of the right foot, whereas

Δ F D_{z}^{(j)} < 0

indicates that the complexity of the corresponding subregion of the right foot is higher.

3.6.4. Regional Fractal-Dimension Feature Set

For the j-th gait cycle window, four types of fractal dimensions are extracted from the five plantar subregions, and the corresponding left–right difference features are further constructed. Therefore, the regional fractal-dimension feature vector is defined as

R F D^{(j)} = {[F D_{(z, S)}^{(j)}, Δ F D_{z}^{(j)}]}_{z \in {H, R F, M F, F F, T}, S \in {L, R}, F D \in {H F D, P F D, K F D, B C F D}} .

(26)

This feature set contains a total of 60 features, corresponding to five plantar subregions, four types of fractal dimensions, and three components for each fractal-dimension type, namely the left-foot value, the right-foot value, and the left–right difference. Among them,

F D_{(z, L)}^{(j)}

and

F D_{(z, R)}^{(j)}

describe the local complexity levels of each subregion of the left and right feet, respectively, whereas

Δ F D_{z}^{(j)}

describes the complexity asymmetry between homologous subregions on the two sides.

For the distribution analysis in Figure 6a, each regional fractal-dimension feature was first assessed for normality using the Shapiro–Wilk test, and between-group comparisons were then performed using the appropriate parametric or nonparametric test. All regional fractal-dimension candidates were jointly corrected within the same feature family using the Holm procedure. Figure 6a summarizes the top 10 regional fractal-dimension features ranked by Holm-adjusted p-values.

As shown in Figure 6a, only three regional fractal-dimension features remained statistically significant after Holm correction, namely Midfoot HFD (Bilateral, adjusted

p < 1 \times 10^{- 4}

), Heel HFD (

Δ

(Left–Right), adjusted

p = 6.95 \times 10^{- 4}

), and Forefoot HFD (Bilateral, adjusted

p = 0.0418

). The remaining seven features in the top-10 ranking did not remain significant after correction. These results indicate that the most robust between-group differences were concentrated in HFD-derived features, involving both bilateral regional complexity and left–right asymmetry.

3.7. Covariance-Adjusted Deviation Feature Generation

To quantify the overall deviation of a single gait-cycle sample from the Normal gait pattern, this paper defines the covariance-adjusted deviation (CAD) feature based on the squared Mahalanobis distance. This metric measures the standardized displacement of a sample from the center of the Normal reference distribution while accounting for feature-scale differences and inter-feature correlations. Its squared value is directly used as a continuous indicator of deviation intensity.

Let the input feature vector corresponding to the jth gait cycle sample be

x^{(j)} \in R^{d}

, where d denotes the feature dimension. Specifically,

x^{(j)} = {[A S^{(j)}, R F D^{(j)}]}^{⊤}

is formed by combining the plantar-loading asymmetry feature set and the regional fractal-dimension feature set. The covariance-adjusted deviation intensity of the jth sample is then defined as

{CAD}^{(j)} = {(x^{(j)} - μ_{HC})}^{⊤} {\tilde{Σ}}_{HC}^{- 1} (x^{(j)} - μ_{HC}) .

(27)

Here,

μ_{HC}

denotes the mean vector of the Normal-control reference distribution,

{\tilde{Σ}}_{HC}

denotes the regularized covariance matrix of the Normal reference distribution, and

{\tilde{Σ}}_{HC}^{- 1}

denotes its inverse. The detailed definitions of the Normal reference mean vector, covariance matrix, and regularization form are provided in Appendix A.3.

Accordingly,

{CAD}^{(j)}

is constructed as a one-dimensional continuous deviation feature. A larger

{CAD}^{(j)}

indicates that the corresponding sample exhibits a stronger overall deviation from the Normal reference distribution in the joint feature space, whereas a smaller

{CAD}^{(j)}

indicates that its feature structure is closer to the Normal gait pattern. Because this feature is defined under covariance constraints, it reflects not only the dispersion of individual feature dimensions but also the correlation structure among features, thereby providing a statistically consistent quantitative description of gait abnormality.

The CAD distributions in Figure 6b were compared at the subject level after normality assessment with the Shapiro–Wilk test. Because CAD was analyzed as a single prespecified summary endpoint, this comparison did not involve a multiple-testing family and therefore no additional post hoc multiplicity correction was required. The between-group difference was evaluated using the Wilcoxon rank-sum test. As shown in Figure 6b, the Parkinson group exhibited significantly higher CAD values than the Normal group (raw

p < 1 \times 10^{- 4}

), with a large rank-biserial effect size (

r = - 0.946

), together with a higher median and a broader distribution. This result indicates a stronger overall deviation from the Normal reference distribution in the fused feature space and further supports the effectiveness of CAD as an interpretable deviation-sensitive feature for characterizing abnormal gait patterns.

Spatial Box-Counting Fractal Dimension Based on CAD

To further characterize the distribution pattern of gait-cycle samples in a fused multidomain feature space, a Gait Feature Space was constructed based on the AS score, the RFD score, and CAD. In this space, each gait-cycle sample is represented as a three-dimensional point, so that gait imbalance, fractal complexity, and deviation from the Normal reference distribution can be jointly described within a unified geometric framework.

The AS score of the j-th gait cycle sample was defined as

A S {score}^{(j)} = \frac{1}{N_{A}} \sum_{k = 1}^{N_{A}} \frac{a_{k}^{(j)} - μ_{a, k, NL}}{σ_{a, k, NL} + ε},

(28)

where

N_{A}

is the number of asymmetry-domain features,

a_{k}^{(j)}

is the value of the k-th asymmetry feature of the j-th gait cycle sample,

μ_{a, k, NL}

and

σ_{a, k, NL}

are the mean and standard deviation of the corresponding feature in the Normal reference samples, and

ε

is a small constant introduced to avoid division by zero.

Similarly, the RFD score of the j-th gait cycle sample was defined as

R F D {score}^{(j)} = \frac{1}{N_{R}} \sum_{k = 1}^{N_{R}} \frac{r_{k}^{(j)} - μ_{r, k, NL}}{σ_{r, k, NL} + ε},

(29)

where

N_{R}

is the number of regional fractal-dimension features,

r_{k}^{(j)}

is the value of the k-th fractal-dimension feature of the j-th gait cycle sample, and

μ_{r, k, NL}

and

σ_{r, k, NL}

are the corresponding mean and standard deviation in the Normal reference samples.

Based on these definitions, the three coordinates of the j-th gait cycle sample in the 3D Gait Feature Space were given by

X^{(j)} = A S {score}^{(j)}, Y^{(j)} = R F D {score}^{(j)}, Z^{(j)} = C A D^{(j)} .

(30)

Accordingly, each gait cycle sample was represented as

g^{(j)} = {[A S {score}^{(j)}, R F D {score}^{(j)}, C A D^{(j)}]}^{⊤},

(31)

and the corresponding set of gait cycle samples for subject u was written as

G_{u} = {\{g^{(j)}\}}_{j = 1}^{N_{u}},

(32)

where

N_{u}

denotes the number of valid step-level samples of subject u.

In this representation, the X-, Y-, and Z-axes, respectively, denote sample displacement in the gait-imbalance domain, displacement in the regional fractal-complexity domain, and overall abnormal deviation intensity in the covariance-aware fused feature space, jointly forming a unified 3D Gait Feature Space for step-level gait representation. To characterize subject-specific sample distributions in this space, a three-dimensional box-counting fractal-dimension analysis was performed (Appendix A.3). A lower

D_{B, u}

indicates a more compact and regular distribution, whereas a higher

D_{B, u}

indicates a more dispersed and irregular spatial pattern. Therefore,

D_{B, u}

complements

C A D^{(j)}

by providing a geometric measure of multidomain gait abnormality.

As shown in Figure 7, the Normal subject forms a compact cluster with limited voxel occupancy and a spatial fractal dimension of 0.9529, whereas the Parkinson subject exhibits a more dispersed distribution across the AS, RFD, and CAD axes and a higher fractal dimension of 1.2586, indicating stronger step-level deviations and greater multidomain structural complexity. Figure 7c further shows that the Parkinson group is shifted toward higher spatial fractal-dimension values, peaking at approximately 1.2380, compared with 0.9615 for the Normal group. Overall, these results indicate that the spatial box-counting fractal dimension complements

C A D

by capturing both deviation from the Normal baseline and the organizational complexity of gait states in the fused multidomain space.

The distributions of all normalized model-input features are provided in Appendix C (Figure A1).

4. Experiments and Results

To systematically evaluate the effectiveness and reproducibility of the proposed FID-Gait framework for Parkinson’s disease (PD) identification, experiments were conducted using plantar insole sensor data collected by smart insoles. The experimental analysis consisted of two parts: a gait-cycle-level classification experiment to evaluate the discriminative capability of the proposed multidomain fused features for abnormal gait samples and a subject-level classification experiment to further assess the stability and discriminative ability of the model at the individual level.

Prior to classification modeling, the raw plantar insole sensor signals were preprocessed by smoothing, boundary completion, and within-record normalization, which were followed by gait segmentation, cycle alignment, and feature extraction. Based on these procedures, a multidomain fused feature set was constructed for classification, including five feature groups across three domains: the fractal domain (regional fractal-dimension features), the PLPI domain (plantar-loading asymmetry features and gait-phase ratio features), and the deviation domain (covariance-adjusted deviation features and spatial box-counting fractal-dimension features).

To reduce scale discrepancies and distributional imbalance across different feature domains, thereby improving the stability and separability of multidomain fusion modeling while preventing data leakage, feature standardization was further applied to the classification inputs. Specifically, within each experimental split, the mean

μ_{j}

and standard deviation

σ_{j}

of the j-th feature were estimated exclusively from the training set, and each feature value

x_{i j}

was transformed as

{\tilde{x}}_{i j} = \frac{x_{i j} - μ_{j}}{σ_{j}} .

The same parameters were then consistently applied to the corresponding validation and test sets.

To assess the discriminative performance of the proposed feature framework, nine classical machine learning classifiers were compared: Decision Tree, Logistic Regression, K-Nearest Neighbors, Random Forest, Gaussian Naive Bayes, Gradient Boosting, Multilayer Perceptron, Support Vector Machine, and AdaBoost. The experiments employed stratified splitting for the training, validation, and test sets, and five-fold stratified cross-validation was further performed to evaluate model robustness and generalization ability. The evaluation metrics included Accuracy, Precision, Recall, F1-score, and AUC. The evaluation metrics were defined as

Accuracy = \frac{T P + T N}{T P + T N + F P + F N},

Precision = \frac{T P}{T P + F P},

Recall = \frac{T P}{T P + F N},

F 1 - score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall},

where

T P

,

T N

,

F P

, and

F N

denote the numbers of true positives, true negatives, false positives, and false negatives, respectively. In addition, the area under the receiver operating characteristic curve (AUC) was adopted to measure the overall discriminative ability of a classifier across different decision thresholds.

4.1. Comparative Classification Performance

First, the proposed method was evaluated on gait-cycle-level features under a standard 7:3 split setting with an independent validation set further separated from the training portion. Different from the aggregated discrimination at the subject level, each sample in this setting corresponds to an aligned gait cycle, and the resulting performance therefore more directly reflects the ability of the model to identify local cycle-level gait patterns.

The classification results of the nine evaluated classifiers under the gait-cycle-level 7:3 split setting are summarized in Table 3. Most classifiers achieved high performance on this task, suggesting that the proposed multidomain fused representation effectively distinguishes PD gait cycles from normal gait cycles.

Among all classifiers, Multilayer Perceptron (MLP) achieved the best overall performance at the gait-cycle level with the highest Accuracy (0.9911) and F1-score (0.9947). Random Forest yielded the highest Recall (0.9986) and AUC (0.9990). KNN also showed highly competitive performance, achieving an Accuracy of 0.9894 and an F1-score of 0.9938. Overall, under the present random gait-cycle-level split setting, the multidomain fused features provided strong discrimination between PD and Normal gait cycles, suggesting that the proposed complexity–imbalance–deviation representation is effective for identifying abnormal cycle-level gait patterns.

In addition, subject-level classification was further performed under the same 7:3 split setting. In this case, each sample corresponds to an individual subject rather than a single aligned gait cycle, and the results therefore reflect the stability and discriminative ability of the model at the individual level more directly. The corresponding results are summarized in Table 4.

Under the 7:3 split setting, logistic regression and AdaBoost achieved the highest Accuracy, both reaching 0.9022. Among them, logistic regression yielded the highest AUC (0.9621), whereas AdaBoost achieved the highest F1-score (0.9302). MLP also showed strong competitive performance with an Accuracy of 0.8913, the highest Recall of 0.9531, and an AUC of 0.9481. These findings suggest that FID-Gait retains good discriminative performance at the subject level.

Figure 8 presents the visualization results of the top-performing models at two levels. Figure 8a,b shows the ROC curve and the corresponding confusion matrix for MLP, which achieved the highest performance in gait cycle-level classification. At the subject level, AdaBoost was selected for visualization because it achieved the highest F1-score while sharing the highest Accuracy with Logistic Regression. Figure 8c,d presents the ROC curve and the corresponding confusion matrix for AdaBoost. These results further demonstrate the effectiveness of the proposed multidomain fused features for PD gait classification at both the gait-cycle and subject levels.

4.2. Independent Validation Performance

To further assess the within-training stability and generalization ability of FID-Gait, an independent validation set was separated from the training portion at a ratio of 8:2 after the standard 7:3 split, and validation was performed at both the gait-cycle and subject levels before testing.

At the gait-cycle level, the independent validation results remained competitive, although the overall performance was slightly lower than in the 7:3 split experiment, and the best-performing classifier varied across metrics. Gradient Boosting achieved the highest Accuracy (0.9055) and F1-score (0.9460), whereas MLP achieved the highest AUC (0.9023). AdaBoost also showed competitive performance with an Accuracy of 0.9046 and an F1-score of 0.9454. These findings indicate that the gait-cycle-level features retained discriminative value across different training-data sub-splits despite some sensitivity to data partitioning.

At the subject level, the validation results showed a pattern broadly consistent with that of the 7:3 split experiment, although the top-performing classifier changed under the new sub-split. Random Forest achieved the highest Accuracy (0.8837) and F1-score (0.9091), whereas MLP yielded the highest AUC (0.9154) and maintained strong overall performance. AdaBoost also remained stable, achieving an Accuracy of 0.8605 and an F1-score of 0.8889. Overall, these results suggest that FID-Gait maintains good stability and generalization capability, particularly at the subject level. The representative independent validation results at the gait-cycle and subject levels are summarized in Table 5.

4.3. Contribution Analysis of Feature Domains

To assess the contribution of different feature domains to classification performance, an ablation study was performed. Starting from the full feature set, three settings were constructed by removing the PLPI, deviation, and fractal domains, respectively, while keeping the same training and testing configuration. Table 6 presents the results obtained using MLP at the gait-cycle level and AdaBoost at the subject level.

At the gait-cycle level, all ablation settings reduced performance compared with the full feature set. The full feature set achieved the best results with Accuracy, Precision, Recall, F1-score, and AUC scores of 0.9911, 0.9940, 0.9955, 0.9947, and 0.9973, respectively. Removing the fractal domain caused the largest decline, reducing Accuracy, F1-score, and AUC to 0.9799, 0.9882, and 0.9891, respectively. Removing the PLPI domain also decreased performance with Accuracy and F1-score dropping to 0.9893 and 0.9937. In contrast, removing the deviation domain produced only a small decline with Accuracy, F1-score, and AUC results of 0.9909, 0.9946, and 0.9969. These results indicate that the fractal domain contributed the most at the gait-cycle level.

At the subject level, the full feature set again achieved the best overall performance, with Accuracy, F1-score, and AUC values of 0.9022, 0.9302, and 0.9235, respectively. Removing the PLPI domain reduced Accuracy and F1-score to 0.8696 and 0.9048, although AUC increased slightly to 0.9364. Removing the deviation domain caused the largest decline, reducing Accuracy, F1-score, and AUC to 0.8152, 0.8722, and 0.8677, respectively. Removing the fractal domain also lowered Accuracy and F1-score to 0.8696 and 0.9032, with an AUC of 0.9219.

Figure 9 summarizes the contribution of each feature domain by showing the average performance degradation of each ablation setting relative to the full feature set across all classifiers and both evaluation levels. Larger values indicate greater performance loss after removal of the corresponding domain. Among the three domains, removing the fractal domain caused the largest decreases in Accuracy, Recall, and F1-score, indicating its dominant contribution to discriminative performance. Removing the PLPI domain also led to clear reductions, particularly in Recall and F1-score, highlighting its complementary role. In contrast, removing the deviation domain produced relatively smaller decreases across most metrics, although its effect on Precision remained noticeable. These results are consistent with Table 6 and further support the complementary value of the proposed multidomain feature representation. The corresponding statistical results are provided in Appendix B (Table A2).

4.4. Comparative Evaluation of Fractal-Dimension Components

As shown in Table 6 and Figure 9, removing the fractal domain caused the most pronounced performance degradation among all feature-domain ablation settings. Therefore, a refined ablation experiment was further conducted to compare different fractal-dimension components. Starting from a baseline model without any fractal features, four additional settings were constructed by introducing only one fractal descriptor, namely HFD, PFD, KFD, or BCFD, while keeping the remaining non-fractal features unchanged. The results of all nine classifiers under each setting are listed in Table 7, and the averaged gait cycle-level and subject-level performance across classifiers are summarized in Table 8 and Table 9, respectively.

From the averaged gait cycle-level results, the setting with only HFD achieved the best performance with mean Accuracy, mean F1-score, and mean AUC values reaching 0.9205, 0.9470, and 0.9571, respectively, all of which were higher than those of the baseline model without fractal features (0.9036, 0.9347, and 0.9442). The settings with only KFD and only BCFD also yielded performance improvements to varying degrees, whereas the setting with only PFD showed slightly lower mean Accuracy and mean F1-score values than the baseline despite a marginally higher mean AUC. Overall, HFD was the most effective single fractal component at the gait cycle level.

At the individual classifier level, MLP under the “Base + only HFD” setting achieved the highest gait cycle level Accuracy and F1-score with values of 0.9887 and 0.9933, respectively. Compared with the corresponding baseline MLP model, its Accuracy, F1-score, and AUC were all improved. Similar trends were also observed for several other classifiers, such as KNN, SVM, and Decision Tree, further indicating that HFD provided the most effective complementary discriminative information among the four fractal-dimension descriptors.

For the subject-level evaluation, only the averaged cross-classifier results are reported, rather than listing the detailed performance of each classifier individually, because the main purpose of this analysis is to provide a complementary validation of the gait cycle-level findings from the perspective of aggregated decision making. As shown in Table 9, the setting with only PFD achieved the best averaged subject-level performance with Accuracy, Precision, Recall, F1-score, and AUC reaching 0.8563, 0.9010, 0.8924, 0.8962, and 0.9044, respectively, all of which were higher than those of the baseline model. In contrast, although HFD, KFD, and BCFD also showed certain gains, their overall subject-level performance remained inferior to that of PFD. These findings suggest that HFD was more advantageous at the gait cycle level, whereas PFD exhibited better overall adaptability after subject-level aggregation.

4.5. Cross-Validation Performance and Robustness

To further evaluate the robustness of the proposed FID-Gait framework, five-fold stratified cross-validation was conducted at both the gait cycle level and the subject level for all classifiers. For each evaluation metric, the mean across the five folds and the corresponding 95% confidence interval (CI) were calculated from the fold-wise results using the t-distribution. The results are summarized in Table 10 and Table 11.

As shown in Table 10, the gait cycle-level cross-validation results were broadly consistent with the test-set evaluation. MLP achieved the highest mean Accuracy (0.9921) and F1-score (0.9953), Random Forest yielded the highest mean Recall (0.9986), and SVM achieved the highest mean Precision (0.9962). Random Forest and SVM both reached the highest AUC (0.9991). KNN and Gradient Boosting also maintained strong performance, whereas Logistic Regression and Gaussian Naive Bayes performed relatively worse, with Gaussian Naive Bayes showing the lowest results across most metrics.

At the subject level, the cross-validation results also supported the robustness of the proposed representation, as shown in Table 11. AdaBoost achieved the highest mean Accuracy (0.9314), mean Precision (0.9523), mean F1-score (0.9515), and mean AUC (0.9807), whereas Random Forest obtained the highest mean Recall (0.9627). Gradient Boosting and MLP also showed competitive performance. In contrast, Gaussian Naive Bayes and KNN showed relatively lower subject-level performance particularly in terms of mean Accuracy and mean F1-score.

Figure 10 further illustrates the distributions of Accuracy and F1-score for the top five classifiers under five-fold cross-validation at both the subject and gait cycle levels. Overall, gait cycle-level models achieved higher performance and smaller inter-fold variability than subject-level models, indicating better stability across data partitions.

At the subject level, AdaBoost showed the best overall performance, with higher central values and a more favorable distribution for both Accuracy and F1-score, which is consistent with the mean results in Table 11. Gradient Boosting, Random Forest, and MLP also performed well, whereas Decision Tree showed lower central values and greater dispersion, indicating lower stability across folds.

At the gait cycle level, MLP showed the best performance, with the highest Accuracy and F1-score and relatively small dispersion, indicating both strong discrimination and high stability across folds. SVM, KNN, and Random Forest also showed strong and stable performance, whereas Decision Tree exhibited slightly lower central values and greater variability.

These visual results are consistent with the quantitative findings in Table 10 and Table 11, further confirming the robustness of the proposed FID-Gait representation under different cross-validation partitions. In particular, AdaBoost performed best at the subject level, whereas MLP achieved the best performance at the gait cycle level.

4.6. Performance Comparison of Machine Learning and Deep Learning Models Across Different Data Levels

Table 12 presents the classification performance of different models at the raw data, subject, and gait-cycle levels. The evaluation metrics include accuracy and average computation time per subject. The raw-data-level deep learning models used plantar insole sensor time-series signals as input, whereas the subject-level and gait-cycle-level models were built on the proposed three-domain feature representation after preprocessing, gait segmentation, and feature extraction. To improve comparability, the classifier settings and training-related parameters were kept consistent whenever applicable.

The deep learning baselines were implemented with a fixed random seed of 42. At the subject level, gait cycles were grouped by subject and ordered by cycle index to form variable-length sequences, which were zero-padded to a common length. The data were split in a stratified manner into training, validation, and test subsets, and feature standardization was fitted on the training set only. All deep learning models were trained using Adam with a learning rate of

1 \times 10^{- 3}

, a batch size of 16, and 40 epochs, and the model with the best validation F1-score was retained for final evaluation. The LSTM and BiLSTM models used a single recurrent layer with a hidden size of 64 with the BiLSTM variant using bidirectional recurrence. In the CNN-LSTM and CNN-BiLSTM models, the convolutional front-end consisted of two one-dimensional convolutional layers with kernel size 3, padding 1, and 64 channels, which was followed by the corresponding recurrent layer. The dropout rate was 0.3, and the MLP baseline used 128 hidden units in the first fully connected layer.

At the raw data level, model accuracy ranged from 0.5700 to 0.7000, with CNN-LSTM and CNN-BiLSTM achieving the highest accuracy of 0.7000, whereas MLP required the shortest computation time (0.2974 s). At the subject level, accuracy increased to 0.8387–0.9022 with AdaBoost achieving the highest accuracy of 0.9022; computation times were similar across models (0.9807–0.9876 s). At the gait cycle level, all models achieved accuracies above 0.9900. BiLSTM obtained the highest accuracy (0.9937), whereas MLP showed the shortest average computation time per subject (1.3342 s), indicating a favorable balance between accuracy and efficiency.

The computation times reported in the table represent the average per subject. In large-scale or full-dataset scenarios, these differences would accumulate and become more pronounced. Overall, MLP and AdaBoost not only achieved strong classification performance at their respective levels but also showed clear advantages in computational efficiency, indicating greater practical value in real-world applications.

4.7. Subject-Independent Evaluation Using Leave-One-Subject-Out Cross-Validation

To further evaluate the classification performance of the proposed FID-Gait framework under unseen-subject conditions, leave-one-subject-out cross-validation (LOSO) was conducted at the subject level. In each iteration, one subject was used as the test set, and the remaining subjects were used for training. The predictions from all iterations were aggregated to calculate Accuracy, Precision, Recall, F1-score, and AUC under this strict subject-independent setting.

The subject-level LOSO results are summarized in Table 13. Overall, all classifiers maintained relatively good performance. Among them, MLP achieved the highest Accuracy (0.8954), Precision (0.9252), F1-score (0.9252), and AUC (0.9268), whereas Random Forest yielded the highest Recall (0.9533). Gradient Boosting, SVM, and AdaBoost also showed competitive performance, with F1-scores of 0.9144, 0.9125, and 0.9032, respectively. By contrast, Gaussian Naive Bayes and Decision Tree showed relatively lower performance.

Overall, although LOSO is a more stringent evaluation protocol, the proposed FID-Gait framework still achieved good classification performance across multiple classifiers, indicating that the constructed multidomain fused features retain good discriminative ability under subject-independent evaluation.

4.8. Accuracy Comparison with Previous Studies Based on Plantar Insole Sensor Signals for PD Classification

To further evaluate the effectiveness of the proposed framework, Table 14 compares the classification accuracy of the proposed method with representative studies reported in the literature. As shown in the table, previous studies mainly relied on time-domain features, frequency-domain features, combined time- and frequency-domain features, raw time-series signals, or spectrogram image representations derived from insole sensor data. Their reported accuracies ranged from 77.33% to 93.75%, depending on the adopted feature representation and classifier model.

In comparison, the proposed FID-Gait framework achieved superior performance at the gait cycle level, reaching an accuracy of 99.11% with the MLP classifier. In addition, at the subject level, the best-performing classifier, AdaBoost, achieved an accuracy of 90.22%. Unlike methods based only on time-domain or frequency-domain information, the proposed framework integrates three complementary domains, namely the fractal domain, imbalance domain, and deviation domain, thereby providing a more comprehensive representation of PD-related gait characteristics.

Overall, the comparative results suggest that the proposed FID-Gait framework achieves highly competitive performance relative to existing studies on insole sensor data-based PD gait classification, especially at the gait cycle level, while also maintaining strong discriminative ability at the subject level.

5. Discussion

This paper proposed the FID-Gait framework for PD identification based on plantar insole sensor data by integrating fractal-domain, imbalance-domain, and deviation-domain features. The results showed that the proposed multidomain representation achieved strong discriminative performance at both the gait-cycle level and the subject level under the present experimental settings. Under the gait-cycle-level 7:3 split setting, MLP achieved the highest Accuracy (0.9911) and F1-score (0.9947). Under the subject-level 7:3 split setting, AdaBoost and Logistic Regression yielded the highest Accuracy (0.9022), whereas AdaBoost achieved the highest F1-score (0.9302). The five-fold cross-validation results further supported the robustness of the proposed framework within this dataset with MLP performing best at the gait-cycle level and AdaBoost performing best at the subject level. Collectively, these findings suggest that jointly modeling nonlinear gait complexity, biomechanical imbalance, and deviation from a Normal reference distribution can improve the discriminative capability of PD gait analysis.

The ablation results showed that the fractal domain contributed most substantially to the overall performance, particularly at the gait-cycle level, suggesting that fractal features capture aspects of gait complexity that are not adequately represented by conventional statistical descriptors. The imbalance domain provided complementary information related to plantar loading distribution and gait-phase organization, whereas the deviation domain appeared to be especially important at the subject level. In particular, removing the deviation domain reduced subject-level Accuracy from 0.9022 to 0.8152, supporting its relevance for subject-wise global abnormality assessment. These findings further support the complementary roles of the three feature domains in characterizing PD-related gait abnormalities.

A notable result of this paper is that the optimal classifier varied across evaluation protocols. Under the subject-level 7:3 split and five-fold cross-validation settings, AdaBoost showed the most favorable overall performance, whereas under the stricter subject-level LOSO setting, MLP became the best-performing classifier. Because LOSO leaves one entire subject out for testing in each iteration, it provides a more conservative assessment of subject-independent performance than random split or conventional k-fold validation [71]. In this context, our results suggest that AdaBoost was more effective at capturing discriminative structure under dataset-dependent partitioning, whereas MLP showed comparatively stronger performance under the LOSO protocol within this dataset. The representative independent validation results further showed that validation-stage stability and final test performance were not necessarily identical, although MLP, AdaBoost, Gradient Boosting, and Random Forest consistently remained among the strongest models. This observation suggests that the performance advantage of FID-Gait is mainly associated with the multidomain fused representation itself rather than reliance on a single classifier.

From an application-oriented perspective, the proposed framework showed a more favorable balance between accuracy and efficiency than direct raw-signal modeling under the current offline experimental setting. At the raw-data level, model Accuracy ranged from 0.5700 to 0.7000, whereas after the introduction of the three-domain fused features, the selected classifiers achieved Accuracy values of 0.9022 at the subject level with AdaBoost and 0.9911 at the gait-cycle level with MLP. In addition, AdaBoost at the subject level and MLP at the gait-cycle level showed relatively short average computation times. These findings indicate that interpretable multidomain features can provide an efficient alternative to direct end-to-end raw-signal modeling in the present task setting. However, these results should be interpreted strictly within the scope of offline experiments on a single public dataset. Although the proposed framework showed strong performance under random split, cross-validation, and LOSO evaluation within this dataset, its external generalizability has not yet been established. In particular, robustness across independent cohorts, acquisition protocols, wearable devices, and real-world clinical environments remains to be confirmed in future studies.

5.1. Real-World Applicability and Feasibility of Real-Time Implementation

Although this research was conducted offline, the results provide preliminary evidence for practical applicability. In real-time deployment, total system latency depends not only on classifier inference but also on signal acquisition, preprocessing, gait-cycle segmentation, and feature extraction. Previous studies have shown that wearable real-time gait systems must balance accuracy, latency, energy consumption, and edge-device resource constraints [64,72].

In this context, FID-Gait shows potential for near-real-time application. The relatively short computation times of AdaBoost and MLP, together with the LOSO performance of MLP on unseen subjects, support its potential for deployment to new users. However, the reported computation times were obtained offline and should not be regarded as end-to-end latency in real wearable systems. In free-living scenarios, factors such as sensor displacement, unstable contact, gait-speed variation, turning, noise, and data loss may affect the stability of gait segmentation and feature extraction. Therefore, the present findings support the feasibility of translating FID-Gait to real-time wearable applications, although its practical performance still requires end-to-end validation on embedded or edge platforms.

5.2. Limitations

Several limitations should be noted. First, the proposed framework was evaluated mainly under offline conditions; thus, its end-to-end latency, operational stability, and real-time feasibility in wearable applications remain to be verified. In addition, although random split, five-fold cross-validation, and leave-one-subject-out validation were performed, this paper was conducted using only one public dataset for binary PD-versus-Normal classification. Therefore, the reported performance mainly reflects within-dataset generalization, and external generalizability to other cohorts, sensor setups, and acquisition conditions remains unconfirmed. Validation in larger, multicenter, multi-device, and more heterogeneous clinical cohorts is still required.

Another limitation is related to the acquisition protocol of the public PhysioNet gait dataset. The dataset provides continuous walking recordings rather than repeated trials with standardized rest intervals. Therefore, although a substantial number of gait cycles were extracted, inter-trial consistency and its potential influence could not be specifically evaluated in this paper. This is relevant because gait measurements may be affected by testing procedures and protocol design, and recent studies have emphasized the importance of standardized gait protocols in Parkinson’s disease research [50,51,52,53].

In addition, although the proposed three-domain features are interpretable, they remain handcrafted descriptors and may not fully capture long-range temporal dependencies or more complex sequential dynamics. The deviation domain is also defined relative to a Normal reference distribution, which may be affected by sample composition, acquisition conditions, and device-related variation. Future work should therefore validate the framework in real-world, cross-device, and cross-cohort settings, further examine repeated-trial protocols with predefined inter-trial rest intervals, and further explore integration with sequence-based deep learning models to better balance interpretability and temporal modeling capacity [64,71,72].

6. Conclusions

This paper proposed FID-Gait, which is a three-domain fusion framework for Parkinson’s disease (PD) identification using plantar insole sensor data. By integrating fractal, plantar-loading–phase imbalance, and deviation domains, the framework provides an interpretable representation of gait complexity, biomechanical imbalance, and global deviation from normal gait.

Experimental results showed that FID-Gait achieved strong discriminative performance at both the gait-cycle and subject levels. At the gait-cycle level, MLP achieved the best performance under the 7:3 split setting with an Accuracy of 0.9911 and an F1-score of 0.9947. At the subject level, Logistic Regression and AdaBoost achieved the highest Accuracy of 0.9022, while AdaBoost obtained the best F1-score of 0.9302. Five-fold cross-validation further supported the robustness of the proposed framework, and subject-level LOSO evaluation provided preliminary evidence of subject-independent generalization within this dataset.

Ablation analysis confirmed that all three domains contributed to the final performance with the fractal domain showing the largest contribution overall and the deviation domain playing an important role in subject-level classification. These findings suggest that PD gait can be more effectively characterized through the integrated modeling of gait complexity, plantar-loading imbalance, and deviation from normal reference patterns.

Overall, FID-Gait achieved strong performance across multiple evaluation protocols while maintaining interpretability and computational efficiency. However, the present findings are based primarily on offline experiments using a single public dataset. Therefore, further validation on independent external datasets and in real-world, cross-device, and cross-cohort settings remains necessary before broader clinical or wearable deployment can be established.

Author Contributions

Conceptualization, methodology, software development, data curation, and writing—original draft preparation, H.L.; formal analysis, visualization, and funding acquisition, H.L., J.M. and X.R.; validation and investigation, H.L., B.C., Y.C., Q.G. and B.L.; resources and writing—review and editing, J.M., I.B., A.B. and V.T.; supervision, J.M.; project administration, I.B., A.B. and V.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by the authors.

Institutional Review Board Statement

Ethical review and approval were waived for this paper because it was based exclusively on secondary analysis of a publicly available and de-identified dataset (PhysioNet Gait in Parkinson’s Disease) and did not involve any new collection of human participant data or direct subject intervention.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data analyzed in this paper are publicly available from the PhysioNet repository (Gait in Parkinson’s Disease). No new data were generated in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Methodological Details: K-Means Thresholding, Gait-State Rules, Fractal-Dimension Estimators, and Covariance-/Space-Based Definitions

Appendix A.1. Detailed K-Means Threshold Generation and Gait-State Detection Rules

For the z-th plantar zone, let

C_{k_{i}, z, I_{C}}

denote the set of insole sensor data samples assigned to cluster

k_{i}

at iteration

I_{C}

, and let

μ_{C (k_{c}, z, I_{C})}

denote the corresponding cluster mean. The clustering procedure iteratively updates cluster assignments and cluster means until convergence. Specifically, the sample-assignment set of cluster

k_{i}

in zone z at iteration

I_{C}

is defined as

C_{k_{i}, z, I_{C}} = \{p_{z} (t_{n}) | |p_{z} (t_{n}) - μ_{C (k_{i}, z, I_{C} - 1)}| \leq |p_{z} (t_{n}) - μ_{C (3 - k_{i}, z, I_{C} - 1)}|\},

(A1)

where

k_{i} \in {1, 2}

and

3 - k_{i}

denotes the other cluster index in the two-cluster partition. The cluster mean is initialized from the corresponding cluster samples at

I_{C} = 0

and is iteratively updated for

I_{C} \geq 1

as

μ_{C (k_{c}, z, I_{C})} = \frac{1}{N_{C (k_{c}, z, I_{C})}} \sum_{p_{z} (t_{n}) \in C_{k_{c}, z, I_{C}}} p_{z} (t_{n}),

(A2)

where

N_{C (k_{c}, z, I_{C})}

denotes the number of samples assigned to cluster

k_{c}

of zone z at iteration

I_{C}

.

After convergence, the adaptive threshold of the z-th plantar zone is defined as the mean value of the cluster with the lower centroid, namely,

T h_{p_{z}} = μ_{C (k_{i}, z, I_{C})}, k_{i} = arg min_{i \in {1, 2}} μ_{C (i, z, I_{C})} .

(A3)

The stopping criterion for cluster

k_{c}

in zone z is defined as

S C (z, k_{c}, I_{C}) = |μ_{C (k_{c}, z, I_{C})} - μ_{C (k_{c}, z, I_{C} - 1)}| < T h_{S C} (z, k_{c}, I_{C}),

(A4)

where

T h_{S C}

is a preset convergence threshold.

The transition points of the six gait states are determined by a set of multizonal detection rules applied to the adaptive thresholds and zonal insole sensor signals. These rules are summarized in Table A1. In this paper, the transition points are determined by threshold-crossing events of zonal plantar insole sensor signals and should not be interpreted as peak-force points.

Table A1. Relationship between gait-state transition points and five-zone plantar insole sensor signals.

WGCS	Time Moments of Transition Points	Symbol	Gait-State Detection Rules
IC	Start	$t_{n} = t_{i = I C, start}$	$p_{H} (t_{n - 1}) < T h_{p_{H}} < p_{H} (t_{n})$
IC	End	$t_{n} = t_{i = I C, end}$	$p_{H} (t_{n}) > T h_{p_{H}} \cap p_{R F} (t_{n - 1}) < T h_{p_{R F}} < p_{R F} (t_{n})$
LR	Start	$t_{n} = t_{i = L R, start}$	determined by the end of IC
LR	End	$t_{n} = t_{i = L R, end}$	$p_{M F} (t_{n - 1}) < T h_{p_{M F}} < p_{M F} (t_{n})$
MS	Start	$t_{n} = t_{i = M S, start}$	determined by the end of LR
MS	End	$t_{n} = t_{i = M S, end}$	$p_{F F} (t_{n - 1}) < T h_{p_{F F}} < p_{F F} (t_{n}) \cap p_{H} (t_{n}) < T h_{p_{H}}$
TS	Start	$t_{n} = t_{i = T S, start}$	determined by the end of MS
TS	End	$t_{n} = t_{i = T S, end}$	$p_{H} (t_{n}) < T h_{p_{H}} \cap p_{R F} (t_{n}) < T h_{p_{R F}} \cap p_{M F} (t_{n}) < T h_{p_{M F}} \cap p_{T} (t_{n - 1}) < T h_{p_{T}} < p_{T} (t_{n})$
PS	Start	$t_{n} = t_{i = P S, start}$	determined by the end of TS
PS	End	$t_{n} = t_{i = P S, end}$	$p_{T} (t_{n + 1}) < T h_{p_{T}} < p_{T} (t_{n})$
SW	Start	$t_{n} = t_{i = S W, start}$	determined by the end of PS
SW	End	$t_{n} = t_{i = S W, end}$	$p_{H} (t_{n - 1}) < T h_{p_{H}} < p_{H} (t_{n})$

Appendix A.2. Detailed Definitions of the Four Fractal-Dimension Estimators

For each regional insole sensor data sequence

p_{(z, S)}^{(j)} (t_{n})

, the four time-domain fractal dimensions, HFD, PFD, KFD, and BCFD, are defined as follows.

Higuchi Fractal Dimension

H F D_{(z, S)}^{(j)} = \frac{\sum_{r = 1}^{R} (x_{r} - \bar{x}) (y_{r} - \bar{y})}{\sum_{r = 1}^{R} {(x_{r} - \bar{x})}^{2}}

(A5)

Here,

x_{r} = log (1 / k_{r})

and

y_{r} = log L_{(z, S)}^{(j)} (k_{r})

, where

k_{r}

is the r-th fitting step size and R is the number of scale points involved in the fitting;

\bar{x}

and

\bar{y}

are the sample means of

{x_{r}}

and

{y_{r}}

, respectively. In addition,

L_{(z, S)}^{(j)} (k_{r}) = \frac{1}{k_{r}} \sum_{m = 1}^{k_{r}} L_{(z, S, m)}^{(j)} (k_{r})

is the average curve length at step size

k_{r}

, and

L_{(z, S, m)}^{(j)} (k_{r}) = \frac{N_{j} - 1}{⌊\frac{N_{j} - m}{k_{r}}⌋ k_{r}} \sum_{q = 1}^{⌊\frac{N_{j} - m}{k_{r}}⌋} |p_{(z, S)}^{(j)} (t_{m + q k_{r}}) - p_{(z, S)}^{(j)} (t_{m + (q - 1) k_{r}})|

is the normalized curve length of the subsequence with starting index m, where

m = 1, 2, \dots, k_{r}

, and q is the segment index in the corresponding subsequence.

H F D_{(z, S)}^{(j)}

is estimated from the slope of the least-squares linear fitting between

log L_{(z, S)}^{(j)} (k_{r})

and

log (1 / k_{r})

.

Petrosian Fractal Dimension

P F D_{(z, S)}^{(j)} = \frac{{log}_{10} (N_{j})}{{log}_{10} (N_{j}) + {log}_{10} (\frac{N_{j}}{N_{j} + 0.4 N_{Δ, z, S}^{(j)}})}

(A6)

Here,

N_{Δ, z, S}^{(j)}

is the number of sign changes in the first-order difference sequence of the regional insole sensor data sequence. This metric is mainly used to characterize the local oscillatory complexity of short time series.

Katz Fractal Dimension

K F D_{(z, S)}^{(j)} = \frac{{log}_{10} (N_{j} - 1)}{{log}_{10} (N_{j} - 1) + {log}_{10} (\frac{d_{(z, S)}^{(j)}}{L_{(z, S)}^{(j)}})}

(A7)

Here,

L_{(z, S)}^{(j)} = \sum_{n = 1}^{N_{j} - 1} |p_{(z, S)}^{(j)} (t_{n + 1}) - p_{(z, S)}^{(j)} (t_{n})|

denotes the total path length of the regional insole sensor data sequence, and

d_{(z, S)}^{(j)} = {max}_{1 \leq n \leq N_{j}} \sqrt{{(n - 1)}^{2} + {(p_{(z, S)}^{(j)} (t_{n}) - p_{(z, S)}^{(j)} (t_{1}))}^{2}}

is the maximum Euclidean distance from the starting point to any other point on the two-dimensional trajectory

(n, p_{(z, S)}^{(j)} (t_{n}))

. Equivalently, the average path length between adjacent sampling points is defined as

{\bar{l}}_{(z, S)}^{(j)} = \frac{L_{(z, S)}^{(j)}}{N_{j} - 1}

. This metric characterizes the relationship between waveform tortuosity and overall spatial extension. A larger value indicates a more tortuous and geometrically complex trajectory.

Box-Counting Fractal Dimension

B C F D_{(z, S)}^{(j)} = \frac{\sum_{r = 1}^{R_{B}} (u_{r} - \bar{u}) (v_{r} - \bar{v})}{\sum_{r = 1}^{R_{B}} {(u_{r} - \bar{u})}^{2}}

(A8)

Here,

u_{r} = log (1 / ε_{r})

and

v_{r} = log N_{B, z, S}^{(j)} (ε_{r})

, where

ε_{r}

is the r-th grid side length and

R_{B}

is the number of grid scales involved in the fitting;

\bar{u}

and

\bar{v}

are the sample means of

{u_{r}}

and

{v_{r}}

, respectively; and

N_{B, z, S}^{(j)} (ε_{r})

is the number of non-empty grids occupied by the regional plantar insole sensor signal data curve when the grid side length is

ε_{r}

. The slope of the least-squares linear fitting between

log N_{B, z, S}^{(j)} (ε_{r})

and

log (1 / ε_{r})

is taken as the estimate of

B C F D_{(z, S)}^{(j)}

.

Appendix A.3. Detailed Definitions of Covariance-Adjusted Deviation and Spatial Box-Counting Fractal Dimension

Here,

μ_{HC}

denotes the mean vector of the Normal-control reference distribution,

{\tilde{Σ}}_{HC}

denotes the regularized covariance matrix of the Normal reference distribution, and

{\tilde{Σ}}_{HC}^{- 1}

denotes its inverse. The Normal-reference mean vector

μ_{HC} = \frac{1}{N_{HC}} \sum_{n = 1}^{N_{HC}} x_{HC}^{(n)}

(A9)

is estimated from the Normal sample set

X_{HC} = \{x_{HC}^{(1)}, x_{HC}^{(2)}, \dots, x_{HC}^{(N_{HC})}\},

(A10)

where

N_{HC}

denotes the number of Normal reference samples. Correspondingly, the Normal-reference covariance matrix is defined as

Σ_{HC} = \frac{1}{N_{HC} - 1} \sum_{n = 1}^{N_{HC}} (x_{HC}^{(n)} - μ_{HC}) {(x_{HC}^{(n)} - μ_{HC})}^{⊤} .

(A11)

Considering that the empirical covariance matrix may become ill-conditioned, nearly singular, or yield an unstable inverse under finite-sample conditions, this paper further adopts a diagonal regularization form

{\tilde{Σ}}_{HC} = Σ_{HC} + λ I,

(A12)

where I is the identity matrix and

λ > 0

is a small regularization parameter. Thus, a more stable precision matrix, i.e., the inverse covariance matrix, can be obtained for deviation estimation.

To further quantify the geometric complexity of the subject-specific sample distribution in this space, a three-dimensional box-counting fractal-dimension analysis was performed. After min–max normalization of the three coordinates into the unit cube

{[0, 1)}^{3}

, the space was uniformly partitioned into

n_{b} \times n_{b} \times n_{b}

voxels with side length

ϵ = \frac{1}{n_{b}} .

(A13)

Let

N_{u} (ϵ)

denote the number of occupied voxels of subject u at scale

ϵ

. For a point cloud with multiscale spatial occupancy,

N_{u} (ϵ) \propto ϵ^{- D_{B, u}},

(A14)

where

D_{B, u}

denotes the box-counting fractal dimension of subject u in the 3D Gait Feature Space. Taking logarithms yields

log N_{u} (ϵ) = D_{B, u} log (\frac{1}{ϵ}) + C,

(A15)

where C is a constant. Therefore, over multiple spatial scales

ϵ_{k}

, the box-counting fractal dimension was estimated as the slope of the least-squares linear fit between

log N_{u} (ϵ_{k})

and

log (1 / ϵ_{k})

:

D_{B, u} = \frac{\sum_{k = 1}^{N_{S}} (log (\frac{1}{ϵ_{k}}) - \bar{log (1 / ϵ)}) (log N_{u} (ϵ_{k}) - \bar{log N_{u} (ϵ)})}{\sum_{k = 1}^{N_{S}} {(log (\frac{1}{ϵ_{k}}) - \bar{log (1 / ϵ)})}^{2}},

(A16)

where

N_{S}

is the number of spatial scales, and

\bar{log (1 / ϵ)}

and

\bar{log N_{u} (ϵ)}

denote the mean values of

log (1 / ϵ_{k})

and

log N_{u} (ϵ_{k})

, respectively.

Appendix B. Statistical Analysis of Feature-Domain Ablation Effects in Figure 9

To further support the trends shown in Figure 9, statistical comparisons were performed on the classifier-level percentage performance drop values under the three ablation settings: Without PLPI domain features, Without deviation-domain features, and Without fractal domain features. Because Figure 9 shows averaged percentage drops, the statistical tests were conducted on the classifier-level percentage-drop values before averaging.

For each evaluation metric, a Friedman test was first used to assess the overall difference among the three ablation settings. When the Friedman test was significant, post hoc pairwise Wilcoxon signed-rank tests with Holm correction were further conducted.

Table A2 shows significant overall differences among the three ablation settings for Accuracy, Precision, Recall, and F1-score. Post hoc comparisons further showed that Without fractal domain features caused significantly larger performance drops than Without PLPI domain features for all four metrics, and it caused significantly larger drops than Without deviation-domain features for Accuracy, Precision, and F1-score. No significant difference was observed between Without PLPI domain features and Without deviation-domain features. These results are consistent with the trend in Figure 9 and further support the strong contribution of the fractal domain to the overall classification performance.

Table A2. Friedman and post hoc Wilcoxon–Holm analyses of classifier-level percentage performance drops under different feature-domain ablation settings corresponding to Figure 9.

Metric	Friedman Statistic	Friedman p-Value	Without PLPI vs. Without Deviation (Holm-Adjusted p)	Without PLPI vs. Without Fractal (Holm-Adjusted p)	Without Deviation vs. Without Fractal (Holm-Adjusted p)
Accuracy	10.8889	0.0043	0.4961	0.0117	0.0234
Precision	14.0000	0.0009	0.1289	0.0117	0.0117
Recall	10.8889	0.0043	0.9102	0.0117	0.1094
F1-score	10.8889	0.0043	0.4961	0.0117	0.0234

Appendix C. Normalized Model-Input Features Used for PD–Normal Comparison

To improve transparency and verifiability, this appendix provides the distributions of the normalized model-input features used for the PD–Normal comparison. In accordance with the main experimental protocol, normalization was fitted using the training subset only and then applied to all samples. The resulting normalized feature distributions are shown in Figure A1.

$Fractalfract 10 00297 g0a1$

Figure A1. Boxplots of normalized model-input features used for the PD–Normal comparison. Normalization was fitted on the training subset only and then applied to all samples. (a) Plantar-loading asymmetry features (AS1–AS12). (b) Regional fractal-dimension features (RFD). (c) Gait-phase ratio features (GR1–GR26). (d) The 3D box-counting fractal dimension based on CAD. (e) The covariance-adjusted deviation (CAD).

$Fractalfract 10 00297 g0a1$

References

Bahureksa, L.; Najafi, B.; Saleh, A.; Sabbagh, M.; Coon, D.; Mohler, M.J.; Schwenk, M. The impact of mild cognitive impairment on gait and balance: A systematic review and meta-analysis of studies using instrumented assessment. Gerontology 2016, 63, 67–83. [Google Scholar] [CrossRef]
Mc Ardle, R.; Morris, R.; Wilson, J.; Galna, B.; Thomas, A.J.; Rochester, L. What can quantitative gait analysis tell us about dementia and its subtypes? A structured review. J. Alzheimer’s Dis. 2017, 60, 1295–1312. [Google Scholar] [CrossRef]
Murman, D.L. Early treatment of Parkinson’s disease: Opportunities for managed care. Am. J. Manag. Care 2012, 18, S183. [Google Scholar]
Kikkert, L.H.J.; Vuillerme, N.; van Campen, J.P.; Hortobagyi, T.; Lamoth, C.J.C. Walking ability to predict future cognitive decline in old adults: A scoping review. Ageing Res. Rev. 2016, 27, 1–14. [Google Scholar] [CrossRef] [PubMed]
Soubra, R.; Chkeir, A.; Novella, J.L. A systematic review of thirty-one assessment tests to evaluate mobility in older adults. Biomed. Res. Int. 2019, 2019, 1354362. [Google Scholar] [CrossRef]
Li, H. Analysis of human gait balance based on plantar pressure sensors. In Tekhnologii Peredachi i Obrabotki Informatsii: Materialy Mezhdunarodnogo Nauchno-Tekhnicheskogo Seminara; Belarusian State University of Informatics and Radioelectronics: Minsk, Belarus, 2024; pp. 86–91. [Google Scholar]
Wang, C.; Su, W.; Li, J.; Xu, J. Bidirectional Mamba-Enhanced 3D Human Pose Estimation for Accurate Clinical Gait Analysis. Fractal Fract. 2025, 9, 603. [Google Scholar] [CrossRef]
Jun, K.; Lee, K.; Lee, S.; Lee, H.; Kim, M.S. Hybrid deep neural network framework combining skeleton and gait features for pathological gait recognition. Bioengineering 2023, 10, 1133. [Google Scholar] [CrossRef] [PubMed]
Palazzo, L.; Suglia, V.; Grieco, S.; Buongiorno, D.; Brunetti, A.; Carnimeo, L.; Amitrano, F.; Coccia, A.; Pagano, G.; D’Addio, G.; et al. A Deep Learning-Based Framework Oriented to Pathological Gait Recognition with Inertial Sensors. Sensors 2025, 25, 260. [Google Scholar] [CrossRef]
Palazzo, L.; Suglia, V.; Grieco, S.; Buongiorno, D.; Pagano, G.; Bevilacqua, V.; D’Addio, G. Optimized deep learning-based pathological gait recognition explored through network analysis of inertial data. In Proceedings of the 2025 IEEE International Symposium on Medical Measurements and Applications (MeMeA), Chania, Greece, 28–30 May 2025; pp. 1–5. [Google Scholar]
Alam, M.N.; Garg, A.; Munia, T.T.K.; Fazel-Rezai, R.; Tavakolian, K. Vertical ground reaction force marker for Parkinson’s disease. PLoS ONE 2017, 12, e0175951. [Google Scholar] [CrossRef] [PubMed]
Rashidi, H.H.; Pantanowitz, J.; Hanna, M.G.; Tafti, A.P.; Sanghani, P.; Buchinsky, A.; Fennell, B.; Deebajah, M.; Wheeler, S.; Pearce, T.; et al. Introduction to artificial intelligence and machine learning in pathology and medicine: Generative and nongenerative artificial intelligence basics. Mod. Pathol. 2025, 38, 100688. [Google Scholar] [CrossRef]
Chakraborty, C.; Bhattacharya, M.; Pal, S.; Lee, S.-S. From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare. Curr. Res. Biotechnol. 2024, 7, 100164. [Google Scholar] [CrossRef]
Mekruksavanich, S.; Jantawong, P.; Jitpattanakul, A. A deep learning-based model for human activity recognition using biosensors embedded into a smart knee bandage. Procedia Comput. Sci. 2022, 214, 621–627. [Google Scholar] [CrossRef]
Mandelbrot, B.B. The Fractal Geometry of Nature; W. H. Freeman and Co.: New York, NY, USA, 1982. [Google Scholar]
Braun, O.; Parlitz, U. Estimating fractal dimensions: A comparative review and open source resource. Chaos 2023, 33, 102101. [Google Scholar]
Dierick, F.; Nivard, A.-L.; White, O.; Buisseret, F. Fractal analyses reveal independent complexity and predictability of gait. PLoS ONE 2017, 12, e0188711. [Google Scholar] [CrossRef]
Fling, B.W.; Curtze, C.; Horak, F.B. Gait asymmetry in people with Parkinson’s disease is linked to reduced integrity of callosal sensorimotor regions. Front. Neurol. 2018, 9, 215. [Google Scholar] [CrossRef]
Seuthe, J.; Hermanns, H.; Hulzinga, F.; D’Cruz, N.; Deuschl, G.; Ginis, P.; Nieuwboer, A.; Schlenstedt, C. Gait asymmetry and symptom laterality in Parkinson’s disease: Two of a kind? J. Neurol. 2024, 271, 4373–4382. [Google Scholar] [CrossRef]
Huang, P.-H.; Huang, Y.-C.; Tsai, Y.-T.; Su, M.-C.; Wang, C.-H. Parkinson’s disease classification using gait characteristics and wavelet-based feature extraction. Expert Syst. Appl. 2012, 39, 7338–7344. [Google Scholar]
Perumal, S.V.; Sankar, R. Gait and tremor assessment for patients with Parkinson’s disease using wearable sensors. ICT Express 2016, 2, 168–174. [Google Scholar] [CrossRef]
Daliri, M.R. Chi-square distance kernel of the gaits for the diagnosis of Parkinson’s disease. Biomed. Signal Process. Control 2013, 8, 66–70. [Google Scholar] [CrossRef]
Naimi, S.; Bouachir, W.; Bilodeau, G.-A. 1D-convolutional transformer for Parkinson disease diagnosis from gait. Neural Comput. Appl. 2024, 36, 1947–1957. [Google Scholar] [CrossRef]
Johri, A.; Tripathi, A. Parkinson disease detection using deep neural networks. In Proceedings of the 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida, India, 8–10 August 2019; pp. 1–4. [Google Scholar]
Hammoud, M.; Shcherbak, A.; Istrakova, O.; Shindryaeva, N.; Bril, E.; Passerone, R.; Somov, A. Wrist-worn sensors and machine learning for Parkinson’s disease detection: Investigation of binary and multiclassification problem. IEEE Trans. Instrum. Meas. 2025, 74, 2509611. [Google Scholar] [CrossRef]
Shcherbak, A.; Kovalenko, E.; Somov, A. Detection and classification of early stages of Parkinson’s disease through wearable sensors and machine learning. IEEE Trans. Instrum. Meas. 2023, 72, 4007909. [Google Scholar] [CrossRef]
Liu, R.; Wang, Z.; Zhao, H.; Qiu, S.; Wang, C.; Shi, X.; Lin, F. Quantitative analysis of lower limb motion in Parkinson’s disease based on inertial sensors. IEEE Sens. J. 2022, 22, 20937–20946. [Google Scholar] [CrossRef]
Vun, D.S.Y.; Bowers, R.; McGarry, A. Vision-based motion capture for the gait analysis of neurodegenerative diseases: A review. Gait Posture 2024, 112, 95–107. [Google Scholar] [CrossRef]
Salchow-Hömmen, C.; Skrobot, M.; Jochner, M.C.; Schauer, T.; Kühn, A.A.; Wenger, N. Emerging portable technologies for gait analysis in neurological disorders. Front. Hum. Neurosci. 2022, 16, 768575. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Su, B.; Lu, L.; Jung, S.; Qing, L.; Xie, Z.; Xu, X. Markerless gait analysis through a single camera and computer vision. J. Biomech. 2024, 171, 112027. [Google Scholar] [CrossRef]
Viswakumar, A.; Rajagopalan, V.; Ray, T.; Gottipati, P.; Parimi, C. Development of a robust, simple, and affordable human gait analysis system using bottom-up pose estimation with a smartphone camera. Front. Physiol. 2022, 12, 784865. [Google Scholar] [CrossRef]
Mazumder, O.; Khandelwal, P.; Gavas, R.; Sinha, A. Assessment of insole based gait feature variation with progression of Parkinson’s disease. In Proceedings of the 2018 IEEE SENSORS, New Delhi, India, 28–31 October 2018; pp. 1–4. [Google Scholar]
Vimalajeewa, D.; McDonald, E.; Tung, M.; Vidakovic, B. Parkinson’s disease diagnosis with gait characteristics extracted using wavelet transforms. IEEE J. Transl. Eng. Health Med. 2023, 11, 271–281. [Google Scholar] [CrossRef]
Faiem, N.; Asuroglu, T.; Acici, K.; Kallonen, A.; Van Gils, M. Assessment of Parkinson’s disease severity using gait data: A deep learning-based multimodal approach. In Proceedings of the Nordic Conference on Digital Health and Wireless Solutions, Oulu, Finland, 7–8 May 2024; Springer Nature: Cham, Switzerland, 2024; pp. 29–48. [Google Scholar]
Li, H.; Baryskievic, I.A.; Baryskievic, A.A.; Tsviatkou, V.Y. A novel integration of bodily-kinesthetic intelligence (BKI) and feature mining methodology: Applications in fall risk assessments. IEEE Sens. J. 2025, 25, 8721–8736. [Google Scholar] [CrossRef]
Li, H.; Ma, J.; Chen, Y.; Ren, X. Enhancing fall risk assessment in the elderly based on a hybrid neural network model utilizing plantar pressure sensor data. Biomed. Signal Process. Control 2026, 113, 109183. [Google Scholar] [CrossRef]
Lim, C.M.; Ng, H.; Yap, T.T.V.; Ho, C.C. Gait analysis and classification on subjects with Parkinson’s disease. J. Teknol. 2015, 77, 1–6. [Google Scholar] [CrossRef][Green Version]
Hoang, N.S.; Cai, Y.; Lee, C.-W.; Yang, Y.O.; Chui, C.-K.; Chua, M.C.H. Gait classification for Parkinson’s disease using stacked 2D and 1D convolutional neural network. In Proceedings of the 2019 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 17–19 October 2019; pp. 44–49. [Google Scholar]
Channa, A.; Ceylan, R.; Baqai, A. Machine learning for analyzing gait in Parkinson’s patients using wearable force sensors. In Proceedings of the International Conference on Intelligent Technologies and Applications, Bahawalpur, Pakistan, 23–25 October 2018; Springer: Singapore, 2018; pp. 548–559. [Google Scholar]
Hausdorff, J.M. Gait dynamics in Parkinson’s disease: Common and distinct behavior among stride length, gait variability, and fractal-like scaling. Chaos 2009, 19, 026113. [Google Scholar] [CrossRef]
Phinyomark, A.; Larracy, R.; Scheme, E. Fractal analysis of human gait variability via stride interval time series. Front. Physiol. 2020, 11, 333. [Google Scholar] [CrossRef]
Suglia, V.; Camardella, C.; Rinaldi, G.; Chiaradia, D.; Buongiorno, D.; Zhou, H.; Frisoli, A.; Leonardis, D.; Bevilacqua, V. Muscle network analysis of a dynamic bilateral task with an upper limb exoskeleton. In Proceedings of the 2025 International Conference on Rehabilitation Robotics (ICORR), Chicago, IL, USA, 23–27 June 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 419–424. [Google Scholar]
Zanardi, A.P.J.; da Silva, E.S.; Costa, R.R.; Passos-Monteiro, E.; dos Santos, I.O.; Kruel, L.F.M.; Peyré-Tartaruga, L.A. Gait parameters of Parkinson’s disease compared with healthy controls: A systematic review and meta-analysis. Sci. Rep. 2021, 11, 752. [Google Scholar] [CrossRef]
König, N.; Singh, N.B.; Baumann, C.R.; Taylor, W.R. Can gait signatures provide quantitative measures for aiding clinical decision-making? A systematic meta-analysis of gait variability behavior in patients with Parkinson’s disease. Front. Hum. Neurosci. 2016, 10, 319. [Google Scholar] [CrossRef]
di Biase, L.; Raiano, L.; Caminiti, M.L.; Pecoraro, P.M.; Di Lazzaro, V. Parkinson’s disease wearable gait analysis: Kinematic and dynamic markers for diagnosis. Sensors 2022, 22, 8773. [Google Scholar] [CrossRef] [PubMed]
Hausdorff, J.M. Gait in Parkinson’s Disease. Available online: http://www.physionet.org (accessed on 17 March 2026).
Toledo, S.F. Effect of gait speed on gait rhythmicity in Parkinson’s disease: Variability of stride time and swing time respond differently. J. Neuroeng. Rehabil. 2005, 2, 23. [Google Scholar] [CrossRef] [PubMed]
Hausdorff, J.M.; Lowenthal, J.; Herman, T.; Gruendlinger, L.; Peretz, C.; Giladi, N. Rhythmic auditory stimulation modulates gait variability in Parkinson’s disease. Eur. J. Neurosci. 2007, 26, 2369–2375. [Google Scholar] [CrossRef] [PubMed]
Yogev, G.; Giladi, N.; Peretz, C.; Springer, S.; Simon, E.S.; Hausdorff, J.M. Dual tasking, gait rhythmicity, and Parkinson’s disease: Which aspects of gait are attention demanding? Eur. J. Neurosci. 2005, 22, 1248–1256. [Google Scholar] [CrossRef]
Mancini, M.; Hausdorff, J.M.; Pelosin, E.; Bonato, P.; Camicioli, R.; Ellis, T.D.; Klucken, J.; Gifford, L.; Fasano, A.; Nieuwboer, A.; et al. A framework to standardize gait study protocols in Parkinson’s disease. J. Park. Dis. 2025, 15, 129–139. [Google Scholar] [CrossRef]
Howard, C.K.; Rhea, C.K.; Moxey, J.R.; Langerhans, K.; Prupetkaew, P.; Samulski, B.S. How Many Trials Are Needed for Consistent Clinical Gait Assessment? Appl. Sci. 2025, 15, 12740. [Google Scholar] [CrossRef]
Stuck, A.K.; Bachmann, M.; Füllemann, P.; Josephson, K.R.; Stuck, A.E. Effect of testing procedures on gait speed measurement: A systematic review. PLoS ONE 2020, 15, e0234200. [Google Scholar] [CrossRef] [PubMed]
Majlesi, M.; Azadian, E.; Farahpour, N.; Bakhtiarian, R.; Nobari, H. Fatigue-Inducing Protocols in Parkinson’s Disease: Implications for Gait Assessment and Rehabilitation: A Systematic Review. Park. Dis. 2026, 2026, 8822220. [Google Scholar] [CrossRef]
Alharthi, A. Explainable Gait Multi-Anchor Space-Aware Temporal Convolutional Networks for Gait Recognition in Neurological, Orthopedic, and Healthy Cohorts. Mathematics 2026, 14, 230. [Google Scholar] [CrossRef]
Yin, W.; Zhu, W.; Gao, H.; Niu, X.; Shen, C.; Fan, X.; Wang, C. Gait analysis in the early stage of Parkinson’s disease with a machine learning approach. Front. Neurol. 2024, 15, 1472956. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.; Kuk, M.; Singh, R.E.; Holmes, G.; Wren, T.A.L. A deep-learning approach for automatically detecting gait-events based on foot-marker kinematics in children with cerebral palsy—Which markers work best for which gait patterns? PLoS ONE 2022, 17, e0275878. [Google Scholar] [CrossRef] [PubMed]
Kimijanová, J.; Svoboda, Z.; Han, J. Editorial: Sensory control of posture and gait: Integration and mechanisms to maintain balance during different sensory conditions. Front. Hum. Neurosci. 2024, 18, 1378599. [Google Scholar] [CrossRef]
Taiar, R.; Bernardo-Filho, M.; Sañudo, B.; Ivanenko, Y. Editorial: The Relationship Between Neural Circuitry and Biomechanical Action. Front. Hum. Neurosci. 2022, 16, 838028. [Google Scholar] [CrossRef]
Duysens, J.; Forner-Cordero, A. A controller perspective on biological gait control: Reflexes and central pattern generators. Annu. Rev. Control 2019, 48, 392–400. [Google Scholar] [CrossRef]
Project Zero, Harvard University. Multiple Intelligences. Available online: https://pz.harvard.edu/projects/multiple-intelligences (accessed on 17 March 2026).
Hao, L. Nonparametric walking model based on relationship between plantar insole sensor signal time-spatial features. In Novye Gorizonty–2025: Sbornik Materialov XII Belorussko-Kitaiskogo Molodezhnogo Innovatsionnogo Foruma; Belarusian National Technical University: Minsk, Belarus, 2025; Volume 1, pp. 78–79. [Google Scholar]
Perry, J.; Burnfield, J.M. Gait Analysis: Normal and Pathological Function, 2nd ed.; SLACK Incorporated: Thorofare, NJ, USA, 2010. [Google Scholar]
PM&R KnowledgeNow. Biomechanics of Normal Gait. Available online: https://now.aapmr.org/biomechanics-normal-gait/ (accessed on 15 April 2026).
Prasanth, H.; Caban, M.; Keller, U.; Courtine, G.; Ijspeert, A.; Vallery, H.; von Zitzewitz, J. Wearable sensor-based real-time gait detection: A systematic review. Sensors 2021, 21, 2727. [Google Scholar] [CrossRef]
Ramírez-Bautista, J.A.; Huerta-Ruelas, J.A.; Chaparro-Cárdenas, S.L.; Hernández-Zavala, A. Gait Segmentation Method Using a Plantar Pressure Measurement System with Custom-Made Capacitive Sensors. Sensors 2020, 20, 656. [Google Scholar] [CrossRef] [PubMed]
Caruso, M.; Cesarelli, G.; Pagano, G.; d’Addio, G.; Cuccarese, M.; Galiero, R.; Romano, M.; Iuppariello, L.; Sansone, M.; Ricciardi, C. Biomechanics Parameters of Gait Analysis to Characterize Parkinson’s Disease: A Scoping Review. Sensors 2025, 25, 338. [Google Scholar] [CrossRef]
Wafai, L.; Zayegh, A.; Woulfe, J.; Mahfuzul, S.; Begg, R. Identification of Foot Pathologies Based on Plantar Pressure Asymmetry. Sensors 2015, 15, 20392–20408. [Google Scholar] [CrossRef]
Li, A.; Li, C. Detecting Parkinson’s disease through gait measures using machine learning. Diagnostics 2022, 12, 2404. [Google Scholar] [CrossRef]
Markovic, F.; Jovanovic, L.; Spalevic, P.; Kaljevic, J.; Zivkovic, M.; Simic, V.; Shaker, H.; Bacanin, N. Parkinsons detection from gait time series classification using modified metaheuristic optimized long short term memory. Neural Process. Lett. 2025, 57, 14. [Google Scholar] [CrossRef]
Rangel-Cascajosa, C.; Luna-Perejón, F.; Vicente-Diaz, S.; Domínguez-Morales, M. Gait-based Parkinson’s disease detection using recurrent neural networks for wearable systems. Big Data Cogn. Comput. 2025, 9, 183. [Google Scholar] [CrossRef]
Rehman, S.U.; Ali, A.; Khan, A.M.; Okpala, C. Human activity recognition: A comparative study of validation methods and impact of feature extraction in wearable sensors. Algorithms 2024, 17, 556. [Google Scholar] [CrossRef]
Dion, G.; Tessier-Poirier, A.; Chiasson-Poirier, L.; Morissette, J.-F.; Brassard, G.; Haman, A.; Turcot, K.; Sylvestre, J. In-sensor human gait analysis with machine learning in a wearable microfabricated accelerometer. Commun. Eng. 2024, 3, 48. [Google Scholar] [CrossRef]

$Fractalfract 10 00297 g001$

Figure 1. Generalized structure of Bodily-Kinesthetic Control Integration (BKCI)-based multilevel integration and multidomain feature-based walking gait analysis.

$Fractalfract 10 00297 g001$

$Fractalfract 10 00297 g002$

Figure 2. Gait state segmentation for PLPI feature generation in the FID-Gait framework. Numbers 1–8 denote the plantar-insole sensor indices on the plantar plane.

$Fractalfract 10 00297 g002$

$Fractalfract 10 00297 g003$

Figure 3. Adaptive-threshold-based gait-state segmentation and gait-state percentage distribution in the Normal group. (a) Representative multiregional plantar-pressure signals with adaptive thresholds and segmented gait states across the heel, rearfoot, midfoot, forefoot, and toe regions. (b) Percentage distribution of each segmented gait state within the gait cycle in the Normal group.

$Fractalfract 10 00297 g003$

$Fractalfract 10 00297 g004$

Figure 4. Representative plantar-region and channel selection for plantar-loading asymmetry feature construction. (a) Boxplots of subject-level threshold summaries across five plantar regions (Heel, Rearfoot, Midfoot, Forefoot, and Toe) for the Normal and Parkinson groups with Holm-adjusted between-group comparison results and rank-biserial effect sizes. (b) Boxplots of subject-level raw-pressure summaries for channels P6 and P7, together with outlier-ratio and robust-dispersion statistics, used to support the selection of the representative forefoot channel.

$Fractalfract 10 00297 g004$

$Fractalfract 10 00297 g005$

Figure 5. Representative regional insole sensor data sequences and fractal-fitting schematics for the five plantar subregions. (a) Heel subregion. (b) Rearfoot subregion. (c) Midfoot subregion. (d) Forefoot subregion. (e) Toe subregion. Each panel presents representative regional insole sensor data sequences from the Normal and Parkinson groups together with the corresponding HFD, PFD, KFD, and BCFD estimation schematics.

$Fractalfract 10 00297 g005$

$Fractalfract 10 00297 g006$

Figure 6. Subject-level fractal-dimension summary and CAD comparison between the Normal and Parkinson groups. (a) Radar-style summary of the top 10 regional fractal-dimension features ranked by Holm-adjusted p-values. Blue and orange markers denote the Normal and Parkinson groups, respectively. Red feature labels indicate Holm-significant features. (b) Subject-level CAD distribution comparison using the Wilcoxon rank-sum test; because CAD was analyzed as a single prespecified comparison, multiple-comparison correction was not applicable. Significance labels denote Holm-adjusted results: * p < 0.05, *** p < 0.001, and **** p <0.0001; ns, not significant after Holm correction.

$Fractalfract 10 00297 g006$

$Fractalfract 10 00297 g007$

Figure 7. Comparison of the spatial distributions of Normal and Parkinson gait cycle samples in the 3D Gait Feature Space and their space box-counting fractal dimensions. (a) Spatial distribution of gait cycle samples from a representative Normal subject with occupied boxes for space box-counting fractal-dimension estimation. (b) Spatial distribution of gait cycle samples from a representative Parkinson subject with occupied boxes for space box-counting fractal-dimension estimation. (c) Kernel density estimation (KDE) comparison of subject-level space box-counting fractal dimensions between the Normal and Parkinson groups, with dashed vertical lines indicating the distribution peaks.

$Fractalfract 10 00297 g007$

$Fractalfract 10 00297 g008$

Figure 8. Visualization results of the best-performing classifiers at two levels: (a) gait cycle-level ROC curve of MLP, where the blue solid line denotes the model ROC curve and the orange dashed diagonal line denotes the random-classification reference; (b) gait cycle-level confusion matrix of MLP, where darker blue indicates a higher percentage; (c) subject-level ROC curve of AdaBoost, where the blue solid line denotes the model ROC curve and the orange dashed diagonal line denotes the random-classification reference; (d) subject-level confusion matrix of AdaBoost, where darker blue indicates a higher percentage.

$Fractalfract 10 00297 g008$

$Fractalfract 10 00297 g009$

Figure 9. Average percentage performance drop under different feature-domain ablation settings across all classifiers at both the gait-cycle and subject levels.

$Fractalfract 10 00297 g009$

$Fractalfract 10 00297 g010$

Figure 10. Boxplot distributions of Accuracy and F1-score for the top five classifiers under five-fold cross-validation at the subject level and the gait cycle level. (a) Subject-level Accuracy distributions. (b) Subject-level F1-score distributions. (c) Gait cycle-level Accuracy distributions. (d) Gait cycle-level F1-score distributions. Colored points denote fold-wise results for different classifiers.

$Fractalfract 10 00297 g010$

Table 1. Comparison of representative plantar-insole-based studies for Parkinson’s disease (PD) gait analysis. CoP, center of pressure; IMU, inertial measurement unit; LDA, linear discriminant analysis; UPDRS, Unified Parkinson’s Disease Rating Scale; SVM, support vector machine; Nor/PD, Normal-versus-PD classification.

Ref	Sensor	Research Task	Main Method	Advantage	Disadvantage
[32]	Insole sensor	Disease progression assessment	Signal data and CoP	Easy to understand, quantifies progression	Small sample size, lacks dynamic expression of progression
[33]	Insole sensor	Classification	Wavelet + fuzzy neural network	Interpretable, simple	Lacks multidomain feature processing, easily affected by noise
[35]	Plantar insole sensor	Fall risk assessment	Feature mining	Structured walking-model features	Generalization requires validation
[36]	Plantar insole sensor	Elderly fall risk assessment	Hybrid neural network model	Improved risk classification using pressure-derived dynamics	Model complexity
[21]	Heel/Toe + IMU	Early PD detection	Gait + tremor (LDA)	Sensitive, easy to implement	Lacks deep data fusion
[34]	Pressure sensor	UPDRS regression	Perceiver + features	High accuracy, available remotely	Complex model

Table 2. Summary of walking dataset construction for binary PD classification.

Characteristics	PD	Normal	Total
Number of subjects	93	73	166
Male/Female (subjects)	58/35	40/33	98/68
Mean age (years)	66.3	66.3	66.3
Number of gait-record files	214	92	306

Table 3. Comparative classification performance of different classifiers under the gait-cycle-level 7:3 split setting. Bold values indicate the best result in each evaluation metric across all classifiers.

Classifier	Accuracy	Precision	Recall	F1-Score	AUC
Decision Tree	0.9716	0.9795	0.9872	0.9833	0.9371
Logistic Regression	0.8666	0.9719	0.8673	0.9166	0.9348
KNN	0.9894	0.9899	0.9977	0.9938	0.9906
Random Forest	0.9875	0.9868	0.9986	0.9927	0.9990
Gaussian Naive Bayes	0.6675	0.9662	0.6288	0.7618	0.8646
Gradient Boosting	0.9585	0.9614	0.9907	0.9759	0.9875
MLP	0.9911	0.9940	0.9955	0.9947	0.9973
SVM	0.9883	0.9955	0.9906	0.9930	0.9989
AdaBoost	0.9332	0.9500	0.9722	0.9609	0.9661

Table 4. Comparative classification performance of different classifiers under the subject-level 7:3 split setting.

Classifier	Accuracy	Precision	Recall	F1-Score	AUC
Decision Tree	0.8152	0.8983	0.8281	0.8618	0.8039
Logistic Regression	0.9022	0.9365	0.9219	0.9291	0.9621
KNN	0.8152	0.8406	0.9063	0.8722	0.8597
Random Forest	0.8478	0.8676	0.9219	0.8939	0.9314
Gaussian Naive Bayes	0.8370	0.9016	0.8594	0.8800	0.8892
Gradient Boosting	0.8696	0.9063	0.9063	0.9063	0.8806
MLP	0.8913	0.8971	0.9531	0.9242	0.9481
SVM	0.8370	0.8889	0.8750	0.8819	0.9018
AdaBoost	0.9022	0.9231	0.9375	0.9302	0.9235

Table 5. Representative independent validation results at the gait-cycle and subject levels.

Level	Classifier	Accuracy	Precision	Recall	F1-Score	AUC
Gait-cycle level	Gradient Boosting	0.9055	0.9417	0.9503	0.9460	0.8821
	MLP	0.8783	0.9568	0.9009	0.9280	0.9023
	AdaBoost	0.9046	0.9421	0.9488	0.9454	0.8626
Subject level	Random Forest	0.8837	1.0000	0.8333	0.9091	0.9026
	MLP	0.8605	0.9286	0.8667	0.8966	0.9154
	AdaBoost	0.8605	1.0000	0.8000	0.8889	0.8821

Table 6. Contribution analysis of different feature-domain combinations using representative classifiers at the gait-cycle and subject levels. Bold values indicate the best performance in each metric within each evaluation level.

Level	Classifier	Feature Setting	Accuracy	Precision	Recall	F1-Score	AUC
Gait-cycle level	MLP	Full feature set	0.9911	0.9940	0.9955	0.9947	0.9973
		Without PLPI domain features	0.9893	0.9923	0.9950	0.9937	0.9951
		Without deviation-domain features	0.9909	0.9934	0.9959	0.9946	0.9969
		Without fractal domain features	0.9799	0.9854	0.9909	0.9882	0.9891
Subject level	AdaBoost	Full feature set	0.9022	0.9231	0.9375	0.9302	0.9235
		Without PLPI domain features	0.8696	0.9194	0.8906	0.9048	0.9364
		Without deviation-domain features	0.8152	0.8406	0.9062	0.8722	0.8677
		Without fractal domain features	0.8696	0.9333	0.8750	0.9032	0.9219

Table 7. Comparative evaluation of different fractal-dimension components across all classifiers.

Feature Setting	Classifier	Accuracy	Precision	Recall	F1-Score	AUC
Base model without fractal features	Decision Tree	0.9649	0.9746	0.9842	0.9793	0.9220
	Logistic Regression	0.8105	0.9597	0.8099	0.8785	0.8810
	KNN	0.9789	0.9809	0.9944	0.9876	0.9827
	Random Forest	0.9814	0.9814	0.9969	0.9891	0.9982
	Gaussian Naive Bayes	0.5870	0.9493	0.5405	0.6888	0.8131
	Gradient Boosting	0.9465	0.9512	0.9874	0.9690	0.9752
	MLP	0.9799	0.9854	0.9909	0.9882	0.9891
	SVM	0.9581	0.9909	0.9593	0.9748	0.9888
	AdaBoost	0.9255	0.9406	0.9733	0.9567	0.9476
Base + only HFD	Decision Tree	0.9729	0.9795	0.9886	0.9840	0.9378
	Logistic Regression	0.8372	0.9646	0.8382	0.8970	0.9046
	KNN	0.9869	0.9870	0.9977	0.9923	0.9887
	Random Forest	0.9861	0.9853	0.9985	0.9918	0.9989
	Gaussian Naive Bayes	0.6438	0.9603	0.6037	0.7413	0.8512
	Gradient Boosting	0.9555	0.9589	0.9898	0.9741	0.9825
	MLP	0.9887	0.9920	0.9946	0.9933	0.9956
	SVM	0.9812	0.9945	0.9832	0.9888	0.9972
	AdaBoost	0.9319	0.9463	0.9748	0.9604	0.9576
Base + only PFD	Decision Tree	0.9638	0.9743	0.9831	0.9787	0.9207
	Logistic Regression	0.8234	0.9659	0.8201	0.8871	0.8946
	KNN	0.9798	0.9817	0.9946	0.9881	0.9846
	Random Forest	0.9832	0.9825	0.9979	0.9902	0.9985
	Gaussian Naive Bayes	0.5613	0.9579	0.5034	0.6599	0.8168
	Gradient Boosting	0.9478	0.9516	0.9885	0.9697	0.9782
	MLP	0.9816	0.9860	0.9923	0.9891	0.9902
	SVM	0.9639	0.9912	0.9659	0.9784	0.9910
	AdaBoost	0.9232	0.9396	0.9716	0.9553	0.9492
Base + only KFD	Decision Tree	0.9674	0.9763	0.9854	0.9808	0.9273
	Logistic Regression	0.8287	0.9650	0.8274	0.8909	0.8954
	KNN	0.9844	0.9863	0.9953	0.9908	0.9873
	Random Forest	0.9836	0.9827	0.9981	0.9904	0.9986
	Gaussian Naive Bayes	0.6391	0.9493	0.6056	0.7394	0.8187
	Gradient Boosting	0.9523	0.9549	0.9903	0.9723	0.9786
	MLP	0.9855	0.9883	0.9946	0.9914	0.9929
	SVM	0.9744	0.9934	0.9762	0.9847	0.9951
	AdaBoost	0.9274	0.9437	0.9721	0.9577	0.9555
Base + only BCFD	Decision Tree	0.9646	0.9750	0.9834	0.9792	0.9228
	Logistic Regression	0.8210	0.9596	0.8229	0.8860	0.8896
	KNN	0.9823	0.9834	0.9959	0.9896	0.9836
	Random Forest	0.9825	0.9819	0.9977	0.9897	0.9983
	Gaussian Naive Bayes	0.6322	0.9516	0.5953	0.7324	0.8231
	Gradient Boosting	0.9467	0.9512	0.9876	0.9691	0.9775
	MLP	0.9843	0.9878	0.9938	0.9908	0.9919
	SVM	0.9710	0.9918	0.9737	0.9827	0.9941
	AdaBoost	0.9223	0.9403	0.9696	0.9547	0.9504

Table 8. Average gait cycle-level performance of different fractal-dimension components across the nine classifiers. Bold values indicate the best average performance among all feature settings for each metric.

Feature Setting	Mean Accuracy	Mean F1-Score	Mean AUC
Base model without fractal features	0.9036	0.9347	0.9442
Base + only HFD	0.9205	0.9470	0.9571
Base + only PFD	0.9031	0.9329	0.9471
Base + only KFD	0.9159	0.9443	0.9499
Base + only BCFD	0.9119	0.9416	0.9479

Table 9. Average subject-level performance of different fractal-dimension components across the nine classifiers. Bold values indicate the best average performance among all feature settings for each metric.

Feature Setting	Accuracy	Precision	Recall	F1-Score	AUC
Base model without fractal features	0.8357	0.8923	0.8715	0.8806	0.8977
Base + only HFD	0.8358	0.8867	0.8785	0.8815	0.8908
Base + only PFD	0.8563	0.9010	0.8924	0.8962	0.9044
Base + only KFD	0.8418	0.8977	0.8733	0.8847	0.8927
Base + only BCFD	0.8454	0.8973	0.8802	0.8877	0.8915

Table 10. Five-fold stratified cross-validation performance of different classifiers at the gait cycle level.

Classifier	Accuracy (Mean, 95% CI)	Precision (Mean, 95% CI)	Recall (Mean, 95% CI)	F1-Score (Mean, 95% CI)	AUC (Mean, 95% CI)
AdaBoost	0.9347 (0.9326, 0.9368)	0.9498 (0.9490, 0.9507)	0.9742 (0.9724, 0.9761)	0.9619 (0.9607, 0.9631)	0.9665 (0.9653, 0.9677)
Decision Tree	0.9750 (0.9736, 0.9765)	0.9819 (0.9808, 0.9830)	0.9887 (0.9875, 0.9899)	0.9853 (0.9844, 0.9861)	0.9446 (0.9415, 0.9477)
Gaussian Naive Bayes	0.6773 (0.6729, 0.6817)	0.9664 (0.9652, 0.9676)	0.6406 (0.6350, 0.6462)	0.7705 (0.7665, 0.7744)	0.8662 (0.8638, 0.8687)
Gradient Boosting	0.9607 (0.9602, 0.9613)	0.9634 (0.9627, 0.9640)	0.9913 (0.9905, 0.9921)	0.9771 (0.9768, 0.9774)	0.9880 (0.9871, 0.9889)
KNN	0.9893 (0.9884, 0.9902)	0.9899 (0.9889, 0.9909)	0.9975 (0.9973, 0.9977)	0.9937 (0.9931, 0.9942)	0.9920 (0.9910, 0.9929)
Logistic Regression	0.8693 (0.8669, 0.8717)	0.9721 (0.9717, 0.9726)	0.8704 (0.8677, 0.8731)	0.9184 (0.9169, 0.9200)	0.9353 (0.9331, 0.9376)
MLP	0.9921 (0.9914, 0.9928)	0.9943 (0.9935, 0.9951)	0.9964 (0.9952, 0.9975)	0.9953 (0.9949, 0.9958)	0.9971 (0.9964, 0.9978)
Random Forest	0.9882 (0.9875, 0.9889)	0.9877 (0.9868, 0.9886)	0.9986 (0.9984, 0.9987)	0.9931 (0.9927, 0.9935)	0.9991 (0.9990, 0.9992)
SVM	0.9905 (0.9902, 0.9909)	0.9962 (0.9958, 0.9966)	0.9926 (0.9923, 0.9930)	0.9944 (0.9942, 0.9946)	0.9991 (0.9990, 0.9993)

Table 11. Five-fold stratified cross-validation performance of different classifiers at the subject level.

Classifier	Accuracy (Mean, 95% CI)	Precision (Mean, 95% CI)	Recall (Mean, 95% CI)	F1-Score (Mean, 95% CI)	AUC (Mean, 95% CI)
AdaBoost	0.9314 (0.9058, 0.9570)	0.9523 (0.9022, 1.0000)	0.9534 (0.9331, 0.9736)	0.9515 (0.9348, 0.9682)	0.9807 (0.9718, 0.9896)
Decision Tree	0.8727 (0.8522, 0.8931)	0.9196 (0.8947, 0.9446)	0.8975 (0.8637, 0.9312)	0.9077 (0.8918, 0.9236)	0.8631 (0.8314, 0.8949)
Gaussian Naive Bayes	0.8364 (0.7759, 0.8968)	0.9366 (0.8948, 0.9783)	0.8228 (0.7548, 0.8908)	0.8746 (0.8261, 0.9232)	0.9097 (0.8748, 0.9446)
Gradient Boosting	0.9182 (0.8932, 0.9433)	0.9413 (0.9029, 0.9797)	0.9440 (0.9258, 0.9621)	0.9419 (0.9252, 0.9586)	0.9690 (0.9536, 0.9843)
KNN	0.8529 (0.8427, 0.8631)	0.8756 (0.8414, 0.9098)	0.9254 (0.8689, 0.9818)	0.8975 (0.8864, 0.9085)	0.8947 (0.8519, 0.9375)
Logistic Regression	0.8562 (0.8248, 0.8875)	0.9115 (0.8642, 0.9588)	0.8833 (0.8634, 0.9032)	0.8961 (0.8758, 0.9165)	0.9299 (0.9001, 0.9597)
MLP	0.9083 (0.8599, 0.9567)	0.9259 (0.8704, 0.9814)	0.9485 (0.9148, 0.9822)	0.9358 (0.9032, 0.9685)	0.9395 (0.9109, 0.9681)
Random Forest	0.9085 (0.8845, 0.9325)	0.9141 (0.8708, 0.9574)	0.9627 (0.9445, 0.9808)	0.9367 (0.9215, 0.9519)	0.9662 (0.9581, 0.9744)
SVM	0.8626 (0.8196, 0.9055)	0.9134 (0.8592, 0.9675)	0.8926 (0.8530, 0.9322)	0.9011 (0.8724, 0.9299)	0.9269 (0.8826, 0.9713)

Table 12. Performance comparison of machine learning and deep learning models across different data levels. Raw-data-level models used plantar insole sensor time-series signals as input, whereas subject-level and gait-cycle-level models were built on the proposed three-domain feature representation. Bold values indicate the best-performing model at each evaluation level.

Evaluation Level	Input Type	Model	Average Computation Time per Subject (s)	Accuracy
Raw data level	Plantar insole sensor signal	LSTM	1.4216	0.6500
		BiLSTM	3.0261	0.6800
		CNN-LSTM	1.4706	0.7000
		CNN-BiLSTM	2.6863	0.7000
		MLP	0.2974	0.5700
Subject level	Three-feature domain	LSTM	0.9843	0.8710
		BiLSTM	0.9854	0.8710
		CNN-LSTM	0.9864	0.8710
		CNN-BiLSTM	0.9876	0.8387
		MLP	0.9833	0.8548
		AdaBoost	0.9807	0.9022
Gait cycle level	Three-feature domain	LSTM	1.3688	0.9919
		BiLSTM	1.5056	0.9937
		CNN-LSTM	1.5094	0.9909
		CNN-BiLSTM	1.8210	0.9900
		MLP	1.3342	0.9911

Table 13. Comparative classification performance of different classifiers under the subject-level LOSO setting. Bold values indicate the best result in each evaluation metric across all classifiers.

Classifier	Accuracy	Precision	Recall	F1-Score	AUC
Decision Tree	0.7843	0.8190	0.8879	0.8520	0.7144
Logistic Regression	0.8301	0.9050	0.8458	0.8744	0.8903
KNN	0.8529	0.8690	0.9299	0.8984	0.9059
Random Forest	0.8693	0.8718	0.9533	0.9107	0.9149
Gaussian Naive Bayes	0.7353	0.8889	0.7103	0.7896	0.8442
Gradient Boosting	0.8758	0.8826	0.9486	0.9144	0.8879
MLP	0.8954	0.9252	0.9252	0.9252	0.9268
SVM	0.8791	0.9234	0.9019	0.9125	0.9254
AdaBoost	0.8627	0.8909	0.9159	0.9032	0.8912

Table 14. Accuracy comparison with representative previous studies on insole sensor data-based PD classification. Bold values indicate the best reported accuracy at the corresponding analysis level and highlight the proposed method.

Study	Domain Feature	Accuracy (%)	Classifier Model
[20]	Frequency-domain features	77.33	NN
[68]	Frequency-domain features	84	SVM
[21]	Time-domain and frequency-domain features	86.9	LDA
[23]	Raw time-series signals	87.97	Hybrid ConvNet-Transformer
[24]	Spectrogram image representations	88.17	CNN
[38]	Time-domain features	88.7	CNN
[69]	Raw time-series signals	89.92	LSTM-PSOGO
[37]	Time-domain features	89.97	BPANN
Proposed	Fractal domain, imbalance domain, and deviation domain (subject-level features)	90.22	AdaBoost
[22]	Time-domain and frequency-domain features	91.2	SVM
[21]	Time-domain features	91.58	LDA
[70]	Raw time-series signals	93.75	RNN
Proposed	Fractal domain, imbalance domain, and deviation domain (gait cycle-level features)	99.11	MLP

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, H.; Ma, J.; Cao, B.; Ren, X.; Chen, Y.; Guo, Q.; Li, B.; Baryskievic, I.; Baryskievic, A.; Tsviatkou, V. Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection. Fractal Fract. 2026, 10, 297. https://doi.org/10.3390/fractalfract10050297

AMA Style

Li H, Ma J, Cao B, Ren X, Chen Y, Guo Q, Li B, Baryskievic I, Baryskievic A, Tsviatkou V. Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection. Fractal and Fractional. 2026; 10(5):297. https://doi.org/10.3390/fractalfract10050297

Chicago/Turabian Style

Li, Hao, Jun Ma, Boqiang Cao, Xunhuan Ren, Yiming Chen, Qicheng Guo, Bohan Li, Illa Baryskievic, Anatoliy Baryskievic, and Viktar Tsviatkou. 2026. "Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection" Fractal and Fractional 10, no. 5: 297. https://doi.org/10.3390/fractalfract10050297

APA Style

Li, H., Ma, J., Cao, B., Ren, X., Chen, Y., Guo, Q., Li, B., Baryskievic, I., Baryskievic, A., & Tsviatkou, V. (2026). Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection. Fractal and Fractional, 10(5), 297. https://doi.org/10.3390/fractalfract10050297

Article Menu

Integrated Fractal Dimensions and Imbalance–Deviation Features for Smart-Insole Walking Gait Analysis: Application to Parkinson’s Disease Detection

Abstract

1. Introduction

2. Related Work

2.1. Wearable Gait Analysis for Parkinson’s Disease and Related Movement Disorders

2.2. Feature Representation for Plantar Insole Sensor Signal Data PD Gait Analysis

3. Materials and Methods

3.1. PD Dataset Description

3.2. Statistical Analysis

3.3. Data Processing and Model Settings

3.4. A Bodily-Kinesthetic Control Integration (BKCI)-Based FID-Gait Architecture for Smart-Insole Gait Analysis

3.4.1. Body System Module and Bodily-Kinesthetic Control Integration

3.4.2. Multidomain Feature Generation Based on FID-Gait

3.4.3. Multilevel Integration Module

3.5. Plantar-Loading–Phase Imbalance Feature Generation

3.5.1. Gait State Segmentation for PLPI Construction

3.5.2. Plantar-Loading Asymmetry Features

3.5.3. Gait-Phase Ratio Imbalance Features

3.6. Fractal-Dimension Feature Generation

3.6.1. Definition of Regional Plantar Insole Sensor Data Sequences

3.6.2. Computation of Four Types of Fractal Dimensions

3.6.3. Construction of Left-Right Difference Features

3.6.4. Regional Fractal-Dimension Feature Set

3.7. Covariance-Adjusted Deviation Feature Generation

Spatial Box-Counting Fractal Dimension Based on CAD

4. Experiments and Results

4.1. Comparative Classification Performance

4.2. Independent Validation Performance

4.3. Contribution Analysis of Feature Domains

4.4. Comparative Evaluation of Fractal-Dimension Components

4.5. Cross-Validation Performance and Robustness

4.6. Performance Comparison of Machine Learning and Deep Learning Models Across Different Data Levels

4.7. Subject-Independent Evaluation Using Leave-One-Subject-Out Cross-Validation

4.8. Accuracy Comparison with Previous Studies Based on Plantar Insole Sensor Signals for PD Classification

5. Discussion

5.1. Real-World Applicability and Feasibility of Real-Time Implementation

5.2. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Methodological Details: K-Means Thresholding, Gait-State Rules, Fractal-Dimension Estimators, and Covariance-/Space-Based Definitions

Appendix A.1. Detailed K-Means Threshold Generation and Gait-State Detection Rules

Appendix A.2. Detailed Definitions of the Four Fractal-Dimension Estimators

Appendix A.3. Detailed Definitions of Covariance-Adjusted Deviation and Spatial Box-Counting Fractal Dimension

Appendix B. Statistical Analysis of Feature-Domain Ablation Effects in Figure 9

Appendix C. Normalized Model-Input Features Used for PD–Normal Comparison

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI