Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection

Kumar, Arvind; Gyanchandani, Manasi; Shukla, Sanyam

doi:10.3390/sym18010022

Open AccessArticle

Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection

by

Arvind Kumar

^*

,

Manasi Gyanchandani

and

Sanyam Shukla

Maulana Azad National Institute of Technology, Bhopal 462007, India

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(1), 22; https://doi.org/10.3390/sym18010022

Submission received: 11 October 2025 / Revised: 10 November 2025 / Accepted: 13 November 2025 / Published: 23 December 2025

(This article belongs to the Section Life Sciences)

Download

Browse Figures

Versions Notes

Abstract

Speech is one of the earliest-affected modalities in Parkinson’s disease (PD). For more reliable PD evaluation, speech-based telediagnosis studies often use multiple samples from the same subject to capture variability in speech recordings. However, many existing studies split samples—rather than subjects—between training and testing, creating a biased experimental setup that allows data (samples) from the same subject to appear in both sets. This raises concerns for reliable PD evaluation due to data leakage, which results in over-optimistic performance (often close to 100%). In addition, detecting subtle vocal impairments from speech recordings using multiple feature extraction techniques often increases data dimensionality, although only some features are discriminative while others are redundant or non-informative. To address this and build a reliable speech-based PD telediagnosis system, the key contributions of this work are two-fold: (1) a uniform (fair) experimental setup employing subject-wise symmetric (stratified) splitting in 5-fold cross-validation to ensure better generalization in PD prediction, and (2) a novel hybrid data complexity-aware (HDC) feature selection method that improves class separability. This work further contributes to the research community by releasing a publicly accessible five-fold benchmark version of the Parkinson’s speech dataset for consistent and reproducible evaluation. The proposed HDC method analyzes multiple aspects of class separability to select discriminative features, resulting in reduced data complexity in the feature space. In particular, it uses data complexity measures (F4, F1, F3) to assess minimal feature overlap and ReliefF to evaluate the separation of borderline points. Experimental results show that the top-50 discriminative features selected by the proposed HDC outperform existing feature selection algorithms on five out of six classifiers, achieving the highest performance with 0.86 accuracy, 0.70 G-mean, 0.91 F1-score, and 0.58 MCC using an SVM (RBF) classifier.

Keywords:

Parkinson’s disease; feature selection; experimental setup; data complexity; feature extraction

1. Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disease with an incidence rate of 20/100,000 and prevalence rate of 100/100,000 at the age of 60 or above [1]. PD impacts the brain, impairing both motor and non-motor functions, which indicates the need for accurate and reliable PD telediagnosis systems. Existing works analyze brain stimulation using Electroencephalogram (EEG) signals [2,3,4,5,6], muscle activity through surface electromyography (sEMG) signals [7,8], gait [9,10], handwriting [11,12,13], and speech [14,15] for PD detection. Speech impairments affect 90% of PD patients and can appear at an early age [16,17,18]. Speech-based PD telediagnosis systems use sustained vowel and running speech signals to differentiate PD patients from healthy subjects [19]. PD telediagnosis studies use multiple speech samples and signal processing algorithms to extract features for the early detection of speech impairments such as hypophonia (reduced voice), dysarthria, and dysphonia (breathiness or noise), typically before the age of 45 [20].

To effectively capture impaired speech patterns, existing studies [21,22,23,24,25] have introduced various speech signal processing algorithms (i.e., feature extraction techniques) to extract features such as baseline, time–frequency, vocal, MFCCs, wavelet, and TQWT, each capturing different aspects of speech signals. Deviations from normal speech patterns can be detected through these features, indicating potential vocal disorders and vocal fold irregularities by analyzing fundamental frequency (

F_{0}

) stability, pitch, formants, periodicity, and amplitude variations. Table 1 shows the evolution from traditional acoustic feature extraction techniques—such as pitch, jitter, shimmer, and formants—toward advanced feature extraction techniques that detect localized speech patterns associated with early PD-related impairments. Wavelet analysis offers multi-resolution decomposition to detect periodicity deviations, with minimal deviation in healthy subjects and significant deviation in PD patients. MFCCs improve spectral resolution by mapping frequencies to the Mel scale for narrow spectral sampling, enabling the detection of fine articulatory changes. TQWT further advances feature extraction by adaptively tuning the Q-factor, where higher Q values enable finer subband decomposition to better isolate PD-specific signal variations.

This work evaluates the contribution of various speech feature extraction techniques in providing discriminative features and investigates the size threshold for feature selection beyond which features become non-informative, contradictory, or redundant [26,27,28,29]. Similarly to this work, existing studies [25,30,31,32,33,34,35] employ feature extraction or feature selection methods to reduce dimensionality or identify informative features from the original feature set. Analysis of this work and existing studies demonstrates that, at most, the top-50 features (i.e., <7%) are sufficient to achieve the optimal classification performance. For dimensionality reduction, some studies employ feature extraction methods, such as octopus pooling with SVD [30] (32 features) and minimum average maximum tree with SVD [31] (50 features), while others employ feature selection methods, such as MRMR [25] (50 features) and the wrapper method [32] (<20 features). Other studies use deep learning approaches employing two-stage reduction, where [33] uses wolf search optimization (37 features) followed by a sparse autoencoder (8 features), [34] uses Crow search (36 features) followed by a sparse autoencoder (7 features), and [35] uses filter-based selection using ReliefF and Fisher score (60–90 features) followed by variational autoencoders (30 features).

This work explores data complexity measures to select discriminative features that capture multiple aspects of class separability. To address this, the work proposes a hybrid data complexity-based feature selection (HDC) method that combines data complexity measures (F1, F3, and F4) with ReliefF to select a discriminative feature set that effectively enhances classification performance. Class separability is quantified using the F1, F3, and F4 measures. The F1 measure (maximum Fisher’s discriminant ratio) captures the magnitude of feature overlap (i.e., overlapping region) between classes, whereas F3 and F4 evaluate feature efficiency based on the number of data points (speech samples) within the overlapping region. As a novel contribution, this work introduces F1F3, a metric formed by combining the weights of F1 and F3 to rank features. This combination unifies the magnitude of feature overlap from F1 and the number of data points in the feautre overlap from F3 to provide a holistic view of class separability. In addition, to incorporate different aspects of class separability in the final feature set, this work utilizes the F4 measure (which iteratively applies F3 to select a subset of features) and ReliefF (which evaluates the separation of borderline points).

However, the main motivation of this work is to develop a reliable PD telediagnosis system that performs well on unseen test data. In PD telediagnosis, each subject has multiple speech samples. Due to this, existing PD studies have shortcomings in their experimental setup, as in conventional cross-validation schemes, some samples of a subject are used for training and others for testing, leading to biased results. These results, derived from non-typical real testing scenarios, are overestimated and do not generalize well to unseen subjects [36]. The study in [36] confirms this by re-evaluating the generalization of [37], which introduced a speech dataset of 31 subjects (23 PD and 8 healthy) and reported an over-optimized accuracy of 91.40% using bootstrap resampling. The results show that using the same dysphonia features with an unbiased Leave-One-Subject-Out (LOSO) cross-validation scheme, which ensured testing on unseen subjects, accuracy decreased significantly from 91.40% to 65.13%. Recently, Sakar et al. [25] introduced a speech dataset consisting of 252 subjects (i.e., 188 PD and 64 healthy subjects) and achieved 86% accuracy using the LOSO cross-validation scheme. Later, studies in [30,31,32,33,34] employed a conventional sample-wise k-fold (10-fold) cross-validation scheme in their experimental setups and reported over-optimized accuracies of 95–99%.

To ensure reliable PD evaluation, this work introduces a uniform (fair) experimental setup that streamlines research in this domain. This approach facilitates the development of robust PD telediagnosis systems that generalize well to unseen subjects. This is achieved by keeping test data unseen during the training phase at two levels: (1) a modified subject-wise 5-fold benchmark dataset is proposed using an unbiased subject-wise k-fold cross-validation scheme, where the dataset is partitioned subject-wise rather than sample-wise into five train–test set pairs; and (2) if no proper external test data is available—as in the case of k-fold cross-validation—the selection of an optimal feature subset must be conducted independently for each training set [38]. This prevents data leakage (information from the test set is unintentionally used during training) and ensures a fair comparison of PD telediagnosis studies.

The main contributions of the work are delineated below:

This work presents the experimental error in the aforementioned studies and introduces a uniform (or fair) experimental setup to streamline research and ensure a fair comparison of results.
This work releases a publicly accessible 5-fold benchmark version of the PD speech dataset introduced in [25] for consistent and reproducible evaluation.
This work proposes a hybrid data complexity-aware feature selection (HDC) method using data complexity metrics such as F1, F3, and F4 measures along with the ReliefF feature selection algorithm.
Empirical analysis demonstrates
–
The impact of feature extraction techniques and the size limit at which features remain informative for PD classification;
–
An analysis of the top-50 features using the proposed HDC algorithm and existing state-of-the-art feature selection algorithms, including Information Gain, Gain Ratio, ReliefF, and mRMR.
This work employs Naive Bayes, Decision Tree, k-Nearest Neighbors (k-NN), and Support Vector Machine (SVM) classifiers to evaluate the efficiency of the proposed PD telediagnosis system based on four evaluation metrics: accuracy, G-mean, F1 measure, and MCC.

The next section describes the related work. Section 3 elaborates the proposed work in detail. Section 4 provides the details of the experimental setup and describes the results obtained and their analysis. The last section concludes the work and also provides the details of future work.

2. Related Work

2.1. Dataset and Speech Feature Categories

The publicly available dataset [39] released by Sakar et al. [25] is used in this study. Table 2 provides details of the dataset, which includes 252 subjects—188 PD patients and 64 healthy individuals—each with three recordings of sustained phonation of the vowel /a/. Table 3 presents the speech feature categories along with their descriptions.

Table 4 presents the equations for each speech feature category along with definitions of all parameters. Baseline speech features extract both linear and nonlinear dysphonia measures. Jitter and shimmer quantify variations in fundamental frequency (

F_{0}

) and amplitude, respectively, indicating vocal instability. NHR and HNR assess breathiness and hoarseness by measuring the energy distribution between noise and tonal components, with PD patients typically showing increased NHR and reduced HNR. RPDE quantifies the unpredictability of vocal fold vibrations, where higher values indicate irregular phonation, a distinctive trait of PD dysphonia. PPE evaluates pitch variation irregularities, which increase in PD patients due to unstable vocal fold movements. DFA captures fractal scaling properties in speech, reflecting long-term voice correlations, with PD patients exhibiting altered DFA values due to impaired vocal control. These baseline features collectively assess phonatory instability, breathiness, and nonlinearity in PD speech, aiding in early detection and progression monitoring. Time–frequency features, such as intensity, formant, and bandwidth parameters, are extracted from speech signal spectrograms to analyze deviations in PD patients compared to healthy controls. Vocal fold features, including GNE, EMD-ER, VFER, and GQ, measure irregularities in vocal fold vibrations commonly observed in PD-related dysphonia. MFCCs and their first- and second-order derivatives, computed on the Mel scale, describe spectral and temporal changes in speech. These parameters assess articulatory and phonatory changes, as PD patients often exhibit reduced speech clarity and altered spectral patterns. As shown in Table 4, the cosine function (cos) and

π

implement the discrete cosine transform (DCT), which converts the log energies of the Mel-filtered bands into a set of decorrelated and compact cepstral coefficients (MFCCs) for efficient speech analysis. Wavelet transform (WT) decomposes the speech signal into different scales and time positions and measures deviations from exact periodicity in sustained vowels, which are minimal in healthy subjects and significant in PD patients. Unlike traditional WT with fixed Q-factors, TQWT offers flexible time–frequency resolution, where Q-factor (

α

) controls the oscillatory nature of the wavelets, and the redundancy parameter (

β

) determines the overlap between subbands. This enables the detection of timing and amplitude anomalies in speech linked to motor impairments. With tunable resolution, TQWT isolates fine-grained speech variations for discriminative feature extraction in PD detection.

2.2. Existing Feature Selection Algorithms

Feature selection is crucial in domains like bioinformatics [40,41,42], image analysis [43,44], text analysis [45], and intrusion detection [46], which involve high-dimensional data due to large feature sets such as genes, pixels, words, or network attributes—making feature selection essential in enhancing model efficiency by selecting informative features and reducing dimensionality [47]. Similarly, this work employs state-of-the-art filter-based feature selection algorithms—Information Gain (IG), Gain Ratio (GR), ReliefF, and minimum redundancy maximum relevance (mRMR)—which are widely used to identify the most relevant acoustic features for Parkinson’s disease classification.

Information Gain (IG) [48] measures the reduction in uncertainty (entropy) and ranks the features based on IG scores. It evaluates the importance of a feature by considering prior class information, which refers to the overall class distribution before splitting on feature A and is represented by

E n t r o p y (D)

, and posterior class information, which represents the class distribution after splitting on A and is given by

\sum_{j \in A} \frac{| D_{j} |}{| D |} * E n t r o p y (D_{j})

, as shown in Equation (1).

I G (D, A) = E n t r o p y (D) - \sum_{j \in A} \frac{| D_{j} |}{| D |} * E n t r o p y (D_{j})

(1)

where

E n t r o p y = - \sum_{i = 1}^{C} p_{i} l o g_{2} (p_{i})

Here,

p_{i}

represents the probability of class i in dataset D. A higher IG score indicates a higher reduction in uncertainty (entropy), making the feature more relevant for classification.

The Gain Ratio mitigates Information Gain’s bias toward highly branched predictors by introducing a normalizing term

H_{b}

. It serves as the attribute selection criterion in well-known algorithms like C4.5 [49,50]. Equation (2) shows that

G R

increases for the same

I G

as

H_{b}

decreases, making

G R

less likely to favor finer partitions.

G R = \frac{I G}{H_{b}}

(2)

where

H_{b} = - \sum_{i = 1}^{C} (\frac{| D_{j} |}{| D |}) l o g_{2} (\frac{| D_{j} |}{| D |})

ReliefF [51] is an extension of Relief [52] that supports multi-class classification and uses the k-Nearest Neighbors approach to handle noisy or incomplete datasets. It ranks features based on their discriminative power for borderline samples by considering both intra-class (i.e.,

d f (A, R, H)

) and inter-class nearest neighbors (i.e.,

d f (A, R, M (C))

). Here, A is the attribute, R is a randomly selected sample,

H_{i}

denotes its i-th nearest neighbor from the same class (near-hit),

M_{i} (C)

denotes its i-th nearest neighbor from a different class

C \neq c l (R)

(near-miss), and

c l (R)

represents the class label of the instance R. The number of such randomly selected samples is denoted by m, and the number of nearest neighbors considered per sample is k. ReliefF updates the feature weight vector

W [A]

by averaging these contributions, as shown in Equation (3).

W [A] = W [A] - \sum_{i = 1}^{k} \frac{d f (A, R, H_{i})}{m \times k} + \sum_{C \neq c l (R)} \sum_{i = 1}^{k} [\frac{P (C)}{1 - P (c l (R))} \times \frac{d f (A, R, M_{i} (C))}{m \times k}]

(3)

Minimum redundancy maximum relevance (mRMR) [53] selects a feature subset by retaining the most discriminative features and discarding irrelevant features. It aims to minimize the correlation (redundancy) between features while maximizing the correlation with the class (relevance). mRMR ranks features by maximizing relevance (

I (i, h)

) between a feature and the class while minimizing redundancy (

I (i, j)

) between features, where

| S | (= m)

denotes the number of features in the subset S. Equation (4) demonstrates how relevance and redundancy are evaluated using mutual information (i.e.,

I (i, h)

and

I (i, j)

) for discrete features.

\begin{matrix} D i s c r e t e_{M I D} = {max}_{i = 1}^{m} (I (i, h) - \frac{1}{| S |} \sum_{j = 1}^{S} I (i, j)) \\ D i s c r e t e_{M I Q} = {max}_{i = 1}^{m} \{I (i, h) / [\frac{1}{| S |} \sum_{j = 1}^{S} I (i, j)]\} \end{matrix}

(4)

However, Equation (5) defines relevance and redundancy for continuous features, evaluated using the F-statistic (i.e.,

F (i, h)

) and the Pearson correlation coefficient (i.e.,

c (i, j)

), respectively.

\begin{matrix} C o n t i n u o u s_{F C D} = {max}_{i = 1}^{m} (F (i, h) - \frac{1}{| S |} \sum_{j = 1}^{S} c (i, j)) \\ C o n t i n u o u s_{F C Q} = {max}_{i = 1}^{m} \{F (i, h) / [\frac{1}{| S |} \sum_{j = 1}^{S} c (i, j)]\} \end{matrix}

(5)

2.3. Data Complexity Measures

As described in [54], data complexity measures fall into three main categories: (i) feature overlapping measures, (ii) class separability measures, and (iii) measures of geometry, topology, and density of manifolds. These measures are computed from the training dataset X, which contains n labeled samples

(x_{i}, y_{i})

, where

x_{i} \in X

has m features and

y_{i} \in {c_{1}, . . ., c_{t}}

denotes the class label, with t being the total number of classes. This work employs the data complexity measures outlined below and combines these measures to introduce a novel feature selection method. The feature values of the data complexity measures lie in the range [0, 1], where larger values indicate higher complexity. This work employs only classifier-independent complexity measures to minimize bias and ensure that the feature selection results are not influenced by any specific classifier.

The F1 measure (maximum Fisher’s discriminant ratio) [54] evaluates class separability, which is inversely related to feature overlap (overlapping region), and is defined as

\begin{matrix} F 1 = \frac{1}{1 + {max}_{j = 1}^{m} (f_{j})} where f_{j} = \frac{{(μ_{c_{1}} - μ_{c_{2}})}^{2}}{(σ_{c_{1}}^{2} + σ_{c_{2}}^{2})} \end{matrix}

(6)

Here, m denotes the total number of features;

μ_{c_{1}}

and

μ_{c_{2}}

are the means of feature j for classes

c_{1}

and

c_{2}

, respectively;

σ_{c_{1}}^{2}

and

σ_{c_{2}}^{2}

are the corresponding variances; and

f_{j}

is the Fisher score of feature j. The Fisher score of a feature j reflects its discriminative power; a high score for any feature leads to a low F1 value (indicating low feature overlap), demonstrating the presence of at least one feature with better class separability than the others in the dataset.

The F3 measure (Maximum Individual Feature Efficiency) [55] evaluates class separability by considering the numerator of F3, which represents the number of samples present in the overlapping region, to estimate the efficiency of each feature in separating two classes. It is defined as

\begin{matrix} F 3 & = {min}_{j = 1}^{m} (\frac{n_{o} (f_{j})}{n}), where \\ n_{o} (f_{j}) & = \sum_{d = 1}^{n} (x_{d j} > m a x_{m i n} (f_{j}) \land x_{d j} < m i n_{m a x} (f_{j})) \\ m a x_{m i n} (f_{j}) & = m a x (m i n (f_{j}^{c_{1}}), m i n (f_{j}^{c_{2}})) \\ m i n_{m a x} (f_{j}) & = m i n (m a x (f_{j}^{c_{1}}), m a x (f_{j}^{c_{2}})) \end{matrix}

(7)

Here,

n_{o} (f_{j})

denotes the number of overlapping samples for feature j, and

m a x (f_{j}^{c_{t}})

and

m i n (f_{j}^{c_{t}})

denote the maximum and minimum values of feature j in class

c_{t}

, respectively. A lower F3 value for a feature indicates that fewer samples lie inside the overlapping region.

The F4 measure (Collective feature efficiency) [55] evaluates class separability to select one feature per iteration using the F3 measure, and iteratively removes non-overlapping samples from

X_{t r a i n^{d}}

over d iterations until no overlapping samples remain. F4 provides a feature subset of d features and is defined as

\begin{matrix} F 4 = \frac{n_{0} (f_{m i n} (X_{t r a i n^{d}}))}{n} \end{matrix}

(8)

Here, d denotes the number of features in the subset, and

n_{0} (f_{m i n} (X_{t r a i n^{d}}))

represents the iterative F3-based procedure that selects the feature

f_{j}

with the minimum F3 score (

f_{m i n}

) in each iteration and removes non-overlapping samples from

X_{t r a i n}

until no overlap remains.

3. Proposed Work

As mentioned earlier, conventional sample-wise k-fold cross-validation may lead to a faulty experimental setup, as speech samples from the same subject may appear in both the training and test sets. To ensure reliable PD evaluation, train–test data should be split subject-wise rather than sample-wise. This paper mainly focuses on developing an ethical experimental setup for Parkinson’s disease detection. Figure 1 illustrates the proposed uniform experimental setup, which employs an unbiased subject-wise 5-fold cross-validation. Feature selection is applied separately for each training fold, ensuring that test data does not influence the selection process. This prevents data leakage (information from the test set) and ensures a more reliable evaluation of PD detection with five different feature subsets. In addition, this work proposes a hybrid data complexity-based (HDC) feature selection algorithm to select discriminative features that improve class separability.

3.1. Subject-Wise Dataset Bifurcation Enabling Researchers to Have Fair Common Platform for Their Study

This work uses the PD speech benchmark dataset [25], publicly available in the UCI repository, which consists of 252 subjects—188 PD patients (107 males, 81 females) and 64 healthy subjects (23 males, 41 females)—exhibiting class imbalance. The work in [25] publicly released the preprocessed dataset [39] for research, and many works [30,31,32,33,34] encountered problems in designing a fair experimental environment as the dataset is not available in train–test pair format. Therefore, this paper provides a modified subject-wise 5-fold benchmark dataset in a standard format to enable a fair comparison of results and streamline the research in this area. The proposed modified subject-wise 5-fold benchmark dataset with proper bifurcation is available online https://github.com/ArvindKumar-PhD/DataComplexity-aware-FeatureSelection, accessed on 10 October 2025. This work adopts a uniform (or fair) experimental setup to assess model performance in two configurations, as illustrated in Figure 2:

Internal test set evaluation (k-fold cross-validation):
This work employs subject-wise 5-fold cross-validation to create five train–test pairs from a total of 245 subjects—185 PD (105 males, 80 females) and 60 healthy (20 males, 40 females)—using a symmetric data splitting strategy that ensures the same class and gender ratio as in the original population.
External test set evaluation:
The remaining seven subjects—three PD (two males, one female) and four healthy (three males, one female)—are held out to maintain symmetry in the data split used for subject-wise 5-fold cross-validation. In this configuration, these 7 subjects are used as the external test set, and 245 subjects from the earlier setup are used as the training set.

3.2. Hybrid Data Complexity-Based (HDC) Feature Selection

The difficulty in PD detection lies in finding clinically useful information from speech signals. To address this, the study in [25] implements various speech feature extraction techniques to extract informative features, which can enhance the representation of speech disorders but may also introduce redundant, non-informative, or contradictory features. Therefore, identifying the most relevant features is crucial for better classification but remains difficult due to uncertainty about which features to select. This work proposes an HDC algorithm that combines two data complexity categories to assess different aspects of class separability for feature selection [54]:

Category $M_{1}$ : measures of overlap of individual feature values;
Category $M_{2}$ : measures of the separability of classes.

The proposed HDC algorithm combines data complexity measures—F1, F3, and F4 [54,55,56]—from the

M_{1}

category, which evaluate class separability based on feature overlap, with ReliefF [51] from the

M_{2}

category, which evaluates class separability based on borderline data points.

Algorithm 1 presents the steps of the proposed HDC algorithm, where features are selected based on their ability to enhance class separability and reduce the underlying complexity of the feature space. A feature with a high F1, F3, or F4 value denotes significant overlap between classes and a more complex feature space, while a low value indicates better class separability (i.e., minimal feature overlap) and a less complex feature space for classification. Figure 3 illustrates lines 7–15 of Algorithm 1, which explain the optimal feature set selection process.

Algorithm 1 Hybrid data complexity-based Feature Selection

Input:

X_{t r a i n_{1 . . k}}, Y_{t r a i n_{1 . . k}}

: subject-wise k-fold training data,
             m: total number of features,
             d: number of features to select via F4 (one per iteration),
             C: number of top features to select (Top-C),

C o r r %

: correlation threshold for feature removal,

μ_{c_{1}}, μ_{c_{2}}

: mean values of feature j for classes

c_{1}

and

c_{2}

,

σ_{c_{1}}^{2}, σ_{c_{2}}^{2}

: variance of feature j for classes

c_{1}

and

c_{2}

,

f_{j}

: individual feature,
n: number of samples in

X_{t r a i n_{i}}

,

n_{o} (f_{j})

: number of overlapping samples for feature j,
Output:

O p t i m a l S e t

(optimal feature subset for each training set)

1:: procedure OptimalSet( $X_{t r a i n_{1 . . k}}$ , $Y_{t r a i n_{1 . . k}}$ )
2:: for $i \leftarrow 1$ to k do ▹ Compute optimal features for each fold
3:: $F 1 \leftarrow \frac{1}{1 + {max}_{j = 1}^{m} (f_{j})}$ where $f_{j} \leftarrow \frac{{(μ_{c_{1}} - μ_{c_{2}})}^{2}}{(σ_{c_{1}}^{2} + σ_{c_{2}}^{2})}$
▹ Individual feature overlap
4:: $F 3 \leftarrow {min}_{j = 1}^{m} (\frac{n_{o} (f_{j})}{n})$ ▹ Ranks features based on overlap
$w h e r e, \{\begin{matrix} n_{o} (f_{j}) \leftarrow \sum_{d = 1}^{n} (x_{d j} > m a x_{m i n} (f_{j}) \land x_{d j} < m i n_{m a x} (f_{j})) \\ \{\begin{matrix} m a x_{m i n} (f_{j}) \leftarrow m a x (m i n (f_{j}^{c_{1}}), m i n (f_{j}^{c_{2}})) \\ m i n_{m a x} (f_{j}) \leftarrow m i n (m a x (f_{j}^{c_{1}}), m a x (f_{j}^{c_{2}})) \end{matrix} \end{matrix}$
5:: $F 4 \leftarrow \frac{n_{0} (f_{m i n} (X_{t r a i n_{i}^{d}}))}{n}$ ▹ Select one feature per iteration (total d features)
6:: Iteratively remove non-overlapping samples from $X_{t r a i n_{i}^{d}}$ using F3 procedure until no overlapping samples remain.
7:: repeat
8:: Iteratively refine the F4 feature subset
9:: repeat
10:: $F 1 F 3 \leftarrow {min}_{j = 1}^{m} (F 1 s c o r e + F 3 s c o r e)$ ▹ Combine F1 and F3 scores
11:: Remove F1F3 features with correlation exceeding $C o r r %$
12:: $R e l i e f F \leftarrow R e l i e f F (X_{t r a i n_{i}}, Y_{t r a i n_{i}})$ ▹ ReliefF ranks the features
13:: Remove ReliefF features with correlation exceeding $C o r r %$
14:: until Top-C features are selected
15:: until F4 has no remaining features
16:: end for
17:: end procedure

This work introduces F1F3, a novel metric that combines the weights of F1 and F3 (lines 10–11) to unify different perspectives of feature overlap for feature ranking, providing a holistic view of class separability.
–
The F1 measure (Fisher’s discriminant ratio) evaluates the extent of the overlapping region (i.e., how much the classes overlap) in each feature (line 3), where $μ$ and $σ$ denote the mean and variance of feature $f_{j}$ within each class, respectively. Lower F1 values indicate better class separability (minimal feature overlap) [55].
–
The F3 measure considers how many data points lie in the overlapping region (line 4) and evaluates a feature (say, j) based on whether the overlap is densely populated with samples (high complexity) or contains only a few samples (low complexity). A low value indicates better feature efficiency and implies that it can separate more samples.
The F4 measure builds on the F3 measure and iteratively reduces class overlap (lines 5–6) by focusing on local class separability to select a set of features. It selects the feature that eliminates the most samples from the overlapping region, continuing until no samples remain in the overlap.
ReliefF applies a different strategy to address class overlap by employing k-Nearest Neighbors (k-NN) to rank features based on their discriminative power for borderline points, considering both intra-class and inter-class nearest neighbors. It updates the feature weight vector by evaluating the distances between $x_{i}$ and its near-hit_i as well as $x_{i}$ and its near-miss_i, as defined in line 12 of Algorithm 1.

4. Experimental Study and Results

This section presents the results of experiments conducted using the proposed modified subject-wise 5-fold cross-validation dataset. This work investigates the subset size or threshold at which features remain informative and evaluates the contribution of different speech feature extraction techniques in providing discriminative features. This work employs four classification methods in MATLAB 2023a to evaluate the performance of the proposed feature selection algorithm. Each classifier is implemented through the MATLAB built-in library functions, with default parameter settings and automatic kernel scale optimization. Naive Bayes uses normal distribution as an input parameter; Decision Tree applies the Gini split criterion (max splits = 587, pruning = on); k-NN uses k = 3 (as each subject has 3 speech samples, which prevents decision bias), Minkowski distance, exhaustive search, and equal weighting; and SVM employs Linear, Polynomial, and RBF kernels with box constraint = 1, automatic kernel scaling, and the Sequential Minimal Optimization (SMO) solver. The results for the top-50 features are analyzed, comparing the proposed HDC feature selection algorithm with four existing feature selection algorithms—InfoGain, GainRatio, ReliefF, and mRMR—to assess their effectiveness in PD detection. The following evaluation metrics are employed to assess how well a feature selection algorithm differentiates PD patients from healthy subjects:

Accuracy: Measures the proportion of correctly classified PD and healthy cases, and is given by

$Accuracy = \frac{T P + T N}{T P + F P + T N + F N}$

(9)

where TP (true positives) represents correctly detected PD cases, TN (true negatives) indicates correctly identified healthy cases, FP (false positives) is misdiagnosed healthy individuals, and FN (false negatives) denotes missed PD cases.
Geometric Mean (G-Mean): Measures the balance between sensitivity (correct PD detection) and specificity (correct identification of healthy individuals), and is given by

$G - mean = \sqrt{T P_{r a t e} \times T N_{r a t e}}$

(10)

where $T P_{r a t e} = \frac{T P}{T P + F N}$ , and $T N_{r a t e} = \frac{T N}{F P + T N}$ .
$F_{1}$ -Score: A measure such as the F1-score is crucial for false detection evaluation, as it balances precision (reduces false positives) and recall (reduces false negatives), and is given by

$F_{1} - Score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

(11)

where $P r e c i s i o n = \frac{T P}{T P + F P}$ , and $R e c a l l = \frac{T P}{T P + F N}$ .
Matthews Correlation Coefficient (MCC): A measure such as MCC plays a vital role in assessing model reliability in clinical applications such as PD detection, where a higher MCC value (closer to +1) is desirable, as it indicates accurate and reliable classification [57,58]. Values near 0 suggest random performance, while values near −1 imply poor or misleading predictions.

$MCC = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}$

(12)

4.1. Empirical Analysis of Existing Speech Feature Categories

This section analyzes the subset size or threshold at which features remain informative, along with the contribution of different speech feature extraction techniques [21,22,23,24,25]. The aim is to extract informative features that capture vocal disorders in PD detection and provide useful clinical insights. Recently, the study in [25] introduced a new feature extraction technique (i.e., TQWT) for PD detection and compared it with existing feature extraction techniques. This work conducts a comprehensive evaluation to identify which feature extraction techniques contribute most to the discriminative feature set using state-of-the-art filter feature selection algorithms, namely Information Gain, Gain Ratio, and ReliefF.

As shown in Figure 4, three classification models—Naive Bayes, k-Nearest Neighbors (3NN), and Decision Tree—are employed to assess the effectiveness of the ranked features from the aforementioned algorithms.

4.1.1. Size Limit or Threshold Analysis for Informative Features and Category-Specific Contributions

This work deploys the aforementioned feature selection algorithms to identify the threshold at which features remain informative and the contribution of each feature category to the informative features. Figure 4 presents the performance results of three different classifiers using the proposed modified subject-wise 5-fold cross-validation dataset. As shown in Table 5, this work applies thresholds (Information Gain < 0.05, Gain Ratio < 0.05, and ReliefF < 0.01) to discard features with negligible values and identify informative features. Out of 752 features, the informative feature counts for each training fold are as follows: Information Gain—249, 267, 322, 253, and 295; Gain Ratio—327, 383, 391, 331, and 366; and ReliefF—329, 328, 364, 323, and 350. Furthermore, these features are categorized into six categories—baseline, time–frequency, vocal, MFCC, wavelet, and TQWT—for evaluating the contribution of each feature category to informative features. Overall, Information Gain, Gain Ratio, and ReliefF feature selection algorithms determine that 37%, 48%, and 45% of speech features, respectively, are informative for PD detection. Despite their high absolute count, TQWT features contribute a smaller fraction of informative features (InfoGain: 35%, GainRatio: 49%, ReliefF: 38%). In contrast, baseline features are highly informative even with a small total count, while wavelet and MFCC features contribute similarly in terms of informative features (43–45%).

As shown in Figure 4, performance evaluation shows that the top-50 features achieve optimal performance, while features up to 200 maintain comparable performance. This work analyzes the top-200 features in intervals of 50—moving from the most important to the least important—revealing the contribution of each feature category. As shown in Figure 5 and Figure 6, this analysis reveals which feature categories contribute little within the top-50 and more beyond it (providing informative but less discriminative features), and which provide more discriminative features within the top-50. This work finds that the most contributing feature categories in the top-50 features are TQWT and MFCCs, which is consistent with [25,30,31,32]. In contrast, Wavelet features become prominent only beyond the top-50, indicating lower discriminative power for PD detection.

4.1.2. Feature Redundancy and Correlation Analysis

The feature extraction techniques extract multiple features from speech samples, yet only a few are truly discriminative, with others being redundant or non-informative, as shown in Figure 7. Similarly, Figure 8 shows wavelet and TQWT performing k-level decomposition to capture discriminative speech characteristics. Although higher decomposition levels (k) extract more discriminative features, they simultaneously generate proportionally more redundant or non-informative features.

Due to this redundancy issue, this work evaluates feature redundancy through correlation analysis, which identifies highly correlated feature pairs—many above 85% and 90%, with some remaining strongly correlated at 95% and 99%—as shown in Figure 9. Examining these correlations by category reveals that wavelet features are highly redundant, dominating correlations at 99%, with few contributions from baseline and TQWT. At 85% correlation, all feature categories are present, but wavelet still leads, followed by TQWT, baseline, and MFCC, with only baseline and wavelet showing inter-category correlations. Therefore, since features correlated above 95% carry nearly identical information as shown in Figure 10, eliminating such features is recommended for feature selection, which reduces dimensionality without significant information loss while ensuring a discriminative, non-redundant feature set for PD detection.

4.2. Design and Evaluation of the Final Feature Set for the Proposed HDC

The proposed HDC algorithm employs F4 and F1F3 data complexity measures to reduce feature overlap and ReliefF to focus on samples near class boundaries, thereby improving class separability. This section explores how to effectively combine these data complexity measures with the ReliefF algorithm to select an optimal feature subset. The F4 measure produces a feature subset, while F1F3 and ReliefF provide a ranked list of all features. F4 selects features by iteratively reducing samples in the overlapping region. Therefore, it is important to identify up to which point the F4-based feature subset remains discriminative, as features selected in the later stages may lack discriminative power. This work employs a correlation threshold for the final feature set. Therefore, after including the F4-based feature subset, features from F1F3 and ReliefF are added sequentially. This sequential approach ensures that ReliefF features are retained, whereas block-wise addition would risk their removal due to correlation filtering. A correlation-based analysis is conducted on the top-50 features over the proposed 5-fold benchmark dataset, with thresholds of 85%, 90%, and 95%. Table 6 presents different combinations of the top-50 most informative features for the proposed HDC. This work obtains five different feature subsets, one for each training set, but reduces the F4 subset by f(=4) features uniformly across all folds instead of using fold-specific values. The final discriminative, non-redundant feature set is then obtained by removing highly correlated features (above 95%), which reduces dimensionality without significant information loss and leads to improved results, as shown in Table 6. The proposed HDC achieves the best results with an accuracy of 0.86, a G-mean of 0.70, an F1-score of 0.91, and an MCC of 0.58 using an SVM (RBF) classifier.

4.3. Results and Discussion

This work adopts a uniform (or fair) experimental setup to facilitate a streamlined research workflow in PD detection. To support this, a modified subject-wise 5-fold dataset is introduced, which ensures fair comparison of studies and helps build reliable decision support systems for PD detection that perform well on unseen data. Several literature studies achieved a high accuracy of 96–100% using hold-out and k-fold (10-fold) cross-validation schemes [30,31,32,33,34,59,60,61]. These studies overlook that each subject has multiple speech samples, which creates virtual overlap when samples from the same subject are split between training and testing sets. Earlier, the study in [36] introduced an unbiased approach to handle this virtual overlap and re-evaluated the dysphonia features from [37] using a Leave-One-Subject-Out (LOSO) cross-validation scheme, which ensured testing on unseen subjects and led to a sharp accuracy drop from 91.40% to 65.13%. The studies in [25,62] follow this approach and use the LOSO cross-validation scheme for the dataset in [39], iteratively leaving out all samples from one subject as test data. This results in 86–87% accuracy representing unbiased (or fair) generalization, compared to the inflated 96–100% accuracy from sample-wise validation schemes.

This work confirms this inflation in Figure 11 by comparing subject-wise and sample-wise cross-validation schemes. The results show that subject-wise evaluation achieves a maximum accuracy of 86% with SVM (RBF), while sample-wise evaluation reaches 92% with SVM (Polynomial). This 6% inflation confirms that sample-wise cross-validation lacks unbiased assessment of classifier performance, which underscores the importance of subject-wise cross-validation for developing clinically reliable PD telediagnosis systems. In addition, this work develops a feature selection algorithm that explores class separability using data complexity measures and ReliefF. The proposed HDC algorithm is evaluated using subject-wise 5-fold cross-validation and compared with existing state-of-the-art feature selection algorithms across multiple evaluation metrics. The experimental results for six classifiers are presented in Table 7. The results demonstrate that the proposed method outperforms existing feature selection algorithms in five out of six classifiers and achieves the highest accuracy of 0.86, with a G-mean of 0.70, an F1-score of 0.91, and an MCC of 0.58, using the SVM (RBF) classifier.

The publicly available preprocessed dataset [39] lacks standardized train–test splits, limiting fair comparison across studies. To address this, this work provides a modified subject-wise 5-fold benchmark dataset in standard format, using the maximum number of subjects (i.e., 245 out of 252 subjects) for cross-validation. This publicly accessible benchmark dataset enables consistent and reproducible evaluation, streamlining research in Parkinson’s disease speech classification. Beyond the 5-fold cross-validation, hold-out validation is conducted, where these 245 subjects serve as the training set and the remaining 7 subjects as the test set. However, as shown in Table 8, the evaluation results are not statistically reliable or representative of the broader population due to the limited test set of 7 subjects. The consistently high recall value (1 in most cases) and precision value close to 0.50 indicate that the test set size is insufficient for drawing meaningful generalization conclusions. It nevertheless offers a valuable resource for future researchers to conduct detailed case-by-case analyses, examine model behavior on individual patients, and identify specific failure modes or success patterns in personalized diagnostic scenarios.

Table 9 provides results using biased 10-fold (sample-wise) cross-validation, as adopted in existing studies, for comparison with earlier approaches. The proposed HDC method, combined with the k-Nearest Neighbors (1NN, similar to existing studies) classifier, demonstrates superior overall performance across multiple evaluation metrics. It achieves an accuracy of 0.96, F1-score of 0.97, MCC of 0.89, precision of 0.97, and recall of 0.98, outperforming most existing approaches, including one-against-all (OGA) data sampling, wrapper methods, minimum average maximum (MAMa) tree, and deep learning-based methods such as variational autoencoder (VAE) and sparse autoencoder (SAE). The high G-mean (0.94) further confirms the balanced classification performance across classes. These results highlight that the integration of data complexity measures in the proposed HDC method effectively enhances class separability and contributes to more reliable feature selection, leading to improved Parkinson’s speech classification.

In addition to the 5-fold benchmark dataset, this work introduces a hybrid data complexity-based (HDC) feature selection algorithm that integrates F4 and F1F3 data complexity measures with ReliefF to select discriminative features. The F4 measure iteratively reduces feature overlap to select a set of features that ensures better class separability, whereas F1F3 and ReliefF enhance class separability by minimizing feature overlap and improving separation of borderline samples. As shown in Figure 12, the selected features exhibit no significant correlations, as the HDC algorithm removes features with correlation exceeding 90% or 95%, ensuring a discriminative and non-redundant feature set. The 95% correlation threshold configuration of the proposed HDC algorithm outperforms other existing feature selection methods for five out of six evaluated classifiers.

To understand which features contribute most to classification performance, this work analyzes the contribution of each feature category within the top-50 features for the proposed HDC algorithm, along with existing state-of-the-art feature selection algorithms. As shown in Figure 13, the majority of the top-50 features belong to TQWT (72% for Information Gain, 64% for Gain Ratio, 50% for ReliefF, 48% for MRMR, and 60% for the proposed HDC), while the second most contributing group is MFCC (i.e., 10–30%), which confirms findings in [25]. Furthermore, Figure 13 shows consistent contribution of TQWT features across each fold (i.e., 40–80%) for all feature selection algorithms as well as the distribution of TQWT sub-categories in the proposed HDC algorithm, including energy, Log Entropy, Shannon Entropy, Skewness, Kurtosis, and TKEO features, which capture the distorted periodic patterns in PD speech. The most discriminative features predominantly belong to the Log-Energy Entropy and Shannon Entropy sub-categories [25]. These entropy-based features apply logarithmic transformations to capture high-frequency information—Log-Energy Entropy to energy values of the signal and Shannon Entropy to probability distributions of the signal values—enabling detection of subtle changes in speech signals that helps in discrimination of PD patients from healthy individuals.

The analysis reveals that feature extraction techniques can extract clinically useful information related to speech disorders that directly influences model predictions. However, standard application of these techniques increases dimensionality and introduces many non-informative or redundant features. This work and existing studies [25,30,31,32,33,34,35,59,60,61,62] demonstrate that only 1–7% of features (at most, the top-50 features) are sufficient to achieve optimal performance, indicating that at least 93–99% are non-informative or redundant. This finding emphasizes the need for approaches that extract fewer but more discriminative features—either through developing novel feature extraction methods or finding innovative ways to apply standard feature extraction techniques—to improve PD detection from speech.

5. Conclusions and Future Work

This paper presents three main contributions to speech-based Parkinson’s disease detection. First, it introduces a uniform and ethical experimental setup with unbiased subject-wise 5-fold cross-validation that maintains population ratios and keeps test data unseen, ensuring better generalization. This subject-wise partitioning facilitates fair comparisons and streamlines research in this area. Second, the paper proposes a hybrid data complexity-based feature selection (HDC) method that uses the F4 and F1F3 data complexity measures along with ReliefF to capture the intrinsic properties of the data (i.e., feature overlap and borderline points). The novel F1F3 metric combines F1 and F3 weights to strengthen feature efficiency through diverse perspectives of feature overlap, providing a holistic view of class separability. The proposed HDC algorithm provides an optimal feature subset that reduces geometric complexity and maximizes separability. The experimental results demonstrate that the proposed HDC method with a 95% correlation threshold outperforms existing feature selection methods across five out of six evaluated classifiers. The proposed HDC achieves the best results with an accuracy of 0.86, a G-mean of 0.70, an F1-score of 0.91, and an MCC of 0.58 using an SVM (RBF) classifier. These results demonstrate that the top-50 features selected by the proposed HDC algorithm consistently outperform existing methods by a clear margin. Third, the analysis of important features confirms that the majority of the most discriminative features are from the TQWT category (46–72%), followed by MFCCs (10–30%) and wavelet features (0–16%). The proposed HDC algorithm demonstrates that TQWT features, particularly the Log-Energy Entropy and Shannon Entropy sub-categories, exhibit superior discriminative ability in capturing the distorted periodic patterns in PD speech signals.

This work and existing studies show that 93–99% of features are non-informative, as only 1–7% are sufficient for optimal performance. This finding suggests important directions for future research. Researchers can develop novel traditional feature extraction approaches or apply existing techniques innovatively to extract fewer but more discriminative features for PD detection. Beyond traditional approaches, deep learning offers promising directions, including developing CNN architectures like SkipConNet or multi-layered CNNs, or employing sparse/variational autoencoders as dimensionality reduction techniques in two-stage approaches combined with traditional/meta feature selection algorithms.

Furthermore, specific to our work, future research can pursue several promising directions. The proposed method will be further refined to address its current limitations. This approach selects features that focus on feature overlap for class separability without considering structural information or the local neighborhood distribution of classes. Moreover, like standard filter-based methods, it requires a predefined subset size, as it lacks an automatic stopping criterion for optimal feature selection. Future work will address these limitations by integrating neighborhood and network data complexity measures with feature-based measures and introducing a stopping criterion to automatically determine the optimal feature subset. In addition, the proposed subject-wise 5-fold benchmark dataset, which maintains gender population ratios in each fold, facilitates investigation of gender-specific patterns. Future research can examine whether males and females exhibit distinct or similar speech disorder patterns in PD detection. Furthermore, as reported in [64], recent studies have increasingly focused on both diagnosis and prognosis of Parkinson’s disease. The dataset supports research beyond diagnosis to include prognostic applications, enabling monitoring of PD progression and planning of early interventions through speech analysis.

Author Contributions

Conceptualization, A.K., M.G. and S.S.; methodology, A.K.; validation, A.K. and S.S.; investigation, A.K.; writing—original draft preparation, A.K.; writing—review and editing, A.K., S.S. and M.G.; visualization, A.K.; supervision, M.G. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available in a publicly accessible repository (GitHub) at https://github.com/ArvindKumar-PhD/DataComplexity-aware-FeatureSelection (accessed on 10 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

De Rijk, M.d.; Launer, L.; Berger, K.; Breteler, M.; Dartigues, J.; Baldereschi, M.; Fratiglioni, L.; Lobo, A.; Martinez-Lage, J.; Trenkwalder, C.; et al. Prevalence of Parkinson’s disease in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in the Elderly Research Group. Neurology 2000, 54, S21–S23. [Google Scholar]
Han, C.X.; Wang, J.; Yi, G.S.; Che, Y.Q. Investigation of EEG abnormalities in the early stage of Parkinson’s disease. Cogn. Neurodyn. 2013, 7, 351–359. [Google Scholar] [CrossRef] [PubMed]
Yuvaraj, R.; Murugappan, M.; Acharya, U.R.; Adeli, H.; Ibrahim, N.M.; Mesquita, E. Brain functional connectivity patterns for emotional state classification in Parkinson’s disease patients without dementia. Behav. Brain Res. 2016, 298, 248–260. [Google Scholar] [CrossRef] [PubMed]
Yuvaraj, R.; Acharya, U.R.; Hagiwara, Y. A novel Parkinson’s Disease Diagnosis Index using higher-order spectra features in EEG signals. Neural Comput. Appl. 2018, 30, 1225–1235. [Google Scholar] [CrossRef]
Oh, S.L.; Hagiwara, Y.; Raghavendra, U.; Yuvaraj, R.; Arunkumar, N.; Murugappan, M.; Acharya, U.R. A deep learning approach for Parkinson’s disease diagnosis from EEG signals. Neural Comput. Appl. 2018, 32, 10927–10933. [Google Scholar] [CrossRef]
Bhat, S.; Acharya, U.R.; Hagiwara, Y.; Dadmehr, N.; Adeli, H. Parkinson’s disease: Cause factors, measurable indicators, and early diagnosis. Comput. Biol. Med. 2018, 102, 234–241. [Google Scholar] [CrossRef]
Lacy, S.E.; Smith, S.L.; Lones, M.A. Using echo state networks for classification: A case study in Parkinson’s disease diagnosis. Artif. Intell. Med. 2018, 86, 53–59. [Google Scholar] [CrossRef]
Loconsole, C.; Cascarano, G.D.; Brunetti, A.; Trotta, G.F.; Losavio, G.; Bevilacqua, V.; Di Sciascio, E. A model-free technique based on computer vision and sEMG for classification in Parkinson’s disease by using computer-assisted handwriting analysis. Pattern Recognit. Lett. 2019, 121, 28–36. [Google Scholar] [CrossRef]
Zeng, W.; Liu, F.; Wang, Q.; Wang, Y.; Ma, L.; Zhang, Y. Parkinson’s disease classification using gait analysis via deterministic learning. Neurosci. Lett. 2016, 633, 268–278. [Google Scholar] [CrossRef]
Joshi, D.; Khajuria, A.; Joshi, P. An automatic non-invasive method for Parkinson’s disease classification. Comput. Methods Programs Biomed. 2017, 145, 135–145. [Google Scholar] [CrossRef]
Sharma, P.; Sundaram, S.; Sharma, M.; Sharma, A.; Gupta, D. Diagnosis of Parkinson’s disease using modified grey wolf optimization. Cogn. Syst. Res. 2019, 54, 100–115. [Google Scholar] [CrossRef]
Afonso, L.C.; Rosa, G.H.; Pereira, C.R.; Weber, S.A.; Hook, C.; Albuquerque, V.H.C.; Papa, J.P. A recurrence plot-based approach for Parkinson’s disease identification. Future Gener. Comput. Syst. 2019, 94, 282–292. [Google Scholar] [CrossRef]
Rios-Urrego, C.D.; Vásquez-Correa, J.C.; Vargas-Bonilla, J.F.; Nöth, E.; Lopera, F.; Orozco-Arroyave, J.R. Analysis and evaluation of handwriting in patients with Parkinson’s disease using kinematic, geometrical, and non-linear features. Comput. Methods Programs Biomed. 2019, 173, 43–52. [Google Scholar] [CrossRef] [PubMed]
Tsanas, A.; Little, M.; McSharry, P.; Ramig, L. Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests. Nat. Preced. 2009. [Google Scholar] [CrossRef]
Mostafa, S.A.; Mustapha, A.; Mohammed, M.A.; Hamed, R.I.; Arunkumar, N.; Abd Ghani, M.K.; Jaber, M.M.; Khaleefah, S.H. Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease. Cogn. Syst. Res. 2019, 54, 90–99. [Google Scholar] [CrossRef]
Hartelius, L.; Svensson, P. Speech and swallowing symptoms associated with Parkinson’s disease and multiple sclerosis: A survey. Folia Phoniatr. Logop. 1994, 46, 9–17. [Google Scholar] [CrossRef]
Ho, A.K.; Iansek, R.; Marigliani, C.; Bradshaw, J.L.; Gates, S. Speech impairment in a large sample of patients with Parkinson’s disease. Behav. Neurol. 1998, 11, 131–137. [Google Scholar] [CrossRef]
Erdogdu Sakar, B.; Serbes, G.; Sakar, C.O. Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease. PLoS ONE 2017, 12, e0182428. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Enhanced classical dysphonia measures and sparse regression for telemonitoring of Parkinson’s disease progression. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; IEEE: New York, NY, USA, 2010; pp. 594–597. [Google Scholar]
Braak, H.; Ghebremedhin, E.; Rüb, U.; Bratzke, H.; Del Tredici, K. Stages in the development of Parkinson’s disease-related pathology. Cell Tissue Res. 2004, 318, 121–134. [Google Scholar] [CrossRef]
Logemann, J.A.; Fisher, H.B.; Boshes, B.; Blonsky, E.R. Frequency and cooccurrence of vocal tract dysfunctions in the speech of a large sample of Parkinson patients. J. Speech Hear. Disord. 1978, 43, 47–57. [Google Scholar] [CrossRef]
Little, M.; McSharry, P.; Roberts, S.; Costello, D.; Moroz, I. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Nat. Preced. 2007. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. New nonlinear markers and insights into speech signal degradation for effective tracking of Parkinson’s disease symptom severity. IEICE Proc. Ser. 2010, 457–460. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Spielman, J.; Ramig, L.O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2012, 59, 1264–1271. [Google Scholar] [CrossRef] [PubMed]
Sakar, C.O.; Serbes, G.; Gunduz, A.; Tunc, H.C.; Nizam, H.; Sakar, B.E.; Tutuncu, M.; Aydin, T.; Isenkul, M.E.; Apaydin, H. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. 2019, 74, 255–263. [Google Scholar] [CrossRef]
Sharma, G.; Umapathy, K.; Krishnan, S. Trends in audio signal feature extraction methods. Appl. Acoust. 2020, 158, 107020. [Google Scholar] [CrossRef]
Tuncer, T.; Dogan, S. Novel dynamic center based binary and ternary pattern network using M4 pooling for real world voice recognition. Appl. Acoust. 2019, 156, 176–185. [Google Scholar] [CrossRef]
Korkmaz, Y.; Boyacı, A. A comprehensive Turkish accent/dialect recognition system using acoustic perceptual formants. Appl. Acoust. 2022, 193, 108761. [Google Scholar] [CrossRef]
Soumaya, Z.; Drissi Taoufiq, B.; Benayad, N.; Yunus, K.; Abdelkrim, A. The detection of Parkinson disease using the genetic algorithm and SVM classifier. Appl. Acoust. 2021, 171, 107528. [Google Scholar] [CrossRef]
Tuncer, T.; Dogan, S. A novel octopus based Parkinson’s disease and gender recognition method using vowels. Appl. Acoust. 2019, 155, 75–83. [Google Scholar] [CrossRef]
Tuncer, T.; Dogan, S.; Acharya, U.R. Automated detection of Parkinson’s disease using minimum average maximum tree and singular value decomposition method with vowels. Biocybern. Biomed. Eng. 2020, 40, 211–220. [Google Scholar] [CrossRef]
Solana-Lavalle, G.; Galán-Hernández, J.C.; Rosas-Romero, R. Automatic Parkinson disease detection at early stages as a pre-diagnosis tool by using classifiers and a small set of vocal features. Biocybern. Biomed. Eng. 2020, 40, 505–516. [Google Scholar] [CrossRef]
Xiong, Y.; Lu, Y. Deep feature extraction from the vocal vectors using sparse autoencoders for Parkinson’s classification. IEEE Access 2020, 8, 27821–27830. [Google Scholar] [CrossRef]
Masud, M.; Singh, P.; Gaba, G.S.; Kaur, A.; Alroobaea, R.; Alrashoud, M.; Alqahtani, S.A. CROWD: Crow search and deep learning based feature extractor for classification of Parkinson’s disease. ACM Trans. Internet Technol. (TOIT) 2021, 21, 1–18. [Google Scholar] [CrossRef]
Gunduz, H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification. Biomed. Signal Process. Control 2021, 66, 102452. [Google Scholar] [CrossRef]
Sakar, C.O.; Kursun, O. Telediagnosis of Parkinson’s disease using measurements of dysphonia. J. Med. Syst. 2010, 34, 591–599. [Google Scholar] [CrossRef]
Little, M.; McSharry, P.; Hunter, E.; Spielman, J.; Ramig, L. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nat. Preced. 2008. [Google Scholar] [CrossRef]
Smialowski, P.; Frishman, D.; Kramer, S. Pitfalls of supervised feature selection. Bioinformatics 2010, 26, 440–443. [Google Scholar] [CrossRef]
Sakar, C.; Sakar, B. Parkinson’s Disease Classification; UCI Machine Learning Repository: Noida, India, 2018. [Google Scholar] [CrossRef]
Liu, H.; Li, J.; Wong, L. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform. 2002, 13, 51–60. [Google Scholar]
Hilario, M.; Kalousis, A. Approaches to dimensionality reduction in proteomic biomarker studies. Briefings Bioinform. 2008, 9, 102–118. [Google Scholar] [CrossRef]
Zheng, C.H.; Huang, D.S.; Zhang, L.; Kong, X.Z. Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 599–607. [Google Scholar] [CrossRef]
Choi, J.Y.; Ro, Y.M.; Plataniotis, K.N. Boosting color feature selection for color face recognition. IEEE Trans. Image Process. 2010, 20, 1425–1434. [Google Scholar] [CrossRef]
Goltsev, A.; Gritsenko, V. Investigation of efficient features for image recognition by neural networks. Neural Netw. 2012, 28, 15–23. [Google Scholar] [CrossRef]
Forman, G. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 2003, 3, 1289–1305. [Google Scholar]
Ambusaidi, M.A.; He, X.; Nanda, P.; Tan, Z. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 2016, 65, 2986–2998. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Quinlan, J.R. C 4.5: Programs for machine learning. In The Morgan Kaufmann Series in Machine Learning; Elsevier: Amsterdam, The Netherlands, 1993. [Google Scholar]
Quinlan, J.R. Improved use of continuous attributes in C4. 5. J. Artif. Intell. Res. 1996, 4, 77–90. [Google Scholar] [CrossRef]
Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of the European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germnay, 1994; pp. 171–182. [Google Scholar]
Kira, K.; Rendell, L.A. A practical approach to feature selection. In Machine Learning Proceedings 1992; Elsevier: Amsterdam, The Netherlands, 1992; pp. 249–256. [Google Scholar]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
Ho, T.K.; Basu, M. Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 289–300. [Google Scholar] [CrossRef]
Lorena, A.C.; Garcia, L.P.; Lehmann, J.; Souto, M.C.; Ho, T.K. How Complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 2019, 52, 1–34. [Google Scholar] [CrossRef]
Orriols-Puig, A.; Macia, N.; Ho, T.K. Documentation for the data complexity library in C++. Univ. Ramon Llull Salle 2010, 196, 12. [Google Scholar]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
Chicco, D.; Tötsch, N.; Jurman, G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021, 14, 13. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Pun, C.M.; Xu, F.; Pan, L.; Zong, R.; Gao, H.; Lu, H. A hybrid feature selection algorithm based on a discrete artificial bee colony for Parkinson’s diagnosis. ACM Trans. Internet Technol. 2021, 21, 1–22. [Google Scholar] [CrossRef]
Celik, G.; Başaran, E. Proposing a new approach based on convolutional neural networks and random forest for the diagnosis of Parkinson’s disease from speech signals. Appl. Acoust. 2023, 211, 109476. [Google Scholar] [CrossRef]
Hasanzadeh, M.; Mahmoodian, H. A novel hybrid method for feature selection based on gender analysis for early Parkinson’s disease diagnosis using speech analysis. Appl. Acoust. 2023, 211, 109561. [Google Scholar] [CrossRef]
Gunduz, H. Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access 2019, 7, 115540–115551. [Google Scholar] [CrossRef]
Polat, K.; Nour, M. Parkinson disease classification using one against all based data sampling with the acoustic features from the speech signals. Med. Hypotheses 2020, 140, 109678. [Google Scholar] [CrossRef]
Xavier, D.; Felizardo, V.; Ferreira, B.; Zacarias, H.; Pourvahab, M.; Souza-Pereira, L.; Garcia, N.M. Voice analysis in Parkinson’s disease-a systematic literature review. Artif. Intell. Med. 2025, 163, 103109. [Google Scholar] [CrossRef]

Figure 1. Block diagram of uniform experimental setup for PD telediagnosis system, in which dashed rectangle shows our contribution.

Figure 2. Development of a subject-wise 5-fold cross-validation setup for internal evaluation and a separate external test set for external evaluation.

Figure 3. Block diagram of the proposed hybrid data complexity-based feature selection (HDC) algorithm.

Figure 4. Performance of three different classifiers (Naive Bayes, KNN, Decision Tree) on ranked lists of 752 speech features from Information Gain, Gain Ratio, and ReliefF, respectively.

Figure 5. Category-wise distribution of the top-50 most important (discriminative) features selected by Information Gain, Gain Ratio, and ReliefF for each fold.

Figure 6. Average of all folds for important, less important, and least important features.

Figure 7. MFCCs and their derivatives for healthy and PD subjects from preprocessed dataset values.

Figure 8. k-level decomposition of a (speech) signal using wavelet transform (WT) and tunable Q-factor wavelet transform (TQWT).

Figure 9. Count of correlated feature pairs (a) above 85%, (b) above 90%, (c) above 95%, and (d) above 99% correlation.

Figure 10. Highly correlated (positive and negative) feature pairs from wavelet, baseline, and TQWT feature categories.

Figure 11. Comparison of subject-wise and sample-wise cross-validation schemes, where sample-wise cross-validation shows inflated results.

Figure 12. Heatmaps illustrating correlation patterns among the top-50 HDC features under (a) 90% and (b) 95% thresholds, showing that most features are weakly correlated while only a few exhibit high correlation.

Figure 13. Analysis of speech feature composition among the top-50 features: (a) distribution of major speech feature categories; (b) contribution of TQWT-based features; and (c) detailed distribution of TQWT sub-category features.

Table 1. Chronological evolution of feature extraction techniques in PD telediagnosis studies to extract discriminative features [21,22,23,24,25].

Study Period	Feature Extraction Techniques	Advantages
Earlier studies	Baseline, Time–Frequency, Vocal	Capture general and traditional acoustic features (pitch, jitter, shimmer, formants)
Later studies	Wavelet, MFCC, TQWT	Capture localized, nonstationary, and fine-grained speech patterns; better at detecting subtle impairments

Table 2. Details of the publicly available UCI Parkinson’s disease dataset [39].

Name	Values	Description
# of Vowel /a/ Samples	756	Each subject with 3 repetitions Speech recordings were collected using a microphone at a sampling rate of 44.1 KHz
# of Subjects	252 (188 PD, 64 healthy)	Class imbalance ratio (3:1)
# of PD Patients	188 (107 men, 81 women)	Age range: [33, 87] years Mean ± Std: 65.1 ± 10.9
# of Healthy Subjects	64 (23 men, 41 women)	Age range: [41, 82] years Mean ± Std: 61.1 ± 8.9

Table 3. Description of various speech feature categories [25].

Feature Category	Description	Features
Baseline Features	Jitter and shimmer capture variations in fundamental frequency ( $F_{0}$ ) and amplitude Statistical measures of $F_{0}$ perturbation (Min, Max, mean, Median, Std. dev., and Avg.) Noise-to-tonal component ratios (noise-to-harmonics (NHR), harmonics-to-noise (HNR)) Nonlinear measures, including Recurrence Period Density Entropy (RPDE), Pitch Period Entropy (PPE), and Detrended Fluctuation Analysis (DFA)	23
Time–Frequency Features	Intensity parameters, i.e., Min, Max and mean Formant frequencies, i.e., F1, F2, F3, and F4 Bandwidth, i.e., b1, b2, b3, and b4	11
Vocal Fold Features	Vocal fold excitation ratio (VFER), Glottal to noise excitation (GNE) Glottis quotient (GQ) and Empirical mode decomposition (EMD)	22
Wavelet Transform Features	Original and log transform of $F_{0}$ contour 10-level discrete wavelet transform Approximation and detailed coefficients of Teager–Kaiser energy (TKEO) and entropy of Shannon and log energy	182
MFCC Features	Mean and Std. dev. of the original 13 MFCCs and their derivatives (i.e., first, second) Log energy of the signal	84
TQWT Features	36-level Q-factor wavelet transform Min, Max, mean, Median, Std. dev., Skewness, Kurtosis, entropy of log energy and Shannon, and Teager–Kaiser energy (TKEO)	432

Table 4. Speech features for Parkinson’s disease detection with equations and parameter definitions.

Feature	Equation/Definition with Parameters
Jitter (Jitter_F0,abs)	$\frac{1}{N} \sum_{i = 1}^{N - 1} \| F_{0, i} - F_{0, i + 1} \|$ where $F_{0, i}$ : fundamental frequency of i-th speech cycle; N: total number of $F_{0}$ periods
Shimmer (Shimmer_dB)	$\frac{1}{N} \sum_{i = 1}^{N - 1} 20 \cdot \|{log}_{10} \frac{A_{0, i}}{A_{0, i + 1}}\|$ where $A_{0, i}$ : peak amplitude of the i-th cycle; N: total number of cycles
HNR(dB) & NHR(dB)	HNR: $10 \cdot {log}_{10} \frac{R_{x x} (l_{max})}{1 - R_{x x} (l_{max})}$ NHR: $10 \cdot \frac{1 - R_{x x} (l_{max})}{R_{x x} (l_{max})}$ where $R_{x x} (l_{max})$ : maximum autocorrelation value of the signal
RPDE	$\frac{- \sum_{i}^{T_{max}} p (i) \cdot ln (p (i))}{ln (T_{max})}$ where $p (i)$ : probability of recurrence period i; $T_{max}$ : maximum recurrence period
PPE	$\frac{- \sum_{i}^{L_{PPE}} p (i) \cdot ln (p (i))}{ln (L_{PPE})}$ where $p (i)$ : probability of pitch period i; $L_{PPE}$ : total pitch periods
DFA	$\frac{1}{1 + exp (- γ)}$ where $γ$ : slope of fluctuation function
MFCC (MFCC_n)	$\sum_{k = 1}^{K} E_{k} cos [n (k - 0.5) π / K], n = 0, \dots, L$ where $E_{k}$ : log energy of the k-th Mel-filtered frequency band; K: total number of Mel filters; n: index of cepstral coefficient (0 to L); L: total number of MFCC coefficients
$WT (r, s)$	$\frac{1}{\sqrt{c_{0}^{r}}} \sum_{k = 1}^{n} x [n] \cdot g (\frac{n - n d_{0} c_{0}^{r}}{c_{0}^{r}})$ where $x [n]$ : discrete-time speech signal; $g (\cdot)$ : mother wavelet function; r: scale index (controls dilation of the wavelet); s: translation index (controls shift of the wavelet); $c_{0}$ : scale step size (usually >1); $d_{0}$ : translation step size; N: total number of signal samples
TQWT	$H_{0}^{(k)} (w) = \{\begin{matrix} \prod_{m = 0}^{k - 1} H_{0} (\frac{w}{α^{m}}), & \| w \| \leq α^{k} π \\ 0, & α^{k} π \leq \| w \| \leq π \end{matrix}$ $H_{1}^{(k)} (w) = \{\begin{matrix} H_{1} (\frac{w}{α^{k - 1}}) \prod_{m = 0}^{k - 2} H_{0} (\frac{w}{α^{m}}), & (1 - β) α^{k - 1} π \leq \| w \| \leq α^{k - 1} π \\ 0, & f o r o t h e r s w \in - [π, π] \end{matrix}$ where $H_{0} (w), H_{1} (w)$ : low-pass and high-pass filters; $α$ : Q-factor scaling parameter (controls bandwidth of the filters), $β$ : redundancy parameter (controls overlap of frequency bands); k: decomposition level (number of wavelet subbands); w: frequency variable in radians per sample

Table 5. Categorization of informative speech features from Information Gain, Gain Ratio, and ReliefF.

Feature Selection	Fold(s)	Baseline (23)	Time (4)	Frequency (7)	MFCC (84)	Wavelet (182)	Vocal (22)	TQWT (432)
Information Gain	Fold 1 (249)	15	3	1	25	79	2	124
	Fold 2 (267)	11	3	1	20	78	1	153
	Fold 3 (322)	18	3	2	25	88	7	179
	Fold 4 (253)	10	3	1	24	79	3	133
	Fold 5 (295)	14	3	1	27	80	5	165
Gain Ratio	Fold 1 (327)	20	3	0	32	86	6	180
	Fold 2 (383)	16	3	2	33	88	4	237
	Fold 3 (391)	20	3	2	32	88	9	237
	Fold 4 (331)	15	3	2	31	87	5	188
	Fold 5 (366)	17	3	1	35	81	7	222
ReliefF	Fold 1 (329)	15	3	3	53	84	17	154
	Fold 2 (328)	14	3	5	53	74	15	164
	Fold 3 (364)	18	3	6	54	81	16	186
	Fold 4 (323)	15	3	4	53	77	16	155
	Fold 5 (350)	18	3	5	56	86	15	167

Table 6. Results obtained using proposed HDC feature selection algorithm for various combination of features from F4 measure, F1F3 measure, and Relief feature selection algorithm.

F4: ‘All Features’	Corr. > 0.85				Corr. > 0.90				Corr. > 0.95
	Accuracy	G-Mean	F1-Score	MCC	Accuracy	G-Mean	F1-Score	MCC	Accuracy	G-Mean	F1-Score	MCC
Naive Bayes	0.70	0.70	0.78	0.37	0.71	0.70	0.78	0.37	0.70	0.69	0.77	0.36
k-NN	0.81	0.68	0.88	0.46	0.81	0.66	0.88	0.45	0.81	0.68	0.88	0.46
Decision Tree	0.72	0.63	0.81	0.29	0.72	0.61	0.81	0.28	0.75	0.60	0.84	0.30
SVM (Linear)	0.85	0.73	0.91	0.58	0.86	0.73	0.91	0.58	0.85	0.73	0.91	0.57
SVM (Polynomial)	0.82	0.72	0.88	0.51	0.84	0.74	0.90	0.56	0.82	0.71	0.88	0.49
SVM (RBF)	0.85	0.69	0.90	0.55	0.85	0.69	0.91	0.55	0.84	0.68	0.90	0.53
F4: ‘all’ − f(=1)
Naive Bayes	0.70	0.70	0.77	0.37	0.71	0.70	0.78	0.38	0.69	0.69	0.77	0.35
k-NN	0.82	0.68	0.88	0.47	0.82	0.67	0.89	0.47	0.81	0.67	0.88	0.45
Decision Tree	0.72	0.63	0.81	0.29	0.72	0.63	0.81	0.29	0.76	0.63	0.84	0.34
SVM (Linear)	0.85	0.72	0.90	0.56	0.85	0.72	0.91	0.57	0.85	0.73	0.91	0.57
SVM (Polynomial)	0.83	0.73	0.89	0.52	0.83	0.73	0.89	0.53	0.81	0.71	0.88	0.48
SVM (RBF)	0.85	0.69	0.91	0.56	0.85	0.68	0.91	0.56	0.84	0.69	0.90	0.53
F4: ‘all’ − f(=2)
Naive Bayes	0.70	0.70	0.77	0.37	0.70	0.70	0.77	0.37	0.69	0.69	0.77	0.36
k-NN	0.83	0.69	0.89	0.50	0.82	0.68	0.89	0.48	0.81	0.66	0.88	0.45
Decision Tree	0.72	0.63	0.81	0.30	0.73	0.65	0.82	0.33	0.72	0.61	0.81	0.28
SVM (Linear)	0.85	0.72	0.90	0.56	0.85	0.72	0.91	0.57	0.85	0.72	0.90	0.55
SVM (Polynomial)	0.82	0.71	0.89	0.50	0.83	0.72	0.89	0.52	0.81	0.70	0.88	0.47
SVM (RBF)	0.85	0.69	0.91	0.55	0.85	0.69	0.91	0.55	0.85	0.69	0.91	0.55
F4: ‘all’ − f(=3)
Naive Bayes	0.73	0.72	0.81	0.40	0.73	0.71	0.80	0.38	0.72	0.71	0.80	0.38
k-NN	0.83	0.69	0.89	0.50	0.81	0.67	0.88	0.45	0.82	0.68	0.89	0.48
Decision Tree	0.73	0.63	0.82	0.30	0.74	0.66	0.83	0.34	0.73	0.62	0.82	0.30
SVM (Linear)	0.84	0.71	0.90	0.54	0.85	0.73	0.91	0.57	0.85	0.73	0.91	0.57
SVM (Polynomial)	0.81	0.70	0.88	0.48	0.83	0.73	0.89	0.53	0.82	0.72	0.88	0.51
SVM (RBF)	0.85	0.69	0.91	0.56	0.85	0.68	0.91	0.55	0.85	0.70	0.91	0.55
F4: ‘all’ − f(=4)
Naive Bayes	0.73	0.71	0.80	0.38	0.74	0.72	0.81	0.41	0.72	0.71	0.80	0.39
k-NN	0.82	0.68	0.88	0.46	0.82	0.68	0.89	0.48	0.83	0.70	0.89	0.51
Decision Tree	0.74	0.63	0.82	0.31	0.74	0.65	0.83	0.34	0.74	0.65	0.83	0.34
SVM (Linear)	0.84	0.69	0.90	0.52	0.85	0.71	0.91	0.57	0.85	0.72	0.91	0.57
SVM (Polynomial)	0.81	0.71	0.88	0.48	0.82	0.72	0.88	0.50	0.83	0.73	0.89	0.52
SVM (RBF)	0.85	0.69	0.91	0.56	0.85	0.68	0.91	0.55	0.86	0.70	0.91	0.58
F4: ‘all’ − f(=5)
Naive Bayes	0.73	0.71	0.81	0.40	0.75	0.72	0.82	0.42	0.73	0.72	0.80	0.40
k-NN	0.81	0.67	0.88	0.45	0.81	0.67	0.88	0.46	0.82	0.68	0.89	0.49
Decision Tree	0.74	0.64	0.82	0.32	0.74	0.66	0.82	0.34	0.74	0.65	0.82	0.34
SVM (Linear)	0.84	0.69	0.90	0.53	0.85	0.70	0.90	0.55	0.86	0.73	0.91	0.59
SVM (Polynomial)	0.81	0.70	0.88	0.47	0.82	0.71	0.88	0.50	0.83	0.73	0.89	0.53
SVM (RBF)	0.84	0.69	0.90	0.53	0.85	0.68	0.91	0.55	0.85	0.69	0.91	0.56
F4: ‘all’ − f(=6)
Naive Bayes	0.73	0.71	0.81	0.39	0.74	0.72	0.82	0.41	0.72	0.71	0.80	0.38
k-NN	0.82	0.68	0.88	0.47	0.82	0.68	0.88	0.47	0.82	0.68	0.88	0.48
Decision Tree	0.74	0.63	0.82	0.31	0.74	0.67	0.82	0.35	0.74	0.65	0.83	0.34
SVM (Linear)	0.84	0.69	0.90	0.52	0.85	0.71	0.91	0.56	0.86	0.73	0.91	0.58
SVM (Polynomial)	0.81	0.71	0.88	0.47	0.82	0.72	0.88	0.50	0.83	0.73	0.89	0.52
SVM (RBF)	0.85	0.69	0.90	0.55	0.85	0.69	0.91	0.55	0.85	0.69	0.91	0.56
F4: ‘all’ − f(=7)
Naive Bayes	0.73	0.71	0.80	0.39	0.74	0.72	0.81	0.40	0.72	0.71	0.80	0.38
k-NN	0.82	0.68	0.88	0.47	0.81	0.66	0.88	0.44	0.82	0.67	0.88	0.47
Decision Tree	0.73	0.63	0.82	0.31	0.74	0.65	0.82	0.32	0.74	0.66	0.83	0.35
SVM (Linear)	0.84	0.69	0.90	0.53	0.85	0.72	0.91	0.57	0.86	0.73	0.91	0.59
SVM (Polynomial)	0.82	0.71	0.88	0.49	0.82	0.71	0.88	0.49	0.83	0.73	0.89	0.52
SVM (RBF)	0.84	0.69	0.90	0.54	0.85	0.68	0.91	0.56	0.85	0.69	0.91	0.56

Table 7. Results of subject-wise 5-fold cross-validation using the top-50 features selected by the proposed HDC algorithm and existing feature selection methods.

	Information Gain						Gain Ratio
	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Naive Bayes	0.75	0.70	0.83	0.39	0.88	0.79	0.69	0.65	0.75	0.35	0.87	0.71
k-NN	0.78	0.62	0.86	0.37	0.83	0.89	0.78	0.61	0.86	0.37	0.83	0.90
Decision Tree	0.64	0.59	0.74	0.19	0.81	0.68	0.57	0.52	0.62	0.14	0.81	0.57
SVM (Linear)	0.83	0.67	0.89	0.49	0.85	0.94	0.82	0.66	0.89	0.48	0.85	0.94
SVM (Polynomial)	0.78	0.67	0.86	0.40	0.85	0.87	0.78	0.67	0.86	0.40	0.85	0.86
SVM (RBF)	0.81	0.55	0.89	0.43	0.81	0.98	0.79	0.55	0.87	0.36	0.81	0.94
	ReliefF						MRMR [25]
	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Naive Bayes	0.77	0.70	0.84	0.41	0.87	0.82	0.73	0.72	0.81	0.40	0.89	0.74
k-NN	0.80	0.63	0.87	0.40	0.83	0.91	0.82	0.71	0.88	0.49	0.86	0.90
Decision Tree	0.74	0.62	0.83	0.31	0.83	0.83	0.70	0.63	0.79	0.29	0.83	0.76
SVM (Linear)	0.85	0.71	0.90	0.55	0.86	0.95	0.84	0.72	0.90	0.55	0.87	0.94
SVM (Polynomial)	0.82	0.71	0.89	0.50	0.86	0.91	0.82	0.72	0.88	0.50	0.87	0.90
SVM (RBF)	0.84	0.68	0.90	0.52	0.85	0.95	0.85	0.66	0.90	0.54	0.85	0.97
	Proposed HDC (Corr. > 0.90)						Proposed HDC (Corr. > 0.95)
	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Naive Bayes	0.74	0.72	0.81	0.41	0.89	0.75	0.72	0.71	0.80	0.39	0.89	0.73
k-NN	0.82	0.68	0.89	0.48	0.85	0.92	0.83	0.70	0.89	0.51	0.86	0.93
Decision Tree	0.74	0.65	0.83	0.34	0.84	0.81	0.74	0.65	0.83	0.34	0.84	0.81
SVM (Linear)	0.85	0.71	0.90	0.55	0.86	0.95	0.85	0.72	0.91	0.57	0.86	0.95
SVM (Polynomial)	0.82	0.72	0.89	0.51	0.87	0.91	0.83	0.73	0.89	0.52	0.87	0.91
SVM (RBF)	0.85	0.68	0.91	0.55	0.85	0.97	0.86	0.70	0.91	0.58	0.86	0.97

Table 8. Results on 7 external test subjects using the top-50 features selected by the proposed HDC algorithm and existing feature selection methods.

	Information Gain						Gain Ratio
	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Naive Bayes	0.67	0.65	0.72	0.48	0.56	1	0.57	0.57	0.53	0.14	0.50	0.56
k-NN	0.52	0.41	0.64	0.28	0.47	1	0.67	0.65	0.72	0.48	0.56	1
Decision Tree	0.48	0.48	0.48	−0.03	0.42	0.56	0.62	0.53	0.43	0.19	0.60	0.33
SVM (Linear)	0.57	0.50	0.67	0.35	0.50	1	0.57	0.50	0.67	0.35	0.50	1
SVM (Polynomial)	0.57	0.50	0.67	0.35	0.50	1	0.67	0.65	0.72	0.48	0.56	1
SVM (RBF)	0.52	0.41	0.64	0.28	0.47	1	0.67	0.65	0.72	0.48	0.56	1
	ReliefF						MRMR [25]
	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Naive Bayes	0.62	0.58	0.69	0.42	0.53	1	0.71	0.71	0.75	0.55	0.60	1
k-NN	0.52	0.41	0.64	0.28	0.47	1	0.57	0.54	0.64	0.26	0.50	0.89
Decision Tree	0.62	0.58	0.69	0.42	0.53	1	0.52	0.51	0.58	0.12	0.47	0.78
SVM (Linear)	0.57	0.50	0.67	0.35	0.50	1	0.62	0.58	0.69	0.42	0.53	1
SVM (Polynomial)	0.62	0.58	0.69	0.42	0.53	1	0.67	0.65	0.72	0.48	0.56	1
SVM (RBF)	0.57	0.50	0.67	0.35	0.50	1	0.57	0.50	0.67	0.35	0.50	1
	Proposed HDC (Corr. > 0.90)						Proposed HDC (Corr. > 0.95)
	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Naive Bayes	0.67	0.65	0.72	0.48	0.56	1	0.76	0.76	0.78	0.61	0.64	1
k-NN	0.67	0.65	0.72	0.48	0.56	1	0.62	0.58	0.69	0.42	0.53	1
Decision Tree	0.76	0.76	0.78	0.61	0.64	1	0.52	0.51	0.44	0.03	0.44	0.44
SVM (Linear)	0.57	0.50	0.67	0.35	0.50	1	0.57	0.50	0.67	0.35	0.50	1
SVM (Polynomial)	0.57	0.50	0.67	0.35	0.50	1	0.67	0.65	0.72	0.48	0.56	1
SVM (RBF)	0.57	0.50	0.67	0.35	0.50	1	0.57	0.50	0.67	0.35	0.50	1

Table 9. Comparison of results of the proposed HDC algorithm with existing studies using 10-fold (sample-wise) cross-validation.

Study	Model	Accuracy	G-Mean	F1-Score	MCC	Precision	Recall
Polat [63]	OGA Sampling, wkNN	0.89	-	-	-	-	-
Xiong [33]	SAE (mRMR), LDA	0.91	-	-	-	0.94	-
Gunduz [35]	VAE (Relief), SVM	0.96	-	0.97	0.88	-	-
Solana-Lavelle [32]	Wrapper, kNN	0.95	-	0.96	0.87	0.97	0.96
Tuncer [31]	MAMa Tree, kNN	0.97	-	0.96	-	0.97	0.95
Proposed method	HDC, kNN	0.96	0.94	0.97	0.89	0.97	0.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kumar, A.; Gyanchandani, M.; Shukla, S. Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection. Symmetry 2026, 18, 22. https://doi.org/10.3390/sym18010022

AMA Style

Kumar A, Gyanchandani M, Shukla S. Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection. Symmetry. 2026; 18(1):22. https://doi.org/10.3390/sym18010022

Chicago/Turabian Style

Kumar, Arvind, Manasi Gyanchandani, and Sanyam Shukla. 2026. "Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection" Symmetry 18, no. 1: 22. https://doi.org/10.3390/sym18010022

APA Style

Kumar, A., Gyanchandani, M., & Shukla, S. (2026). Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection. Symmetry, 18(1), 22. https://doi.org/10.3390/sym18010022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Data Complexity-Aware Feature Selection with Symmetric Splitting for Robust Parkinson’s Disease Detection

Abstract

1. Introduction

2. Related Work

2.1. Dataset and Speech Feature Categories

2.2. Existing Feature Selection Algorithms

2.3. Data Complexity Measures

3. Proposed Work

3.1. Subject-Wise Dataset Bifurcation Enabling Researchers to Have Fair Common Platform for Their Study

3.2. Hybrid Data Complexity-Based (HDC) Feature Selection

4. Experimental Study and Results

4.1. Empirical Analysis of Existing Speech Feature Categories

4.1.1. Size Limit or Threshold Analysis for Informative Features and Category-Specific Contributions

4.1.2. Feature Redundancy and Correlation Analysis

4.2. Design and Evaluation of the Final Feature Set for the Proposed HDC

4.3. Results and Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI