Entropy-Based Machine Learning Model for Fast Diagnosis and Monitoring of Parkinson’s Disease

This study presents the concept of a computationally efficient machine learning (ML) model for diagnosing and monitoring Parkinson’s disease (PD) using rest-state EEG signals (rs-EEG) from 20 PD subjects and 20 normal control (NC) subjects at a sampling rate of 128 Hz. Based on the comparative analysis of the effectiveness of entropy calculation methods, fuzzy entropy showed the best results in diagnosing and monitoring PD using rs-EEG, with classification accuracy (ARKF) of ~99.9%. The most important frequency range of rs-EEG for PD-based diagnostics lies in the range of 0–4 Hz, and the most informative signals were mainly received from the right hemisphere of the head. It was also found that ARKF significantly decreased as the length of rs-EEG segments decreased from 1000 to 150 samples. Using a procedure for selecting the most informative features, it was possible to reduce the computational costs of classification by 11 times, while maintaining an ARKF ~99.9%. The proposed method can be used in the healthcare internet of things (H-IoT), where low-performance edge devices can implement ML sensors to enhance human resilience to PD.


Introduction
By 2030, experts predict that every sixth person on Earth will be over 60 years of age due to an increasing life expectancy [1].It is estimated that 1.4 billion people will be over 60 by 2050.Age-related neurodegenerative diseases are a major risk factor for mortality and morbidity caused by neurodegenerative diseases [2][3][4].The symptoms of neurodegenerative disease may begin as early as middle age [5], followed by overt signs and symptoms.By diagnosing and treating patients early, irreversible damage to the nervous system can be reduced, improving their quality of life and length of life.
In addition to diagnostics, a personalized approach to neurodegenerative disease treatment using IoT-enabled environments is essential to improving patients' quality of life [5][6][7], such as smart homes and healthcare [8], smart spaces for mHealth applications [9], and smart healthcare [8].The healthcare IoT (H-IoT) [10] is also known as IoMT [11] and is one of the most efficient tools for this purpose.The key point is that a sensor (within an IoT edge device) participates in making the device smart.In particular, machine learning (ML) methods can be used to analyze the sensed data for diagnosis, e.g., see our concept of an ML sensor for diagnosing COVID-19 [12].The problem is that IoT edge devices are of low performance, and new effective ML algorithms are required.Several works have focused on the differences in the entropy of signals in different frequency ranges between patients with PD and the control group [28,31,33].In [33], relative spectral powers and wavelet packet entropy were used to identify PD.Although entropy features allow for better separation of two classes, relative spectral power (especially in the beta band) can also be useful.Higher-order spectral features, like bispectral entropies and mean magnitude, were used for PD diagnosis [28] based on five different types of classification algorithms such as DT, KNN, NB, PNN, and SVM.The SVM classifier reported a maximum mean accuracy of 99.6% compared to other classifiers in diagnosing PD.The authors of [31] used the KNN and SVM classifiers to diagnose PD based on energy and entropy features extracted from reconstructed wavelet signals.Accordingly, KNN and SVM classifiers achieved 99.5% and 99.9% mean accuracy, respectively.
Although the presented results prove a high classification accuracy (more than 99%), most of the approaches used to calculate features are limited.Also, the hyperparameters used when calculating entropy can significantly affect the calculation result.To obtain high accuracy, a number of studies have used many features [30,31,33,34], which complicates the implementation of these methods in low-performance IoT devices.This paper attempts to address these deficiencies by comparing various entropy methods, carefully selecting their parameters, and analyzing EEG signal frequency ranges for diagnosing PD.By analyzing EEG data collected from normal control (NC) and Parkinson's disease (PD) patients using wireless Emotiv EPOC headsets, we have developed a novel method for detecting PD which can be used in a smart IoT environment to enhance human resilience to PD.
The major contributions of this paper are: • A comparative analysis of the effectiveness of various methods for calculating entropy for identifying PD was carried out;

•
The most significant frequency ranges and EEG channels were identified, as well as their combinations; • A study was conducted to reduce computational costs by selecting the most significant features and reducing the length of the EEG segments analyzed; • A method of monitoring a patient's condition based on entropy values was developed;

•
We propose a machine learning model for monitoring the health status of Parkinson's patients using an IoT environment based on low-performance sensors.
The remainder of this paper is organized as follows.Section 2 provides an overview of the datasets, proposed methods, and performance evaluation.Section 3 presents a comparison of classification accuracy using different EEG channels and frequency bands.Section 4 describes options for optimizing the classification model by reducing the length of the EEG segment and the number of features.Section 5 outlines our further research on the smart IoT environment concept for patient health monitoring and enhancing human resilience.Section 6 summarizes the key findings and limitations of our study.

Dataset
This study was conducted using an EEG dataset consisting of 20 patients with Parkinson's disease and 20 age-matched normal control subjects without a history of psychological disorders or neurological disorders.This dataset was collected at the Hospital Universiti Kebangsaan Malaysia in Malaysia.The entire data acquisition protocol at the Hospital Universiti Kebangsaan Malaysia was approved by the Institutional Ethical Review Board Committee as part of the hospital's ethical review process.An Emotiv EPOC wireless headset with a total of 14 channels (Figure 1a) was used for recording EEG signals from both NCs and PDs in the rest-state condition with the eyes closed for a period of 5 min during this study.In accordance with the international standard 10-20 system, the 14 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) were placed on the subject's scalp (Figure 1b).With a sampling rate of 128 Hz, the data collected for each of the channels were converted into digital signals.Using the Hoehn and Yahr scales, a total of seven patients were classified as having Parkinson's disease stage III, eleven patients as having Parkinson's disease stage II, and two patients as having Parkinson's disease stage I.A complete description of the dataset, acquisition, and preprocessing of the dataset can be found in [40][41][42].channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) were placed on the subject's scalp (Figure 1b).With a sampling rate of 128 Hz, the data collected for each of the channels were converted into digital signals.Using the Hoehn and Yahr scales, a total of seven patients were classified as having Parkinson's disease stage III, eleven patients as having Parkinson's disease stage II, and two patients as having Parkinson's disease stage I.A complete description of the dataset, acquisition, and preprocessing of the dataset can be found in [40][41][42].
Figure 2 shows the workflow diagram of the proposed classification method.It consists of three separate steps: signal preprocessing (Section 2.2), feature generation (Section 2.3) and classification (Section 2.4).   Figure 2 shows the workflow diagram of the proposed classification method.It consists of three separate steps: signal preprocessing (Section 2.2), feature generation (Section 2.3) and classification (Section 2.4).
less headset with a total of 14 channels (Figure 1a) was used for recording EEG signals from both NCs and PDs in the rest-state condition with the eyes closed for a period of 5 min during this study.In accordance with the international standard 10-20 system, the 14 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) were placed on the subject's scalp (Figure 1b).With a sampling rate of 128 Hz, the data collected for each of the channels were converted into digital signals.Using the Hoehn and Yahr scales, a total of seven patients were classified as having Parkinson's disease stage III, eleven patients as having Parkinson's disease stage II, and two patients as having Parkinson's disease stage I.A complete description of the dataset, acquisition, and preprocessing of the dataset can be found in [40][41][42].
Figure 2 shows the workflow diagram of the proposed classification method.It consists of three separate steps: signal preprocessing (Section 2.2), feature generation (Section 2.3) and classification (Section 2.4).

Signal Preprocessing
Considering the wide spectral range of EEG signals (0-64 Hz) and the fact that most brain activity information is contained in relatively narrow frequency subranges [44,45], it is possible that the entropy of the original signal gives a poor indication of its separation capability.Filtering the initial data and decomposing the signal into separate frequencies using the wavelet transform can increase EEG signals' information content.A fifth-order Butterworth filter with a cut-off frequency of 0.5-32 Hz was applied to all acquired signals to remove low-and high-frequency noise, while amplitude thresholding of ±85 µV was applied to remove artifacts (eye blinking, eyeball rotation, and eye movements) during the acquisition process.Since the number of patients was relatively small, each EEG record was divided into 5 non-overlapping segments, each of which represented an independent observation within the framework of this study.The duration of all segments was the same and varied from 150 (~1.2 s) to 1000 (~7.8 s) samples.A discrete wavelet transform (DWT) was performed on the signal using the db4 wavelet [46].After decomposing into wavelet approximation coefficients (A1-A4) and details (D1-D4), each of them was utilized to reconstruct the signals, with each signal (cA1-cA4 and cD1-cD4) being reconstructed with only one of the coefficients.A similar method was used in [31]; however, a different frequency band was chosen.
Using the original dataset, 9 variants of different signal types were obtained:

Feature Generation
Entropy features were calculated from EEG signals after applying DWT and concatenated to form the feature vector for each class (NC and PD).Later, these feature vectors were used in classifying patients using different machine learning methods.This entropy model comprises several features such as singular value decomposition entropy (SVDEn) [47], permutation entropy (PermEn) [48], sample entropy (SampEn) [49], cosine similarity entropy (CoSiEn) [50], fuzzy entropy (FuzzyEn) [51], phase entropy (PhaseEn) [52], and attention entropy (AttnEn) [53].A method for calculating entropy was implemented using the EntropyHub (version 0.2) [54] software package, except for SVDEn and PermEn.The Antropy (version 0.1.6)[55] software package was used to calculate SVDEn and PermEn.The range of hyperparameters used for computing each type of entropy is shown in Table 1.There are no hyperparameters associated with AttnEn.Below are descriptions of these methods for calculating entropy (Sections 2.3.1-2.3.7).

SVDEn
To calculate SVDEn for a time series X = [x 1 , x 2 , . . .x i , . . .x N ] of length N, an embedding matrix A is created as follows: where m-length of the embedding dimension and delay-time series sample bias.Singular value decomposition is the factorization of matrix A into the product: Matrix U contains the left singular vectors of A, and matrix V contains the right singular vectors.Matrix S is always diagonal, and its coefficients are non-negative real numbers λ 1 , . .., λ k , located on the main diagonal of the matrix, which are called singular values.
The dispersion of singular values λ k also provides an indication of the complexity of signal dynamics [47].Singular values can be normalized as: Singular value decomposition entropy is defined with the Shannon formula applied to the elements of singular values of the matrix, and calculated as follows [47]: After that, the SVDEn values are normalized in the range from 0 to 1:

PermEn
PermEn is a complexity measure for time series based on the comparison of neighboring values.The permutation entropy PermEn of a one-dimensional data series X is: where p i -the frequency of occurrence of the i-th permutation in embedded matrix A, which is defined in the same way as (1).
After that, PermEn values are normalized in the range from 0 to 1:

SampEn
The SampEn calculation of time series X = [x 1 , x 2 , . . .x N ] of length N contains several stages.First, the series is divided into template vector X m i = [x i , x i+1 , . . .x i+m−1 ] of length m (m < N).Then, the number C (m, r) of pairs of vectors X m i and X m j (i = j) for which the Chebyshev distance ChebDist[X m i , X m j ] does not exceed r is calculated.SampEn for one-dimensional data series X is defined as:

CoSiEn
The CoSiEn calculation of time series X = [x 1 , x 2 , . . .x N ] of length N contains several stages.First, the series is divided into template vector X m i = [x i , x i+1 , . . .x i+m−1 ] of length m (m < N).Then, the number B (m, r) of pairs of vectors X m i and X m j (i = j) for which the angular distance AngDist[X m i , X m j ] does not exceed r is calculated.Angular distance between two vectors is calculated as follows: CoSiEn for one-dimensional data series X is defined as: Sensors 2023, 23, 8609 7 of 20 2.3.5.FuzzyEn For a vector of time series T of length N, it is possible to compose Nm + 1 vectors X m i of length m, consisting of normalized successive segments of the original series T. The normalization procedure consists of subtracting T avg i from each element of the series: where i = 1. ..N -m + 1 and T avg i is calculated as follows: For any pair of vectors X m i and X m i (i = j), one can determine the distance d m ij between them equal to the maximum absolute difference between the vector components: The similarity between vectors is determined using the fuzzy function D m ij : The FuzzyEn entropy value is calculated based on the average similarity of vectors.For a finite series T it can be expressed as: where the function φ m is expressed through:

PhaseEn
In order to calculate PhaseEn of time series X = [x 1 , x 2 , . . .x N ] of length N, it is necessary to first construct vectors Y and W, which are the coordinates of the points on the second-order difference plot, defined as follows: Then, a vector containing the slope angles of each point (in the range of 0-2π) is calculated as follows: Then, the entire range (2π) is divided into K equal sectors, for each of which the total slope angle S i (i = 1. ..K) is calculated: Sensors 2023, 23, 8609 8 of 20 After that, probability distribution p i is calculated for each of the K sectors: PhaseEn is computed as:

AttnEn
The AttnEn calculation of time series X = [x 1 , x 2 , . . .x N ] of length N contains several stages.First, it is necessary to calculate the positions of local minima and maxima within the time series.By local minimum, we mean point x i for which the inequalities x i < x i−1 and x i < x i+1 hold, and by local maximum, we mean point x j for which the inequalities x j > x j−1 and x j > x j+1 hold.Then, the intervals between two successive peak points (minima and maxima) are calculated.In this case, 4 variants of such intervals are considered: between two maximums (I max-max ), between two minimums (I min-min ), between the maximum and the subsequent minimum (I max-min ), between the minimum and the subsequent maximum (I min-max ).
After calculating 4 sets of intervals (I max-max , I min-min , I max-min , I min-max ) for each set, the frequency of occurrence of each interval within the set is calculated, on the basis of which Shannon entropy values are calculated (ShEn max-max , ShEn min-min , ShEn max-min , ShEn min-max ).The AttnEn value is the average of these entropies: AttnEn = (ShEn max-max + ShEn min-min + ShEn max-min + ShEn min-max )/4.

Assessment of Classification Accuracy
The accuracy of the classifications was assessed using support vector classifiers (SVCs) implemented using scikit-learn.Two stages were involved in the classification accuracy assessment.In the first step, hyperparameters were selected by means of repeated K-fold cross-validation (RKF) [56].This was performed by dividing the estimated datasets into K = 10 blocks in various ways, with N = 10.For each of the N variants of partitions, the K-blocks were filled with different samples, resulting in a uniform distribution of classes.Sets of samples were created based on K-blocks for training and validating the classifier, with each K-block being validated once and the remaining K -1 = 9 being used in training.
The classifier hyperparameters were then selected at the maximum average accuracy achieved on the validation set.K-block cross-validation allows for the selection of hyperparameter values that do not require retraining the model because many training and validation sets are used.Due to the optimization of hyperparameters on a fixed set of samples, it is possible that the average cross-validation accuracy is too optimistic.Consequently, after determining the optimal hyperparameters, the next step was taken.During the second stage, optimal values of hyperparameters were used and cross-validation was performed on other N = 30 partitions divided into K = 10 blocks, which was different from the first stage.Classification accuracy was measured based on the average A RKF accuracy across the new partitions.

Experimental Results and Discussion
In this section, we present the results of assessing classification accuracy using all features, one signal type, all channels, one channel, and one feature.

Classification Accuracy Using One Method for Calculating the Entropy
In both NC and PD, the entropy feature was computed using all nine types of input EEG data (original signal and eight reconstructed signals based on detail and approximation coefficients) across 14 channels (126 features in total).A model was developed to categorize NC and PD based on the features extracted from NC and PD pairs.Based on PermEn, SampEn, CoSiEn, FuzzyEn, PhaseEn, BubbleEn, and SVDEn, Figure 3 shows the classification accuracy (A RKF ) of each entropy feature with different hyperparameters.These entropy features were computed with varying hyperparameter values in this study.Using five non-overlapping segments of 40 subjects (20 PDs and 20 NCs), we extracted entropy features from 200 datasets.In this task, the optimal parameters for each of the entropy calculations were determined.
EEG data (original signal and eight reconstructed signals based on detail and approximation coefficients) across 14 channels (126 features in total).A model was developed to categorize NC and PD based on the features extracted from NC and PD pairs.Based on Per-mEn, SampEn, CoSiEn, FuzzyEn, PhaseEn, BubbleEn, and SVDEn, Figure 3 shows the classification accuracy (ARKF) of each entropy feature with different hyperparameters.These entropy features were computed with varying hyperparameter values in this study.Using five non-overlapping segments of 40 subjects (20 PDs and 20 NCs), we extracted entropy features from 200 datasets.In this task, the optimal parameters for each of the entropy calculations were determined.

Classification Accuracy Using One Type of Signal
Furthermore, we wished to identify which type of EEG data is most effective among the nine types of data, as described in Section 2.2, based on different entropy measures.Through this investigation, the computational complexity (memory and computation time) of the proposed PD diagnosis system can be reduced.This section presents the results of calculating classification accuracy ARKF using each type of nine signals (O, cA1-cA4, cD1-cD4) for each of the 14 channels (14 features in total).The values of the optimal The best classification result A RKF = 99.9% was demonstrated for FuzzyEn with parameters (m = 1, r = 0.15 × std, r 2 = 5).The influence of the r parameter in this case is insignificant.It was observed that the A RKF value increases as the r 2 parameter increases from 1 to 5. The next most accurate entropy method was AttnEn (A RKF = 97.9%).This method has no hyperparameters.Acceptable accuracy was achieved for PermEn (A RKF = 95% for m = 5) and SVDEn (A RKF = 93.6% for m = 3).Both curves have a maximum at intermediate values of the m parameter.The worst results were obtained using the SampEn (A RKF = 91.5% for m = 2, r = 0.25 × std), PhaseEn (A RKF = 81.5% for K = 6), and CoSiEn (A RKF = 81.3%for m = 3, r = 0.05) methods.

Classification Accuracy Using One Type of Signal
Furthermore, we wished to identify which type of EEG data is most effective among the nine types of data, as described in Section 2.2, based on different entropy measures.Through this investigation, the computational complexity (memory and computation time) of the proposed PD diagnosis system can be reduced.This section presents the results of calculating classification accuracy A RKF using each type of nine signals (O, cA1-cA4, cD1-cD4) for each of the 14 channels (14 features in total).The values of the optimal entropy parameters correspond to those presented in Section 3.1.Figure 4 shows the dependence of A RKF on the type of signal.

Classification Accuracy Using One Type of Signal
Furthermore, we wished to identify which type of EEG data is most effective the nine types of data, as described in Section 2.2, based on different entropy me Through this investigation, the computational complexity (memory and comp time) of the proposed PD diagnosis system can be reduced.This section presents sults of calculating classification accuracy ARKF using each type of nine signals (O cA4, cD1-cD4) for each of the 14 channels (14 features in total).The values of the entropy parameters correspond to those presented in Section 3.1.Figure 4 shows pendence of ARKF on the type of signal.According to the experimental results, FuzzyEn has higher accuracy than other types of entropy features for all types of signals.According to the presented data, it can be noted that the use of only one type of signal (14 features) generally reduces the accuracy of the A RKF classification compared to using all 126 features.When using FuzzyEn, the A RKF value had high values for the following signals: cD2 (A RKF = 98.9%), cA3 (A RKF = 98.2%), cA4(A RKF = 98%).For other entropies, high A RKF values were observed for signals O, cA1, cA2, cA3, cA4.Perhaps this is due to the presence of a low-frequency component in the range from 0 to 4 Hz in these signals, namely O (0-64 Hz), cA1 (0-32 Hz), cA2 (0-16 Hz), cA3 (0-8 Hz), and cA4 (0-4 Hz), while cD1 (32-64 Hz), cD2 (16-32 Hz), cD3 (8-16 Hz), and cD4 (4-8 Hz) signals contain higher frequency components.Low-frequency rhythms (delta and theta) are usually prominent while the eye is closed and in a resting state compared to waking and alert states (while the eye is open and focused).People with neurological disorders, particularly those with delta and theta rhythms, tend to have these rhythms dominate more than healthy individuals.Due to this, low-frequency rhythms (alpha to gamma) are more accurate in diagnosing Parkinson's disease than high-frequency rhythms.
The decrease in accuracy when using only one type of signal is quite significant: classification error E RKF = 1 − A RKF increased by 11 times compared to the result achieved when using all features (Section 3.1).Thus, the use of one frequency range is not enough to achieve maximum classification accuracy A RKF = 99.9%.

Classification Accuracy Using a Single Channel
In this section, we present the results of classification accuracy A RKF using all nine signal types (nine features in total) corresponding to one of the 14 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4).The values of the optimal entropy parameters are specified in Section 3.1.In Figure 5, A RKF is shown in relation to the channel number.
theta) are usually prominent while the eye is closed and in a resting state compared to waking and alert states (while the eye is open and focused).People with neurological disorders, particularly those with delta and theta rhythms, tend to have these rhythms dominate more than healthy individuals.Due to this, low-frequency rhythms (alpha to gamma) are more accurate in diagnosing Parkinson's disease than high-frequency rhythms.
The decrease in accuracy when using only one type of signal is quite significant: classification error ERKF = 1 − ARKF increased by 11 times compared to the result achieved when using all features (Section 3.1).Thus, the use of one frequency range is not enough to achieve maximum classification accuracy ARKF = 99.9%.

Classification Accuracy Using a Single Channel
In this section, we present the results of classification accuracy ARKF using all nine signal types (nine features in total) corresponding to one of the 14 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4).The values of the optimal entropy parameters are specified in Section 3.1.In Figure 5, ARKF is shown in relation to the channel number.Analyzing the results presented in Figure 5, it can be noted that the highest ARKF value for most channels was obtained using FuzzyEn for the P8 (ARKF = 90.8%)and F8 (ARKF = 88.8%)channels.It is not possible to find pronounced dependencies that are repeated for all entropies.The classification accuracy obtained when using one channel is significantly reduced compared to the results achieved when using all channels: minimum classification error ERKF increases by ~8 times when using one channel and one type of signal (Section 3.2) and 92 times when using all signals and all channels (Section 3.1).This suggests the need to use multichannel EEG measurement devices to maximize accuracy.

Classification Accuracy Using One Feature
In Sections 3.2 and 3.3, reduced datasets with fourteen (one signal type) and nine (one channel) features were used; however, Figures 4 and 5 show that classification accuracy varies significantly across different channels and signal types (frequency bands).At the Analyzing the results presented in Figure 5, it can be noted that the highest A RKF value for most channels was obtained using FuzzyEn for the P8 (A RKF = 90.8%)and F8 (A RKF = 88.8%)channels.It is not possible to find pronounced dependencies that are repeated for all entropies.The classification accuracy obtained when using one channel is significantly reduced compared to the results achieved when using all channels: minimum classification error E RKF increases by ~8 times when using one channel and one type of signal (Section 3.2) and 92 times when using all signals and all channels (Section 3.1).This suggests the need to use multichannel EEG measurement devices to maximize accuracy.

Classification Accuracy Using One Feature
In Sections 3.2 and 3.3, reduced datasets with fourteen (one signal type) and nine (one channel) features were used; however, Figures 4 and 5 show that classification accuracy varies significantly across different channels and signal types (frequency bands).At the same time, when analyzing these two criteria, we cannot determine the most informative combinations of channels and frequency ranges.
This section presents the results of using one feature (one type of signal for one channel).In this case, the FuzzyEn method, which produced the best accuracy estimate in Sections 3.2 and 3.3, will be used, with the parameters m = 1, r = 0.15 × std, r 2 = 5.The graphs are grouped by signal types and are divided into two groups: 1.
The most informative frequency range for the first group (Figure 6a) is cD4 (4-8 Hz), for which the average value of A RKF (A RKF_mean ) is equal to 67.1%, while for the rest of the frequency ranges, A RKF_mean ~63%.Among the signals in the second group (Figure 6b), the most informative is cA3 (0-8 Hz), with an average value of A RKF_mean = 71.4%,while signals with the presence of higher-frequency components show lower values of A RKF_mean : 63.2% for O (0-64 Hz), 62.9% for cA1 (0-32 Hz), and 65.8% for cA2 (0-16 Hz).The lower accuracy of A RKF_mean = 68.2%for cA4 (0-4 Hz) may indicate that the 4-8 Hz range is needed to improve signal classification accuracy.The highest classification accuracy by one feature was obtained for the T8 channel and the cA3 signal: A RKF = 79.5%.
same time, when analyzing these two criteria, we cannot determine the most informative combinations of channels and frequency ranges.
The most informative frequency range for the first group (Figure 6a) is cD4 (4-8 Hz), for which the average value of ARKF (ARKF_mean) is equal to 67.1%, while for the rest of the frequency ranges, ARKF_mean ~63%.Among the signals in the second group (Figure 6b), the most informative is cA3 (0-8 Hz), with an average value of ARKF_mean = 71.4%,while signals with the presence of higher-frequency components show lower values of ARKF_mean:  To determine the most informative combinations of channels and frequency ranges, Table 2 was compile.It contains 15 combinations of channel and signal type with the highest A RKF value from those presented in Figure 6a,b.It can be noted that for most of the channels presented in the table (T8, O2, FC6, F3, AF4), only the low-frequency components of the original signal are the most informative, namely cA3 (0-8 Hz), cA4 (0-4 Hz), and cD4 (4-8 Hz), while for channels F8 and O1, signals with high-frequency components are also informative: O (0-64 Hz) and cD1 .It is also worth noting that most of the channels that give the best results were located in the right hemisphere of the head.According to our knowledge, there are no earlier studies that examine the impact of specific regions or specific hemispheres on PD diagnosis using rest-state EEG signals.
As a result of the proposed entropy-based PD diagnosis methodology, right hemisphere channels showed a significant difference compared to left hemisphere channels in terms of the following criteria: (a) limited number of PD subject data, clinical history of the patients, and progression of PD in the subjects; (b) limited number of channels (14 channels); and (c) proposed methodology of entropy features and machine learning-based diagnosis.No specific region in the brain has been studied in the literature on diagnosing PD due to the lack of valid scientific evidence.By conducting the experiment on another PD dataset with a larger number of subjects with a higher number of EEG channels, we could justify or test our proposed conclusion in the future.

Model Optimization
Section 3.4 showed that different types of signals perform best on different channels.A high classification accuracy can be achieved with a minimum number of features, which appears to be an interesting goal.We examined how the accuracy of A RKF changes with the number of features computed using FuzzyEn (Section 3.4).In order to do this, we used an iterative approach in which only the first feature gave the maximum value of A RKF .Next, the A RKF value was calculated for the combination of two features.The evaluation procedure was repeated with one more of the remaining features added.Figure 7 illustrates the dependence of A RKF on feature numbers.
With 11 features, classification accuracy A RKF is 99.9%, which is the same as that achieved using all 126 features.By minimizing the number of features, it is possible to reduce the computational costs of classification and use lower-performance devices for analysis, such as peripheral IoT devices or embedded analytical modules in EEG signal measurement devices.The length of the EEG segment (LEEG) can also be reduced to reduce the amount of data to be processed.In Section 3, we used segments with 1000 counts (~7.8 s).However, it is possible to shorten this length in order to speed up calculations.We achieved this by reducing the most resource-intensive part of the analysis-the calculation of FuzzyEn.Another part of the time is spent filtering the signal using wavelet methods.According to Figure 8, ARKF accuracy depends on the number of LEEG readings when using all 126 features (see Section 3.1) or the 11 most informative ones (this section).The segment length LEEG of 1000 samples provides a high classification accuracy of 99.9% for both 11 and 126 features.As segment length LEEG decreases, classification accuracy ARKF also decreases, but less intensely for 126 features than for 11.For example, a decrease in length even by 20% (up to LEEG = 800) led to a decrease in accuracy to 99.4% for 126 features and to 98.2% for 11 features.Thus, ERKF error increased by 6 times for 126 features and by 18 times for 11 features.
Since the main idea of reducing computational costs is to reduce computation time, we compared the computation time of one segment (calculation of entropy features and classification by the trained model) for different segment lengths LEEG and different numbers of features.The calculations were performed on a desktop computer with an Intel i5-7200U (2.5 GHz) processor and 8 GB of RAM.The length of the EEG segment (L EEG ) can also be reduced to reduce the amount of to be processed.In Section 3, we used segments with 1000 counts (~7.8 s).However, it is possible to shorten this length in order to speed up calculations.We achieved this by reducing the most resource-intensive part of the analysis-the calculation of FuzzyEn.Another part of the time is spent filtering the signal using wavelet methods.According to Figure 8, A RKF accuracy depends on the number of L EEG readings when using all 126 features (see Section 3.1) or the 11 most informative ones (this section).The length of the EEG segment (LEEG) can also be reduced to reduce the amount of data to be processed.In Section 3, we used segments with 1000 counts (~7.8 s).However, it is possible to shorten this length in order to speed up calculations.We achieved this by reducing the most resource-intensive part of the analysis-the calculation of FuzzyEn.Another part of the time is spent filtering the signal using wavelet methods.According to Figure 8, ARKF accuracy depends on the number of LEEG readings when using all 126 features (see Section 3.1) or the 11 most informative ones (this section).The segment length LEEG of 1000 samples provides a high classification accuracy of 99.9% for both 11 and 126 features.As segment length LEEG decreases, classification accuracy ARKF also decreases, but less intensely for 126 features than for 11.For example, a decrease in length even by 20% (up to LEEG = 800) led to a decrease in accuracy to 99.4% for 126 features and to 98.2% for 11 features.Thus, ERKF error increased by 6 times for 126 features and by 18 times for 11 features.
Since the main idea of reducing computational costs is to reduce computation time, we compared the computation time of one segment (calculation of entropy features and classification by the trained model) for different segment lengths LEEG and different numbers of features.The calculations were performed on a desktop computer with an Intel i5-7200U (2.5 GHz) processor and 8 GB of RAM.The segment length L EEG of 1000 samples provides a high classification accuracy of 99.9% for both 11 and 126 features.As segment length L EEG decreases, classification accuracy A RKF also decreases, but less intensely for 126 features than for 11.For example, a decrease in length even by 20% (up to L EEG = 800) led to a decrease in accuracy to 99.4% for 126 features and to 98.2% for 11 features.Thus, E RKF error increased by 6 times for 126 features and by 18 times for 11 features.
Since the main idea of reducing computational costs is to reduce computation time, we compared the computation time of one segment (calculation of entropy features and classification by the trained model) for different segment lengths L EEG and different numbers of features.The calculations were performed on a desktop computer with an Intel i5-7200U (2.5 GHz) processor and 8 GB of RAM.
With more than 350 samples, computation time t comp depends linearly on segment length L EEG , since most of the time is spent calculating entropy features.It took approxi-mately 0.06 s to calculate one feature with a length of L EEG = 1000.In Figure 9, it can be observed that by reducing the number of features, calculation time can be significantly reduced (for example, with L EEG = 1000, calculation time varies by 11 times) while maintaining a low classification error (see Figure 8).The reduction in segment length does not significantly improve calculation speed (for example, the speed difference between L EEG = 1000 and L EEG = 800 is only 25%), but significantly increases classification error E RKF .
Sensors 2023, 23, 8609 15 of 20 With more than 350 samples, computation time tcomp depends linearly on segment length LEEG, since most of the time is spent calculating entropy features.It took approximately 0.06 s to calculate one feature with a length of LEEG = 1000.In Figure 9, it can be observed that by reducing the number of features, calculation time can be significantly reduced (for example, with LEEG = 1000, calculation time varies by 11 times) while maintaining a low classification error (see Figure 8).The reduction in segment length does not significantly improve calculation speed (for example, the speed difference between LEEG = 1000 and LEEG = 800 is only 25%), but significantly increases classification error ERKF.

Future Work: Smart IoT Environment Concept for Patient Health Monitoring
Based on the results presented in Section 3, we conclude that entropy features can be used to analyze EEG signals in order to effectively diagnose PD patients.Let us present the idea of a smart IoT environment that continuously monitors the patient's condition at home (Figure 10).Such a smart IoT environment collects and analyzes a wide array of information in real-time using ML sensors in edge IoT devices.The results are then presented to both the patient and the attending physician through remote, authorized access to the data.The latter is especially important if the treatment takes place at home rather than in a medical facility [57].An attending physician can intervene quickly if a patient's condition deteriorates, which the patient himself/herself may not be aware of due to the deterioration in cognitive functions.This approach enhances human resilience to PD, making everyday life more comfortable and easier.

Future Work: Smart IoT Environment Concept for Patient Health Monitoring
Based on the results presented in Section 3, we conclude that entropy features can be used to analyze EEG signals in order to effectively diagnose PD patients.Let us present the idea of a smart IoT environment that continuously monitors the patient's condition at home (Figure 10).Such a smart IoT environment collects and analyzes a wide array of information in real-time using ML sensors in edge IoT devices.The results are then presented to both the patient and the attending physician through remote, authorized access to the data.The latter is especially important if the treatment takes place at home rather than in a medical facility [57].An attending physician can intervene quickly if a patient's condition deteriorates, which the patient himself/herself may not be aware of due to the deterioration in cognitive functions.This approach enhances human resilience to PD, making everyday life more comfortable and easier.With more than 350 samples, computation time tcomp depends linearly on segment length LEEG, since most of the time is spent calculating entropy features.It took approximately 0.06 s to calculate one feature with a length of LEEG = 1000.In Figure 9, it can be observed that by reducing the number of features, calculation time can be significantly reduced (for example, with LEEG = 1000, calculation time varies by 11 times) while maintaining a low classification error (see Figure 8).The reduction in segment length does not significantly improve calculation speed (for example, the speed difference between LEEG = 1000 and LEEG = 800 is only 25%), but significantly increases classification error ERKF.

Future Work: Smart IoT Environment Concept for Patient Health Monitoring
Based on the results presented in Section 3, we conclude that entropy features can be used to analyze EEG signals in order to effectively diagnose PD patients.Let us present the idea of a smart IoT environment that continuously monitors the patient's condition at home (Figure 10).Such a smart IoT environment collects and analyzes a wide array of information in real-time using ML sensors in edge IoT devices.The results are then presented to both the patient and the attending physician through remote, authorized access to the data.The latter is especially important if the treatment takes place at home rather than in a medical facility [57].An attending physician can intervene quickly if a patient's condition deteriorates, which the patient himself/herself may not be aware of due to the deterioration in cognitive functions.This approach enhances human resilience to PD, making everyday life more comfortable and easier.The optimization of information processing processes is an important step in developing IoT environments and low-performance sensors that monitor PD patients' health status.Due to their limited computing capabilities and small amounts of RAM, IoT devices and gateways need to reduce their volume to speed up data processing.An IoMT network is capable of continuously monitoring physiological parameter changes in humans by using machine learning (ML) models trained on smart sensors [12,58,59].Physiological or biomedical sensors that are placed on the patient's body (wearable sensors) measure different types of physiological responses, including heart rate, blood pressure, skin electrical conductivity, oxygen saturation, heart electrical activity, electroencephalograms (EEGs), etc. [60].Additionally, some sensors can be placed in the room where the patient is located to monitor their movement patterns, gait, physical activity, etc. [61,62].In addition to transforming the hardware designs of traditional sensor systems using ML techniques, artificial intelligence sensors (or smart sensors) can also be designed holistically based on ML methods [63] and machine learning algorithms [64,65].A further development of the ML sensor paradigm was achieved by Warden et al. [59] and Matthew Stewart [58], where the authors introduced the terms Sensors 1.0 and Sensors 2.0.Sensors 2.0 involve both a sensor and a machine learning module integrated into one device.The optimization of information processing processes is an important step in developing IoT environments and low-performance sensors that monitor PD patients' health status.Due to their limited computing capabilities and small amounts of RAM, IoT devices and gateways need to reduce their volume to speed up data processing.An IoMT network is capable of continuously monitoring physiological parameter changes in humans by using machine learning (ML) models trained on smart sensors [12,58,59].Physiological or biomedical sensors that are placed on the patient's body (wearable sensors) measure different types of physiological responses, including heart rate, blood pressure, skin electrical conductivity, oxygen saturation, heart electrical activity, electroencephalograms (EEGs), etc. [60].Additionally, some sensors can be placed in the room where the patient is located to monitor their movement patterns, gait, physical activity, etc. [61,62].In addition to transforming the hardware designs of traditional sensor systems using ML techniques, artificial intelligence sensors (or smart sensors) can also be designed holistically based on ML methods [63] and machine learning algorithms [64,65].A further development of the ML sensor paradigm was achieved by Warden et al. [59] and Matthew Stewart [58], where the authors introduced the terms Sensors 1.0 and Sensors 2.0.Sensors 2.0 involve both a sensor and a machine learning module integrated into one device.
In Section 4, we showed that only 11 features are sufficient to identify PD with a classification accuracy A RKF of ~99.9%.For future research, it is possible to propose the development of a type of Sensor 2.0 which will be implemented in the real device (wireless headset) (Figure 10).EEG signals will be input into the model, and the output will be the degree of disease development.This may be part of a smart IoT environment for patient health monitoring.To implement the EEG signal classification methods proposed in this work, it is proposed that Raspberry Pi Zero W be used.
The third direction of research could be the fusing of information from EEG devices and an IoT video camera.Continuous monitoring of the patient's condition could include regular (e.g., weekly) EEG measurements at rest and continuous monitoring of motor activity using video surveillance.By analyzing the video image, it would be possible to identify specific motor activity disorders characteristic of Parkinson's disease.Both the patient and his/her attending physician would be able to monitor the patient's condition objectively based on the analysis results.Interaction between the smart IoT environment and a medical information system could be achieved through network interaction.This would be especially relevant to remote northern regions with low population density and long distances to medical institutions with the necessary infrastructure.Additionally, it would reduce the burden on medical facilities and reduce the cost and time of transporting patients.

Conclusions
This study proposes a novel ML model based on EEG entropy features for PD diagnosis and monitoring in smart IoT environments.We investigated the most effective entropy method to calculate EEG entropy features.We found that fuzzy entropy performed well in detecting and monitoring Parkinson's disease.EEG signals with low frequencies (0-4 Hz) contributed the most to high classification accuracy, and we identified the most prominent EEG signal frequency range.Additionally, the most informative signals were received primarily from the right hemisphere of the head (F8, P8, T8, FC6).A combination of signal frequency range and channels was selected to accurately diagnose PD with only 11 features achieving a classification accuracy A RKF of ~99.9%, while reducing data processing time by ~11 times.A study of the dependence of classification accuracy A RKF on the length of EEG segments (L EEG ) showed a significant decrease in A RKF with a decrease in L EEG : from 99.9% for L EEG = 1000 to 98.3% for L EEG = 800 when using the 11 best features.At the same time, decreasing the value of L EEG only slightly reduced computation time, so this approach does not make much practical sense.This also shows the limitations of the method: to obtain a high classification accuracy, it is necessary to use long segments of the EEG signal (1000 samples or ~7.8 s).An optimized model with a small number of features, reducing computational costs, could be used in low-performance devices, and so would be applicable for smart IoT environments with ML sensors.

Figure 2 .
Figure 2. The workflow diagram of the proposed classification method.

Figure 2 .
Figure 2. The workflow diagram of the proposed classification method.Figure 2. The workflow diagram of the proposed classification method.

Figure 2 .
Figure 2. The workflow diagram of the proposed classification method.Figure 2. The workflow diagram of the proposed classification method.

Figure 7 .
Figure 7. Dependence of classification accuracy ARKF on the number of features.

Figure 8 .
Figure 8. Dependence of classification accuracy ARKF on segment length LEEG.

Figure 7 .
Figure 7. Dependence of classification accuracy A RKF on the number of features.

Figure 8 .
Figure 8. Dependence of classification accuracy ARKF on segment length LEEG.

Figure 8 .
Figure 8. Dependence of classification accuracy A RKF on segment length L EEG .

Figure 9 .
Figure 9. Dependence of computation time tcomp on segment length LEEG.

Figure 10 .
Figure 10.The concept of a smart IoT environment that can continuously monitor Parkinson's disease patients.

Figure 9 .
Figure 9. Dependence of computation time t comp on segment length L EEG .

Figure 9 .
Figure 9. Dependence of computation time tcomp on segment length LEEG.

Figure 10 .
Figure 10.The concept of a smart IoT environment that can continuously monitor Parkinson's disease patients.Figure 10.The concept of a smart IoT environment that can continuously monitor Parkinson's disease patients.

Figure 10 .
Figure 10.The concept of a smart IoT environment that can continuously monitor Parkinson's disease patients.Figure 10.The concept of a smart IoT environment that can continuously monitor Parkinson's disease patients.

Figure 11 .
Figure 11.Histogram of distribution of FuzzyEn values for signal cA3 of channel T8.

Figure 11 .
Figure 11.Histogram of distribution of FuzzyEn values for signal cA3 of channel T8.

Table 1 .
Ranges of parameters used to create entropy features.

Table 2 .
Combinations of channels and signal type that give the highest A RKF value.