Wearable-Sensor-Based Detection and Prediction of Freezing of Gait in Parkinson’s Disease: A Review

Freezing of gait (FOG) is a serious gait disturbance, common in mid- and late-stage Parkinson’s disease, that affects mobility and increases fall risk. Wearable sensors have been used to detect and predict FOG with the ultimate aim of preventing freezes or reducing their effect using gait monitoring and assistive devices. This review presents and assesses the state of the art of FOG detection and prediction using wearable sensors, with the intention of providing guidance on current knowledge, and identifying knowledge gaps that need to be filled and challenges to be considered in future studies. This review searched the Scopus, PubMed, and Web of Science databases to identify studies that used wearable sensors to detect or predict FOG episodes in Parkinson’s disease. Following screening, 74 publications were included, comprising 68 publications detecting FOG, seven predicting FOG, and one in both categories. Details were extracted regarding participants, walking task, sensor type and body location, detection or prediction approach, feature extraction and selection, classification method, and detection and prediction performance. The results showed that increasingly complex machine-learning algorithms combined with diverse feature sets improved FOG detection. The lack of large FOG datasets and highly person-specific FOG manifestation were common challenges. Transfer learning and semi-supervised learning were promising for FOG detection and prediction since they provided person-specific tuning while preserving model generalization.


Introduction
Parkinson's disease (PD) is a progressive neurodegenerative condition that presents numerous life-altering symptoms, including the characteristic upper-limb trembling [1]. In moderate to advanced PD, locomotion can deteriorate into a flexed upper body posture with small shuffling steps, an anteriorly-shifted centre of mass, decreased walking speed, poor balance, increased gait variability, and freezing of gait (FOG) [2][3][4][5][6].
A FOG episode is a complex and highly-variable phenomenon defined as a "brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk" [7]. Freezing is often described as the sensation of having one's feet glued to the floor with an inability to initiate the next step, and becomes increasing common as PD progresses [2,8]. Although typically lasting only a few seconds [9], freezes can lead to falls [10][11][12]. Since FOG can occur multiple times a day, most commonly between doses when medication is wearing off [11,13], FOG related fall risk 1.
Use wearable sensor data as input (direct from sensor or wearable sensor datasets).

2.
Involve people with PD, or data from people with PD, who experience FOG.

3.
Primary goal of detecting or predicting FOG. Articles were not included if they examined cueing using a FOG detection method developed in previous research and reported in another article, or if they only classified individuals as freezers or non-freezers, rather than detecting freezing episodes.
Articles were excluded if they were not published in English, if they were not full texts (abstract only publications were excluded), or if they lacked adequate descriptions and explanations of the detection or prediction methods (i.e., training and testing methods not described, important variables not defined, results not presented).
Eligible articles were used to extract, where available, the following characteristics: population, data collection location and summary, sensor type and location, FOG detection and prediction method (i.e., classifier or machine-learning algorithm), features, whether feature extraction and selection were used, classification performance, and evaluation in real-time.

Article characteristics included:
• Population: The number of participants in the study, i.e.: healthy controls (HC), people with FOG symptoms (FOG-PD), people with no FOG symptoms (NFOG-PD), and FOG symptom status unknown or not reported (UFOG-PD); the number of PD participants who froze during data collection, medication state during data collection (ON or OFF), number of FOG episodes. • Data collection location and summary: Whether data collection was performed in a laboratory setting or in the participant's home. Summary of walking tasks performed.

•
Sensor type and location: The type and number of sensors used, sensor location on the body. • FOG detection method: Methods used to detect and predict FOG, i.e., general approach (e.g., machine-learning model), model training method (person-specific: trained using data from a single person; or person-independent: trained using data from multiple people and not customized for an individual), whether the data was windowed, window length, and extent of detection (i.e., detection performed on each data point, window, or FOG event, etc.). Where multiple methods were attempted, the method with the best performance or research focus was reported. • Feature extraction and feature selection: Features are variables calculated from sensor data. Feature selection uses feature ranking, filtering, or other techniques to produce an appropriate feature subset with fewer redundant features. Reporting features that performed best in FOG detection or comparing detection performance of different features after model testing was not considered as feature selection. • Classifier performance: Sensitivity, specificity, other performance metrics reported. • Real-time: Reporting the detection of a FOG episode as it occurs. In this review, real-time refers to detection using a live wearable-sensor data stream.
Feature analysis included: • Feature Name: Feature name or a short description if not named in the cited article. • Sensor Type: The type of sensor to calculate the feature: accelerometer (Acc), gyroscope (Gyro), force sensitive resistor (FSR), electromyography (EMG), electroencephalogram (EEG), galvanic skin response (GSR), goniometer, telemeter, or camera-based motion capture (CBMC) (included if used with wearable-sensor).

•
Sensor Location: Body location where the sensor was placed. • Feature Description: Brief explanation of the feature. • Source: Articles that used the feature as input for FOG detection or prediction.

Results
The initial search provided 323 documents. An additional 10 articles that did not appear in the search but were referenced by other articles were included, resulting in 333 articles. After removing duplicates, 178 documents were available ( Figure 1). Following screening and eligibility assessment, 74 articles were included in the review: 68 on FOG detection, seven on FOG prediction, and one article in both categories. Study characteristics related to population, data collection location and summary, sensor type and location, FOG detection method, feature extraction and selection, classifier performance, and whether analysis was performed in real time are presented in Table 1. Features extracted from wearable sensor data are presented in Table 2. Table 3 presents a summary of the top machinelearning methods from studies that compared different machine-learning classifiers for FOG detection using wearable sensors. Study characteristics related to population, data collection location and summary, sensor type and location, FOG detection method, feature extraction and selection, classifier performance, and whether analysis was performed in real time are presented in Table 1. Features extracted from wearable sensor data are presented in Table 2. Table 3 presents a summary of the top machine-learning methods from studies that compared different machine-learning classifiers for FOG detection using wearable sensors.  [58]). [52] reported 182 FOG episodes and [58] reported 184 episodes. IMU (9) on wrists, thighs, ankles, feet, and lower back.   Variance in given window. Calculated for acceleration or angular velocity data in given window, for 3 axes. In [83] and [91], variance calculated for FFT signal and detail and approximation coefficients from discrete wavelet transform. [42,45,54,83,91] Acceleration indicator (S AC ) Acc Shank, thigh, lower back Binary value, to detect acceleration in each axis S AC = sgn( X − X − σ ) + , where X is a set of acceleration data, X is mean of X, σ is standard deviation of X, and sgn(a) is a sign function of a while (a) + returns a only if a ≥ 0, otherwise returns 0. [20] Zero velocity and Trembling event intervals (ZVEI, TREI) Acc, Gyro Heel Direction of gravitational acceleration used to calculate ZVEI and TREI to determine if foot is stationary (zero velocity) or trembling, from all acceleration and angular velocity axes. [82] Foot speed Acc, Gyro Heel Foot position, orientation, and velocity, from 3 axis acceleration and angular velocity [106]. [82] Integral Acc Waist, shank, thigh, low back Integral of acceleration in given window, for given axis. [ For acceleration x within a window of N data points. Calculated for 3 axes. [86] v-order 2 and 3 Acc Shank, thigh, low back v2 = For acceleration x within window of N data points. Calculated for 3 axes.
[86] For acceleration x within window of N data points. Calculated for 3 axes. [86] Average amplitude change Acc Shank, thigh, low back For acceleration x within a window of N data points. Calculated for 3 axes. [86] Difference absolute standard deviation Acc Shank, thigh, low back For acceleration x within window of N data points. Calculated for 3 axes. [86] Maximum fractal length Acc Shank, thigh, low back MFL = log 10 For acceleration x within window of N data points. Calculated for 3 axes. [86] Step length Acc, CBMC Waist, thigh, shank, foot Distance (m) between consecutive footfalls of the same limb, measured as double integral of A/P acceleration or by camera-based motion capture. [35,71,77] Step duration Gyro Thigh, shank, ankle, foot Duration (s) between consecutive footfalls of same limb, calculated from angular velocity peaks (raw or filtered) [35,50,71] Cadence Acc, Gyro Feet, shank, thigh, waist Number of steps in given time (e.g., steps/minute), from time between peaks in angular velocity, vertical acceleration, second harmonic of acceleration in frequency domain [65], or calculated as in [107]. [35,49,65,77] Cadence variation Acc Waist Standard deviation of cadence, from last 3 windows. [49] Stride peaks Gyro, Angular velocity Shank (ankle) Peak of low pass filtered (4 th order Butterworth 10 Hz) angular velocity within gait cycle, in frontal plane. [88,89] Zero Crossing rate, mean crossing rate Acc Shank, thigh, low back Number of times acceleration signal changes between positive and negative. Number of times acceleration signal changes between below average and above average in a given window. Calculated for 3 axes. [45] Signal vector magnitude Acc Shank, thigh, low back Summation of Euclidean norm over 3 axes over entire window, normalized by window length. [45]  Asymmetry coefficient Acc Shank, thigh, low back The first moment of acceleration data in window divided by standard deviation over window. Calculated for 3 axes. [45] Freezing of gait criterion (FOGC) Gyro, Acc Shank Cadence and stride length measure, for stride n FOGC n = C n L min C max (L n + L min ) where C n is cadence, L n stride length. Maximum cadence C max set to 5 strides/s, and minimum stride length L min = 5 cm. Cadence and stride parameters calculated from angular velocity and acceleration [108] [ 46,47] FOG detection on glasses (FOGDOG) Acc Head where D is cumulative forward distance travelled by person during window, D re f pre-set normal forward distance travelled, N step cadence (number of steps/s), N max pre-set maximum normal cadence, forward distance from double integral of forward acceleration after correction for head tilt angle, step length from [109].   R value is calculated once for each stride.
ABS is absolute value of moving average angular velocity in sagittal plane, sEMG surface EMG signal, max(ABS) maximum ABS during a stride, sEMG| t=t max(ABS) value of surface EMG at that instant. [94] Ratio of height of first peak EMG EMG: shank (tibialis anterior) Height of peak at origin in autocorrelation of filtered EMG signal, in a given window. [38,110] Lag of first peak (not at origin) EMG EMG: shank (tibialis anterior) Autocorrelation of filtered EMG signal, in a given window. [38,110] Pearson's correlation coefficient (PCC) Acc, Gyro, FSR Shanks, thighs, waist, FSR: under feet Similarity between two signals, with n sample points, x i , y i , i th value of x and y signals; means x, y Calculated between acceleration axes or between FSR force of a step compared to template "normal" step. [37,50,57,74,75,81] Ground reaction force FSR Under heel, ball of foot Sum of forces from all force sensing resistors (FSR) under a foot. [37] Shank displacement Acc, Gyro Shanks Shank displacement (m) calculated from vertical acceleration and pitch angular velocity [111]. [50] Change of the shank transversal orientation Gyro Shanks Rotation angle in transversal plane, calculated as integral of angular velocity data about vertical axis, for each limb and each stride. [50] Auto regression coefficient Acc Waist Four auto-regression coefficients obtained by Bourg method from acceleration in all 3 axes [112]. [57,74,81] Entropy Acc, Gyro, EEG Acc: ankle, pants pocket, waist, wrists, chest, thigh Gyro: chest, waist, low back EEG: head Shannon's entropy: where discrete variable x contains n values, P is probability (often defined from histogram), calculated from each axis of acceleration or angular velocity in time and frequency domains, or filtered EEG voltage from multiple scalp locations. [39,42,43,45,54,64,66,87,96] Direct transfer function EEG Head Application of coherence directionality in multi-variate time series [113]. Signals from motor control regions: O1-T4 (visual), P4-T3 (sensorimotor affordance), Cz-FCz (motor execution) and Fz-FCz (motor planning). Data filtered band-pass (0.5-60 Hz), band-stop (50 Hz), then normalized with a z-transformation.
Total power Acc Lower back, thigh, shank where P is the power spectrum of the acceleration signal for a window of length M [116,117]. Calculated for 3 axes [86] Mean power Acc Lower back, thigh, shank where P is power spectrum of acceleration signal for window of length M [116,117]. Calculated for 3 axes. [86] Energy Derivative ratio (EDR) Acc Lateral waist Derivative of vertical acceleration energy in 3-8 Hz band divided by derivative of energy in 0.5-3 Hz band. [49,77] Median frequency Acc Lower back, thigh, shank where P is the power spectrum of acceleration signal for a window of length M [116,117]. Calculated for 3 axes. [86] Peak frequency Acc Lower back, thigh, shank where P is power spectrum of acceleration signal for a window of length M [116,117]. Calculated for 3 axes. [86] Peak amplitude, Frequency of peak amplitude Acc Waist, thighs, shanks Maximum value in frequency domain and corresponding frequency bin. Calculated for [0.5-3 Hz] band and [3][4][5][6][7][8] band. In [41] relative acceleration signal is used, defined in [105]. [41,64]  x(n), is amplitude of bin n, and f (n) is frequency of bin n: Calculated from 3 axis acceleration signal, filtered EEG voltage calculated within specific frequency bands, knee angular rotation or telemeter voltage. [57,66,74,81,86,96] 1 st 2 nd 3 rd spectral moments Acc Lower back, thigh, shank where P is power spectrum of acceleration signal for window of length M [116,117]. Calculated for 3 axes. [86] Spectral coherence Acc, EEG Lower back, thigh, shank, EEG: head Calculated from 3D acceleration or filtered EEG data using Welch method [118] C xy (ω) = P xy (ω) where ω is frequency, P xx (ω) is power spectrum of signal x, P yy (ω) is power spectrum of signal y, and P xy (ω) is cross-power spectrum for signals x and y. Also used with wavelet power spectrum in [96]. EEG signal from 4 locations: O1-visual, P4-sensorimotor affordance, Cz-motor execution, and Fz-motor planning. Filtered bandpass (0.5-60 Hz). [20,67,69,96] Max amplitude and number of peaks of spectral coherence Acc Foot, shank, thigh, lower back/hip Maximum amplitude and number of peaks of spectral coherence feature [20]. [67] Discrete wavelet transform (DWT) Acc, EMG Lower back, thigh, shank, EMG: quadriceps Discrete wavelet transform, Decomposition coefficients (approximate and detail coefficients) used as features. Calculated from the acceleration 3D vector magnitude each axis individually, or the raw EMG signal. [51,56,71,79] Select bands of the CWT Acc Lower back, thigh, shank Continuous wavelet transform in specific ranges (0.5-3 Hz, 3-8 Hz), also ratio of signal in 0.5-3 Hz band divided by signal in 0.3-8 Hz. Calculated for 3 axes.
[96] Table 3. Top machine-learning methods from studies that compared different machine-learning classifiers for FOG detection using wearable sensors.

Machine-Learning Methods Tested Best Method Second Best Third Best Source
Random forests, decision trees, naive Bayes, k-nearest neighbor (KNN-l) (KNN-2), multilayer perceptron NN, boosting (AdaBoost) and bagging with pruned decision trees.
The best performing classifiers for FOG detection were convolutional neural networks, support vector machines, random forest, and AdaBoosted decision trees (Table 3).

Decision Trees
Decision trees are a series of binary selections that form branches resembling a tree structure. More complex decision trees can improve performance. For example, random forest classifiers use multiple decision trees, where the final decision is the majority vote of the individual trees. Boosting can also improve performance. AdaBoosting (adaptive boosting) repeatedly retrains the classifier, placing increasing importance on incorrectly classified training examples [124,125]. LogitBoosting (logistic boosting) [126], RUSBoosting [127], and RobustBoosting [128] are extensions of AdaBoosting that can further improve performance [85]. Decision trees for FOG detection included ensembles of trees and boosting techniques [42,43,85], with performance results ranging from 66.25% to 98.35% for sensitivity and 66.00% to 99.72% for specificity [25,39,42,43,45,52,54,58,85].

Support Vector Machines (SVM)
Support vector machines are binary (two class) classifiers that trace a plane to separate data points from each class. New data points are then classified based on their side of the plane. If data points are not easily separable, a kernel can transform the data into a dimension that is linearly separable [125]. SVM for FOG detection achieved 74.7%-99.73% sensitivity and 79.0%-100% specificity [64,71,74,75,81,83,86].
Different NN subtypes have been used in FOG detection and prediction, such as convolutional [85,90] and recurrent [97,100] NN. Convolutional neural networks (CNN) have become popular in numerous applications, including medical image analysis, in part due their ability to recognize local patterns within images and because feature selection prior to implementation is not required [130,131]. CNN performed well for FOG detection [85], achieving 91.9% sensitivity and 89.5% specificity. Recurrent NN have recently been used for FOG prediction due to their applicability to time-series data [97,100]. Recurrent neural networks (RNN) utilize previous data in addition to current inputs during classification [132], thus giving the network "memory" to help recognize sequences [133].
A long short-term memory network (LSTM), a type of RNN, was used for FOG prediction [100], achieving over 90% accuracy when predicting FOG 5 s in advance.

Unsupervised and Semi-Supervised Models
Since freezing manifests differently for each person, person-specific models outperformed person-independent models [42,58,74,86] (with some exceptions as in [53]). However, in practice, it is difficult to obtain enough data to develop a model for an individual. To address this small dataset problem, unsupervised learning has been attempted. These methods do not rely on experts labelling FOG episodes. Instead, clustering techniques are used to define the classes [87], or an anomaly detection approach is used to define the normal class and then identify abnormalities (such as FOG) that do not conform to that class [45,90]. Unsupervised FOG detection approaches are appealing since they do not require data labelling; however, few studies have used unsupervised FOG detection, and unsupervised models performance has been worse than supervised models [90].
Recently, transfer learning, which uses a previously-trained network as a base and adapts the model to a new task [100], and semi-supervised learning, which uses both labeled and unlabeled data during training [69,88,89], have been used to create partly personalized FOG detection methods without large amounts of data. In [100], transfer learning trained a neural network using group data before adding an additional network layer that was trained using an individual's data. Semi-supervised learning methods [69,88,89] use labeled data to train a base classifier before updating in an unsupervised manner. This reduces the need for labeled data and preserves the generalization ability from a multiple person data set, while allowing person-specific tuning. Semi-supervised learning theoretically combines the advantages of both supervised and unsupervised learning. When applied to FOG detection, performance achieved 89.2%-95.9% sensitivity [69,88,89] and 93.1%-95.6% specificity [69,88,89]. Although the methods are promising, due to a current shortage of studies, the value of these methods for FOG detection remains unclear.

Limitations and Challenges of FOG Detection
FOG detection and prediction is affected by the participant's medication state (ON and OFF), with substantial effects on motor control, gait patterns, and physical abilities. Freezing occurs more frequently in the OFF state than the ON state. In the OFF state, smaller shuffling steps are common, whereas in the ON state, many people can walk fairly normally. A machine-learning model trained during a person's optimal medication state may perform worse if the medication wears off and their unassisted gait changes. Given that medication is needed in PD management, medication state is crucial contextual information for FOG detection and prediction research.
With machine-learning algorithms becoming more prevalent, larger FOG detection and prediction datasets are needed for model development. FOG studies ranged from 1 to 32 participants, with most studies having more than 10 participants. Studies involving few participants may not adequately validate a FOG detection method, especially when machine-learning algorithms are involved. Data augmentation techniques [85] or additional testing with more participants are required. On the other hand, large participant pools may not guarantee unbiased datasets since some participants freeze many times during data collection, while others may not freeze at all. For example, in [48], only 6 of 20 participants froze during data collection, which may lead to person-biased models that over-represent the few individuals with FOG data. Difficulty in participant recruitment and FOG unpredictability are therefore challenges that may limit the availability and quality of training data.
Following data collection, FOG episodes are typically visually identified and labelled. Visual FOG identification is currently the gold standard. These labels are ground truth for detection method validation. Even though FOG is a well-defined clinical phenomenon [7], the criteria for defining the beginning and end of FOG episodes [24,25,98] was not defined in some articles. Differing FOG definitions make comparison between studies problematic. Published datasets can provide consistent ground truth FOG labelling. The Daphnet [24] (10 participants) and CuPiD [101] (18 participants) datasets provide consistent input but fewer than 250 FOG episodes; thus, dataset size may be an issue for machine learning, especially if deep learning is used [85].
When evaluating a classification system, ideally, different data are used for training and testing, as in [25,38,51,55,56,64,66,67,85,96,97,99,100], in order to prevent model performance overestimation that can occur when the model is evaluated using data used in model training. Cross-validation is often used when the dataset size is limited, as done in [24,[31][32][33]39,42,43,45,52,54,58,74,75,81,[86][87][88][89][90]98]. For FOG research, leave-one-person-out cross-validation was the most common. In this method, model training used data from all but one participant, model testing used data from the remaining participant, the process was repeated for each participant, and the performance results were averaged. Other studies, often more preliminary in nature, used ad hoc optimization to tune parameters and set thresholds [34,44,48,[59][60][61][62][63]95]. This approach, although useful for initial system assessment, is not a good indicator of classifier performance, and should be followed by a more robust evaluation scheme, such as cross-validation.
Feature calculation from wearable sensor data is typically done using data windows. Window lengths ranged from 0.2 to 32 s [36,48,92,93], with the most common window length being 1 s. Long windows with many sample points are desirable for calculating frequency-based features involving the discrete Fourier transform, since the number of sample points in the input signal will determine the output frequency bin resolution. However, long windows decrease the temporal resolution and do not permit distinguishing short events within the window. In addition, long windows with many data points may be slower to process and may introduce unwanted lags between data acquisition and classification for detection or prediction. Studies comparing multiple window lengths found that, in general, 1-4 s windows are preferable [42,44,48,57,63,64].
FOG detection studies used different performance metrics. For example, a FOG detection system used to trigger a real-time cue during walking might emphasize freeze onset detection. This detection system might attempt to classify every data point or window as FOG or no freeze, and be evaluated using the number of correctly classified instances [24,[31][32][33]. In contrast, a long-term monitoring system may treat each freeze occurrence as a binary event and evaluate whether the FOG event was successfully detected [74,75]. Experimental procedures and underlying definitions, such as ignoring FOG shorter than 3 s [43] or calculating specificity with data from participants without FOG [64], also varied between studies. Differences in evaluation metrics and procedures make FOG detection method comparisons more difficult.
To help compare future FOG detection and prediction studies, researchers should include study population details; including, sex, PD severity, number of participants, the number and duration of FOG episodes (ideally for each person), and medication state during testing. Methodologically, the FOG labelling criterion, detailed detection method, validation method, and basis upon which the performance evaluation metrics are calculated should be clearly stated.

FOG Prediction
The FOG prediction studies varied in approach and performance, with most being somewhat preliminary and focusing less on performance and more on understanding the intricacies of FOG prediction. In addition to FOG detection study considerations (e.g., dataset size, medication state, FOG definitions, contextual or study-specific performance metric definitions), FOG prediction studies must define the pre-FOG class using data before freeze onset. FOG prediction is typically done by training a machine-learning model to recognize data from the pre-FOG class. Six of the seven FOG prediction studies selected a pre-FOG segment duration that ranged from 1 s [45,96,100] to 6 s [45]. Since the transition from walking to FOG is subtle, labelling the start of pre-FOG from visual observation is difficult. Instead, a FOG episode is visually identified, and data prior to the FOG are selected using a single fixed duration. Three studies used a 5 s period [96,97,99]; one study used a 2 s period [98]; one used 1,3 and 5 s periods [100]; and one used 1-6 s periods, in 1 s increments [45]. The seventh study [95] used an assumed 3 s period before FOG for feature selection; then, a person-specific, multivariate Gaussian-distribution-based anomaly-detection model was created and manually tuned for each participant.
Optimal pre-FOG segment duration is difficult to determine. If the pre-FOG segment is assumed to be a linear degradation of gait leading to FOG (threshold theory [134]), data closest to the freeze would resemble FOG, and data farther from the freeze would resemble typical PD walking. For a two-class classifier (pre-FOG, typical PD walking), short pre-FOG segments are preferred, since data are closer to FOG onset and likely more distinct from typical walking [100].
A short pre-FOG segment may not be ideal when using a three-class classifier consisting of typical PD walking, pre-FOG, and FOG classes as in [45], which found that very short pre-FOG segments made it difficult to distinguish between the pre-FOG and FOG classes. Longer pre-FOG segments improved pre-FOG classification but greatly reduced FOG and typical walking classification accuracy. The best performing pre-FOG segment duration differed across participants, and likely between individual FOG episodes for the same person [45]. The observation that a single pre-FOG duration is inadequate is also supported by [95,98]. For this reason, a person-specific or episode-specific pre-FOG duration may help to reduce overlap with the walking class and increase class purity (contain only pre-FOG data), thus improving pre-FOG detection performance.
A feature set can be more representative of the wide range of FOG manifestations. Studies that combined time and frequency domains features [96] had better performance than either type of feature individually. Time domain features can account for gait parameters such as step length [35,71,77] cadence [49], asymmetry [45], and peak limb angular velocity [88,89], whereas frequency domain features can capture more subtle patterns characteristic of FOG, such as trembling in specific frequency bands [29]. The best performance is typically achieved with multiple features.
The choice of features is very important, especially for real-time systems, where, in addition to classification performance considerations, classification speed is critical. For example, the calculation of stride duration at the end of the stride (approximately 1 s) could result in the delayed detection of a FOG event. Other features such as step length, cadence, cadence variation, stride peaks and FOGC may share this limitation, depending on the feature calculation method. Features extracted from appropriately-sized windowed data do not have this problem, since the features can be calculated as soon as the data window is available. The feature availability to the classifier is determined by the step size of the sliding window and calculation delay. All of the window-based features in Table 2 could be used in a real-time application, given sufficient processing power. However, an excessive number of features or complex features requiring many calculation stages may induce unacceptable delays when computing power is limited, as in many wearable systems. Using a minimal number of easily-calculated features is desirable; however, too few or overly-simple features may adversely impact classification performance. To address the delicate balance of classification performance versus classification speed, feature selection algorithms can be used to determine the best features from a larger set, as implemented in [45,51,55,56,58,66,67,76,80,83,86,95,96,98,99]. Algorithms such as the Relief-F or correlation-based approaches can be used to rank features according to their relevance so that the least relevant can be eliminated [139]. The most used feature selection methods in this review were paired t-tests [86,98], mutual information [45,58,67,95], and the Wilcoxon sum rank test [55,66,76,96]. The topic of feature selection is broad and encompasses numerous methods that can be used to improve classifier models. Given the diversity of features in the literature for detecting and predicting FOG, the best feature or feature set has yet to be determined. For future studies, it is generally suggested to begin with multiple features that can then be tuned or eliminated using feature section methods to produce a set of optimal features.

Conclusions
Based on 74 freezing of gait detection and prediction articles, this review reported details of the participants, walking task, sensors, features extracted, detection and prediction methods, and performance. The continued development of high-performing FOG detection methods is important for long-term monitoring and real-time cueing, and together with development of FOG prediction systems, is important for implementation in gait-assist systems. While FOG detection methods have been steadily increasing in performance, important challenges remain. Small FOG datasets may limit the machine-learning models that can be used, especially for deep learning. Sets of diverse features in both the time and frequency domains have helped to represent the inconsistent nature of FOG. The adoption of transfer learning, and semi-supervised learning models, built upon the established FOG detection methods, could add an element of personalization while preserving the robust generalization of person-independent models, thus making them promising approaches for future FOG detection and prediction research.