Multivariate Multiscale Symbolic Entropy Analysis of Human Gait Signals

: The complexity quantiﬁcation of human gait time series has received considerable interest for wearable healthcare. Symbolic entropy is one of the most prevalent algorithms used to measure the complexity of a time series, but it fails to account for the multiple time scales and multi-channel statistical dependence inherent in such time series. To overcome this problem, multivariate multiscale symbolic entropy is proposed in this paper to distinguish the complexity of human gait signals in health and disease. The embedding dimension, time delay and quantization levels are appropriately designed to construct similarity of signals for calculating complexity of human gait. The proposed method can accurately detect healthy and pathologic group from realistic multivariate human gait time series on multiple scales. It strongly supports wearable healthcare with simplicity, robustness, and fast computation. trivariate human gait stride interval under different walking conditions. p -values, the comparison of unconstraint and metronomically paced walking at various speeds (slow, normal, and fast) with MMSyEn, show statistical signiﬁcance at different scales. “No” represents the statistical difference is not obvious. The threshold θ 1 = 0.2 × (the sum of standard deviation of the multivariate time series) and θ 2 = 15 ms.


Introduction
The human gait is a nonlinear dynamic behavior based on the feedback of space and time, mainly controlled by the nerve and locomotor systems.Its outputs exhibit significant fluctuations on account of multiple interacting components commanded by various complex physiological systems [1].Several conventional approaches and standard physical examinations cannot provide a complete pathological description about these complex fluctuations and the emerging complexity of an abnormal human gait.In recent decades, it has been clearly demonstrated that the complexity of the human gait can be adequately analyzed by stride interval time series, which is the gait cycle and is defined as the time interval of the same limb from a heel-strike to heel-strike again [2].
Hausdorff et al. [3][4][5] discussed human gait variability with aging, certain disease states, and even different walking conditions by applying detrended fluctuation analysis (DFA) and observed that there are more random or less correlated in elderly subjects and in subjects with Parkinson's disease (PD) and Huntington's disease (HD).The multiscale entropy (MSE) method proposed by Costa et al. [6] defined a quantitative measure of complexity that is large for both correlated stochastic processes and normal walking conditions.Aziz et al. [7,8] employed symbolic entropy (SyEn) to characterize human gait signals of pathological subjects with neurodegenerative diseases such as PD, HD, amyotrophic lateral sclerosis (ALS) and healthy subjects.Goshvarpour et al. [9] evaluated nonlinear and complexity characteristics of gait signals in healthy subjects who walked at their usual, slow, and fast paces with Poincare plots, Hurst exponents, and the Lyapunov exponents.Obviously, various nonlinear methods were introduced to study complex human gait and physiological signals.Among these methods, entropy-based algorithms have received considerable attention in quantifying the complexity of physiological systems for the sake of potential implications with respect to evaluating dynamical models of biologic control systems and bedside diagnostics [10].For instance, loss of complexity due to the reduction of physiologic information content and individual adaptive capacity resulted in aging and disease states has been proposed as a generic feature of pathologic dynamics [11,12].
Much work has been devoted to develop entropy-estimation algorithms applied to complexity measure for distinguishing physiologic signals in health and disease.Traditional entropy-based algorithms, such as Shannon entropy [13], Kolmogorov entropy [14], spectral entropy [15], wavelet entropy [16], approximate entropy [17], and sample entropy [18], are not always associated with dynamical complexity for estimating entropy.For example, some algorithms assign a larger entropy for certain pathologic processes that are generally presumed to represent less complexity than for healthy dynamics [11].In fact, this is misleading, especially when the signal comes from more complex systems, with underlying significant correlation over multiple spatio-temporal scales.Therefore, the MSE technique [10] and its modified algorithms [19][20][21][22] are introduced to applicable analysis of various time series.Permutation entropy (PE) [23] and its improved algorithms [24,25] have been proposed as characteristics extraction in order to make a comprehensive analysis of biological and economical systems [26].However, Qumar et al. [27] concluded SyEn is a more statistically significant separation than MSE between normal and walking under various stress conditions.Moreover, SyEn has higher calculation efficiency on the whole for different time series data lengths.PE is simple and computationally fast [23].Nevertheless, PE does not consider the influence of the difference between amplitude values for a given time series and takes a litter more computation time than SyEn [26,28].Consequently, SyEn [7,8,27,29] can provide an accurate assessment of the dynamic behavior and takes less computation time for various signals.
Recently, to improve the predictable accuracy of pathologic signals, there are multichannel physiologic signals or synchronous coupling multivariate time series in experimental measurements [19,25].SyEn may be inapplicable due to the fact that these signals are statistically dependent or correlated at a certain degree.More importantly, the wearable health monitoring approach prefers simple and fast computation capability and robustness in the presence of noise.To meet these demands, a multivariate multiscale complexity measure method is proposed to robustly distinguish physiologic signals in health and disease with high computation efficiency.
In this paper, multivariate multiscale symbolic entropy (MMSyEn) is proposed to accurately quantify the complexity measure considering both within-and cross-channel dependencies and coupling in multiple channels complex signals over a range of scales.The MMSyEn calculation results of simulated stochastic data and experimental gait signals obtained under different disease states and walking conditions demonstrate the advantageous performance in the complexity quantification and characteristics extraction of real-world time series.

Multivariate Multiscale Symbolic Entropy
For a given p-variate time series x k,i N i=1 , k = 1, 2, • • • , p, the consecutive moving-averaging multivariate time series {y ε k,j } is constructed at scale factor ε, according to the equation: For scale one, the time series {y 1 k,j } is simply the original time series.Depending on the above data, the moving-averaging multivariate time series is transformed into symbolization sequence of "1" and "0" with respect to a given threshold.Therefore, quantization level 2 (symbols 0 and 1) is applied for symbolization and following is the criteria Entropy 2017, 19, 557 3 of 10 where y k is the mean of p-variate time series, δ is quantization levels, and θ is the threshold.There are two main methods to define the threshold value θ: a fixed number and the threshold value θ = ζ × SD(y) (SD is the standard deviation) normalized to the standard deviation of the time series [7].
The embedding vectors τ can be obtained based on {Y ε,δ k } using the embedding dimension m and the time delay τ.The embedding vectors series are generated decimal numbers as For word code series (embedding vectors series) having embedding dimension m and quantization level δ, the total number of all possible words is δ m .Here, eight different types of words can be obtained by using the probability distribution of embedding dimension 3 (a word consisting of three symbols) and quantization level 2. Accordingly, the probability of each type of words is p(w ε,δ k,m ).Then, Shannon entropy is calculated as In order to avoid the impact of random error in numbers and a systematic error or bias, Eguia et al. [30] proposed correction terms for the Shannon entropy (CSE) CSE(k, ε) = SE(k, ε) + (C − 1)/(2M ln 2), where M (equals δ m ) is the total number of words and C is the number of occurring words among the possible words.Unfortunately, there is still the problem to compare two values of CSE for two different embedding dimension m at the same threshold θ and same quantization level δ.To overcome the problem, normalized corrected Shannon entropy is employed to define MMSyEn as: where the maximum value CSE max = − log 2 (1/M) + (M − 1)/(2M ln 2) is obtained when all M words occurs with uniform distribution in a data series.It can be known that MMSyEn will vary from 0 to 1 for any parameters.A bigger MMSyEn value implies that the time series is more complex and irregular.On the contrary, a smaller MMSyEn value indicates that the time series is more regular and periodic.
In order to more clearly demonstrate the algorithm, three small segments of human gait time series are selected in Figure 1a.Then, the 3 channels time series x k,i N i=1 (k = 1, 2, 3) are respectively calculated to obtain the moving-averaging time series at different scales.Figure 1c demonstrates the symbolization process.For a quantization level of 2, when the absolute values of the difference between all data values and their mean are above the threshold, the symbol series is labeled as 1 and the rest as 0. Subsequently, the symbol series is generated.After defining the embedding dimension m = 3 and the time delay τ = 1, the symbol series is converted into decimal series as shown in the last of Figure 1c.The histograms generated from the decimal series are plotted in Figure 1d.Finally, the last histogram is used to calculate the MMSyEn.

Results and Discussions
The MMSyEn analysis is evaluated for multichannel stochastic data and real-world multivariate human stride interval recordings.All computation and analysis were run on a computer with following specifications: operating system (Windows 7 Professional 64-bit), processor (AMD Athlon X2 245 @ 2.9 GHz), memory (4 GB RAM).To evaluate the statistical significant difference of the MMSyEn values for signals, the Mann-Whitney U test (also known as Wilcoxon rank sum test) was applied to calculate the p (p < 0.01) values.

Validation on Synthetic Stochastic Data
To illustrate the corresponding behavior of numerical simulations for the method of MMSyEn, it is necessary to generate four types of trivariate time series simultaneously containing white noise (the number of variables from 3-0) and independent 1/f noise (the number of variables from 0-3).Furthermore, trivariate white and 1/f noise and the corresponding correlated time series are generated to illustrate that the proposed MMSyEn fully caters for both within-and cross-channel correlations.The values of the parameters applied to calculate MMSyEn in this section are τ = 1, θ =

Results and Discussions
The MMSyEn analysis is evaluated for multichannel stochastic data and real-world multivariate human stride interval recordings.All computation and analysis were run on a computer with following specifications: operating system (Windows 7 Professional 64-bit), processor (AMD Athlon X2 245 @ 2.9 GHz), memory (4 GB RAM).To evaluate the statistical significant difference of the MMSyEn values for signals, the Mann-Whitney U test (also known as Wilcoxon rank sum test) was applied to calculate the p (p < 0.01) values.

Validation on Synthetic Stochastic Data
To illustrate the corresponding behavior of numerical simulations for the method of MMSyEn, it is necessary to generate four types of trivariate time series simultaneously containing white noise (the number of variables from 3-0) and independent 1/f noise (the number of variables from 0-3).Furthermore, trivariate white and 1/f noise and the corresponding correlated time series are generated to illustrate that the proposed MMSyEn fully caters for both within-and cross-channel correlations.The values of the parameters applied to calculate MMSyEn in this section are τ = 1, θ = 0.2 × sum (SD) (the sum of standard deviation of the trivariate time series) and other determinate parameters for each Entropy 2017, 19, 557 5 of 10 data channel.Figure 2 shows the MMSyEn curves for the cases considered; notice that the MMSyEn values of each type of trivariate time series monotonically decrease with the scale factor and MMSyEn is larger at higher scales as the number of variates containing 1/f noises increases, and when all the three data channels contain 1/f noise, the complexity at larger scales is the highest.Consequently, it is worth noting that the analysis of complexity is consistent with the previous research [6,10,19] that 1/f noise (long-range correlated) is structurally more complex than uncorrelated random signals.
0.2 × sum (SD) (the sum of standard deviation of the trivariate time series) and other determinate parameters for each data channel.Figure 2 shows the MMSyEn curves for the cases considered; notice that the MMSyEn values of each type of trivariate time series monotonically decrease with the scale factor and MMSyEn is larger at higher scales as the number of variates containing 1/f noises increases, and when all the three data channels contain 1/f noise, the complexity at larger scales is the highest.Consequently, it is worth noting that the analysis of complexity is consistent with the previous research [6,10,19] that 1/f noise (long-range correlated) is structurally more complex than uncorrelated random signals.Figure 3 shows that the proposed MMSyEn approach accounts for both within-and crosschannel correlations and is able to distinguish between uncorrelated and correlated trivariate white and 1/f noises.Specifically, the MMSyEn values of four types of trivariate time series monotonically decrease with the scale factor and the MMSyEn values of the correlated trivariate 1/f noise at large scales are the largest, followed by the uncorrelated 1/f noise, correlated and uncorrelated white noise.In other words, the MMSyEn values of the correlated trivariate white and 1/f noise are larger than that of corresponding uncorrelated time series.Therefore, MMSyEn demonstrates that the complexity of correlated multivariate white noise and 1/f noise is higher, which conforms with the underlying physics [19].Figure 3 shows that the proposed MMSyEn approach accounts for both within-and cross-channel correlations and is able to distinguish between uncorrelated and correlated trivariate white and 1/f noises.Specifically, the MMSyEn values of four types of trivariate time series monotonically decrease with the scale factor and the MMSyEn values of the correlated trivariate 1/f noise at large scales are the largest, followed by the uncorrelated 1/f noise, correlated and uncorrelated white noise.In other words, the MMSyEn values of the correlated trivariate white and 1/f noise are larger than that of corresponding uncorrelated time series.Therefore, MMSyEn demonstrates that the complexity of correlated multivariate white noise and 1/f noise is higher, which conforms with the underlying physics [19].
0.2 × sum (SD) (the sum of standard deviation of the trivariate time series) and other determinate parameters for each data channel.Figure 2 shows the MMSyEn curves for the cases considered; notice that the MMSyEn values of each type of trivariate time series monotonically decrease with the scale factor and MMSyEn is larger at higher scales as the number of variates containing 1/f noises increases, and when all the three data channels contain 1/f noise, the complexity at larger scales is the highest.Consequently, it is worth noting that the analysis of complexity is consistent with the previous research [6,10,19] that 1/f noise (long-range correlated) is structurally more complex than uncorrelated random signals.Figure 3 shows that the proposed MMSyEn approach accounts for both within-and crosschannel correlations and is able to distinguish between uncorrelated and correlated trivariate white and 1/f noises.Specifically, the MMSyEn values of four types of trivariate time series monotonically decrease with the scale factor and the MMSyEn values of the correlated trivariate 1/f noise at large scales are the largest, followed by the uncorrelated 1/f noise, correlated and uncorrelated white noise.In other words, the MMSyEn values of the correlated trivariate white and 1/f noise are larger than that of corresponding uncorrelated time series.Therefore, MMSyEn demonstrates that the complexity of correlated multivariate white noise and 1/f noise is higher, which conforms with the underlying physics [19].

Complexity Analysis of Healthy Human under Different Walking Conditions
To demonstrate how MMSyEn applies to real data about a healthy human stride interval recording under different walking conditions, the MMSyEn algorithm is used to calculate the different entropy by considering three walking paces (slow, normal, fast) as multivariate from the same system.Ten young, healthy men whose mean age was 21.7 years (range: 18-29 years), height was 1.77 ± 0.08 meters (mean ± S.D.) and weight was 71.8 ± 10.7 kg, walked continuously on level ground around an obstacle free, long (either 225 or 400 m), approximately oval path, and the stride interval was measured using ultra-thin, force sensitive switches taped inside one shoe [3].Additionally, the subjects walked for one hour on a metronome with the same average speed and time to unconstrained walking state.
In order to discriminate the relative differences of complexity between the unconstrained and the corresponding metronomically-paced conditions, three bivariate time series (slow and normal, slow and fast, normal and fast) and one trivariate time series (slow, normal, and fast) are generated to illustrate the complexity behavior.In addition, the corresponding surrogate time series is obtained by shuffling (randomly reordering) the sequence of each stride interval time series for reasons of investigation that the correlations among the shuffled stride intervals are destroyed, while the statistical properties of the distribution are preserved.
The values of the parameters applied to calculate MMSyEn in this section are τ = 1, θ = 0.2 × (the sum of standard deviation of the multivariate time series) or θ = 15 ms (θ can take other values, such as 8~35 ms), and other determinate parameters for each data channel.In Figure 4, the calculation results of MMSyEn show that when the walking conditions are considered within the multivariate approach (bivariate for any two walking speeds or trivariate for all the three walking speeds), the proposed algorithm can effectively discriminate between the unconstrained and metronomically-paced walking conditions at larger scales.More specifically, for bivariate time series (the top panels in Figure 4), the values of MMSyEn for unconstrained walking are larger at higher scales than that for metronomically paced walking, indicating unconstrained walking has more complex dynamics.The bottom panels in Figure 4 show that the entropy for all stride interval time series monotonically decreases with increasing scale factor and the MMSyEn curves under unconstrained walking are above those under metronomically-paced walking without the error bars overlapped at larger scales.More obviously, the distinction of the trivariate time series is more significant than that of bivariate time series.These analyses fully exhibit underlying correlations, since the MMSyEn method considers all the walking conditions within one unifying model, directly benefiting from the multivariate and multiscale approach.
The MMSyEn results for unconstrained or metronomically paced walking stride interval and their corresponding shuffled time series are also presented.In Figure 4, the values of MMSyEn for all unconstrained walking are larger at higher scales than that for the corresponding shuffled time series.The Mann-Whitney U test shows the statistical difference (p < 0.01) of the entropy between original and shuffled time series.But for metronomically-paced walking, there is no qualitative difference (p > 0.01) between MMSyEn curves corresponding to original stride interval and surrogate time series in contrast to the results for unconstrained walking.With regard to unconstrained walking, the results indicate that the persistent correlations or long-range dependent are presented, while the correlations would decrease for metronomically-paced walking, similar to white noise.
Table 1 shows the p values over a range of scales between unconstrained and metronomically-paced walking conditions according to the statistical test.For larger scales, all the multivariate stride interval time series can evaluate the statistical significance of the entropy statistics between unconstrained and metronomically-paced conditions.Moreover, metronomically paced walking time series share uncorrelated random underlying dynamics both within and cross-channel.On the contrary, unconstrained walking time series are correlated both within and cross-channel.Therefore, at larger scales, the output of the human locomotor system under unconstrained walking is more complex than walking under metronomically paced protocol, and the difference is statistically significant over more scales when all the available walking conditions (multivariate measurements) are considered.Furthermore, the MMSyEn supports the general views of MSE (complexity) loss with aging and disease or the adaptive capacity reduction of biological organization at all levels when a system is under constraints (metronomically paced).Table 1 shows the p values over a range of scales between unconstrained and metronomicallypaced walking conditions according to the statistical test.For larger scales, all the multivariate stride interval time series can evaluate the statistical significance of the entropy statistics between unconstrained and metronomically-paced conditions.Moreover, metronomically paced walking time series share uncorrelated random underlying dynamics both within and cross-channel.On the contrary, unconstrained walking time series are correlated both within and cross-channel.Therefore, at larger scales, the output of the human locomotor system under unconstrained walking is more complex than walking under metronomically paced protocol, and the difference is statistically significant over more scales when all the available walking conditions (multivariate measurements) are considered.Furthermore, the MMSyEn supports the general views of MSE (complexity) loss with aging and disease or the adaptive capacity reduction of biological organization at all levels when a system is under constraints (metronomically paced).

Complexity Analysis of Diseased Human Stride Interval
To evaluate the differences in relative complexity between the healthy subjects and the neuro-degenerative subjects (with PD, HD, and ALS), the stride interval time series of left and right foot is considered as different variables from the same system, and MMSyEn is applied to distinguish between the healthy and diseased subjects.The subjects were instructed to walk up and down a 77-m-long, straight hallway at their self-determined rate for 5 min on level ground [4].PD (n = 12), HD (n = 15), ALS (n = 11) and 14 healthy control subjects are respectively selected to calculate their complexity with the proposed MMSyEn.Before doing that, the singular values of these signals are removed.
In reference [7], when the coefficient ζ of threshold θ was normalized to a unique number, it cannot realize the expectation that SyEn statistically discriminates between the control and all diseased (PD, HD, and ALS) subjects.In other words, to discriminate the healthy controls from all diseased subjects, different coefficients ζ are selected to calculate the entropy for respectively discriminating between the healthy controls and subjects with PD, HD, and ALS.Consequently, the parameters used to calculate MMSyEn are τ = 1, a fixed threshold θ = 4 ms (other thresholds are valid, such as 1~12 ms), and other determinate parameters.Figure 5 shows the MMSyEn curves for the cases considered; the entropy of all subjects is approximately different constants and the complexity of control subjects is the largest at all scales.Moreover, it can be observed from Table 2 that MMSyEn can discriminate between the control and diseased subjects, and the degree of distinction is good.This result also indicates lower complexity of gait responses of diseased subjects with PD, HD, and ALS than the healthy ones, thus reducing the adaptive capacity of biological organization, conforming with the complexity loss theory with disease.

Conclusions
An improved symbolic entropy is proposed to accurately quantify the complexity measure considering both within-and cross-channel dependencies and coupling in multiple channels complex signals over a range of scales.The calculation method of multivariate multiscale symbolic entropy is introduced to obtain the values of MMSyEn with normalized corrected method, selective embedding dimension, and time delay.The values of MMSyEn will vary from 0 to 1.A bigger value implies that multivariate time series is more complex and irregular.
The proposed entropy analysis of multiple channels' time series from synthetic stochastic signal is performed to verify the effectiveness of MMSyEn.It is consistent with the fact that 1/f noise is structurally more complex than white noise.Moreover, human gait signals under different walking

Conclusions
An improved symbolic entropy is proposed to accurately quantify the complexity measure considering both within-and cross-channel dependencies and coupling in multiple channels complex signals over a range of scales.The calculation method of multivariate multiscale symbolic entropy is introduced to obtain the values of MMSyEn with normalized corrected method, selective embedding dimension, and time delay.The values of MMSyEn will vary from 0 to 1.A bigger value implies that multivariate time series is more complex and irregular.
The proposed entropy analysis of multiple channels' time series from synthetic stochastic signal is performed to verify the effectiveness of MMSyEn.It is consistent with the fact that 1/f noise is structurally more complex than white noise.Moreover, human gait signals under different walking conditions and various subjects with different diseases are employed to investigate their multivariate multiscale entropy characteristics.The results of MMSyEn demonstrate that the complexity of healthy and normal gait is more distinct than that of disease and constrained walking conditions.More importantly, the proposed method possesses the advantages of symbolic entropy in term of convenience, robustness, and fast computation.It will be helpful for human wearable devices to monitoring of physiologic signals and personal healthcare in the future.

Figure 2 .
Figure 2. MMSyEn analysis of 10 simulated three-channel data containing different variables of white and 1/f noise with 5000 data points.In each case, values are shown as means ± standard deviations (SDs).

Figure 3 .
Figure 3. MMSyEn analysis of 10 uncorrelated and correlated simulated data containing trivariate white and 1/f noise with 5000 data points.In each case, values are shown as means ± standard deviations (SDs).

Figure 2 .
Figure 2. MMSyEn analysis of 10 simulated three-channel data containing different variables of white and 1/f noise with 5000 data points.In each case, values are shown as means ± standard deviations (SDs).

Figure 2 .
Figure 2. MMSyEn analysis of 10 simulated three-channel data containing different variables of white and 1/f noise with 5000 data points.In each case, values are shown as means ± standard deviations (SDs).

Figure 3 .
Figure 3. MMSyEn analysis of 10 uncorrelated and correlated simulated data containing trivariate white and 1/f noise with 5000 data points.In each case, values are shown as means ± standard deviations (SDs).

Figure 3 .
Figure 3. MMSyEn analysis of 10 uncorrelated and correlated simulated data containing trivariate white and 1/f noise with 5000 data points.In each case, values are shown as means ± standard deviations (SDs).

Entropy 2017, 19 , 557 7 of 11 Figure 4 .
Figure 4. MMSyEn analysis of trials from 10 subjects containing self-paced (solid blue circle) vs metronomically paced (solid red asterisk) stride interval time series and their corresponding randomized surrogates (dashed line) with 1000 data points.Top: bivariate MMSyEn analysis; Bottom: trivariate MMSyEn analysis.In each case, values are shown as means ± standard deviations (SDs).

Figure 4 .
Figure 4. MMSyEn analysis of trials from 10 subjects containing self-paced (solid blue circle) vs metronomically paced (solid red asterisk) stride interval time series and their corresponding randomized surrogates (dashed line) with 1000 data points.Top: bivariate MMSyEn analysis; Bottom: trivariate MMSyEn analysis.In each case, values are shown as means ± standard deviations (SDs).

Table 1 .
Mann-Whitney U test for the bivariate and trivariate human gait stride interval under different walking conditions.p-values, the comparison of unconstraint and metronomically paced walking at various speeds (slow, normal, and fast) with MMSyEn, show statistical significance at different scales."No" represents the statistical difference is not obvious.The threshold θ 1 = 0.2 × (the sum of standard deviation of the multivariate time series) and θ 2 = 15 ms.

Table 2 .
Mann-Whitney U test for the bivariate human gait stride interval from spontaneous output of the human locomotor system during usual walking.p-values, the comparison of healthy controls and subjects with PD, HD, and ALS with MMSyEn, show statistical significance at different scales."No" represents the statistical difference is not obvious.

Table 2 .
Mann -Whitney U test for the bivariate human gait stride interval from spontaneous output of the human locomotor system during usual walking.p-values, the comparison of healthy controls and subjects with PD, HD, and ALS with MMSyEn, show statistical significance at different scales."No" represents the statistical difference is not obvious.