Using the Information Provided by Forbidden Ordinal Patterns in Permutation Entropy to Reinforce Time Series Discrimination Capabilities

Despite its widely tested and proven usefulness, there is still room for improvement in the basic permutation entropy (PE) algorithm, as several subsequent studies have demonstrated in recent years. Some of these new methods try to address the well-known PE weaknesses, such as its focus only on ordinal and not on amplitude information, and the possible detrimental impact of equal values found in subsequences. Other new methods address less specific weaknesses, such as the PE results’ dependence on input parameter values, a common problem found in many entropy calculation methods. The lack of discriminating power among classes in some cases is also a generic problem when entropy measures are used for data series classification. This last problem is the one specifically addressed in the present study. Toward that purpose, the classification performance of the standard PE method was first assessed by conducting several time series classification tests over a varied and diverse set of data. Then, this performance was reassessed using a new Shannon Entropy normalisation scheme proposed in this paper: divide the relative frequencies in PE by the number of different ordinal patterns actually found in the time series, instead of by the theoretically expected number. According to the classification accuracy obtained, this last approach exhibited a higher class discriminating power. It was capable of finding significant differences in six out of seven experimental datasets—whereas the standard PE method only did in four—and it also had better classification accuracy. It can be concluded that using the additional information provided by the number of forbidden/found patterns, it is possible to achieve a higher discriminating power than using the classical PE normalisation method. The resulting algorithm is also very similar to that of PE and very easy to implement.


Introduction
Despite its relatively young age in comparison with other entropy statistics, permutation entropy (PE) has already become one of the most utilised time series entropy-related measures. It was proposed in the well known paper by Bandt and Pompe [1] in 2002, and since then, it has given rise to a number of applications and further algorithm developments. This number is growing exponentially [2], which confirms the goodness of the PE approach.
Regarding PE applications, this measure has been used in many fields, such as medicine, engineering, seismology, and economics. In medicine, it has been frequently used as a diagnostic aid to disclose information hidden in physiological time series. The most common physiological time series processed using PE are probably electroencephalograms (EEG) and electrocardiograms, but in the form of R-wave interval series (RR records). For example, in [3], PE was applied to EEG records to find seizure-free, pre-seizure, and seizure phases. This a recurrent application in many similar

Permutation Entropy
The present study is based on the original PE algorithm described in [1]. This method computes a normalised histogram of ordinal patterns found in the subsequences drawn from a time series, when sorted in ascending order, from which the Shannon entropy is calculated. The length of these subsequences is defined by an input parameter, the embedded dimension m.
Formally, the input time series under analysis is defined as a vector of N components x = {x 0 , x 1 , . . . , x N−1 }. A generic subsequence extracted commencing at sample x j of x is defined as a vector of m components x m j = x j , x j+1 , . . . , x j+m−1 . In its original state, the samples in x m j can be assigned a default growing set of indices given by π m = {0, 1, . . . , m − 1}. The subsequence x m j undergoes, then, an ascending sorting process, and the sample order changes in it are mirrored in the vector of indices π m . The resulting new version of this vector, π m j = {π 0 , π 1 , . . . , π m−1 }, with x j+π 0 ≤ x j+π 1 ≤ x j+π 2 . . . ≤ x j+π m−1 , is compared, in principle, with all the possible m! ordinal patterns of length m. When a coincidence is found, a specific associated counter to that pattern, c i ∈ c, is increased. This process is repeated with all the possible N − (m − 1) subsequences (0 ≤ j < N − m + 1) until the complete histogram is obtained. Each bin of the histogram is finally normalised by N − (m − 1) in order to get an estimation of the probability of each ordinal pattern: p = {p 0 , p 1 , . . . , p m!−1 } p i = c i N−(m−1) . This vector of probabilities is used to calculate PE as: There is another input parameter for PE, the embedded delay τ. This parameter, when τ > 1, defines the time scale at which PE is computed, and it can contribute to gain a deeper insight into the temporal correlations of the time series [23]. However, since this parameter is almost equivalent to a downsampling process [14], and given that the present study is a comparative study in relative terms, we took τ = 1 in all the experiments, as in many other works [1,14,19,22].
Some numerical examples of PE computation can be found in the literature. For examples, see [15,18,24].
For comparative purposes, another improved version of PE will be included in some experiments, WPE [12]. The difference is to use a weight factor applied to the relative PE frequencies that quantifies amplitude values. The weighting factor for each relative frequency is given by w j = w j . Further WPE details are described in [15].

Permutation Entropy Using the Number of Patterns Found
Forbidden patterns, in the sense of patterns that will never occur in a sequence regardless of its length, have been demonstrated to provide additional information about the determinism degree of the underlying time series [21]. These forbidden patterns can even be considered as a new dynamical property [21,25], and have already been used successfully as a quantifier to assess stock market inefficiency [22]. In cases of unobserved patterns due to low probabilities of occurrence in a relatively short time series, namely, not truly forbidden but missing patterns [25], they can also be considered potential distinguishing features if all the records are balanced with regard to length [26]. The numbers of forbidden and admissible patterns are two sides of the same coin, since they are complementary and totalise the theoretical number of possible patterns m!.
In a more formal way, if the probability of an ordinal pattern π i is P π i , for an admissible pattern P π i > 0, whereas P π i = 0 for a forbidden pattern (these probabilities can be thought of as relative frequencies for finite length time series). A forbidden pattern implies there is no x m j in x for which the ordinal pattern is π i . The presence of a forbidden pattern is heuristically linked to determinism [25], and its presence causes an even higher number of forbidden patterns for longer subsequences, which can also be exploited from a classification point of view, as will be described in the experiments later. For example, if π 3 i = {2, 1, 0} is a forbidden pattern of x, for m = 4, {3, 2, 1, 0}, {2, 3, 1, 0}, {2, 1, 3, 0}, and {2, 1, 0, 3} will be also forbidden patterns [20], and so on.
Given that PE looks at the dynamics of a time series in terms of relative frequency of ordinal patters, but overlooks the additional information provided by the number of forbidden/admissible patterns (which is also related to the randomness of the time series [20]), we hypothesised that there could exist a potential synergy between the two sources. After studying several integration possibilities, a straightforward and simple solution was to change the PE probability normalisation factor N − m + 1, which accounts for the number of subsequences that can be drawn from the time series, by the actual number of different ordinal patterns found, termed T.
In principle, PE becomes non-normalised this way, since ∑ ∀p p k = 1 in most cases. The T value can be considered a rescaling or weighting factor that embeds the forbidden patterns information in the modified PE measure, PE2, and its additional class discriminating power would be lost if an individual normalisation took place on a signal by signal basis (intrinsic). In order to keep the PE2 results within a reasonable interval, as with other similar measures, including PE, it is recommended to adopt a global feature normalisation scheme (extrinsic) after PE2 computation, if normalisation is desired.
There are many feature normalisation methods reported in the scientific literature: Z-Score, Min-Max, and other linear and non-linear scaling methods [27]. In case it is necessary, we propose using a linear proportionate normalisation scheme [28]. Each PE2 value can be divided by the sum of all the PE2 values. Therefore, each result accounts for its relative contribution within the entire set of results; it does not destroy proportionality (namely, discriminating power), and the newly computed value can be easily related to the original value. In other words, order and differences are not lost or modified. Moreover, it is not based on arbitrary choices, and its implementation is straightforward. An even more convenient variation of this method would be to divide the PE2 values by the maximum PE2 value obtained in order to keep the range of possible results between 0 and 1 [29], more easily interpretable.
For example, in [15] the sequence  [15]. Once the PE2 values from all the time series under analysis were computed, the normalisation scheme described above could be applied, although it would not be necessary for classification purposes because the differences would already be in the resulting PE2 values.
As for the PE1 algorithm, a PE2 algorithm can be implemented in many ways, but we have chosen to use a bubble sort approach to obtain the ordered indices, and dynamically update the list of different ordinal patterns found, termed Π m (initially empty), instead of assuming a set with all the possible m! permutations. This implementation is less computationally efficient due to the list operations (searching and appending), but it is better suited to implementing the improvements devised in the PE2 algorithm. Besides, it can be more memory-efficient, since, in case of forbidden patterns, as in many chaotic time series [22,30,31], there is no need to store all the theoretically possible m! ordinal patterns, only those really found in the data. This could entail significant memory savings when m is relatively large and/or forbidden patterns are frequent [26]. Last but not least, a linked-list facilitates the implementation and even integration of other PE variants based on a dynamic generation of patterns, such as FGPE [13]. Based on this approach, a suggested implementation is shown in Algorithm 1.
Algorithm 1 Permutation entropy scaled by the number of patters found (PE2) algorithm.
The PE2 algorithm can become equivalent to PE just by replacing T by N − m + 1 at the line labelled DESNORMALISATION in Algorithm 1. If the records are very short, it is not just a question about forbidden patterns but about the low probabilities of certain ordinal patterns. If all the classes exhibit the same behaviour in terms of length, this should not be an influencing factor; otherwise, a length normalisation scheme should be devised.

Experimental Dataset
First, the addition of synthetic databases was considered, since this kind of records is also very useful for characterising the performances of methods under more controlled conditions. In a very recent paper [32], we proposed to use a hidden Markov model to create synthetic records based on transition probabilities of their ordinal patterns of length m = 3. This is a very suitable tool to create a synthetic dataset for the present study, since the main difference between PE1 and PE2 is the use of the number of actual ordinal patterns found. Assigning a 0 probability to some ordinal pattern transitions, the frequency of the destination pattern can be minimised, and for m > 3, the probability of derived patterns is likely to reach 0, since the number of forbidden patterns grows superexponentially [20].
a ij being the transition probability between consecutive states q i and q j at time t, a ij = p State t+1 = q j |State t = q i , and the following correspondence between model states and ordinal patterns: , and q 5 ← (2, 0, 1), 100 records of two synthetic classes were generated using this model. For one class, the transition probabilities were a ij = {0.5, 0.5, 0.0}, and for the second class a ij = {0.5, 0.0, 0.5}, probabilities defined as in [32]. Therefore, each class penalised a different transition, and that would impact the number of patterns found at m = 4 and beyond in a different way for each class, since the model is not symmetric (see details in [32]). An example of the resulting signals is shown in Figure 1. The experiments on these records used 10 random realisations in each test. In addition, a real experimental dataset was chosen with the primary goal of assembling a publicly available collection of widely representative data of the most common time series entropy applications. This would enable other researchers to replicate the experiments and draw sound conclusions about the most likely performance under a disparity of conditions, not just for a single dataset/case, as occurs in many studies. Figure 2 depicts an example of one signal from each dataset used, described next: • BONN. This database was collected at the Department of Epileptology, University of Bonn [33], and is a frequently used dataset found in many similar research studies [17,[34][35][36][37][38]. The length of the records is 4097, with two classes of 100 time series each, corresponding to seizure-free and seizure-included electroencephalograms (EEGs). This dataset was chosen due to its popularity among the scientific community, and because EEGs are the focus of many entropy-related studies. • GAIT. The records included in this dataset were drawn from Physionet gait in the ageing and disease database [39]. Although this is a small collection of gait data, with only five subjects per class, we found it very representative of another group of physiological data, not as common as EEGs, and useful to explore algorithms' performances. The 15 records correspond to five healthy young adults, five healthy old adults, and five old adults with Parkinson's disease. The data are stride intervals [40]. The length of the records is around 800 samples for healthy subjects, and it is 200 for pathological ones, which suffices for a representative classification analysis using PE, according to recent studies [26]. Anyway, a variation of this subset, termed GAIT2, where all the records were cut short to 200 samples, was included in the experiments for comparative purposes. • FANT. The fantasia dataset contains 120 min of electrocardiographic and respiration data from 20 young and 20 elderly healthy subjects, and it is also available at [39]. Only the RR-interval time series were used in the experiments in this paper. RR records are also a field of intensive research [4,[41][42][43][44]. A detailed description of this database can be found in [43]. • RATS. Records of blood pressure readings from Dahl SS rats on high and low salt diets [45]. The database contains nine records of each class, sampled at 100 Hz, with a total length of 12,000 samples [39]. • WORMS. This database corresponds to the recorded 2D movement of genetically-modified worms [46][47][48], and is publicly available at www.timeseriesclassification.com. It was included to have a dataset not related to physiological records, and to widen the scope of the analysis. Specifically, there were 181 records of two classes (76 wild-type and 105 mutant type) of length 900 in this subset used in the experiments. • HOUSE. The records in this database are also publicly available at www.timeseriesclassification. com, and they correspond to non-physiological data too. There are two classes of 20 records each, with 1022 samples per record [49]. • PAF. This dataset contains paroxysmal atrial fibrillation (PAF) records [50]. There are two classes of 25 records, each one (PAF and PAF-free episodes) five minutes in duration. These records were also drawn from [39].

Performance Analysis
The performance of each approach under analysis was quantified using the classification accuracy: ratio of correctly classified records. Significance of this classification was qualitatively assessed by means of sensitivity (Se) and specificity (Sp), since very unbalanced results (for example, 0.7 accuracy with Se = 0.4 and Sp = 1 is not considered significant, at least a minimum 0.6 result is required), reflect an underlying poor discriminating power, regardless of the global classification accuracy. The classification threshold was taken as the ROC (receiver operating characteristic) curve point closest to the (0,1) coordinates [51]. It is important to note that the goal of the study was not to design an optimal classifier, but to carry out a fair comparison between the performances of the two measures tested under the same conditions. The quantitative significance of the classification accuracy was assessed by means of an unpaired Wilcoxon-Mann-Whitney test. This is a very robust test that does not require data normality [52]. The significance threshold was set at α = 0.05.

Results
The experiments were first carried out using the synthetic dataset described before. Since the transition probabilities were quite different, both PE1 and PE2 were capable of finding significant differences between the two classes generated. These results are shown in Table 1. It is important to note that as m increased, and the differences between classes in terms of patterns found became greater, and so did the PE2 accuracy, whereas PE1 performance was lower.  Table 2 shows the classification performances and statistical significance of the results achieved by the standard PE1 on real records. These experiments also included a variation of the parameter m, from 3 to 8, since input parameter influence is another topic of intense debate and research in the scientific literature. The previous experiments for PE were repeated using the new approach: PE2. These additional results are shown in Table 3. Table 3. Classification results achieved using the new approach, PE2, applied to all the time series databases. Classification performance is quantified with three parameters: sensitivity, specificity, and accuracy. In order to better support the addition of the number of admissible/forbidden patterns in the PE method, this number was computed for each dataset. The results of this experiment are shown in Table 4. A great difference between the number of patterns found for the classes under comparison, would suggest that the addition of this number could make a significant contribution to the discriminating power of PE. Despite the results shown in Tables 2 and 3, some of them could be debatable due to the low number of subjects, especially in the case of the GAIT database. For this reason, we have included two plots of the PE1 and PE2 values obtained to visually check the validity of the classification results for the GAIT database and for the RATS database, with nine members in each class. These plots are depicted in Figures 3 and 4 respectively. Anyway, it is important to note that the analysis should be viewed in comparative terms. We are not proposing a classifier for this GAIT dataset; we are assessing whether PE2 performs better than PE1 or not.   Range of values for PE1 and PE2 using the optimal m value to achieve the maximum possible classification accuracy using the two classes in the RATS database (High: high salt diet; Low: low salt diet). There is an overlapping of three or four members in PE1 for both classes, and that is surely why PE1 did not achieve statistical significance. For PE2, all the members in class High can be classified correctly, with an overlapping of the two top members of the Low class. This corresponds to the numerical results 0.77, 1, and 0.88 in Table 3. A summary of the best statistically significant performance achieved using the two approaches assessed (PE1,PE2) is shown in Table 5 for real records. As hypothesised, the combination of PE and the actual number of patterns, the newly proposed scheme PE2, achieved the highest accuracy, and was more robust (only one dataset was not significant), than PE alone. Since in a previous study [15], WPE exhibited the best performance of a group of PE algorithm improvements, Table 5 also includes the results of applying this method, along with a denormalised version (WPE + number of patterns found) using the same approach as for PE2, since WPE also enables the computation of the measure without including the number of expected patterns. WPE has also outperformed PE in other studies, such as in [53,54]. Table 5. Summary of the performances achieved by PE1 and PE2, including comparison with the same approach applied to WPE. Only the best significant case is reported, including the corresponding m value, in terms of classification accuracy only. If statistical significance was not achieved, the performance was labelled NS. Other factors that are taken into account to assess the utility of an entropy measure are its dependence on parameters or artefacts, such as time series length and robustness against noise. The results of a length influence analysis are shown in Table 6. In this case, PE1 and PE2 were compared using m values for which their performances were most similar and significant, and datasets with at least 750 samples in their time series. The results of the noise robustness test are shown in Table 7. It is important to note that the time series were probably already affected by noise, whose level was not known. Therefore, this test considered the signals completely free of noise when computing the resulting SNR, but in reality it should be considered lower in practical terms. The levels tested were 40, 30, 25, 20, and 15 dB SNR. The noise was a random uniform time series added to the experimental datasets, with 10 realisations for each test.

Discussion
There was a clear correlation in Table 1 between differences in patterns found and accuracy achieved using PE2 in synthetic records. Since the ordinal patterns were generated using a length of 3, for m = 3, and its multiple m = 6, the performance of PE1 was high, but far lower for other m values used in the computation. Those patterns scarcely or did not at all, at m = 3, give rise to a set of forbidden patterns for m > 3. The number of forbidden patterns grows superexponentially [20], and that is why the differences between classes became more apparent for higher m values, and only PE2 was able to take advantage of this effect.
PE2 yielded a higher classification accuracy than PE1 for all the experimental real datasets except PAF, and it was equal in the GAIT case (Table 5). Moreover, PE1 was unable to find differences in three out of the seven datasets, whereas PE2 only failed in one. Therefore, it can arguably be concluded that PE2 has a greater discriminating power than PE1. The experimental dataset was reasonably very varied and diverse, chosen from publicly available repositories, and used in other similar works.
It is important to note that the PE1 and PE2 results for low m values were very similar. This is probably due to the fact that there is no significant difference in terms of number of patterns found between classes for such low values. In fact, these differences become more apparent beyond m = 5 (Table 4).
Specifically, results for the BONN database were Se = 0.93, Sp = 0.90, and Acc = 0.91 for PE with m = 3, and 0.90, 0.97, and 0.93 respectively, for PE2 with m = 5. This is not a great difference, probably because the PE1 performance was already very good, despite the clear differences in terms of patterns found. The situation was the same for the GAIT dataset results. For the FANT dataset, PE1 did not find significant differences for all the m values tested, but PE2 did for m = 6, 7, precisely where the relative differences between the two classes in terms of patterns found was the highest. This was also the case for the RATS dataset: PE1 failed, but PE2 found differences, with higher accuracy correlated with higher relative differences in the number of patterns found; the maximum performance being reached at m = 8. The separability of the classes in the WORMS dataset was slightly improved with PE2. This could be due to the high variability in the number of patterns found, quantified in terms of standard deviation ( Table 4). The records in the HOUSE dataset were not distinguished by any of the measures tested. This was the only case. These records, as can be seen in Figure 2, are very dichotomic, and it is very likely that just a few ordinal patterns monopolise most of the matches, making it difficult to correctly capture the differences between records [26]. In other words, it is not only the number of patterns found, as with such a regular time series, the motif histogram will be extremely biased.
The results for WPE also confirmed that the number of actual patterns found carries significant discriminant information. This method has already demonstrated its performance in previous studies [12,15], and the replacement of the weights as the normalisation factor by the number of different patterns found, as for PE (WPE + patterns in Table 5), further improved the WPE performance in most of the experiments. This fact suggests that the PE2 approach can be a transversal improvement that could be applied to many methods at the normalisation stage in order to enhance their discriminating power, not just as a method on its own.
A case that deserved further investigation was the PAF dataset, since this was the only case with better PE1 performance, despite significant differences and low variability in number of patterns between the two classes ( Table 4). The hypothesis was that, somehow, the information provided by ordinal patterns and number of patterns cancelled each other out. A straightforward approach to solve this problem was to change the arithmetic operation in which T was involved, use a multiplication instead of a division. The PAF experiments were repeated with this PE2 algorithm tweak, whose results are shown in Table 8. PE2 performance improved significantly and again was better than PE1. The results in Table 8 confirm that the patterns found provide discriminant information, but maybe in some specific cases the approach should be slightly different in terms of integration with ordinal patterns. This experiment was repeated for some records that already exhibited good performances using the initial approach described in Algorithm 1. These additional results, shown in Table 9, confirm PE2 is a robust approach in general, and only a few specific cases, such as that of PAF, need a more customised scheme. However, even with both approaches, it was impossible to find significant differences for the HOUSE database.
Although the desnormalisation can be detrimental for low m values when not all the records have the same length, for greater m values the forbidden patterns become more apparent and the additional information provided should make PE2 outperform PE1. In other words, in the worst case PE2 and PE1 should be equivalent in terms of classification accuracy, but for real time series, which usually exhibit some degree of determinism, PE2 will yield the best performance. With the two approaches proposed, PE2 is a clear winner over PE1. Table 9. Classification results achieved using the new approach PE2, applied to other datasets, and Despite the p ⇐ p i ← c i * T change applied to the PAF case, the new method proposed is still that described initially in Algorithm 1. This final adjustment hints that there might not be a single way to integrate the information of the relative frequency of motifs with its absolute number, and other studies will be necessary to find a better combination and a more optimised algorithm. We tried to keep a very similar algorithm to PE to facilitate adoption and implementation, but there is room for PE2 improvement with regard to generalisation, performance, and even normalisation that will have to be addressed in further studies.

Dataset
The results for the GAIT2 database were not significant (all GAIT records cut short to 200 samples), and therefore, were far worse than those for GAIT. It could be argued that differences in the GAIT case were mainly due to differences in lengths, and this could be the case when comparing old-healthy with old-Parkinson's, and young-healthy with old-Parkinson's, since the lengths were 800-200 originally in both cases. However, the comparison of old-healthy with young-healthy became not possible as well, and this pair had the same length in the GAIT database. Therefore, it can be arguably stated that the lack of significance was due to the insufficient length, since 200 samples are borderline [26], not the lack of differences in length, another factor that requires further research. Anyway, the important point is the comparative analysis between PE1 and PE2.
It could also be hypothesised that the differences in number of patterns found would suffice to classify the records. In order to assess this point, a few experiments were repeated using the complete PE2 method and only the number of actual patterns found as single classification features. Although in some cases the performances of both approaches were very similar (for example, for the BONN database, with m = 6, the performances were PE2 = 0.935 vs. 0.930), in others, they were quite different (For the RATS dataset, with m = 6, the performances were PE2 = 0.78 vs. 0.72). Since the computational cost of both approaches is almost the same (the algorithm should be run almost completely; only the final Shannon calculation could be omitted), such a simplification is not advisable because not all the differences only lie on the number of admissible patterns.
The parameter and noise influence analysis provided an additional insight into the PE2 approach capabilities. The comparative length influence analysis between PE1 and PE2 in Table 6 showed that both metrics exhibit a similar robustness against N variations and short lengths. For extremely short lengths, 50 samples, it was not possible to find differences between classes, which is understandable since 50 samples do not suffice to provide a reasonable pattern frequency estimation. At 250 samples, most of the results became significant, but it was at 500 samples where both methods provided significant classifications in all cases, except PE1 for the WORMS database. With 750 samples, performances were really close to those achieved with the entire records.
The performances of PE1 and PE2 with regard to noise interference seemed very similar, except for the PAF dataset (Table 7). This could be due to the fact that noise impact can also be considered in terms of new ordinal patterns; random time series usually contain all the possible m! patterns as a sign of non-determinism. Therefore, since the difference between PE1 and PE2 lies on the number of patterns found, it can arguably be assumed that, in general, as the noise level increases, PE2 will converge to the PE1 performance, since the differences in terms of number of patterns will become blurred.
PE2 is derived from PE1, and without specific studies yet, it would be sensitive to assume that some PE1 drawbacks can be inherited by PE2. Thus, PE1 could be influenced by amplitude differences in the ordinal patterns [15], by ties [17], or by the record length [26]. A similar set of characterisation studies would be necessary specifically for PE2 to shed some light on these possible issues.

Conclusions
Every year quite a number of new tools to quantify the dynamical features of time series are described in the scientific literature. These new tools are claimed to be more efficient in algorithmic terms, more sensitive, more robust, or less dependent on input parameters, among many other possible benefits.
As such, PE2 was introduced as an improvement over PE by taking advantage of the differences between time series classes in terms of numbers of different patterns actually found. The present study assessed this discriminant power using several real-life datasets, and we could conclude that the discriminating capabilities of ordinal patterns' relative frequencies and their counts are clearly complementary and synergistic. This led us to try to combine both measures in a single method to take advantage of their strengths and simultaneously minimise their possible weaknesses. The scheme used for PE using dynamic lists provided the algorithmic template to merge ordinal pattern and pattern number information together by changing the way histogram bins were normalised, and keeping the algorithm's implementation simple and similar to its ancestor PE.
According to the results obtained, the PE2 approach can be considered a very promising tool in the field of symbolic dynamics. It should not be claimed to be a cure-all method, but the classification performance confirmed the hypothesis, and PE2 seems to be able to seamlessly exploit the synergy between PE and the number of patterns found in most cases. It was clearly more robust, since statistical significance was reached in six out of the seven datasets, two more than with PE1 (PE). It also achieved the maximum performance of the two methods tested in five cases, or six if the final PE2 algorithm tweaking is considered (Section 4).
The PE2 algorithm is just a little bit more complex than that of PE, but more memory efficient, since it only needs to store the patterns found, not all the possible m! ones. Moreover, the algorithm introduced can be easily further optimised. In addition to implementation issues to save memory requirements or computational cost, the algorithm could be improved in terms of ordinal and pattern number influences on the final calculations by using a normalisation or weighting scheme based on an additional parameter, such as A/1 − A, as in [14]. Additionally, other factors could be included as additional symbols in the motifs. This way, other properties such as sequence amplitude, would become part of the comparisons. In fact, we included the q parameter in the PE2 method, as described in [13], and the classification performance of PE2 increased in some cases. The experiments with WPE and WPE+Patterns also confirmed this point.
This new approach will need further studies using other databases and other integration schemes. The influences of equal values in the subsequences [17], and time delay τ, could also be characterised. Further integration with other PE improvements could be worth exploring [6,[12][13][14]16] in future.