Next Article in Journal
Analyzing the Effects of Topological Defect (TD) on the Energy Spectra and Thermal Properties of LiH, TiC and I2 Diatomic Molecules
Previous Article in Journal
Improved Adaptive Augmentation Control for a Flexible Launch Vehicle with Elastic Vibration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PFC: A Novel Perceptual Features-Based Framework for Time Series Classification

1
College of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
2
College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(8), 1059; https://doi.org/10.3390/e23081059
Submission received: 9 July 2021 / Revised: 7 August 2021 / Accepted: 12 August 2021 / Published: 17 August 2021

Abstract

:
Time series classification (TSC) is a significant problem in data mining with several applications in different domains. Mining different distinguishing features is the primary method. One promising method is algorithms based on the morphological structure of time series, which are interpretable and accurate. However, existing structural feature-based algorithms, such as time series forest (TSF) and shapelet traverse, all features through many random combinations, which means that a lot of training time and computing resources are required to filter meaningless features, important distinguishing information will be ignored. To overcome this problem, in this paper, we propose a perceptual features-based framework for TSC. We are inspired by how humans observe time series and realize that there are usually only a few essential points that need to be remembered for a time series. Although the complex time series has a lot of details, a small number of data points is enough to describe the shape of the entire sample. First, we use the improved perceptually important points (PIPs) to extract key points and use them as the basis for time series segmentation to obtain a combination of interval-level and point-level features. Secondly, we propose a framework to explore the effects of perceptual structural features combined with decision trees (DT), random forests (RF), and gradient boosting decision trees (GBDT) on TSC. The experimental results on the UCR datasets show that our work has achieved leading accuracy, which is instructive for follow-up research.

1. Introduction

In the information age, massive amounts of data have been generated over time. These data are closely related to many studies. In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series [1] is a sequence taken at successive equally spaced points in time. Time series contains information on time dimension and data dimension, and it exists in many fields such as economy, life science, military science, space science, geology and meteorology, and industrial automation. Time series classification [2,3,4] is an essential task that has attracted widespread attention. Normally, time series classification refers to assign time series patterns to a specific category, for example, judge whether it will rain or not through a series of temperature data [5] or determine whether the patient has Parkinson’s disease through a period of physiological data [6,7]. Dau et al. [8] proposed UCR Time Series Classification Archive (UCR) for this task, including 128 datasets from different fields such as ECG, Sensor, and Image. In order to understand TSC more intuitively, Figure 1 shows some representative datasets in UCR. These datasets almost cover the existing TSC tasks, show the morphological structure of various time series, and lay the foundation for researchers to explore general classification methods. In order to solve this problem, many methods have been proposed, which can be divided into five categories according to their different cores: dictionary-based, distance-based, interval-based, shapelet-based, and kernel-based.
The dictionary-based method refers to the idea of natural language processing. Researchers believe that a time series is a special sentence composed of discrete words or words. How to segment and map the time series into characters is the first issue that needs to be considered. There are three main time series symbolization methods: Piecewise Aggregate Approximation (PAA) [9,10], Symbolic Aggregate approXimation (SAX) [11,12], and Symbolic Fourier Approximation (SFA) [13]. Subsequently, the Bag-of-SFA Symbols (BOSS) method based on the bag-of-words model was proposed [14]. This method records high-frequency symbol features and uses them to distinguish different types of time series samples. Matthew et al. [15] and James et al. [16] further proposed Contract BOSS (cBOSS) and Spatial BOSS (S-BOSS). In addition, Word Extraction for Time Series Classification (WEASEL) [17] is also a typical dictionary-based method composed of a supervised symbolic time series representation for discriminative word generation and the Bag of Patterns (BOP) [18] model for building a discriminative feature vector.
Many TSC methods focus on the distance between time series. Generally, a time series can be regarded as a point in a multi-dimensional space, and the dimension of this multi-dimensional space depends on the length of the time series. Different types of time series will have different aggregations. At this time, distance is an effective way to distinguish. K-Nearest Neighbors (KNN) and the Elastic Ensemble (EE) [19] are two commonly used methods. Ben et al. [20] proposed Proximity Forest to model a decision tree forest that uses distance measures to partition data. It should be noted that since most distance calculations use the form of “one to one”, samples of equal length are necessary. For unequal length sequences, dynamic time warping (DTW) [21,22,23] is a robust calculation method, which can avoid differences in length and shape. Combining KNN and DTW is a way to take advantage of both at the same time [24,25].
In reality, different types of time series may have precisely the same statistical characteristics such as mean, variance, standard deviation, and so on [26]. In order to avoid this problem, the interval-based method focuses on local features rather than overall features. Deng et al. [27] proposed a Time Series Forest (TSF) model that converts time series into statistical features of sub-sequences and uses random forest for classification. Cabello et al. [28] further constructed Supervised Time Series Forest (STSF), an ensemble of decision trees built on intervals selected through a supervised process. Random Interval Spectral Ensemble (RISE) is a popular variant of time series forest [29]. RISE differs from time series forest in two ways. First, it uses a single time series interval per tree. Second, it is trained using spectral features extracted from the series instead of summary statistics. Since RISE relies on frequency information extracted from the time series, it can be defined as a frequency-based classifier.
The shapelet-based method draws inspiration from pattern recognition. Shapelets are defined in [30,31] as “subsequences that are in some sense maximally representative of a class”. Informally, if we assume a binary classification setting, a shapelet is discriminant if it is present in most classes and absent from the series of the other class. However, any subsequence may be distinguishable, and the length of the subsequence is arbitrary, which means that all samples and their subsequences need to be checked through a sliding window, and the search space for shapelets is enormous. In response to this problem, Ji et al. [32,33] proposed a fast shapelets selection algorithm.
Building on the recent success of convolutional neural networks for time series classification, Dempster et al. [34] realize that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Therefore, they proposed ROCKET, a kernel-based time series classification method. This is a new direction for TSC, which can both reduce computational complexity and improve accuracy.
By analyzing the five classification methods, we realized that the existing algorithms are essentially trying to find efficient distinguishing features by learning all the original information of the sample, which leads to high computational complexity and resource consumption. In fact, for human beings, it does not require all the information to distinguish time series. On the contrary, we only pay attention to a few critical data points, which are enough to describe the approximate outline of time series samples and present a significant distribution. This paper proposes a classification framework based on perceptual features, which can extract support points of morphological structure from the original time series and further obtain interval-level and point-level features for classifiers such as decision trees. The contributions of our work are described below.
  • An improved algorithm called globally restricted matching perceptually important points (GRM-PIPs) is proposed, which avoids the redundancy caused by sequential matching in traditional important point extraction.
  • How many data points are necessary to describe complete information? We conducted in-depth research on this question and verified our opinions through mathematical proofs and experiments.
  • The data points extracted by GRM-PIPs can divide the time series into sub-sequences similar to shapelets. We used statistical features such as mean, standard deviation, slope, skewness, and kurtosis to enhance discrimination further.
  • Most classifiers learn the information of the original time series, which is not suitable for perceptual features. Therefore, we matched a suitable classifier and proposed a complete perceptual features-based framework.
The remainder of this paper is organized as follows. In Section 2, related work about PIPs, decision trees, random forests, and gradient boosting decision trees are presented. Section 3 describes the details about PFC, including GRM-PIPs, perceptual feature extraction, and classifiers adaptation. Section 4 presents the experimental setup and performance of the approach we proposed, as well as a comparison of the experiments and performances. A discussion about the differences in experimental results is also given in Section 4. Finally, the conclusions and directions for future research are given in Section 5.

2. Related Work

2.1. Perceptually Important Points

For time series, avoiding point-to-point local comparison is the key to reducing computational complexity. In time series pattern mining, unique, and frequently occurring patterns can be abstractly represented by several critical points. It is precisely through these points with important visual impacts that humans remember specific time series patterns [35]. The definition of perceptually important points was first introduced in reference [36]. The PIPs algorithm can retain the key turning points in the time series, and its ability to capture the critical points in the time series has been verified in the time series segmentation and pattern recognition [37,38,39].
Interestingly, PIPs have been widely used in the research of stock time series. Fu et al. [40] used PIPs as a new time series segmentation method to extract the uptrend and downtrend patterns. Mojtaba et al. [41] regard PIPs as a dimensionality reduction method similar to PCA and combine it with support vector regression to predict the trend of the stock market. The turning point in the stock time series indicates a substantial change in the market, and PIPs are sensitive to these dividing points, which is also the advantage of PIPs.
In general, we would define any time series as T = t 1 , t 2 , , t n n Z + . This is a classic one-dimensional definition, which treats a time series as a string of data arranged chronologically. However, a one-dimensional data sequence is considered to have no morphological structure and cannot be displayed on a two-dimensional plane. Therefore, we need to upgrade the traditional one-dimensional definition to two-dimensional to explain the calculation process of PIPs. By introducing data in the time dimension, the two-dimensional definition of a time series is T = x 1 , y 1 , x 2 , y 2 , , x n , y n n Z + , where x n represents the current data point in the n t h position in the entire time series, and y n corresponds to amplitude. PIPs uses a concise idea to extract important points in the morphological structure of time series. The process is shown below.
Definition 1.
Perceptually Important Points.
Given a time series sample T = x 1 , y 1 , x 2 , y 2 , , x n , y n , n > 2 , n Z + , an empty list L p is set to save the extracted perceptually important points. In general, when extracting m important points, the following steps should be followed.
Step 1: Put the first point x 1 , y 1 and the last point x n , y n in T as initial two PIPs into L p .
Step 2: Check each point in T and calculate the distance between them and x 1 , y 1 and x n , y n . Choose the point with the largest distance as the third PIP and save it in L p .
Step 3: The fourth PIP is the point that maximizes its distance to its adjacent PIPs (which are either the first and the third, or the third and the second PIP). It is also necessary to save the fourth PIP into the L p .
Step 4: For each new PIP, use the same method as the fourth PIP, repeat Step 3 until the length of L p is equal to m.
For PIPs, there are three distance measures including the euclidean distance (ED), the perpendicular distance (PD), and the vertical distance (VD). The calculation formula of the vertical distance V D P c between P c x c , y c and the line P a P b is shown in Formula (1) and Figure 2. We use VD to show the calculation process of PIPs through a simple example.
V D P c = y c y d = y c x b x a · x d x a y b y a + y a
We define the extraction of m PIPs from the sample T as P I P s T , m . For example, if the one-dimensional representation of T is T = 0 , 4 , 2.5 , 3 , 1 , 5 , 6 , 1 , 2 , 1 , 0 , its corresponding two-dimensional representation as below. The process of finding six PIPs from T is shown in Figure 3.
T = 0 , 0 , 1 , 4 , 2 , 2.5 , 3 , 3 , 4 , 1 , 5 , 5 , 6 , 6 , 7 , 1 , 8 , 2 , 9 , 1 , 10 , 0
It should be noted that when there are two points with the same maximum vertical distance, the first calculated point is usually set as the new PIP.

2.2. Decision Tree and Ensemble Methods

A Decision Tree (DT) is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. DT is a non-parameter supervised learning method used for classification and regression. Its purpose is to create a model to learn simple decision rules from data features to predict the value of a target variable [42]. DT are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning [43].
DT is a predictive model in machine learning, which represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, and each bifurcation path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path from the root node to the leaf node. The decision tree has only a single output. If you want to have a complex output, you can build an independent decision tree to handle different outputs [44]. Simultaneously, DT is a frequently used technique in data mining, which can be used to analyze data, and it can also be used to make predictions [45].
The applications of decision tree on the TSC task mainly has three directions, pattern recognition, shapelet transformation and features selection. Pierre [4] believed that many time-series classification problems can be solved by detecting and combining local properties or patterns in time series and he proposed a technique based on DT to find patterns which are useful for classification. Qiu et al. [46] forecast shanghai composite index based on fuzzy time series and improved C-fuzzy decision trees. Willian et al. [47] explored shapelet transformation for time series classification in decision trees and develop strategies to improve the representation quality of the shapelet transformation. In essence, the DT uses the “if-then-else” rule to learn the data, and the deeper the rule is applied, the better the data fitting will be.
There are too many research results and knowledge about DT, and we would not repeat them specifically. The following is a simple example to introduce DT. We assume a scenario that includes three factors: season, wind, and time. In this scenario, record the data of whether someone is doing morning exercises, as shown in Table 1. This scenario is a typical classification task, and the decision tree constructed based on Table 1 is shown in Figure 4.
The ensemble methods is a high-level application, and the decision tree is regarded as a basic/weak estimator. The goal of the ensemble methods is to combine the predictions of multiple basic estimators to achieve better generalization or robustness than a single estimator. The ensemble methods generally fall into three categories:
  • Bagging Method. This method usually considers the homogeneous weak estimators, learns these weak estimators independently and in parallel, and combines them according to some deterministic average process [48]. In general, the combined estimator is better than the single estimator because its variance is reduced. Random forest (RF) [49] is a typical Bagging method, it can build a large number of decision trees to filter features to get the best decision rule set.
  • Boosting Method. The core of this method is also a combination of homogeneous weak estimators. It sequentially learns these weak estimators in a highly adaptive method (each basic estimator attempts to reduce the bias of the combined estimator), and combines them according to a certain deterministic strategy. The current popular boosting methods include AdaBoost and Gradient Tree Boosting. Freund and Schapire proposed the former in 1999 [50]. Its core idea is to train a series of weak estimators by repeatedly modifying the weights of the data [51]. On the other hand, Gradient Tree Boosting [52] is a generalization of the lifting algorithm for any differentiable loss function. It can be used for classification and regression and applied to various fields, including web search ranking and ecological environment [53,54].
  • Stacking Method. Different from the previous methods, the stacking method uses heterogeneous estimators, learns them in parallel, and combines them by training a “meta mode” to output a final result according to different predictions [55].

3. Perceptual Features-Based Framework

This section will introduce the perceptual features-based framework (PFC) in detail, divided into three parts: time series preprocessing with GRM-PIPs, feature extraction, and classifier. These parts have a precise sequence in our framework.

3.1. Time Series Preprocessing with GRM-PIPs

The purpose of this part is to traverse the time series and extract a certain number of PIPs. Based on the traditional PIPs algorithm, we determined the global optimal selection strategy and proposed a restrictive selection method. The relevant definition is as follows.
Definition 2.
Globally Restricted Matching Perceptually Important Points.
Given a time series sample T = x 1 , y 1 , x 2 , y 2 , , x n , y n , n > 2 , n Z + , an empty list L p is set to save the extracted perceptually important points, the interval between adjacent PIPs is defined as δ with δ Z + , δ 4 . Commonly, when the number of extracted PIPs m is large enough ( m = n ), all points in T will be identified as PIPs, but if the parameter δ is considered, the upper limit of PIPs will be further restricted. The calculation steps of GRM-PIPs are as follow.
Step 1: Put the first point P 1 x 1 , y 1 and the last point P n x n , y n in T as initial two PIPs into L p .
Step 2: Set a temporary PIP P t , which can be any point in T, and calculate the vertical distance V D t between P t and the line P 1 P n . P t divides the sequence { P 1 , ..., P n } into two subsequences: { P 1 , ..., P t } and { P t , ..., P n }. If the length of any subsequence is less than δ, the current P t should not be considered, and a new point is set as P t to continue the calculation until a P t is found that can maximize V D t and satisfy that the length of all subsequences is greater than δ, then save this P t in L p as the third PIP.
Step 3: The fourth PIP is the point that maximizes the vertical distance to its adjacent PIPs (which are either the first and the third, or the third and the second PIP) and controls the length of all segmented subsequences are greater than δ. It is also necessary to save the fourth PIP into the L p .
Step 4: For each new PIP, use the same recursive method as the fourth PIP, repeat Step 3 until the length of L p is equal to m.
GRM-PIPs ensure a well-distribution of PIPs in the entire time series by adding a restriction on the interval length. A simple example in Figure 5 is shown to distinguish between the traditional PIPs and the GRM-PIPs proposed by us.
T = 0 , 1 , 2 , 10 , 9 , 10 , 9 , 9 , 6 , 4 , 3 , 1 , 5 , 3 , 10 , 10 , 8 , 9 , 10 , 11 , 9 , 6 , 3 , 0
In this example, we set a time series sample T in (7) with the length n = 23 . Figure 5 shows that the morphological structure of T is composed of two peaks and one trough. Seven PIPs were extracted from it. There is an apparent difference between the results of GRM-PIPs and PIPs, which are highlighted by red and green circles, respectively. Traditional PIPs are easy to fall into local optima because there is no interval constraint, and the selected PIPs do not contribute to the depiction of the overall structure. GRM-PIPs avoids this problem and accurately extracts PIPs that are more conducive to generalizing structural features.
In GRM-PIPs, affected by the length of interval δ , the number of extracted PIPs has an upper limit. In order to calculate the upper limit, we need to define the “quotient” first. Suppose there are two integers a and b , b 0 , there must be a pair of integers q and r satisfy a = q · b + r , and q can be called the quotient of a divided by b , abbreviated as q = Q a , b . In this way, the number of extracted PIPs can be calculated as follows:
2 m Q n , δ + 2
It is obvious that the upper limit is closely related to δ . In our research, we set δ = 4 because the subsequent feature extraction determines this value. We would explain the reason in detail in Section 3.2.

3.2. Feature Extraction

In this paper, we extract two features from time series, including point-level features F P and interval-level features F I .
The point-level feature is straightforward, which is the coordinates of PIPs. We found that for different classes of time series, the distributions of PIPs in the two-dimensional space are also significantly different. Most importantly, these special distributions are consistent on the training set and the test set. Therefore, the point-level feature is distinctive and consistent, should be taken seriously. Some representative UCR datasets shown in Figure 6 confirm our views.
On the other hand, PIPs can generate excellent time series segmentation. Many datasets have no significant differences in the distribution of PIPs. At this time, the interval-level features need to be supplemented to help the classifier further distinguish samples of different categories. There are five interval-level features used by us:
  • Arithmetic mean. The arithmetic mean (or simply mean) x ¯ of a sequence is the sum of all of the amplitudes divided by the length of the sequence n. This is a rough feature used to describe the average level of all data in the sequence. The calculation of the arithmetic mean follows Formula (4).
    x ¯ = 1 n i = 1 n x i = x 1 + x 2 + · · · + x n n
  • Standard deviation. In statistics, the standard deviation σ is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation plays an important role in distinguishing frequently fluctuating series from stable changing series. The calculation of this feature is shown below.
    σ = 1 n i = 1 n x i x ¯ 2
  • Slope. In mathematics, the slope or gradient of a line is a number that describes both the direction and the steepness of the line. Slope is calculated by finding the ratio of the “vertical change” to the “horizontal change” between (any) two distinct points on a line. We can also abstract any subsequence as a straight line connecting two adjacent PIPs and the trend can be judged by calculating the slope of the interval. For sequence S = x 1 , y 1 , , x n , y n , its slope can be calculated according to the following formula.
    m = Δ y Δ x = y n y 1 x n x 1
  • Kurtosis. In probability theory and statistics, kurtosis is a measure of the “tailedness” of the probability distribution of a real-valued random variable. The standard measure of a distribution’s kurtosis is a scaled version of the fourth moment of the distribution. Objectively speaking, kurtosis is not exactly the same as peakedness. Higher kurtosis means that the data has large deviations or extreme abnormal points, which deviate from the mean. However, in most cases, when the amplitude in a period of time in the time series is high, the corresponding kurtosis is high. In the calculation of kurtosis G 2 we use Standard unbiased estimator. It is worth noting that n represents the number of samples, and the formula needs to calculate n 3 . As part of the denominator, it is required to be n 3 0 , which means that n must be a positive integer greater than 3. This is why we require the parameter δ to be equal to 4.
    G 2 = k 4 k 2 2 = n 2 n + 1 m 4 3 n 1 m 2 2 n 1 n 2 n 3 · n 1 2 n 2 m 2 2 = n + 1 n n 1 n 2 n 3 · i = 1 n x i x ¯ 4 k 2 2 3 · n 1 2 n 2 n 3
  • Skewness. In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. Skewness can be visually understood as the degree of inclination of the shape to the left or right. For example, in the two sequences shown in Figure 7, S 2 is almost obtained by mirror flipping of S 1 , which is an indistinguishable situation for the mean, standard deviation, slope, and kurtosis. The use of skewness makes up for this deficiency. The calculation formula of skewness G 1 is similar to kurtosis, is a scaled version of the third central moment.
    G 1 = k 3 k 2 3 / 2 = n 2 n 1 n 2 · b 1 = n 2 n 1 n 2 · m 3 σ 3 = n 2 n 1 n 2 · 1 n i = 1 n x i x ¯ 3 1 n 1 i = 1 n x i x ¯ 2 3 / 2

3.3. Classifer and the PFC Framework

In the TSC dataset, the data format is D = d a t a , l a b e l = T 1 , , T d , L 1 , , L d with d time series and corresponding labels. We extract m PIPs through GRM-PIPs, and get m 1 intervals, thereby converting the original dataset into the corresponding feature set F D = F P , F I . Subsequently, the training set in the F D is input into the classifier and the test set is used for verification.
We realize that F D is a high-level representation of raw data, essentially a combination of many features, and an explicit expression of morphological information. Therefore, we are more inclined to choose a classifier that is conducive to feature processing. In the PFC framework, we have selected three levels of classifiers, which are the decision tree as the basic estimator, the random forest with bagging idea, and the gradient boosting decision tree using boosting theory.
There are many ways to implement decision trees, such as ID3, C4.5, and CART. Under normal circumstances, the effect of CART is better than other methods, so we decided to implement CART. The reason for choosing RF and GBDT is that they are classifiers developed based on decision trees. RF conducts joint learning by constructing a large number of decision trees and integrates all classification results. RF equalizes the weights of all basic estimators, while GBDT gradually upgrades the weak classifiers to robust classifiers by iteratively changing the weights.
A schematic diagram of the PFC framework is shown in Figure 8. The innovation of our work is to propose GRM-PIPs, extract the combination of point-level and interval-level features, and use a suitable classifier to form a framework for TSC tasks. What we want to explore is the effect of the entire framework. Therefore, we did not make any special optimizations to the classifiers, and all the classifiers use traditional implementation methods. Further improvement of the classifier is our future work.

4. Performance Evaluation and Discussion

4.1. Experimental Design

The UCR archive has been widely used as a benchmark to evaluate TSC algorithms [8] (check details in http://www.timeseriesclassification.com, accessed on 1 May 2021). It currently contains 128 datasets, 15 of these are unequal length, 15 of there are missing values, and one (Fungi) has a single instance per class in the train files. Given this situation, in order to evaluate PFC, we select part of the UCR dataset. Since the two-category data is typically exclusive to each other, we divide the verification into two types, two-category and hybrid.
In the verification of two-category, we selected all the two-category datasets in UCR Archive and excluded the two with many missing values. Finally, 40 datasets were used for comparison experiments. Considering that PFC is a fast and straightforward classification method, it is unfair compared with some methods that use neural networks and consume substantial computing resources and time. Therefore, we exclude some deep learning algorithms for the benchmark model, such as ResNet and HIVE-COTE. The following five classification algorithms were selected for comparison, including the word extraction for time series classification (WEASEL), bag of symbolic-fourier approximation symbols (BOSS), time series forest (TSF), random interval spectral ensemble (RISE), and canonical time-series characteristics (Catch22). The results of these comparison algorithms have been officially recognized and released.
In the hybrid verification, we introduced some methods published recently as comparisons. These methods include extreme-SAX (E-SAX, 2020) [56], interval feature transformation (IFT, 2020) [57], and discriminative virtual sequences learning (DVSL, 2020) [58]. PFC is tested on the same datasets with these methods, including two-category datasets and multi-category datasets.
In addition, through the analysis of the experimental results, we would find answers to the following questions:
  • What is the appropriate number of PIPs? The more always means the better?
  • Does the number of PIPs have the same effect on different classifiers?
All experiments strictly follow UCR’s division of training set and test set. The classification accuracy is uniformly adopted as the metric. Some methods use classification errors and we convert them to accuracy. The number of time series correctly classified is defined as n c , and the total number of time series of test set denoted by n t . The calculation formula for classification accuracy ( A C C ) and error ( E R R ) is shown below.
A C C = n c n t , E R R = 1 A C C
Due to the randomness of RF and GBDT, the final experimental result is an average of 50 runs under the same parameters. At the same time, we do not do particular parameter optimization for DT, RF, and GBDT. DT uses the information gain to measure the quality of a split, and the nodes are expanded until all leaves are pure. There are 600 trees in RF, and the number of boosting stages to perform in GBDT is 600, too.

4.2. The Verification of Two-Category

The information of 40 two-category datasets in UCR Archive is listed in Table 2. Obviously, these datasets cover various situations such as short-sequence classification (Chinatown and ItalyPowerDemand), long-sequence classification (HandOutlines, HouseTwenty, and SemgHandGenderCh2), unbalanced training set and test set (ECGFiveDays and FreezerSmallTrain), and so on.
The classification accuracy of the five benchmark methods and PFC on these datasets is shown in Table 3. We found that not all datasets have public results on the five benchmark methods, and the results of two datasets (FordB and HandOutLines) are missing. These two datasets were excluded when calculating the number of times to obtain the best accuracy, and the experimental results of the remaining 38 datasets were considered.
The PFC framework achieved the best accuracy in 13 of 38 UCR datasets. What is interesting is that when DT and GBDT are used as classifiers, 6 times catch the best, which is less than 10 times when RF is used. Nevertheless, their performance has been better than RISE, TSF and Catch22.
This seems to be a counter-intuitive result. As the most complex classifier, GBDT has not achieved the best results. However, this situation can be explained. We noticed that there is a significant difference in the number of PIPs extracted by the GRM-PIPs algorithm when the best results are obtained (for detail see Appendix A). When DT and GBDT achieve their best results, the number of PIPs is almost the same, while RF requires more PIPs to achieve higher accuracy. This means that the upper limit of RF performance is the highest among the three classifiers. This may be caused by no parameter optimization. GBDT and DT usually rely on adjusting parameters to improve accuracy, while RF is not sensitive to parameters, and a large number of random decisions can effectively compensate for parameter defects.
We conduct an in-depth analysis of the experimental results shown in Table 3, which are divided into two aspects:
  • The impact of the length of the time series on accuracy. We sort all the datasets according to their length, and the ones with a length less than 100 are classified as a group of G 1 , which contains 11 datasets. G 2 has 11 datasets, the corresponding length is greater than 100 but less than 300. G 3 covers 15 datasets ranging in length from 300 to 1000. The remaining three datasets whose length exceeds 1000 are set as G 4 . From G 1 to G 4 , the number of times that PFC achieves the best accuracy is 3, 6, 4, and 0, respectively. The results show that PFC is good at distinguishing time series samples whose length ranges from 100 to 1000. For samples with a length less than 100, GRM-PIPs can only extract 27 PIPs at most and generate 26 intervals, which results in the feature dimension being much larger than the original sequence dimension, and the information redundancy makes the classifier unable to obtain robust decision rules. On the other hand, since we set up to extract only 30 PIPs in the experiment, the features of samples longer than 1000 may be incompletely extracted.
  • Does the imbalance of the training set and test set affect the accuracy of PFC? As far as the current results are concerned, the training set and test set are not factors that affect accuracy.

4.3. The Hybrid Verification

In hybrid verification, we will compare with the TSC methods in three recently published papers. Since the datasets validated by these methods are different, we decided to compare them one by one and use the same datasets.
First, we test the performance of PFC and DVSL. Abhilash et al. [58] believed that the existing VSML methods employ fixed virtual sequences, which might not be optimal for the subsequent classification tasks. Therefore, they proposed DVSL to learn a set of discriminative virtual sequences that help separate time series samples in a feature space. Finally, this method was validated on 15 UCR datasets. The results of the comparative experiment are shown in Table 4.
The results show that PFC performed better on the same 15 UCR datasets and surpassed DVSL for the best accuracy in 12 of them. At the same time, we also notice that the accuracy of PFC is much lower than that of DVSL in datasets such as Beef. Figure 9 shows the distribution of PIPs in Beef. We can clearly find that only the distribution of L a b e l = 1 (represented by the red dots) is distinguishable, and the distributions of the other categories are highly similar. We believe that PFC can distinguish some samples with obvious distinguishing characteristics, but if these characteristics are highly similar in multiple types of samples, PFC will be invalid. Although this situation is accidental, PFC is based on morphological perception information, and it is difficult to process samples with small differences in morphology.
The second comparison method is IFT [57], which also uses PIPs. The difference is that IFT adopts information gain-based selection for interval features, which makes the whole method a special decision tree. Since both PFC and IFT perceive the importance of morphological features, this is a meaningful comparative experiment. IFT was validated on 22 UCR datasets, and we also tested on the same datasets. The comparison results are shown in Table 5.
On these datasets, the performance of PFC almost completely surpasses IFT. However, one exception to the results, was the huge difference in the accuracy of the PFC and IFT on a dataset called ShapeletSim. The samples in ShapeletSim present a form similar to high-frequency sinusoidal signals, which causes most of the PIPs to be located at the peaks and troughs. At this time, the distribution of PIPs can describe the boundary of the sample only, a rectangle in Figure 10. The crux of the problem is not just the abnormality of these distributions, we realize that they lack the necessary distinguishability. On this dataset, the performance of IFT is almost perfect. The reason may be that its feature selection is different from our work, and these unique features play an important role in classification.
Finally, we set our sights on E-SAX. One of the most popular dimensionality reduction techniques of time series data is the Symbolic Aggregate Approximation (SAX), which is inspired by algorithms from text mining and bioinformatics. E-SAX uses only the extreme points of each segment to represent the time series [56]. The essence of SAX is to reduce the dimensionality of time series, which is the same as PIPs. For these reasons, we chose E-SAX as the comparison method.
There are 45 UCR datasets used for comparison experiments, and all the results are listed in Table 6. It is important to point out that E-SAX originally used classification error E R R as the metric. In order to facilitate comparison, we convert the classification error E R R into classification accuracy A C C according to formula (9).
As shown in Table 6, the PFC achieves the most best ACC, and best performance in 34 out of 45 datasets.These datasets are divided into 17 two-category datasets and 28 multi-category datasets. PFC has achieved significant advantages in 13 two-category datasets and 21 multi-category datasets. Although PFC is still at a disadvantage in some datasets, we found that the results obtained by PFC are very close to E-SAX, which is based on the premise that we have not optimized any parameters and model structure. We believe that PFC still has the possibility of improvement.
This comparison experiment and the previous two-category verification have a very small gap in the number of datasets used. It is equivalent to removing part of the two-category datasets and introducing a large number of multi-category datasets based on the latter. However, the number of times that the PFC using RF as a classifier achieves the best accuracy has greatly increased, far exceeding the cases of DT and GBDT. RF can rely on a large number of decision trees to satisfy multi-classification tasks, and this advantage has been demonstrated.

4.4. Discussion on the Number of PIPs

This is a meaningful discussion, because most of the current papers ignore this problem. No matter what operations will be performed later, we usually extract m PIPs from the original time series at first. There are two questions that need to be answered at this time:
  • What is the appropriate number of m? The more always means the better?
  • Does the number of PIPs have the same effect on different classifiers?
The second question is relatively easy to answer. The data listed in Appendix A. gives us the answer: the same m has different effects on different classifiers. RF and GBDT always require a large number of PIPs to achieve high accuracy, but DT is not so demanding. RF and GBDT as ensemble methods must be suitable for more features, but on some simple datasets, DT can outperform them with a few PIPs.
In fact, the most difficult thing is to answer the first question. As shown in Figure 11, with the length of the dataset as the horizontal axis, we obtain the distribution of PIPs when the best accuracy is achieved on the corresponding dataset. The three distributions are similar, but for RF and GBDT, the appropriate number of PIPs is greater than DT.
On the other hand, on the same dataset, the larger m does not mean the better. Through a large number of experimental records, we found that there is no specific rule. For some time series with quite different morphological structures, a small amount of PIPs is enough to highlight their differences. Conversely, more PIPs may cause information redundancy and confusion. When the morphological structure of the time series is complex, the situation is completely opposite, and more PIPs are needed to describe the characteristics of the sample.

5. Conclusions

The introduction of morphological structure features is an important improvement to the time series classification. Based on the way of human visual cognition, many studies have pointed out that the shape of time series can be described by a sequence of important turning points. Inspired by these studies, we proposed GRM-PIPs, which control the length of the interval. Then we used PIPs to segment the time series, and extracted the feature combination of interval-level and point-level. The introduction of three classifiers, DT, RF, and GBDT, completes the perceptual feature-based framework. Finally, we compared five benchmark methods and three recently published methods on a large number of UCR datasets. The experimental results show that our work has excellent performance on the TSC task. In addition, we demonstrated the threshold of the interval length and discussed the influence of the number of PIPs, which made up for the deficiency in these aspects.
In future work, we plan to add more different types of classifiers and optimize these classifiers. At the same time, further improvement of feature extraction is considered.

Author Contributions

S.W.: Conceptualization, methodology, programming, validation of the results, analyses, writing, review and editing, supervision, investigation, and data curation. X.W.: Resources, supervision, and project administration. M.L.: Supervision and review. D.W.: Investigation and review. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Planning Project of Shenzhen Municipality, Grant number JCYJ20190806112210067.

Institutional Review Board Statement

“Not applicable” for studies not involving humans or animals.

Informed Consent Statement

“Not applicable” for studies not involving humans.

Data Availability Statement

The UCR dataset comes from https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (accessed on 1 May 2021). The complete data package can be downloaded from https://www.cs.ucr.edu/~eamonn/time_series_data_2018/UCRArchive_2018.zip (accessed on 1 May 2021). The briefing documents of the UCR dataset can be downloaded here (https://www.cs.ucr.edu/~eamonn/time_series_data_2018/BriefingDocument2018.pdf and https://www.cs.ucr.edu/~eamonn/time_series_data_2018/BriefingDocument2018.pptx) (accessed on 1 May 2021). More information about the UCR dataset such as baseline and comparison can be found in http://www.timeseriesclassification.com (accessed on 1 May 2021).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Summary of the Number of PIPs When the Best Results Are Obtained

Table A1. The best accuracy and the number of PIPs used at the time.
Table A1. The best accuracy and the number of PIPs used at the time.
ACC PIPs
No.PFC-DTPFC-RFPFC-GBDTPFC-DTPFC-RFPFC-GBDT
10.90000.95000.90009721
20.95000.90000.9500575
30.97950.97950.9767574
41.00001.00001.00003153
50.70000.74000.7640252212
60.75000.78990.7753465
70.79130.80580.79865429
80.81000.86000.8500676
90.99880.95010.99546186
100.71360.85300.873472929
110.64320.70250.727192226
120.97820.96210.9775676
130.92810.90810.9421667
140.95330.99330.9533577
150.95890.99360.9873111011
160.98101.00000.9810565
171.00001.00001.0000333
180.66670.77140.7486312424
190.88650.92160.935171311
200.68750.65630.6875261216
210.87400.92430.8740192119
220.94170.94850.9105565
230.75410.81970.8197633
240.74230.83160.8178121111
250.77640.85940.7572222316
260.71450.79950.80077811
270.93330.95560.95003293
280.84190.87970.8965161417
290.82500.88670.88673105
300.56670.59440.5667212921
310.84690.78040.84696126
320.80010.78270.790114914
330.91080.98240.9622477
No.PFC-DTPFC-RFPFC-GBDTPFC-DTPFC-RFPFC-GBDT
340.76750.89150.8333454
350.76920.83080.74625199
360.95430.97190.9543353
371.00001.00001.0000192719
380.70370.79630.722292610
390.71430.74030.7800151113
400.74970.82000.8097777

References

  1. Wei, W.W. Time series analysis. In The Oxford Handbook of Quantitative Methods in Psychology: Volume 2; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
  2. Bagnall, A.; Lines, J.; Bostrom, A.; Large, J.; Keogh, E. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 2016, 31, 606–660. [Google Scholar] [CrossRef] [Green Version]
  3. Fawaz, H.I.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef] [Green Version]
  4. Geurts, P. Pattern Extraction for Time Series Classification. In Principles of Data Mining and Knowledge Discovery; Springer: Berlin/Heidelberg, Germany, 2001; pp. 115–127. [Google Scholar] [CrossRef] [Green Version]
  5. Elhoseiny, M.; Huang, S.; Elgammal, A. Weather classification with deep convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015. [Google Scholar] [CrossRef]
  6. Pham, T.D.; Wardell, K.; Eklund, A.; Salerud, G. Classification of short time series in early Parkinsons disease with deep learning of fuzzy recurrence plots. IEEE/CAA J. Autom. Sin. 2019, 6, 1306–1317. [Google Scholar] [CrossRef]
  7. Joshi, D.; Khajuria, A.; Joshi, P. An automatic non-invasive method for Parkinson’s disease classification. Comput. Methods Progr. Biomed. 2017, 145, 135–145. [Google Scholar] [CrossRef] [PubMed]
  8. Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
  9. Keogh, E.J.; Pazzani, M.J. Scaling up dynamic time warping for datamining applications. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), Boston, MA, USA, 20–23 August 2000; ACM Press: New York, NY, USA, 2000. [Google Scholar] [CrossRef] [Green Version]
  10. Keogh, E.; Chakrabarti, K.; Pazzani, M.; Mehrotra, S. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst. 2001, 3, 263–286. [Google Scholar] [CrossRef]
  11. Zhang, H.; Dong, Y.; Xu, D. Entropy-based Symbolic Aggregate Approximation Representation Method for Time Series. In Proceedings of the 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 11–13 December 2020. [Google Scholar] [CrossRef]
  12. Sun, Y.; Li, J.; Liu, J.; Sun, B.; Chow, C. An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing 2014, 138, 189–198. [Google Scholar] [CrossRef]
  13. Schäfer, P.; Högqvist, M. SFA. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT’12), Berlin, Germany, 27–30 March 2020; ACM Press: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
  14. Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min. Knowl. Discov. 2014, 29, 1505–1530. [Google Scholar] [CrossRef]
  15. Middlehurst, M.; Vickers, W.; Bagnall, A. Scalable Dictionary Classifiers for Time Series Classification. In Intelligent Data Engineering and Automated Learning—IDEAL 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 11–19. [Google Scholar] [CrossRef] [Green Version]
  16. Large, J.; Bagnall, A.; Malinowski, S.; Tavenard, R. On time series classification with dictionary-based classifiers. Intell. Data Anal. 2019, 23, 1073–1089. [Google Scholar] [CrossRef] [Green Version]
  17. Schäfer, P.; Leser, U. Fast and Accurate Time Series Classification with WEASEL. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; ACM: New York, NY, USA, 2017. [Google Scholar] [CrossRef] [Green Version]
  18. Lin, J.; Khade, R.; Li, Y. Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst. 2012, 39, 287–315. [Google Scholar] [CrossRef] [Green Version]
  19. Lines, J.; Bagnall, A. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 2014, 29, 565–592. [Google Scholar] [CrossRef]
  20. Lucas, B.; Shifaz, A.; Pelletier, C.; O’Neill, L.; Zaidi, N.; Goethals, B.; Petitjean, F.; Webb, G.I. Proximity Forest: An effective and scalable distance-based classifier for time series. Data Min. Knowl. Discov. 2019, 33, 607–635. [Google Scholar] [CrossRef] [Green Version]
  21. Xi, X.; Keogh, E.; Shelton, C.; Wei, L.; Ratanamahatana, C.A. Fast time series classification using numerosity reduction. In Proceedings of the 23rd International Conference on MACHINE Learning (ICML’06), Pittsburgh, PA, USA, 25–29 June 2006; ACM Press: New York, NY, USA, 2006. [Google Scholar] [CrossRef] [Green Version]
  22. Górecki, T.; uczak, M. Non-isometric transforms in time series classification using DTW. Knowl. Based Syst. 2014, 61, 98–108. [Google Scholar] [CrossRef]
  23. Datta, S.; Karmakar, C.K.; Palaniswami, M. Averaging Methods using Dynamic Time Warping for Time Series Classification. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020. [Google Scholar] [CrossRef]
  24. Yu, D.; Yu, X.; Hu, Q.; Liu, J.; Wu, A. Dynamic time warping constraint learning for large margin nearest neighbor classification. Inf. Sci. 2011, 181, 2787–2796. [Google Scholar] [CrossRef]
  25. Forechi, A.; Souza, A.F.D.; Badue, C.; Oliveira-Santos, T. Sequential appearance-based Global Localization using an ensemble of kNN-DTW classifiers. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar] [CrossRef]
  26. Ryabko, D.; Mary, J. Reducing statistical time-series problems to binary classification. Adv. Neural Inf. Process. Syst. 2012, 3, 2069–2077. [Google Scholar]
  27. Deng, H.; Runger, G.; Tuv, E.; Vladimir, M. A time series forest for classification and feature extraction. Inf. Sci. 2013, 239, 142–153. [Google Scholar] [CrossRef] [Green Version]
  28. Cabello, N.; Naghizade, E.; Qi, J.; Kulik, L. Fast and Accurate Time Series Classification Through Supervised Interval Search. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020. [Google Scholar] [CrossRef]
  29. Lines, J.; Taylor, S.; Bagnall, A. HIVE-COTE: The Hierarchical Vote Collective of Transformation-Based Ensembles for Time Series Classification. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016. [Google Scholar] [CrossRef] [Green Version]
  30. Ye, L.; Keogh, E. Time series shapelets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), Paris, France, 28 June–1 July 2009; ACM Press: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  31. Hills, J.; Lines, J.; Baranauskas, E.; Mapp, J.; Bagnall, A. Classification of time series by shapelet transformation. Data Min. Knowl. Discov. 2013, 28, 851–881. [Google Scholar] [CrossRef] [Green Version]
  32. Ji, C.; Liu, S.; Yang, C.; Pan, L.; Wu, L.; Meng, X. A Shapelet Selection Algorithm for Time Series Classification: New Directions. Procedia Comput. Sci. 2018, 129, 461–467. [Google Scholar] [CrossRef]
  33. Ji, C.; Zhao, C.; Liu, S.; Yang, C.; Pan, L.; Wu, L.; Meng, X. A fast shapelet selection algorithm for time series classification. Comput. Netw. 2019, 148, 231–240. [Google Scholar] [CrossRef]
  34. Dempster, A.; Petitjean, F.; Webb, G.I. ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 2020, 34, 1454–1495. [Google Scholar] [CrossRef]
  35. Yu, J.; Yin, J.; Zhou, D.; Zhang, J. A Pattern Distance-Based Evolutionary Approach to Time Series Segmentation. In Intelligent Control and Automation; Springer: Berlin/Heidelberg, Germany, 2006; pp. 797–802. [Google Scholar] [CrossRef]
  36. Chung, F.; Fu, T.; Luk, W.; Ng, V. Flexible time series pattern matching based on perceptually important points. In Workshop on Learning from Temporal and Spatial Data in International Joint Conference on Artificial Intelligence; The Hong Kong Polytechnic University: Hong Kong, China, 2001; pp. 1–7. [Google Scholar]
  37. Phetchanchai, C.; Selamat, A.; Rehman, A.; Saba, T. Index Financial Time Series Based on Zigzag-Perceptually Important Points. J. Comput. Sci. 2010, 6, 1389–1395. [Google Scholar] [CrossRef] [Green Version]
  38. Chi, X.; Jiang, Z. Feature recognition of the futures time series based on perceptually important points. In Proceedings of the 2012 2nd International Conference on Computer Science and Network Technology, Changchun, China, 29–31 December 2012. [Google Scholar] [CrossRef]
  39. Lintonen, T.; Raty, T. Self-learning of multivariate time series using perceptually important points. IEEE/CAA J. Autom. Sin. 2019, 6, 1318–1331. [Google Scholar] [CrossRef]
  40. Fu, T.C.; Chung, F.L.; Ng, C.M. Financial Time Series Segmentation based on Specialized Binary Tree Representation. In Proceedings of the 2006 International Conference on Data Mining (DMIN 2006), Las Vegas, NV, USA, 26–29 June 2006; pp. 3–9. [Google Scholar]
  41. Azimifar, M.; Araabi, B.N.; Moradi, H. Forecasting stock market trends using support vector regression and perceptually important points. In Proceedings of the 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 29–30 October 2020; pp. 268–273. [Google Scholar] [CrossRef]
  42. Fenton, N.; Neil, M. Decision Analysis, Decision Trees, Value of Information Analysis, and Sensitivity Analysis. In Risk Assessment and Decision Analysis with Bayesian Networks; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 347–369. [Google Scholar] [CrossRef]
  43. Kamiński, B.; Jakubczyk, M.; Szufel, P. A framework for sensitivity analysis of decision trees. Cent. Eur. J. Oper. Res. 2017, 26, 135–159. [Google Scholar] [CrossRef] [PubMed]
  44. Quinlan, J. Simplifying decision trees. Int. J. Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef] [Green Version]
  45. Kretowski, M. Decision Trees in Data Mining. In Studies in Big Data; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 21–48. [Google Scholar] [CrossRef]
  46. Qiu, W.; Liu, X.; Wang, L. Forecasting shanghai composite index based on fuzzy time series and improved C-fuzzy decision trees. Expert Syst. Appl. 2012, 39, 7680–7689. [Google Scholar] [CrossRef]
  47. Zalewski, W.; Silva, F.; Maletzke, A.; Ferrero, C. Exploring shapelet transformation for time series classification in decision trees. Knowl. Based Syst. 2016, 112, 80–91. [Google Scholar] [CrossRef]
  48. He, Y.; Chu, X.; Wang, Y. Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 373–384. [Google Scholar] [CrossRef]
  49. Biau, G.; Scornet, E. Rejoinder on: A random forest guided tour. Test 2016, 25, 264–268. [Google Scholar] [CrossRef] [Green Version]
  50. Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
  51. Wang, J.; Tang, S. Time series classification based on arima and adaboost. MATEC Web Conf. 2020, 309, 03024. [Google Scholar] [CrossRef] [Green Version]
  52. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
  53. Elish, M. Enhanced prediction of vulnerable Web components using Stochastic Gradient Boosting Trees. Int. J. Web Inf. Syst. 2019, 15, 201–214. [Google Scholar] [CrossRef]
  54. Johnson, N.E.; Bonczak, B.; Kontokosta, C.E. Using a gradient boosting model to improve the performance of low-cost aerosol monitors in a dense, heterogeneous urban environment. Atmos. Environ. 2018, 184, 9–16. [Google Scholar] [CrossRef] [Green Version]
  55. Džeroski, S.; Ženko, B. Is Combining Classifiers with Stacking Better than Selecting the Best One? Mach. Learn. 2004, 54, 255–273. [Google Scholar] [CrossRef] [Green Version]
  56. Fuad, M.M.M. Extreme-SAX: Extreme Points Based Symbolic Representation for Time Series Classification. In Big Data Analytics and Knowledge Discovery; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 122–130. [Google Scholar] [CrossRef]
  57. Yan, L.; Liu, Y.; Liu, Y. Interval Feature Transformation for Time Series Classification Using Perceptually Important Points. Appl. Sci. 2020, 10, 5428. [Google Scholar] [CrossRef]
  58. Dorle, A.; Li, F.; Song, W.; Li, S. Learning Discriminative Virtual Sequences for Time Series Classification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Online, 19–23 October 2018; ACM: New York, NY, USA, 2020; pp. 2001–2004. [Google Scholar] [CrossRef]
Figure 1. Three representative datasets from the UCR Time Series Classification Archive. Due to the large size of the original dataset, only some samples are shown as examples.
Figure 1. Three representative datasets from the UCR Time Series Classification Archive. Due to the large size of the original dataset, only some samples are shown as examples.
Entropy 23 01059 g001
Figure 2. The schematic diagram of the vertical distance V D P c .
Figure 2. The schematic diagram of the vertical distance V D P c .
Entropy 23 01059 g002
Figure 3. The process of finding 6 PIPs from T.
Figure 3. The process of finding 6 PIPs from T.
Entropy 23 01059 g003
Figure 4. A decision tree constructed based on the example.
Figure 4. A decision tree constructed based on the example.
Entropy 23 01059 g004
Figure 5. GRM-PIPs and the traditional PIPs algorithms were used to extract PIPs from time series sample T.
Figure 5. GRM-PIPs and the traditional PIPs algorithms were used to extract PIPs from time series sample T.
Entropy 23 01059 g005
Figure 6. The distributions of PIPs in two UCR datasets, which are Coffee (a) and ECGFiveDays (b). The figures above show that the PIPs extracted from the original sample are discriminative, while the figures below show that the distribution of PIPs is consistent on the training set and the test set.
Figure 6. The distributions of PIPs in two UCR datasets, which are Coffee (a) and ECGFiveDays (b). The figures above show that the PIPs extracted from the original sample are discriminative, while the figures below show that the distribution of PIPs is consistent on the training set and the test set.
Entropy 23 01059 g006
Figure 7. An instance that cannot be distinguished by features such as mean and standard deviation. The sequence S 1 on the left is flipped to get the sequence S 2 on the right.
Figure 7. An instance that cannot be distinguished by features such as mean and standard deviation. The sequence S 1 on the left is flipped to get the sequence S 2 on the right.
Entropy 23 01059 g007
Figure 8. The schematic diagram of the PFC framework.
Figure 8. The schematic diagram of the PFC framework.
Entropy 23 01059 g008
Figure 9. The original time series and the distribution of PIPs in Beef.
Figure 9. The original time series and the distribution of PIPs in Beef.
Entropy 23 01059 g009
Figure 10. The distribution of PIPs in ShapeletSim. The distribution on the training set is on the left, and the right is the distribution on the test set.
Figure 10. The distribution of PIPs in ShapeletSim. The distribution on the training set is on the left, and the right is the distribution on the test set.
Entropy 23 01059 g010
Figure 11. The distribution of the number of PIPs in different classifers.
Figure 11. The distribution of the number of PIPs in different classifers.
Entropy 23 01059 g011
Table 1. The sample data of DT.
Table 1. The sample data of DT.
SeasonTimeWindExercise
SpringBefore 8:00 a.m.BreezeYes
WinterBefore 8:00 a.m.No windYes
AutumnAfter 8:00 a.m.BreezeYes
WinterBefore 8:00 a.m.No windYes
SummerBefore 8:00 a.m.BreezeYes
WinterAfter 8:00 a.m.BreezeYes
WinterBefore 8:00 a.m.GaleYes
WinterBefore 8:00 a.m.No windYes
SpringAfter 8:00 a.m.No windNo
SummerAfter 8:00 a.m.GaleNo
SummerBefore 8:00 a.m.GaleNo
AutumnAfter 8:00 AMBreezeNo
Table 2. Summary of 40 two-category datasets in UCR Archive.
Table 2. Summary of 40 two-category datasets in UCR Archive.
No.NameTypeTrainTestLength
1BeetleFlyImage2020512
2BirdChickenImage2020512
3ChinatownTraffic2034324
4CoffeeSpectro2828286
5ComputersDevice250250720
6DistalPhalanxOutlineCorrectImage60027680
7EarthquakesSensor322139512
8ECG200ECG10010096
9ECGFiveDaysECG23861136
10FordASensor36011320500
11FordBSensor3636810500
12FreezerRegularTrainSensor1502850301
13FreezerSmallTrainSensor282850301
14GunPointMotion50150150
15GunPointAgeSpanMotion135316150
16GunPointMaleVersusFemaleMotion135316150
17GunPointOldVersusYoungMotion136315150
18HamSpectro109105431
19HandOutlinesImage10003702709
20HerringImage6464512
21HouseTwentyDevice401192000
22ItalyPowerDemandSensor67102924
23Lightning2Sensor6061637
24MiddlePhalanxOutlineCorrectImage60029180
25MoteStrainSensor20125284
26PhalangesOutlinesCorrectImage180085880
27PowerConsPower180180144
28ProximalPhalanxOutlineCorrectImage60029180
29SemgHandGenderCh2Spectrum3006001500
30ShapeletSimSimulated20180500
31SonyAIBORobotSurface1Sensor2060170
32SonyAIBORobotSurface2Sensor2795365
33StrawberrySpectro613370235
34ToeSegmentation1Motion40228277
35ToeSegmentation2Motion36130343
36TwoLeadECGECG23113982
37WaferSensor10006164152
38WineSpectro5754234
39WormsTwoClassMotion18177900
40YogaImage3003000426
Table 3. Classification accuracy of PFC and five benchmarks on 40 two-category UCR datasets.
Table 3. Classification accuracy of PFC and five benchmarks on 40 two-category UCR datasets.
PFC
No.WEASELBOSSRISETSFCatch22PFC-DTPFC-RFPFC-GBDT
10.88670.94330.87170.83330.84000.90000.95000.9000
20.86500.98330.86830.81500.89330.95000.90000.9500
30.95730.87710.88850.95300.93450.97950.97950.9767
40.98930.98570.98450.98690.97981.00001.00001.0000
50.77850.80050.77890.64880.78030.70000.74000.7640
60.81920.81170.81120.80580.81210.75000.78990.7753
70.74750.74600.74820.74750.73880.79130.80580.7986
80.85900.87830.85100.86000.78870.81000.86000.8500
90.99350.99230.97290.95200.81590.99880.95010.9954
100.96870.92140.94000.81580.90920.71360.85300.8734
11lacklacklacklacklack0.64320.70250.7271
120.99060.98810.95230.99710.99820.97820.96210.9775
130.90060.96160.87870.96140.95980.92810.90810.9421
140.99310.99640.98090.95530.94310.95330.99330.9533
150.98130.99490.98630.97770.94390.95890.99360.9873
160.99390.99960.99110.99600.99350.98101.00000.9810
170.98600.99920.99981.00000.96421.00001.00001.0000
180.82130.83750.81970.79940.69400.66670.77140.7486
19lacklacklacklacklack0.88650.92160.9351
200.60210.59580.59840.60420.55570.68750.65630.6875
210.81060.95600.92970.83780.94620.87400.92430.8740
220.94680.87090.94450.95950.87750.94170.94850.9105
230.62730.81910.68200.76450.74480.75410.81970.8197
240.82830.80950.80550.79950.77270.74230.83160.8178
250.90480.84420.87800.85550.84850.77640.85940.7572
260.82170.81740.81250.80570.79190.71450.79950.8007
270.91940.89000.95800.99310.88630.93330.95560.9500
280.87630.86550.87370.84890.83370.84190.87970.8965
290.78140.88770.87000.94740.87060.82500.88670.8867
300.99741.00000.76760.51370.99370.56670.59440.5667
310.90930.89770.86700.86370.88340.84690.78040.8469
320.93530.87940.91250.87430.90230.80010.78270.7901
330.97860.97050.97300.96750.92290.91080.98240.9622
340.94300.92490.88040.66710.81270.76750.89150.8333
350.92850.96150.91180.80260.83510.76920.83080.7462
360.99750.98470.91070.87060.85390.95430.97190.9543
370.99990.99890.99540.99660.99731.00001.00001.0000
380.93020.89260.87100.86230.70000.70370.79630.7222
390.80040.80780.78530.69350.79220.71430.74030.7800
400.89240.91020.83720.86580.80380.74970.82000.8097
Best ACC912041 13 (6/10/6)
Table 4. Comparison of PFC and DVSL on 15 UCR datasets.
Table 4. Comparison of PFC and DVSL on 15 UCR datasets.
PFC
DatasetsTrainTestClassLengthDVSLPFC-DTPFC-RFPFC-GBDT
ArrowHead3617532510.72000.61140.72570.7257
Beef303054700.90000.70000.73330.6667
Car606045770.83500.71670.71670.7500
ChlConcent467384031660.77430.59510.64970.6466
Coffee282822861.00001.00001.00001.0000
ECG2001001002960.83500.81000.86000.8500
ECGFiveDays2386121360.97350.99880.95010.9954
Herring646425120.65630.68750.65630.6875
InsectWingb2201980112560.58190.42170.59270.4788
Meat606034480.98831.00000.96670.9667
MPhaOLAge4001543800.58180.57790.65580.5974
OliveOil303045700.84670.96670.93330.9000
SonyAIBR1206012700.76160.84690.78040.8469
TwoLeadECG2311392820.91600.95430.97190.9543
Wine575422340.65000.70370.79630.7222
Best ACC 4 12 (6/7/4)
Table 5. Comparison of PFC and IFT on 20 UCR datasets (exclude two datasets with missing values).
Table 5. Comparison of PFC and IFT on 20 UCR datasets (exclude two datasets with missing values).
PFC
DatesetsTrainTestLengthClassIFTPFC-DTPFC-RFPFC-GBDT
BirdChicken202051220.90000.95000.90000.9500
FreezerRegularTrain150285030120.90350.97820.96210.9775
ShapeletSim2018050020.99440.56670.59440.5667
ToeSegmentation14022827720.88160.76750.89150.8333
Worms1817790050.66230.59740.66230.6883
Rock2050284440.62000.70000.76000.7000
Meat606044830.95001.00000.96670.9667
Beef303047050.66670.70000.73330.6667
InlineSkate100550188270.35820.28000.38180.3327
Coffee282828620.96431.00001.00001.0000
ECGFiveDays2386113620.82810.99880.95010.9954
Ham10910543120.63810.66670.77140.7486
Herring646451220.67190.68750.65630.6875
PowerCons18018014420.93330.93330.95560.9500
Wine575423420.74070.70370.79630.7222
Yoga300300042620.77670.74970.82000.8097
FaceFour248835040.64770.70450.61360.6591
OliveOil303057040.76670.96670.93330.9000
Fish17517546370.81140.76000.89710.8500
Plane10510514471.00000.94291.00000.9905
Best ACC 2 19 (8/9/4)
Table 6. Comparison of PFC and E-SAX on 45 UCR datasets. All results are converted to accuracy uniformly.
Table 6. Comparison of PFC and E-SAX on 45 UCR datasets. All results are converted to accuracy uniformly.
PFC
DatesetsTrainTestClassLengthE-SAXPFC-DTPFC-RFPFC-GBDT
SyntheticControl3003006600.99700.84000.97670.9500
GunPoint5015021500.86000.95330.99330.9533
CBF3090031280.91900.95220.91110.9056
FaceAll5601690141310.72500.68170.83020.8012
OSULeaf20024264270.51600.52480.66120.6157
SwedishLeaf500625151280.75200.72960.87520.8112
Trace10010042750.68001.00001.00001.0000
FaceFour248843500.78400.70450.61360.6591
Lightning2606126370.83600.75410.81970.8197
Lightning7707373190.60200.69860.79450.7397
ECG2001001002960.88000.81000.86000.8500
Adiac390391371760.14600.47310.64190.5242
Yoga300300024260.82100.74970.82000.8097
Fish17517574630.75400.76000.89710.8500
Plane10510571440.97100.94291.00000.9905
Car606045770.73300.71670.71670.7500
Beef303054700.63300.70000.73330.6667
Coffee282822860.71401.00001.00001.0000
OliveOil303045700.16700.96670.93330.9000
CinCECGTorso401380416390.92700.88000.94150.9223
ChlorineConcentration467384031660.49200.59510.64970.6466
DiatomSizeReduction1630643450.91200.90520.90200.8889
ECGFiveDays2386121360.76500.99880.95010.9954
FacesUCR2002050141310.79400.59370.74050.6615
Haptics155308510920.33800.37010.50970.4805
InlineSkate100550718820.33000.28000.38180.3327
ItalyPowerDemand6710292240.88800.94170.94850.9105
MedicalImages38176010990.64200.66180.74210.6868
MoteStrain2012522840.80700.77640.85940.7572
SonyAIBORobotSurface1206012700.69400.84690.78040.8469
SonyAIBORobotSurface2279532650.85400.80010.78270.7901
Symbols2599563980.89700.80400.93170.8523
TwoLeadECG2311392820.72200.95430.97190.9543
InsectWingbeatSound2201980112560.54700.42170.59270.4788
ArrowHead3617532510.77700.61140.72570.7257
BeetleFly202025120.75000.90000.95000.9000
BirdChicken202025120.65000.95000.90000.9500
Herring646425120.59400.68750.65630.6875
ProximalPhalanxTW4002056800.63800.73660.81950.7951
ToeSegmentation14022822770.64500.76750.89150.8333
ToeSegmentation23613023430.80800.76920.83080.7462
DistalPhalanxOutlineAgeGroup4001393800.75000.73380.76260.7482
DistalPhalanxOutlineCorrect6002762800.60200.75000.78990.7753
DistalPhalanxTW4001396800.72800.69780.69060.6763
WordSynonyms267638252700.62900.42320.57990.4828
Best ACC 11 34 (8/27/6)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, S.; Wang, X.; Liang, M.; Wu, D. PFC: A Novel Perceptual Features-Based Framework for Time Series Classification. Entropy 2021, 23, 1059. https://doi.org/10.3390/e23081059

AMA Style

Wu S, Wang X, Liang M, Wu D. PFC: A Novel Perceptual Features-Based Framework for Time Series Classification. Entropy. 2021; 23(8):1059. https://doi.org/10.3390/e23081059

Chicago/Turabian Style

Wu, Shaocong, Xiaolong Wang, Mengxia Liang, and Dingming Wu. 2021. "PFC: A Novel Perceptual Features-Based Framework for Time Series Classification" Entropy 23, no. 8: 1059. https://doi.org/10.3390/e23081059

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop