Time Series Classiﬁcation with Shapelet and Canonical Features

: Shapelet-based time series classiﬁcation methods are widely adopted models for time series classiﬁcation tasks. However, the high computational cost greatly limits the practicability of the Shapelet-based methods. What is more, traditional Shapelet can only describe the overall shape characteristics of subsequences under the Euclidean distance metric, so it is vulnerable to noise. Other than Shapelet, there are other types of discriminative information contained in the subsequences. To deal with the aforementioned problems, an accurate and efﬁcient time series classiﬁcation algorithm, named Shapelet with Canonical Time Series Features, is proposed in this paper. The proposed algorithm is based on the following three key strategies: (1) randomly selecting Shapelet and limiting the scope of Shapelet to improve efﬁciency; (2) embedding multiple canonical time series features in Shapelet to improve the adaptability of the algorithm to different classiﬁcation problems and make up for the accuracy loss caused by the random selection of Shapelet; and (3) building a random forest classiﬁer based on the new feature representations to ensure the generalization ability of the algorithm. Experimental results on 112 UCR time series datasets show that the proposed algorithm is more accurate than the STC algorithm which is based on Shapelet exact search and the Shapelet transform technique, as well as many other types of state-of-the-art time series classiﬁcation algorithms. Moreover, extensive experimental comparisons verify the signiﬁcant advantages of the proposed algorithm in terms of efﬁciency.


Introduction
Time series classification is an important research area in data mining and has received more and more extensive attention in recent years [1,2].The solutions of many practical applications are supported by time series classification technology, such as road condition prediction [3], disease diagnosis [4], remote sensing data analysis [5] and so on, which have greatly promoted the rapid development of time series classification research.However, the increasing scale of data [6,7] and the constant introduction of complex classification tasks [7] make it still extremely challenging to achieve accurate and efficient time series classification.
One of the key problems in time series classification research is how to define and find patterns that distinguish time series from different categories, which affects and even determines the performance of time series classification algorithms.Ye et al. [8] first proposed the concept of Shapelet, defining it as a discriminating subsequence in a time series.Shapelet distinguishes different categories by the local shape of the time series, has strong predictive ability and interpretability, and has received widespread attention in the field of time series classification.However, Shapelet's brute force search algorithm requires iterating through all subsequences in the dataset with a time complexity of up to O(n 2 • l 4 ) (n is the number of time series in the dataset, l is the average length of the time series) [8].The expensive computational cost strongly limits the usefulness of the time series classification algorithms based on Shapelet [9,10].To address the issue, researchers improved the search efficiency of Shapelet by reducing the search space of Shapelet [9][10][11][12] or trading space for time [13] (see Section 2.1 for details).However, these methods usually are designed with the goal of bringing only a certain efficiency gain without significantly reducing accuracy.
In addition to the efficiency problem, we realize that there is another flaw in the traditional Shapelet algorithm-that is, it can only describe the overall shape characteristics of the subsequence under the Euclidean distance metric.So, it is extremely susceptible to noise, and it is difficult to mine other types of features embedded in the subsequence.Consider the example in Figure 1 taken from the "ShapeletSim" dataset [7]. Figure 1a is a candidate Shapelet S chosen from the "Triangle" class, and Figure 1b,c present the subsequences most similar to candidate Shapelet S in each of the two categories found using sliding window technique.In fact, the characteristic that distinguishes the two categories of the "ShapeletSim" dataset is the artificially embedded triangular-shaped segment in the "Triangle" class (the area within the box in Figure 1a), and other data points in the two categories are all random noise.As can be seen from Figure 1, the candidate Shapelet S intuitively meets our expectations for Shapelet, as it has the triangular-shaped feature that the "Noise" class does not have.However, due to the influence of noise, it is impossible to correctly classify these two categories by sorting the Euclidean distances between the eight subsequences and the candidate Shapelet S (as shown in Figure 1d).
Appl.Sci.2022, 12, x FOR PEER REVIEW 2 of 19 series) [8].The expensive computational cost strongly limits the usefulness of the time series classification algorithms based on Shapelet [9,10].To address the issue, researchers improved the search efficiency of Shapelet by reducing the search space of Shapelet [9][10][11][12] or trading space for time [13] (see Section 2.1 for details).However, these methods usually are designed with the goal of bringing only a certain efficiency gain without significantly reducing accuracy.
In addition to the efficiency problem, we realize that there is another flaw in the traditional Shapelet algorithm-that is, it can only describe the overall shape characteristics of the subsequence under the Euclidean distance metric.So, it is extremely susceptible to noise, and it is difficult to mine other types of features embedded in the subsequence.Consider the example in Figure 1 taken from the "ShapeletSim" dataset [7]. Figure 1a is a candidate Shapelet S chosen from the "Triangle" class, and Figure 1b,c present the subsequences most similar to candidate Shapelet S in each of the two categories found using sliding window technique.In fact, the characteristic that distinguishes the two categories of the "ShapeletSim" dataset is the artificially embedded triangular-shaped segment in the "Triangle" class (the area within the box in Figure 1a), and other data points in the two categories are all random noise.As can be seen from Figure 1, the candidate Shapelet S intuitively meets our expectations for Shapelet, as it has the triangular-shaped feature that the "Noise" class does not have.However, due to the influence of noise, it is impossible to correctly classify these two categories by sorting the Euclidean distances between the eight subsequences and the candidate Shapelet S (as shown in Figure 1d).In order to deal with the above problems, this paper proposes a new time series classification algorithm-Random Shapelet Forest Embedded with Canonical Time Series Features (RSFCF).RSFCF is a random, tree-based integrated classification algorithm designed to achieve high accuracy and efficient time series classification.RSFCF's time complexity is reduced by several orders of magnitude relative to the Shapelet brute force search by randomly selecting Shapelets.In addition, inspired by interval-based time series classification methods [3,14], we believe that for most real-world datasets, the local offset of the time series on the timeline is generally within a limited range.As a result, RSFCF limits the scope of Shapelet, which further improves the efficiency of Shapelet matching while retaining Shapelet's location information to a large extent.In order to deal with the above problems, this paper proposes a new time series classification algorithm-Random Shapelet Forest Embedded with Canonical Time Series Features (RSFCF).RSFCF is a random, tree-based integrated classification algorithm designed to achieve high accuracy and efficient time series classification.RSFCF's time complexity is reduced by several orders of magnitude relative to the Shapelet brute force search by randomly selecting Shapelets.In addition, inspired by interval-based time series classification methods [3,14], we believe that for most real-world datasets, the local offset of the time series on the timeline is generally within a limited range.As a result, RSFCF limits the scope of Shapelet, which further improves the efficiency of Shapelet matching while retaining Shapelet's location information to a large extent.
Lubba et al. proposed 22 typical time series features ("Catch22" for short; see Section 3.3 for details), including statistical features, spectral features and other types of features [15].In order to improve the applicability of the algorithm and compensate for the accuracy loss caused by the random selection of Shapelet, this paper combines the Shapelet transformation technology [16] with multiple typical time series features, and proposes a random Shapelet transformation method that embeds typical time series features, and the final random forest classification model is constructed based on the new feature representation of the data.As in the example shown in Figure 1, the "triangle" causes the power spectrum of the subsequence to have a stronger response in the lower frequency band.Based on the eigenvalue of the subsequence on the feature SP_Summaries_welch_rect_area_5_1 (the sum of the energies of the five lowest frequencies in the Fourier power spectrum, which is one of the features in the Catch22), the two categories can be correctly distinguished (as shown in Figure 1e).In this case, although the candidate Shapelet S is not precise (i.e., contains a lot of random noise in addition to the "triangle"), we are still able to tap into the discriminatory information contained therein and rely on it for accurate classification.
In order to fully verify the performance of the RSFCF algorithm proposed in this paper, we have conducted extensive experimental comparison and analysis with a number of current advanced time series classification algorithms on a large number of UCR time series datasets [7].Experimental results on 112 datasets show that: (1) embedding typical time series features can effectively improve the accuracy of random Shapelet forests; (2) RSFCF surpasses the STC algorithm based on Shapelet precision search and Shapelet transformation techniques [17] in terms of accuracy (a recent work by Bagnall et al. [18] showed that STC is the most accurate time series classification algorithm based on Shapelet), and is an order of magnitude faster than it is in training; (3) besides the STC algorithm, RSFCF surpasses many other types of advanced time series classification algorithms in terms of accuracy, including residual neural networks (ResNet) [19], Proximity Forest [5], and Canonical Interval Forests (CIF) [3].
The main contributions in this paper are summarized as follows: (1) Considering the characteristics of a real dataset, a method that can effectively improve the matching efficiency of Shapelet without a significant loss of accuracy is verified to limit the scope of Shapelet; (2) A novel method of embedding typical time series features in Shapelet is proposed, and experimental results show that this method can effectively compensate for the loss of accuracy caused by the random selection of Shapelet; (3) Based on the above method, an accurate and efficient time series classification algorithm is proposed-Random Shapelet Forest embedded with Canonical Time Series features (referred to as RSFCF).
The rest of this article is organized as follows.Section 2 introduces the relevant work and recent progress made in time series classification, Section 3 introduces the relevant definitions and background knowledge, Section 4 describes the RSFCF algorithm proposed in this paper in detail, Section 5 verifies the performance of the RSFCF algorithm through extensive experimental comparison and analysis, and a final summary and possible future work are given in Section 6.

Related Work
This section reviews relevant research work on Shapelet and briefly introduces other types of time series classification method in light of the latest research advances.

Classification Method Based on Shapelet
The classification method of the time series classification algorithm based on Shapelet distinguishes between different categories based on whether the discriminating subsequence (i.e., Shapelet [8]) appears in the time series, regardless of where it appears.The original method of Ye et al. [8] used information gain as an evaluation criterion to search for optimal Shapelet by enumerating all subsequences.Since the high computational cost of the method severely hampers Shapelet's practicality, much of the research on Shapelet has focused on how to accelerate the discovery of Shapelet.For example, Mueen et al. [13] employed an intelligent caching technique that traded space for time, reducing the time complexity of an exact search for Shapelet by an order of magnitude.The fast Shapelet method [9] uses SAX technique [20] to discretize subsequences and search for "approximate" optimal Shapelet in low-dimensional spaces using random projection techniques, reducing the temporal complexity of searching for Shapelet to O(n • l 2 ).When Karlsson et al. [10] constructed each node of a decision tree, only randomly selected k subsequences (k are much less than the number of all subsequences).In [11], Shapelet pre-screening was performed based on variance located at key points in the time series at each time point.Ref. [12] applied local Fisher discriminant analysis (LFDA) to find key dimensions in time series to reduce the Shapelet search space.Another method of improving efficiency is called LearningShapelet [1], which learns Shapelet by optimizing strategies rather than directly using subsequences in the dataset as candidates.
There are two main classification strategies for the above methods.The first way is to fuse the Shapelet search process with the building process of the decision tree [8][9][10]13], and the second is to transform the dataset using the multiple Shapelets found (i.e., using the Shapelet transformation technique [16], see Section 3.2 for details) and then classifying them using traditional classifiers such as SVM [12].Although a series of Shapelet-based classification methods have been proposed one after another, a recent assessment by Bagall et al. [18] suggests that in terms of accuracy, the STC classification algorithm [17] based on Shapelet precision search and Shapelet transformation techniques (the converted data are used to train the rotating forest classifier [18]) represents the most advanced level of this type of method.In the experimental analysis in Section 5, we will show that the accuracy and efficiency of the RSFCF algorithm proposed in this paper on the UCR time series dataset exceed the STC algorithm.

Other Types of Classification Methods
Classification methods based on intervals assume that local features depend on where they appear and are generally more efficient.Typical methods include time series forest (TSF) [14], random interval spectrum integration (RISE) [21], and typical interval forest (CIF) [3].TSF randomly selects a set of intervals in the time series to perform transformations over three time domains (mean, variance, and slope, respectively), and trains an integrated classifier based on a decision tree with new feature representations [14].Unlike TSF, RISE performs four frequency domain transitions for each set of randomly selected intervals, including autocorrelation functions, partial autocorrelation functions, autoregressive models, and power spectra [21].CIF adds the 22 features in Catch22 [15] to TSF, significantly exceeding TSF and RISE in accuracy [3].
Classification methods based on dictionaries convert time series into a bag of patterns, distinguishing between different categories by the relative frequency with which patterns appear.Representative algorithms include Pattern Package (BoP) [22], SAX-VSM [23], BOSS [24], and WINCL [25].Among them, BoP and SAX-VSM use symbolic aggregation approximation (SAX) [20] techniques to convert subsequences into words, building feature vectors based on word frequency [23].BOSS is the most commonly used dictionary-based classification method [21], which constructs words using symbolic Fourier approximation (SFA) [26] techniques and constructs an integrated classifier [24] based on nearest neighbor and specially tailored distance metrics.WEASEL uses a "supervised" symbolic Fourier approximation technique to screen words with chi-square tests and ultimately train a logistic regression classifier [25].In terms of accuracy, WEASEL represents an advanced level of lexicography-based classification methods [3,18].However, researchers generally point out that the spatial complexity of WEASEL is extremely high, mainly due to the large characteristic space [5,27].
Classification methods based on distance usually use elastic distance measurements (i.e., distance measurement methods that can cope with phenomena such as local shifts or distortions of time series to some extent [28][29][30]) to quantify the distance between time series and classify them according to the distance between test instances and training instances.In order to improve the accuracy, Lines et al. [30] proposed an elastic integration algorithm consisting of 11 nearest neighbor classification algorithms based on different elastic distance measurements.The training and classification complexity of this method are high, at O(n 2 • l 2 ) and O(n • l 2 ), respectively.Lucas et al. [5] proposed the Proximity Forest algorithm to improve the accuracy and efficiency of the elastic integration algorithm.A neighboring forest is an integration of multiple neighboring trees, where the data on each node are split according to the distance from a randomly selected time series in each class, and the distance measurement and its required parameters are also randomly selected.
The time series classification methods introduced earlier, including STC [17], BOSS [24], elastic integration [31], neighboring forests [5], etc., are all integrated methods, but they are all homogeneous integrations, or integrations based on a single representation of time series.The meta-ensemble method [21,32] is an integration method based on several homogeneous integration methods, that is, integrated integration.HIVE-COTE [21] is the most accurate and representative meta-integration method available, and is an improved version of FLATCOTE [31] that integrates elastic integration [31], STC [17], BOSS [24], time series forest [14], and random interval spectrum integration [21].HIVE-COTE achieves the highest accuracy on UCR time series datasets [7], but is extremely complex [5,21], making it difficult to apply to large-scale datasets.
The application of deep learning method to the task of time series classification has gained attention in recent years [6,19].Wang et al. [19] and Fawaz et al. [6] demonstrated the powerful performance of fully convolutional neural networks (FCN) and residual neural networks (ResNet) in time series classification tasks.

Definition and Background
This section firstly provides a definition of the basic concepts, and then introduces the key techniques and theoretical foundations of this work: Shapelet transformation techniques and typical time series characteristics.

Definitions
Definition 1.Time series.A time series T of length l is an ordered sequence of l observations of a variable, which can be expressed as T = <t 1 , t 2 , . . ., t l >, where t i ∈ R.
Compared with the entire time series, here we pay more attention to the local fragments of the time series, which is a time series subsequence.Definition 2. Time series subsequence.A subsequence T i,m = < t i , t i+1 , . . ., t i+m−1 > of time series T refers to a fragment consisting of consecutive m values from index i to index i + m − 1 in T.
In this article, any subsequence can be seen as a Shapelet.According to a Shapelet's discriminating ability, there will be expressions such as "optimal Shapelet" or "optimal k Shapelets".Definition 3. Distance metric.The distances between subsequences can be used to reflect their similarity.For two subsequences S 1 and S 2 with the same length m, here we use the normalized Euclidean distance metric shown in Equation (1).
When calculating the distance between a subsequence S with length m and a time series T with length l (l > m), the subsequence S needs to slide over the time series T to find the best matching subsequence S', and then take the best matching distance (i.e., the distance between the subsequence S and S') as the distance between the subsequence S and the time series T (as shown in Equation ( 2)). Figure 2 describes the above process.
series T with length l (l > m), the subsequence S needs to slide over the time se the best matching subsequence S', and then take the best matching distance tance between the subsequence S and S') as the distance between the subseq the time series T (as shown in Equation ( 2)). Figure 2 describes the above pro It should be noted that most of classification methods based on Shape use Z-standard Euclidean disdance measurements in order to ensure that th offset of the subsequence is invariant-that is, Z-score is calculated according (3) before using Formula (1) to obtain the normalized European distance ( μ of subsequence S and σ is the standard deviation of S).
, 1,2, , For computational efficiency reasons, the RSFCF algorithm uses a m normalized Euclidean distance metric when searching for the best match S', a the Z standardized Euclidean distance metric when calculating the best matc

Shapelet Transforamtion Technology
In order to overcome the shortcomings of the original Shapelet-based c algorithm that can only build a decision tree classification model, Hills et al.It should be noted that most of classification methods based on Shapelet generally use Z-standard Euclidean disdance measurements in order to ensure that the amplitude offset of the subsequence is invariant-that is, Z-score is calculated according to Formula (3) before using Formula (1) to obtain the normalized European distance (µ is the mean of subsequence S and σ is the standard deviation of S).
For computational efficiency reasons, the RSFCF algorithm uses a more efficient normalized Euclidean distance metric when searching for the best match S', and only uses the Z standardized Euclidean distance metric when calculating the best match distance.

Shapelet Transforamtion Technology
In order to overcome the shortcomings of the original Shapelet-based classification algorithm that can only build a decision tree classification model, Hills et al. [16] proposed the Shapelet transformation technology, which separates the two stages of Shapelet discovery and the classifier training, and the converted data can be directly used to train various classification models, which greatly improves the flexibility of the application of Shapelet.Algorithm 1 describes the specific process of the Shapelet transformation algorithm.
The algorithm first scans the dataset to find the optimal k Shapelets (line 1, the specific process can be found in references [8,16]).The time series in the dataset are then converted into feature vectors in the Shapelet space.Specifically, for each time series T i in the dataset, its distances from k Shapelet are calculated according to Equation (2).The vector F i is then formed by the adding the k distances and T i 's class label y i together, which is added as an instance to the transformed dataset D' (lines 3-10).
for Shapelet S j in Shapelets 6.
F i .add(dist);8. end for 9. F i .add(yi ); 10. end for 11. return D'; As described in Section 3.1, when calculating the distance between Shapelet S j and the time series T i , we obtained the subsequence S' (line 6) that is most similar to S j in the T i under the Euclidean distance measure.However, S' is not shown to be explicitly utilized in Algorithm 1.In Section 4, we will describe how to fully exploit the discriminative information contained in it by embedding typical time series features.It is also important to note that the Shapelets found in the training set are used when transforming the test set.

Typical Time Series Characteristics (Catch22)
Catch22 (22 canonical time series characteristics) is a feature set of 22 typical time series features which was designed to assist time series analysis, particularly time series classification, through a concise, diverse, and informative set of descriptive features.
The vast majority of time series in UCR time series datasets have been normalized by Z-score [7,15] (standardized time series whose mean value equals to 0, variance equals to 1).Catch22 was originally proposed to perform feature transformation on the entire time series, thus deliberately excluding features sensitive to mean and variance.However, the RSFCF algorithm proposed in this paper is based on the local characteristics of the time series, which performs feature transformation on the subsequence instead of the entire time series.Generally, the mean value and variance of subsequences contain a wealth of discriminating information [14].For example, mean value can distinguish between subsequences of similar shapes but different amplitudes; variance reflects the degree of dispersion of subsequences.Therefore, in addition to the Catch22 feature set, we also introduce two features: mean and variance.Moreover, since the slope feature reflects the trend of subsequences well and was successfully applied in [3,14], we also introduce it into the algorithm.In total, the RSFCF algorithm proposed in this paper uses a total of 25 features, namely mean, variance, slope, and 22 Catch22 features.In the following, we refer to these 25 characteristics collectively as typical time series features.

Algorithm
This section describes in detail the random Shapelet forest algorithm (RSFCF).Firstly, a novel data transformation method is introduced, which fully excavates the discriminating information in Shapelet by embedding multiple time series features in Shapelet.Secondly, the construction and classification process of RSFCF model is described.The time complexity analysis of RSFCF algorithm is given at last.

Random Shapelet Transformation Embedded with Typical Time Series Features
To improve efficiency while reducing accuracy loss, a new random Shapelet transformation method embedded with typical time series features is proposed on the basis of the traditional Shapelet transformation technique introduced in Section 3.2.
The transformation method (Algorithm 2) first randomly selects k Shapelets from all possible subsequences of the training set according to the specified minimum and maximum length of Shapelet (line 1), and records the starting position of each Shapelet (e.g., Locations[i] = 10 indicates that the starting position of the ith Shapelet is at index 10).To improve efficiency, for 25 time series typical features, we randomly select a features to perform subsequent transformations of the dataset instead of using all of them (line 2).
The transformation process that follows has two key differences from the traditional Shapelet transformation method.First, the restrictedSubDist method (line 7) limits the scope of Shapelet matching, i.e., allows Shapelet to have the maximum offset of each shift size on the left and right (as shown in Figure 3, Shapelet S can only look for the best match with other time series between the two dotted lines).In this way, the algorithm is still able to overcome the time warping problem that is prevalent in time series to a large extent, and the computational complexity of Shapelet matching is reduced from

Random Shapelet Transformation Embedded with Typical Time Series Features
To improve efficiency while reducing accuracy loss, a new random Shapelet transformation method embedded with typical time series features is proposed on the basis of the traditional Shapelet transformation technique introduced in Section 3.2.
The transformation method (Algorithm 2) first randomly selects k Shapelets from all possible subsequences of the training set according to the specified minimum and maximum length of Shapelet (line 1), and records the starting position of each Shapelet (e.g., Locations[i]=10 indicates that the starting position of the ith Shapelet is at index 10).To improve efficiency, for 25 time series typical features, we randomly select a features to perform subsequent transformations of the dataset instead of using all of them (line 2).
The transformation process that follows has two key differences from the traditional Shapelet transformation method.First, the restrictedSubDist method (line 7) limits the scope of Shapelet matching, i.e., allows Shapelet to have the maximum offset of each shift size on the left and right (as shown in Figure 3, Shapelet S can only look for the best match with other time series between the two dotted lines).In this way, the algorithm is still able to overcome the time warping problem that is prevalent in time series to a large extent, and the computational complexity of Shapelet matching is reduced from ( ) Second, when transforming time series T i with Shapelet S j , the traditional method finds the subsequence S' most similar to S j in T i , and only takes the normalized Euclidean distance of S' and S j as a feature value of the transformed instance.On this basis, the proposed method calculates a different feature values of S', and merges the value of a value into the transformed instance (lines 9-12), so that the transformed data representation contains richer information.

Ensemble Classification Model Building
Algorithm 3 describes the process of building an RSFCF ensemble classification model.Algorithm 2 is first invoked to transform the training dataset (line 3), and then the transformed dataset is used to train the decision tree classifier (line 4).The Shapelets and features used in the dataset transformation are saved together in the corresponding decision tree (lines 5-7) for converting the test dataset when classifying.The above process will be repeated r times to build a forest containing r trees.
The construction of the time series tree follows the recursive strategy of the standard decision tree from top to bottom, which takes the information gain as the criterion, divides the instances of the current node with the best splitting threshold of the best splitting attribute at each node, constructing two subnodes on the left and right, and recursively carries out the process until all instances of the node belong to the same category.Regarding the calculation of the optimal split threshold for numeric attributes, the time series tree adopts a more efficient method, it divides the interval composed of the minimum and maximum values of the attribute into κ equal parts (here we set κ to fixed value of 20), and the boundary between each cell is tested one by one as the candidate split threshold, and the threshold for obtaining the maximum information gain is the optimal split threshold for the attribute.In addition, for the split threshold to achieve the same information gain, the time series tree uses the method of maximizing the decision boundary to break the draw (to calculate a reasonable boundary value, the various attributes of the dataset need to be Z-score standardized before training the time series tree to avoid the algorithm's preference for attributes of a larger magnitude).
Equation ( 4) shows how the decision boundary is calculated for the split threshold τ of the jth attribute, where Att j i is the value of the ith instance on the jth attribute.Tree i ←buildTimeSeriesTree(D'); 5.
Tree i .add(Atts);8. end for 9. return RSFCFModel; Algorithm 4 describes the classification process of the RSFCF ensemble classification model.When classifying, RSFCF aggregates the classification results of all time series trees and gives the final classification by majority vote.It should be noted that before classifying using a time series tree, it is first necessary to transform the time series to be classified using the Shapelets stored in the tree and the attribute indices according to the method described in lines 6 to 13 of Algorithm 2. T'←instanceTransform(T,Tree i .Shapelets,Tree i .Locations,Tree i .Atts,shift); 4.

Time Complexity Analysis
Training time complexity: RSFCF is an ensemble classification model composed of r time series trees.Building a time series tree first requires dataset transformation and then training on the transformed dataset.The time complexity of the latter is [14], where n is the number of training set instances, log(n) is the average depth of the tree, and the converted dataset has O(k • a) attributes (k is the number of randomly selected Shapelets, a is the number of features used when transfroming the dataset).Randomly selected Shapelets vary in length and the computational complexity of each feature varies, making it difficult to analyze the precise time complexity of the dataset transformation process, but still making reasonable estimates possible.The transformation of time series T i over Shapelet S j consists of two processes: (1) finding the subsequence S' most similar to S j on the Euclidean distance measure in T i ; (2) calculating the value of S' on a randomly selected features.Process 1 requires 2•shift normalized Euclidean distance calculations with complexity of O(m) (m is the length of Shapelet S j ), so the time complexity of the process is O(shi f t • m).On the basis of the Catch22 feature set, we introduce three features of linear computational complexity (mean, variance and slope), and randomly select a features of 25 features to calculate the feature values, since the average computational complexity of the 22 Catch22 features is approximately linear (O(m 1.16 )) [15], so the computational complexity of process 2 is also approximately linear in the mean sense, and the total time complexity of the two processes can be approximated as O(shi f t • m).Since n time series need to be transformed with k Shapelets, the time complexity of the dataset transformation is O(n • k • shi f t • m), where m is the average length of the Shapelets.Overall, the time complexity of training an RSFCF classification model is: Classification time complexity: The time complexity of transforming test dataset is O(k • shi f t • m), after which a traversal of the average log(n) nodes completes the classification of each tree.Therefore, the time complexity of the classification process is:

Experimental Analysis
This section first analyzes the parameter settings of the proposed algorithm RSFCF, then compares it with several of the most advanced time series classification algorithms to evaluate the accuracy and efficiency of RSFCF, and finally verifies the effectiveness of the RSFCF design strategy through experiments, finding that embedding typical time series features in Shapelet can effectively improve the classification accuracy.To improve the reproducibility of the work, we provide the Java source code of the algorithm (https: //github.com/gaozhenzhuo/RSFCF,accessed on 6 June 2022).

Parameter Settings
RSFCF has a total of six parameters to set (see Table 1).Since the length range of the pattern in the learning task cannot be known in advance, we simply set the minimum and maximum lengths of Shapelet to 3 and l, respectively (l is the time series length in the learning task).The number of Shapelets k and the Shapelet maximum offset shift randomly selected per tree can be used to flexibly control the efficiency of the algorithm, and larger parameter values can theoretically obtain higher accuracy with a decline in efficiency.Parameter k is set to √ l to seek a compromise between accuracy and efficiency.We conducted specialized experiments to verify the effect of the Shapelet maximum offset shift, the number of features a used in the data transformation and the number of trees r in the RSFCF classification model on RSFCF performance to guide parameter value setting.The number of randomly selected Shapelets for each tree √ l a The number of features used in the data transformation 8 shift The maximum offset of Shapelet l/10 r The number of trees in the RSFCF classification model 500 Effect of the number of trees r on the performance of RSFCF (a is set to 8, shift is set to l/10).For a reasonable assessment and to fully account for experimental efficiency, we selected all 60 datasets with a total number of instances less than 2000 and time series length less than 600 in the 112 datasets shown in Table 2 (i.e., the datasets with names bolded in Table 2, which we will abbreviate as Small60 datasets later).Since RSFCF is essentially a random algorithm, we repeated the experiment 10 times on each dataset of Small60.As shown in Figure 4, the accuracy of RSFCF increases with the size of ensemble.Although the average accuracy tends to stabilize when the number of trees exceeds 100, it can be seen from the box diagram in Figure 4 that the accuracy rate of RSFCF in 10 experiments at r = 500 is more stable than at r = 100.The efficiency comparison analysis in the next section shows that RSFCF can still maintain a significant advantage in efficiency when r is set to 500, and when there is a higher requirement for efficiency, setting r to 100 can achieve a five-fold efficiency improvement without causing a significant reduction in classification accuracy.
Although the average accuracy tends to stabilize wh it can be seen from the box diagram in Figure 4 th experiments at r = 500 is more stable than at r = 100.T the next section shows that RSFCF can still maintain when r is set to 500, and when there is a higher requi can achieve a five-fold efficiency improvement witho classification accuracy.Effect of the number of features used in data tr of RSFCF (r set to 500, shift is set to l/10).Figure 5 RSFCF on the Small60 dataset compares to the avera of parameter a.As the value of parameter a increa shows an upward trend, and the average training tim the design goal of RSFCF is to achieve accurate and e set a to 8, which loses only a small amount of accura ciency improvement compared to using all features ( Effect of the number of features used in data transformation a on the performance of RSFCF (r set to 500, shift is set to l/10).Figure 5 shows how the average accuracy of RSFCF on the Small60 dataset compares to the average training time at different settings of parameter a.As the value of parameter a increases, the average accuracy generally shows an upward trend, and the average training time also increases (nearly linear).Since the design goal of RSFCF is to achieve accurate and efficient time series classification, we set a to 8, which loses only a small amount of accuracy but achieves nearly twice the efficiency improvement compared to using all features (i.e., setting a to 25).
Effect of the maximum offset of Shapelet shift on the performance of RSFCF (r set to 500, a is set to 8).Parameter shift is set to restrict the scope of the Shapelet matching process.To find the optimal setting of shift, different values from l•5% to l•50% are tested with Small60 datasets.As shown in Figure 6, as the shift value increases, the training time becomes significantly longer.This is the result of expanding the searching space of Shapelet matching.The classification accuracy increases from the beginning, reaches a stable state when the value of shift reaches 20% of the time series length, and even encounters a small drop at the end.This shows that although time warping exists commonly in time series datasets, the vast majority of shifts occur only within small areas, and it may be counterproductive to expand the matching space of the Shapelet.Again, to reach a compromise between efficiency and accuracy, we choose l/10 as the optimal value of shift.To sum up, subsequent experiments take the default parameter settings given in Table 1.
of parameter a.As the value of parameter a increases, the average acc shows an upward trend, and the average training time also increases (near the design goal of RSFCF is to achieve accurate and efficient time series c set a to 8, which loses only a small amount of accuracy but achieves nearl ciency improvement compared to using all features (i.e., setting a to 25).Effect of the maximum offset of Shapelet shift on the performance of RSFCF (r set to 500, a is set to 8).Parameter shift is set to restrict the scope of the Shapelet matching process.To find the optimal setting of shift, different values from l•5% to l•50% are tested with Small60 datasets.As shown in Figure 6, as the shift value increases, the training time becomes significantly longer.This is the result of expanding the searching space of Shapelet matching.The classification accuracy increases from the beginning, reaches a stable state when the value of shift reaches 20% of the time series length, and even encounters a small drop at the end.This shows that although time warping exists commonly in time series datasets, the vast majority of shifts occur only within small areas, and it may be counterproductive to expand the matching space of the Shapelet.Again, to reach a compromise between efficiency and accuracy, we choose l/10 as the optimal value of shift.To sum up, subsequent experiments take the default parameter settings given in Table 1.

Ablation Experiment
RSFCF randomly selects 8 out a total of 25 features, namely mean, variance, slope, and 22 Catch22 features.The ablation experiment is designed to test whether eight features are good enough for describing a time series by comparing it with using all Catch22 features and using mean, variance, slope features only.The experiment is conducted with all 112 time series datasets under three different settings.With each dataset, the algorithms with Catch22 features and with mean, variance, slope features are run only once, while the original RSFCF is run 10 times to avoid contingency, as the eight features are randomly selected.The average accuracies are finally calculated as shown in Table 2.

Ablation Experiment
RSFCF randomly selects 8 out a total of 25 features, namely mean, variance, and 22 Catch22 features.The ablation experiment is designed to test whether eight features are good enough for describing a time series by comparing it with using all Catch22 features and using mean, variance, slope features only.The experiment is conducted with all 112 time series datasets under three different settings.With each dataset, the algorithms with Catch22 features and with mean, variance, slope features are run only once, while the original RSFCF is run 10 times to avoid contingency, as the eight features are randomly selected.The average accuracies are finally calculated as shown in Table 3.The average accuracy of RSFCF is slightly higher than Catch22, showing that the algorithm benefits from adding the three features.What is more, it should be noted that the feature size of RSFCF is much smaller than Catch22.Using only mean, variance and slope features acquires the lowest average accuracy among the three.This means these three features are not enough to describe a time series when dealing with classification tasks.

Accuracy Comparison
The UCR time series dataset [7] is the standard dataset in the area of time series classification study [30], containing a total of 128 datasets.Bagnall et al. [18] recently conducted a new evaluation of time series classification algorithms using 112 of them (15 of the 16 excluded datasets are not typically used to evaluate algorithm performance due to inconsistent time series lengths or defect values [7,18], and the other is a "Fungi" dataset with only one training instance per category), and gave the accuracy of time series classification algorithms that represent the most advanced level of the current state of affairs.
In order to fully verify the performance of the RSFCF algorithm proposed in this paper, we used the same 112 datasets and compared them with the seven algorithms that performed the best and are most relevant: gRSF [10], STC [17], CIF [3], WEASEL [25], Proximity Forest [5], HIVE-COTE [21], ResNet [19] (the above algorithms are abbreviated as: gRSF, STC, CIF, WS, PF, HCT, RN).These algorithms belong to the six different types of time series classification methods introduced in Section 2, representing the advanced level of the corresponding types.The experiment follows the original training and test set split of the dataset [7], and the results of the comparison algorithms are taken from the results proposed by Bagnall et al. [18].Table 2 shows the accuracy rate of algorithm RSFCF on 112 datasets.
We use the critical difference plots described in [32] to compare the accuracy of multiple classifiers across multiple datasets.We set the significance level α to 0.05, and then use the Friedman test to determine whether the hypothesis "there is no difference in the average ranking of accuracy between multiple classifiers on multiple datasets" is true, and if the hypothesis is not true, we then use the Nemenyi test to group the classifiers (the classifiers connected to the same line in Figure 7 form a group), and there is a significant difference in the average accuracy ranking of any two classifiers in different groups.The critical difference plot shown in Figure 7 shows that the average ranking of RSFCF's accuracy rates outperforms gRSF, STC, CIF, and WEASEL, and significantly exceeds Proximity Forest.It is worth mentioning that according to the recent evaluation of Bagalll et al. [18], the above four algorithms represent the current most advanced level of time series classification methods based on Shapelet, interval, dictionary, and distance.Compared with deep learning methods, RSFCF surpasses the powerful baseline algorithm ResNet.The meta-ensemble method HIVE-COTE obtained the highest average accuracy ranking (2.9063), but there was no significant difference from the algorithm RSFCF in this paper, and the efficiency comparison in the next section verified that the efficiency of RSFCF was much higher than that of HIVE-COTE.Table 4 shows the results of the two-by-two comparison of RSFCF and seven algorithms.Although HIVE-COTE ranks higher on average accuracy than RSFCF, it can be seen from Table 4 that RSFCF still surpassed HIVE-COTE on 36 datasets.Compared to the remaining six algorithms, RSFCF wins on near to or even more than half of the da- The critical difference plot shown in Figure 7 shows that the average ranking of RSFCF's accuracy rates outperforms gRSF, STC, CIF, and WEASEL, and significantly exceeds Proximity Forest.It is worth mentioning that according to the recent evaluation of Bagalll et al. [18], the above four algorithms represent the current most advanced level of time series classification methods based on Shapelet, interval, dictionary, and distance.Compared with deep learning methods, RSFCF surpasses the powerful baseline algorithm ResNet.The meta-ensemble method HIVE-COTE obtained the highest average accuracy ranking (2.9063), but there was no significant difference from the algorithm RSFCF in this paper, and the efficiency comparison in the next section verified that the efficiency of RSFCF was much higher than that of HIVE-COTE.
Table 4 shows the results of the two-by-two comparison of RSFCF and seven algorithms.Although HIVE-COTE ranks higher on average accuracy than RSFCF, it can be seen from Table 4 that RSFCF still surpassed HIVE-COTE on 36 datasets.Compared to the remaining six algorithms, RSFCF wins on near to or even more than half of the datasets, while the other algorithms only win on about one-third of the datasets, which highlights the accuracy advantage of RSFCF.

Efficiency Comparison
To evaluate the efficiency of the algorithm, we ran RSFCF and five comparison algorithms on the Small60 dataset using the same computer (Intel Core i7-7700 (3.60 GHz) processor, 16 GB of memory), recording the algorithm's CPU running time (ResNet was not included in this evaluation because it required a high-performance GPU to complete the training in acceptable time).Figure 8 shows the average training time (horizontal axis) and the average ranking of accuracy (vertical axis) on the Small60 dataset compared to the average ranking of accuracy (vertical axis) of the RSFCF and 4 comparison algorithms (HIVE-COTE was not included because the algorithm was not run on the full Small60 dataset after more than 5 days), and the algorithm is located in the lower left corner of the graph, indicating that its accuracy average ranking is higher and requires less training time.As can be seen from Figure 8, the efficiency of RSFCF far exceeds that of STC.RSFCF is slightly less efficient than WEASEL and CIF, but RSFCF ranks significantly higher on average for accuracy.The above comparative analysis shows that RSFCF is highly competitive in terms of both accuracy and efficiency.

Design Strategy Validation
The biggest innovation in this paper is to provide a novel idea for embedding typical time series features in randomly selected Shapelets.To verify the effectiveness of this strategy, we compared the accuracy of RSFCF and Naive Random Shapelet Forest (RSF) on 112 UCR datasets.RSF is a simplified version of RSFCF that does not embed typical time series features in Shapelet (this simplification can be achieved by setting the parameter a to 0 or commenting on lines 9-12 of Algorithm 2) (Figure 9). is slightly less efficient than WEASEL and CIF, but RSFCF ranks sign average for accuracy.The above comparative analysis shows that RSF petitive in terms of both accuracy and efficiency.

Design Strategy Validation
The biggest innovation in this paper is to provide a novel idea for time series features in randomly selected Shapelets.To verify the effecti egy, we compared the accuracy of RSFCF and Naive Random Shape 112 UCR datasets.RSF is a simplified version of RSFCF that does not e series features in Shapelet (this simplification can be achieved by setti to 0 or commenting on lines 9-12 of Algorithm 2) (Figure 9).

Design Strategy Validation
The biggest innovation in this paper is to provide time series features in randomly selected Shapelets.To egy, we compared the accuracy of RSFCF and Naive 112 UCR datasets.RSF is a simplified version of RSFC series features in Shapelet (this simplification can be a to 0 or commenting on lines 9-12 of Algorithm 2) (Figu Figure 10 shows the results of the comparison of RSFCF and RSF, with RSFCF winning on 69 datasets, while RSF won on only 31 datasets.The reason why RSF is significantly weaker than RSFCF is that RSF does not embed other features, so it can only use the overall shape of Shapelet as a basis for distinguishing different categories of time series, but the randomly selected Shapelet is only a very small subset of all possible Shapelet, so the probability of taking a discriminating "exact shape" is very low, thus affecting the classification accuracy.RSFCF has two significant advantages over RSF.so the probability of taking a discriminating "exact shape" is very low, thus affecting the classification accuracy.RSFCF has two significant advantages over RSF.First, because the value of Shapelet on one or more features can reflect the discriminating shape information contained within it (as shown in Figure 1), RSFCF relaxes the requirements for Shapelet and thus reduces the risk of random selection of Shapelet, resulting in a loss of accuracy; second, in addition to the sequence shape, RSFCF can also capture discriminating information from multiple angles such as the numerical distribution characteristics, autocorrelation, and periodicity of the sequence, which is the most critical reason for which RSFCF surpasses RSF and STC.Based on the comparison results of RSFCF and RSF and the above analysis, we believe that embedding typical time series features in Shapelet plays a key role in improving classification accuracy.

Conclusions
The continuous expansion of data scale puts forward higher requirements for the efficiency of data mining algorithms.In order to perform accurate and efficient time series classification, a random Shapelet forest algorithm (RSFCF) embedded with typical time series features is proposed in this paper.RSFCF randomly selects Shapelet and limits the scope of Shapelet to improve efficiency, and embeds typical time series features in Shapelets to compensate for the loss of accuracy caused by random selection of Shapelet.The classification results on the 112 UCR time series datasets show that the accuracy of RSFCF surpasses that of multiple advanced time series classification algorithms and reaches the current leading level.The meta-integrated method HIVE-COTE is more accurate than RSFCF, but experiments have shown that its efficiency is much lower than that of RSFCF, so HIVE-COTE is difficult to apply to large-scale datasets.In summary, the RSFCF algorithm proposed in this paper takes into account both accuracy and efficiency, and has higher practicality.Future work includes studying a fusion strategy to embed RSFCF into the meta-integration method HIVE-COTE for a more precise classification of scenarios where real-time requirements are not high.First, because the value of Shapelet on one or more features can reflect the discriminating shape information contained within it (as shown in Figure 1), RSFCF relaxes the requirements for Shapelet and thus reduces the risk of random selection of Shapelet, resulting in a loss of accuracy; second, in addition to the sequence shape, RSFCF can also capture discriminating information from multiple angles such as the numerical distribution characteristics, autocorrelation, and periodicity of the sequence, which is the most critical reason for which RSFCF surpasses RSF and STC.Based on the comparison results of RSFCF and RSF and the above analysis, we believe that embedding typical time series features in Shapelet plays a key role in improving classification accuracy.

Conclusions
The continuous expansion of data scale puts forward higher requirements for the efficiency of data mining algorithms.In order to perform accurate and efficient time series classification, a random Shapelet forest algorithm (RSFCF) embedded with typical time series features is proposed in this paper.RSFCF randomly selects Shapelet and limits the scope of Shapelet to improve efficiency, and embeds typical time series features in Shapelets to compensate for the loss of accuracy caused by random selection of Shapelet.The classification results on the 112 UCR time series datasets show that the accuracy of RSFCF surpasses that of multiple advanced time series classification algorithms and reaches the current leading level.The meta-integrated method HIVE-COTE is more accurate than RSFCF, but experiments have shown that its efficiency is much lower than that of RSFCF, so HIVE-COTE is difficult to apply to large-scale datasets.In summary, the RSFCF algorithm proposed in this paper takes into account both accuracy and efficiency, and has higher practicality.Future work includes studying a fusion strategy to embed RSFCF into the meta-integration method HIVE-COTE for a more precise classification of scenarios where real-time requirements are not high.

Figure 1 .
Figure 1.An example of mining discriminative information of subsequences via embedding canonical time series features.

Figure 1 .
Figure 1.An example of mining discriminative information of subsequences via embedding canonical time series features.

Definition 4 .
Time series classification.Given a training set , each instance consists of a time series and its corresponding c time series classification task aims to learn a classification model by training set D model to predict the categories of unlabeled time series.
[1 the Shapelet transformation technology, which separates the two stages discovery and the classifier training, and the converted data can be directly various classification models, which greatly improves the flexibility of the a

Definition 4 .
Time series classification.Given a training set D = {(T 1 , y 1 ), (T 2 , y 2 ), • • • , (T n , y n )} containing n instances, each instance consists of a time series and its corresponding class label.The time series classification task aims to learn a classification model by training set D and uses the model to predict the categories of unlabeled time series.

Algorithm 4 .
classification(T,RSFCFModel,shift) Inputs: Time series to be classified T, Classification model RSFCFModel, Shapelet maximum offset shift Output: Category y of time series T 1. Y←null; 2. for Tree i in RSFCFModel 3.

Figure 4 .
Figure 4. Average accuracy and variance of RSFCF over Sm sizes.

Figure 4 .
Figure 4. Average accuracy and variance of RSFCF over Small60 datasets under different ensemble sizes.

Figure 5 .
Figure 5. Average accuracy and training time of RSFCF over Small60 datasets un tings of parameter a.

Figure 5 .
Figure 5. Average accuracy and training time of RSFCF over Small60 datasets under different settings of parameter a.

Figure 6 .
Figure 6.Average accuracy and training time of RSFCF over Small60 datasets under different settings of parameter shift.

Figure 6 .
Figure 6.Average accuracy and training time of RSFCF over Small60 datasets under different settings of parameter shift.

Figure 7 .
Figure 7. Average ranks and critical differences on accuracy of RSFCF and 7 comparison algorithms over 112 UCR datasets.

Figure 7 .
Figure 7. Average ranks and critical differences on accuracy of RSFCF and 7 comparison algorithms over 112 UCR datasets.

Figure 8 .
Figure 8.Comparison of RSFCF and 4 classifiers in terms of average training tim over Small60 datasets.

31 Figure 8 .
Figure 8.Comparison of RSFCF and 4 classifiers in terms of average training time and average ranks over Small60 datasets.

Figure 8 .
Figure 8.Comparison of RSFCF and 4 classifiers in terms of a over Small60 datasets.

Figure 9 . 31 Figure 9 .
Figure 9.Comparison of random Shapelet forest classifiers features (RSFCF versus RSF) in terms of accuracy over 112 U

Table 4 .
Pairwise comparison of RSFCF and 6 comparison algorithms in terms of accuracy over 112 UCR datasets.