You are currently viewing a new version of our website. To view the old version click .
Computers
  • Article
  • Open Access

17 December 2025

Scalable Univariate and Multivariate Time-Series Classifiers with Deep Learning Methods Exploiting Symbolic Representations

and
Department of Digital Systems, University of Piraeus, 185 34 Piraeus, Greece
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue AI in Its Ecosystem

Abstract

Time-series classification (TSC) is an important task across sciences. Symbolic representations (especially SFA) are very effective at combating noise. In this paper, we employ symbolic representations to create state-of-the-art time-series classifiers, with the aim to advance scalability without sacrificing accuracy. First, we create a graph representation of the time series based on SFA words. We use this representation together with graph kernels and an SVM classifier to create a scalable time-series classifier. Next, we use the graph representation together with a Graph Convolutional Neural Network to test how it fares against state-of-the-art time-series classifiers. Additionally, we devised deep neural networks exploiting the SFA representation, inspired by the text classification domain, to study how they fare against state-of-the-art classifiers. The proposed deep learning classifiers have been adapted and evaluated for the multivariate time-series case and also against state-of-the-art time-series classification algorithms based on symbolic representations.

1. Introduction

Time-series classification (TSC) is an important task across many branches of science. Indicative applications of TSC are seizure detection [1], earthquake monitoring [2], insect classification [3], applications in power systems [4], and other applications [5,6,7,8,9,10,11,12,13]. Symbolic representations (especially SFA) [14] have been proven to be very effective in combating noise. To this end, this paper aims to incorporate symbolic representations to create state-of-the-art time-series classification algorithms, aiming to advance their scalability without sacrificing accuracy. More specifically, this paper aims to answer the following research questions (RQs):
  • RQ1. How would a classifier built on top of graph representations of SFA words perform in terms of accuracy and execution time? Our first attempt to answer this question is through the use of graph kernels, which are more scalable than Graph Neural Networks, with an SVM classifier on top.
  • RQ2. Closely linked with RQ1, we take the graph representation with SFA words and combine it with Graph Convolutional Neural Networks to see how the resulting classifier would fare against the state of the art.
  • RQ3. For this research question, we aim to answer whether SFA, together with state-of-the-art deep learning methods adapted from the text classification domain, can also provide state-of-the-art accuracy and execution times for multivariate time-series classification.
  • RQ4. Finally, we aim to answer whether SCALE-BOSS-MR [15], a state-of-the-art symbolic time-series classifier, can be adapted to the multivariate use case and whether it provides state-of-the-art accuracy and execution time.
To answer the above research questions, this paper makes the following contributions:
1.
We use a graph representation together with a symbolic representation to represent time-series as a graph. First, we use this representation together with graph kernels and an SVM classifier to create SCALE-BOSS-GRAPH.
2.
We use the graph representation of time-series in conjunction with a Graph Convolutional Neural Network to see whether it can attain state-of-the-art accuracy and execution time.
3.
We adapt state-of-the-art neural network architectures from the text classification domain, and we use them in conjunction with symbolic representation to create state-of-the-art deep learning symbolic time-series classifiers. We also adapt the proposed deep learning methods to the multivariate use case and compare them against state-of-the-art deep learning methods.
4.
We adapt SCALE-BOSS-MR to the multivariate use case, and we compared the adapted version SCALE-BOSS-MR-MV to state-of-the-art time-series classifiers.

3. Graph Kernel Preliminaries

The idea behind graph kernels is to create a kernel matrix that can be used with off-the-shelf machine learning models to make predictions. The kernel matrix measures the similarity between all pairs of graphs. In order for the kernel matrix to be valid, the matrix has to be symmetric with positive eigenvalues.
The key idea of kernel methods is to design a feature representation of a given graph as a bag-of-words. In this case, we need a feature representation ϕ (.) such that K ( G 1 , G 2 ) = ( ϕ ( G 1 ) ) · ϕ ( G 2 ) , where K is the kernel matrix, G 1 is the first graph and G 2 is the second graph. The naive representation is to use words as nodes. The problem with this approach is that we can have the same representation for very different graphs. This happens because this approach does not consider connectivity between the nodes of each graph.
To address this issue, we are using the Weisfeiler–Lehman (WL) graph kernel. The main idea of the WL graph kernel [45] is to have an efficient feature descriptor f ( G ) by exploiting the neighborhood structure of the graph to iteratively enrich the node vocabulary, i.e., the set of possible labels for the graph nodes. The WL is a generalized version of node degrees because node degrees provide 1-hop information, whereas WL generalizes this to the k-hop. The algorithm to achieve this is called WL isomorphism test or color refinement.

Color Refinement

This section describes the color refinement procedure behind the WL kernel.
Given a graph G with nodes V, the process assigns an initial color c ( 0 ) ( v ) to each node v. Then, it iteratively refines colors as follows: c ( k + 1 ) ( v ) = H A S H ( c ( k ) ( v ) , c ( k ) ( u ) ) , with u n e i b ( v ) , and H A S H maps different inputs (node labels) to different colors. After k steps of color refinement, it summarizes the structure of the K-Hop neighborhood.
Figure 1, Figure 2, Figure 3 and Figure 4 provide an example of the application of the WL graph kernel for two graphs.
Figure 1. Iteration 0 of color refinement.
Figure 2. Hashing after the first iteration of color refinement.
Figure 3. Hashing after the second iteration of color refinement.
Figure 4. Vector representation of the two graphs.
Initially, all nodes of the two graphs get a color (label) of 1. Next, for each node it computes the next label. The next label for each node is created by merging the colors of its neighbors together with its previous color. For example, the top left node of G 1 gets a label of (1,111), whereas the top left node of G 2 gets a label of (1,11).
After it has created the new color for each node, the processes pass it through the hash function. For example, according to the hash table, (1,111) gets a new color 4, while (1,11) gets a new color 3.
Figure 3 shows the second iteration of the color refinement procedure.
Figure 4 provides the feature representation of the two graphs, by counting the total number of color occurrences in each graph.

4. Deep Learning Preliminaries

4.1. Convolutional Neural Networks

The original purpose of Convolutional Neural Networks (CNNs) [46,47] is to detect certain features in an image. The early layers of a CNN detect local visual features, and later layers of a CNN take those features and combine them to produce higher-level features.
In order for a CNN layer to detect features, the main idea is to pass a filter across the image (or more generally, a tensor) with a process called convolution.
The convolution operation retains spatial information and essentially performs a point-wise multiplication of the patch of an image with the filter followed by an addition of the point-wise multiplication results to extract the final value.
Usually, convolutional layers are followed by a pooling operation for dimensionality reduction. Pooling is an operation of applying a small filter that returns either the maximum value (max pooling) or the average (average pooling) in the specific window of the tensor.

4.2. Recurrent Neural Networks

RNNs [48,49,50] are created to handle sequence information. The RNN has a single hidden layer, but it also has a memory buffer that stores the output of the hidden layer and feeds it back into the layer along with the next element from the input sequence. This means that the RNN processes each input of the sequence together with context information from the previous inputs. One problem with RNNs is that, while they can process a sequence of inputs, they have the problem of vanishing gradients. This is because training an RNN to process a sequence of inputs requires the error to be backpropagated through the entire length of the RNN.

4.3. Long Short Term Memory

In [51] the authors present the Long Short Term Memory (LSTM) cells that aim to solve the vanishing gradient problem.
In LSTMs, the propagation of the activations within the cell through time is controlled by three components called gates: the forget gate, the input gate, and the output gate. The forget gate determines which activations in the cell should be forgotten at each time step; the input gate decides how the activations in the cell should be updated in response to the new input. Finally, the output gate controls what activations should be used to generate the output in response to the current input.
The forget gate takes the concatenation of the input and the hidden state and passes this vector through a layer of neurons that uses a sigmoid activation function. The output of this forget layer is a vector of values in the range 0 to 1. The cell state is then multiplied by this forget vector. The result of this multiplication is that activations in the cell state that are multiplied by components in the forget vector with values near 0 are forgotten, and activations that are multiplied by components with values near to 1 are remembered. Next, the input gate decides what information should be added to the cell state.
The procedure works as follows. The gate decides which elements in the cell state should be updated and what information should be included in the update. The concatenated input x t plus hidden state h t 1 is passed through a layer of sigmoid units to generate a vector of elements, the same width as the cell, where each element in the vector is in the range 0 (no update) to 1 (to be updated). At the same time that the filter vector is generated, the concatenated input and hidden state are also passed through a layer of t a n h units. The final stage of processing in an LSTM is to decide which elements of the cell should be output in response to the current input.
A candidate output vector is generated by passing the cell through a t a n h layer. At the same time, the concatenated input and propagated hidden state vector are passed through a layer of sigmoid units to create another filter vector. The actual output vector is then calculated by multiplying the candidate output vector by this filter vector. An RNN with LSTM cells is called an LSTM network.

4.4. Attention

In [52] the authors present the attention mechanism to address the bottleneck problem that arises with the use of a fixed-length encoding vector, where the decoder would have limited access to the information provided by the input. This is thought to become especially problematic for long sequences.
The main idea of the attention mechanism is to have a context vector per time step. The intuition behind the context vector is that now the context vector attends to the relevant part of the input sequence.
The attention mechanism is composed of three parts:
1.
Alignment scores: The alignment scores e indicate how well the elements of the input sequence align with the current output. The alignment scores are computed by e = t a n h ( d o t ( x , W ) + b ) , where W is the weight matrix of the attention layer, b is the bias of the attention layer and x is the input of the attention layer.
2.
Weights: The weights a are computed by passing a softmax to e. That is, a = s o f t m a x ( e )
3.
Context Vector: The context vector c is computed by the dot product of x and a. That is, c = d o t ( x , a )

5. Proposed Methods

5.1. SCALE-BOSS-GRAPH

The SCALE-BOSS-GRAPH framework fuses graph and symbolic representations.
The idea of the graph-based symbolic representation comes from the input of BOSS, BOSS-VS and K-BOSS-VS. More specifically, after each time-series is encoded in SFA words, each successive word is connected to the one after it to create a graph. Figure 5 shows examples of graph-based symbolic representations. For example, for the first time-series in the figure, the first word “acbb” is connected with the second word “acbb”, which in turn is connected with the third word “acbc” and so on. Each time-series in the dataset is encoded as a graph and the nodes of the graph are labeled using the SFA representation.
Figure 5. Graph-based symbolic representation.
The workflow of SCALE-BOSS-GRAPH is as follows:
1.
In the first step the training set is converted to its symbolic representation. In our instantiation we have chosen SFA but the choice of the symbolic representation is orthogonal to the framework.
2.
Next, we create the graph-based symbolic representation for the training set.
3.
In the third step, we create the Kernel for the training set. For this instantiation, we have chosen the WL graph kernels [45] with Vertex Histogram kernel [53] as the base kernel. The choice of the kernel is orthogonal to the method. We have chosen WL with Vertex Histogram as a base kernel for scalability reasons. More concretely WL proved to be amongst the fastest graph kernels with very good accuracy results. We experimented with other graph kernels, but most were orders of magnitude slower than WL with no gains in terms of accuracy. In addition to this, as already pointed out, the WL kernel is a generalization of nodes’ degrees since these provide one-hop neighborhood information, whereas WL kernel generalized this to K-hop information.
4.
Then, we convert the test set to its symbolic representation, and the Graph-based symbolic representation.
5.
Then, we create the kernel representation for the test set.
6.
The SVC classifier is trained on the precomputed kernel for the training set.
We also used the graph-based symbolic representation as an input to a CNN [34].

5.2. Proposed Networks for the Univariate Case

In this section we describe the neural network architectures that have been studied to exploit the symbolic representation of univariate time-series. In all of the proposed neural network architectures, the input is constructed as follows:
1.
First, the raw time-series is converted to the SFA representation.
2.
Then each of the resulting SFA words is converted to integers according to the following scheme.
(a)
The most common SFA word is assigned the integer 2.
(b)
The second most common SFA word is assigned the integer 3 and so on for all the next in frequency SFA words.
(c)
The integer 0 is used for padding and the integer 1 is reserved for the Out-of-Vocabulary token.
Figure 6 depicts the architecture of the LSTM-CNN neural network. The first layer is an embedding layer that takes the sequence of integers and creates embeddings. Embedding vectors have a length of 32. Then, the embedding layer is fed into a CNN followed by a MaxPooling layer. The CNN has a filter size 32 and kernel size of 3. After that, the results of the MaxPooling layer are fed into an LSTM layer. The LSTM is followed by a batch normalization layer that aims to help the network train faster and achieve better accuracy. Next, we have a Dropout layer to prevent overfitting. Finally, a fully connected layer does the classification.
Figure 6. LSTM-CNN network.
Different CNN architectures could be used if we would like to make the proposed architecture deeper. However, keeping the architecture relatively shallow has positive effects on execution time. Also, according to our tests, making the architecture deeper did not necessarily help in terms of accuracy.
In our evaluation section, we compare our proposed architectures with ResNet and it appears to be significantly faster and more accurate.
Figure 7 shows the architecture of a network that combines Bidirectional LSTM with a self-attention layer. A Bidirectional LSTM adds a second LSTM layer that processes the data in reverse (right-to-left). Thus, Bidirectional LSTM provides the model with a complete understanding of the context surrounding a given point in the sequence. The first layer is an embedding layer that takes the sequence of integers and creates embeddings. The embeddings have a vector length of 32. The embeddings are fed into an LSTM layer that is followed by an attention layer, which is followed by a batch normalization layer and a dropout layer. Finally, a Fully Connected layer does the classification.
Figure 7. LSTM-Attention network.
Figure 8 shows a neural network architecture that augments the LSTM-Attention network with a CNN layer. The CNN layer is placed before the Bidirectional LSTM layer and is followed by a MaxPooling layer. The CNN has a filter size of 32 and a kernel size of 3. The purpose of the CNN layer is to improve accuracy, but also to significantly reduce execution time. The CNN layer performs dimensionality reduction (via Max Pooling) and thus reduces the complexity of the step after it.
Figure 8. LSTM-Attention-CNN network.
Finally, in Figure 9 we see the LSTM-Attention-CNN2 network that augments the LSTM-Attention-CNN network with an additional CNN layer followed by a MaxPooling layer. The first CNN has a filter size of 32 while the second CNN has a filter size of 16, and both have a kernel size of 3. The extra CNN Layer aims to reduce execution time.
Figure 9. LSTM-Attention-CNN2 network.
The algorithms proposed were implemented on top of pyts [54], GraKeL [55], scikit-learn [56], Keras [57] and StellarGraph [58].

5.3. Merging Strategies for the Multivariate Use Case

In the case of multivariate time-series, there are two strategies to handle the multiple attributes: The BAG and the STACK strategies that are used in both the proposed Deep Learning methods and the proposed extension of SCALE-BOSS-MR (SCALE-BOSS-MR-MV or SBMR-MV). More concretely, suppose we have a time-series with two attributes Attr0 and Attr1. Attr0 contains the sequence “aaaa bbbb” and Attr1 contains the sequence “cccc dddd”. In the BAG case we create the sequence “aaaa bbbb cccc ddd” and then we create the Term-Frequency vector (in the SBMR-MV case) or we pass it into the embedding layer (in the Deep Learning algorithm case.) In the STACK case we create a term frequency vector (or a vectorized representation by an Embedding Layer) for Attr0 and then a term frequency vector (or an Embedding Layer output) for Attr1. Then we concatenate the two vectorized representations horizontally (column-wise) to obtain the final vectorized representation.

5.4. Proposed Multivariate Neural Networks

In the BAG case, each attribute of a multivariate time-series is transformed into its symbolic representation, and the symbolic representations are then fused. Then these are passed on to an embedding layer. In the BAG case, the relationships between attributes are not retained.
From then on the process is the same as for the univariate time-series.
Figure 10 shows the workflow for LSTM-Attention-CNN2 with the BAG workflow.
Figure 10. Depiction of the BAG version of LSTM-Attention-CNN2 neural network.
In the STACK case, each attribute is transformed into a symbolic representation, which is fed into an embedding layer. Then, the embeddings are concatenated to form the final embedding of the input sequence.
In Figure 11 we can see the workflow for LSTM-Attention-CNN2 with the STACK workflow.
Figure 11. Depiction of the STACK version of LSTM-Attention-CNN2 neural network.

5.5. The SCALE-BOSS-MR-MV (SBMR-MV) Algorithm

Before diving deeper into the specifics of SBMR-MV, we explain in more detail the Merging used in SBMR-MV. More concretely in the BAG case, the vocabularies for all the attributes of the time-series get merged to create a single vocabulary and then a term frequency vector is created. For example, the word aaaa for attribute 0 and the word aaaa for attribute 1 will be in the same bag, which now has a term frequency of 2 for time-series 0. This is because the word aaaa for attribute 0 has a term frequency value of 1 for time-series 0 and the word aaaa for attribute 1 also has a term frequency of 1. When the two get merged to create the final Term-Frequency vector for the BAG strategy, we get a Term-Frequency value of 2. The BAG merging strategy is shown in Figure 12.
Figure 12. BAG merging strategy example for SBMR-MV.
In the STACK case, each attribute is transformed into its symbolic representation and then a Term Frequency vector is created for each attribute. Then the term frequency vectors are stacked column-wise to produce the final term frequency vector. In contrast to the BAG strategy, when using the STACK merging strategy, the vocabularies for each attribute remain distinct and the term frequency vectors are in effect stacked horizontally (column-wise). For example the word aaaa for attribute 0 is denoted as “aaaa Atr0” and is different than the word aaaa for attribute 1 that is denoted as “aaaa Atr1”. In the STACK merging strategy, both “aaaa Atr0” and “aaaa Atr1” get a Term-Frequency value of 1 for time-series 0.
The STACK merging strategy is shown in Figure 13.
Figure 13. STACK merging strategy example.
The final Term Frequency vector is then passed to the classifier and the procedure from then on is the same as in SCALE-BOSS-MR.
Algorithm 1 describes the SCALE-BOSS-MR-MV (SBMR-MV) process in detail. Lines 1 through 16 describe the inner method of SBMR-MV.
In lines 4 to 7, we apply dilation to each attribute of the input time-series.
Then, in lines 8 through 12, we create the term frequency vector and transform it either using bag or stack merging strategy. Then, in line 13 we merge the current term frequency vector with the global term frequency vector to create the final term frequency vector that will be passed to the classifier. In lines 18 and 19, we create the global term frequency vectors for the train and test sets, respectively. In line 20 we run SBMR-MV-inner. This is a helper function for creating the term frequency vector from the input time-series for the training set. In line 23 we run the SBMR-MV-inner for the first-order differences of the training set. In line 24 we fit the classifier with the term frequency vector of the training set. In line 25 we run SBMR-MV inner for the test set. In line 28 we run the SBMR-MV-inner for the first-order differences of the test set. Finally, in line 29 we get the predicted values for the test set.
The algorithm has two sets of hyperparameters, the window size and the dilation size. The window size allows SBMR-MV to see the input at different granularities similar to WEASEL. Dilation allows the algorithm to see parts of the input that are further ahead.
Algorithm 1: SCALE-BOSS-MR-MV (SBMR-MV) algorithm.
Computers 14 00563 i001

6. Evaluation

6.1. Experimental Setting

We have chosen mean accuracy and mean execution time as the measures of choice.
We define execution time (Total_time) as the total execution time (training time plus test time) of each method for each dataset. Total_Time_Mean is the mean execution time across all datasets. Accuracy_Mean is the mean accuracy across all datasets.
We did that because all of the methods in the literature [15,18,19,20,24,25] use accuracy as a measure of qualitative performance. In addition to that, we used mean execution time as a measure of computational efficiency.
Our tests were performed on a mid 2015 Macbook with a 2.2 Ghz Intel Core i7 and 16 GB of RAM.
Table 1 shows the characteristics of the datasets used for the evaluation of the univariate case. We have chosen the eight datasets from the UCR archive [59] with the largest training set.
Table 1. Characteristics of the datasets.
In Table 2 we see the characteristics of the multivariate datasets.
Table 2. Description of the multivariate datasets.
We have chosen five UEA multivariate datasets. We have chosen datasets that have large enough trainSize, contain no null values, and have at least 40 timestamps.
In our evaluation, we measured the following:
1.
The performance, in terms of accuracy and execution time for SCALE-BOSS-GRAPH for the univariate use case.
2.
The performance, in terms of accuracy and execution time of the proposed SFA-enhanced neural network classifiers in the univariate use case.
3.
The performance, in terms of accuracy and execution time of the proposed SFA-enhanced neural network classifiers in the multivariate use case.
4.
The performance, in terms of accuracy and execution time of the adaptation of SCALE-BOSS-MR (SCALE-BOSS-MR-MV) to the multivariate use case.
We provide the detailed Accuracy and Total Execution Time for each algorithm and dataset in the Appendix A of the paper.

6.2. SCALE-BOSS Results

Table 3 shows the mean accuracy and mean execution time for a selection of SCALE-BOSS instantiations used as a baseline to judge the performance of SCALE-BOSS-GRAPH. The selected instantiations were chosen because they are the best in terms of accuracy and execution time.
Table 3. Accuracy and total execution time of the SCALE-BOSS classifiers.

6.3. State-of-the-Art Algorithm Results

Table 4 shows the mean accuracy and mean execution time for the state-of-the-art algorithms.
Table 4. Accuracy and execution time of state-of-the-art algorithms.
We have chosen WEASEL 2.0 and MRSQM as state-of-the-art algorithms that use symbolic representations, and Rocket and miniRocket as state-of-the-art algorithms in terms of accuracy, which are also quite efficient in terms of execution time.
The mean accuracy and mean execution time for the state-of-the-art algorithms are used as a yardstick of success to compare the algorithms proposed. In bold we can see the algorithm with the best accuracy and the algorithm with the best execution time.

6.4. SCALE-BOSS-GRAPH Results

Table 5 shows the mean accuracy and mean execution time for several instantiations of the SCALE-BOSS-GRAPH framework.
Table 5. Mean accuracy and execution time of the proposed algorithms.
The SVC classifier has two main parameters: (a) the number of iterations and (b) the regularization parameter C.
In the first row, we can see the result of a SVC classifier with 2 iterations and using cross validation for three C values 2.0, 3.0 and 3.5. In the second row, we can see the results for a SVC classifier with 2 iterations and a C value of 3.5. In the third row, we can see the result of a SVC classifier for 2 iterations and a C value of 1.0.
Table 5 answers RQ1. This question aims to answer whether graph kernel-based algorithms fused with SFA representation are accurate and efficient. As we can see in Table 5, the WL kernel together with the SFA representation and a kernel SVM is moderately accurate (it is less accurate than MB-K-BOSS-VS) with good execution time (twice the execution time of BOSS-RF and K-BOSS-VS, but significantly less than that of other state-of-the-art algorithms).

6.5. Evaluation of the Deep Neural Network Architectures

As part of our evaluation, we used the deep learning architectures provided by aeon [60] as a baseline to compare against our own.
Table 6 shows the average accuracy and execution time of the state-of-the-art neural classifiers. The ep20 notation means that the algorithm ran for 20 epochs, while the no-bn notation means that there is no batch normalization layer. The IndividualLite classifier trained for 100 epochs achieves the best accuracy with a reasonable execution time. The CNNClassifier trained for 200 epochs achieves very high accuracy with the best trade-off between accuracy and execution time.
Table 6. Accuracy and total execution time of the neural classifiers.
In Table 7 we can see the average accuracy and average execution time for the SFA-enhanced neural networks. The LSTM-attention-cnn2 trained for 50 epochs achieves the best accuracy with a very competitive execution time. The LSTM-attention-cnn trained for 50 epochs follows in terms of accuracy, but with double the execution time. The LSTM-attention-cnn2 trained for 20 epochs achieves very high accuracy with the best execution time. The addition of batch normalization gives an improvement in accuracy of 4% for the 50 epoch case. The Graph Convolutional Neural network (GCN) achieves the worst performance with relatively high execution time compared to the other networks.
Table 7. Accuracy and total execution time of the SFA-enhanced neural classifiers.
Table 7 answers RQ2, whether using GCN instead of the WL kernel is more efficient. As we can see, GCN is only slightly worse in accuracy compared to the WL kernel (as shown in Table 5) but is significantly worse in terms of execution time.
The rest of Table 7 answers RQ3 on how the proposed neural network architectures compare against the state-of-the-art neural network classifiers. As we can see, the proposed neural network architectures score the same or better in terms of accuracy compared to the state-of-the-art methods, but are significantly more efficient in terms of execution time.
In Figure 14 we can see the average accuracy and the average execution time for the SFA-enhanced neural classifiers. The SFA-enhanced neural classifiers are the classifiers we created by fusing the SFA symbolic representation and neural networks. LSTM-attention-cnn2-ep50 is the best in terms of balancing execution time and accuracy. It achieves the best accuracy with a modest 157 seconds of mean execution time. The same model trained for 20 epochs follows closely with 1% lower mean accuracy but with the best execution time.
Figure 14. Execution time and accuracy for the neural classifiers.
Figure 15 shows the average accuracy and average execution time for the state-of-the-art neural classifiers.
Figure 15. Execution time and accuracy for the state-of-the-art neural classifiers.

6.6. Multivariate Evaluation

We have chosen to reuse the windowing and dilation configurations that proved be the best in SCALE-BOSS-MR [15]. These are shown in Table 8 and Table 9.
Table 8. Window configurations.
Table 9. Dilation configurations.
The W0 configuration has a single window size with step 1, which is the baseline configuration. The W8 configuration has many window sizes with step 1; this configuration is the most compute-intensive but gave the best results in terms of accuracy in the SCALE-BOSS-MR case. The W11, W14 and W15 are very similar in terms of window size and the main difference is in the window step parameter.
Regarding dilation, larger dilation sizes mean that the algorithm can look further in time. Having multiple dilation sizes in theory gives the algorithm multiple resolution of the time-series. The D0 parameter is to have dilation 1 which means no dilation. This is the configuration that works as a baseline, to see the effects of dilation in the other configurations. D2 has the most dilation sizes (4 dilation sizes in total) aiming to provide the best accuracy. D4 and D6 have a single dilation size parameter beside 1.
In Table 10 we see the mean execution time and mean accuracy for the novel neural network architectures. As we can see, adding self-attention gives a 7 % boost in accuracy and that the STACK strategy is significantly more accurate compared to the BAG strategy.
Table 10. Proposed neural network methods’ mean accuracy and execution time.
Table 11 provides the mean execution time and mean accuracy for the state-of-the-art neural networks. The CNNClassifier is really scalable and achieves good accuracy compared to other neural network methods. We can see that our proposed neural networks are significantly more accurate than the state-of-the-art classifiers.
Table 11. State-of-the-art neural network methods’ mean accuracy and execution time.
In Table 12 we see the mean execution time and mean accuracy for the single window SCALE-BOSS-MR Multivariate (SBMR-MV).
Table 12. SCALE-BOSS-MR Multivariate (SBMR-MV) mean accuracy and execution time for a single window.
SBMR-MV-ridge-cv-W0-trend-D2-BG-STACK achieves the best accuracy but at the expense of execution time.
On the other hand, SBMR-MV-RF-W0-trend-D4-BG-STACK and SBMR-MV-ridge-cv-W0-trend-D4-BG-STACK achieve similar accuracy with significantly lower execution time. SBMR-MV-RF-W0-trend-D0-UG-STACK achieves an accuracy of 0.657 with a very low execution time. SBMR-MV-RF-W0-noTrend-D0-UG-STACK does not contain any trend information and gives an accuracy of 0.65. The above results show that adding trend information and dilation helps improve accuracy.
In Table 13 we see the mean execution time and mean accuracy for the multiple windows SBMR-MV. This table shows the effect of using multiple windows and chi squared test for feature selection and on top of dilation and trend information.
Table 13. SCALE-BOSS-MR Multivariate (SBMR-MV) mean accuracy and execution time for multiple windows.
SBMR-MV-ridge-cv-W15-trend-D2-BG-BAG-chi gives the best accuracy of 0.733 while being relatively scalable. SBMR-MV-ridge-cv-W15-trend-D2-BG-STACK-chi is not far behind but with higher execution time. SBMR-MV-ridge-cv-W15-trend-D6-BG-STACK-chi is also very close with accuracy 0.729 while being significantly more scalable compared to other window configurations. SBMR-MV-ridge-cv-W15-trend-D6-BG-BAG-chi is not far behind with accuracy of 0.725 while being even more scalable compared to SBMR-MV-ridge-cv-W15-trend-D6-BG-STACK-chi. SBMR-MV-ridge-cv-W15-trend-D6-BG-BAG achieves accuracy of 0.713, which shows that the chi squared test gives a slight bump in accuracy when using multiple windows.
We can also observe that the W15 configuration is generally better than W11 while also being significantly more scalable. W14 is not better than W11 or W15, neither in terms of performance nor in accuracy. W8 has by far the slowest execution time while not producing better results.
Table 12 and Table 13 answer RQ4 on how SCALE-BOSS-MR can be adapted to the multivariate use case and how it performs in terms of accuracy and execution time.
Table 14 shows the mean execution time and the mean accuracy for the Rocket family of classifiers. Both ROCKET and miniRocket proved to be very accurate, but ROCKET is significantly less scalable (in terms of execution time) than SBMR-MV.
Table 14. Rocket method mean accuracy and execution time.

7. Conclusions

During our evaluation, we arrived at the following conclusions:
1.
The SCALE-BOSS-GRAPH algorithm proved very scalable, but its accuracy is lower than the state of the art.
2.
The GCNs were on par in terms of accuracy with the state-of-the-art deep learning methods, but were not that scalable compared to other methods.
3.
The proposed deep learning methods inspired by the text classification domain proved to be very effective both in terms of accuracy as well as in terms of execution time when compared with state-of-the-art deep learning methods both in the univariate and multivariate use cases. The optimizations we employed were instrumental for this result; for example, batch normalization gave a 5 % improvement in accuracy over the baseline, whereas adding an attention layer to the model gave a 10 % increase in accuracy but with a non-negligible cost in execution time. Adding convolutional layers to the architecture did not contribute much in accuracy but proved very beneficial in terms of execution time.
4.
The adaptation of SCALE-BOSS-MR to the multivariate use case namely SCALE-BOSS-MR-MV proved very scalable and accurate compared to the state of the art. More specifically, SCALE-BOSS-MR-MV proved to be more scalable than ROCKET while being marginally less accurate.
For future work, we plan to try transformer architectures fused with the SFA symbolic representation. Another line of research would be to try other Graph Neural Networks with more complex graph representations fused with the SFA symbolic representation.

Author Contributions

Conceptualization, A.G. and G.V.; methodology, A.G. and G.V.; software, A.G.; validation, A.G. and G.V.; formal analysis, A.G. and G.V.; investigation, A.G.; resources, not applicable; data curation, A.G.; writing—original draft preparation, A.G. and G.V.; writing—review and editing, A.G. and G.V.; visualization, A.G. and G.V.; supervision, G.V.; project administration, G.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. The UCR and UEA datasets used in this study can be found at: http://www.timeseriesclassification.com/dataset.php (accessed on 24 November 2025) and https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ (accessed on 24 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The appendix shows detailed accuracy and total execution time results for all the methods and datasets. The detailed tables help compare the proposed methods with other methods proposed in the literature, as well as compare the proposed methods against each other on a per-dataset basis.
In Table A1 we can see the accuracy of the instantiations of SCALE-BOSS-GRAPH for all the datasets.
In Table A2 we can see the execution time of the instantiations of SCALE-BOSS-GRAPH for all the datasets.
In Table A3 we can see the accuracy of the state-of-the-art neural classifiers for all the datasets.
In Table A4 we can see the execution time of the state-of-the-art neural classifiers for all the datasets.
In Table A5 we can see the accuracy of the SFA-enhanced neural implementation for all the datasets.
Crop, NonInvasiveFetalECGThorax2 and NonInvasiveFetalECGThorax1 have class count higher than or equal to 8. Both LSTM-attention-cnn2-ep50 and LSTM-attention-cnn2-ep50 have accuracy higher than or equal to the best state-of-the-art neural classifiers. This shows that our proposed method works well with datasets with high class count.
In Table A6 we can see the execution time of the SFA-enhanced neural implementation for all the datasets.
In Table A7 we see the accuracy for the single-window SBMR across all datasets.
ArticularyWordRecognition and UWaveGestureLibrary have class count higher than or equal to 8. We can see that the proposed algorithm even with a single window can achieve high accuracy. It is worth noting that the accuracy on those datasets is sometimes lower compared to datasets with lower class count. This suggests that (a) the proposed algorithm works well with datasets with high class count and (b) that the class count is not a deciding factor for classification accuracy.
In Table A8 we see the execution time for single-window SBMR across all datasets.
In Table A9 we see the accuracy for the multiple-window SBMR across all datasets.
We can see that SBMR-MV with multiple window sizes achieves near perfect accuracy for the datasets with high class count. This also reinforces the argument that high class count is not a very good indicator of classification difficulty of a dataset.
In Table A10 we see the execution time for multiple-window SBMR across all datasets.
In Table A11 we see the accuracy for the novel neural network architectures across all datasets.
We can see that the proposed algorithms and especially LSTM-attention-cnn2-ep50-stack perform really well on datasets with high class count especially compared with state-of-the-art neural network architectures. We can also observe that the datasets with high class count are not the most difficult to classify.
In Table A12 we see the execution time for novel neural networks across all datasets.
In Table A13 we see the accuracy for the state-of-the-art neural network architectures across all datasets.
In Table A14 we see the execution time for state-of-the-art neural networks across all datasets.
In Table A15 we see the accuracy for the Rocket family of algorithms across all datasets.
In Table A16 we see the execution time for the Rocket family of algorithms across all datasets.
Table A1. Accuracy of the instantiations of SCALE-BOSS-GRAPH.
Table A1. Accuracy of the instantiations of SCALE-BOSS-GRAPH.
DatasetCropFordAFordBHandOutlinesNonInvasiveFetalECGThorax1NonInvasiveFetalECGThorax2PhalangesOutlinesCorrectTwoPatterns
Algorithm
SVC-2_1.00.6050.8130.5980.7000.5340.6990.7040.786
SVC-2_3.50.6120.8110.6200.6950.5880.7320.7260.794
SVCcv-2_2.0_3.0_3.50.6120.8110.6170.7110.5880.7350.7230.793
Table A2. Execution time of the instantiations of SCALE-BOSS-GRAPH.
Table A2. Execution time of the instantiations of SCALE-BOSS-GRAPH.
DatasetCropFordAFordBHandOutlinesNonInvasiveFetalECGThorax1NonInvasiveFetalECGThorax2PhalangesOutlinesCorrectTwoPatterns
Algorithm
SVC-2_1.029.30726.35022.95729.29726.08924.7832.7267.243
SVC-2_3.529.85726.38323.94827.08724.55224.6672.4136.966
SVCcv-2_2.0_3.0_3.546.93929.53627.20828.83427.34526.8612.9907.879
Table A3. Accuracy of the state-of-the-art neural network classifiers.
Table A3. Accuracy of the state-of-the-art neural network classifiers.
DatasetCropFordAFordBHandOutlinesNonInvasiveFatalECGThorax1NonInvasiveFatalECGThorax2PhalangesOutlinesCorrectTwoPatterns
Algorithm
CNNClassifier-ep200.0490.5100.5090.8590.0180.0180.6130.344
CNNClassifier-ep2000.5990.9050.7590.8840.7900.8560.6500.909
EncoderClassifier-ep200.6640.9420.8010.9000.8310.8480.6570.847
FCNClassifier-ep200.5050.8960.7520.6410.0300.0420.6670.838
IInception-ep200.6960.9270.7040.3590.2570.5000.6961.000
IndividualLITEClassifier-ep1000.7200.9480.8090.7810.5890.8830.8281.000
IndividualLITEClassifier-ep200.6490.9060.8330.6650.3040.2500.7750.690
IndividualLITEClassifier-ep500.7060.9370.7750.8220.5370.6830.7001.000
MLPClassifier-ep200.0670.5680.5060.7220.0320.0460.6130.270
MLPClassifier-ep2000.2700.6980.5980.8540.2050.2190.6140.484
ResNet-ep200.6520.9280.7160.6680.3710.7000.6110.999
Table A4. Execution time of the state-of-the-art neural network classifiers.
Table A4. Execution time of the state-of-the-art neural network classifiers.
DatasetCropFordAFordBHandOutlinesNonInvasiveFatalECGThorax1NonInvasiveFatalECGThorax2PhalangesOutlinesCorrectTwoPatterns
Algorithm
CNNClassifier-ep209.2928.7408.8568.0976.6426.6642.9292.835
CNNClassifier-ep20062.03281.31880.16888.31366.28461.91120.70918.079
EncoderClassifier-ep20543.3881464.7321477.9822208.3981173.4321165.267186.202147.210
FCNClassifier-ep20107.983547.794546.054908.470473.204410.37248.38845.022
IInception-ep20405.6211622.8021674.0652536.5081170.4441210.527177.632144.673
IndividualLITEClassifier-ep100277.578836.207888.0901347.996641.307631.69991.44180.029
IndividualLITEClassifier-ep2064.009186.180187.645305.871145.600146.95923.52821.885
IndividualLITEClassifier-ep50148.385482.603495.227725.426338.380348.79049.65242.970
MLPClassifier-ep2015.1729.0658.8307.5626.1176.0874.5183.489
MLPClassifier-ep200139.29683.35283.04164.77151.88850.57836.98023.391
ResNet-ep20316.4471341.2271398.2911800.079979.298953.220146.271116.175
Table A5. Accuracy of the SFA-enhanced neural network classifiers.
Table A5. Accuracy of the SFA-enhanced neural network classifiers.
DatasetCropFordAFordBHandOutlinesNonInvasiveFetalECGThorax1NonInvasiveFetalECGThorax2PhalangesOutlinesCorrectTwoPatterns
Algorithm
LSTM-CNN-ep200.6020.8360.7200.6030.5440.6880.7420.940
LSTM-CNN-ep500.6140.8800.7090.6430.5230.7000.7260.950
LSTM-CNN-no-bn-ep200.6120.9000.6830.6460.3430.4500.7210.969
LSTM-CNN-no-bn-ep500.6250.8550.6620.6590.3770.5380.7550.967
LSTM-attention-cnn-ep200.6140.9080.7310.8540.7200.8190.7530.935
LSTM-attention-cnn-ep500.6150.9020.7350.8810.7510.8450.7190.935
LSTM-attention-cnn2-ep200.5940.9110.7420.8890.7220.8200.7580.929
LSTM-attention-cnn2-ep500.5990.9070.7440.9030.7850.8380.7320.933
LSTM-attention-ep200.6140.9030.7400.8810.6920.7770.7180.941
LSTM-attention-ep500.6140.9090.7370.8780.7680.8320.7120.930
gcn-ep200.5250.9120.7300.6760.5820.6060.6420.772
Table A6. Execution time of the SFA-enhanced neural network classifiers.
Table A6. Execution time of the SFA-enhanced neural network classifiers.
DatasetCropFordAFordBHandOutlinesNonInvasiveFetalECGThorax1NonInvasiveFetalECGThorax2PhalangesOutlinesCorrectTwoPatterns
Algorithm
LSTM-CNN-ep2029.245106.697145.504321.038158.140156.37912.65421.750
LSTM-CNN-ep5033.414130.198139.316277.145236.521260.69814.46419.466
LSTM-CNN-no-bn-ep2025.209148.240157.016254.593130.238131.01214.88118.473
LSTM-CNN-no-bn-ep5045.386200.621108.940463.126292.452295.63520.11725.116
LSTM-attention-cnn-ep2039.236177.046204.562479.402231.187227.56020.80725.905
LSTM-attention-cnn-ep5046.482204.554164.1501176.196339.760447.86318.08623.698
LSTM-attention-cnn2-ep2026.53289.626101.880246.840127.765130.39313.17517.661
LSTM-attention-cnn2-ep5028.055104.636102.962456.134250.488280.23614.41119.087
LSTM-attention-ep2067.502554.118364.0361097.373490.547484.48535.52242.412
LSTM-attention-ep50100.004409.937497.0811666.8531145.013970.11838.96355.774
gcn-ep20126.940434.270410.6702714.180437.547425.62224.37938.267
Table A7. SBMR-MV accuracy for a single window across all datasets.
Table A7. SBMR-MV accuracy for a single window across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
SBMR-MV-RF-W0-noTrend-D0-UG-BAG0.8070.3500.5000.7270.472
SBMR-MV-RF-W0-noTrend-D0-UG-STACK0.9530.3990.5000.7340.666
SBMR-MV-RF-W0-trend-D0-UG-BAG0.8730.3570.6200.7370.569
SBMR-MV-RF-W0-trend-D0-UG-STACK0.9630.2850.5700.7650.700
SBMR-MV-RF-W0-trend-D2-BG-BAG0.8330.3570.5100.7920.744
SBMR-MV-RF-W0-trend-D2-BG-STACK0.9570.3310.4800.8190.831
SBMR-MV-RF-W0-trend-D4-BG-BAG0.8700.3270.5500.7510.706
SBMR-MV-RF-W0-trend-D4-BG-STACK0.9630.3540.5500.7510.838
SBMR-MV-RF-W0-trend-D6-UG-BAG0.8700.3310.5000.7580.725
SBMR-MV-RF-W0-trend-D6-UG-STACK0.9500.3310.4400.7880.828
SBMR-MV-ridge-cv-W0-noTrend-D0-UG-BAG0.8230.2740.4700.7130.463
SBMR-MV-ridge-cv-W0-noTrend-D0-UG-STACK0.9900.3160.4800.7060.653
SBMR-MV-ridge-cv-W0-trend-D0-UG-BAG0.8730.3080.4900.7130.497
SBMR-MV-ridge-cv-W0-trend-D0-UG-STACK0.9830.3230.4800.7340.697
SBMR-MV-ridge-cv-W0-trend-D2-BG-BAG0.9830.3120.5200.8260.769
SBMR-MV-ridge-cv-W0-trend-D2-BG-STACK0.9870.2740.5000.8500.891
SBMR-MV-ridge-cv-W0-trend-D4-BG-BAG0.9630.3080.5000.7370.722
SBMR-MV-ridge-cv-W0-trend-D4-BG-STACK0.9930.3120.5000.7610.856
SBMR-MV-ridge-cv-W0-trend-D6-UG-BAG0.9370.2810.5000.7610.753
Table A8. SBMR-MV execution time for a single window across all datasets.
Table A8. SBMR-MV execution time for a single window across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
SBMR-MV-RF-W0-noTrend-D0-UG-BAG3.10514.3332.33312.7681.810
SBMR-MV-RF-W0-noTrend-D0-UG-STACK3.24214.4392.54312.7731.939
SBMR-MV-RF-W0-trend-D0-UG-BAG5.99226.6094.48925.4543.495
SBMR-MV-RF-W0-trend-D0-UG-STACK6.17627.4644.86925.0303.543
SBMR-MV-RF-W0-trend-D2-BG-BAG27.984116.76921.084119.07216.017
SBMR-MV-RF-W0-trend-D2-BG-STACK29.462113.13925.120113.50716.038
SBMR-MV-RF-W0-trend-D4-BG-BAG13.63958.59210.27457.0047.791
SBMR-MV-RF-W0-trend-D4-BG-STACK14.58357.82112.48356.1297.905
SBMR-MV-RF-W0-trend-D6-UG-BAG12.49454.3869.20252.0447.275
SBMR-MV-RF-W0-trend-D6-UG-STACK12.42151.9869.73250.5257.008
SBMR-MV-ridge-cv-W0-noTrend-D0-UG-BAG3.02914.3892.26912.4061.756
SBMR-MV-ridge-cv-W0-noTrend-D0-UG-STACK3.08414.4382.41012.5461.756
SBMR-MV-ridge-cv-W0-trend-D0-UG-BAG5.85626.6804.37524.6173.384
SBMR-MV-ridge-cv-W0-trend-D0-UG-STACK6.15826.5244.74225.5883.454
SBMR-MV-ridge-cv-W0-trend-D2-BG-BAG27.256111.71120.511113.17315.684
SBMR-MV-ridge-cv-W0-trend-D2-BG-STACK29.062112.90524.957112.33615.853
SBMR-MV-ridge-cv-W0-trend-D4-BG-BAG13.40657.02210.16155.3747.650
SBMR-MV-ridge-cv-W0-trend-D4-BG-STACK14.34156.46212.26956.2347.807
SBMR-MV-ridge-cv-W0-trend-D6-UG-BAG11.78351.6788.75050.1386.808
Table A9. SBMR-MV accuracy for multiple windows across all datasets.
Table A9. SBMR-MV accuracy for multiple windows across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
SBMR-MV-RF-W11-trend-D6-BG-BAG0.8970.3500.5700.7880.831
SBMR-MV-RF-W11-trend-D6-BG-STACK0.9730.3380.4100.7510.878
SBMR-MV-RF-W15-trend-D6-BG-BAG0.7930.3270.5500.7710.812
SBMR-MV-RF-W15-trend-D6-BG-STACK0.9530.3610.5600.7650.853
SBMR-MV-ridge-cv-W11-trend-D4-BG-BAG-chi0.9970.3500.5500.7880.859
SBMR-MV-ridge-cv-W11-trend-D4-BG-STACK-chi0.9970.3350.5200.8190.916
SBMR-MV-ridge-cv-W11-trend-D6-BG-BAG0.9930.3160.5100.8570.863
SBMR-MV-ridge-cv-W11-trend-D6-BG-BAG-chi0.9930.3420.5700.8230.863
SBMR-MV-ridge-cv-W11-trend-D6-BG-STACK0.9970.2850.5300.8570.916
SBMR-MV-ridge-cv-W11-trend-D6-BG-STACK-chi0.9970.3230.5000.8740.916
SBMR-MV-ridge-cv-W14-trend-D6-BG-STACK0.9930.2970.4900.8330.903
SBMR-MV-ridge-cv-W15-trend-D2-BG-BAG-chi0.9870.3730.6100.8460.850
SBMR-MV-ridge-cv-W15-trend-D2-BG-STACK-chi0.9900.3420.5600.8500.909
SBMR-MV-ridge-cv-W15-trend-D4-BG-BAG-chi0.9930.3040.5400.7750.884
SBMR-MV-ridge-cv-W15-trend-D4-BG-STACK-chi0.9930.3270.5200.8260.906
SBMR-MV-ridge-cv-W15-trend-D6-BG-BAG0.9900.3080.5300.8600.878
SBMR-MV-ridge-cv-W15-trend-D6-BG-BAG-chi0.9900.3120.5900.8570.878
SBMR-MV-ridge-cv-W15-trend-D6-BG-STACK0.9900.2810.5200.8430.916
SBMR-MV-ridge-cv-W15-trend-D6-BG-STACK-chi0.9900.3000.5900.8500.916
SBMR-MV-ridge-cv-W8-trend-D6-BG-BAG0.9970.2850.5200.8460.828
SBMR-MV-ridge-cv-W8-trend-D6-BG-STACK0.9970.2890.5300.8530.894
Table A10. SBMR-MV execution time for multiple windows across all datasets.
Table A10. SBMR-MV execution time for multiple windows across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
SBMR-MV-RF-W11-trend-D6-BG-BAG31.708110.90325.473112.66916.795
SBMR-MV-RF-W11-trend-D6-BG-STACK41.358115.32842.269119.47618.377
SBMR-MV-RF-W15-trend-D6-BG-BAG18.69558.28916.67458.3509.610
SBMR-MV-RF-W15-trend-D6-BG-STACK25.41861.01028.89361.29310.652
SBMR-MV-ridge-cv-W11-trend-D4-BG-BAG-chi31.945111.85823.221114.69817.014
SBMR-MV-ridge-cv-W11-trend-D4-BG-STACK-chi40.846112.99532.580117.63818.243
SBMR-MV-ridge-cv-W11-trend-D6-BG-BAG38.256145.06231.322143.59720.178
SBMR-MV-ridge-cv-W11-trend-D6-BG-BAG-chi32.462112.87123.014112.69517.219
SBMR-MV-ridge-cv-W11-trend-D6-BG-STACK40.630115.38644.945120.18118.654
SBMR-MV-ridge-cv-W11-trend-D6-BG-STACK-chi41.561113.59932.276115.98318.670
SBMR-MV-ridge-cv-W14-trend-D6-BG-STACK49.472160.64347.373167.63224.661
SBMR-MV-ridge-cv-W15-trend-D2-BG-BAG-chi38.014112.52227.867113.78719.632
SBMR-MV-ridge-cv-W15-trend-D2-BG-STACK-chi55.559116.25742.339120.74322.030
SBMR-MV-ridge-cv-W15-trend-D4-BG-BAG-chi18.77558.02814.46257.9939.299
SBMR-MV-ridge-cv-W15-trend-D4-BG-STACK-chi25.86058.99421.08859.50910.347
SBMR-MV-ridge-cv-W15-trend-D6-BG-BAG17.78457.44016.03456.8689.303
SBMR-MV-ridge-cv-W15-trend-D6-BG-BAG-chi18.81658.59913.88557.0849.833
SBMR-MV-ridge-cv-W15-trend-D6-BG-STACK24.28758.77727.66760.62210.238
SBMR-MV-ridge-cv-W15-trend-D6-BG-STACK-chi28.18163.83322.48765.00211.692
SBMR-MV-ridge-cv-W8-trend-D6-BG-BAG141.573543.507105.930566.51179.861
SBMR-MV-ridge-cv-W8-trend-D6-BG-STACK157.743546.896145.858560.05579.709
Table A11. Neural network architecture accuracy for multiple windows across all datasets.
Table A11. Neural network architecture accuracy for multiple windows across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
LSTM-attention-cnn2-ep20-bag0.4700.2550.5300.8530.394
LSTM-attention-cnn2-ep20-stack0.9270.2660.5000.4980.653
LSTM-attention-cnn2-ep50-bag0.6470.2660.6000.8600.531
LSTM-attention-cnn2-ep50-stack0.9070.3460.5300.8530.756
LSTM-CNN-ep20-stack0.9130.2470.5100.6380.562
LSTM-CNN-ep50-stack0.8870.3270.4800.6860.625
Table A12. Neural network architecture execution time across all datasets.
Table A12. Neural network architecture execution time across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
LSTM-attention-cnn2-ep20-bag30.99385.28217.87896.89717.104
LSTM-attention-cnn2-ep20-stack12.71948.45711.65332.34210.190
LSTM-attention-cnn2-ep50-bag45.07071.92119.31873.72026.229
LSTM-attention-cnn2-ep50-stack22.64767.12214.95538.72216.491
LSTM-CNN-ep20-stack10.64232.6449.25219.4768.247
LSTM-CNN-ep50-stack13.897119.33212.22145.03913.111
Table A13. State-of-the-art neural network classifiers’ accuracy in all datasets.
Table A13. State-of-the-art neural network classifiers’ accuracy in all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
CNNClassifier-ep200.0830.2550.4600.8460.191
CNNClassifier-ep2000.4570.2970.5300.8770.806
EncoderClassifier-ep200.8470.2320.5400.8120.734
IInception-ep200.9200.2550.4800.7780.456
IndividualLITEClassifier-ep1000.9870.2810.5500.6960.228
IndividualLITEClassifier-ep500.9300.3080.5100.8840.244
ResNet-ep200.1630.2470.4600.5020.125
Table A14. State-of-the-art neural network classifiers’ execution time in all datasets.
Table A14. State-of-the-art neural network classifiers’ execution time in all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
CNNClassifier-ep201.9693.4261.8352.6521.631
CNNClassifier-ep2009.02620.1746.73414.6936.750
EncoderClassifier-ep2041.563334.12127.218199.41237.719
IInception-ep2052.853414.76332.317222.46946.094
IndividualLITEClassifier-ep10038.759257.80828.145144.83331.622
IndividualLITEClassifier-ep5020.604138.16216.90174.09817.632
ResNet-ep2033.791283.64919.412147.36031.053
Table A15. Rocket accuracy across all datasets.
Table A15. Rocket accuracy across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
ROCKET0.9930.4260.5300.8500.934
miniRocket0.9900.4750.5000.9180.941
Table A16. Rocket execution time across all datasets.
Table A16. Rocket execution time across all datasets.
DatasetArticularyWordRecognitionEthanolConcentrationFingerMovementsSelfRegulationSCP1UWaveGestureLibrary
Algorithm
ROCKET40.319284.41310.389215.68643.534
miniRocket3.45216.7341.31812.1683.523

References

  1. Chaovalitwongse, W.A.; Prokopyev, O.A.; Pardalos, P.M. Electroencephalogram (EEG) time series classification: Applications in epilepsy. Ann. Oper. Res. 2006, 148, 227–250. [Google Scholar] [CrossRef]
  2. Arul, M.; Kareem, A. Applications of shapelet transform to time series classification of earthquake, wind and wave data. Eng. Struct. 2021, 228, 111564. [Google Scholar] [CrossRef]
  3. Potamitis, I. Classifying insects on the fly. Ecol. Inform. 2014, 21, 40–49. [Google Scholar] [CrossRef]
  4. Susto, G.A.; Cenedese, A.; Terzi, M. Chapter 9—Time-Series Classification Methods: Review and Applications to Power Systems Data. In Big Data Application in Power Systems; Arghandeh, R., Zhou, Y., Eds.; Elsevier: Amsterdam, The Netherlands, 2018; pp. 179–220. [Google Scholar] [CrossRef]
  5. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Evaluating surgical skills from kinematic data using convolutional neural networks. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, 16–20 September 2018; Proceedings, Part IV 11. Springer: Berlin/Heidelberg, Germany, 2018; pp. 214–221. [Google Scholar]
  6. Tao, L.; Elhamifar, E.; Khudanpur, S.; Hager, G.D.; Vidal, R. Sparse hidden markov models for surgical gesture classification and skill evaluation. In Proceedings of the Information Processing in Computer-Assisted Interventions: Third International Conference, IPCAI 2012, Pisa, Italy, 27 June 2012; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2012; pp. 167–177. [Google Scholar]
  7. Forestier, G.; Lalys, F.; Riffaud, L.; Trelhu, B.; Jannin, P. Classification of surgical processes using dynamic time warping. J. Biomed. Inform. 2012, 45, 255–264. [Google Scholar] [CrossRef]
  8. Devanne, M.; Wannous, H.; Berretti, S.; Pala, P.; Daoudi, M.; Del Bimbo, A. 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans. Cybern. 2014, 45, 1340–1352. [Google Scholar] [CrossRef]
  9. Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
  10. Pinto, J.P.; Pimenta, A.; Novais, P. Deep learning and multivariate time series for cheat detection in video games. Mach. Learn. 2021, 110, 3037–3057. [Google Scholar] [CrossRef]
  11. Younis, R.; Zerr, S.; Ahmadi, Z. Multivariate time series analysis: An interpretable cnn-based model. In Proceedings of the 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; IEEE: Piscataway Township, NJ, USA, 2022; pp. 1–10. [Google Scholar]
  12. Madrid, F.; Singh, S.; Chesnais, Q.; Mauck, K.; Keogh, E. Matrix profile xvi: Efficient and effective labeling of massive time series archives. In Proceedings of the 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Washington, DC, USA, 5–8 October 2019; IEEE: Piscataway Township, NJ, USA, 2019; pp. 463–472. [Google Scholar]
  13. Devanne, M.; Rémy-Néris, O.; Le Gals-Garnett, B.; Kermarrec, G.; Thepaut, A. A co-design approach for a rehabilitation robot coach for physical rehabilitation based on the error classification of motion errors. In Proceedings of the 2018 Second IEEE International Conference on Robotic Computing (IRC), Laguna Hills, CA, USA, 31 January–2 February 2018; IEEE: Piscataway Township, NJ, USA, 2018; pp. 352–357. [Google Scholar]
  14. Schäfer, P.; Högqvist, M. SFA: A symbolic fourier approximation and index for similarity search in high dimensional datasets. In Proceedings of the 15th International Conference on Extending Database Technology, Berlin, Germany, 27–30 March 2012; ACM: New York, NY, USA, 2012; pp. 516–527. [Google Scholar]
  15. Glenis, A.; Vouros, G.A. SCALE-BOSS-MR: Scalable Time Series Classification Using Multiple Symbolic Representations. Appl. Sci. 2024, 14, 689. [Google Scholar] [CrossRef]
  16. Lin, J.; Keogh, E.; Wei, L.; Lonardi, S. Experiencing SAX: A novel symbolic representation of time series. Data Min. Knowl. Discov. 2007, 15, 107–144. [Google Scholar] [CrossRef]
  17. Senin, P.; Malinchik, S. Sax-vsm: Interpretable time series classification using sax and vector space model. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; IEEE: Piscataway Township, NJ, USA, 2013; pp. 1175–1180. [Google Scholar]
  18. Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min. Knowl. Discov. 2015, 29, 1505–1530. [Google Scholar] [CrossRef]
  19. Schäfer, P. Scalable time series classification. Data Min. Knowl. Discov. 2016, 30, 1273–1298. [Google Scholar]
  20. Schäfer, P.; Leser, U. Fast and accurate time series classification with weasel. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; ACM: New York, NY, USA, 2017; pp. 637–646. [Google Scholar]
  21. Glenis, A.; Vouros, G.A. Balancing between scalability and accuracy in time-series classification for stream and batch settings. In Proceedings of the Discovery Science: 23rd International Conference, DS 2020, Thessaloniki, Greece, 19–21 October 2020; Proceedings 23. Springer: Berlin/Heidelberg, Germany, 2020; pp. 265–279. [Google Scholar]
  22. Nguyen, T.L.; Ifrim, G. MrSQM: Fast time series classification with symbolic representations. arXiv 2021, arXiv:2109.01036. [Google Scholar]
  23. Nguyen, T.L.; Ifrim, G. Fast time series classification with random symbolic subsequences. In Proceedings of the International Workshop on Advanced Analytics and Learning on Temporal Data, Grenoble, France, 19–23 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 50–65. [Google Scholar]
  24. Glenis, A.; Vouros, G.A. SCALE-BOSS: A framework for scalable time-series classification using symbolic representations. In Proceedings of the 12th Hellenic Conference on Artificial Intelligence, Corfu, Greece, 7–9 September 2022; pp. 1–9. [Google Scholar]
  25. Schäfer, P.; Leser, U. WEASEL 2.0–A Random Dilated Dictionary Transform for Fast, Accurate and Memory Constrained Time Series Classification. arXiv 2023, arXiv:2301.10194. [Google Scholar]
  26. Schäfer, P.; Leser, U. Multivariate time series classification with WEASEL+ MUSE. arXiv 2017, arXiv:1711.11343. [Google Scholar]
  27. Dempster, A.; Petitjean, F.; Webb, G.I. ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 2020, 34, 1454–1495. [Google Scholar] [CrossRef]
  28. Dempster, A.; Schmidt, D.F.; Webb, G.I. Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 248–257. [Google Scholar]
  29. Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Piscataway Township, NJ, USA, 2017; pp. 1578–1585. [Google Scholar]
  30. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  31. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
  32. Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
  33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  34. Monti, F.; Frasca, F.; Eynard, D.; Mannion, D.; Bronstein, M.M. Fake news detection on social media using geometric deep learning. arXiv 2019, arXiv:1902.06673. [Google Scholar] [CrossRef]
  35. Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
  36. Serra, J.; Pascual, S.; Karatzoglou, A. Towards a universal neural network encoder for time series. In Artificial Intelligence Research and Development; IOS Press: Amsterdam, The Netherlands, 2018; pp. 120–129. [Google Scholar]
  37. Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.A.; Petitjean, F. Inceptiontime: Finding alexnet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
  38. Ismail-Fawaz, A.; Devanne, M.; Weber, J.; Forestier, G. Deep learning for time series classification using new hand-crafted convolution filters. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; IEEE: Piscataway Township, NJ, USA, 2022; pp. 972–981. [Google Scholar]
  39. Ismail-Fawaz, A.; Devanne, M.; Berretti, S.; Weber, J.; Forestier, G. Lite: Light inception with boosting techniques for time series classification. In Proceedings of the 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), Thessaloniki, Greece, 9–13 October 2023; IEEE: Piscataway Township, NJ, USA, 2023; pp. 1–10. [Google Scholar]
  40. Cui, Z.; Chen, W.; Chen, Y. Multi-scale convolutional neural networks for time series classification. arXiv 2016, arXiv:1603.06995. [Google Scholar] [CrossRef]
  41. Wickstrøm, K.; Kampffmeyer, M.; Mikalsen, K.Ø.; Jenssen, R. Mixing up contrastive learning: Self-supervised representation learning for time series. Pattern Recognit. Lett. 2022, 155, 54–61. [Google Scholar] [CrossRef]
  42. Ordonez, P.; Armstrong, T.; Oates, T.; Fackler, J. Using modified multivariate bag-of-words models to classify physiological data. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops, Vancouver, BC, Canada, 11 December 2011; IEEE: Piscataway Township, NJ, USA, 2011; pp. 534–539. [Google Scholar]
  43. Baydogan, M.G.; Runger, G. Learning a symbolic representation for multivariate time series classification. Data Min. Knowl. Discov. 2015, 29, 400–422. [Google Scholar] [CrossRef]
  44. Ruiz, A.P.; Flynn, M.; Large, J.; Middlehurst, M.; Bagnall, A. The great multivariate time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 2021, 35, 401–449. [Google Scholar] [CrossRef] [PubMed]
  45. Shervashidze, N.; Schweitzer, P.; Van Leeuwen, E.J.; Mehlhorn, K.; Borgwardt, K.M. Weisfeiler-Lehman graph kernels. J. Mach. Learn. Res. 2011, 12, 2539–2561. [Google Scholar]
  46. LeCun, Y. Generalization and network design strategies. Connect. Perspect. 1989, 19, 18. [Google Scholar]
  47. LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 1995, 3361, 1995. [Google Scholar]
  48. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations; MIT Press: Cambridge, MA, USA, 1986; pp. 318–362. [Google Scholar]
  49. Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  50. Jordan, M.I. Serial order: A parallel distributed processing approach. In Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1997; Volume 121, pp. 471–495. [Google Scholar]
  51. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  52. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  53. Sugiyama, M.; Borgwardt, K. Halting in random walk kernels. Adv. Neural Inf. Process. Syst. 2015, 28, 1639–1647. [Google Scholar]
  54. Faouzi, J.; Janati, H. pyts: A Python Package for Time Series Classification. J. Mach. Learn. Res. 2020, 21, 1–6. [Google Scholar]
  55. Siglidis, G.; Nikolentzos, G.; Limnios, S.; Giatsidis, C.; Skianis, K.; Vazirgiannis, M. Grakel: A graph kernel library in python. J. Mach. Learn. Res. 2020, 21, 1–5. [Google Scholar]
  56. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  57. Chollet, F. Deep Learning with Python; Simon and Schuster: New York, NY, USA, 2021. [Google Scholar]
  58. Data61, C. StellarGraph Machine Learning Library. 2018. Available online: https://github.com/stellargraph/stellargraph (accessed on 9 September 2025).
  59. Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
  60. Middlehurst, M.; Ismail-Fawaz, A.; Guillaume, A.; Holder, C.; Guijo-Rubio, D.; Bulatova, G.; Tsaprounis, L.; Mentel, L.; Walter, M.; Schäfer, P.; et al. aeon: A Python toolkit for learning from time series. J. Mach. Learn. Res. 2024, 25, 1–10. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.