Next Article in Journal
Capacity of Linguistic Communication Channels in Literary Texts: Application to Charles Dickens’ Novels
Previous Article in Journal
NeuralMinimizer: A Novel Method for Global Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LA-ESN: A Novel Method for Time Series Classification

School of Software, Jiangxi Normal University, Nanchang 330022, China
*
Authors to whom correspondence should be addressed.
Information 2023, 14(2), 67; https://doi.org/10.3390/info14020067
Submission received: 9 November 2022 / Revised: 14 January 2023 / Accepted: 19 January 2023 / Published: 26 January 2023

Abstract

:
Time-series data is an appealing study topic in data mining and has a broad range of applications. Many approaches have been employed to handle time series classification (TSC) challenges with promising results, among which deep neural network methods have become mainstream. Echo State Networks (ESN) and Convolutional Neural Networks (CNN) are commonly utilized as deep neural network methods in TSC research. However, ESN and CNN can only extract local dependencies relations of time series, resulting in long-term temporal data dependence needing to be more challenging to capture. As a result, an encoder and decoder architecture named LA-ESN is proposed for TSC tasks. In LA-ESN, the encoder is composed of ESN, which is utilized to obtain the time series matrix representation. Meanwhile, the decoder consists of a one-dimensional CNN (1D CNN), a Long Short-Term Memory network (LSTM) and an Attention Mechanism (AM), which can extract local information and global dependencies from the representation. Finally, many comparative experimental studies were conducted on 128 univariate datasets from different domains, and three evaluation metrics including classification accuracy, mean error and mean rank were exploited to evaluate the performance. In comparison to other approaches, LA-ESN produced good results.

1. Introduction

Massive data resources have accumulated in numerous industries in the quickly increasing information era, and large-scale data provide valuable content. Time series data is a set of data points arranged in chronological order, which can be divided into two categories: univariate and multivariate. In univariate time series data, only one variable varies over time, while in multivariate time series data, multiple variables change over time [1]. Time series data are now widely used in a wide range of applications, including statistics, pattern recognition, earthquake prediction, econometrics, astronomy, signal processing, control engineering, communication engineering, finance, weather forecasting, the Internet of Things, and medical care [2].
Much work has been completed around the TSC problem in the last two decades. This enlightened and attractive data mining topic focuses on classifying data points indexed by time by predicting their labels. Time series classification is an essential task in data mining and has been extensively employed in many fields. For example, in the field of network traffic, Xiao et al. [3] proposed traffic classification for dynamic network flows. In medical diagnosis, R. Michida et al. [4] proposed a deep learning method for classifying lesions based on JNET classification for computer-aided diagnosis systems for colorectal magnification NBI endoscopy. In finance, Mori et al. [5] and Wan et al. [6] proposed methods for the early classification of time series using multi-objective optimization techniques and financial strategies for classifying chart patterns in time series.
Generally, the solution to the TSC task can be divided into three types [7]: (1) Distance-based approaches: the distances between various samples are evaluated and utilized for classification through a distance function. Representative distance-based techniques include Time-warping Edit Distance (TED) [8], Weighted Dynamic Time Warping (WDTW) [9], Complexity Invariant Distance (CID) [10], Derivative Transform Distance (DTD) [11], Dynamic Derivative Time Warping (DDTW) [12], 1-Nearest Neighbour with Euclidean Distance (ED) [13] and Longest Common Subsequence (LCSS) [14]. (2) Feature-based methods: use feature extraction to extract compelling features from the original data, which can be local features of the data or global features. Examples include Time Series Bag Feature (TSBF) [15], Time Series Forest (TSF) [16], Fast Shape Tree (FS) [17], Shape Transform (ST) [18], Learning Pattern Similarity (LPS) [19], and Dynamic Time Warping Feature (DTWF) [20]. (3) Neural network-based methods: classifying time series through an “end-to-end” learning model. The raw data are fed straight into the model, and the model generates direct results. For example, Long Short-Term Memory Fully Convolutional Network (LSTM-FCN) [21], Echo State Network (ESN) [22], Attention LSTM-FCN (ALSTM-FCN) [23], and Temporal Convolutional Network (TCN) [24].
ESN, CNN and LSTM are widely used for time series classification tasks. Using ESNs alone is insufficient for time series classification. Therefore, several researchers have proposed models that fuse ESNs with CNNs to make ESNs more adaptable to time series classification tasks. Ma et al. [22] used ESN to generate a time series representation matrix before employing CNN to extract classification features. LSTM is widely used to extract long-term dependencies in time series, and some researchers have also fused CNN with LSTM for time series classification and achieved quite good results. For example, Karim et al. [21] incorporated a full convolutional block and an LSTM to propose a new method. The full convolutional block contains three temporal convolutional blocks as feature extractors, and the temporal convolution block has a convolutional layer with multiple filters. However, extracting features directly from the time series for classification necessitates substantial preprocessing. The attention mechanism has become an essential concept in deep learning and has been successfully applied to time series tasks. In recent years, the main principle behind employing attention in time series classification has been to focus on the most important information related to the input data while extracting features from the input data.
In light of the preceding work, we propose an end-to-end TSC model called LA-ESN for time series classification. The LA-ESN consists of an echo memory encoder and a decoder. The reservoir layer makes up the echo memory encoder, while the decoder is made up of a 1D CNN, an LSTM and Attention. The storage layer first projects the time series into a high-dimensional nonlinear space to generate echo states step by step. The echo memory matrix is formed by collecting the echo states of all time steps in chronological order. To capture the critical historical information in the time series, we design a decoder that applies a 1D CNN, an LSTM and Attention to the echo memory matrix, respectively. To increase network efficiency, the 1D CNN and LSTM are employed to extract multi-scale features and retrieves global information from the echo memory matrix. Then, the Attention is used to extract the essential information from the global information. LA-ESN is an acceptable classification approach, according to experimental findings on a variety of time series datasets. The following are the main contributions of this paper:
(1)
We propose a simple end-to-end model LA-ESN for handling time series classification tasks;
(2)
We modify the output layer of ESN to handle time series better and use CNN and LSTM as output layers to finish feature extraction;
(3)
The attention mechanism is deployed behind both CNN and LSTM, which effectively improves the effectiveness and computing efficiency of LA-ESN;
(4)
Experiments on various time series datasets show that LA-ESN is efficacious.
The rest of this paper is organized as follows. Section 2 discusses the related work of this study. Section 3 explains the proposed method. Section 4 provides a description of the UCR time series database and the results of the comparison experiments. A brief conclusion is given in Section 5.

2. Associated Work

2.1. CNN with Attention

CNN and AM have been broadly adopted in image processing and other fields. In combination with AM, CNN has been proposed for time series classification. AM can be interpreted as a feature enhancement method by extracting the most information-rich components of a signal. After calculating the attention values based on the saliency importance in the feature maps extracted from the convolutional layers, the refined feature map weights can be derived from the attention values. A Deep CNN (DCNN) with an attention module has been proposed as a framework for time series classification [25]. This network improves the classification performance for various seismic events and solves all possible seismic events. Tripathi et al. [26] proposed an Attention-based Multivariate CNN (AT-MVCNN), which contains an input tensor based on attention features to encode information across multiple timestamps. Sun et al. [27] proposed a Prototypical Inception Network with Cross Branch Attention (PIN-BA) framework that uses CNNs with various receiver window branches to capture features at different time window scales, using a cross-branching attention scheme to emphasize critical feature information in the classification process. In [28], five AMs were applied to six neural networks to focus on the valid information in the time series from either the channel or the spatial dimension. All of the above studies yielded promising results in time series classification.

2.2. ESN -Based Classifier

An ESN is a recursive structure consisting of random RNNs. Although ESNs are mainly employed to predict time series, many ESN-based classification networks have also been designed to implement the TSC task. For example, using Autoencoder (AE) theory, Wang et al. [7] developed an ESN with an input weight framework and a globally reversible algorithm to reconstruct the randomly initialized input weights of the ESN network. Existing self-encoder and ESN-based network models start with initial input weights, the output weights generated during the encoding process. Huang et al. [29] proposed an innovative ESN approach named Functional Deep Echo State Network (FDESN), which introduced temporal and spatial aggregation. Moreover, this approach considers the relative value of temporal data throughout several periods and the dynamic properties of MTS. Wang et al. [30] offered a unique Discriminative Regularized ESN (DR-ESN) time series classification algorithm by combining Discriminative Feature Aggregation (DFA) and an Outlier Robust Weight (ORW) algorithm. First, the DFA algorithm is adopted to replace the random input weights of the ESN with the bounded weights based on the sample information. Second, the ORW algorithm is applied to weight-constrained samples with notable training errors to achieve greater robustness in the training process. The DR-ESN can effectively improve the classification performance of the original ESN and significantly reduce the impact of outliers on the classification result.

3. The Proposed LA-ESN Framework

The framework of LA-ESN is divided into two parts: encoding and decoding. In the encoding part, each frame of the input time series is mapped into a high-dimensional reservoir state space to obtain an Echo State Representation (ESR). The ESRs of all time steps are stored in a memory matrix. We simultaneously design two ways to decode the memory matrix in the decoding phase. First, we adopt a Multi-scale 1D CNN [31,32,33] to extract the local information. Second, utilizing LSTM and Attention, long-term dependent information is recovered from the memory matrix. Both attention mechanisms are intended to reduce irrelevant information while increasing computational efficiency. Finally, the local and long-term dependent information obtained by the two methods is pooled and merged, and then the merged features are passed through a fully connected layer. The conditional probability distribution of the categories is calculated using a Softmax layer. The proposed LA-ESN model’s general design is seen in Figure 1.

3.1. Preliminary

TSC refers to the identification of non-labeled samples based on the given labeled samples. The time series dataset, consisting of N samples, can be expressed as follows:
D = { ( X 1 , y 1 ) , , ( X i , y i ) , , ( X N , y N ) } X i = ( X i ( 1 ) , , X i ( t ) , , X i ( T ) )
where T denotes the time stamp of the time series data and yi represents the corresponding category in each time series Xi.

3.2. Encoding Stage

The primitive ESN includes an input, reserve pool, and output layer. The diagram of the primitive ESN model is shown in Figure 2. The primitive ESN uses a reserve pool of randomly sparsely connected neurons as the hidden layer of a high-dimensional and nonlinear input representation. The weights of the hidden layer of the ESN are generated in advance rather than by training. Meanwhile, they are trained from the hidden layer to the output layer. Therefore, the generated reserve pool has some good properties that guarantee excellent performance by using only linear methods to train the weights from the reserve pool to the output layer. Given a k-dimensional input, i(t) with time step t, the state of the reserve pool with time step t − 1 is r(t − 1). The equations for updating the primitive ESN are as follows:
r ( t ) = f ( w r e s r ( t 1 ) + w i n i ( t ) )
o ( t ) = f o u t ( w o u t r ( t ) )
w r e s = S R × w λ max ( w )
where w i n , w r e s and w o u t denote the input, intermediate and output weight matrix, respectively; i(t) and r(t−1) express the time series input and the reservoir state at step t−1, respectively; w i n and w o u t are randomly initialized and fixed; w r e s can be calculated by Equation (4), SR is the spectral radius of w r e s , λ max ( w ) is the maximum eigenvalue of the matrix W and the elements of W are generated randomly in [−0.5, 0.5].
In this work, we take ESN as the base network and modify the output layer of ESN to make the network model more suitable for TSC, as shown in Figure 1. Suppose s = ( s ( 0 ) , s ( 1 ) , , s ( T 1 ) ) T is a k-dimensional time series. At each time step, the echo state r(t) is calculated according to Equation (2) and the echo state memory matrix R can be obtained as follows:
R = r 1 ( 0 ) r 2 ( 0 ) r N ( 0 ) r 1 ( 1 ) r 2 ( 1 ) r N ( 1 ) r 1 ( T 1 ) r 2 ( T 1 ) r N ( T 1 )
We use r(t) to indicate the t-th row of R, which denotes the echo state at the t-th time step where t 0 , 1 , , ( T 1 ) . The term rm demonstrates as the m-th column of R, expressing the echo state of the m-th time sequence at all time steps where m 0 , 1 , , ( T 1 ) . We keep all the calculated echo states in R to obtain a complete ESR of the sequence as the input of the decoder, and the decoder extracts the discriminative features to determine the category labels.

3.3. Decoding Stage

In previous studies, multi-scale convolution has been used as a feature extractor to extract effective classification features from time series representations. Alternatively, LSTM is used to learn straightforwardly from the input time series. Both approaches are capable of classifying time series to some extent, although they might be improved. Therefore, we propose to appropriately adapt and then combine them to be used as the output layer of ESN to learn better feature information from the echo states.
On the one hand, we employ multiple scales of 1D CNNs for convolution operations along the time direction and use multiple filters for each time scale. The batch normalization operation follows the convolution operations to avoid the gradient disappearance problem. Next, ReLU as a correction layer for the activation function is adopted, which can improve the nonlinear and sparsity connection between the levels to reduce over-fitting. Therefore, batch normalization and ReLU can achieve more robust learning. Finally, the multi-scale features are concatenated into a new feature. The structure diagram of a multi-scale 1D convolution is shown in Figure 3.
On the other hand, LSTM is extremely successful at dealing with time series challenges. Therefore, we propose to learn the long-term global dependence between the states from the echo state matrix using LSTM to make LA-ESN learn more robust classification feature information. In LA-ESN, the LSTM receives the echo state matrix as a multivariate state matrix with a single time step, and the operations of the LSTM are shown as follows:
f t = σ ( W f h t 1 + W f x t + b f )
i t = σ ( W i h t 1 + W i x t + b i )
o t = σ ( W o h t 1 + W o x t + b o )
w t = tan h ( W w h t 1 + W w x t + b w )
C t = f t C t 1 + i t w t
h t = o t tan h ( C t )
where σ defines as the logistic sigmoid function, and symbol represents the element-wise multiplication. The recurrent weight matrices are depicted using the notations Wf, Wi, Wo, and Ww. The deviation is depicted using the notation bf, bi, bo and bw. The diagram of the LSTM model is shown in Figure 4.
Then, the attention module is integrated, which is commonly used in natural language processing. The context vector c i depends on a sequence of annotations ( h i , , h T x ) . Each annotation h i contains information about the entire input sequence, focusing on the part of the input sequence around the i-th word. The encoder maps the input sequence to a new sequence. The context vector c i is a weighted sum of these annotations as below:
c i = j 1 T x a i j h j
The weight a i j for each annotation h j is calculated as follows:
a i j = exp ( e i j ) k - 1 T x exp ( e i k )
e i j = φ ( s i 1 , h j )
where h j denotes the j-th hidden state vector of the encoder and s i 1 denotes the hidden state of the decoder at the previous time step. The term φ is a scoring function and used to calculate the similarity between h j and s i 1 . Figure 5 depicts the attention model diagram. There are two advantages to using the AM. First, AM can apply various weights to distinct echo state representations of the same time step. In other words, AM can give more weight to information that is more significant for categorization while suppressing irrelevant data. Second, including the AM increases the model’s running speed.

4. Experiments and Results

4.1. Database Description

We conducted extensive tests using the publicly accessible UCR database, which can be acquired from this URL (https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (accessed on 9 October 2022)), to ensure that the proposed LA-ESN approach is valid for the time series classification problem. The collection comprises 128 datasets divided into 15 categories. The specific descriptions of the datasets used are listed in Table 1. Each dataset has five characteristics: training set size, test set size, number of categories, sequence length and category [34]. The selected datasets cover multiple categories in the UCR database. We use Python programming language to implement our proposed LA-SEN approach and execute all tests on an Intel Core i9-9900 k CPU with 32 GB RAM and an Nvidia GeForce RTX 2080 GPU.

4.2. Evaluation Metric

In our experiment, the standard accuracy is used as the evaluation index. Meanwhile, the mean error (ME) and mean rank (MR) are employed to evaluate the classification performance of a given model on multiple datasets [35]. They are defined as follows:
P C E k = e k c k
M E = 1 K P C E k
M R = S C R K
where e k denotes the error rate of the k-th dataset, c k means the number of classes in the k-th dataset, and K represents the number of used datasets.
After sorting and numbering the data from largest to smallest, the MR is the average of the ordinal numbers in a dataset. The MR can help us see which of the methods is working well. The smaller average of MR may indicate that the model is more accurate than the other methods on most datasets.
To better understand MR, we introduce the SCR term with an example to explain the process of calculating the average rank. SCR represents the sorted sum of a particular method for all datasets. For example, we have three datasets and three methods denoted as D = { d 1 , d 2 , d 3 } and F = { f 1 , f 2 , f 3 } , respectively, and the corresponding results of each method on each dataset are as follows:
f 1 f 2 f 3 d 1 0.80 0.75 0.83 d 2 0.79 0.87 0.89 d 3 0.90 0.89 0.93
Then, according to accuracy on the dataset d1, the f1, f2 and f3 are sorted from large to small and denoted as {2, 3, 1}. In the same way, we obtain the sorting results on the datasets d2 and d3 are {3, 2, 1} and {2, 3, 1}, respectively. Thus, the rankings of f1 are 2, 3 and 2 on dataset D, respectively, then the SCR of f1 is the sum of sorts on all datasets; that is, the SCR of method f1 is 7 = 2 + 3 + 2. According to Equation (17), the smaller SCR, the smaller MR. The method’s accuracy is higher, and the MR on the dataset is more minor. Therefore, the MR can evaluate the overall classification results of the method on all datasets.

4.3. Results and Discussion

4.3.1. Compared with Traditional Methods

We compare the proposed LA-ESN model with the traditional TSC models to conduct a general evaluation. In this work, we selected 12 conventional models as comparison methods and abbreviated them as ED [12], DDTW [13], DTD [11], LS [14], BOSS [36], EE [37], FCOTE [38], TSF [16], TSBF [15], ST [18], LPS [19], and FS [17]. We briefly describe each comparison method as follows:
(1)
ED (1-Nearest Neighbour with Euclidean Distance). In this method, the Euclidean distance is employed to measure the similarity of two given time series, and then the nearest neighbor is used for classification.
(2)
DDTW (Dynamic Derivative Time Warping). This method weights the DTW distance between the two-time series and the DTW distance between the corresponding first-order difference series.
(3)
DTD (Derivative Transform Distance). The DTW distance between sequences of sine, cosine, and Hilbert transforms is further considered based on the DDTW.
(4)
LCSS (Longest Common Subsequence). In this method, a shapelet transform-based classifier is designed using a heuristic gradient descent-based shapelet search process instead of enumeration.
(5)
BOSS (Bag of SFA Symbols). This method uses windows to form ‘words’ on the level and then explores a truncated discrete Fourier transform on each window to obtain features.
(6)
EE (Elastic Ensemble). The EE method takes a voting scheme to combine eleven 1-NN classifiers with elastic distance metrics.
(7)
FCOTE (Flat Collective of Transform-based Ensembles). The method integrates 35 classifiers by using the cross-validation accuracy of training sets.
(8)
TSF (Time Series Forest). The time series is first divided into intervals in this method to calculate the mean, standard deviation and slope as interval features. Then, the intervals are randomly selected to train the tree forest.
(9)
TSBF (Time Series Bag Feature). This method selects multiple random length subsequences from random locations and then divides these subsequences into shorter intervals to capture local information.
(10)
ST (Shape Transform). This method uses the shapelet transform to obtain a new representation of the original time series. Then, a classifier is constructed on the new representation using a weighted integration of eight different classifiers.
(11)
LPS (Learning Pattern Similarity). The method is based on intervals, but the main difference is that the subsequence is used as an attribute rather than the extracted interval features.
(12)
FS (Fast Shape Tree). The method speeds up the search process by converting the original time series into a discrete low-dimensional representation by applying a symbolic aggregation approximation to the actual time series. Random projections are then used to find potential shapelet candidates.
The average accuracy and standard deviation of our proposed LA-ESN method five times and other conventional TSC classifiers on 76 datasets are shown in Table 2. As seen from Table 2, we can conclude that the following: (1) LA-ESN achieves higher classification accuracy than the other 12 traditional methods on 36 datasets and achieves comparable results on the other 25 datasets. The total number significantly outperforms other methods. (2) The average classification accuracy of all datasets of the proposed LA-ESN is also superior to the other compared methods. (3) For MR, the performance of LA-ESN is only slightly worse than that of the FCOTE method but significantly better than that of the other methods. (4) The ME of the proposed LA-ESN is slightly higher than that of the FCOTE and ST methods but lower than that of the other methods. Therefore, the experimental results indicate that LA-ESN is effective for time series classification in most cases.
To better compare multiple classifiers on multiple datasets, we applied the pairwise post hoc analysis proposed by Benavoli et al. [39], where mean rank comparisons were computed using the Wilcoxon signature rank test [40] corrected for Holm’s alpha (5%) [41]. A critical difference diagram [42] is used to depict the outcome. The more right the classifier is, the better it is. As illustrated in Figure 6, the classifiers connected by thick lines reflect no significant difference in accuracy between the two classifiers. Figure 6 shows the pairwise statistical difference comparison between LA-ESN and traditional classifiers and traditional classifiers. As can be seen from Figure 6, our model is located to the right of the majority of approaches.

4.3.2. Compared with Deep Learning Methods

In this section, we compared the LA-ESN model with four deep learning classifiers: MLP [2], FCN [2], ResNet [2] and Inception Time [32]. The following is a brief description of the four deep-learning comparison methods mentioned above:
(1)
MLP (Multilayer Perceptrons). The final result is obtained by using a softmax layer and three fully connected layers of 500 cells. Dropout and ReLU are used to activate the model.
(2)
FCN (Fully Convolutional Networks). The FCN model stacks three one-dimensional convolutional blocks with 128, 256 and 128 and kernel sizes of 3, 5 and 8. Then the features are fed into the global average pooling layer and the softmax layer to obtain the final result. The FCN model uses the ReLU activation function and batch normalization.
(3)
ResNet (Residual Network). The residual network is stacked with three residual blocks. Each residual block consists of 64, 128 and 256 convolutions of sizes 8, 5 and 3, respectively, followed by a ReLU activation function and batch normalization. ResNet extends the neural network to a profound structure by adding shortcut connections in each residual block. ResNet has a higher proclivity for overfitting the training data.
(4)
Inception Time (AlexNet). Instead of the usual fully connected layer, it consists of two distinct residual blocks, each made of three Inception sub-blocks. The input of each residual block is transferred to the information of the next block via a fast linear connection, thus alleviating the problem of gradient disappearance by allowing a direct flow of gradients.
The experimental results of the proposed LA-ESN and these four classifiers on 128 datasets are shown in Table 3.
From Table 3, we can obtain the following: (1) On 64 datasets, LA-ESN has the highest classification accuracy. Furthermore, the total number significantly outperforms other methods. (2) The average classification accuracy of all datasets of the proposed LA-ESN is also superior to other compared methods. (3) The MR value of the proposed LA-ESN is 2.430, which is higher than the Inception Time method but smaller than the other three classifiers. (4) The proposed LA-ESN has a ME value of 0.047, which is lower than the MLP and FCN but greater than ResNet and Inception Time. (5) The performances of LA-ESN and Inception Time perform better than the other three methods. This reason is that LA-ESN and Inception Time can efficiently handle temporal dependencies in time series with high nonlinear mapping capacity and dynamic memory. (6) When the epoch is set as 500, the running time of the proposed LA-ESN on small datasets is concise while is acceptable on large datasets. A paired statistical difference comparison of the five deep learning methods is shown in Figure 7. As seen in the diagram, our model is to the right of the majority of methods.

4.3.3. Ablation Study

In this subsection, a step-by-step exploration has been constructed further to verify the validity of the proposed LA-ESN model. We divided the model into four modules: ESN-LSTM, ESN-CNN, ESN_LSTM_ATT (ELA) and ESN_CNN_ATT (ECA), and conducted experiments on 17 datasets. These four methods first employ the ESN to perform representation learning on time series and then use different classifiers such as LSTM, CNN, LSTM with attention (LSTM_ATT) and CNN with attention (CNN_ATT) to extract feature information from the representation. In order to verify the validity of each module, a unified setting is adopted for the parameters. In ESN, the spectral radius SR is 0.9, the input cell scale IS is 0.1, the reservoir sparsity SP is 0.7, and the size of the reservoir is 32. The epoch of the whole experiment is 500, and the batch size is 25. LSTM_ATT can automatically record long-term temporal dependencies of sequences, and CNN_ATT can extract feature information from different scales. LA-ESN combines both advantages to extract more valuable information from datasets and achieve better classification accuracy. Table 4 displays the outcomes of LA-ESN and four distinct classifiers on 17 datasets. On 13 datasets, it can be seen that LA-ESN produces the best results. At the same time, we build the critical difference diagram shown in Figure 8.

5. Conclusions

This study proposes a deep learning model called LA-ESN for the end-to-end classification of univariate time series. The original time series is first passed through the ESN model in LA-ESN to obtain the echo state representation matrix. Then, the echo state representation matrix is used as the input of the LSTM module and the multi-scale convolutional module, and feature extraction operations are performed on them. Finally, the results from the two modules are concatenated, and the softmax function is utilized to produce the final classification results. The attention-based LSTM module can automatically record the long-term time dependence of the sequences, and the multi-scale 1D convolutional attention module can extract feature information from echo state representations at different scales, highlighting the spatial sparsity and heterogeneity of the data. Without any data reshaping or pre-processing, the model performs better. Based on extensive trials on the UCR time series dataset, the proposed LA-ESN model outperforms several older approaches and current popular deep learning methods in the great majority of selected datasets. Experiments suggest that our technique may perform better on certain datasets.
However, this experiment still needs to be improved in studying the class imbalance problem. As a consequence, in the future, we will improve the model to obtain the best results on most datasets while also addressing the class imbalance issue. Second, we can consider modifying the model to accomplish the classification of multivariate time series data. Finally, LA-ESN is a more generic model in the field of time series, and we can subsequently consider using LA-ESN for tasks such as time series prediction and clustering.

Author Contributions

Data curation, H.S., M.L., J.H., P.L. and Y.P.; Formal analysis, H.S., M.L. and Y.P.; Methodology, H.S., M.L. and Y.Y.; Resources, H.S., M.L. and J.H.; Supervision, P.L. and Y.Y.; Writing—original draft, H.S., M.L. and P.L.; Writing—review and editing, H.S., M.L., P.L. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by grants from the National Natural Science Foundation of China (Nos. 62062040, 61967010 and 62067003), the Outstanding Youth Project of Jiangxi Natural Science Foundation (No. 20212ACB212003), the Jiangxi Province Key Subject Academic and Technical Leader Funding Project (No. 20212BCJ23017), the Jiangxi Province Graduate Innovation Project (No. YC2022-S275), the Central Guided Local Science and Technology Development Special Project (20222ZDH04090) and the Jiangxi Provincial Social Science “14th Five-Year” Planning Project (22WT76).

Data Availability Statement

The data were derived from public domain resources.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hamilton, J.D. Time Series Analysis; Princeton University Press: Oxford, UK, 2020. [Google Scholar]
  2. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.-A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef] [Green Version]
  3. Xiao, X.; Li, R.; Zheng, H.-T.; Ye, R.; Kumar Sangaiah, A.; Xia, S. Novel dynamic multiple classification system for network traffic. Inf. Sci. 2018, 479, 526–541. [Google Scholar] [CrossRef]
  4. Michida, R.; Katayama, D.; Seiji, I.; Wu, Y.; Koide, T.; Tanaka, S.; Okamoto, Y.; Mieno, H.; Tamaki, T.; Yoshida, S. A Lesion Classification Method Using Deep Learning Based on JNET Classification for Computer-Aided Diagnosis System in Colorectal Magnified NBI Endoscopy. In Proceedings of the 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), Jeju, Republic of Korea, 27–30 June 2021; pp. 1–4. [Google Scholar]
  5. Mori, U.; Mendiburu, A.; Miranda, I.; Lozano, J. Early classification of time series using multi-objective optimization techniques. Inf. Sci. 2019, 492, 204–218. [Google Scholar] [CrossRef] [Green Version]
  6. Wan, Y.; Si, Y.-W. A formal approach to chart patterns classification in financial time series. Inf. Sci. 2017, 411, 151–175. [Google Scholar] [CrossRef]
  7. Wang, H.; Wu, Q.J.; Wang, D.; Xin, J.; Yang, Y.; Yu, K. Echo state network with a global reversible autoencoder for time series classification. Inf. Sci. 2021, 570, 744–768. [Google Scholar] [CrossRef]
  8. Marteau, P.-F.; Gibet, S. On Recursive Edit Distance Kernels with Application to Time Series Classification. IEEE Trans. Neural Networks Learn. Syst. 2014, 26, 1121–1133. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, J.; Zhao, Y. Time Series K-Nearest Neighbors Classifier Based on Fast Dynamic Time Warping. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 28–30 June 2021; pp. 751–754. [Google Scholar]
  10. Gustavo, E.; Batista, A.; Keogh, E.J.; Tataw, O.M.; De Souza, V.M.A. CID: An efficient complexity-invariant distance for time series. Data Min. Knowl. Discov. 2013, 28, 634–669. [Google Scholar] [CrossRef]
  11. Purnawirawan, A.; Wibawa, A.D.; Wulandari, D.P. Classification of P-wave Morphology Using New Local Distance Transform and Random Forests. In Proceedings of the 6th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia, 7–8 September 2020; pp. 1–6. [Google Scholar]
  12. Qiao, Z.; Mizukoshi, Y.; Moteki, T.; Iwata, H. Operation State Identification Method for Unmanned Construction: Extended Search and Registration System of Novel Operation State Based on LSTM and DDTW. In Proceedings of the IEEE/SICE International Symposium on System Integration (SII), Virtual Conference, 9–12 January 2022; pp. 13–18. [Google Scholar]
  13. Rahman, B.; Warnars, H.L.H.S.; Sabarguna, B.S.; Budiharto, W. Heart Disease Classification Model Using K-Nearest Neighbor Algorithm. In Proceedings of the 6th International Conference on Informatics and Computing (ICIC), Virtual Conference, 3–4 November 2021; pp. 1–4. [Google Scholar]
  14. Górecki, T. Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recognit. Lett. 2014, 45, 99–105. [Google Scholar] [CrossRef]
  15. Baydogan, M.G.; Runger, G.; Tuv, E. A Bag-of-Features Framework to Classify Time Series. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2796–2802. [Google Scholar] [CrossRef]
  16. Deng, H.; Runger, G.; Tuv, E.; Vladimir, M. A time series forest for classification and feature extraction. Inf. Sci. 2013, 239, 142–153. [Google Scholar] [CrossRef] [Green Version]
  17. Ji, C.; Zhao, C.; Liu, S.; Yang, C.; Pan, L.; Wu, L.; Meng, X. A fast shapelet selection algorithm for time series classification. Comput. Networks 2019, 148, 231–240. [Google Scholar] [CrossRef]
  18. Arul, M.; Kareem, A. Applications of shapelet transform to time series classification of earthquake, wind and wave data. Eng. Struct. 2021, 228, 111564. [Google Scholar] [CrossRef]
  19. Baydogan, M.G.; Runger, G. Time series representation and similarity based on local autopatterns. Data Min. Knowl. Discov. 2016, 30, 476–509. [Google Scholar] [CrossRef]
  20. Kate, R.J. Using dynamic time warping distances as features for improved time series classification. Data Min. Knowl. Discov. 2016, 30, 283–312. [Google Scholar] [CrossRef]
  21. Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Networks 2019, 116, 237–245. [Google Scholar] [CrossRef] [Green Version]
  22. Ma, Q.; Chen, E.; Lin, Z.; Yan, J.; Yu, Z.; Ng, W.W.Y. Convolutional Multitimescale Echo State Network. IEEE Trans. Cybern. 2019, 51, 1613–1625. [Google Scholar] [CrossRef]
  23. Karim, F.; Majumdar, S.; Darabi, H. Insights into LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 2019, 7, 67718–67725. [Google Scholar] [CrossRef]
  24. Koh, B.H.D.; Lim, C.L.P.; Rahimi, H.; Woo, W.L.; Gao, B. Deep Temporal Convolution Network for Time Series Classification. Sensors 2021, 21, 603. [Google Scholar] [CrossRef]
  25. Ku, B.; Kim, G.; Ahn, J.-K.; Lee, J.; Ko, H. Attention-Based Convolutional Neural Network for Earthquake Event Classification. IEEE Geosci. Remote. Sens. Lett. 2020, 18, 2057–2061. [Google Scholar] [CrossRef]
  26. Tripathi, A.M.; Baruah, R.D. Multivariate Time Series Classification with An Attention-Based Multivariate Convolutional Neural Network. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  27. Sun, J.; Takeuchi, S.; Yamasaki, I. Prototypical Inception Network with Cross Branch Attention for Time Series Classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–7. [Google Scholar]
  28. Li, D.; Lian, C.; Yao, W. Research on time series classification based on convolutional neural network with attention mechanism. In Proceedings of the 11th International Conference on Intelligent Control and Information Processing (ICICIP), Yunnan, China, 3–7 December 2021; pp. 88–93. [Google Scholar]
  29. Huang, Z.; Yang, C.; Chen, X.; Zhou, X.; Chen, G.; Huang, T.; Gui, W. Functional deep echo state network improved by a bi-level optimization approach for multivariate time series classification. Appl. Soft Comput. 2021, 106. [Google Scholar] [CrossRef]
  30. Wang, H.; Liu, Y.; Wang, D.; Luo, Y.; Tong, C.; Lv, Z. Discriminative and regularized echo state network for time series classification. Pattern Recognit. 2022, 130, 1–14. [Google Scholar] [CrossRef]
  31. Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification. Front. Comput. Sci. 2016, 10, 96–112. [Google Scholar] [CrossRef]
  32. Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. InceptionTime: Finding AlexNet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
  33. Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Time series classification using multi-channels deep convolutional neural networks. In Proceedings of the International Conference on Web-Age Information Management, Macau, China, 16–18 June 2014; Springer: Cham, Switzerland, 2014; pp. 298–310. [Google Scholar]
  34. Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.-C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
  35. Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
  36. Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min. Knowl. Discov. 2015, 29, 1505–1530. [Google Scholar] [CrossRef]
  37. Lines, J.; Bagnall, A. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 2015, 29, 565–592. [Google Scholar] [CrossRef]
  38. Bagnall, A.; Lines, J.; Hills, J.; Bostrom, A. Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles. IEEE Trans. Knowl. Data Eng. 2015, 27, 2522–2535. [Google Scholar] [CrossRef]
  39. Benavoli, A.; Corani, G.; Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 2016, 17, 152–161. [Google Scholar] [CrossRef]
  40. Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
  41. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
  42. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar] [CrossRef]
Figure 1. The general architecture of LA-ESN model.
Figure 1. The general architecture of LA-ESN model.
Information 14 00067 g001
Figure 2. The diagram of the primitive ESN model.
Figure 2. The diagram of the primitive ESN model.
Information 14 00067 g002
Figure 3. The diagram of a multi-scale 1D convolution. The stride of each 1D convolution in LA-ESN is set to 1. Different colours represent the convolution results with different convolution kernel sizes. From top to bottom, the convolution results are shown for convolution kernel sizes 3, 5 and 8, respectively.
Figure 3. The diagram of a multi-scale 1D convolution. The stride of each 1D convolution in LA-ESN is set to 1. Different colours represent the convolution results with different convolution kernel sizes. From top to bottom, the convolution results are shown for convolution kernel sizes 3, 5 and 8, respectively.
Information 14 00067 g003
Figure 4. The diagram of LSTM model.
Figure 4. The diagram of LSTM model.
Information 14 00067 g004
Figure 5. The diagram of attention model.
Figure 5. The diagram of attention model.
Information 14 00067 g005
Figure 6. Pairwise statistical difference comparison of LA-ESN and 12 traditional classifiers.
Figure 6. Pairwise statistical difference comparison of LA-ESN and 12 traditional classifiers.
Information 14 00067 g006
Figure 7. Pairwise statistical difference comparison of LA-ESN and four deep learning methods.
Figure 7. Pairwise statistical difference comparison of LA-ESN and four deep learning methods.
Information 14 00067 g007
Figure 8. Pairwise statistical difference comparison of LA-ESN and four ablation models.
Figure 8. Pairwise statistical difference comparison of LA-ESN and four ablation models.
Information 14 00067 g008
Table 1. The specific description of 128 datasets used for time series classification.
Table 1. The specific description of 128 datasets used for time series classification.
DatasetTrainTestClassLength
ACSF1100100101460
Adiac39039137176
AllGestureX300700100
AllGestureY300700100
AllGestureZ300700100
ArrowHead361753251
Beef30305470
BeetleFly20202512
BirdChicken20202512
BME301503128
Car60604577
CBF309003128
Chinatown20345224
ChlorineCon46738403166
CinCECGTorso40138041639
Coffee28282286
Computers2502502720
CricketX39039012300
CricketY39039012300
CricketZ39039012300
Crop720016,8002446
DiatomSizeR163064345
DistPhxAgeGp400139380
DistlPhxOutCorr600276280
DistPhxTW400139680
DodgerLoopDay78807288
DodgerLoopGame201382288
DodgerLoopWnd201382288
Earthquakes3221392512
ECG200100100296
ECG500050045005140
ECGFiveDays238612136
ElectricDevices89267711796
EOGHorSignal362362121250
EOGVerticalSignal362362121250
EthanolLevel50450041751
FaceAll560169014131
FaceFour24884350
FacesUCR200205014131
FiftyWords45045550270
Fish1751757463
FordA360113202500
FordB36368102500
FreezerRegularT15028502301
FreezerSmallTrain2828502301
Fungi1818618201
GestureMidAirD120813026360
GestureMidAirD220813026360
GestureMidAirD320813026360
GesturePebbleZ113217260
GesturePebbleZ214615860
GunPoint501502150
GunPointAgeSpan1353162150
GunPointMaleFe1353162150
GunPointOldYg1353162150
Ham1091052431
HandOutlines100037022709
Haptics15530851092
Herring64642512
HouseTwenty3410123000
InlineSkate10055071882
InsectEPGRegTra622493601
InsectEPGSmallTra172493601
InsectWingSnd30,00020,0001030
ItalyPowerDemand671029224
LargeKitchenApp3753753720
Lightning260612637
Lightning770737319
Mallat55234581024
Meat60603448
MedicalImages3817601099
MelbournePed119424391024
MidPhxOutAgeGp400154380
MidPhxOutCorr600291280
MidPhxTW399154680
MixedRegularTrain500242551024
MixedSmallTrain100242551024
MoteStrain201252284
NonFetalECGTh11800196542750
NonFetalECGTh21800196542750
OliveOil30304570
OSULeaf2002426427
PhaOutCorr1800858280
Phoneme2141896391024
PickupGestureWZ5050100
PigAirwayPressure104208522000
PigArtPressure104208522000
PigCVP104208522000
PLAID537537110
Plane1051057144
PowerCons1801802144
ProxPhxOutAgeGp400205380
ProxPhaxOutCorr600291280
ProxPhxTW400205680
RefDevices3753753720
Rock205042844
ScreenType3753753720
SemgHandGenCh230060021500
SemgHandMovCh245045061500
SemgHandSubCh245045051500
ShakeGestureWZ5050100
ShapeletSim201802500
ShapesAll60060060512
SmallKitchenApp3753753720
SmoothSubspace150150315
SonyAIBORobSur120601270
SonyAIBORobSur227953265
StarLightCurves1000823631024
Strawberry6133702235
SwedishLeaf50062515128
Symbols259956398
SyntheticControl300300660
ToeSegmentation1402282277
ToeSegmentation2361302343
Trace1001004275
TwoLeadECG231139282
TwoPatterns100040004128
UMD361443150
UWaveAll89635828945
UwaveX89635828315
UwaveY89635828315
UwaveZ89635828315
Wafer100061642152
Wine57542234
WordSynonyms26763825270
Worms181775900
WormsTwoClass181772900
Yoga30030002426
Table 2. The results of LA-ESN model and different traditional TSC modes on 76 datasets.
Table 2. The results of LA-ESN model and different traditional TSC modes on 76 datasets.
DatasetEDDDTWDTDLCSSBOSSEEFCOTETSFTSBFSTLPSFSLA-ESN
Adiac0.6110.7010.7010.5220.7650.6650.7900.7310.7700.7830.7700.5930.984 (0.002)
Beef0.6670.6670.6670.8670.8000.6330.8670.7670.5670.9000.6000.5670.935 (0.009)
BeetleFly0.7500.6500.6500.8000.9000.7500.8000.7500.8000.9000.8000.7000.783 (0.080)
BirdChicken0.5500.8500.8000.8000.9500.8000.9000.8000.9000.8001.0000.7500.775 (0.048)
Car0.7330.8000.7830.7670.8330.8330.9000.7670.7830.9170.8500.7500.931 (0.002)
CBF0.8520.9970.9800.9910.9980.9980.9960.9940.9880.9740.9990.9400.901 (0.030)
ChlorineCon0.6500.7080.7130.5920.6610.6560.7270.7200.6920.7000.6080.5460.911 (0.009)
CinCECGTorso0.8970.7250.8520.8690.9000.9420.9950.9830.7120.9540.7360.8590.909 (0.004)
Computers0.5760.7160.7160.5840.7560.7080.7400.7200.7560.7360.6800.5000.551 (0.012)
CricketX0.5770.7540.7540.7410.7360.8130.8080.6640.7050.7720.6970.4850.930 (0.004)
CricketY0.5670.7770.7740.7180.7540.8050.8260.6720.7360.7790.7670.5310.933 (0.002)
CricketZ0.5870.7740.7740.7410.7460.7820.8150.6720.7150.7870.7540.4640.935 (0.003)
DiatomSizeR0.9350.9670.9150.9800.9310.9440.9280.9310.8990.9250.9050.8660.978 (0.005)
DistPhxAgeGp0.6260.7050.6620.7190.7480.6910.7480.7480.7120.7700.6690.6550.803 (0.007)
DistPhxCorr0.7170.7320.7250.7790.7280.7280.7610.7720.7830.7750.7210.7500.748 (0.009)
DistPhxTW0.6330.6120.5760.6260.6760.6470.6980.6690.6760.6620.5680.6260.893 (0.005)
Earthquakes0.7120.7050.7050.7410.7480.7410.7480.7480.7480.7410.6400.7050.734 (0.014)
ECG2000.8800.8300.8400.8800.8700.8800.8800.8700.8400.8300.8600.8100.904 (0.010)
ECG50000.9250.9240.9240.9320.9410.9390.9460.9390.9400.9440.9170.9230.973 (0.000)
ECGFiveDays0.7970.7690.8221.0000.9830.8200.9990.9560.8770.9840.8790.9980.883 (0.040)
ElectricDevices0.5520.5920.5940.5870.7990.6630.7130.6930.7030.7470.6810.5790.895 (0.003)
FaceAll0.7140.9020.8990.7490.7820.8490.9180.7510.7440.7790.7670.6260.974 (0.002)
FaceFour0.7840.8290.8180.9660.9960.9090.8980.9321.0000.8520.9430.9090.910 (0.011)
FacesUCR0.7690.9040.9080.9390.9570.9450.9420.8830.8670.9060.9260.7060.977 (0.003)
FiftyWords0.6310.7540.7540.7300.7050.8200.7980.7410.7580.7050.8180.4810.988 (0.001)
Fish0.7830.9430.9260.9600.9890.9660.9830.7940.8340.9890.9430.7830.965 (0.002)
FordA0.6650.7230.7650.9570.9300.7380.9570.8150.8500.9710.8730.7870.789 (0.011)
FordB0.6060.6670.6530.9170.7110.6620.8040.6880.5990.8070.7110.7280.697 (0.014)
GunPoint0.9130.9800.9871.0000.9940.9931.0000.9730.9871.0000.9930.9470.947 (0.007)
Ham0.6000.4760.5520.6670.6670.5710.6480.7430.7620.6860.5620.6480.689 (0.009)
HandOutlines0.8620.8680.8650.4810.9030.8890.9190.9190.8540.9320.8810.8110.903 (0.005)
Haptics0.3700.3990.3990.4680.4610.3930.5230.4450.4900.5230.4320.3930.783 (0.007)
Herring0.5160.5470.5470.6250.5470.5780.6250.6090.6410.6720.5780.5310.609 (0.018)
InlineSkate0.3420.5620.5090.4380.5160.4600.4950.3760.3850.3730.5000.1890.816 (0.004)
InsWngSnd0.5620.3550.4730.6060.5230.5950.6530.6330.6250.6270.5510.4890.931 (0.001)
ItalyPrDmd0.9550.9500.9510.9600.9090.9620.9610.9600.8830.9480.9230.9170.958 (0.008)
LrgKitApp0.4930.7950.7950.7010.7650.8110.8450.5710.5280.8590.7170.5600.637 (0.015)
Lightning20.7540.8690.8690.8200.8100.8850.8690.8030.7380.7380.8190.7050.702 (0.026)
Lightning70.5750.6710.6570.7950.6660.7670.8080.7530.7260.7260.7390.6440.901 (0.009)
Mallat0.9140.9490.9270.9500.9490.9390.9540.9190.9600.9640.9080.9760.976 (0.005)
Meat0.9330.9330.9330.7330.9000.9330.9170.9330.9330.8500.8830.8330.917 (0.054)
MedicalImages0.6840.7370.7450.6640.7180.7420.7580.7550.7050.6700.7460.6240.945 (0.002)
MidPhxAgeGp0.5190.5390.5000.5710.5450.5580.6360.5780.5780.6430.4870.5450.660 (0.021)
MidPhxCorr0.7660.7320.7420.7800.7800.7840.8040.8280.8140.7940.7730.7290.779
(0.009)
MidPhxTW0.5130.4870.5000.5060.5450.5130.5710.5650.5970.5190.5260.5320.855 (0.006)
MoteStrain0.8790.8330.7680.8830.8460.8830.9370.8690.9030.8970.9220.7770.848 (0.022)
NonInv_Thor10.8290.8060.8410.2590.8380.8460.9310.8760.8420.9500.8120.7100.997 (0.000)
NonInv_Thor20.8800.8930.8900.7700.9000.9130.9460.9100.8620.9510.8410.7540.997 (0.000)
OliveOil0.8670.8330.8670.1670.8670.8670.9000.8670.8330.9000.8670.7330.750 (0.000)
OSULeaf0.5210.8800.8840.7770.9550.8060.9670.5830.7600.9670.7400.6780.853 (0.005)
PhaOutCorr0.7610.7390.7610.7650.7720.7730.7700.8030.8300.7630.7560.7440.784 (0.019)
Phoneme0.1090.2690.2680.2180.2650.3050.3490.2120.2760.3210.2370.1740.963
(0.000)
ProxPhxAgeGp0.7850.8000.7950.8340.8340.8050.8540.8490.8490.8440.7950.7800.893 (0.006)
ProxPhxCorr0.8080.7940.7940.8490.8490.8080.8690.8280.8730.8830.8420.8040.865 (0.026)
ProxPhxTW0.7070.7690.7710.7760.8000.7660.7800.8150.8100.8050.7320.7020.926 (0.007)
RefDev0.3950.4450.4450.5150.4990.4370.5470.5890.4720.5810.4590.3330.596 (0.009)
ScreenType0.3600.4290.4370.4290.4640.4450.5470.4560.5090.5200.4160.4130.593 (0.013)
ShapesAll0.7520.8500.8380.7680.9080.8670.8920.7920.1850.8420.8730.5800.993 (0.000)
SmlKitApp0.3440.6400.6480.6640.7250.6960.7760.8110.6720.7920.7120.3330.619
(0.023)
SonyAIBORobot0.6960.7420.7100.8100.8950.7040.8450.7870.7950.8440.7740.6860.730 (0.054)
SonyAIBORobot20.8590.8920.8920.8750.8880.8780.9520.8100.7780.9340.8720.7900.836 (0.014)
StarlightCurves0.8490.9620.9620.9470.9780.9260.9800.9690.9770.9790.9630.9180.960 (0.002)
Strawberry0.9460.9540.9570.9110.9760.9460.9510.9650.9540.9620.9620.9030.947 (0.016)
SwedishLeaf0.7890.9010.8960.9070.9220.9150.9550.9140.9150.9280.9200.7680.984 (0.001)
Symbols0.8990.9530.9630.9320.9610.9590.9630.9150.9460.8820.9630.9340.950 (0.009)
SynthCntr0.8800.9930.9970.9970.9670.9901.0000.9870.9940.9830.9800.9100.990 (0.003)
Trace0.7601.0000.9901.0001.0000.9901.0000.9900.9801.0000.9801.0000.908 (0.010)
TwoLeadECG0.7470.9780.9850.9960.9850.9710.9930.7590.8660.9970.9480.9240.795 (0.050)
TwoPatterns0.9071.0000.9990.9930.9911.0001.0000.9910.9760.9550.9820.9080.980 (0.006)
UWaveX0.7390.7790.7750.7910.7530.8050.8220.8040.8310.8030.8290.6950.941 (0.001)
UWaveY0.6620.7160.6980.7030.6610.7260.7590.7270.7360.7300.7610.5960.922 (0.001)
UWaveZ0.6490.6960.6790.7470.6950.7240.7500.7430.7720.7480.7680.6380.925 (0.001)
Wafer0.9950.9800.9930.9960.9950.9971.0000.9960.9951.0000.9970.9970.995 (0.000)
Wine0.6110.5740.6110.5000.9120.5740.6480.6300.6110.7960.6290.7590.497 (0.007)
WordSynonyms0.6180.7300.7300.6070.6590.7790.7570.6470.6880.5700.7550.4310.970 (0.000)
Yoga0.8300.8560.8560.8340.9180.8790.8770.8590.8190.8180.8690.6950.844 (0.004)
Average0.7030.7660.7670.7560.8050.7850.8310.7800.7690.8140.7770.6940.861
Total1325941347133136
MR10.6057.8557.9477.0265.4746.1582.9616.2246.5664.7507.26310.9214.684
ME0.0790.0650.0650.0620.0500.0600.0460.0580.0590.0480.0600.0770.052
Table 3. The results of LA-ESN and four deep learning methods.
Table 3. The results of LA-ESN and four deep learning methods.
DatasetMLPFCNResNetInception TimeLA-ESN
AC(SD)Time(s)
ACSF10.5580.8980.9160.8960.907 (0.004)169.4
Adiac0.3910.8410.8330.8300.984 (0.002)348.8
AllGestureX0.4770.7130.7410.7720.904 (0.001)257.7
AllGestureY0.5710.7840.7940.8130.912 (0.002)257.0
AllGestureZ0.4390.6920.7260.7920.891 (0.001)257.7
ArrowHead0.7840.8430.8380.8470.850 (0.016)39.0
Beef0.7130.6800.7530.6870.935 (0.009)40.1
BeetleFly0.8800.9100.8500.8000.790 (0.086)85.1
BirdChicken0.7400.9400.8800.9500.790 (0.037)30.8
BME0.9050.8360.9990.9930.928 (0.004)29.6
Car0.7830.9130.9170.8900.931 (0.002)107.4
CBF0.8690.9940.9960.9980.900 (0.033)129.0
Chinatown0.8720.9800.9780.9830.706 (0.091)39.3
ChlorineCon0.8000.8170.8530.8730.911 (0.009)892.9
CinCECGTorso0.8380.8290.8380.8420.909 (0.004)1033.2
Coffee0.9931.0001.0001.0001.000 (0.000)94.7
Computers0.5580.8190.8060.7860.548 (0.010)299.5
CricketX0.5910.7940.7990.8410.930 (0.004)308.2
CricketY0.5980.7930.8100.8390.933 (0.002)323.2
CricketZ0.6290.8100.8090.8490.935 (0.003)302.1
Crop0.6180.7380.7430.7510.911 (0.136)1657.0
DiatomSizeR0.9090.3460.3010.9350.978 (0.005)108.2
DistPhxAgeGp0.6470.7180.7180.7340.803 (0.007)123.5
DistlPhxOutCorr0.7270.7600.7700.7680.748 (0.009)197.1
DistPhxTW0.6100.6950.6630.6650.893 (0.005)123.7
DodgerLoopDay0.1600.1430.1500.1500.875 (0.013)49.8
DodgerLoopGame0.8650.7680.7100.8540.810 (0.048)34.5
DodgerLoopWnd0.9780.9040.9520.9700.971 (0.022)34.1
Earthquakes0.7270.7250.7120.7420.738 (0.012)287.4
ECG2000.9140.8880.8740.9180.904 (0.010)93.7
ECG50000.9300.9400.9350.9390.973 (0.000)712.0
ECGFiveDays0.9730.9850.9661.0000.892 (0.038)203.2
ElectricDevices0.5930.7060.7280.7090.896 (0.003)4303.5
EOGHorSignal0.4320.5650.5990.5880.899 (0.005)511.0
EOGVerticalSignal0.4180.4460.4450.4640.899 (0.003)508.3
EthanolLevel0.3860.4840.7580.8040.765 (0.030)933.2
FaceAll0.7940.9380.8670.8010.974 (0.002)624.2
FaceFour0.8360.9300.9550.9570.910 (0.011)47.9
FacesUCR0.8310.9430.9540.9640.977 (0.003)320.9
FiftyWords0.7080.6460.7400.8070.988 (0.001)311.3
Fish0.8480.9610.9810.9760.965 (0.002)165.3
FordA0.8160.9140.9370.9570.786 (0.010)3204.3
FordB0.7070.7720.8130.8490.700 (0.014)2970.7
FreezerRegularT0.9060.9970.9980.9960.959 (0.013)352.7
FreezerSmallTrain0.6860.6830.8320.8660.676 (0.007)324.5
Fungi0.8630.0180.1771.0000.986 (0.006)35.4
GestureMidAirD10.5750.6950.6980.7320.969 (0.002)111.8
GestureMidAirD20.5450.6310.6680.7080.963 (0.002)110.5
GestureMidAirD30.3820.3260.3400.3660.955 (0.002)111.3
GesturePebbleZ10.7920.8800.9010.9220.942 (0.003)100.8
GesturePebbleZ20.7010.7810.7770.8750.922 (0.018)104.3
GunPoint0.9281.0000.9911.0000.947 (0.007)52.7
GunPointAgeSpan0.9340.9960.9970.9870.882 (0.009)71.4
GunPointMaleFe0.9800.9970.9920.9960.985 (0.004)70.1
GunPointOldYg0.9410.9890.9890.9620.992 (0.007)70.6
Ham0.6990.7070.7580.7050.688 (0.009)119.4
HandOutlines0.9140.7990.9140.9460.903 (0.005)3273.4
Haptics0.4250.4900.5100.5490.783 (0.007)368.5
Herring0.4910.6440.6000.6660.613 (0.018)89.4
HouseTwenty0.7340.9820.9830.9750.803 (0.009)154.8
InlineSkate0.3350.3320.3770.4850.816 (0.004)631.9
InsectEPGRegTra0.6460.9990.9980.9980.788 (0.015)94.3
InsectEPGSmallTra0.6270.2180.3720.9410.756 (0.005)73.0
InsectWingSnd0.6040.3920.4990.6300.931 (0.001)497.3
ItalyPowerDemand0.9530.9630.9620.9640.958 (0.008)166.9
LargeKitchenApp0.4700.9030.9010.9000.641 (0.014)497.1
Lightning20.6820.7340.7800.7870.702 (0.028)112.5
Lightning70.6160.8250.8270.8030.901 (0.009)86.6
Mallat0.9230.9670.9740.9410.976 (0.005)1212.1
Meat0.8930.8030.9900.9330.916 (0.059)93.0
MedicalImages0.7190.7780.7700.7870.945 (0.002)253.2
MelbournePed0.8630.9120.9090.9080.973 (0.004)322.6
MidPhxOutAgeGp0.5220.5350.5450.5230.660 (0.021)143.6
MidPhxOutCorr0.7550.7950.8260.8160.781 (0.008)208.5
MidPhxTW0.5360.5010.4950.5080.855 (0.006)173.7
MixedRegularTrain0.9070.9550.9730.9660.963 (0.002)1101.6
MixedSmallTrain0.8410.8930.9170.9120.934 (0.004)774.1
MoteStrain0.8550.9360.9240.8860.848 (0.022)201.1
NonFetalECGTh10.9150.9580.9410.9560.997 (0.000)2374.7
NonFetalECGTh20.9180.9530.9440.9580.997 (0.000)2320.2
OliveOil0.6530.7200.8470.8200.750 (0.000)63.8
OSULeaf0.5600.9790.9800.9250.853 (0.005)201.5
PhaOutCorr0.7560.8180.8450.8380.777 (0.013)512.4
Phoneme0.0940.3280.3330.3280.963 (0.000)1109.9
PickupGestureWZ0.6040.7440.7040.7440.936 (0.006)31.8
PigAirwayPressure0.0650.1720.4060.5320.969 (0.001)302.6
PigArtPressure0.1050.9870.9910.9930.971 (0.001)303.6
PigCVP0.0760.8310.9180.9530.974 (0.000)302.1
PLAID0.6250.9040.9400.9370.932 (0.001)791.3
Plane0.9771.0001.0001.0000.993 (0.001)79.7
PowerCons0.9770.8630.8790.9480.979 (0.010)70.1
ProxPhxOutAgeGp0.8490.8250.8470.8450.893 (0.006)127.1
ProxPhaxOutCorr0.7300.9070.9200.9180.865 (0.026)183.3
ProxPhxTW0.7670.7610.7730.7810.926 (0.007)133.6
RefDevices0.3770.4970.5300.5230.596 (0.009)439.7
Rock0.8520.6320.5520.7520.917 (0.029)105.3
ScreenType0.4020.6220.6150.5800.591 (0.014)470.7
SemgHandGenCh20.8220.8160.8240.8020.825 (0.033)599.5
SemgHandMovCh20.4350.4760.4390.4200.815 (0.015)710.8
SemgHandSubCh20.8170.7420.7390.7870.922 (0.007)707.8
ShakeGestureWZ0.5480.8840.8800.9000.920 (0.008)33.2
ShapeletSim0.5130.7060.7820.9170.510 (0.032)54.4
ShapesAll0.7760.8940.9260.9180.993 (0.000)561.3
SmallKitchenApp0.3800.7770.7810.7560.615 (0.024)452.6
SmoothSubspace0.9800.9750.9800.9810.880 (0.021)30.4
SonyAIBORobSur10.6920.9580.9610.8640.734 (0.059)116.2
SonyAIBORobSur20.8310.9800.9750.9460.838 (0.014)153.8
StarLightCurves0.9500.9650.9720.9780.960 (0.002)4779.5
Strawberry0.9590.9750.9800.9830.947 (0.016)350.3
SwedishLeaf0.8450.9670.9630.9640.984(0.001)247.5
Symbols0.8360.9550.8930.9800.948 (0.009)345.0
SyntheticControl0.9730.9890.9970.9960.990 (0.003)100.6
ToeSegmentation10.5890.9610.9570.9610.639 (0.025)52.9
ToeSegmentation20.7450.8890.8940.9430.794 (0.026)44.2
Trace0.8061.0001.0001.0000.910 (0.009)78.2
TwoLeadECG0.7530.9991.0000.9970.806 (0.048)190.0
TwoPatterns0.9480.8701.0001.0000.980 (0.006)813.6
UMD 0.9490.9880.9900.9820.925 (0.021)33.0
UWaveAll0.9540.8180.8610.9440.988 (0.001)1740.2
UWaveX0.7680.7540.7810.8140.941 (0.001)1324.3
UWaveY0.6990.6420.6660.7550.922 (0.001)1346.0
UWaveZ0.6970.7270.7490.7500.925 (0.001)1251.3
Wafer0.9960.9970.9980.9990.995 (0.000)1309.3
Wine0.5410.6110.7220.6590.496 (0.007)64.2
WordSynonyms0.5990.5610.6170.7320.970 (0.000)296.5
Worms0.4570.7820.7610.7690.794 (0.012)176.6
WormsTwoClass0.6080.7430.7480.7820.626 (0.056)177.2
Yoga0.8560.8370.8670.8910.845 (0.004)900.2
Average0.7050.7860.8070.8360.871
Total213253264
MR4.3133.2342.6482.2502.430
ME0.0660.0480.0430.0370.047
Table 4. Results of LA-ESN and four ablation models.
Table 4. Results of LA-ESN and four ablation models.
DatasetESN_CNNECAESN_LSTMELALA-ESN
Adiac0.9520.9780.9830.9740.984
Beef0.8260.9060.8990.8460.947
CricketX0.9300.9250.9280.9250.930
CricketY0.9320.9270.9290.9170.933
ECG50000.9730.9730.9670.9700.973
HandOutlines0.8540.8560.8780.8990.908
Haptics0.7360.7840.7530.7790.783
NonInv_Thor20.9950.9900.9950.9920.997
ProxPhxAgeGp0.8520.8690.8740.8710.893
ProxPhxTW0.9130.9260.9340.9350.926
ShapesAll0.9920.9920.9900.9870.993
SwedishLeaf0.9820.9830.9780.9780.984
UWaveX0.9420.9410.9330.9340.941
UWaveY0.9100.9150.8990.9120.922
UWaveZ0.9240.9270.9070.9150.925
Wafer0.9860.9600.9900.9900.995
WordSynonyms0.9670.9670.9660.9640.970
Average0.9220.9310.9300.9290.941
Total330113
MR3.2352.7653.4123.5881.294
ME0.0160.0150.0140.0140.012
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sheng, H.; Liu, M.; Hu, J.; Li, P.; Peng, Y.; Yi, Y. LA-ESN: A Novel Method for Time Series Classification. Information 2023, 14, 67. https://doi.org/10.3390/info14020067

AMA Style

Sheng H, Liu M, Hu J, Li P, Peng Y, Yi Y. LA-ESN: A Novel Method for Time Series Classification. Information. 2023; 14(2):67. https://doi.org/10.3390/info14020067

Chicago/Turabian Style

Sheng, Hui, Min Liu, Jiyong Hu, Ping Li, Yali Peng, and Yugen Yi. 2023. "LA-ESN: A Novel Method for Time Series Classification" Information 14, no. 2: 67. https://doi.org/10.3390/info14020067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop