Abstract
Time-series data is an appealing study topic in data mining and has a broad range of applications. Many approaches have been employed to handle time series classification (TSC) challenges with promising results, among which deep neural network methods have become mainstream. Echo State Networks (ESN) and Convolutional Neural Networks (CNN) are commonly utilized as deep neural network methods in TSC research. However, ESN and CNN can only extract local dependencies relations of time series, resulting in long-term temporal data dependence needing to be more challenging to capture. As a result, an encoder and decoder architecture named LA-ESN is proposed for TSC tasks. In LA-ESN, the encoder is composed of ESN, which is utilized to obtain the time series matrix representation. Meanwhile, the decoder consists of a one-dimensional CNN (1D CNN), a Long Short-Term Memory network (LSTM) and an Attention Mechanism (AM), which can extract local information and global dependencies from the representation. Finally, many comparative experimental studies were conducted on 128 univariate datasets from different domains, and three evaluation metrics including classification accuracy, mean error and mean rank were exploited to evaluate the performance. In comparison to other approaches, LA-ESN produced good results.
    1. Introduction
Massive data resources have accumulated in numerous industries in the quickly increasing information era, and large-scale data provide valuable content. Time series data is a set of data points arranged in chronological order, which can be divided into two categories: univariate and multivariate. In univariate time series data, only one variable varies over time, while in multivariate time series data, multiple variables change over time []. Time series data are now widely used in a wide range of applications, including statistics, pattern recognition, earthquake prediction, econometrics, astronomy, signal processing, control engineering, communication engineering, finance, weather forecasting, the Internet of Things, and medical care [].
Much work has been completed around the TSC problem in the last two decades. This enlightened and attractive data mining topic focuses on classifying data points indexed by time by predicting their labels. Time series classification is an essential task in data mining and has been extensively employed in many fields. For example, in the field of network traffic, Xiao et al. [] proposed traffic classification for dynamic network flows. In medical diagnosis, R. Michida et al. [] proposed a deep learning method for classifying lesions based on JNET classification for computer-aided diagnosis systems for colorectal magnification NBI endoscopy. In finance, Mori et al. [] and Wan et al. [] proposed methods for the early classification of time series using multi-objective optimization techniques and financial strategies for classifying chart patterns in time series.
Generally, the solution to the TSC task can be divided into three types []: (1) Distance-based approaches: the distances between various samples are evaluated and utilized for classification through a distance function. Representative distance-based techniques include Time-warping Edit Distance (TED) [], Weighted Dynamic Time Warping (WDTW) [], Complexity Invariant Distance (CID) [], Derivative Transform Distance (DTD) [], Dynamic Derivative Time Warping (DDTW) [], 1-Nearest Neighbour with Euclidean Distance (ED) [] and Longest Common Subsequence (LCSS) []. (2) Feature-based methods: use feature extraction to extract compelling features from the original data, which can be local features of the data or global features. Examples include Time Series Bag Feature (TSBF) [], Time Series Forest (TSF) [], Fast Shape Tree (FS) [], Shape Transform (ST) [], Learning Pattern Similarity (LPS) [], and Dynamic Time Warping Feature (DTWF) []. (3) Neural network-based methods: classifying time series through an “end-to-end” learning model. The raw data are fed straight into the model, and the model generates direct results. For example, Long Short-Term Memory Fully Convolutional Network (LSTM-FCN) [], Echo State Network (ESN) [], Attention LSTM-FCN (ALSTM-FCN) [], and Temporal Convolutional Network (TCN) [].
ESN, CNN and LSTM are widely used for time series classification tasks. Using ESNs alone is insufficient for time series classification. Therefore, several researchers have proposed models that fuse ESNs with CNNs to make ESNs more adaptable to time series classification tasks. Ma et al. [] used ESN to generate a time series representation matrix before employing CNN to extract classification features. LSTM is widely used to extract long-term dependencies in time series, and some researchers have also fused CNN with LSTM for time series classification and achieved quite good results. For example, Karim et al. [] incorporated a full convolutional block and an LSTM to propose a new method. The full convolutional block contains three temporal convolutional blocks as feature extractors, and the temporal convolution block has a convolutional layer with multiple filters. However, extracting features directly from the time series for classification necessitates substantial preprocessing. The attention mechanism has become an essential concept in deep learning and has been successfully applied to time series tasks. In recent years, the main principle behind employing attention in time series classification has been to focus on the most important information related to the input data while extracting features from the input data.
In light of the preceding work, we propose an end-to-end TSC model called LA-ESN for time series classification. The LA-ESN consists of an echo memory encoder and a decoder. The reservoir layer makes up the echo memory encoder, while the decoder is made up of a 1D CNN, an LSTM and Attention. The storage layer first projects the time series into a high-dimensional nonlinear space to generate echo states step by step. The echo memory matrix is formed by collecting the echo states of all time steps in chronological order. To capture the critical historical information in the time series, we design a decoder that applies a 1D CNN, an LSTM and Attention to the echo memory matrix, respectively. To increase network efficiency, the 1D CNN and LSTM are employed to extract multi-scale features and retrieves global information from the echo memory matrix. Then, the Attention is used to extract the essential information from the global information. LA-ESN is an acceptable classification approach, according to experimental findings on a variety of time series datasets. The following are the main contributions of this paper:
- (1)
 - We propose a simple end-to-end model LA-ESN for handling time series classification tasks;
 - (2)
 - We modify the output layer of ESN to handle time series better and use CNN and LSTM as output layers to finish feature extraction;
 - (3)
 - The attention mechanism is deployed behind both CNN and LSTM, which effectively improves the effectiveness and computing efficiency of LA-ESN;
 - (4)
 - Experiments on various time series datasets show that LA-ESN is efficacious.
 
2. Associated Work
2.1. CNN with Attention
CNN and AM have been broadly adopted in image processing and other fields. In combination with AM, CNN has been proposed for time series classification. AM can be interpreted as a feature enhancement method by extracting the most information-rich components of a signal. After calculating the attention values based on the saliency importance in the feature maps extracted from the convolutional layers, the refined feature map weights can be derived from the attention values. A Deep CNN (DCNN) with an attention module has been proposed as a framework for time series classification []. This network improves the classification performance for various seismic events and solves all possible seismic events. Tripathi et al. [] proposed an Attention-based Multivariate CNN (AT-MVCNN), which contains an input tensor based on attention features to encode information across multiple timestamps. Sun et al. [] proposed a Prototypical Inception Network with Cross Branch Attention (PIN-BA) framework that uses CNNs with various receiver window branches to capture features at different time window scales, using a cross-branching attention scheme to emphasize critical feature information in the classification process. In [], five AMs were applied to six neural networks to focus on the valid information in the time series from either the channel or the spatial dimension. All of the above studies yielded promising results in time series classification.
2.2. ESN -Based Classifier
An ESN is a recursive structure consisting of random RNNs. Although ESNs are mainly employed to predict time series, many ESN-based classification networks have also been designed to implement the TSC task. For example, using Autoencoder (AE) theory, Wang et al. [] developed an ESN with an input weight framework and a globally reversible algorithm to reconstruct the randomly initialized input weights of the ESN network. Existing self-encoder and ESN-based network models start with initial input weights, the output weights generated during the encoding process. Huang et al. [] proposed an innovative ESN approach named Functional Deep Echo State Network (FDESN), which introduced temporal and spatial aggregation. Moreover, this approach considers the relative value of temporal data throughout several periods and the dynamic properties of MTS. Wang et al. [] offered a unique Discriminative Regularized ESN (DR-ESN) time series classification algorithm by combining Discriminative Feature Aggregation (DFA) and an Outlier Robust Weight (ORW) algorithm. First, the DFA algorithm is adopted to replace the random input weights of the ESN with the bounded weights based on the sample information. Second, the ORW algorithm is applied to weight-constrained samples with notable training errors to achieve greater robustness in the training process. The DR-ESN can effectively improve the classification performance of the original ESN and significantly reduce the impact of outliers on the classification result.
3. The Proposed LA-ESN Framework
The framework of LA-ESN is divided into two parts: encoding and decoding. In the encoding part, each frame of the input time series is mapped into a high-dimensional reservoir state space to obtain an Echo State Representation (ESR). The ESRs of all time steps are stored in a memory matrix. We simultaneously design two ways to decode the memory matrix in the decoding phase. First, we adopt a Multi-scale 1D CNN [,,] to extract the local information. Second, utilizing LSTM and Attention, long-term dependent information is recovered from the memory matrix. Both attention mechanisms are intended to reduce irrelevant information while increasing computational efficiency. Finally, the local and long-term dependent information obtained by the two methods is pooled and merged, and then the merged features are passed through a fully connected layer. The conditional probability distribution of the categories is calculated using a Softmax layer. The proposed LA-ESN model’s general design is seen in Figure 1.
      
    
    Figure 1.
      The general architecture of LA-ESN model.
  
3.1. Preliminary
TSC refers to the identification of non-labeled samples based on the given labeled samples. The time series dataset, consisting of N samples, can be expressed as follows:
      
        
      
      
      
      
    
        where T denotes the time stamp of the time series data and yi represents the corresponding category in each time series Xi.
3.2. Encoding Stage
The primitive ESN includes an input, reserve pool, and output layer. The diagram of the primitive ESN model is shown in Figure 2. The primitive ESN uses a reserve pool of randomly sparsely connected neurons as the hidden layer of a high-dimensional and nonlinear input representation. The weights of the hidden layer of the ESN are generated in advance rather than by training. Meanwhile, they are trained from the hidden layer to the output layer. Therefore, the generated reserve pool has some good properties that guarantee excellent performance by using only linear methods to train the weights from the reserve pool to the output layer. Given a k-dimensional input, i(t) with time step t, the state of the reserve pool with time step t − 1 is r(t − 1). The equations for updating the primitive ESN are as follows:
        
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where , and  denote the input, intermediate and output weight matrix, respectively; i(t) and r(t−1) express the time series input and the reservoir state at step t−1, respectively;  and  are randomly initialized and fixed;  can be calculated by Equation (4), SR is the spectral radius of ,  is the maximum eigenvalue of the matrix W and the elements of W are generated randomly in [−0.5, 0.5].
      
    
    Figure 2.
      The diagram of the primitive ESN model.
  
In this work, we take ESN as the base network and modify the output layer of ESN to make the network model more suitable for TSC, as shown in Figure 1. Suppose  is a k-dimensional time series. At each time step, the echo state r(t) is calculated according to Equation (2) and the echo state memory matrix R can be obtained as follows:
      
        
      
      
      
      
    
We use r(t) to indicate the t-th row of R, which denotes the echo state at the t-th time step where . The term rm demonstrates as the m-th column of R, expressing the echo state of the m-th time sequence at all time steps where . We keep all the calculated echo states in R to obtain a complete ESR of the sequence as the input of the decoder, and the decoder extracts the discriminative features to determine the category labels.
3.3. Decoding Stage
In previous studies, multi-scale convolution has been used as a feature extractor to extract effective classification features from time series representations. Alternatively, LSTM is used to learn straightforwardly from the input time series. Both approaches are capable of classifying time series to some extent, although they might be improved. Therefore, we propose to appropriately adapt and then combine them to be used as the output layer of ESN to learn better feature information from the echo states.
On the one hand, we employ multiple scales of 1D CNNs for convolution operations along the time direction and use multiple filters for each time scale. The batch normalization operation follows the convolution operations to avoid the gradient disappearance problem. Next, ReLU as a correction layer for the activation function is adopted, which can improve the nonlinear and sparsity connection between the levels to reduce over-fitting. Therefore, batch normalization and ReLU can achieve more robust learning. Finally, the multi-scale features are concatenated into a new feature. The structure diagram of a multi-scale 1D convolution is shown in Figure 3.
      
    
    Figure 3.
      The diagram of a multi-scale 1D convolution. The stride of each 1D convolution in LA-ESN is set to 1. Different colours represent the convolution results with different convolution kernel sizes. From top to bottom, the convolution results are shown for convolution kernel sizes 3, 5 and 8, respectively.
  
On the other hand, LSTM is extremely successful at dealing with time series challenges. Therefore, we propose to learn the long-term global dependence between the states from the echo state matrix using LSTM to make LA-ESN learn more robust classification feature information. In LA-ESN, the LSTM receives the echo state matrix as a multivariate state matrix with a single time step, and the operations of the LSTM are shown as follows:
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where σ defines as the logistic sigmoid function, and symbol  represents the element-wise multiplication. The recurrent weight matrices are depicted using the notations Wf, Wi, Wo, and Ww. The deviation is depicted using the notation bf, bi, bo and bw. The diagram of the LSTM model is shown in Figure 4.
      
    
    Figure 4.
      The diagram of LSTM model.
  
Then, the attention module is integrated, which is commonly used in natural language processing. The context vector  depends on a sequence of annotations . Each annotation  contains information about the entire input sequence, focusing on the part of the input sequence around the i-th word. The encoder maps the input sequence to a new sequence. The context vector  is a weighted sum of these annotations as below:
      
        
      
      
      
      
    
The weight  for each annotation  is calculated as follows:
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where  denotes the j-th hidden state vector of the encoder and  denotes the hidden state of the decoder at the previous time step. The term  is a scoring function and used to calculate the similarity between  and . Figure 5 depicts the attention model diagram. There are two advantages to using the AM. First, AM can apply various weights to distinct echo state representations of the same time step. In other words, AM can give more weight to information that is more significant for categorization while suppressing irrelevant data. Second, including the AM increases the model’s running speed.
      
    
    Figure 5.
      The diagram of attention model.
  
4. Experiments and Results
4.1. Database Description
We conducted extensive tests using the publicly accessible UCR database, which can be acquired from this URL (https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (accessed on 9 October 2022)), to ensure that the proposed LA-ESN approach is valid for the time series classification problem. The collection comprises 128 datasets divided into 15 categories. The specific descriptions of the datasets used are listed in Table 1. Each dataset has five characteristics: training set size, test set size, number of categories, sequence length and category []. The selected datasets cover multiple categories in the UCR database. We use Python programming language to implement our proposed LA-SEN approach and execute all tests on an Intel Core i9-9900 k CPU with 32 GB RAM and an Nvidia GeForce RTX 2080 GPU.
       
    
    Table 1.
    The specific description of 128 datasets used for time series classification.
  
4.2. Evaluation Metric
In our experiment, the standard accuracy is used as the evaluation index. Meanwhile, the mean error (ME) and mean rank (MR) are employed to evaluate the classification performance of a given model on multiple datasets []. They are defined as follows:
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where  denotes the error rate of the k-th dataset,  means the number of classes in the k-th dataset, and K represents the number of used datasets.
After sorting and numbering the data from largest to smallest, the MR is the average of the ordinal numbers in a dataset. The MR can help us see which of the methods is working well. The smaller average of MR may indicate that the model is more accurate than the other methods on most datasets.
To better understand MR, we introduce the SCR term with an example to explain the process of calculating the average rank. SCR represents the sorted sum of a particular method for all datasets. For example, we have three datasets and three methods denoted as  and , respectively, and the corresponding results of each method on each dataset are as follows:
      
        
      
      
      
      
    
Then, according to accuracy on the dataset d1, the f1, f2 and f3 are sorted from large to small and denoted as {2, 3, 1}. In the same way, we obtain the sorting results on the datasets d2 and d3 are {3, 2, 1} and {2, 3, 1}, respectively. Thus, the rankings of f1 are 2, 3 and 2 on dataset D, respectively, then the SCR of f1 is the sum of sorts on all datasets; that is, the SCR of method f1 is 7 = 2 + 3 + 2. According to Equation (17), the smaller SCR, the smaller MR. The method’s accuracy is higher, and the MR on the dataset is more minor. Therefore, the MR can evaluate the overall classification results of the method on all datasets.
4.3. Results and Discussion
4.3.1. Compared with Traditional Methods
We compare the proposed LA-ESN model with the traditional TSC models to conduct a general evaluation. In this work, we selected 12 conventional models as comparison methods and abbreviated them as ED [], DDTW [], DTD [], LS [], BOSS [], EE [], FCOTE [], TSF [], TSBF [], ST [], LPS [], and FS []. We briefly describe each comparison method as follows:
- (1)
 - ED (1-Nearest Neighbour with Euclidean Distance). In this method, the Euclidean distance is employed to measure the similarity of two given time series, and then the nearest neighbor is used for classification.
 - (2)
 - DDTW (Dynamic Derivative Time Warping). This method weights the DTW distance between the two-time series and the DTW distance between the corresponding first-order difference series.
 - (3)
 - DTD (Derivative Transform Distance). The DTW distance between sequences of sine, cosine, and Hilbert transforms is further considered based on the DDTW.
 - (4)
 - LCSS (Longest Common Subsequence). In this method, a shapelet transform-based classifier is designed using a heuristic gradient descent-based shapelet search process instead of enumeration.
 - (5)
 - BOSS (Bag of SFA Symbols). This method uses windows to form ‘words’ on the level and then explores a truncated discrete Fourier transform on each window to obtain features.
 - (6)
 - EE (Elastic Ensemble). The EE method takes a voting scheme to combine eleven 1-NN classifiers with elastic distance metrics.
 - (7)
 - FCOTE (Flat Collective of Transform-based Ensembles). The method integrates 35 classifiers by using the cross-validation accuracy of training sets.
 - (8)
 - TSF (Time Series Forest). The time series is first divided into intervals in this method to calculate the mean, standard deviation and slope as interval features. Then, the intervals are randomly selected to train the tree forest.
 - (9)
 - TSBF (Time Series Bag Feature). This method selects multiple random length subsequences from random locations and then divides these subsequences into shorter intervals to capture local information.
 - (10)
 - ST (Shape Transform). This method uses the shapelet transform to obtain a new representation of the original time series. Then, a classifier is constructed on the new representation using a weighted integration of eight different classifiers.
 - (11)
 - LPS (Learning Pattern Similarity). The method is based on intervals, but the main difference is that the subsequence is used as an attribute rather than the extracted interval features.
 - (12)
 - FS (Fast Shape Tree). The method speeds up the search process by converting the original time series into a discrete low-dimensional representation by applying a symbolic aggregation approximation to the actual time series. Random projections are then used to find potential shapelet candidates.
 
The average accuracy and standard deviation of our proposed LA-ESN method five times and other conventional TSC classifiers on 76 datasets are shown in Table 2. As seen from Table 2, we can conclude that the following: (1) LA-ESN achieves higher classification accuracy than the other 12 traditional methods on 36 datasets and achieves comparable results on the other 25 datasets. The total number significantly outperforms other methods. (2) The average classification accuracy of all datasets of the proposed LA-ESN is also superior to the other compared methods. (3) For MR, the performance of LA-ESN is only slightly worse than that of the FCOTE method but significantly better than that of the other methods. (4) The ME of the proposed LA-ESN is slightly higher than that of the FCOTE and ST methods but lower than that of the other methods. Therefore, the experimental results indicate that LA-ESN is effective for time series classification in most cases.
       
    
    Table 2.
    The results of LA-ESN model and different traditional TSC modes on 76 datasets.
  
To better compare multiple classifiers on multiple datasets, we applied the pairwise post hoc analysis proposed by Benavoli et al. [], where mean rank comparisons were computed using the Wilcoxon signature rank test [] corrected for Holm’s alpha (5%) []. A critical difference diagram [] is used to depict the outcome. The more right the classifier is, the better it is. As illustrated in Figure 6, the classifiers connected by thick lines reflect no significant difference in accuracy between the two classifiers. Figure 6 shows the pairwise statistical difference comparison between LA-ESN and traditional classifiers and traditional classifiers. As can be seen from Figure 6, our model is located to the right of the majority of approaches.
      
    
    Figure 6.
      Pairwise statistical difference comparison of LA-ESN and 12 traditional classifiers.
  
4.3.2. Compared with Deep Learning Methods
In this section, we compared the LA-ESN model with four deep learning classifiers: MLP [], FCN [], ResNet [] and Inception Time []. The following is a brief description of the four deep-learning comparison methods mentioned above:
- (1)
 - MLP (Multilayer Perceptrons). The final result is obtained by using a softmax layer and three fully connected layers of 500 cells. Dropout and ReLU are used to activate the model.
 - (2)
 - FCN (Fully Convolutional Networks). The FCN model stacks three one-dimensional convolutional blocks with 128, 256 and 128 and kernel sizes of 3, 5 and 8. Then the features are fed into the global average pooling layer and the softmax layer to obtain the final result. The FCN model uses the ReLU activation function and batch normalization.
 - (3)
 - ResNet (Residual Network). The residual network is stacked with three residual blocks. Each residual block consists of 64, 128 and 256 convolutions of sizes 8, 5 and 3, respectively, followed by a ReLU activation function and batch normalization. ResNet extends the neural network to a profound structure by adding shortcut connections in each residual block. ResNet has a higher proclivity for overfitting the training data.
 - (4)
 - Inception Time (AlexNet). Instead of the usual fully connected layer, it consists of two distinct residual blocks, each made of three Inception sub-blocks. The input of each residual block is transferred to the information of the next block via a fast linear connection, thus alleviating the problem of gradient disappearance by allowing a direct flow of gradients.
 
The experimental results of the proposed LA-ESN and these four classifiers on 128 datasets are shown in Table 3.
       
    
    Table 3.
    The results of LA-ESN and four deep learning methods.
  
From Table 3, we can obtain the following: (1) On 64 datasets, LA-ESN has the highest classification accuracy. Furthermore, the total number significantly outperforms other methods. (2) The average classification accuracy of all datasets of the proposed LA-ESN is also superior to other compared methods. (3) The MR value of the proposed LA-ESN is 2.430, which is higher than the Inception Time method but smaller than the other three classifiers. (4) The proposed LA-ESN has a ME value of 0.047, which is lower than the MLP and FCN but greater than ResNet and Inception Time. (5) The performances of LA-ESN and Inception Time perform better than the other three methods. This reason is that LA-ESN and Inception Time can efficiently handle temporal dependencies in time series with high nonlinear mapping capacity and dynamic memory. (6) When the epoch is set as 500, the running time of the proposed LA-ESN on small datasets is concise while is acceptable on large datasets. A paired statistical difference comparison of the five deep learning methods is shown in Figure 7. As seen in the diagram, our model is to the right of the majority of methods.
      
    
    Figure 7.
      Pairwise statistical difference comparison of LA-ESN and four deep learning methods.
  
4.3.3. Ablation Study
In this subsection, a step-by-step exploration has been constructed further to verify the validity of the proposed LA-ESN model. We divided the model into four modules: ESN-LSTM, ESN-CNN, ESN_LSTM_ATT (ELA) and ESN_CNN_ATT (ECA), and conducted experiments on 17 datasets. These four methods first employ the ESN to perform representation learning on time series and then use different classifiers such as LSTM, CNN, LSTM with attention (LSTM_ATT) and CNN with attention (CNN_ATT) to extract feature information from the representation. In order to verify the validity of each module, a unified setting is adopted for the parameters. In ESN, the spectral radius SR is 0.9, the input cell scale IS is 0.1, the reservoir sparsity SP is 0.7, and the size of the reservoir is 32. The epoch of the whole experiment is 500, and the batch size is 25. LSTM_ATT can automatically record long-term temporal dependencies of sequences, and CNN_ATT can extract feature information from different scales. LA-ESN combines both advantages to extract more valuable information from datasets and achieve better classification accuracy. Table 4 displays the outcomes of LA-ESN and four distinct classifiers on 17 datasets. On 13 datasets, it can be seen that LA-ESN produces the best results. At the same time, we build the critical difference diagram shown in Figure 8.
       
    
    Table 4.
    Results of LA-ESN and four ablation models.
  
      
    
    Figure 8.
      Pairwise statistical difference comparison of LA-ESN and four ablation models.
  
5. Conclusions
This study proposes a deep learning model called LA-ESN for the end-to-end classification of univariate time series. The original time series is first passed through the ESN model in LA-ESN to obtain the echo state representation matrix. Then, the echo state representation matrix is used as the input of the LSTM module and the multi-scale convolutional module, and feature extraction operations are performed on them. Finally, the results from the two modules are concatenated, and the softmax function is utilized to produce the final classification results. The attention-based LSTM module can automatically record the long-term time dependence of the sequences, and the multi-scale 1D convolutional attention module can extract feature information from echo state representations at different scales, highlighting the spatial sparsity and heterogeneity of the data. Without any data reshaping or pre-processing, the model performs better. Based on extensive trials on the UCR time series dataset, the proposed LA-ESN model outperforms several older approaches and current popular deep learning methods in the great majority of selected datasets. Experiments suggest that our technique may perform better on certain datasets.
However, this experiment still needs to be improved in studying the class imbalance problem. As a consequence, in the future, we will improve the model to obtain the best results on most datasets while also addressing the class imbalance issue. Second, we can consider modifying the model to accomplish the classification of multivariate time series data. Finally, LA-ESN is a more generic model in the field of time series, and we can subsequently consider using LA-ESN for tasks such as time series prediction and clustering.
Author Contributions
Data curation, H.S., M.L., J.H., P.L. and Y.P.; Formal analysis, H.S., M.L. and Y.P.; Methodology, H.S., M.L. and Y.Y.; Resources, H.S., M.L. and J.H.; Supervision, P.L. and Y.Y.; Writing—original draft, H.S., M.L. and P.L.; Writing—review and editing, H.S., M.L., P.L. and Y.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This work is supported in part by grants from the National Natural Science Foundation of China (Nos. 62062040, 61967010 and 62067003), the Outstanding Youth Project of Jiangxi Natural Science Foundation (No. 20212ACB212003), the Jiangxi Province Key Subject Academic and Technical Leader Funding Project (No. 20212BCJ23017), the Jiangxi Province Graduate Innovation Project (No. YC2022-S275), the Central Guided Local Science and Technology Development Special Project (20222ZDH04090) and the Jiangxi Provincial Social Science “14th Five-Year” Planning Project (22WT76).
Data Availability Statement
The data were derived from public domain resources.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Hamilton, J.D. Time Series Analysis; Princeton University Press: Oxford, UK, 2020. [Google Scholar]
 - Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.-A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
 - Xiao, X.; Li, R.; Zheng, H.-T.; Ye, R.; Kumar Sangaiah, A.; Xia, S. Novel dynamic multiple classification system for network traffic. Inf. Sci. 2018, 479, 526–541. [Google Scholar] [CrossRef]
 - Michida, R.; Katayama, D.; Seiji, I.; Wu, Y.; Koide, T.; Tanaka, S.; Okamoto, Y.; Mieno, H.; Tamaki, T.; Yoshida, S. A Lesion Classification Method Using Deep Learning Based on JNET Classification for Computer-Aided Diagnosis System in Colorectal Magnified NBI Endoscopy. In Proceedings of the 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), Jeju, Republic of Korea, 27–30 June 2021; pp. 1–4. [Google Scholar]
 - Mori, U.; Mendiburu, A.; Miranda, I.; Lozano, J. Early classification of time series using multi-objective optimization techniques. Inf. Sci. 2019, 492, 204–218. [Google Scholar] [CrossRef]
 - Wan, Y.; Si, Y.-W. A formal approach to chart patterns classification in financial time series. Inf. Sci. 2017, 411, 151–175. [Google Scholar] [CrossRef]
 - Wang, H.; Wu, Q.J.; Wang, D.; Xin, J.; Yang, Y.; Yu, K. Echo state network with a global reversible autoencoder for time series classification. Inf. Sci. 2021, 570, 744–768. [Google Scholar] [CrossRef]
 - Marteau, P.-F.; Gibet, S. On Recursive Edit Distance Kernels with Application to Time Series Classification. IEEE Trans. Neural Networks Learn. Syst. 2014, 26, 1121–1133. [Google Scholar] [CrossRef]
 - Wang, J.; Zhao, Y. Time Series K-Nearest Neighbors Classifier Based on Fast Dynamic Time Warping. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 28–30 June 2021; pp. 751–754. [Google Scholar]
 - Gustavo, E.; Batista, A.; Keogh, E.J.; Tataw, O.M.; De Souza, V.M.A. CID: An efficient complexity-invariant distance for time series. Data Min. Knowl. Discov. 2013, 28, 634–669. [Google Scholar] [CrossRef]
 - Purnawirawan, A.; Wibawa, A.D.; Wulandari, D.P. Classification of P-wave Morphology Using New Local Distance Transform and Random Forests. In Proceedings of the 6th International Conference on Science and Technology (ICST), Yogyakarta, Indonesia, 7–8 September 2020; pp. 1–6. [Google Scholar]
 - Qiao, Z.; Mizukoshi, Y.; Moteki, T.; Iwata, H. Operation State Identification Method for Unmanned Construction: Extended Search and Registration System of Novel Operation State Based on LSTM and DDTW. In Proceedings of the IEEE/SICE International Symposium on System Integration (SII), Virtual Conference, 9–12 January 2022; pp. 13–18. [Google Scholar]
 - Rahman, B.; Warnars, H.L.H.S.; Sabarguna, B.S.; Budiharto, W. Heart Disease Classification Model Using K-Nearest Neighbor Algorithm. In Proceedings of the 6th International Conference on Informatics and Computing (ICIC), Virtual Conference, 3–4 November 2021; pp. 1–4. [Google Scholar]
 - Górecki, T. Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recognit. Lett. 2014, 45, 99–105. [Google Scholar] [CrossRef]
 - Baydogan, M.G.; Runger, G.; Tuv, E. A Bag-of-Features Framework to Classify Time Series. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2796–2802. [Google Scholar] [CrossRef]
 - Deng, H.; Runger, G.; Tuv, E.; Vladimir, M. A time series forest for classification and feature extraction. Inf. Sci. 2013, 239, 142–153. [Google Scholar] [CrossRef]
 - Ji, C.; Zhao, C.; Liu, S.; Yang, C.; Pan, L.; Wu, L.; Meng, X. A fast shapelet selection algorithm for time series classification. Comput. Networks 2019, 148, 231–240. [Google Scholar] [CrossRef]
 - Arul, M.; Kareem, A. Applications of shapelet transform to time series classification of earthquake, wind and wave data. Eng. Struct. 2021, 228, 111564. [Google Scholar] [CrossRef]
 - Baydogan, M.G.; Runger, G. Time series representation and similarity based on local autopatterns. Data Min. Knowl. Discov. 2016, 30, 476–509. [Google Scholar] [CrossRef]
 - Kate, R.J. Using dynamic time warping distances as features for improved time series classification. Data Min. Knowl. Discov. 2016, 30, 283–312. [Google Scholar] [CrossRef]
 - Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Networks 2019, 116, 237–245. [Google Scholar] [CrossRef]
 - Ma, Q.; Chen, E.; Lin, Z.; Yan, J.; Yu, Z.; Ng, W.W.Y. Convolutional Multitimescale Echo State Network. IEEE Trans. Cybern. 2019, 51, 1613–1625. [Google Scholar] [CrossRef]
 - Karim, F.; Majumdar, S.; Darabi, H. Insights into LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 2019, 7, 67718–67725. [Google Scholar] [CrossRef]
 - Koh, B.H.D.; Lim, C.L.P.; Rahimi, H.; Woo, W.L.; Gao, B. Deep Temporal Convolution Network for Time Series Classification. Sensors 2021, 21, 603. [Google Scholar] [CrossRef]
 - Ku, B.; Kim, G.; Ahn, J.-K.; Lee, J.; Ko, H. Attention-Based Convolutional Neural Network for Earthquake Event Classification. IEEE Geosci. Remote. Sens. Lett. 2020, 18, 2057–2061. [Google Scholar] [CrossRef]
 - Tripathi, A.M.; Baruah, R.D. Multivariate Time Series Classification with An Attention-Based Multivariate Convolutional Neural Network. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
 - Sun, J.; Takeuchi, S.; Yamasaki, I. Prototypical Inception Network with Cross Branch Attention for Time Series Classification. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–7. [Google Scholar]
 - Li, D.; Lian, C.; Yao, W. Research on time series classification based on convolutional neural network with attention mechanism. In Proceedings of the 11th International Conference on Intelligent Control and Information Processing (ICICIP), Yunnan, China, 3–7 December 2021; pp. 88–93. [Google Scholar]
 - Huang, Z.; Yang, C.; Chen, X.; Zhou, X.; Chen, G.; Huang, T.; Gui, W. Functional deep echo state network improved by a bi-level optimization approach for multivariate time series classification. Appl. Soft Comput. 2021, 106. [Google Scholar] [CrossRef]
 - Wang, H.; Liu, Y.; Wang, D.; Luo, Y.; Tong, C.; Lv, Z. Discriminative and regularized echo state network for time series classification. Pattern Recognit. 2022, 130, 1–14. [Google Scholar] [CrossRef]
 - Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification. Front. Comput. Sci. 2016, 10, 96–112. [Google Scholar] [CrossRef]
 - Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. InceptionTime: Finding AlexNet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
 - Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Time series classification using multi-channels deep convolutional neural networks. In Proceedings of the International Conference on Web-Age Information Management, Macau, China, 16–18 June 2014; Springer: Cham, Switzerland, 2014; pp. 298–310. [Google Scholar]
 - Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.-C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
 - Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
 - Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min. Knowl. Discov. 2015, 29, 1505–1530. [Google Scholar] [CrossRef]
 - Lines, J.; Bagnall, A. Time series classification with ensembles of elastic distance measures. Data Min. Knowl. Discov. 2015, 29, 565–592. [Google Scholar] [CrossRef]
 - Bagnall, A.; Lines, J.; Hills, J.; Bostrom, A. Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles. IEEE Trans. Knowl. Data Eng. 2015, 27, 2522–2535. [Google Scholar] [CrossRef]
 - Benavoli, A.; Corani, G.; Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 2016, 17, 152–161. [Google Scholar] [CrossRef]
 - Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
 - Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
 - Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar] [CrossRef]
 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.  | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).