Effect of Data Representation for Time Series Classiﬁcation—A Comparative Study and a New Proposal

: Time series classification (TSC) is becoming very important in the area of pattern recognition with the increased availability of time series data in various natural and real life phenomena. TSC is a challenging problem because, due to the attributes being ordered, traditional machine learning algorithms for static data are not quite suitable for processing temporal data. Due to the gradual increase of computing power, a large number of TSC algorithms have been developed recently. In addition to traditional feature-based, model-based or distance-based algorithms, ensemble and deep networks have recently become popular for time series classification. Time series are essentially huge, and classifying raw data is computationally expensive in terms of both processing and storage. Representation techniques for data reduction and ease of visualization are needed for accurate classification. In this work a recurrence plot-based data representation is proposed and time series classification in conjunction with a deep neural network-based classifier has been studied. A simulation experiment with 85 benchmark data sets from UCR repository has been undertaken with several state of the art algorithms for time series classification in addition to our proposed scheme of classification for comparative study. It was found that, among non-ensemble algorithms, the proposed algorithm produces the highest classification accuracy for most of the data sets.


Introduction
Time series is an ordered sequence of data points which is abundant in nature as well as in real life.Due to the increasing use of various sensors, the advancement of ICT (Information and Communication Technology) and decreased cost of storage, a huge amount of time series data are collected and stored regularly in various application domains.This high volume of time series data need to be analysed for meaningful use of the data.Classification of time series is an important task among time series analysis [1] which has many important applications ranging from biometric authentication such as on line signature verification [2] to electroencephalogram (EEG), electrocardiogram (ECG) analysis in medical or health care field [3] or stock price, exchange rate in financial applications [4] to human activity recognition [5,6].
Traditional time series classification algorithms can be summarized into three categoriesmodel-based, feature-based and distance-based.The first category of approaches focuses on building a model for each class from raw time series data by fitting its parameters to that class and the new data is classified according to the class model that best fits it.Models used in time series classification are mainly statistical, such as Gaussian, Poisson, Autoregressive [7] Markov and Hidden Markov Model (HMM) [8] or based on neural networks.Naive Bayes is the simplest model and it is used in text classification [9].Hidden Markov models (HMM) are successfully used for biological sequence classifications.Some neural network models, such as recurrent neural network (RNN), are suitable for temporal data classification.Probabilistic distance measures are generally suitable for model-based classification of the time series.
The second category consists of extracting meaningful features from the time series, transforming the time series into a feature vector and then classification is done by using traditional machine learning classifiers.The choice of appropriate features plays an important role in this approach.A number of techniques has been proposed for feature subset selection by using compact representation of high dimensional time series into one row to facilitate the application of traditional feature selection algorithms like recursive feature elimination (RFE), zero norm optimization and so forth [10,11].Time series shapelets, characteristic subsequences of the original series, are recently proposed as the features for time series classification [12].Another group of techniques extract features from the original time series by using various transformation techniques like Fourier, Wavelet, and so forth.In Reference [13], a family of techniques has been introduced to perform unsupervised feature selection on time series data based on common principal component analysis (CPCA), a generalization of PCA for multivariate data items where all the data items have the same number of dimensions.Any distance metric is used for classification of the feature-based representation of the time series data.
The third category of approaches is based on developing efficient distance functions to measure the similarity between two raw time series and a good traditional classifier for clustering or classification.Similarity or dissimilarity measures are the most important component of this approach.Euclidean distance is the most widely used measure with a nearest neighbour classifier for time series classification.Although computationally simple, it requires two series to be of equal length and is sensitive to time distortion.Elastic similarity measures such as Dynamic Time Warping (DTW) [14] and its variants overcome the above problems and seem to be the most successful similarity measure for time series classification in spite of high computational cost.The combination of DTW and k-nearest neighbour classifiers is known to be a very efficient approach and was considered to be the best one until a few years ago.A comparative study of different distance measures can be found in Reference [15].
Recently, ensemble-based approaches have been developed in which different classifiers are combined to achieve a higher degree of accuracy.Different ensemble paradigms integrate various feature sets or classifiers.Elastic Ensemble (PROP) [16] combines 11 classifiers based on elastic distance measures with a weighted ensemble scheme.Collective of Transformation ensembles (COTE) [17], is another ensemble of 35 different classifiers based on different feature subsets from time and frequency domains.Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) [18] is an extended version of COTE.However, the computational times for ensemble classifiers are quite high compared to a single classifier, even with the increased use of high performance computers.A good comparative evaluation of recent time series classification algorithms can be found in Reference [19].
Due to increased interest in GPU-based computing, deep learning models are also becoming popular and have been successfully applied in the time series classification problem.A good review of the most successful applications of deep neural netwoks (DNN) can be found in Fawaz et al. [20].Deep learning approaches for TSC can be grouped into two categories generative and discriminative models.Among various DNN models developed for different tasks, Convolutional Neural Network (CNN) is the most widely applied architecture for TSC problems, probably due to their robustness and lesser training time compared to other complex architectures.A review of CNN models can be found in Reference [21].Two baseline CNN models are used in Reference [22], one is a fully convolutional neural network (FCN) and the other is residual network (ResNet).CNN and ResNet are known to be the most successful and effective among deep neural networks (DNN) so far according to Reference [20].Recurrent Neural Networks such as LSTM (Long Short Term Memory) have also been used for human activity recognition from various one dimensional time series data from different sensors or for classifying stocks [23].
For efficient classification, raw time series should be preprocessed to reframe them into a new representation to feed them to CNN.A raw time series needs to be converted into a set of fixed length vectors (to be used with 1D CNN) or matrix before feeding to 2D CNN.The most popular transformation methods are Gramian Angular Fields (GAF) [24,25] and Markov Transtion Field (MTF) [26] which are used to encode time series signals as images for inputs to 2D CNN.Another way of transforming one dimensional time series to two dimensional matrix is to use recurrence plot [27].This paper investigates the performance of recurrence plot-based time series representation with two models of DNN, namely Full Convolutional Network (FCN) and ResNet in time series classification problems.A modification of recurrence-based representation has been proposed and the efficiency of classification of the new representation method has been examined compared to other representative classification algorithms from the literature by simulation experiments with 85 benchmark data sets from UCR time series data repository.The next section contains a brief description of time series representation and classification as the background of present work.Our proposed TSC approach with modified recurrence plot is presented in Section 3. Section 4 describes the comparative study by simulation experiments followed by the simulation results and analysis in Section 5.The final section contains the summary of the work and conclusion.

Time Series Representation and Classification
Approaches for time series classification can also roughly be grouped into approaches based on raw time series data and approaches based on transformed data in which the time series is converted as a set of feature vectors.Figure 1 represents the grouping of popular time series classification approaches after preprocessing of the data.The group of approaches on the left consists of representation of time series as a vector of global static features, selection of the most appropriate features and classification done by traditional machine learning models such as SVM (support vector machine), KNN (k-nearest neighbour) or CART (decision tree).The block on the right represents the approaches for classification with raw time series by KNN using various similarity measures.The middle group represents various representation schemes (feature extraction) for classification by deep neural networks or other machine learning classifiers.In this work our approach falls into this category.

Feature-Based Representation
Feature-based approches for classification is generally faster than raw time series-based approaches.Feature extraction from time series can be done either in time domain or in frequency domain.Moreover features can also be derived from subsequences of a time series characterizing local patterns or from the whole time series capable of expressing the global patterns.Features computed from different subsections of a time series are combined to form a bag of feature framework TSBF for classification of time series in Baydogan et al. [28].Some of the feature-based representations of time series convert time series into a vector of feature values which are generally average statistical measures of time series over a window (whole time series is divided into a sequence of fixed length or sliding windows) of ordered sequences like mean, standard deviation, skewness, kurtosis and their successive increments [29].Those features are unable to preserve the dynamic information embedded in the time series.Another class of feature-based representation consists of various transformations of time series in frequency domain such as DFT (Discrete Fourier Transformation), SVD (Singular Value Decomposition), DCT (Direct Cosine Transformation), DWT (Discrete Wavelet Transformation) and so forth.Timmet et al. [30] used a variety of time and frequency domain features to represent hand tremor time series.Morchen [31] used different features from the frequency domain for classification of different time series.Wang [32] used 13 features for classification of univariate and multivariate time series.
In feature-based approaches for TSC, classification accuracy is highly dependent on the extracted and selected features rather than the classification model.The choice of features to characterize a time series is subjective and non systematic.The best feature subset is also task dependent and there is no one particular way of choosing features for all time series classification problems.All the approaches need to take care of preprocessing data and selecting appropriate features for efficient classification.

Time Series Classification with Deep Neural Network
Deep neural networks (DNN), known to be capable of automatic feature extraction, are now becoming very popular and have many successful applications in the field of image processing [33].In addition to images, sequential text and audio data can also be processed successfully with deep neural networks.Motivated by their success, recently DNN, especially convolutional neural networks (CNN), are increasingly used in TSC problems as time series resembles text data and audio in terms of their sequential nature.
A multi channel CNN (MC-CNN) in which filters are applied to each channel and the features are flattened across channels to input to a fully connected layer is proposed in Zheng et al. [34].A multiscale convolutional neural network (MCNN) has been proposed for univariate time series classification in which three types of representation (down sampling, skip sampling and sliding window) for preprocessing of raw time series are used to input to the network [35].Another research work is based on the similar idea of exploiting simultaneously multiple branches of the same type of representation for time series classification [36].
Wang [22] suggested two other CNNs for time series classification, the fully convolutional neural network (FCN) without subsampling layers and ResNet (Residual Network).With the addition of some learning techniques, these two models produce better performance than MCNN or COTE, as is demonstrated by simulation with UCR benchmark data sets.An ensemble method of deep networks is proposed in Reference [37] in which LSTM (Long Short Term Memory) and FCN models are individually used and their outputs are concatenated and passed though a softmax classifier for final decision.Although deep neural networks achieve quite good classification accuracy for time series classification problems, high preprocessing effort and tuning of large set of hyperparameters make them difficult to use in a real situation.

Recurrence Plot for Deep Neural Network
There are basically two main approaches for time series classification with convolutional neural networks.In one approach, traditional CNN is modified to accept 1-dimensional time series as input and in the other approach, time series is converted into a 2D image to be used with conventional CNN.There are various methods for transforming time series signals into images using specific imaging methods like Gramian Angular Fields (GAF) [24,25], Markov Transtion Field (MTF) [26] and Recurrence Plot (RP), a tool in chaos theory to visualize time series.
Silva et al. [38] used the Campana-Keogh distance measure to estimate image similarity as a similarity measure (CK-1) between two recurrence plots corresponding to two time series and found an improvement of classification accuracy compared to Euclidean distance and dynamic time warping.[39] used RP as an input to CNN for TSC problems.In a subsequent paper [40], the authors used bag of feature concepts on recurrence plot and generated bag of recurrence patterns for representation of time series for classification with Support Vector Machine (SVM) classifier.Michael et al. [41] defined a cross recurrence plot (CRP) as an extension of recurrence plot to visualize similar recurring patterns in two time series and proposed another similarity measure called the cross recurrence plot compression distance (CRPCD), which is a modification of the work in Reference [38].Recurrence quantification analysis (RQA) [42] was developed to quantify differences in recurrence plots of two dynamical systems.It is used as a similarity measure in time series classification tasks in several recent works [43][44][45].It seems that there is no research considering the modification of recurrence plot to be used with deep networks for better classification accuracy in a time series classification problem.

Hatami et al. in Reference
Recurrence plot (RP) created by Eckman [27], is a tool to visualize recurrent behaviour such as periodicity or irregular cyclicity, a typical phenomenon in nonlinear dynamical systems that generates the time series.It is a 2D plot for encoding 1D time series which provides a way to visualize the recurrence behaviour of trajectory through a phase space and enables us to investigate certain aspects of the m-dimensional phase space trajectory through a 2D representation.It can be defined by the following equation: where x is a time series of length n, x i and x j are the subsequences observed at i and j positions of the time series, • is a norm (e.g., Euclidean norm) between the observations, is the recurrence threshold.It is chosen in such a way that the noises are filtered out but the recurrence structures are preserved.Θ is the Heaviside function.According to Equation ( 1), the recurrence of phase state at time i and j are placed in the square matrix with black and white dots.Recurrence is marked by the black dots.
Cross recurrence plot (CRP) is an extension of RP which shows all the times when a state in one time series occurs in the other time series.When the length of the two time series n and l respectively differs, the CRP matrix becomes non-square.

Proposed TSC Approach by DNN with Modified Recurrence Plot
In this work, time series classification with deep neural network with a proposed modification of recurrence plot for improvement of classification performance has been investigated.Based on our literature survey we considered two architectures, fully convolutional networks (FCN) and Residual Network (ResNet) with three types of data representation, the first one being traditional recurrence plot and the two others being proposed modifications for time series classification.
The first step in the proposed classification approach is the recurrence plot generation.It is a simple tool for reconstruction of nonlinear dynamical system from the observed time series based on the concept of the embedding theorem.The embedding theorem proposed by Taken and expanded by Sauer [46] guarantees that the phase space of time delayed vectors with sufficiently large dimension will capture the structure of the original phase space.
A deterministic time series signal {s n (t)} T n t=1 (n = 1, 2, . . ., N) can be embedded as a sequence of time delay co-ordinate vector v s n (t) known as experimental attractor, with an appropriate choice of embedding dimension m which is the minimum number of co-ordinates needed to represent the time series with no overlapping in the state space and delay time τ which is the time lag of the time series points taken as coordinates.
Now for correct reconstruction of the attractor, a fine estimation of embedding parameters (m and τ) is needed.There are a variety of heuristic techniques for estimating those parameters [47].The most popular method of estimating m is False Nearest Neighbour proposed by Kennel and the most popular technique for estimating τ is Average Mutual Information.

Recurrence Plot (RP) Generation
After estimation of the embedding parameters, a time series v i can be converted to recurrence plot.The recurrence plot is an array of dots in a n × n square, where a dot is placed at (i, j), whenever x j is sufficiently close to x i .By choosing an embedding dimension m, the m-dimensional orbit of x i can be constructed by the method of time delay.Then r i is chosen such that the ball of radius r i centred at x i in R m contains a reasonable number of other points of the orbit.Finally, a dot at each point (i, j) for which x j is in the ball of radius r i centred at x i , is plotted and the generated image is called the recurrence plot.The practical steps of generation are:

•
Estimation of proper embedding parameters m and τ.

•
Embedding of time series data with Equation (3).
The square distance matrix is finally converted to grey scale image as the input to CNN for classification.Now the square matrix generated is symmetric across the diagonal, lower left triangular part and upper right triangular part contains the same information.

Proposed Modified Recurrence Plot (Recurrence Plot Raw RP1)
In our previous study [48] for time series classification on 85 benchmark data sets from UCR repository using CNN (convolutional Neural Network) similar to the CNN used in Reference [39], it has been found that the two dimensional recurrence plot representation of input data with CNN produces better classification accuracy compared to the classification accuracy of one dimensional raw time series data with 1NN classifier and Euclid distance or DTW as the similarity measures for most of the data sets.The simulation study was done with recurrence plot generation for different m and τ values, However the following issues were noticed that need further consideration.
1.It was found that for some data sets it was possible to improve classification accuracy by tuning the parameters m and τ while in other data sets, tuning did not work.As an explanation for this, it is assumed that, during generation of the recurrence plot, if the change in the time series is small, the distance values in the matrix become close to zero, resulting in poor classification accuracy while those types of time series are better classified with the 1NN classifier and DTW measure using the raw time series.2. Due to the symmetric nature of the square recurrence plot transformed image across the diagonal, only one triangular part is needed for representation of the data, the other part is redundant, which has an effect on increasing computational burden.3. The computational cost increases with the size of the input image, so recurrence plot image size should be the smallest needed to preserve the characteristic pattern of the time series for classification, so resizing of the input image is needed to reduce computational burden.
To alleviate the above points, a modified image representation of the input data is proposed here where one triangular half of the square image retains the recurrence plot of the input data and the other part contains information from the raw data to remove the redundancy in the input image representation and to take care of different types of time series to be classified with similar accuracy.Finally the image is resized and checked that it does not affect classification accuracy.The steps of generation of the transformed image are summarized below and is shown in Figure 2.
1. Estimation of proper embedding parameters m and τ.Recurrence Plot DTW (RP2) In another version of time series representation, step 3 of the recurrence plot algorithm for distance matrix calculation is modified and dynamic time warping (DTW) is used.The DTW distance matrix DTW(i, j) by DTW (the distance between the time series p i and q j with the best alignment) is obtained by the following Alogrithm 1.

Algorithm 1: Calculation of DTW
for i = 0 to n do for j = 0 to l do Cost = D(p i , q j ) DTW(i, j) = Cost + min(Euclid(i − 1, j), Euclid(i, j − 1), Euclid(i − 1, j − 1)) end for end for return DTW(i, j) D(p i , q j ) represents the euclid distance between p i and q j .

Classification by FCN and ResNet
In this work, fully convolutional neural network (FCN) and Residual network (ResNet) has been used for time series classification.The basic structure of the FCN used is shown in Figure 3.It consists of the input layer followed by two sets of convolutional layer and max-pooling layer, two fully connected layers and output layer.The number of neurons in the first fully connected layer depends on the input image size (input image size × feature map size) and in the second fully connected layer is 512.We used three sizes of input images 70 × 70, 100 × 100, 200 × 200.The detailed parameters used after some trial and error with the model are shown in Table 1.The basic structure of the ResNet used in this work is same as used in Reference [49] and is shown in Figure 4.The input image size for ResNet is restricted to 50 × 50 for all time series.

Comparative Study and Simulation Experiments
The proposed approaches based on FCN and ResNet with three types of recurrence plot-based data representation RP, RP1 and RP2 for time series classification have been evaluated with benchmark data sets from UCR archive.A comparative study has been done to verify the classification efficiency of the proposed approaches in comparison with some other popular and successful approaches for TSC.Here we selected the following classification approaches for comparative study.
1. 1NN classifier with Euclid distance as the similarity measure using raw time series.This is the simplest approach and has the lowest computational cost.However, this approach cannot be used to compare two time series of unequal length.2. 1NN classifier with DTW (dynamic time warping) as the similarity measure between two time series.This is the most popular approach; it produces high classification accuracy but has high computational cost.The algorithm is presented in the previous section.3. 1NN classifier with longest common subsequence (LCSS) [50] as the similarity measure.LCSS is a variant of edit distance which also matches two time series by allowing them to stretch like DTW.It has two parameters ND a matching threshold.Two points from two time series are considered to match if their distance is less than and δ, the warping threshold which controls the window size for matching.It is known to be more robust to noise and outliers compared to DTW. 4. CrossTranslation error (CTE), similarity measure for two time series, was developed by one of the authors previously for the online signature verification problem, which is based on the delay vector representation of time series.The details can be found in Reference [51].It is computationally very light, although classification accuracy is poor.The calculation process is described in short here.
• For the vectors v s i (k) and v s e (k ), the transition in each orbit after one step is calculated as follows; V s e (k ) = v s e (k + 1) − v s e (k ).
• Cross Translation Error (CTE) e cte is calculated from V s i (k) and V s e (k ) as where V denotes average vector between V s i (k) and V s e (k ).• e cte is calculated for L times for a different selection of random vector v s i (k) and the median of e i cte (i = 1, 2, . . ., L) is calculated as M(e cte ) = Median(e 1 cte , . . ., e L cte ).
The final cross translation error E cte is calculated by taking the average, repeating the procedure Q times to suppress the statistical error generated by random sampling in the step (3).
5. Time series bag of features (TSBF) is an an extension of Time series forest (TSF) with multiple stages.The first stage generates a subseries classification problem and the second stage forms class probability estimates for each subseries.The third stage constructs a bag of features from these probabilities and finally a random forest classifier is built on the bag of feature representation.The details can be found in Reference [28].6.We also used one dimensional FCN ( Convolutional Neural Network) and ResNet and used raw time series data for classification to compare the effect of 2D recurrence plot approach for time series classification compared to 1D raw time series data.Due to limitation of computational resources while implementing ResNet, we compressed the time series for recurrence map generation, we used the same compressed time series for one dimensional version of FCN and ResNet for fair comparison.

Dataset Used
The simulation experiments were done with the benchmark datasets from UCR/UEA time series classification archive [52].We used 85 data sets, details of which are presented on the archive website.The data sets contain time series of various characteristics, length ranges from 24 to 2709, number of classes varies from 2 to 60.Some data sets have a very small training set size.The data sets are collected from different application domains and can be divided into seven categories as Image Outline (29), Sensor Readings (16), Motion Capture (14), Spectrographs (7), ECG measurements (7) Electric device profiles (6) and Simulated Data (6), the numbers in bracket represents the numbers of data sets in the said category.

Simulation Experiments
Following simulation experiments for time series classification with benchmark data sets, training and test sets were used, as is mentioned in the original data set with 10 fold cross validation for each classifier.For convolutional neural network CNN, some trial and error experiments were done for appropriate hyper parameter setting and the hyper parameters are set for the best results and are represented in the next section.For ResNet, due to time limitations, we used previously reported parameters.

•
FCN classifier with three types of recurrence plot representation (RP, the original one, RP2, in which DTW is used for distance calculation for recurrence plot, RP1, our proposed modified recurrence plot in which raw data is also combined with the recurrence plot)

•
The above experiments are repeated with ResNet with the same three types of recurrence plots.

•
Experiments were done with Nearest Neighbor classifier with Euclid and DTW using the original raw time series.
• 1NN classifier with edit distance-based approaches, LCSS (longest common subsequence), TWED (time warped edit distance) and MSM (Move-Split-Merge), are used for classification using the original raw time series.

•
Cross transtational error (CTE) based on the concept of multidimensional delay vector representation with 1NN classifier.

•
A feature-based approach TSBF with random forest classifier is used.
We attempted to implement ensemble-based algorithms on the data sets but due to lack of proper computing resources, we restricted our comparative study to non-ensemble algorithms.

Simulation Results and Analysis
Table 2 represents classification accuracies of 85 data sets with different classification approaches.In all tables, in every row, the highest value is presented in bold which represents the best classification accuracy obtained for the particular data set.Column 1 represents data sets, column 2 represents classification accuracies by FCN with traditional recurrence plot similar in the work presented in Reference [39].Columns 3, 4 and 5 represent classification accuracy values for FCN with recurrence plot RP2, FCN with proposed modified recurrence plot RP1 and ResNet with RP1 respectively.We found that RP1 produces better classification accuracies than RP and RP2, so we did not present (RP2 + ResNet) results.The rest of the columns represent classification accuracies for Euclid, DTW, LCSS, CTE and TSBF.We did not include the results of TWED and MSM as those have poor classification accuracies compared to the methods presented in the table.It is found that no algorithm is best for all the data sets.Though TSBF produces the best classification accuracy for most of the data sets, average classification accuracy over 85 data sets is not the highest among all the methods.Our proposed method (RP1 + ResNet) achieves the highest average classification accuracy over 85 data sets.RP2 uses DTW for distance calculation, which increased computational cost as well as accuracy for some of the data sets, as a whole the increase in classification accuracy is not so significant compared to the increase in computational cost.However, our proposed modification of recurrence plot RP1 seems to have the best effect on the increase of classification accuracy.This modification does not increase the computational cost.From this table it can be assumed that TSBF, RP1+FCN and RP1 + ResNet are the effective classifiers.
Table 3 represents the comparison of classification accuracies of different one dimensional deep networks with raw time series input and two dimensional deep network-based algorithms with recurrence plot input.We excluded TSBF here to focus on the results of recurrence plot-based methods.Column 4 and column 6 represent the results of 1D convolutional neural network and 1D ResNet, respectively.It is found that the results of two dimensional deep networks with recurrence plot input are far better than one dimensional deep networks with raw time series input for most of the data sets.From this table it is also found that the classification accuracy of column 7 (RP1 + ResNet) is the highest for the most of the data sets.It can be concluded that ResNet with our proposed modified recurrence plot input produces the best average classification accuracy and the highest classification accuracy for most of the data sets.Also, the variability of the classification accuracies among different data sets is the lowest (same as DTW).For consideration of computational cost, it is difficult to compare all the algorithms by implementing all of them in the same platform.It is needless to say that the parameter search of deep neural network architectures takes time and our reported results might not constitute the most optimized architecture.On the other hand, for NN-DTW, warping window size has a considerable effect on the final accuracy and we did not put significant effort into searching for the best warping window.As a rough comparison, our proposed representation technique based on recurrence plot and deep network considerably improved classification accuracy without incurring additional computational cost compared to other popular non ensemble and deep network-based algorithms.

Conclusions
In this paper, the effect of time series data representation methods for time series classification problems in terms of increased classification accuracy with affordable computational cost and intrepretability has been studied.Our study focussed mainly on recurrence plot-based representation of time series for use with deep network-based classifiers.Because there are several deep network architectures and, from the reported results, it is found that CNN and ResNet perform better than others in time series classification problems, fully convolutional network (FCN) and residual network (ResNet) have been used in our work.A new modified recurrence plot representation of time series data set has been proposed which judiciously includes information from raw time series in the recurrence plot framework without much additional computational cost for improvement of classification accuracy.
The use of recurrence plot as the input representation form increases the interpretability of the classification method compared to the raw time series input.Deep networks are known to be black boxes which inherently extract the features of the time series for grouping.Although it is convenient, this process is invisible to users.Recurrence plots are more visually interpretable than raw time series.Humans can deploy the results of classification by deep network later to establish a correlation between the structure of the recurrence plot with the categories of time series.Interpretability of classification process can also be increased by extracting explicit features from the time series and then classifying the time series by those features which will allow users to relate the classes with the characteristics of the time series.However, the proper selection of a feature set is important for efficient classification and there is no general way to do that.
In our work, the modification of recurrence plot by mixing of information from raw time series data with the recurrence plot allows us to consider static and dynamic features of the time series simultaneously and extends the use of recurrence plot to a wide variety of time series data.Due to computational resource limitations, we optimized the size of recurrence plot in such a way that the computational limitations could be overcome without much degradation of classification accuracy and we selected 50 × 50 image size for input to ResNet for all time series irrespective of their original length according to our computational environment.The computational cost of deep network based approaches with original time series input increases with the increase of the length of time series as the network complexity (number of parameters) increases which in turn increases the training time.Our approach is an attempt to optimize computational burden, classification accuracy with general applicability to different types of time series and also to add interpretability.Of course increasing the size of recurrence plot might increase the classification accuracy for some time series.We did not tuned the size for all the time series individually at this stage.There is a scope of further improvement of classification accuracy at the cost of more computation time.
A comparative study has been done with some of the state-of-the art algorithms and it was found that our proposed approach can produce better classification accuracy for most of the data sets.For comparison, we did not include ensemble algorithms.Although ensemble algorithms produce better classification accuracy, their computational cost is too high to find out the proper combination.It has been found from the comparative study that our proposed algorithm performs better than popular traditional non-ensemble algorithms for time series datasets for most of the domains available from the benchmark data set repository.

Figure 1 .
Figure 1.Approach of time series data classification.

2 .
Embedding of time series data with Equation (3). 3. Calculation of Euclid distance to generate the distance matrix D i,j = dist(v i − v j ). 4. Normalization of the distance values to lie between 0.0 and 1.0 to form the square matrix A. 5. Another square matrix B is formed with the original time series values sifted by τ.Let us suppose that the normalized original time series is represented by S consisting of 11 points.Its distribution in a square matrix B with τ = 2 is shown in the left square of the figure.6.The final square matrix F is designed by combining A and recurrence plot information from B in which upper triangle represents the upper triangle (except the diagonal) of the recurrence plot values and the lower triangle represents the lower triangle (with the diagonal) of the original square matrix A as shown in the right square of the figure.7. Finally, square matrix F is converted to image (RP1) and optimized to proper size as a representation of the time series.

Figure 3 .
Figure 3. Basic structure of the FCN used.

Figure 4 .
Figure 4. Basic structure of the ResNet used.

Figure 7 .
Figure 7. Critical difference plot for different classifiers ( 7 categories of data sets).

•
Let v s i (t) and v s e (t) denote m-dimensional delay vectors generated from time series s i (t) and time series s e (t) respectively according to Equation (3).•A random vector v s i (k) is picked up from v s i (t).Let the nearest vector of v s i (k) from v s e (t) be v s e (k ).The index k for the nearest vector is defined as follows; k ≡ arg min t ||v s i (k) − v s e (t) ||.

Table 2 .
Classification Accuracies with Different Algorithms.