Data Augmentation with Suboptimal Warping for Time-Series Classification

In this paper, a novel data augmentation method for time-series classification is proposed. In the introduced method, a new time-series is obtained in warped space between suboptimally aligned input examples of different lengths. Specifically, the alignment is carried out constraining the warping path and reducing its flexibility. It is shown that the resultant synthetic time-series can form new class boundaries and enrich the training dataset. In this work, the comparative evaluation of the proposed augmentation method against related techniques on representative multivariate time-series datasets is presented. The performance of methods is examined using the nearest neighbor classifier with the dynamic time warping (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT), and the recently introduced time-series cluster kernel (NN-TCK). The impact of the augmentation on the classification performance is investigated, taking into account entire datasets and cases with a small number of training examples. The extensive evaluation reveals that the introduced method outperforms related augmentation algorithms in terms of the obtained classification accuracy.


Introduction
The ubiquity of interconnected sensors that record data over time result in collections of sequentially ordered multidimensional data points. Consequently, the classification of multivariate time-series (MTS) requires taking into account multidimensional features, their relationship with subsequences of features that belong to the same time-series, and possible nonlinear distortions in time scale [1]. The last problem involves sequences of different lengths, which are typical in applications capturing events that occur at different speeds. For example, sign language gestures of the same meaning can be performed differently, depending on the mood or intentions of the signer [2]. This applies to time-series captured by sensors in most action recognition problems [3]. Since the classification of relatively small datasets may lead to overfitting, methods that generate synthetic data examples are desired. Data augmentation techniques are more popular in the computer vision field in which learning images are subjected to different transformation approaches, such as rotation, scaling, or noise injection [4]. Also, datasets which contain sets of multidimensional data points can be augmented using methods creating synthetic minority class examples [5]. They introduce new data samples between pairs of training samples. However, data augmentation of time-series seems more challenging than the augmentation of image data or sets of multidimensional points, as new data points cannot be created in space between time-series. That space is often warped and operators that reflect rotation or scaling of an image may damage the information carried by a sequence of captured events stored in a time-series. Such a sequence has meaning and its distortion may lead to the classification errors [6]. Therefore, time-series augmentation techniques generate new examples by randomly stretching, shrinking, removing their parts [4], or perturbing them [7]. Also, weighted aligned averages [8,9] or generative models [10] are used.
In this paper, a new time-series example is created in the warped space between time-series using a suboptimal alignment. The suboptimality is caused by constraining the warping path. This allows the method to generate synthetic examples that enrich the pool of training data and often create new boundaries between classes. The method is designed for data captured by a variety of sensors [1] and modifies the alignment provided by the Dynamic Time Warping (DTW) [11].
The major contributions of this work are a novel method for the generation of synthetic time-series to augment sparse datasets and a comprehensive evaluation of the approach and related techniques on demanding MTS datasets using representative classifiers.
The rest of this paper is arranged as follows. Section 2 reviews previous work on augmentation of time-series datasets, Section 3 introduces the proposed technique, and Section 4 presents the extensive comparative evaluation of the method with related approaches. Finally, Section 5 concludes the paper and indicates possible directions for future work.

Related Work
The data augmentation methods aim to provide new data samples that cover unoccupied feature space of the considered class, assuming that this would improve the classification performance. Such simulated data samples are assigned with class labels of the samples used for the augmentation. As data generation techniques rely on linear transformations [5,12], the synthetic time-series generation can be particularly challenging due to frequent nonlinear transformations in time scale or the need for methods that create a new example preserving sufficient information to identify the class.
Simple approaches to time-series generation involve a selection of a time window in a time-series and warping it randomly by removing or adding some data points [4]. Similarly, cropping, or window slicing, in which a part of a time-series is removed can be also used [4]. Other such techniques involve the addition of noise, rotation, and scaling of the values in sequences [4]. A more advanced method, DTW Barycentric Averaging (DBA) [9], generates time-series as weighted averages of multiply aligned time-series. In that method, a time-series is selected and used for aligning the remaining samples. Then, weights that govern the alignment are iteratively updated to provide a sequence that averages the input time-series. The suitability of DBA for the training of deep learning models was discussed by Fawaz et al. [8].
The usage of data augmentation methods may depend on their application since in some domains a time-series should not be rotated and window slicing can remove vital information preventing its classification. Consequently, data warping should not be used in applications in which the time scale has significant physical meaning. Therefore, in the literature, some approaches are focused on time-series from one domain. For example, generative adversarial networks (GANs) that are composed of competing neural networks for the generation and discrimination of synthetic biosignals for the augmentation of electrocardiogram and electroencephalogram datasets were proposed by Haradal et al. [10]. A similar approach using Conditional Generative Adversarial Networks (CGAN) and irregular sampling was introduced by Ramponi et al. [13]. Biosignal recordings (electrocardiogram (ECG)) were also augmented by Cao et al. [14] using samples that belong to different categories. In that work, the deep learning methods were trained on augmented data. In other work, Delaney et al. [15] created ECG data with a range of GAN architectures. Since neural networks often require large quantities of training examples, some approaches are designed for their training. For example, Yeomans et al. [16] used smoothed curves and randomly concatenated sections from the deviation from the mean curves, Um et al. [7] perturb a temporal location of within-window event in time-series of wearable sensor data for Parkinson's disease monitoring, or Le Guennec et al. [4] introduced noise and magnitude changes, warped or cropped time slices. Krall et al. [17], in turn, augmented electroencephalographic data using temporal and spatial or rotational distortions. The importance of time-series augmentation for healthcare applications was considered by Dahmen and Cook [18]. In that work, nested sequences obtained with hidden Markov models and regression models were used. Ruiz et al. [19] proposed a compositional synthetic signature generation from shape primitives for off-line handwritten signature verification.
In contrary to the referred methods for time-series augmentation, the method introduced in this paper, SuboPtimAl Warped time-series geNEratoR (SPAWNER), creates new time-series in the warped space between suboptimally aligned and merged time-series, and, as it will be seen in next sections, creates more diverse time-series and extends class boundaries.

Proposed Method
In the considered problem, a time-series X = [x 1 , x 2 , . . . , x L ] is an ordered collection of M-dimensional values of the length L. Hence, each x l ∈ R M , l = 1, 2, ..., L, and X ∈ R L×M .
Considering that there are N time-series in a dataset, and each can be of a different length L n , n = 1, 2, ..., N, X n ∈ R L n ×M . Then, a dataset U = {(X 1 , C 1 ), (X 2 , C 2 ), . . . , (X N , C N )} is a collection of time-series and their labels, C ∈ {1, K}, where K is the number of classes. Consequently, a classifier is trained on U and assigns a label C to a previously unseen time-series Y ∈ R L×M .
In the proposed time-series augmentation method, SPAWNER, two input sequences, X 1 and X 2 , are aligned based on DTW algorithm [11]. The DTW is used as it aligns the sensor data such as voice, action, or other measurements [1,20]. However, for image or text data, other alignment methods can be more suitable [21,22]. In the introduced method, given 2 , x 2 2 , . . . , x j 2 , , . . . , x L 2 2 ], DTW determines the optimal sequence W = [w 1 , w 2 , . . . , w P ], called warping path, in which P is the length of the path, p-th element w p = (i, j), and max(L 1 , L 2 ) ≤ P < L 1 + L 2 . Therefore, a L 1 × L 2 matrix D is computed. For all (i, j), it contains costs, or distances, between time-series [x 1 1 , . . . , x i 1 ] and [x 1 2 , . . . , x j 1 ]. To determine the optimal alignment between X 1 and X 2 , the path W * that minimizes the total cumulative distance is found by calculating ). Each warping path must satisfy the following three conditions: (1) The boundary condition forces the path to start at the beginning of each time-series, w 1 = (1, 1), and finish at their ends, w P = (L 1 , L 2 ).
Specifically, the warping window ξ limits the elements of X 1 and X 2 that can be aligned, i.e., ∀ (i,j)∈w p ||i − j|| ≤ ξ. Typically, ξ is 10 percent of max(L 1 , L 2 ) [11]. Since DTW is used for the distance computation between time-series, the value D(L 1 , L 2 ) is returned.
In this work, to create new time-series given two input examples X 1 and X 2 , an additional constraint on the warping path is introduced forcing it to contain the element w p = (R 1 , R 2 ), where R 1 = rL 1 , R 2 = rL 2 , r is a single uniformly distributed random number in the interval (0, 1) and · is the ceiling operator.
However, to prevent the calculation of L 1 × L 2 matrix D and reduce the computational cost, two matrices . Consequently, warping paths W * 1 and W * 2 are obtained. The paths are optimal, taking into account the introduced constraint and matrices D 1 and D 2 but after their concatenation the obtained path is suboptimal considering the path that can be obtained for D.
Furthermore, as DTW aligns both sequences using a relatively great ξ and, in the proposed method, ξ 1 and ξ 2 used to determine W * 1 and W * 2 depend on the value of 0.1 · max(R 1 , R 2 ) and 0.1 · max(|L 1 − R 1 |, |L 2 − R 2 |) , respectively. These values reduce the flexibility of the path from the perspective of the matrix D as well as the concatenated paths W * 1 and W * 2 . Once the concatenation of the paths is obtained (W * 1,2 ), the method aligns X 1 to X 2 generating sequences X 1 and X 2 of the length of W * 1,2 . In SPAWNER, to produce a new time-series X , X 1 and X 2 are merged. Here, their average was primarily considered. However, as observed for many multivariate time-series examples, in some dimensions, values may not change much in time and an introduction of a slight changes around the average facilitates the application of classifiers that depend on signal variances [24]. Therefore, a value x ∈ X , is a random number chosen from a normal distribution with a small σ to draw values close to the average, To present the output of the method, two time-series from ECG dataset [25] of each class for an exemplary feature dimension are shown in Figure 1a along with the time-series obtained with SPAWNER. The classes are indicated by colors. The figure also contains augmented data for all input sequences ( Figure 1b). The method creates one time-series for each input pair that belong to the same class. The method was implemented in Matlab R2018b using Signal Processing Toolbox on the i7-6700K 4 GHz CPU with 64 GB RAM. Its code and exemplary application are available at http://marosz.kia.prz. edu.pl/SPAWNER.html.

Datasets
In experiments, ten multivariate time-series benchmark datasets are employed. They represent a variety of collected sensor data, including challenging time-series from activity recognition datasets with common nonlinear distortions in time scale. The computation time with the augmented dataset with some of the applied classifiers on the used processing unit was a factor considered when selecting the benchmarks for the experiments. The benchmarks are described in Table 1, most of them contain time-series of different lengths (cf. Figure 1), with the number of classes in benchmarks varied from 2 to 95, and the number of attributes from 2 to 75. Furthermore, some of them contain a relatively small number of examples (e.g., Kick vs. Punch, Occupancy, or AREM).

Time-Series Augmentation Methods
The proposed method is experimentally compared with three representative approaches: DBA [9], window slicing (WS) and window warping (WW) [4]. The Matlab sourcecode of DBA is publicly available while the remaining methods are implemented by the authors of this paper based on findings published by Le Guennec et al. [4]. Among the compared approaches, DBA averages time-series and takes into account nonlinear transformations in time scale, reflected by different lengths of sequences in a dataset. In experiments, DBA provides average time-series for a randomly selected subset of the training data samples until the targeted size of the dataset is achieved. In the WW, randomly selected part of a time-series of 10% of its length is warped [4], while in WS, a slice is removed from the sequence. Both data augmentation techniques are simple but considering the simplicity of image operators typically used for image augmentation in computer vision applications, their performance should be examined. For a given dataset, its time-series are selected and processed until the required size of the output dataset is met. The input examples are added to the generated examples. Since SPAWNER generates new data from a pair of input time-series, the number of generated examples by other methods is based on the size of the produced output dataset to ensure a fair comparison of approaches. The number of data samples per dataset used in the evaluation is also given in Table 1.

Time-Series Classifiers
The compared data augmentation methods generate training data for three time-series classifiers: nearest neighbor (NN) with DTW distance (NN-DTW), LogDet divergence-based metric learning with triplet constraints (LDMLT) [28], and the nearest neighbor with recently introduced time-series cluster kernel (NN-TCK) [24]. These three classifiers are employed in tests due to their popularity and the state-of-the-art performance. Also, their Matlab sourcecodes are publicly available and their efficient classification of large time-series datasets does not require specific hardware setting [29]. In the implementation of the NN-DTW classifier, the size of the warping window is equal to the 10% of the length of the longer sequence from the pair of sequences for which the DTW distance is calculated. Then, the distance is normalized by the sum of time-series' distances to reduce the impact of short time-series on the calculated distance. Since TCK cannot provide the kernel for time-series of different lengths, in experiments with its use, multidimensional time-series are transformed into time-series of the same length according to the approach of Wang et al. [30], also employed by Mikalsen et al. in tests with TCK [24]. The length of the resulted time-series is equal to T max / T max /25 , where T max is the length of the longest time-series in a dataset. In experiments, the time-series are standardized to zero mean and unit standard deviation [24].
It is worth noticing that the classification performance of LDMLT and NN-TCK with augmented time-series is reported for the first time in this paper and most studies on such augmentation consider only one classifier.

Time-Series Classification with Augmented Datasets
In the evaluation, augmentation methods are compared using average classification accuracy and the average and geometric average ranks for methods and classifiers.
The accuracy is calculated as the number of correct predictions of the classifier divided by the total number of testing samples in the dataset. The division of a dataset into training-testing subsets is shown in Table 1. The average accuracy is calculated for 50 runs of the data augmentation methods in experiments with NN-DTW and 10 runs with the remaining classifiers due to their high computational demand (NN-TCK, LDMLT). To compare augmentation methods ranks from 1 to 5 are used, where a lower rank is assigned to the method with greater accuracy. The results are presented in Table 2.
The experimental comparison reveals that all three classifiers display a variety of performances on the benchmark datasets in a case in which data augmentation methods are not used. Specifically, each of them is better than the remaining two for some benchmarks. Here, NN-DTW outperforms NN-TCK and LDMLT on five datasets (Gesture phase, EEG, Movement AAL, Ozone, and LIBRAS), NN-TCK exhibits the best performance on three datasets (ECG, AREM, and Occupancy), and LDMLT on two (AUSLAN and Kick vs. Punch). The compared augmentation methods can improve the classification accuracy four times for NN-DTW and two times for NN-TCK. Apart from the improvement of the best-obtained results for the classifiers, the augmentation techniques also allowed them to improve their remaining results in many cases. The introduced SPAWNER outperforms other such techniques in terms of the number of datasets for which the accuracy is improved. Also, it is better ranked.
Interestingly, the compared methods seem less influential on the results of the NN-TCK classifier, due to the way it processes the time-series (i.e., the interpolation to shorter sequences). Taking into account the results for the datasets, it can be seen that the performance of a classier for Kick vs.

Visualization
The accuracy of the augmentation methods reflects the way the synthetic samples appear in the space among input samples. This can be shown using Multi-Dimensional Scaling (MDS) [31] that reduces the data dimensionality and places samples in new space, using pairwise dissimilarities between them. However, the pairwise DTW distances cannot be used in this case, since DTW is not a metric. Therefore, in this work, Kruskal's nonmetric MDS was employed that approximates the ranks of the dissimilarities. Specifically, the obtained visualization contains points and distances between them, approximating a monotonic transformation of the samples' dissimilarities [31]. Hence, it can be used to interpret the relationship between samples of time-series in a dataset.
The scatter plots containing MDS representations of the first 15 and 25 sequences and the resulted augmented samples from AUSLAN dataset are shown in Figure 2. The number of considered sequences is low to facilitate the visualization. The case in which more samples are used (i.e., Figure 2e-h) is presented to highlight the difficulty of the classification task in which more similar samples from different classes are taken into consideration. Larger or smaller dissimilarities, reflected by DTW distances, correspond to larger or smaller Euclidean distances between points in plots. As shown, WW and WS tend to create new samples close to the original time-series, as they share most of their parts. This can be beneficial in tight spaces between classes (Figure 2e,f) where samples from different classes are close to each other and may confuse a classifier. However, for some datasets, they are unable to create samples that would use space between original sequences. Here, DBA and SPAWNER, put new samples in space between input sequences, often creating new boundaries that can be closer in the nearest neighbor sense than it can be observed for the original data. Also, the diversity of the obtained time-series for these methods is larger in comparison to WW and WS methods. In contrary to DBA, the introduced SPAWNER does not concentrate samples in the center of the cluster for a class (cmp. Figure 2c,d). In such cases, the generation of new samples does not bring an additional advantage, since input samples cover the area. Some input sequences are introduced that are closer to other classes than to the rest of the samples from their class. Here, SPAWNER create samples that connect the outlier sequence to its class, DBA seems to use the sequence to create a new one that slightly shifts the class boundary in the direction towards the outlier, and warping with slicing techniques create sequences in its neighborhood. This can be seen for horizontally placed examples of the "green" class shown in Figure 2a  As shown, the addition of more examples per class improves the classification performance but it locally depends on the randomly selected examples. Interestingly, the accuracy obtained with a small number of input samples augmented by SPAWNER and DBA for AREM dataset is greater than the value obtained for the entire dataset due to close boundaries between classes present in the entire (i.e., original) dataset or a large number of outliers. Also, almost in all experiments with the augmentation, the accuracy is improved. This means that for small datasets the generation of new samples is justified and should be employed. Since the comparison based on data presented in Figure 3 can be difficult due to the amount of conducted experiments, the approaches are compared using average accuracy per dataset. The results are shown in Table 3. Table 3. Experimental comparison of augmentation methods using NN-DTW classifier and few number of examples per class. Average accuracy for a dataset is reported (see Figure 3). The greatest value used in the rank is written in bold and rounded to two significant figures. From the results of experiments with a small number of samples per benchmark, it can be seen that the introduced SPAWNER clearly outperforms other techniques, as it allowed the classifier to obtain the best result for a dataset five times, and is ranked best across datasets. The values for the case in which augmentation is not used (cf. the second column in the table) also confirm that the augmentation is beneficial if only a few examples are available.

Complexity Analysis
To compare the complexity of the time-series augmentation methods, their time-and memoryconsumption on a representative dataset is evaluated. In experiments, LIBRAS dataset is used since it contains a reasonably large number of classes. The average processing time and memory requirements are determined on a basis of 10 runs of the methods. The results shown in Table 4 indicate that WW and WS are much less complex than DBA and SPAWNER. However, since SPAWNER uses DTW to align pairs of time-series and DBA aligns all selected time-series similarly to DTW, SPAWNER requires fewer resources and generates augmented sequences faster than DBA.

Limitations of the Method
Since the proposed method constrains the warping path while aligning two sequences to produce a new time-series, the case in which the warping path is not constrained is investigated. The average accuracy of the NN-DTW classifier calculated for 50 runs of SPAWNER for both cases is reported in Table 5. This classifier is used in tests since the time-series generated by the method influence its results the most. In the table, maximal values obtained for the constrained method (i.e., suboptimal) are also reported. As reported, the classifier with the proposed method of constraining the path outperforms the option in which such a constraint is not used for six datasets. For one dataset, the results were not changed, while in three cases the constraint resulted in a drop in the accuracy. Taking into account the best results obtained by the introduced approach, its application is justified in all cases. This means that the observed results for the aforementioned three datasets can be obtained by the classifier with the applied method using the constrained warping paths. Consequently, it can be assumed that there may be a way of aligning the time-series to provide satisfactory results for each benchmark dataset.
Since the augmentation proposed in this paper does not consider a rejection step of input outliers that may decrease the quality of the augmented time-series, a time-series selection is introduced that allows a given input time-series creating output sequence only with some of its neighbors. The results in terms of the average accuracy for 50 runs of the NN-DTW classifier for all datasets ranging the number of neighbors from 10 to 100% of input sequences are shown in Figure 4. As shown for all datasets, excluding the EEG which does not respond well to the augmented data, the access to more neighbors while creating new samples using input pairs is beneficial to the performance of the classifier. Consequently, it seems that the datasets do not contain outliers decreasing its performance and the more distant time-series should be involved in the time-series augmentation.

Conclusions
In this work, the problem of generating artificial examples to augment time-series datasets is considered. A new method that aligns the input time-series and constrains the warping path is proposed. The proposed modification of the warping path while aligning the sequences allows producing new examples that resemble the original data. It is shown that the produced synthetic examples are better utilized by three time-series classifiers than those of other related methods, improving their classification accuracy on the demanding ten benchmark datasets. Furthermore, the proposed method creates more diverse examples and moves class boundaries.
In future work, deep learning architectures for time-series classification [32], data augmentation near class boundaries, or augmentation addressing a specific domain (e.g., captured human actions or sign language gestures [33,34]) will be considered.
To facilitate the reproducibility of the presented findings, the code of the introduced SPAWNER is available at http://marosz.kia.prz.edu.pl/SPAWNER.html.