Abstract
Dynamic time warping under limited warping path length (LDTW) is a state-of-the-art time series similarity evaluation method. However, it suffers from high space-time complexity, which makes some large-scale series evaluations impossible. In this paper, an alternating matrix with a concise structure is proposed to replace the complex three-dimensional matrix in LDTW and reduce the high complexity. Furthermore, an evolutionary chain tree is proposed to represent the warping paths and ensure an effective retrieval of the optimal one. Experiments using the benchmark platform offered by the University of California-Riverside show that our method uses 1.33% of the space, 82.7% of the time used by LDTW on average, which proves the efficiency of the proposed method.
1. Introduction
As a common data type, time series is a sequence of discrete data obtained from a target with a fixed frequency in a period. A fundamental task regarding the time series is to measure the similarity between two given ones, which is critical to downstream works in terms of classification [,,,,], clustering [,,,,] and pattern recognition [,,,]. The dynamic time warping (DTW) [] algorithm and its variants [,,] are competent in similarity evaluation [].
Given series X and Y, if they are of the same length N, then the similarity S could be described as Expression (1).
where stands for the Euclidean distance, and are the ith node of X and Y, respectively. However, more generally, the length of X and Y may not be the same. A key feature of DTW is that it can deal with two series of different lengths.
Let N and M be the length of X and Y, respectively; DTW finds the similarity by maintaining a two-dimensional cumulative distance matrix (CDM) D as shown in Expression (2). The algorithm calculates each element of D in row-major order (i.e., from left to right, from top to bottom), which starts from till according to Expression (3).
where is the distance between two nodes. After the traversal, will hold the value of the similarity. The matching results (or the optimal warping path in other words) could be determined according to the CDM.
For the evaluation of series with different lengths, as depicted in Figure 1, DTW aims to find the optimal alignment between X and Y [], and a node in X may be matched with multiple nodes in Y (and vice versa). However, if too many nodes (marked within a green dotted circle in Figure 1) are matched with the same one (marked within a red solid circle in Figure 1) which is unreasonable in a real case, it is referred to as the well-known pathological alignment problem of DTW.
Figure 1.
Demonstration of the pathological alignment problem of DTW, where one node in X (marked with the red solid circle) is matched with too many nodes in Y (marked within the green dotted circle).
To solve that, Zhang et al. [] presented a state-of-the-art method named dynamic time warping under limited warping path length (LDTW). By limiting the length of the warping path in a third dimension (see Figure 2), the pathological alignment problem could be relieved. As a result, LDTW boosts the accuracy against other variants [,,,] on the benchmark platform offered by the University of California-Riverside (UCR) []. However, it also leads to a much higher space-time consumption.
Figure 2.
Comparison of the space and calculated amount between LDTW and DTW (tested on UCR data named SyntheticControl). The biggest cube is the CDM of LDTW, while the bottom part is the CDM of DTW.
To reduce the complexity of LDTW, an alternating matrix whose size is much smaller than the three-dimensional CDM used in LDTW is presented, and an evolutionary tree is introduced to represent the warping paths as well. The main contributions of this paper are twofold:
- (1)
- A two-channel matrix with an alternating scheme is proposed for similarity calculation.
- (2)
- A chain tree with an evolutionary scheme is proposed to find the optimal warping path with the similarity calculation process simultaneously.
2. Preliminary
2.1. DTW
DTW is a dynamic programming algorithm for calculating the similarity of two sequences, especially those of different lengths []. Given time series X and Y defined by Expression (4):
where N and M are the lengths of X and Y, respectively. If P(X,Y) defined by Expression (5) is a warping path of X and Y, each path node could be defined by a pair of nodes of X and Y as shown in Expression (6).
In addition, the warping path also abides by the following restrains.
- (1)
- ,;
- (2)
- if and , then , .
Let denote all the warping paths of X and Y, DTW aims to find an optimal one that possesses minimum cumulative distance as shown in Expression (7).
where is the distance between two nodes and among a warping path P(X, Y).
The problem could be solved in a dynamic programming way. Namely, let (or ) denote the subset of X (or Y) that starts from the sth node to the eth node, the cumulative distance of consists of the node distance and the minimum value among , and as described in Expression (8).
where indicates the cumulative distance of a path. This is the reason for DTW to maintain the CDM and calculate according to Expression (3), which is another version of Expression (8).
2.2. LDTW
To ease the pathological alignment problem, besides the series length, LDTW takes the warping path length into consideration as well, which extends the original two-dimensional CDM of size to a three-dimensional matrix of size , where N and M are the lengths of two series, is the upper bound of the warping path length, the range of which is under the rule of DTW (see Ref. [] for the details about ). For example, Figure 2 showed a case that applies LDTW on UCR data named SyntheticControl, where N = M = 60, = 79. The space used by LDTW is a cubic matrix of size . By contrast, DTW only uses the bottom of the cube. The elements that participated in the calculation are colored in the figure as well, which is 18490 in total for LDTW and 3600 for DTW. It shows that, compared to DTW, the time and space complexity of LDTW is greatly increased.
In this paper, a matrix of size is used to replace the above three-dimensional CDM with an alternating scheme, which reduces the cost of time and space dramatically.
3. The Proposed Method
There are two goals for DTW and the variant algorithms in general, which are finding (1) the similarity and (2) the optimal warping path of two given time series. This section will present our solutions, respectively.
3.1. The Alternating Matrix Based Similarity Calculation
The primary innovation of the proposed method is the usage of a two-channel matrix with an alternating scheme, which can replace the three-dimensional CDM of LDTW and save a lot of computer memory.
As illustrated in Figure 3, the proposed matrix has two channels indicated by and , respectively. It could be seen as a subset of the three-dimensional CDM and travels over the CDM space during the similarity calculation process step by step. In each step, data in stand for the calculated result of the previous step. Moreover, it is reserved to participate in the calculation of the current step, which happens in . The last thing to accomplish in each step is to alternate the role of the two channels, in other words (or ) in Step will be (or ) in Step , which is the main reason why we call our matrix the alternating matrix (AM).
Figure 3.
The proposed two-channel alternating matrix within the CDM space.
The calculation workflow can be seen in Figure 4. The system takes the above-mentioned as input and outputs the similarity S which equals to a specific element of the AM (i.e., ). The core step is the update of the AM, which is described in Algorithm 1. In the beginning, the algorithm travels over Y and the warping path dimension as shown from Step 1 to Step 4, where minS and maxS are the ranges calculated by functions named MinStep() and MaxStep(), respectively. Readers can find the calculation details in Ref. []. Step 5 specifies how an element , as shown in Figure 3, is determined by pre-calculated , and . Channel will be reset in Step 9 before the alternating process, for it will become in the next round of iteration. The iteration stops when i becomes larger than N.
| Algorithm 1: AM Update | |
| Input: X, Y, N, M, D, i, cur, pre, LUB | |
| Ouput: updated D | |
| 1 | for j from 1 to M do |
| 2 | minS←MinStep(i, j), maxS←MaxStep(i, j, N, M, LUB) |
| 3 | if minS < maxS do |
| 4 | for s from minS to maxS do |
| 5 | |
| 6 | end for |
| 7 | end if |
| 8 | end for |
| 9 | |
Figure 4.
The workflow of the proposed similarity calculation process.
3.2. The Evolutionary Chain Tree Based Optimal Warping Path Determination
Besides the similarity, we can also find the corresponding warping path, which shows the matching pairs of two series. To achieve that, a chain tree with an evolutionary scheme is proposed. We also modified the structure of the AM, where each element possesses not only a value but also a pointer.
For example, the nodes and links of the chain tree are shown as dots and arrows in Figure 5, and six AM elements are drawn as cubes. Each cube is divided into two parts, the top part is the pointer domain leading to a corresponding tree node, while the bottom part is the value domain for the storage of the cumulative distance.
Figure 5.
Illustration of the AM and ECT.
The above tree is referred to as the evolutionary chain tree (ECT) because we use a chain tree to represent the warping paths and the tree is growing and pruning dynamically during the process. The usage of ECT is another major contribution of this work.
With the ECT, the workflow demonstrated in Figure 4 can be extended to an updated version shown in Figure 6. The main differences are marked as blocks in grey, which include the growing and pruning of the ECT, and the retrieval of the optimal warping path.
Figure 6.
The updated workflow of the proposed method.
3.2.1. Growing
The scale of ECT grows after each update step of AM. Specifically, as soon as the computation in finished, tree nodes will be created and linked to the ECT. Each tree node is initialized as a structure shown in Expression (9).
where is the pointer that leads to a prior tree node. Description of will be given later.
If a node is initialized and linked from AM element as shown in Figure 7a, the next question is which node is its precursor. According to Step 5 in Algorithm 1, is partially determined by the minimum among , and . Therefore, the precursor of is the tree node that links from the minimum among , and as well. The above processes are shown in Algorithm 2, from Steps 5 to Step 7.
Figure 7.
The potential precursor (a) and successors (b) of a tree node ps(xi, yj).
The term of a tree node p is a four-digit value. The higher two digits are defined in Table 1, which is a clue to finding all the X and Y indexes of the optimal warping path nodes since we did not save them. Specifically, when retrieving the optimal warping path, it begins from the tree node linked from backwards to the first one following the pointers. Because the indexes of the last node are known, with the higher two digits, it is easy to find the indexes of the rests. While the lower two digits stand for the number of its successors, which is no more than three as shown in Figure 7b. The lower two digits are crucial to the pruning process introduced in the next section. Step 8 in Algorithm 2 describes the process related to the term accordingly.
| Algorithm 2: ECT Growing | |
| Input: N, M, D, i, cur, pre, LUB | |
| Ouput: updated D | |
| 1 | for j from 1 to M do |
| 2 | minS←MinStep(i, j), maxS←MaxStep(i, j, N, M, LUB) |
| 3 | if minS < maxS do |
| 4 | for s from minS to maxS do |
| 5 | |
| 6 | q←min{Dpre [j][s−1], Dpre [j-1][s−1], Dcur [j-1][s−1]} |
| 7 | |
| 8 | |
| 9 | end for |
| 10 | end if |
| 11 | end for |
Table 1.
Definition of the higher two-digit data term for tree node ps(cur, j).
3.2.2. Pruning
As the ECT grows, some branches lose their activity. Figure 8a demonstrates such a case, where two branches are not growing after new nodes have been added to ECT. Those branches can be pruned to save memory; the pruning result is shown in Figure 8b.
Figure 8.
Illustration of ECT before (a) and after (b) pruning.
In our method, the pruning starts from leaf nodes drawn as circles in Figure 8a. They can be found from as shown in Algorithm 3, Step 5. If their lower two-digit data term equals 0b00, then they need to be removed because it means they have no successor.
| Algorithm 3: ECT Pruning | |
| Input: N, M, D, i, cur, pre, LUB | |
| Ouput: updated D | |
| 1 | for j from 1 to M do |
| 2 | minS←MinStep(i−1, j), maxS←MaxStep(i−1, j, N, M, LUB) |
| 3 | if minS < maxS do |
| 4 | for s from minS to maxS do |
| 5 | |
| 6 | while lower(p.data) equal to 0b00 do |
| 7 | q←p, p←p.prior, p.data--, delete q |
| 8 | end while |
| 9 | end for |
| 10 | end if |
| 11 | end for |
Figure 9a shows the final ECT applying the proposed method on SyntheticControl. Moreover, if no pruning is used, it would look like the one shown in Figure 9b. Figure 9c shows the optimal warping path.
Figure 9.
The final ECT (a) with pruning and (b) without pruning. (c) The optimal warping path that extracted from the final ECT.
4. Experiments and Results
The proposed method was implemented using the C++ programming language. The public dataset UCR [] was adopted for the 1-NN classification tests on a desktop computer with AMD Ryzen 7 5800X 3.80 GHz CPU, 64 GB memory. We compared our method with LDTW in terms of time and space consumption.
4.1. Comparisons
To compare our method with LDTW in space costs, we tested it on all species in UCR. We selected the result of 15 data points for showing, and each is of a different name and length as described in the first and second columns of Table 2. There are two key phases in our method, namely the similarity calculation phase and optimal warping path determination phase, therefore we recorded the space cost of them as Ph1(MB) and Ph2(MB). As the table shows, our method uses 1.33% of the space used by LDTW on average.
Table 2.
Comparisons between LDTW and our method in terms of space costs.
Our comparison was also completed in time costs. According to the results shown in Figure 10, there are 15 data points which are organized in ascending order of scale in the first column of Table 2 in the horizontal direction, and there are the specific time costs (ms) of our method and LDTW in the vertical direction. As the scale of the time series increases from left to right, the superiority of our method becomes more obvious.
Figure 10.
Comparisons between LDTW and our method in terms of time costs.
4.2. Ablation Experiment
To show the contribution of the pruning proposed in our method, system performance with and without pruning is investigated. As Figure 11 shows, the space consumption could be greatly reduced with the pruning process. In addition, it is normal that the space cost rises along with the increase of parameter . With the help of pruning, few variations have been found in Figure 11, compared to the case without pruning which is sensitive to the choice of . The scales of the data used in Figure 11 are listed in Table 3.
Figure 11.
The ablation experiment results. The horizontal axis is the parameter , the vertical axis is the memory cost running on different data. *-P and *-NP stand for the method with and without pruning, respectively.
Table 3.
The scales of the data shown in Figure 11.
5. Discussion
Thanks to the proposed alternating matrix, great achievement has been made in reducing the memory cost compared to the LDTW method. The price of this huge deflation is the need for an additional data structure to maintain the warping paths, as well as a new strategy for optimal warping path retrieval. We solve that problem by the proposed evolutionary chain tree, which will sacrifice little time and space, but it is just a drop in the ocean compared to the contributions. The performance of the proposed method still outranges the LDTW a lot.
Another issue is about the choice of , which is the only parameter in this method. The usage and setting criteria of in our work follow the idea introduced by the LDTW algorithm []. In experiments, we found that different values of may slightly alter the accuracy, but it is insensitive to our final space costs as shown in the ablation experiment. Therefore, to get a fairer comparison, we adopted the same method as [] for to keep a similar parameters environment.
6. Conclusions
This paper proposes a novel resolution for recording and exploding wrapping paths with much less space-time complexity. Firstly, a two-channel matrix is created and travels over the entire cumulative distance space with an alternating scheme to calculate the similarity. Secondly, a chain tree is involved to record all warping paths, and the tree is gradually growing and pruned along with the matrix alternating simultaneously, which ensures an efficient retrieval of the optimal path. Experiments running on the UCR benchmark show that our method uses 1.33% of the space, 82.7% of the time used by LDTW on average. Future work would focus on improving the evaluation accuracy.
Author Contributions
Manuscript writing, Z.Z.; experiments, X.-S.L.; project administration, S.-J.L.; funding acquisition and proofreading, M.-X.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Natural Science Foundation of Hunan Province (grant number 2021JJ30574), the Research Foundation of Education Bureau of Hunan Province (grant number 21B0424), the Natural Science Foundation of Fujian Province (grant number JAT210283, 2022J01932), the Open Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (grant number MJUKF-IPIC202208).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Lichtenauer, J.F.; Hendriks, E.A.; Reinders, M.J. Sign language recognition by combining statistical DTW and independent classification. IEEE Trans. Pattern Anal. 2008, 30, 2040–2046. [Google Scholar] [CrossRef]
- Zhang, Z.; Tang, P.; Hu, C.; Liu, Z.; Zhang, W.; Tang, L. Seeded Classification of Satellite Image Time Series with Lower-Bounded Dynamic Time Warping. Remote Sens. 2022, 14, 2778. [Google Scholar] [CrossRef]
- Amerineni, R.; Gupta, L.; Steadman, N.; Annauth, K.; Burr, C.; Wilson, S.; Barnaghi, P.; Vaidyanathan, R. Fusion Models for Generalized Classification of Multi-Axial Human Movement: Validation in Sport Performance. Sensors 2021, 21, 8409. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Zhang, H.; Dong, Y.; Zuo, T.; Xu, D. An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging. Sensors 2021, 21, 7414. [Google Scholar] [CrossRef]
- Lei, T.C.; Wan, S.; Wu, Y.C.; Wang, H.-P.; Hsieh, C.-W. Multi-Temporal Data Fusion in MS and SAR Images Using the Dynamic Time Warping Method for Paddy Rice Classification. Agriculture 2022, 12, 77. [Google Scholar] [CrossRef]
- Kumar, D.; Wu, H.; Rajasegarar, S.; Leckie, C.; Krishnaswamy, S.; Palaniswami, M. Fast and scalable big data trajectory clustering for understanding urban mobility. IEEE Trans. Intell. Transp. 2018, 19, 3709–3722. [Google Scholar] [CrossRef]
- Petitjean, F.; Ketterlin, A.; Gançarski, P. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recogn. 2011, 44, 678–693. [Google Scholar] [CrossRef]
- Jiang, Y.; Qi, Y.; Wang, W.K.; Bent, B.; Avram, R.; Olgin, J.; Dunn, J. EventDTW: An Improved Dynamic Time Warping Algorithm for Aligning Biomedical Signals of Nonuniform Sampling Frequencies. Sensors 2020, 20, 2700. [Google Scholar] [CrossRef]
- He, Y.; Zhang, X.; Wang, R.; Cheng, M.; Gao, Z.; Zhang, Z.; Yu, W. Faulty Section Location Method Based on Dynamic Time Warping Distance in a Resonant Grounding System. Energies 2022, 15, 4923. [Google Scholar] [CrossRef]
- Debella, T.T.; Shawel, B.S.; Devanne, M.; Weber, J.; Woldegebreal, D.H.; Pollin, S.; Forestier, G. Deep Representation Learning for Cluster-Level Time Series Forecasting. Eng. Proc. 2022, 18, 22. [Google Scholar]
- Cui, J.-W.; Li, Z.-G.; Du, H.; Yan, B.-Y.; Lu, P.-D. Recognition of Upper Limb Action Intention Based on IMU. Sensors 2022, 22, 1954. [Google Scholar] [CrossRef]
- Zhao, S.; Cai, H.; Li, W.; Liu, Y.; Liu, C. Hand Gesture Recognition on a Resource-Limited Interactive Wristband. Sensors 2021, 21, 5713. [Google Scholar] [CrossRef]
- Li, T.; Shi, C.; Li, P.; Chen, P. A Novel Gesture Recognition System Based on CSI Extracted from a Smartphone with Nexmon Firmware. Sensors 2021, 21, 222. [Google Scholar] [CrossRef]
- Li, H.; Khoo, S.; Yap, H.J. Implementation of Sequence-Based Classification Methods for Motion Assessment and Recognition in a Traditional Chinese Sport (Baduanjin). Int. J. Environ. Res. Public Health 2022, 19, 1744. [Google Scholar] [CrossRef]
- Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the Knowledge Discovery and Data Mining Workshop, Seattle, WA, USA, 31 July 1994. [Google Scholar]
- Phan, T.T.H.; Caillault, E.P.; Lefebvre, A.; Bigand, A. Dynamic time warping-based imputation for univariate time series data. Pattern Recogn. Lett. 2020, 139, 139–147. [Google Scholar] [CrossRef] [Green Version]
- Guo, F.; Zou, F.; Luo, S.; Liao, L.; Wu, J.; Yu, X.; Zhang, C. The Fast Detection of Abnormal ETC Data Based on an Improved DTW Algorithm. Electronics 2022, 11, 1981. [Google Scholar] [CrossRef]
- Chang, C.; Shaw, T.; Goutam, A.; Lau, C.; Shan, M.; Tsai, T.J. Parameter-Free Ordered Partial Match Alignment with Hidden State Time Warping. Appl. Sci. 2022, 12, 3783. [Google Scholar] [CrossRef]
- Gong, L.; Chen, B.; Xu, W.; Liu, C.; Li, X.; Zhao, Z.; Zhao, L. Motion Similarity Evaluation between Human and a Tri-Co Robot during Real-Time Imitation with a Trajectory Dynamic Time Warping Model. Sensors 2022, 22, 1968. [Google Scholar] [CrossRef]
- Combes, F.; Fraiman, R.; Ghattas, B. Time Series Sampling. Eng. Proc. 2022, 18, 32. [Google Scholar]
- Zhang, Z.; Tavenard, R.; Bailly, A.; Tang, X.; Tang, P.; Corpetti, T. Dynamic time warping under limited warping path length. Inform. Sciences 2017, 393, 91–107. [Google Scholar]
- Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 43–49. [Google Scholar] [CrossRef] [Green Version]
- Anantasech, P.; Ratanamahatana, C.A. Enhanced weighted dynamic time warping for time series classification. In Proceedings of the Third International Congress on Information and Communication Technology, London, UK, 27–28 February 2018. [Google Scholar]
- Jeong, Y.S.; Jeong, M.K.; Omitaomu, O.A. Weighted dynamic time warping for time series classification. Pattern Recogn. 2011, 44, 2231–2240. [Google Scholar] [CrossRef]
- Ratanamahatana, C.A.; Keogh, E. Making time-series classification more accurate using learned constraints. In Proceedings of the 2004 SIAM International Conference on Data Mining, Lake Buena Vista, FL, USA, 22–24 April 2004. [Google Scholar]
- Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Automatic. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
- Cao, Y.; Ma, S.; Cao, Y.; Pan, G.; Huang, Q.; Cao, Y. Similarity Evaluation Rule and Motion Posture Optimization for a Manta Ray Robot. J. Mar. Sci. Eng. 2022, 10, 908. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).