SideLengthIndependent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile^{ †}
Abstract
:1. Introduction
1.1. Literature Review
1.2. Contribution
2. Methodology
2.1. Underlying Algorithms
2.1.1. Symbolic Aggregate Approximation (SAX)
2.1.2. Minimum Description Length (MDL)
2.1.3. Matrix Profile (MP)
2.2. Combined Methodology
2.2.1. Application of MDL to SAX strings
2.2.2. Hyperparameter Selection: Influence of Alphabet Size Choice upon MDL Compression Rate
2.2.3. Motif Discovery
2.2.4. Independent SideLength Motif Discovery Process
Algorithm 1: SideLengthIndependent Motif (SLIM) pseudocode. 
Data: Input raw time series 
Result: Candidate motif locations with variable sidelength 
Step A: Transform raw input series into a suitable SAX representation 
Step B: Compress the SAX series using MDL to create an MDLSAX series 
Step C: MDLSAX series serves as input to the MP algorithm creating an 
MDLSAXMP series 
while examining MDLSAXMP series do 

end 
2.2.5. Advantages
 Permits identification of motif pairs in which the length of each side is independent.
 Properties of the underlying algorithms are inherited.
 
 Dimensionality reduction of SAX (if required).
 
 Efficiency and scalability of the MP.
 Is independent of SAX and MP versions used and so can take advantage of further improvements to these algorithms.
3. Results and Discussions
3.1. Finance
3.1.1. SideLengthIndependent Motif Discovery
3.1.2. Alternative Motif Identification Algorithms Comparison
3.1.3. Localised Volatility Analysis
3.2. Energy Sector
3.2.1. SideLengthIndependent Motif Discovery
3.2.2. Globalised Volatility Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SLIM  SideLengthIndependent Motif 
SES  Simple Exponential Smoothing 
ARIMA  Autoregressive Integrated Moving Average 
SVR  Support Vector Regression 
WIG20  Warsaw stock exchange index 
SAX  Symbolic Aggregate Approximation 
MDL  Minimum Description Length 
MP  Matrix Profile 
MPI  Matrix Profile Index 
SFA  Symbolic Fourier Approximation 
S&P500  Standard and Poor’s 500 
References
 Mueen, A.; Keogh, E.; Zhu, Q.; Cash, S.; Westover, B. Exact Discovery of Time Series Motifs. In Proceedings of the SIAM International Conference on Data Mining, Sparks, NV, USA, 30 April–2 May 2009; pp. 35–53, 473–484. [Google Scholar] [CrossRef]
 Lin, J.; Keogh, E.; Lonardi, S.; Patel, P. Finding motifs in timeseries. In Proceedings of the Second Workshop on Temporal Data Mining, (KDD 2002), Edmonton, AB, Canada, 23–26 July 2002. [Google Scholar]
 Mueen, A. Time series motif discovery: Dimensions and applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2014, 4, 152–159. [Google Scholar] [CrossRef]
 Investopedia (a): Common Chart Pattern Definitions. Available online: https://www.investopedia.com/articles/technical/112601.asp (accessed on 6 December 2021).
 Vivas, E.; AllendeCid, H.; Salas, R.; Vivas, E. A Systematic Review of Statistical and Machine Learning Methods for Electrical Power Forecasting with Reported MAPE Score. Entropy 2020, 22, 1412. [Google Scholar] [CrossRef] [PubMed]
 He, X.J. Crude Oil Prices Forecasting: Time Series vs. SVR Models. Int. Inf. Manag. Assoc. 2018, 27, 25. Available online: https://scholarworks.lib.csusb.edu/jitim/vol27/iss2/2 (accessed on 6 December 2021).
 Domino, K. The use of the Hurst exponent to investigate the global maximum of the Warsaw Stock Exchange WIG20 index. Phys. Stat. Mech. Its Appl. 2012, 391, 156–169. [Google Scholar] [CrossRef]
 Xiaoxi, D.; Ruoming, J.; Liang, D.; Lee, V.E.; Thornton, J.H. Migration Motif A Spatial Temporal Pattern Mining Approach for Financial Markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 1135–1144. [Google Scholar] [CrossRef]
 Elangovan, R.; Padmavathi, S. A Review on Time Series Motif Discovery Techniques an Application to ECG Signal Classification: ECG Signal Classification Using Time Series Motif Discovery Techniques. Int. J. Artif. Intell. Mach. Learn. (IJAIML) 2019, 9, 39–56. [Google Scholar] [CrossRef]
 Silva, D.F.; Yeh, C.C.M.; Zhu, Y.; Batista, G.E.A.P.A.; Keogh, E. Fast Similarity Matrix Profile for Music Analysis and Exploration. IEEE Trans. Multimed. 2019, 21, 29–38. [Google Scholar] [CrossRef]
 Gao, Y.; Lin, J. Exploring variablelength time series motifs in one hundred million length scale. Data Min. Knowl. Discov. 2018, 32, 1200–1228. [Google Scholar] [CrossRef]
 Torkamani, S.; Lohweg, V. Survey on time series motif discovery. WIREs Data Min. Knowl. Discov. 2017, 7, e1199. [Google Scholar] [CrossRef]
 Fu, T.K. A review on time series data mining. Eng. Appl. Artif. Intell. 2011, 32, 164–181. [Google Scholar] [CrossRef]
 Chiu, B.; Keogh, E.; Lonardi, S. Probabilistic discovery of time series motifs. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2013; pp. 493–498. [Google Scholar] [CrossRef]
 Lin, J.; Keogh, E.; Wei, L.; Lonardi, S. Experiencing SAX: A novel symbolic representation of time series. Data Min. Knowl. Discov. 2007, 15, 107–144. [Google Scholar] [CrossRef][Green Version]
 Castro, N.; Azevedo, P.J. Multiresolution Motif Discovery in Time Series. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM2010), Columbus, ON, USA, 29 April–1 May 2021; pp. 665–676. [Google Scholar] [CrossRef][Green Version]
 Castro, N.; Azevedo, P.J. Time Series Motifs Statistical Significance. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM2011), Mesa, AZ, USA, 28–30 April 2011; pp. 687–698. [Google Scholar] [CrossRef][Green Version]
 Li, Y.; Hou, U.; Yiu, M.L.; Gong, Z. Quickmotif: An efficient and scalable framework for exact motif discovery. In Proceedings of the IEEE 31st International Conference on Data Engineering (ICDE 2015), Seoul, Korea, 13–16 April 2015; pp. 579–590. [Google Scholar] [CrossRef]
 Yeh, C.M.; Zhu, Y.; Ulanova, L.; Begum, N.; Ding, Y.; Dau, H.; Silva, D.F.; Mueen, A.; Keogh, E. Matrix Profile I: All pairs similarity joins for time series a unifying view that includes motifs discords and shapelets. In Proceedings of the IEEE ICDM, Barcelona, Spain, 1–15 December 2016; pp. 1317–1322. [Google Scholar] [CrossRef]
 The University of California Riverside (UCR) Matrix Profile. Available online: https://www.cs.ucr.edu/~eamonn/MatrixProfile.html (accessed on 6 December 2021).
 Yuan, L.; Lin, J. Approximate variablelength time series motif discovery using grammar inference. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, Washington, DC, USA, 25 July 2010; pp. 1–9. [Google Scholar] [CrossRef][Green Version]
 Nunthanid, P.; Niennattrakul, V.; Ratanamahatana, C.A. Discovery of variable length time series motif. In Proceedings of the 8th Electrical Engineering/ Electronics, Computer, Telecommunications and Information Technology (ECTICON 2011), Khon Kaen, Thailand, 17–19 May 2011; pp. 472–475. [Google Scholar] [CrossRef]
 Nunthanid, P.; Niennattrakul, V.; Ratanamahatana, C.A. Parameterfree motif discovery for time series data. In Proceedings of the 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTICON 2012), Hua Hin, Thailand, 16–18 May 2012; pp. 1–4. [Google Scholar] [CrossRef]
 Lam, H.; Calders, T.; Pham, N. Online Discovery of Topk Similar Motifs in Time Series Data Read. In Proceedings of the 2011 SIAM International Conference on Data Mining (SDM11), Mesa, AZ, USA, 28–30 April 2011; pp. 1004–1015, ISBN 9780898719925. [Google Scholar]
 Linardi, M.; Zhu, Y.; Palpanas, T.; Keogh, E. Matrix Profile X: VALMOD–Scalable Discovery of VariableLength Motifs in Data Series. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD18), Houston, TX, USA, 10–15 June 2018; pp. 1053–1066. [Google Scholar] [CrossRef]
 Madrid, F.; Imani, S.; Mercer, R.; Zimmerman, Z.; Shakibay, N.; Mueen, A.; Keogh, E. Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile. In Proceedings of the IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 10–11 November 2019; Volume 1, pp. 175–182. [Google Scholar] [CrossRef]
 Somarajan, S.; Shankar, M.; Sharma, T.; Jeyanthi, R. Modelling and Analysis of Volatility in Time Series Data. In Soft Computing and Signal Processing (ICSCSP 2018). Part of the Advances in Intelligent Systems and Computing Book Series (AISC, Volume 898); Wang, J., Reddy, G., Prasad, V., Reddy, V., Eds.; Springer: Singapore, 2019; Volume 898, pp. 609–618. [Google Scholar] [CrossRef]
 The University of California Riverside (UCR) SAX. Available online: https://www.cs.ucr.edu/~eamonn/SAX.htm (accessed on 6 December 2021).
 Ruan, G.; Hanson, P.C.; Dugan, H.A.; Plale, B. Mining lake time series using symbolic representation. Ecol. Inform. 2017, 39, 10–22. [Google Scholar] [CrossRef][Green Version]
 Shieh, J.; Keogh, E. ISAX: Indexing and mining terabyte sized time series. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; Volume 14, pp. 623–631. [Google Scholar]
 Schäfer, P.; Högqvist, M. SFA: A Symbolic Fourier Approximation and Index for Similarity Search in High Dimensional Datasets. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT), Berlin, Germany, 26–30 March 2012; Volume 1, pp. 516–527. [Google Scholar] [CrossRef]
 Amornbunchornvej, C.; Navaporn, S.; Anon, P.; Suttipong, T. Identifying Linear Models in MultiResolution Population Data Using Minimum Description Length Principle to Predict Household Income. ACM Trans. Knowl. Discov. Data 2021, 15, 1–30. [Google Scholar] [CrossRef]
 Grünwald, P.D. The Minimum Description Length Principle; MIT Press: Cambridge, MA, USA, 2007; ISBN 9780262072816. [Google Scholar] [CrossRef]
 Meegan, A.; Corbet, S.; Larkin, C. Financial market spillovers during the quantitative easing programmes of the global financial crisis (2007–2009) and the European debt crisis. J. Int. Financ. Mark. Inst. Money 2018, 56, 128–148. [Google Scholar] [CrossRef]
 Bracke, T.; Michael, F. The macrofinancial factors behind the crisis: Global liquidity glut or global savings glut? N. Am. J. Econ. Financ. 2012, 23, 185–202. [Google Scholar] [CrossRef]
 Cartwright, E.; Crane, M.; Ruskin, H.J. Financial Time Series: Motif Discovery and Analysis Using VALMOD. In Proceedings of the International Conference on Computational Science, Faro, Portugal, 12–14 June 2019; pp. 771–778. [Google Scholar] [CrossRef][Green Version]
 Cartwright, E.; Crane, M.; Ruskin, H.J. Financial Time Series: Market Analysis Techniques Based on Matrix Profiles. Eng. Proc. 2021, 5, 45. [Google Scholar] [CrossRef]
 Ferreira, P.G.; Azevedo, P.J. Evaluating deterministic motif significance measures in protein databases. Algorithms Mol. Biol. 2007, 2, 16. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Open Power System Data. 2020. Data Package Time Series. Version 20201006: Primary Data from Various Sources, for a Complete List. Available online: https://data.openpowersystemdata.org//time_series/latest/ (accessed on 6 December 2021).
 Bloomberg S&P500 Index, Including Summary. Available online: https://www.bloomberg.com/quote/SPX:IND (accessed on 6 December 2021).
 Investopedia (b): Volatility Summary. Available online: https://www.investopedia.com/terms/v/volatility.asp (accessed on 6 December 2021).
 World Health Organisation Covid19 Pandemic Timeline. Available online: https://www.who.int/news/item/29062020covidtimeline (accessed on 6 December 2021).
Raw Series Date  SAX Series Index  SAXVal  SAXValDiff  SymJoinNum  RawSeries Index 

2 January 2009  97  5  1  3  254 
7 January 2009  98  4  −1  1  257 
…  …  …  …  …  … 
28 January 2009  104  4  1  1  271 
29 January 2009  105  3  −1  2  272 
MDLSAXSeriesIdx  SAXValDiffTotal  SymJoinNumTotal  SAXValAmplitude 

1  25  12  15 
2  26  11  14 
3  26  10  17 
…  …  …  … 
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cartwright, E.; Crane, M.; Ruskin, H.J. SideLengthIndependent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile. Forecasting 2022, 4, 219237. https://doi.org/10.3390/forecast4010013
Cartwright E, Crane M, Ruskin HJ. SideLengthIndependent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile. Forecasting. 2022; 4(1):219237. https://doi.org/10.3390/forecast4010013
Chicago/Turabian StyleCartwright, Eoin, Martin Crane, and Heather J. Ruskin. 2022. "SideLengthIndependent Motif (SLIM): Motif Discovery and Volatility Analysis in Time Series—SAX, MDL and the Matrix Profile" Forecasting 4, no. 1: 219237. https://doi.org/10.3390/forecast4010013