Simulation Study on How Input Data Affects Time-Series Classification Model Results
Abstract
:1. Introduction
- the generation of synthetic time-series datasets with controlled characteristics, such as the number of classes, level of noise, and data contamination.
- the comparison of selected time-series classification models based on the generated datasets, with focus on their accuracy and efficiency.
- the analysis of models’ performance in response to changes in input data characteristics, including the number of classes and noise, offering the insights into how different dataset characteristics influence model performance.
2. Literature Review
3. Models
- Naive models
- Feature- and pattern-based models.
3.1. Naive Methods
- c denotes the class label,
- j is the class index (iterator), where ,
- k represents the total number of classes,
- indicates the number of occurrences of class in the dataset,
- N denotes the total number of observations,
- p indicates the number of time series (columns in dataset).
3.2. Used Models
3.3. Accuracy Measures
4. Data Simulations
4.1. Wave Types
- t—time index; ;
- freq—frequency of the wave, determining how many cycles occur across the series length;
- min—minimum value of the amplitude;
- max—maximum value of the amplitude;
- Sinusoidal Wave (‘sin’)The sinusoidal wave is defined by:
- -
- The term represents the amplitude of the wave.
- -
- The offset centers the wave between min and max.
- Cosinusoidal Wave (‘cos’)The cosinusoidal wave is defined by:
- -
- The formula is similar to the sinusoidal wave, but it starts at its maximum value when .
- Triangle Wave (‘triangle’)The triangle wave is defined by:
- -
- The function generates a triangular waveform with symmetrical peaks and valleys.
- Sawtooth Wave (‘sawtooth’)The sawtooth wave is defined by:
- -
- This wave rises linearly to its peak and then drops sharply to the minimum value.
- Inverted Sawtooth Wave (‘inverted_sawtooth’)The inverted sawtooth wave is defined by:
- -
- This wave descends linearly to the minimum value and then jumps sharply to the maximum.
- Pulse Width Modulation (PWM) Wave (‘pwm’)The PWM wave is defined by:
- -
- The duty parameter determines how long the signal stays “high” during each cycle.
- -
- A duty cycle of 0.5 results in equal time spent in the “high” and “low” states.
- -
- By default, in our study, the duty parameter is set to 0.7, meaning that the signal remains “high” for 70% of each cycle.
4.2. Noise
4.3. Datasets
- 10 datasets with two classes: cosinusoidal and PWM.
- 18 datasets with three classes: cosinusoidal, PWM, and sinusoidal.
- 18 datasets with four classes: cosinusoidal, PWM, sinusoidal, and triangular.
- 18 datasets with five classes: cosinusoidal, PWM, sinusoidal, triangular, and sawtooth.
- 18 datasets with six classes: cosinusoidal, PWM, sinusoidal, triangular, sawtooth, and inverted sawtooth.
5. Experiment
- Catch22 Classifier: n_estimators = 50, 100, 250, 500;
- Rocket Classifier: n_kernels = 100, 1000, 5000, 10,000;
- Time Series Forest Classifier: n_estimators = 50, 100, 250, 500;
- TSFresh Classifier: default_fc_parameters = “efficient”, “minimal”;
- WEASEL Classifier: window_incint = 2, 4, 8; alphabet_size = 2, 4, 8;
- CNN Classifier: n_epochs = 100, 1000, 2000;
5.1. Computational Resources
5.2. Results
5.3. Validity Threats
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Waldman, C.; Hermel, M.; Hermel, J.; Allinson, F.; Pintea, M.; Bransky, N.; Udoh, E.; Nicholson, L.; Robinson, A.; Gonzalez, J.; et al. Artificial intelligence in healthcare: A primer for medical education in radiomics. Pers. Med. 2022, 19, 445–456. [Google Scholar] [CrossRef] [PubMed]
- Wachman, G.; Khardon, R.; Protopapas, P.; Alcock, C.R. Kernels for Periodic Time Series Arising in Astronomy. In Machine Learning and Knowledge Discovery in Databases, Proceedings of the European Conference, ECML PKDD 2009, Bled, Slovenia, 7–11 September 2009; Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 489–505. [Google Scholar]
- Chen, J.F.; Chen, W.L.; Huang, C.P.; Huang, S.H.; Chen, A.P. Financial Time-Series Data Analysis Using Deep Convolutional Neural Networks. In Proceedings of the 2016 7th International Conference on Cloud Computing and Big Data (CCBD), Macau, China, 16–18 November 2016; pp. 87–92. [Google Scholar] [CrossRef]
- Gajowniczek, K.; Bator, M.; Zabkowski, T.; Orlowski, A.; Loo, C.K. Simulation Study on the Electricity Data Streams Time Series Clustering. Energies 2020, 13, 924. [Google Scholar] [CrossRef]
- Middlehurst, M.; Ismail-Fawaz, A.; Guillaume, A.; Holder, C.; Guijo-Rubio, D.; Bulatova, G.; Tsaprounis, L.; Mentel, L.; Walter, M.; Schäfer, P.; et al. aeon: A Python Toolkit for Learning from Time Series. J. Mach. Learn. Res. 2024, 25, 1–10. [Google Scholar]
- Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
- Dempster, A.; Petitjean, F.; Webb, G.I. ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 2019, 34, 1454–1495. [Google Scholar] [CrossRef]
- Lubba, C.H.; Sethi, S.S.; Knaute, P.; Schultz, S.R.; Fulcher, B.D.; Jones, N.S. catch22: CAnonical Time-series CHaracteristics. Data Min. Knowl. Discov. 2019, 33, 1821–1852. [Google Scholar] [CrossRef]
- Dhariyal, B.; Le Nguyen, T.; Ifrim, G. Back to Basics: A Sanity Check on Modern Time Series Classification Algorithms. In Advanced Analytics and Learning on Temporal Data, Proceedings of the 8th ECML PKDD Workshop, AALTD 2023, Turin, Italy, 18–22 September 2023; Springer: Cham, Switzerland, 2023; pp. 205–229. [Google Scholar]
- Middlehurst, M.; Schäfer, P.; Bagnall, A. Bake off redux: A review and experimental evaluation of recent time series classification algorithms. Data Min. Knowl. Discov. 2024, 38, 1958–2031. [Google Scholar] [CrossRef]
- Jiang, W. Time series classification: Nearest neighbor versus deep learning models. SN Appl. Sci. 2020, 2, 721. [Google Scholar] [CrossRef]
- Taga, E.O.; Ildiz, M.E.; Oymak, S. TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data. arXiv 2025, arXiv:2502.16294. [Google Scholar] [CrossRef]
- Pei, H.; Ren, K.; Yang, Y.; Liu, C.; Qin, T.; Li, D. Towards Generating Real-World Time Series Data. In Proceedings of the 2021 IEEE International Conference on Data Mining (ICDM), Auckland, New Zealand, 7–10 December 2021; pp. 469–478. [Google Scholar] [CrossRef]
- Qian, J.; Xie, B.; Wan, B.; Li, M.; Sun, M.; Chiang, P.Y. TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation. arXiv 2024, arXiv:2407.04211. [Google Scholar]
- Coletta, A.; Gopalakrishan, S.; Borrajo, D.; Vyetrenko, S. On the Constrained Time-Series Generation Problem. arXiv 2023, arXiv:2307.01717. [Google Scholar]
- Vergara, S. A Synthesizer Based on Frequency-Phase Analysis and Square Waves. arXiv 2013, arXiv:0804.3241. [Google Scholar] [CrossRef]
- Sikder, M.F.; Ramachandranpillai, R.; Heintz, F. TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers. arXiv 2024, arXiv:2307.12667. [Google Scholar] [CrossRef]
- Xiao, C.; Jiang, X.; Du, X.; Yang, W.; Lu, W.; Wang, X.; Chetty, K. Boundary-enhanced time series data imputation with long-term dependency diffusion models. Knowl.-Based Syst. 2025, 310, 112917. [Google Scholar] [CrossRef]
- Das, A.; Kong, W.; Sen, R.; Zhou, Y. A decoder-only foundation model for time-series forecasting. arXiv 2024, arXiv:2310.10688. [Google Scholar]
- Xiao, C.; Gou, Z.; Tai, W.; Zhang, K.; Zhou, F. Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion Models. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 2742–2751. [Google Scholar] [CrossRef]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv 2023, arXiv:2211.14730. [Google Scholar]
- Schäfer, P.; Leser, U. Fast and Accurate Time Series Classification with WEASEL. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Association for Computing Machinery, Singapore, 6–10 November 2017; pp. 637–646. [Google Scholar] [CrossRef]
- Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh—A Python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
- Deng, H.; Runger, G.; Tuv, E.; Vladimir, M. A Time Series Forest for Classification and Feature Extraction. Inf. Sci. 2013, 239, 142–153. [Google Scholar] [CrossRef]
- Dissanayake, O.; McPherson, S.; Allyndree, J.; Kennedy, E.; Cunningham, P.; Riaboff, L. Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models. arXiv 2024, arXiv:2404.18159. [Google Scholar]
- Schäfer, P. The BOSS is concerned with time series classification in the presence of noise. Data Min. Knowl. Discov. 2015, 29, 1505–1530. [Google Scholar] [CrossRef]
- Sofia, V. Analysis and Innovation in Time Series Classification Algorithms and Methods: Towards a Superior Convolutional Algorithm with Feature Selection. Master’s Thesis, Aristotle University of Thessaloniki, Thessaloniki, Greece, 2024. [Google Scholar]
- Middlehurst, M.; Large, J.; Bagnall, A. The Canonical Interval Forest (CIF) Classifier for Time Series Classification. In Proceedings of the 2020 IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, 10–13 December 2020; pp. 188–195. [Google Scholar] [CrossRef]
- Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–11. [Google Scholar] [CrossRef]
- Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2020, 17, 168–192. [Google Scholar] [CrossRef]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Luque, A.; Carrasco, A.; Martín, A.; de las Heras, A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019, 91, 216–231. [Google Scholar] [CrossRef]
- Caruana, R.; Niculescu-Mizil, A. Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, Seattle, WA, USA, 22–25 August 2004; pp. 69–78. [Google Scholar] [CrossRef]
- Wang, Z.; Ning, X.; Blaschko, M. Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels. In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2023; Volume 36, pp. 75259–75285. [Google Scholar]
- Vujovic, Z. Classification Model Evaluation Metrics. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 599–606. [Google Scholar] [CrossRef]
- Ferrer, L. Analysis and Comparison of Classification Metrics. arXiv 2022, arXiv:2209.05355. [Google Scholar]
Classifier | Measure | Group |
---|---|---|
CNN Classifier | 0.8812 | a |
Time Series Forest Classifier | 0.6493 | b |
Rocket Classifier | 0.6380 | b |
TSFresh Classifier | 0.6176 | b |
WEASEL | 0.6056 | b |
Catch22 Classifier | 0.3794 | c |
Naive Classifier | 0.2483 | c |
Classifier | Peak Memory (MiB) |
---|---|
Catch22 | 10.94 |
Time Series Forest | 17.10 |
TSFresh | 66.59 |
CNN | 72.02 |
Rocket | 101.30 |
WEASEL | 367.38 |
d | CNN | Catch22 | Rocket | TSFresh | TSF | WEASEL |
---|---|---|---|---|---|---|
2 | 0.9035 | 0.5560 | 0.9785 | 0.7455 | 0.7670 | 0.7410 |
3 | 0.7881 | 0.2509 | 0.5495 | 0.4678 | 0.4713 | 0.5462 |
4 | 0.7890 | 0.1260 | 0.5615 | 0.2920 | 0.3645 | 0.4180 |
5 | 0.7805 | 0.1435 | 0.1478 | 0.3032 | 0.3492 | 0.3637 |
6 | 0.7686 | 0.1015 | 0.1348 | 0.2661 | 0.2756 | 0.2837 |
0.8060 | 0.2356 | 0.4744 | 0.4149 | 0.4455 | 0.4705 | |
0.0493 | 0.1682 | 0.3128 | 0.1799 | 0.1725 | 0.1600 |
Noise | CNN | Catch22 | Rocket | TSFresh | TSF | WEASEL |
---|---|---|---|---|---|---|
5 | 1.0000 | 0.6461 | 0.9806 | 0.9602 | 0.9932 | 0.8665 |
10 | 0.9869 | 0.4062 | 0.6925 | 0.8114 | 0.8894 | 0.6774 |
15 | 0.9328 | 0.3311 | 0.5136 | 0.6233 | 0.7346 | 0.5609 |
20 | 0.8545 | 0.2577 | 0.4433 | 0.4589 | 0.5298 | 0.4768 |
25 | 0.8173 | 0.2025 | 0.4122 | 0.3444 | 0.3741 | 0.4387 |
30 | 0.7631 | 0.1655 | 0.3901 | 0.2649 | 0.2850 | 0.3833 |
35 | 0.7220 | 0.1112 | 0.3629 | 0.2137 | 0.2247 | 0.3643 |
40 | 0.6918 | 0.0923 | 0.3409 | 0.1845 | 0.1816 | 0.3371 |
45 | 0.6598 | 0.0852 | 0.3127 | 0.1595 | 0.1355 | 0.3150 |
50 | 0.6313 | 0.0582 | 0.2956 | 0.1285 | 0.1073 | 0.2850 |
0.8060 | 0.2671 | 0.4744 | 0.4149 | 0.4455 | 0.4705 | |
0.1276 | 0.1742 | 0.2014 | 0.2772 | 0.3078 | 0.1744 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sadowska, M.; Gajowniczek, K. Simulation Study on How Input Data Affects Time-Series Classification Model Results. Entropy 2025, 27, 624. https://doi.org/10.3390/e27060624
Sadowska M, Gajowniczek K. Simulation Study on How Input Data Affects Time-Series Classification Model Results. Entropy. 2025; 27(6):624. https://doi.org/10.3390/e27060624
Chicago/Turabian StyleSadowska, Maria, and Krzysztof Gajowniczek. 2025. "Simulation Study on How Input Data Affects Time-Series Classification Model Results" Entropy 27, no. 6: 624. https://doi.org/10.3390/e27060624
APA StyleSadowska, M., & Gajowniczek, K. (2025). Simulation Study on How Input Data Affects Time-Series Classification Model Results. Entropy, 27(6), 624. https://doi.org/10.3390/e27060624