K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm
Abstract
:1. Introduction
- Definition of K-MDTSC (K-Multi-Dimensional Time-Series Clustering) to group multi-dimensional time-series.
- Thorough validation of K-MDTSC and comparison with k-Shape, a state-of-art time-series clustering algorithm.
- Synthetic datasets and K-MDTSC code available at [8].
- Generic pipeline to process time-series data.
- Application in a real case scenario with real data from the Industrial field.
2. Related Work
3. Time-Series Clustering
3.1. K-Means
3.2. k-Shape
3.3. K-MDTSC
- A well-known criticality is related to the choice of the initial centroids. If a random point is chosen as a centroid and lays in part of space where no other points belong, it may create empty clusters. The worst the clustering result is, in the end, instead of k clusters, the user will retrieve clusters. To reduce this situation probability, we choose the k initial centroids among the multi-dimensional time-series directly.
- Secondly, while the first solution reduces the probability of getting empty clusters, some corner cases are still possible, e.g., if two similar multi-dimensional time-series are present and both are used as centroids. To solve this issue, we verify whether any empty cluster exists. If any is present, for each empty cluster, we select from the most prominent cluster the farthest multi-dimensional time-series and take it as a new centroid.
3.4. Performance Metrics
4. Results on Synthetic Data
4.1. Synthetic Dataset Generation
Lessons Learned
5. Real Use Case
5.1. Data Collection
5.2. Problem Definition
- Can we use the welding data to predict the electrode aging and adopt the predictive maintenance?
- Can we infer further information about the welding process to identify possible patterns hinting at problems?
- Noise reduction: While K-MDTSC can handle any multi-dimensional time-series, domain experts suggested to us to filter the time-series to remove high-frequency noise that depends mostly on the sensors and collection process.
- Data selection: Secondly, weldings may have different durations even within the same welding spot. Moreover, two-phase welding spots are characterized by two distinct periods. Hence, we apply different data selection strategies to extract multi-dimensional time-series having (i) the same duration and (ii) that appear synchronized.
- Data Normalization: Since and have different operational ranges, we run a classic data normalization process based on z-score.
- Time-Series Clustering: We run K-MDTSC to group welding samples and identify different welding shapes within each welding spot.
- Correlation with external indexes: Finally, we study the clusters by using the wear to discover whether different welding shapes are characterized by different tears and wear of the electrode.
5.3. Noise Reduction
5.4. Data Selection
5.5. Data Normalization
- Mean: ;
- Standard deviation:
6. Case Study Evaluation
6.1. First Robot
6.2. Second Robot
Lessons Learned
6.3. Noise Sensitivity
Lessons Learned
7. Conclusions
- both K-MDTSC and k-Shape correctly group multi-dimensional time-series;
- however, K-MDTSC overcomes k-Shape performance when a high number of curve families or noise perturbation are present.
- the wear and tear of the electrode do not negatively impact the welding process;
- considering the different data selection techniques, the wear and tear of the electrode does not negatively impact it neither considering the whole welding process nor considering only the steady-state when the actual welding process is performed;
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hill, T.; O’Connor, M.; Remus, W. Neural network models for time series forecasts. Manag. Sci. 1996, 42, 1082–1092. [Google Scholar] [CrossRef]
- Bhandari, S.; Bergmann, N.; Jurdak, R.; Kusy, B. Time series data analysis of wireless sensor network measurements of temperature. Sensors 2017, 17, 1221. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wei, L.; Keogh, E. Semi-supervised time series classification. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006. [Google Scholar]
- Gopalapillai, R.; Gupta, D.; Sudarshan, T. Experimentation and analysis of time series data for rescue robotics. In Recent Advances in Intelligent Informatics; Springer: Berlin/Heidelberg, Germany, 2014; pp. 443–453. [Google Scholar]
- Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T. Time-series clustering—A decade review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Statistical Laboratory of the University of California: Berkeley, CA, USA, 1967. [Google Scholar]
- Paparrizos, J.; Gravano, L. k-Shape: Efficient and Accurate Clustering of Time Series. ACM Sigmod Rec. 2016, 45, 69–76. [Google Scholar] [CrossRef]
- Smartdatapolito. K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm. 2021. Available online: https://github.com/smartdatapolito/K-MDTSC (accessed on 7 April 2021).
- Celenk, M. A color clustering technique for image segmentation. Comput. Vis. Graph. Image Process. 1990, 52, 145–170. [Google Scholar] [CrossRef]
- Chuang, K.S.; Tzeng, H.L.; Chen, S.; Wu, J.; Chen, T.J. Fuzzy c-means clustering with spatial information for image segmentation. Comput. Med. Imaging Graph. 2006, 30, 9–15. [Google Scholar] [CrossRef] [Green Version]
- Dhanachandra, N.; Manglem, K.; Chanu, Y.J. Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Comput. Sci. 2015, 54, 764–771. [Google Scholar] [CrossRef] [Green Version]
- Glowacz, A. Ventilation Diagnosis of Angle Grinder Using Thermal Imaging. Sensors 2021, 21, 2853. [Google Scholar] [CrossRef]
- Huang, A. Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, 14 April 2008. [Google Scholar]
- Beil, F.; Ester, M.; Xu, X. Frequent term-based text clustering. In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002. [Google Scholar]
- Faroughi, A.; Javidan, R.; Mellia, M.; Morichetta, A.; Soro, F.; Trevisan, M. Achieving horizontal scalability in density-based clustering for URLs. In Proceedings of the IEEE International Conference on Big Data, Seattle, WA, USA, 10–13 December 2018. [Google Scholar]
- Giordano, D.; Traverso, S.; Grimaudo, L.; Mellia, M.; Baralis, E.; Tongaonkar, A.; Saha, S. YouLighter: A cognitive approach to unveil YouTube CDN and changes. IEEE Trans. Cogn. Commun. Netw. 2015, 1, 161–174. [Google Scholar] [CrossRef]
- Morichetta, A.; Mellia, M. Clustering and evolutionary approach for longitudinal web traffic analysis. Perform. Eval. 2019, 135, 102033. [Google Scholar] [CrossRef]
- Giordano, D.; Traverso, S.; Grimaudo, L.; Mellia, M.; Baralis, E.; Tongaonkar, A.; Saha, S. Youlighter: An unsupervised methodology to unveil youtube cdn changes. In Proceedings of the 2015 27th International Teletraffic Congress, Ghent, Belgium, 8–10 September 2015; pp. 19–27. [Google Scholar]
- Chen, H.; Yin, H.; Li, X.; Wang, M.; Chen, W.; Chen, T. People opinion topic model: Opinion based user clustering in social networks. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017. [Google Scholar]
- Li, P.; Dau, H.; Puleo, G.; Milenkovic, O. Motif clustering and overlapping clustering for social network analysis. In Proceedings of the IEEE INFOCOM 2017-IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017. [Google Scholar]
- Curiskis, S.A.; Drake, B.; Osborn, T.R.; Kennedy, P.J. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inf. Process. Manag. 2020, 57, 102034. [Google Scholar] [CrossRef]
- Huang, L.; Shea, A.L.; Qian, H.; Masurkar, A.; Deng, H.; Liu, D. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 2019, 99, 103291. [Google Scholar] [CrossRef] [Green Version]
- Yelipe, U.; Porika, S.; Golla, M. An efficient approach for imputation and classification of medical data values using class-based clustering of medical records. Comput. Electr. Eng. 2018, 66, 487–504. [Google Scholar] [CrossRef]
- Sun, W.; Cai, Z.; Liu, F.; Fang, S.; Wang, G. A survey of data mining technology on electronic medical records. In Proceedings of the IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, China, 12–15 October 2017. [Google Scholar]
- Hautamaki, V.; Nykanen, P.; Franti, P. Time-series clustering by approximate prototypes. In Proceedings of the 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008. [Google Scholar]
- Ghassempour, S.; Girosi, F.; Maeder, A. Clustering multivariate time series using hidden Markov models. Int. J. Environ. Res. Public Health 2014, 11, 2741–2763. [Google Scholar] [CrossRef] [Green Version]
- Zakaria, J.; Mueen, A.; Keogh, E. Clustering time series using unsupervised-shapelets. In Proceedings of the IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012. [Google Scholar]
- Rakthanmanon, T.; Keogh, E.J.; Lonardi, S.; Evans, S. MDL-based time series clustering. Knowl. Inf. Syst. 2012, 33, 371–399. [Google Scholar] [CrossRef]
- Ding, H.; Trajcevski, G.; Scheuermann, P.; Wang, X.; Keogh, E. Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. Proc. Vldb Endow. 2008, 1, 1542–1552. [Google Scholar] [CrossRef] [Green Version]
- Vlachos, M.; Lin, J.; Keogh, E.; Gunopulos, D. A wavelet-based anytime algorithm for k-means clustering of time series. In Proceedings of the Workshop on Clustering High Dimensionality Data and Its Applications, San Francisco, CA, USA, 3 May 2003. [Google Scholar]
- Wang, X.; Smith, K.A.; Hyndman, R.J. Dimension reduction for clustering time series using global characteristics. In Proceedings of the International Conference on Computational Science, Atlanta, GA, USA, 22–25 May 2005. [Google Scholar]
- Abonyi, J.; Feil, B.; Nemeth, S.; Arva, P. Principal component analysis based time series segmentation. In Proceedings of the IEEE International Conference on Computational Cybernetics, Hotel Le Victoria, Mauritius, 13–16 April 2005. [Google Scholar]
- Fu, T.c.; Chung, F.l.; Ng, V.; Luk, R. Pattern discovery from stock time series using self-organizing maps. In Proceedings of the Workshop Notes of KDD2001 Workshop on Temporal Data Mining, San Francisco, CA, USA, 26–29 August 2001. [Google Scholar]
- Kumar, M.; Patel, N.R.; Woo, J. Clustering seasonality patterns in the presence of errors. In Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002. [Google Scholar]
- Dias, J.G.; Vermunt, J.K.; Ramos, S. Clustering financial time series: New insights from an extended hidden Markov model. Eur. J. Oper. Res. 2015, 243, 852–864. [Google Scholar] [CrossRef] [Green Version]
- Sadahiro, Y.; Kobayashi, T. Exploratory analysis of time series data: Detection of partial similarities, clustering, and visualization. Comput. Environ. Urban Syst. 2014, 45, 24–33. [Google Scholar] [CrossRef]
- Ji, M.; Xie, F.; Ping, Y. A Dynamic Fuzzy Cluster Algorithm for Time Series; Abstract and Applied Analysis; Hindawi: London, UK, 2013; Volume 2013. [Google Scholar]
- Horenko, I. On clustering of non-stationary meteorological time series. Dyn. Atmos. Ocean. 2010, 49, 164–187. [Google Scholar] [CrossRef]
- Wismüller, A.; Lange, O.; Dersch, D.R.; Leinsinger, G.L.; Hahn, K.; Pütz, B.; Auer, D. Cluster analysis of biomedical image time-series. Int. J. Comput. Vis. 2002, 46, 103–128. [Google Scholar] [CrossRef]
- Möller-Levet, C.S.; Klawonn, F.; Cho, K.H.; Wolkenhauer, O. Fuzzy clustering of short time-series and unevenly distributed sampling points. In Proceedings of the International Symposium on Intelligent Data Analysis, Berlin, Germany, 28–30 August 2003. [Google Scholar]
- Liao, T.W. Clustering of time series data—A survey. Pattern Recognit. 2005, 38, 1857–1874. [Google Scholar] [CrossRef]
- Javed, A.; Lee, B.S.; Rizzo, D.M. A benchmark study on time series clustering. Mach. Learn. Appl. 2020, 1, 100001. [Google Scholar] [CrossRef]
- Bukhsh, Z.A.; Saeed, A.; Stipanovic, I.; Doree, A.G. Predictive maintenance using tree-based classification techniques: A case of railway switches. Transp. Res. Part C Emerg. Technol. 2019, 101, 35–54. [Google Scholar] [CrossRef]
- Renga, D.; Apiletti, D.; Giordano, D.; Nisi, M.; Huang, T.; Zhang, Y.; Mellia, M.; Baralis, E. Data-driven exploratory models of an electric distribution network for fault prediction and diagnosis. Computing 2020, 1, 1–13. [Google Scholar] [CrossRef]
- Verhagen, W.J.; De Boer, L.W. Predictive maintenance for aircraft components using proportional hazard models. J. Ind. Inf. Integr. 2018, 12, 23–30. [Google Scholar] [CrossRef]
- Markudova, D.; Mishra, S.; Cagliero, L.; Vassio, L.; Mellia, M.; Baralis, E.; Salvatori, L.; Loti, R. Preventive maintenance for heterogeneous industrial vehicles with incomplete usage data. Comput. Ind. 2021, 130, 103468. [Google Scholar] [CrossRef]
- Giordano, D.; Pastor, E.; Giobergia, F.; Cerquitelli, T.; Baralis, E.; Mellia, M.; Neri, A.; Tricarico, D. Dissecting a Data-driven Prognostic Pipeline: A Powertrain use case. Expert Syst. Appl. 2021, 180, 115109. [Google Scholar] [CrossRef]
- Tessaro, I.; Mariani, V.C.; Coelho, L.d.S. Machine Learning Models Applied to Predictive Maintenance in Automotive Engine Components. Multidiscip. Digit. Publ. Inst. Proc. 2020, 64, 26. [Google Scholar]
- Glowacz, A. Acoustic fault analysis of three commutator motors. Mech. Syst. Signal Process. 2019, 133, 106226. [Google Scholar] [CrossRef]
- Panicucci, S.; Nikolakis, N.; Cerquitelli, T.; Ventura, F.; Proto, S.; Macii, E.; Makris, S.; Bowden, D.; Becker, P.; O’Mahony, N.; et al. A Cloud-to-Edge Approach to Support Predictive Analytics in Robotics Industry. Electronics 2020, 9, 492. [Google Scholar] [CrossRef] [Green Version]
- Bousdekis, A.; Lepenioti, K.; Apostolou, D.; Mentzas, G. A Review of Data-Driven Decision-Making Methods for Industry 4.0 Maintenance Applications. Electronics 2021, 10, 828. [Google Scholar] [CrossRef]
- Uhlmann, E.; Pontes, R.P.; Geisert, C.; Hohwieler, E. Cluster identification of sensor data for predictive maintenance in a Selective Laser Melting machine tool. Procedia Manuf. 2018, 24, 60–65. [Google Scholar] [CrossRef]
- Amruthnath, N.; Gupta, T. A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In Proceedings of the 5th International Conference on Industrial Engineering and Applications, Singapore, 26–28 April 2018. [Google Scholar]
- Kanawaday, A.; Sane, A. Machine learning for predictive maintenance of industrial machines using IoT sensor data. In Proceedings of the 8th IEEE International Conference on Software Engineering and Service Science, Beijing, China, 24–26 November 2017. [Google Scholar]
- Kulkarni, K.; Devi, U.; Sirighee, A.; Hazra, J.; Rao, P. Predictive maintenance for supermarket refrigeration systems using only case temperature data. In Proceedings of the Annual American Control Conference, Milwaukee, WI, USA, 27–29 June 2018. [Google Scholar]
- Jimenez-Cortadi, A.; Irigoien, I.; Boto, F.; Sierra, B.; Rodriguez, G. Predictive maintenance on the machining process and machine tool. Appl. Sci. 2020, 10, 224. [Google Scholar] [CrossRef] [Green Version]
- Jie, Y.; Qiang, Y. Integrating hidden Markov models and spectral analysis for sensory time series clustering. In Proceedings of the Fifth IEEE International Conference on Data Mining, Houston, TX, USA, 27–30 November 2005. [Google Scholar]
- Tan, P.N.; Steinbach, M.; Karpatne, A.; Kumar, V. Introduction to Data Mining, 2nd ed.; Pearson: London, UK, 2018. [Google Scholar]
- Yeung, K.Y.; Ruzzo, W.L. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics 2001, 17, 763–774. [Google Scholar] [CrossRef] [PubMed]
- Scikit Learn. Adjusted Rand Score. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html (accessed on 7 April 2021).
Parameter | Description |
---|---|
i | When the weld was performed |
The welding spot | |
Time-series describing the current curve | |
Time-series describing the voltage curve | |
w | Percentage of tip dressing operations with respect to the expected number of tip dressing operation per electrode |
Working Station | # Clamps | # Welds | Calibration Program ID |
---|---|---|---|
Working Station 1 | 1 | 22,746 | 3 |
Working Station 2 | 1 | 26,546 | 4 |
Working Station 2 | 2 | 89,857 | 4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Giordano, D.; Mellia, M.; Cerquitelli, T. K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm. Electronics 2021, 10, 1166. https://doi.org/10.3390/electronics10101166
Giordano D, Mellia M, Cerquitelli T. K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm. Electronics. 2021; 10(10):1166. https://doi.org/10.3390/electronics10101166
Chicago/Turabian StyleGiordano, Danilo, Marco Mellia, and Tania Cerquitelli. 2021. "K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm" Electronics 10, no. 10: 1166. https://doi.org/10.3390/electronics10101166
APA StyleGiordano, D., Mellia, M., & Cerquitelli, T. (2021). K-MDTSC: K-Multi-Dimensional Time-Series Clustering Algorithm. Electronics, 10(10), 1166. https://doi.org/10.3390/electronics10101166