Combining Log Files and Monitoring Data to Detect Anomaly Patterns in a Data Center
Abstract
:1. Introduction
- RQ1.
- Can we get service anomalies by considering log messages?
- RQ2.
- Are there NLP techniques that can automatically provide information on the state of a service?
- RQ3.
- Can we get machine’s state by looking at monitoring metrics’ data?
- RQ4.
- Can we relate log and monitoring data to determine anomalous behaviour at machine level?
2. Related Works
2.1. Log Data
2.2. Monitoring Data
2.3. Multi-Sources Data
3. Source Data
3.1. Log Files
3.2. Monitoring Metric Files
4. Methodology Overview
Project Implementation
5. Log Anomaly Detection
5.1. Data Preprocessing
5.2. Creation of the Anomaly Dictionary
5.3. Feature Matrix
5.4. Clustering Algorithm
6. Anomaly Detection on Monitoring Metrics
Anomaly Scores Resulting from JumpStarter
7. Combining Anomalies at Machine Level
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Farshchi, M.; Schneider, J.G.; Weber, I.; Grundy, J. Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. In Proceedings of the 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE), Gaithersbury, MD, USA, 2–5 November 2015. [Google Scholar] [CrossRef]
- Cavallaro, C.; Ronchieri, E. Identifying anomaly detection patterns from log files: A dynamic approach. In Computational Science and Its Applications—ICCSA 2021; Gervasi, O., Murgante, B., Misra, S., Garau, C., Blečić, I., Taniar, D., Apduhan, B.O., Rocha, A.M.A., Tarantino, E., Torre, C.M., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 517–532. [Google Scholar]
- Ma, M.; Zhang, S.; Chen, J.; Xu, J.; Li, H.; Lin, Y.; Nie, X.; Zhou, B.; Wang, Y.; Pei, D. Jump-starting multivariate time series anomaly detection for online service systems. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 21), Virtual, 14–16 July 2021; pp. 413–426. [Google Scholar]
- Bertero, C.; Roy, M.; Sauvanaud, C.; Trédan, G. Experience report: Log mining using natural language processing and application to anomaly detection. In Proceedings of the 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), Toulouse, France, 23–26 October 2017; pp. 351–360. [Google Scholar]
- Wang, M.; Xu, L.; Guo, L. Anomaly detection of system logs based on natural language processing and deep learning. In Proceedings of the 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), Poitiers, France, 24–27 September 2018; pp. 140–144. [Google Scholar]
- Layer, L.; Abercrombie, D.R.; Bakhshiansohi, H.; Adelman-McCarthy, J.; Agarwal, S.; Hernandez, A.V.; Si, W.; Vlimant, J.R. Automatic log analysis with NLP for the CMS workflow handling. In Proceedings of the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), Adelaide, Australia, 4–8 November 2020. [Google Scholar] [CrossRef]
- Zeufack, V.; Kim, D.; Seo, D.; Lee, A. An unsupervised anomaly detection framework for detecting anomalies in real time through network system’s log files analysis. High-Confid. Comput. 2021, 1, 100030. [Google Scholar] [CrossRef]
- Bursic, S.; Cuculo, V.; D’Amelio, A. Anomaly detection from log files using unsupervised deep learning. In Proceedings of the International Symposium on Formal Methods, Porto, Portugal, 7–11 October 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 200–207. [Google Scholar]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Huang, J.; Chai, Z.; Zhu, H. Detecting anomalies in data center physical infrastructures using statistical approaches. J. Phys. Conf. Ser. Iop Publ. 2019, 1176, 22056. [Google Scholar] [CrossRef]
- Gabel, M.; Schuster, A.; Gilad-Bachrach, R. Unsupervised Anomaly Detection in Large Datacenters. Ph.D. Thesis, Computer Science Department, Technion, Haifa, Israel, 2013. [Google Scholar]
- Wang, C.; Viswanathan, K.; Choudur, L.; Talwar, V.; Satterfield, W.; Schwan, K. Statistical techniques for online anomaly detection in data centers. In Proceedings of the 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops, Dublin, Ireland, 23–27 May 2011; pp. 385–392. [Google Scholar]
- Decker, L.; Leite, D.; Giommi, L.; Bonacorsi, D. Real-time anomaly detection in data centers for log-based predictive maintenance using an evolving fuzzy-rule-based approach. In Proceedings of the 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A novel multi-source information-fusion predictive framework based on deep neural networks for accuracy enhancement in stock market prediction. J. Big Data 2021, 8, 17. [Google Scholar] [CrossRef]
- Lee, J.; Yi, J.S. Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining. Appl. Sci. 2017, 7, 1141. [Google Scholar] [CrossRef]
- dell’Agnello, L.; Boccali, T.; Cesini, D.; Chiarelli, L.; Chierici, A.; Dal Pra, S.; Girolamo, D.; Falabella, A.; Fattibene, E.; Maron, G.; et al. INFN Tier–1: A distributed site. EPJ Web Conf. 2019, 214, 8002. [Google Scholar] [CrossRef]
- Breskin, R.V.A. The CERN Large Hadron Collider: Accelerator and Experiments Volume 2: CMS, LHCb, LHCf, and Totem; CERN: Meyrin, Switzerland, 2009. [Google Scholar]
- Bovina, S.; Michelotto, D. The evolution of monitoring system: The INFN-CNAF case study. J. Phys. Conf. Ser. 2017, 898, 92029. [Google Scholar] [CrossRef]
- He, P.; Chen, Z.; He, S.; Lyu, M.R. Characterizing the natural language descriptions in software logging statements. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering ASE, Rochester, MI, USA, 26 September–1 October 2018; pp. 178–189. [Google Scholar] [CrossRef]
- Chen, B.; Jiang, Z.M.J. Characterizing and detecting anti-patterns in the logging code. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, Argentina, 20–28 May 2017; pp. 71–81. [Google Scholar] [CrossRef]
- NLP. Natural Language Toolkit. Available online: https://www.nltk.org/ (accessed on 16 July 2022).
- Scikit Learn. CountVectorizer in Scilit Learn. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html (accessed on 15 July 2022).
- pyLDAvis. Available online: https://pyldavis.readthedocs.io/en/latest/readme.html (accessed on 16 July 2022).
- Sandhu, A.; Mohammed, S. Detecting Anomalies in Logs by Combining NLP features with Embedding or TF-IDF. TechRxiv 2022. [Google Scholar] [CrossRef]
- Dai, H.; Li, H.; Chen, C.S.; Shang, W.; Chen, T.H. Logram: Efficient Log Parsing Using n-Gram Dictionaries. IEEE Trans. Softw. Eng. 2020, 48, 1. [Google Scholar] [CrossRef]
- Blei, D.M.; Carin, L.; Dunson, D. Probabilistic topic models. IEEE Signal Process. Mag. 2010, 55, 77–84. [Google Scholar] [CrossRef]
Filename | Frequency | Filename | Frequency | Filename | Frequency |
---|---|---|---|---|---|
sudo.log | 378,781 | systemd.log | 107,700 | userhelper.log | 21,380 |
puppet-agent.log | 368,530 | mmfs.log | 72,620 | nslcd.log | 20,544 |
run-parts.log | 365,734 | rsyslogd.log | 70,210 | neutron_linuxbridge.log | 8572 |
crontab.log | 348,896 | kernel.log | 65,938 | runuser.log | 6859 |
crond.log | 347,708 | logrotate.log | 62,531 | cvmfs_x509_validator.log | 6031 |
sshd.log | 303,919 | syslog.log | 47,330 | cvmfs_x509_helper.log | 5399 |
anacron.log | 287,419 | yum.log | 43,301 | srp_daemon.log | 4938 |
postfix.log | 175,558 | fusinv-agent.log | 42,125 | edg-mkgridmap.log | 4083 |
auditd.log | 120,473 | root.log | 37,345 | libvirtd.log | 3328 |
smartd.log | 109,441 | gpfs.log | 31,000 | dbus.log | 3301 |
Category | Metrics | Category | Metrics | Category | Metrics |
---|---|---|---|---|---|
Linux load average | 1 min | Memory | swap free | iostat average | cpu pct iowait |
Linux load average | 5 min | Memory | swap total | iostat average | cpu pct nice |
Linux load average | 15 min | Memory | swap used | iostat average | cpu pct steal |
Memory | available | iostat average | cpu pct system | ||
Memory | buffers | iostat average | cpu pct user | ||
Memory | cached | iostat average | cpu pct idle | ||
Memory | dirty | ||||
Memory | free | ||||
Memory | used | ||||
Memory | total |
Log Event msg | Log Event Type | Anomaly Key Term |
---|---|---|
.. reset error counters | error | reset |
.. failed create session connection time out | fail | time out |
Log Event msg | .. | Error | .. | Failed | .. | Connection Time | .. |
---|---|---|---|---|---|---|---|
.. reset error counters | .. | 1 | .. | 0 | .. | 0 | .. |
.. failed create session connection time out | .. | 0 | .. | 1 | .. | 1 | .. |
Date | Time | Hostname | Process_Name | msg | Cluster |
---|---|---|---|---|---|
21 January 2021 | 09:12:53 | ui-tier1 | screen | pam_krb5[19197]: TGT verified | 0 |
21 January 2021 | 09:12:53 | ui-tier1 | screen | pam_krb5[21445]: got error -1 (Unknown code ____ 255) while obtaining tokens for infn.it | 4 |
21 January 2021 | 09:12:53 | ui-tier1 | screen | pam_krb5[19197]: authentication succeeds for ’username’ ([email protected]) | 0 |
6 August 2020 | 12:27:22 | ui02-virgo | screen | pam_unix(screen:auth): authentication failure | 1 |
6 August 2020 | 12:27:22 | ui02-virgo | screen | pam_krb5[24018]: authentication fails for ’username’ ([email protected]): Authentication failure (Decrypt integrity check failed) | 2 |
6 August 2020 | 12:27:22 | ui02-virgo | screen | pam_ldap(screen:auth): Authentication failure | 1 |
6 August 2020 | 12:27:34 | ui02-virgo | screen | pam_krb5[24018]: authentication fails for ’username’ ([email protected]): Authentication failure (Decrypt integrity check failed) | 2 |
6 August 2020 | 12:27:34 | ui02-virgo | screen | pam_ldap(screen:auth): Authentication failure | 1 |
6 August 2020 | 12:27:50 | ui02-virgo | screen | pam_krb5[24018]: error reading keytab ’FILE:/etc/krb5.keytab’ | 3 |
6 August 2020 | 12:27:50 | ui02-virgo | screen | pam_krb5[25026]: got error -1 (Unknown code ____ 255) while obtaining tokens for infn.it | 4 |
Date | Time | Host_Name | Process_Name | msg | Cluster |
---|---|---|---|---|---|
7 May 2021 | 17:33:50 | tb-cloud-net01 | auditd | Audit daemon is suspending logging due to previously mentioned write error | 1 |
6 May 2021 | 17:33:50 | tb-cloud-net01 | auditd | Audit daemon is suspending logging due to previously mentioned write error | 1 |
Index | Date | Time | Host_Name | Process_Name | msg | Cluster |
---|---|---|---|---|---|---|
0 | 7 May 2021 | 23:32:51 | tb-cloud-net01 | journal | Suppressed 18,739 messages from / | 2 |
1 | 7 May 2021 | 23:53:56 | tb-cloud-net01 | journal | Suppressed 5738 messages from / | 2 |
2 | 7 May 2021 | 23:53:56 | tb-cloud-net01 | journal | Suppressed 5672 messages from /system.slice/boot.mount | 2 |
3 | 7 May 2021 | 23:53:51 | tb-cloud-net01 | journal | Suppressed 19,279 messages from / | 2 |
... | ||||||
229 | 6 May 2021 | 23:32:26 | tb-cloud-net01 | journal | Suppressed 5640 messages from /system.slice/boot.mount | 2 |
Index | Date | Time | Host_Name | Process_Name | msg | Cluster |
---|---|---|---|---|---|---|
0 | 7 May 2021 | 23:54:30 | tb-cloud-net01 | sensu-service | {“level“:“error“,“message“:“log file is not writable“, “log_file“:“/var/log/sensu/sensu-client.log“} | 11 |
1 | 7 May 2021 | 23:54:30 | tb-cloud-net01 | sensu-service | {“level“:“warn“,“message“:“config file does not exist or is not readable“, “file“:“/etc/sensu/config.json“} | 12 |
2 | 7 May 2021 | 23:54:30 | tb-cloud-net01 | sensu-service | {“level“:“warn“,“message“:“ignoring config file“, “file“:“/etc/sensu/config.json“} | 13 |
3 | 7 May 2021 | 23:54:30 | tb-cloud-net01 | sensu-service | {“level“:“warn“,“message“:“loading config files from directory“, “directory“:“/etc/sensu/conf.d“} | 14 |
4 | 7 May 2021 | 23:54:30 | tb-cloud-net01 | sensu-service | {“level“:“warn“,“message“:“loading config file“, “file“:“/etc/sensu/conf.d/smart.json“} | 15 |
... | ||||||
2881 | 7 May 2021 | 02:53:44 | tb-cloud-net01 | sensu-service | {“level“:“warn“,“message“:“loading config file“, “file“:“/etc/sensu/conf.d/subscription_smartctl-os.json“} | 15 |
2882 | 7 May 2021 | 02:53:44 | tb-cloud-net01 | sensu-service | {“level“:“warn“,“message“:“config file applied changes“, “file“:“/etc/sensu/conf.d/subscription_smartctl-os.json“,“changes“:{}} | 16 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Viola, L.; Ronchieri, E.; Cavallaro, C. Combining Log Files and Monitoring Data to Detect Anomaly Patterns in a Data Center. Computers 2022, 11, 117. https://doi.org/10.3390/computers11080117
Viola L, Ronchieri E, Cavallaro C. Combining Log Files and Monitoring Data to Detect Anomaly Patterns in a Data Center. Computers. 2022; 11(8):117. https://doi.org/10.3390/computers11080117
Chicago/Turabian StyleViola, Laura, Elisabetta Ronchieri, and Claudia Cavallaro. 2022. "Combining Log Files and Monitoring Data to Detect Anomaly Patterns in a Data Center" Computers 11, no. 8: 117. https://doi.org/10.3390/computers11080117
APA StyleViola, L., Ronchieri, E., & Cavallaro, C. (2022). Combining Log Files and Monitoring Data to Detect Anomaly Patterns in a Data Center. Computers, 11(8), 117. https://doi.org/10.3390/computers11080117