Next Article in Journal
Special Issue: Advancement in Biomonitoring and Remediation Treatments of Pollutants in Aquatic Environments
Next Article in Special Issue
Fractional-Order PIλDμ Controller Using Adaptive Neural Fuzzy Model for Course Control of Underactuated Ships
Previous Article in Journal
Modelling Soil Temperature by Tree-Based Machine Learning Methods in Different Climatic Regions of China
Previous Article in Special Issue
Tool for Predicting College Student Career Decisions: An Enhanced Support Vector Machine Framework
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Anomaly Detection in Log Files Using Selected Natural Language Processing Methods

Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(10), 5089; https://doi.org/10.3390/app12105089
Submission received: 21 April 2022 / Revised: 16 May 2022 / Accepted: 17 May 2022 / Published: 18 May 2022
(This article belongs to the Special Issue Soft Computing Application to Engineering Design)

Abstract

In this article, we address the problem of detecting anomalies in system log files. Computer systems generate huge numbers of events, which are noted in event log files. While most of them report normal actions, an unusual entry may inform about a failure or malware infection. A human operator may easily miss such an entry; therefore, anomaly detection methods are used for this purpose. In our work, we used an approach known from the natural language processing (NLP) domain, which operates on so-called embeddings, that is vector representations of words or phrases. We describe an improved version of the LogEvent2Vec algorithm, proposed in 2020. In contrast to the original version, we propose a significant shortening of the analysis window, which both increased the accuracy of anomaly detection and made further analysis of suspicious sequences much easier. We experimented with various binary classifiers, such as decision trees or multilayer perceptrons (MLPs), and the Blue Gene/L dataset. We showed that selecting an optimal classifier (in this case, MLP) and a short log sequence gave very good results. The improved version of the algorithm yielded the best F1-score of 0.997, compared to 0.886 in the original version of the algorithm.
Keywords: log analysis; natural language processing; anomaly detection; malware; word embeddings; fastText log analysis; natural language processing; anomaly detection; malware; word embeddings; fastText

Share and Cite

MDPI and ACS Style

Ryciak, P.; Wasielewska, K.; Janicki, A. Anomaly Detection in Log Files Using Selected Natural Language Processing Methods. Appl. Sci. 2022, 12, 5089. https://doi.org/10.3390/app12105089

AMA Style

Ryciak P, Wasielewska K, Janicki A. Anomaly Detection in Log Files Using Selected Natural Language Processing Methods. Applied Sciences. 2022; 12(10):5089. https://doi.org/10.3390/app12105089

Chicago/Turabian Style

Ryciak, Piotr, Katarzyna Wasielewska, and Artur Janicki. 2022. "Anomaly Detection in Log Files Using Selected Natural Language Processing Methods" Applied Sciences 12, no. 10: 5089. https://doi.org/10.3390/app12105089

APA Style

Ryciak, P., Wasielewska, K., & Janicki, A. (2022). Anomaly Detection in Log Files Using Selected Natural Language Processing Methods. Applied Sciences, 12(10), 5089. https://doi.org/10.3390/app12105089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop