Unsupervised Anomaly Detection

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 December 2022) | Viewed by 48106

Special Issue Editor


E-Mail Website
Guest Editor
Faculty of Computer Science, Ulm University of Applied Sciences, 89075 Ulm, Germany
Interests: anomaly detection; data science; machine learning; deep learning; NoSQL

Special Issue Information

Anomaly detection (also known as outlier detection) is the task of finding instances in a dataset which deviate from the norm. Anomalies are often of specific interest in many real-world analytic tasks, since they can refer to incidents requiring special attention. Among others, intrusion detection, payment fraud detection, public safety, complex system monitoring, and medical data analytics are possible application domains. Typically, anomaly detection is performed in an unsupervised setting, because no labeled training data are available. This causes many challenges in the research area, including a fair evaluation of algorithms, combing different algorithms (“outlier ensembles”) in a smart way or the interpretability of scores.

Potential topics of interest for this Special Issue include (but are not limited to) the following areas:

  • New or improved unsupervised anomaly detection algorithms;
  • Deep learning for anomaly detection;
  • Interpretability of scores;
  • Outlier ensembles;
  • Unsupervised anomaly detection datasets for benchmarks and quality assessments;
  • Applications of unsupervised anomaly detection, for example, surveillance, intrusion detection, fraud detection, medical applications or monitoring applications;
  • Anomaly detection in time series/ images/ video and text data;
  • Semi-supervised anomaly detection (also known as one-class classification).

Prof. Dr. Markus Goldstein
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • anomaly detection
  • outlier detection
  • novelty detection
  • outlier ensembles
  • evaluation of unsupervised anomaly detection
  • time series anomaly detection
  • deep learning for anomaly detection
  • unsupervised learning
  • one-class classification

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

3 pages, 171 KiB  
Editorial
Special Issue on Unsupervised Anomaly Detection
by Markus Goldstein
Appl. Sci. 2023, 13(10), 5916; https://doi.org/10.3390/app13105916 - 11 May 2023
Cited by 2 | Viewed by 1402
Abstract
Anomaly detection (also known as outlier detection) is the task of finding instances in a dataset which deviate markedly from the norm [...] Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)

Research

Jump to: Editorial, Review

21 pages, 2996 KiB  
Article
Phase I Analysis of Nonlinear Profiles Using Anomaly Detection Techniques
by Chuen-Sheng Cheng, Pei-Wen Chen and Yu-Tang Wu
Appl. Sci. 2023, 13(4), 2147; https://doi.org/10.3390/app13042147 - 07 Feb 2023
Cited by 1 | Viewed by 1024
Abstract
In various industries, the process or product quality is evaluated by a functional relationship between a dependent variable y and one or a few input variables x, expressed as y=fx. This relationship is called a profile in the [...] Read more.
In various industries, the process or product quality is evaluated by a functional relationship between a dependent variable y and one or a few input variables x, expressed as y=fx. This relationship is called a profile in the literature. Recently, profile monitoring has received a lot of research attention. In this study, we formulated profile monitoring as an anomaly-detection problem and proposed an outlier-detection procedure for phase I nonlinear profile analysis. The developed procedure consists of three key processes. First, we obtained smoothed nonlinear profiles using the spline smoothing method. Second, we proposed a method for estimating the proportion of outliers in the dataset. A distance-based decision function was developed to identify potential outliers and provide a rough estimate of the contamination rate. Finally, PCA was used as a dimensionality reduction method. An outlier-detection algorithm was then employed to identify outlying profiles based on the estimated contamination rate. The algorithms considered in this study included Local Outlier Factor (LOF), Elliptic Envelope (EE), and Isolation Forest (IF). The proposed procedure was evaluated using a nonlinear profile that has been studied by various researchers. We compared various competing methods based on commonly used metrics such as type I error, type II error, and F2 score. Based on the evaluation metrics, our experimental results indicate that the performance of the proposed method is better than other existing methods. When considering the smallest and hardest-to-detect variation, the LOF algorithm, with the contamination rate determined by the method proposed in this study, achieved type I errors, type II errors, and F2 scores of 0.049, 0.001, and 0.951, respectively, while the performance metrics of the current best method were 0.081, 0.015, and 0.899, respectively. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

19 pages, 37316 KiB  
Article
Anomaly Detection Method for Multivariate Time Series Data of Oil and Gas Stations Based on Digital Twin and MTAD-GAN
by Yuanfeng Lian, Yueyao Geng and Tian Tian
Appl. Sci. 2023, 13(3), 1891; https://doi.org/10.3390/app13031891 - 01 Feb 2023
Cited by 13 | Viewed by 3332
Abstract
Due to the complexity of the oil and gas station system, the operational data, with various temporal dependencies and inter-metric dependencies, has the characteristics of diverse patterns, variable working conditions and imbalance, which brings great challenges to multivariate time series anomaly detection. Moreover, [...] Read more.
Due to the complexity of the oil and gas station system, the operational data, with various temporal dependencies and inter-metric dependencies, has the characteristics of diverse patterns, variable working conditions and imbalance, which brings great challenges to multivariate time series anomaly detection. Moreover, the time-series reconstruction information of data from digital twin space can be used to identify and interpret anomalies. Therefore, this paper proposes a digital twin-driven MTAD-GAN (Multivariate Time Series Data Anomaly Detection with GAN) oil and gas station anomaly detection method. Firstly, the operational framework consisting of digital twin model, virtual-real synchronization algorithm, anomaly detection strategy and realistic station is constructed, and an efficient virtual-real mapping is achieved by embedding a stochastic Petri net (SPN) to describe the station-operating logic of behavior. Secondly, based on the potential correlation and complementarity among time series variables, we present a MTAD-GAN anomaly detection method to reconstruct the error of multivariate time series by combining mechanism of knowledge graph attention and temporal Hawkes attention to judge the abnormal samples by a given threshold. The experimental results show that the digital twin-driven anomaly detection method can achieve accurate identification of anomalous data with complex patterns, and the performance of MTAD-GAN anomaly detection is improved by about 2.6% compared with other methods based on machine learning and deep learning, which proves the effectiveness of the method. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

25 pages, 3039 KiB  
Article
Is It Worth It? Comparing Six Deep and Classical Methods for Unsupervised Anomaly Detection in Time Series
by Ferdinand Rewicki, Joachim Denzler and Julia Niebling
Appl. Sci. 2023, 13(3), 1778; https://doi.org/10.3390/app13031778 - 30 Jan 2023
Cited by 12 | Viewed by 3548
Abstract
Detecting anomalies in time series data is important in a variety of fields, including system monitoring, healthcare and cybersecurity. While the abundance of available methods makes it difficult to choose the most appropriate method for a given application, each method has its strengths [...] Read more.
Detecting anomalies in time series data is important in a variety of fields, including system monitoring, healthcare and cybersecurity. While the abundance of available methods makes it difficult to choose the most appropriate method for a given application, each method has its strengths in detecting certain types of anomalies. In this study, we compare six unsupervised anomaly detection methods of varying complexity to determine whether more complex methods generally perform better and if certain methods are better suited to certain types of anomalies. We evaluated the methods using the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We analyzed the results on a dataset and anomaly-type level after adjusting the necessary hyperparameters for each method. Additionally, we assessed the ability of each method to incorporate prior knowledge about anomalies and examined the differences between point-wise and sequence-wise features. Our experiments show that classical machine learning methods generally outperform deep learning methods across a range of anomaly types. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

18 pages, 5752 KiB  
Article
Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data
by Sameer Kumar Jasra, Gianluca Valentino, Alan Muscat and Robert Camilleri
Appl. Sci. 2022, 12(20), 10261; https://doi.org/10.3390/app122010261 - 12 Oct 2022
Cited by 3 | Viewed by 1753
Abstract
This paper investigates the use of an unsupervised hybrid statistical–local outlier factor algorithm to detect anomalies in time-series flight data. Flight data analysis is an activity carried out by airlines primarily as a means of improving the safety and operation of their fleet. [...] Read more.
This paper investigates the use of an unsupervised hybrid statistical–local outlier factor algorithm to detect anomalies in time-series flight data. Flight data analysis is an activity carried out by airlines primarily as a means of improving the safety and operation of their fleet. Traditionally, this is performed by checking exceedances in pre-set limits to the flight data parameters. However, this method highlights single events during a flight, making this analysis laborious. The process also fails to establish trends or reflect potential unknown hazards. This research took advantage of machine learning techniques to recognize patterns in large datasets by implementing the local outlier factor (LOF). In order to minimize human input, a statistical approach was adopted to establish the threshold value above which the flights are considered to be anomalous and interpret the scores. This paper shows that LOF quantifies the degree of outlier-ness of an outlier rather than binary categorizing a point into inlier or outlier, as in the case of clustering algorithms. Thus, with LOF, for the first time, we demonstrated that in the aviation industry, anomalous flights could not only be identified but also be given an anomaly score to compare two anomalous flights in an unsupervised manner. Furthermore, LOF helps to track anomalous behavior in time during the flight. This is insightful when a flight is abnormal, only for some seconds or short duration. For the first time, we attempted to detect flight parameters responsible for anomalous behavior or at least give direction to human experts looking for the cause of abnormal behavior. This was all analyzed with real-life flight data in an unsupervised manner in contrast to simulated data. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

14 pages, 477 KiB  
Article
MST-VAE: Multi-Scale Temporal Variational Autoencoder for Anomaly Detection in Multivariate Time Series
by Tuan-Anh Pham, Jong-Hoon Lee and Choong-Shik Park
Appl. Sci. 2022, 12(19), 10078; https://doi.org/10.3390/app121910078 - 07 Oct 2022
Cited by 2 | Viewed by 4224
Abstract
In IT monitoring systems, anomaly detection plays a vital role in detecting and alerting unexpected behaviors timely to system operators. With the growth of signal data in both volumes and dimensions during operation, unsupervised learning turns out to be a great solution to [...] Read more.
In IT monitoring systems, anomaly detection plays a vital role in detecting and alerting unexpected behaviors timely to system operators. With the growth of signal data in both volumes and dimensions during operation, unsupervised learning turns out to be a great solution to trigger anomalies thanks to the feasibility of working well with unlabeled data. In recent years, autoencoder, an unsupervised learning technique, has gained much attention because of its robustness. Autoencoder first compresses input data to lower-dimensional latent representation, which obtains normal patterns, then the compressed data are reconstructed back to the input form to detect abnormal data. In this paper, we propose a practical unsupervised learning approach using Multi-Scale Temporal convolutional kernels with Variational AutoEncoder (MST-VAE) for anomaly detection in multivariate time series data. Our key observation is that combining short-scale and long-scale convolutional kernels to extract various temporal information of the time series can enhance the model performance. Extensive empirical studies on five real-world datasets demonstrate that MST-VAE can outperform baseline methods in effectiveness and efficiency. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

19 pages, 808 KiB  
Article
Functional Outlier Detection by Means of h-Mode Depth and Dynamic Time Warping
by Álvaro Rollón de Pinedo, Mathieu Couplet, Bertrand Iooss, Nathalie Marie, Amandine Marrel, Elsa Merle and Roman Sueur
Appl. Sci. 2021, 11(23), 11475; https://doi.org/10.3390/app112311475 - 03 Dec 2021
Cited by 4 | Viewed by 1689
Abstract
Finding outliers in functional infinite-dimensional vector spaces is widely present in the industry for data that may originate from physical measurements or numerical simulations. An automatic and unsupervised process of outlier identification can help ensure the quality of a dataset (trimming), validate the [...] Read more.
Finding outliers in functional infinite-dimensional vector spaces is widely present in the industry for data that may originate from physical measurements or numerical simulations. An automatic and unsupervised process of outlier identification can help ensure the quality of a dataset (trimming), validate the results of industrial simulation codes, or detect specific phenomena or anomalies. This paper focuses on data originating from expensive simulation codes to take into account the realistic case where only a limited quantity of information about the studied process is available. A detection methodology based on different features, such as h-mode depth or the dynamic time warping, is proposed to evaluate the outlyingness both in the magnitude and shape senses. Theoretical examples are used to identify pertinent feature combinations and showcase the quality of the detection method with respect to state-of-the-art methodologies of detection. Finally, we show the practical interest of the method in an industrial context thanks to a nuclear thermal-hydraulic use case and how it can serve as a tool to perform sensitivity analysis on functional data. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

23 pages, 4186 KiB  
Article
Time Series Anomaly Detection for KPIs Based on Correlation Analysis and HMM
by Zijing Shang, Yingjun Zhang, Xiuguo Zhang, Yun Zhao, Zhiying Cao and Xuejie Wang
Appl. Sci. 2021, 11(23), 11353; https://doi.org/10.3390/app112311353 - 30 Nov 2021
Cited by 6 | Viewed by 3087
Abstract
KPIs (Key Performance Indicators) in distributed systems may involve a variety of anomalies, which will lead to system failure and huge losses. Detecting KPI anomalies in the system is very important. This paper presents a time series anomaly detection method based on correlation [...] Read more.
KPIs (Key Performance Indicators) in distributed systems may involve a variety of anomalies, which will lead to system failure and huge losses. Detecting KPI anomalies in the system is very important. This paper presents a time series anomaly detection method based on correlation analysis and HMM. Correlation analysis is used to obtain the correlation between abnormal KPIs in the system, thereby reducing the false alarm rate of anomaly detection. The HMM (Hidden Markov Model) is used for anomaly detection by finding the close relationship between abnormal KPIs. In our correlation analysis of abnormal KPIs, firstly, the time series prediction model (1D-CNN-TCN) is proposed. The residual sequence is obtained by calculating the residual between the predicted value and the actual value. The residual sequence can highlight the abnormal segment in each data point and improve the accuracy of anomaly screening. According to the obtained residual sequence, these abnormal KPIs are preliminarily screened out from the historical data. Next, KPI correlation analysis is performed, and the correlation score is obtained by adding a sliding window onto the obtained anomaly index residual sequence. The correlation analysis based on the residual sequence can eliminate the interference of the original data fluctuation itself. Then, a correlation matrix of abnormal KPIs is constructed using the obtained correlation scores. In anomaly detection, the constructed correlation matrix is processed to obtain the adaptive parameters of the HMM model, and the trained HMM is used to quickly discover the abnormal KPI that may cause a KPI anomaly. Experiments on public data sets show that the method obtains good results. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

18 pages, 5921 KiB  
Article
Semi-Supervised Time Series Anomaly Detection Based on Statistics and Deep Learning
by Jehn-Ruey Jiang, Jian-Bin Kao and Yu-Lin Li
Appl. Sci. 2021, 11(15), 6698; https://doi.org/10.3390/app11156698 - 21 Jul 2021
Cited by 22 | Viewed by 5886
Abstract
Thanks to the advance of novel technologies, such as sensors and Internet of Things (IoT) technologies, big amounts of data are continuously gathered over time, resulting in a variety of time series. A semi-supervised anomaly detection framework, called Tri-CAD, for univariate time series [...] Read more.
Thanks to the advance of novel technologies, such as sensors and Internet of Things (IoT) technologies, big amounts of data are continuously gathered over time, resulting in a variety of time series. A semi-supervised anomaly detection framework, called Tri-CAD, for univariate time series is proposed in this paper. Based on the Pearson product-moment correlation coefficient and Dickey–Fuller test, time series are first categorized into three classes: (i) periodic, (ii) stationary, and (iii) non-periodic and non-stationary time series. Afterwards, different mechanisms using statistics, wavelet transform, and deep learning autoencoder concepts are applied to different classes of time series for detecting anomalies. The performance of the proposed Tri-CAD framework is evaluated by experiments using three Numenta anomaly benchmark (NAB) datasets. The performance of Tri-CAD is compared with those of related methods, such as STL, SARIMA, LSTM, LSTM with STL, and ADSaS. The comparison results show that Tri-CAD outperforms the others in terms of the precision, recall, and F1-score. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

13 pages, 397 KiB  
Article
Online Forecasting and Anomaly Detection Based on the ARIMA Model
by Viacheslav Kozitsin, Iurii Katser and Dmitry Lakontsev
Appl. Sci. 2021, 11(7), 3194; https://doi.org/10.3390/app11073194 - 02 Apr 2021
Cited by 33 | Viewed by 5155
Abstract
Real-time diagnostics of complex technical systems such as power plants are critical to keep the system in its working state. An ideal diagnostic system must detect any fault in advance and predict the future state of the technical system, so predictive algorithms are [...] Read more.
Real-time diagnostics of complex technical systems such as power plants are critical to keep the system in its working state. An ideal diagnostic system must detect any fault in advance and predict the future state of the technical system, so predictive algorithms are used in the diagnostics. This paper proposes a novel, computationally simple algorithm based on the Auto-Regressive Integrated Moving Average model to solve anomaly detection and forecasting problems. The good performance of the proposed algorithm was confirmed in numerous numerical experiments for both anomaly detection and forecasting problems. Moreover, a description of the Autoregressive Integrated Moving Average Fault Detection (ARIMAFD) library, which includes the proposed algorithms, is provided in this paper. The developed algorithm proves to be an efficient algorithm and can be applied to problems related to anomaly detection and technological parameter forecasting in real diagnostic systems. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

24 pages, 578 KiB  
Article
Low-Cost Active Anomaly Detection with Switching Latency
by Fengfan Qin, Hui Feng, Tao Yang and Bo Hu
Appl. Sci. 2021, 11(7), 2976; https://doi.org/10.3390/app11072976 - 26 Mar 2021
Cited by 3 | Viewed by 1369
Abstract
Consider the problem of detecting anomalies among multiple stochastic processes. Each anomaly incurs a cost per unit time until it is identified. Due to the resource constraints, the decision-maker can select one process to probe and obtain a noisy observation. Each observation and [...] Read more.
Consider the problem of detecting anomalies among multiple stochastic processes. Each anomaly incurs a cost per unit time until it is identified. Due to the resource constraints, the decision-maker can select one process to probe and obtain a noisy observation. Each observation and switching across processes accompany a certain time delay. Our objective is to find a sequential inference strategy that minimizes the expected cumulative cost incurred by all the anomalies during the entire detection procedure under the error constraints. We develop a deterministic policy to solve the problem within the framework of the active hypothesis testing model. We prove that the proposed algorithm is asymptotic optimal in terms of minimizing the expected cumulative costs when the ratio of the single-switching delay to the single-observation delay is much smaller than the declaration threshold and is order-optimal when the ratio is comparable to the threshold. Not only is the proposed policy optimal in the asymptotic regime, but numerical simulations also demonstrate its excellent performance in the finite regime. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

14 pages, 556 KiB  
Article
Outlier Detection with Explanations on Music Streaming Data: A Case Study with Danmark Music Group Ltd.
by Jonas Herskind Sejr, Thorbjørn Christiansen, Nicolai Dvinge, Dan Hougesen, Peter Schneider-Kamp and Arthur Zimek
Appl. Sci. 2021, 11(5), 2270; https://doi.org/10.3390/app11052270 - 04 Mar 2021
Cited by 6 | Viewed by 2694
Abstract
In the digital marketplaces, businesses can micro-monitor sales worldwide and in real-time. Due to the vast amounts of data, there is a pressing need for tools that automatically highlight changing trends and anomalous (outlier) behavior that is potentially interesting to users. In collaboration [...] Read more.
In the digital marketplaces, businesses can micro-monitor sales worldwide and in real-time. Due to the vast amounts of data, there is a pressing need for tools that automatically highlight changing trends and anomalous (outlier) behavior that is potentially interesting to users. In collaboration with Danmark Music Group Ltd. we developed an unsupervised system for this problem based on a predictive neural network. To make the method transparent to developers and users (musicians, music managers, etc.), the system delivers two levels of outlier explanations: the deviation from the model prediction, and the explanation of the model prediction. We demonstrate both types of outlier explanations to provide value to data scientists and developers during development, tuning, and evaluation. The quantitative and qualitative evaluation shows that the users find the identified trends and anomalies interesting and worth further investigation. Consequently, the system was integrated into the production system. We discuss the challenges in unsupervised parameter tuning and show that the system could be further improved with personalization and integration of additional information, unrelated to the raw outlier score. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

23 pages, 1889 KiB  
Review
A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data
by Redhwan Al-amri, Raja Kumar Murugesan, Mustafa Man, Alaa Fareed Abdulateef, Mohammed A. Al-Sharafi and Ammar Ahmed Alkahtani
Appl. Sci. 2021, 11(12), 5320; https://doi.org/10.3390/app11125320 - 08 Jun 2021
Cited by 62 | Viewed by 9435
Abstract
Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. [...] Read more.
Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. Examining these collected data to detect suspicious events can reduce functional threats and avoid unseen issues that cause downtime in the applications. Due to the dynamic nature of the data stream characteristics, many unresolved problems persist. In the existing literature, methods have been designed and developed to evaluate certain anomalous behaviors in IoT data stream sources. However, there is a lack of comprehensive studies that discuss all the aspects of IoT data processing. Thus, this paper attempts to fill this gap by providing a complete image of various state-of-the-art techniques on the major problems and core challenges in IoT data. The nature of data, anomaly types, learning mode, window model, datasets, and evaluation criteria are also presented. Research challenges related to data evolving, feature-evolving, windowing, ensemble approaches, nature of input data, data complexity and noise, parameters selection, data visualizations, heterogeneity of data, accuracy, and large-scale and high-dimensional data are investigated. Finally, the challenges that require substantial research efforts and future directions are summarized. Full article
(This article belongs to the Special Issue Unsupervised Anomaly Detection)
Show Figures

Figure 1

Back to TopTop