Special Issue on Unsupervised Anomaly Detection

Anomaly detection (also known as outlier detection) is the task of finding instances in a dataset which deviate markedly from the norm [...]


Introduction to Anomaly Detection 1
Anomaly detection (also known as outlier detection) is the task of finding instances in 2 a dataset which deviate markedly from the norm. Anomalies are often of specific interest in 3 many real-world analytic tasks, since they can refer to incidents requiring special attention. 4 The detection of suspicious activity can be helpful in both, post-incident investigation, or, 5 in early-warning setups, where anomalies are detected in recent up-to-date datasets [11] or 6 even in streaming data [10]. 7 Among others, intrusion detection [7,8,10,12], payment fraud detection, public safety, 8 complex system monitoring[1, 2,4,6,9], and medical data analytics are possible application 9 domains. 10 11 In the context of supervised anomaly detection, a labeled dataset and an established 12 classification algorithm, which can deal well with unbalanced classes, can be used. Since 13 anomalies are often not similar to each other and also often unknown during training, this 14 setup is barely used in practice. In semi-supervised anomaly detection, a model is learned 15 with the normal class only. Later, anomalies can be detected using deviations from that 16 model. This setup is also known as one-class machine learning or novelty detection. Lastly, 17 in unsupervised anomaly detection, no training is done at all -the data is solely analyzed 18 according to its intrinsic structure and anomalies are often scored according to their degree 19 of outlierliness. This special issue addresses primarily these algorithms, whereas many of 20 them can also be used in a semi-supervised setup [8].

22
From an application point of view, an anomaly can be a single record within a mul-23 tivariate dataset, which is also known as a point anomaly detection problem. If the context 24 time needs to be taken into consideration to detect the outliers, the task is also known as 25 contextual anomaly detection. Lastly, a collective anomaly is the scenario, where multiple in-26 stances can form altogether an anomaly. A collective anomaly is the most complex scenario 27 and can also be at the same time a contextual anomaly detection problem. Further details 28 about this taxonomy can also be found in Al-amri et al. [12]. To solve a contextual anomaly 29 detection task, the data can be transformed into a point anomaly detection problem[1,4] or 30 (multivariate) time series anomaly detection algorithm can be applied [2,3,5,[7][8][9]. for detecting outliers among process or product quality profiles. In this context, a profile 34 is a nonlinear relationship between input variables and an output variable, mapping a 35 collective or contextual anomaly detection task to a point anomaly detection problem.

36
A stochastic Petri net digital twin was used by Lian et al. [2] to detect complex 37 collective outliers for oil and gas station operation behavior based on a multivariate time 38 series anomaly detection using a GAN. According to the authors, the model could also be 39 used to explain anomalies utilizing the reconstructed information.

40
A comparison of traditional and deep learning unsupervised algorithms for time series 41 anomaly detection was carried out by Rewicki et al. [3], also focusing on the different types 42 of anomalies. Interestingly the classical machine learning approaches generally outperform 43 the deep learning based algorithms.

44
A very compelling application is the monitoring of flight parameter data in the article 45 by Jasra et al. [4]. The authors detected anomalous flights after preprocessing using LOF 46 with an automatically determined threshold among recorded and simulated flight data. 47 It is worth to point out that the method could also potentially be used in a near-realtime 48 setting during the flight.

49
A new approach for detecting anomalies in multivariate time series data was proposed 50 by Pham et al. [5] using Multi-Scale Temporal convolutional kernels with Variational 51 AutoEncoder (MST-VAE). As a result, short-scale and long-scale convolutional kernels 52 should be combined to improve the overall model performance.

53
Furthermore, a new interesting research direction is addressed by Rollón de Pinedo 54 et al. [6]: Functional outlier detection, where anomalies in terms of magnitude and shape 55 are detected based on using h-mode depth and dynamic time warping. The authors also 56 investigate a not very commonly used but interesting application scenario: The detection 57 of anomalies within data originating from costly simulations.

58
The analysis of monitored KPI data in distributed systems to detect abnormal system 59 states is carried out by Shang et al. [7]. Here, a correlation analysis of multivariate time 60 series is performed using a Hidden Markov Model. 61 Jiang et al. [8] proposed a new semi-supervised anomaly detection framework for 62 univariate time series entitled Tri-CAD. It categorizes time series into three types and uses 63 different models for each. The different models include statistics, wavelet transforms as 64 well as a deep autoencoder. 65 Furthermore, an improvement of the well-known ARIMA model to detect anomalies 66 in univariate time series more efficiently and continuously has been proposed by Kozitsin 67 et al. [9]. The focus of the new algorithm includes a better performance compared to the 68 original ARIMA as well as the ability to adopt the model over time to changing underlying 69 data distributions.

70
Anomaly detection utilizing multiple parallel data streams in the form of multiple 71 stochastic processes has been addressed by Qin et al. [10], introducing a low-cost determin-72 istic policy for detecting anomalous processes. The authors point out that their proposed 73 algorithm is an ideal candidate for the challenge of anomaly detection during DOS attacks 74 in intrusion detection systems.

75
Herskind Sejr et al. [11] developed an application for detecting anomalies in music 76 streaming behavior data, whereas an explanation of the outliers was included such that 77 deviations from the expected time series forecast are more informative. Interestingly, 78 anomalies are presented to the users e.g. studios or musicians in this application scenario, 79 not only to system administrators as commonly done.

80
The challenges of anomaly detection among IoT data and possible future directions 81 are discussed in the article by Al-amri et al. [12] in a comprehensive review. Besides the 82 problems of common unsupervised anomaly detection tasks, the authors especially identify 83 feature-evolving data streams as a core point for IoT anomaly detection tasks in the future. 84

85
In this special issue "Unsupervised Anomaly Detection" of Applied Sciences, a total 86 of 12 papers (11 research articles and one review paper) are published and show recent 87 advances in the area. Besides many contributions addressing particular practical anomaly 88 detection tasks on real-world challenges, also some papers improve well-known unsuper-89 vised algorithms. Additionally, it is worth pointing out two articles focusing on current 90 trending research directions: Explainable Anomaly Detection and Functional Anomaly Detec-91 tion.