Accurately detecting anomalies and timely interventions are critical for cloud application maintenance. Traditional methods for performance anomaly detection based on thresholds and rules work well for simple key performance indicator (KPI) monitoring. Unfortunately, it is difficult to find the appropriate threshold levels when there are significant differences between KPI values at different times during the day or when there are significant fluctuations stemming from different usage patterns. Therefore, anomaly detection presents a challenge for all types of temporal data, particularly when non-stationary time series have special adaptability requirements or when the nature of potential anomalies is vaguely defined or unknown. To address this limitation, we propose a novel anomaly detector (called KPI-TSAD) for time-series KPIs based on supervised deep-learning models with convolution and long short-term memory (LSTM) neural networks, and a variational auto-encoder (VAE) oversampling model was used to address the imbalanced classification problem. Compared with other related research on Yahoo’s anomaly detection benchmark datasets, KPI-TSAD exhibited better performance, with both its accuracy and F-score exceeding 0.90 on the A1benchmark and A2Benchmark datasets. Finally, KPI-TSAD continued to perform well on several KPI monitoring datasets from real production environments, with the average F-score exceeding 0.72.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited