You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

23 May 2023

Unsupervised Anomaly Detection for Cars CAN Sensors Time Series Using Small Recurrent and Convolutional Neural Networks

,
,
and
1
Renault Software Labs, 2600 Route des Crêtes, Sophia Antipolis, 06560 Valbonne, France
2
LEAT (CNRS), Bât. Forum, Campus SophiaTech 930 Route des Colles, 06903 Sophia Antipolis, France
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Anomaly Detection and Monitoring for Networks and IoT Systems

Abstract

Predictive maintenance in the car industry is an active field of research for machine learning and anomaly detection. The capability of cars to produce time series data from sensors is growing as the car industry is heading towards more connected and electric vehicles. Unsupervised anomaly detectors are therefore very adapted to process those complex multidimensional time series and highlight abnormal behaviors. We propose to use recurrent and convolutional neural networks based on unsupervised anomaly detectors with simple architectures on real, multidimensional time series generated by the car sensors and extracted from the Controller Area Network bus (CAN). Our method is then evaluated through known specific anomalies. As the computational costs of Machine Learning algorithms are a rising issue regarding embedded scenarios such as car anomaly detection, we also focus on creating anomaly detectors that are as small as possible. Using a state-of-the-art methodology incorporating a time series predictor and a prediction-error-based anomaly detector, we show that we can obtain roughly the same anomaly detection performance with smaller predictors, reducing parameters and calculations by up to 23 % and 60 % , respectively. Finally, we introduce a method to correlate variables with specific anomalies by using anomaly detector results and labels.

2. Car Time Series Extracted from the CAN Bus

In this section, we present the process of data generation in the car and how it can be exploited for machine learning purposes.

2.1. The Data and the Car

A car is a complex assembly of mechanical and electrical parts. The control and sensing of all of those parts are processed by electric computing units (ECU). ECUs are microcontroller-based units that can receive, process, and send information from and to other ECUs. Specifically, information transits through a controller area network (CAN) bus. For example, one ECU might control the oil temperature while receiving and processing information about the motor rotational speed that has been sent from the ECU managing the motor area. In a modern car, tens of ECUs can process information simultaneously while the car is running. Information transiting through the CAN bus is thus a very good reflection of the car’s global state. In order to extract this information, a spy system can listen to the information transiting on the CAN bus and save it in a structured manner, associating a sample to its variable and timestamp. Then, the resulting data can be sent to a cloud database and into an anomaly detector, as shown in Figure 1.
Figure 1. Flow chart presenting the process of data generation in a modern car leading to anomaly detection. Within the car, ECUs handle sensors and communicate some of their values through the CAN bus. That CAN bus can be read in order to extract those sensors’ data and can then be sent to the cloud or be exploited internally by another ECU. The resulting time series are then resampled, a prediction is emitted after each sample, and an anomaly is either detected or not given the prediction error.

2.2. Our Dataset

The dataset used in this paper consists of the recordings of 486 variables gathered from an Alpine Renault car during driving tests conducted on circuits. Those tests were conducted over the course of 4 months, resulting in 17 GB of data. The sampling rate of the recordings was 10 Hz. In order to simplify the training and evaluation of the model, we filtered variables with respect to their complexity and retained only the richest ones. As many variables correspond to commands with only one to three discrete values and are rarely triggered, most of the signals show very few variations through time and are therefore poorly informative. Those are the variables which we filtered out. This process resulted in an 85-variable system. As recordings do not have the same temporal length, this dataset can be seen as a set of multivariate time series with different lengths. Finally, we resampled the data to 1 Hz using the average in order to save time on training and testing. This dataset is not public but is necessary in order to apply state-of-the-art anomaly detectors to real case studies. Nevertheless, we have included a comparison of different state-of-the-art models as well as the ones we have chosen in Table 1. These were applied to an open-source benchmark in order to ensure that our method and implementation were valid.
Table 1. Comparison of state-of-the-art anomaly detectors on the public benchmark Yahoo [32]. The LSTM (ours) corresponds to the LSTM with 50 cells and two layers, as presented in Section 4, using the semi-supervised threshold. A1 to A4 correspond to different groups of time-series that can be found within the Yahoo benchmark.

2.3. Labels

The utilized labels consist of 3683 timestamps in which an anomaly linked to the oil pressure was noticed. They were automatically generated following expert rules over five key variables, namely the oil pressure, the oil temperature, the mean effective torque, the engine RPM, and the engine coolant temperature. The abnormality is here defined as problematic oil pressure behavior. It is worth noting that this highly specific abnormal behavior does not cover all possible abnormal behaviors of the car and thus provides a limited evaluation of our unsupervised anomaly detector.

3. System Model and Problem Formulation

3.1. System Model

The overall system model is represented in Figure 2. We considered a time series T of n samples, each sample being an indexed vector of m dimensions representing variables with n N and m N , such that
T = { x 0 , x 1 , , x n 1 } .
Figure 2. Process of anomaly detection.
Each time series sample x t , t { 0 , , n 1 } is potentially corrupted by an anomaly that modifies its “normal behavior” in the sense that the value registered at this point does not fit with the usual pattern of the time series. The goal of the anomaly detector is then to associate each sample x t with an estimated binary anomaly label a ^ t { 0 , 1 } m in order to indicate whether said sample is corrupted by an anomaly. These labels then form the time series A .
A = { a ^ 0 , a ^ 1 , , a ^ n 1 } .

3.2. Unsupervised Anomaly Detection

In this section, we review our baseline algorithm formed by a stacked LSTM-based anomaly detector from References [17,19]. We added the anomaly likelihood method to the anomaly detection process as proposed in Reference [7]. This method allows the anomaly detector to focus on dynamic variations of the prediction error. As different variables at different moments can present behaviors that are more or less difficult to predict, the error prediction average might vary, and this makes the usage of a simple threshold tricky. Anomaly likelihood adapts nicely to this complex behavior, as it evaluates the local probability of a prediction error to occur and also prevents random point anomalies. Finally, we explain below how we acquired all of the parameters.
In Table 1, we have gathered the anomaly detection results found in References [9,13] on the public Yahoo benchmark [32]. We added our LSTM with 50 cells and two layers, as presented in Section 4, using semi-supervised threshold optimization, just as in the work of Reference [10]. For this model, we used 100 epochs, a learning rate of 10 3 , and a time window of w = 100 . We can see that LSTM and CNN performed well compared to some of the more complex models such as generative adversarial networks (TadGAN) or those that are bio-inspired. As we wanted to use small models with simple architectures in order to reduce computational costs while still providing good performance, we chose to exploit LSTM and CNN predictors in this paper. We also chose to compare the GRU [22] with the LSTM, as it is a simplified version of the LSTM. All of these predictors are described below.

3.3. LSTM Predictor

The first block of the anomaly detector is formed by an LSTM neural network, which is an RNN that has proven to be very efficient in capturing the temporal dependencies of a time series using internal memory [15]. The LSTM cell is described in Figure 3. Each cell is recurrent and outputs to vectors c t and h t . One LSTM layer can be made of several cells. In the following sections, we use the notation LSTM (n,m) to refer to an LSTM with x cells in the first layer and y cells in the second layer.
Figure 3. The Long Short Term Memory cell. h t and c t are the hidden and context vectors respectively. σ stands for the sigmoid activation function.
The overall stacked LSTM we used has two layers. It takes a sample x t as an input and predicts the next sample x t + 1 while using its internal state recurrently. A simple linear layer allows for the conversion of the LSTM output vector h t into a prediction of x t + 1 , as shown in Figure 4.
Figure 4. Organization/architecture of the Long Short Term Memory.
In this paper, we also compare the LSTM with two other models.

3.4. CNN Predictor

A convolutional neural network (CNN) is a well-known machine learning algorithm that has been proven to be very powerful, especially for image classification. In Reference [10], it was used to predict time series and to detect anomalies the same way we proposed in this paper. It is one of the state-of-the-art approaches to unsupervised anomaly detection. The limitation of a CNN lies in its inability to integrate temporal information longer than its convolutional window. As can be seen in Figure 5, the time series is cut into windows in which we run the convolutions. The model can then predict the next time series sample x t + w + 1 given the window { x t , , x t + w } . This CNN model has two layers made up of a 1D convolution layer followed by an ReLU activation function. l t and h t are intermediate hidden vectors generated by the model. In the following sections, we use the notation CNN (n,m) to refer to a CNN with n channels in the first layer and m channels in the second layer.
Figure 5. Organization/architecture of the Convolutional Neural Network.

3.5. GRU Predictor

Finally, we propose to use a gated recurrent unit (GRU) network [22], which is a simplified version of the LSTM with fewer gates in each cell forming a lighter RNN. Its architecture is exactly the same as that of the LSTM shown in Figure 4. Depending on the complexity of the task, a GRU can sometimes perform just as well or even better than an LSTM while having fewer parameters. We show such a result very clearly in Table 2. In the following sections, we use the notation GRU (n,m) to refer to a GRU with n cells in the first layer and m cells in the second layer.
Table 2. Performance evaluation comparison between the LSTM [17], GRU, and CNN [10] models. CNN (n,m) refers to a CNN with n channels in the first layer and m channels in the second layer, while LSTM (n,m) and GRU (n,m) refer to an LSTM or GRU with n cells in the first layer and m cells in the second layer.

3.6. Anomaly Detection

During training, for each sample x t , the predictor output a prediction x ^ t + 1 . The mean square error (MSE) over all dimensions and time was then computed between the predicted and true samples of one time series.
MSE = 1 nm t = 0 n 1 | | ( x ^ t x t ) 2 | | 1
where | | . | | 1 stands for the norm 1 of a vector.
This error was used as a loss function in order to optimize the predictor using a gradient descent optimization. Using this method and using an LSTM or a GRU, we were able to take full advantage of having time series of different lengths in the dataset. No information was lost due to padding or cutting the time series. The predictor was able to efficiently adapt to the variation in time series length.
In a vehicle, most of variables that we exploited here are correlated; for example, a failure in the oil pressure impacts the engine temperature. We made use of this assumption and provided all of the m variables to the predictor together in order to help it to capture more-complex temporal and cross-variable correlations and better predict the signal.
By comparing the true values of the tth sample x t with the predicted values, the prediction squared error for the tth sample was formed as:
e t = ( x ^ t x t ) 2
We then obtained the time series of the squared prediction errors:
E = { e 0 , e 1 , , e n 1 }
We then applied the anomaly likelihood method to the time series E as in reference [7] so as to obtain the anomaly scores s t and labels a ^ t . Note that we adapted its expression here for the use of multidimensional time series.
First, we computed the parameters μ t and σ t 2 of a sliding normal distribution of w samples of E .
μ t = i = 0 w 1 e t i w
σ t 2 = i = 0 w 1 ( e t i μ t ) 2 w 1
We then used the Gaussian tail probability (Q-function [34]) of the recent sliding average of W samples of prediction errors in order to obtain the anomaly score.
s t = 1 Q ( μ t ˜ μ t σ t )
where,
μ t ˜ = i = 0 W 1 e t i W
Finally, we used a threshold vector θ to compute the found anomalies.
a ^ t i = 1 if   s t i θ i 0 otherwise
where a ^ t i is the scalar of the ith dimension of the vector a ^ t .
After the prediction, each variable was treated separately with its own parameters; thus, there were m anomaly likelihoods being processed in parallel.
We finally defined the set A T of timestamps in which at least one variable was found to be abnormal.
A T = { t [ t O , t n 1 ] | | | a ^ t | | 1 > 0 }
With t i being the timestamp of the corresponding time series sample.

4. Results

After learning the predictor parameters, we ran our unsupervised anomaly detector through all the time series of our dataset and then exploited the results through different evaluations.
For all models, during the learning phase we used 300 epochs, a learning rate of 10 4 with the Adam optimization, and 2000 subsequences of 100 points each; additionally, we used 20 % of the data for testing. The Adam optimization is one of the best and most-utilized gradient-descent-based optimization algorithms for neural networks such as CNN and RNN that exists in the field of machine learning [35].
Below, we will first evaluate how well our algorithm catches the given anomalies using our specific labels, and then we will propose measuring the correlation between labels and variables using the multidimensional nature of our algorithm.

4.1. Evaluation with Labels (Oil Pressure Failure)

As explained in Section 2, labels that have been provided to our team express a specific abnormal behavior due to failures in oil pressure and do not cover all possible abnormal behaviors.
These labels point to specific timestamps in the dataset where this oil pressure failure has been detected. This abnormal behavior is not instantaneous and appears for several seconds. Thus, we have to consider a time window w a around those timestamps’ labels to assess whether an abnormal sample found by our algorithm matches one label. We define the set of true positives TP as:
TP = { t A T | t l L , t l w a t t l + w a }
With L and A T being, respectively, the set of labels and found anomalies timestamps.
We only consider one true positive sample with respect to one label even if there are several true positives in that label window, hence the usage of the set formalism.
We then define the false positive (FP), true negative (TN) and false negative (FN) sets as follows.
FP = { t A T | t l L , t l w a t t l + w a }
TN = t [ t O , t n 1 ] \ A T | t l L , t l w a t t l + w a
FN = t [ t O , t n 1 ] \ A T | t l L , t l w a t t l + w a
In order to obtain a clearer evaluation, we then compute respective rates—namely the true positive rate (TPR) for TP and so on—corresponding to those last ensembles as:
TPR = | TP | | L |
FPR = | FP | n
TNR = | TN | n
FNR = | FN | n
Provided in Table 3 are the TPR, FPR, TNR, and FNR obtained with windows w a of 5, 30, and 60 s and an LSTM with two layers of 50 cells with the notation LSTM (50-50).
Table 3. Performance evaluation of the LSTM (50-50) on limited labels.
The best results were obtained using the largest w a = 60 s, and they achieved 86 % successful detection.
The FPR is 0.061 % . If false positives were normally distributed over time and considering our 1 Hz sampling rate, this would mean that the model sends a false positive alert, on average, every 16 s. First, we have to remember here that our labels are limited to one specific kind of anomaly, and thus, this relatively high FPR might not depict an unsatisfactory behavior of our unsupervised anomaly detector. A subset of anomalies might correspond to actual abnormal behaviors of the vehicle which are not given in our labels.
The evaluation of whether our false positives match abnormal behaviors is not always straightforward. Visual interpretation can be difficult, as the model processes 85 variables altogether.
In Figure 6, we show an example of a successful detection made by our model regarding the oil pressure problem. As the engine mean effective torque is rising, the oil pressure should do the same; however, it does not do so here, which constitutes abnormal behavior. We also see how anomaly likelihood offers a nice dynamic adaptation of the quadratic error processing to find anomalies.
Figure 6. Blue: signal, Results for a slice of the engine oil pressure (a) and engine mean effective torque (b) time series. On the left side, we see a specific abnormal behavior of the oil pressure. As the mean effective torque rises, the oil pressure is supposed to do so as well, but it does not do so here, which constitutes an abnormal behavior that our model successfully highlighted despite a label is missing at this timestep. On the right side, we see an example of a true positive and the impact of the anomaly likelihood, which adapts to the variation of the prediction error from the left to the right side. This method also induces a little delay in the anomaly detection time because of the local average that is computed.

4.2. LSTM Computation Costs Comparison

Here we evaluate LSTM models with respect to different amounts of cells and layers. We start with our baseline LSTM (50-50) and run smaller architectures. We also compute the number of parameters and multiply–accumulate operations (MACs) for each model in order to evaluate their computational costs using the pytorch-OpCounter (access date: 12 January 2022) library https://github.com/Lyken17/pytorch-OpCounter.
In Table 4, the LSTM (10) shows the best TPR, but, for a better evaluation, the FPR has to be considered, as discussed below.
Table 4. Performance evaluation comparison between different Long Short Term Memories for w a = 60 s.
When evaluating the anomaly detection results, the two main factors to be considered are the TPR and the FPR. We want the highest TPR while keeping the FPR as low as possible. Depending on the threshold used during anomaly detection, TPR and FPR vary. It is difficult to equalize at least one of these across different models. Thus, we propose to use the positive likelihood ratio (PLR) defined in Equation (20), which allows for better evaluation and comparison between different models.
PLR = TPR FPR
In Table 5, we first see that the LSTM (50) gives the best PLR of 14.2 . We can also observe that the LSTM (10) with only 10 cells performs very similarly with a PLR of 14.1 but offers a computation cost about 6.6 times lower. As a result, our best predictor for anomaly detection is the LSTM (10). This is one of the lightest in terms of parameters and MACs, which makes it a better candidate for embedded systems. However, it is also one of the worst predictors with an MSE of 0.0059 . This is an interesting property of the overall anomaly detector process that allows for the use of simpler predictors despite the prediction performance limitation.
Table 5. Computation costs comparison between different LSTMs; the PLR is computed with respect to w a = 60 s.
In order to evaluate more directly and quantitatively each state-of-the-art model, including the CNN [10], we have tested all models with various sizes and without the anomaly likelihood. The method used to find anomalies this time is semi-supervised, and we also use the f1 score defined in Equation (23) to evaluate our results [10]. The model is processed exactly as before for training and prediction.
recall = TP P
where TP is the number of true positives and P the number of real positives.
precision = TP PP
where PP is the number of predicted positives.
f 1 = 2 · precision . recall precision + recall
Instead of using the anomaly likelihood method, we simply search for the anomaly threshold that maximizes the PLR. This threshold is the value above which the prediction error is considered to be abnormal. As finding this threshold requires labels, this method is therefore semi-supervised. This allows us to focus our performance analysis more on the ML model itself and avoid unsupervised post-processing bias.
In Table 2, all of the models’ performances are given in detail. We can see that the best f1 score was achieved by the CNN (32-32) followed very closely by the GRU (50-50) which is much lighter in terms of computational costs.
Considering the window w a = 5 s for example, we observe that the CNN (8-8), the GRU (50,50), and the LSTM (50,50) reach the same f1. However, this GRU has 23 % reduced parameters and consumes 23 % fewer MACs than this LSTM. This GRU, while having 3.79 times greater parameters than this CNN, still consumes 60 % fewer MACs than this CNN. This result highlights that CNN might not always represent the best solution for time series, as convolutions induce a lot of calculation over a given time window. As no information is integrated from one time window to another, a CNN indeed needs to process N sliding time windows while an LSTM or a GRU processes N samples.
In terms of TPR and FPR, the GRU also shows the highest performance with the lowest computational costs. From one time window w a to another, the signal is being resampled in our implementation method, meaning that signals do not have the same length and some labels can be merged together. This has an impact over the range of f1, TPR, and FPR scalars, which makes comparisons between those metrics across time windows difficult.
In accordance with the previous results, we see that the f1 score, TPR, and FPR are not or very slightly impacted by the model size whereas the prediction performance increases accordingly to with size. We thus show again that the best prediction model might not be the best anomaly detection model in terms of anomaly detection performance and computational costs. Finally, the proposed semi-supervised anomaly threshold optimization method offers very low or null FPRs.
In the next section, we present a method to correlate variables with a specific kind of anomaly.

4.3. Correlation of Variables with Labels

As our labels are focused on one specific abnormal behavior, the question was raised as to whether it is possible to explain which variables are linked to that behavior.
In our case, abnormal behavior has been defined by experts using five expert variables: MeanEffTorque, EngineRPM, RST_EngineOilPressure, RST_EngineCoolantTemp, and RST_OilTemperature. The anomaly is linked to a problem with the oil pressure.
Making use of our anomaly detector and the way it handles each variable independently after prediction, we propose to use our model PLR to correlate variables with the anomaly, the underlying idea being that variables showing better detection are more linked to the anomaly.
We look at variables with the highest PLR. In Figure 7, we see that three of our five expert variables are ranked within the first quarter of variables with the best PLR out of the total 85 variables. Modifying the parameters of the anomaly detection model will impact this result, but we always find most of the five expert variables in that best quarter. Please note that we find oil pressure to be the second-best variable. This method thus effectively points out variables that are correlated with the anomaly, although we would need additional expert feedback in order to understand the role of the neighbor variables. We also want to know whether they are correlated to the anomaly. Unfortunately, we were not able to obtain that feedback yet.
Figure 7. Variables correlations with labels. Variables are shown ordered by decreasing PLR. Circled in green are the variables that are known by experts to have caused the oil pressure failure.

5. Discussion and Future Work

In this paper, we have proposed unsupervised anomaly detection models using different state-of-the-art algorithms such as a LSTM, GRU, or CNN and anomaly likelihood or a semi-supervised anomaly threshold optimization. We applied these models in order to effectively point out abnormal temporal behaviors over a real, complex, multivariate sensor time series that was extracted from a vehicle CAN bus during circuit tests.
The evaluation of these models with respect to some limited labels shows an acceptable rate of detection, although there is still room for improvement. First, as our labels only express one specific abnormal behavior, it is difficult to fully evaluate our model. Second, further advanced data prepossessing and feature engineering conducted with experts would certainly help.
The comparison between different LSTMs, GRUs, and CNNs shows that the anomaly detector can work just as well or even better with smaller predictors despite a slight decrease prediction performance. Under certain circumstances, we have shown that the GRU can save up to 23 % of parameters and 60 % of MACs operations compared to the LSTM and the CNN. This is an interesting property for embedded systems, as it allows for a reduction of the algorithm computational cost.
As future works, we intend to embed anomaly detectors inside vehicles and thus expose our models directly to high computational cost constraints. We are currently studying spiking neural networks as a candidate to tackle this problem. An in-depth anomaly detection performance and computational costs study should be performed in the future on public data benchmarks so as to focus on the reduction of computational costs, as this is lacking in the literature presented in this paper.
Finally, we proposed a method for evaluating the correlation between variables and a specific abnormal behavior given in labels. This method shows interesting results and would need further in-depth evaluation in order to refine it.

Author Contributions

Conceptualization, Y.C., A.V., B.M. and A.P.; methodology, Y.C., A.V., B.M. and A.P.; software, Y.C.; validation, A.V., B.M. and A.P.; investigation, Y.C. and A.V.; data curation, Y.C. and A.V.; writing—original draft preparation, Y.C. and A.P.; writing—review and editing, Y.C. and A.P.; supervision, B.M., A.V. and A.P.; project administration, A.P.; funding acquisition, B.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Renault and the ANRT (Research contract LEAT—Renault n°2021-C-5854/CNRS n°239386).

Data Availability Statement

The Yahoo anomaly benchmark can be found here: https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70 (accessed on 12 Jaunary 2022). The Renault car data is publicly unavailable.

Acknowledgments

This material is based upon work supported by the French technological research agency (ANRT) through a CIFRE thesis in collaboration with Renault.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
  2. Shaukat, K.; Alam, T.M.; Luo, S.; Shabbir, S.; Hameed, I.A.; Li, J.; Abbas, S.K.; Javed, U. A review of time-series anomaly detection techniques: A step to future perspectives. In Advances in Information and Communication, Proceedings of the 2021 Future of Information and Communication Conference (FICC), Vancouver, BC, Canada, 29–30 April 2021; Springer: Berlin/Heidelberg, Germany, 2021; Volume 1, pp. 865–877. [Google Scholar]
  3. Guha, S.; Mishra, N.; Roy, G.; Schrijvers, O. Robust random cut forest based anomaly detection on streams. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
  4. Kejariwal, A. Introducing Practical and Robust Anomaly Detection in a Time Series. Twitter Engineering Blog. Web, 15. 2015. Available online: https://blog.twitter.com/engineering/en_us/a/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series (accessed on 12 January 2022).
  5. Stanway, A. Etsy Skyline. Online Code Repos. 2013. Available online: https://github.com/etsy.skyline (accessed on 12 January 2022).
  6. Laptev, N.; Amizadeh, S.; Flint, I. Generic and scalable framework for automated time-series anomaly detection. In Proceedings of the ACM SIGKDD International Conference, Sydney, NSW, Australia, 10–13 August 2015; pp. 1939–1947. [Google Scholar]
  7. Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 2017, 262, 134–147. [Google Scholar] [CrossRef]
  8. Widanage, C.; Li, J.; Tyagi, S.; Teja, R.; Peng, B.; Kamburugamuve, S.; Baum, D.; Smith, D.; Qiu, J.; Koskey, J. Anomaly detection over streaming data: Indy500 case study. In Proceedings of the 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), Milan, Italy, 8–13 July 2019; pp. 9–16. [Google Scholar]
  9. Maciąg, P.S.; Kryszkiewicz, M.; Bembenik, R.; Lobo, J.L.; Del Ser, J. Unsupervised anomaly detection in stream data with online evolving spiking neural networks. Neural Netw. 2021, 139, 118–139. [Google Scholar] [CrossRef] [PubMed]
  10. Munir, M.; Siddiqui, S.; Dengel, A.; Ahmed, S. DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series. IEEE Access 2019, 7, 1991–2005. [Google Scholar] [CrossRef]
  11. Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 3009–3017. [Google Scholar]
  12. Zhao, H.; Wang, Y.; Duan, J.; Huang, C.; Cao, D.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; Zhang, Q. Multivariate time-series anomaly detection via graph attention network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 841–850. [Google Scholar]
  13. Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. Tadgan: Time series anomaly detection using generative adversarial networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 33–43. [Google Scholar]
  14. Längkvist, M.; Karlsson, L.; Loutfi, A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit. Lett. 2014, 42, 11–24. [Google Scholar] [CrossRef]
  15. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory; MIT Press: Cambridge, MA, USA, 1997; Volume 9, pp. 1735–1780. [Google Scholar]
  16. Bontemps, L.; McDermott, J.; Le-Khac, N. Collective anomaly detection based on long short-term memory recurrent neural networks. In Proceedings of the International Conference of Future Data and Security Engineering, Can Tho City, Vietnam, 23–25 November 2016. [Google Scholar]
  17. Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long Short Term Memory Networks for Anomaly Detection in Time Series. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 22–24 April 2015. [Google Scholar]
  18. Filonov, P.; Lavrentyev, A.; Vorontsov, A. Multivariate industrial time series with cyber-attack simulation: Fault detection using an lstm-based predictive data model. arXiv 2016, arXiv:1612.06676. [Google Scholar]
  19. Chauhan, S.; Vig, L. Anomaly detection in ECG time signals via deep long short-term memory networks. In Proceedings of the International Conference on Data Science and Advanced Analytics (DSAA), Paris, France, 19–21 October 2015. [Google Scholar]
  20. Cherdo, Y.; De Kerret, P.; Pawlak, R. Training lstm for unsupervised anomaly detection without a priori knowledge. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4297–4301. [Google Scholar]
  21. Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on KNOWLEDGE Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
  22. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
  23. Meyer, P.; Häckel, T.; Korf, F.; Schmidt, T.C. Network anomaly detection in cars based on time-sensitive ingress control. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Victoria, BC, Canada, 18 November–16 December 2020; pp. 1–5. [Google Scholar]
  24. Rajbahadur, G.K.; Malton, A.J.; Walenstein, A.; Hassan, A.E. A survey of anomaly detection for connected vehicle cybersecurity and safety. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; pp. 421–426. [Google Scholar]
  25. Zhou, A.; Li, Z.; Shen, Y. Anomaly detection of CAN bus messages using a deep neural network for autonomous vehicles. Appl. Sci. 2019, 9, 3174. [Google Scholar] [CrossRef]
  26. Sun, H.; Chen, M.; Weng, J.; Liu, Z.; Geng, G. Anomaly detection for in-vehicle network using CNN-LSTM with attention mechanism. IEEE Trans. Veh. Technol. 2021, 70, 10880–10893. [Google Scholar] [CrossRef]
  27. Boumiza, S.; Braham, R. An anomaly detector for CAN bus networks in autonomous cars based on neural networks. In Proceedings of the 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Barcelona, Spain, 21–23 October 2019; pp. 1–6. [Google Scholar]
  28. Wang, Y.; Masoud, N.; Khojandi, A. Real-time sensor anomaly detection and recovery in connected automated vehicle sensors. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1411–1421. [Google Scholar] [CrossRef]
  29. Narayanan, S.N.; Mittal, S.; Joshi, A. OBD_SecureAlert: An anomaly detection system for vehicles. In Proceedings of the 2016 IEEE International Conference on Smart Computing (SMARTCOMP), St. Louis, MO, USA, 18–20 May 2016; pp. 1–6. [Google Scholar]
  30. Bogdoll, D.; Nitsche, M.; Zöllner, J.M. Anomaly detection in autonomous driving: A survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4488–4499. [Google Scholar]
  31. Pereira, P.J.; Coelho, G.; Ribeiro, A.; Matos, L.M.; Nunes, E.C.; Ferreira, A.; Pilastri, A.; Cortez, P. Using deep autoencoders for in-vehicle audio anomaly detection. Procedia Comput. Sci. 2021, 192, 298–307. [Google Scholar] [CrossRef]
  32. Yahoo! Webscope Research. S5—A Labeled Anomaly Detection Dataset, Version 1.0(16M). Available online: https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70 (accessed on 12 January 2022).
  33. Lavin, A.; Ahmad, S. Evaluating Real-Time Anomaly Detection Algorithms–The Numenta Anomaly Benchmark. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 38–44. [Google Scholar]
  34. Karagiannidis, G.K.; Lioumpas, A.S. An improved approximation for the Gaussian Q-function. IEEE Commun. Lett. 2007, 11, 644–646. [Google Scholar] [CrossRef]
  35. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.