Abstract
Online anomaly detection is critical for industrial safety and security monitoring but is facing challenges due to the complexity of evolving data streams from working conditions and performance degradation. Unfortunately, existing approaches fall short of such challenges, and these models may be disabled, suffering from the evolving data distribution. The paper presents a framework for online anomaly detection of data streams, of which the baseline algorithm is the incremental learning method of Growing Neural Gas (GNG). It handles complex and evolving data streams via the proposed model Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG). Firstly, novel learning rate adjustment and neuron addition strategies are designed to enhance the model convergence and data presentation capability. Then, the Bayesian algorithm is adopted to realize the fine-grained search of BOA-GNG-based hyperparameters. Finally, comprehensive studies with six data sets verify the superiority of BOA-GNG in terms of detection accuracy and computational efficiency.
1. Introduction
Anomalies can be identified as abnormal points that have different characteristics from the majority of the data, and such novel observations may result from a specific failure, unexpected noise, etc., in industrial facilities monitoring. Online anomaly detection of industrial data streams can support operation and maintenance staff in identifying and locating potential equipment failures in a timely and accurate manner, avoiding serious faults and accidents.
Today, a complex data stream is common from Distributed Control System (DCS) and IoT devices in many industry fields, where multivariate data of infrastructures with complex correlations and time-varying characteristics are continuously arriving. The complexity brings an insurmountable challenge to anomaly detection, and the evolving data streams may make the preprocessing or trained models quickly outdated.
This paper concerns the computational method to deal with complex evolving data streams. Incremental learning-based anomaly detection provides an effective way to address the challenge. In particular, a competitive neural network has been widely used, as it is appropriate for an unsupervised setting that is natural for anomaly detection with rare labels. Existing state-of-the-art competitive neural network models, such as Growing Neural Gas (GNG) [1,2], Self-Organizing Neural Network (SONN) [3,4], Adaptive Resonance Theory (ART) [5], however, are designed for static offline data and cannot effectively cope with evolving data streams. There are a few competitive neural network-based methods proposed for time series anomaly detection [6,7,8]. However, they focus on learning the time-varying characteristic via adding adaptive learning strategies, ignoring the evaluation of the computational overhead of improved algorithms for evolving data streams.
The goal of this paper, thus, is to provide a novel framework for online anomaly detection that adaptively adopts the baseline GNG algorithm in an online setting, effectively dealing with the challenges of complex and evolving data streams.
2. Related Work
2.1. Incremental Learning
Undoubtedly, it is either too ineffective or too inefficient to handle a complex evolving data stream using a pre-trained fixed model or repeatedly creating a new model [9]. A common method is to build an initial model and incrementally update the model over the arriving data. Thus, Incremental Learning (IL) adapts a model to only the latest data, regardless of how data streams evolve. According to the different learning factors, the IL method can be categorized into three types, Sample Incremental Learning (SIL), Class Incremental Learning (CIL), and Feature Incremental Learning (FIL) [10]. SIL aims to continuously learn internal attributes to maintain model presentation ability for dynamic data streams and extract new knowledge to enhance model accuracy. CIL attempts to learn a new class of arriving data and add the new class into the historical class set so the classification performance can be improved. For the FIL method, such an algorithm is devoted to adding new features of evolving data to construct a new representation space to achieve the goal of improving classification accuracy. Due to the scarcity of anomalies in the anomaly detection tasks, it is difficult to define the boundary of normal and abnormal samples, and data labels are lacking for building a supervised model. Therefore, the CIL approach has difficulty performing well in anomaly detection of data streams. Additionally, excellent and distinguishable features in anomaly detection tasks are challenging to extract dynamically, especially in the modeling task of data streams [11]. Thus, SIL is better suited to constructing unsupervised models for the anomaly detection of evolving data streams.
2.2. Incremental Anomaly Detection
The time-varying characteristics of data streams require an anomaly detection model to have incremental learning ability. Existing state-of-the-art SIL-based anomaly detection approaches mainly include improved classical machine learning methods and competitive neural networks. Andrew [12] proposed a model integrating autoencoder and incremental clustering, providing a reliable machine learning-based monitoring method for electrical applications with varying power cycle patterns. Bigdeli [13] designed a two-layer cluster-based anomaly detection, which is fast, noise-resilient, and incremental, to lower the false alarm rate of real-time anomaly detection for dynamic data. Kaan [14] designed a two-stage filtering and hedging algorithm for sequential anomaly detection, where an incremental decision tree is used to construct a multimodal probability density function, and an adaptive thresholding scheme is used to detect anomalies. Furthermore, deep learning is also generally selected as the baseline algorithm for incremental anomaly detection, and its advantage is its spatiotemporal feature-catching ability. Nawaratne [15] proposes the incremental spatiotemporal learner for real-time video surveillance, and the learner is an unsupervised deep-learning module to continuously update and distinguish between new anomalies and normality over time. Agarwal [16] develops an LSTM-autoencoder-based incremental anomaly detection model to detect machine chatter, not only capturing changes in system dynamics over time but also incrementally improving the detection accuracy via transfer learning. The existing incremental anomaly detection approaches based on improved classical machine learning or deep learning algorithms always aim to design a continuous learning strategy for detected data via an offline setting, paying little attention to evolving data streams. Furthermore, many deep learning-based incremental anomaly detection models focus on learning temporal relationships in local sequences and incrementally updating the initial model, which is not suitable for handling arbitrarily evolving data streams. On the other hand, the optimization of computational overhead for deep anomaly detection models is another challenge for real-time streaming anomaly detection.
Competitive neural network-based incremental anomaly detection methods are always shallow neural networks, which can not dynamically learn the data stream characteristics and also have lower computation complexity. Typical algorithms of such methods are the Self-Organizing Incremental Neural Network (SOINN), Growing Neural Gas (GNG), Adaptive Resonance Theory (ART), and their variants. Fahn [17] designed a SOINN-based abnormal trajectory detection for efficient video condensation; the feature extraction module can compress the original video size to 10%, while the detection accuracy is maintained at 95%. Hu [18] developed a novel algorithm for fault diagnosis of redundant inertial measurement units. The method introduces SOINN and PCA to achieve high accuracy of tiny faults with low computational complexity. Based on the original GNG algorithm, Mahmoudabadi [19] integrates a fuzzy inference system with GNG for online anomaly detection of data streams; the selection trick of winning neurons in GNG is improved, and the algorithm shows better accuracy on public datasets over existing clustering models. Song [20] optimizes the adjustment of the learning rate and neuron addition and deletion strategies; the accuracy and computational efficiency of GNG are both improved in online anomaly detection. Therefore, in comparison with improved machine learning models, the competitive neural network-based incremental anomaly detection approach has the disadvantages of good precision and timeliness, which is more suitable for streaming anomaly detection. It is worth noting that promoting the ability to present complex data streams and optimize the adjustment strategy of the network structure is challenging. The motivation of this paper is to further optimize our work in [20].
2.3. Highlights
The main idea in this paper is to use the GNG algorithm as a baseline model, which can process evolving data streams via an adaptive learning mechanism. On the basis of our previous work in [20], we propose a novel framework (BOA-GNG) for online anomaly detection of data streams, and the contributions of this work are briefly summarized as follows.
1. An improved GNG algorithm is proposed to better model the complex and evolving data streams for online anomaly detection. Therein, a novel learning rate adjustment of GNG is designed to obtain better model convergence and stability; the neuron addition strategy is improved for better data representation ability.
2. In terms of hyperparameter settings for GNG, a Bayesian algorithm is introduced for fine-grained search of network hyperparameters instead of brute force search, optimizing the computational efficiency of online detection.
3. Extensive experiments on six data sets verify the superiority of BOA-GNG in terms of detection accuracy and computational efficiency, and further ablation studies are conducted to show the effectiveness of the improvement strategies.
3. Methodology
3.1. Algorithm of Original GNG
The base model of the proposed method is the GNG algorithm, and the theory and process of the GNG are described briefly. Then, the vital improvements for the original GNG are detailed and demonstrated in this section.
The original GNG, which combines the neural gas [21] and competitive Hebbian learning [22], is a topological graph. It aims to represent the characteristics of multivariate data by allowing the number of neurons to increase and taking into account the neighborhood relations of neurons. The learning process of the original GNG includes the adjustment of neuron parameters, adjustment of relationships between neurons, insertion of neurons, and deletion of neurons, of which the process is as follows in Algorithm 1, and the relevant parameters are shown in Table 1.
Table 1.
Parameters of the Growing Neural Gas (GNG) algorithm.
| Algorithm 1: The original Growing Neural Gas (GNG) |
|
As shown in the original GNG algorithm, a minimum number of neurons is initially created (lines 1~3), and then, new neurons and new neighborhood connections (edges) are added between them during learning, according to the input instances. For each new instance x from the data stream (lines 4~5), the two closest neurons and are found via Euclidean distance (line 6). A local representation error is increased for the winning neuron (line 7), and the age of the edges connected to this neuron is updated (line 8). The winning neuron and the neighboring neurons (linked to by an edge) are adapted according to the learning rate and (line 9). Moreover, the two neurons and are linked by a new edge (age is 0) (lines 10~11). When the edges reach a maximum age without being reset, they will be deleted. If any neuron belonging to the graph becomes isolated, it is also deleted (line 12). With the input of data streams, the graph periodically creates a new neuron between the two neighboring neurons that have accumulated the largest representation error (lines 13~14). Finally, the representation error of all neurons is subject to an exponential decay (line 15) in order to emphasize the importance of the recently measured errors.
As stated above, the GNG model can dynamically represent the distribution characteristics of streaming data, as shown in Figure 1. However, it also suffers from several drawbacks. First, the original GNG organizes neurons to represent the streaming data using fixed learning rates, affecting the model convergence and stability. Second, inserting new neurons into the model every k step cannot guarantee the necessity of the new neurons, and it may also result in model redundancy. Third, the hyperparameter optimization of GNG via grid search is time-consuming and imprecise, which makes it difficult to obtain the optimal solution.
Figure 1.
The dynamic topology learning of Growing Neural Gas (GNG) algorithm. The blue bullets are neurons of GNG. The red bullet represents newly inserted neuron as the topology grows.
Several variants of GNG have been proposed to solve the previous problems, such as GNG-U [23], GWR [24], and Online GNG [25]. GNG-U defines a utility measure that removes neurons located in low-density regions and inserts them in regions of high density. However, we may need to create new neurons without the necessity of removing others. GWR judges whether new neurons need to be added according to the activity and firing thresholds of the winning neurons, but it does not consider optimizing the learning rate. The model of Online GNG can estimate the learning efficiency and adjust the network size to automatically fit the changing data space. However, neighbor-related strategies treat all neurons equally, which may result in some common parameters amplified by the sparsest or densest. In this paper, a novel GNG-based anomaly detection method of data streaming is proposed on the basis of our previous work in [20], and detailed improvements are presented as the description of highlights in Section 2.3.
3.2. Insertion Strategy of New Neurons
Representing a large amount of data with a few neurons is an essential idea of GNG models. As described in Algorithm 1, the original GNG creates a new neuron periodically, the strategy of which cannot adapt to a sudden change in the distribution of streaming data with a new data topology. If there has been no new pattern for a long time, the algorithm will create a lot of redundant neurons. On the other hand, a threshold is defined in our previous work (GNG-I) to determine whether an input is distinctive enough to insert new neurons, and thus, the fixed threshold may limit its ability to adapt to dynamically changing data streams. In this paper, we further optimized the threshold-setting strategy.
When a new instance of the data stream is coming, we find the winner by following line 6 of Algorithm 1. If has directly connected neighbors, then the new threshold is calculated using the maximum distance between and its neighbors following Formula (1).
where is the weight of , is the weight of , is the set of neurons that connect to the neuron , and is the Euclidean distance between neuron and neuron . If has no neighbor, the threshold is calculated using the minimum distance between and other neurons:
where is the set of all neurons. During the learning process, if the distance between a new instance and its winner neuron is larger than the threshold, the algorithm adaptively inserts new neurons to support the new topology. The original GNG inserts the new neuron between the two neurons with the largest and second-largest cumulative error (line 14 of Algorithm 1), overlooking the role of the new neuron. However, the new neuron describes a new inputting pattern for the model. Therefore, the insertion location in this paper is between and , taking into account both the winner neuron and the new neuron, and the weights of the new neuron can be calculated as follows.
The purpose of optimizing the neuron insertion is to improve the model’s adaptability in online anomaly detection scenarios for streaming data. When the data distribution is stationary, fewer neurons are created, and it is necessary to create neurons in a timely manner in order to handle changes.
3.3. Adjustment Strategy of Learning Rate
As shown in Table 1, to learn the inputting pattern, the winner and its neighbors will change their weights by multiplying fixed rates and . Previous works [26,27] indicate that using a dynamic learning rate is helpful for modeling streaming data. In this paper, the adaptive learning rate is designed, and the calculation equation can be calculated as follows.
where is the winner neuron and is the neighbor neuron of , is the set of neurons that connect to the neuron , is the Euclidean distance between instance and neuron , is the winning times of the neuron , has been defined in Equations (1) and (2). There are two main advantages of such an adaptive learning rate.
First, neurons with more winning times are considered to be more important, and these relatively vital neurons only need small adjustments after continuous incremental learning. The relatively larger learning rate enables quick updates to the weights at the beginning, while the gradual attenuation of the learning rate enables the neurons with higher winning frequency to converge gradually at the end of model training.
Second, the adaptive learning rate enables the model to adjust its neurons at appropriate step sizes. For example, if the distance between the input and its winner is large, the winner should adjust the weights fast. Otherwise, if the winner is mature by winning enough patterns, this acceleration can be decreased.
3.4. Optimization of Hyperparameters
The choice of GNG-based hyperparameters is very important for its performance. In this paper, the Bayesian optimization method is adopted to select optimal hyperparameters for the proposed BOA-GNG model. Bayesian optimization, which mainly consists of the probabilistic surrogate model and acquisition function, is an efficient global optimization algorithm [28,29]. Its objective function is designed as follows.
where is the set of parameters to be optimized, is the validation set, is the instance from validation set, is the total number of validation set, and is the neuron nearest to . We hope to find the optimal parameter , which minimizes the average distance from to the existing neurons and maximizes the value of the objective function at the same time. Taking the objective function as the optimization goal, the process of Bayesian optimization can be described below.
1. Initialization: Initialize the search scope of parameter set . Start by randomly selecting a small set of initial sampling points from the search space. Evaluate the objective function to obtain initial observations.
2. Build a Probabilistic Surrogate Model: Based on the initial observations, construct a probabilistic surrogate model (defined as a Gaussian process in this paper) that approximates the objective function. This model captures both the mean prediction and the uncertainty of the objective function.
3. Optimize the Acquisition Function: Define an acquisition function (probability of improvement is selected as the acquisition function in this paper) that quantifies the utility of sampling at a given point based on the current surrogate model. Optimize this acquisition function to find the next point to sample.
4. Sample the Next Point: Evaluate the objective function at the point identified by optimizing the acquisition function. This new observation is used to update the surrogate model.
5. Iterate: Repeat steps 3 and 4, continuously updating the surrogate model and optimizing the acquisition function until a stopping criterion is met (e.g., a maximum number of iterations is reached or the objective function value improves below a certain threshold).
6. Return the Best Solution: At the end of the optimization process, return the point in the search space that corresponds to the best objective function value observed.
By balancing exploration and exploitation, it can efficiently navigate the search space to find high-quality solutions.
4. BOA-GNG-Based Anomaly Detection of Streaming Data
Anomaly detection methods learn a model from a reference set of regular (or normal) data and classify the unexpected data as irregular (or abnormal) [30]. However, if the reference data comes as a stream and its distribution is subject to change over time, the model trained over historical data may lose efficacy as the data distribution of the streaming data is changing. The proposed BOA-GNG can adapt to evolving data streams, and the representational topology structure is adaptively changing during the online anomaly detection of streaming data. In the detecting process, the distance-based method is used to estimate the anomaly state of input data, and the threshold can be calculated as follows.
where is the current number of edges in the BOA-GNG model, and is an edge connecting neurons. is the set of edge-connecting neurons, is the Euclidean distance of . Manually choosing a convenient value for the decision parameter is hard because it not only depends on the dataset but also on the number of neurons in the model, which varies over time. Therefore, we heuristically set equal to the expected distance between neighboring neurons in the model. In other words, at any time is defined as the average length of edges at that time.
Obviously, each neuron in the BOA-GNG model can be considered as the center of a hyper-sphere, and the model covers the space that represents regular data at any time. If a new instance from the data streams comes in, and the distance between it and the winner neuron is larger than , it means that such data are not part of the existing topology of the model, and is considered an anomaly. The anomaly judge rule is defined as follows.
where the set of all neurons, is the current inputting data, is the weight of the winning neuron , and is the Euclidean distance between and . is calculated using Equation (6). As in Equation (7), the Euclidean distance of each and its winner neuron is calculated for online anomaly detection. The whole process of the BOA-GNG can be described as the pseudocode in Algorithms 2 and 3, and the variants of BOA-GNG are shown in Table 2.
Table 2.
The notations of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG).
| Algorithm 2: The learning process of BOA-GNG |
;
;
Delete the neurons, which become isolated and ;
|
| Algorithm 3: The hyperparameter optimization of BOA-GNG via Bayesian |
BOA-GNG. train (train set);
,
|
5. Validation of the Proposed Method
In this section, we validate the proposed method. Firstly, we describe the experimental datasets. Secondly, the evaluation indicators of the experiments are introduced. Thirdly, the effect of improvements is discussed. Finally, comparison and ablation studies are presented, and the results are discussed.
5.1. Datasets
The experimental datasets include five publicly available datasets and one real engineering dataset from the Payload of China’s aerospace satellite. Table 3 gives a brief summary of the six datasets.
Table 3.
Summary of the six experimental datasets.
Shuttle dataset: This dataset was utilized to delineate the position of radiators on NASA space shuttles, primarily for classification purposes. Initially comprising 58,000 samples, 80% were in the first category. The version used in this study is post-processed data, where samples belonging to the first category are considered normal, while other categories are considered anomalous. In the experiments, the test set consisted of 1778 samples evenly distributed between normal and anomalous data.
KDD-CUP99 HTTP dataset: This dataset was collected over a period of 9 weeks from a simulated Air Force network and includes network connection and system audit data, which are commonly used to validate the performance of intrusion detection algorithms. To meet unsupervised or semi-supervised requirements, this study ultimately adopted a simplified version of KDD99, known as KDD-CUP99 HTTP. This subset used only HTTP traffic data from the original dataset, which consists of 620,000 samples, including 1053 anomalous samples, accounting for 0.17%. In the experiment, 80,000 normal samples were used for training, and 2100 samples were used for testing.
Satellite Dataset: This dataset was generated by the Australian Centre for Remote Sensing from NASA data. This dataset contains 36 parameters and serves as a multi-class classification dataset. Throughout the experiment, all normal data were considered the positive class, while all anomalous data were considered the negative class. A selection of 4100 normal samples was used for training, and 1000 samples were reserved for testing, of which 925 were normal and 75 were anomalous.
SMAP dataset: This dataset was obtained from a NASA spacecraft and contains real-world data. It consists of 55 telemetry channels, 429,735 telemetry values, and 69 anomalous sequences. Anomalies within the dataset are divided into two categories: point anomalies and contextual anomalies. For the experiment, 70,000 normal samples from channels A1 to A9 were selected for training, and 7000 samples were selected for testing, with 10% of the testing samples being anomalous.
MSL dataset: This dataset is a real-world dataset derived from NASA’s spacecraft via the Mars Science Laboratory (MSL). It comprises anomalous data stemming from Incident, Surprise, and Anomaly (ISA) reports from a spacecraft monitoring system. The data set consists of 27 telemetry channels, 66,709 telemetry values, and 36 anomalous sequences. To perform the experiments, 15,000 normal samples were selected for training, and 1500 samples were selected for testing, with 10% of the testing samples being anomalous.
Payload Dataset: This dataset consists of operation data collected from the Payload of China’s aerospace satellite. It includes variables such as current, voltage, temperature, and command parameters, totaling 66 dimensions. To represent a typical anomaly of a complex mode type during its operational phase, a subset of 96,662 samples was extracted following preprocessing steps, including noise reduction. Upon analyzing the data samples, a change in current was observed during abnormal operational stages, although it remained within the valid threshold values. Figure 2 shows that under normal conditions, the payload operates with a current of approximately 1.44 A within the South Atlantic Anomaly (SAA) and around 1.33 A outside the SAA. However, an anomaly occurred with a current surge to approximately 1.49 A. To reduce the influence of significant variations in current dimensionality during experimentation, we reduced the dataset to retain the remaining 65 dimensions and then selected 80,000 normal samples as the training set and the remaining 16,662 samples as the test set. The test set comprised 3083 anomalous samples and 13,580 normal samples.
Figure 2.
The current curve before and after fault.
5.2. Performance Metrics
Some metrics were chosen to evaluate the performance of the proposed method during the experiments in this section, including precision , recall , F1 score, and the processing speed .
where (True Positive) indicates the number of correctly detected anomalies, (False Positive) is the number of incorrectly detected anomalies, (False Negative) denotes the number of missed detected anomalies, is the total number of samples, and is the time used to process the samples.
5.3. The Improvement Effect of BOA-GNG
In this section, we demonstrate the detailed effect of improvement for BOA-GNG. As described in Section 3.3, BOA-GNG adopts the adaptive learning rate of neurons, which is related to the Euclidean distance of the input data and the winner neuron, the maximum Euclidean distance between the winner neuron and its neighbors, and the winning times of each neuron. Figure 3a indicates the changing curve of the learning rate and neuron winning times of one neuron on the Payload dataset. Although the adaptive learning rate fluctuates due to the calculated Euclidean distance of the input data and winning neuron during the learning process, we can see that the improved learning rate adjustment strategy can obtain a faster convergence speed.
Figure 3.
(a) The changing curve of learning rate and neuron winning times. (b) Value range setting of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) hyperparameters.
Here, we discuss the detailed hyperparameter optimization of BOA-GNG based on the Payload dataset. The hyperparameters of the proposed method described in line 1 of Table 3 are the maximum edge’s age , the maximum of the neurons , and the threshold of isolated neuron deletion . The parameter set , and the objective function is defined in Equation (5). We selected samples from the training set at a ratio of 4:1 as the validation set, randomly selected 320 sets of , and then calculated the value of the objective function. In order to determine the appropriate search scope of the parameter set , the control variable method was used to analyze the relations of optimization loss and hyperparameters. As shown in Figure 3b, as the value of is greater than 120, the value of the objective function decreases. If the value of is greater than 260, the value of the objective function remains almost unchanged. The value of the objective function decreases when the value of is greater than 150. Therefore, initialize the search scope of the parameter set is as follows:
On the basis of the range setting of hyperparameters, the Bayesian optimization steps in Algorithm 3 are conducted to search the optimal parameter setting of BOA-GNG, and the best parameter combination is obtained, where . Figure 4 shows the iteration loss of BOA-GNG under different hyperparameter settings via grid search and Bayesian optimization, which demonstrates that the latter can obtain a better convergence value with a faster convergence speed.
Figure 4.
The iteration loss of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) under different hyperparameter settings.
5.4. Comparison Experiments
In the comparison experiments, the proposed model was compared with the original GNG, GWR, GNG-I, SOIIN, and K-Means. The experiments were conducted on a computer equipped with an Intel(R) Core (TM) i7-10700 CPU at 2.90 GHz and 16 GB of RAM. According to the results of the test set, the values of the corresponding evaluation criteria are shown in Table 4. For each dataset, the result of the best-performing method is highlighted in bold, and we can come to the conclusion that BOA-GNG achieves good results on all datasets.
Table 4.
The comparison results of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) and other methods on six datasets.
Given the sequence nature of streaming data, divide the test set into three equal parts in order (part 1, part 2, part 3). To simulate the continuous arrival of streaming data, define that Stream 1 includes part 1, Stream 2 includes part 1 and part 2, and Stream 3 includes part 1, part 2, and part 3. For each dataset, the results of the rolling test of BOA-GNG are shown in Table 5. As the data flow continues to expand, the values of the corresponding evaluation criteria of BOA-GNG remain at a relatively high level.
Table 5.
The results of Bayes-Optimized Adaptive Growing Neural Gas (BOA-GNG) on the rolling tests.
For online abnormal detection scenarios, computational efficiency is important as it is the high accuracy of anomaly detection, which includes learning time and detecting time for incremental learning techniques. As a result, the experiments compare the computational efficiency of BOA-GNG and other GNG-based models. The results are shown in Table 6, and the bold fonts denote the highest computational efficiency of the methods on each dataset. , , respectively, represent the average velocity of the learning and detecting processes. We show that BOA-GNG outperforms other models on most datasets in terms of learning and velocity detection.
Table 6.
Comparison of online learning rates on each dataset.
5.5. Ablation Experiments
Ablation experiments are conducted to verify the effectiveness of the optimization strategy in Section 3.2 and Section 3.3 We define the model with a fixed learning rate as BOA-GNGFLR, with StepLR as BOA-GNGSLR, and with the fixed insertion step size as BOA-GNGFIS. The results of anomaly detection performance are shown in Table 7 and the result of the best-performing model is highlighted in bold. It can be seen that the adaptive learning rate and new insertion strategy have a positive effect on improving the GNG-based anomaly detection performance.
Table 7.
Results of ablation experiments on the payload dataset.
6. Conclusions
In this paper, we propose a novel incremental learning model named BOA-GNG for online anomaly detection of streaming data. The proposed approach adopts our previous model [22] as a baseline model, of which the learning rate, neuron insertion strategy, and network optimization strategy are made for better dynamic data learning ability and online detection performance.
We demonstrate that the adaptive learning rate can obtain faster convergence ability compared to the previous linear learning rate. Then, the new neuron insertion method can improve the model’s adaptability for evolving data streams and make it more flexible to adapt to changes in data distributions. Finally, Bayesian optimization is introduced for fast and fine-grained hyperparameter setting instead of grid search.
Five open datasets and a real engineering dataset from the aerospace satellite are taken as experimental cases to verify the effectiveness and superiority of the proposed model. Results indicate that BOA-GNG can improve precision, recall, F1 score, and computing efficiency of online anomaly detection compared with classical GNG-based models, and both the precision and recall of BOA-GNG reach more than 95% on six datasets. Additionally, the lowest learning and detecting velocity can reach nearly 500 dot/s and 4000 dot/s. Further studies demonstrate that an adaptive learning rate and new neuron insertion strategy are effective in improving the baseline model.
Author Contributions
Methodology, J.Z., L.S. and S.G.; writing—original draft, J.Z. and L.S.; software, S.G. and M.L.; supervision, X.L. and L.G.; investigation, C.H. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Prospective Foundation of Technology and Engineering Center for Space utilization, Chinese Academy of Sciences of funder grant number No. T303271.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data that support the findings of this study can be obtained by contacting songlei@csu.ac.cn.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Fritzke, B. A growing neural gas network learns topologies. Adv. Neural Inf. Process. Syst. 1995, 7, 625–632. [Google Scholar]
- Frezza-Buet, H. Following non-stationary distributions by controlling the vector quantization accuracy of a growing neural gas network. Neurocomputing 2008, 71, 1191–1202. [Google Scholar] [CrossRef]
- Xiang, Z.; Zhu, J. Network anomaly detection with improved self-organizing incremental neural network. Comput. Eng. Appl. 2014, 50, 88–91. [Google Scholar]
- Ren, H.; Guo, C.; Yang, R.; Wang, S. Fault diagnosis of electric rudder based on self-organizing differential hybrid biogeography algorithm optimized neural network. Measurement 2023, 208, 112355. [Google Scholar] [CrossRef]
- Masuyama, N.; Amako, N.; Yamada, Y.; Nojima, Y.; Ishibuchi, H. Adaptive resonance theory-based topological clustering with a divisive hierarchical structure capable of continual learning. IEEE Access 2022, 10, 68042–68056. [Google Scholar] [CrossRef]
- Zheng, S.; Lan, F.; Castellani, M. A competitive learning scheme for deep neural network pattern classifier training. Appl. Soft Comput. 2023, 146, 110662. [Google Scholar] [CrossRef]
- Wang, X.; Wang, J.; Zhang, Y.; Du, Y. Analysis of local macroeconomic early-warning model based on competitive neural network. J. Math. 2022, 2022, 7880652. [Google Scholar] [CrossRef]
- Vanguri, N.; Pazhanirajan, S.; Kumar, T. Competitive feedback particle swarm optimization enabled deep recurrent neural network with technical indicators for forecasting stock trends. Int. J. Intell. Robot. Appl. 2023, 7, 385–405. [Google Scholar] [CrossRef]
- Yoon, S.; Lee, J.; Lee, B. Ultrafast local outlier detection from a data stream with stationary region skipping. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1181–1191. [Google Scholar]
- Van, D.; Tinne, T.; Tolias, A. Three types of incremental learning. Nat. Mach. Intell. 2022, 4, 1185–1197. [Google Scholar]
- Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. (CSUR) 2021, 54, 38. [Google Scholar] [CrossRef]
- Andrew, C.; Syed, A.; Des, M. Autoencoder and incremental clustering-enabled anomaly detection. Electronics 2023, 12, 1970. [Google Scholar] [CrossRef]
- Bigdeli, E.; Mohammadi, M.; Raahemi, B.; Matwin, S. Incremental anomaly detection using two-layer cluster-based structure. Inf. Sci. Int. J. 2018, 429, 315–331. [Google Scholar] [CrossRef]
- Gokcesu, K.; Neyshabouri, M.; Gokcesu, H.; Kozat, S.S. Sequential outlier detection based on incremental decision trees. IEEE Trans. Signal Process. 2018, 67, 993–1005. [Google Scholar] [CrossRef]
- Nawaratne, R.; Alahakoon, D.; Silva, D.; Yu, X. Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans. Ind. Inform. 2020, 16, 393–402. [Google Scholar] [CrossRef]
- Agarwal, R.; Nagpal, T.; Roy, D. A Novel Anomaly Detection for Streaming Data using LSTM Autoencoders. Int. J. Recent Technol. Eng. 2021, 10, 233–241. [Google Scholar]
- Fahn, C.; Kao, C.; Wu, M.; Chueh, H.E. SOINN-based abnormal trajectory detection for efficient video condensation. Comput. Syst. Sci. Eng. 2022, 42, 451–463. [Google Scholar] [CrossRef]
- Hu, X.; Zhang, X.; Peng, X.; Yang, D. A novel algorithm for the fault diagnosis of a redundant inertial measurement unit. IEEE Access 2020, 8, 46080–46091. [Google Scholar] [CrossRef]
- Mahmoudabadi, A.; Rafsanjani, M.; Javidi, M. Online one pass clustering of data streams based on growing neural gas and fuzzy inference systems. Expert Syst. 2021, 38, e12736. [Google Scholar] [CrossRef]
- Song, L.; Zheng, T.; Wang, J.; Guo, L. An improvement growing neural gas method online anomaly detection of aerospace payloads. Soft Comput. 2020, 24, 11393–11405. [Google Scholar] [CrossRef]
- Martinetz, T.; Berkovich, S.; Schulten, K. Neural-gas network for vector quantization and its application to timeseries prediction. IEEE Trans. Neural Netw. 1993, 4, 558–569. [Google Scholar] [CrossRef]
- Hebb, D.O. The Organization of Behavior; Wiley: New York, NY, USA, 1988; pp. 43–54. [Google Scholar]
- Fritzke, B. A self-organizing network that can follow nonstationary distributions. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 1997; pp. 613–618. [Google Scholar]
- Marsland, S.; Shapiro, J.; Nehmzow, U. A self-organizing network that grows when required. Neural Netw. 2002, 15, 1041–1058. [Google Scholar] [CrossRef] [PubMed]
- Sun, Q.; Liu, H.; Harada, T. Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recognit. 2017, 64, 187–201. [Google Scholar] [CrossRef]
- Mohamed-Rafik, B.; Slawomir, N.; Payberah, A. An adaptive algorithm for anomaly and novelty detection in evolving data streams. Data Min. Knowl. Discov. 2018, 32, 1597–1633. [Google Scholar]
- Zhang, Q.; Wu, H.; Tao, J.; Ding, W.; Zhang, J.; Li, J. Fault Diagnosis of Rolling Bearing Based on CNN with Attention Mechanism and Dynamic Learning Rate. In Proceedings of the 2021 International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), Nanjing, China, 21–23 October 2021; pp. 1–7. [Google Scholar]
- Liu, X.; Ma, T.; Gao, W.; Zhu, X.; Wen, Y.; Pan, W. Outlier Detection Using Machine Learning Algorithms Integrated with Bayesian Optimization. In Proceedings of the 2022 International Conference on Algorithms, Data Mining, and Information Technology (ADMIT), Xi’an, China, 23–25 September 2022; pp. 160–165. [Google Scholar]
- Zhou, A.; Zhu, Q.; Zhang, J.; Meng, K. Ship Intrusion Detection Technology Based on Bayesian Optimization Algorithm and XGBoost. In Proceedings of the 2023 3rd International Conference on Electrical Engineering and Control Science (IC2ECS), Hangzhou, China, 29–31 December 2023; pp. 1647–1652. [Google Scholar]
- Sarhan, M.; Kulatilleke, G.; Lo, W.W.; Layeghy, S.; Portmann, M. DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly Detection. In Proceedings of the 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), Bangalore, India, 1–4 May 2023; pp. 1–7. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).