Anomaly Detection in Time Series Data Using Reversible Instance Normalized Anomaly Transformer

Anomalies are infrequent in nature, but detecting these anomalies could be crucial for the proper functioning of any system. The rarity of anomalies could be a challenge for their detection as detection models are required to depend on the relations of the datapoints with their adjacent datapoints. In this work, we use the rarity of anomalies to detect them. For this, we introduce the reversible instance normalized anomaly transformer (RINAT). Rooted in the foundational principles of the anomaly transformer, RINAT incorporates both prior and series associations for each time point. The prior association uses a learnable Gaussian kernel to ensure a thorough understanding of the adjacent concentration inductive bias. In contrast, the series association method uses self-attention techniques to specifically focus on the original raw data. Furthermore, because anomalies are rare in nature, we utilize normalized data to identify series associations and employ non-normalized data to uncover prior associations. This approach enhances the modelled series associations and, consequently, improves the association discrepancies.


Introduction
Anomaly detection in time series data is pivotal in modern data analysis [1] and involves identifying rare patterns or discrepancies that deviate from expected behaviors.This form of detection has a broad range of applications in industries, such as manufacturing, healthcare, and finance [2][3][4].As technological advancements continue, we are producing and collecting more data than ever before.This influx is not just about the sheer volume of data but also its complexity.Complex relationships and patterns embedded within data grow more nuanced as data expand.Given that many modern systems and processes are data-driven, even minor irregularities can lead to significant consequences [5].
Not all anomalies are of concern; some might be benign outliers without substantial impacts.However, others could indicate severe issues, such as critical system failures.In industries, like system operations, finance, and healthcare, distinguishing between these types of anomalies can be of paramount importance [6][7][8].
We analyze time series data either as univariate or multivariate [9].These data can be further decomposed into four distinct types [10].The secular trend represents the consistent, long-term direction of a dataset.Seasonal variations are predictable patterns that recur at regular intervals, like sales spikes during holidays.Cyclical fluctuations refer to longerterm changes without a fixed pattern, often influenced by broader conditions, like economic recessions.Irregular variations represent unpredictable changes due to unforeseen events or outliers, with irregular anomalies being these sudden, unexpected variations.
In this paper, we suggest the reversible instance normalized anomaly transformer for unsupervised anomaly detection in real-life time series data.First, we considered the transformer [11] architecture, following the success of the anomaly transformer [12] Sensors 2023, 23, 9272 2 of 15 in anomaly detection.Transformers have also achieved positive results in the areas of natural language processing [13], machine vision [14,15], and time series [16].These successes can be attributed to the ability of self-attention in transformers to obtain longrange individual relationships.Furthermore, following the observation in [17,18], it is evident that time series data undergo distribution shifts.Addressing distribution shifts while performing time series forecasting has led to significantly improved results [17,18].However, attempting to normalize time series data during anomaly detection could further degrade the performance of the model owing to the sparsity of anomalies in actual data.Also, normalizing time series data, like in [17], could nullify the anomalies as anomalous datapoints will be closer in value to the normal datapoints.We normalize the used time series dataset for anomaly detection such that anomalies are highlighted as compared to normal datapoints.In anomaly transformers [12], the idea of using prior associations and series-associations seems to be highly effective.Here, series associations are calculated using the self-attention of transformers, and prior associations are calculated using learnable Gaussian kernels to calculate priors with respect to the relative temporal distance.To that end, reversibly normalized data should be used for determining series associations, and regular datapoints should be passed to determine prior associations.In this way, when a minimax-strategy-based association discrepancy is used for anomaly detection, the anomalies are highlighted more as compared to normal datapoints.The contributions of the paper can be summarized as follows: • the suggestion of the reversible instance normalized anomaly transformer to highlight anomalies better than normal datapoints; • the achievement of comparable or better results in four actual datasets.

Related Works
Anomalies in time series data can occur in various ways and can broadly be categorized into temporal, intermetric, or a combination of temporal-intermetric anomalies.Temporal anomalies [19] can be global, where singular or multiple points in a series have values significantly different from the rest.Contextual anomalies are variations relative to neighboring data points.Anomalies in one context might be normal in another.Seasonal anomalies deviate from the regular seasonality of a series.Trend anomalies cause a persistent shift in the data's mean, leading to a change in the time series trend without affecting its cyclical and seasonal patterns.Shapelets pertain to subsequences in data for which cycles or shapes deviate from the usual pattern, influenced by external factors.In analyzing time series data, several algorithms have been proposed for anomaly detection.Based on intrinsic model characteristics, these anomaly detection algorithms can be systematically classified into five distinct categories.

Stochastic Models
Although modern machine-learning-based methods are increasingly popular for this task, there are several traditional techniques and categories that have been used over the years.These models operate on the assumption that data follow a specific statistical pattern or distribution.Anomalies are identified when observed data points deviate significantly from this expected pattern.Examples include autoregressive integrated moving average (ARIMA) [20], the exponential smoothing state space model (ETS) [21], and the seasonal decomposition of time series (STL) [22].

Distance-Based Models
The core idea of these models is that anomalies are data points that are far away from other points.Examples include the k-nearest neighbor (k-NN) algorithm [23], where a point is considered an anomaly if its distance from its k th nearest neighbor exceeds some threshold.Density-based methods, like (DBSCAN) [24], can also be considered in this category, where sparse regions with a low density of data points can be indicative of anomalies.

Information-Theoretic Models
These models are based on concepts from information theory, such as entropy.The idea is to measure the randomness or unpredictability in the data [25].High or low entropy regions, depending on the context, can be indicative of anomalies.A sudden spike in entropy in time series data might indicate an anomaly.

Machine Learning and Deep Learning Models
These models are trained on historical time series data to learn data patterns.Anomalies are detected when new data points significantly differ from the model's prediction.We can further divide machine learning and deep learning models into two categories, namely, forecasting-based models and reconstruction-based models.

Forecasting-Based Models
Forecasting-based models learn the usual patterns from past data, predict future patterns, and then label anomalies if real future data are too different from their predictions.Recurrent neural networks (RNNs) are the commonly used approach as they are designed to handle sequences of data, making them naturally suited for time series.RNNs are trained on a sequence of data points to learn the pattern.When predicting future data points, if the actual data deviate significantly from their predictions, the data are labeled as anomalous.Long short-term memory (LSTM) is an advanced type of RNN that is designed to remember patterns over long sequences and avoid long-term dependency issues found in traditional RNNs.LSTMs are particularly good at capturing long-term patterns in time series data.If the LSTM's prediction for a future data point does not match the actual observed data, it is an indication of an anomaly.Owing to their long memory, they can be particularly useful for spotting anomalies that are based on long-term patterns [26,27].Convolutional neural networks (CNNs) are primarily designed for image processing to identify spatial hierarchies in data.However, they can be adapted for time series data by treating segments of time series as local patterns.A CNN can slide over a time series and learn local patterns [28].After training, if a new pattern appears that does not match any learned pattern, the CNN can label this as an anomaly.It is effective for capturing local anomalies in a dataset.Transformer-based models [29] use attention mechanisms to weigh the importance of different data points in a sequence.Introduced for natural language-processing tasks, their adaptability has extended their usage for time series forecasting.Transformers can give attention to significant patterns in a time series dataset.When trained, if the model encounters a data point or sequence that significantly deviates from the patterns it gave attention to, the model can label that as an anomaly.The capacity to handle long sequences with varied attention spans makes transformers robust for complex anomaly detection scenarios.Graph neural networks (GNNs) are designed for graph-structured data.Graphs consist of nodes and edges, and GNNs process these data by propagating and aggregating information from neighboring nodes to enhance the feature representation of each node or edge.Time series data are transformed into a graph format, especially when there is a relationship or correlation between different time series.For instance, in multivariate time series, where different series influence each other, or in scenarios where temporal patterns form a network of relationships [30], GNNs learn the underlying structure and relationships in the data.When a deviation from the learned graph structure or relationship pattern occurs, it is an indication of an anomaly.

Reconstruction-Based Models
This type of model aims to learn a compressed representation of the data and then reconstruct it.Anomalies are often identified based on how well the model can reconstruct a particular data point or sequence.Autoencoder-based models [31] aim to copy their inputs to their outputs and consist of an encoder, which compresses the input into a latent-space representation, and a decoder, which reconstructs the input data from this representation.For anomaly detection, this model trains the autoencoder on normal data so that it learns to reconstruct the input data well.When an anomalous data point is passed through, the reconstruction error (difference between the original data point and its reconstruction) tends to be high, signaling an anomaly.Variational autoencoder (VAE)-based models [32] are a type of autoencoder with added constraints on the encoded representations and are designed to generate new data points and, hence, are often used in generative tasks.For anomaly detection, like standard autoencoders, VAEs are trained on normal data to learn the data structure.Anomalies are data points that are difficult for the VAE to reconstruct, leading to high reconstruction errors.Additionally, the latent space of a VAE (where data are compressed) follows a specific distribution, and deviations from this can also signal anomalies.Generative adversarial network (GAN)-based models [33] consist of two networks: a generator that produces data and a discriminator that evaluates them.The generator tries to produce data that the discriminator cannot distinguish from real data.GANs can be trained on normal data, where the generator learns to produce normal data samples.When a real data point is fed to the discriminator and is deemed as "fake" (or different from the learned distribution), it can be an indication of an anomaly [34].

Proposed Method
We propose an anomaly detection method that combines a transformer architecture with an autoencoder structure.Transformer-based models are originally designed for natural language-processing tasks [11].These models use an attention mechanism to weigh the importance of different data points in a sequence, enabling the models to capture longrange dependencies in data.Transformers can be trained in a reconstruction manner similar to autoencoders and can learn to predict or reconstruct a segment of a time series based on its context.A high reconstruction error indicates an anomaly.Given the transformer's ability to handle long sequences and varied attention spans, it can capture both local and global anomalies in data.In the majority of existing time series anomaly detection methods, there is a prevalent emphasis on understanding predominant temporal patterns.However, these traditional approaches prioritize either pointwise representations focusing on individual data points or pairwise associations examining relationships between pairs.Thus, these models often hesitate in comprehensively capturing the adjacent concentration inductive bias of each time point in time series data.This inductive bias suggests that for each time point in a time series, its immediate neighbors are more relevant or influential for its representation than distant points.Furthermore, these models can be susceptible to distribution shifts in the data, meaning that the models might struggle when the underlying statistical properties of the time series change over time.
To address the challenges faced by traditional time series anomaly detection methods, a two-fold solution is proposed.First, the learnable Gaussian kernel is introduced to effectively handle the adjacent concentration inductive bias, ensuring that each data point in the series adequately emphasizes its immediate neighbors.Second, the integration of reversible instance normalization (RevIN) is suggested, incorporating both normalization and denormalization with a learnable affine transformation.This approach provides a robust mechanism to counteract distribution shifts, ensuring consistent model performance even as the underlying statistical properties of the data evolve.

Anomaly Transformer
The anomaly transformer is an adaptation of the transformer architecture designed for unsupervised time series anomaly detection.In anomaly transformers, the temporal association between data from each time point is obtained using a self-attention map and is termed as 'series association'.The series association is more significant for nonanomalous time points and less so for anomalous time points.As anomalous time points are less frequent, their associations with the adjacent time points are much higher, where these disruptions are more likely to appear.This is termed as 'prior association'.Based on the series association and prior association, a new criterion called the 'association discrepancy' is introduced for anomaly classification.The self-attention is modified to separately obtain the prior association and series association for each time point.Although series associations are obtained using the conventional self-attention, prior associations are obtained using learnable Gaussian kernels.A minimax approach is implemented to enhance the differentiation between normal and abnormal patterns in the association discrepancy.

Reversible Instance Normalization
Time-series forecasting models frequently encounter challenges related to distribution shifts, where statistical properties in training and test data evolve over time, leading to performance issues.Although removing non-stationary information from input sequences can mitigate these discrepancies, it may compromise the model's ability to capture the original data distribution.To address this issue, reversible instance normalization (RevIN) was introduced, a method that normalizes input sequences and then denormalizes the model's output sequences using normalization statistics [17].This approach maintains the performance while effectively handling distribution shifts in time-series forecasting.Suppose we have a set of input and output time series data, X = x (i) N i=1 and , respectively, where N is the number of sequences, K is the number of variables, T x is the length of the input, and T y is the length of the output.Then, given the mean and standard deviation of each instance, x (i) k ∈ R T x , the data are normalized as follows: where µ t x (i) kt and Var x kt are the mean and standard deviation (Var), respectively, and γ, β ∈ R K are learnable affine parameters.The mean and standard deviation are given as follows: Similarly, the forecasting-model output is denormalized as follows: In this work, we intentionally used the concept of normalization to further emphasize the differences between the anomalous and non-anomalous datapoints by normalizing the data.Because anomalies are rare, it is difficult for them to build series associations, and their associations with their neighboring datapoints are stronger.When input data are normalized, anomalies in the data are less significant.Considering this, we propose to find series associations using normalized data and prior associations using the original (non-normalized) data.We hypothesize that this way, stronger prior associations can be observed, which will help us to obtain better association discrepancies.In our architecture, we do not use learnable parameters, β, as it has previously been determined that the difference between using them and not using them is negligible [18].

Reversible Instance Normalized Anomaly Transformer (RINAT)
By focusing on the constraints of transformers and the achievement of the anomaly transformer in unsupervised anomaly detection, we enhanced the anomaly transformer to the reversible instance normalized anomaly transformer.We adopted the anomaly transformer [12] as it addresses the challenge of the adjacent inductive by introducing the prior association and series association of each time point.We also leveraged the concept of the reversible normalization and rethought the anomaly transformer for the same application.This architecture estimates the anomaly score based on the association discrepancy and reconstruction error.The association discrepancy considers the prior association and series association of each time point.The prior association employs the learnable Gaussian kernel to present the adjacent concentration inductive bias of each time point.The series association corresponds to the self-attention weights learned from raw series.We renovated the anomaly transformer by adding the reversible instance learnable normalization to input time series data because anomalies are rare and normalization might reduce the impact of anomalies.Thus, normalization was only applied to the series association part, as shown in Figure 1.This partial application of the reversible instance normalization brings to light the variations between the series associations and the prior associations while determining the association discrepancies.As in the anomaly transformer, we utilized an encoder-only design, with stacks of specially designed attention blocks and feedforward layers.These stacks are repeated multiple times.However, the attention block is different from the anomaly attention block.

Reversible Instance Normalized Anomaly Transformer (RINAT)
By focusing on the constraints of transformers and the achievement of the anomaly transformer in unsupervised anomaly detection, we enhanced the anomaly transformer to the reversible instance normalized anomaly transformer.We adopted the anomaly transformer [12] as it addresses the challenge of the adjacent inductive by introducing the prior association and series association of each time point.We also leveraged the concept of the reversible normalization and rethought the anomaly transformer for the same application.This architecture estimates the anomaly score based on the association discrepancy and reconstruction error.The association discrepancy considers the prior association and series association of each time point.The prior association employs the learnable Gaussian kernel to present the adjacent concentration inductive bias of each time point.The series association corresponds to the self-attention weights learned from raw series.We renovated the anomaly transformer by adding the reversible instance learnable normalization to input time series data because anomalies are rare and normalization might reduce the impact of anomalies.Thus, normalization was only applied to the series association part, as shown in Figure 1.This partial application of the reversible instance normalization brings to light the variations between the series associations and the prior associations while determining the association discrepancies.As in the anomaly transformer, we utilized an encoder-only design, with stacks of specially designed attention blocks and feedforward layers.These stacks are repeated multiple times.However, the attention block is different from the anomaly attention block.Given the time series data, X ∈ R T , with T time steps, and each time-step value, x i , in the sequence, we perform embedding on the given time series data.For the input layer, we take layer l = 0.
For X l=0 Out ∈ R T×D , D represents the embedding dimension, effectively capturing both the time series length and the embedded feature dimensions.The proposed transformer architecture for anomaly detection integrates the power of the traditional transformer with additional steps.These steps include the reversible normalization, semi-stationary anomaly attention, as well as strategic placements of the layer normalization and denormalization.A salient feature of this architecture is the semi-stationary anomaly attention, which intakes two distinct inputs.The first one is the normalized data from the reversible normalization stage and second one is the raw embedded data directly from the embedding phase.The reversible normalization stage normalizes the given data by subtracting their mean, µ, and dividing by their standard deviation, σ.
Sensors 2023, 23, 9272 7 of 15 Given two distinct inputs to the semi-stationary anomaly attention, this stage estimates the anomaly discrepancy using the two-branch structure.One branch estimates the prior association to address the challenge of the adjacent inductive.The relationship between two temporal points, i and j, with respect to the relative temporal distance within the series is quantified using the Gaussian kernel, represented by the following equation: Benefiting from the unimodal property of the Gaussian kernel, essentially, this design can pay more attention to the adjacent time points.The learnable scale parameter, σ, for the Gaussian kernel makes prior associations adapt to various time series patterns, such as different lengths of anomaly segments.
Next, a branch of the normalized anomaly attention estimates the series association.The series association corresponds to the self-attention weights learned from raw series.Given the embedded data, X l=0 norm , self-attention weights are computed using a scaled dotproduct between query Q l , and key K l , followed by a SoftMax operation.We compute Q l , K l , and V l as follows: where are weights for layer l.Then, the series association coefficient, S l , is derived as follows: The series association coefficient and prior association coefficient both represent the probability distribution.The disparity between the prior and series associations is measured using the Kullback-Leibler (KL) divergence.
After the attention mechanism, the output is normalized using the layer normalization.This step improves the model convergence and ensures stable activations.The normalized output is then directed to a feedforward neural network, which further extracts high-level features and representations from the data.Once processed, the output undergoes another layer normalization step to maintain a stabilized activation range.To preserve the original time series scale and pattern, a denormalization step is employed, reversing the effects of the initial normalization and ensuring the final output remains intricately tied to the original series dynamics.
In the proposed architecture, the process for learning or training to achieve the desired performance is guided using two loss functions simultaneously.This dual-loss approach helps the network to learn and adapt based on two different objectives or criteria.The primary component is the reconstruction loss, measuring the disparity between the original series and the decoded output, essentially guiding the series association to recognize the most pivotal associations.Complementing this is the association discrepancy loss, which Sensors 2023, 23, 9272 8 of 15 highlights the differences between typical patterns and unusual patterns in time series data.The loss function for input series is as follows: Loss Final (X, P, S, λ; X Rev ) = X − X Rev −λ AssDisp(P, S; X) (12) The value of λ determines the influence of the association discrepancy within the broader context of the loss function.Additionally, we implemented the minimax strategy to make the association discrepancy more distinguishable.This approach is employed between the series association and prior association in two phases.In the minimize phase, the model adjusts the prior association, P l , to reflect the series association, S l .The prior association serves as an initial model or understanding, which is then refined or updated based on the actual patterns observed in the series association.This enables the prior association to become more adaptable to a variety of temporal patterns found in the data.Conversely, in the maximize phase, the objective is to increase the association discrepancy, pushing the series association to focus more on non-adjacent data points.The model pays extra attention to data points that are separated by significant time intervals.A score, AS(X), is assigned for each data point in the series to quantify the deviation of the data point from the norm.
This gives the pointwise anomaly criteria based on the association discrepancy.

Experiments
We extensively evaluated the proposed RINAT with different publicly available datasets in three practical applications.

Datasets
We used the following four datasets in our experiments: (1) the server machine dataset (SMD) [35], which is a dataset collected from a large internet company and consists of five-week-long data with 38 dimensions; (2) pooled server metrics (PSMs) [36], which are a collection of internally collected data from multiple application server nodes at eBay and have 26 dimensions; and the (3) Mars Science Laboratory (MSL) rover [37] and (4) Soil Moisture Active Passive (SMAP) [37] satellite datasets, which are public datasets made available by NASA, contain telemetry anomaly data derived from Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems, and have 55 and 25 dimensions, respectively.

Implementation Details
The overall experiments were performed in a system with a single Nvidia Geforce RTX 3090, and the implemented code was written in the Pytorch framework of version 1.13.The overall setup was implemented in a fashion similar to that in the work of the anomaly transformer [12].A non-overlapping sliding window was used to obtain a set of sub-series, just like in [38].For all the datasets, the sliding window was set to a fixed size of 100.Time points were labeled as anomalies if their anomaly scores were higher than a certain threshold, δ.The threshold, δ, was determined such that a proportion, r, of the data in the validation dataset would be labeled as anomalies.For the SMD dataset, we set r = 0.5% and 1% for the rest.For anomaly detection, if a single time point in a certain segment of an anomalous time series was detected, it was considered that the whole anomalous segment was detected.This adjustment strategy has previously been widely adopted [35,38,39] and, similar to the adjustment strategy for the anomaly transformer [12], contains three layers.We set the number of channels in the hidden-state model at 512 and the number of heads, h, at 8. The hyperparameter, λ, (Equation ( 4)) was set at 3 for all the datasets to tradeoff two parts of the loss function.We used the ADAM optimizer [40] at an initial learning rate of 10 −4 .The training process was stopped early, within 10 epochs, with a batch size of 32.

Results
Table 1 shows the quantitative comparison of the precision, recall, and F1 scores for the 16 other baseline models and the suggested model.We can see that although the performance of the suggested model is comparable to that of the anomaly transformer in the SMD, MSL, and SMAP datasets, it is better than that of the state-of-the-art anomaly transformer in the PSM dataset.Figures 2-5 show the comparisons of the precision, recall, and F1 scores, respectively.The proposed model outperforms almost all the existing algorithms except for the anomaly transformer.
Table 1.Quantitative results for the suggested model and 16 other models in four actual datasets.The metrics used for comparison are precision (P), recall (R), and F1 scores.Higher values represent better performance in each of these metrics.The results of anomaly transformer was replicated using their provided code while for the rest of models the results were copied from the anomaly transformer paper [12].With the MSL data, the proposed model shows a slightly lower performance compared to that of the anomaly transformer, especially in terms of the precision and F1 scores.Although the performance of the proposed model is impressive and slightly better than that of the anomaly transformer, with an F1 score of 98.28, we can see that the performance of the proposed model is very comparable to that of the anomaly transformer.The F1 scores are almost the same, indicating a similar overall performance in this dataset.The proposed model seems to show a drop in performance, especially in terms of the recall and F1 scores, compared to those of the anomaly transformer.The anomaly transformer tends to perform better than the proposed model in the MSL and SMD datasets in terms of the F1 score, while the proposed model has a slight edge over the anomaly transformer in the PSM dataset.Both models perform similarly in the SMAP dataset.The proposed model consistently shows higher precision than the anomaly transformer in all the datasets, but it tends to have a lower recall score than the anomaly transformer in the MSL and SMD datasets.Figure 6 shows the ROC curves for the suggested model architecture alongside the ROC curves of the anomaly transformer and BeatGAN architectures.The AUC values of the suggested model architecture in the SMAP and PSM datasets seem to be better than those of even the anomaly transformer architecture.Additionally, for the MSL and SMD datasets, even though the proposed model architecture does not outshine that of the anomaly transformer, the AUC values are comparable.With the MSL data, the proposed model shows a slightly lower performance compared to that of the anomaly transformer, especially in terms of the precision and F1 scores.Although the performance of the proposed model is impressive and slightly better than that of the anomaly transformer, with an F1 score of 98.28, we can see that the performance of the proposed model is very comparable to that of the anomaly transformer.The F1 scores are almost the same, indicating a similar overall performance in this dataset.The proposed model seems to show a drop in performance, especially in terms of the recall and F1 scores, compared to those of the anomaly transformer.The anomaly transformer tends to perform better than the proposed model in the MSL and SMD datasets in terms of the F1 score, while the proposed model has a slight edge over the anomaly transformer in the PSM dataset.Both models perform similarly in the SMAP dataset.The proposed model consistently shows higher precision than the anomaly transformer in all the datasets, but it tends to have a lower recall score than the anomaly transformer in the MSL and SMD datasets.Figure 6 shows the ROC curves for the suggested model architecture alongside the ROC curves of the anomaly transformer and BeatGAN architectures.The AUC values of the suggested model architecture in the SMAP and PSM datasets seem to be better than those of even the anomaly transformer architecture.Additionally, for the MSL and SMD datasets, even though the proposed model architecture does not outshine that of the anomaly transformer, the AUC values are comparable.

Conclusions
In conclusion, our paper introduced the reversible instance normalized anomaly transformer, building upon the fundamental principles of the anomaly transformer.Through a comprehensive evaluation of well-established benchmarks, including those of the anomaly transformer and 16 other baseline models across multiple datasets, we have gained valuable insights.Although our model demonstrates commendable performance, it is crucial to recognize that the model's strengths and limitations are contextdependent, varying across datasets.

Figure 2 .
Figure 2. Comparison of the proposed model with four different models in group1 classifiers using four different datasets: (a) SMD; (b) MSL; (c) SMAP; (d) PSM.

Figure 2 .
Figure 2. Comparison of the proposed model with four different models in group1 classifiers using four different datasets: (a) SMD; (b) MSL; (c) SMAP; (d) PSM.

Figure 4 .
Figure 4. Comparison of the proposed model with four different models in group3 classifiers using four different datasets: (a) SMD; (b) MSL; (c) SMAP; (d) PSM.

Figure 4 .
Figure 4. Comparison of the proposed model with four different models in group3 classifiers using four different datasets: (a) SMD; (b) MSL; (c) SMAP; (d) PSM.

Figure 4 .
Figure 4. Comparison of the proposed model with four different models in group3 classifiers using four different datasets: (a) SMD; (b) MSL; (c) SMAP; (d) PSM.