Skip to Content
Applied SciencesApplied Sciences
  • Article
  • Open Access

27 June 2023

Anomaly Detection through Grouping of SMD Machine Sounds Using Hierarchical Clustering

,
and
1
Divsion of Computer Engineering, Hankuk University of Foreign Studies, Yongin 17035, Republic of Korea
2
Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Smart Machines and Intelligent Manufacturing

Abstract

Surface-mounted device (SMD) assembly machines refer to production lines that assemble a variety of products that fit their purposes. As the required products become more diverse, models that oversee product anomaly detection are also becoming increasing linearly. In order to efficiently oversee products, the number of models has to be reduced and products with similar characteristics have to be grouped and overseen. In this paper, we show that it is possible to handle a large number of new products using latent vectors obtained from the autoencoder model. By hierarchically clustering latent vectors, the model finds product groups with similar characteristics and oversees them by group. Furthermore, we validate our multi-product operation strategy for anomaly detection with a newly collected SMD dataset. Experimental results show that the anomaly detection method using hierarchical clustering of latent vectors is a practical management method for SMD anomaly detection.

1. Introduction

SMD refers to a manufacturing line that continuously assembles various products. Two characteristics of SMDs are as follows:
  • Continuity: The SMD assembly machine constantly assembles products along the manufacturing line, so if the manufacturing line breaks down, the damage is enormous.
  • Product variety: The product type varies depending on which sensor is assembled, and the product type linearly increases over time.
There is a need for a system that can immediately detect abnormalities [1,2] when there is a problem in the manufacturing line; as products become linearly diversified, there is a need for a method that can effectively detect abnormalities in products.
Considering the real-world scenario, where a sufficient amount of abnormal data cannot be obtained, unsupervised learning, which trains only the normal data with the autoencoder [3,4,5] and detects the anomaly with the reconstruction error [6], is suitable.
Previous SMD anomaly detection studies [7,8,9] using autoencoder were developed as follows: Oh et al. [7] performed anomaly detection using a convolutional neural network (CNN)-based autoencoder. The anomaly detection performance of the proposed model was good, but in order to address the disadvantage of a long training time due to the many parameters to learn, Park et al. [8] proposed the fast adaptive RNN encoder–decoder (FARED). FARED is a recurrent neural network (RNN)-based anomaly detection model; it improves the anomaly detection performance while training faster and using fewer parameters than CNN. This is achieved using effective preprocessing methods and stacked RNNs. At that time, there were not many types of products, and the product assembly time was short and constant, so FARED did well in detecting abnormalities. As products diversified over time, two problems arose. One is that as the products diversified, the relatively short and constant assembly time became long and diverse, and the other problem is that there are too many models to manage. To address these problems, Nam et al. [9] proposed temporal adaptive average pooling (TAAP), which is a preprocessing method that enables more effective learning of longer and diversified assembly times, and self-attention based sequence-to-sequence autoencoder (SSS-AE), which can detect abnormalities in multiple products in one model while maintaining the advantage of a fast training speed. Although the multi-product anomaly detection performance of the SSS-AE model is robust, to oversee all products with one model, the products have become more diverse at this point, and there is a problem with having to train the model whenever a new SMD product is made. Therefore, we considered how to group and oversee products with relatively similar characteristics to reduce the number of models to be managed and the number of model trainings.
The proposed method is an anomaly detection technique based on the hierarchical clustering of latent vectors obtained through a pre-trained autoencoder. The autoencoder [3,4,5] is trained to represent data well in the latent space through the compression and reconstruction, and if the latent vectors obtained through the encoder of the pre-trained autoencoder have similar characteristics, the distances in the latent space will be closer than the data with different characteristics. Hierarchical clustering [10] is a system that groups objects into clusters based on their distance from individual clusters. In our proposed method, we leverage the characteristics of both autoencoder and hierarchical clustering to efficiently manage data by grouping them according to their similar features. First, we pre-train the autoencoder model with normal data from products that have abnormal data. The reason for pre-training with the normal data of products that include abnormal data is that abnormal data serve as the threshold for grouping data with similar features. Using a pre-trained autoencoder, we obtain latent vectors of the SMD data, and then perform hierarchical clustering based on the distances between the latent vectors in the latent space. Based on the distance between the latent vectors of the abnormal data and the latent vectors of the normal data of the products used for the pre-training, we group them if the distance between the latent vectors of the normal data of the products used for the pre-training and the latent vectors of the normal data of the other products is closer. The proposed method utilizes both autoencoder and clustering techniques to determine if the data exhibit similar features to the training data. By using abnormal data as the reference, if the data demonstrate similar characteristics, anomaly detection can be performed using the existing model without unnecessary training. This approach proves suitable for anomaly detection in SMD products, which are diverse and continuously produced. In this paper, we show that it is possible to group and oversee products with similar characteristics by hierarchically clustering the latent vector obtained from the autoencoder. Furthermore, we validate our proposed method with a newly collected SMD dataset.

3. Proposed Method

In Section 3, we will describe the proposed method. The method utilizes hierarchical clustering of latent vectors obtained from a pre-trained autoencoder to establish clustering criteria based on the presence of two types of actual abnormal data. This approach effectively reduces the number of unnecessary models and training iterations. The encoder of the autoencoder is trained to accurately represent the features of the training data in the latent space. Consequently, when obtaining latent vectors of data using a pre-trained autoencoder, data with similar characteristics to the training data exhibit shorter distances in the latent space, while data with different characteristics exhibit longer distances. To leverage these characteristics, we propose a method that involves pre-training an autoencoder model with normal data containing abnormal data, performing hierarchical clustering using the obtained latent vectors, and grouping and managing data with similar characteristics. By employing hierarchical clustering, we use the distance between the latent vector of the training data and the latent vector of abnormal data as the threshold for grouping. If the latent vector of normal data from untrained products falls within this threshold, they are grouped. However, if it exceeds the threshold, individual models are trained for anomaly detection. Figure 1 shows the flowchart of the proposed method. Pre-training of the autoencoder is described in detail in Section 3.1 and hierarchical clustering in Section 3.2.
Figure 1. The flowchart of the proposed method.

3.1. Pre-Training the Autoencoder Model

Before explaining the pre-training of the autoencoder model, we introduce the autoencoder. The autoencoder [3] is a neural network that uses unsupervised learning algorithms to train so that the output is equal to the input. The autoencoder consists of an encoder that encodes a high-dimensional input x = [ x 0 , x 1 , , x n 1 ] into a low-dimensional hidden representation h = [ h 0 , h 1 , , h d 1 ] and a decoder that reconstructs the low-dimensional hidden representation h to a high-dimensional x ^ = [ x ^ 0 , x ^ 1 , , x ^ n 1 ] . The equations for the encoder and decoder networks are as follows:
h = f ( W h x + b h ) ,
x ^ = g ( W x ^ h + b x ^ ) ,
where x , x ^ R n , n is the dimension of the input x and output x ^ , and h R d , W h R d × n , W x ^ R n × d , d is the dimension of the hidden representation h ; W and b denote the weights and bias vectors, and f , g are non-linear activation functions [22]. The autoencoder is trained by reducing the reconstruction error by Equations (17) and (18) between the input x and the output x ^ . If the input is continuous, we calculate the reconstruction error with the mean squared error in the following equation:
L ( x , x ^ ) = x x ^ 2 ,
If the input is categorical, we calculate the cross-entropy loss using the following equation:
L ( x , x ^ ) = 1 n i = 1 n ( x i log x ^ i + ( 1 x i ) log ( 1 x ^ i ) ) ,
where n is the number of categories of categorical data. In the process of compressing and reconstructing the input data, the encoder of the autoencoder is trained to extract important features from the input data so that the decoder can accurately reconstruct the hidden representation.
We pre-train the model SSS-AE [9] to utilize the characteristics of the autoencoder. The SSS-AE is one of the autoencoder-based SMD anomaly detection models and has the best performance among existing SMD anomaly detection models, meaning that the model represents the input well in the latent space. The SSS-AE is a Seq2Seq-based autoencoder model, and RNN layers of the SSS-AE are LSTM. The encoder and decoder of the SSS-AE consist of N blocks, consisting of LSTM, multi-head self-attention of the transformer [23], and layer normalization [24] being sequentially connected. The residual connection [25] is used to add the output of each sub-layer of the block before layer normalization. The last block of the decoder consists of one LSTM layer for reconstructing the output of the previous block as input.
We pre-train SSS-AE using normal data from products that contain abnormal data according to Algorithm 1. This is because abnormal data will be a threshold for finding untrained data with characteristics similar to those trained. Then, we obtain latent vectors for abnormal data of the trained product and normal data of untrained products using the pre-trained SSS-AE model.
Algorithm 1:Pre-trained autoencoder model with normal data
Input: Normal dataset [ x 0 , x 1 , , x n 1 ] of the product,
         where n is the number of product data samples
Output: Encoder of the autoencoder
   repeat
        calculate L ( X , X ^ ) by Equations (9), (10), and (17),
        where X is a set of data samples x and X ^ denotes the output sequence data.
        update parameter using gradients of L .
   until epochs given in the experiment.

3.2. Hierarchical Clustering

By hierarchically clustering the obtained latent vectors according to the distance in the latent space, products closer than the distance between the normal data of the trained product and abnormal data Error 1 and Error 2 of the trained product can be grouped and overseen. Error 1 is noise that is difficult for humans to distinguish, and Error 2 is noise that can be recognized by humans.
Hierarchical clustering is a hierarchical system in which all objects are grouped into a single cluster, with each object represented as a separate cluster [10]. The following linkage methods [26] are used to compute the distance between two clusters U and V . d ( U , V ) refers to the distance between clusters U and V .
  • Centroid: For all combinations of data u i in cluster U and data v j in cluster V , the distance between the center points of clusters U and V according to the following equation:
    d ( U , V ) = c u c v 2 ,
    where c u and c v are the central points of the two clusters U and V , respectively. The center point of the cluster uses the average of all data contained in the cluster according to the following equation:
    c u = 1 | U | i u i ,
    where the | | symbol refers to the number of elements in a cluster.
  • Single: For all combinations of data u i and data v j , we measure the distance to find the smallest value according to the following equation:
    d ( U , V ) = min ( d ( u i , v j ) ) .
  • Complete: For all combinations of data u i and v j , we measure the distance to find the largest value according to the following equation:
    d ( U , V ) = max ( d ( u i , v j ) ) .
  • Average: For all combinations of data u i and v j , we measure the distance to find the average according to the following equation:
    d ( U , V ) = i , j d ( u i , v j ) | U | | V | .
  • Median: This method is a variation of the centroid linkage method. Similar to the centroid method, the distance of the center points of clusters is the distance between clusters. If clusters S and T combine to form cluster U according to the following equation:
    U S + T ,
    the center point of the cluster U is not newly calculated, but the average of the center points of the two clusters of the original cluster is used according to the following equation:
    c u = 1 2 ( c s + s t ) .
    Therefore, the calculation is faster than obtaining the center point by averaging all the data in the cluster.
The dendrogram resulting from the hierarchical clustering of the latent vectors can be used to set a threshold distance between the latent vectors of normal data and abnormal data of the trained product. Normal data of untrained products that are closer than this threshold can be grouped for anomaly detection. This is because the pre-trained autoencoder embeds data with similar characteristics to the training data closely in the latent space. By performing hierarchical clustering on these latent vectors, similar characteristics to the training data can be grouped together, which enables anomaly detection without the need for separate model training. If the grouping threshold is placed on Error 1, it can be overseen more strictly, even though there are fewer products that can be grouped, and if the grouping threshold is placed on Error 2, there are many products that can be grouped, so it can be overseen more efficiently.

4. Experiment

In this section, we describe the dataset in Section 4.1, the experimental process in Section 4.2, and the experimental results in Section 4.3. Section 4.1 describes the dataset used in the experiment and newly collected data for verification of the proposed method, and Section 4.1.1 describes how to preprocess the data to train the model more effectively before training it. Section 4.2 describes how the proposed method described in Section 3 was applied to the experiment, and Section 4.3 describes the experimental results.

4.1. Dataset

The SMD dataset consists of sound data collected by microphones installed on SMD assembly machines. As the SMD machine continuously assembles the product, the raw data represent the continuous assembly sounds of the product. As the raw data were too long for training, the data were segmented based on the assembly section of each individual product during the data collection process. The data collection process is the same as in the previous studies [7,8,9]. Figure 2 shows an example of the data collection process. In a previous study [9], we collected 30 normal data samples and 6 abnormal data samples. According to the characteristics of SMD products, which are diversified according to the combination of sensors, products are becoming more diversified in the real field. We collected 15 sets of normal data and 5 sets of abnormal data from 15 SMD products, and the new SMD dataset size is 6.1 GB. Abnormal data were actually collected when there was a problem in the assembly process and classified into two categories based on the degree of human recognition. These two error levels are expressed as Error 1 and Error 2, and are marked as product name-1 and product name-2. Since there are many types of products and the names are complicated, the first two letters of the product name, with one letter in alphabetical order, are combined and marked in the Abbr. column in the table. Amt. is the abbreviation for amount; the Amt. column in the table represents the number of samples collected for each respective product’s data. The previously collected dataset is described in Table 1 and the newly collected dataset is described in Table 2. The bold text in the table means abnormal.
Figure 2. Example of the sound data collection process. Each sample is the assembly sound datum divided by the standard when the assembly of one product was completed.
Table 1. Large SMD dataset consisting of 30 normal products and 6 abnormal products. Bold text refers to abnormal products. Abnormal products were excluded from training.
Table 2. Newly acquired SMD dataset consisting of 15 normal products and 5 abnormal products. Bold text refers to abnormal products. Abnormal products were excluded from training.

4.1.1. Data Preprocessing

Before training the model, we transformed the data into spectrograms s , representing the data in the time-frequency domain. s R T × D , where T is the length of the spectrogram and D is the number of frequency bins. As shown in the Time column in Table 1, the lengths of the data vary, and the length T of the converted spectrogram s also varies. We use temporal adaptive average pooling (TAAP) [9], which is an effective data preprocessing method that can reduce the variable length T to a target length L using an adaptive kernel. TAAP is calculated as follows:
k = T L ,
s ¯ j = 1 k i = j × k ( j + 1 ) × k 1 s i ,
where the symbol refers to the ceiling function, j = 0 , 1 , , L 1 . T is a multiple of L ; otherwise, zero-padding is used for spectrogram s . By using TAAP, data corruption can be minimized, and since it can be reduced to a fixed length, the calculation speed is increased because the data are not truncated or padded when configuring the batch, and the model can be trained reliably. After TAAP, we normalize the sequence s ¯ by min–max normalization.

4.2. Experimental Process

The experimental process consists of the following four main steps, and we describe the experimental setup in detail.
  • Step 1: Data preprocessing.
We transform all datasets described in Table 1 into a Mel spectrogram. The Hanning window was used, and the window size was set to 2048, the hop size to 512, and the Mel filter bank to 80 for the transform. Then, TAAP was applied with a target length of 32 according to Equations (26) and (27), and normalized to min–max normalization.
  • Step 2: Train the autoencoder model.
Each SSS-AE model was trained with normal data from product data {GTC, STE, STI}, where abnormal data exist to obtain an encoder that can represent each product well in the latent space. Then, with a trained encoder, all normal data and abnormal data, i.e., Error 1 and Error 2 of the base product of the group, were embedded in the latent space. The batch size for pre-training was set to 64, the number of epochs was set to 5000, the number of heads for multi-head self-attention was set to 8, and the Adam optimizer [27] was used.
  • Step 3: Hierarchical clustering.
The latent vectors, obtained by embedding the data from each product {GTC, STE, STI} into the latent space using the encoder of the trained model, were hierarchically clustered using the centroid linkage mode based on Equations (19) and (20).
  • Step 4: Verification of newly collected data.
To validate our methodology, we verify it with the newly collected dataset. We find out how many products can be grouped and overseen without training, including the product group found through hierarchical clustering, and verify whether anomaly detection is possible with the abnormal data of the newly collected dataset.

4.3. Experiment Result

To verify how much normal products can be overseen with only one pre-trained model, we apply hierarchical clustering to the data from three products, i.e., {STE, STI, GTC}, having normal and abnormal sounds. Figure A1 summarizes the hierarchical clustering results on each latent vector obtained from SSS-AE. We denote the distance value merged with each of the abnormal products with Error 1 as a magenta color dotted line and Error 2 as a red dotted line. First, in the results of the hierarchical clustering, we found that the data numbers from normal products grouped into the same cluster under Error 1 for product data {STE, STI, GTC} were 1, 6, and 2. In the case of Error 2, 23, 26, and 30 can be overseen in the same cluster. Error 2 has a relatively higher noise compared to Error 1, and more products can be grouped and overseen in one cluster, but Error 1, with relatively weak noise, is overseen, similar to a normal product. It is up to the user to decide whether to group and manage more products under strong noise, such as Error 2, or group them under weak noise, such as Error 1, and manage them in the same cluster. We set Error 1 as a threshold and verified our methodology with newly collected SMD data. Figure A2 is the result of hierarchically grouping the products that can be grouped by the Error 1 threshold of the trained product data {STI} and the products that can be grouped by the Error 1 threshold among the newly collected data. The black label refers to the data collected previously, and the blue label refers to the newly collected data. The magenta label refers to abnormal data at the level of Error 1, and the red label refers to abnormal data at the level of Error 2. The left and center dendrograms are hierarchically clustered results with newly collected product data {GTD, NAC}, respectively, which involves the process of finding product data within the Error 1 threshold among the newly collected data. The dendrogram at the right is a hierarchical clustering result that includes product data {GTD, NAC, GTD-2}, which shows that the proposed method works by hierarchically clustering products within the Error 1 threshold among the newly collected data and their abnormal data. Normal data are grouped together with normal data, the existing Error 1 level of abnormal data is then clustered hierarchically, and the newly collected Error 2 level of abnormal product data {GTD-2} is clustered with abnormal product data {STI-2}. This shows that product data that can be grouped through hierarchical clustering were found, and anomaly detection was possible through actual abnormal data.

5. Conclusions

SMD refers to a machine used for assembling electronic components or semiconductors on electronic boards. Due to the characteristics of SMD, which is continuously assembled along the manufacturing line, a large loss can be prevented by immediately detecting an abnormality. The problem is that the types of SMD products produced according to the combination of electronic components and sensors vary, and accordingly, the number of anomaly detection models also increases linearly. To resolve these problems, in this work, we proposed a practical management method by clustering various products using the structural characteristics of an autoencoder. The method involves hierarchically clustering the latent vectors embedded in the encoder of the autoencoder, even with a strict threshold that can detect the Error 1 noise level, which is difficult for humans to detect. With one model trained only on the base product, up to 7 untrained normal products can be grouped and overseen, and for more efficient management, up to 30 untrained products can be grouped and overseen if they are grouped with a threshold that can detect the Error 2 noise level, which is relatively easy to detect. The findings suggest that our anomaly detection method allows for anomaly detection without the need to train individual anomaly detection models for each new SMD product, as long as the data of the new product exhibit similar characteristics to the previously training data. These results provide evidence that the proposed methodologies can effectively operate SMD anomaly detection models in real-world situations.
Our research employs a threshold using abnormal data, but it primarily trains on normal data for anomaly detection. With the passage of time, the SMD assembly machine is expected to undergo aging, which can lead to changes in data characteristics even when the machine operates normally. Currently, our anomaly detection research does not encompass changes in data characteristics caused by machine aging. However, we plan to conduct future research on anomaly detection to account for such changes.

Author Contributions

Writing–original draft, Y.J.S.; Writing–review & editing, K.H.N.; Supervision, I.D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science, Technology (no. 2019R1A2C1085113) and the 2023 Hankuk University of Foreign Studies Research Fund.

Institutional Review Board Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A presents the results of the hierarchical clustering using the proposed method as a dendrogram.
Figure A1. Hierarchical clustering results of products {STI, STE, GTC} with abnormal data. The green label refers to the base product used for training, the black label refers to the normal product data previously collected, the magenta dotted line refers to the Error 1 grouping threshold, and the red dotted line refers to the Error 2 grouping threshold.
Figure A1. Hierarchical clustering results of products {STI, STE, GTC} with abnormal data. The green label refers to the base product used for training, the black label refers to the normal product data previously collected, the magenta dotted line refers to the Error 1 grouping threshold, and the red dotted line refers to the Error 2 grouping threshold.
Applsci 13 07569 g0a1
Figure A2. Hierarchical clustering results of products that can oversee Error 1 as a grouping threshold. The green label refers to the base product used for training, and the blue label refers to the newly acquired normal data. The black label refers to the normal product data previously collected, the magenta dotted line refers to the Error 1 grouping threshold, and the red dotted line refers to the Error 2 grouping threshold.
Figure A2. Hierarchical clustering results of products that can oversee Error 1 as a grouping threshold. The green label refers to the base product used for training, and the blue label refers to the newly acquired normal data. The black label refers to the normal product data previously collected, the magenta dotted line refers to the Error 1 grouping threshold, and the red dotted line refers to the Error 2 grouping threshold.
Applsci 13 07569 g0a2

References

  1. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. (CSUR) 2009, 41, 1–58. [Google Scholar] [CrossRef]
  2. Chalapathy, R.; Chawla, S. Deep learning for anomaly detection: A survey. arXiv 2019, arXiv:1901.03407. [Google Scholar]
  3. Zhang, G.; Liu, Y.; Jin, X. A survey of autoencoder-based recommender systems. Front. Comput. Sci. 2020, 14, 430–450. [Google Scholar] [CrossRef]
  4. Han, J.; Liu, T.; Ma, J.; Zhou, Y.; Zeng, X.; Xu, Y. Anomaly Detection and Early Warning Model for Latency in Private 5G Networks. Appl. Sci. 2022, 12, 12472. [Google Scholar] [CrossRef]
  5. Elhalwagy, A.; Kalganova, T. Multi-Channel LSTM-Capsule Autoencoder Network for Anomaly Detection on Multivariate Data. Appl. Sci. 2022, 12, 11393. [Google Scholar] [CrossRef]
  6. Wulsin, D.; Blanco, J.; Mani, R.; Litt, B. Semi-supervised anomaly detection for EEG waveforms using deep belief nets. In Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications, Washington, DC, USA, 12–14 December 2010; pp. 436–441. [Google Scholar]
  7. Oh, D.Y.; Yun, I.D. Residual error based anomaly detection using autoencoder in SMD machine sound. Sensors 2018, 18, 1308. [Google Scholar] [CrossRef]
  8. Park, Y.; Yun, I.D. Fast adaptive RNN encoder–decoder for anomaly detection in SMD assembly machine. Sensors 2018, 18, 3573. [Google Scholar] [CrossRef]
  9. Nam, K.H.; Song, Y.J.; Yun, I.D. SSS-AE: Anomaly Detection using Self-Attention based Sequence-to-Sequence Autoencoder in SMD Assembly Machine Sound. IEEE Access 2021, 9, 131191–131202. [Google Scholar] [CrossRef]
  10. Johnson, S.C. Hierarchical clustering schemes. Psychometrika 1967, 32, 241–254. [Google Scholar] [CrossRef] [PubMed]
  11. Chen, Z.; Yeo, C.K.; Lee, B.S.; Lau, C.T. Autoencoder-based network anomaly detection. In Proceedings of the 2018 Wireless Telecommunications Symposium (WTS), Phoenix, AZ, USA, 17–20 April 2018; pp. 1–5. [Google Scholar]
  12. Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked convolutional autoencoders for hierarchical feature extraction. In Proceedings of the International Conference on Artificial Neural Networks, Espoo, Finland, 14–17 June 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 52–59. [Google Scholar]
  13. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 dataset. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
  14. Said Elsayed, M.; Le-Khac, N.A.; Dev, S.; Jurcut, A.D. Network anomaly detection using LSTM based autoencoder. In Proceedings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks, Alicante, Spain, 16–20 November 2020; pp. 37–45. [Google Scholar]
  15. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  16. Diday, E.; Simon, J. Clustering analysis. In Digital Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 1976; pp. 47–94. [Google Scholar]
  17. Izakian, H.; Pedrycz, W. Anomaly detection in time series data using a fuzzy c-means clustering. In Proceedings of the 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), Edmonton, AB, Canada, 24–28 June 2013; pp. 1513–1518. [Google Scholar]
  18. Kumar, S.; Khan, M.B.; Hasanat, M.H.A.; Saudagar, A.K.J.; AlTameem, A.; AlKhathami, M. An Anomaly Detection Framework for Twitter Data. Appl. Sci. 2022, 12, 11059. [Google Scholar] [CrossRef]
  19. Shi, P.; Zhao, Z.; Zhong, H.; Shen, H.; Ding, L. An improved agglomerative hierarchical clustering anomaly detection method for scientific data. Concurr. Comput. Pract. Exp. 2021, 33, e6077. [Google Scholar] [CrossRef]
  20. Lyon, R.J.; Stappers, B.; Cooper, S.; Brooke, J.M.; Knowles, J.D. Fifty years of pulsar candidate selection: From simple filters to a new principled real-time classification approach. Mon. Not. R. Astron. Soc. 2016, 459, 1104–1123. [Google Scholar] [CrossRef]
  21. Saraçli, S.; Doğan, N.; Doğan, İ. Comparison of hierarchical cluster analysis methods by Cophenetic correlation. J. Inequalities Appl. 2013, 2013, 1–8. [Google Scholar] [CrossRef]
  22. Sharma, S.; Sharma, S.; Athaiya, A. Activation functions in neural networks. Towards Data Sci. 2017, 6, 310–316. [Google Scholar] [CrossRef]
  23. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  24. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
  25. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  26. Murtagh, F.; Contreras, P. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 86–97. [Google Scholar] [CrossRef]
  27. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.