Data Anomaly Detection of Bridge Structures Using Convolutional Neural Network Based on Structural Vibration Signals

: Structural monitoring provides valuable information on the state of structural health, which is helpful for structural damage detection and structural state assessment. However, when the sensors are exposed to harsh environmental conditions, various anomalies caused by sensor failure or damage lead to abnormalities of the monitoring data. It is inefﬁcient to remove abnormal data by manual elimination because of the massive number of data obtained by monitoring systems. In this paper, a data anomaly detection method based on structural vibration signals and a convolutional neural network (CNN) is proposed, which can automatically identify and eliminate abnormal data. First, the anomaly detection problem is modeled as a time series classiﬁcation problem. Data preprocessing and data augmentation, including data expansion and down-sampling to construct new samples, are employed to process the original time series. For a small number of samples in the data set, randomly increase outliers, symmetrical ﬂipping, and noise addition methods are used for data expansion, and samples with the same label are added without increasing the original samples. The down-sampling method of symmetrically extracting the maximum value and the minimum value at the same time can effectively reduce the dimensionality of the input sample, while retaining the characteristics of the data to the greatest extent. Using hyperparameter tuning of the classiﬁcation weights, CNN is more effective in dealing with unbalanced training sets. Finally, the effectiveness of the proposed method is proved by the anomaly detection of acceleration data on a long-span bridge. For the anomaly detection problem modeled as a time series classiﬁcation problem, the proposed method can effectively identify various abnormal patterns.


Introduction
In the field of structural health monitoring, the problem of data accumulation has been paid more and more attention. During the real-time monitoring of bridges, a large number of data are generated every day. These data containing the damage information of the bridge structure are the basis of bridge state assessment and long-term performance prediction. However, the installed sensors are exposed to harsh environments. As the working time increases, the performance of the sensor will decrease, which may cause sensor failure or data anomalies [1]. In the absence of an effective data processing mechanism, anomalies not only increase the cost of storage but also fail to guide the formulation of bridge maintenance strategies.
The existing data anomaly detection methods can generally be divided into modelbased methods and data-driven methods. Basically, model-based methods rely on finite element models to reflect inherent structural characteristics. A series of statistical and mechanical models have been established to predict the output of the measurement [2][3][4][5]. Model-based methods can achieve better detection accuracy. However, when dealing with large numbers of SHM data, it is difficult to create a reliable explicit finite element model to describe the structural behavior of the structure in service [6].
Data-driven methods include statistical process control and machine learning methods. They do not rely on finite element models and directly analyze measured time series data, which hopefully alleviates the shortcomings of model-based methods [7]. Among data-driven methods, deep-learning-based methods have the potential to learn from big data containing abnormal data to automatically diagnose various abnormal data. Recently, deep learning has been increasingly applied to solve time-series-related tasks [8][9][10], including time series classification, time series prediction, and time series anomaly detection. Bao [11] et al. proposed a data anomaly detection method based on computer vision and deep learning. The original time series measurement values are first converted into image vectors, and then these image vectors are input to a deep neural network (DNN) to identify various anomalies. Tang et al. [12] proposed a new anomaly detection method using computer vision and deep learning methods. This method first converts the original time series data into images, imitating human-vision-based data collection, and then trains CNN for abnormal classification. Mao et al. [13] combined the generative adversarial network with an autoencoder to improve the performance of existing unsupervised learning methods and used two data sets from full-scale bridges to verify the proposed method.
Supervised deep learning relies heavily on a large number of labeled training data to train the network. However, many abnormal data patterns in actual projects do not have enough labeled data. Therefore, how to efficiently generate a large number of labeled synthetic data with fewer samples is a problem worthy of attention. As an effective tool to improve the quantity and quality of training data, data augmentation is essential for the successful application of deep learning models. The basic idea of data augmentation is to allow limited data to generate more value when new data are not added substantially while maintaining correct labels. Data augmentation has achieved good results in many application scenarios [14]. Sun et al. [15] proposed a simple but effective data augmentation method for generating multi-view 2D pose annotations. Liu et al. [16] proposed an image generation technique to enhance the robustness of the convolutional neural network model. Time-domain transformation is the most direct data augmentation method for time series data. Most of them directly process the original input time series. Cui et al. [17] proposed a sliding window method combined with a Multi-scale Convolutional Neural Network (MCNN) to solve the time series classification problem and achieved good results on a large number of benchmark data sets. Fawaz et al. [18] proposed a new method for generating new time series with DTW and ensembled them by a weighted version of the DBA algorithm. Wen et al. [19] used data augmentation methods such as random mutation and adding random trends in different data sets and proposed a time series segmentation approach based on convolutional neural networks (CNN) and transfer learning. Gao et al. [20] proposed a label expansion method to change those data points near the labeled anomalies and their labels as anomalies, which brings performance improvement for time series anomaly detection.
For the time series classification problem, most studies model the problem as a classification problem based on computer vision, while the classification method directly based on vibration signals is rarely studied. In addition, less research uses time series data augmentation to obtain a more balanced sample set. However, one-dimensional convolutional networks, which are faster for time series problems, are also used in rare cases. In this paper, a data anomaly identification method using one-dimensional CNN is proposed based on bridge monitoring acceleration data, in which data augmentation is employed to process the samples.

Bridge Overview and Data Set Composition
This research uses the health monitoring data set of a large-span cable-stayed bridge in China. The main span of the bridge is 1088 m long, and the two side spans are 300 m each, including two 306 m-high towers. The structural health monitoring system of the bridge consists of 38 sensors. The position on the bridge is shown in Figure 1. Sensors include accelerometers, anemometers, strain gauges, global positioning systems (GPS), and thermometers. For this research, one-month (1 January-31 January 2012) acceleration data from all 38 sensors of the SHM system were used for data anomaly detection. The sampling frequency of the accelerometer is 20 Hz. The original continuous measurement data are divided into one-hour time periods, and in a one-month time period, through the method of non-overlapping windows, 744 time series measurement data of each sensor are obtained so as to obtain a total of 28,272 (744 × 38) data. The dimensions of a single data point are 1 × 72,000. Figure 2a-g shows an example of each type of data pattern. Table 1 describes the quantity and characteristics of normal data and six types of abnormal data. Each data point has a real category label. The normal time series measurement data are marked as 1, and the other six abnormal data patterns are marked as 2-7. It can be seen that nearly 52% of the data are abnormal. "Trend" is the main abnormal pattern that constitutes 20% of the data set, followed by "missing" and "square", each accounting for about 10%. On the other hand, the "outlier" only accounts for 1.9% of the data set, followed by "drift", which accounts for 2.4% of the data.

Data Preprocessing
Since there may be missing values or calculation errors in the process of data acquisition, data cleaning is performed on all data to remove missing or calculated incorrect values, which is reflected in MATLAB as "NAN". In order to keep the data length unchanged as the input of the neural network, all "NAN" values are replaced with 0.

Bridge Overview and Data Set Composition
This research uses the health monitoring data set of a large-span cable-stayed b in China. The main span of the bridge is 1088 m long, and the two side spans are each, including two 306 m-high towers. The structural health monitoring system bridge consists of 38 sensors. The position on the bridge is shown in Figure 1. Se include accelerometers, anemometers, strain gauges, global positioning systems ( and thermometers. For this research, one-month (1 January-31 January 2012) accele data from all 38 sensors of the SHM system were used for data anomaly detection sampling frequency of the accelerometer is 20 Hz. The original continuous measur data are divided into one-hour time periods, and in a one-month time period, throu method of non-overlapping windows, 744 time series measurement data of each s are obtained so as to obtain a total of 28,272 (744 × 38) data. The dimensions of a data point are 1 × 72000. Figure 2a-g shows an example of each type of data pattern. 1 describes the quantity and characteristics of normal data and six types of abnorma Each data point has a real category label. The normal time series measurement da marked as 1, and the other six abnormal data patterns are marked as 2-7. It can b that nearly 52% of the data are abnormal. "Trend" is the main abnormal pattern tha stitutes 20% of the data set, followed by "missing" and "square", each accountin about 10%. On the other hand, the "outlier" only accounts for 1.9% of the data se lowed by "drift", which accounts for 2.4% of the data

Bridge Overview and Data Set Composition
This research uses the health monitoring data set of a large-span cable-stayed bridge in China. The main span of the bridge is 1088 m long, and the two side spans are 300 m each, including two 306 m-high towers. The structural health monitoring system of the bridge consists of 38 sensors. The position on the bridge is shown in Figure 1. Sensors include accelerometers, anemometers, strain gauges, global positioning systems (GPS), and thermometers. For this research, one-month (1 January-31 January 2012) acceleration data from all 38 sensors of the SHM system were used for data anomaly detection. The sampling frequency of the accelerometer is 20 Hz. The original continuous measurement data are divided into one-hour time periods, and in a one-month time period, through the method of non-overlapping windows, 744 time series measurement data of each sensor are obtained so as to obtain a total of 28,272 (744 × 38) data. The dimensions of a single data point are 1 × 72000. Figure 2a-g shows an example of each type of data pattern. Table  1 describes the quantity and characteristics of normal data and six types of abnormal data. Each data point has a real category label. The normal time series measurement data are marked as 1, and the other six abnormal data patterns are marked as 2-7. It can be seen that nearly 52% of the data are abnormal. "Trend" is the main abnormal pattern that constitutes 20% of the data set, followed by "missing" and "square", each accounting for about 10%. On the other hand, the "outlier" only accounts for 1.9% of the data set, followed by "drift", which accounts for 2.4% of the data  Trend The data has an obvious trend in the time domain and has an obvious peak value in the frequency domain 5778 (20.4%)

Data Preprocessing
Since there may be missing values or calculation errors in the process of data acquisition, data cleaning is performed on all data to remove missing or calculated incorrect values, which is reflected in MATLAB as "NAN". In order to keep the data length unchanged as the input of the neural network, all "NAN" values are replaced with 0.
Zero-mean normalization is used to process the data for one hour so that the normalized data are normally distributed; that is, the mean is zero, and the standard deviation is one. This method can eliminate errors caused by self-variation or large differences in values, making the data more beneficial for subsequent steps. As shown in Equation (1),  Trend The data has an obvious trend in the time domain and has an obvious peak value in the frequency domain 5778 (20.4%)

7
Drift The vibration response is non-stationary, with random drift 679 (2.4%) Zero-mean normalization is used to process the data for one hour so that the normalized data are normally distributed; that is, the mean is zero, and the standard deviation is one. This method can eliminate errors caused by self-variation or large differences in values, making the data more beneficial for subsequent steps. As shown in Equation (1), Where x is a one-hour time series {x 1 , x 2 , . . . , x N }, µ is the mean of all sampling points, σ is the standard deviation of all sampling points, and x* is the normalized time series.

Data Augmentation
Augmentation methods should always be selected appropriately for the case under consideration [14]. For example, when applied to a time series containing outliers, the sliding window may not be able to capture the mutation features. Therefore, this research deals with every hour, that is, the full length of the sample.
Data enhancement includes two steps: data expansion of a small number of data samples and down-sampling of all samples.
Data expansion is applied to a small number of samples, namely outlier and drift, in the numerical simulation. Not all abnormal samples need to be expanded.
Outlier data can be defined as individual points of the normal data whose amplitude greatly exceeds the normal range. Therefore, a data expansion method that magnifies individual points is used for outlier samples. x is a normal sample {x 1 , x 2 , . . . , x N }, and the proposed method is shown in Equation (2), where p is a random number between 10 and 60, mean is the mean value of x, β is a random number between −2 and 2, and range is the difference between the maximum and minimum values in x.
The method of symmetrical flipping and noise addition is used to expand the data of drift samples. The drift data has a random drift upwards or downwards. Therefore, the method of up-and-down symmetrical flipping can construct an effective sample. The sample dimension of a single hour is 1 × 72,000, which is relatively large as the input of the neural network. Therefore, down-sampling is used to reduce the dimensionality of the sample while retaining the characteristics of the sample as much as possible to increase the efficiency of the neural network. The upper and lower contours of a sample are both useful features. Therefore, a down-sampling method that uses a sliding window to symmetrically extract the maxi-mum and minimum values is used. All 1 × 72,000 samples are down-sampled over the entire sample length. A step size is selected, which is 20 in this article, and the maximum and minimum values in the sequence are taken out for every sampling point of the step size. Therefore, after processing each of the 72,000 samples, a 2 × 3600 sample size will be obtained. The comparison chart of some examples before and after down-sampling is shown in Figure 4a,b. The horizontal axis represents the number of sampling points, and the vertical axis represents the acceleration amplitude in m/s 2 .

1D-CNN
A convolutional neural network (CNN) usually consists of an input layer, convolutional layer (Conv), pooling layer (Pooling), dense layer (Dense), and output layer. In the CNN architecture, the first few layers usually alternate between convolutional layers and pooling layers, and the last few layers close to the output layer are composed of dense layers. CNN is an end-to-end learning method model, which can use the existing supervised gradient descent algorithm to train the model. For time-series-processing problems, the effect of a one-dimensional convolutional neural network (1D-CNN) can be comparable to a recurrent neural network (RNN), and the computational cost is much smaller. For simple tasks such as time series classification, a small one-dimensional convolutional network can completely replace the RNN, and it runs faster [21].
Regardless of whether one-dimensional or two-dimensional convolution is used, convolutional neural networks have a similar structure. The structure starts with a stack of convolutional and pooling layers, and then connected to a flatten layer to convert two-dimensional features into one-dimensional output, and then multiple dense layers can be added for classification or regression. However, there is a little difference between them: one-dimensional convolutional neural networks can use larger convolution kernels [21]. For example, for a two-dimensional convolution layer, a 3 × 3 convolution kernel contains 3 × 3 = 9 convolution vectors; however, for a one-dimensional convolution layer, a convolution kernel of size 3 only contains 3 convolution vectors. Therefore, a one-dimensional convolution kernel greater than or equal to 9 can be easily used. The sample dimension of a single hour is 1 × 72,000, which is relatively large as the input of the neural network. Therefore, down-sampling is used to reduce the dimensionality of the sample while retaining the characteristics of the sample as much as possible to increase the efficiency of the neural network. The upper and lower contours of a sample are both useful features. Therefore, a down-sampling method that uses a sliding window to symmetrically extract the maxi-mum and minimum values is used. All 1 × 72,000 samples are down-sampled over the entire sample length. A step size is selected, which is 20 in this article, and the maximum and minimum values in the sequence are taken out for every sampling point of the step size. Therefore, after processing each of the 72,000 samples, a 2 × 3600 sample size will be obtained. The comparison chart of some examples before and after down-sampling is shown in Figure 4a,b. The horizontal axis represents the number of sampling points, and the vertical axis represents the acceleration amplitude in m/s 2 . The Python Science Suite, Tensorflow, and Keras are used to build a neural network architecture with GPU acceleration. The processor and graphics card types of the hardware platform are Inter Core i5-9400F and Nvidia GeForce RTX 2070. The object function in CNN is set to categorical cross-entropy to estimate the difference between the actual data category and the predicted data category. The metric is set to the accuracy to evaluate the performance of the model. In order to minimize the output of the object function, an adaptation of the mini-batch stochastic gradient descent algorithm called Adam is used as an optimizer.
In the classification, the imbalanced training set needs to be considered; that is, the number of normal samples is much larger than the abnormal samples. If an imbalanced training set is used to train the network, all abnormal samples will be predicted as normal samples during the test, and there will still be a high accuracy, but this is meaningless. Therefore, we choose to use the class weight technique [22], which can make important categories of samples contribute more to the object function during training. Batch Normalization (BN) [23,24] is a method that has been widely used in deep network training. The method of adding BN after the convolutional layer and then adding the activation function can save the operator from adjusting the parameters deliberately and slowly. Figure 5 shows the workflow of the proposed method.

D-CNN
A convolutional neural network (CNN) usually consists of an input layer, convolutional layer (Conv), pooling layer (Pooling), dense layer (Dense), and output layer. In the CNN architecture, the first few layers usually alternate between convolutional layers and pooling layers, and the last few layers close to the output layer are composed of dense layers. CNN is an end-to-end learning method model, which can use the existing supervised gradient descent algorithm to train the model. For time-series-processing problems, the effect of a one-dimensional convolutional neural network (1D-CNN) can be comparable to a recurrent neural network (RNN), and the computational cost is much smaller. For simple tasks such as time series classification, a small one-dimensional convolutional network can completely replace the RNN, and it runs faster [21].
Regardless of whether one-dimensional or two-dimensional convolution is used, convolutional neural networks have a similar structure. The structure starts with a stack of convolutional and pooling layers, and then connected to a flatten layer to convert twodimensional features into one-dimensional output, and then multiple dense layers can be added for classification or regression. However, there is a little difference between them: one-dimensional convolutional neural networks can use larger convolution kernels [21]. For example, for a two-dimensional convolution layer, a 3 × 3 convolution kernel contains 3 × 3 = 9 convolution vectors; however, for a one-dimensional convolution layer, a convolution kernel of size 3 only contains 3 convolution vectors. Therefore, a one-dimensional convolution kernel greater than or equal to 9 can be easily used. The Python Science Suite, Tensorflow, and Keras are used to build a neural network architecture with GPU acceleration. The processor and graphics card types of the hardware platform are Inter Core i5-9400F and Nvidia GeForce RTX 2070. The object function in CNN is set to categorical cross-entropy to estimate the difference between the actual data category and the predicted data category. The metric is set to the accuracy to evaluate the performance of the model. In order to minimize the output of the object function, an adaptation of the mini-batch stochastic gradient descent algorithm called Adam is used as an optimizer.
In the classification, the imbalanced training set needs to be considered; that is, the number of normal samples is much larger than the abnormal samples. If an imbalanced training set is used to train the network, all abnormal samples will be predicted as normal samples during the test, and there will still be a high accuracy, but this is meaningless. Therefore, we choose to use the class weight technique [22], which can make important categories of samples contribute more to the object function during training. Batch Normalization (BN) [23,24] is a method that has been widely used in deep network training. The method of adding BN after the convolutional layer and then adding the activation function can save the operator from adjusting the parameters deliberately and slowly. Figure 5 shows the workflow of the proposed method.

Bridge Monitoring Data Verification
According to the proposed method process, anomaly detection is performed on the bridge monitoring data set. First, data preprocessing is performed on all original samples, missing values are deleted, and samples are standardized. In order to test the generaliza-

Bridge Monitoring Data Verification
According to the proposed method process, anomaly detection is performed on the bridge monitoring data set. First, data preprocessing is performed on all original samples, missing values are deleted, and samples are standardized. In order to test the generalization ability of the model, the data set is divided into training and test sets, and 80% of the samples are randomly selected as the training set. The training set size is 22,616. Twenty percent of the samples are randomly selected as the test set, and the test set size is 5656. In order to simulate real anomalies, the distribution of test samples is unbalanced. Table 2 shows the distribution of selected test samples. Constructing a balanced training set of various categories is beneficial to the training process. Data expansion is carried out on the small number of anomalies in the training set, namely outlier and drift. The normal samples in all training sets are expanded to outlier samples by magnifying individual points. The Gaussian distributed noise with 2%, 3%, 4%, 5%, 6%, 7%, and 8% standard deviation to the signals are added to each drift sample once, and symmetrical flip it once to obtain 8 times the number of drift samples. Therefore, an additional 10,860 (13,575 × 80%) outlier samples and 4345 (679 × 80% × 8) drift samples were obtained. After adding to the training set, the new training set size is 37,821 (22,616 + 10,860 + 4345).
Down-sampling is implemented on the test set and new training set samples, and the dimensionality of the samples is reduced from 1 × 72,000 to 2 × 3600 while retaining most of their features.
In order to build the 1D-CNN architecture, two one-dimensional convolutional layers are stacked to obtain the deep features of the sample more efficiently, and a flatten layer and two dense layers are connected to convert two-dimensional features into one-dimensional output. The last layer of the network uses the softmax multi-classifier. In short, softmax is the value that maps the output of the previous layer to (0,1) through the softmax function. The sum of these values is 1, which can be understood as a probability. The node with the largest probability is selected as the predicted abnormal data type. The network structure is shown in Figure 6. The detailed structure of 1D-CNN is shown in Table 3. The hyperparameter configuration is shown in Table 4. Constructing a balanced training set of various categories is beneficial to the training process. Data expansion is carried out on the small number of anomalies in the training set, namely outlier and drift. The normal samples in all training sets are expanded to outlier samples by magnifying individual points. The Gaussian distributed noise with 2%, 3%, 4%, 5%, 6%, 7%, and 8% standard deviation to the signals are added to each drift sample once, and symmetrical flip it once to obtain 8 times the number of drift samples. Therefore, an additional 10,860 (13,575 × 80%) outlier samples and 4345 (679 × 80% × 8) drift samples were obtained. After adding to the training set, the new training set size is 37,821 (22,616 + 10,860 + 4345).
Down-sampling is implemented on the test set and new training set samples, and the dimensionality of the samples is reduced from 1 × 72,000 to 2 × 3600 while retaining most of their features.
In order to build the 1D-CNN architecture, two one-dimensional convolutional layers are stacked to obtain the deep features of the sample more efficiently, and a flatten layer and two dense layers are connected to convert two-dimensional features into onedimensional output. The last layer of the network uses the softmax multi-classifier. In short, softmax is the value that maps the output of the previous layer to (0,1) through the softmax function. The sum of these values is 1, which can be understood as a probability. The node with the largest probability is selected as the predicted abnormal data type. The network structure is shown in Figure 6. The detailed structure of 1D-CNN is shown in Table 3. The hyperparameter configuration is shown in Table 4.   Figure 6. Schematic of the proposed CNN architecture.
where Y represents the predicted value, and Y 0 represents the true label value. N represents the total number of samples.
In the training process, the training set is divided into 12.5% as the verification set. During the training process, the training loss and the validation loss (MSE) are monitored, and the training accuracy and verification accuracy (Accuracy) are also monitored. The change of the loss function and the change of the accuracy are shown in Figures 7 and 8.
where Y represents the predicted value, and 0 Y represents the true label valu represents the total number of samples.
In the training process, the training set is divided into 12.5% as the verificatio During the training process, the training loss and the validation loss (MSE) are monit and the training accuracy and verification accuracy (Accuracy) are also monitored change of the loss function and the change of the accuracy are shown in Figures 7 a  It can be seen that the overall loss value shows a downward trend, and the overall accuracy shows an upward trend. The amplitude is large at the beginning of training, indicating that the learning rate is appropriate. There are glitches and oscillations locally, possibly because a large batch size is selected for a large number of samples, and there are a small number of samples with incorrect labels in the real-world data set. After the loss value and accuracy stabilized, the final training and validation accuracy reached more than 95%.  It can be seen that the overall loss value shows a downward trend, and the accuracy shows an upward trend. The amplitude is large at the beginning of t indicating that the learning rate is appropriate. There are glitches and oscillations  Table 5 shows the classification results in a statistical way. In the statistical analysis of binary or multiple classifications, precision, recall, and F1 score are measures of the accuracy of the classification results, and the last one is the harmonic average of the first two. Recall is relative to the sample, that is, how many positive samples in the sample are predicted correctly. Take the missing-type samples in Table 5 as an example. There are a total of 603 missing-type samples. If 602 are predicted correctly, the recall is 602/603 = 99.83%. Precision is relative to the prediction result. It indicates how many of the samples whose predictions are positive are correct. Taking the normal-type samples as an example, a total of 2590 samples are predicted to be normal types. If 2542 predictions are correct, the precision is 2542/2590 = 98.15%. Recall and precision indicators are sometimes contradictory. If a comprehensive indicator is used to express the results of recall and precision, the most common method should be the F1 score as follows: Where F 1 represents F1 score, recall represents recall, and precision represents precision. It can be seen that the proposed method can effectively identify various data patterns. The recall of normal, missing, minor, square, trend, and drift categories can reach above 90%. Except for the low F1 score of outlier and drift, the other types are all high. A small number of minor samples are classified into the normal category. Some outlier samples are classified into the normal category, and a few are classified into the minor category. The outlier sample may have only a few peaks, and most of the features of the outlier sample are very similar to the normal sample, and the feature that is too small will be lost in the convolution process. Trend and drift are partly confused, probably because they both have slanted features.

Conclusions
This paper modeled the anomaly detection problem into a time series classification problem. The original time series undergoes data preprocessing and data augmentation to get a sample set with more uniform distribution, more obvious features, and smaller dimensions. Data augmentation includes data expansion and down-sampling. For small samples, the methods of symmetrical flipping, adding noise, and randomly generating outliers are used for data expansion, and samples with the same label are added without increasing the original samples. The down-sampling method of symmetrically extracting the maximum and minimum values can effectively reduce the dimensionality of the input sample and retain its features. Build a one-dimensional convolutional neural network model that is faster for time series classification problems. Adding the hyperparameter tuning of class weights makes the network more effective in dealing with an unbalanced training set. The method is verified with the acceleration data of a long-span cable-stayed bridge for one month. For the anomaly detection problem modeled as a time series classification problem, the results show that the proposed method can automatically detect a variety of data anomaly categories with high precision.
The proposed method can accurately identify most types of abnormal data, but for abnormal types with very inconspicuous features, such as outlier data, there is still much room for improvement in recognition accuracy. In future work, time series augmentation will not only be carried out in the time domain, but will be expanded to the frequency domain, or more advanced methods (such as GAN) will be used to expand samples.