Label-Free Anomaly Detection Using Distributed Optical Fiber Acoustic Sensing

Deep learning anomaly detection is important in distributed optical fiber acoustic sensing (DAS). However, anomaly detection is more challenging than traditional learning tasks, due to the scarcity of true-positive data and the vast imbalance and irregularity within datasets. Furthermore, it is impossible to catalog all types of anomalies, therefore, the direct application of supervised learning is deficient. To overcome these problems, an unsupervised deep learning method that only learns the normal data features from ordinary events is proposed. First, a convolutional autoencoder is used to extract DAS signal features. A clustering algorithm then locates the feature center of the normal data, and the distance to the new signal is used to determine whether it is an anomaly. The efficacy of the proposed method was evaluated in a real high-speed rail intrusion scenario, and considered all behaviors that may threaten the normal operation of high-speed trains as abnormal. The results show that the threat detection rate of this method reaches 91.5%, which is 5.9% higher than that of the state-of-the-art supervised network and, at 7.2%, the false alarm rate is 0.8% lower than the supervised network. Moreover, using a shallow autoencoder reduces the parameters to 1.34 K, which is significantly lower than the 79.55 K of the state-of-the-art supervised network.


Introduction
Anomaly detection refers to the process of detecting data events that significantly deviate from the norm. Owing to the increasing demand for such applications in domains such as risk management, security, financial surveillance, health, and medicine, a variety of machine-learning anomaly detection methods are being developed [1]. Video data are a common resource for anomaly detection [2,3]. However, three key challenges hamper the deployment of machine learning video-event anomaly solutions. First, deployment is difficult and incurs high costs. Furthermore, systems can only be installed in specific areas and abnormal events can occur anywhere. Hence, blind and dead spots are problematic. Second, troubleshooting and maintenance are difficult and require vast human and material resources. Last, events can be easily disturbed by external factors, such as haze, weather, luminance, and reflections. Distributed optical fiber acoustic sensing (DAS) can solve these problems because it has many advantages, such as ultra-long detection distances, easy deployment, and resistance to harsh environments [4]. Therefore, it is highly suitable for anomaly detection [5].
DAS utilizes Rayleigh backscattering to measure sound or vibrations along an optical fiber over ultra-long distances [6]. Presently, the application of DAS is being studied in several fields, such as pipeline monitoring [7][8][9][10] that uses a buried optical fiber cable next to the pipeline. Perimeter security [11][12][13] is similarly monitored using a DAS system at the boundary of important facilities to detect intrusions. Earthquake detection [14][15][16][17] and other vibrational events are also monitored in this manner.
Numerous studies have focused on improving the performance of DAS, e.g., by applying new transmitted light schemes and scattering phase demodulation. Sensing range performance parameters, e.g., spatial and sensing resolutions, have significantly improved over the years [18]. To improve performance further, DAS methods are being combined with deep learning techniques for the intelligent and real-time identification of vibrations along the optical fiber [19][20][21][22][23].
However, the integration of deep learning inherits the same challenges faced by other machine learning anomaly detection models [1,24]. For example, the extreme imbalance between normal and abnormal samples in sensory datasets reduces the generalizability of the model. Many studies use artificially simulated abnormal data; however, it is impossible to simulate all DAS abnormalities because there is an abundance of noise in real environments. These challenges make it difficult to apply traditional supervised learning methods directly.
To address these challenges, this paper proposes an unsupervised method that only learns normal data features. Because the DAS systems collect multi-channel time series signals, the proposed unsupervised method considers the space and time dimensions. Thus, over a given period the original DAS signal is divided into windows consisting of several nearby channels and becomes the model's input. Then, an autoencoder is used to extract normal data features, and a clustering algorithm is used to establish feature centers. During testing, if a feature in the window is sufficiently distant from the center, it is judged to be abnormal. Notably, the autoencoder's shallow convolution structure is time efficient. To evaluate the efficacy of the proposed method, experiments were performed on the data from a real-world high-speed rail intrusion scenario. Intrusion behaviors, such as wall climbing and wall breaking, are regarded as abnormal. Since the scene contains intense background noise (e.g., high-speed trains and heavy trucks), the proposed identification method's complexity escalates. Nevertheless, the experimental results show that our method improves the threat detection rate by 7.6% and reduces the false alarm rate by 0.7% compared to the state-of-the-art supervised network.

Principle of the DAS System
DAS can be understood as multiple sensors along an optical fiber (i.e., multiple continuous channels), with the distance between each channel being the spatial resolution of the DAS, and the data received by each channel being different. DAS can collect all the vibration information that causes strain on the optical fiber, such as high-speed trains, heavy vehicles passing by, knocking on isolation walls, and climbing fences. The DAS structure shown in Figure 1 is based on the preliminary work of our team [13]. This system was deployed at a high-speed railway station using optical fiber fixed to the isolation wall along the tracks. Taking the railway station as the starting point, the overall sensing range of optical fiber is about 40 km. Considering that there is a large amount of invalid data in a silent state within the actual monitoring range, only 150 channels (about 1.5 km) with abnormal events were included in the dataset construction, and the data from each expansion section forward and backward were used as experimental data. The dataset contains many different noise signals, giving the research of this area some generality.
This system applies phase-sensitive optical time domain reflectometry (Φ-OTDR) [25]. It has high-quality spatial resolution (i.e., ≥5 m) and signal-to-noise ratio, which, in turn, drastically increased the sensing range of the DAS. This system uses 1550 nm narrow linewidth fiber laser with a maximum laser power of 13 dBm. The light is divided into two parts by the 2:98 coupler (OC1), 98% of the light is modulated into pulses by an acousticoptic modulator (AOM). EDFA is used to amplify and band pass filter (BPF) is used to suppress the amplified spontaneous emission noise. This pulse light passes through the circulator and collects Rayleigh backscattering (RB). Then the RB light is mixed with the remaining 2% of light by OC2. Subsequently, we use the balanced photodetector to detect the mixed light and retain the frequency and phase information of RB signal. The frequency and phase of the mixed signal are obtained in FPGA through fast Fourier transform. The final spatial resolution is 10.2 m and the sampling frequency (fs) of the vibration signal is 488 Hz.

Proposed Framework
The framework of the proposed unsupervised method is illustrated in Figure 2, which shows the vibration signal being captured and the training and testing phases. First, the DAS collects vibration signals along the optical fiber and generates space-time windows as the smallest input units. During training, a convolutional autoencoder obtains normal features, and each input, x i , obtains r n-dimensional feature vectors, F i ∈ R r×n . Next, a clustering algorithm is used to establish K feature centers, C ∈ R K×n . During testing, the autoencoder obtains latent features from the test data, calculating the distances between the latent features and their center as the model's output. If the distance is greater than a given threshold (using validation data), this window is identified as containing an anomaly. Finally, the results are smoothed to improve performance.

Space-Time Window
The efficacy of the proposed method was evaluated by considering a real-world intrusion scene at a high-speed railway station. The collected dataset includes intrusion behaviors, such as climbing the isolation wall, destroying the wall, and destroying the isolation iron spikes. All of them are regarded as abnormal events during the test. Additionally, the signal strength of noise, such as that of a high-speed train, may be several times of that of abnormal events, which increases the difficulty of event recognition (The data collected by DAS is the relative phase size between adjacent channels which can also be understood as the magnitude of vibration energy). For the data to have certain visual characteristics, they are subjected to bilinear difference processing during image visualization (all the DAS images in this paper were drawn after this operation). Figure 3 shows a typical multi-channel DAS signal, in which the abscissa is the time, the ordinate is the channel number, the red box is the signal of abnormal event (Here the abnormal event is the destroying of the separation wall), and the yellow box is the additional noise. Noise 1 is the high intensity noise when the train passes, and noise 2 is generated by the surrounding environment. Because the data collected by the DAS system have temporal and spatial continuity that affects multiple channels simultaneously, single-channel data are insufficient. Thus, the data are divided into adjustable space-time windows of multiple adjacent channels over a given period. In the experiment, the effects of different window sizes are compared. The input is scanned over a certain time interval (i.e., the scanning period). Then, a window equal to the scanning period slides across all channels. The width of the sliding window is expressed as the channel width, and they are equal. If intrusion data are found in a window, it is determined to be an intrusion window. Each space-time window is the smallest unit input to the model. Figure 4 illustrates the window generation process.

Autoencoder
An autoencoder is a neural network the expected output of which is the input itself. It transforms a high-dimensional input into a low-dimensional product and restores it using paired encoding and decoding processes. This procedure forces the encoder to find the most effective expression for the low-dimensional version of the high-dimensional data: where x is the input, y is the output, E is the encoder, and D is the decoder; the encoder and decoder can use any network structure. The purpose is to train E and D to minimize the differences between input and output, which forces the encoder to extract the most important features of the data to improve the decoder's performance. Because an autoencoder is non-linear, it differs from linear dimensionality reduction methods, such as principle component analysis, making it applicable to this study. The influences of different autoencoding depths on model recognition and the number of calculations are discussed in the experimental section. The loss function uses mean square error: where N is the number of windows, and Adam optimization is used to update the network parameters. The specific structure is given in the experiment section.

Clustering
A clustering algorithm is a typical unsupervised learning method in which similar data are apportioned in groups without regard to class labels. The K-means [26] version is widely used, owing to its simplicity and efficiency. A smaller Euclidean distance between two targets means greater similarity.
In this study, the autoencoder was used to extract normal features, and K-means clustering was used to establish 128 cluster centers.

Model Recognition
During testing, the distances between features and their resultant centers were computed using Equation (3). F j i ∈ R n , j ∈ {1, . . . , r} is a latent input feature (x i ): The largest m distances were averaged as the output value of each window: where Top l is a function that determines the l-th largest distance. If the output value is greater than the threshold, the corresponding window is considered an anomaly: where TH is the threshold set by the ratio of the number of false positive windows in the validation set, and s i is set to one if x i is anomalous; otherwise, it is zero.

Window Accumulation
Because anomalous behaviors are continuous, those of short duration are likely to produce false positive detection. To avoid this, a window accumulation mechanism is added after the output. When determining whether a window, s i , is anomalous, the current and prior α windows are traversed in retrograde. If the anomalous window exceeds β, s i is regarded as a true positive. Given multiple windows containing the same set of channels, s 1 , s 2 , . . ., the final test result, s i , is defined as

Threshold Setting
The threshold is set on the validation set to detect space-time window abnormalities. If the distance is greater than this threshold, the window is marked as abnormal. The threshold set on the verification set is the calibration of the actual environment. Subsequent experiments are conducted on the test set using this threshold.

Evaluation Metrics and Parameters
The threat detection rate (TDR) is the ratio of true positive intrusion windows detected after the window accumulation is equivalent to the recall rate: where TP is a true positive, and FN is a false negative. The false alarm rate (FAR) is the ratio of false alarms after the window accumulation is equivalent to the one-minus-precision rate: FP is a false positive. The F1 score is used to evaluate the balance between intrusion detection accuracy and recall rate; the higher the F1 score , the better: The F1 score objectively measures overall performance according to the false positive and intrusion detection rates. A higher F1 score is obtained only when the intrusion detection rate is high, and the false positive rate is low. The response time (RT) is the average time between the space-time window at the beginning of the true-positive intrusion signal at the predicted value.
Several settable parameters can be used to fine-tune model performance. The threshold (TH) determines whether the space-time window is indeed an intrusion window. If the distance of the event from the center is greater than this threshold, it is determined to be an intrusion window. The threshold setting is used to control model sensitivity. Unless otherwise specified, the threshold is obtained when the false alarm rate of the validation set is 3%. The scanning period controls the temporality of the intercepted space-time window, whereas the channel width controls its spatiality. Variable m is the average of the largest distances; thus, model sensitivity can be controlled by adjusting m. Unless otherwise specified, m is set to 64. α is the number of windows traversed in a retrograde strategy to control model sensitivity. Last, β is the window threshold that triggers judgment in the window accumulation strategy, which also controls model sensitivity.

Experimental Set Up
First, the DAS system illustrated in Figure 1 was used to collect vibration signals, in which the DAS spatial resolution was set to 10 m (i.e., the space interval between two channels), and the signal sampling rate was 488 Hz. The DAS system was deployed at a high-speed railway station, and the optical fiber was fixed along the barrier beside the tracks for approximately 40 km.
In a normal environment, 60 min signal data were randomly and intermittently collected and 40 min were randomly selected for the training set. The other data were used for validation. The test data were the same as those used in [13], which included approximately 30 min of multiple intrusion behaviors. The number of windows generated is shown in Table 1 (The ratio of abnormal to normal data is about 1:10). To eliminate the bias between the various channels and the effect of some too large data on the results, before being input to the model, all channels were standardized at each moment, and the maximum value of each channel was set to 3.5 times the standard deviation of the same channel in the training set. Finally, the vibration data were normalized in the [0, 1] range.  Table 2, and Bold in the table represents the best result for this item. The autoencoder model was used to extract latent features. The encoder structure of each layer was a Conv-ReLU-BatchNorm-Maxpool, and the decoder structure of each layer was an Interpolate-Conv-ReLU/Sigmoid, the convolution kernel is 3. Each encoder layer corresponded to one decoder layer. Considering the model result and calculation time, autoencoder models with 1-3 layers were compared. The results showed that the autoencoder using only one convolutional layer achieved the best performance, and owing to the shallow structure, the model saved computation time. Table 2. Influence of autoencoder model depth, "Par" represents the model parameter quantity, "FLOPs" represents floating point operations.

Influence of Scanning Period and Channel Width
The scanning period was used to control the space-time window and the channel width was used to control the spatial range. When experimentally compared, the windowscanning period range was 128∼4096, and the channel widths range was 10∼35.
The table shows that model detection accuracy improved with the increasing scanning period. In particular, the FAR dropped significantly. The response time increased accordingly, as a longer scanning period caused an increased delay in model output, as detailed in Table 3. Increasing the channel width improved anomaly detection performance and lowered the FAR. Additionally, owing to the fixed scanning period, the change in RT was small. When the channel width was too small, the missing spatial relationships led to poor detection. However, when the channel width was much larger a large amount of useless interference information was extracted and TDR dropped. A too-large channel width affected the accurate positioning of anomalies. When the channel width approximated the width of the channel affected by the abnormal behavior, the model reached its peak performance, as detailed in Table 4.

Impact of Window Accumulation Strategy
The window accumulation was controlled by two parameters, α and β, which balanced model sensitivity and detection accuracy. The value of α was varied from 2 to 10 and that of β from 1 to α − 1 to examine their influence on the model's sensitivity and adopted a scanning period of 128.
With the increase in α, model sensitivity increased along with the rate of anomaly detection. In contrast, with the increase in β, the anomaly detection rate declined, as shown in Figure 5. The false detection of the model increased with α, and it decreased as β increased, as shown in Figure 6.
Considering the trade-off performances of TDR and FAR, the F1 scores were compared, as shown in Figure 7.

Impact of Maximum Average Distance
Adjusting m can be used to tune model performance, as shown in Table 5. With an increase in m, various model indicators were improved. When m reached a reasonable value range, the TDR also improved, and the FAR decreased. Concomitantly, a larger window width requires a larger m.

Model Performance Comparison
To verify the model's performance, experiments were conducted on this dataset using a variety of supervised algorithms as the baseline. The results are shown in Table 6. The proposed method was trained using normal data and the supervised methods were trained on labeled anomaly data of the same size, as were the other configurations. Table 6. Comparison of anomaly detection performance.

Visualization of Results
As one of the test samples, the visualization of model detection shown in Figure 3 is shown in Figure 8. Nearly all intrusion signals were detected even with strong noise interference.
The results after window smoothing are shown in Figure 9, and further confirm the model detection performance. Window smoothing also eliminated the false positives caused by single-window identification errors.
As shown in Figure 10, the scanning period produced 1024 results. Furthermore, the false positives reduced significantly as the scanning period increased.

Conclusions
This study developed a label-free anomaly detection method based on DAS, which, to detect a variety of anomalous events, required only normal-state data for training. Because this method does not require labeled abnormal data, it addresses the problem that labeled anomaly data are generally lacking and sidesteps the impossibility of defining all types of anomalies for supervised networks. In particular, the proposed lightweight model reduces several parameters and vast amounts of computation.
The proposed method was validated using a high-speed rail intrusion dataset and was compared with multiple supervised methods under the same experimental configuration. The results show that the threat detection rate of this method reaches 91.5%, which is 5.9% higher than that of the state-of-the-art supervised network, and the false alarm rate reaches 7.2% is 0.8% lower than the supervised network.
We intend to test the model in complex environments (e.g., rain and thunder). By evaluating the efficacy of the proposed approach in more complex environments, the model will be refined for even better performance.