Signal Processing for Time Domain Wavelengths of Ultra-Weak FBGs Array in Perimeter Security Monitoring Based on Spark Streaming

To detect perimeter intrusion accurately and quickly, a stream computing technology was used to improve real-time data processing in perimeter intrusion detection systems. Based on the traditional density-based spatial clustering of applications with noise (T-DBSCAN) algorithm, which depends on manual adjustments of neighborhood parameters, an adaptive parameters DBSCAN (AP-DBSCAN) method that can achieve unsupervised calculations was proposed. The proposed AP-DBSCAN method was implemented on a Spark Streaming platform to deal with the problems of data stream collection and real-time analysis, as well as judging and identifying the different types of intrusion. A number of sensing and processing experiments were finished and the experimental data indicated that the proposed AP-DBSCAN method on the Spark Streaming platform exhibited a fine calibration capacity for the adaptive parameters and the same accuracy as the T-DBSCAN method without the artificial setting of neighborhood parameters, in addition to achieving good performances in the perimeter intrusion detection systems.


Introduction
With the widespread technological development of society, security issues have become increasingly prominent. Thanks to the recent progress of modern science and technology, better solutions have become available to solve security problems. Among them are fiber Bragg grating interference technology [1][2][3], big data processing technology [4][5][6], machine learning [7,8] and stream processing technology [9][10][11]. Recently, our team has realized the online writing of an ultra-weak FBG (UWFBG) array during the drawing process of single mode fibers (SMF). A large-scale UWFBG array is made up of hundreds or thousands of identical-wavelength FBGs with a reflectivity of about −50 dB for each FBG. Such a large-scale UWFBG sensor array has attracted a great deal of attention in major engineering monitoring, because of its low cost, low crosstalk, and strong multiplexing capacity [12][13][14]. In particular, the UWFBGs have the advantages of small size, favorable wavelength selectivity, and anti-electromagnetic interference, and so they are widely used in perimeter security and structural health monitoring.
The physical parameters of these UWFBGs, such as their reflected powers and Bragg wavelengths, vary with external vibration signals. The external signals are extracted from the light signs through demodulation and further data processing. As the demodulated data shows characteristics of having a by two definitions, namely density-reachability and density-connectability, which depend on two predefined parameter values: the size of the neighborhood, denoted by ε, and the number of neighborhood points in a cluster, denoted by N min . In T-DBSCAN, one begins with a random point x and it finds all of the points that are density-reachable from x with respect to ε and N min . It is obvious that no points are density-reachable from x when x is a border point; in this case, the T-DBSCAN begins with an unclassified point to repeat the same process, and so the two predefined parameters ε and N min decide the quality and efficiency of clusters.

AP-DBSCAN Algorithm
In the T-DBSCAN method, the values of ε and N min are regulated by the users. To avoid human intervention, we proposed an unsupervised clustering method. A sample set composed of n sensing signals, Signs n = {(x l ), l = 1, 2, . . . , n}, with the sampling frequency (f ) and a quantity of n f = n/f, can be denoted by Signs n = {(s m ), m = 0, 1, 2, . . . , n f } (1) where Then, a set of characteristic parameters, energy and average amplitude, can be calculated, as follows: A set of sample characteristic parameters can be described as: These characteristic data contain different dimensions. An effective normalization is needed to eliminate the influence of target dimensions, and so the min-max normalization method was used, which enables the mapping of energy and average amplitude in the range of [0, 1] by the linear transformation of the characteristic set T. Let the horizontal axis and vertical axis in the range of [0, 1] denote normalized energy and normalized average amplitude, respectively; then, the original one-dimensional data were converted into two-dimensional normalized data. Then, a symmetric distance matrix which describes the distances between all pairs of points may be constructed, as follows: where t = 1 + n/f is the number of the characteristic sample sets and d ij is the distance between points i and j. Sorting the elements of each row in the matrix D t from small to large in turn, a new matrix D s can be obtained. In the matrix D s , all of the elements at the first column are zero, and the elements at the kth column (k > 1) are the (k − 1) th closer distances. The sorted matrix D s can be represented by column matrices: For all the column matrices, calculating their J values gives the following: Then, to find the characteristic column matrix, which produces a minimum of all J's values, this process can be denoted by: Thus, the characteristic column matrix γ = ζ imin = (d 1,imin , d 2,imin , . . . , d t,imin ) T can be obtained. Progressively, the maximum distance in the characteristic column matrix was assigned as ε.
After the determination of ε, and then performing an arithmetic mean for the number of points within the ε-neighborhood in the entire data set, an optimal value of the point number in each cluster can be obtained: where X i is the number of points in the ε-neighborhood of each point. Figure 1 describes the implementation process of the proposed AP-DBSCAN method on Spark Streaming, which decomposes streaming computing into a series of short batch jobs. The batch engine is Spark Core, which divides the input data into pieces of data according to the batch size (for example, 4 s). The data were converted to the RDD in Spark, and then the transformation operation was changed to the RDD transformation operation, and each RDD is the data conversion of T s demodulated by the fiber grating signal processor.

AP-DBSCAN on Spark Streaming
For all the column matrices, calculating their J values gives the following: Then, to find the characteristic column matrix, which produces a minimum of all J's values, this process can be denoted by: Thus, the characteristic column matrix T min 1, min 2, min , min ( , ,..., ) Progressively, the maximum distance in the characteristic column matrix was assigned as ε.
After the determination of ε, and then performing an arithmetic mean for the number of points within the ε-neighborhood in the entire data set, an optimal value of the point number in each cluster can be obtained: where Xi is the number of points in the ε-neighborhood of each point. Figure 1 describes the implementation process of the proposed AP-DBSCAN method on Spark Streaming, which decomposes streaming computing into a series of short batch jobs. The batch engine is Spark Core, which divides the input data into pieces of data according to the batch size (for example, 4 s). The data were converted to the RDD in Spark, and then the transformation operation was changed to the RDD transformation operation, and each RDD is the data conversion of T s demodulated by the fiber grating signal processor.  The main steps in AP-DBSCAN on Spark Streaming are shown in Algorithm 1, and the workflows of the algorithm are shown in Figure 2. Box-plots [26] and fast Fourier transform [27] are employed to deal with noises. The feature sets of normal data in the first RDD, which mean that there is no intrusion, are obtained by Equations (3) and (4), and are then mixed to the proposed AP-DBSCAN in case the abnormal data appears at the beginning. The noise data is reconstructed and added to the normal data from AP-DBSCAN. On the DStream, each piece of data is continuously updated by RDD. If there is no abnormal data in a certain RDD, the output of the RDD includes normal feature samples of the last RDD and the current RDD; if there is abnormal data in a certain RDD, then the output of the RDD includes not only the normal feature samples of the last RDD and the current RDD, but also the abnormal samples of the current RDD. Thus, the clustering result of each RDD is achieved through AP-DBSCAN when the distinguished abnormal samples are output and the normal data samples are mixed with the following RDD.

Monitoring System Based on the UWFBG Array
The architecture of the intrusion monitoring and identification system is shown in Figure 3. The sensing system is composed of a quasi-distribution UWFBG array that was prepared on a drawing  The training sets of n workers: D n = x 1 , y 1 , x 2 , y 2 , . . . , (x n , y n ) Normal data: N t = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x t , y t )} 2: Step1: Create a local streaming context with two working thread and a batch interval of 4 s. 3: Step2: Create an input in DStream. 4: Step3: Operate DStream: Convert segment data and normal data to RDD, perform the first AP-DBSCAN to get the result of the clustering: →first RDD→AP-DBSCAN→the first clustering result While input DStream = true Abnormal data is separated from the first result, normal data is retained and mixed into the next data; Perform AP-DBSCAN to get the result of clustering. 5: Step4: Start Spark Streaming. 6: Output: The results of clustering on each RDD.

Monitoring System Based on the UWFBG Array
The architecture of the intrusion monitoring and identification system is shown in Figure 3. The sensing system is composed of a quasi-distribution UWFBG array that was prepared on a drawing single mode silica optical fiber, a 1550-nm laser source (RIO, narrow frequency laser module, 1 kHz of line width), an FBG signal processor, a detector (4-way photoelectric detection plate, self-control, bandwidth is 60 MHz), and a computer. UWFBGs with the same Bragg wavelengths were used as a string of vibration detectors to encapsulate the external vibration signals near the optical fibers. In our experiments, two kinds of UWFBG sensing arrays were prepared at a distance of every 5 m: one is the suspended UWFBG sensing array with a length of 100 m (20 sensors), which was fixed along a railing; the other is the buried UWFBG sensing array with a length of 300 m (60 sensors), which was buried under the ground.
All of the wavelength shift signals from the UWFBGs were transmitted to the signal processor based on Mach-Zehnder interference (MZI). The grating signal processor was connected to a computer through a network line, sending and receiving the data by the user datagram protocol (UDP). The computer and the software received the data stream from the grating signal processor, pushed the stream data to Spark Streaming for real-time processing, and saved the data to the Hadoop distributed file system (HDFS). As abnormal data appeared, the computer output them in real-time, and finally actualized the intelligent analysis and pattern recognition of the intrusion signals.  All of the wavelength shift signals from the UWFBGs were transmitted to the signal processor based on Mach-Zehnder interference (MZI). The grating signal processor was connected to a computer through a network line, sending and receiving the data by the user datagram protocol (UDP). The computer and the software received the data stream from the grating signal processor, pushed the stream data to Spark Streaming for real-time processing, and saved the data to the Hadoop distributed file system (HDFS). As abnormal data appeared, the computer output them in realtime, and finally actualized the intelligent analysis and pattern recognition of the intrusion signals.
The data acquisition methods are as follows: the continuous light from an amplified spontaneous emission (ASE) was modulated into a nanosecond pulse by a semiconductor optical amplifier. The pulse light was launched into the fiber with uniformly distributed UWFBGs by a circulator, and then a pulse train could be realized. A phase demodulation unit consisting of an unbalanced MZI, a 3 × 3 coupler and three detectors was used to restore the vibration signal. The unbalanced paths of the MZI separated each reflected pulse to two pulses; the slower pulse from the closer UWFBGs coincided with the faster pulse from the further UWFBGs, and the coherence would be maintained. Phase perturbations that are caused by vibrations between the two adjacent UWFBGs can be demodulated from the interference light pulse. According to optical time domain reflectometry, the correspondence relationships between the interference light pulse and sensing position were established.
The experiments were finished on a cluster with four nodes: a master node and three compute nodes. The configuration of each node is as follow: 3.4 GHz Intel Core i7-6700 processor, 8 M cache, 4 G memory, and 1 TB storage. The software used are Spark 1.6.1 and the Ubuntu 16.04 operating system, and the sampling frequency is 100 Hz.

Signal Processing for Railing Sensors
Three kinds of railing intrusion behaviors, namely knock, shake, and climb, were simulated, and the corresponding vibration signals are shown in Figure 4a-c. The DBCSAN and AP-DBSCAN calculations were finished by selecting the data of three behaviors and merging static data, respectively. The corresponding clustering effects are shown in Figure 5, where the horizontal axis and vertical axis denote the normalized energy and the normalized average amplitude, respectively. The data acquisition methods are as follows: the continuous light from an amplified spontaneous emission (ASE) was modulated into a nanosecond pulse by a semiconductor optical amplifier. The pulse light was launched into the fiber with uniformly distributed UWFBGs by a circulator, and then a pulse train could be realized. A phase demodulation unit consisting of an unbalanced MZI, a 3 × 3 coupler and three detectors was used to restore the vibration signal. The unbalanced paths of the MZI separated each reflected pulse to two pulses; the slower pulse from the closer UWFBGs coincided with the faster pulse from the further UWFBGs, and the coherence would be maintained. Phase perturbations that are caused by vibrations between the two adjacent UWFBGs can be demodulated from the interference light pulse. According to optical time domain reflectometry, the correspondence relationships between the interference light pulse and sensing position were established.
The experiments were finished on a cluster with four nodes: a master node and three compute nodes. The configuration of each node is as follow: 3.4 GHz Intel Core i7-6700 processor, 8 M cache, 4 G memory, and 1 TB storage. The software used are Spark 1.6.1 and the Ubuntu 16.04 operating system, and the sampling frequency is 100 Hz.

Signal Processing for Railing Sensors
Three kinds of railing intrusion behaviors, namely knock, shake, and climb, were simulated, and the corresponding vibration signals are shown in Figure 4a-c. The DBCSAN and AP-DBSCAN calculations were finished by selecting the data of three behaviors and merging static data, respectively. The corresponding clustering effects are shown in Figure 5, where the horizontal axis and vertical axis denote the normalized energy and the normalized average amplitude, respectively. According to the distance between the mass center of each cluster and the origin, one can judge the type of behaviors and each cluster represents a behavior.
According to the distance between the mass center of each cluster and the origin, one can judge the type of behaviors and each cluster represents a behavior.  In addition, the number of points in each cluster and the number of clusters corresponding to all the behaviors given by the T-DBSCAN and AP-DBSCAN methods were calculated and are shown in Table 2. The number of data for static, knocking, shaking, and climbing the rail are represented by C1, C2, C3, and C4, respectively; the results showed that the AP-DBSCAN method can achieve the same precision as T-DBSCAN without setting neighborhood parameters manually, being conductive to the realization of automatic detection in the perimeter intrusion detection systems.   According to the distance between the mass center of each cluster and the origin, one can judge the type of behaviors and each cluster represents a behavior.  In addition, the number of points in each cluster and the number of clusters corresponding to all the behaviors given by the T-DBSCAN and AP-DBSCAN methods were calculated and are shown in Table 2. The number of data for static, knocking, shaking, and climbing the rail are represented by C1, C2, C3, and C4, respectively; the results showed that the AP-DBSCAN method can achieve the same precision as T-DBSCAN without setting neighborhood parameters manually, being conductive to the realization of automatic detection in the perimeter intrusion detection systems.  In addition, the number of points in each cluster and the number of clusters corresponding to all the behaviors given by the T-DBSCAN and AP-DBSCAN methods were calculated and are shown in Table 1. The number of data for static, knocking, shaking, and climbing the rail are represented by C1, C2, C3, and C4, respectively; the results showed that the AP-DBSCAN method can achieve the same precision as T-DBSCAN without setting neighborhood parameters manually, being conductive to the realization of automatic detection in the perimeter intrusion detection systems. Finally, according to the methods by K-means [28], FCM (Fuzzy C-means) [29], and AP-DBSCAN, the misclassified patterns, the computation time and the error rate (ER) were computed and compared; their data are shown in Table 2, where the error rate was computed, as follows:

ER =
Number o f misclassi f ied objects Total number o f objects × 100% (11) Smaller ERs given by AP-DBSCAN indicated the AP-DBSCAN method can produce better results than the other methods.

Signal Processing for Buried Sensors
The underground sensing optical cable was buried under the ground at a depth of half a meter so that the optical cables were less affected by noise. We simulated five behaviors that influence the underground cable: walking on the buried cable, walking parallel to the cable at distances of 20, 40, 60 cm from the cable, and static standing, respectively. Each behavior records data for 20 s, and the corresponding vibration signals are shown in Figure 6a-d. Due to low noise, the collected data is processed by difference denoising methods, and then the AP-DBSCAN method is used to calculate the clustering based on the data of five behaviors. The corresponding clustering effects are shown in Figure 7, where the horizontal axis and vertical axis denote the normalized energy and the normalized average amplitude, respectively. The AP-DBSCAN on Spark Streaming and that on a single machine are compared with the clustering speed at an interval of 4 s. The abnormal behavior is obtained by clustering according to the data characteristics of different behaviors. Each cluster represents one kind of behavior; the abscissa and the ordinate of point denote energy and frequency amplitude, respectively. According to the distance from the origin to the center of the cluster, one can determine the type of the behaviors.
Then, we calculate the number of clusters that form each behavior and the number of points in each cluster, as given by T-DBSCAN and AP-DBSCAN. The number of data sets of static standing, walking parallel to the cable at distances of 60, 40, and 20 cm from the cable, and walking on the buried cable is denoted by C1, C2, C3, C4, and C5, respectively. According to the same sample data, the experimental results of the two methods are shown in Table 3, which exhibits that the AP-DBSCAN method can achieve the same accuracy as the T-DBSCAN without artificial setting of neighborhood parameters; thus, the AP-DBSCAN method can save a large amount of labor time.      Thirdly, according to the methods by K-means [28], FCM (Fuzzy C-means) [29], and AP-DBSCAN, the misclassified patterns, the computation time, and the error rate (ER) were computed and compared; their data are shown in Table 4. One can see that the smaller ERs given by AP-DBSCAN indicated that the AP-DBSCAN method can produce better results than the other methods. Finally, the time response of AP-DBSCAN on Spark Streaming was investigated also. In order to test the timeliness of the algorithm, several invasion scenarios were simulated. At a distance of 1 m from one side of the detection optical cable, one person walked to the optical cable vertically and arrived at the ground over the optical cable, and then walked 1 m distance to other side of the optical cable. Meanwhile, 100 pieces of data are recorded every 1 s and calculated every 2 s; the clustering data and the number of abnormal events are shown in Figure 8. It can be seen that the data can be received and processed in an effective time. To test the performance of the AP-DBSCAN method on Spark Streaming, the time responses was measured and shown in Figure 9. There is a small difference between AP-DBSCAN on a single machine and AP-DBSCAN on Spark Streaming when the test data is small. However, when the test data is very big, the response time by the AP-DBSCAN on Spark Streaming is significantly superior to the response time by AP-DBSCAN on the single machine.

Conclusions
In this paper, we propose the AP-DBSCAN algorithm with adaptive parameters on the Spark Streaming platform, solving the problem of the real-time anomaly detection of large-scale data in perimeter security. The preprocessing of the algorithm combines the Box-plots and the fast Fourier transform, and it is necessary to make certain that the data stream of a segment is mixed with the normal data stream of the previous segment to detect the abnormal data of different types. In the verification experiment of AP-DBSCAN, the proposed algorithm improves the unsupervised capability of T-DBSCAN and can detect abnormal conditions of large-scale data in real-time, providing better convenience and service for perimeter security.

Conclusions
In this paper, we propose the AP-DBSCAN algorithm with adaptive parameters on the Spark Streaming platform, solving the problem of the real-time anomaly detection of large-scale data in perimeter security. The preprocessing of the algorithm combines the Box-plots and the fast Fourier transform, and it is necessary to make certain that the data stream of a segment is mixed with the normal data stream of the previous segment to detect the abnormal data of different types. In the verification experiment of AP-DBSCAN, the proposed algorithm improves the unsupervised capability of T-DBSCAN and can detect abnormal conditions of large-scale data in real-time, providing better convenience and service for perimeter security.