Fast Measurement of Brillouin Frequency Shift in Optical Fiber Based on a Novel Feedforward Neural Network

: Brillouin scattering-based distributed optical ﬁber sensors have been successfully employed in various applications in recent decades, because of beneﬁts such as small size, light weight, electromagnetic immunity, and continuous monitoring of temperature and strain. However, the data processing requirements for the Brillouin Gain Spectrum (BGS) restrict further improvement of monitoring performance and limit the application of real-time measurements. Studies using Feedforward Neural Network (FNN) to measure Brillouin Frequency Shift (BFS) have been performed in recent years to validate the possibility of improving measurement performance. In this work, a novel FNN that is 3 times faster than previous FNNs is proposed to improve BFS measurement performance. More speciﬁcally, after the original Brillouin Gain Spectrum (BGS) is preprocessed by Principal Component Analysis (PCA), the data are fed into the Feedforward Neural Network (FNN) to predict BFS.


Introduction
Smart manufacturing based on the Internet of Things (IoT) is growing rapidly, driving the rapid development of IoT technology [1,2]. IoT is a network system that connects the Internet, global positioning systems, and various sensors, and is widely used in a variety of industries [3]. The key technology of IoT is sensing technology, in which distributed fiber optic sensors offer the advantages of small sensing size, light weight, electromagnetic immunity and continuous monitoring for temperature and strain. In the Brillouin optical time-domain analyzer (BOTDA), which is a typical system of distributed fiber optic sensors, the pump wave is pulsed into fiber, transferring energy into the probe wave. Scanning the required range of pump-probe frequency difference is used to construct BGS. BFS, the peak of BGS, is a function of local temperature and strain in the fiber [4,5]. Hence, one of the key technologies for the sensors is technology of estimating BFS from BGS, which can be realized using the traditional method, Lorentzian Curve Fitting (LCF). LCF is very time consuming and constitutes an obstacle to real-time measurement. Many techniques focused on hardware have been proposed to solve this problem, such as the use of balanced detection instead of increasing the number of measurements to eliminate polarization, and the use of a digital signal generator to switch quickly among scanning frequencies [6,7]. Additionally, sweep-free BOTDA has been proposed in order to boost the measurement speed, at the expense of complicated hardware and high cost [8][9][10][11]. Hence, some software approaches have been developed in recent years, wherein the training of FNN using BGSs with different full-width at half maximum (FWHM) was reported in 2019 [12,13], and the use of Principal Component Analysis (PCA) based on pattern recognition to avoid the measurement of BFS and directly measure the sensing information was reported in 2017 [14]. More specifically, the study conducted in [12] showed that FNN using ideal BGSs as the training set and temperature as the output of the network has higher accuracy and is faster than LCF. However, the FNN needs to be retrained when the frequency scanning step of the inputted spectra changes, which is a time-consuming task. The use of noisy BGSs for FNN training has been reported, and the training method of FNN has been well described [13]. The study conducted in [14] constructed a reference database comprising BGS compressed by PCA and temperature, and the predicted temperature was determined by finding the best match in the reference database. The measurement range for this algorithm is correlated with the size of the reference database, meaning that the required storage of the algorithm increases as the measurement range increases. This is a space-consuming task. Additionally, machine learning algorithms have been investigated to study the feasibility of using them for temperature measurement, skipping the BFS measurement process [15]. The Random Forest algorithm is usually outperformed by the FNN algorithm; when attempting to improved the efficiency of the Gradient Boosted Trees algorithm, its processing time increases; the Support Vector Machine algorithm requires complex classifiers when the number of classes is greater than two, leading to increased processing time.
In this work, a novel FNN is proposed to improve BFS measurement and achieve a faster measurement speed. Simulation results show that the proposed FNN has the ability to measure BFS from BGS with a shorter measurement time and adaptability to different frequency scanning steps. In this proof-of-concept experiment, the experimental results show that the novel FNN is able to accurately measure BFS from experimental BGS.

PCA
PCA is a powerful tool for data analysis, enabling us to identify patterns in data, especially the patterns in data of high dimension, and to express the data in such a way as to reduce their dimension [16][17][18]. Consequently, the other advantage of PCA is to compress the data without losing much information. In the mathematical computation of PCA, M samples are needed, where the i th sample is represented as s i , with a length of N. These samples are represented as where T represents the transposing matrix. For PCA to work properly, the mean matrix S is obtained using the following equations: The next step is to calculate the covariance matrix Cov using the following equation: Then, the eigenvectors µ i and eigenvalues λ i of the covariance matrix Cov are calculated, where i = 1, 2, . . . , N. These eigenvalues are different, and the eigenvectors are ordered by the corresponding eigenvalues from highest to lowest. Then, the first p eigenvectors are selected, and the others are ignored, whereby p is decided by Most applications of PCA usually employ δ of more than 0.9. The first p eigenvectors sorted from highest to lowest can be expressed together as a matrix PC = µ 1 , µ 2 , . . . , µ p .
Therefore, the i th sample s i is transformed tô whereŝ i is a row vector, a length of p, and i = 1, 2, . . . , M. Because p < N, all samples can be reduced from high dimension to a comparatively small dimension.

FNN
FNN is the simplest type of Artificial Neural Network, and is shown in Figure 1 [19]. All units of the previous layer are connected to all units of the next layer; however, units in the same layer cannot be connected to each other. In this network, information flows from the input layer, through the hidden layer, to the output layer, in only one direction. To understand FNN, taking FNN of 3 layers as example, some common markups as shown in Figure 1 are defined:

•
The input vector is [x 1 , x 2 ]; • The input and output of unit u i in the hidden layer is m i and n i , respectively; • The weight from j th unit in the previous layer to i th in the next layer is w j,i ; • The output value is h(x 1 , x 2 ).
Most applications of PCA usually employ δ of more than 0.9. The first p eigenvectors sorted from highest to lowest can be expressed together as a matrix PC = μ , μ , … , μ . Therefore, the i th sample s is transformed to where s is a row vector, a length of p, and i = 1,2, … , M. Because p N, all samples can be reduced from high dimension to a comparatively small dimension.

FNN
FNN is the simplest type of Artificial Neural Network, and is shown in Figure 1 [19]. All units of the previous layer are connected to all units of the next layer; however, units in the same layer cannot be connected to each other. In this network, information flows from the input layer, through the hidden layer, to the output layer, in only one direction. To understand FNN, taking FNN of 3 layers as example, some common markups as shown in Figure 1 are defined: The input and output of unit u in the hidden layer is m and n , respectively; • The weight from j th unit in the previous layer to i th in the next layer is w , ; • The output value is h(x , x ). When information moves from the input layer to the output layer, the input of the unit u in the hidden layer is and the output is where f(•) is the activation function and i = 1, 2, 3. Consequently, the output of the unit in the output layer is When information moves from the input layer to the output layer, the input of the unit u i in the hidden layer is and the output is where f(·) is the activation function and i = 1, 2, 3. Consequently, the output of the unit in the output layer is The expected result of the input [x 1 , x 2 ] is y, and the error between the expected result and the output is To use the algorithm of backpropagation (BP), the cost function can be minimized by adjusting the weights w j,i . FNN uses two datasets: the training set and the testing set. The training set is used to adjust the weights of FNN and the testing set used to determine their effectivity.

FNN with PCA
Making use of the advantages of PCA and FNN, a novel FNN is proposed in this work. The theoretical framework of this technique is shown in Figure 2. In an offline environment, a matrix is obtained through the mathematical computation of PCA and is not changed when real-time data are fed into the system. In an online environment, real-time data are fed into the system through the Input Interface, and the Output Interface outputs the predicted value.
result and the output is To use the algorithm of backpropagation (BP), the cost function can be minimized by adjusting the weights w , . FNN uses two datasets: the training set and the testing set. The training set is used to adjust the weights of FNN and the testing set used to determine their effectivity.

FNN with PCA
Making use of the advantages of PCA and FNN, a novel FNN is proposed in this work. The theoretical framework of this technique is shown in Figure 2. In an offline environment, a matrix is obtained through the mathematical computation of PCA and is not changed when real-time data are fed into the system. In an online environment, real-time data are fed into the system through the Input Interface, and the Output Interface outputs the predicted value.
In an offline environment, a set of data are needed having a length of m. The aim of PCA is to learn a mapping relationship from the group of data. A matrix, referred to as PCs, represents this relationship. PCs has dimensions of m n, and n determines the number of units in the input layer of FNN. Additionally, note that there is a one-to-one correspondence between PCs and the length of data, and n is much lower than m. In an online environment, real-time data can be fed into the system and a predicted result should be measured through FNN. In this process, real-time data are transformed into low-dimensional data, referred to as a Feature Vector. In fact, FNN receives the transformed data and not the raw data. This is unique and novel. Consequently, the novel FNN can achieve fast measurement of accurate results without sacrificing accuracy.

Proposed Method
To improve the measurement speed of BFS, the novel FNN was applied to estimate BFS, as shown in Figure 3, wherein the layout of the FNN was set as 10-50-15-1, where In an offline environment, a set of data are needed having a length of m. The aim of PCA is to learn a mapping relationship from the group of data. A matrix, referred to as PCs, represents this relationship. PCs has dimensions of m × n, and n determines the number of units in the input layer of FNN. Additionally, note that there is a one-to-one correspondence between PCs and the length of data, and n is much lower than m.
In an online environment, real-time data can be fed into the system and a predicted result should be measured through FNN. In this process, real-time data are transformed into low-dimensional data, referred to as a Feature Vector. In fact, FNN receives the transformed data and not the raw data. This is unique and novel. Consequently, the novel FNN can achieve fast measurement of accurate results without sacrificing accuracy.

Proposed Method
To improve the measurement speed of BFS, the novel FNN was applied to estimate BFS, as shown in Figure 3, wherein the layout of the FNN was set as 10-50-15-1, where the numbers represent the number of neurons in the input layer, the two hidden layers and the output layer, respectively.  In the training phase, the training set of BGS is multiplied by the selected PCs to obtain 10-dimensional Feature Vectors to train the FNN. The Feature Vectors and the corresponding BFSs are used to construct (feature vector, BFS) pairs, wherein feature vector is the input of the FNN and BFS is the peak of the BGS. Then, the weights of FNN are randomly initialized and updated using the training set until Equation (12) is minimized by the algorithm of BP. In addition, theoretical BGS is used to train the FNN, not experimental BGS. An example of theoretical BGS is shown in Figure 4, presenting a Lorentzian spectral profile: where g is the max of BGS, Δν is the FWHM and BGS peak at the frequency of ν . According to Equation (13), simulated BGS can be produced. More specifically, a group of discrete data points, the red triangle in Figure 4, represent BGS, and the sampling interval is the frequency scanning step. Obviously, the frequency scanning step affects the length of BGS, i.e., m in Figure 2. This means that different frequency scanning steps require different PCs. Noisy BGS is used to train the FNN. The starting frequency of BGS is 10.6 GHz, the ending frequency is 11.00 GHz, and frequency scanning step is 2 MHz; In the training phase, the training set of BGS is multiplied by the selected PCs to obtain 10-dimensional Feature Vectors to train the FNN. The Feature Vectors and the corresponding BFSs are used to construct (feature vector, BFS) pairs, wherein feature vector is the input of the FNN and BFS is the peak of the BGS. Then, the weights of FNN are randomly initialized and updated using the training set until Equation (12) is minimized by the algorithm of BP. In addition, theoretical BGS is used to train the FNN, not experimental BGS. An example of theoretical BGS is shown in Figure 4, presenting a Lorentzian spectral profile: where g 0 is the max of BGS, ∆ν B is the FWHM and BGS peak at the frequency of ν B . According to Equation (13), simulated BGS can be produced. More specifically, a group of discrete data points, the red triangle in Figure 4, represent BGS, and the sampling interval is the frequency scanning step. Obviously, the frequency scanning step affects the length of BGS, i.e., m in Figure 2. This means that different frequency scanning steps require different PCs. Noisy BGS is used to train the FNN. The starting frequency of BGS is 10.6 GHz, the ending frequency is 11.00 GHz, and frequency scanning step is 2 MHz; more details about noisy BGS are shown in Table 1. Meanwhile, BGS without noise is used to obtain PCs, which is almost same as noisy BGS, but has no noise; more details shown in Table 2.
Feature Vector is fed into the FNN with optimized weights and measured value of BFS is outputted. The performance of the FNN is estimated by calculating the Mean Absolute Error where x is the i measured BFS and x is the corresponding target value.

Simulation Results
According to the proposed method, a well-trained FNN was constructed. In this section, simulated BGSs that were not used in the training phase are fed into the FNN. The MAEs of the group of simulated BGSs at different targeted BFS are shown in Figure   Figure 4. A BGS sample is obtained according to Equation (13). In the testing phase, the testing environment is simulated using the testing set. The Feature Vector is fed into the FNN with optimized weights and measured value of BFS is outputted. The performance of the FNN is estimated by calculating the Mean Absolute Error where x i is the i th measured BFS and x i target is the corresponding target value.

Simulation Results
According to the proposed method, a well-trained FNN was constructed. In this section, simulated BGSs that were not used in the training phase are fed into the FNN. The MAEs of the group of simulated BGSs at different targeted BFS are shown in Figure 5. It can be seen that the MAEs were less than 0.4 MHz between 10.65 GHz and 10.95 GHz. When BFSs were less than 10.65 GHz or more than 10.95 GHz, MAEs became larger, but were not more than 0.8 MHz. It can be concluded that the proposed FNN is capable of measuring BFS from BGS. In Figure 6, a comparison is presented between the proposed FNN, marked as FNN with PCA, and other kinds of FNN [13], marked as FNN without  Table 3. The efficiencies of the two methods were taken into account in the comparison. The processing time is the total time required, from feeding 1,712,223 samples into the neural network until all measurement results had been outputted. It can be seen that the efficiency of the FNN with PCA is 3 times that of the second FNN. Additionally, FNN with PCA has almost the same accuracy as FNN without PCA. 5. It can be seen that the MAEs were less than 0.4 MHz between 10.65 GHz and 10.95 GHz. When BFSs were less than 10.65 GHz or more than 10.95 GHz, MAEs became larger, but were not more than 0.8 MHz. It can be concluded that the proposed FNN is capable of measuring BFS from BGS. In Figure 6, a comparison is presented between the proposed FNN, marked as FNN with PCA, and other kinds of FNN [13], marked as FNN without PCA. The details of the comparison are shown in Table 3. The efficiencies of the two methods were taken into account in the comparison. The processing time is the total time required, from feeding 1,712,223 samples into the neural network until all measurement results had been outputted. It can be seen that the efficiency of the FNN with PCA is 3 times that of the second FNN. Additionally, FNN with PCA has almost the same accuracy as FNN without PCA.    5. It can be seen that the MAEs were less than 0.4 MHz between 10.65 GHz and 10.95 GHz. When BFSs were less than 10.65 GHz or more than 10.95 GHz, MAEs became larger, but were not more than 0.8 MHz. It can be concluded that the proposed FNN is capable of measuring BFS from BGS. In Figure 6, a comparison is presented between the proposed FNN, marked as FNN with PCA, and other kinds of FNN [13], marked as FNN without PCA. The details of the comparison are shown in Table 3. The efficiencies of the two methods were taken into account in the comparison. The processing time is the total time required, from feeding 1,712,223 samples into the neural network until all measurement results had been outputted. It can be seen that the efficiency of the FNN with PCA is 3 times that of the second FNN. Additionally, FNN with PCA has almost the same accuracy as FNN without PCA.    FNN with PCA is not only efficient, but it can also be adapted to different frequency scanning steps. For the same frequency scanning range, different frequency scanning steps means different numbers of sampling points. In other words, when the raw sampling data of BGS is the input data of the FNN, the number of inputs is equal to the number of sampling points. In this way, BGS with different frequency scanning steps cannot use the same FNN, necessitating the retraining of the FNNs. However, the FNN with PCA solves this problem, since the length of the Feature Vector does not change as the frequency scanning step changes. Consequently, the FNN with PCA has the ability to use BGS with different frequency scanning steps as input. This ability is referred to as adaptability to different frequency scanning steps. Three frequency scanning steps, 1 MHz, 2 MHz and 5 MHz, are taken into consideration. The MAEs of the three steps are shown in Figure 7. It can be clearly seen that 5 MHz is more unstable than both 2 MHz and 1 MHz. As shown in Table 4, the MAE for 5 MHz is 1.1968 MHz, 2 MHz is 0.2027 MHz and 1 MHz is 0.4888 MHz. Hence, it is can be concluded that the frequency scanning step of the inputted BGS can be either 1 MHz or 2 MHz. The MAE of 2 MHz is lower than that of 1 MHz, because the training set consists of 2 MHz BGSs. The FNN only learns the pattern between BFS and the Feature Vector of 2 MHz from the training set, and the 2 MHz pattern is different from the 1 MHz pattern. However, the patterns between BFS for the three steps, namely 1 MHz, 2 MHz and 5 MHz, are similar. Similarity is calculated using the following equation: is the Feature Vector of the j th step; the similarity between them can be calculated. Because the FNN with PCA is trained using a 2 MHz step, the similarity between 2 MHz and the other two steps should be examined. As shown in Table 5, the similarity between 2 MHz and 1 MHz is the lowest. Comparing the similarity between the Feature Vectors with different steps, it can be seen that the Feature Vector with 2 MHz is more similar to the Feature Vector with 1 MHz than to the Feature Vector with 5 MHz, as can be seen in Figure 8. Consequently, the MAE of 1 MHz is smaller than the MAE of 5 MHz.     Table 5. There is similarity between different frequency scanning steps. S(i, j) is calculated using Equation (15), and there is similarity between the i th frequency scanning step and the j th frequency scanning step.

The i th Frequency Scanning
Step The j th Frequency Scanning Step S(i, j)

Experimental Results
Experimental BGSs were obtained from [20], as shown in Figure 9. The starting frequency is 10.88 GHz, the ending frequency is 11.19 GHz, and the step is 1 MHz. In the 80-km BOTDA system, the 80-km sensing fiber is in room of 26 • C and the 5-m hotspot of 76 • C is placed at the beginning of the fiber. A three-dimensional BGS spectrum along the fiber in the BOTDA system is shown in Figure 10. Measurement of BFS is performed in an online environment. The experimental BGSs has step of 1 MHz, so PCs taken from clear BGSs of 1 MHz is used to compress experimental data and obtain the corresponding feature vector. Now we feed the experimental BGSs into the well-trained FNN to measure BFS. In this way, BFS can be estimated from BGS, as shown in Figure 11. The measurement error of the technique is shown in Figure 12. It is shown that the error is huge, when BFS has sudden change, because the shape of BGS is distorted in the measurement process of the BOTDA system, such as when the shape changes from one-peak to two-peak. In addition, MAE is used for the quantitative interpretation of BFS error value. The MAE excluding the data near temperature variations is 0.3289 MHz, as calculated using Equation (14), where the target value is obtained by LCF, indicating that FNN with PCA can accurately measure BFS from experimental BGS. measurement error of the technique is shown in Figure 12. It is shown that the error is huge, when BFS has sudden change, because the shape of BGS is distorted in the measurement process of the BOTDA system, such as when the shape changes from one-peak to two-peak. In addition, MAE is used for the quantitative interpretation of BFS error value. The MAE excluding the data near temperature variations is 0.3289 MHz, as calculated using Equation (14), where the target value is obtained by LCF, indicating that FNN with PCA can accurately measure BFS from experimental BGS.

Conclusions
An accurate and ultra-fast method for estimating BFS from BGS is was and systematically compared with FNN with PCA and FNN without PCA. By comparison, our ap-

Conclusions
An accurate and ultra-fast method for estimating BFS from BGS is was and systematically compared with FNN with PCA and FNN without PCA. By comparison, our approach is 3 times faster than other FNN without sacrifice of accuracy. Meanwhile, experimental results show that the novel FNN can accurately measure BFS from experimental BGS. This will contribute to distributed optical fiber sensors, especially when the sensing distance is long. More importantly, measuring BFS using FNN with PCA is adaptable to different sensing applications, such as temperature and strain. Additionally, FNN based on PCA can be reused for different frequency scanning step, saving training cost and improving adaptability of the FNN.