Denoising of BOTDR Dynamic Strain Measurement Using Convolutional Neural Networks

The Brillouin optical time domain reflectometry (BOTDR) system measures the distributed strain and temperature information along the optic fibre by detecting the Brillouin gain spectra (BGS) and finding the Brillouin frequency shift profiles. By introducing small gain stimulated Brillouin scattering (SBS), dynamic measurement using BOTDR can be realized, but the performance is limited due to the noise of the detected information. An image denoising method using the convolutional neural network (CNN) is applied to the derived Brillouin gain spectrum images to enhance the performance of the Brillouin frequency shift detection and the strain vibration measurement of the BOTDR system. By reducing the noise of the BGS images along the length of the fibre under test with different network depths and epoch numbers, smaller frequency uncertainties are obtained, and the sine-fitting R-squared values of the detected strain vibration profiles are also higher. The Brillouin frequency uncertainty is improved by 24% and the sine-fitting R-squared value of the obtained strain vibration profile is enhanced to 0.739, with eight layers of total depth and 200 epochs.


Introduction
Due to the wide range of applications in structural health monitoring and the advantages of stability and flexibility in harsh environments, the Brillouin scattering-based distributed fibre optic sensors have attracted significant attention in recent decades. These sensors can quantitatively measure the distributed strain and temperature information along the optic fibre. The Brillouin optical time domain reflectometry (BOTDR) has the advantages of single-end access and simple operation at the construction site, which makes it very popular in civil engineering applications. Furthermore, the demand for the dynamic sensing of strain information is becoming stronger and stronger, along with the rapid developments in some civil engineering fields such as geophysical sciences and the oil and gas industries. Therefore, the dynamic measurement of strain using the BOTDR system is a practical and promising topic, especially for civil engineering-related industries.
The BOTDR system obtains the distributed temperature and strain information along the optic fibre under test (FUT) by sensing distributed Brillouin frequency shift (BFS) profiles. The BFS profiles are extracted by finding the peak values of each Brillouin gain spectrum (BGS) along the FUT. Therefore, the signal-to-noise ratio (SNR) of the BGSs limits the performance of the BOTDR system. By introducing noise reduction, the system performance could be improved.
By treating the BGSs along the FUT as a two-dimensional image, noise suppression can be conducted on Brillouin-based distributed fibre optic sensors using image denoising methods [1][2][3]. It has been proved that image denoising methods are more effective compared with one-dimensional denoising methods such as one-dimensional wavelet denoising [4]. Non-local means (NLM) and wavelet denoising (WD) are the main image the input. Additionally, the frequency accuracy and spatial resolution are not degraded with DnCNN. In this work, the denoising performance of DnCNN on the dynamic strain sensing of small gain stimulated Brillouin scattering (SBS) short-time Fourier transform (STFT)-BOTDR is experimentally demonstrated. Compared with conventional frequency scanning, STFT is faster and adds a noise removal effect. Small gain SBS boosts the Brillouin signal power of BOTDR and improves the SNR and system performance. The DnCNN performance on the detection of BGSs and distributed BFS profiles are experimentally validated. Experimental results of denoised BGS images, distributed BFS profiles, frequency uncertainties and the detection of strain vibration are compared, with different total depths and epochs of the training networks.

Materials and Methods
Since CNN has superior performance in image recognition, the idea of image denoising based on CNN was later proposed by [22], named DnCNN. The aim of image denoising is to recover a clean image x from a noisy image y with noise v. The relationship between them is y = x + v, where v is commonly assumed to be additive white Gaussian noise (AWGN) with standard deviation σ. Instead of producing the clean image as the output, DnCNN predicts the noise as the residual image by residual learning and removes the latent clean image within the hidden layers. Since there is a larger difference between the input image and the residual noise, it is easier to learn the residual noise compared with the clean image. Figure 1 is the architecture of the DnCNN for noise learning. The input is the noisy image (y = x + v) to be processed. There are three types of hidden layers in this model. The first layer is the convolutional layer with rectified linear units (ReLU) for nonlinearity, which is to gradually separate the image structure from the noisy input. The second type is the convolutional layer with batch normalization (BN) and ReLU. The third type is the convolutional layer to reconstruct the output.

DnCNN Architecture
Sensors 2023, 23, x FOR PEER REVIEW 3 of 20 learn the noise features than the clean image features and this leads to better noise removal. Compared with the image denoising methods, such as NLM and WD, DnCNN does not change the redundancy of the information. The data sampling point number is kept the same as it is of the input. Additionally, the frequency accuracy and spatial resolution are not degraded with DnCNN. In this work, the denoising performance of DnCNN on the dynamic strain sensing of small gain stimulated Brillouin scattering (SBS) short-time Fourier transform (STFT)-BOTDR is experimentally demonstrated. Compared with conventional frequency scanning, STFT is faster and adds a noise removal effect. Small gain SBS boosts the Brillouin signal power of BOTDR and improves the SNR and system performance. The DnCNN performance on the detection of BGSs and distributed BFS profiles are experimentally validated. Experimental results of denoised BGS images, distributed BFS profiles, frequency uncertainties and the detection of strain vibration are compared, with different total depths and epochs of the training networks.

Materials and Methods
Since CNN has superior performance in image recognition, the idea of image denoising based on CNN was later proposed by [22], named DnCNN. The aim of image denoising is to recover a clean image x from a noisy image y with noise v. The relationship between them is y = x + v, where v is commonly assumed to be additive white Gaussian noise (AWGN) with standard deviation σ. Instead of producing the clean image as the output, DnCNN predicts the noise as the residual image by residual learning and removes the latent clean image within the hidden layers. Since there is a larger difference between the input image and the residual noise, it is easier to learn the residual noise compared with the clean image. Figure 1 is the architecture of the DnCNN for noise learning. The input is the noisy image (y = x + v) to be processed. There are three types of hidden layers in this model. The first layer is the convolutional layer with rectified linear units (ReLU) for nonlinearity, which is to gradually separate the image structure from the noisy input. The second type is the convolutional layer with batch normalization (BN) and ReLU. The third type is the convolutional layer to reconstruct the output. Batch normalization normalizes the inputs of each hidden layer by normalizing the means and the variances [18]. Mini-batch stochastic gradient descent (SGD) is a very popular tool in the training process of deep learning nowadays. The mini-batch method randomly chooses a subset of the training dataset to perform the backpropagation and evaluate its gradient. Compared with computing over the whole batch (batch gradient descent) or a single training sample (stochastic gradient descent SGD), mini-batch is a faster and more stable way to update the weights. During the training process with mini-batch, the distribution of a layer's inputs changes due to the random initialisation of weights and change of the parameters as well as the randomness of the input data [23]. When updating the parameters by calculating the gradients, the layer has to continuously adapt to the new distribution. In experimental operations, the change in the means and variances of the inputs to layers during training processes is observed. This phenomenon is named Batch normalization normalizes the inputs of each hidden layer by normalizing the means and the variances [18]. Mini-batch stochastic gradient descent (SGD) is a very popular tool in the training process of deep learning nowadays. The mini-batch method randomly chooses a subset of the training dataset to perform the backpropagation and evaluate its gradient. Compared with computing over the whole batch (batch gradient descent) or a single training sample (stochastic gradient descent SGD), mini-batch is a faster and more stable way to update the weights. During the training process with mini-batch, the distribution of a layer's inputs changes due to the random initialisation of weights and change of the parameters as well as the randomness of the input data [23]. When updating the parameters by calculating the gradients, the layer has to continuously adapt to the new distribution. In experimental operations, the change in the means and variances of the inputs to layers during training processes is observed. This phenomenon is named internal covariate shift. Batch normalization is a method to mitigate the internal covariate shift. Arbitrary shift of data is removed by normalizing mean, while the arbitrary spread of the data is eliminated by variance normalization. With only two parameters added for each activation, batch normalization offers faster training and better convergence. The ReLU is a widely used activation function to add non-linearity [24]. It trains fast, is expressive and prevents the gradient vanishing problem. The ReLU is expressed as g(a) = max(a,0) (1) Figure 2 is the ReLU and its derivative. It shows that, for input values below 0, ReLU returns the value 0, which adds nonlinearity and is very effective and filters out unwanted values (no small values left compared to some other activation functions). On the other hand, with input values greater than 0, ReLU works as a linear function. Its derivative is constantly 1, which is stable and eliminates the gradient vanishing problem. For the activation functions of derivative smaller than 1, the errors between the predicted values and the real values will be decayed as they propagate through the layers. As the network goes deeper, it will be harder for the model to converge, and the gradient vanishing problem becomes obvious. internal covariate shift. Batch normalization is a method to mitigate the internal covariate shift. Arbitrary shift of data is removed by normalizing mean, while the arbitrary spread of the data is eliminated by variance normalization. With only two parameters added for each activation, batch normalization offers faster training and better convergence. The ReLU is a widely used activation function to add non-linearity [24]. It trains fast, is expressive and prevents the gradient vanishing problem. The ReLU is expressed as g(a) = max(a,0) (1) Figure 2 is the ReLU and its derivative. It shows that, for input values below 0, ReLU returns the value 0, which adds nonlinearity and is very effective and filters out unwanted values (no small values left compared to some other activation functions). On the other hand, with input values greater than 0, ReLU works as a linear function. Its derivative is constantly 1, which is stable and eliminates the gradient vanishing problem. For the activation functions of derivative smaller than 1, the errors between the predicted values and the real values will be decayed as they propagate through the layers. As the network goes deeper, it will be harder for the model to converge, and the gradient vanishing problem becomes obvious. Instead of training the mapping function of the clean image F(y) = x, the target of the model is to train a residual mapping of noise R(y) = v. The clean image can then be predicted by x = y − R(y).

DnCNN Architecture
The loss function of the model is written as where N is the number of noisy-clean training image pairs [22]. This loss function learns the trainable parameters Θ by computing the averaged mean squared error between the estimated residual noise from the noisy input and the desired residual noise. In the training process, the DnCNN model extracts the features of the noise in the forward direction and minimizes the loss function by tuning the trainable parameters Θ in backpropagation.
Since the size of the input data shrinks as the data are convolved with the kernels on each hidden layer, zero-padding is added to keep the output size the same as the input size. Given the input size i (along both axes), the kernel size along both axes k, and zero padding parameter along both axes p, the output size o of a convolutional layer is calculated as [25] o = (i − k) + 2p + 1 Having o = i, 2p = k − 1. For k to be odd, Instead of training the mapping function of the clean image F(y) = x, the target of the model is to train a residual mapping of noise R(y) = v. The clean image can then be predicted by x = y − R(y).
The loss function of the model is written as where N is the number of noisy-clean training image pairs [22]. This loss function learns the trainable parameters Θ by computing the averaged mean squared error between the estimated residual noise from the noisy input and the desired residual noise. In the training process, the DnCNN model extracts the features of the noise in the forward direction and minimizes the loss function by tuning the trainable parameters Θ in backpropagation.
Since the size of the input data shrinks as the data are convolved with the kernels on each hidden layer, zero-padding is added to keep the output size the same as the input size. Given the input size i (along both axes), the kernel size along both axes k, and zero padding parameter along both axes p, the output size o of a convolutional layer is calculated as [25] Having o = i, 2p = k − 1. For k to be odd, As the kernel size of 3 by 3 (k = 3) has been proven to be the most effective and widely used kernel size in CNN [20,26], 3 by 3 kernels are used here for image denoising. According to Equation (4), zero padding with p = 1 is added for k = 3 (1 row and column of 0 added on each boundary).

BOTDR Setup
To validate its performance, DnCNN is applied to the data processing of the small gain SBS STFT-BOTDR dynamic strain measurement as is shown in [27]. Figure 3 illustrates the BOTDR experimental setup. A narrow linewidth laser transmits continuous-wave (CW) light. The incident light is then equally split into two branches. In the upper branch, the light is modulated into a 40 ns pulse with 16 µs period. The modulator is controlled by a signal generator. The light pulse is amplified by an erbium-doped fibre amplifier (EDFA) and then goes through an optical bandpass filter to boost the signal power and reduce the noise after the amplification. The data with the input peak pulse power of 3.12 W are used in this work, as the optimal small gain SBS is induced at this input level. The incident light is sent into the FUT through a circulator and the backscattered light is delivered to the photodetector (PD). In the lower branch in Figure 3, a polarization scrambler (PS) is used to remove the effect of polarization mismatch between the two branches. The lower branch also goes to the PD so that the optical Brillouin signal is converted to the electrical signal on the PD. The signal is filtered by an electrical bandpass filter (BPF) and amplified by an amplifier (AMP). Since the Brillouin signal is around 11 GHz, which is a very high frequency, the electrical signal beats with an electrical local oscillator (LO) to down-convert the signal. After a second electrical amplifier (AMP), the signal is captured by a digitizer. The captured signal is processed by the signal processing method to reconstruct the BGS and BFS information. Each measurement is derived after 25 averaging, leading to 2.5 kHz sampling speed of the dynamic detection. In addition, the FUT is about 935 m long with the 60 Hz strain vibration added close to the far end of the fibre over about 6 m length. The fibre section with strain added is labelled as S2, while the first 921 m loose fibre is labelled as S1.
As the kernel size of 3 by 3 (k = 3) has been proven to be the most effective and widely used kernel size in CNN [20,26], 3 by 3 kernels are used here for image denoising. According to (4), zero padding with p = 1 is added for k = 3 (1 row and column of 0 added on each boundary).

BOTDR Setup
To validate its performance, DnCNN is applied to the data processing of the small gain SBS STFT-BOTDR dynamic strain measurement as is shown in [27]. Figure 3 illustrates the BOTDR experimental setup. A narrow linewidth laser transmits continuouswave (CW) light. The incident light is then equally split into two branches. In the upper branch, the light is modulated into a 40 ns pulse with 16 µ s period. The modulator is controlled by a signal generator. The light pulse is amplified by an erbium-doped fibre amplifier (EDFA) and then goes through an optical bandpass filter to boost the signal power and reduce the noise after the amplification. The data with the input peak pulse power of 3.12 W are used in this work, as the optimal small gain SBS is induced at this input level. The incident light is sent into the FUT through a circulator and the backscattered light is delivered to the photodetector (PD). In the lower branch in Figure 3, a polarization scrambler (PS) is used to remove the effect of polarization mismatch between the two branches. The lower branch also goes to the PD so that the optical Brillouin signal is converted to the electrical signal on the PD. The signal is filtered by an electrical bandpass filter (BPF) and amplified by an amplifier (AMP). Since the Brillouin signal is around 11 GHz, which is a very high frequency, the electrical signal beats with an electrical local oscillator (LO) to down-convert the signal. After a second electrical amplifier (AMP), the signal is captured by a digitizer. The captured signal is processed by the signal processing method to reconstruct the BGS and BFS information. Each measurement is derived after 25 averaging, leading to 2.5 kHz sampling speed of the dynamic detection. In addition, the FUT is about 935 m long with the 60 Hz strain vibration added close to the far end of the fibre over about 6 m length. The fibre section with strain added is labelled as S2, while the first 921 m loose fibre is labelled as S1. For one vibration result, the BFS profiles are measured 124 times and the vibration information for a specific location is extracted from the 124 BFS profiles. Each measurement lasts for 50 ms. After STFT of the captured time domain sensing data is performed, BGSs along the FUT are obtained. These BGSs along the FUT are illustrated in images. With the horizontal axis and the vertical axis representing the fibre length and the BGS frequency, respectively, the colour of the images indicates the amplitude of each point on the BGSs. Noise removal is conducted on the 124 derived BGS images with the trained

Training Setup
The Berkeley segmentation dataset (BSDS300) of size 200 by 200 is used to train the network as clean images [22]. Gaussian noise with standard deviation of 110 is added to the images of the dataset to generate noisy input images. Since the training target is the noise but not the clean image, the BSDS300 is chosen instead of simulated BGS images. The BSDS300 dataset includes more variations of details than the simulated BGS images, and hence the network can be better trained. Simulated BGS images were tested as the training set, but the outcomes were even worse than the original BFS results. This is due to the fact that real BGS images include more complicated BFS variations than simulated BGS images. On the other hand, if there was enough experimental data, the experimental dataset could be used as the training dataset.
The filter weights are initialized by the method in [28] and the batch size of 4 is used. The depth and the layers are described as follows. The first layer-1 convolutional layer with ReLU: 64 filters of size 3 by 3 are used to generate 64 feature maps. The second type of layers with depth D-D convolutional layers with BN and ReLU: 64 filters of size 3 by 3 by 64 are used for each layer. The last layer-1 convolutional layer: a filter of size 3 by 3 by 64 is used to reconstruct the output. The depth D is to be investigated in the experiments. The total depth of the network is (D + 2). The choices of the depths and the epoch numbers are experimentally studied in Section 3.

Experiments with Different Total Depths and Epochs
To compare the performance of the neural network with different depths and epochs, the total depth of the hidden layers is set to 4, 8, 12, 16, respectively, while the epoch is set to 50 and 200, respectively. The depth of the second type hidden layers is therefore 2, 6, 10, 14, respectively. The neural network is trained eight times in total with the set depths and epochs. The 124 BGS images are denoised using each trained network. Figure 4 shows a three-dimensional map of the measured BGSs along the fibre distance. The top view of the map is the BGS image to be denoised.

Training Setup
The Berkeley segmentation dataset (BSDS300) of size 200 by 200 is used to tra network as clean images [22]. Gaussian noise with standard deviation of 110 is add the images of the dataset to generate noisy input images. Since the training target noise but not the clean image, the BSDS300 is chosen instead of simulated BGS im The BSDS300 dataset includes more variations of details than the simulated BGS im and hence the network can be better trained. Simulated BGS images were tested a training set, but the outcomes were even worse than the original BFS results. This i to the fact that real BGS images include more complicated BFS variations than simu BGS images. On the other hand, if there was enough experimental data, the experim dataset could be used as the training dataset.
The filter weights are initialized by the method in [28] and the batch size of 4 is The depth and the layers are described as follows. The first layer-1 convolutional with ReLU: 64 filters of size 3 by 3 are used to generate 64 feature maps. The second of layers with depth D-D convolutional layers with BN and ReLU: 64 filters of siz 3 by 64 are used for each layer. The last layer-1 convolutional layer: a filter of size by 64 is used to reconstruct the output. The depth D is to be investigated in the ex ments. The total depth of the network is (D + 2). The choices of the depths and the e numbers are experimentally studied in Section 3.

Experiments with Different Total Depths and Epochs
To compare the performance of the neural network with different depths and ep the total depth of the hidden layers is set to 4, 8, 12, 16, respectively, while the epoch to 50 and 200, respectively. The depth of the second type hidden layers is therefor 10, 14, respectively. The neural network is trained eight times in total with the set d and epochs. The 124 BGS images are denoised using each trained network. Figure 4 shows a three-dimensional map of the measured BGSs along the fibr tance. The top view of the map is the BGS image to be denoised.  The training loss curves with different total depths and epoch numbers are shown in Figure A2, Appendix B. Good convergence can be seen in all figures in Figure A2, indicating reasonable training of the models.
The BGS image without DnCNN is shown in Figure A1a in Appendix A. The denoised BGS images along the FUT, with different total depths and epochs of DnCNN are demon-  Figure A1b-i. It should be noted that there are 124 denoised images for each set total depth and epoch. Figure A1 illustrates one single image from the 124 images for each training network. The centre yellow parts of the figures are the BGSs. Upshifts of BGS frequencies are observed at the far end of the FUT, which are caused by the applied strain onto the fibre. Clearer images are seen with an epoch of 200 than those with an epoch of 50, when the total depth of the network is fixed. Figure A1a is the fuzziest without DnCNN denoising.
The detected BFS profiles along the FUT for one of the 124 measurements are shown on the left in Figures 5 and 6  By examining the 124 BFS profiles, the 60 Hz strain vibration over 50 ms of time is detected. The vibration profiles with DnCNN denoising at different total depths and epochs (red) are drawn in the right figures in Figures 5 and 6, compared with the detected strain vibration without denoising (blue). The peak-to-peak change in BFS is about 16 MHz, corresponding to 320 µε strain change on the fibre. The improvement of the vibration detection can be slightly more obvious in Figures 5d and 6b,d by visual inspection but is not intuitionistic enough.
To verify the performance of the noise reduction on the strain vibration detection, the measured vibration results are sine fitted and the R-squared values of the sine fitting are derived for each given total depth and epoch. The R-squared value is a statistical measure that shows how well the dataset fits the regression model. It determines the proportion of the variance in the dependent variable that can be explained by the independent variable. Its value ranges from 0 to 1, and the larger the value, the better the correlation between the fitting model and the data. The frequency uncertainty of the BFS is calculated as the standard deviation of the BFS over time. The R-squared values and the frequency uncertainties are written in Table 1 for each total depth and epoch number of the DnCNN. The R-squared value and the frequency uncertainty of the measured result without denoising are also listed in Table 1 (No. i) as a comparison. The results with different depths and epochs are numbered in Table 1 (as a to h). The corresponding R-squared values and the frequency uncertainties are shown in the charts in Figures 7 and 8, respectively. The column numbers of a to i agree with Table 1.  tainties are written in Table 1 for each total depth and epoch number of the DnCNN. The R-squared value and the frequency uncertainty of the measured result without denoising are also listed in Table 1 (No. i) as a comparison. The results with different depths and epochs are numbered in Table 1 (as a to h). The corresponding R-squared values and the frequency uncertainties are shown in the charts in Figure 7 and Figure 8, respectively. The column numbers of a to i agree with Table 1.    Table 1).
Sensors 2023, 23, x FOR PEER REVIEW 11 of 20  Table 1.) It can be observed in Table 1 and Figure 7 that the DnCNN with a total depth of 8 and epoch of 200 provides the best performance for the detection of sinusoidal strain vibration which has the largest R-squared value among all the results. With the total depth of 8, 12 and 16, the R-squared values with 200 epochs are higher than the results with 50 epochs. With a total depth of 4, the R-squared value of 50 epochs is slightly larger than that of 200 epochs. On the other hand, comparing the results with the same epoch number, the R-squared values with a total depth of eight are the most enhanced. The R-squared values with a total depth of 12 and 16 are almost the same as the second best, whereas the R-squared values with a total depth of 4 are the lowest. All the R-squared values of sine fitting for the detected strain vibrations from denoised BGS images are better than the value without denoising.
From Table 1 and Figure 8, the frequency uncertainties with the total depth of 8 and  Table 1).
It can be observed in Table 1 and Figure 7 that the DnCNN with a total depth of 8 and epoch of 200 provides the best performance for the detection of sinusoidal strain vibration which has the largest R-squared value among all the results. With the total depth of 8, 12 and 16, the R-squared values with 200 epochs are higher than the results with 50 epochs. With a total depth of 4, the R-squared value of 50 epochs is slightly larger than that of 200 epochs. On the other hand, comparing the results with the same epoch number, the R-squared values with a total depth of eight are the most enhanced. The R-squared values with a total depth of 12 and 16 are almost the same as the second best, whereas the R-squared values with a total depth of 4 are the lowest. All the R-squared values of sine fitting for the detected strain vibrations from denoised BGS images are better than the value without denoising.
From Table 1 and Figure 8, the frequency uncertainties with the total depth of 8 and 16 and the epoch number of 200 are the most decreased among all the results, which are reduced to below 4 MHz. With the same total depths (4, 8, 12 or 16), the frequency uncertainties with 200 epochs are lower and hence better than those with 50 epochs. Moreover, the frequency uncertainty with a total depth of 16 is the smallest for 200 epochs, and the frequency uncertainty with a total depth of 8 is the smallest for 50 epochs. On the contrary, the frequency uncertainty with the depth of 4 and 12 and the two largest, whichever the epoch number is. Furthermore, all the obtained frequency uncertainties with DnCNN denoising are improved compared with that with no denoising.
Comparing the performance with the same total depth, generally, all the experimental results with 200 epochs are better than those with 50 epochs, although there is some disturbance for the depth of 4. With a batch size of 4 and a training dataset of 200, the trainable parameters are updated 50 times for each epoch. With a larger number of epochs, the parameters (the weights) are updated more times and hence better convergence is realised. The loss function is better minimized and the denoising performance is also better, at the expense of time consumption. On the other hand, as the epoch number goes further larger, the network training loss tends to be flattened and the improvement of the denoising performance is limited. 200 epochs can already provide noticeable training performance [22], and a larger epoch number will take up more processing time without significant performance improvement.
On the other hand, by comparing the experimental results with the same epoch number, it can be seen that as the total depth increases from 4 to 16, the detected results get first better and then worse and then better again. In a deep learning network, each layer progressively extracts higher level features of the input until the final layer, from the edges, object parts to objects [29]. With four hidden layers, the DnCNN extracts relatively lower level features, and the network is not yet trained to obtain enough features of the target BGS images. With the network going deeper to eight layers, more complex features of the input are extracted. The network performs better training and better denoising of the images. As the total depth goes even deeper to 12 and 16, there are even more complex features extracted. However, there are also more trainable parameters introduced into the network. Hence, the training process of the network is more complicated as well. Different from normal natural images, the BGS images are made up of a large number of spectra that have similar features and shapes. Additionally, their features are not as complicated as the natural image. With a total depth of 8, the features can already be extracted well. With a deeper network, the BGS images are overfitted. From Figure 6, the denoised BFS profile in Figure 6 (d) is smoother, but some details are suppressed. Since the network with a depth of 8 and 200 epochs can generate comparable frequency uncertainty while keeping more details of the BFS, the DnCNN with 8 layers and 200 epochs is the most feasible for the denoising of BGS images along the FUT.

Spatial Performance and the Brillouin Gain Spectra
To investigate the spatial performance with DnCNN denoising, Figure 9 illustrates measured BFS profiles of the optic fibre section S2 with strain applied at different total depths and 200 epochs. The change in BFS due to the applied strain can be observed from all the profiles from 922 m to 928.4 m, except the profile with 12 layers. The rising point of the detected strain is one point forward with 12 layers. Due to the double peak effect where there is a strain change, distance error might be induced after denoising as the Brillouin amplitudes of the two peaks can be comparable and the denoising is based on the whole image with spatial context information considered [30].
depths and 200 epochs. The change in BFS due to the applied strain can be observed from all the profiles from 922 m to 928.4 m, except the profile with 12 layers. The rising point of the detected strain is one point forward with 12 layers. Due to the double peak effect where there is a strain change, distance error might be induced after denoising as the Brillouin amplitudes of the two peaks can be comparable and the denoising is based on the whole image with spatial context information considered [30]. The obtained BGSs with DnCNN denoising and without denoising are shown in Figure 10, respectively. It can be observed that the fluctuations in the spectra are reduced after denoising compared with the spectrum without denoising although the noise level is not significantly suppressed. In addition, there are shifts in the peak BGS frequencies with different depths after denoising. In the Brillouin-based distributed fibre optic sensors, there is another common definition of SNR which is better related to the performance of a system as the ratio of the mean BFS signal to the standard deviation of the BFS signal [7]. Hence, the suppression of the fluctuation of BGSs improves the SNR and can lead to more accurate detection of BFS. The obtained BGSs with DnCNN denoising and without denoising are shown in Figure 10, respectively. It can be observed that the fluctuations in the spectra are reduced after denoising compared with the spectrum without denoising although the noise level is not significantly suppressed. In addition, there are shifts in the peak BGS frequencies with different depths after denoising. In the Brillouin-based distributed fibre optic sensors, there is another common definition of SNR which is better related to the performance of a system as the ratio of the mean BFS signal to the standard deviation of the BFS signal [7]. Hence, the suppression of the fluctuation of BGSs improves the SNR and can lead to more accurate detection of BFS.
Sensors 2023, 23, x FOR PEER REVIEW 13 of 20 Figure 10. the DnCNN denoised BGS spectra. Table 2 demonstrates the experimental results of Brillouin-based fibre optic sensors with some published image denoising methods, compared with the results of this work. It can be seen that the image denoising methods used are mainly NLM, WD and BM3D. The BFS uncertainty can be improved within the range of 57.2% to 23.9% of the original value with these three methods based on the different denoising optimization parameters chosen as well as different fibre lengths, input pulse widths (the nominal spatial resolution) and different averaging numbers of the traces. Nevertheless, the first three lines of Table 2 show the degradation of these denoising methods, which can be very significant to as large as 23 times the nominal frequency uncertainty, due to the missing of data details as the SNR of signal increases [1]. The spatial resolution can also broaden to over twice the original value. In addition, these methods require complex parameter adjustment [3,8]. In addition, the performance of NLM depends on the similarity of the neighbouring data. Moreover, due to the confusion of the definition of frequency uncertainty, many are using the distance domain approach, i.e., the standard deviation of the BFS along the fibre from one measurement within a certain spatial window length, which is inaccurate and overestimates the BFS uncertainty, making the results much better than  Table 2 demonstrates the experimental results of Brillouin-based fibre optic sensors with some published image denoising methods, compared with the results of this work. It can be seen that the image denoising methods used are mainly NLM, WD and BM3D. The BFS uncertainty can be improved within the range of 57.2% to 23.9% of the original value with these three methods based on the different denoising optimization parameters chosen as well as different fibre lengths, input pulse widths (the nominal spatial resolution) and different averaging numbers of the traces. Nevertheless, the first three lines of Table 2 show the degradation of these denoising methods, which can be very significant to as large as 23 times the nominal frequency uncertainty, due to the missing of data details as the SNR of signal increases [1]. The spatial resolution can also broaden to over twice the original value. In addition, these methods require complex parameter adjustment [3,8]. In addition, the performance of NLM depends on the similarity of the neighbouring data. Moreover, due to the confusion of the definition of frequency uncertainty, many are using the distance domain approach, i.e., the standard deviation of the BFS along the fibre from one measurement within a certain spatial window length, which is inaccurate and overestimates the BFS uncertainty, making the results much better than the real values [6]. In fact, the sequential domain approach, that is, calculating the standard deviation at each fibre location over consecutive measurements is the more accurate method [6]. However, in many publications the calculation approaches of the uncertainty are not clearly defined, making it difficult to compare the performance of different works. In addition, for many static measurements, only a single experimental result is obtained after averaging, so it can be inferred that the distance domain approach is used since there are not enough data for the sequential domain approach. In this work, the sequential domain approach is applied. All the referred results in Table 2 are based on static measurement, and it is suspected that the distance domain approach is used for most of these measurements. Since the distance domain approach can lead to about 2.6 times the overestimation of frequency uncertainty, which is discussed in [6], it is hard to compare the real performance of Table 2. The first value is the nominal frequency uncertainty, while the second value is the accuracy degradation. The temperature values are proportional to BFS uncertainty for Brillouin-based sensors, and 1 • C approximates 1 MHz. 2 The first value is the nominal spatial resolution, while the second value is the degraded spatial resolution.

Comparison with Some Known Methods
On the other hand, compared with BOTDA, the Brillouin signal power from BOTDR is exponentially smaller, making the signal much harder to be detected and the SNR worse. Furthermore, conventional Brillouin-based fibre optic sensors use the frequency sweeping method to acquire the data, which is time-consuming and limits the performance with the frequency step. STFT is a much faster way to obtain the frequency domain data while keeping the redundant information and conducting a denoising effect on the data. The second line from the bottom in Table 2 is the result with STFT and WD [31]. The frequency uncertainty is improved by 19% for this experiment. However, the uncertainty of our work is improved by 24%, which is better than [31], indicating more effective denoising performance. Although the absolute value of [31] is smaller, the spatial resolution of [31] is 5 times the value of this work, and the averaging number is 16 of our work. Considering the enhancement of SNR with these larger values and the possible distance domain approach for frequency uncertainty, the smaller absolute values cannot stand for necessarily better results.
Still, all other experiments in Table 2 are static measurements, apart from our work. For our work, the dynamic measurement sampling rate is 2.5 kHz, indicating the capability of vibration detection up to 1.25 kHz. In addition, DnCNN denoising is applied based on the results with STFT processing, polynomial fitting and the induced small gain SBS, which have already performed some denoising effects. The DnCNN here further improves the system performance and helps to enhance the vibration detection results.
The availability of experimental data could be a limitation of this study. If there are enough experimental data, real noisy data could be used as the training dataset in a later study, so that the training set could be more related to the validation dataset. In addition, experiments with different vibration speeds and amplitudes could better verify the denoising model. In addition, as deep learning is under rapid development, better results might be realized with other technologies.

Conclusions
In this work, the image denoising method of DnCNN is introduced and trained to reduce the noise of the BGS images and improve the performance of vibration detection of the small gain SBS STFT-BOTDR. To investigate the influence of the network depth and the epoch number on the denoising performance of the BOTDR system, the total depth is set to 4, 8, 12 and 16, respectively, and the epoch number is set to 50 and 200, respectively.
By denoising of the BGS images along the fibre length, experimental results show that the total depths of 8 and 16 with the epoch number of 200 generate the best denoised outputs for the BOTDR system in terms of the BFS frequency uncertainties (improved by 1.22 MHz and 1.48 MHz, respectively), and the sine fitting R-squared values (improved by 0.029 and 0.02, respectively), of the detected vibration when denoising is conducted on the BGS images. The obtained BFS profiles along the FUT are also less fluctuant in these cases. However, to better maintain the details and avoid over denoising, the total depth of 8 and epoch number 200 are the best choices, with the frequency uncertainty improved to 3.88 MHz.
As the averaging number is a significant determinant of the dynamic sensing speed, the averaging number must be controlled to a small value to realise the dynamic measurement of BOTDR [27]. On the other hand, a small averaging number leads to more noise, especially for a BOTDR system, where the Brillouin signal is weak, and the SNR is very limited. Therefore, noise reduction is important for the dynamic BOTDR system to enhance the frequency uncertainty. In this work, the averaging number is set to 25, leading to a 2.5 kHz dynamic measurement sampling rate for a 16 µs pulse period. By involving DnCNN, the frequency uncertainty of small gain SBS STFT-BOTDR measurement can be improved by 24%, and the vibration detection can be improved to 0.739 from 0.710 in terms of the Rsquared fitting value. In addition, the conventional image denoising methods, such as WD, NLM and BM3D, can result in unneglectable degradation, due to the loss of information. The frequency degradation can be as large as over 10 MHz, and the spatial resolution can be broadened to as wide as over twice of the original value. DnCNN keeps the redundancy of the information and overcomes this drawback. Hence, with DnCNN, the system can produce more accurate BFS information and therefore more accurate vibration detection, which can be applied in fields such as earthquake detection and intrusion detection.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to potential commercial values.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.