Diversion Detection in Small-Diameter HDPE Pipes Using Guided Waves and Deep Learning

In this paper, we propose a novel technique for the inspection of high-density polyethylene (HDPE) pipes using ultrasonic sensors, signal processing, and deep neural networks (DNNs). Specifically, we propose a technique that detects whether there is a diversion on a pipe or not. The proposed model transmits ultrasound signals through a pipe using a custom-designed array of piezoelectric transmitters and receivers. We propose to use the Zadoff–Chu sequence to modulate the input signals, then utilize its correlation properties to estimate the pipe channel response. The processed signal is then fed to a DNN that extracts the features and decides whether there is a diversion or not. The proposed technique demonstrates an average classification accuracy of 90.3% (when one sensor is used) and 99.6% (when two sensors are used) on 34 inch pipes. The technique can be readily generalized for pipes of different diameters and materials.


Introduction
Pipelines are extensively utilized for oil and natural gas transfer. However, one of the main concerns with pipelines is leakage, with incidents occurring on a regular basis [1]. In addition to human errors, many other factors contribute to pipeline leakage, including external impacts, material faults during manufacturing time, environmental corrosion, internal erosion, ground surface movements, and inappropriate maintenance [2]. Early detection of structural degradation due to such factors is required to maintain structural safety and integrity to lower the likelihood of a catastrophic failure. Thus, the development of robust and cost-effective diagnosis techniques to ensure the structural safety of pipes has long been of great importance [1].
Many works in the literature have studied diagnosis methods (nondestructive testing) for steel pipes, such as radiography, conventional ultrasonic testing, acoustic emission, visual inspection (including thermal imaging and laser scanning)and ground penetrating radar [2][3][4][5][6][7]. However, less work has been conducted on the diagnosis of high-density polyethylene (HDPE), which is commonly used in residential natural gas service lines. Nonetheless, the diagnosis of HDPE pipes is important for several reasons including leak detection and pipe integrity monitoring [8].
Gas diversions are carried out illegally by using methods to obstruct the flow of natural gas through the pipe and installing a T-fitting that redirects natural gas to an unmetered location for unmeasured consumption. This tampering with the pipe poses many risks since it is unrecorded, violates pipeline quality standards, and can lead to potential leaks and possibly explosions, consequently posing significant risk to public safety, property, and the environment in the vicinity of altered gas lines. Such diversions have been discovered in the past through word of mouth, leaks, or unexpected encounters with an unrecorded natural gas pipe in a construction site [9][10][11][12][13].
Therefore, effective and timely techniques for diversion detection are required to avoid such risks. Given the concealed nature of below-grade pipes, it is very challenging to inspect • To propose, design, and test piezoelectric transducer arrays, electronic circuits, signal designing and processing, and DNN models required to develop a DNN-based UGW detection scheme. • To implement and evaluate a DNN-based UGW detection scheme, for detecting diversions in HDPE pipes.
To this end, we design two clamps containing an array of piezoelectric transmitters and receivers to enable transmitting waves through the HDPE pipes. To develop the transmitter and receiver signal processing, we employ Zadoff-Chu sequences (due to their desired correlation properties [30]) at the transmitter side and correlation methods at the receiver side to estimate the pipe's impulse response. These signals are designed in a computer, which is interfaced with the transmitters and receivers through a waveform generator, an oscilloscope, and the necessary driving circuits. To classify the received correlation signals, we develop a DNN architecture that extracts features from the correlation signal and use these features to detect a diversion. The DNN consists of a CNN that extracts features from segments of the correlation signal, followed by a long short-term memory (LSTM) network that extract features from temporal correlations between segments, followed by a multi-layer perceptron that makes a decision. We train and test the DNN using signals from experiments on control pipes and pipes with diversions, and show that the developed method is able to classify control and diversion samples with an accuracy of 90.3% when one receiving sensor is used, and 99.6% when two sensors are used.
The details of the proposed method are described in the rest of the paper. Section 2 presents the system specifications and problem statement. The signal design and processing techniques are presented in Section 3. Empirical data collection is described in Section 4, and the proposed DNN architecture is illustrated in Section 5. The results are presented in Section 6, and the paper is concluded in Section 7.

System Specifications
We consider HDPE pipes with an outer circumference of 26.7 mm and a wall thickness of 2 mm ( 3 4 inch pipe), which are commonly used for transporting and distributing natural gas to households. For the purpose of diversion detection, we consider two types of pipes as shown in Figure 1: a control pipe without a diversion, and a pipe with a diversion, i.e., a tee-fitting installed along the pipe. The lengths of the considered pipes are 1, 2, 3, and 5 m, with a diversion installed at 0.6, 1.2, 1.5, and 3 m from one end of the pipe for diversion samples, and with nothing installed on the control samples. The length of the pipe extending from the tee-fitting is 30 cm. The goal is to detect the diversion using UGW and machine learning. To this end, piezoelectric transducers are used to transmit/receive ultrasonic waves. For UGW testing, a range of the operation frequency between 20 kHz and 100 kHz is widely used [31], which was considered when selecting the transducers. We opted for transducers with 40 KHz frequency due to its availability in the market with proper dimensions. Moreover, utilizing 40 KHz can help in mitigating the attenuation effect, which increases the detection range in return [32]. These transducers generate an electrical signal in response to a sensed mechanical signal (propagating through the pipe), which is known as the piezoelectric effect, or generate a mechanical signal (to be sent through the pipe) in response to a driving electrical signal, which is known as the reverse piezoelectric effect. The transmitters and receivers are placed in two clamps ( Figure 2) which are designed to fit the pipe and to hold the transducers. Placing multiple transducers in each clamp can aid in improving signal generation (transmitted power and modes) as well as signal detection. For the purpose of this paper, the transmitter clamp was equipped with six transmitters, while the receiver clamp was equipped with two receivers, distributed evenly around the circumference of the pipe. The clamp has a total of eight slots, six of which form a ring where transducers can be placed 60 degrees apart, and two additional slots which are 180 degrees apart for deploying additional transducers. This clamp was modified from the original design in [33].
To sense the pipe for a diversion, we use the system shown in Figure 3. First, the signal to be transmitted is designed in Matlab. Then it is sent to an arbitrary waveform generator (AWG, Siglent Technologies SDG2042X), which is interfaced with an Op-Amp driving circuit (2 KΩ and 10 KΩ resistors and a 741 Op-Amp), powered by a DC-power supply (Siglent Technologies SPD3303X-E), which in turn drives the ultrasonic transmitters (Murata Electronics MA40S4S). The transmitters generate a wave that propagates along the pipe's circumference. The reflected signal is captured using the receivers (Murata Electronics MA40S4R) which are interfaced with an oscilloscope (Siglent Technologies SDS1104X-E) that records the signal. Finally, the recorded signal is sent to a PC for signal processing and analysis.
Due to the dispersion effect that takes place whenever an ultrasound signal travels through a solid object (in this case, the pipe), the wave will be decomposed into three distinct modes: flexural, torsional, and longitudinal modes [17]. These modes travel through the pipe at varying speeds depending on the mode. As a result, analyzing the received signal can be challenging, as the signal is likely to be a mix of these modes. Mode filtering/isolation is often used to improve the received signal quality [34]. However, to keep the process simple and to use all modes combined for diversion detection, we replace the classical signal processing techniques with a machine learning technique applied after correlating the received signal with the Zadoff-Chu sequence as explained next. The clamp has a total of eight slots, six of which form a ring where transducers can be placed 60 degrees apart, and two additional slots which are 180 degrees apart for deploying additional transducers. This clamp was modified from the original design in [34].  Figure 3. Experimental setup: The designed signal is transmitted via ultrasonic transmitters with the aid of an AWG (arbitrary waveform generator), a DC-power supply, and a driving circuit, and received using ultrasonic receivers with the aid of an oscilloscope. Signal processing takes place in a computer.
effect, or generate a mechanical signal (to be sent through the pipe) in response to a driving 121 electrical signal which is known as the reverse piezoelectric effect.

122
The transmitters and receivers are placed in two clamps ( Fig. 2) which are designed to 123 fit the pipe and to hold the transducers. Placing multiple transducers in each clamp can aid 124 in improving signal generation (transmitted power and modes) as well as signal detection. 125 For the purpose of this paper, the transmitter clamp was equipped with six transmitters, 126 while the receiver clamp was equipped with two receivers, distributed evenly around the 127 circumference of the pipe.

128
To sense the pipe for a diversion, we use the system shown in Fig. 3. First, the signal 129 to be transmitted is designed in Matlab. Then it is sent to an arbitrary waveform generator 130 (AWG, Siglent Technologies SDG2042X) which is interfaced with an Op-Amp driving 131 circuit (2 KΩ and 10 KΩ resistors and a 741 Op-Amp), powered by a DC-power supply 132 (Siglent Technologies SPD3303X-E) which in turn drives the ultrasonic transmitters (Murata 133 Electronics MA40S4S). The transmitters generate a wave that propagates along the pipe's 134 circumference. The reflected signal is captured using the receivers Murata Electronics 135 MA40S4R) which are interfaced with an oscilloscope (Siglent Technologies SDS1104X-E) 136 that records the signal. Finally, the recorded signal is sent to a PC for signal processing and 137 analysis.

138
Due to the dispersion effect that takes place whenever an ultrasound signal travels 139 through a solid object (in this case the pipe), the wave will be decomposed into three distinct 140 modes: flexural, torsional, and longitudinal modes [23]. These modes travel through the 141 Figure 3. Experimental setup: The designed signal is transmitted via ultrasonic transmitters with the aid of an AWG (arbitrary waveform generator), a DC power supply, and a driving circuit, and received using ultrasonic receivers with the aid of an oscilloscope. Signal processing takes place in a computer.

Signal Design and Processing
This section discusses the design of the transmitted signal and its benefits, in addition to the signal processing of the received signal, which is applied before feeding the signal into the DNN for classification. We provide a generic description of the scheme first, and then we specify the utilized parameters afterwards.

Transmitted Signal Design
To sense the pipe, we interpret the pipe as a channel whose impulse response can be used as a signature to determine the presence of a diversion or the absence thereof. Following this interpretation, we propose to transmit a training signal with good correlation properties in line with signals used for channel estimation, rather than a pulse as in [18,19,35]. There are several signals that can be used for this purpose, including maximum-length pseudo-noise sequences [36] and Zadoff-Chu sequences [30], which are known to aid in channel estimation as well as providing immunity against disturbances, such as noise and interference. These sequences have good auto-correlation properties, namely, their autocorrelation resembles a Dirac impulse. As such, when such a sequence is transmitted, correlating the received signal with the sequence itself yields the channel impulse response (within some distortion due to noise).
We opt for using a Zadoff-Chu sequence due to its desirable auto-correlation properties. A Zadoff-Chu sequence (named after Solomon A. Zadoff and D.C. Chu [30,37]) is a complex valued poly-phase sequence with constant amplitude and zero auto-correlation (CAZAC). A length N Zadoff-Chu sequence s[k], k = 0, 1, . . . , N − 1, can be expressed as where α k is given by and M is coprime with N. To convert this to a time-domain signal that can be transmitted using the transducers, we use pulse shaping using a sinc function where T c is the pulse duration, sampled at a sampling rate of f s to obtain the signal sin π n f s T c − k where n = 0, 1, . . . , NT c f s − 1. The signals[n] is then modulated using a f c = 40 KHz carrier (the frequency of the transducer) to obtain the transmitted signal x[n] =s[n]e j2π f c n/ f s , n = 0, 1, . . . , NT c f s − 1 (note that we require f s > 2 f c to avoid aliasing).

Received Signal Processing
The received signal of each receiver is first sampled at a sampling rate of f s , and bandpass filtered using an infinite-impulse-response (IIR) bandpass filter centered at 40 KHz with a 3 dB bandwidth of 10 KHz to eliminate out-of-band noise and interference. The resulting signal can be modeled as where n is the time index, h[n] is the impulse response of the pipe, is the convolution operator, and w[n] is noise. A general K-tap impulse response can generally be represented is the Dirac delta pulse which equals 1 when n = 0 and 0 otherwise, and β k ∈ R and d k ∈ N are the attenuation coefficient and the time delay of the k-th tap.
To estimate the channel response, we cross-correlate the received signal y[n] with the reference signal x[n] to obtain the following signal [37], we have the following (note that while the circular correlation of the Zadoff-Chu sequence coincides with a Dirac pulse, the linear correlation is very close (but not equal) to a Dirac pulse): where we replace the approximation by an equality for simplicity. This yields To eliminate the carrier, we then apply a Hilbert transformer and compute its magnitude to obtain the envelope of r[n] [38]. The resulting signal will have prominent peaks at time instants d k (as can be observed in (8)), which can be interpreted as a 'signature' of the channel h[n]. It is expected that the signal transmitted through a control pipe will produce a signature that is different from the one produced from a diversion sample. This can be used to classify the pipes accordingly.
In this work, we use N = 512 and M = 1 to generate a Zadoff-Chu sequence, and we use T c = 300 µs and f s = 1 MHz. As seen in Figure 4, the resulting signal has a central frequency of 40 KHz and a 3-dB bandwidth of 3 KHz which can be transmitted using the selected transducers that have a bandwidth of 3.3 KHz. Figure 5 depicts the received signal y[n] in (4) and the envelope of the correlation signal in (8). Due to the low-frequency characteristics of this envelope, we down-sample it to 10 KHz. The resulting signal envelope, which has a total of 320 time samples, is used as an input for the proposed DNN. 38 38. Note that a similar process is applied to the signal received from each of the two sensors. Next, we describe our data collection procedure, which uses the signal processing explained above.

Data Collection
To train and assess a deep learning algorithm for detecting diversions, we collect a dataset of correlation envelope signals (similar to Figure 5b) using the experimental setup described in Section 2 and the signal processing in Section 3. We consider two types of pipes, namely control and diversion pipes. To diversify the dataset, we use pipes with lengths of 1, 2, 3, and 5 m, with diversions (in the diversion samples) at distances of 0.6, 1.2, 1.5, and 3 m from one end of the pipe, respectively.
For each pipe, we attach the clamps so that the distance between the transmitting and receiving clamps is 11 cm. Additionally, to guarantee consistent readings of the received signals, we make sure the transmitting and receiving clamps are oriented similarly. To do this, we mark the upper side of each clamp (see Figure 2), and we ensure that a line passing through the two marks is coplanar with the axis of the pipe.
Initially, the clamps are installed so that the transmitting (respectively receiving) clamp is 20 cm (respectively 31 cm) away from the beginning of the pipe. To collect a diverse set of measurements from this clamp position, we rotate the clamps with increments of 120 degrees while maintaining the alignment of the transmitting and receiving clamps. After collecting data from this position, the clamps are advanced by 4 cm, and the same process is repeated to collect more data. This process is repeated until the clamps are 20 cm away from the diversion on the diversion pipes, and at a similar position on the control pipes. For each position and rotation, 2 to 4 samples are collected from each of the two sensors. We collected a dataset of 2292 samples, 1006 of which were from control pipes and 1286 from diversion pipes. Figure 6 shows the correlation envelope of a number of samples from each class (diversion and control). The figure shows the presence of differences between the two classes, the most evident of which is the delay spread which is larger in diversion samples than in control samples. However, the figure also shows that it is not always reliable to use this difference to classify the pipes. Fortunately, a deep neural network (DNN) can be used to learn subtle differences between the samples to produce a more reliable classification. Thus, we train a DNN to classify the pipes based on the correlation envelope signals, which is detailed in the next section.

DNN Architectures
The traditional techniques that are usually employed in the area of non-destructive testing (NDT) to categorize flaws are prone to subjective influences, such as a person's opinion [39]. In addition, since the received signals are one-dimensional, it is difficult to use them to precisely detect diversions by visual inspection. This challenge can be addressed by using an automated detection system based on a DNN to provide high-accuracy detection and avoid the manual detection drawbacks.
The first step in the proposed DNN is to learn features from the data. Various methods have been used in the literature to extract or learn features using time and frequency domain features [23,25], using wavelets and a CNN [26,40], or using pre-established standard time-series classification networks, such as long short-term memory recurrent networks (LSTMs) [41]. In this paper, we use CNN and LSTM layers to learn features from the correlation envelope signals. We investigate three types of DNN models, namely CNNbased models, LSTM-based models and CNN-LSTM-based models, all implemented in TensorFlow. These models were chosen based on their ability to extract shape features (CNN-based models), analyzing time-series signals (LSTM-based models) or combining both (CNN-LSTM-based models). In the following, we explain these models in details.

CNN Based Model
It can be seen from Figure 6 that the correlation envelope signals of the control sample have different shapes than those of the diversion samples. Therefore, we can use the signal shape to classify the pipe. Since our data consist of correlation envelop signals, and since a CNN is known for its accuracy when identifying shapes, it can also be used to extract shape features from a 1D signal as in the correlation envelope signal.
A CNN-based model can contain one or multiple stacked CNN layers. In this paper, we consider one and two CNN layer models (CNN model and two stacked CNN model) with 64 filters in each layer as shown in Figure 7 (since it was noticed that increasing the number of layers/filters further does not improve accuracy). We describe the 2-layer architecture, noting that the 1-layer architecture can be described in an analogous way. We use the correlation envelope signals, represented as vectors with dimension 320 × 1, as inputs to the first CNN layer which uses 64 filters (kernels) each with size 3 × 1. Each filter learns to extract only one feature from the input sequence. By applying this filter to the whole input (which is 320 × 1), this results in an output of size 318 × 1 for each filter. Since 64 filters are used, this process yields a 318 × 64 array of features as an output. This output is then fed to a rectified linear unit (ReLU) activation layer, which is commonly used in many neural networks, and it is known for its fast training and superior results.
The output of the first CNN layer is further processed by feeding it to another CNN layer. The propose of using a second layer is to extract composite features (features of features). As a result, the two layers combined extract features with complex patterns from the input signal. The second CNN layer operates similar to the first one, where 64 distinct filters of size 3 × 64 and a ReLU activation layer are used to obtain a 316 × 64 output array.
A dropout layer is used at the output of the second CNN as a regularization technique to prevent over-fitting. It randomly 'drops' some neurons from the CNN (by setting their output to zero), which prevents the model from relying on specific features or neurons. In the training stage, the dropout layer randomly chooses a number of neurons to drop in each training iteration. The number of neurons is calculated based on a parameter called the drop out ratio, which varies from 0 (no dropout) to 1 (full dropout). We choose a ratio of 0.5 which means that 50 percent of the neurons are dropped at random (chosen since it led to the highest accuracy among a set of examined ratios). Note that, as this layer is a regularization layer, it is deactivated after the model is trained. This layer does not change the size of its input and thus its output is a 316 × 64 array.
A max pooling layer is often employed after a CNN to reduce the complexity through downsampling and to avoid data over-fitting. This is implemented by pooling a number of outputs into one output through choosing their maximum value. We used a pooling size of 2 and a stride of 2, which means that the output is the maximum value in nonoverlapping windows of size 2 for each filter and segment. As a result, the output becomes a 158 × 64 array.
After the features are extracted using the previous steps, we need to convert these features into a one feature vector to prepare it for classification. This is achieved using a flatten layer that reshapes features of shape 158 × 64 into a vector of shape 1× 10,112. These features are then fed to the multilayer-perceptron layer, which will be explained in Section 5.4.

LSTM Based Models
An LSTM is a type of RNN which is used to extract features from sequential data. Unlike feed-forward neural networks, LSTMs include feedback connections to enable them to learn dependencies between time-correlated data points. In this paper, we examine two LSTM-based models, i.e., the LSTM model and 2 stacked LSTM model, as shown in Figure 8. As shown in Figure 9, a typical LSTM cell consists of three gates: a forget gate, an input gate, and an output gate. It has an input vector x t , a cell state c t , and a hidden state h t . The hidden state is computed from c t , which is used as feedback. The output of the LSTM cell is the hidden state obtained after processing the whole sequence {x t } 319 t=0 . The gates in the LSTM cell regulate the flow of information from one time step to the other. The forget gate uses the current input x t and previous hidden state h t−1 to gate the previous cell state c t−1 . The input gate uses x t and h t−1 to determine their contribution to the current cell state c t . The output gate uses x t and h t−1 to determine if the current cell state c t contributes to the current hidden state h t which will be fed back to the LSTM cell in the next time step. This mechanism enables the LSTM cell to extract one feature from the time series {x t } 319 t=0 while considering the temporal dependencies between the time steps. We utilize an LSTM layer that consists of 100 LSTM cells to extract 100 features from the input time series {x t } 319 t=0 of dimension 320 × 1. This leads to an output of dimension 1 × 100.
To regularize the LSTM layers, we use a dropout layer with a dropout ratio of 0.5. This is only used while training and removed afterwards. It remains to use this information to produce a classification outcome, 0 for control and 1 for diversion, which is done using a multi-layer perceptron.

CNN-LSTM Based Models
To exploit both the shape and the time dependencies of the input signal, models with CNN and LSTM layers were examined. Theoretically, CNN can be used to extract shape features, which can be then fed to the LSTM layer to combine the temporal information with shape features to produce global features. For those reasons, we considered three CNN-LSTM based models: CNN-LSTM, CNN-(2-LSTM), and (2-CNN)-LSTM) models. The proposed DNNs take the processed correlation signals obtained from the previous stage as an input and classifies the pipe into control or diversion based on this input. As shown in Figure 10, the proposed neural networks consists of three main blocks, namely, a CNN block, an LSTM block block, and a multi-layer perceptron block. In this model, we are using a time-distributed layer on top of the CNN block in order to preserve the time dependencies of the extracted features from CNN block, which are then fed to the LSTM block as explained next. In the proposed method, we use a two-step procedure to learn features from the obtained correlation envelope signal. The first is to learn short time-scale features from seg-ments of the correlation envelope signal using a CNN. The second is to learn long time-scale correlations between segments of the correlation envelope signal using a LSTM network.
To learn short time-scale features using a CNN while preserving longer time-scale information related to the temporal variations of the signal, the correlation signal, a 320 × 1 vector, is split into multiple segments. Each segment is fed into a CNN in parallel (using a time-distributed layer). Then the outputs of the CNNs are downsampled and converted into a time-sequence of feature vectors.
As explained in Section 3, the input of the DNN is a vector with 320 dimensions. This vector is then segmented into N s non-overlapping vectors, each of size T s × 1, where T s = 320 N s and N s is a divisor of 320. This process helps in capturing the temporal information in the signal, while decreasing the computational complexity of the model. Note that features will be extracted from each segment separately instead of the whole 320 samples, as we explain next. The final input size after segmentation is N s × T s × 1. In this work, we select N s = 4 and T s = 80. These values are selected after trying different values of N s and selecting the one that leads to the highest accuracy.
Two 1D CNN layers are used as follows. For the first layer, 64 filters (kernels) of size 3 × 1 are utilized. By applying these filters to the input (4 × 80 × 1), an output of size 4 × 78 × 1 is produced for each filter. This procedure generates a 4 × 78 × 64 array of features as a result of utilizing 64 filters. This output is then fed to a rectified linear unit (ReLU) activation layer.
The output of the first CNN layer is further processed by feeding it to another CNN layer. The second CNN layer operates similar to the first one. We use 64 distinct filters of size 3 × 64 and a ReLU activation layer to obtain a 4 × 76 × 64 output array. The output is then passed to a dropout layer (ratio of 0.5), maxpooling layer (of size 2 and stride 2) and flatten layer. The output of the flatten layer is with a shape of 4 × 2432, which is then used as an input to the LSTM block.
Note that increasing the number of segments N s leads to increasing the length of this sequence, in return for shorter segments and less features. For instance, for N s = 8, the length of each segment will be 40, and the result after the flatten layer will be a sequence of 8 vectors with 1152 features each. In general, one should examine different choices for a given application, and select the one that produces the desired accuracy.
In order to extract 100 features from a 4 × 2432 dimensional input time-series, we employ an LSTM layer composed of 100 LSTM cells. The resulting output has the dimension of 1 × 100. A dropout layer with a 0.5 dropout ratio is used to regularize the LSTM. The output of the LSTM is then fed into multilayer perceptron networks, which is explained next.

Multilayer Perceptron
The multilayer perceptron is a fully connected dense layer followed by a classification layer. We use multilayer perceptron with 100 inputs, 100 hidden units with ReLU activation functions, and 1 output with a sigmoid activation function. The advantage of a dense layer is that it learns combinations of features from the 100 inputs produced by the LSTM layer. The output of this layer is a probability value between 0 and 1, representing the class of the input sample. This probability is used to determine a class by thresholding at 0.5.

Experimental Protocol and Results Discussion
In this section, we first provide the experimental protocols, discussing the models' evaluation metrics. Then we examine and discuss the performance of the proposed algorithms according to the performance metrics presented in Section 6.1.

Experimental Protocols
To evaluate the proposed deep learning models, we compare the performance of the LSTM, 2-LSTM, CNN, 2-CNN, CNN-LSTM, CNN-(2-LSTM), and (2-CNN)-LSTM models (cf. Section 5). We consider different performance metrics to examine the performance of the considered deep learning models. In particular, the considered DNNs are all evaluated using the following metrics: • Accuracy: The accuracy is the percentage of data points that are classified correctly by the algorithm over the total number of data points. Mathematically, the accuracy can be expressed as where T P is the number of the correctly classified positive points (diversions), F P is the number of misclassified negative points (control), T N is the number of correctly classified positive points, and F N is the number of misclassified negative points.
• Recall: The recall describes the ratio of correctly classified positive samples over the total number of the positive samples, which is given by • F1-score: F1-score is the harmonic mean of the precision and the recall values, which can be expressed as We calculated these metrics to evaluate the models and specify the model that achieves the highest performance. Furthermore, we studied these models in terms of their convergence rate (see Section 6.2).
Using the aforementioned data-collection procedures, we collect 2292 samples and split them into training, validation, and testing sets, where the splitting ratios are selected as 70%, 10%, and 20%, respectively. We shuffle the samples of the dataset before splitting to guarantee that each subset represents the true distribution of the whole dataset. A full description of the dataset is presented in Table 1. The dataset includes measurements from two sensors. To evaluate the performance of the proposed models, we trained each model with a single sensor reading and then again with both sensors' readings. Thus, we can evaluate the robustness and performance of each model under a variety of circumstances. We used Python supported by TensorFlow, a machine learning library, to build and train the considered DNNs. We adopted the categorical cross-entropy as a cost function, and used the ADAM optimizer (a first-order gradient-based approach) to tune the network parameters. To speed up the training process, we used the mini-batch learning approach, where the training dataset was divided into sub-sets (mini-batches) of 64 and fed into the DNN models in an iterative manner. The number of times the training iterates over all the mini-batches is the number of epochs which we set to be 25. Note that these hyperparameters are selected based on trying various values and selecting the ones that lead to the best performance on the validation set. The performance metrics of the DNNs were calculated using the testing set by training the model 25 times (folds) with random initialization and then taking the average and the standard deviation of each metric.

Discussion of the Results
The results in Table 2 and Figure 11a,b show that the LSTM-based models (LSTM and 2-LSTM models) perform poorly in diversion classification compared to the other models. This is due to the nature of the input signal, which contains long-term time dependencies that the LSTM layer cannot capture due to its limited memory, despite the fact that LSTM networks have superior memory to RNN networks.
In general, Table 2 reveals that the performance of the CNN-based models is better than the performance of the LSTM-based models. Figure 11c,d depict the learning curves of the CNN-based models. The figures shows a steady improvement in both loss and accuracy with increasing the number of epochs. This indicates that the CNN layers are able to extract features from the correlation signal that can distinguish between the diversion and control samples with an average classification accuracy of 83%. Table 2. Performance metrics of the tested models (using one sensors' readings).

Model
Performance Metrics  Table 2 also shows that adding the LSTM layer to CNN-based models does not necessarily improve the classification performance when two sensors are considered. However, the table shows that the model (2-CNN)-LSTM model reports the best performance over all the other models. Figure 11g confirms the superiority of the (2-CNN)-LSTM model by showing its fast convergence compared to the other models. These results indicates that an appropriate combination of CNN and LSTM layers may provide a DNN model that can significantly outperform the LSTM-and the CNN-based models.
In Table 2, results show that the highest accuracy reported among all the proposed models is 90.3%. This is because using only one sensor to read the data might not be enough to detect the diversions due to the fact that the sensor orientation may not be suitable to capture the reflected signal from the diversion. While this can be be addressed by considering a more sophisticated DNN architecture, this increases the model complexity, which is not desired in practice. Instead, adding another sensor can help DNN models in characterizing the reflected signals with higher confidence while decreasing complexity as is discussed next. Table 3 shows the performance of all examined models when two sensors are used, and their learning curves are shown in Figure 12. It can be observed that the performance of the LSTM-based models does not improve with adding a sensor to the system due to the difficulty in extracting long-term time dependencies. On the other hand, CNN-and CNN-LSTM-based models all report a significant improvement in all the metrics recorded. This suggests that with enough sensors, the correlation envelope signals become easier to classify, and high accuracy can be achieved, even with a single CNN layer, which is desirable due Table 3. Performance metrics of the tested models (using two sensors readings). The proposed scheme can be utilized to be used with different type of pipes. It can also be used to classify different types of defects as long as the defect is captured by the correlation envelop signal.
The limitations of this scheme are as follows. First, the maximum pipe length used in our experiments was 5 m, which raises a question of whether the proposed methods can detect a diversion in longer pipes. It is expected that the accuracy might decrease with increasing the pipe's length; however, enriching the dataset via using more than two sensors may improve performance. In addition, the performance of the proposed method is expected to deteriorate if the pipes are buried below ground. Therefore, in our future work, we will extend this work to study the diversion detection in buried pipes.

Conclusions
The method proposed in this paper was capable of classifying HDPE pipes into two control and diversion pipes with an accuracy of 90.3% (when one sensor is used) and 99.6% (when two sensors are used). The method relies on using Zadoff-Chu sequences which are well-known for having excellent autocorrelation properties suited for channel estimation applications. As a result, using signal processing at the receiver combined with a DNN, we were able to determine whether the pipe has a diversion or is a control pipe. It is possible to enhance the algorithm further by adding a second step that estimates the location of the diversion using another DNN, which is left for future work.