A Reliable Acoustic EMISSION Based Technique for the Detection of a Small Leak in a Pipeline System

: This paper proposes a reliable leak detection method for water pipelines under di ﬀ erent operating conditions. This approach segments acoustic emission (AE) signals into short frames based on the Hanning window, with an overlap of 50%. After segmentation from each frame, an intermediate quantity, which contains the symptoms of a leak and keeps its characteristic adequately stable even when the environmental conditions change, is calculated. Finally, a k-nearest neighbor (KNN) classiﬁer is trained using features extracted from the transformed signals to identify leaks in the pipeline. Experiments are conducted under di ﬀ erent conditions to conﬁrm the e ﬀ ectiveness of the proposed method. The results of the study indicate that this method o ﬀ ers better quality and more reliability than using features extracted directly from the AE signals to train the KNN classiﬁer. Moreover, the proposed method requires less training data than existing techniques. The transformation method is highly accurate and works well even when only a small amount of data is used to train the classiﬁer, whereas the direct AE-based method returns misclassiﬁcations in some cases. In addition, robustness is also tested by adding Gaussian noise to the AE signals. The proposed method is more resistant to noise than the direct AE-based


Introduction
Water pipeline systems are required to accomplish commercial and domestic activities.A leak in such a system can cause financial loss, waste of resources, and even human deaths due to the collapse of the system.Thus, numerous approaches have been introduced to diagnose pipeline leaks.The approaches include passive and active systems [1], hardware-based and software-based methods [2].Previous works [3,4] also provided an overview of research on leak detection.Among those approaches, acoustic emission (AE)-based techniques, which are passive and hardware-based, are promising because AE sensors can detect small leaks quickly, offering high sensitivity regarding fault growth in a pipeline system.Thus, many researchers have exploited AE-based techniques for pipeline diagnostics [5][6][7][8][9][10][11].
The first stage of the diagnosis process is to spot an existing fault.Because of the complexity of AE activity, it is difficult to establish a leak detection model directly from mathematical equations; therefore, different methods have been proposed to train the model using recorded datasets [6,7].Although these techniques offer high accuracy, they could be inefficient in diverse circumstances because the classifiers are trained using features extracted directly from AE signals.Thus, it makes the classifiers dependent on the absolute signal levels that are influenced by the flow rate and pressure in a pipeline system [12,13].Moreover, AE signals also vary with temperature, which is uncertain in industries.As a result, AE measurements recorded under such uncertain working conditions have different signal levels, and thus, a fault diagnosis model based on the absolute amplitude values of the signals is unreliable.In [14], the authors presented a study that investigates the inherent properties of the vibro-acoustic signals instead of relative amplitude values.It delivers useful insights and results for water leak detection, but the study focused on pipelines buried under soil less influenced by external factors while AE signals from pipelines above soil in factories are prone to noise.This issue can be addressed by acquiring many training datasets under different working conditions to provide enough information for the classifier.However, it is not an optimal solution because the design of experiments can become immensely expensive or even prohibitively complicated.Therefore, this study extracts features from transformed signals independent of an absolute level instead of directly from AE signals.The obtained classifier can detect small leaks in a pipeline with high accuracy and reliability.
For the experiments, this study uses a water pipeline system deployed in a laboratory.Three AE sensors are mounted on different parts of the system with the known distances among the sensors.A pinhole is drilled through the pipe wall to simulate leakage of the pipeline and a valve is attached to the wall to control the flow of water through the leak.When the valve is closed, and system is working in a normal state (no leakage) then there is no variation in the levels of the recoded signals.
If a small leak appears between sensors 2 and 3, an imbalance is created between AE channels 1 and 2 by reporting dissimilar amounts of noise from the leak.Since sensor 2 is nearer the leak than sensor 1, the leak noise on channel 2 is greater than that on channel 1, as explained by the theory of wave attenuation [15][16][17].Thus, this uneven behavior of the signals can be used to detect small leaks in a pipeline system.The leak-signal-to-normal-noise ratio and the attenuation law of wave propagation can be used to detect the leak; these characteristics are demonstrated in Section 3.
To demonstrate the advantages of the balance/imbalance-based approach, this paper uses the theory of Kullback-Leibler distance [18] to measure the separability between the two classes (i.e., NORMAL and ABNORMAL) in different experiments.Furthermore, the paper applies a k-nearest neighbor (KNN) model to classify between normal and leak.The KNN classifier is trained by two approaches, one using features extracted directly from the AE signals and the other using features extracted from the transformed signals.The KNN classifier in both the approaches are trained on the data of one working condition and tested using data of other conditions.In addition, Gaussian noise is added to the AE measurements to evaluate the robustness of the classifiers trained by the two approaches.

Data Acquisition
Figures 1 and 2 show a setup of the AE signals acquisition from a water pipeline system.The testbed installation and the sensors configuration parameters are listed in Table 1.The experiments are based on four leaks with different diameters, i.e., 2.0, 1.0, 0.5, and 0.3 mm (mm), abbreviated as L1, L2, L3, and L4, respectively.The water flow is controlled using pressures of 7, 13, and 18 bar, given as P1, P2, and P3, respectively.The experiments are conducted under a stable temperature of approximately 29 • C.

AE Sensors
To acquire AE signals, R15I-AST sensors are used to provide high sensitivity during the data recording and the recorded signals tend to be free of low-frequency components.The sensors' characteristics are summarized in Table 2.

Data Record
The normal and abnormal states of the system refer to the closed and open positions of the valve installed on the leak, respectively.The data for every pair of (P, L) combinations are recorded for 2 min with a sampling frequency of 1 megahertz (MHz) after the water flow are stable.Thus, there are a total of 72 datasets for the three signal channels in the experimental conditions (3 channels × 3 pressures × 4 leaks × 2 classes).Figure 3 presents mode setting of water flow.In the first stage, the pump is turned on, the leak is deactivated (closed) and the pressure is controlled around 7 Bar.Then, AE signals are recorded in 2 min.In the second stage, the leak is activated (open).The acquisition device waits for the flow stabilization and records AE signals in 2 min.The process continues, as shown in Figure 3. Figure 4 shows the AE signals from channel 2 for leak L3 under three different flow pressures.

AE Sensors
To acquire AE signals, R15I-AST sensors are used to provide high sensitivity during the data recording and the recorded signals tend to be free of low-frequency components.The sensors' characteristics are summarized in Table 2.

Data Record
The normal and abnormal states of the system refer to the closed and open positions of the valve installed on the leak, respectively.The data for every pair of (P, L) combinations are recorded for 2 min with a sampling frequency of 1 megahertz (MHz) after the water flow are stable.Thus, there are a total of 72 datasets for the three signal channels in the experimental conditions (3 channels × 3 pressures × 4 leaks × 2 classes).Figure 3 presents mode setting of water flow.In the first stage, the pump is turned on, the leak is deactivated (closed) and the pressure is controlled around 7 Bar.Then, AE signals are recorded in 2 min.In the second stage, the leak is activated (open).The acquisition device waits for the flow stabilization and records AE signals in 2 min.The process continues, as shown in Figure 3. Figure 4 shows the AE signals from channel 2 for leak L3 under three different flow pressures.

AE Sensors
To acquire AE signals, R15I-AST sensors are used to provide high sensitivity during the data recording and the recorded signals tend to be free of low-frequency components.The sensors' characteristics are summarized in Table 2.

Data Record
The normal and abnormal states of the system refer to the closed and open positions of the valve installed on the leak, respectively.The data for every pair of (P, L) combinations are recorded for 2 min with a sampling frequency of 1 megahertz (MHz) after the water flow are stable.Thus, there are a total of 72 datasets for the three signal channels in the experimental conditions (3 channels × 3 pressures × 4 leaks × 2 classes).Figure 3 presents mode setting of water flow.In the first stage, the pump is turned on, the leak is deactivated (closed) and the pressure is controlled around 7 Bar.Then, AE signals are recorded in 2 min.In the second stage, the leak is activated (open).The acquisition device waits for the flow stabilization and records AE signals in 2 min.The process continues, as shown in Figure 3. Figure 4 shows the AE signals from channel 2 for leak L3 under three different flow pressures.

Small Leak Detection Methodology
The mathematical modeling of the leaks and their symptoms is based on the testbed, as shown in Figure 1.

Symptoms of Leak Presence
There exists AE activity in pipelines, even when they are operating in a healthy state.This activity could be due to the mechanical sources, such as pumps or particles in the flow, hitting a pipeline wall, or hydraulic sources caused by pressure pulses at vortexes in the fluid inside the pipeline [12].Let assume that ni (i = 1,2) are AE signals acquired by sensors 1,2 in the normal state, and their mean and variance are 0 and N1 ≈ N2 ≈ N > 0, respectively.When a small leak occurs in the testbed pipeline, it makes a small disturbance in the flow around the leak, which introduces a new AE source into the system.The previous source ni can be deemed as background noise; suppose that this noise is uncorrelated with the source of the signal obtained from the leak.Therefore, the model for AE measurements from the sensors for this scenario can be given as: zi = si + ni (i = 1,2), where si is the leak AE signal received by sensor i.If the variance of zi and si is Zi and Si, respectively, then the uncorrelation between ni and si, Zi = Si + N can be explored by setting g = Z2/Z1 and it is transformed as follows: If the background noise ni (i = 1,2) is bandlimited white noise, then its variance is always N over its entire frequency range.Next, consider the measurement model when the measurement consists

Small Leak Detection Methodology
The mathematical modeling of the leaks and their symptoms is based on the testbed, as shown in Figure 1.

Symptoms of Leak Presence
There exists AE activity in pipelines, even when they are operating in a healthy state.This activity could be due to the mechanical sources, such as pumps or particles in the flow, hitting a pipeline wall, or hydraulic sources caused by pressure pulses at vortexes in the fluid inside the pipeline [12].Let assume that ni (i = 1,2) are AE signals acquired by sensors 1,2 in the normal state, and their mean and variance are 0 and N1 ≈ N2 ≈ N > 0, respectively.When a small leak occurs in the testbed pipeline, it makes a small disturbance in the flow around the leak, which introduces a new AE source into the system.The previous source ni can be deemed as background noise; suppose that this noise is uncorrelated with the source of the signal obtained from the leak.Therefore, the model for AE measurements from the sensors for this scenario can be given as: zi = si + ni (i = 1,2), where si is the leak AE signal received by sensor i.If the variance of zi and si is Zi and Si, respectively, then the uncorrelation between ni and si, Zi = Si + N can be explored by setting g = Z2/Z1 and it is transformed as follows: If the background noise ni (i = 1,2) is bandlimited white noise, then its variance is always N over its entire frequency range.Next, consider the measurement model when the measurement consists

Small Leak Detection Methodology
The mathematical modeling of the leaks and their symptoms is based on the testbed, as shown in Figure 1.

Symptoms of Leak Presence
There exists AE activity in pipelines, even when they are operating in a healthy state.This activity could be due to the mechanical sources, such as pumps or particles in the flow, hitting a pipeline wall, or hydraulic sources caused by pressure pulses at vortexes in the fluid inside the pipeline [12].Let assume that n i (i = 1, 2) are AE signals acquired by sensors 1, 2 in the normal state, and their mean and variance are 0 and N 1 ≈ N 2 ≈ N > 0, respectively.When a small leak occurs in the testbed pipeline, it makes a small disturbance in the flow around the leak, which introduces a new AE source into the system.The previous source n i can be deemed as background noise; suppose that this noise is uncorrelated with the source of the signal obtained from the leak.Therefore, the model for AE measurements from the sensors for this scenario can be given as: , where s i is the leak AE signal received by sensor i.If the variance of z i and s i is Z i and S i , respectively, then the uncorrelation between n i and s i , Z i = S i + N can be explored by setting g = Z 2 /Z 1 and it is transformed as follows: If the background noise n i (i = 1, 2) is bandlimited white noise, then its variance is always N over its entire frequency range.Next, consider the measurement model when the measurement consists of the background noise n i and the leak signal s i (i = 1, 2) at a frequency ω.In fact, the spectrum of the leak noise is a broadband range of frequencies; however, all the components demonstrate an identical behavior toward the leak phenomenon.
The AE attenuation characteristic from a power law [17], S 1 and S 2 are related by: In ( 2), α = α(ω) is the attenuation coefficient in wave propagation which is dependent on frequency, d is the distance between sensors 1 and 2, and S i (i = 1, 2) is the leak signal with frequency ω.
By substituting (2) into (1), and abbreviating r = S 2 /N, which is the signal to noise ratio measured by sensor 2 at frequency ω, and symbolizing β = e −αd , (1) transforms as follow: Next, take the partial derivative of g according to r: Since 0 < β < 1 for 0 < d < ∞, then ∂g/∂r > 0. As a result, g(r) is a monotonically increasing function according to r.If r 1 r 2 , then g(r 1 ) g(r 2 ).Naturally, a normal state has r = 0 at every frequency ω, and an abnormal state always has r 0; thus, the function g(r) is applicable for leak detection.Figure 5 shows the dependence of g(r) on r with different β values at a particular frequency.It can be easily observed that all the curves g(r) increase from the normal state when r increases.of the background noise ni and the leak signal si (i = 1,2) at a frequency ω.In fact, the spectrum of the leak noise is a broadband range of frequencies; however, all the components demonstrate an identical behavior toward the leak phenomenon.
The AE attenuation characteristic from a power law [17], S1 and S2 are related by: In ( 2),

( )
   = is the attenuation coefficient in wave propagation which is dependent on frequency, d is the distance between sensors 1 and 2, and Si (i = 1,2) is the leak signal with frequency ω.
By substituting ( 2) into (1), and abbreviating , which is the signal to noise ratio measured by sensor 2 at frequency ω, and symbolizing transforms as follow: Next, take the partial derivative of g according to r: , then /0 gr    .As a result, g(r) is a monotonically increasing function according to r.If r1 ≠ r2, then g(r1) ≠ g(r2).Naturally, a normal state has r = 0 at every frequency ω, and an abnormal state always has r ≠ 0; thus, the function g(r) is applicable for leak detection.Figure 5 shows the dependence of g(r) on r with different β values at a particular frequency.It can be easily observed that all the curves g(r) increase from the normal state when r increases.

Robustness of g(r) in Leak Manifestation
This section investigates the reliability of leak manifestation using the function g(r) when the level of background noise increases.Gaussian noise is added to the signals to emulate the presence of noise.Suppose that an amount of noise Δn with the mean 0 and variance ΔN is added to the noise background while the leak signal remains the same.At this moment, the background noise n is replaced by n′ = n + Δn; its variance is N′ = N + ΔN, and r is replaced by r′ = S2/N′.( )

Setting
Figure 5.The dependence of g(r) on r with different β at frequency ω.

Robustness of g(r) in Leak Manifestation
This section investigates the reliability of leak manifestation using the function g(r) when the level of background noise increases.Gaussian noise is added to the signals to emulate the presence of noise.Suppose that an amount of noise ∆n with the mean 0 and variance ∆N is added to the noise background while the leak signal remains the same.At this moment, the background noise n is replaced by n = n + ∆n; its variance is N = N + ∆N, and r is replaced by r = S 2 /N .
Setting γ = ∆N/N, produces r = r/(1 + γ), which is a function of two variables (r, γ).Consider the partial derivative of r' with respect to γ: (5) Now, the function g is replaced by: and its partial derivative according to γ is given by: If 0 < β < 1 and ∂g /∂γ < 0, then g' decreases when γ increases.In other words, the discrimination quality of g' becomes poorer as the intensity of the background noise is higher.This characteristic is similar to r ; however, g is more reliable than r because the decline in g is smaller than the decline in r .
Shortening ∂r ∂γ = ∆ 1 , ∂g ∂γ = ∆ 2 , and taking the proportion ∆ 2 ∆ 1 produces: Obviously, if the parameter β in ( 10) is selected suitably, then << 1 ∀r, γ.As a result, g(r') varies more slowly than r if γ is increasing.It turns out that if the noise background increases to some extent, the variable r exceeds the limitation of leak discrimination, whereas the function g(r ) still provides enough differentiation.
In (8), if β approaches 0, the ratio ∆2/∆1 converges 1 and the variation of the function g is similar to r if the background noise changes, the characteristic of g is no longer robust.In contrast, (3) reveals that if β approaches 1, g converges 1 for any r.This means that the function g does not manifest any abnormal state of the system.Hence, the parameter β should be chosen optimally to trade off between the two above cases so that both high sensitivity and reliability can be achieved.

Detection Procedures
This paper proposes an algorithm to detect a small leak in a pipeline based on g(r) function because it can indicate the presence of leak as described in the previous section.Figure 6 shows a generic framework for leak detection using two approaches: direct AE-based and g(r)-based.Now, the function g is replaced by: ( ) and its partial derivative according to γ is given by: If 0 1 β < < and ' / 0 g γ ∂ ∂ < , then g' decreases when γ increases.In other words, the discrimination quality of g' becomes poorer as the intensity of the background noise is higher.This characteristic is similar to r′; however, g′ is more reliable than r′ because the decline in g′ is smaller than the decline in r′.
, and taking the proportion Obviously, if the parameter β in ( 10) is selected suitably, then varies more slowly than r′ if γ is increasing.It turns out that if the noise background increases to some extent, the variable r′ exceeds the limitation of leak discrimination, whereas the function g(r′) still provides enough differentiation.
In (8), if β approaches 0, the ratio Δ2/Δ1 converges 1 and the variation of the function g is similar to r if the background noise changes, the characteristic of g is no longer robust.In contrast, (3) reveals that if β approaches 1, g converges 1 for any r.This means that the function g does not manifest any abnormal state of the system.Hence, the parameter β should be chosen optimally to trade off between the two above cases so that both high sensitivity and reliability can be achieved.

Detection Procedures
This paper proposes an algorithm to detect a small leak in a pipeline based on g(r) function because it can indicate the presence of leak as described in the previous section.Figure 6 shows a generic framework for leak detection using two approaches: direct AE-based and g(r)-based.The AE signals from sensors 1, 2, and 3 are the inputs to the leak detection framework.In the direct AE-based method, there is no g(r)-construction block.After dividing the AE signals into frames, they are provided to the feature extraction block.The feature extraction process is carried out after the completion of g(r)-construction process.

Frame Division
The recorded AE signals are segmented into a series of frames by the frame division block.The AE waves propagate through different distances from the leak to the sensors.Thus, their arrivals are lagged.It indicates that the frame indexes associated with different channels are not exactly The AE signals from sensors 1, 2, and 3 are the inputs to the leak detection framework.In the direct AE-based method, there is no g(r)-construction block.After dividing the AE signals into frames, they are provided to the feature extraction block.The feature extraction process is carried out after the completion of g(r)-construction process.

Frame Division
The recorded AE signals are segmented into a series of frames by the frame division block.The AE waves propagate through different distances from the leak to the sensors.Thus, their arrivals are lagged.It indicates that the frame indexes associated with different channels are not exactly correlation.
Hence, at the detection stage, the position of the leak is obscure, and the time of arrival of the signals at the sensors is unknown.Thus, one way to deal with this problem is to select a reasonable frame size.Figure 7 uses an example of two signals s 1 and s 2 to explain the method.In this figure, ∆t lag and t f rame are the lag time and frame size (in time) of the two signals, respectively.Due to the existence of the lag time ∆t lag , a lag part of signal s 1 cannot be correlated with any part of signal s 2 in the same frame index i because it has already propagated in frame (i − 1) of s 2 .Thus, a formula for the frame size can be defined as: where t ext is an amount of time to extend the frame size.Obviously, the bigger t ext is, the smaller the lag compared with the remains, which reduces the effect of the lag on the correlation.to the existence of the lag time lag t Δ , a lag part of signal s1 cannot be correlated with any part of signal s2 in the same frame index i because it has already propagated in frame (i − 1) of s2.Thus, a formula for the frame size can be defined as: where ext t is an amount of time to extend the frame size.Obviously, the bigger ext t is, the smaller the lag compared with the remains, which reduces the effect of the lag on the correlation.Next, the parameter lag t Δ is calculated.The location of the leak in the pipeline is unknown; however, the leak lies somewhere within the tested pipeline.According to this condition, the following equations can calculate the maximum lag time and this result is used to calculate the reasonable frame size.
( ) In (10), C is the wave speed.According to [19], AE signals can be propagated in fluid in the frequencies range of 20 kHz to 80 kHz besides propagating through the pipe wall in high frequencies.Furthermore, AE signals of leaks are from the flow turbulence and interaction of particles at the leak point.Thus, AE signals might contain both kinds of propagation.In other words, the wave speed in water is smaller than in solid materials [20].Hence, the value of C should be calculated with the propagation in water.The wave speed can be calculated as follows [21,22]: where K and ρ are the volumetric compressibility modulus and the liquid density of the medium inside the pipeline, e and p D are the thickness and inner diameter of the pipe, ψ is a factor related to the pipe supporting condition, and u is Poisson's ratio.Thus, the frame size is counted as: ( ) In ( 14), ζ must keep the lagged part of the signals, which is not very large as compared to the rest of the signal.Next, the parameter ∆t lag is calculated.The location of the leak in the pipeline is unknown; however, the leak lies somewhere within the tested pipeline.According to this condition, the following equations can calculate the maximum lag time and this result is used to calculate the reasonable frame size.
In (10), C is the wave speed.According to [19], AE signals can be propagated in fluid in the frequencies range of 20 kHz to 80 kHz besides propagating through the pipe wall in high frequencies.Furthermore, AE signals of leaks are from the flow turbulence and interaction of particles at the leak point.Thus, AE signals might contain both kinds of propagation.In other words, the wave speed in water is smaller than in solid materials [20].Hence, the value of C should be calculated with the propagation in water.The wave speed can be calculated as follows [21,22]: where K and ρ are the volumetric compressibility modulus and the liquid density of the medium inside the pipeline, e and D p are the thickness and inner diameter of the pipe, ψ is a factor related to the pipe supporting condition, and u is Poisson's ratio.Thus, the frame size is counted as: In ( 14), ζ must keep the lagged part of the signals, which is not very large as compared to the rest of the signal.

g(r)-Construction
The quantity g(r) in Section 3.1 is formulated by dividing the variance of one frequency for the leak signal.In this section, g(r) vector is constituted from the signal of sensors 1 and 2. The proportions of the amplitudes over all the frequencies are considered in the g(r) vector (Figure 8).

g(r)-Construction
The quantity g(r) in Section 3.1 is formulated by dividing the variance of one frequency for the leak signal.In this section, g(r) vector is constituted from the signal of sensors 1 and 2. The proportions of the amplitudes over all the frequencies are considered in the g(r) vector (Figure 8).In Figure 8, time domain signals are converted to the frequency domain by fast Fourier transform (FFT), taking only their amplitudes as components of the divider for every frequency.After the transformation, we have a new signal in the form of ( ) g r containing information about leakage symptoms.

Feature Extraction
In this study, the three features given in Table 3 are used to compare the performance of the direct AE-based method with that of the g(r)-based one.These features are selected because of their effectiveness in both time and frequency domains.[18,23].This paper uses the KNN-based classifier to solve a binary classification problem, i.e., whether a sample belongs to the normal or abnormal conditions of the pipeline.

Experimental Results
For a comparison between the direct AE-based and g(r)-based approaches, the necessary dataset parameters are listed in Table 4, where P is one of the pressures {P1, P2, P3} and L is one of the leaks {L1, L2, L3, L4} defined in Section 2.1.In Figure 8, time domain signals are converted to the frequency domain by fast Fourier transform (FFT), taking only their amplitudes as components of the divider for every frequency.After the transformation, we have a new signal in the form of g(r) containing information about leakage symptoms.

Feature Extraction
In this study, the three features given in Table 3 are used to compare the performance of the direct AE-based method with that of the g(r)-based one.These features are selected because of their effectiveness in both time and frequency domains.[18,23].This paper uses the KNN-based classifier to solve a binary classification problem, i.e., whether a sample belongs to the normal or abnormal conditions of the pipeline.

Experimental Results
For a comparison between the direct AE-based and g(r)-based approaches, the necessary dataset parameters are listed in Table 4, where P is one of the pressures {P1, P2, P3} and L is one of the leaks {L1, L2, L3, L4} defined in Section 2.1.After the AE signals are divided into frames, they are transferred directly to the feature extraction block in the direct AE-based method, and to the g(r)-construction block in the proposed method (Figure 6). Figure 9 illustrates the AE signals of the pair (P1, L1), and Figure 10 presents the signals in a frame index.
The R15I-AST AE sensors used in the experiment are manufactured by MISTRAS in accordance with the industrial standards, and they can capture signals with low level as shown in Figures 9 and 10.These signals demonstrate that noise levels of all the channels are nearly same in the normal condition, but differ from those of the abnormal condition.Channel 1 is mounted at the furthest place from the leak, and the signal level at this point is the lowest, which resembles the noise, while the two others get higher amplitudes for being closer to the leak.Moreover, such a low level of signals is prone to noise in industry.Therefore, a classification algorithm relying on absolute levels is unreliable if it is only trained by a limited number of datasets.In a real application, the leak position is obscure; AE signals can vary between low level like noise (when AE sensors are far from the leak) and higher levels (when they are closer to the leak).To address the issue, a classifier is trained by the quantity g(r) formed by AE signals from critical conditions; for example, when sensors are mounted far from the leak, and pressure is low.The proposed model can detect a leak correctly even if the environment fluctuates widely.
Sensors 2019, 12, x FOR PEER REVIEW 9 of 18 After the AE signals are divided into frames, they are transferred directly to the feature extraction block in the direct AE-based method, and to the g(r)-construction block in the proposed method (Figure 6). Figure 9 illustrates the AE signals of the pair (P1, L1), and Figure 10 presents the signals in a frame index.
The R15I-AST AE sensors used in the experiment are manufactured by MISTRAS in accordance with the industrial standards, and they can capture signals with low level as shown in Figures 9 and  10.These signals demonstrate that noise levels of all the channels are nearly same in the normal condition, but differ from those of the abnormal condition.Channel 1 is mounted at the furthest place from the leak, and the signal level at this point is the lowest, which resembles the noise, while the two others get higher amplitudes for being closer to the leak.Moreover, such a low level of signals is prone to noise in industry.Therefore, a classification algorithm relying on absolute levels is unreliable if it is only trained by a limited number of datasets.In a real application, the leak position is obscure; AE signals can vary between low level like noise (when AE sensors are far from the leak) and higher levels (when they are closer to the leak).To address the issue, a classifier is trained by the quantity g(r) formed by AE signals from critical conditions; for example, when sensors are mounted far from the leak, and pressure is low.The proposed model can detect a leak correctly even if the environment fluctuates widely.The g(r)-construction block in Figure 8 converts the signals of channels 1 and 2 from the time domain to the frequency domain and then transforms into the quantity g(r).Figure 11 presents a frame of g(r) in which the normal g(r) fluctuates around 0, but the abnormal g(r) has a different trend.This behavior is independent of distance and resistant to noise.The g(r)-construction block in Figure 8 converts the signals of channels 1 and 2 from the time domain to the frequency domain and then transforms into the quantity g(r).Figure 11 presents a frame of g(r) in which the normal g(r) fluctuates around 0, but the abnormal g(r) has a different trend.This behavior is independent of distance and resistant to noise.

Effectiveness of the g(r)-Based Approach Compared with the Direct AE-Based Method
The effectiveness of the g(r)-based approach compared with the direct AE-based method is illustrated by state scattering in 3D space, as shown in Figures 12-15.The plot uses the features and a separability comparison that relies on Kullback-Leibler (KL) distances [18] between the two classes under various experimental conditions.The KL distance is given as follows: where 1 w , 2 w are the two classes (i.e., NORMAL and ABNORMAL) and , ,..., is the features for n data frames, and p is the probability density function.

Effectiveness of the g(r)-Based Approach Compared with the Direct AE-Based Method
The effectiveness of the g(r)-based approach compared with the direct AE-based method is illustrated by state scattering in 3D space, as shown in Figures 12-15.The plot uses the features and a separability comparison that relies on Kullback-Leibler (KL) distances [18] between the two classes under various experimental conditions.The KL distance is given as follows: where 1 w , 2 w are the two classes (i.e., NORMAL and ABNORMAL) and , ,..., is the features for n data frames, and p is the probability density function.

Effectiveness of the g(r)-Based Approach Compared with the Direct AE-Based Method
The effectiveness of the g(r)-based approach compared with the direct AE-based method is illustrated by state scattering in 3D space, as shown in Figures 12-15.The plot uses the features and a separability comparison that relies on Kullback-Leibler (KL) distances [18] between the two classes under various experimental conditions.The KL distance is given as follows: where w 1 , w 2 are the two classes (i.e., NORMAL and ABNORMAL) and x = [x 1 , x 2 , . . ., x n ] T is the features for n data frames, and p is the probability density function.Table 5 shows the KL distances for each pair of pressure, leak (P, L) values.Since the KL distances depend on features and their values can be relatively large, this study uses a logarithm calculation to convert the distances into dB for demonstration.Obviously, the KL distances based on features extracted directly from the AE signals vary in a wide range.In contrast, the KL distances calculated using features extracted from g(r) remain around certain value for each type of feature under diverse conditions.Moreover, the average values also demonstrate that the class separability of the g(r)-based approach is greater than that of the direct AE-based one.From the state scattering and the KL distances, it is clearly observed that the g(r)-based approach is more effective than the direct AE-based one.Table 5 shows the KL distances for each pair of pressure, leak (P, L) values.Since the KL distances depend on features and their values can be relatively large, this study uses a logarithm calculation to convert the distances into dB for demonstration.Obviously, the KL distances based on features extracted directly from the AE signals vary in a wide range.In contrast, the KL distances calculated using features extracted from g(r) remain around a certain value for each type of feature under diverse conditions.Moreover, the average values also demonstrate that the class separability of the   Table 5 shows the KL distances for each pair of pressure, leak (P, L) values.Since the KL distances depend on features and their values can be relatively large, this study uses a logarithm calculation to convert the distances into dB for demonstration.Obviously, the KL distances based on features extracted directly from the AE signals vary in a wide range.In contrast, the KL distances calculated using features extracted from g(r) remain around a certain value for each type of feature under diverse conditions.Moreover, the average values also demonstrate that the class separability of the  This paper uses datasets of pressure and leak {P, L} pairs to train the KNN classifier used in the two approaches.Table 6 presents the performance of the two approaches.The P4 column in the table represents the combination of datasets with all the pressures and leaks.Although Table 6 shows high accuracies with both methods, their performances become different when applying them under different conditions, as shown in the next section.

Cross Testing
The previous section trained 12 classifiers using datasets for conditions corresponding to each pair {P i , L j } for each method: direct AE-based and g(r)-based.This section uses datasets in conditions {P m , L n } to test them, where i, m ∈ {1, 2, 3} and j, n ∈ {1, 2, 3, 4}.The purpose of the test is to verify the reliability of each approach if the training is carried out using only some of the datasets.Table 7 shows the results of the experiment carried under this configuration.In each table, the trained classifiers are symbolized in the columns, and the test datasets are listed in the rows.
It is obvious from Tables 7-10 that when the testing datasets use the same conditions as the training datasets, the accuracies reach 100% or nearly 100%.However, if the testing conditions differ from the training conditions, the accuracies are not promising, and misclassification can occur when using the direct AE-based method.In Table 8, an accuracy of 00.00% indicates misclassification.In the same situation; however, the g(r)-based approach identifies leaks without misclassification.Furthermore, the average accuracy of the g(r)-based approach is higher than that of the direct AE-based method in every test.To detect a fault, a classifier that depends on data must be trained using many datasets representing various experimental conditions to provide high reliability.Table 11 shows accuracies of the two approaches when using the dataset of each pair {P, L}.The average accuracies are 99.28 and 99.97 for the direct AE-based and g(r)-based approaches, respectively, as shown in Table 11.Thus, a model trained by many datasets can work in varied conditions.However, the g(r)-based method is more efficient than the direct AE-based because it still achieves the expected accuracy with only a small number of training datasets.

Evaluating Two Approaches Using Combined Datasets with Added Noise
To evaluate robustness of the proposed method, we add Gaussian noise to the acquired signals to increase the intensity of background noise.The parameter γ is counted in (dB) for each channel of the three AE sensors.Figure 16 illustrates the signals from channel 2 before and after added noise with γ = 0 (dB).This evaluation is conducted using the KNN classifier trained by the combination of all datasets.The testing accuracies are shown in Table 12.Table 12 shows that if γ is small, the accuracies of the two methods are approximately the same as in Table 11.When γ increases, the accuracy of both approaches decreases, and at a certain value, they cannot guarantee classification.However, the direct AE-based method fails in classification when the added noise reaches 5 (dB), whereas, the proposed g(r)-based method works until γ reaches 35 (dB).Thus, the g(r)-based approach produces a classifier that can detect a small leak more robustly than the direct AE-based classifier.

Conclusions
AE-based techniques have the advantage of detecting the physical changes within a structure such as a water pipeline with high accuracy.Due to the complication of the AE phenomenon, modelbased fault diagnosis of the water pipeline is difficult, whereas a data-driven fault diagnosis is relatively easy to implement.However, if the training relies on extracting features directly from AE signals, the resulting classifier is unreliable because such data cannot reflect all possible information under divers working conditions.This paper introduced an intermediate step in a water pipeline  Table 12 shows that if γ is small, the accuracies of the two methods are approximately the same as in Table 11.When γ increases, the accuracy of both approaches decreases, and at a certain value, they cannot guarantee classification.However, the direct AE-based method fails in classification when the added noise reaches 5 (dB), whereas, the proposed g(r)-based method works until γ reaches 35 (dB).Thus, the g(r)-based approach produces a classifier that can detect a small leak more robustly than the direct AE-based classifier.

Conclusions
AE-based techniques have the advantage of detecting the physical changes within a structure such as a water pipeline with high accuracy.Due to the complication of the AE phenomenon, model-based fault diagnosis of the water pipeline is difficult, whereas a data-driven fault diagnosis is relatively easy to implement.However, if the training relies on extracting features directly from AE signals, the resulting classifier is unreliable because such data cannot reflect all possible information under divers working conditions.This paper introduced an intermediate step in a water pipeline fault diagnosis framework in which the signals are preprocessed before the extraction of features.

Figure 2 .
Figure 2. The testbed for pipeline leak detection.

Figure 2 .
Figure 2. The testbed for pipeline leak detection.

Figure 2 .
Figure 2. The testbed for pipeline leak detection.

Figure 5 .
Figure 5.The dependence of g(r) on r with different β at frequency ω.

Figure 7 .
Figure 7. Frame division dependent on lag time.

Figure 7 .
Figure 7. Frame division dependent on lag time.
popular kernel function used to identify instances belonging to different classes during the diagnosis.The theory of KNN is presented clearly in

Figures 12 -
15 show state scattering based on the features extracted from direct AE signals and g(r).

Figures 12 -
15 show state scattering based on the features extracted from direct AE signals and g(r).

Figures 12 -
15 show state scattering based on the features extracted from direct AE signals and g(r).

Figure 12 .
Figure 12.State scattering based on features extracted from the AE signal on channel 1 (CH1).

Figure 13 .
Figure 13.State scattering based on features extracted from the AE signal of channel 2 (CH2).

Figure 12 .
Figure 12.State scattering based on features extracted from the AE signal on channel 1 (CH1).

Figure 12 .
Figure 12.State scattering based on features extracted from the AE signal on channel 1 (CH1).

Figure 13 .
Figure 13.State scattering based on features extracted from the AE signal of channel 2 (CH2).Figure 13.State scattering based on features extracted from the AE signal of channel 2 (CH2).

Figure 13 .
Figure 13.State scattering based on features extracted from the AE signal of channel 2 (CH2).Figure 13.State scattering based on features extracted from the AE signal of channel 2 (CH2).

Sensors 2019 , 18 Figure 14 .
Figure 14.State scattering based on features extracted from the AE signal of channel 3 (CH3).

Figure 15 .
Figure 15.State scattering based on features extracted from g(r).

Figure 14 .
Figure 14.State scattering based on features extracted from the AE signal of channel 3 (CH3).

Sensors 2019 , 18 Figure 14 .
Figure 14.State scattering based on features extracted from the AE signal of channel 3 (CH3).

Figure 15 .
Figure 15.State scattering based on features extracted from g(r).

Figure 15 .
Figure 15.State scattering based on features extracted from g(r).
Sensors 2019,12,x FOR PEER REVIEW 7 of 18 correlation.Hence, at the detection stage, the position of the leak is obscure, and the time of arrival of the signals at the sensors is unknown.Thus, one way to deal with this problem is to select a reasonable frame size.Figure7uses an example of two signals s1 and s2 to explain the method.In this figure, are the lag time and frame size (in time) of the two signals, respectively.Due

Table 11 .
Accuracy (%) of classifiers of two approaches by combined datasets.

Table 12 .
Accuracy with added noise.

Table 12 .
Accuracy with added noise.