Real-Time Leak Detection for a Gas Pipeline Using a k-NN Classifier and Hybrid AE Features

This paper introduces a technique using a k-nearest neighbor (k-NN) classifier and hybrid features extracted from acoustic emission (AE) signals for detecting leakages in a gas pipeline. The whole algorithm is embedded in a microcontroller unit (MCU) to detect leaks in real-time. The embedded system receives signals continuously from a sensor mounted on the surface of a gas pipeline to diagnose any leak. To construct the system, AE signals are first recorded from a gas pipeline testbed under various conditions and used to synthesize the leak detection algorithm via offline signal analysis. The current work explores different features of normal/leaking states from corresponding datasets and eliminates redundant and outlier features to improve the performance and guarantee the real-time characteristic of the leak detection program. To obtain the robustness of leak detection, the paper normalizes features and adapts the trained k-NN classifier to the specific environment where the system is installed. Aside from using a classifier for categorizing normal/leaking states of a pipeline, the system monitors accumulative leaking event occurrence rate (ALEOR) in conjunction with a defined threshold to conclude the state of the pipeline. The entire proposed system is implemented on the 32F746G-DISCOVERY board, and to verify this system, numerous real AE signals stored in a hard drive are transferred to the board. The experimental results show that the proposed system executes the leak detection algorithm in a period shorter than the total input data time, thus guaranteeing the real-time characteristic. Furthermore, the system always yields high average classification accuracy (ACA) despite adding a white noise to input signal, and false alarms do not occur with a reasonable ALEOR threshold.


Introduction
Gas pipelines play a vital role in the fuel transportation field. Even though they are designed and assembled according to strict technical principles [1,2], a gas leak could still occur due to material aging and corrosion [3,4], leading to violent explosions causing injuries, human deaths, and pollution of the environment. Hence, a real-time gas pipeline leak detection system is extremely important to reduce catastrophic consequences.
In early times, acoustic emission (AE) was mainly used for detecting growing cracks and discontinuities in materials because it was defined as releasing elastic energies in a deformed material [5]. However, AE is currently referred to as a phenomenon where transient elastic waves are generated by the rapid release of energy from localized sources within a material, or the transient elastic waves so generated [5]. As a result, a leak is also the source of AEs and is detectable with AE equipment. This type of AE source is sometimes called a secondary source to distinguish it from the classic AEs which are caused by material deformation [5]. The AE-based leak detection is therefore feasible. Consequently, many researchers have applied this mechanism to detect a leak in a gas pipeline [6][7][8][9][10][11][12][13]. The leak detection using AE signals is extremely beneficial because it is a non-destructive technique, number of nearest neighbors) are chosen to ensure not only the real-time characteristic, but also high accuracy of the leak detection program. Aside from ambient noise, any external factor that can cause the vibrations in the pipeline can trip AE signals. For instance, a random pipe collision triggers a mechanical vibration that generates plentiful elastic waves propagating through the pipeline. AE sensors with enough sensitivity can capture signals resulting from those elastic waves, thus interfering with measured target signals. Hence, a k-NN classifier based on AE signals is subjected to discrete events near the testing pipeline, generating false alarms. To address this problem, the current work proposes monitoring the accumulative leaking event occurrence rate (ALEOR) from the output of the state classifier. A final decision of pipeline health state is based on the comparison between the instant ALEOR and a defined threshold, hence avoiding a false alarm.
Finally, the work evaluates the gas pipeline leak detection system constructed from the proposed methodology on the 32F746G-DISCOVERY board (STMicroelectronics, Quakertown, PA, USA) using recorded AE signal datasets. Experimental results demonstrate that the system can identify a leak in real-time with high average classification accuracy under various pressure conditions, and its robustness is satisfactory, even with adding white noise to the input AE signal. Hence, the proposed MCU-based system is applicable for gas leak detection in real applications.

AE Signal Data Acquisition
A pipeline testbed is established to simulate the gas leakage as shown in Figure 1. The testbed is a part of a real gas pipeline system (see Figure 1c) made from stainless steel 304 pipelines with sizes of 114.3 millimeters (mm) and 6.02 mm in outer diameter and wall thickness, respectively. To create various leaks, we designed a leak tool as shown in Figure 1a, which is assembled to the testing pipeline. This tool is composed of a valve and an orifice of diameter 0.3 mm, 0.5 mm, or 1 mm (see Figure 1b). Hence, the normal/leaking states of the pipeline are connected to closed/open valve positions. and AE signal datasets recorded at a gas pipeline testbed under diverse experimental scenarios. Thus, the essential parameters of the k-NN classifier (training features and number of nearest neighbors) are chosen to ensure not only the real-time characteristic, but also high accuracy of the leak detection program. Aside from ambient noise, any external factor that can cause the vibrations in the pipeline can trip AE signals. For instance, a random pipe collision triggers a mechanical vibration that generates plentiful elastic waves propagating through the pipeline. AE sensors with enough sensitivity can capture signals resulting from those elastic waves, thus interfering with measured target signals. Hence, a k-NN classifier based on AE signals is subjected to discrete events near the testing pipeline, generating false alarms. To address this problem, the current work proposes monitoring the accumulative leaking event occurrence rate (ALEOR) from the output of the state classifier. A final decision of pipeline health state is based on the comparison between the instant ALEOR and a defined threshold, hence avoiding a false alarm. Finally, the work evaluates the gas pipeline leak detection system constructed from the proposed methodology on the 32F746G-DISCOVERY board (STMicroelectronics, Quakertown, PA, USA) using recorded AE signal datasets. Experimental results demonstrate that the system can identify a leak in real-time with high average classification accuracy under various pressure conditions, and its robustness is satisfactory, even with adding white noise to the input AE signal. Hence, the proposed MCU-based system is applicable for gas leak detection in real applications.

AE Signal Data Acquisition
A pipeline testbed is established to simulate the gas leakage as shown in Figure 1. The testbed is a part of a real gas pipeline system (see Figure 1c) made from stainless steel 304 pipelines with sizes of 114.3 millimeters (mm) and 6.02 mm in outer diameter and wall thickness, respectively. To create various leaks, we designed a leak tool as shown in Figure 1a, which is assembled to the testing pipeline. This tool is composed of a valve and an orifice of diameter 0.3 mm, 0.5 mm, or 1 mm (see Figure 1b). Hence, the normal/leaking states of the pipeline are connected to closed/open valve positions. The experimental configuration is shown in Figure 2. To capture AE signals, two R15i-AST sensors (AE channels), which were manufactured by MITRAS Group, Inc (Princeton Junction, NJ, USA), are mounted at downstream and upstream locations on the surface of the testing pipeline. These sensors can detect any elastic wave in a range of operating frequencies, which are 50 kilohertz (kHz) to 400 kHz [27]. Those elastic waves The experimental configuration is shown in Figure 2. To capture AE signals, two R15i-AST sensors (AE channels), which were manufactured by MITRAS Group, Inc (Princeton Junction, NJ, USA), are mounted at downstream and upstream locations on the surface of the testing pipeline. These sensors can detect any elastic wave in a range of operating frequencies, which are 50 kilohertz (kHz) to 400 kHz [27]. Those elastic waves can be caused by diverse sources such as leak noise [10], negative pressure wave [4], ambient noise, and other vibrations of the pipe wall. Such R15i-AST sensors are selected because their operating frequency range covers the frequency ranges of AE waves propagating in  [15]. AE signals are sampled at 1 megahertz (MHz) by the NI-9223 module. The sampling frequency of 1 MHz is more than double the maximum operating frequency of sensors, thus satisfying the Nyquist-Shannon sampling theorem [28] about converting analog signals into digital signals. can be caused by diverse sources such as leak noise [10], negative pressure wave [4], ambient noise, and other vibrations of the pipe wall. Such R15i-AST sensors are selected because their operating frequency range covers the frequency ranges of AE waves propagating in metal objects, which are from 100 kHz to 300 kHz, as stated in the BSI standard BS EN 15,856 [15]. AE signals are sampled at 1 megahertz (MHz) by the NI-9223 module. The sampling frequency of 1 MHz is more than double the maximum operating frequency of sensors, thus satisfying the Nyquist-Shannon sampling theorem [28] about converting analog signals into digital signals. After finishing the hardware setup, data recording software is installed on the computer to control the whole data acquisition. Additionally, we exploit the pencil lead break technique [29] to examine both sensitivity of sensors and the whole AE equipment. This ensures the reliability of AE signal datasets prior to storing them in the hard drive.
In the experiment, the three orifices are alternated to simulate different leakages at three inner relative pressures of 700 kPa, 1300 kPa, and 1800 kPa, resulting in three normal states of the testing pipeline (closed valve) and nine diverse leaking states (open valve). Specifically, data acquisition has been performed as follows. First, an orifice was installed, and the pipeline system was configured at a pressure level of 700 kPa, 1300 kPa, or 1800 kPa, and this condition was kept relatively stable before acquiring AE signals. At this time, the valve of the leak tool was closed to simulate the normal state of the pipeline. For this state, the signals were recorded for 2 min. Next, the valve was opened to simulate a leakage. Here, the data corresponding to a leaking state were collected after pressure stabilization. Figure 3 presents gas flow rates measured in front of the testing pipeline during the experimental stages.

Leak Detection Methodology
The overall gas pipeline leak detection diagram is shown in Figure 4. It is composed of two processes: one is offline, and the other is online. The offline analysis synthesizes After finishing the hardware setup, data recording software is installed on the computer to control the whole data acquisition. Additionally, we exploit the pencil lead break technique [29] to examine both sensitivity of sensors and the whole AE equipment. This ensures the reliability of AE signal datasets prior to storing them in the hard drive.
In the experiment, the three orifices are alternated to simulate different leakages at three inner relative pressures of 700 kPa, 1300 kPa, and 1800 kPa, resulting in three normal states of the testing pipeline (closed valve) and nine diverse leaking states (open valve). Specifically, data acquisition has been performed as follows. First, an orifice was installed, and the pipeline system was configured at a pressure level of 700 kPa, 1300 kPa, or 1800 kPa, and this condition was kept relatively stable before acquiring AE signals. At this time, the valve of the leak tool was closed to simulate the normal state of the pipeline. For this state, the signals were recorded for 2 min. Next, the valve was opened to simulate a leakage. Here, the data corresponding to a leaking state were collected after pressure stabilization. Figure 3 presents gas flow rates measured in front of the testing pipeline during the experimental stages. can be caused by diverse sources such as leak noise [10], negative pressure wave [4], ambient noise, and other vibrations of the pipe wall. Such R15i-AST sensors are selected because their operating frequency range covers the frequency ranges of AE waves propagating in metal objects, which are from 100 kHz to 300 kHz, as stated in the BSI standard BS EN 15,856 [15]. AE signals are sampled at 1 megahertz (MHz) by the NI-9223 module. The sampling frequency of 1 MHz is more than double the maximum operating frequency of sensors, thus satisfying the Nyquist-Shannon sampling theorem [28] about converting analog signals into digital signals. After finishing the hardware setup, data recording software is installed on the computer to control the whole data acquisition. Additionally, we exploit the pencil lead break technique [29] to examine both sensitivity of sensors and the whole AE equipment. This ensures the reliability of AE signal datasets prior to storing them in the hard drive.
In the experiment, the three orifices are alternated to simulate different leakages at three inner relative pressures of 700 kPa, 1300 kPa, and 1800 kPa, resulting in three normal states of the testing pipeline (closed valve) and nine diverse leaking states (open valve). Specifically, data acquisition has been performed as follows. First, an orifice was installed, and the pipeline system was configured at a pressure level of 700 kPa, 1300 kPa, or 1800 kPa, and this condition was kept relatively stable before acquiring AE signals. At this time, the valve of the leak tool was closed to simulate the normal state of the pipeline. For this state, the signals were recorded for 2 min. Next, the valve was opened to simulate a leakage. Here, the data corresponding to a leaking state were collected after pressure stabilization. Figure 3 presents gas flow rates measured in front of the testing pipeline during the experimental stages.

Leak Detection Methodology
The overall gas pipeline leak detection diagram is shown in Figure 4. It is composed of two processes: one is offline, and the other is online. The offline analysis synthesizes

Leak Detection Methodology
The overall gas pipeline leak detection diagram is shown in Figure 4. It is composed of two processes: one is offline, and the other is online. The offline analysis synthesizes and optimizes the leak detection algorithm, while the online process experiments and verifies the detection. We will describe the analysis blocks of the algorithm below. and optimizes the leak detection algorithm, while the online process experiments and verifies the detection. We will describe the analysis blocks of the algorithm below.

Hybrid Feature Pool and Feature Selection
To detect the leaking state of a gas pipeline, time and frequency domain statistical features are extracted, as shown in Table 1, from raw AE signals utilized as diagnosis leakage signatures. We therefore obtain a hybrid feature pool of size R × M, where R is the number of feature types (R = 12, as shown in Table 1), and M is the number of analyzed signal frames. The value M should be large enough to reflect the statistical discrimination of the normal/leaking states precisely.
Average amplitude (AVA) Next, the feature pool should be refined to enhance the pipeline health classification quality. Outliers, data points that differ significantly from the other aggregated data points in the same class can cause serious problems in statistical analyses. The existence of outliers in a feature extracted from an AE signal measured at a gas pipeline is inevitable, resulting from both exterior and interior factors. The exterior factor could be variability in the measurement. For example, power spikes can interfere with sensed signals, causing outliers in AE features. This problem can be fixed by perfect experimental configuration and the exploitation of high-quality equipment. Outliers may be created by interior factors of the pipeline system, such as burst emissions appearing in high amplitude and energy in AE signals. A gas pipeline itself generates such a signal due to the disturbance between inner gas flow and the gas flow-pipe wall interaction. Nevertheless, outliers should be eliminated from features used for training a classifier because they do not statistically character-

Hybrid Feature Pool and Feature Selection
To detect the leaking state of a gas pipeline, time and frequency domain statistical features are extracted, as shown in Table 1, from raw AE signals utilized as diagnosis leakage signatures. We therefore obtain a hybrid feature pool of size R × M, where R is the number of feature types (R = 12, as shown in Table 1), and M is the number of analyzed signal frames. The value M should be large enough to reflect the statistical discrimination of the normal/leaking states precisely.
Where x is an input signal, N is the total number of samples, X is the short-time spectral amplitude, f is the frequency, M is the total number of discrete frequencies, and q n = x 2 n / Next, the feature pool should be refined to enhance the pipeline health classification quality. Outliers, data points that differ significantly from the other aggregated data points in the same class can cause serious problems in statistical analyses. The existence of outliers in a feature extracted from an AE signal measured at a gas pipeline is inevitable, resulting from both exterior and interior factors. The exterior factor could be variability in the measurement. For example, power spikes can interfere with sensed signals, causing outliers in AE features. This problem can be fixed by perfect experimental configuration and the exploitation of high-quality equipment. Outliers may be created by interior factors of the pipeline system, such as burst emissions appearing in high amplitude and energy in AE signals. A gas pipeline itself generates such a signal due to the disturbance between inner gas flow and the gas flow-pipe wall interaction. Nevertheless, outliers should be eliminated from features used for training a classifier because they do not statistically characterize the normal/leaking state discrimination, thus leading to the deterioration of the classification performance. This paper assumes a normal distribution for the AE features; outliers can therefore be detected by the three-sigma rule [25]. This rule is expressed as follows: Pr where Y i is an observation from a normally distributed feature y i ; µ yi and σ yi are the mean and standard deviation of the distribution, respectively; i = 1, 2, . . . , R. According to (1), if |Y i − µ yi | > 3σ yi , the value Y i is considered an outlier and it is removed from the set of y i -feature observations. After unwanted observations are eliminated from the y i -vector, the length of y i -vector is shrunk as M i * (M i * ≤ M). Because the feature types distribute dissimilarly, the outlier elimination might return different lengths M i * of the y i -vectors (i = 1, 2, . . . , R). As a result, we compensate new satisfactory observations for the feature pool to gain M i * = M. The feature pool size is therefore intact (R × M); however, its elements are refined, which satisfies (1). Furthermore, all the extracted features may not be equally effective in highly accurate leak detection. Inferior signatures not only impair the classification accuracy but also increase the computational complexity. Thus, we need to filter out redundant features from the pool to enhance the detection performance while reducing the computational load. This paper scores features using the Kullback-Leibler distance [26] and eradicates low-ranked elements in the feature pool. The KL distance is calculated as follows: where d KL is the KL distance, w 1 , w 2 are two classes indicating the normal and leaking states, respectively; y i = [y i1 , y i2 , . . . , y iM ] T is a sort of y i -feature in the refined feature pool, p is a conditional probability density function. Based on (2), we retain features with the dominant KL distance and remove the others in the feature pool, because the greater the KL distance is, the more discriminative the feature. Finally, we retrieve a purified feature pool with size r × M, where r is the number of high-scored features (r ≤ R).

Leak Detection Using a k-NN Classifier and Accumulative Leaking Event Occurrence Rate
With the purified feature pool, we utilize a k-NN classifier to distinguish the two normal/leaking states, in which an obscure new class is assigned to the most common class among its k nearest neighbors using the Manhattan distance given by: where δ j is the Manhattan distance between the input feature vector z = {z 1 , z 2 , . . . , z r } and the jth training feature vector y *j = {y 1j , y 2j , . . . , y rj }, and j = 1, 2, . . . , M. The k-NN classifier categorizes the input z into the major class in its k nearest neighbors corresponding to k minimum distances δ j (k < M).
The detection approach aims at the extremely noisy industrial environment. A k-NN classifier is sensitive to noise involving ambient noise and discrete events and may subsequently yield a false alarm (the classified state is "leaking" but the true state is "normal") or miss a true leaking event (the leakage is actually happening); thus, a normal/leaking state decision should depend on monitoring the ALEOR. The leak detection criterion is given by: where ∆B is the number of leaking events in a time period ∆t = t 2 − t 1 , which is from the moment t1 to the moment t 2 , and γ is a threshold to issue a warning of pipeline health state. This threshold is flexibly adjusted by pipeline operators in their specific real environment.

Offline Analysis of AE Signal Datasets
Prior to developing the real-time gas leak detection program with the proposed methodology on an MCU-based architecture, we analyzed offline AE signal datasets to search for a set of optimal parameters, thus enhancing the performance of the real-time leak detection program. The optimized parameters are the feature pool for training the k-NN classification model and the number k (the number of nearest neighbors used for the k-NN classifier). We perform the offline analysis process using a number of AE datasets, as shown in Table 2. For feature selection, we should first normalize extracted features to place them on the same unit basis. The feature normalization is expressed by the following equation: where y old , y new are original and rescaled features, respectively, and µ yn , σ yn are successively mean and standard deviation of the feature estimated from samples belonging to the normal pipeline state. Table 3 exhibits feature scores using the KL distance method. The most highly ranked features are STE, RMS, AVA, and STD, and these are returned in every pressure condition. Hence, we only consider these kinds of features to build the real-time gas leak detection program. Figure 5 illustrates the 3-D visualization of three features with the highest scores under diverse pressure conditions, in which the normal/leaking states are obviously separated for all the cases. Moreover, we know that a large k may improve performance; however, too large a k destroys the locality. Therefore, to choose k appropriately, we employ the available k-NN fitting function "fitcknn" supported by Matlab 2019a to trial different values of k using the analysis datasets and we obtain k = 25.  The datasets belong to a signal channel (R15i Ch1 or R15i Ch2), corresponding to three pressure conditions: 700 kPa (P0), 1300 kPa (P1), and 1800 kPa (P2), and pipeline health states: normal (L0), leaking (0.3 mm (L1), 0.5 mm (L2), and 1 mm (L3)), which were recorded in Section 2; NFA and NFE are the numbers of frames for the offline analysis and experiment respectively, and a frame consists of 8192 samples stored in the hard drive. The datasets belong to a signal channel (R15i Ch1 or R15i Ch2), corresponding to three pressure conditions: 700 kPa (P 0 ), 1300 kPa (P 1 ), and 1800 kPa (P 2 ), and pipeline health states: normal (L 0 ), leaking (0.3 mm (L 1 ), 0.5 mm (L 2 ), and 1 mm (L 3 )), which were recorded in Section 2; N FA and N FE are the numbers of frames for the offline analysis and experiment respectively, and a frame consists of 8192 samples stored in the hard drive.  Figure 6 illustrates an MCU-based hardware architecture to implement the proposed method for real-time gas pipeline leak detection. A sensor channel is connected to a data acquisition (DAQ) module which converts analog AE signals to digital AE signals and directly writes them to a synchronous random-access memory (SRAM) through a communication module, along with a direct memory access (DMA) channel available in the MCU; hence, the leak detection program can investigate AE signals in real-time. We also design a portable memory (SDcard) to store some pre-defined parameters of the leak detection program and its runtime log files used for later analyses. Hence, the program can be adjusted and updated quickly. Additionally, a liquid crystal display (LCD) is installed to indicate the output of the diagnostic program. This entire experimental design is embedded in the 32F746G-DISCOVERY board, as shown in Figure 7. The datasets belong to a signal channel (R15i Ch1 or R15i Ch2), corresponding to three pressure conditions: 700 kPa (P0), 1300 kPa (P1), and 1800 kPa (P2), and pipeline health states: normal (L0), leaking (0.3 mm (L1), 0.5 mm (L2), and 1 mm (L3)), which were recorded in Section 2; NFA and NFE are the numbers of frames for the offline analysis and experiment respectively, and a frame consists of 8192 samples stored in the hard drive.  Figure 6 illustrates an MCU-based hardware architecture to implement the proposed method for real-time gas pipeline leak detection. A sensor channel is connected to a data acquisition (DAQ) module which converts analog AE signals to digital AE signals and directly writes them to a synchronous random-access memory (SRAM) through a communication module, along with a direct memory access (DMA) channel available in the MCU; hence, the leak detection program can investigate AE signals in real-time. We also design a portable memory (SDcard) to store some pre-defined parameters of the leak detection program and its runtime log files used for later analyses. Hence, the program can be adjusted and updated quickly. Additionally, a liquid crystal display (LCD) is installed to indicate the output of the diagnostic program. This entire experimental design is embedded in the 32F746G-DISCOVERY board, as shown in Figure 7.

Real-Time Gas Leak Detection Implementation on the 32F746G-DISCOVERY Board
Due to the limitation of the MCU in internal memory and operating speed, we use integer instead of floating-point format for the feature calculation and the k-NN classification, thus utilizing the memory economically and lightening the computation load. In other words, a real feature value is multiplied by 10 before rounding it, which sustains a one-decimal point precision for the vectors of rounded features, while avoiding reduction in the classification quality.
A trained classifier leans heavily on its training datasets, while AE signals acquired from a pipeline are prone to variation because the inner flow rate and pressure change constantly. The signals also fluctuate according to the sensor installation location and the operating moment. To reconcile these differing environments, we must adjust the trained leak detection model to its real and specific operational conditions. Therefore, the paper proposes updating the classifier by modifying the two parameters μyn and σyn related to the normal pipeline state in run-time, and which are employed in (5). Figure 8 shows the

Real-Time Gas Leak Detection Implementation on the 32F746G-DISCOVERY Board
Due to the limitation of the MCU in internal memory and operating speed, we use integer instead of floating-point format for the feature calculation and the k-NN classification, thus utilizing the memory economically and lightening the computation load. In other words, a real feature value is multiplied by 10 before rounding it, which sustains a one-decimal point precision for the vectors of rounded features, while avoiding reduction in the classification quality.
A trained classifier leans heavily on its training datasets, while AE signals acquired from a pipeline are prone to variation because the inner flow rate and pressure change constantly. The signals also fluctuate according to the sensor installation location and the operating moment. To reconcile these differing environments, we must adjust the trained leak detection model to its real and specific operational conditions. Therefore, the paper proposes updating the classifier by modifying the two parameters µ yn and σ yn related to the normal pipeline state in run-time, and which are employed in (5). Figure 8 shows the feature calculation and k-NN classification module of a real-time gas pipeline leak detection program implemented on the 32F746G-DISCOVERY board. cation, thus utilizing the memory economically and lightening the computation load. In other words, a real feature value is multiplied by 10 before rounding it, which sustains a one-decimal point precision for the vectors of rounded features, while avoiding reduction in the classification quality.
A trained classifier leans heavily on its training datasets, while AE signals acquired from a pipeline are prone to variation because the inner flow rate and pressure change constantly. The signals also fluctuate according to the sensor installation location and the operating moment. To reconcile these differing environments, we must adjust the trained leak detection model to its real and specific operational conditions. Therefore, the paper proposes updating the classifier by modifying the two parameters μyn and σyn related to the normal pipeline state in run-time, and which are employed in (5). Figure 8 shows the feature calculation and k-NN classification module of a real-time gas pipeline leak detection program implemented on the 32F746G-DISCOVERY board.

Experimental Results
To evaluate the gas pipeline leak detection system quickly, we emulate a real data acquisition device (DAQ) using a computer program which dispatches recorded AE signal datasets, whose description is shown in Table 2, through an available communication channel to the 32F746G-DISCOVERY board. This does not affect the objective assessment because the datasets have been acquired from a practical pipeline testbed under various operating conditions. We here figure out three aspects: detection accuracy, real-time characteristic, and detection robustness, because those are key factors to apply a leak detection system for the real environment.

Experimental Results
To evaluate the gas pipeline leak detection system quickly, we emulate a real data acquisition device (DAQ) using a computer program which dispatches recorded AE signal datasets, whose description is shown in Table 2, through an available communication channel to the 32F746G-DISCOVERY board. This does not affect the objective assessment because the datasets have been acquired from a practical pipeline testbed under various operating conditions. We here figure out three aspects: detection accuracy, real-time characteristic, and detection robustness, because those are key factors to apply a leak detection system for the real environment. Figure 9 shows confusion matrices of experimental results returned by the leak detection program running on the 32F746G-DISCOVERY board, and Table 4 illustrates classification accuracy and execution time for evaluation scenarios. The accuracy, as averaged over the two sensor channels (R15i Ch1 and Ch2), and that of various pipeline states (L 0 , L 1 , L 2 , and L 3 ), is relatively high at better than 98% for every pressure condition (P 0 , P 1 , and P 2 ). Besides, the mean execution time (t E = 109 s) is less than the total experimental dataset duration (t D = 123 s). This demonstrates the real-time characteristic of the implemented detection system that does not miss any data and returns a timely result during the analysis operation. Furthermore, the ALEOR is monitored while examining dataset pairs (L 0 , L 1 ), (L 0 , L 2 ), and (L 0 , L 3 ) subsequently (see Figure 10). This plot reveals the correct identification of pipeline states: normal (L 0 ), leaking (L 1 , L 2 , and L 3 ), exploiting a threshold γ = 10 (see red dash line in Figure 10). The leaking state is decided only if ALEOR exceeds the threshold, despite the fluctuation below it. Therefore, no false alarm is reported in the experiment and the leaking state is also indicated punctually. where tD and tE are the total time of datasets and execution time, respectively, measured in seconds. A is classification accuracy given by: A = 100 × NC/NFE [%], NC is the number of correctly classified frames.   Where t D and t E are the total time of datasets and execution time, respectively, measured in seconds. A is classification accuracy given by: A = 100 × N C /N FE [%], N C is the number of correctly classified frames.

Detection Robustness
The result as exhibited in Table 4 and Figures 9 and 10 is obtained by using the test datasets under the same recording condition as the training datasets. As a result, the effectiveness of the proposed leak detection system may not be adequately demonstrated, because in a real gas pipeline network, there are always irregular disturbances leading to AE signal modifications, such as operating mode variation (inner pressure or flow rate), noise interference, etc. Measurement of an AE sensor can be modelled as follows:

Detection Robustness
The result as exhibited in Table 4 and Figures 9 and 10 is obtained by using the test datasets under the same recording condition as the training datasets. As a result, the effectiveness of the proposed leak detection system may not be adequately demonstrated, because in a real gas pipeline network, there are always irregular disturbances leading to AE signal modifications, such as operating mode variation (inner pressure or flow rate), noise interference, etc. Measurement of an AE sensor can be modelled as follows: where z and x are measured and original signals, respectively, and η represents any signal modification including ambient noise and discrete events. We assume the normal distribution function for both x and η. According to the probability rule specified by [30], z distributes normally also, and its mean and standard deviation are sequentially: where µ z , σ z , µ x , σ x , µ η , σ η are means and standard deviations of z, x, and η, respectively. Equation (7) shows that the abnormal disturbance distorts original signals, thus deteriorating the signal-based leak detection model. To verify the robustness of the proposed leak detection method, we add white noise to the experimental datasets prior to conducting the real-time leak detection on the 32F746G-DISCOVERY board. This noise is referred to as the signal disturbance η, simulated by an available function in the Matlab software with a rule below: where σ xn is the standard deviation of normal state signal (acquired when the pipeline is healthy), and ρ is a proportion ratio. We set µ η = 0 in (8) because the mean parameter of a signal is mainly related to low frequency components of that signal, while the operating frequency range of R15i sensors is from 50 kHz to 400 kHz. The low frequency band (below 50 kHz) is not examined and the influence of µ η is therefore relatively minor or µ η ≈ 0. Figure 11 illustrates the signal distortion if adding a white noise η according to (6) and (8) where ρ = 2. We can easily realize that the distorted signal energy is greater than the original because of the added noise in Figure 11. We alter ρ and observe the performance deterioration of the trained classifier. Figure 12 shows the dependence of receiver operating characteristic (ROC) and average classification accuracy (ACA) on ρ. The computation is calculated on all the datasets of the two sensor channels in two cases: with updating μyn and σyn (see Section 4.2.2) and without updating. The classification performance substantially declines at slight values of ρ if we do not adapt the model to the increasing added white noise (see Figure 12a and the blue dash dot line in Figure 12c). In contrast, the classifier can still work acceptably until ρ = 70 if we adjust μyn and σyn (see Figure 12b and the red solid line in Figure 12c). With ρ = 10, the resulting classification accuracy is above 90% (see Figure 12c) and the pipeline state can be We alter ρ and observe the performance deterioration of the trained classifier. Figure 12 shows the dependence of receiver operating characteristic (ROC) and average classification accuracy (ACA) on ρ. The computation is calculated on all the datasets of the two sensor channels in two cases: with updating µ yn and σ yn (see Section 4.2.2) and without updating. The classification performance substantially declines at slight values of ρ if we do not adapt the model to the increasing added white noise (see Figure 12a and the blue dash dot line in Figure 12c). In contrast, the classifier can still work acceptably until ρ = 70 if we adjust µ yn and σ yn (see Figure 12b and the red solid line in Figure 12c). With ρ = 10, the resulting classification accuracy is above 90% (see Figure 12c) and the pipeline state can be exactly identified by the ALEOR with a threshold γ = 10, as shown in Figure 13 for every experimental condition. In short, the proposed methodology can ensure the robustness of the leak detection system. Figure 11. A signal after adding a white noise with ρ = 2.
We alter ρ and observe the performance deterioration of the trained classifier. Figure 12 shows the dependence of receiver operating characteristic (ROC) and average classification accuracy (ACA) on ρ. The computation is calculated on all the datasets of the two sensor channels in two cases: with updating μyn and σyn (see Section 4.2.2) and without updating. The classification performance substantially declines at slight values of ρ if we do not adapt the model to the increasing added white noise (see Figure 12a and the blue dash dot line in Figure 12c). In contrast, the classifier can still work acceptably until ρ = 70 if we adjust μyn and σyn (see Figure 12b and the red solid line in Figure 12c). With ρ = 10, the resulting classification accuracy is above 90% (see Figure 12c) and the pipeline state can be exactly identified by the ALEOR with a threshold γ = 10, as shown in Figure 13 for every experimental condition. In short, the proposed methodology can ensure the robustness of the leak detection system.  Although the proposed method can sustain a high classification performance with small values of ρ, the classification performance still deteriorates gradually according to the increase in ρ and the classifier cannot precisely operate with ρ > 70 which causes severe distortion of the acquired signals. Therefore, we should configure the testbed to resemble an applied real pipeline before gathering datasets for training the classifier, thus obtaining an adequate leakage detector. The greater the similarity between the testbed and the real pipeline, the more accurate the detection is.

Conclusions
A complete system is offered for real-time gas pipeline leak detection in the paper. First, the system offline analyzed recorded AE signals sampled at 1 MHz. The process configured a hybrid feature pool and normalized its elements using the mean and standard deviation of the set of feature observations related to normal pipeline health. Then, the pool was purified using the three-sigma rule and the Kullback-Leibler distance to obtain the most discriminative signatures. Next, the system identified the pipeline health states (normal/leaking) with an input vector of features, by exploiting a k-nearest neighbor classifier that seeks the purified feature pool for the signatures closest to the input vector, based on the Manhattan distance. To avoid issuing a false alarm, the system decided a pipeline state via monitoring the accumulative leaking event occurrence rate and a predefined threshold. Finally, the total proposed leak detection method was embedded in a compact MCU-based hardware platform for real-time leak detection. The detection accuracy, the real-time characteristic, and the robustness of the introduced gas pipeline leak detection system have been evaluated. The experimental results showed that the system indicated pipe- Although the proposed method can sustain a high classification performance with small values of ρ, the classification performance still deteriorates gradually according to the increase in ρ and the classifier cannot precisely operate with ρ > 70 which causes severe distortion of the acquired signals. Therefore, we should configure the testbed to resemble an applied real pipeline before gathering datasets for training the classifier, thus obtaining an adequate leakage detector. The greater the similarity between the testbed and the real pipeline, the more accurate the detection is.

Conclusions
A complete system is offered for real-time gas pipeline leak detection in the paper. First, the system offline analyzed recorded AE signals sampled at 1 MHz. The process configured a hybrid feature pool and normalized its elements using the mean and standard deviation of the set of feature observations related to normal pipeline health. Then, the pool was purified using the three-sigma rule and the Kullback-Leibler distance to obtain the most discriminative signatures. Next, the system identified the pipeline health states (normal/leaking) with an input vector of features, by exploiting a k-nearest neighbor classifier that seeks the purified feature pool for the signatures closest to the input vector, based on the Manhattan distance. To avoid issuing a false alarm, the system decided a pipeline state via monitoring the accumulative leaking event occurrence rate and a predefined threshold. Finally, the total proposed leak detection method was embedded in a compact MCU-based hardware platform for real-time leak detection. The detection accuracy, the real-time characteristic, and the robustness of the introduced gas pipeline leak detection system have been evaluated. The experimental results showed that the system indicated pipeline health states robustly in a quick enough timeframe for real-time application. Thus, this system can be applied for inspecting pipeline health in a real gas pipeline network.
The testbed used in this paper for collecting AE signals is a part of a real gas pipeline network. Hence, the resulting AE signals are not simple signals generated by the pipeline leakage simulation in a laboratory. They do not only contain information about pipeline states (normal or leaking), but also depend on practical gas transportation and systematic behavior. Additionally, a noisy measurement location and wave attenuation could conceal symptoms of leakage in recorded signals. This challenges the signal investigation because the relation between the leakage phenomenon and AE signals is unclear in the initial analysis stages. Therefore, a short pipeline was chosen in the paper to easily separate signal classes related to pipeline states corresponding to different experimental scenarios, hence conveniently proposing a leak detection method as well as evaluating experimental results. However, it is believed that the proposed technique can effectively monitor a long pipeline in a real application. The pipeline length depends on the signal detection ability of the AE sensor-their sensitivity and a specific working environment. These parameters can be estimated by using pencil lead breaking tests.