Leak Detection in Gas Mixture Pipelines under Transient Conditions Using Hammerstein Model and Adaptive Thresholds

Conventional leak detection techniques require improvements to detect small leakage (<10%) in gas mixture pipelines under transient conditions. The current study is aimed to detect leakage in gas mixture pipelines under pseudo-random boundary conditions with a zero percent false alarm rate (FAR). Pressure and mass flow rate signals at the pipeline inlet were used to estimate mass flow rate at the outlet under leak free conditions using Hammerstein model. These signals were further used to define adaptive thresholds to separate leakage from normal conditions. Unlike past studies, this work successfully detected leakage under transient conditions in an 80-km pipeline. The leakage detection performance of the proposed methodology was evaluated for several leak locations, varying leak sizes and, various signal to noise ratios (SNR). Leakage of 0.15 kg/s—3% of the nominal flow—was successfully detected under transient boundary conditions with a F-score of 99.7%. Hence, it can be concluded that the proposed methodology possesses a high potential to avoid false alarms and detect small leaks under transient conditions. In the future, the current methodology may be extended to locate and estimate the leakage point and size.


Introduction
Piping systems have been found to be the fastest and economical means to transport oil and gas [1]. Unfortunately, pipelines are not immune to faults such as leakage and blockage, which results in huge losses [2,3]. For instance, in September 2010, San Bruno, California, an old aged gas pipeline exploded due to leakage, resulted in 8 fatalities, 58 injuries and around 14 million-dollar losses [4]. Moreover, leakage in the natural gas pipelines is the largest anthropogenic source of CH 4 emission in the USA and the second-largest globally, which significantly contributes to global warming [5]. Therefore, timely and accurate fault detection and diagnostics (FDD) in pipelines is crucial to ensure the safety of human, material, and environment.
According to the comprehensive review by Venkat et al. [6], various FDD techniques have been reported in the previous literature. In Figure 1, an updated (brief) classification of leak detection techniques is presented. Pipeline leak detection techniques can be mainly classified into hardware-based and software-based methods [7][8][9]. Hardware-based methods require the installation of external sensors Various software-based techniques like fuzzy systems [17] support vector machine [18][19][20][21], neural networks [22][23][24][25], statistical [26][27][28] and, transient models [29][30][31], etc. are applied in faults detection studies. To draw a clear picture of current challenges in pipeline fault detection and diagnostics studies, selected studies are summarized in Table 1. Four critical issues related to leak detection in gas mixture pipelines are highlighted below.
1. Overall, it can be depicted that few studies incorporate pipeline dynamics due to transients in leak detection studies. The studies which considered system transients relied on step transient only [14,31,32]. However, for nonlinear systems, signals should exploit the full range of amplitude and frequency in order to capture all possible system dynamics [33]. 2. According to Pan et al. [14], most leak detection studies assume ideal gas conditions; similar observation can be found in Table 1. For instance, Tiantian et al. [31] and Shouxi, Carroll [34] considered gas as an incompressible fluid. 3. From Table 1, it can also be noted that most studies considered small pipelines, i.e., within a length equal to or lesser than 10 km besides, gas pipelines usually have higher lengths [35,36]. 4. Effects due to thermal changes is also ignored in previous studies; it can be seen in Table 1 that 3 out of 10 studies assumed constant temperature throughout the pipeline length. Various software-based techniques like fuzzy systems [17] support vector machine [18][19][20][21], neural networks [22][23][24][25], statistical [26][27][28] and, transient models [29][30][31], etc. are applied in faults detection studies. To draw a clear picture of current challenges in pipeline fault detection and diagnostics studies, selected studies are summarized in Table 1. Four critical issues related to leak detection in gas mixture pipelines are highlighted below.

1.
Overall, it can be depicted that few studies incorporate pipeline dynamics due to transients in leak detection studies. The studies which considered system transients relied on step transient only [14,31,32]. However, for nonlinear systems, signals should exploit the full range of amplitude and frequency in order to capture all possible system dynamics [33].

2.
According to Pan et al. [14], most leak detection studies assume ideal gas conditions; similar observation can be found in Table 1. For instance, Tiantian et al. [31] and Shouxi, Carroll [34] considered gas as an incompressible fluid. 3.
From Table 1, it can also be noted that most studies considered small pipelines, i.e., within a length equal to or lesser than 10 km besides, gas pipelines usually have higher lengths [35,36]. 4.
Effects due to thermal changes is also ignored in previous studies; it can be seen in Table 1 that 3 out of 10 studies assumed constant temperature throughout the pipeline length. In this work, the potential of system identification technique for leak detection in gas mixture pipelines is tested. Two main attractive advantages of using system identification include; less amount of data required for the training than black-box models [39], opposed to other data-driven techniques, the physical meaning of a system can be easily interpreted [33] which is essential to implement a proposed methodology in real systems. The core objective of this study is to improve the leak detection system in gas mixture pipelines under transient conditions. Following can be claimed the main contributions of this work:

1.
The transient, compressible and non-isothermal flow of natural gas in a pipeline is modeled using the OLGA simulator for the purpose of generating sufficient data needed for designing, validating and testing the proposed leak detection system.

2.
For leak detection study, the mass flow rate at the pipeline inlet is designed based on an amplitude modulated pseudo-random binary signals. Inlet mass flow rate and pressure signals are used to estimate outlet mass flow rate using the Hammerstein model.

3.
Adaptive thresholds are defined to monitor pipeline outlet mass flow rate for leakage detection under transient conditions.

4.
Effects of different leak locations, varying leak size and, various signal to noise ratio on leak detection performance are investigated using standard performance measures.

Proposed Leak Detection Methodology
The proposed architecture for leak detection is shown in Figure 2. It can be mainly divided into four steps: case study, model identification, adaptive thresholds calculation and, leak detection. In the case study, data for training, validation and testing are generated based on the design of Experiment (DOE). The data are then used for model identification (training). After that, the identified model is cross-validated against unseen boundary conditions. Finally, testing is performed using various sets of leakage data. The following subsections address the details of all the steps involved in the leak detection algorithm.

Case Study for Data Generation
Data for training may be acquired through supervisory control and data acquisition (SCADA) system (physical sensors) or from the mathematical models of the pipelines (virtual sensors) [19]. In this study, OLGA simulator is used to generate pipeline data which is based on transient mathematical models and used by several studies [24,40,41]. Transient conditions in actual pipelines are due to various reasons such as varying customer demand, the compressibility of a gas mixture, changes in atmospheric conditions, dynamic friction factor, line shutdown, start-up, compressor surges, etc. [42]. To study such systems, transients can be artificially generated through imposed transient signals at pipeline boundaries XU, Karney [43]. These transients can be generated using step, impulse and, pseudorandom signals. In this study, amplitude modulated pseudo-random binary signal (APRBS) of mass flow at pipeline inlet is imposed at pipeline inlet to induce system transients as defined in the paper by Deflorian, Zaglauer [44].

System Model Identification for Normal Conditions (Training)
Mass flow rate and pressure measurements at the pipeline inlet are used as input and outlet mass flow rate values are used as an output for model identification using the Hammerstein model. The model parameters are estimated using the least-squares method (LSM) in MATLAB 2019b ® . Various pipeline models are estimated using several numbers of parameters. The theoretical background of Hammerstein model can be explained as in the following section.

Stochastic Hammerstein Model
Block diagram of single input and single output (SISO) Hammerstein model is mentioned in Figure 3. Where ( ) and ( ) are measured input and measured output at a time step . Hammerstein model is composed of nonlinear function followed by linear, as shown in Figure 3. The

Case Study for Data Generation
Data for training may be acquired through supervisory control and data acquisition (SCADA) system (physical sensors) or from the mathematical models of the pipelines (virtual sensors) [19]. In this study, OLGA simulator is used to generate pipeline data which is based on transient mathematical models and used by several studies [24,40,41]. Transient conditions in actual pipelines are due to various reasons such as varying customer demand, the compressibility of a gas mixture, changes in atmospheric conditions, dynamic friction factor, line shutdown, start-up, compressor surges, etc. [42]. To study such systems, transients can be artificially generated through imposed transient signals at pipeline boundaries XU, Karney [43]. These transients can be generated using step, impulse and, pseudorandom signals. In this study, amplitude modulated pseudo-random binary signal (APRBS) of mass flow at pipeline inlet is imposed at pipeline inlet to induce system transients as defined in the paper by Deflorian, Zaglauer [44].

System Model Identification for Normal Conditions (Training)
Mass flow rate and pressure measurements at the pipeline inlet are used as input and outlet mass flow rate values are used as an output for model identification using the Hammerstein model. The model parameters are estimated using the least-squares method (LSM) in MATLAB 2019b ® . Various pipeline models are estimated using several numbers of parameters. The theoretical background of Hammerstein model can be explained as in the following section.

Stochastic Hammerstein Model
Block diagram of single input and single output (SISO) Hammerstein model is mentioned in Figure 3. Where u(k) and y(k) are measured input and measured output at a time step k. Hammerstein model is composed of nonlinear function followed by linear, as shown in Figure 3. The linear part B q −1 /A q −1 can also be termed as memory because it utilizes the previous memory of the system to predict the model parameters. While, the nonlinear part may be selected from a variety of available functions, some examples of these functions are quadratic, cubic, sigmoid, wavelet, etc. For quadratic function Hammerstein model can be written as Equation (1) [45], For quadratic function Hammerstein model can be written as Equation (1) [45], Here, ( −1 ) and ( −1 ) are referred to as the memory portions of output and input measurements, respectively, and can be written as Equations (2)-(5),  The above model is nonlinear in parameters; thus, it requires nonlinear optimization. In order to avoid it, a generalized form of Hammerstein model in prediction from can be written as Equation (5) [45], here, , 1 , 2 are the linearized parameters in Hammerstein model and ̂( ) is presenting the predicted output.
Above mentioned model is deterministic, as it did not consider any noise in the process. When stochastic model is considered, a random noise function ( ) is added to the data (Equation (6)). A common practice is to add white noise [46]. In this study, white noise of 0% to 0.5% will be added in the mass flow rate and pressure signals.

Parameter Estimation using LSM
For multiple input (pressure and mass flow rate at inlet) and single output (mass flow rate at outlet) Hammerstein model for a time step can be written as Equation (7), Above equation for various time steps can be written in matrix form as Equation (8), Here, A q −1 and B q −1 are referred to as the memory portions of output and input measurements, respectively, and can be written as Equations (2)-(5), where, The above model is nonlinear in parameters; thus, it requires nonlinear optimization. In order to avoid it, a generalized form of Hammerstein model in prediction from can be written as Equation (5) [45], here, g o , g 1i , g 2i are the linearized parameters in Hammerstein model andŷ(k) is presenting the predicted output. Above mentioned model is deterministic, as it did not consider any noise in the process. When stochastic model is considered, a random noise function e(k) is added to the data (Equation (6)). A common practice is to add white noise [46]. In this study, white noise of 0% to 0.5% will be added in the mass flow rate and pressure signals.

Parameter Estimation Using LSM
For multiple input (pressure P in and mass flow rate M in at inlet) and single output (mass flow rate M out at outlet) Hammerstein model for a time step k can be written as Equation (7), Above equation for various time steps can be written in matrix form as Equation (8), Here, g o , g 11 , . . . , g 1nb , g 21 , . . . , g 2nb , g 31 , . . . , g 3nb , g 41 , . . . g 4nb are the linearized parameters in Hammerstein model associated with inlet mass flow and pressure data. The memory points with zero and the negative domain will be considered as zero. For simplicity, mass flow rate, pressure and respective parameters can be represented as A, B, C, D, g o , g 1A , g 1B , g 1C , g 1D then Equation (8) reduced to Equation (9), (2) . . .
All the memory points and parameters are combined in Equation (10) to form an augmented matrix, = Model parameters, then we can write as Equation (11), According to Ljung [46], system parameters can be easily estimated from Equation (12) using least square method (LSM). Equations (12) and (13) below are formulations used to estimate the parameters by LSM. [

Adaptive Thresholds-Based Leak Detection (ATBLD)
For cross-validation of the estimated model, new data points (unknown boundary conditions) are arranged in the form of the augmented matrix using Equation (10); After that, the predicted output mass flow rate is compared with the actual mass flow rate using modeling errors. Predicted mass flow rateŶ New can be determined as Equation (14), Modeling estimation errors can be calculated as Equation (15), Thresholding is the drawing of the boundary that separates normal conditions with faults. In this case thresholds for the normal conditions are defined using model predictions of mass flow rate. Thresholds are calculated based on the concept of standard deviation, in which the percentage of acceptance region is defined for the variable to be monitored. As this study considers transient behavior, fixed thresholds are modified to calculate adaptive thresholds. In adaptive thresholds, the value of threshold updates at each data point according to input boundary conditions. The modified equation for adaptive thresholds can be written as Equations (16) and (17) [47].
where, t α,N d −n θ −1 is the t-student distribution at α × 100% acceptance region N d is the total number of data points n θ is the total number of parameters U is the augmented matrix of input datâ U New is the augmented matrix of new/validation data Th(k)(upper bound) is the upper limit of mass flow rate at the outlet Th(k)(lower bound) is the lower limit of mass flow rate at the outlet Y New (k) is the estimated value of mass flow rate at the outlet Parameters from the training data are used to estimate mass flow rate at the outlet for leak detection using Equation (14), given that mass flow rate and pressure is available at the pipeline inlet. At the time of the leak, the actual mass flow rate at the outlet started violating thresholds limits thus, leakage is detected. For each observation k violations can be of different amplitudes; these amplitudes can be converted into binary signals using Equation (18), where,

Performance Measures
For fault detection studies using a model identification approach, calculation of leak detection performance is essential. The performance of a proposed method to detect faulty and normal conditions may vary due to model estimation errors, leak size, leak location and, signal to noise ratio. To test the performance of leak detection system, various performance indicators are explained by the American petroleum institute [48], these performance measures can be calculated according to the definitions by Jiawei Han et al. [49]. In this study, accuracy (Ac) or recognition rate, error rate (ER), sensitivity (Se) or recall, specificity (Sp), precision (Pr), False alarm rate (FAR), F-score (FS) and, leak detection time (LDT) was calculated. Table 2 listed the mathematical definitions of the above-mentioned indices. Table 2. Indicators used to evaluate the performance of leak detection system.

Performance Measure Formula
Accuracy (percentage of correct classification) In this study, the performance of the proposed leak detection algorithm was tested for three different leak locations: 10 km near pipeline inlet, 45 km close to the midpoint and, 70 km near to outlet using various parameters (41 to 4801). The effect of increasing noise from 0% to 0.5% was also analyzed. Additionally, 1% to 5% leakage in terms of nominal flow (0.01 kg/s to 0.05 kg/s) was also tested.

OLGA Model Validation
A transient, one-dimensional, non-isothermal and compressible flow was simulated to generate data for gas mixture flow in pipelines using the OLGA simulator. Before the FDD study, experimental data from the study by Taylor et al. [50] was used to validate the developed model. The benchmark data were featured by a pipeline having a nominal diameter of 8.15 inches (0.20701 m), length of 44.9 miles (72,259.5 m) and pipeline roughness of 0.617 mm. Moreover, the gas mixture was proposed to have a specific gravity of 0.6962 at 15 • C (288.15 K) is simulated. The OLGA model for the system was simulated for 24 h assuming pipeline discretization of 371 nodes. The inlet pressure was maintained constant at 4205 kPa while the outlet mass flow rate varied with time as per the trend in Figure 4.
The pipeline outlet pressure simulated from the OLGA model is mentioned in Figure 5. It can be observed that the developed model is in good agreement with the experimental results [50] and other simulated studies [51][52][53]. It can be observed that pressure at the outlet is maintained constant in the start followed by constant increment while the mass flow rate was constantly decreased at the same time. There was a delay of around 1.8 h between maximum pressure and minimum mass flow rate, this difference was due to the inertia effect [53]. data were featured by a pipeline having a nominal diameter of 8.15 inches (0.20701 m), length of 44.9 miles (72259.5 m) and pipeline roughness of 0.617 mm. Moreover, the gas mixture was proposed to have a specific gravity of 0.6962 at 15 °C (288.15 K) is simulated. The OLGA model for the system was simulated for 24 h assuming pipeline discretization of 371 nodes. The inlet pressure was maintained constant at 4205 kPa while the outlet mass flow rate varied with time as per the trend in Figure 4. The pipeline outlet pressure simulated from the OLGA model is mentioned in Figure 5. It can be observed that the developed model is in good agreement with the experimental results [50] and  other simulated studies [51][52][53]. It can be observed that pressure at the outlet is maintained constant in the start followed by constant increment while the mass flow rate was constantly decreased at the same time. There was a delay of around 1.8 h between maximum pressure and minimum mass flow rate, this difference was due to the inertia effect [53]. Similarly, when the mass flow rate was increased to its maximum, pressure decreased and reached to its lowest value in around 15.2 h. In contrast with the experimental data, our simulation results are following a similar trend throughout. At around 8.5 h, both experimental and simulated pressure has a maximum value of around 2550 kPa. After 16 h, numerical study is showing a gradual increment in pressure while the measured pressure was almost constant. This discrepancy was due to the uncertainty in the measured data after 16 h. As can be observed mass flow rate at the boundary ( Figure 4) suddenly becomes constant from 18 h to onwards which was very difficult to measure from sensors due to their limited precision resulting in uncertain measurements of pressure.

Case Study
A case was developed to generate mass flow rate data required for model training, validation, and testing. Amplitude modulated pseudo-random binary signals (APRBS) of mass flow rate are used as a design of experiment (DOE) at pipeline inlet. Pressure, mass flow rate, and temperature measurements are captured at inlet and outlet of a pipeline with an interval of 10 s. Simulations were run for 50 h, first 25 h are simulated under constant boundary conditions to attain stable conditions. The last 25 h are simulated under a transient condition. For the testing case, a 5% leakage was introduced after 30 min. Other parameters used in the study are mentioned in Table 3.  Similarly, when the mass flow rate was increased to its maximum, pressure decreased and reached to its lowest value in around 15.2 h. In contrast with the experimental data, our simulation results are following a similar trend throughout. At around 8.5 h, both experimental and simulated pressure has a maximum value of around 2550 kPa. After 16 h, numerical study is showing a gradual increment in pressure while the measured pressure was almost constant. This discrepancy was due to the uncertainty in the measured data after 16 h. As can be observed mass flow rate at the boundary ( Figure 4) suddenly becomes constant from 18 h to onwards which was very difficult to measure from sensors due to their limited precision resulting in uncertain measurements of pressure.

Case Study
A case was developed to generate mass flow rate data required for model training, validation, and testing. Amplitude modulated pseudo-random binary signals (APRBS) of mass flow rate are used as a design of experiment (DOE) at pipeline inlet. Pressure, mass flow rate, and temperature measurements are captured at inlet and outlet of a pipeline with an interval of 10 s. Simulations were run for 50 h, first 25 h are simulated under constant boundary conditions to attain stable conditions. The last 25 h are simulated under a transient condition. For the testing case, a 5% leakage was introduced after 30 min. Other parameters used in the study are mentioned in Table 3.
There are several aspects that needs to be considered for the application of proposed technique in other pipelines. For instance, gas composition, pipeline boundary conditions, presence of system and sensor noise, length of pipeline, pipeline roughness, etc. Results obtained in this study are specified to the established conditions that are clearly mentioned above. If pipeline conditions and parameter are varied then, there is a need to tune design of experiment, number of estimated parameters and confidence interval of adaptive thresholds accordingly.

Pipeline Model Identification and Validation
A training data set with 9000 measurements are used to estimate system parameters using LSM (Equation (13)). These parameters are estimated offline. Then, these parameters are used to predict the pipeline outlet mass flow rate using Equation (14). Figure 6 presents the pipeline model identification results using 1201 parameters. Figure 6a shows the pipeline inlet mass flow rate and pressure under transient conditions. These measurements are used as the model input. Figure 6b presents the actual and estimated mass flow rate values at the pipeline (training); it can be noted that estimated flow rates from the Hammerstein model are almost the same as that of actual measurements. Figure 6c presents errors between actual and estimated mass flow rate at the pipeline outlet; it can be seen that error fluctuates between −0.05 to 0.05 kg/s, with root mean square error (RMSE) of almost zero (0.0147). Similarly, Figure 7 presents the cross-validation results of estimated model or parameters. It can be observed that the trained model was accurately predicting the mass flow rate for new boundary conditions with RMSE of 0.0129. There are several aspects that needs to be considered for the application of proposed technique in other pipelines. For instance, gas composition, pipeline boundary conditions, presence of system and sensor noise, length of pipeline, pipeline roughness, etc. Results obtained in this study are specified to the established conditions that are clearly mentioned above. If pipeline conditions and parameter are varied then, there is a need to tune design of experiment, number of estimated parameters and confidence interval of adaptive thresholds accordingly.

Pipeline Model Identification and Validation
A training data set with 9000 measurements are used to estimate system parameters using LSM (Equation (13)). These parameters are estimated offline. Then, these parameters are used to predict the pipeline outlet mass flow rate using Equation (14). Figure 6 presents the pipeline model identification results using 1201 parameters. Figure 6a shows the pipeline inlet mass flow rate and pressure under transient conditions. These measurements are used as the model input. Figure 6b presents the actual and estimated mass flow rate values at the pipeline (training); it can be noted that estimated flow rates from the Hammerstein model are almost the same as that of actual measurements. Figure 6c presents errors between actual and estimated mass flow rate at the pipeline outlet; it can be seen that error fluctuates between −0.05 to 0.05 kg/s, with root mean square error (RMSE) of almost zero (0.0147). Similarly, Figure 7 presents the cross-validation results of estimated model or parameters. It can be observed that the trained model was accurately predicting the mass flow rate for new boundary conditions with RMSE of 0.0129.   Figure 8 shows adaptive control limits to monitor the mass flow rate under transient conditions. It can be observed that mass flow rate measurements are within upper and lower bounds of adaptive thresholds, indicating normal conditions. However, these thresholds are violated after 30 h (time of leak) thus, indicating a faulty state as shown in Figure 9. Leakage of 5% was introduced in a pipeline and it can be observed that there was a significant violation of limits for 5% leakage. The smallest detectable leak in the study was 1% but with very low accuracy.   Figure 8 shows adaptive control limits to monitor the mass flow rate under transient conditions. It can be observed that mass flow rate measurements are within upper and lower bounds of adaptive thresholds, indicating normal conditions. However, these thresholds are violated after 30 h (time of leak) thus, indicating a faulty state as shown in Figure 9. Leakage of 5% was introduced in a pipeline and it can be observed that there was a significant violation of limits for 5% leakage. The smallest detectable leak in the study was 1% but with very low accuracy.  Figure 8 shows adaptive control limits to monitor the mass flow rate under transient conditions. It can be observed that mass flow rate measurements are within upper and lower bounds of adaptive thresholds, indicating normal conditions. However, these thresholds are violated after 30 h (time of leak) thus, indicating a faulty state as shown in Figure 9. Leakage of 5% was introduced in a pipeline and it can be observed that there was a significant violation of limits for 5% leakage. The smallest detectable leak in the study was 1% but with very low accuracy.

Performance Evaluation of Fault Detection System
To test the performance of the proposed system, various performance indicators are calculated by changing the number of parameters, leak location, leak size and, percentage noise as mentioned in Section 2.4.

Effect of Several Parameters and Leakage Locations on Fault Detection
Fault detection performance indices are calculated for various number of parameters, i.e., 41,81,201,401,801,1201,1601,2001, 2401, 2801, 3601 and 4801. To incorporate process noise, 0.2% white noise was added in the signals and leakage of 0.05 kg/s (5% of nominal flow) was assumed for all cases. Table 4 presents the performance of the proposed technique when there was a leak near the inlet (10 km from inlet) using various parameters. Tables 5 and 6 show a leak at 45 km and 70 km, respectively, which was near the outlet of a pipeline.
Overall, accuracy, sensitivity and F-score of leak detection increased by increasing the number of parameters. This trend was true for the leakage at 45 km and 70 km, but for the leakage at 10 km and parameters higher than 1000, accuracy of leak detection started to descend as the number of

Performance Evaluation of Fault Detection System
To test the performance of the proposed system, various performance indicators are calculated by changing the number of parameters, leak location, leak size and, percentage noise as mentioned in Section 2.4.

Effect of Several Parameters and Leakage Locations on Fault Detection
Fault detection performance indices are calculated for various number of parameters, i.e., 41,81,201,401,801,1201,1601,2001, 2401, 2801, 3601 and 4801. To incorporate process noise, 0.2% white noise was added in the signals and leakage of 0.05 kg/s (5% of nominal flow) was assumed for all cases. Table 4 presents the performance of the proposed technique when there was a leak near the inlet (10 km from inlet) using various parameters. Tables 5 and 6 show a leak at 45 km and 70 km, respectively, which was near the outlet of a pipeline.
Overall, accuracy, sensitivity and F-score of leak detection increased by increasing the number of parameters. This trend was true for the leakage at 45 km and 70 km, but for the leakage at 10 km and parameters higher than 1000, accuracy of leak detection started to descend as the number of

Performance Evaluation of Fault Detection System
To test the performance of the proposed system, various performance indicators are calculated by changing the number of parameters, leak location, leak size and, percentage noise as mentioned in Section 2.4.

Effect of Several Parameters and Leakage Locations on Fault Detection
Fault detection performance indices are calculated for various number of parameters, i.e., 41,81,201,401,801,1201,1601,2001,2401,2801, 3601 and 4801. To incorporate process noise, 0.2% white noise was added in the signals and leakage of 0.05 kg/s (5% of nominal flow) was assumed for all cases. Table 4 presents the performance of the proposed technique when there was a leak near the inlet (10 km from inlet) using various parameters. Tables 5 and 6 show a leak at 45 km and 70 km, respectively, which was near the outlet of a pipeline. Overall, accuracy, sensitivity and F-score of leak detection increased by increasing the number of parameters. This trend was true for the leakage at 45 km and 70 km, but for the leakage at 10 km and parameters higher than 1000, accuracy of leak detection started to descend as the number of parameters are increased. The reason of this decrement in performance was due to the small leak size and long-distance pipeline. When small leaks happened near the inlet, the weaker pressure signals are received at the outlet (as pressure decreases with increase in length). With high number of parameters, these weak fault signals are mixed with system transients and noise, thus reducing the performance of leak detection. In contrast, the error ratio and LDT has the opposite trend, as compared to accuracy. Specificity and precision of a system was 100% for the parameters up to 3601 (0% FAR), as the number of parameters was increased from 3601, the specificity started decreasing from 100% thus, raising false alarms.

Selection of Parameters
The decision on the number of parameters can be made by considering the attributes like leak detection performance, leak detection time, computational time and FAR. To select the best possible solution, one must do a trade-off among these attributes. First, for the leak detection performance, the average F-score of parameters at different locations is compared in Figure 11. It can be noted that the highest percentages of the average F-score (94.14% and 94.64%) was achieved with 1201 and 2001 parameters, respectively. Average F-score is calculated based on the individual F-score of leak detection at 10 km, 40 km, and 70 km. The decision on the number of parameters can be made by considering the attributes like leak detection performance, leak detection time, computational time and FAR. To select the best possible solution, one must do a trade-off among these attributes. First, for the leak detection performance, the average F-score of parameters at different locations is compared in Figure 11. It can be noted that the highest percentages of the average F-score (94.14% and 94.64%) was achieved with 1201 and 2001 parameters, respectively. Average F-score is calculated based on the individual F-score of leak detection at 10 km, 40 km, and 70 km. The computational time required by 2001 was approximately double as compared to 1202 parameters but has almost the same F-score value. From the Tables 3-5, it can be observed that for the higher number of parameters, the leak detection time was much smaller than the lower ones. In this situation, there was a trade-off between computational time and detection time. If detection time was at utmost priority one must go for higher parameters, and if there was a limitation of computation power so low number of parameters are suggested. The average detection time difference for 1201 and 2001 parameters was found to be around 4.8 min. Figure 11. F-Score of 5% leak detection using various parameters at different locations.

Number of Parameters
The computational time required by 2001 was approximately double as compared to 1202 parameters but has almost the same F-score value. From the Tables 3-5, it can be observed that for the higher number of parameters, the leak detection time was much smaller than the lower ones. In this situation, there was a trade-off between computational time and detection time. If detection time was at utmost priority one must go for higher parameters, and if there was a limitation of computation power so low number of parameters are suggested. The average detection time difference for 1201 and 2001 parameters was found to be around 4.8 min.

Effect on Fault Detection by Increasing Noise
High noise in the signals notably increased leak detection time as compared to noise-free data. It can be seen in Table 7 that, when noise was 0%, the leak detection time was less than a half minute (10.2 s). When noise was increased to 0.1%, detection time suddenly increased to 3 min while maintaining the F-score. Increasing noise in the signals up to 0.5% maintains F-score to more than 99.5% but, it takes much longer time to detect leakage as noise increases. When there was varying noise from 0 to 0.5% in the system, then the average leak detection F-Score was around 99.8% and the average detection time was around 2.8 min. The performance results of ATBLD for various leak sizes are mentioned in Table 8. Leakages of size 3% and higher are detected with an F-score of more than 99.5% and for leaks smaller than 3% F-score started to decrease while maintaining the specificity to 100% (same as 0% FAR). As leakage size was increased, LDT was significantly reduced. For instance, 1% leak detection time was 73.3 min whereas, leakage of 2% was detected in only 8.83 min and for the leakage of 5%, the detection time was reduced to 3 min. As can be noted leakage of 0.05 kg/s (1%) was equal to the maximum modeling error of the mass flow rate (Figure 6c). Thus, 1% leakage requires significantly higher detection time than higher degree leaks.

Comparison between Proposed Methodology and Recent Studies
In Table 9, comparison is made between the proposed methodology and other reported literature. Several advantages and disadvantages related to fault detection are mentioned for each technique. The parameters that are important for fault detection studies are: type of fluid, length of a pipeline, boundary conditions, amount of data required, computational time/cost, missed and false alarms, leak detection time and accuracy.  Where, ATBLD = Adaptive thresholds-based leak detection. SVM = Support vector machines. PCA = Principle component analysis. IRF = Impulse response function. ANN = Artificial neural network. RTTM = Real time transient modeling.