An Integration Method Using Kernel Principal Component Analysis and Cascade Support Vector Data Description for Pipeline Leak Detection with Multiple Operating Modes

: Pipelines are one of the most efficient and economical methods of transporting fluids, such as oil, natural gas, and water. However, pipelines are often subject to leakage due to pipe corrosion, pipe aging, pipe weld defects, or damage by a third-party, resulting in huge economic losses and environmental degradation. Therefore, effective pipeline leak detection methods are important research issues to ensure pipeline integrity management and accident prevention. The conventional methods for pipeline leak detection generally need to extract the features of leak signal to establish a leak detection model. However, it is difficult to obtain actual leakage signal data samples in most applications. In addition, the operating modes of pipeline fluid transportation process often have frequent changes, such as regulating valves and pump operation. Aiming at these issues, this paper proposes a hybrid intelligent method that integrates kernel principal component analysis (KPCA) and cascade support vector data description (Cas-SVDD) for pipeline leak detection with multiple operating modes, using data samples that are leak-free during pipeline operation. Firstly, the local mean decomposition method is used to denoise and reconstruct the measured signal to obtain the feature variables. Then, the feature dimension is reduced and the nonlinear principal component is extracted by the KPCA algorithm. Secondly, the K-means clustering algorithm is used to identify multiple operating modes and then obtain multiple support vector data description models to obtain the decision boundaries of the corresponding hyperspheres. Finally, pipeline leak is detected based on the Cas-SVDD method. The experimental results show that the proposed method can effectively detect small leaks and improve leak detection accuracy. Abstract: Pipelines are one of the most e ﬃ cient and economical methods of transporting ﬂuids, such as oil, natural gas, and water. However, pipelines are often subject to leakage due to pipe corrosion, pipe aging, pipe weld defects, or damage by a third-party, resulting in huge economic losses and environmental degradation. Therefore, e ﬀ ective pipeline leak detection methods are important research issues to ensure pipeline integrity management and accident prevention. The conventional methods for pipeline leak detection generally need to extract the features of leak signal to establish a leak detection model. However, it is di ﬃ cult to obtain actual leakage signal data samples in most applications. In addition, the operating modes of pipeline ﬂuid transportation process often have frequent changes, such as regulating valves and pump operation. Aiming at these issues, this paper proposes a hybrid intelligent method that integrates kernel principal component analysis (KPCA) and cascade support vector data description (Cas-SVDD) for pipeline leak detection with multiple operating modes, using data samples that are leak-free during pipeline operation. Firstly, the local mean decomposition method is used to denoise and reconstruct the measured signal to obtain the feature variables. Then, the feature dimension is reduced and the nonlinear principal component is extracted by the KPCA algorithm. Secondly, the K-means clustering algorithm is used to identify multiple operating modes and then obtain multiple support vector data description models to obtain the decision boundaries of the corresponding hyperspheres. Finally, pipeline leak is detected based on the Cas-SVDD method. The experimental results show that the proposed method can e ﬀ ectively detect small leaks and improve leak detection accuracy.


Introduction
Pipeline fluid transport is one of the more safe, efficient, and energy-saving methods, which plays an increasingly important role in the development of national economy. However, pipelines are often subject to leakage due to pipe corrosion, pipe aging, pipe weld defects, or damage by a third-party [1]. In addition, pipeline fluids are also often flammable, toxic, corrosive media. Therefore, pipeline leakage will not only lead to economic loss of products and resources, but also seriously pollute the environment [2,3]. Pipeline safety management and accident prevention are increasingly required in countries around the world. Establishing an efficient pipeline leak detection system can report the occurrence and localization of leak accidents in a timely manner, and minimize or even avoid economic machine. Li et al. [11] specifically studied the leak detection of a water distribution system subject to failure of the socket joint. The acoustic characteristics of leak signals in the socket and spigot pipe segments were extracted and selected. An artificial neural network was established as the classifier. More recently, an adaptive design was proposed that combined one-dimensional convolutional neural networks and support vector machine [28]. This method enabled fast and accurate leak detection. Moreover, a graph-based localization algorithm was proposed to determine the leak location within a real water distributed system.
To this end, most of the published data-driven methods require extracting features from pipeline leak signal data samples to develop a classification or prediction model for leak detection. To ensure the accuracy and versatility of an algorithm, it is necessary to introduce leaks with different leak levels at different leak locations before the leak detection system operates. This is impossible and unrealistic in most applications [38]. An alternative suggestion is to use simulation techniques to generate leak samples for training for data-driven methods. However, due to the uncertainty and complexity of actual leaks, it is difficult to fully simulate the actual leakage signal with complete characteristics. Therefore, some researchers have introduced some leak detection methods that require only leak-free sample signals. For example, Wang et al. [32] extracted the time-domain statistical characteristics of the acoustic sensor from the normal (no leak) sample signal, and constructed the SVDD model, which was implemented in a field leak detection system.
In addition, frequent changes in various operating modes, such as normal running, operating conditions adjustment, and pump operation, are often encountered during the pipeline fluid transportation process. The training samples in the feature space of each operating mode are different in nature and unevenly distributed. Moreover, the characteristics of working condition adjustment and pump operation have certain similarities with the one of pipeline leakage, resulting in a high false alarm rate for the pipeline leak system. As a result, the effective classification of pipeline operating modes is of great significance for improving the accuracy of the leak detection system.
In this paper, we present a novel integration method using kernel principal component analysis (KPCA) and cascade support vector data description (Cas-SVDD), namely, KPCA-Cas-SVDD, for pipeline leak detection with multiple operating modes. On the one hand, only the leak-free data samples are required from the actual running process, and the features are extracted and reconstructed by local mean decomposition (LMD). After that, KPCA is used to reduce the feature dimensions. On the other hand, based on the K-means clustering method, the various normal operating modes of the pipeline are classified, based on which the SVDD model is established for each operating mode. Subsequently, the Cas-SVDD method is used for pipeline leak detection, which greatly improves the leak detection accuracy. The rest of this paper is structured as follows. Section 2 presents the novel methodology for pipeline leak detection, including the basic algorithms of LMD, KPCA, K-means, and SVDD. Section 3 presents the background of the case study, data processing, and feature extraction. Section 4 consists of experimental results and discussion. Section 5 addresses the conclusion of this research and directions for future work.

LMD Based Signal Processing and KPCA for Feature Extraction
The local mean decomposition (LMD) was recently developed to analyze time series signals with nonlinear and non-stationary features, that can adaptively denoise and extract features from the original signals [39]. Meanwhile, the LMD adaptively decomposes the original signal into pure frequency modulated signals and envelope components of different magnitude, and then obtains a set of product functions (PF), each of which is the product of a pure frequency-modulated signal and an envelope signal [40]. Combining the instantaneous amplitude and instantaneous frequency of all PF components, the complete time-frequency distribution of the original signal can be obtained, so that the feature information in the original signal can be extracted more effectively.
Given any non-stationary measurement signal x(t), its decomposed PF components via the LMD algorithm can be written as follows: where a i (t) is the instantaneous amplitude of the PF component, and s i (t) is a pure frequency modulation signal. The instantaneous frequency of the PF component is obtained by processing the pure frequency modulation signal by the following formula: All the PF components are separated from the signal x(t), and finally a residual component e k (t) is obtained, which is constant or monotonic. In this way, the original signal x(t) can be represented into the sum of k PF components and e k (t): The noise in x(t) can be removed by carefully selecting PF and e k (t) according to frequency. After the original signal is denoised and reconstructed by the LMD, 12 feature variables are extracted, including time-domain features (mean, variance, effective value, square root amplitude, and energy) and waveform-domain features (kurtosis, skewness parameter, kurtosis factor, pulse factor, shape parameter, peak coefficient, and valley factor), as shown in Table 1.
kurtosis factor However, too many features that may include invalid or redundant features can result in reduced accuracy of leak detection due to overfitting and greatly increase computational complexity. Principal components analysis (PCA) is one of the classical techniques for multivariate analysis, based on which the dimensionality is reduced by preserving the most variance of the original data. Usually, PCA decorrelates the variables by obtaining a linear relationship that handles the two-order correlation between the variables. However, it is less effective in extracting nonlinear features of the pipeline measurement variables. In this case, kernel PCA (KPCA), using the idea of kernel function, is proposed to handle nonlinear feature extraction for the original signal by finding a suitable nonlinear mapping function Φ(X), by which the low-dimensional data sample set X = [x 1 , x 2 , · · · , x n ] T , where x i ∈ R m (i = 1, 2, · · · , n), n is the number of samples, and m is the number of variables, is mapped to high-dimensional space F to construct the feature space, and thereby each variable is subjected to principal component analysis from the high-dimensional feature space [41,42]. We shall briefly review the KPCA method here. The covariance matrix C F on the F space is expressed by: The eigenvalue decomposition equation of the Equation (4) can be obtained by: where λ i and v i are, respectively, the eigenvalues and the eigenvectors of the covariance matrix. The eigenvectors v i of C F can be expressed as: where a ij is the feature space expansion coefficient. Taking Equations (4) and (6) into Equation (5), the following equation can be obtained: Define an n × n kernel matrix K, where K ij = Φ(x i ), Φ x j , then Equation (7) can be expressed as: where a i = [a i1 , a i2 , · · · , a in ] T , the eigenvalue of the matrix K satisfies λ 1 ≥ λ 2 ≥ · · · ≥ λ n . One can choose to preserve the pre-p (p ≤ n) eigenvalues and eigenvectors according to the cumulative variance contribution rate criteria (e.g., up to 0.85). Through this method, KPCA achieves dimensionality reduction and nonlinear feature extraction for original signal. Commonly kernel functions used of KPCA include linear kernel, multi-layer perceptual kernel function, Gaussian kernel, and polynomial kernel. Of these, the Gaussian kernel was used in KPCA in our study, which provides better performance, regardless of the total sample size and feature dimension.

Pipeline Leak Detection Method Based on K-Means and Cas-SVDD
The SVDD algorithm only requires data samples under normal operating conditions, the main idea of which is to create a closed compact hypersphere that contains as many data samples as possible [43,44]. SVDD is an excellent method of one-class classifier with the advantages of robustness, good generalization, and high computation efficiency. Additionally, SVDD inherits small-sample characteristics. However, due to the uncertainties caused by natural changes (such as corrosion and sediments), and human factors (such as valve operation and demand changes), there are frequent changes in operating modes in the pipeline during the fluid transportation process. Moreover, the signal characteristics generated by some pipeline operations, such as valve adjustment and pump opening or closing, are similar to those generated by pipeline leakage. If the different normal operating mode data samples collected are used to create a single SVDD hypersphere, the volume of the hypersphere may be too large, which would lead to an increase in false alarm rate. Therefore, it is necessary that the various operating modes data during normal operation should be first classified, and then multiple SVDDs established.
In our work, the K-means clustering algorithm was used to identify each operating mode of the data samples collected from normal operation of the pipeline. The K-means clustering, as an unsupervised clustering algorithm, is a mature and widely used clustering method. It has the advantages of its simplicity, favorable execution time, and good clustering effect [45]. Assuming that there are P operating modes in the pipeline transportation process, the algorithm will divide the original data set W into P clusters, each of which has high data similarity, with low similarity between Processes 2019, 7, 648 6 of 17 clusters. First, the P data samples are randomly selected in the data set W, each as the center of the initial clusters, and then the distance between the remaining data samples and the initial cluster center is calculated. After that, the data samples of the minimum distance from the certain center of the cluster are assigned to the nearest cluster. Subsequently, the average of all the data in each cluster is calculated. Thereby, the new P cluster centers are obtained. The iterative calculation is performed until each cluster center value does not change. At that time the update process stops. The K-means algorithm chooses squared Euclidean distance as the dissimilarity measure, so the optimization problem can be formulated as: min where H is the function value of the sum of mean squared deviations, q is the given data samples in cluster Q i , and C i is the mean of the cluster Q i (i.e., cluster center). Through the K-means method, the original data set was classified into several sub-data sets with corresponding operating modes. However, as the K-means clustering method does not consider the anomaly data samples in each cluster and remove them, it was necessary to further describe the cluster-intensive data samples using SVDD for each cluster after K-means clustering. In this way, it the method not only overcomes the influence of abnormal sample noise, but also overcomes the shortcoming of the SVDD algorithm when the sample population density is not high.
As mentioned earlier, the data samples for each type of operating mode only participate in the training of constructing an SVDD hypersphere, so that each SVDD hypersphere contains all or as many data samples of the corresponding operating mode. In this way, the multiple SVDD models can be established. Figure 1 gives the schematic diagram.
where is the function value of the sum of mean squared deviations, is the given data samples in cluster , and is the mean of the cluster (i.e., cluster center). Through the K-means method, the original data set was classified into several sub-data sets with corresponding operating modes. However, as the K-means clustering method does not consider the anomaly data samples in each cluster and remove them, it was necessary to further describe the cluster-intensive data samples using SVDD for each cluster after K-means clustering. In this way, it the method not only overcomes the influence of abnormal sample noise, but also overcomes the shortcoming of the SVDD algorithm when the sample population density is not high.
As mentioned earlier, the data samples for each type of operating mode only participate in the training of constructing an SVDD hypersphere, so that each SVDD hypersphere contains all or as many data samples of the corresponding operating mode. In this way, the multiple SVDD models can be established. Figure 1 gives the schematic diagram. Ⅰ Ⅱ Ⅲ Figure 1. Schematic diagram of hyperspheres obtained by multiple SVDDs (i.e., each SVDD corresponds to one operating mode), where the symbols (○, •, and □) denote the corresponding samples from different modes, and the symbol (×) denotes outlier samples.
Even for single class data samples obtained by the K-means algorithm, there are often some data samples of a non-target class, which lead to a larger volume of SVDD hypersphere obtained by training, which will increase the false alarm rate. Additionally, if these data samples near the boundary of the hypersphere are removed, the volume of the obtained hypersphere becomes smaller, resulting in an increase in the false negative rate [46]. Therefore, in order to enhance the robustness of single SVDD, a relaxation factor and a penalty parameter were introduced. The problem of determining the minimum SVDD hypersphere is formulated as the following optimization problem: where is the relaxation factor, which indicates the probability that some training data samples are allowed to be erroneously classified; is the penalty parameter, which is used to control the degree of punishment for misclassified data samples, and thus plays a role in hypersphere volume and misclassification; is the number of data samples; is the number of operating modes; and are the center and radius of the -th hypersphere, respectively. Even for single class data samples obtained by the K-means algorithm, there are often some data samples of a non-target class, which lead to a larger volume of SVDD hypersphere obtained by training, which will increase the false alarm rate. Additionally, if these data samples near the boundary of the hypersphere are removed, the volume of the obtained hypersphere becomes smaller, resulting in an increase in the false negative rate [46]. Therefore, in order to enhance the robustness of single SVDD, a relaxation factor and a penalty parameter were introduced. The problem of determining the minimum SVDD hypersphere is formulated as the following optimization problem: where ξ is the relaxation factor, which indicates the probability that some training data samples are allowed to be erroneously classified; C p is the penalty parameter, which is used to control the degree of punishment for misclassified data samples, and thus plays a role in hypersphere volume and misclassification; N is the number of data samples; P is the number of operating modes; a p and R p are the center and radius of the p-th hypersphere, respectively.
The kernel function used in SVDD maps raw training data from a low-dimensional space to a high-dimensional feature space, thereby constructing a compact hypersphere in a high-dimensional feature space that contains all or all of the target training data samples. The Gaussian kernel was used for SVDD here, which is commonly used for a one-class classifier, such as SVDD, support vector machine, and Parzen density [47]. After introducing the kernel function, the dual problem of the optimization problem of Equation (9) can be obtained as follows: where a a is a Lagrange multiplier; K x p,i · x p,j = Φ x p,i , Φ x p,j is the kernel function used to calculate the inner product of the feature space. By solving the above quadratic programming problem, the radius of the p-th hypersphere is calculated by Equation (12): where x p,k is the support vector. With the same method, the spherical center and radius of other SVDD hyperspheres with different operating modes can be obtained.
Assuming that x new is a new sample to be tested, then the distance d p between the test sample and the p-th hypersphere center is obtained by: Therefore, if d p is greater than R p , then the test sample does not belong to the p-th class. For multi-SVDD classification issues, it should be noted that due to the intersection of multiple hyperspheres some data samples may have difficulty determining which operating mode the data sample belongs to. Therefore, there are uncertain regions. However, the training data set collected from leak-free historical data, regardless of which hypersphere the data sample in the overlap region belongs to, indicates that the data sample is leak-free one. It is not necessary to determine which SVDD hypersphere the data sample of the overlap region belongs to. For pipeline leak detection, we are concerned with whether the data sample is a leak or a leak-free one, and do not care which type of operating mode the data sample belongs to. Since all SVDD hyperspheres are obtained under different normal operating modes, as long as the test data samples fall in any of the SVDD hyperspheres, this indicates that no leak has occurred. This is the basis for our proposed Cas-SVDD for pipeline leak detection. In our work, the Cas-SVDD is defined that the SVDDs obtained under different operating modes are cascaded one after the other.
After obtaining the SVDD model for each operating mode, Cas-SVDD can be applied with the use of the following strategies: (1) Before the Cas-SVDD is used, each SVDD should be reasonably ordered. The SVDD corresponding to the operating mode with high probability occurrence should be placed in front of other SVDDs, which can effectively reduce the online detection time; (2) The data samples collected online are sequentially passed through the Cas-SVDD model. If the sample is included in a certain SVDD hypersphere, it indicates that there is no leak and it is no longer necessary to enter the remaining SVDD hyperspheres. Conversely, if the sample is not included in any SVDD hypersphere, it indicates that the pipeline is leaking.

Procedure
Based on the KPCA and Cas-SVDD methods, the procedure of the proposed method for pipeline leak detection with multiple operating modes can be divided into an offline part and online part, as shown in Figure 2. The detailed procedure was as follows: I: Offline training model module (1) Collect historical data of pipeline normal operation, and set initial parameters of LMD, KPCA, K-means algorithm, and SVDD; (2) Denoise and reconstruct data samples by LMD, and extract the feature variables, each of which can be calculated according to the formulas provided in Table 1. Then, the value of each feature is normalized to the same range (between 0 and 1); (3) Reduce the Dimension of feature variables by KPCA; (4) Identify various operating modes by using the K-means algorithm; (5) Establish the SVDD model for each operating mode, and obtain the center and radius of each hypersphere.

II: Online detection module
(1) Obtain real-time operational data sample; (2) and (3) are the same as the steps as for the offline module; (4) Leak alarm, or not, by using Cas-SVDD according to the strategy mentioned above.
It should be noted that the issue of pipeline leak localization is not considered in the paper. For the integrity of the methodology for pipeline leak problem, the most widely used method for pipeline leak location, namely the negative pressure wave method, is briefly introduced here. Once the pipeline leak is determined, the generalized correlation analysis can be used to obtain the time delay estimation of the negative pressure wave generated by the leak signal reaching the sensors on both sides of the leak point. The leak localization can be calculated according to the formula = ( + × Δ )/2, where is the leak point from the upstream reservoir, is pipeline length, is the propagation speed of negative pressure wave, and Δ is the time delay. A detailed description of the leak location method can be found in the literature [48]. The detailed procedure was as follows: I: Offline training model module (1) Collect historical data of pipeline normal operation, and set initial parameters of LMD, KPCA, K-means algorithm, and SVDD; (2) Denoise and reconstruct data samples by LMD, and extract the feature variables, each of which can be calculated according to the formulas provided in Table 1. Then, the value of each feature is normalized to the same range (between 0 and 1); (3) Reduce the Dimension of feature variables by KPCA; (4) Identify various operating modes by using the K-means algorithm; (5) Establish the SVDD model for each operating mode, and obtain the center and radius of each hypersphere.
II: Online detection module (1) Obtain real-time operational data sample; (2) and (3) are the same as the steps as for the offline module; (4) Leak alarm, or not, by using Cas-SVDD according to the strategy mentioned above.
It should be noted that the issue of pipeline leak localization is not considered in the paper. For the integrity of the methodology for pipeline leak problem, the most widely used method for pipeline leak location, namely the negative pressure wave method, is briefly introduced here. Once the pipeline leak is determined, the generalized correlation analysis can be used to obtain the time delay estimation of the negative pressure wave generated by the leak signal reaching the sensors on both sides of the leak point. The leak localization can be calculated according to the formula L x = (L + v × ∆t)/2, where L x is the leak point from the upstream reservoir, L is pipeline length, v is the propagation speed of negative pressure wave, and ∆t is the time delay. A detailed description of the leak location method can be found in the literature [48].

Data Generation by Flowmaster Software
The pipeline model and pipeline leak scenarios were established by using Flowmaster software, as shown in Figure 3. The length of the pipeline was 2000 m, between an upstream node and a downstream node. The pipeline model parameters were as follows: the inner diameter was 70 mm, the inner wall relative roughness 0.015 mm, the reservoir height of constant head upstream and downstream were 130 m and 0 m, and the negative pressure wave velocity was 1000 m/s, the temperature was 20 degrees Celsius. The position of the leak simulated was 500 m away from the upstream reservoir. Leak ball valves were selected to simulate the pipeline leak. The simulation time was 40 s, the sampling time was 0.01 s. The leak ball valve was opened within 2 s, and the leak occurred at 20 s. For the purpose of better demonstration, only two normal operating modes were considered in our study, one was normal running, where the pipeline was running without any operation adjustment, the other was pipeline valve adjustment. To verify the validity of the proposed method, three leak scenarios were simulated, namely small leak, medium leak, and large leak. Here, small leak was defined as smaller than 1% of the total instantaneous flow within the pipeline, medium leak ranged from 1% of the flow up to 5%, and large leak was larger than 5% of the total flow within the pipeline. Each type of scenario simulated 80 sets of data samples. The pressure signals were collected at node 1 and node 2. To simulate the real signals coming from the pressure sensors, the normally distributed random number was added to the pressure data collected at the nodes.

Data Generation by Flowmaster Software
The pipeline model and pipeline leak scenarios were established by using Flowmaster software, as shown in Figure 3. The length of the pipeline was 2000 m, between an upstream node and a downstream node. The pipeline model parameters were as follows: the inner diameter was 70 mm, the inner wall relative roughness 0.015 mm, the reservoir height of constant head upstream and downstream were 130 m and 0 m, and the negative pressure wave velocity was 1000 m/s, the temperature was 20 degrees Celsius. The position of the leak simulated was 500 m away from the upstream reservoir. Leak ball valves were selected to simulate the pipeline leak. The simulation time was 40 s, the sampling time was 0.01 s. The leak ball valve was opened within 2 s, and the leak occurred at 20 s. For the purpose of better demonstration, only two normal operating modes were considered in our study, one was normal running, where the pipeline was running without any operation adjustment, the other was pipeline valve adjustment. To verify the validity of the proposed method, three leak scenarios were simulated, namely small leak, medium leak, and large leak. Here, small leak was defined as smaller than 1% of the total instantaneous flow within the pipeline, medium leak ranged from 1% of the flow up to 5%, and large leak was larger than 5% of the total flow within the pipeline. Each type of scenario simulated 80 sets of data samples. The pressure signals were collected at node 1 and node 2. To simulate the real signals coming from the pressure sensors, the normally distributed random number was added to the pressure data collected at the nodes.

Data Processing and Feature Extraction
The collected pressure signal was used for noise reduction and signal reconstruction by using LMD. The pressure signal of 500 m from the upstream reservoir was used as an example to illustrate the noise reduction effect, as shown in Figure 4. The denoised and reconstructed pressure signal better showed the characteristics of time domain and waveform domain, which provided a basis for the subsequent feature extraction of the pressure variables.
A total of 400 sets of data samples were generated, of which 80 sets were normal running mode and 80 sets were pipeline valve adjustment mode. The data samples for small leak, medium leak, and large leaks were each 80 sets. Table 2 gives an example of a set of extracted feature variables for each mode and leak scenario. It should be noted the value of each feature was normalized to the same range (between 0 and 1) in order to eliminate cross-modal amplitude differences caused by different feature extraction mechanisms.

Data Processing and Feature Extraction
The collected pressure signal was used for noise reduction and signal reconstruction by using LMD. The pressure signal of 500 m from the upstream reservoir was used as an example to illustrate the noise reduction effect, as shown in Figure 4. The denoised and reconstructed pressure signal better showed the characteristics of time domain and waveform domain, which provided a basis for the subsequent feature extraction of the pressure variables.
A total of 400 sets of data samples were generated, of which 80 sets were normal running mode and 80 sets were pipeline valve adjustment mode. The data samples for small leak, medium leak, and large leaks were each 80 sets. Table 2 gives an example of a set of extracted feature variables for each mode and leak scenario. It should be noted the value of each feature was normalized to the same range (between 0 and 1) in order to eliminate cross-modal amplitude differences caused by different feature extraction mechanisms.  It can be seen from Table 2 that the time-domain feature variables and waveform-domain feature variables reflect the characteristic changes of the pressure signal to some extent. Moreover, the changes in certain feature variables are obvious under various operation modes and leak scenarios. However, too many feature variables will greatly increase the computational complexity, which in turn affects the real-time performance of pipeline leak detection. Additionally, some feature variables may have redundancy. Here, the KPCA was used to reduce the feature dimensions, through which the redundant information was removed from the feature variables and nonlinear elements were extracted. The KPCA adopted the Gaussian radial basis function, which has the advantage of fewer parameters and satisfies the Mercer condition, and its kernel width is 20. The first three kernel principal components (KPC) were selected as the new comprehensive feature variables, i.e., KPC 1, KPC 2, and KPC 3. The variance contribution rate was 85% or more, so that the selected kernel principal components could reflect the comprehensive characteristics of the original feature variables. Table 3 gives an example of four sets of kernel principal components data samples in each operating mode and leak scenario. Table 3 shows that, compared with Table 2, the kernel principal components in different operating modes and leak scenarios have a better difference.   It can be seen from Table 2 that the time-domain feature variables and waveform-domain feature variables reflect the characteristic changes of the pressure signal to some extent. Moreover, the changes in certain feature variables are obvious under various operation modes and leak scenarios. However, too many feature variables will greatly increase the computational complexity, which in turn affects the real-time performance of pipeline leak detection. Additionally, some feature variables may have redundancy. Here, the KPCA was used to reduce the feature dimensions, through which the redundant information was removed from the feature variables and nonlinear elements were extracted. The KPCA adopted the Gaussian radial basis function, which has the advantage of fewer parameters and satisfies the Mercer condition, and its kernel width is 20. The first three kernel principal components (KPC) were selected as the new comprehensive feature variables, i.e., KPC 1, KPC 2, and KPC 3. The variance contribution rate was 85% or more, so that the selected kernel principal components could reflect the comprehensive characteristics of the original feature variables. Table 3 gives an example of four sets of kernel principal components data samples in each operating mode and leak scenario. Table 3 shows that, compared with Table 2, the kernel principal components in different operating modes and leak scenarios have a better difference.

Results and Discussions
As mentioned above, during the pipeline fluid transport process, the operating modes, such as valve adjustment and normal running, change frequently, and the collected leak-free data samples contain multiple operating conditions. The single SVDD hypersphere is not compact enough in a variety of operating modes, which in turn leads to low classification accuracy. The K-means algorithm was used to cluster data samples that were processed by KPCA, and then the data samples of each operating mode could be obtained. The clustering result using K-means is shown in Figure 5. The K-means clustering obtained a good clustering result. Next, the SVDD model was established for each operating mode in order to obtain the Cas-SVDD model, consisting of multiple compact SVDD hyperspheres. It can also be seen from Figure 5 that establishing a unique SVDD hypersphere for all the data samples in all normal operating modes will result in a data sample set that is not compact enough. In some cases, a leak data sample may be located between the SVDD hyperspheres, and the unique SVDD may consider it as a normal operating data sample, while the Cas-SVDD makes it easy to identify that the data sample is a pipeline leak sample.

Results and Discussions
As mentioned above, during the pipeline fluid transport process, the operating modes, such as valve adjustment and normal running, change frequently, and the collected leak-free data samples contain multiple operating conditions. The single SVDD hypersphere is not compact enough in a variety of operating modes, which in turn leads to low classification accuracy. The K-means algorithm was used to cluster data samples that were processed by KPCA, and then the data samples of each operating mode could be obtained. The clustering result using K-means is shown in Figure 5. The K-means clustering obtained a good clustering result. Next, the SVDD model was established for each operating mode in order to obtain the Cas-SVDD model, consisting of multiple compact SVDD hyperspheres. It can also be seen from Figure 5 that establishing a unique SVDD hypersphere for all the data samples in all normal operating modes will result in a data sample set that is not compact enough. In some cases, a leak data sample may be located between the SVDD hyperspheres, and the unique SVDD may consider it as a normal operating data sample, while the Cas-SVDD makes it easy to identify that the data sample is a pipeline leak sample. To verify the proposed KPCA-Cas-SVDD method, the performance of the pipeline leak detection will be compared to the corresponding data from the single SVDD (S-SVDD) and Cas-SVDD methods. Here, the S-SVDD method means that only one hypersphere model was established using all data samples with different operating modes under normal operation; the difference between the To verify the proposed KPCA-Cas-SVDD method, the performance of the pipeline leak detection will be compared to the corresponding data from the single SVDD (S-SVDD) and Cas-SVDD methods. Here, the S-SVDD method means that only one hypersphere model was established using all data samples with different operating modes under normal operation; the difference between the Cas-SVDD and the KPCA-Cas-SVDD is that the former does not use the KPCA for the dimension reduction of feature variables, while the latter uses the KPCA. The ratio of training data and test data was 5:5. The SVDD adopted a Gaussian kernel function. After 5-fold cross-validation and grid search, the optimal parameter kernel width σ and target class error rate w of the SVDD in the three methods were obtained, as shown in Table 4. For the convenience of comparing the performance of the three methods, three cases were considered and the three methods were applied to each of these three cases. Case 1: including 120 sets of data samples, of which 40 were from normal running, 40 from pipeline valve adjustment, and 40 from small leak; Case 2: including 120 sets of data samples, of which 40 were from normal running, 40 from pipeline valve adjustment, and 40 from medium leak; Case 3: including 120 sets of data samples, of which 40 were from normal running, 40 from pipeline valve adjustment, and 40 from large leak.
For the S-SVDD method, the pipeline leak detection results are shown in Figure 6, where Figure 6a-c represent Case 1, Case 2, and Case 3, respectively. For the convenience of graphic demonstration, the first 80 data samples are leak-free data samples, and the leak data sets are set from the 81st to the 120th samples. It can be seen from Figure 6a that 15 small leak samples were not detected. Figure 6b shows that 12 medium leak samples were undetected. Figure 6c shows that seven large leak samples were undetected. Therefore, these results show that the performance of the S-SVDD method for pipeline leak detection is poor, with low leak detection accuracy. For the Cas-SVDD method, the pipeline leak detection results are shown in Figure 7. The 120 sets of data samples were first classified by using the first SVDD of the Cas-SVDD. Then, the remaining data samples that were not in the first SVDD hypersphere were further classified by using the second SVDD of the Cas-SVDD. Figure 7(a1) shows that there were eight samples with false-positive results, and Figure 7(a2) shows 10 samples with false-negative results. In summary, these results from Figure 7 show that the performance of Cas-SVDD for pipeline leak detection was much better than that of S-SVDD. As shown in Figure 8, the KPCA-Cas-SVDD method had very few false-positive results and false-positive results. This also shows that the feature dimension reduction using KPCA has a great influence on the leak detection performance for the Cas-SVDD. This is because the KPCA algorithm has the ability to process high-dimensional projections for nonlinear data, and can obtain new comprehensive features that contain most of the information of the original feature variables. Table 5 shows the results of the comparison of the three methods for pipeline leak detection. It also shows that the pressure at the location of the leak has a great influence on the accuracy of the pipeline leak detection. The high pressure at the location of the leak indicates that a large leak has occurred. Large leaks are easily detected compared to small leaks, is because a large leak produces a pressure change of higher magnitude that travels faster to the upstream and downstream pressure sensors, causing obvious changes in the feature variables. However, for small leaks, the pressure change at the location of the leak is less obvious and the variation in the feature variables is also small. In this case, the proposed method showed a high detection accuracy compared to the other two methods.

Conclusions
In this paper, a hybrid intelligent method for pipeline leak detection was proposed. The method firstly performed signal denoising and signal reconstruction based on LMD. After that, the KPCA was used for feature dimension reduction. Subsequently, the K-means algorithm was used for the clustering of various operating modes. Finally, the Cas-SVDD was used for pipeline leak detection. The characteristics of this paper are mainly two aspects: (1) the proposed method is only necessary to collect the pressure signal of pipeline normal operation, without the need to collect the leak data samples that are difficult to obtain in the actual pipeline operation; (2) based on the integration of KPCA and Cas-SVDD, a novel method was proposed for pipeline leak detection with multiple operating modes. Compared with the S-SVDD method and Cas-SVDD method, the proposed method in this paper comprehensively considered the various operating modes in the pipeline transportation process, and can effectively reduce the false alarm rate.
However, only two normal operating modes were considered in our work. There may be more operating modes during the actual pipeline operation. In this case, more SVDD models would need to be established, furthermore, the order of each SVDD in the cascade structure should be optimized,

Conclusions
In this paper, a hybrid intelligent method for pipeline leak detection was proposed. The method firstly performed signal denoising and signal reconstruction based on LMD. After that, the KPCA was used for feature dimension reduction. Subsequently, the K-means algorithm was used for the clustering of various operating modes. Finally, the Cas-SVDD was used for pipeline leak detection. The characteristics of this paper are mainly two aspects: (1) the proposed method is only necessary to collect the pressure signal of pipeline normal operation, without the need to collect the leak data samples that are difficult to obtain in the actual pipeline operation; (2) based on the integration of KPCA and Cas-SVDD, a novel method was proposed for pipeline leak detection with multiple operating modes. Compared with the S-SVDD method and Cas-SVDD method, the proposed method in this paper comprehensively considered the various operating modes in the pipeline transportation process, and can effectively reduce the false alarm rate.
However, only two normal operating modes were considered in our work. There may be more operating modes during the actual pipeline operation. In this case, more SVDD models would need to be established, furthermore, the order of each SVDD in the cascade structure should be optimized, which can reduce the time of online detection and improve the accuracy of leak detection. Therefore, how to optimize the order of Cas-SVDD to minimize the time of online detection while ensuring the accuracy of leak detection is an issue worthy of further study. In addition, the better clustering algorithm also contributes to the construction of the Cas-SVDD and improves the accuracy of pipeline leak detection. Finally, some practical issues of the proposed method should be considered for leak detection in a real case. Future work will focus on these issues.