Pipeline Leakage Detection Using Acoustic Emission and Machine Learning Algorithms

Pipelines play a significant role in liquid and gas resource distribution. Pipeline leaks, however, result in severe consequences, such as wasted resources, risks to community health, distribution downtime, and economic loss. An efficient autonomous leakage detection system is clearly required. The recent leak diagnosis capability of acoustic emission (AE) technology has been well demonstrated. This article proposes a machine learning-based platform for leakage detection for various pinhole-sized leaks using the AE sensor channel information. Statistical measures, such as kurtosis, skewness, mean value, mean square, root mean square (RMS), peak value, standard deviation, entropy, and frequency spectrum features, were extracted from the AE signal as features to train the machine learning models. An adaptive threshold-based sliding window approach was used to retain the properties of both bursts and continuous-type emissions. First, we collected three AE sensor datasets and extracted 11 time domain and 14 frequency domain features for a one-second window for each AE sensor data category. The measurements and their associated statistics were transformed into feature vectors. Subsequently, these feature data were utilized for training and evaluating supervised machine learning models to detect leaks and pinhole-sized leaks. Several widely known classifiers, such as neural networks, decision trees, random forests, and k-nearest neighbors, were evaluated using the four datasets regarding water and gas leakages at different pressures and pinhole leak sizes. We achieved an exceptional overall classification accuracy of 99%, providing reliable and effective results that are suitable for the implementation of the proposed platform.


Introduction
Pipelines are the primary mechanical component required for long-distance liquid and gas material distribution and transportation. Pipeline networks, therefore, need unwavering quality, high safety levels, and efficiency. The current status of a pipeline network involves colossal annual leakage rates and the corresponding waste of natural resources. Pipeline leaks may have detrimental effects on the environment, human safety, property, and reputation and additionally lead to financial losses from fines and cleanup expenses. BenSaleh et al. [1] noted that it is well-documented that many countries rely heavily on long-distance oil and water transportation from desalination plants to their intended destinations. Unfortunately, significant quantities of these resources are lost annually due to leaks in the pipelines, with an estimated 60% of water being wasted each year due to pipeline leaks [2]. For pipeline operators to achieve optimal performance, a reliable leak detection system (LDS) is imperative. An influential LDS should be able to detect leaks promptly, provide accurate leak localization, minimize false alarms, be easy to retrofit, function well under various operating conditions, and utilize sensors with high dependability and low maintenance requirements. This article proposes an LDS using acoustic emission (AE) and machine learning (ML) algorithms. the predefined threshold for extracting the AE features can lead to false alarms due to noise in the AE signal. Furthermore, defining a threshold above the level of continuous background noise requires human expertise and domain understanding. Additionally, the type of transportation medium will also affect the AE hits in the signal. In order to address the above-mentioned problems, it is of primary importance to develop a leak-sensitive model for pipeline leak detection and size identification. As such, in this manuscript, an attempt has been made to develop a leak-sensitive model for pipeline leak detection and size identification with different transportation mediums, such as fluid and gas. Instead of utilizing a predetermined threshold for AE feature extraction, in this paper, a sliding window is used. In order to exploit the statistical changes in the AE signal due to the defects in the pipeline, statistical indicators, such as kurtosis, skewness, mean value, RMS, peak value, standard deviation, and entropy, are calculated from each sliding window. Furthermore, the changes in the frequency spectrum due to the defect in the pipeline are utilized by calculating spectral features from each sliding window. Additionally, a set of classification models were tested and validated for pipeline leak detection and size identification by considering two different transportation mediums, out of which the best classification model is reported in this study.
The overall novelty and contribution of this work can be summarized as follows: (i) In order to exploit the statistical changes in the AE signal due to the defects in the pipeline, a sliding window is used, and from each sliding window, temporal statistical indicators are calculated. Furthermore, the changes in the frequency spectrum due to the defect in the pipeline are utilized by calculating the spectral features from each sliding window. To the best of our knowledge, utilizing a sliding window to extract statistical and spectral indicators from the AE signal is reported for the first time in this work; (ii) A pipeline health-sensitive classification model is reported in this study based on evaluating different classification models for pipeline leak detection and size identification by considering two different transportation mediums, such as fluid and gas; (iii) Real-world industrial fluid pipeline data were utilized in this study for leak detection and size identification using machine learning algorithms.
Overall, the proposed platform achieved an exceptional overall classification accuracy of 99%, which makes it a reliable and effective solution for pipeline leakage detection and leak pinhole size identification.
The following sections make up the structure of the paper. Section 2 proposes the architecture, methodology, and ML algorithms for leakage detection. The results and pipeline experimental test rig are presented in Section 3. Section 4 presents the conclusion of this study.

The Proposed Architecture and Methodology
Figures 1 and 2 show the overview of the proposed methodology for pipeline leakage detection. The architecture is implemented using acoustic emission sensors. There are three sensors placed on the pipeline at locations channel 3 = 0 mm, channel 2 = 1600 mm, and channel 1 = 2500 mm. Data are transmitted to the next step, which is the data acquisition step, in which the signals that gauge physical circumstances in the real world and transform the resulting samples into digital numeric values (that a computer can work with) are sampled. The acquisition of data at the defined sampling rate, 1MHz, extracts the desired features and assigns the labels to the feature vector extracted from one second for the complete data set. The next step is to complete the dataset, which is then processed for testing the classification accuracy with different algorithms to detect the leakage in the pipeline. Once the activity labels are identified, the activity gets either "leakage" or "normal" assigned to its label. It will generate the output of the sensor data containing the leakage or the data are "Normal". The final step is to show the result on the display for the monitoring supervisor. In the rest of the following subsection, we briefly explain each subcomponent. leakage or the data are "Normal". The final step is to show the result on the display for the monitoring supervisor. In the rest of the following subsection, we briefly explain each subcomponent.

Acoustic Emission
A leak in the pipeline results in a change in the structural integrity of the material. This change in the structural integrity can be due to fatigue rupture, stress cracks, corrosion cracks, and structural discontinuities. The structural discontinuity or leak in the pipeline (irrespective of the cause) will disturb the flow of the fluid or gas inside the pipeline. However, the intramolecular interactions or chemical bonding of the fluid will force the fluid to keep its flow consistent [29]. Thus, for the fluid to keep its flow consistent inside

Acoustic Emission
A leak in the pipeline results in a change in the structural integrity of the material. This change in the structural integrity can be due to fatigue rupture, stress cracks, corrosion cracks, and structural discontinuities. The structural discontinuity or leak in the pipe-

Acoustic Emission
A leak in the pipeline results in a change in the structural integrity of the material. This change in the structural integrity can be due to fatigue rupture, stress cracks, corrosion cracks, and structural discontinuities. The structural discontinuity or leak in the pipeline (irrespective of the cause) will disturb the flow of the fluid or gas inside the pipeline. However, the intramolecular interactions or chemical bonding of the fluid will force the fluid to keep its flow consistent [29]. Thus, for the fluid to keep its flow consistent inside the pipeline, the molecule of the fluid will exert pressure on the position of pipeline structural discontinuity or leak, which will result in the short, rapid release of energy in the form of an elastic wave. An AE can be defined as a transient sound wave that is generated by a short, rapid release of energy in the form of an elastic wave that is produced by the change in the structural integrity of the material within a specimen, such as a pipeline [30]. The physical phenomenon resulting from the fluid interaction with the structural discontinuity is referred to as an AE event. This AE event is detected by the AE sensors in the form of AE hits. Thus, the AE hits in the signal can be related to a leak due to fatigue rupture, stress cracks, corrosion cracks, and structural discontinuities. For this reason, the nondestructive method based on AE is considered 'global' in nature. The word global means that AE-based monitoring allows the investigator to get a bigger picture of the overall performance of the specimen, irrespective of the cause for degradation [31]. Specifically, in this study, a nondestructive method based on AE was used to investigate the health conditions of the pipeline. Based on the global nature of the AE, theoretically, the proposed method will work to identify a leak and its size.

Acoustic Emission Sensor
An acoustic emission (AE) sensor is a device that detects and analyzes the sound waves generated by changes in the internal structure of a material or structure. The R15I-AST, manufactured by MISTRAS Group, Inc, is one example of such a sensor. It uses piezoelectric transducers to convert mechanical stress or strain into electrical signals, which can then be analyzed to determine the location and severity of structural changes. These sensors are commonly used for non-destructive testing and the structural health monitoring of various structures, such as bridges, pipelines, and pressure vessels. With the capability of working under high temperatures, humidity, and pressures, R15I-AST can monitor the structural integrity in near real-time and provide early warnings of potential issues, allowing for the necessary actions to be taken before any damage occurs. The operating frequency range of the R15I-AST sensor and the parameters used for the data acquisition are listed in Table 1.

Data Acquisition
In data acquisition, the physical conditions in the real world are measured through the use of sensors. These measurements are then sampled and translated into digital numeric values that a computer can interpret. This is typically achieved by converting analog waveforms into digital values for further processing. The key component of the current data acquisition system is the acoustic emission sensor, which is used to convert physical parameters into electrical signals. Once the data is acquired from the sensors, it can be used to detect leaks or other anomalies in the pipeline. The spectrum was performed on the time amplitude signal to instantly identify the leak time.

Features Extraction in the Time and Frequency Domains
The feature extraction and selection methods are helpful for the transformation of data, which translates the preprocessed data into processed data to identify significant trends. Features extraction is an essential approach for reducing data size and provides valuable information for developing a classification model. Most researchers use statistical characteristics approaches for feature extraction. Chai et al. [32] extracted the various features from the AE signal, such as peak amplitude, entropy, energy, count, peak frequency, and centroid frequency, to find crack growth under different stress ratios. Muir et al. [33] reviewed the time domain, frequency domain, and composite features extracted from the AE signal for the damage analysis of a structure. They mentioned around 31 various features used in the literature. Traditional AE hit features, such as rise time, decay time, counts, etc., can be extracted from the AE signal by defining a threshold above the level of continuous background noise. However, the predefined threshold for extracting the AE features can lead to false alarms due to noise in the AE signal. Furthermore, defining a threshold above the level of continuous background noise requires human expertise and domain understanding. In order to address this concern, instead of utilizing a predetermined threshold for AE feature extraction, in this paper, a sliding window is used. In order to exploit the statistical changes in the AE signal due to the defects in the pipeline, this research extracts 25 statistical time and frequency domain features for each AE channel using a sliding window. A total of 75 features are extracted from the three AE channels. All these features are provided as input to the classification model for the task of a pipeline health assessment. The features extracted from each AE channel, comprising 11 time domain features (such as mean, standard deviation, skewness, kurtosis, crest factor, clearance factor, etc.) and 14 frequency domain features (namely, P1, P2, P3, P4, P5, . . . , P14). Considered the frequency spectrum-based feature extraction that incorporated the lower and higher frequencies into their power. Figure 4 shows the details of feature extraction and feature vectors. Tables 2 and 3 show the mathematical formulas for the time domain and frequency domain features, respectively.
there is no leakage. It clearly shows that both the time amplitude and frequency amplitude are low for both time response and frequency response under normal conditions. Figure  3g-i show the time-domain amplitude, and Figure 3j-l show the frequency response. These figures illustrated that when the leakage was introduced, both the time and frequency amplitude increased two times that of the normal condition.  health assessment. The features extracted from each AE channel, comprising 11 time domain features (such as mean, standard deviation, skewness, kurtosis, crest factor, clearance factor, etc.) and 14 frequency domain features (namely, P1, P2, P3, P4, P5, …, P14). Considered the frequency spectrum-based feature extraction that incorporated the lower and higher frequencies into their power. Figure 4 shows the details of feature extraction and feature vectors. Tables 2 and 3 show the mathematical formulas for the time domain and frequency domain features, respectively.    [19].

Feature Name Equation Feature Name Equation
Crest factor X crest = X peak X rms Clearance factor X clearnace = X peak X root Table 3. Frequency domain features of AE channels.

Mean
Frequency Root means square frequency P7 = X rms f =

Machine Learning Algorithms for Leakage Detection and Identification
Classification algorithms are used to predict the class or label of a given set of data. The input to these algorithms is a set of features extracted from raw sensor data, which are associated with specific activities or classes. A decision rule or function that can accurately predict the class of new data based on these features must be determined. This process is known as the classification task.
Classifiers are machine learning techniques that can be used to assign labels to activities. They are trained on a dataset of labeled data, where the feature vector (also known as the training dataset) has been given a label. The learning algorithm adjusts its parameters to generate a model or hypothesis, which can then be used to predict the label 'y' for new input data 'x'. In this research paper, the data collected were related to a pipeline, and we used MATLAB to extract the features of each piece of data. Then, we applied algorithms to classify the data using the software tool MATLAB. In order to check the accuracy of the data, four distinct algorithms were used.
A simple and successful technique for classification and regression applications is k-nearest neighbors (KNN). It operates by locating the K closest data points in the training set for a specific point in the test set, then making a prediction using the labels or values of these nearest neighbors. The Euclidean distance, or the distance along a straight line between two points, is one approach to gauge the separation between data points. The Euclidean distance in KNN is determined as the square root of the sum of the squared coordinate differences between the two points. The performance of the KNN model can be influenced by the distance metric that is selected. Euclidean and Manhattan distances can both be employed. In some instances, Euclidean distance may be more appropriate, while the Manhattan distance may be more suitable in others. The dataset's characteristics and the task when deciding which distance metric to use with KNN.
A random forest is a method of categorization that relies on the construction of many decision trees (weak learners) and, in the end, adopts the verdict reached by the majority of such learners. The decision tree is a single tree, but the random forest has multiple trees. Normally, Overfitting can be prevented through the use of trimming decision trees. With pruning, you have to choose between precision and simplicity. Complexity, extra work, and more use of resources are the results of not trimming. Equal to the parameters of a decision tree classifier is the random forest.
Three different node types are formed while constructing a decision tree, i.e., -The root node is the node with no input link and can have no or some output links; -Internal nodes have one input link and two or more output links; -Leaf nodes are the end nodes that have exactly one input link and no output link.
The neural network can be used for more complex models, which can be utilized in multi-class classification. Neural networks are inspired by the brain, which is a network of neurons. The neuron model consists of some inputs with input weights, a hidden layer, and an output (hypothesis). They translate information through a sort of machine recognition, marking, or grouping of the information. The examples they observe are numerical in vector form, into which all correct information, might be pictures, sound, content, or time arrangement, must be deciphered. A neural network is a collection of "neurons" with "synapses" which are connecting. Hidden layers are vital when the neural system needs to realize something truly confounded, relevant, or non-self-evident, like picture acknowledgment. The circles speak to the neurons, and the lines speak to the synapses. Synapses take the input and multiply it by weight. The neurons add the outputs from all synapses and apply an activation function.

Performance Metrics
In order to evaluate the performance of the proposed method in comparison to the reference method, metrics such as accuracy, precision, and recall were employed. Equations (1)-(3) were used to determine these metrics: In this context, TP a , FP a , and FN a refer to the true-positive, false-positive, and falsenegative results, respectively, obtained from the features that are representative of class a; n a represents the total number of samples from class a; A represents the overall number of classes in the dataset. The variable N denotes how many samples there are in all of the testing sets.

Results and Discussion
In this section, the experimental setup and the performance comparison of the machine learning algorithms are described.

Experimental Setup
The proposed technique was experimentally validated using a specific experimental setup, as illustrated in Figure 5. The setup consists of a stainless-steel pipeline with a thickness of 6.02 mm and an outer diameter of 114.3 mm, and it is used for transporting water or gas and is equipped with AE sensors and a data collection system. The sensors used in this setup are R15I-AST-type sensors manufactured by MISTRAS Group, Inc. The sensors were fixed to the pipeline by using adhesive gels and mounting tapes. The data collection system comprised a NI-9223 National Instruments Data Acquisition system and a computer system that was set up to record information on the pipeline data conditions. In order to ensure the sensor's sensitivity to the applied stresses, the sampling frequency was adjusted to 1 MHz, and pencil lead break tests were carried out. Accuracy = ∑ n × TP RP FN N In this context, TPa, FPa, and FNa refer to the true-positive, false-positive, and falsenegative results, respectively, obtained from the features that are representative of class a; na represents the total number of samples from class a; A represents the overall number of classes in the dataset. The variable N denotes how many samples there are in all of the testing sets.

Results and Discussion
In this section, the experimental setup and the performance comparison of the machine learning algorithms are described.

Experimental Setup
The proposed technique was experimentally validated using a specific experimental setup, as illustrated in Figure 5. The setup consists of a stainless-steel pipeline with a thickness of 6.02 mm and an outer diameter of 114.3 mm, and it is used for transporting water or gas and is equipped with AE sensors and a data collection system. The sensors used in this setup are R15I-AST-type sensors manufactured by MISTRAS Group, Inc. The sensors were fixed to the pipeline by using adhesive gels and mounting tapes. The data collection system comprised a NI-9223 National Instruments Data Acquisition system and a computer system that was set up to record information on the pipeline data conditions. In order to ensure the sensor's sensitivity to the applied stresses, the sampling frequency was adjusted to 1 MHz, and pencil lead break tests were carried out.
Leak simulation was conducted by incorporating a valve into the pipeline and conducting tests at pressure levels of 13 and 18 bar. The process involved initially closing the valve and collecting data for 2 min while the pipeline was in normal operation. The valve was then opened to introduce a 1 mm leak, and data were collected for an additional 4 min. When the valve was closed again, the pipeline's flow was stabilized. This process was repeated at both pressure levels for the same leak pinhole size. In total, 360 signal samples were collected for each test, with 120 samples from the normal condition and 240 samples from the leak condition being used for further analysis and evaluation.  Leak simulation was conducted by incorporating a valve into the pipeline and conducting tests at pressure levels of 13 and 18 bar. The process involved initially closing the valve and collecting data for 2 min while the pipeline was in normal operation. The valve was then opened to introduce a 1 mm leak, and data were collected for an additional 4 min. When the valve was closed again, the pipeline's flow was stabilized. This process was repeated at both pressure levels for the same leak pinhole size. In total, 360 signal samples were collected for each test, with 120 samples from the normal condition and 240 samples from the leak condition being used for further analysis and evaluation.

Dataset Collection and Description
In this study, the pipeline provides a transportation medium for fluid and gas. For each transportation medium, a pressure of 13 and 18 bar was adjusted with the help of a centrifugal pump. First, the valve was kept closed, and the pipeline was operated to transport fluid under 13 bar pressure; thus, the normal operating conditions data were acquired. After the acquisition of the normal condition data, under the same pressure condition, the leak valve was opened to 1 mm, and data were acquired, which formed Dataset-1. After the acquisition of data under a pressure of 13 bar, the leak valve was kept closed, and the pipeline was operated to transport fluid under a pressure of 18 bar, and the normal conditions data were acquired. After the acquisition of the normal condition data, under the same pressure condition, the leak valve was opened to 0.7 mm, and data were acquired, which formed Dataset-3. The same process was repeated for the acquisition of Dataset-2 and Dataset-4. However, for safety purposes, the leak valve was only opened up to 0.5 mm during the data acquisition for Dataset-2 and Dataset-4. A detailed description of each dataset is given in Table 4. The dataset was divided into 70% for training, 15% for testing, and 15% for validation, respectively. The 'logsig' activation function and two configurations of neurons were used, i.e., 10 and 50 neurons. Figure 6 shows the convergence curves for the best validation value 3.6829 × 10 −7 at epochs 24 and 1.44345 × 10 −7 at epochs 28 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 7 shows the confusion matrices obtained using the 10 and 50 neurons, respectively. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation. Confusion matrix "1" represents "No Leakage/Normal", and "2" represents "Leakage". Figure 8 shows the convergence curves for the best validation value 2.8079 × 10 −7 at epochs 24 and 1.5926 × 10 −7 at epochs 30 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 9 shows the confusion matrices obtained by using 10 and 50 neurons. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation.
respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 7 shows the confusion matrices obtained using the 10 and 50 neurons, respectively. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation. Confusion matrix "1" represents "No Leakage/Normal", and "2" represents "Leakage".   Figure 8 shows the convergence curves for the best validation value 2.8079 × 10 −7 at epochs 24 and 1.5926 × 10 −7 at epochs 30 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 9 shows the confusion matrices obtained by using 10 and 50 neurons. The neural networks achieved the highest accuracy for both the 10and 50-neuron setups for training, testing, and validation.  Figure 10 shows the convergence curves for the best validation value 4.1322 × 10 −7 at epochs 26 and 1.2969 × 10 −7 at epochs 31 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 11 shows the confusion matrices obtained by using the 10-and 50-neuron setups. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation. epochs 24 and 1.5926 × 10 −7 at epochs 30 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 9 shows the confusion matrices obtained by using 10 and 50 neurons. The neural networks achieved the highest accuracy for both the 10and 50-neuron setups for training, testing, and validation.   Figure 10 shows the convergence curves for the best validation value 4.1322 × 10 −7 at epochs 26 and 1.2969 × 10 −7 at epochs 31 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 11 shows the confusion matrices obtained by using the 10-and 50-neuron setups. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation.  Figure 12 shows the convergence curves for the best validation value 0.00067013 at epochs 20 and 1.5746 × 10 −7 at epochs 29 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 13 shows the confusion matrices obtained by using 10 and 50 neurons. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation. Figure 10 shows the convergence curves for the best validation value 4.1322 × 10 −7 at epochs 26 and 1.2969 × 10 −7 at epochs 31 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 11 shows the confusion matrices obtained by using the 10-and 50-neuron setups. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation.   Figure 12 shows the convergence curves for the best validation value 0.00067013 at epochs 20 and 1.5746 × 10 −7 at epochs 29 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 13 shows the confusion matrices obtained by using 10 and 50 neurons. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation.  Figure 12 shows the convergence curves for the best validation value 0.00067013 at epochs 20 and 1.5746 × 10 −7 at epochs 29 for 10 and 50 neurons, respectively. The convergence clearly depicts improved accuracy when increasing the number of neurons but also costs an increase in training time. Figure 13 shows the confusion matrices obtained by using 10 and 50 neurons. The neural networks achieved the highest accuracy for both the 10-and 50-neuron setups for training, testing, and validation.

K-Nearest Neighbor
The datasets were divided into 90-10%, 80-20%, and 70-30%, and used K = 5 and the Manhattan distance as a similarity metric to predict the closest label for the test sample. Figure 14 shows the accuracy of the four datasets; the highest accuracy of 100% was achieved.

K-Nearest Neighbor
The datasets were divided into 90-10%, 80-20%, and 70-30%, and used K = 5 and the Manhattan distance as a similarity metric to predict the closest label for the test sample. Figure 14 shows the accuracy of the four datasets; the highest accuracy of 100% was achieved.

K-Nearest Neighbor
The datasets were divided into 90-10%, 80-20%, and 70-30%, and used K = 5 and the Manhattan distance as a similarity metric to predict the closest label for the test sample. Figure 14 shows the accuracy of the four datasets; the highest accuracy of 100% was achieved. Figure 14. Performance comparison of KNN algorithm accuracy for the four datasets for different data splits. Figure 14. Performance comparison of KNN algorithm accuracy for the four datasets for different data splits.

Random Forest
The dataset was divided into 90-10%, 80-20%, and 70-30%, and the number of trees = 100, and the Gini index was used for splitting the trees. Figure 15 shows the accuracy of the four datasets; the highest accuracy of more than 99% was achieved.

Random Forest
The dataset was divided into 90-10%, 80-20%, and 70-30%, and the number of trees = 100, and the Gini index was used for splitting the trees. Figure 15 shows the accuracy of the four datasets; the highest accuracy of more than 99% was achieved.

Decision Tree
The dataset was divided into 90-10%, 80-20%, and 70-30%, and we used the Gini index for splitting the tree. Figure 16 shows the accuracy of the four datasets; the highest accuracy of more than 99% was achieved for each dataset.

Decision Tree
The dataset was divided into 90-10%, 80-20%, and 70-30%, and we used the Gini index for splitting the tree. Figure 16 shows the accuracy of the four datasets; the highest accuracy of more than 99% was achieved for each dataset.

Decision Tree
The dataset was divided into 90-10%, 80-20%, and 70-30%, and we used the Gini index for splitting the tree. Figure 16 shows the accuracy of the four datasets; the highest accuracy of more than 99% was achieved for each dataset.   Figure 17 shows the overall performance comparison of the applied ML algorithms in terms of prediction accuracy. It shows that the neural networks show the highest accuracy among all the algorithms, and also, the KNN algorithms for all datasets in all splits were the best.  Figure 17 shows the overall performance comparison of the applied ML algorithms in terms of prediction accuracy. It shows that the neural networks show the highest accuracy among all the algorithms, and also, the KNN algorithms for all datasets in all splits were the best.

Conclusions
This article presents a machine learning-based platform for detecting and localizing pipeline leaks using acoustic emission (AE) technology. By extracting various statistical measures from AE signals and using them as features to train machine learning models, the platform can accurately identify and locate leaks in pipelines. In order to preserve the characteristics of both bursts and continuous-type emissions, a sliding window with an adaptive threshold was used, allowing for real-time data collection and analysis. The article also presents an evaluation of the proposed platform by using four datasets that contain water and gas leaks at different pressures and various machine learning classifiers like neural networks, decision tree random forests, and k-nearest neighbors. An overall classification accuracy of 99% was achieved, indicating that the proposed platform is a reliable and effective solution for pipeline leak detection and localization. Overall, the article emphasizes the significance of pipeline leaks and how the proposed machine learning-based platform can be an effective solution for this problem. The severe consequences

Conclusions
This article presents a machine learning-based platform for detecting and localizing pipeline leaks using acoustic emission (AE) technology. By extracting various statistical measures from AE signals and using them as features to train machine learning models, the platform can accurately identify and locate leaks in pipelines. In order to preserve the characteristics of both bursts and continuous-type emissions, a sliding window with an adaptive threshold was used, allowing for real-time data collection and analysis. The article also presents an evaluation of the proposed platform by using four datasets that contain water and gas leaks at different pressures and various machine learning classifiers like neural networks, decision tree random forests, and k-nearest neighbors. An overall classification accuracy of 99% was achieved, indicating that the proposed platform is a reliable and effective solution for pipeline leak detection and localization. Overall, the article emphasizes the significance of pipeline leaks and how the proposed machine learning-based platform can be an effective solution for this problem. The severe consequences of pipeline leaks include wasted resources, health risks, distribution downtime, and economic losses, so it is important to develop efficient leak detection systems. The success of the proposed platform in detecting and localizing leaks with high accuracy provides a strong indication of its potential for use in real-world applications. It is also notable that AE technology is a promising solution for detecting pipeline leaks, as it is capable of leak diagnosis, which has been significantly demonstrated. The current study is capable of detecting and identifying the size of a leak in the pipeline. However, the classification model cannot predict the condition of the pipeline, along with the pressure and transportation medium. For this reason, in the future, a classification model can be developed that can predict the condition of the pipeline, along with the pressure and transportation medium.