Synthetic Leak Data Generation Using Variational Autoencoders to Address Data Imbalance in Acoustic Emission-Based Pipe Leak Detection
Featured Application
Abstract
1. Introduction
2. Related Work
2.1. Pipe Leak Detection Using Acoustic Emission Sensors
2.2. Data Imbalance Handling
3. Data Acquisition
4. Synthetic Leak Data Generation
4.1. Difference of Spectrograms
4.2. Training of Variational Autoencoder Using Differences-of-Spectrogram Images
4.3. Training of Leak Detection Models
5. Experimental Results
5.1. VAE Model Training Using Differences-of-Spectrogram Images
5.2. Leak Detection Models
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AE | Acoustic Emission |
| VAE | Variational Autoencoder |
| SVM | Support Vector Machine |
| ABC | Artificial Bee Colony |
| STFT | Short-Time Fourier Transform |
| IMF | Intrinsic Mode Function |
| EMD | Empirical Mode Decomposition |
| CNN | Convolutional Neural Network |
| LSTM | Long Short-Term Memory |
| CWT | Continuous Wavelet Transform |
| SMOTE | Synthetic Minority Oversampling Technique |
| ADASYN | Adaptive Synthetic Sampling |
| DFT | Discrete Fourier Transform |
| DoS | Difference of Spectrograms |
| KLD | Kullback–Leibler Divergence |
| PCA | Principal Component Analysis |
References
- Yang, D.; Lee, S.; Lee, J. Crack growth degradation-based diagnosis and design of high pressure liquefied natural gas pipe via designable data-augmented anomaly detection. J. Comput. Des. Eng. 2023, 10, 1531–1546. [Google Scholar] [CrossRef]
- Hu, J.; Zhang, L.; Liang, W. Detection of small leakage from long transportation pipeline with complex noise. J. Loss Prev. Process Ind. 2011, 24, 449–457. [Google Scholar] [CrossRef]
- Rahmat, R.F.; Satria, I.S.; Siregar, B.; Budiarto, R. Water Pipeline Monitoring and Leak Detection using Flow Liquid Meter Sensor. Iop Conf. Ser. Mater. Sci. Eng. 2017, 190, 012036. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, S.; Li, J.; Jin, S. Leak detection monitoring system of long distance oil pipeline based on dynamic pressure transmitter. Measurement 2014, 49, 382–389. [Google Scholar] [CrossRef]
- Wu, H.; Duan, H.F.; Lai, W.W.L.; Zhu, K.; Cheng, X.; Yin, H.; Zhou, B.; Lai, C.C.; Lu, C.; Ding, X. Leveraging Optical Communication Fiber and AI for Distributed Water Pipe Leak Detection. IEEE Commun. Mag. 2024, 62, 126–132. [Google Scholar] [CrossRef]
- Li, M.; Chen, Y.; Wang, G.; Wen, Z.; Yang, X. Online Vibration Detection in High-Speed Robotic Milling Process Based on Wavelet Energy Entropy of Acoustic Emission. Int. J. Precis. Eng. Manuf.-Green Technol. 2025, 12, 1117–1132. [Google Scholar] [CrossRef]
- Rajendran, V.; Prathuru, A.; Fernandez, C.; Faisal, N. Acoustic emission wave propagation in pipeline sections and analysis of the effect of coating and sensor location. Nondestruct. Test. Eval. 2025, 40, 3004–3034. [Google Scholar] [CrossRef]
- Park, B.; Lee, S.; Yoo, H. Detecting Small Leaks in Pipeline with Semi-Supervised Ensemble Learning Using Acoustic Emission Sensor. Int. J. Precis. Eng. Manuf. 2025, 26, 3255–3266. [Google Scholar] [CrossRef]
- Park, J.; Lee, S.; Lee, B.J.; Kim, S.J.; Yoo, H. Acoustic Emission (AE) Technology-based Leak Detection System Using Macro-fiber Composite (MFC) Sensor. Compos. Res. 2023, 36, 429–434. [Google Scholar]
- Yu, L.; Li, S. Acoustic emission (AE) based small leak detection of galvanized steel pipe due to loosening of screw thread connection. Appl. Acoust. 2017, 120, 85–89. [Google Scholar] [CrossRef]
- Banjara, N.K.; Sasmal, S.; Voggu, S. Machine learning supported acoustic emission technique for leakage detection in pipelines. Int. J. Press. Vessel. Pip. 2020, 188, 104243. [Google Scholar] [CrossRef]
- Adegboye, M.A.; Fung, W.K.; Karnik, A. Recent advances in pipeline monitoring and oil leakage detection technologies: Principles and approaches. Sensors 2019, 19, 2548. [Google Scholar] [CrossRef] [PubMed]
- Wongvorachan, T.; He, S.; Bulut, O. A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information 2023, 14, 54. [Google Scholar] [CrossRef]
- Bach, M.; Werner, A.; Żywiec, J.; Pluskiewicz, W. The study of under-and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf. Sci. 2017, 384, 174–190. [Google Scholar] [CrossRef]
- Chen, H.; Xie, W.; Vedaldi, A.; Zisserman, A. Vggsound: A large-scale audio-visual dataset. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), Barcelona, Spain, 4–8 May 2020; pp. 721–725. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Mandal, S.K.; Chan, F.T.; Tiwari, M. Leak detection of pipeline: An integrated approach of rough set theory and artificial bee colony trained SVM. Expert Syst. Appl. 2012, 39, 3071–3080. [Google Scholar] [CrossRef]
- Jin, H.; Zhang, L.; Liang, W.; Ding, Q. Integrated leakage detection and localization model for gas pipelines based on the acoustic wave method. J. Loss Prev. Process Ind. 2014, 27, 74–88. [Google Scholar] [CrossRef]
- Ahadi, M.; Bakhtiar, M.S. Leak detection in water-filled plastic pipes through the application of tuned wavelet transforms to acoustic emission signals. Appl. Acoust. 2010, 71, 634–639. [Google Scholar] [CrossRef]
- Ali, A.; Xinhua, W.; Razzaq, I. Pipeline leak detection through implementation of empirical mode decomposition and cluster analysis. Measurement 2025, 248, 116873. [Google Scholar] [CrossRef]
- Wang, X.; Zhao, M.; Li, S. An improved cross-correlation algorithm based on wavelet transform and energy feature extraction for pipeline leak detection. In ICPTT 2012: Better Pipeline Infrastructure for a Better Life; American Society of Civil Engineers: Reston, VA, USA, 2012; pp. 577–591. [Google Scholar]
- Song, Y.; Li, S. Gas leak detection in galvanised steel pipe with internal flow noise using convolutional neural network. Process Saf. Environ. Prot. 2021, 146, 736–744. [Google Scholar] [CrossRef]
- Peng, H.; Xu, Z.; Huang, Q.; Qi, L.; Wang, H. Leakage detection in water distribution systems based on logarithmic spectrogram CNN for continuous monitoring. J. Water Resour. Plan. Manag. 2024, 150, 04024015. [Google Scholar] [CrossRef]
- Saleem, F.; Ahmad, Z.; Kim, J.M. Real-Time pipeline leak detection: A hybrid deep learning approach using acoustic emission signals. Appl. Sci. 2024, 15, 185. [Google Scholar] [CrossRef]
- Huang, Y.M.; Du, S.X. Weighted support vector machine for classification with uneven training class sizes. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005; pp. 4365–4369. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Menardi, G.; Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 2014, 28, 92–122. [Google Scholar] [CrossRef]
- Swana, E.F.; Doorsamy, W.; Bokoro, P. Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset. Sensors 2022, 22, 3246. [Google Scholar] [CrossRef]
- Blagus, R.; Lusa, L. SMOTE for high-dimensional class-imbalanced data. Bmc Bioinform. 2013, 14, 106. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Smart Sensor, Smart Solution Renovation in Achievement. Available online: https://rinasolution.com/ (accessed on 18 January 2023).
- Sejdić, E.; Djurović, I.; Jiang, J. Time–frequency feature representation using energy concentration: An overview of recent advances. Digit. Signal Process. 2009, 19, 153–183. [Google Scholar] [CrossRef]
- Chen, K.F.; Mei, S.L. Composite interpolated fast Fourier transform with the Hanning window. IEEE Trans. Instrum. Meas. 2010, 59, 1571–1579. [Google Scholar] [CrossRef]
- Richardson, M. Fundamentals of the discrete fourier transform. Sound Vib. Mag. 1978, 12, 5. [Google Scholar]
- Burgess, C.P.; Higgins, I.; Pal, A.; Matthey, L.; Watters, N.; Desjardins, G.; Lerchner, A. Understanding disentangling in β-VAE. arXiv 2018, arXiv:1804.03599. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar] [CrossRef]
- Gemeinhardt, H.; Sharma, J. Machine-learning-assisted leak detection using distributed temperature and acoustic sensors. IEEE Sens. J. 2024, 24, 1520–1531. [Google Scholar] [CrossRef]
- Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
- Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Zhai, X.; Jelfs, B.; Chan, R.H.; Tin, C. Short latency hand movement classification based on surface EMG spectrogram with PCA. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; pp. 327–330. [Google Scholar]













| Layer Name | Kernel | Output Shape |
|---|---|---|
| conv2d | 3 × 3, 512, stride 2 | 128 × 64 |
| conv2d | 3 × 3, 256, stride 2 | 64 × 32 |
| conv2d | 3 × 3, 128, stride 2 | 32 × 16 |
| conv2d | 3 × 3, 64, stride 2 | 16 × 8 |
| conv2d | 3 × 3, 32, stride 2 | 8 × 4 |
| linear | - | 1024 |
| Layer Name | Kernel | Output Shape |
|---|---|---|
| linear (mean) | - | 128 |
| linear (logvar) | - | 128 |
| reparameterization | - | 128 |
| Layer Name | Kernel | Output Shape |
|---|---|---|
| decoder input | - | 1024 |
| pixel shuffle | 3 × 3, 64, stride 2, upscale 2 | 16 × 8 |
| pixel shuffle | 3 × 3, 128, stride 2, upscale 2 | 32 × 16 |
| pixel shuffle | 3 × 3, 256, stride 2, upscale 2 | 64 × 32 |
| pixel shuffle | 3 × 3, 512, stride 2, upscale 2 | 128 × 64 |
| interpolation | bilinear, upscale 2 | 256 × 128 |
| 3 × 3, 1, stride 1 |
| Parameter | Value |
|---|---|
| Frame length | 1024 |
| Hop size | 512 |
| Window type | Hann window |
| Model | Oversampling Method | ||||
|---|---|---|---|---|---|
| Baseline | Random | SMOTE | ADASYN | Proposed | |
| Logistic Regression | 0.901 | 0.890 | 0.890 | 0.889 | 0.990 |
| 0.901 | 0.901 | 0.901 | 0.900 | 0.990 | |
| Decision Tree | 0.518 | 0.519 | 0.519 | 0.519 | 0.973 |
| 0.037 | 0.071 | 0.071 | 0.071 | 0.972 | |
| SVM | 0.500 | 0.585 | 0.629 | 0.630 | 0.926 |
| 0.000 | 0.289 | 0.409 | 0.413 | 0.920 | |
| Gradient Boosting | 0.518 | 0.543 | 0.543 | 0.543 | 0.973 |
| 0.037 | 0.157 | 0.157 | 0.157 | 0.972 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Park, B.; Ryu, H.; Yoo, H. Synthetic Leak Data Generation Using Variational Autoencoders to Address Data Imbalance in Acoustic Emission-Based Pipe Leak Detection. Appl. Sci. 2026, 16, 3050. https://doi.org/10.3390/app16063050
Park B, Ryu H, Yoo H. Synthetic Leak Data Generation Using Variational Autoencoders to Address Data Imbalance in Acoustic Emission-Based Pipe Leak Detection. Applied Sciences. 2026; 16(6):3050. https://doi.org/10.3390/app16063050
Chicago/Turabian StylePark, Byungjae, Hyejeong Ryu, and Hyeongmin Yoo. 2026. "Synthetic Leak Data Generation Using Variational Autoencoders to Address Data Imbalance in Acoustic Emission-Based Pipe Leak Detection" Applied Sciences 16, no. 6: 3050. https://doi.org/10.3390/app16063050
APA StylePark, B., Ryu, H., & Yoo, H. (2026). Synthetic Leak Data Generation Using Variational Autoencoders to Address Data Imbalance in Acoustic Emission-Based Pipe Leak Detection. Applied Sciences, 16(6), 3050. https://doi.org/10.3390/app16063050

