Denoising Masked Autoencoder-Based Missing Imputation within Constrained Environments for Electric Load Data
Abstract
:1. Introduction
- Existing autoencoder-based methods for missing imputation in prior studies have been conducted in environments where plentiful complete DLP data are available. However, our proposed DMAE method offers the advantage of alleviating these learning environment constraints, making it applicable to scenarios characterized by short data collection periods and high missing rates.
- Previous approaches to missing value imputation based on autoencoders exclusively utilized either the DAE or the MAE methods. In contrast, the proposed DMAE-based approach integrates both methods to leverage their respective advantages, resulting in substantial enhancement of the missing value imputation performance.
- To validate the effectiveness of the proposed DMAE-based missing imputation technique, we conducted extensive experiments using three datasets. We compared the performance of our approach with the traditional missing imputation methods HA and LI, as well as existing autoencoder-based methods, DAE, and MAE. While LI may be more efficient for DLP data with short missing durations, our proposed DMAE method outperforms traditional methods like HA or LI and other autoencoder-based methods in terms of the missing imputation accuracy, especially in cases of longer missing durations.
2. Related Works
2.1. Missing Imputation Model Based on Generative Adversarial Networks (GANs)
2.2. Missing Imputation Model Based on Denoising Autoencoders (DAEs)
2.3. Missing Imputation Model Based on Masked Autoencoders (MAEs)
3. Proposed Methodology
3.1. Denoising Autoencoder (DAE)
3.2. Masked Autoencoder (MAE)
3.3. Proposed Denoising Masked Autoencoder (DMAE)
Algorithm 1 Proposed DMAE-based missing imputation algorithm |
|
4. Performance Evaluation
4.1. Experimental Setup
4.1.1. Data Description
4.1.2. Baseline Model
4.1.3. Evaluation Metrics
4.1.4. Hyperparameter Selection
4.2. Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
DLP | Daily load profile |
HA | Historical average |
LI | Linear interpolation |
DAE | Denoising autoencoder |
MAE | Masked autoencoder |
DMAE | Denoising masked autoencoder |
Proability density function | |
CDF | Cumulative distribution function |
References
- Massaoudi, M.; Abu-Rub, H.; Refaat, S.S.; Chihi, I.; Oueslati, F.S. Deep learning in smart grid technology: A review of recent advancements and future prospects. IEEE Access 2021, 9, 54558–54578. [Google Scholar] [CrossRef]
- Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data 2021, 8, 1–37. [Google Scholar] [CrossRef]
- Peppanen, J.; Zhang, X.; Grijalva, S.; Reno, M.J. Handling bad or missing smart meter data through advanced data imputation. In Proceedings of the 2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Minneapolis, MN, USA, 6–9 September 2016; pp. 1–5. [Google Scholar]
- Ryu, S.; Choi, H.; Lee, H.; Kim, H. Convolutional autoencoder based feature extraction and clustering for customer load analysis. IEEE Trans. Power Syst. 2019, 35, 1048–1060. [Google Scholar] [CrossRef]
- Noor, N.M.; Al Bakri Abdullah, M.M.; Yahaya, A.S.; Ramli, N.A. Comparison of linear interpolation method and mean method to replace the missing values in environmental data set. Mater. Sci. Forum 2015, 803, 278–281. [Google Scholar] [CrossRef]
- Zhang, J.; Yin, P. Multivariate time series missing data imputation using recurrent denoising autoencoder. In Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 18–21 November 2019; pp. 760–764. [Google Scholar]
- Ryu, S.; Kim, M.; Kim, H. Denoising autoencoder-based missing value imputation for smart meters. IEEE Access 2020, 8, 40656–40666. [Google Scholar] [CrossRef]
- De Wit, T.D. A method for filling gaps in solar irradiance and solar proxy data. Astron. Astrophys. 2011, 533, A29. [Google Scholar] [CrossRef]
- Luo, Y.; Cai, X.; Zhang, Y.; Xu, J.; Xiaojie, Y. Multivariate time series imputation with generative adversarial networks. Adv. Neural Inf. Process. Syst. 2018, 31, 1–12. [Google Scholar]
- De Paz-Centeno, I.; García-Ordás, M.T.; García-Olalla, Ó.; Alaiz-Moretón, H. Imputation of missing measurements in PV production data within constrained environments. Expert Syst. Appl. 2023, 217, 119510. [Google Scholar] [CrossRef]
- Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 111–112. [Google Scholar]
- Yoon, J.; Jordon, J.; Schaar, M. Gain: Missing data imputation using generative adversarial nets. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5689–5698. [Google Scholar]
- Zhang, W.; Luo, Y.; Zhang, Y.; Srinivasan, D. SolarGAN: Multivariate solar data imputation using generative adversarial network. IEEE Trans. Sustain. Energy 2020, 12, 743–746. [Google Scholar] [CrossRef]
- Hwang, J.; Suh, D. CC-GAIN: Clustering and Classification-Based Generative Adversarial Imputation Network for Missing Electricity Consumption Data Imputation; SSRN 4617547; Elsevier: Amsterdam, The Netherlands, 2023. [Google Scholar]
- Hu, X.; Zhan, Z.; Ma, D.; Zhang, S. Spatiotemporal Generative Adversarial Imputation Networks: An Approach to Address Missing Data for Wind Turbines. IEEE Trans. Instrum. Meas. 2023, 72, 3530508. [Google Scholar] [CrossRef]
- Zhao, L.; Wang, Z.; Chen, T.; Lv, S.; Yuan, C.; Shen, X.; Liu, Y. Missing interpolation model for wind power data based on the improved CEEMDAN method and generative adversarial interpolation network. Glob. Energy Interconnect. 2023, 6, 517–529. [Google Scholar] [CrossRef]
- Li, Y.; Song, L.; Hu, Y.; Lee, H.; Wu, D.; Rehm, P.; Lu, N. Load Profile Inpainting for Missing Load Data Restoration and Baseline Estimation. IEEE Trans. Smart Grid 2023. [Google Scholar] [CrossRef]
- Mescheder, L.; Nowozin, S.; Geiger, A. The numerics of gans. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Raghuvamsi, Y.; Teeparthi, K.; Kosana, V. Denoising autoencoder based topology identification in distribution systems with missing measurements. Int. J. Electr. Power Energy Syst. 2023, 154, 109464. [Google Scholar] [CrossRef]
- Kuppannagari, S.R.; Fu, Y.; Chueng, C.M.; Prasanna, V.K. Spatio-temporal missing data imputation for smart power grids. In Proceedings of the 12th ACM International Conference on Future Energy Systems, Virtual, 28 June–2 July 2021; pp. 458–465. [Google Scholar]
- Marco, R.; Syed Ahmad, S.S.; Ahmad, S. Missing Data Imputation Via Stacked Denoising Autoencoder Combined with Dropout Regularization Based Small Dataset in Software Effort Estimation. Int. J. Intell. Eng. Systems 2022, 15, 253–267. [Google Scholar]
- Park, K.; Jeong, J.; Kim, D.; Kim, H. Missing-insensitive short-term load forecasting leveraging autoencoder and LSTM. IEEE Access 2020, 8, 206039–206048. [Google Scholar] [CrossRef]
- Park, H.; Jeong, J.; Oh, K.W.; Kim, H. Autoencoder-Based Recommender System Exploiting Natural Noise Removal. IEEE Access 2023, 11, 30609–30618. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, H.; Xu, Z.; Gao, J.; Wu, Y.; Zhang, Z. Multivariate Time Series Imputation Based on Masked Autoencoding with Transformer. In Proceedings of the 2022 IEEE 24th International Conference on High Performance Computing & Communications; 8th International Conference on Data Science & Systems; 20th Int Conf on Smart City; 8th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Hainan, China, 18–20 December 2022; pp. 2110–2117. [Google Scholar]
- EnerNOC. EnerNOC Commerical Building Dataset. Available online: https://open-enernoc-data.s3.amazonaws.com/anon/index.html (accessed on 23 October 2023).
- Jeong, D.; Park, C.; Ko, Y.M. Missing data imputation using mixture factor analysis for building electric load data. Appl. Energy 2021, 304, 117655. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Steck, H. Autoencoders that don’t overfit towards the identity. Adv. Neural Inf. Process. Syst. 2020, 33, 19598–19608. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 1–12. [Google Scholar]
Mean (kW) | Std (kW) | Min (kW) | 25% (kW) | 50% (kW) | 75% (kW) | Max (kW) | |
---|---|---|---|---|---|---|---|
Dataset 14 | 872.042 | 354.4232 | 359.4548 | 577.4333 | 688.5184 | 1198.8815 | 2585.3501 |
Dataset 366 | 22.2583 | 6.0807 | 7.6144 | 16.7268 | 23.093 | 25.7975 | 43.1901 |
Dataset 716 | 313.5718 | 64.4742 | 85.8452 | 277.8806 | 316.9546 | 352.3403 | 507.441 |
Hyperparameters | Value |
---|---|
The number of hidden layers (L) | 2 |
The size of the input neurons | 96 |
The size of the hidden neurons | 64 |
The size of the latent features | 32 |
Learning rate | 0.0001 |
Dropout rate | 0.1 |
Metric | HA | LI | DAE | MAE | DMAE (Proposed) | |
---|---|---|---|---|---|---|
NRMSE | 10.03% | 3.56% | 6.17% | 4.08% | 3.67% | |
NMAE | 6.04% | 1.69% | 4.13% | 2.84% | 2.22% | |
NRMSE | 8.71% | 7.05% | 11.00% | 6.02% | 4.98% | |
NMAE | 5.43% | 3.76% | 8.16% | 3.73% | 2.86% | |
NRMSE | 8.16% | 10.31% | 14.31% | 6.63% | 5.40% | |
NMAE | 4.89% | 6.76% | 10.92% | 4.39% | 3.01% | |
NRMSE | 8.35% | 10.40% | 14.82% | 10.66% | 5.97% | |
NMAE | 5.35% | 7.73% | 11.59% | 6.04% | 3.27% |
Metric | HA | LI | DAE | MAE | DMAE (Proposed) | |
---|---|---|---|---|---|---|
NRMSE | 5.08% | 3.43% | 5.70% | 3.77% | 3.31% | |
NMAE | 2.67% | 2.67% | 4.51% | 2.79% | 2.35% | |
NRMSE | 5.31% | 5.49% | 12.96% | 3.96% | 3.48% | |
NMAE | 2.91% | 4.17% | 9.52% | 2.59% | 2.43% | |
NRMSE | 4.81% | 8.39% | 16.10% | 3.77% | 3.59% | |
NMAE | 3.01% | 6.61% | 12.53% | 2.69% | 2.59% | |
NRMSE | 5.28% | 10.99% | 20.96% | 11.19% | 4.81% | |
NMAE | 3.16% | 8.73% | 17.59% | 5.80% | 3.02% |
Metric | HA | LI | DAE | MAE | DMAE (Proposed) | |
---|---|---|---|---|---|---|
NRMSE | 8.32% | 3.58% | 7.62% | 7.04% | 4.21% | |
NMAE | 6.01% | 2.51% | 5.53% | 4.69% | 2.99% | |
NRMSE | 8.56% | 3.91% | 13.71% | 6.38% | 4.38% | |
NMAE | 5.41% | 2.89% | 11.01% | 3.92% | 3.32% | |
NRMSE | 7.52% | 5.42% | 19.51% | 5.82% | 5.03% | |
NMAE | 5.55% | 4.11% | 16.73% | 3.95% | 3.64% | |
NRMSE | 8.78% | 6.00% | 24.85% | 15.03% | 5.13% | |
NMAE | 5.98% | 4.59% | 21.95% | 7.46% | 3.76% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jeong, J.; Ku, T.-Y.; Park, W.-K. Denoising Masked Autoencoder-Based Missing Imputation within Constrained Environments for Electric Load Data. Energies 2023, 16, 7933. https://doi.org/10.3390/en16247933
Jeong J, Ku T-Y, Park W-K. Denoising Masked Autoencoder-Based Missing Imputation within Constrained Environments for Electric Load Data. Energies. 2023; 16(24):7933. https://doi.org/10.3390/en16247933
Chicago/Turabian StyleJeong, Jaeik, Tai-Yeon Ku, and Wan-Ki Park. 2023. "Denoising Masked Autoencoder-Based Missing Imputation within Constrained Environments for Electric Load Data" Energies 16, no. 24: 7933. https://doi.org/10.3390/en16247933
APA StyleJeong, J., Ku, T. -Y., & Park, W. -K. (2023). Denoising Masked Autoencoder-Based Missing Imputation within Constrained Environments for Electric Load Data. Energies, 16(24), 7933. https://doi.org/10.3390/en16247933