# Generating Energy Data for Machine Learning with Recurrent Generative Adversarial Networks

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background

#### 2.1. Generative Adversarial Networks

#### 2.2. Recurrent Neural Networks

## 3. Related Work

#### 3.1. Data Sets in Energy Forecasting

#### 3.2. Generative Adversarial Networks (GANs)

## 4. Methodology

#### 4.1. Data Pre-Processing

#### 4.1.1. ARIMA

#### 4.1.2. Fourier Transform

#### 4.1.3. Normalization

#### 4.1.4. Sliding Window

#### 4.2. R-GAN

#### 4.3. Evaluation Process

**Train on Synthetic, Test on Real (TSTR):**A prediction model is trained with synthetic data and tested on real data. TSTR was proposed by Esteban et al. [35]: they evaluated the GAN model on a clustering task using random forest classifier. In contrast, our study evaluates R-GAN on an energy forecasting task using an RNN forecasting model. Note that this forecasting RNN is different than RNNs used for the GAN generator and discriminator, and could be replaced by a different ML algorithm. RNN was selected because of its recent success in energy forecasting studies [26]. This forecasting RNN is trained with synthetic data and tested on real data.

**Train on Real, Test on Synthetic (TRTS):**This is the reverse of TSTR: a model is trained on the real data and tested on the synthetic data. The process is exactly the same as in TSTR with exception of reversed roles of synthetic and real data. TRTS serves as an evaluation of GAN’s ability to generate realistic looking data. Unlike TSTR, TRTS is not affected by the mode collapse as a limited diversity of synthetic data does not affect forecasting accuracy. As the aim is to generate data for training ML models, TSTR is a more significant metrics than TRTS.

**Train on Real, Test on Real (TRTR):**This is a traditional evaluation with the model trained and tested on the real data (with separate train and test sets). TRTR does not evaluate the synthetic data itself, but it allows for the comparison of accuracy achieved when a model is trained with real and with synthetic data. Low TRTR and TSTR accuracies indicate that the forecasting model is not capable of capturing variations in data and do not imply low quality of synthetic data. The goal of the presented R-GAN data generation is the TSTR value comparable to the TRTR value, regardless of their absolute values: this demonstrates that the model trained using synthetic data has similar abilities as the model trained with real data.

**Train on Synthetic, Test on Synthetic (TSTS):**Similar to TRTR, TSTS evaluates the ability of the forecasting model to capture variations in data: TRTR evaluates the accuracy with real data and TSTS with synthetic data. Large discrepancies between TRTR and TSTS indicate that the model is much better with real data than with synthetic, or the other way around. Consequently, this means that the synthetic data does not reassemble the real data.

## 5. Evaluation

#### 5.1. Data sets and Pre-Processing

#### 5.2. Experiments

- Number of layers $L=2$
- Cell state dimension size $c=128$
- Learning rate = $2\times {10}^{-6}$
- Batch size = 100
- Optimizer = Adam

- Core features only.
- Core and ARIMA generated features.
- Core and FT generated features.
- Core, ARIMA, and FT generated features.

- Hidden layer sizes: 32, 64, 128
- Number of layers: 1, 2
- Batch sizes: 1, 5, 10, 15, 30, 50
- Learning rates: continuous from 0.001 to 0.03

#### 5.3. Results and Discussion—UCI Data Set

#### 5.4. Results and Discussion—Building Genome Data Set

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

AMI | Advanced Metering Infrastructure |

ARIMA | AutoRegressive Integrated Moving Average |

C-RNN-GAN | Continuous recurrent GAN |

CNN | Convolutional Neural Network |

DL | Deep Learning |

EEG | electroencephalographic brain signal |

EEG-GAN | GAN for generating EEG |

ELU | Exponential Linear Unit |

FT | Fourier Transform |

LSTM | Long Short-Term Memory |

LR | Learning Rate |

GAN | Generative Adversarial Network |

GELU | Gaussian Error Linear Unit |

GRU | Gated Recurrent Unit |

MAPE | Mean Absolute Percentage Error |

MAE | Mean Absolute Error (MAE) |

ML | Machine Learning |

MH-GAN | Metropolis-Hastings GA |

NILM | Nonintrusive Load Monitoring |

R-GAN | Recurrent GAN |

ReLU | Rectified Linear Unit |

RNN | Recurrent Neural Network |

SMOTE | Synthetic Minority Over-sampling TEchnique |

SR | image superresolution (SR) |

SRGAN | GAN for image superresolution (SR) |

TRTR | Train on Real, Test on Real |

TRTS | Train on Real, Test Synthetic |

TSTR | Train on Synthetic, Test on Real |

TSTS | Train on Synthetic, Test on Synthetic |

WGAN | Wasserstein GAN |

WGAN-GP | Wasserstein GAN with gradient penalty |

## References

- Zhou, K.; Fu, C.; Yang, S. Big data driven smart energy management: From big data to big insights. Renew. Sustain. Energ Rev.
**2016**, 56, 215–225. [Google Scholar] [CrossRef] - Tian, Y.; Sehovac, L.; Grolinger, K. Similarity-based chained transfer learning for energy forecasting with Big Data. IEEE Access
**2019**, 7, 139895–139908. [Google Scholar] [CrossRef] - Sial, A.; Singh, A.; Mahanti, A.; Gong, M. Heuristics-Based Detection of Abnormal Energy Consumption. In Smart Grid and Innovative Frontiers in Telecommunications, Proceedings of the International Conference on Smart Grid Inspired Future Technologies, Auckland, New Zealand, 23–24 April 2018; Springer: Berlin, Germany, 2018; pp. 21–31. [Google Scholar]
- Sial, A.; Singh, A.; Mahanti, A. Detecting anomalous energy consumption using contextual analysis of smart meter data. In Wireless Networks; Springer: Berlin, Germany, 2019; pp. 1–18. [Google Scholar]
- Deb, C.; Frei, M.; Hofer, J.; Schlueter, A. Automated load disaggregation for residences with electrical resistance heating. Energy Build.
**2019**, 182, 61–74. [Google Scholar] [CrossRef] - Miller, C.; Meggers, F. The Building Data Genome Project: An open, public data set from non-residential building electrical meters. Energy Proc.
**2017**, 122, 439–444. [Google Scholar] [CrossRef] - Ratnam, E.L.; Weller, S.R.; Kellett, C.M.; Murray, A.T. Residential load and rooftop PV generation: An Australian distribution network dataset. Int. J. Sustain. Energy
**2017**, 36, 787–806. [Google Scholar] [CrossRef] - Wang, Y.; Chen, Q.; Hong, T.; Kang, C. Review of smart meter data analytics: Applications, methodologies, and challenges. IEEE Trans. Smart Grid
**2018**, 10, 3125–3148. [Google Scholar] [CrossRef] [Green Version] - Guan, Z.; Li, J.; Wu, L.; Zhang, Y.; Wu, J.; Du, X. Achieving Efficient and Secure Data Acquisition for Cloud-Supported Internet of Things in Smart Grid. IEEE Internet Things
**2017**, 4, 1934–1944. [Google Scholar] [CrossRef] [Green Version] - Liu, Y.; Guo, W.; Fan, C.I.; Chang, L.; Cheng, C. A practical privacy-preserving data aggregation (3PDA) scheme for smart grid. IEEE Trans. Ind. Inform.
**2018**, 15, 1767–1774. [Google Scholar] [CrossRef] - Fan, C.; Xiao, F.; Li, Z.; Wang, J. Unsupervised data analytics in mining big building operational data for energy efficiency enhancement: A review. Energy Build.
**2018**, 159, 296–308. [Google Scholar] [CrossRef] - Genes, C.; Esnaola, I.; Perlaza, S.M.; Ochoa, L.F.; Coca, D. Robust recovery of missing data in electricity distribution systems. IEEE Trans. Smart Grid
**2018**, 10, 4057–4067. [Google Scholar] [CrossRef] [Green Version] - Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D.N. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5907–5915. [Google Scholar]
- Cai, J.; Hu, H.; Shan, S.; Chen, X. FCSR-GAN: End-to-end Learning for Joint Face Completion and Super-resolution. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, Lille, France, 14–18 May 2019; pp. 1–8. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv
**2017**, arXiv:1701.07875. [Google Scholar] - Turner, R.; Hung, J.; Frank, E.; Saatchi, Y.; Yosinski, J. Metropolis-Hastings Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning, Siena, Italy, 10–13 September 2019; pp. 6345–6353. [Google Scholar]
- Xu, T.; Zhang, P.; Huang, Q.; Zhang, H.; Gan, Z.; Huang, X.; He, X. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1316–1324. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
- Denton, E.L.; Chintala, S.; Fergus, R. Deep generative image models using a laplacian pyramid of adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 1486–1494. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
- Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Pascanu, R.; Gulcehre, C.; Cho, K.; Bengio, Y. How to construct deep recurrent neural networks. In Proceedings of the Second International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Amasyali, K.; El-Gohary, N.M. A review of data-driven building energy consumption prediction studies. Renew. Sustain. Energ Rev.
**2018**, 81, 1192–1205. [Google Scholar] [CrossRef] - Sehovac, L.; Nesen, C.; Grolinger, K. Forecasting Building Energy Consumption with Deep Learning: A Sequence to Sequence Approach. In Proceedings of the IEEE International Congress on Internet of Things, Aarhus, Denmark, 17–21 June 2019. [Google Scholar]
- Deb, C.; Zhang, F.; Yang, J.; Lee, S.E.; Shah, K.W. A review on time series forecasting techniques for building energy consumption. Renew. Sustain. Energ Rev.
**2017**, 74, 902–924. [Google Scholar] [CrossRef] - Lazos, D.; Sproul, A.B.; Kay, M. Optimisation of energy management in commercial buildings with weather forecasting inputs: A review. Renew. Sustain. Energ Rev.
**2014**, 39, 587–603. [Google Scholar] [CrossRef] - Pillai, G.G.; Putrus, G.A.; Pearsall, N.M. Generation of synthetic benchmark electrical load profiles using publicly available load and weather data. Int. J. Electr. Power
**2014**, 61, 1–10. [Google Scholar] [CrossRef] [Green Version] - Ngoko, B.; Sugihara, H.; Funaki, T. Synthetic generation of high temporal resolution solar radiation data using Markov models. Sol. Energy
**2014**, 103, 160–170. [Google Scholar] [CrossRef] - Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Zhang, H.; Sindagi, V.; Patel, V.M. Image de-raining using a conditional generative adversarial network. IEEE Trans. Circuits Syst. Video Technol.
**2019**. [Google Scholar] [CrossRef] [Green Version] - Mogren, O. C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv
**2016**, arXiv:1611.09904. [Google Scholar] - Yu, L.; Zhang, W.; Wang, J.; Yu, Y. SeqGAN: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (medical) time series generation with recurrent conditional GANs. arXiv
**2017**, arXiv:1706.02633. [Google Scholar] - Hartmann, K.G.; Schirrmeister, R.T.; Ball, T. EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals. arXiv
**2018**, arXiv:1806.01875. [Google Scholar] - Douzas, G.; Bacao, F. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst. Appl.
**2018**, 91, 464–471. [Google Scholar] [CrossRef] - Li, S.C.X.; Jiang, B.; Marlin, B. MisGAN: Learning from Incomplete Data with Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Shang, C.; Palmer, A.; Sun, J.; Chen, K.S.; Lu, J.; Bi, J. VIGAN: Missing view imputation with generative adversarial networks. In Proceedings of the IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017; pp. 766–775. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2234–2242. [Google Scholar]
- Lucic, M.; Kurach, K.; Michalski, M.; Gelly, S.; Bousquet, O. Are GANs created equal? A large-scale study. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 700–709. [Google Scholar]
- Sajjadi, M.S.; Bachem, O.; Lucic, M.; Bousquet, O.; Gelly, S. Assessing generative models via precision and recall. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 5228–5237. [Google Scholar]
- Theis, L.; van den Oord, A.; Bethge, M. A note on the evaluation of generative models. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016; pp. 1–10. [Google Scholar]
- Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
- Dan Hendrycks, K.G. Gaussian Error Linear Units (GELUs). arXiv
**2018**, arXiv:1606.08415. [Google Scholar] - Candanedo, L.M.; Feldheim, V.; Deramaix, D. Data driven prediction models of energy use of appliances in a low-energy house. Energy Build.
**2017**, 140, 81–97. [Google Scholar] [CrossRef] - Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On large-batch training for deep learning: Generalization gap and sharp minima. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Networks Learn. Syst.
**2016**, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bergstra, J.S.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 12–14 December 2011; pp. 2546–2554. [Google Scholar]
- Grolinger, K.; L’Heureux, A.; Capretz, M.A.; Seewald, L. Energy forecasting for event venues: Big data and prediction accuracy. Energy Build.
**2016**, 112, 222–233. [Google Scholar] [CrossRef] [Green Version] - Ribeiro, M.; Grolinger, K.; ElYamany, H.F.; Higashino, W.A.; Capretz, M.A. Transfer learning with seasonal and trend adjustment for cross-building energy forecasting. Energy Build.
**2018**, 165, 352–363. [Google Scholar] [CrossRef] - Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - Farrar, D.E.; Glauber, R.R. Multicollinearity in regression analysis: The problem revisited. Rev. Econ. Stat.
**1967**, 49, 92–107. [Google Scholar] [CrossRef] - Katrutsa, A.; Strijov, V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria. Expert Syst. Appl.
**2017**, 76, 1–11. [Google Scholar] [CrossRef] - Vatcheva, K.P.; Lee, M.; McCormick, J.B.; Rahbar, M.H. Multicollinearity in regression analyses conducted in epidemiologic studies. Epidemiology
**2016**, 6. [Google Scholar] [CrossRef] [Green Version]

**Figure 9.**Generator and discriminator loss for LR = $2\times {10}^{-6}$, $2\times {10}^{-4}$, and $2\times {10}^{-4}$ (UCI data set).

**Figure 13.**Mean Absolute Percentage Error (MAPE)(%) comparison between TRTS/TSTS and TSTR/TRTR (Building Genome data set).

**Figure 14.**Mean Absolute Error (MAE) comparison between TRTS/TSTS and TSTR/TRTR (Building Genome data set).

**Table 1.**Train on Real, Test on Synthetic (TRTS), Train on Real, Test on Real (TRTR), Train on Synthetic, Test on Real (TSTR), and Train on Synthetic, Test on Synthetic (TSTS) accuracy for R-GAN (UCI data set).

Features | MAPE (%) | MAE | ||||||
---|---|---|---|---|---|---|---|---|

TRTS | TRTR | TSTR | TSTS | TRTS | TRTR | TSTR | TSTS | |

Core features | 13.60% | 17.98% | 18.67% | 18.80% | 54.26 | 63.82 | 62.74 | 90.90 |

Core and ARIMA features | 8.65% | 11.43% | 11.37% | 8.92% | 48.14 | 62.67 | 54.00 | 80.00 |

Core and FT features | 9.07% | 15.84% | 17.79% | 15.10% | 48.99 | 63.12 | 61.74 | 90.67 |

Core, ARIMA and FT features | 5.28% | 10.81% | 10.12% | 6.80% | 46.41 | 62.27 | 52.54 | 78.35 |

Model (Real vs. Synthetic) | Kruskal-Wallis H Test | Mann-Whitney U Test | ||
---|---|---|---|---|

H Value | p Value | U Value | p Value | |

Core features | 416.50 | 0.312 | 0.247 | 0.619 |

Core and ARIMA features | 428.00 | 0.375 | 0.107 | 0.744 |

Core and FT features | 390.50 | 0.191 | 0.775 | 0.375 |

Core, ARIMA, and FT features | 380.50 | 0.180 | 0.885 | 0.355 |

Features | MAPE (%) | MAE | ||||||
---|---|---|---|---|---|---|---|---|

TRTS | TRTR | TSTR | TSTS | TRTS | TRTR | TSTR | TSTS | |

Core features | 6.16% | 5.13% | 6.48% | 4.65% | 48.98 | 46.88 | 49.00 | 45.54 |

Core and ARIMA features | 10.37% | 10.54% | 11.89% | 9.84% | 61.38 | 62.15 | 64.16 | 59.2 |

Core and FT features | 4.16% | 4.86% | 5.49% | 3.88% | 44.13 | 44.46 | 45.12 | 43.84 |

Core, ARIMA and FT features | 6.76% | 6.77% | 7.37% | 6.5% | 50.03 | 50.83 | 51.33 | 50.00 |

Model (Real vs. Synthetic) | Kruskal-Wallis H Test | Mann-Whitney U Test | ||
---|---|---|---|---|

H Value | p Value | U Value | p Value | |

Core features | 430.0 | 0.387 | 0.087 | 0.767 |

Core and ARIMA features | 434.00 | 0.409 | 0.056 | 0.813 |

Core and FT features | 403.50 | 0.248 | 0.473 | 0.492 |

Core, ARIMA, and FT features | 433.00 | 0.404 | 0.063 | 0.802 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fekri, M.N.; Ghosh, A.M.; Grolinger, K.
Generating Energy Data for Machine Learning with Recurrent Generative Adversarial Networks. *Energies* **2020**, *13*, 130.
https://doi.org/10.3390/en13010130

**AMA Style**

Fekri MN, Ghosh AM, Grolinger K.
Generating Energy Data for Machine Learning with Recurrent Generative Adversarial Networks. *Energies*. 2020; 13(1):130.
https://doi.org/10.3390/en13010130

**Chicago/Turabian Style**

Fekri, Mohammad Navid, Ananda Mohon Ghosh, and Katarina Grolinger.
2020. "Generating Energy Data for Machine Learning with Recurrent Generative Adversarial Networks" *Energies* 13, no. 1: 130.
https://doi.org/10.3390/en13010130