# A Two-Stage Multistep-Ahead Electricity Load Forecasting Scheme Based on LightGBM and Attention-BiLSTM

^{*}

## Abstract

**:**

## 1. Introduction

- We present a forecasting model that combines an ensemble learning method and an RNN for accurate MSA forecasting;
- We show that the performance of an MSA forecasting model can be further improved by considering the prediction result of a single-output forecasting model;
- The proposed model shows very stable forecasting accuracy over the entire forecasting horizon of 96 time points at 15 min intervals.

## 2. Related Works

## 3. Data Collection and Preprocessing

#### 3.1. Weather Data

#### 3.2. Calendar Information and Historical Electricity Load

## 4. Methodology

#### 4.1. Single-Output Forecasting

#### 4.1.1. LightGBM

#### 4.1.2. Time Series Cross-Validation

#### 4.2. Attention-BiLSTM Based MSA Forecasting

#### 4.2.1. Bidirectional Long Short-Term Memory

#### 4.2.2. Sequence-to-Sequence Recurrent Neural Networks

#### 4.2.3. Attention Mechanism

## 5. Results and Discussion

**,**as given in Equations (19)–(22). Here, ${A}_{t}$ and ${F}_{t}$ represent the actual and forecasted values, respectively, at time $t$. $n$ indicates the number of observations, and $\overline{A}$ represents the mean of the actual values.

#### 5.1. Single-Output Forecasting Results

#### 5.2. Multistep-Ahead Forecasting Results

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

ICT | information and communications technology |

STLF | Short-term load forecasting |

AI | artificial intelligence |

MSA | multistep-ahead |

RNN | recurrent neural network |

LSTM | long short-term memory |

GRU | gated recurrent unit |

S2S | sequence-to-sequence |

ANN | artificial neural network |

ARIMA | autoregressive integrated moving average |

MLR | multiple linear regression |

PCR | principal component regression |

PCC | Pearson correlation coefficient |

WCI | windchill index |

GBM | gradient boosting machine |

ReLU | rectified linear unit |

MAE | mean absolute error |

RMSE | root mean square error |

NN | neural network |

DARNN | dual-stage attention-based recurrent neural network |

ATT-GRU | attention-based gated recurrent unit |

LightGBM | light gradient boosting machine |

TSCV | Time series cross-validation |

BiLSTM | bidirectional long short-term memory |

ATT-BiLSTM | bidirectional long short-term memory with attention mechanism |

SVR | support vector regression |

RF | random forest |

FIR | fuzzy inductive reasoning |

MLP | multilayer perceptron |

CNN | convolutional neural network |

XGB | extreme gradient boosting |

ML | machine learning |

KMA | Korea meteorological administration |

DI | discomfort index |

GBDT | gradient boosting decision tree |

BA | Bahdanau attention mechanism |

Adam | adaptive moment estimation |

MAPE | mean absolute percentage error |

NRMSE | normalized root mean square error |

RICNN | recurrent inception convolution neural network |

DALSTM | dual-stage attentional long short-term memory |

COSMOS | combination of short-term load forecasting models using a stacking ensemble approach |

## References

- Atef, S.; Eltawil, A.B. Assessment of stacked unidirectional and bidirectional long short-term memory networks for electricity load forecasting. Electr. Power Syst. Res.
**2020**, 187, 106489. [Google Scholar] [CrossRef] - Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast.
**2016**, 32, 914–938. [Google Scholar] [CrossRef] - Li, B.W.; Zhang, J.; He, Y.; Wang, Y. Short-Term Load-Forecasting Method Based on Wavelet Decomposition with Second-Order Gray Neural Network Model Combined with ADF Test. IEEE Access
**2017**, 5, 16324–16331. [Google Scholar] [CrossRef] - Rana, M.; Koprinska, I. Forecasting electricity load with advanced wavelet neural networks. Neurocomputing
**2016**, 182, 118–132. [Google Scholar] [CrossRef] - Dong, Y.X.; Ma, X.J.; Fu, T.L. Electrical load forecasting: A deep learning approach based on K-nearest neighbors. Appl. Soft Comput.
**2021**, 99, 106900. [Google Scholar] [CrossRef] - Dodamani, S.; Shetty, V.; Magadum, R. Short term load forecast based on time series analysis: A case study. In Proceedings of the 2015 International Conference on Technological Advancements in Power and Energy (TAP Energy), Kollam, India, 24–26 June 2015; pp. 299–303. [Google Scholar]
- Song, K.-B.; Baek, Y.-S.; Hong, D.H.; Jang, G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans. Power Syst.
**2005**, 20, 96–101. [Google Scholar] [CrossRef] - Taylor, J.W.; McSharry, P.E. Short-term load forecasting methods: An evaluation based on European data. IEEE Trans. Power Syst.
**2007**, 22, 2213–2219. [Google Scholar] [CrossRef] [Green Version] - Kelo, S.; Dudul, S. A wavelet Elman neural network for short-term electrical load prediction under the influence of temperature. Int. J. Electr. Power Energy Syst.
**2012**, 43, 1063–1071. [Google Scholar] [CrossRef] - Zhang, Z.C.; Hong, W.C.; Li, J.C. Electric Load Forecasting by Hybrid Self-Recurrent Support Vector Regression Model with Variational Mode Decomposition and Improved Cuckoo Search Algorithm. IEEE Access
**2020**, 8, 14642–14658. [Google Scholar] [CrossRef] - Chen, Y.B.; Xu, P.; Chu, Y.Y.; Li, W.L.; Wu, Y.T.; Ni, L.Z.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy
**2017**, 195, 659–670. [Google Scholar] [CrossRef] - Yu, F.; Xu, X.Z. A short-term load forecasting model of natural gas based on optimized genetic algorithm and improved BP neural network. Appl. Energy
**2014**, 134, 102–113. [Google Scholar] [CrossRef] - Yeom, C.U.; Kwak, K.C. Short-Term Electricity-Load Forecasting Using a TSK-Based Extreme Learning Machine with Knowledge Representation. Energies
**2017**, 10, 1613. [Google Scholar] [CrossRef] - Liu, T.X.; Zhao, Q.J.; Wang, J.Z.; Gao, Y.Y. A novel interval forecasting system for uncertainty modeling based on multi-input multi-output theory: A case study on modern wind stations. Renew. Energy
**2021**, 163, 88–104. [Google Scholar] [CrossRef] - Pei, S.Q.; Qin, H.; Yao, L.Q.; Liu, Y.Q.; Wang, C.; Zhou, J.Z. Multi-Step Ahead Short-Term Load Forecasting Using Hybrid Feature Selection and Improved Long Short-Term Memory Network. Energies
**2020**, 13, 4121. [Google Scholar] [CrossRef] - Sehovac, L.; Nesen, C.; Grolinger, K. Forecasting building energy consumption with deep learning: A sequence to sequence approach. In Proceedings of the 2019 IEEE International Congress on Internet of Things (ICIOT), Milan, Italy, 8–13 July 2019; pp. 108–116. [Google Scholar]
- Jarábek, T.; Laurinec, P.; Lucká, M. Energy load forecast using S2S deep neural networks with k-Shape clustering. In Proceedings of the 2017 IEEE 14th International Scientific Conference on Informatics, Poprad, Slovakia, 14–16 November 2017; pp. 140–145. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv
**2014**, arXiv:1409.0473. [Google Scholar] - Luong, M.-T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv
**2015**, arXiv:1508.04025. [Google Scholar] - Sehovac, L.; Grolinger, K. Deep Learning for Load Forecasting: Sequence to Sequence Recurrent Neural Networks with Attention. IEEE Access
**2020**, 8, 36411–36426. [Google Scholar] [CrossRef] - Gollou, A.R.; Ghadimi, N. A new feature selection and hybrid forecast engine for day-ahead price forecasting of electricity markets. J. Intell. Fuzzy Syst.
**2017**, 32, 4031–4045. [Google Scholar] [CrossRef] - Jalili, A.; Ghadimi, N. Hybrid Harmony Search Algorithm and Fuzzy Mechanism for Solving Congestion Management Problem in an Electricity Market. Complexity
**2016**, 21, 90–98. [Google Scholar] [CrossRef] - Fan, G.F.; Peng, L.L.; Hong, W.C.; Sun, F. Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression. Neurocomputing
**2016**, 173, 958–970. [Google Scholar] [CrossRef] - Grolinger, K.; L’Heureux, A.; Capretz, M.A.M.; Seewald, L. Energy Forecasting for Event Venues: Big Data and Prediction Accuracy. Energy Build.
**2016**, 112, 222–233. [Google Scholar] [CrossRef] [Green Version] - Jurado, S.; Nebot, A.; Mugica, F.; Avellana, N. Hybrid methodologies for electricity load forecasting: Entropy-based feature selection with machine learning and soft computing techniques. Energy
**2015**, 86, 276–291. [Google Scholar] [CrossRef] [Green Version] - Zhang, X.B.; Wang, J.Z.; Zhang, K.Q. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm. Electr. Power Syst. Res.
**2017**, 146, 270–285. [Google Scholar] [CrossRef] - Zheng, J.; Xu, C.; Zhang, Z.; Li, X. Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network. In Proceedings of the 2017 51st Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 22–24 March 2017; pp. 1–6. [Google Scholar]
- Marino, D.L.; Amarasinghe, K.; Manic, M. Building energy load forecasting using deep neural networks. In Proceedings of the IECON 2016-42nd Annual Conference of the IEEE Industrial Electronics Society, Florence, Italy, 23–26 October 2016; pp. 7046–7051. [Google Scholar]
- Kim, J.; Moon, J.; Hwang, E.; Kang, P. Recurrent inception convolution neural network for multi short-term load forecasting. Energy Build.
**2019**, 194, 328–341. [Google Scholar] [CrossRef] - Jung, S.; Moon, J.; Park, S.; Hwang, E. An Attention-Based Multilayer GRU Model for Multistep-Ahead Short-Term Load Forecasting (dagger). Sensors
**2021**, 21, 1639. [Google Scholar] [CrossRef] [PubMed] - Kuo, P.-H.; Huang, C.-J. A high precision artificial neural networks model for short-term energy load forecasting. Energies
**2018**, 11, 213. [Google Scholar] [CrossRef] [Green Version] - Park, S.; Moon, J.; Jung, S.; Rho, S.; Baik, S.W.; Hwang, E. A Two-Stage Industrial Load Forecasting Scheme for Day-Ahead Combined Cooling, Heating and Power Scheduling. Energies
**2020**, 13, 443. [Google Scholar] [CrossRef] [Green Version] - Siridhipakul, C.; Vateekul, P. Multi-step power consumption forecasting in Thailand using dual-stage attentional LSTM. In Proceedings of the 2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE), Pattaya, Thailand, 10–11 October 2019; pp. 1–6. [Google Scholar]
- Moon, J.; Jung, S.; Rew, J.; Rho, S.; Hwang, E. Combination of short-term load forecasting models based on a stacking ensemble approach. Energy Build.
**2020**, 216, 109921. [Google Scholar] [CrossRef] - Nie, H.; Liu, G.; Liu, X.; Wang, Y. Hybrid of ARIMA and SVMs for short-term load forecasting. Energy Procedia
**2012**, 16, 1455–1460. [Google Scholar] [CrossRef] [Green Version] - Tian, C.; Ma, J.; Zhang, C.; Zhan, P. A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies
**2018**, 11, 3493. [Google Scholar] [CrossRef] [Green Version] - Xie, Y.; Ueda, Y.; Sugiyama, M. A Two-Stage Short-Term Load Forecasting Method Using Long Short-Term Memory and Multilayer Perceptron. Energies
**2021**, 14, 5873. [Google Scholar] [CrossRef] - Oliveira, M.O.; Marzec, D.P.; Bordin, G.; Bretas, A.S.; Bernardon, D. Climate change effect on very short-term electric load forecasting. In Proceedings of the 2011 IEEE Trondheim PowerTech, Trondheim, Norway, 19–23 June 2011; pp. 1–7. [Google Scholar]
- Park, J.; Moon, J.; Jung, S.; Hwang, E. Multistep-Ahead Solar Radiation Forecasting Scheme Based on the Light Gradient Boosting Machine: A Case Study of Jeju Island. Remote Sens.
**2020**, 12, 2271. [Google Scholar] [CrossRef] - Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst.
**2017**, 30, 3146–3154. [Google Scholar] - Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst.
**2020**, 36, 1984–1997. [Google Scholar] [CrossRef] - Park, S.; Jung, S.; Jung, S.; Rho, S.; Hwang, E. Sliding window-based LightGBM model for electric load forecasting using anomaly repair. J. Supercomput.
**2021**, 77, 12857–12878. [Google Scholar] [CrossRef] - Huang, H.; Jia, R.; Liang, J.; Dang, J.; Wang, Z. Wind Power Deterministic Prediction and Uncertainty Quantification Based on Interval Estimation. J. Sol. Energy Eng.
**2021**, 143, 061010. [Google Scholar] [CrossRef] - De Livera, A.M.; Hyndman, R.J.; Snyder, R.D. Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc.
**2011**, 106, 1513–1527. [Google Scholar] [CrossRef] [Green Version] - Moon, J.; Kim, Y.; Son, M.; Hwang, E. Hybrid Short-Term Load Forecasting Scheme Using Random Forest and Multilayer Perceptron. Energies
**2018**, 11, 3283. [Google Scholar] [CrossRef] [Green Version] - Werbos, P.J. Backpropagation through Time-What It Does and How to Do It. Proc. IEEE
**1990**, 78, 1550–1560. [Google Scholar] [CrossRef] [Green Version] - Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. Proc. Int. Conf. Mach. Learn.
**2013**, 28, 1310–1318. [Google Scholar] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Robinson, A.J. An Application of Recurrent Nets to Phone Probability Estimation. IEEE T Neural Netw.
**1994**, 5, 298–305. [Google Scholar] [CrossRef] [PubMed] - Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process.
**1997**, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version] - Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. arXiv
**2014**, arXiv:1409.3215. [Google Scholar] - Nair, V.; Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the Icml, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 492–518. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. arXiv
**2019**, arXiv:1912.01703. [Google Scholar]

Input Variable Identifier | Description (Type) | Input Variable Identifier | Description (Type) |
---|---|---|---|

No.01 | $\mathrm{Month}$ (numeric) | No.11 | $\mathrm{Windchill}\text{}\mathrm{index}$ (numeric) |

No.02 | $\mathrm{Day}$ (numeric) | No.12 | $\mathrm{Discomfort}\text{}\mathrm{index}$ (numeric) |

No.03 | $\mathrm{Hour}$ (numeric) | No.13 | $\mathrm{D}-7\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.04 | $\mathrm{Min}$ (numeric) | No.14 | $\mathrm{D}-6\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.05 | $\mathrm{Day}\text{}\mathrm{of}\text{}\mathrm{the}\text{}\mathrm{week}$ (numeric) | No.15 | $\mathrm{D}-5\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.06 | $\mathrm{Holiday}$ (binary) | No.16 | $\mathrm{D}-4\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.07 | $\mathrm{Temperature}$ (numeric) | No.17 | $\mathrm{D}-3\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.08 | $\mathrm{Humidity}$ (numeric) | No.18 | $\mathrm{D}-2\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.09 | $\mathrm{Wind}\text{}\mathrm{speed}$ (numeric) | No.19 | $\mathrm{D}-1\text{}\mathrm{same}\text{}\mathrm{point}\text{}\mathrm{load}$ (numeric) |

No.10 | $\mathrm{Wind}\text{}\mathrm{direction}$ (numeric) |

Cluster A | Cluster B | Cluster C | Cluster D | |||||
---|---|---|---|---|---|---|---|---|

Training Set | Test Set | Training Set | Test Set | Training Set | Test Set | Training Set | Test Set | |

Mean | 656.499 | 586.247 | 623.109 | 670.012 | 302.790 | 322.904 | 515.166 | 489.349 |

Standard error | 0.954 | 1.418 | 0.587 | 0.929 | 0.205 | 0.323 | 0.35 | 0.515 |

Median | 553.4 | 462.7 | 543.8 | 593.3 | 292.3 | 311.8 | 478.2 | 447.3 |

Mode | 271.2 | 265.9 | 454.1 | 518.4 | 259.9 | 275.8 | 413.4 | 414.9 |

Standard deviation | 357.686 | 333.674 | 220.187 | 218.575 | 76.902 | 76.172 | 119.92 | 121.328 |

Sample variance | 127,939.5 | 111,338.3 | 48,482.45 | 47,775.23 | 5914.042 | 5802.298 | 14,380.87 | 14,720.47 |

Kurtosis | −0.721 | −0.758 | 0.352 | 0.434 | 0.375 | 0.233 | −0.597 | −0.225 |

Skewness | 0.676 | 0.736 | 1.076 | 1.096 | 0.689 | 0.716 | 0.673 | 0.871 |

Range | 1529.7 | 1350.2 | 1104 | 1017.1 | 541.8 | 476.9 | 586.8 | 556.5 |

Minimum | 195.4 | 181.4 | 296.2 | 383.2 | 114.1 | 130.7 | 300.6 | 292.5 |

Maximum | 1725.1 | 1531.6 | 1400.2 | 1400.3 | 655.9 | 607.6 | 887.4 | 849 |

Sum | 92,204,063 | 32,417,138 | 87,514,440 | 37,049,022 | 42,526,339 | 17,855,317 | 60,336,352 | 27,059,063 |

Count | 140,448 | 55,296 | 140,448 | 55,296 | 140,448 | 55,296 | 117,120 | 55,296 |

**Table 3.**Selected hyperparameters for each single-output forecasting model. Selected values are bold.

Model | Cluster A | Cluster B | Cluster C | Cluster D |
---|---|---|---|---|

LightGBM | Learning rate: | Learning rate: | Learning rate: | Learning rate: |

0.01, 0.05, 0.1 | 0.01, 0.05, 0.1 | 0.01, 0.05, 0.1 | 0.01, 0.05, 0.1 | |

No. of iterations: 500, 1000 | No. of iterations: 500, 1000 | No. of iterations: 500, 1000 | No. of iterations: 500, 1000 | |

No. of leaves: 64 | No. of leaves: 64 | No. of leaves: 64 | No. of leaves: 64 | |

Subsample: 0.5, 1.0 | Subsample: 0.5, 1.0 | Subsample: 0.5, 1.0 | Subsample: 0.5, 1.0 | |

XGBoost | Learning rate: 0.01, 0.05, 0.1 | Learning rate: 0.01, 0.05, 0.1 | Learning rate: 0.01, 0.05, 0.1 | Learning rate: 0.01, 0.05, 0.1 |

No. of iterations: 500, 1000 | No. of iterations: 500, 1000 | No. of iterations: 500, 1000 | No. of iterations: 500, 1000 | |

Subsample: 0.5, 1.0 | Subsample: 0.5, 1.0 | Subsample: 0.5, 1.0 | Subsample: 0.5, 1.0 | |

Colsample by tree: | Colsample by tree: | Colsample by tree: | Colsample by tree: | |

0.5, 1.0 | 0.5, 1.0 | 0.5, 1.0 | 0.5, 1.0 | |

NGBoost | No. of iterations: 500, 1000, 1500 | No. of iterations: 500, 1000, 1500 | No. of iterations: 500, 1000, 1500 | No. of iterations: 500, 1000, 1500 |

Random Forest | No. of trees: 64, 128 | No. of trees: 64, 128 | No. of trees: 64, 128 | No. of trees: 64, 128 |

Random state: 32, 64 | Random state: 32, 64 | Random state: 32, 64 | Random state: 32, 64 | |

MLP | No. of layers: 4, 5, 6, 7 | No. of layers: 4, 5, 6, 7 | No. of layers: 4, 5, 6, 7 | No. of layers: 4, 5, 6, 7 |

Activation function: ReLU | Activation function: ReLU | Activation function: ReLU | Activation function: ReLU | |

Optimizer: Adam | Optimizer: Adam | Optimizer: Adam | Optimizer: Adam | |

Learning rate: 0.001 | Learning rate: 0.001 | Learning rate: 0.001 | Learning rate: 0.001 |

Evaluation Metric | Model | Cluster A | Cluster B | Cluster C | Cluster D |
---|---|---|---|---|---|

MAPE (%) | LightGBM | 7.01 | 4.74 | 6.98 | 4.98 |

MLP | 12.06 | 7.03 | 7.61 | 5.67 | |

RF | 7.53 | 4.98 | 7.24 | 5.38 | |

XGBoost | 7.22 | 5.12 | 7.52 | 5.29 | |

NGBoost | 9.64 | 5.44 | 7.73 | 5.90 | |

MAE (kWh) | LightGBM | 40.71 | 32.00 | 22.18 | 23.47 |

MLP | 68.59 | 45.95 | 23.10 | 27.30 | |

RF | 43.81 | 34.09 | 23.46 | 25.49 | |

XGBoost | 41.81 | 32.81 | 24.76 | 24.81 | |

NGBoost | 54.25 | 37.04 | 24.85 | 29.21 | |

RMSE (kWh) | LightGBM | 61.31 | 49.01 | 30.81 | 36.26 |

MLP | 119.07 | 75.39 | 32.5 | 41.11 | |

RF | 67.2 | 54.73 | 32.89 | 40.51 | |

XGBoost | 63.58 | 48.34 | 34.39 | 37.16 | |

NGBoost | 82.05 | 59.1 | 34.43 | 44.17 | |

NRMSE (%) | LightGBM | 9.37 | 7.63 | 9.78 | 7.25 |

MLP | 18.91 | 11.74 | 10.32 | 8.22 | |

RF | 10.67 | 8.52 | 10.45 | 8.10 | |

XGBoost | 10.09 | 7.53 | 10.92 | 7.43 | |

NGBoost | 13.03 | 9.21 | 10.93 | 8.83 |

Cluster A | Cluster B | Cluster C | Cluster D |
---|---|---|---|

0.984 | 0.978 | 0.923 | 0.957 |

Model | Package | Selected Hyperparameters |
---|---|---|

LightGBM | LightGBM Scikit-learn | Learning rate: 0.05 |

No. of iterations: 1000 | ||

No. of leaves: 32 | ||

Subsample: 0.5 | ||

Random Forest | Scikit-learn | No. of trees: 128 |

Random state: 64 | ||

S2S BiLSTM | Pytorch | No. of hidden nodes: 15 |

No. of hidden layers: 2 | ||

Activation function: ReLU | ||

Optimizer: Adam | ||

Learning rate: 0.001 | ||

No. of epochs: 350 | ||

S2S ATT-BiLSTM | Pytorch | No. of hidden nodes: 15 |

No. of hidden layers: 2 | ||

Activation function: ReLU | ||

Optimizer: Adam | ||

Learning rate: 0.001 | ||

No. of epochs: 350 | ||

ATT-GRU [30] | Pytorch | No. of hidden nodes: 15 |

No. of hidden layers: 2 | ||

Activation function: SELU | ||

Optimizer: Adam | ||

Learning rate: 0.001 | ||

No. of epochs: 150 | ||

DALSTM [33] Stage 1: LSTM Stage 2: DARNN | Pytorch | LSTM |

No. of hidden nodes: 15 | ||

No. of hidden layers: 2 | ||

Activation function: ReLU | ||

Optimizer: Adam | ||

Learning rate: 0.001 | ||

No. of epochs: 350 | ||

DARNN | ||

No. of hidden nodes: 64 | ||

Time steps: 96 | ||

Optimizer: Adam | ||

Learning rate: 0.001 | ||

No. of epochs: 150 | ||

COSMOS [34] Stage 1: MLP Stage 2: PCR | Scikit-learn | MLP |

No. of hidden nodes: 15 | ||

No. of hidden layers: 4, 5, 6, 7 | ||

Activation function: ReLU | ||

Optimizer: Adam | ||

Learning rate: 0.001 | ||

No. of epochs: 150 | ||

PCR | ||

Principal components: 1 | ||

Sliding window size: 672 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Park, J.; Hwang, E.
A Two-Stage Multistep-Ahead Electricity Load Forecasting
Scheme Based on LightGBM and Attention-BiLSTM. *Sensors* **2021**, *21*, 7697.
https://doi.org/10.3390/s21227697

**AMA Style**

Park J, Hwang E.
A Two-Stage Multistep-Ahead Electricity Load Forecasting
Scheme Based on LightGBM and Attention-BiLSTM. *Sensors*. 2021; 21(22):7697.
https://doi.org/10.3390/s21227697

**Chicago/Turabian Style**

Park, Jinwoong, and Eenjun Hwang.
2021. "A Two-Stage Multistep-Ahead Electricity Load Forecasting
Scheme Based on LightGBM and Attention-BiLSTM" *Sensors* 21, no. 22: 7697.
https://doi.org/10.3390/s21227697