Electrical Load Forecasting in the Industrial Sector: A Literature Review of Machine Learning Models and Architectures for Grid Planning
Abstract
1. Introduction
2. Methodology of Literature Review
2.1. Research Questions
2.2. Search Strategy
2.3. Inclusion and Exclusion Criteria
- Inclusion Criteria
- I1.
- Papers published in the last 15 years (2010–2025) are taken into account to make sure we stick to the recent developments in the industry.
- I2.
- Papers written only in English are considered.
- I3.
- A minimum threshold of more than 5 cites is taken into account.
- Exclusion Criteria
- E1.
- Non-peer reviewed works, studies on non-real time forecasting, or those lacking methodological detail.
- E2.
- Papers that did not focus primarily on the energy sector.
2.4. Retrieval of Results
3. Results
3.1. Classification of Models and Methods
3.1.1. Statistical Models
- 1.
- 2.
- 3.
- Auto Regressive Integrated Moving Average (ARIMA): Combining AR and MA components, with ARIMA additionally handling non-stationary data. The common approach to establish ARIMA models is the Box–Jenkins methodology which best fits the behavior of the time series [13]. It consists of model identification, parameter estimation, model validation, and forecasting [15].
- 4.
- 5.
- Spearman Rank Order Correlation Coefficient (SROCC): It is a non-parametric statistical method used to analyze the non-linear relationship between variables, with the primary purpose of quantifying the degree of influence of each parameter on another [14].
- 6.
| Year | Author | Input | Output | Accuracy | Model Type |
|---|---|---|---|---|---|
| 2010 | H Liu et al. [15] | Time-series wind speed wind power | Forecast wind speedand power | 11.34% (MAPE) [short-term multi-step ahead] | Wavelet, ITSM, ARIMA, Box–Jenkins |
| 2012 | A.M. Foley et al. [13] | Wind and weather data, forecast | Wind power/Speed forecast | 10–15% (MAE) [short term] | Persistence, AR, MA, ARMA, ARIMA, u.a. |
| 2014 | S. Tasnim et al. [26] | Past wind data, generated power | Wind power forecast | ∼0.16 (MAE) [short term] | Linear Regression (LR) |
| 2015 | A. K. Nayak et al. [29] | Historical wind data, air density, power coefficient | Wind forecast, ramps | 136.103% (MAPE) [short term] | ARIMA |
| 2017 | C. Deb et al. [27] | Energy consumption, occupancy, timestamp | Energy, cooling/heating load, temp., electricity price | 1.05–2.59% (MAPE) [short, medium, and long term] | ES, ARMA, ARIMA |
| 2020 | M. Zaimi et al. [30] | Meteorological, PV-data, PV-parameters | PMPP, efficiency, I–V-curve, model parameters | <3% (NRMSE) [short term] | Non-linear fit, empirical, polynomial formula |
| 2020 | R. Ahmed et al. [31] | Meteorol. parameters, PV time series, timestamp | PV-output/radiation forecast | 2–17% (NRMSE) [short, medium, and long term] | EWMA, ARMA, ARIMA |
| 2022 | D.V. Pombo et al. [3] | Power production, weather measurements | Short term PV-forecast | 20.63% (MAPE, SPAR model), 32.07% (MAPE, Persistence model) [short term] | Persistence, SPAR |
| 2022 | C. Wang et al. [32] | Multi-load data, forecast date | Multi-energy load last 24 h | 1.30 % (MAPE) [short term] | ARIMA, VAR, GRA, PCC |
| 2022 | W.H. Chung et al. [1] | Heat load variables, time, weather forecast | Short-term forecast load | 2.60 (MAE) [short term] | Elastic Net |
| 2023 | M. Yu et al. [14] | Cooling load, time factors, meteorol. factors | Cooling load forecast (buildings) | statistical benchmark not included [short term] | ARIMA, SROCC, ACF |
| 2024 | T. G. Grandón et al. [33] | Electricity consumption, meteorol., economic., calendar | Electricity demand forecast | 430.8 MW (MAE, LR) [medium term] | LR, ARIMA, ARMA |
| 2025 | W. Liao et al. [2] | Time series, load data | Load forecast | 4.4% (MAPE, LR) [short term] | LR, Persistence |
3.1.2. Machine Learning Models
| Year | Author | Input | Output | Accuracy | Model Type |
|---|---|---|---|---|---|
| 2012 | Aoife M. Foley, Paul G. Leahy, Antonino Marvuglia, Eamon J. McKeogh [13] | Historical wind data, wind power production, NWP forecast values, Weather forecast data | Wind power generation/output/patterns, Forecasted wind speed | 10–15% (MAE) [short term] | Multi-layer Perceptron (MLP), Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Bayesian methods |
| 2016 | HA Azzeddine, Mustapha Tioursi, Djamel Eddine C, Brahim K [41] | Cell temperature, Solar irradiance | Current, Voltage, Maximum power point (MPP) of a photovoltaic panel | 0.03 (MSE) [short term] | Radial Basis Function (RBF) neural network |
| 2017 | Weicong Kong et al. [42] | Household power consumption time series | Short-term residential electric energy forecast | 8.18% (MAPE, aggregated) [short term] | BPNN, SVM, Extreme Learning Machine (ELM), Adaptive Neuro Fuzzy Inference System (ANFIS), Radial Basis Function (RBF), Decision Tree (DT), Bayesian Neural Network |
| 2019 | Tae-Young Kim, Sung-Bae Cho [43] | Household power consumption: time variables, sensors, submetering | Residential electric energy consumption (Global Active Power) | 31.84% (MAPE) [short/medium term] | SVR, Random Forest (RF), Decision Tree (DT), Multi-Layer Perceptron (MLP) |
| 2020 | Liufeng Du, Linghua Zhang, Xu Wang [44] | Historical load data | Load forecasting | 2.14% (MAE) [short term, one hour ahead] | SVR, Extreme Learning Machine (ELM), Stacked Denoising Autoencoder (SDAE) |
| 2020 | Kuihua Wu et al. [45] | Electric load, Temperature, Gas consumption, Cooling load | Short-term electric load forecasting | 4.78% (MAPE) [short term] | BPNN, RF Regression (RFR), SVR |
| 2020 | PW Khan, Yung-Cheol Byun, Sang-Joon Lee, Dong-Ho Kang, Jin-Young Kang, Hae-Su Park [6] | Historical power consumption data, Time-series data | Energy consumption forecasting | 5.77% (MAPE) [short term, one hour] | Support Vector Machine (SVR), Lasso, Ridge, GradientBoost, XGBoost, MLP Regressor, CatBoost |
| 2022 | Huafeng Xian, Jinxing Che [11] | Historical load data | Power load forecasting | 3.37% (MAPE) [short term] | BPNN, SVR (RBF kernel), MSC framework for optimization |
| 2022 | Won Hee Chung, Yeong Hyeon Gu, Seong Joon Yoo [1] | Heat load variables, Time factors, Weather forecasts | Short-term load forecasting | 0.587 (MAE) [short term] | k-Nearest Neighbors (k-NN), SVR, RF, XGBoost |
| 2022 | Chen Wang et al. [32] | Multi-energy load data, Forecast date/timestamp/type | Multi-energy load forecasting (24 h) | 4.02% (MAPE) [short term] | Extreme Learning Machine (ELM), RF, SVR, Bagging-Boosting Neural Network |
| 2022 | Daniel Vázquez Pombo et al. [3] | Power production and meteorological measurements | Short-term PV power forecasting | 11.77% (RMSE) [short term, five hour] | RF, SVR |
| 2023 | Ke Li et al. [46] | Multi-energy load data, Meteorological factors | Short-term multi-energy load forecasting | 3.64% (MAPE) [short term] | SVM, Least Squares SVM (LSSVM), Generalized Regression Neural Network (GRNN) |
| 2023 | Jintao He, Lingfeng Shi, Hua Tian, Xuan Wang, Xiaocun Sun, Meiyan Zhang, Yu Yao, Gequn Shu [47] | State variables (temperature, pressure, mass flow rates), Input variables (cold source inlet mass flow rate, compressed rotational speed, pump rotational speed), Disturbance variables | Net power and refrigerating capacity of the CO2 combined cooling and power cycle (CCP) system | 0.0023 (RMSE) [short term] | Multilayer Feedforward Neural Network (MLFF), Radial Basis Function (RBF) neural network, Generalized Regression Neural Network (GRNN) |
| 2023 | Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman, Jubair Yusuf, Md Saiful Islam S, Farhana Akter Tumpa [48] | Meteorological data: DNI, DHI, GHI, temperature, wind direction, wind speed | Solar power generation, predicted as a classification problem | 81.02% (Classification) [short term] | Ensembles: Adaboost Classifier, Gradient Boosting Classifier, Random Forest Classifier |
| 2023 | Connor Scott, Mominul Ahsan, Alhussein Albarbar [49] | PV generation history, Time variable, Meteorological vars. | Forecasted PV power output | 32.0 (RMSE) [short term] | Linear Regression (LR), RF, SVM, Neural Networks (NN) |
| 2024 | L.R. Visser et al. [50] | Meteorological vars., Lagged PV gen., Prob. forecasts, Market prices | Forecasted PV power generation | 4.8 (CRPS) [short term, day ahead] | Multiple Linear Regression (MLR), RF, Smart Persistence (SP), Quantile Regression (QR), Quantile RF (QRF), Clear Sky Persistence Ensemble (CSPE) |
| 2024 | Lionel P. Joseph et al. [9] | Meteorological vars., Predictors, Ground level, Satellite climate vars. | Hourly wind speed forecasting | 0.421 m/s (MAE) [short term, one hour] | RF, Decision Tree Regressor (DTR), Gradient Boosting |
| 2025 | Wenlong Liao et al. [2] | Time-series datasets, Scarce historical load data | Load forecasting | 6.1 % (MAPE) [short term, 1 h–24 h] | Regression Tree (RT), XGBoost, MLP, Time-series large language model (TimeLLM) |
| 2025 | Emrah Dokur et al. [38] | Active/reactive power measurements, Past voltage | Forecasted node voltage | 0.0019 (MSE) [short term] | Extreme Learning Machine (ELM) Variants, MLP, ANFIS |
3.1.3. Deep Learning Models
- 1.
- Artificial Neural Network (ANN): As a foundational element within the broader field of DL, ANNs are structured with input, hidden, and output layers [6,8,31]. They process input data through weighted connections and activation functions like ReLU, learning complex patterns via backpropagation and optimization (e.g., Adam solver). These models are increasingly employed for prediction tasks, such as determining if power delivery network design violates its target impedance, without needing additional simulations during optimization processes [51].
- 2.
- LSTM: A specialized type of Recurrent Neural Network (RNN) that is highly effective for processing sequential data and capturing temporal features from time series [2,52]. Studies have shown that LSTMs often demonstrate superior performance over conventional model such as MLPs in tasks like short-term load forecasting [2].
- 3.
- Gated Recurrent Unit (GRU): Another well-known RNN model that is considered a simplified version of the LSTMs is GRU, possessing a more streamlined architecture with fewer gates (typically two: a reset gate and an update gate), which can lead to faster training times while still maintaining competitive performance [2].
- 4.
- Convolutional Neural Network (CNN): It excels in extracting spatial and temporal features from data, identifying local patterns through convolutional and pooling layer architecture [2,53]. Originally popular in image processing, their application has expanded to time-series analysis, where they can identify local patterns and relationships within data segments [1,3,14]. In the context of load forecasting, CNNs are specifically used to depict spatial features between loads at different points in a system, such as various buses in a power network [1,2,53,54].
- 5.
- Graph Neural Network (GNN): These are specialized models designed to operate on structured graph data, which are characterized by nodes and edges representing entities and their relationships, respectively. They are particularly adept at capturing both structural and temporal information within complex networks [2,53]. This makes them highly suitable for applications where data exhibit relational structures, such as heating networks, smart grids, or traffic flow systems [2].
- 6.
- Transformer: Unlike traditional RNNs which are inherently sequential and struggle with parallelization and long-term dependencies, the transformers are entirely based on attention mechanisms [2,32]. To manage information flow, prevent gradient degradation, and accelerate convergence, residual connections and layer normalization are integrated around each sub-layer. Since the model has no recurrence, positional encodings (e.g., sine-cosine functions) are added to the input embeddings to inject information about the order of token in the sequence [32,55].
| Year | Author | Input | Output | Accuracy | Model-Type |
|---|---|---|---|---|---|
| 2014 | V. Lo Brano, G. Ciulla, M. Di Falco [8] | Air temp.; cell temp.; solar irradiance; wind speed; open-circuit voltage; short-circuit current | PV module power forecast | 0.05–1% (Mean error) [short term] | RNN; Gamma Memory (GM) |
| 2018 | S. Bouktif, A. Fiaz, M.A. Serhani [4] | Energy-consumption data; time lags; weather; schedule vars | Short/medium-term load forecast | 0.56% (RMSE) [short and medium term] | LSTM; RNN |
| 2019 | N. Jinil, S. Reka [56] | EV motor power req.; other component loads; driving conditions | EV power-req. prediction; distribution optimization | 4.10% (MAPE) [long term] | Modular RNN (MRNN) |
| 2019 | C.M. Schierholz, K. Scharff, C. Schuster [51] | Simulated PCB-variation data | Binary TI-violation prediction | 88% (Classification) [static design] | Multi-layer ANN |
| 2019 | L. Du, L. Zhang, X. Wang [44] | Historical load data | Load forecasting | 2.14% (MAE) [short term, hour ahead] | 3D-CNN-GRU |
| 2019 | T.Y. Kim, S.-B. Cho [43] | Household consumption dataset (time, sensors, submeter vars) | Residential energy-consumption prediction | 31.83% (MAPE) [short term] | CNN; LSTM; GRU; Bi-LSTM; Attention LSTM |
| 2021 | H. Pariaman, G.M. Luciana, M.K. Wisyaldin, M. Hisjam [52] | Historical time-series sensor data | Reconstructed time-series patterns | 93.36% (MAE) [short term] | LSTM-Autoencoder |
| 2022 | H. Xian, J. Che [11] | Historical load data | Power-load forecasting | 3.025% (MAPE) [short term] | LSTM; RNN |
| 2022 | W.H. Chung, Y.H. Gu, S.J. Yoo [1] | Heat-load vars; time factors; weather forecasts | Short-term load forecast | 94.2% (R2) [short term] | DNN; RNN; LSTM; LSTM Attention |
| 2022 | Z. Gao, J. Yu, A. Zhao, Q. Hu, S. Yang [57] | Load data; internal/external disturbances | Short-term cooling forecast | 3.25% (MAPE) [short and medium term] | ELM; GRNN; BP; WNN |
| 2022 | D. Niu, M. Yu, L. Sun, T. Gao, K. Wang [58] | Cooling, heat | electric load data; time features; external factors and Multi-energy-load forecast | 5.44% (MAPE) [short term] | LSTM; BiGRU |
| 2022 | C. Wang, Y. Wang, Z. Ding, T. Zheng, J. Hu, K. Zhang [32] | Multi-energy load data; forecast date/timestamp/type | Next-24h multi-energy forecast | 1.037% (MAPE) [short term, day ahead] | Multiple-decoder Transformer |
| 2022 | Y. Guo, Y. Li, X. Xuebo [59] | Multi-energy load; meteorological; date info | Combined heating, cooling and electric forecast | 1.76% (MAPE) [short term] | LSTM; BiLSTM; MTL |
| 2022 | D. Vázquez Pombo, P. Bacher, C. Ziras, H.W. Bindner, S.V. Spataru, P.E. Sørensen [3] | PV production and meteorological data | Short-term PV forecast | 14.03% (RMSE) [short term, 5 h] | CNN; LSTM |
| 2023 | W. Cui, W. Yang, B. Zhang [60] | Time-series data (voltage magnitude, rotor angle, frequency deviation), system topology, fault locations/types, power injections (active and reactive) | Predicted trajectories; unstable-case identification | 0.01% (Relative MSE) [transient, seconds] | DNN (Fourier-transform + filtering layers) |
| 2023 | J. He, L. Shi, H. Tian, X. Wang, X. Sun, M. Zhang, Y. Yao, G. Shu [47] | CCP system params (temp., pressure, mass-flow), disturbance vars (torque, speed, exhaust), inputs (mass-flow rate, compressor and pump speeds) | Net power; refrigerating capacity; state-variable prediction | 0.13% (RMSE) [transient, seconds] | MLFF NN; RNN; LSTM; GRU |
| 2023 | Y. Huang, Y. Zhao, Z. Wang, X. Liu, H. Liu, Y. Fu [61] | District-heating consumption data | Multi-horizon district-heat forecast | 31.2% (RMSE) [medium /long term] | TPA; LSTM; CRNN; Encoder; MSL |
| 2023 | M. Yu, D. Niu, J. Zhao, M. Li, L. Sun, X. Yu [14] | Cooling-load data; time | Short-term cooling forecast | Not specified | LSTM; Bi-LSTM; DNN; RNN; CNN; TTGAT-GTC |
| 2024 | X. Wang, H. Wang, S. Li, H. Jin [54] | Historical meter data | Real-time short-term load forecast | 0.92% (MAPE) [short term] | LSTM; LSTM + Att; BiLSTM + Att |
| 2024 | Y. Huang, Y. Zhao, Z. Wang, X. Liu, Y. Fu [53] | Heat-load records; meteorological; exogenous factors | Future heat-load forecast | 7.2% (MAE) [short term] | GNNs |
| 2024 | L.P. Joseph, R.C. Deo, D. Casillas-Pérez, R. Prasad, N. Raj, S. Salcedo-Sanz [9] | Meteorological predictors; ground and satellite data | Hourly wind-speed forecast | 0.149 m/s [short term] (MAE) | LSTM; BiLSTM |
| 2025 | W. Liao, S. Wang, D. Yang, Z. Yang, J. Fang, C. Rehtanz, F. Porté-Agel [2] | Time series; scarce historical load data | Load forecasting | 2.1% (MAPE) [short term] | Transformer (positional encoding, multi-head attention, CNN) |
3.1.4. Hybrid Models
| Year | Author | Input | Output | Accuracy | Model Type |
|---|---|---|---|---|---|
| 2018 | S. Bouktif, Ali Fiaz, Mohamed Adel Serhani [4] | Electric energy consumption data, Time lags, Weather data, Schedule-related variables | Forecasted electric load/consumption (short- and medium-term horizons) | 0.62% (RMSE) [short and medium term] | Genetic Algorithm (GA)- Enchanced LSTM-RNN |
| 2020 | PW Khan, Yung-Cheol Byun, Sang-Joon Lee, Dong-Ho Kang, Jin-Young Kang and Hae-Su Park [6] | Historical power consumption data, Time-series data | Energy consumption forecasting | 4.29% (MAPE) [short term] | Ensemble thrree base models 1. CatBoost 2. SVR3. MLP |
| 2020 | Kuihua Wu, Jian Wu, Liang Feng, Bo Yang, Rong Liang, Shenquan Yang, Ren Zhao [45] | Historical electric load, Temperature, Gas consumption, Cooling load | Short-term electric load forecasting | 99.1% () [short term] | Attention-based CNN-LSTM-BiLSTM model |
| 2022 | Huafeng Xian, Jinxing Che [11] | Historical load data | Power load forecasting | 2.71% (MAPE) [short term] | MSC-PSO-SVR, Ensemble model (RF and XGBoost) |
| 2022 | Won Hee Chung, Yeong Hyeon Gu, Seong Joon Yoo [1] | Heat load-derived variables, Time factors, Weather forecasts | Short-term load forecasting | 94.2% () [short term] | Parallel CNN-LSTM Attention (PCLA) |
| 2022 | Daniel Vázquez Pombo, Peder Bacher, Charalampos Ziras, Henrik W. Bindner, Sergiu V. Spataru, Poul E. Sørensen [3] | Basic dataset including power production and meteorological measurements | Short-term Photovoltaic (PV) power forecasting | 18.65% (MAPE) [short term] | CNN-LSTM |
| 2022 | Dongxiao Niu, Min Yu, Lijie Sun, Tian Gao, Keke Wang [58] | Historical cooling, heat, and electrical load data, Time features, External influencing factors | Short-term multi-energy load forecasting | 2.75% (MAPE) [short term] | CNN-BiGRU, BiGRU-Attention CNN-BiGRU-Attention- Multi Task Learning (MTL) |
| 2023 | Min Yu, Dongxiao Niu, Jinqiu Zhao, Mingyu Li, Lijie Sun, Xiaoyu Yu [14] | Historical cooling load (CL) data, Time factors, Meteorological factors | Short-term building cooling load (CL) forecasting | 8.81% (MAPE) [short term] | SWT (Synchrosqueezing Wavelet Denoising), TTGAT (Temporal Trend-aware Graph Attention Network), GTC (Gated Temporal Convolutional Layer) SWT-TTGAT-GTC Model |
| 2024 | Ke Li a, Yuchen Mu a, Fan Yang a, Haiyang Wang a, Yi Yan b, Chenghui Zhang [63] | Uncertain variables in an Integrated Energy System (IES), Meteorological data | Joint source-load-price forecasting | 4.10% (MAPE) [short and long term] | MCNN-SCAM-LSTM-MTL where, MCNN- Multi-column Convolutional Neural Network, SCAM- Sequential Convolution Attention Module MTL-BiLSTM, Radial Basis Function Deep Belief Network (RBF-DBN), MTL-LSSVM |
| 2024 | Sujan Ghimire, Ravinesh C. Deo, David Casillas-Pérez, Sancho Salcedo-Sanz [64] | Half-hourly electricity price sequences, Lagged values of the decomposed price series, Historical errors | Short-term, half-hourly electricity price forecasts | 5.83% (sMAPE) [short term] | VMD-CLSTM-VMD-ERCRF model VMD: Variational Mode Decomposition CLSTM: combined of CNN and LSTM, ERCRF: Error compensation and Random Forest regresssion |
| 2024 | Jungyeon Park, Estêvão Alvarenga, Jooyoung Jeon, Ran Li, Fotios Petropoulos, Hokyun Kim, Kwangwon Ahn [65] | Hourly electricity consumption series, Deseasonalized demand time series, Historical demand observations | Probabilistic load forecasts | 60% (Error Red.) [short term] | ARMA-GARCH (Autoregressive Moving Average - Generalized Autoregressive Conditional Heteroskedasticity) model |
| 2024 | Yaohui Huang, Yuan Zhao, Zhijin Wang, Xiufeng Liu, Yonggang Fu [53] | Historical heat load records, Meteorological factors, Exogenous factors | Future heat oad values, Forecast of time steps ahead | 23.5% (RMSE) [short term] | Sparse Dynamic Graph Neural Network (SDGNN) |
| 2024 | Lionel P. Joseph, Ravinesh C. Deo, David Casillas-Pérez, Ramendra Prasad, Nawin Raj, Sancho Salcedo-Sanz [9] | Meteorological variables, Attributes used as predictors, Ground level data, Satellite based climate variables | Hourly wind speed forecasting | 99.5% (Index) [short term] | 3 Phase hybrid model: 3P-CBiLSTM 1. TMGWO (Mutation Grey Wolf Optimizer) for feature selection 2. BOHB (Hybrid Bayesian Optimization and HyperBand) algorithm for hyperparameter optimization 3. CBiLSTM |
| 2025 | Ali Amini, Samuel Rey-Mermet, Steve Crettenand, Cécile Münch-Alligné [66] | High-frequency experimental data, Low-frequency SCADA data, Physics-based parameters, Engineered features | Instantaneous Power | 99% () [real time] | physics-based analysis and data-driven (machine learning) approach |
| 2025 | Emrah Dokur, Nuh Erdogan, Ibrahim Sengor, Ugur Yuzgec, Barry P. Hayes [38] | Time series of active power measurements, Time series of reactive power measurements, Past voltage values | Forecasted node volatage | 0.56% (Avg. Dec.) [near real time] | It combines Extreme Learning Machine (ELM) with Single Candidate Optimizer (SCO) |
| 2025 | Weikun Deng, Hung Le, Khanh T.P. Nguyen, Christian Gogu, Kamal Medjaher, Jérôme Morio, Dazhong Wu [67] | Raw sensor data, Continuous time-series data, Temperature data | Remaining Useful Life (RUL) prediction for fast-charging lithium-ion batteries | 8.4% (MAPE) [long term] | Data driven branch: Dilated Convolutional Neural Network (D-CNN) Physics-informed branch: Neural network that uses physics-embedded algorithm structure Features from both branches are merged and processed by a Full Connectivity Neural Network (FCNN) in the final output layer |
| 2025 | Hui Song, Boyu Zhang, Mahdi Jalili, Xinghuo Yu [68] | Energy demand data, Temperature data | Energy demand forecasting | 5.8% (RMSE) [short term] | Multi-swarm Multi-tasking Ensemble Learning (MSMTEL) |
- 1.
- Architecture defines the structural integration of components:
- (a)
- (b)
- Parallel (Ensemble): “Ensemble methods” or “ensemble learning”, where multiple models run in parallel and their predictions are aggregated (bagging, boosting, stacking). These models emphasize on variance reduction, improved generalization, and combing multiple base learners via averaging, voting, or stacking meta-learners [1,6].
- (c)
- 2.
- Functionality describes the logic behind the combination:
- (a)
- Decomposition-based: Models that first decompose the original signal into a set of simpler subseries (e.g., modes, frequency bands, trends) using techniques such as WT or VMD and then learn based on these components or their recombination to handle the complex oscillatory behavior more constructively [14,62].
- (b)
- Feature fusion-based: Models that integrate heterogeneous feature extractors and combine their latent representation through concatenations via attention or learned weighting into a unified feature space that is passed to a downstream predictor, aiming to capture complementary aspects of the data such as spatiotemporal structure (e.g., instead of standard CNN-LSTM combination, Graph Attention Networks (GAT) combined with TCN could be utilized) [14].
| Model Type | Strength | Weakness |
|---|---|---|
| CNN | They excel in extracting hidden structures and inherent features from time-series data, to improve learning efficiency and reduce the number of parameters [9,31]. | They are fundamentally limited to temporal receptive field, destructive flattening in hybrid setups, and rigidity with sequence lengths [1]. |
| LSTM | It is designed to overcome the vanishing gradient problem of traditional RNNs and learn and retain long-term dependencies in temporal sequences [2,3,6,31]. | While effective they are limited to pre-processing information in a single direction, meaning they have the capacity to miss out on pertinent information [9]. |
| BiLSTM | They improve upon standard LSTMs by being able to process information in both forward and backward directions, which facilitates the effective learning of due to dual information flow characteristics [1,9,14]. | Due to their complex architecture, they are non-interpretable “black-box” models, which require explainable intelligence to increase model transparency [1,9,14]. |
| RF | Pombo et al. (2022) explicitly state that RF required 1–3 h for training compared to 12–35 h for hybrid CNN-LSTM models [3]. Joseph et al. (2024) stated that tree-based models generally offer better short-term accuracy then physical or statistical models [9]. | Tree-based models “perform poorly when extrapolating outside the range of the training data” [9]. |
| Ensemble | Khan et al. (2020) argue that ensemble learning allows weak classifier to correct each other’s mistake, resulting in a stronger supervised model [6]. Ensembles reduce the risk of selecting a single model with systematic bias errors [31]. | Ensembles increase computational costs because multiple base models must be processed in parallel [31]. |
3.2. Application in Grid Planning
3.3. Adaptability Across Forecasting Horizons
- 1.
- Short-term and intra-hour horizons (real-time operations): In this domain, horizons range from minutes to 48 h, model adaptability relies heavily on capturing high-frequency fluctuations and rapid meteorological changes [31]. DL models, specifically LSTM networks and CNNs, demonstrate superior adaptability due to their ability to capture non-linear dependencies in volatile time-series data [1,9,13,14]. While LSTMs like Time GPT perform, well in short-look-ahead scenarios, their performance can degrade in longer horizons, often producing “conservative” forecast that fail to capture peaks and valleys necessary for granular operational planing [1,2,36].
- 2.
- Medium-term horizons (Scheduling and maintenance): Spanning one week to several months, in this horizon “catastrophic forgetting” of older patterns becomes a risk for standard neural networks [4,31]. Techniques such as GA is used to optimize time lags which has proven to be effective in prediction stability by identifying optimal historical windows [31]. Accurate medium forecasts allow utilities to optimize unit commitment and minimize reserve power requirements by predicting weekly load profiles with lower variance [2].
- 3.
- Long-term horizons (Capacity planning): For horizons extending from one year to decades, statistical and physical models often outperform pure ML approaches in this domain because they simulate atmospheric dynamics and physical boundary conditions, which may not account for long-term climate shifts [36]. However, recent trends advocate hybridizing physical models with ML error correction to enhance long-term validity [13,16,36].
| Forecast Horizon | Dominant Input Feature | Recommended Model Architecture | Key Challenges |
|---|---|---|---|
| Intra-hour/ Short-term (<1 h to 48 h) | Historical load, cloud motion, wind speed, temperature [31,71,72]. | DL and hybrid models such as CNN-LSTM, BiLSTM, Ensembles [1]. | Minimizing latency in data acquisition [3,45]. |
| Medium-term (1 week to 1 year) | Seasonal indices, calendar events, temperature trends [4,70]. | Optimized RNNs such as GA-LSTM and statistical methods like SARIMA [4,31]. | Ensuring stability and avoiding overfitting to short-term noise [4]. |
| Long-term (>1 year) | Macro economic indicators, demographics, climatological norms [13]. | Physical statistical hybrid models [9,13]. | Accounting for non-stationary trends [13,63]. |
4. Discussion
4.1. General Discussion
4.1.1. RQ1 (Input and Output Characteristics)
- 1.
- Time-series data: This type of data is crucial because electrical load is inherently dynamic and influenced by factors that change over time. In the context of wind power, historical wind speed data is used, from which the one day ahead wind power target is derived using the power curve of specific turbines [9]. This historical wind speed itself forms a time-series input. The raw time-series data can be transformed using techniques such as Fast Fourier Transformation (FFT) to decompose the signal into frequency components [3]. This transformation also enhances the diversity of the inputs for ensemble learning [26]. In terms of historical energy consumption data both from renewable and non-renewable sources, the raw data can be aggregated into a single total energy consumption series, where the model leverages rich temporal features directly from the date time index of the collected data, allowing ML algorithms to learn and forecast energy consumption patterns based on time-specific variations like hour, day of the week, and year. This approach highlights the time-dependent context in which load occurred for accurate forecasting [6].
- 2.
- System specific data: This category encompasses parameters and specifications unique to the physical and operational characteristic of the power system and associated technologies. These involve core electrical variables such as voltage, current, active/reactive power, system topology, and fault data [8]. Inputs such as cell temperature, solar irradiance, and efficiency are crucial when modeling systems with solar integration. Instead of relying solely on physical measurements, some models use a mathematical representation (e.g., the single-diode model) to generate high-quality datasets. These datasets enable accurate offline training of ANNs to predict the maximum power point under varying conditions [2,30,41].
- 3.
- Meteorological data: It plays a pivotal role, particular in systems that incorporate renewable energy sources such as solar and wind power. Meteorological data is not only diverse but also exhibits high spatial and temporal variability, meaning same model may perform differently across regions or over time due to changing climate conditions. Solar irradiance and Global Horizontal Irradiation (GHI) are critical variables for Photo Voltaic (PV) power forecasting, which provides insights into potential solar energy available at a given location and time [48].
- 4.
- Operational parameters: It provides essential insights into the functioning, configuration, and temporal context of the system being modeled. An example where the inputs capture design specific and runtime characteristics of a Power Delivery Network (PDN) and in context of the ANN model, it includes Target Impedance (TI) which defines the PDN performance threshold, influencing how the ANN assesses design adequacy. Another input which represents the placement values of capacitors on the Printed Circuit Board (PCB), either integrated via ring or detailed grid and rectangular sector methods to enhance model performance is Decoupling Capacitor (Decap). Thus, the inputs are synthetically generated using physics-based simulations, enabling the ANN to predict whether a given PDN design will violate its target impedance. Abstracting spatial data through sector-based pre-processing significantly improves prediction accuracy, demonstrating the importance of a thoughtful input representation [51].
4.1.2. RQ2 (Improving Electrical Grid Planning)
4.1.3. RQ3 (Performance Metrics)
4.2. Trends and Advancements
4.3. Gap and Challenges
- 1.
- Need for continuous adaptability: Models require continuous integration feedback, handling variation in parameters, applications, and modalities in real time. When the output length varies, the complexity is greatly amplified [80].
- 2.
- Handling irregularity: Models must handle these irregular length sequences without relying on padding or truncation, which can obscure critical information or introduce noise [81].
- 3.
- Balancing accuracy: Sliding window is one common approach in handling sequential data in dynamic forecasting [81]. The comparative optimizations reveal that it is a critical hyperparameter; for instance, expanding the window from 22 to 30 h in hybrid RF and LSTM models was proven necessary to capture specific multi day dependencies in non-stationary solar data [82].
5. Conclusions
- Adapting to dynamic signal lengths: There are currently no algorithms for dynamically adjusting the signal length in a manufacturing environment. Existing models assume constant clock intervals regardless of varying load profiles. Additionally, MAE or RMSE maybe unsuitable and misleading for these results when signals are irregularly sampled. Therefore, further work is needed to examine industrially relevant metrics to best depict the results and their relevance for managing real-time industrial grid applications, and demand response.
- Decoupling output length from signal prediction: In future work, the length of the output of the model can be decoupled from the signal prediction task. The model is able to predict how long the future predictions will be based on the length of the forecast horizon. Therefore, two sub-modules for duration prediction and load shape prediction can strengthen the overall algorithm.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| SLR | Systematic Literature Review | ANN | Artificial Neural Network |
| ML | Machine Learning | kNN | k-Nearest Neighbor |
| DL | Deep Learning | LLMs | Large Language Models |
| LSTM | Long Short-Term Memory | FFT | Fast Fourier Transform |
| GHI | Global Horizontal Irradiation | TI | Target Impedance |
| PDN | Power Delivery Network | LR | Linear Regression |
| PCB | Printed Circuit Board | Decap | Decoupling Capacitor |
| SVM | Support Vector Machine | PCA | Principal Component Analysis |
| GA | Genetic Algorithm | CL | Cooling Load |
| FCNN | Fully Connected Neural Network | ACF | Autocorrelation Function |
| MI | Mutual Information | ReLU | Rectified Linear Unit |
| ELM | Extreme Learning Machine | TCN | Temporal Convolutional Network |
| GTC | Gated Temporal Convolution | DES | Discrete Event Simulation |
| CatBoost | Categorical Boosting | RBF | Radial Basis Function |
| MLP | Multi-Layer Perceptron | MTL | Multi-Task Learning |
| RNN | Recurrent Neural Network | GNN | Graph Neural Network |
| SHAP | SHapley Additive Explanations | DBN | Deep Belief Network |
| ES | Exponential Smoothing | ODE | Ordinary Differential Equations |
| MA | Moving Average | RMLP | Recurrent Multi-Layer Perceptron |
| AR | Auto Regressive | WT | Wavelet Transform |
| SVR | Support Vector Regression | AC-RNN | Active Graph Recurrent Network |
| FNOs | Fourier Neural Operators | NWP | Numerical Weather Prediction |
| ITSM | Improved Time-Series Methods | CNN | Convolutional Neural Network |
| BP | Back Propagation | DNN | Deep Neural Network |
| MAPE | Mean Absolute Percentage Error | SPAR | Semi-Parametric Auto Regressive |
| DR | Demand Response | VAR | Vector Auto Regression |
| GRU | Gated Recurrent Unit | GRA | Grey Relational Analysis |
| MOGA | Multi-Objective Genetic Algorithm | PCC | Pearson Correlation Coefficient |
| PINNs | Physics-Informed Neural Networks | MAE | Mean Absolute Error |
| EWMA | Exponentially Weighted Moving Average | MSE | Mean Squared Error |
| GRNN | Generalized Regression Neural Network | RMSE | Root Mean Square Error |
| PCNNs | Physics-Constrained Neural Networks | VMD | Variational Mode Decomposition |
| HEA | Hybrid Evolutionary Adaptive Approach | ARMA | Auto Regressive Moving Average |
| EPSO | Evolutionary Particle Swarm Optimization | MTLF | Medium-Term Load Forecasting |
| SROCC | Spearman Rank Order Correlation Coefficient | LTLF | Long-Term Load Forecasting |
| SDGNN | Sparse Dynamic Graph Neural Network | ARIMA | Auto Regressive Integrated Moving Average |
| PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses | BiLSTM | Bidirectional Long Short-Term Memory |
References
- Chung, W.H.; Gu, Y.H.; Yoo, S.J. District heater load forecasting based on machine learning and parallel CNN-LSTM attention. Energy 2022, 246, 123350. [Google Scholar] [CrossRef]
- Liao, W.; Wang, S.; Yang, D.; Yang, Z.; Fang, J.; Rehtanz, C.; Porté-Agel, F. TimeGPT in load forecasting: A large time series model perspective. Appl. Energy 2025, 379, 124973. [Google Scholar] [CrossRef]
- Pombo, D.V.; Bacher, P.; Ziras, C.; Bindner, H.W.; Spataru, S.V.; Sørensen, P.E. Benchmarking physics-informed machine learning-based short term PV-power forecasting tools. Energy Rep. 2022, 8, 6512–6520. [Google Scholar] [CrossRef]
- Bouktif, S.; Fiaz, A.; Ouni, A.; Serhani, M.A. Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies 2018, 11, 1636. [Google Scholar] [CrossRef]
- Ugbehe, P.O.; Diemuodeke, O.E.; Aikhuele, D.O. Electricity demand forecasting methodologies and applications: A review. Sustain. Energy Res. 2025, 12, 19. [Google Scholar] [CrossRef]
- Khan, P.W.; Byun, Y.C.; Lee, S.J.; Kang, D.H.; Kang, J.Y.; Park, H.S. Machine learning-based approach to predict energy consumption of renewable and nonrenewable power sources. Energies 2020, 13, 4870. [Google Scholar] [CrossRef]
- Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
- Lo Brano, V.; Ciulla, G.; Di Falco, M. Artificial neural networks to predict the power output of a PV panel. Int. J. Photoenergy 2014, 2014, 193083. [Google Scholar] [CrossRef]
- Joseph, L.P.; Deo, R.C.; Casillas-Pérez, D.; Prasad, R.; Raj, N.; Salcedo-Sanz, S. Short-term wind speed forecasting using an optimized three-phase convolutional neural network fused with bidirectional long short-term memory network model. Appl. Energy 2024, 359, 122624. [Google Scholar] [CrossRef]
- Amasyali, K.; El-Gohary, N.M. A review of data-driven building energy consumption prediction studies. Renew. Sustain. Energy Rev. 2018, 81, 1192–1205. [Google Scholar] [CrossRef]
- Xian, H.; Che, J. Multi-space collaboration framework based optimal model selection for power load forecasting. Appl. Energy 2022, 314, 118937. [Google Scholar] [CrossRef]
- Ma, L.; Luan, S.; Jiang, C.; Liu, H.; Zhang, Y. A review on the forecasting of wind speed and generated power. Renew. Sustain. Energy Rev. 2009, 13, 915–920. [Google Scholar] [CrossRef]
- Foley, A.M.; Leahy, P.G.; Marvuglia, A.; McKeogh, E.J. Current methods and advances in forecasting of wind power generation. Renew. Energy 2012, 37, 1–8. [Google Scholar] [CrossRef]
- Yu, M.; Niu, D.; Zhao, J.; Li, M.; Sun, L.; Yu, X. Building cooling load forecasting of IES considering spatiotemporal coupling based on hybrid deep learning model. Appl. Energy 2023, 349, 121547. [Google Scholar] [CrossRef]
- Liu, H.; Tian, H.Q.; Chen, C.; Li, Y.f. A hybrid statistical method to predict wind speed and wind power. Renew. Energy 2010, 35, 1857–1861. [Google Scholar] [CrossRef]
- Wang, Q.; Li, Y.; Li, R. Integrating artificial intelligence in energy transition: A comprehensive review. Energy Strategy Rev. 2025, 57, 101600. [Google Scholar] [CrossRef]
- Wanigasekara, C.; Swain, A.; Nguang, S.K.; Prusty, B.G. Neural network based inverse system identification from small data sets. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–6. [Google Scholar]
- Wanigasekara, C.; Oromiehie, E.; Swain, A.; Prusty, B.G.; Nguang, S.K. Machine learning based predictive model for AFP-based unidirectional composite laminates. IEEE Trans. Ind. Inform. 2019, 16, 2315–2324. [Google Scholar] [CrossRef]
- Wanigasekara, C.; Oromiehie, E.; Swain, A.; Prusty, B.G.; Nguang, S.K. Machine learning-based inverse predictive model for AFP based thermoplastic composites. J. Ind. Inf. Integr. 2021, 22, 100197. [Google Scholar] [CrossRef]
- Oromiehie, E.; Prusty, B.G.; Rajan, G.; Wanigasekara, C.; Swain, A. Machine learning based process monitoring and characterisation of automated composites. In Proceedings of the SAMPE, Seattle, WA, USA, 22–25 May 2017; pp. 1–6. [Google Scholar]
- Wanigasekara, C.; Swain, A.; Nguang, S.K.; Prusty, B.G. Improved learning from small data sets through effective combination of machine learning tools with VSG techniques. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar]
- Witharama, W.; Bandara, K.; Azeez, M.; Bandara, K.; Logeeshan, V.; Wanigasekara, C. Advanced genetic algorithm for optimal microgrid scheduling considering solar and load forecasting, battery degradation, and demand response dynamics. IEEE Access 2024, 12, 83269–83284. [Google Scholar] [CrossRef]
- McCullagh, P. What is a statistical model? Ann. Stat. 2002, 30, 1225–1310. [Google Scholar] [CrossRef]
- Luna-Romero, S.F.; Serrano-Guerrero, X.; de Souza, M.A.; Escrivá-Escrivà, G. Enhancing anomaly detection in electrical consumption profiles through computational intelligence. Energy Rep. 2024, 11, 951–962. [Google Scholar] [CrossRef]
- King, G.; Tomz, M.; Wittenberg, J. Making the most of statistical analyses: Improving interpretation and presentation. Am. J. Political Sci. 2000, 44, 347–361. [Google Scholar] [CrossRef]
- Tasnim, S.; Rahman, A.; Shafiullah, G.; Oo, A.M.T.; Stojcevski, A. A time series ensemble method to predict wind power. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence Applications in Smart Grid (CIASG), Orlando, FL, USA, 9–12 December 2014; pp. 1–5. [Google Scholar]
- Deb, C.; Zhang, F.; Yang, J.; Lee, S.E.; Shah, K.W. A review on time series forecasting techniques for building energy consumption. Renew. Sustain. Energy Rev. 2017, 74, 902–924. [Google Scholar] [CrossRef]
- Spanos, A. Foundational issues in statistical modeling: Statistical model specification and validation. Ration. Mark. Morals 2011, 2, 146–178. [Google Scholar]
- Nayak, A.K.; Sharma, K.C.; Bhakar, R.; Mathur, J. ARIMA based statistical approach to predict wind power ramps. In Proceedings of the 2015 IEEE Power & Energy Society General Meeting, Denver, CO, USA, 26–30 July 2015; pp. 1–5. [Google Scholar]
- Zaimi, M.; El Achouby, H.; Zegoudi, O.; Ibral, A.; Assaid, E. Numerical method and new analytical models for determining temporal changes of model-parameters to predict maximum power and efficiency of PV module operating outdoor under arbitrary conditions. Energy Convers. Manag. 2020, 220, 113071. [Google Scholar] [CrossRef]
- Ahmed, R.; Sreeram, V.; Mishra, Y.; Arif, M. A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renew. Sustain. Energy Rev. 2020, 124, 109792. [Google Scholar] [CrossRef]
- Wang, C.; Wang, Y.; Ding, Z.; Zheng, T.; Hu, J.; Zhang, K. A transformer-based method of multienergy load forecasting in integrated energy system. IEEE Trans. Smart Grid 2022, 13, 2703–2714. [Google Scholar] [CrossRef]
- Grandón, T.G.; Schwenzer, J.; Steens, T.; Breuing, J. Electricity demand forecasting with hybrid classical statistical and machine learning algorithms: Case study of Ukraine. Appl. Energy 2024, 355, 122249. [Google Scholar] [CrossRef]
- Sarker, I.H.; Hoque, M.M.; Uddin, M.K.; Alsanoosy, T. Mobile data science and intelligent apps: Concepts, AI-based modeling and research directions. Mob. Netw. Appl. 2021, 26, 285–303. [Google Scholar] [CrossRef]
- Bergmann, D. What is Machine Learning? IBM—ibm.com. Available online: https://www.ibm.com/think/topics/machine-learning (accessed on 15 November 2025).
- Park, K.; Kim, J.; Seo, J. Pint: Physics-informed neural time series models with applications to long-term inference on weatherbench 2m-temperature data. arXiv 2025, arXiv:2502.04018. [Google Scholar]
- Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M.; et al. Opinion Paper:“So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
- Dokur, E.; Erdogan, N.; Sengor, I.; Yuzgec, U.; Hayes, B.P. Near real-time machine learning framework in distribution networks with low-carbon technologies using smart meter data. Appl. Energy 2025, 384, 125433. [Google Scholar] [CrossRef]
- Faizan, M.; Afgan, I. Dynamic Assessment and Optimization of Thermal Energy Storage Integration with Nuclear Power Plants Using Machine Learning and Computational Fluid Dynamics. Appl. Energy 2025, 391, 125939. [Google Scholar] [CrossRef]
- INT Global. 5 Major Limitations of Machine Learning. 2025. Available online: https://intglobal.com/blogs/5-major-limitations-of-machine-learning/ (accessed on 15 November 2025).
- Azzeddine, H.A.; Tioursi, M.; Chaouch, D.E.; Khiari, B. An offline trained artificial neural network to predict a photovoltaic panel maximum power point. Rev. Roum. Sci. Techn. Énerg 2016, 61, 255–257. [Google Scholar]
- Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
- Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
- Du, L.; Zhang, L.; Wang, X. Spatiotemporal feature learning based hour-ahead load forecasting for energy internet. Electronics 2020, 9, 196. [Google Scholar] [CrossRef]
- Wu, K.; Wu, J.; Feng, L.; Yang, B.; Liang, R.; Yang, S.; Zhao, R. An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in integrated energy system. Int. Trans. Electr. Energy Syst. 2021, 31, e12637. [Google Scholar] [CrossRef]
- Li, K.; Mu, Y.; Yang, F.; Wang, H.; Yan, Y.; Zhang, C. A novel short-term multi-energy load forecasting method for integrated energy system based on feature separation-fusion technology and improved CNN. Appl. Energy 2023, 351, 121823. [Google Scholar] [CrossRef]
- He, J.; Shi, L.; Tian, H.; Wang, X.; Sun, X.; Zhang, M.; Yao, Y.; Shu, G. Applying artificial neural network to approximate and predict the transient dynamic behavior of CO2 combined cooling and power cycle. Energy 2023, 285, 129451. [Google Scholar] [CrossRef]
- Islam, M.S.; Hasan, A.J.; Rahman, M.S.; Yusuf, J.; Sajol, M.S.I.; Tumpa, F.A. Location agnostic source-free domain adaptive learning to predict solar power generation. In Proceedings of the 2023 IEEE International Conference on Energy Technologies for Future Grids (ETFG), Wollongong, Australia, 3–6 December 2023; pp. 1–6. [Google Scholar]
- Scott, C.; Ahsan, M.; Albarbar, A. Machine learning for forecasting a photovoltaic (PV) generation system. Energy 2023, 278, 127807. [Google Scholar] [CrossRef]
- Visser, L.; AlSkaif, T.; Khurram, A.; Kleissl, J.; van Sark, W. Probabilistic solar power forecasting: An economic and technical evaluation of an optimal market bidding strategy. Appl. Energy 2024, 370, 123573. [Google Scholar] [CrossRef]
- Schierholz, C.M.; Scharff, K.; Schuster, C. Evaluation of neural networks to predict target impedance violations of power delivery networks. In Proceedings of the 2019 IEEE 28th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), Montreal, QC, Canada, 6–9 October 2019; pp. 1–3. [Google Scholar]
- Pariaman, H.; Luciana, G.; Wisyaldin, M.; Hisjam, M. Anomaly detection using lstm-autoencoder to predict coal pulverizer condition on coal-fired power plant. J. Nov. Carbon Resour. Sci. Green Asia Strategy 2021, 1, 89–97. [Google Scholar] [CrossRef]
- Huang, Y.; Zhao, Y.; Wang, Z.; Liu, X.; Fu, Y. Sparse dynamic graph learning for district heat load forecasting. Appl. Energy 2024, 371, 123685. [Google Scholar] [CrossRef]
- Wang, X.; Wang, H.; Li, S.; Jin, H. A reinforcement learning-based online learning strategy for real-time short-term load forecasting. Energy 2024, 305, 132344. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Jinil, N.; Reka, S. Deep learning method to predict electric vehicle power requirements and optimizing power distribution. In Proceedings of the 2019 Fifth International Conference on Electrical Energy Systems (ICEES), Chennai, India, 21–22 February 2019; pp. 1–5. [Google Scholar]
- Gao, Z.; Yu, J.; Zhao, A.; Hu, Q.; Yang, S. A hybrid method of cooling load forecasting for large commercial building based on extreme learning machine. Energy 2022, 238, 122073. [Google Scholar] [CrossRef]
- Niu, D.; Yu, M.; Sun, L.; Gao, T.; Wang, K. Short-term multi-energy load forecasting for integrated energy systems based on CNN-BiGRU optimized by attention mechanism. Appl. Energy 2022, 313, 118801. [Google Scholar] [CrossRef]
- Guo, Y.; Li, Y.; Qiao, X.; Zhang, Z.; Zhou, W.; Mei, Y.; Lin, J.; Zhou, Y.; Nakanishi, Y. BiLSTM multitask learning-based combined load forecasting considering the loads coupling relationship for multienergy system. IEEE Trans. Smart Grid 2022, 13, 3481–3492. [Google Scholar] [CrossRef]
- Cui, W.; Yang, W.; Zhang, B. A frequency domain approach to predict power system transients. IEEE Trans. Power Syst. 2023, 39, 465–477. [Google Scholar] [CrossRef]
- Huang, Y.; Zhao, Y.; Wang, Z.; Liu, X.; Liu, H.; Fu, Y. Explainable district heat load forecasting with active deep learning. Appl. Energy 2023, 350, 121753. [Google Scholar] [CrossRef]
- Osório, G.J.; Matias, J.C.; Catalão, J.P. Hybrid evolutionary-adaptive approach to predict electricity prices and wind power in the short-term. In Proceedings of the 2014 Power Systems Computation Conference, Wroclaw, Poland, 18–22 August 2014; pp. 1–7. [Google Scholar]
- Li, K.; Mu, Y.; Yang, F.; Wang, H.; Yan, Y.; Zhang, C. Joint forecasting of source-load-price for integrated energy system based on multi-task learning and hybrid attention mechanism. Appl. Energy 2024, 360, 122821. [Google Scholar] [CrossRef]
- Ghimire, S.; Deo, R.C.; Casillas-Pérez, D.; Salcedo-Sanz, S. Two-step deep learning framework with error compensation technique for short-term, half-hourly electricity price forecasting. Appl. Energy 2024, 353, 122059. [Google Scholar] [CrossRef]
- Park, J.; Alvarenga, E.; Jeon, J.; Li, R.; Petropoulos, F.; Kim, H.; Ahn, K. Probabilistic forecast-based portfolio optimization of electricity demand at low aggregation levels. Appl. Energy 2024, 353, 122109. [Google Scholar] [CrossRef]
- Amini, A.; Rey-Mermet, S.; Crettenand, S.; Münch-Alligné, C. A hybrid methodology for assessing hydropower plants under flexible operations: Leveraging experimental data and machine learning techniques. Appl. Energy 2025, 383, 125402. [Google Scholar] [CrossRef]
- Deng, W.; Le, H.; Nguyen, K.T.; Gogu, C.; Medjaher, K.; Morio, J.; Wu, D. A Generic physics-informed machine learning framework for battery remaining useful life prediction using small early-stage lifecycle data. Appl. Energy 2025, 384, 125314. [Google Scholar] [CrossRef]
- Song, H.; Zhang, B.; Jalili, M.; Yu, X. Multi-swarm multi-tasking ensemble learning for multi-energy demand prediction. Appl. Energy 2025, 377, 124553. [Google Scholar] [CrossRef]
- Motevakel, P.; Roldán-Blay, C.; Roldán-Porta, C.; Escrivá-Escrivá, G.; Dasí-Crespo, D. Hybrid energy solutions for enhancing rural power reliability in the Spanish municipality of Aras de los Olmos. Appl. Sci. 2025, 15, 3790. [Google Scholar] [CrossRef]
- Diaz-Iglesias, A.; Belaunzaran, X.; Florez-Tapia, A.M. Short-Term Power Demand Forecasting for Diverse Consumer Types to Enhance Grid Planning and Synchronisation. arXiv 2025, arXiv:2506.04294. [Google Scholar] [CrossRef]
- Pinheiro, M.G.; Madeira, S.C.; Francisco, A.P. Short-term electricity load forecasting—A systematic approach from system level to secondary substations. Appl. Energy 2023, 332, 120493. [Google Scholar] [CrossRef]
- Hasan, M.; Mifta, Z.; Papiya, S.J.; Roy, P.; Dey, P.; Salsabil, N.A.; Chowdhury, N.U.R.; Farrok, O. A state-of-the-art comparative review of load forecasting methods: Characteristics, perspectives, and applications. Energy Convers. Manag. 2025, 26, 100922. [Google Scholar] [CrossRef]
- Dong, J.; Olama, M.M.; Kuruganti, T.; Melin, A.M.; Djouadi, S.M.; Zhang, Y.; Xue, Y. Novel stochastic methods to predict short-term solar radiation and photovoltaic power. Renew. Energy 2020, 145, 333–346. [Google Scholar] [CrossRef]
- Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
- Benato, A.; Bracco, S.; Stoppato, A.; Mirandola, A. LTE: A procedure to predict power plants dynamic behaviour and components lifetime reduction during transient operation. Appl. Energy 2016, 162, 880–891. [Google Scholar] [CrossRef]
- Suganthi, L.; Samuel, A.A. Energy models for demand forecasting—A review. Renew. Sustain. Energy Rev. 2012, 16, 1223–1240. [Google Scholar] [CrossRef]
- Xie, X.; Parlikad, A.K.; Puri, R.S. A neural ordinary differential equations based approach for demand forecasting within power grid digital twins. In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Beijing, China, 21–23 October 2019; pp. 1–6. [Google Scholar]
- Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
- Beinert, D.; Holzhüter, C.; Thomas, J.M.; Vogt, S. Power flow forecasts at transmission grid nodes using graph neural networks. Energy AI 2023, 14, 100262. [Google Scholar] [CrossRef]
- Azeem, A.; Ismail, I.; Jameel, S.M.; Romlie, F.; Danyaro, K.U.; Shukla, S. Deterioration of electrical load forecasting models in a smart grid environment. Sensors 2022, 22, 4363. [Google Scholar] [CrossRef]
- Jagait, R.K.; Fekri, M.N.; Grolinger, K.; Mir, S. Load forecasting under concept drift: Online ensemble learning with recurrent neural network and ARIMA. IEEE Access 2021, 9, 98992–99008. [Google Scholar] [CrossRef]
- Díaz-Bedoya, D.; González-Rodríguez, M.; Clairand, J.M.; Serrano-Guerrero, X.; Escrivá-Escrivá, G. Forecasting Univariate Solar Irradiance using Machine learning models: A case study of two Andean Cities. Energy Convers. Manag. 2023, 296, 117618. [Google Scholar] [CrossRef]



| ID | Research Question |
|---|---|
| RQ1 | What are the primary input and output characteristics considered in the development of electrical load prediction models? |
| RQ2 | How can identified models and frameworks be applied effectively to improve electrical grid planning and management? |
| RQ3 | How do different models and architectures compare in terms of accuracy and performance metrics for the prediction of electrical loads? |
| Model Type | Hybrid Architecture | Functional Logic |
|---|---|---|
| PCLA (Parallel CNN-LSTM Attention [1] | Parallel (Ensemble) | Extracts spatial (CNN) and temporal (LSTM) features simultaneously in parallel branches to capture spatiotemporal characteristics before fusion. |
| Physics-Informed (PV model + RF/SVR/ANN) [3] | Series (Sequential) | Feature fusion with physics-based variables is achieved by concatenating raw meteorological data with derived quantities from a PV performance model prior to training ML models such as RF. |
| GA-LSTM [4] | Embedded (Optimization) | Integrates GA to optimize the number of time lags and hidden layers for LSTM model. |
| MSC-PSO-SVR [11] | Embedded (Optimization) | Integrates a new technique called multi-space collaboration (MSC) framework with Particle Swarm Optimization (PSO) to tune Support Vector Regression (SVR) parameters, preventing local optima. |
| SWT-TTGAT-GTC [14] | Series (Sequential) | Pipeline applies SWT for denoising and improving data quality, followed by graph attention (spatial) and gated temporal convolution (temporal). |
| ELM-SCO [38] | Embedded (Optimization) | Embed the SCO algorithm to optimize the initial weights and biases of an ELM to prevent overfitting. |
| Attention based CNN-LSTM-BiLSTM [45] | Series (Sequential) | Operates in a feature fusion pipeline such that CNN extracts features, an attention block assigns weights, and LSTM-BiLSTM forecasts the load. |
| SDGNN [53] | Series (Sequential) | A pipeline that constructs a sparse dynamic graph, enhances spatio temporal memory via convolution, and fuses features for global forecasting. |
| CNN-BiGRU Attention (Multi-task) [58] | Parallel (Ensemble) | Uses a hard weight sharing mechanism for multi task learning to share coupling information among cooling, heat, and electrical loads, combined with feature extraction. |
| VMD-CLSTM-VMD-ERCRF [64] | Series (Sequential) | Decomposition based pipeline where VMD decomposes data for a CNN-LSTM forecast; residual errors are then decomposed and corrected by RF. |
| ARMA-GARCH [65] | Embedded (Optimization) | Error mitigation using GA to optimize the portfolio (aggregation) of demand to minimize probabilistic forecast error. |
| MSMTEL [68] | Embedded (Optimization) | Error mitigation using multi swarm PSO to optimize knowledge transfer between tasks, followed by a PSO- optimized weighted ensembles. |
| MCNN-SCAM-LSTM-MTL [63] | Parallel (Ensemble) | A hybrid attention mechanism combined to enable separate extraction and unified fusion of features. Uses a Multi-Column CNN to extract features fused by SCAM. |
| Aspect | Static Length Forecasting | Dynamic Length Forecasting |
|---|---|---|
| Definition | Predicts a predetermined, fixed number of future time steps (e.g., from time t to , where L is constant across all instances). | Predicts a variable number of future time steps (e.g., from time t to , where y adapts based on instance specific factors like component properties). |
| Parameter handling | Assumes uniform or averaged parameters (e.g., standard material sizes, processing times) across components. | Incorporate instance specific parameters (e.g., varying dimensions, material quantities, or process durations per component). |
| Data dependency | Relies on generalized, aggregated, or historical average data; lesser need for fine grained details. | Relies on detailed, instance specific data (e.g., from simulations or measurements for each component). |
| Training data | Require uniform length sequences (often achieved via padding truncation if raw data varies). | Natively handles variable length sequences (modern approaches use masking; classical method may still pad/truncate). |
| Model approach | Uses models with fixed output dimensions (e.g., direct multi step prediction with a static horizon). | Uses adaptive models (e.g., auto regressive with stop tokens, encoder-decoder with dynamic decoding, or masking in transformers) which adjust output length per instance. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Eckhoff, J.; Wadhwa, S.; Fette, M.; Wulfsberg, J.P.; Wanigasekara, C. Electrical Load Forecasting in the Industrial Sector: A Literature Review of Machine Learning Models and Architectures for Grid Planning. Energies 2026, 19, 538. https://doi.org/10.3390/en19020538
Eckhoff J, Wadhwa S, Fette M, Wulfsberg JP, Wanigasekara C. Electrical Load Forecasting in the Industrial Sector: A Literature Review of Machine Learning Models and Architectures for Grid Planning. Energies. 2026; 19(2):538. https://doi.org/10.3390/en19020538
Chicago/Turabian StyleEckhoff, Jannis, Simran Wadhwa, Marc Fette, Jens Peter Wulfsberg, and Chathura Wanigasekara. 2026. "Electrical Load Forecasting in the Industrial Sector: A Literature Review of Machine Learning Models and Architectures for Grid Planning" Energies 19, no. 2: 538. https://doi.org/10.3390/en19020538
APA StyleEckhoff, J., Wadhwa, S., Fette, M., Wulfsberg, J. P., & Wanigasekara, C. (2026). Electrical Load Forecasting in the Industrial Sector: A Literature Review of Machine Learning Models and Architectures for Grid Planning. Energies, 19(2), 538. https://doi.org/10.3390/en19020538

