Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

Kim, Tae-Geun; Yoon, Sung-Guk; Song, Kyung-Bin

doi:10.3390/en18133229

Open AccessFeature PaperArticle

Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

by

Tae-Geun Kim

¹

,

Sung-Guk Yoon

²

and

Kyung-Bin Song

^2,*

¹

Department of Electrical Engineering, Soongsil University, Seoul 06978, Republic of Korea

²

Department of Electrical Engineering and Convergence of Energy Policy and Technology, Soongsil University, Seoul 06978, Republic of Korea

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(13), 3229; https://doi.org/10.3390/en18133229

Submission received: 21 May 2025 / Revised: 15 June 2025 / Accepted: 18 June 2025 / Published: 20 June 2025

(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 4th Edition)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a very short-term load forecasting (VSTLF) model tailored for large-scale power systems, employing a gated recurrent unit (GRU) network enhanced with an attention mechanism. To improve forecasting accuracy, a systematic input feature selection method based on Normalized Mutual Information (NMI) is introduced. Additionally, a novel input feature termed the load variationis proposed to explicitly capture real-time dynamic load patterns. Tailored data preprocessing techniques are applied, including load reconstitution to account for the impact of Behind-The-Meter (BTM) solar generation, and a weighted averaging method for constructing representative weather inputs. Extensive case studies using South Korea’s national power system data from 2021 to 2023 demonstrate that the proposed GRU-attention model significantly outperforms existing approaches and benchmark models. In particular, when expressing the accuracy of the proposed method in terms of the error rate, the Mean Absolute Percentage Error (MAPE) is 0.77%, which shows an improvement of 0.50 percentage points over the benchmark model using the Kalman filter algorithm and an improvement of 0.27 percentage points over the hybrid deep learning benchmark (CNN-BiLSTM). The simulation results clearly demonstrate the effectiveness of the NMI-based feature selection and the combination of load characteristics for very short-term load forecasting.

Keywords:

deep learning; large power system; load forecasting; machine learning; real-time load forecasting; time-series forecasting; very short-term load forecasting (VSTLF)

1. Introduction

Very short-term load forecasting (VSTLF) refers to predicting electricity demand over a time horizon ranging from several minutes to several hours ahead. VSTLF provides higher-resolution, real-time load profiles compared to conventional short-term load forecasting (STLF). Accurate VSTLF is essential not only for maintaining power system stability under rapid load fluctuations but also for optimizing generator dispatch and managing energy storage systems. Furthermore, VSTLF plays a pivotal role in electricity market operations. Real-time market activities, including price determination, market clearing, and generation scheduling, heavily depend on accurate STLF and VSTLF. Inaccurate forecasts can lead to price volatility, increased imbalance costs, and inefficient dispatch decisions, adversely affecting both electricity suppliers and consumers. Moreover, demand response programs rely on precise VSTLF to maximize their effectiveness. Enhancing forecasting accuracy enables market participants to optimize bidding strategies, mitigate financial risks, and improve overall market efficiency.

Numerous studies have investigated load forecasting in recent years. Load forecasting methods can generally be categorized into statistical approaches [1,2,3] and machine learning approaches [4,5,6]. Statistical techniques, such as the autoregressive integrated moving average (ARIMA) model [1], exponential smoothing [2], and least absolute shrinkage and selection operator (LASSO) regression [3], have been widely applied. However, their performance is limited when addressing the nonlinear and uncertain nature of load, especially in systems with high integration of variable renewable energy sources (RESs).

To overcome these limitations, various machine learning techniques have been explored, including neural networks [4], gradient boosting algorithms [5], and support vector machines (SVMs) [6]. These approaches have shown superior performance by effectively capturing complex nonlinear relationships between input variables. Among them, neural network-based methods such as deep neural networks (DNNs) [7], convolutional neural networks (CNNs) [8], recurrent neural networks (RNNs) [9], and attention-based models [10] have gained significant attention. Given the sequential nature of load data, RNN-based models—such as long short-term memory (LSTM) and gated recurrent unit (GRU)—as well as CNN-based models like temporal convolutional networks (TCNs), have been widely adopted for STLF. For instance, Kwon et al. [11] proposed STLF using LSTM. Lin et al. [10] applied LSTM with attention for STLF, while Kong et al. [12] used LSTM. Cai et al. [13] combined variational mode decomposition (VMD) with GRUs and TCNs to enhance forecasting accuracy. Hua et al. [14] adopts a CNNs-GRU hybrid architecture combined with multi-head attention for STLF. He et al. [15] employs a DDPG algorithm for GRU hyperparameter tuning for STLF. Ahmad et al. [16] proposed TFTformer, a transformer-based model that integrates temporal convolution and feature-specific embeddings to enhance short-term load forecasting accuracy across diverse regional datasets.

Although these studies contribute to load forecasting, their direct applicability to VSTLF remains limited. Most of them focus on longer forecasting horizons and do not adequately address the rapid load fluctuations caused by high RES variability. These studies often rely solely on historical load and temperature as input features without employing a clear feature selection process, which limits their ability to capture real-time system dynamics. However, input feature selection is critical in load forecasting due to the strong dependency on past demand and meteorological variables, especially in VSTLF, where timely adaptation to changing conditions is essential. Furthermore, these studies focus on relatively small-scale power systems with peak loads up to 15 GW, limiting their generalizability to large-scale systems such as the one considered in this study, which has a peak load of approximately 97 GW, including significant BTM loads. In large-scale power systems, spatial heterogeneity in meteorological conditions becomes significant, necessitating the aggregation of data from multiple weather stations across regions to derive representative weather inputs. This introduces additional complexity, limiting the direct scalability of previous approaches to more realistic operational environments.

Several studies have specifically addressed VSTLF. Pati et al. [17] proposed a method based on incomplete fuzzy decision systems and genetic algorithms for small-city VSTLF. Wang et al. [18] utilized a combination of TCN and Light Gradient Boosting Machine (LGBM) to extract spatial–temporal features for load forecasting. Rafati et al. [19] applied artificial neural networks and SVMs in a photovoltaic (PV)-integrated microgrid. Jiang et al. [20] used a deep-autoformer model for household-level VSTLF. Zhang et al. [21] combined improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) and bidirectional LSTM (Bi-LSTM) for small-city VSTLF in China. Cheng et al. [22] developed a multi-head 1D convolutional block attention module for VSTLF in both Chinese and New England cities. However, most of these studies focus on one-step-ahead forecasting, which limits their usefulness for real-time system operations and market planning.

Some studies have explored multi-step VSTLF [23,24,25]. Vontzos et al. [23] compared multilayer perceptron (MLP), LSTM, and Bi-LSTM to forecast six future steps at five-minute intervals for a Greek airport building. Wang et al. [24] forecasted 60 steps ahead at one-minute intervals for a Chinese substation by combining Prophet-based decomposition and VMD for feature extraction, followed by models such as GRU, RNN, LSTM, and TCN. Zhao et al. [25] proposed a diffusion–attention-enhanced temporal (DATE-TM) model that integrates multiple features to perform 24-step ahead household VSTLF at 1 min intervals. Yang et al. [26] developed hybrid CNN-GRU and TCN-GRU models for 15 min-ahead VSTLF at a 1 min resolution, incorporating SHAP-based interpretability to assess feature importance. While these efforts demonstrate multi-step forecasting, their scope is limited to small-scale areas such as household and buildings, making them less applicable to large-scale power system operations and real-time market applications.

Although the aforementioned studies contribute to the development of multi-step VSTLF methods, they primarily focus on load forecasting without explicitly addressing the distortions introduced by BTM solar generation. In power systems with widespread deployment of distributed PV resources, the observed load often significantly deviates from the actual system demand due to BTM consumption. This discrepancy poses a major challenge for accurate forecasting and system operation. To address this challenge, Tziolis et al. [27] proposed a short-term net load forecasting model for solar-integrated distribution systems using Bayesian neural networks, incorporating solar irradiance as an input feature to implicitly account for PV generation variability. Kerkau et al. [28] developed a day-ahead net load forecasting model for renewable integrated buildings using XGBoost, with solar irradiance as a key input variable to capture the impact of rooftop PV generation. Yuan et al. [29] proposed a net load forecasting model based on LSTM networks for distribution grid planning, utilizing solar irradiance to reflect the variability in distributed PV generation. While these approaches consider the impact of solar generation, they rely solely on external features such as irradiance and PV output to approximate BTM effects. Consequently, they may fail to fully capture the distortions caused by self-consumption of BTM solar generation. In contrast, Bae et al. [30] proposed an XGBoost-based day-ahead forecasting algorithm that reconstructs load profiles by combining measured net load with estimated BTM solar generation, thereby enabling more accurate forecasting in high PV-integration environments. Unlike studies that indirectly account for PV variability through solar irradiance inputs, this method directly reconstructs gross load to mitigate the masking effects of BTM generation. Although the reconstituted load approach enhances day-ahead forecasting accuracy, its applicability to VSTLF remains to be investigated to assess robustness and performance in very short-term scenarios.

To address the limitations of existing VSTLF research, this paper proposes a novel multi-step VSTLF model based on a GRU integrated with an attention mechanism. The model is specifically designed for large-scale systems and forecasts net load up to six hours ahead using nationwide data from South Korea. The reconstituted load method is specifically adapted to account for the effects of BTM solar generation. And to enhance input selection, a feature selection technique based on normalized mutual information (NMI) is introduced. Furthermore, a novel input variable termed “load variation” is proposed to explicitly capture the dynamic behavior of real-time load changes.

Comprehensive case studies are conducted to validate the effectiveness of the proposed model, evaluating both the impact of NMI-based input selection and the contribution of the new input feature. The main contributions of this paper are summarized as follows:

Development of a Large-Scale VSTLF Model: A VSTLF model tailored for a national-scale power system (South Korea) is proposed. The model forecasts the net load over a six-hour horizon, making it well-suited for real-time system operations and electricity market planning.
Customized Data Preprocessing for Large Systems: Two key preprocessing steps are introduced. First, reconstituted load is calculated to account for BTM solar generation. Second, a weighted average method is applied to derive representative weather inputs at a national scale.
Input Feature Selection Using NMI: A systematic method based on NMI is proposed to select dominant features, including weather variables and historical load patterns, by evaluating their mutual information (MI) with the target load.
Introduction of the “load variation” Feature: A new input feature is introduced to reflect the real-time rate of load change, enabling the model to better capture dynamic load behavior.
Parallel Input Architecture with GRU and Attention: A GRU-attention model architecture is designed to effectively process multi-dimensional inputs. GRU units learn temporal dependencies, while the attention mechanism identifies the relative importance and inter-dependencies of input features at each time step.

The remainder of this paper is organized as follows. Section 2 describes the characteristics of the South Korean load dataset used in this study. Section 3 presents the proposed algorithm. Section 4 provides results on feature selection, hyperparameter tuning, and comparative performance evaluation. Finally, Section 5 concludes the paper.

2. Load Characteristic Analysis

Load in power systems is shaped by temporal patterns—such as seasonality, day-of-week effects, and holidays—as well as socio-economic factors including economic growth. Accurate forecasting therefore requires rigorous analysis and integration of these inter-dependencies.

In addition, variability introduced by RES—primarily influenced by meteorological conditions—must also be considered. Specifically, ambient temperature and humidity exert primary influence on heating and cooling loads; precipitation and sky conditions affect photovoltaic output; and wind speed and direction determine wind power variability.

This study forecasts South Korea’s net load over a six-hour horizon at 15 min intervals. As of 2023, the national grid recorded a peak load of 93,615 MW and an installed capacity of 142,567 MW [31], with industrial consumption accounting for 53% of the total electricity demand [32]. Consequently, weekday demand—when industrial operations are active—is generally higher than on weekends. Monday morning loads tend to be lower due to reduced activity on the preceding Sunday, while Tuesday through Friday (i.e., “normal weekdays”) exhibit similar net-load profiles when weather effects are excluded. Figure 1 presents the daily net-load profile and the hour-by-hour distribution of net-load values.

As shown in Figure 1a,b, net-load variability is highest between 09:00 and 17:00, driven by a combination of social activity, seasonal influences, and the intermittent nature of solar PV generation.

In 2023, RESs accounted for approximately 9.67% of South Korea’s total power generation [33], with solar PV contributing 55.03% of the total renewable energy output. This substantial share of solar PV generation causes significant distortions in observed load profiles. Additionally, solar PV output is inherently variable, being directly influenced by solar irradiance, which fluctuates due to atmospheric conditions such as cloud cover, humidity, and time of day. As a result, the net load measured at the system level deviates from the actual demand, complicating the load forecasting process. Figure 2 illustrates the relationship between solar power generation and net load under contrasting weather scenarios—sunny versus cloudy days. In 2023, this variation in solar output led to a net-load difference of 9582 MW between the two conditions.

Hence, accurate VSTLF depends on the integration of periodic load patterns, weather-related variables, and advanced methods that can effectively capture the inherent variability in BTM solar generation.

3. Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

This paper proposes a VSTLF model that utilizes a GRU architecture enhanced with an attention mechanism to forecast net load over a 6 h horizon at 15 min intervals in large-scale power systems. The overall workflow is illustrated in Figure 3.

The proposed methodology begins with feature selection from historical load and meteorological data using NMI. The selected features are then preprocessed, including normalization and load reconstitution, to account for the BTM solar generation. The resulting dataset is then split into training and testing subsets, and the GRU-attention model is trained on the processed inputs. Forecast outputs are subsequently post-processed through denormalization and adjustment to yield the final net-load predictions. Finally, model performance is evaluated using standard accuracy metrics and comparisons against benchmark models.

3.1. Backgrounds of Gated Recurrent Unit and Attention Mechanism

This study employs GRU [34] due to its capability to learn long-range dependencies in load time-series data while alleviating the vanishing-gradient problem. The GRU architecture incorporates update and reset gates that adaptively regulate the flow of information, enabling the model to effectively capture relevant temporal patterns over extended sequences. As a result, the GRU is well suited to the proposed forecasting framework. The internal structure of the GRU cell utilized in this study is depicted in Figure 4.

The GRU update gate (

z_{t}

), reset gate (

r_{t}

), and candidate hidden state (

{\tilde{h}}_{t}

) at timestep t are defined as

z_{t} = σ (W_{z} [h_{t - 1}, X_{t}] + b_{z}),

(1)

r_{t} = σ (W_{r} [h_{t - 1}, X_{t}] + b_{r}),

(2)

{\tilde{h}}_{t} = \tanh (W_{h} [h_{t - 1} \circ r_{t}, X_{t}] + b_{h}),

(3)

where

z_{t}

,

r_{t}

, and

\tilde{h_{t}}

denote the activation vectors of the update gate, reset gate, and candidate hidden state, respectively. And

W_{z}, W_{r}, W_{h}

represent the input weight matrices associated with the update gate, reset gate, and candidate hidden state calculations, respectively. Also,

X_{t}

is the input vector at timestep t,

h_{t - 1}

is the hidden state from the previous timestep,

σ

represents the sigmoid activation function, tanh represents the hyperbolic tangent activation function, and ∘ represents element-wise multiplication. The final hidden state (

h_{t}

) of GRU at timestep t is computed by combining the previous hidden state (

h_{t - 1}

) and the candidate hidden state (

\tilde{h_{t}}

), controlled by the update gate

z_{t}

. Unlike LSTM [35], which maintains both a cell state and a hidden state, the GRU’s single hidden state (

h_{t}

) integrates information about the past and the current input. The final output

h_{t}

is formulated as

h_{t} = (1 - z_{t}) \circ h_{t - 1} + z_{t} \circ {\tilde{h}}_{t} .

(4)

In addition to the GRU architecture, this study incorporates an attention mechanism, originally introduced by Vaswani et al. [36], to enable the model to focus on the most informative parts of the input sequence. The attention mechanism assigns varying weights to input features or time steps, thereby prioritizing more relevant information rather than treating all inputs uniformly. owing to this ability to selectively emphasize salient inputs, the attention mechanism has been widely adopted across multiple domains, including image processing, natural language processing (NLP), and time-series forecasting. The core concept is that, at each step, the model dynamically re-evaluates the input sequence to attend to elements most pertinent to the current prediction. Figure 5 illustrates the dot-product attention method, a commonly used implementation in sequence modeling tasks.

The inputs to the attention mechanism are typically referred to as Query (Q), Key (K), and Value (V). The Query matrix

Q \in R^{L_{q} \times d_{k}}

, Key matrix

K \in R^{L_{k} \times d_{k}}

, and Value matrix

V \in R^{L_{k} \times d_{k}}

, where

L_{q}, L k

, and

L_{v}

denote sequence lengths, and

d_{k}, d_{v}

represent the dimensionalities of the key/query and value vectors, respectively. The lengths of the key and value sequences must be equal (

L_{k} = L_{v}

), since each key is associated with a corresponding value. Furthermore,

d_{k}

must be the same for both queries and keys to allow for valid dot-product computation. The dot-product attention mechanism proceeds as follows. First, similarity scores between query and key vectors denoted as Attention score are calculated using matrix multiplication:

Attention Score (Q, K) = Q K^{T} .

(5)

To stabilize training and control the scale of gradients, the scores are scaled by the square root of

d_{k}

:

Scaled Attention Score (Q, K) = \frac{Q K^{T}}{\sqrt{d_{k}}} .

(6)

A softmax function is then applied to obtain attention weights:

Attention weights (Q, K) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) .

(7)

These weights reflect the relative importance of each value vector with respect to the corresponding query. The final output is obtained by computing the weighted sum of the value vectors:

Attention (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(8)

The key advantage of dot-product attention lies in its ability to selectively emphasize relevant parts of the input sequence, thereby enhancing the model’s capacity to capture complex dependencies. This selective weighting improves forecasting performance compared to using the GRU alone.

3.2. Input Feature Selection Using Normalized Mutual Information

Accurate load forecasting requires consideration of the nonlinear dependencies between the load and input variables. Traditional linear correlation techniques, such as the Pearson correlation coefficient, are insufficient for capturing these complex relationships. To address this limitation, this study employs MI, a concept from information theory that quantifies the statistical dependence between two variables—accounting for both linear and nonlinear associations [37]. MI measures the amount of information gained about one variable through the observation of another. It can be defined either in terms of probability distributions or using entropy. The MI between two random variables X and Y is given by:

MI (X; Y) = \sum_{y \in Y} \sum_{x \in X} p (x, y) \log (\frac{p (x, y)}{p (x) p (y)}) = H (X) + H (Y) - H (X, Y),

(9)

where x and y denote specific values of X and Y,

p (x), p (y),

and

p (x, y)

denote the marginal and joint probability distributions, and

H (\cdot)

represents entropy. The marginal entropies of X and Y are defined as:

H (X) = - \sum_{x \in X} p (x) \log_{2} p (x), H (Y) = - \sum_{y \in Y} p (y) \log_{2} p (y),

(10)

and the joint entropy is

H (X, Y) = - \sum_{x \in X} \sum_{y \in Y} p (x, y) \log_{2} p (x, y) .

(11)

However, the raw MI value is sensitive to the entropy scale of the individual variables, making direct comparisons between variable pairs difficult. To overcome this issue, NMI [38] is adopted in this study. NMI standardizes the MI value by the combined entropy of the two variables:

N M I (X, Y) = \frac{M I (X; Y)}{H (X) + H (Y)} .

(12)

NMI values range from 0 to 1, with higher values indicating stronger statistical dependence and greater relevance for predicting the target variable. In this study, NMI is calculated between the target load

L_{t a r g e t}

and various input features, including meteorological variables—temperature (T), humidity (H), wind speed (

W S

), weather conditions (

W C

), and precipitation (R)—as well as historical load values from one day (D-1) to seven days (D-7) prior. The complete feature selection process based on NMI is illustrated in Figure 6.

As shown in Figure 6, the input feature selection process based on NMI is applied to two categories of input variables: weather-related features (denoted as W), and historical input dates (denoted as L). The weather input feature (

W_{I N P U T}

) and historical input date (

L_{I N P U T}

) are selected by using threshold parameters;

α

and

β

. These thresholds are determined empirically through simulations to balance model complexity, forecasting accuracy, and training efficiency.

3.3. Additional Input Feature—Load Variation

To improve forecasting accuracy, it is beneficial to incorporate engineered features that explicitly capture load dynamics, in addition to conventional inputs such as historical load values and weather variables. While historical load values primarily represent absolute magnitudes, features characterizing load variations can provide critical insights into short-term fluctuation patterns, ramp rates, and deviations from typical behavior. Therefore, this study introduces and evaluates “load variation (

Δ L

)” features as

Δ L (t) = L (t) - L (t_{0}),

(13)

where t represents the current time step index and

t_{0}

represents the time step of the model input. The term

Δ L

captures the change in load relative to the initial point of the input sequence. To assess its relevance, the NMI is also computed between

Δ L

and

L_{t a r g e t}

.

3.4. Data Preprocessing

To train an accurate VSTLF model, two preprocessing steps are essential: timestamp synchronization across all input variables and reduction in input uncertainty.

Timestamp synchronization: This study integrates multiple datasets, including net load, calendar attributes, and meteorological measurements, for model training. Since the forecasting target is a large-scale power system, weather data were collected from eight representative meteorological observatories. Given the 6 h forecasting horizon with 15 min resolution, all datasets were resampled accordingly. To ensure temporal alignment across all input features, linear interpolation was applied for missing or mismatched timestamps.

Weighted average weather: Forecasting net load over a wide geographic area is particularly challenging due to spatial variability in weather conditions. Relying on data from a single observatory fails to fully capture the heterogeneous weather influences. To address this, a representative weather input was derived by computing a weighted average of meteorological measurements from the eight observatories, following the methodology in [39]. The representative weather

W_{a v g}

is computed as follows:

W_{a v g} = \sum_{i = 1}^{8} α_{i} \cdot w_{i},

(14)

where

α_{i}

and

w_{i}

are the weight [39] and weather data of city i, respectively.

Load reconstitution: In addition to using representative weighted-average weather inputs, the increasing penetration of BTM solar generation has substantially distorted observed net-load profiles—especially during peak daylight hours—thereby reducing forecasting accuracy when relying on raw load data alone [30]. To mitigate this issue, a load reconstitution procedure is applied. Specifically, historical BTM solar generation is first added back to the observed net load, resulting in a reconstituted load series that excludes the influence of solar generation. The VSTLF model is trained on this reconstituted load. During inference, forecasted BTM solar generation over the prediction horizon is subtracted from the reconstituted forecast to yield the final net-load prediction. The complete procedure is illustrated in Figure 7.

Normalization: All input features are normalized to a common [0, 1] scale to ensure stable and efficient model training. Discrete or categorical inputs—such as hour of day, calendar indicators (year, month, day, day of the week), weather condition flags, and precipitation—are first encoded using ordinal schemes. Continuous variables are subsequently normalized using min–max scaling, which has been widely adopted in load forecasting tasks due to its robustness and simplicity [40]. The normalization process is computed as follows:

x {(t)}^{'} = \frac{x (t) - m i n (x (t))}{m a x (x (t)) - m i n (x (t))},

(15)

where

x (t)

represents the value at t, and

m a x (x)

and

m i n (x)

denote the maximum and minimum values of the respective data series. Then, the normalized datasets are reshaped in order to feed into the proposed VSTLF model.

3.5. Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

In this section, we propose a VSTLF model architecture for large-scale power systems based on a GRU-attention mechanism. The combination of GRU networks and attention mechanisms is adopted due to their proven effectiveness in balancing computational efficiency and capturing temporal dependencies in time-series forecasting tasks. Specifically, GRU networks are employed to learn the temporal characteristics of the input data, while the attention mechanism is utilized to capture the inter-dependencies among different datasets. The input structure is designed based on the observation that daily load patterns tend to exhibit similarities on the same day of the week. Accordingly, daily sequences are constructed, where each time step represents a specific day, and the hourly load values are treated as feature dimensions. The proposed VSTLF model incorporates multiple types of input data, including meteorological variables, load, and load variation. To effectively process these diverse inputs, a parallel GRU structure is introduced, wherein each GRU independently extracts the temporal features of its corresponding input. This parallel design prevents potential distortions that may arise when heterogeneous data types are merged into a single input sequence. The outputs of each GRU capture the temporal features of individual input variables but do not inherently model inter-feature relationships. To address this limitation, an attention mechanism is applied to the GRU outputs, enabling the model to learn the relative importance and contextual relationships among the different features. This combined architecture effectively captures both temporal dynamics and cross-feature dependencies, thereby enhancing forecasting accuracy. The overall structure of the proposed VSTLF model is depicted in Figure 8.

The model ingests two groups of input features: (1) historical time-series data, including load variation (

Δ L

), historical load (L), weather variables (W), and temporal indicators (

T_{t i m e}

); and (2) forecasting-day features, which include calendar information—such as year (Y), month (M), day (D), and day of the week (

D o W

)—as well as weather forecasts (

W_{F o r}

). Initially, NMI-based feature selection is applied to identify the most informative load and weather features. The selected features are then resampled to a 15 min resolution, normalized using min–max scaling, and reshaped to match the input format of the model. Each time-series input stream is passed through its corresponding GRU layer stack. The GRU outputs serve as the Value (V) inputs to dot-product attention blocks, with the Query (Q) and Key (K) derived from the same GRU outputs. The attention mechanism adaptively reweights each feature stream based on its relevance to the forecasting task. The attended outputs are then concatenated with the forecasting-day features and passed through fully connected (FC) layers. The final output layer produces a 1 × 24 vector representing net-load forecasts for the subsequent six hours. The model is trained using samples generated via a sliding-window approach, as shown in Figure 9, where the window advances by one time step for each training instance.

Model parameters are optimized based on the mean squared error (MSE) between the predicted and actual net-load values. During training, the network parameters are optimized by minimizing the mean squared error (MSE), which quantifies the average squared difference between the predicted and actual net-load values. The MSE function is defined as

MSE = \frac{1}{N} \sum_{t = 1}^{N} {(L_{t}^{Actual} - L_{t}^{Forecast})}^{2},

(16)

where N denotes the number of forecasted values,

L_{t}^{Actual}

and

L_{t}^{Forecast}

denotes the actual load value at t and forecasted load value at t, respectively. This MSE value serves as a loss function of the proposed model. The optimization process aims to find the network parameters that result in the lowest MSE. In this paper, the Adaptive Moment Estimation (Adam) [41] is adapted for the optimization process.

4. Case Study

To evaluate the performance of the proposed VSTLF algorithm, a case study was conducted on the South Korean power system. This system represents a national-scale grid, with a peak load of 97,115 MW and a total installed generation capacity of 148,709 MW as of 2024. The proposed algorithm is designed to forecast the net load over a 6 h horizon at 15 min intervals.

The model was trained and validated using historical operational data, including net-load measurements and relevant weather variables. A detailed description of the dataset is provided in Section 4.1. Section 4.2 presents the results of input feature selection using NMI and the hyperparameter optimization process. Finally, Section 4.3 provides a comprehensive performance evaluation of the proposed VSTLF model, including a comparative analysis against several benchmark methods.

4.1. Description of Dataset

The dataset used for the case study comprises three main components: 15 min interval net load data, hourly estimated BTM solar generation, and hourly weather observations. The dataset spans the period from 1 January 2021 to 31 December 2023.

The historical 15 min interval net load data were obtained from the Korea Power Exchange (KPX), the Independent System Operator (ISO) of South Korea, and are used to capture historical load patterns. To account for the impact of BTM solar generation—which distorts the net load—hourly estimated BTM solar generation is used for load reconstitution. These hourly values are linearly interpolated to align with the 15 min resolution of the net-load data.

Weather data relevant to real-time system operations—including temperature, humidity, wind speed, weather conditions, and precipitation—were obtained from the Korea Meteorological Administration (KMA). To model the weather dependency of the net load at the national level, hourly observations from eight major cities across South Korea are aggregated into representative inputs using a weighted average method [39]. These hourly weather observations are also linearly interpolated to produce 15 min resolution inputs synchronized with the demand data. A detailed summary of the dataset is provided in Table 1.

The simulations, including model training and forecasting simulations, were conducted in a Python 3.10.15 environment using an Intel Xeon Silver 4215R CPU and an NVIDIA GeForce RTX 3090 GPU.

4.2. Input Feature Selection and Hyperparameter Tuning Results Using Grid Search Algorithm

To analyze the relationship between the reconstitute load and input variables, NMI was calculated using data from 1 January 2021 to 31 December 2022. The NMI values for candidate input features are summarized in Table 2 and Table 3. NMI is computed based on entropy and MI, which quantify the degree of dependency between each input variable and the net load. The NMI values for weather-related variables are presented in Table 2.

As shown in Table 2, among the weather variables, temperature exhibited the highest NMI value of 0.082, which is more than twice that of the next highest variable, wind speed (0.030). This result can be attributed to the fact that heating and cooling demands constitute a significant portion of the total load and are strongly influenced by temperature. In addition to weather variables, the NMI values for historical load and load variation are presented in Table 3.

Compared to the NMI values of weather variables, historical load exhibited significantly higher NMI values. Among the historical load inputs, the NMI of the previous day (D-1) and one week ago (D-7) are relatively high at 0.243 and 0.239, respectively. This can be attributed to the fact that recent load patterns exert the greatest influence on forecasting, and load profiles from the same day of the week typically follow similar patterns. Based on this analysis, good thresholds for

α

and

β

were empirically determined through comparative experiments over a range of candidate values, with each value evaluated using forecasting accuracy metrics. The selected thresholds are

α

= 0.08 and

β

= 0.2. Accordingly, the input features include temperature among the weather variables, as well as historical load-related features such as day-ahead load (D-1), a week-ahead load (D-7), and load variation.

In addition to weather and historical load variables, the load variation (

Δ L

) exhibited the highest NMI value among all input features, reaching 0.542. The impact of input feature selection and the inclusion of

Δ L

is further analyzed in Section 4.3.

Hyperparameter tuning is critical because sub-optimal settings can obscure the true capability and yield misleading performance. Several advanced parameter optimization approaches have been proposed and showed the computational efficiency and accuracy [42]. However, the dimensionality of candidate parameters of this work is small, so complete coverage of every combination is computationally tractable. Due to the small number of candidate parameters, a grid search algorithm [43] is employed for hyperparameter tuning.

The grid search algorithm is an exhaustive global search restricted to the user-defined bounds. It evaluates every feasible combination in order to guarantee that the globally optimal combination is within predefined space. The grid search algorithm is expressed as follows:

Ω = \prod_{i = 1}^{D} Ω_{i},

(17)

here, the D denotes the number of hyperparameters (i.e., the dimensionality of the search space);

Ω_{i}

is the discrete candidate set for the i-th hyperparameter. The

Ω

represents the full search space. And optimal hyperparameter vector (

\hat{θ}

) that minimizes the objective

J (θ)

and cardinality of full search space (

| Ω |

) are expressed as follows:

\hat{θ} = \underset{θ \in Ω}{\arg \min} J (θ),

(18)

J (θ) = {MAPE}_{val} (θ),

(19)

| Ω | = \prod_{i = 1}^{D} | Ω_{i} | .

(20)

In this work, the D is three: length of train data, number of GRU layers, and number of features. The numbers of each search space are 3, 3, and 5, respectively, as shown in Table 4. Therefore, the cardinality is

| Ω | = 3 \times 3 \times 5 = 45

. As the search space comprises 45 combinations, exhaustive evaluation of every combination is tractable and reproducible, making grid search preferable to more advanced hyperparameter-optimization algorithms—such as Bayesian, evolutionary, or other surrogate-based strategies. The resulting best setting is shown in Table 4 and Figure 10. The length of train data, number of GRU layers, and number of input features were selected as 12 months, 1, and 32, respectively.

4.3. Comparison Algorithms and Evaluation Metrics

To assess the performance of the proposed VSTLF algorithm, four comparison models are employed. As a baseline, the Kalman Filter-based Real-Time Load Forecasting model (KRLF) [44], which is currently used by KPX for 15 min interval load forecasting, is selected. In addition, three GRU-attention-based models are designed to compare the impact of input feature selection and the inclusion of the load variation (

Δ L

). These models, along with the proposed method, are summarized in Table 5.

Model 1 does not employ input feature selection nor include the load variation.
Model 2 incorporates only the load variation without input feature selection.
Model 3 adopts only the input feature selection method, excluding the load variation.
TheProposed Model integrates both input feature selection based on NMI and the load variation within the GRU-attention framework.

This comparative setup enables an isolated and combined evaluation of the effects of each component on forecasting performance.

By comparing these algorithms, the performance of the proposed model is validated in three aspects: against a real-time operational model, in terms of the effectiveness of input feature selection, and the contribution of the additional input feature, namely the load variation (

Δ L

). The forecasting performance of the proposed VSTLF model and the comparison algorithms are evaluated using three standard metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). These evaluation metrics are defined as follows:

MAE = \frac{1}{n} \sum_{t = 1}^{n} |\hat{y_{t}} - y_{t}|,

(21)

RMSE = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(\hat{y_{t}} - y_{t})}^{2}},

(22)

MAPE = \frac{100}{n} \sum_{t = 1}^{n} |\frac{\hat{y_{t}} - y_{t}}{y_{t}}|,

(23)

where

\hat{y_{t}}

and

y_{t}

are the forecasted load value and the actual load value at time step t, respectively, and n is the total number of samples.

For performance evaluation, forecasting results on normal days from 1 January 2022 to 31 December 2023, are used. To ensure robust comparison, each algorithm—except for the KRLF model—was trained and tested three times with different random initialization seeds, and the average performance was reported.

4.4. Results of Proposed VSTLF Model and Comparison Models

To assess the performance of the proposed VSTLF model and the comparison models, the annual average values of the evaluation metrics, as well as the metrics at 8:00 AM—a time characterized by high load variability—were analyzed for the years 2022 and 2023. The results are summarized in Table 6. As presented in Table 6, Model 1 showed larger error than KRLF except for total MAPE in both years.

The performance improvements achieved by Models 2 and 3 highlight the contributions of the additional input feature (

Δ L

) and the input feature selection mehtod, respectively. Compared to the KRLF benchmark, Model 2 exhibited higher performance across all evaluation metrics, except for the RMSE at 8:00 AM in 2023. In contrast, Model 3 consistently outperformed KRLF in all evaluation metrics for both 2022 and 2023.

The substanital performance gain of Model 3 over Model 2 underscores the critical importance of selecting dominant input features in VSTLF. Furthermore, a comparison among Models 2, 3, and the proposed VSTLF model reveals that incorporating

Δ L

—capturing recent load variations—as an additional input feature provides further accuracy improvements, beyond those achieved by input feature selection alone.

Among the evaluated models, the proposed VSTLF model demonstrated the highest forecasting accuracy, outperforming the KRLF benchmark. In 2022, it reduced the MAPE by 0.50 percentage points, the MAE by 306.58 MW, and the RMSE by 411.58 MW when aggregated across all forecast horizons. At the 08:00 AM horizon specifically, MAPE, MAE, and RMSE were further reduced by 0.95 percentage points, 532.55 MW, and 635.40 MW, respectively. Similarly, in 2023, the proposed VSTLF model achieved reductions of 0.56 percentage points in MAPE, 329.23 MW in MAE, and 430.27 MW in RMSE across all forecast horizons. For the 08:00 AM forecast in 2023, these improvements were 0.49 percentage points (MAPE), 205.70 MW (MAE), and 128.81 MW (RMSE). These results highlight the effectiveness of integrating NMI-based feature selection and the additional load variation feature (

Δ L

) within the VSTLF framework.

Figure 11 presents the MAPE distributions for KRLF, Models 1–3, and the proposed VSTLF. Each box represents the interquartile range (IQR) with the central line indicating the median. The whiskers extend to 1.5 × IQR from the quartiles, and outliers are omitted for clarity. Compared to KRLF and Model 1, Model 2 exhibits a narrower IQR and shorter whiskers, indicating the benefit of incorporating the

Δ L

feature. Model 3 further reduces dispersion by applying feature selection. The proposed VSTLF achieves the lowest median MAPE and the most compact IQR and whiskers, demonstrating both the highest accuracy and the most consistent performance among all models.

Figure 12 and Figure 13 illustrate the mean predicted values at each forecast point for the proposed and comparison models, aggregated over all forecast horizons and at 08:00 AM, respectively.

As shown in Figure 12 and Figure 13, Models 1–3 outperformed the KRLF benchmark, and the proposed VSTLF yielded the most accurate forecasts overall.

Next, the forecasting accuracy of different algorithms is compared to evaluate the effectiveness of the proposed model. For comparison, two hybrid model structures commonly adopted in recent load forecasting studies are selected as baselines. The first comparison model is based on a CNN-GRU architecture combined with a multi-head attention mechanism (CNN-GRU-MHAT), which captures both spatial and temporal features through CNN-GRU layers while learning the importance of input features via the multi-head attention layers [14]. The second comparison model employs a hybrid CNN-BiLSTM structure, where the CNN extracts spatial features and the BiLSTM captures bidirectional temporal dependencies [26]. For a fair comparison, all models are trained using the same input features and identical hyperparameter settings. The forecasting results for each model are summarized in Table 7. Among the comparison models, the CNN-BiLSTM model demonstrated superior forecasting accuracy compared to the CNN-GRU-MHAT model. However, the proposed model outperformed the CNN-BiLSTM model across both overall time intervals and periods of high load variability during the year 2022. In 2023, the proposed model consistently achieved better forecasting performance than the CNN-BiLSTM model across most evaluation metrics, with the exception of RMSE.

5. Conclusions

In this study, a novel VSTLF model incorporating a GRU and an attention mechanism was proposed for forecasting net load in large-scale power systems. To enhance forecasting accuracy, two key techniques were integrated into the model: input feature selection based on NMI and the inclusion of a novel input feature, the load variation (

Δ L

), which captures real-time load dynamics.

Extensive case studies using national-level data from South Korea demonstrated that the proposed GRU-attention model consistently outperformed both comparison GRU-attention architectures, the Kalman filter-based VSTLF model currently used for real-time system operations, and hybrid deep learning models. In particular, when expressing the accuracy of the proposed method in terms of the error rate, the Mean Absolute Percentage Error (MAPE) is 0.77%, which shows an improvement of 0.50 percentage points over the benchmark model using the Kalman filter algorithm, and an improvement of 0.27 percentage points over the hybrid deep learning benchmark (CNN-BiLSTM). The simulation results clearly demonstrate the effectiveness of the NMI-based feature selection and the combination of load characteristics for very short-term load forecasting.

These findings highlight the practical relevance of the proposed approach for real-time power system operation and electricity market management. However, limitations remain: the model was trained and evaluated using actual measured data without considering weather forecast uncertainty, and it was not explicitly designed to account for holidays, which can cause atypical load behavior. Future work is needed to address the challenges associated with holidays. Holidays exhibit unique load patterns, which can vary depending on the specific type of the day, the corresponding day of the week, or in cases where the date is determined by the lunar calendar. Due to the limited availability of historical data for these events, future studies should explore data augmentation techniques and develop methods for effectively handling the surrounding days of holidays. In addition, further investigation is required to incorporate forecast uncertainty by utilizing predicted weather variables in the training phase or by developing strategies to appropriately integrate both observed and forecasted weather data. Also, this work proposed the sequential approach that first selects the input features via NMI and tunes hyperparameters using grid search algorithm. This approach becomes computationally intensive when dealing with high-dimensional parameter settings such as incorporating economic growth indicators, holiday effects, or learning rate schedules. Future work will therefore investigate employing advanced optimization techniques such as SGSA, Bayesian optimization, or evolutionary strategies.

Author Contributions

Methodology, T.-G.K. and K.-B.S.; Software, T.-G.K.; Writing—review & editing, T.-G.K., S.-G.Y. and K.-B.S.; Supervision, K.-B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Korea Institute of Energy Technology Evaluation and Planning (KETEP) grant number RS-2024-00398166.

Data Availability Statement

The 15-min resolution load datasets and source code presented in this article are not readily available because of institutional data handling policies. Requests to access the datasets should be directed to Korea Power Exchange (KPX). However, in order to support reproducibility, a publicly accessible 1 h resolution load dataset can be obtained from the KPX website at https://www.data.go.kr/data/15065266/fileData.do (accessed on 17 June 2025). Meteorological data were acquired in real time via API services from the Korea Meteorological Administration (KMA) at https://apihub.kma.go.kr/ (accessed on 17 June 2025). Please note that both websites are primarily provided in Korean.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tarmanini, C.; Sarma, N.; Gezegin, C.; Ozgonenel, O. Short-term load forecasting based on ARIMA and ANN approaches. Energy Rep. 2023, 9, 550–557. [Google Scholar] [CrossRef]
Dudek, G.; Pełka, P.; Smyl, S. A hybrid residual dilated LSTM and exponential smoothing model for midterm electric load forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2879–2891. [Google Scholar] [CrossRef]
Lu, S.; Xu, Q.; Jiang, C.; Liu, Y.; Kusiak, A. Probabilistic load forecasting with a non-crossing sparse-group Lasso-quantile regression deep neural network. Energy 2022, 242, 122955. [Google Scholar] [CrossRef]
Taieb, S.B.; Hyndman, R.J. A gradient boosting approach to the Kaggle load forecasting competition. Int. J. Forecast. 2014, 30, 382–394. [Google Scholar] [CrossRef]
Guo, J.; Yun, S.; Meng, Y.; He, N.; Ye, D.; Zhao, Z.; Jia, L.; Yang, L. Prediction of heating and cooling loads based on light gradient boosting machine algorithms. Build. Environ. 2023, 236, 110252. [Google Scholar] [CrossRef]
Ahmad, W.; Ayub, N.; Ali, T.; Irfan, M.; Awais, M.; Shiraz, M.; Glowacz, A. Towards short term electricity load forecasting using improved support vector machine and extreme learning machine. Energies 2020, 13, 2907. [Google Scholar] [CrossRef]
Vanting, N.B.; Ma, Z.; Jørgensen, B.N. A scoping review of deep neural networks for electric load forecasting. Energy Inform. 2021, 4, 49. [Google Scholar] [CrossRef]
Rafi, S.H.; Deeba, S.R.; Hossain, E. A short-term load forecasting method using integrated CNN and LSTM network. IEEE Access 2021, 9, 32436–32448. [Google Scholar] [CrossRef]
Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical load forecasting using LSTM, GRU, and RNN algorithms. Energies 2023, 16, 2283. [Google Scholar] [CrossRef]
Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
Kwon, B.S.; Park, R.J.; Song, K.B. Short-Term Load Forecasting Based on Deep Neural Networks Using LSTM Layer. J. Electr. Eng. Technol. 2020, 15, 1501–1509. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
Cai, C.; Li, Y.; Su, Z.; Zhu, T.; He, Y. Short-term electrical load forecasting based on VMD and GRU-TCN hybrid network. Appl. Sci. 2022, 12, 6647. [Google Scholar] [CrossRef]
Hua, Q.; Fan, Z.; Mu, W.; Cui, J.; Xing, R.; Liu, H.; Gao, J. A short-term power load forecasting method using CNN-GRU with an attention mechanism. Energies 2024, 18, 106. [Google Scholar] [CrossRef]
He, X.; Zhao, W.; Gao, Z.; Zhang, L.; Zhang, Q.; Li, X. Short-term load forecasting by GRU neural network and DDPG algorithm for adaptive optimization of hyperparameters. Electr. Power Syst. Res. 2025, 238, 111119. [Google Scholar] [CrossRef]
Ahmad, A.; Xiao, X.; Mo, H.; Dong, D. TFTformer: A novel transformer based model for short-term load forecasting. Int. J. Electr. Power Energy Syst. 2025, 166, 110549. [Google Scholar] [CrossRef]
Pati, U.; Ray, P.; Singh, A.R. An intelligent approach towards very short-term load forecasting. Int. J. Emerg. Electr. Power Syst. 2022, 23, 59–72. [Google Scholar] [CrossRef]
Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst. 2020, 36, 1984–1997. [Google Scholar] [CrossRef]
Rafati, A.; Joorabian, M.; Mashhour, E.; Shaker, H.R. Machine learning-based very short-term load forecasting in microgrid environment: Evaluating the impact of high penetration of PV systems. Electr. Eng. 2022, 104, 2667–2677. [Google Scholar] [CrossRef]
Jiang, Y.; Gao, T.; Dai, Y.; Si, R.; Hao, J.; Zhang, J.; Gao, D.W. Very short-term residential load forecasting based on deep-autoformer. Appl. Energy 2022, 328, 120120. [Google Scholar] [CrossRef]
Zhang, M.; Han, Y.; Zalhaf, A.S.; Wang, C.; Yang, P.; Wang, C.; Yang, P.; Wang, C.; Zhou, S.; Xiong, T. Accurate ultra-short-term load forecasting based on load characteristic decomposition and convolutional neural network with bidirectional long short-term memory model. Sustain. Energy Grids Netw. 2023, 35, 101129. [Google Scholar] [CrossRef]
Tong, C.; Zhang, L.; Li, H.; Ding, Y. Attention-based temporal–spatial convolutional network for ultra-short-term load forecasting. Electr. Power Syst. Res. 2023, 220, 109329. [Google Scholar] [CrossRef]
Vontzos, G.; Laitsos, V.; Bargiotas, D. Data-driven airport multi-step very short-term load forecasting. In Proceedings of the 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece, 10–12 July 2023; pp. 1–6. [Google Scholar]
Wang, C.; Zhao, H.; Liu, Y.; Fan, G. Minute-level ultra-short-term power load forecasting based on time series data features. Appl. Energy 2024, 372, 123801. [Google Scholar] [CrossRef]
Zhao, Y.; Li, J.; Chen, C.; Guan, Q. A diffusion–attention-enhanced temporal (DATE-TM) model: A multi-feature-driven model for very-short-term household load forecasting. Energies 2025, 18, 486. [Google Scholar] [CrossRef]
Yang, Z.; Li, J.; Liu, C.; Wang, H. Forecasting very short-term power load with hybrid interpretable deep models. Syst. Sci. Control Eng. 2025, 13, 2486136. [Google Scholar] [CrossRef]
Tziolis, G.; Spanias, C.; Theodoride, M.; Theocharides, S.; Lopez-Lorente, J.; Livera, A.; Georghiou, G.E. Short-term electric net load forecasting for solar-integrated distribution systems based on Bayesian neural networks and statistical post-processing. Energy 2023, 271, 127018. [Google Scholar] [CrossRef]
Kerkau, S.; Sepasi, S.; Howlader, H.O.R.; Roose, L. Day-Ahead Net Load Forecasting for Renewable Integrated Buildings Using XGBoost. Energies 2025, 18, 1518. [Google Scholar] [CrossRef]
Yuan, Y.; Yuan, X.; Wang, H.; Tang, M.; Li, M. Net load forecasting method in distribution grid planning based on LSTM network. Sci. Technol. Energy Transit. 2024, 79, 57. [Google Scholar] [CrossRef]
Bae, D.-J.; Kwon, B.-S.; Song, K.-B. XGBoost-based day-ahead load forecasting algorithm considering behind-the-meter solar PV generation. Energies 2022, 15, 128. [Google Scholar] [CrossRef]
Korea Power Exchange (KPX). 2023 Power System Operation Performance Report. Available online: https://www.kpx.or.kr/boardDownload.es?bid=0159&list_no=73276OOO202408271350024351&seq=1 (accessed on 5 June 2025).
Korea Electric Power Corporation (KEPCO). Electricity Usage by Sector and Tariff Classification. Available online: https://home.kepco.co.kr/kepco/EB/A/htmlView/EBAAHP002.do?menuCd=FN430102 (accessed on 5 June 2025).
Statistics Korea. Electric Power Sales by Usage. KOSIS—Korean Statistical Information Service. 2024. Available online: https://www.index.go.kr/unify/idx-info.do?idxCd=4293 (accessed on 21 April 2025).
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Strehl, A.; Ghosh, J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
Lim, J.H.; Kim, S.Y.; Park, J.D.; Song, K.B. Representative temperature assessment for improvement of short-term load forecasting accuracy. J. Korean Inst. Illum. Electr. Install. Eng. 2013, 27, 39–43. [Google Scholar]
Kwon, B.S.; Park, R.J.; Jo, S.W.; Song, K.B. Analysis of short-term load forecasting using artificial neural network algorithm according to normalization and selection of input data on weekdays. In Proceedings of the 2018 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), Kota Kinabalu, Malaysia, 7–10 December 2018; pp. 280–283. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Fan, C.; Yang, L.T.; Xiao, L. A Step Gravitational Search Algorithm for Function Optimization and STTM’s Synchronous Feature Selection–Parameter Optimization. Artif. Intell. Rev. 2025, 58, 179. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Jung, H.W.; Song, K.B.; Park, J.D.; Park, R.J. Very short-term electric load forecasting for real-time power system operation. J. Electr. Eng. Technol. 2018, 13, 1419–1424. [Google Scholar]

Figure 1. Load characteristics of South Korea. Distinct characteristics of daily net load (a) and the distribution of net-load values across different hours in South Korea (b).

Figure 2. The relationship between solar power generation and net load under different weather conditions.

Figure 3. Overall workflow of the proposed VSTLF method.

Figure 4. Structure of GRU cells.

Figure 5. The procedure of the dot-product attention method.

Figure 6. The procedure of input feature selection using NMI.

Figure 7. The procedure of the reconstituted load method.

Figure 8. Structure of proposed very short-term load forecasting algorithm.

Figure 9. Description of sliding window method.

Figure 10. Results of hyperparameter selection using Grid search algorithm.

Figure 11. Comparison of MAPE distribution across models.

Figure 12. Average of MAPE for proposed and comparison models (total).

Figure 13. Average of MAPE for proposed and comparison models (8:00 AM).

Table 1. Summary of the dataset.

Data	Value	Unit
Load	1240–94,929	MW
Time information	00:00–23:45	hh:mm
Calendar information (Year)	2021–2023	yyyy
Calendar information (Month)	1–12	MM
Calendar information (Day)	1–31	DD
Calendar information (Day of week)	Monday–Sunday	-
Temperature	−16–34	°C
Humidity	0–100	%
Wind speed	0–7.8	m/s
Weather conditions	Sunny, cloudy, overcast	-
Precipitation	0–86	mm

Table 2. NMI of weather variables.

	Temperature	Humidity	Wind Speed	Weather Conditions	Rain
NMI	0.082	0.030	0.030	0.001	0.001

Table 3. NMI of historical load and load variation.

	D-1	D-2	D-3	D-4	D-5	D-6	D-7	Load Variation
NMI	0.243	0.128	0.112	0.114	0.122	0.173	0.239	0.542

Table 4. Grid search space and selected hyperparameters.

Parameter	Search Space	Selected
Length of train data	[1 month, 6 months, 12 months]	12 months
Number of GRU layers	[1, 2, 3]	1
Number of features	[4, 8, 16, 32, 64]	32

Table 5. Summary of GRU-attention comparison models.

	Model 1	Model 2	Model 3	Proposed
Input feature selection	x	x	o	o
Load variation	x	o	x	o

Table 6. Results of proposed VSTLF model and comparison models.

Years	Metrics	KRLF	Model 1	Model 2	Model 3	Proposed
2022	${MAPE}_{total} (%)$	1.27	1.23	1.13	0.80	0.77
	${MAE}_{total}$ (MW)	854.11	863.84	790.22	569.55	547.53
	${RMSE}_{total}$ (MW)	1203.49	1205.59	1109.30	828.38	791.91
	${MAPE}_{8 am} (%)$	2.09	1.59	1.47	1.17	1.14
	${MAE}_{8 am}$ (MW)	1401.91	1193.76	1111.04	894.69	869.36
	${RMSE}_{8 am}$ (MW)	1830.38	1628.98	1504.77	1247.53	1194.98
2023	${MAPE}_{total} (%)$	1.36	1.33	1.17	0.84	0.80
	${MAE}_{total}$ (MW)	883.26	903.51	799.15	574.35	553.87
	${RMSE}_{total}$ (MW)	1275.24	1274.28	1143.47	867.82	844.97
	${MAPE}_{8 am} (%)$	1.78	1.81	1.59	1.34	1.29
	${MAE}_{8 am}$ (MW)	1164.60	1321.60	1163.16	983.43	958.90
	${RMSE}_{8 am}$ (MW)	1549.45	1846.73	1604.57	1453.20	1420.64

Table 7. Comparison analysis of forecasting algorithm.

Year	Metric	KRLF	CNN-GRU-MHAT	CNN-BiLSTM	Proposed
2022	${MAPE}_{total} (%)$	1.27	1.16	1.04	0.77
	${MAE}_{total}$ (MW)	854.11	806.37	723.45	547.53
	${RMSE}_{total}$ (MW)	1203.49	1077.24	969.02	791.91
	${MAPE}_{8 am} (%)$	2.09	1.43	1.31	1.14
	${MAE}_{8 am}$ (MW)	1401.91	1077.01	985.31	869.36
	${RMSE}_{8 am}$ (MW)	1830.38	1407.48	1293.59	1194.98
2023	${MAPE}_{total} (%)$	1.36	1.04	0.95	0.80
	${MAE}_{total}$ (MW)	883.26	713.10	655.68	553.87
	${RMSE}_{total}$ (MW)	1275.24	970.23	902.35	844.97
	${MAPE}_{8 am} (%)$	1.78	1.45	1.35	1.29
	${MAE}_{8 am}$ (MW)	1164.60	1073.99	997.94	958.90
	${RMSE}_{8 am}$ (MW)	1549.45	1464.76	1373.79	1420.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, T.-G.; Yoon, S.-G.; Song, K.-B. Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm. Energies 2025, 18, 3229. https://doi.org/10.3390/en18133229

AMA Style

Kim T-G, Yoon S-G, Song K-B. Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm. Energies. 2025; 18(13):3229. https://doi.org/10.3390/en18133229

Chicago/Turabian Style

Kim, Tae-Geun, Sung-Guk Yoon, and Kyung-Bin Song. 2025. "Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm" Energies 18, no. 13: 3229. https://doi.org/10.3390/en18133229

APA Style

Kim, T.-G., Yoon, S.-G., & Song, K.-B. (2025). Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm. Energies, 18(13), 3229. https://doi.org/10.3390/en18133229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

Abstract

1. Introduction

2. Load Characteristic Analysis

3. Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

3.1. Backgrounds of Gated Recurrent Unit and Attention Mechanism

3.2. Input Feature Selection Using Normalized Mutual Information

3.3. Additional Input Feature—Load Variation

3.4. Data Preprocessing

3.5. Very Short-Term Load Forecasting Model for Large Power System Using GRU-Attention Algorithm

4. Case Study

4.1. Description of Dataset

4.2. Input Feature Selection and Hyperparameter Tuning Results Using Grid Search Algorithm

4.3. Comparison Algorithms and Evaluation Metrics

4.4. Results of Proposed VSTLF Model and Comparison Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI