1. Introduction
Climate change, driven by global warming, along with the global transition from fossil fuels to renewable energy to achieve carbon neutrality, has introduced substantial changes to power systems. In particular, the growing integration of renewable energy sources and climate-induced variability in system load have increased the magnitude of load fluctuations. These challenges have made accurate load forecasting more difficult. Nevertheless, precise forecasting remains essential to ensure the reliable and economic operation of power systems. This underscores the need for forecasting models that can effectively capture and adapt to such variability.
Recent studies have explored a variety of approaches to improve the accuracy of load forecasting. These studies can be broadly categorized according to input feature selection techniques, forecasting methodologies, and input enhancement strategies. Given that selecting appropriate input features based on the forecasting horizon and the characteristics of the target load is critical, extensive research has focused on identifying the most relevant features to enhance forecasting performance.
Several studies have proposed input feature selection methods that incorporate multiple analytical techniques. Subbiah et al. [
1] introduced a long-term forecasting framework that combines the RReliefF filter, mutual information, and recursive feature elimination. Similarly, Cui et al. [
2] applied XGBoost and Random Forest algorithms to identify relevant features for short-term load forecasting. These studies highlight that combining multiple techniques can provide greater robustness in input feature selection, as relying on a single method may not fully capture the diverse characteristics of load-related data. Therefore, a hybrid approach that integrates multiple selection techniques is essential for improving both the quality and reliability of input selection.
Traditional time-series models, such as ARIMA, exponential smoothing (ES), LASSO, the Kalman filter, and their nonlinear extensions, have long been applied to load forecasting. Taylor et al. [
3] employed a double seasonal ETS for a 15-day load forecasting task in Brazil, Ziel et al. [
4] adopted LASSO estimation for probabilistic short-term forecasting, and Jung et al. [
5] applied the Kalman filter for VSTLF. While these approaches provide interpretable results, they struggle to capture the nonlinear dynamics of large-scale systems, particularly under increasing load variability driven by weather and renewable integration.
In contrast, machine learning methods, including support vector machines (SVM), gradient boosting, and hybrid approaches, have gained attention for their ability to model nonlinear dependencies. For example, Singh et al. [
6] utilized SVM with wavelet transforms for day-ahead forecasting, while Taieb et al. [
7] applied gradient boosting for short-term forecasting. Since a basic SVM is linear, whereas kernelized SVM can capture nonlinear relationships, these methods extend the modeling capacity beyond traditional time-series models. Hybrid approaches have also been explored: Dudek et al. [
8] combined exponential smoothing with Long Short-Term Memory (LSTM) for mid-term forecasting, and Velasco et al. [
9] proposed an ARIMA–ANN model where ARIMA captured linear components and ANN corrected residuals, achieving higher accuracy than single models.
Among machine learning techniques, Recurrent Neural Network (RNN)-based methods such as LSTM and Gated Recurrent Units (GRUs) have been extensively utilized in recent studies, as their internal feedback structures enable them to effectively capture temporal dependencies in sequential input data. Kwon et al. [
10] applied LSTM for short-term load forecasting in the South Korean power system. Lin et al. [
11] proposed an LSTM-based short-term load forecasting model with an attention mechanism, while Hua et al. [
12] developed a hybrid model that combines Convolutional Neural Networks (CNNs), GRUs, and an attention mechanism. Numerous recent studies have demonstrated that LSTM and GRU architectures are well suited for load forecasting tasks.
Moreover, various studies have explored input data enhancement techniques to improve the accuracy of load forecasting. While many approaches directly use historical load data, along with related weather variables, an increasing number of studies have focused on utilizing feature extraction methods involving component analysis of the original data to enhance input variables. Wang et al. [
13] proposed a method that applies input sequence enhancement by decomposing load signals using the Prophet algorithm and variational mode decomposition. Li et al. [
14] employed Seasonal trend decomposition, using losses to separate load data into trend, seasonal, and residual components, which were then used as input sequences. Shao et al. [
15] used ensemble empirical mode decomposition to decompose load data and utilized the resulting components as forecasting model inputs. These decomposition-based sequence enhancement methods have all demonstrated improvements in load forecasting accuracy.
Recent studies on load forecasting have increasingly addressed VSTLF, which targets predictions ranging from a few minutes to several hours ahead. Wang et al. [
13] proposed a VSTLF model based on Temporal Convolutional Networks (TCNs) combined with Light Gradient Boosting Machine (LGBM) to extract spatio-temporal load features. Zhang et al. [
16] developed a VSTLF approach that integrates an improved empirical mode decomposition algorithm with bidirectional LSTM for a small city in China. Tong et al. [
17] combined an attention module with a 1D convolutional block for VSTLF in Chinese and New England cities. Although these studies advanced the development of VSTLF, they primarily addressed one-step-ahead forecasting, which limits their applicability for real-time power system operation and market planning.
Although many recent studies have focused on input feature selection, input sequence enhancement, or RNN-based VSTLF, few have integrated all these aspects in the context of a large-scale, nationwide power systems. In particular, VSTLF for such systems remains challenging because it must capture not only heterogeneous weather conditions across wide geographic regions and the variability introduced by renewable energy sources but also the distortions in system load caused by Behind-the-Meter (BTM) PV generation. The rapid growth of renewable energy amplifies the uncertainty of BTM PV generation, which induces irregular system load patterns and poses significant challenges for nationwide forecasting.
Building upon previous studies that have improved load forecasting accuracy, this study proposes a VSTLF algorithm for the next six hours in a large-scale, nationwide power system at 15 min intervals. The proposed approach integrates the following: (1) Input feature selection based on multiple correlation analyses, including the Pearson correlation coefficient, Spearman correlation coefficient, and normalized mutual information (NMI); (2) Input sequence enhancement using pseudo-trend components generated by a Kalman filter-based predictor; and (3) An LSTM model tailored to handle multi-dimensional inputs.
The main contributions of this study are described as follows:
- 1.
Input Feature Selection Based on Multiple Correlation Analyses. This study identifies candidate input features by combining multiple correlation analyses: the Pearson correlation coefficient to capture linear relationships, the Spearman correlation coefficient for monotonic and potentially nonlinear associations, and NMI to quantify information shared between variables. These correlation measures are integrated to provide robustness in the candidate selection process. From the set of selected candidates, the input combination that yields the highest forecasting accuracy is selected.
- 2.
Input Sequence Enhancement Using a Kalman Filter-Based Predictor for Pseudo-Trend Generation. To enhance the input sequences for the LSTM-based forecasting model, this study employs a Kalman filter-based predictor to generate pseudo-trend components of the load. These pseudo-trends are used to augment the original historical input sequences, which enables the model to learn a more stable temporal structure and thereby improve forecasting performance.
- 3.
VSTLF for a Large-Scale, Nationwide Power System Considering Regional Characteristics and Renewable Variability. To perform VSTLF on a nationwide scale, this study adopts two key strategies: (1) a reconstituted load approach that accounts for the variability introduced by photovoltaic (PV) generation and (2) the use of representative weather variablesthat capture spatially heterogeneous conditions across regions of the power system.
The remainder of this paper is organized as follows.
Section 2 provides background information.
Section 3 presents the proposed algorithm.
Section 4 discusses case studies for validation, and
Section 5 concludes the paper.
3. VSTLF for Large Power Systems with Pseudo-Trend Information
The overall flowchart of the proposed VSTLF algorithm is illustrated in
Figure 4.
In the proposed method, the reconstituted load is first computed by combining the system load with PV generation data, thereby properly accounting for the influence of PV generation. Input candidates are then selected through multiple correlation analyses—the Pearson correlation coefficient, Spearman correlation coefficient, and NMI—applied to both weather variables and time-lagged features. To enrich the temporal characteristics of the input sequence, pseudo-trend components are generated using a Kalman filter-based predictor and incorporated into the input. The model forecasts the reconstituted load using combinations of selected input candidates and the pseudo-trend data. The forecasted loads are then converted back to system load, and the input combination that yields the lowest forecasting error is selected. Finally, hyperparameter tuning and model evaluation are performed.
3.1. Selection of Candidate Input Features Through Correlation Analyses
In this study, candidate input features are selected through an integrated correlation analysis framework that prioritizes variables consistently exhibiting strong associations with the target load. The selection process combines the Pearson correlation coefficient to assess linear relationships, the Spearman correlation coefficient to capture monotonic trends, and the NMI to quantify the mutual information shared between variables. The overall process of input feature selection is depicted in
Figure 5.
As shown in
Figure 5, each input variable (e.g., weather variables and time-lagged variables) is evaluated using Pearson, Spearman, and NMI correlation analyses. For each correlation analysis, a variable is marked as 1 when its correlation coefficient is greater than the average coefficient obtained from that specific analysis. Finally, variables that are consistently marked across all three methods are selected as the input candidates. Among these candidates, all possible combinations are evaluated through a brute-force search to identify the optimal input feature set for the forecasting model.
Since the proposed model is trained to forecast the reconstituted load directly, correlation analysis is conducted between the reconstituted load and both weather and time-lagged variables to identify meaningful input features. The weather-related variables considered in this study include temperature, humidity, wind speed, weather conditions, and precipitation, all of which are derived from weather forecasts. Time-lagged variables consist of historical load values from one day prior (D-1) to one week prior (D-7) to the forecast date. Additionally, calendar variables—such as year, month, day, and day of the week—are incorporated to account for periodicity and are treated as auxiliary features.
3.2. Data Pre-Processing
The load and weather data used as inputs to the forecasting model may contain missing values and differences in scale, which can introduce bias or increase the risk of overfitting. Furthermore, it is necessary to derive representative weather information that accurately reflects the characteristics of the entire large-scale power system and effectively captures the impact of renewable energy generation.
In this study, data pre-processing is carried out in the following steps: time resolution alignment, missing value removal, load reconstitution, representative weather data generation, and data normalization. The raw datasets used in this study were originally recorded at different temporal resolutions: the target variable, i.e., the system load, is recorded at 15 min intervals; weather data are provided hourly; and calendar variables—such as date and day of the week—are categorical.
All datasets are resampled to a unified 15 min resolution to match the forecasting target. Hourly weather data are converted to 15 min intervals using linear interpolation. Additionally, any days containing missing values in the training dataset are entirely excluded from the dataset.
To accurately account for the impact of renewable energy, particularly PV generation, the reconstituted load is calculated by summing PV generation and the system load. Representative weather data for the entire power system are derived by performing weighted averaging of weather observations from n major zones, where the weights reflect the relative significance of each region.
Finally, to ensure consistency in data scales, min–max normalization is applied. This technique is selected from among several normalization methods—including max normalization, min–max normalization, and Z-score normalization—based on prior studies [
10] reporting superior forecasting accuracy when using min–max scaling. Min–max normalization is defined as
where
and
denote the maximum and minimum values of the data, respectively, and
denotes the value at
t.
3.3. VSTLF Model with Pseudo-Trend Input Sequence Enhancement
This study proposes a VSTLF model that utilizes load data, weather information, and a pseudo-trend generated by a Kalman filter-based predictor as inputs to an LSTM network. The model forecasts load every 15 min over a 6 h horizon on regular weekdays, excluding holidays. This forecasting scope is due to the limited availability of holiday load data and their irregular behavior, which will be addressed in future work.
The proposed model is designed to learn temporal patterns by processing daily load sequences and extracting their characteristics through hourly feature representations. Constructing input sequences that effectively capture patterns relevant to the forecasting target is essential for enhancing forecasting accuracy. To achieve this, the input sequence is constructed using selected time-lagged date combinations from a candidate pool.
Additionally, to provide the model with forward-looking information specific to the forecast date, a pseudo-trend is generated using a Kalman filter-based predictor and incorporated into the input sequence. This pseudo-trend is obtained using the method proposed by Jung et al. [
5], which is designed to capture the underlying load dynamics by identifying both recent and historical trend components. The resulting trend sequence represents the temporal direction of the forecast day’s load profile. The sequence-feature structure of the input data is illustrated in
Figure 6.
The proposed algorithm extends the input structure illustrated in
Figure 6 by incorporating the pseudo-trend generated by the Kalman filter-based predictor. An example of the input data structure including the Kalman filter-based pseudo-trend is presented in
Figure 7.
The pseudo-trend, which captures the recent trend of the target day, is incorporated into the input sequence alongside historical load patterns. The input to the LSTM model consists of three components: (1) the past 6 h of load data prior to the prediction time, (2) time-lagged components selected through the input feature selection process, and (3) the pseudo-trend corresponding to the same 6 h forecast horizon.
When multiple types of data, such as load, temperature, and humidity, are jointly used in the form of sequences and features, the model input becomes multidimensional. To address this, either a merged input structure or a parallel input structure can be applied. However, merged input structures may cause distortion during concatenation because of potential temporal misalignment among heterogeneous variables.
To mitigate this issue, the proposed model preserves the original structure of each input type by feeding load data and each selected weather variable into separate LSTM layers. This architecture enables the model to independently capture the temporal dynamics of each variable, thereby enhancing its ability to learn feature-specific patterns more effectively.
However, since the parallel structure has limitations in learning inter-variable dependencies, the outputs of the individual LSTM layers are concatenated and passed to a fully connected (FC) layer. One-dimensional inputs—such as weather forecasts for the prediction day, the pseudo-trend generated by the Kalman filter-based predictor, and auxiliary features (e.g., calendar variables)—are also fed into the FC layer, along with the LSTM outputs.
The model is trained by minimizing the mean squared error (MSE) between the predicted and actual values using the Adam optimizer. The overall architecture of the proposed forecasting model is illustrated in
Figure 8.
4. Case Studies
4.1. Dataset Description and Experimental Setup
This study utilizes data spanning from 1 January 2021 to 31 December 2022. The dataset includes system load measurements recorded at 15 min intervals and PV generation data recorded hourly, both provided by the Korea Power Exchange. Additionally, hourly weather data—including temperature, humidity, wind speed, weather condition, and precipitation—were obtained from the Korea Meteorological Administration for eight zones across the large, nationwide power system, which consists of several zones. Calendar-related auxiliary variables, such as day of the week and holidays, were also collected. The types and ranges of all datasets used in this study are summarized in
Table 1.
As shown in
Table 1, the maximum recorded load reaches 94,929 MW, representing total national demand. Industrial consumption accounts for approximately 53% of this total, indicating that overall load patterns are primarily driven by industrial activity. Although PV generators are installed across most regions—including limited deployment in the capital area—their output must be considered in the forecasting model [
23].
All processes—including input selection, model training, forecasting, and performance evaluation—were implemented in Python 3.10.15 on a system equipped with an Intel Xeon Silver 4215R CPU.
4.2. Evaluation Metrics
To evaluate the forecasting performance of the proposed model, three commonly used error metrics are employed: mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE). The definitions of these metrics are
where
and
denote the actual and predicted load at time
t, respectively, and
n is the number of forecast steps.
MAPE measures the average percentage error between predicted and actual values, providing a scale-independent indicator of forecasting accuracy. MAE calculates the mean of the absolute errors and is less sensitive to outliers compared to RMSE. In contrast, RMSE penalizes larger errors more heavily by squaring the residuals, making it more suitable for emphasizing peak deviations in the forecast.
4.3. Results of Input Candidate Selection Based on Correlation Analysis
Input candidate selection was performed by conducting correlation analysis between the reconstituted load—obtained from system load and PV generation data—and the candidate variables. The results of the correlation analysis between the reconstituted load and both weather variables and time-lagged features are presented in
Table 2 and
Table 3.
As shown in
Table 2, temperature and humidity consistently exhibited higher correlation values than the average coefficients in both the Pearson and Spearman analyses. In contrast, the NMI results indicated that temperature, humidity, and wind speed exceeded the average NMI coefficient. Since the proposed input candidates were selected only when a variable exceeded the average value across all three correlation measures, temperature and humidity were identified as the final weather-related input features.
In addition, as shown in
Table 3, time-lagged variables D-1, D-6, and D-7 commonly exhibited higher correlation values across the three correlation analyses. As a result, D-1, D-6, and D-7 were selected as the final time-lagged input features.
Additionally, auxiliary information such as year, month, day, weekday, and time was consistently included to represent the calendar-based periodic characteristics of the target day.
To identify the optimal input combination, a brute-force search [
25] was conducted over all possible combinations of the selected input features. For weather-related inputs, two combinations were evaluated: [temperature] and [temperature, humidity]. Since temperature is widely recognized as the most influential weather variable in load forecasting [
10], combinations that excluded temperature were not considered. For time-lagged inputs, seven combinations were tested: [D-1], [D-1, D-6], [D-1, D-7], [D-1, D-6, D-7], [D-6], [D-6, D-7], and [D-7].
As a result, two valid combinations of weather variables and seven combinations of time-lagged variables yielded a total of 14 input configurations. Each configuration was evaluated using the MAPE to identify the one that achieved the highest forecasting accuracy. The forecasting errors associated with each input combination are summarized in
Table 4.
As shown in
Table 4, among the 14 evaluated input combinations, the configuration using temperature as the weather variable and D-1 and D-7 as time-lagged inputs, i.e., Case 3, yielded the highest forecasting accuracy. A comparison between the use of D-6 and D-7 indicates that incorporating D-7, which represents the same day of the previous week, is the most effective for improving forecasting performance. D-6, on the other hand, exhibited higher correlation with the target load than other time-lagged candidates and yielded better accuracy than using D-1 alone, but its contribution was less effective than that of D-7. These results highlight the importance of carefully selecting input features and lag structures. Given that the input sequence of the designed LSTM model is based on daily load patterns, the incorporation of D-7, which provides a more similar load pattern, yielded the greatest improvement in forecasting accuracy. Based on this result, the final input feature set was determined to include temperature as the weather input and D-1 and D-7 as the time-lagged variables.
4.4. Effect of Pseudo-Trend Sequence Input Enhancement
The Kalman filter-based pseudo-trend is generated at 15 min intervals over a 6 h forecasting horizon. This section evaluates the effectiveness of the pseudo-trend as an additional input sequence by analyzing its correlation with the actual target load. To quantify the relationship between the pseudo-trend and the target load, three correlation metrics are computed: the Pearson correlation coefficient, the Spearman correlation coefficient, and the NMI. The results are summarized in
Table 5. Here, the generated pseudo-trend sequence is further denoted as KF.
As shown in
Table 5, the pseudo-trend exhibits a strong correlation with the target load, with Pearson and Spearman correlation coefficients of 0.937 and 0.935, respectively, and an NMI value of 0.992. When used as a direct forecast of the target load, the pseudo-trend yields an MAPE of 1.724%.
Based on these results, the pseudo-trend is incorporated as an additional component of the input sequence, alongside the historical load data. The input configuration of the forecasting model corresponds to the optimal feature set identified in the previous section, which includes temperature as the weather-related feature and D-1 and D-7 as the time-lagged features. In addition to the selected input configuration, the auxiliary calendar features are included as input in the same way.
The annual forecasting performance of the pseudo-trend-enhanced input sequence model is summarized in
Table 6. For comparison, Case 3 in
Table 4, which excludes the pseudo-trend from the input sequence, is also presented.
As shown in
Table 6, incorporating the pseudo-trend into the input sequence led to reductions across all annual average error metrics. This result confirms that the pseudo-trend makes a meaningful contribution in terms of improving forecasting accuracy. Detailed monthly average forecasting errors (MAPE, %) for the proposed model are provided in
Table 7.
As shown in
Table 7, the application of the pseudo-trend improved forecasting performance in January, March, April, September, October, November, and December. However, in some other months, the improvement was less evident, which can be attributed to discrepancies between the pseudo-trend values generated for the training and test data, as well as the model being in a pre-optimized state prior to hyperparameter tuning. These preliminary results nonetheless indicate that pseudo-trend augmentation has the potential to enhance forecasting performance, which becomes more evident after model optimization.
4.5. Hyperparameter Tuning Using Grid Search Method
Hyperparameters significantly influence both the training efficiency and forecasting accuracy of LSTM models. The types of tunable hyperparameters vary depending on the architecture of the forecasting model. In this study, the proposed model adopts a relatively simple structure based on parallel LSTM layers, and the tuning process was therefore restricted to the number of layers and the number of neurons per layer. Other parameters were fixed to widely adopted default settings for reproducibility. Specifically, the Adam optimizer was employed, and the batch size was set sufficiently large to process the entire dataset at once. Dropout was not applied in order to preserve the temporal dependency inherent in the LSTM architecture. To mitigate overfitting, the training dataset was randomly split into 80% for training and 20% for validation. Model performance was monitored by comparing training and validation losses, and the training was terminated early if the validation loss did not improve for more than 200 epochs.
Given the model’s simplicity and the small number of hyperparameters, a grid search method [
26] is employed to identify the parameter combination that yields the highest forecasting accuracy within a predefined search space. The search ranges and the selected hyperparameter values are summarized in
Table 8, and the MAPE corresponding to each hyperparameter combination is illustrated in
Figure 9.
As shown in
Figure 9, the grid search results indicate that the best forecasting performance was achieved with a configuration of two LSTM layers and 64 features per layer, yielding an MAPE of 0.890%.
Table 9 summarizes the configurations and annual average forecasting errors of the comparison models. The model configurations are defined as follows: (1) KF—the Kalman filter-based predictor used independently as a pseudo-trend input; (2) Case 3—the model trained with the optimal input feature set selected via brute-force search; (3) baseline LSTM—a baseline LSTM model in which all input features are concatenated and processed through a single LSTM network; (4) Pseudo-Trend—the model configuration incorporating pseudo-trend enhancement. Hyperparameters are not optimized; (5) Proposed—the final model configuration incorporating pseudo-trend enhancement and optimized hyperparameters.
The results demonstrate that the proposed model outperforms the baseline configurations in forecasting accuracy, highlighting the effectiveness of both the input selection strategy and the pseudo-trend input enhancement.
As shown in
Table 9, the Kalman filter-based forecast, when used independently as the pseudo-trend input, produces relatively large errors across all evaluation metrics. In contrast, Case 3, which employs the best-performing input variable combination identified through correlation analysis, achieves higher forecasting accuracy. The results of Case 3 demonstrate that incorporating a similar load pattern with optimized input variables improves load forecasting accuracy.
In a similar manner, incorporating the pseudo-trend as an additional input, i.e., Pseudo-Trend, yields further improvements than Case 3. The Pseudo-Trend model incorporates the pseudo-trend as part of the input sequence, providing additional information about the trend of the target load. Since both D-7 and the pseudo-trend exhibit strong correlations with the target load, the model can capture more relevant information, leading to higher load forecasting accuracy.
A comparison between the Baseline LSTM and the Pseudo-Trend model shows that the Baseline LSTM achieves higher load forecasting accuracy. However, given that the Pseudo-Trend model was not optimized at this stage, a more equitable comparison should be made between the Baseline LSTM and the optimized Pseudo-Trend model. The results highlight the effectiveness of incorporating the pseudo-trend as an additional input sequence. With the inclusion of the pseudo-trend, the number of input sequences increased from three in the Case 3 model to four in the Proposed model, leading to a slight rise in computational cost; however, the forecasting errors were significantly reduced.
These results suggest that, although the Kalman filter-based predictor alone produces high forecasting errors, it captures meaningful trend information that enhances model performance when integrated with historical input sequences. A detailed comparison of monthly MAPE values for the comparison model configurations is provided in
Table 10.
As shown in
Table 10, the proposed algorithm achieved the highest forecasting accuracy in all months, except May and November. The load forecasting accuracy of the proposed model consistently outperformed that of the baseline LSTM across most months; however, in May, the baseline LSTM showed relatively better accuracy, achieving the lowest error of 0.836%. This can be attributed to the relatively stable load patterns observed in May, where the additional pseudo-trend component provided limited new information and, in some cases, introduced minor noise. Nevertheless, the proposed model achieved the best overall performance, on average, with an MAPE of 0.890%, especially in months characterized by greater variability in load and weather conditions. These results suggest that pseudo-trend augmentation is especially beneficial in periods of greater volatility, whereas future work may focus on adaptive schemes to further improve performance in months with relatively stable demand patterns. Compared with the pre-optimized results reported in
Table 7, these results confirm that hyperparameter tuning enabled the model to more effectively leverage the pseudo-trend input, leading to consistent improvements across nearly all months.
5. Conclusions
This study proposed a VSTLF algorithm for a nationwide power system with a 6 h forecasting horizon. The proposed approach demonstrates strong potential for deployment in large-scale power system operations and the efficient operation of an electrical energy market, as it consistently improved the accuracy of load forecasting compared with baseline models.
The model’s effectiveness stems from two key contributions: (1) a correlation-based input selection strategy that identifies the most informative variables and (2) the incorporation of Kalman filter-based pseudo-trend sequences that capture underlying short-term load dynamics. Together, these elements reduced the overall forecasting error to 0.890% MAPE and enhanced accuracy across nearly all months in the evaluation period.
These findings confirm the utility of combining systematic feature selection with pseudo-trend augmentation to develop more reliable VSTLF models. Nonetheless, limitations remain, as the analysis focused only on normal weekdays. Future research should extend the evaluation to holidays and investigate adaptive strategies for pseudo-trend generation and season-specific feature selection to improve performance during periods of relatively stable demand. In addition, exploring advanced deep learning architectures, such as attention mechanisms and hybrid models, may further enhance forecasting accuracy and interpretability. Another promising direction is to account for heterogeneous weather conditions across zones by performing forecasts at the zonal level, then aggregating them, which may capture regional variability more effectively and improve nationwide forecasting performance.