Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Research on a Short-Term Electric Load Forecasting Model Based on Improved BWO-Optimized Dilated BiGRU

Sustainability 2025, 17(21), 9746; https://doi.org/10.3390/su17219746

by Ziang Peng^*, Haotong Han and Jun Ma

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Sustainability 2025, 17(21), 9746; https://doi.org/10.3390/su17219746

Submission received: 17 June 2025 / Revised: 2 October 2025 / Accepted: 27 October 2025 / Published: 31 October 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

This manuscript introduces a novel short-term electric load forecasting model combining dilated BiGRU architecture and an Improved Beluga Whale Optimization (IBWO) algorithm. While the model design is innovative and the reported forecasting results are promising, several critical methodological issues need to be addressed:

Dataset Description: The manuscript does not specify the type of consumer (e.g., residential, industrial) nor the total volume or timeframe of the dataset used. This limits the interpretability and reproducibility of the study.

Lack of State-of-the-Art Benchmarking: The paper lacks a structured review of the current forecasting methods. We strongly recommend adding a benchmark table comparing various techniques (e.g., ARIMA, SVM, LSTM, hybrid models) under different load types and forecasting horizons (short-term vs ultra-short-term).

Model Validation: The model is validated only on the same dataset used for training. It is essential to test the model on external datasets to assess generalization, especially for practical applications where load characteristics may vary.

Comparative Fairness: Without identifying the type of load or consumer, comparisons with baseline models (e.g., LSTM, CNN-GRU) may be misleading. Forecasting accuracy is highly dependent on load volatility and temporal patterns.

Summary Table Recommendation: Consider including a summary table showing the performance of the proposed model versus others, categorized by accuracy, computation cost, and suitability across different demand profiles.

With these improvements, the manuscript will better position itself within the scholarly literature and provide a stronger basis for evaluating the model’s contribution and applicability.

Best Regards,

Author Response

Response to comments by Reviewer #1:

We sincerely appreciate your valuable review comments. Your feedback is both professional and insightful, particularly in the areas of data description, model validation, and benchmark comparison. These suggestions have provided crucial guidance for optimizing our research design and enhancing the overall quality of the manuscript. Following your recommendations, we have systematically revised the content of the manuscript, with detailed modifications outlined in the following sections.

Below are our detailed responses to each of your suggestions.

Comments:

Dataset Description: The manuscript does not specify the type of consumer (e.g., residential, industrial) nor the total volume or timeframe of the dataset used. This limits the interpretability and reproducibility of the study.

Response:

We sincerely thank you for highlighting this important point. Indeed, in the original manuscript, we failed to explicitly specify the user type, time range, and total volume of the dataset utilized—an oversight in our writing process. Your comment draws attention to a detail that, while easily overlooked, is critical for readers to accurately understand the study. It has prompted us to adopt a more rigorous approach in our data description, thereby enhancing the interpretability and reproducibility of our research.

The dataset used in this study is sourced from the Kaggle platform, titled “Electricity Load Forecasting Short-term electricity load forecasting (Panama case study),” with the following link: https://www.kaggle.com/datasets/ernestojaguilar/shortterm-electricity-load-forecasting-panama. This dataset comprises historical electricity load data collected from the daily dispatch reports issued by the national grid operator, thus reflecting the overall urban electricity demand. It includes all user groups—residential, commercial, and industrial—rather than being limited to a specific sector or a single user type.

The dataset spans from January 30, 2015, to June 27, 2020, with data recorded on an hourly basis. The original load feature contains 48,049 records; when combined with additional meteorological and temporal variables, the dataset comprises a total of 768,784 samples. For this study, we selected a subset of the data covering the period from January 1, 2019, to June 1, 2020. From this subset, we extracted nine feature variables, using load values as the prediction target. This segment contains 12,431 records per feature, resulting in a total of 111,879 data points.

In accordance with your suggestion, we have now supplemented the manuscript with critical details regarding the data source, time range, and scale. These additions are included in Section 2.1, “Feature Engineering.” Such information is essential for readers to understand the data foundation upon which the model is built, and it also facilitates comparative modeling and methodological studies under similar data conditions by other researchers.

Furthermore, the dataset we selected possesses advantages in openness, authenticity, and completeness. First, the dataset is fully open-source, allowing researchers to access it directly through a public platform, thereby lowering the data acquisition threshold. Second, the data originate from national grid dispatch records, reflecting real electricity demand and offering high representativeness and practical value. Third, we have clearly described the data processing workflow and feature engineering strategies in the manuscript, providing a valuable reference framework for future studies employing this dataset.

Lack of State-of-the-Art Benchmarking: The paper lacks a structured review of the current forecasting methods. We strongly recommend adding a benchmark table comparing various techniques (e.g., ARIMA, SVM, LSTM, hybrid models) under different load types and forecasting horizons (short-term vs ultra-short-term).

Response:

The issue you pointed out is very accurate, and comparing additional baseline models is of great significance for validating the effectiveness of our method. Therefore, in the revised manuscript, we have included experimental comparisons with several typical models, including the traditional statistical model ARIMA, machine learning methods such as SVM and Random Forest (RF), as well as the ensemble learning method XGBoost. The corresponding prediction results have been summarized in the revised Table 1, which includes performance metrics such as RMSE, R², and MAE, providing a clearer and more intuitive comparison of the predictive capabilities of different methods.

Table 1. Forecasting Accuracy of Benchmark Models under Key Evaluation Metrics

Model	RMSE	R²	MAE
ARIMA	77.1089	0.8272	59.6386
RF	74.0867	0.8453	57.1238
SVM	70.5128	0.8555	53.6242
XGBoost	69.3699	0.8636	52.9690

To enable readers to gain a comprehensive understanding of the predictive performance of each model, we have added a comparative experiment of baseline models in Section 3.2, "Baseline Prediction Models." We have redrawn the related result graphs and summarized the prediction accuracy of each model under key evaluation metrics in a table, offering a more intuitive demonstration of their performance. The prediction results of the newly added baseline models, including ARIMA, SVM, Random Forest (RF), and XGBoost, are presented here for the reviewers' reference. Additionally, we have briefly analyzed the predictive performance of each baseline model to further highlight the advantages of the model used in this paper in terms of prediction accuracy and stability.

It is worth noting that, in the power system, short-term load forecasting typically refers to predicting the load for the next 1 hour to several days (usually 24 to 168 hours). Such forecasts are mainly used for daily scheduling, grid operation planning, electricity market pricing, and energy optimization management, among other critical tasks. This type of forecasting is highly practical and has widespread engineering demand.

On the other hand, ultra-short-term load forecasting generally refers to high-frequency predictions of load within a few minutes to 1 hour, commonly used in scenarios such as real-time frequency regulation, demand response control, and safety margin assessments, where there are higher requirements for model timeliness and response speed.

This study focuses on short-term load forecasting, based on two main considerations: first, short-term load forecasting holds higher practical application value in most grid scheduling tasks; second, load data at this time scale typically contains more pronounced trend and periodic structures, making it more suitable for evaluating the effectiveness of the proposed data processing method in modeling complex features. By focusing on the STLF scenario, we are able to more stably validate the advantages of the proposed method in improving the model’s generalization ability and accuracy, while also aligning with the operational scheduling needs of most power systems.

Model Validation: The model is validated only on the same dataset used for training. It is essential to test the model on external datasets to assess generalization, especially for practical applications where load characteristics may vary.

Response:

The feedback you provided is of great significance in improving our manuscript, especially in offering clear guidance on enhancing the model’s generalization ability and practical application value. In response to this comment, we have added a new section on robustness experiments in the revised manuscript, as detailed in Section 3.5, “Robustness Experiment.” This section uses publicly available electricity load data from the ENTSO-E platform to test the model. The platform covers most of Europe and provides hourly load data, with publicly transparent sources and easy access, ensuring good availability and facilitating the assessment of the model’s adaptability in different geographic regions and load characteristics.

For this experiment, we selected the electricity load data for the entire year of 2023 from the Georgia region, which consists of 8,760 data samples. To help readers better understand the robustness experiment results, we have supplemented the revised manuscript with model fitting effect diagrams and comparison charts of the models' performance across evaluation metrics. The corresponding evaluation metrics table is also included for your reference, (data source: https://transparency.entsoe.eu/load-domain/r2/totalLoadR2/show?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=22.07.2025+00:00|CET|DAY&biddingZone.values=CTY|10Y1001A1001B012!BZN|10Y1001A1001B012&dateTime.timezone=CET_CEST&dateTime.timezone_input=CET+(UTC+1)+/+CEST+(UTC+2)).

Figure 1. Robustness Evaluation of Model Fitting on Georgia Load Data

Table 2. Comparison of Model Prediction Performance on External Dataset

Model	RMSE	R2	MAE
BiGRU	82.5158	0.8954	68.2491
TCN	76.0564	0.9035	64.4329
Dilated BiGRU	60.8832	0.9424	50.0802
BWO-Dilated BiGRU	46.3615	0.9661	36.8688
IBWO-Dilated BiGRU	38.8091	0.9783	31.0496

As shown in Figure 1, all models exhibit a certain degree of fitting ability on the additional dataset and can accurately reflect the overall trend of electricity load changes. Furthermore, Table 2 compares the performance of the models under different evaluation metrics, and the results indicate that they all demonstrate good robustness on the additional dataset. Among these, the IBWO-Dilated BiGRU model achieved the best results across all evaluation metrics, with an RMSE of 38.8091, an MAE of 31.0496, and an R² of 0.9783, significantly outperforming other models.

Once again, we appreciate your suggestion, which has provided valuable guidance for improving our model evaluation system. The robustness experiment not only verified the models' adaptability in additional data environments but also further highlighted the stability and practical application potential of the proposed method in load conditions from other regions, providing a solid basis for its future deployment in broader application scenarios.

Comparative Fairness: Without identifying the type of load or consumer, comparisons with baseline models (e.g., LSTM, CNN-GRU) may be misleading. Forecasting accuracy is highly dependent on load volatility and temporal patterns.

Response:

Thank you for your valuable comments regarding the fairness of comparisons. We fully understand that without clearly defining load types or user composition, the performance comparison between models may have certain limitations. In response to this issue, we have added a detailed description of the dataset in Section 2.1, “Feature Engineering,” of the revised manuscript, clearly stating that the dataset reflects the overall urban load, covering residential, commercial, and industrial users, which enhances its representativeness.

This paper proposes a generalizable feature engineering method, which is not only applicable to the Panama dataset used in this study but also extends to other multivariate time series tasks. By uniformly processing time and environmental variables, the method effectively extracts key factors influencing load variations, thereby enhancing the model’s ability to capture complex temporal structures and improving overall prediction performance.

It is important to emphasize that this feature engineering method does not depend on specific load types or user composition. The uniform processing strategy applied to the input variables enables it to adapt to the data characteristics of different regions and electricity usage structures. This provides effective support for the direct transfer and deployment of the model across various scenarios.

To further validate the model's generalization ability under different geographic and load conditions, we have added a robustness experiment in Section 3.5, “Robustness Experiment,” of the revised manuscript. This experiment uses the hourly electricity load data for the entire year of 2023 from Georgia provided by the ENTSO-E platform, which has a significantly different temporal structure compared to the Panama dataset. The experimental results show that the proposed IBWO-Dilated BiGRU model still maintains excellent fitting performance on this data, demonstrating its strong generalization ability.

Overall, despite differences in load types and temporal structures, the combined approach of feature engineering and model architecture design consistently demonstrates superior predictive performance across multiple data scenarios. This not only enhances the effectiveness of model comparisons but also provides solid support for its cross-regional promotion and application in real-world electricity load forecasting.

The generalizability of the feature engineering method has been further elaborated in Section 4, “Conclusions.”

Summary Table Recommendation: Consider including a summary table showing the performance of the proposed model versus others, categorized by accuracy, computation cost, and suitability across different demand profiles.

Response:

Thank you for your suggestion to include a summary table. This table helps visually present the comprehensive performance of each model in terms of prediction accuracy and computational efficiency, thus improving the completeness of the result presentation. In response to your suggestion, we have added Section 3.6, “Comparative Analysis of Prediction Accuracy and Resource Usage,” in the revised manuscript. This section systematically compares the performance of the proposed model and various baseline models based on key performance indicators, summarizing the results in a performance overview table that includes dimensions such as prediction accuracy, training time, and memory usage.

Table 3. Comparison of Model Performance in Prediction Accuracy, Training Time, and Memory Usage

Model	RMSE	R²	MAE	Time	Memory Usage
ARIMA	77.1089	0.8272	59.6386	314.2	207.1
RF	74.0867	0.8453	57.1238	377.1	214.5
SVM	70.5128	0.8555	53.6242	408.5	228.9
RNN	69.7217	0.8613	54.5778	450.6	230.4
XGBoost	69.3699	0.8636	52.9690	540.3	291.7
LSTM	63.2030	0.8748	46.4893	513.6	240.1
GRU	61.3893	0.8888	46.3095	498.7	233.8
BiLSTM	60.0619	0.8912	45.0527	554.9	251.1
BiGRU	52.3161	0.9132	39.8785	521.7	247.6
TCN	50.5941	0.9222	39.0946	566.8	307.4
Dilated BiGRU	44.0456	0.9464	31.9781	589.1	266.3
BWO-Dilated BiGRU	27.2682	0.9784	19.5211	852.7	279.4
IBWO-Dilated BiGRU	26.1706	0.9812	18.5462	873.5	286.2

As shown in Table 3, the IBWO-Dilated BiGRU model performs the best in terms of prediction accuracy, with the lowest RMSE of 26.1706, the highest R² of 0.9812, and the MAE of only 18.5462, significantly outperforming other baseline models. In terms of resource consumption, the memory usage of all models is below 310MB, and the training time is generally controllable, demonstrating good deployment feasibility. Although the training time for IBWO-Dilated BiGRU is slightly longer (873.5 seconds), the improvement in accuracy is evident, offering both accuracy and efficiency advantages.

This comparative result further validates that the proposed model not only ensures high prediction performance but also maintains a high level of computational efficiency, providing a solid foundation for practical engineering deployment. We sincerely appreciate your suggestion, which has allowed us to make clearer and more comprehensive improvements in result presentation and model evaluation.

Yours Sincerely,

Ziang Peng

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript has studied a short-term electric load forecasting model. Feature engineering, model architecture design, and multi-strategy parameter optimization are focused on. A method combined IBWO with BiGRU is proposed. Major comments are as follows:

In the context of global efforts toward energy conservation and emission reduction, distributed renewable energy sources and integrated energy systems are being integrated on the load side, which includes distributed energy generation, energy storage, and multiple types of loads such as electricity, heat, and cooling. The system operating needs to comprehensively consider the coupled multi-energy supply and demand (Ref.: 'Source-load-storage distributional robust low-carbon economic scheduling considering carbon capture units and hydrogen energy storage system, International Journal of Hydrogen Energy, 2025'). Traditional single electric load analysis or forecasting can no longer adapt to current developments. The trend is gradually shifting toward comprehensive load demand forecasting. However, this manuscript mainly focuses on electric load forecasting. It is suggested that the manuscript could discuss why the single electric load forecasting is focused on in this development condition.
The manuscript needs to conduct an in-depth discussion on the impact of weakly correlated variables on load forecasting. It should also explain why, in this study, only the low-frequency components of features with weak correlations are retained while the high-frequency components are removed.
The literature review appears to be somewhat limited in scope and accuracy in Introduce. The manuscript points out that with the ongoing expansion of power systems and the large-scale integration of renewable energy into the grid, the complexity of power data has grown explosively. however, integration of renewable energy into the grid will not increase the complexity of power data in the load side, but that in the source side and in the grid side most. Only when renewable energy sources are integrated into the load side the complexity of the load data will be impacted (Ref.: More flexibility and waste heat recovery of a combined heat and power system for renewable consumption and higher efficiency, Energy, 2025). Similar issues need to be checked and revised.
The correlation coefficient between the load and the hour of a day is 0.3 in this manuscript. According to the definition in formula (1), please check the calculation result of this coefficient. If possible, provide the data values of in a typical day and calculate its correlation coefficient in the response.
Under the same initial solution condition, the proposed IBWO algorithm exhibits perfect convergence performance (Figures 11 and 12). The authors need to discuss or check why the algorithm demonstrates such perfect convergence characteristics. Although the dynamic Lévy flight mechanism and improved whale fall step strategy are designed to adapt to different stages of the optimization process, the perfect convergence curves in Figures 11 and 12 fail to reflect the role of these two improved methods.
The text requires thorough proofreading for errors. For instance, in the text and figures, 'GRU' is mistakenly written as 'GCU'.

Author Response

Response to comments by Reviewer #2:

We sincerely appreciate your in-depth and forward-looking review comments. Your feedback not only reflects a profound understanding of the research topic but also provides important guidance in areas such as methodological clarity, accuracy of referenced literature, and research positioning. In particular, your suggestions regarding the clarification of complexity sources on the load side, the refinement of the logic for handling weakly correlated features, and the inquiry into the convergence behavior of the optimization algorithm have prompted us to conduct a more rigorous review and revision of the manuscript, significantly enhancing the overall quality of the paper.

Below, we provide detailed responses to each of your comments.

Comments:

In the context of global efforts toward energy conservation and emission reduction, distributed renewable energy sources and integrated energy systems are being integrated on the load side, which includes distributed energy generation, energy storage, and multiple types of loads such as electricity, heat, and cooling. The system operating needs to comprehensively consider the coupled multi-energy supply and demand (Ref.: 'Source-load-storage distributional robust low-carbon economic scheduling considering carbon capture units and hydrogen energy storage system, International Journal of Hydrogen Energy, 2025'). Traditional single electric load analysis or forecasting can no longer adapt to current developments. The trend is gradually shifting toward comprehensive load demand forecasting. However, this manuscript mainly focuses on electric load forecasting. It is suggested that the manuscript could discuss why the single electric load forecasting is focused on in this development condition.

Response:

Thank you for your forward-thinking and professionally insightful comments. We fully agree that, in the context of global energy conservation, emission reduction, and the development of multi-energy complementarity, with the widespread integration of distributed renewable energy and integrated energy systems, the focus of load forecasting research is indeed shifting from single electricity load to multi-energy coupling and multi-type load collaborative analysis. Your suggestion not only reflects a profound understanding of the development trends in this field but also provides valuable inspiration for our future research direction, for which we sincerely thank you.

This study focuses on electricity load forecasting for the following reasons:

1.The Dominant Role of Electricity Load in Current Practice:

Although multi-energy systems have garnered widespread attention, electricity remains the most dominant energy carrier in practical scheduling and operations. Especially in developing countries and industrial sectors, short-term electricity load forecasting remains a core element in scheduling optimization, energy consumption management, and supply-demand balance.

Additionally, there is still a technical gap and knowledge barrier between current engineering practice and academic research. On the one hand, the engineering field tends to use mature and deployable electricity forecasting models in real-world applications; on the other hand, some cutting-edge research is difficult to directly apply to existing scheduling systems. Therefore, focusing on electricity load, a critical and realistic aspect, helps bridge the gap between scientific research and practical applications, improving the efficiency of transforming research into practice.

2.Data Availability and Model Verification Generalizability:

Compared to multi-energy load data, electricity load data is more abundant and has a higher degree of structural standardization, making it easier for model training and evaluation. Multi-energy load data often faces issues such as inconsistent data dimensions and limited sample sizes, which are not conducive to serving as a widely applicable base for model validation.

3.Research Focus and Future Expansion Potential:

This study aims to construct an accurate, clear, and scalable electricity load forecasting framework as a foundational module for future expansion into integrated load forecasting. The feature engineering and optimization strategies designed in this paper are highly generalizable and can be adapted to more energy carriers and multi-energy data scenarios in the future.

We have included a brief explanation in the revised manuscript, Section 4, “Conclusions,” outlining why we are currently focusing on electricity load forecasting and highlighting the potential directions for future work.

Once again, we thank you for your valuable comments, which have greatly advanced our thinking and refinement of the research positioning and application scenarios.

The manuscript needs to conduct an in-depth discussion on the impact of weakly correlated variables on load forecasting. It should also explain why, in this study, only the low-frequency components of features with weak correlations are retained while the high-frequency components are removed.

Response:

Thank you for your valuable feedback. Due to space limitations, we were unable to discuss in detail the impact of weakly correlated variables on load forecasting in the manuscript. In fact, we have conducted related research on this topic, and in response to your suggestion, we have summarized these conclusions and added them to the revised manuscript. Your input has played a crucial role in enhancing the depth and completeness of our study.

During the model training process, we found that the high-frequency components of weakly correlated variables generally lack stable trends or periodic structures, often appearing as noise disturbances. Such information not only fails to support the prediction but also interferes with the model's ability to learn the main features, especially in complex scenarios with multiple environmental factors, where the impact is more pronounced. In contrast, the low-frequency components contain the long-term trends of load variations and can reveal the underlying patterns of the data. By effectively utilizing this information, the model is better able to capture long-term changes, thereby improving the accuracy of load forecasting.

Unlike some studies that directly exclude weakly correlated features, many researchers tend to completely discard these weakly correlated variables to simplify the model, believing they contribute little to prediction accuracy. However, in order to explore the potential information and long-term trends of weakly correlated variables, we propose a low-frequency extraction strategy that retains their trend components and removes the noise-dominated high-frequency parts, thereby making more effective use of the original data.

In response to your suggestion, we have designed and added comparative experiments in the revised manuscript (as shown in Figure 1). We compared the performance of "all feature signals decomposed into full-frequency (high-frequency and low-frequency) data" as input, direct input of the original dataset, and the data processing method proposed in this paper. In the experiment, we evaluated the prediction accuracy under the three data processing strategies using GRU, BiLSTM, and BiGRU deep learning models.

To further quantify the performance of each model under different evaluation metrics, we have listed the corresponding evaluation indicators, including RMSE, R², and MAE, in Table 1. The analysis combined with the charts provides a more comprehensive view of the effectiveness and stability of the proposed data processing method in improving load forecasting accuracy.

Figure 1. Comparison of Forecasting Performance Before and After Feature Processing

Table 1. Performance Comparison Before and After Feature Engineering Across Different Models

	All features kept after decomposition			Raw data input directly			Data processed as proposed in this paper
Model	RMSE	R²	MAE	RMSE	R²	MAE	RMSE	R²	MAE
GRU	68.5934	0.8597	51.5732	65.8601	0.8705	49.5442	61.3893	0.8888	46.3095
BiLSTM	65.7128	0.8738	48.6737	65.6279	0.8786	48.6312	60.0619	0.8912	45.0527
BiGRU	64.2005	0.8817	48.7456	62.5517	0.8904	48.3497	52.3161	0.9132	39.8785

The experimental results show that after adopting the proposed strategy, the prediction accuracy of various deep models such as GRU, BiLSTM, and BiGRU significantly improved. For instance, compared to directly inputting the original data or retaining all components, the models under our method showed a notable decrease in RMSE and MAE, with R² also improving. This validates the significant role of the feature processing method proposed in this paper in enhancing model effectiveness and suppressing redundant information, demonstrating its broad adaptability across different model structures. Relevant content has been added to Section 2.1, "Feature Engineering," as detailed in the manuscript.

It is worth mentioning that your attention to the role mechanism of high-frequency components is one of the key directions we plan to explore in the next phase of our research. We have included this in our future work plan and will conduct more systematic research on the representational power and modeling value of high-frequency features in subsequent studies.

The literature review appears to be somewhat limited in scope and accuracy in Introduce. The manuscript points out that with the ongoing expansion of power systems and the large-scale integration of renewable energy into the grid, the complexity of power data has grown explosively. however, integration of renewable energy into the grid will not increase the complexity of power data in the load side, but that in the source side and in the grid side most. Only when renewable energy sources are integrated into the load side the complexity of the load data will be impacted (Ref.: More flexibility and waste heat recovery of a combined heat and power system for renewable consumption and higher efficiency, Energy, 2025). Similar issues need to be checked and revised.

Response:

Thank you for your professional suggestion regarding the description of data complexity in the review comments. We realize that the original wording may have caused ambiguity, potentially leading readers to believe that the integration of renewable energy directly contributes to the increased complexity of load-side data. In response, we have revised the relevant statements in the introduction to clarify that the primary impact is on the generation side and the grid side. Additionally, we have added an explanation to indicate that if the integration of renewable energy occurs on the user side, its output instability could influence the load data’s variation trend, thereby increasing the complexity of load-side data to some extent. This revision helps to improve the accuracy and rigor of the background description.

We sincerely appreciate your professional correction on this matter. Your feedback has prompted us to be more meticulous in our use of terminology and logical expression.

The correlation coefficient between the load and the hour of a day is 0.3 in this manuscript. According to the definition in formula (1), please check the calculation result of this coefficient. If possible, provide the data values of in a typical day and calculate its correlation coefficient in the response.

Response:

Thank you for your attention to the calculation of the correlation coefficient in your review comments. We understand your concern and would like to clarify: the reported correlation coefficient between load and hourOfDay (0.3) in the manuscript was indeed calculated using the Spearman correlation. Specifically, the calculation was performed using the DataFrame.corr(method='spearman') method from the Pandas library in Python, which was applied to the hourly load values and the corresponding hourly sequence (0–23) for all samples. This method is capable of identifying monotonic relationships between variables and does not rely on the linear distribution of the data, making it suitable for the statistical characteristics of load time series.

To enhance the transparency of our explanation, we have added a typical workday's load data and the corresponding hourly values in Table 2, allowing you to verify the data features and the applicability and validity of the Spearman calculation:

Table 2. Hourly Load Demand and Corresponding Hour of Day for a Typical Day

datetime	nat_demand	hourOfDay	datetime	nat_demand	hourOfDay
2015-01-31 01:00:00	954.2018	1	2015-01-31 13:00:00	1160.2838	13
2015-01-31 02:00:00	913.866	2	2015-01-31 14:00:00	1124.8878	14
2015-01-31 03:00:00	903.3637	3	2015-01-31 15:00:00	1112.4189	15
2015-01-31 04:00:00	889.0806	4	2015-01-31 16:00:00	1081.7406	16
2015-01-31 05:00:00	910.1472	5	2015-01-31 17:00:00	1064.8583	17
2015-01-31 06:00:00	922.1737	6	2015-01-31 18:00:00	1095.5704	18
2015-01-31 07:00:00	939.9442	7	2015-01-31 19:00:00	1116.6654	19
2015-01-31 08:00:00	1077.8575	8	2015-01-31 20:00:00	1094.677	20
2015-01-31 09:00:00	1179.6601	9	2015-01-31 21:00:00	1075.2083	21
2015-01-31 10:00:00	1255.1569	10	2015-01-31 22:00:00	1041.0831	22
2015-01-31 11:00:00	1253.4414	11	2015-01-31 23:00:00	988.5723	23
2015-01-31 12:00:00	1223.6116	12	2015-02-01 00:00:00	939.5286	0

In this set of typical workday hourly load data, the Spearman correlation coefficient between hourOfDay and nat_demand is 0.453, indicating a moderate positive correlation with statistical significance.

Therefore, the coefficient of 0.3 reported in the manuscript is based on a larger dataset, and due to the influence of seasonal fluctuations in the long-term data, the overall correlation is diluted, resulting in a lower correlation coefficient. This also confirms that the Spearman method is suitable for capturing long-term nonlinear relationships, but its calculation is affected by data seasonality and sample coverage.

Under the same initial solution condition, the proposed IBWO algorithm exhibits perfect convergence performance (Figures 11 and 12). The authors need to discuss or check why the algorithm demonstrates such perfect convergence characteristics. Although the dynamic Lévy flight mechanism and improved whale fall step strategy are designed to adapt to different stages of the optimization process, the perfect convergence curves in Figures 11 and 12 fail to reflect the role of these two improved methods.

Response:

Thank you for your attention to the convergence characteristics of the optimization algorithm. The issue you raised has played a significant role in prompting us to clarify the specific functions of the various modules in the optimization strategy, and has also helped us systematically examine the sources and components of the algorithm's performance.

In response to your comments, we have added Section 2.3.5, "Ablation Study of IBWO Algorithm," in the revised manuscript. This section quantitatively evaluates the actual contribution of each submodule to the convergence performance by gradually stripping the Tanh-Sobol Population Initialization, Dynamic Lévy Flight Mechanism, and Improved Whale Fall Step Strategy from the IBWO algorithm. This experiment helps to more clearly reveal the role of each improvement mechanism at different stages and further supports the justification for the design rationality of the optimization strategy.

Figure 2. Performance Comparison of IBWO Variants on the F1(x) Function

Table 3. Statistical Performance Metrics of IBWO Ablation Combinations on the F1(x) Function

	IBWO	Original BWO	BWO+ Tanh-Sobol	BWO+ Improved Fall	BWO+ Dynamic Levy	BWO+ Tanh-Sobol+ Improved Fall	BWO+ Tanh-Sobol+ Dynamic Levy	BWO+ Dynamic Levy+ Improved Fall
Optimum value	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Average value	0.2582	0.8342	0.5758	0.6298	0.5550	0.5982	0.4264	0.3301
Standard deviation	4.4499	5.8163	5.0560	5.8947	6.0157	6.2886	5.1161	4.5394

As shown in the iteration curves in Figure 2 and the statistical results in Table 3, the overall performance of the IBWO algorithm exhibits a continuous improvement trend with the gradual introduction of each improvement strategy. Notably, the addition of each module effectively enhances the algorithm’s optimization capability, particularly in the two key metrics of mean value and standard deviation.

Among all the individual strategies, the Dynamic Lévy flight mechanism brought the most significant improvement, reducing the mean value to 0.5550, which is better than the 0.6298 achieved by the Improved Fall strategy. This suggests that its enhancement of global search capability is more prominent. In contrast, while the Tanh-Sobol method brought some improvement in initialization, its contribution to the final accuracy was more limited.

Moreover, the combined multi-strategy approach demonstrates a clear synergistic effect. For example, the combination of Tanh-Sobol and Dynamic Lévy further reduced the mean value to 0.4264, while the final IBWO algorithm, integrating all modules, achieved the best result (0.2582) with the lowest standard deviation (4.4499). This indicates that the various improvement strategies complement each other in terms of optimization path, search depth, and convergence speed, collectively driving the algorithm to its optimal performance.

Once again, we appreciate your valuable suggestion regarding the convergence curve analysis. This suggestion prompted us to conduct a more systematic verification of the IBWO algorithm from the perspective of module functionality, further enhancing the rigor of the method design and the persuasiveness of the experimental section. The relevant ablation experiments and analysis results have been added to Section 2.3.5, "Ablation Study of IBWO Algorithm," in the manuscript.

The text requires thorough proofreading for errors. For instance, in the text and figures, 'GRU' is mistakenly written as 'GCU'.

Response:

Thank you for pointing out the error in the manuscript where "GRU" was mistakenly written as "GCU." This mistake indeed resulted from our oversight, as we failed to thoroughly proofread the initial draft. Following your suggestion, we have conducted a comprehensive review of the entire manuscript and all related figures, and have corrected all instances of "BiGCU" to "BiGRU." We appreciate your meticulous attention to this detail, and your suggestion has played a significant role in improving the quality of the paper.

Yours Sincerely,

Ziang Peng

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The authors propose a hybrid deep learning model for short-term electricity load forecasting. The model integrates feature selection techniques (using Spearman correlation and modal decomposition), a Dilated BiGRU architecture, and an enhanced optimization algorithm (Improved Beluga Whale Optimization, or IBWO).

The article is relevant and contributes to the body of knowledge on load forecasting, particularly by improving the traditional BiGRU model through the addition of a dilation mechanism and by enhancing the original BWO optimization algorithm.

Some sections would benefit from stronger support through appropriate references (e.g., lines 50–52). The authors could also elaborate on some of the limitations mentioned in the paper, ideally with concrete examples and supporting references (lines 88–89).
Please verify the validity of the claim regarding the 48,049 hourly samples between January 30, 2015, and June 27, 2020 — this may require a quick consistency check.
Additionally, review the use of the Spearman formula, which differs from Pearson correlation, and ensure that its application is correct.
Also, please provide units for evaluation metrics such as RMSE and MAE (e.g., in Table 11 and preceding sections).
Finally, check for consistency in the notation used in formulas, for example, in Equation 19 (line 459), clarify whether the symbol used is ‘alpha’ or the variable ‘a’. If it is ‘a’, ensure that it is defined somewhere in the text.

Author Response

Response to comments by Reviewer #3:

We appreciate you taking the time out of your busy schedule to thoroughly review our manuscript and provide valuable comments and suggestions. Your feedback has not only helped improve the quality of our paper but has also prompted us to reflect more deeply on the details within the manuscript.

Your suggestion to enhance the literature citations in certain chapters and to provide a more detailed discussion on the limitations encouraged us to expand and refine these sections further. Additionally, your recommendations regarding data consistency checks, verification of the Spearman formula, clarification of evaluation metric units, and consistency of formula symbols have led us to rigorously cross-check and correct the technical details in the paper, significantly improving its accuracy and adherence to academic standards.

Below, we provide detailed responses to each of your comments.

Comments:

Some sections would benefit from stronger support through appropriate references (e.g., lines 50–52).

We sincerely thank the reviewer for the valuable comments. In the revised manuscript, we have added relevant references in the section on the development of power load forecasting technologies. These references provide strong support for the background of the transition from traditional statistical methods to intelligent methods. Additionally, we have further analyzed the limitations of traditional statistical methods in power load forecasting. With the widespread integration of renewable energy sources and changes in user behavior patterns, traditional methods face increasing challenges in meeting the growing demand for accuracy.

The new references include:

Rahmani-Sane, G.; Azad, S.; Ameli, M.T.; Haghani, S. The Applications of Artificial Intelligence and Digital Twin in Power Systems: An In-Depth Review. IEEE Access 2025. [https://doi.org/10.1109/ACCESS.2025.3580340]
Akhtar, S.; Shahzad, S.; Zaheer, A.; Ullah, H.S.; Kilic, H.; Gono, R.; Jasiński, M.; Leonowicz, Z. Short-term load forecasting models: A review of challenges, progress, and the road ahead. Energies 2023, 16, 4060. [https://doi.org/10.3390/en16104060]
Andriopoulos, N.; Magklaras, A.; Birbas, A.; Papalexopoulos, A.; Valouxis, C.; Daskalaki, S.; Birbas, M.; Housos, E.; Papaio-annou, G.P. Short term electric load forecasting based on data transformation and statistical machine learning. Applied Sci-ences 2020, 11, 158. [https://doi.org/10.3390/app11010158]
Singh, A.K.; Khatoon, S.; Muazzam, M.; Chaturvedi, D.K. Load forecasting techniques and methodologies: A review. In Proceedings of the 2012 2nd international conference on power, control and embedded systems, 2012; pp. 1-10. [https://doi.org/10.1109/ICPCES.2012.6508132]

These references not only review the research progress and challenges of load forecasting methods but also explore the gap between real-world demands and existing traditional methods. They further complement the analysis in our paper and strongly support the rationale for the choice of methods.

Comments:

2.The authors could also elaborate on some of the limitations mentioned in the paper, ideally with concrete examples and supporting references (lines 88–89).

Response:

We appreciate the reviewer’s detailed suggestions regarding the limitations section of the paper. Based on your comments, we have added relevant references to support the discussion on limitations, which highlight the challenges faced by machine learning methods in modeling high complexity and high-dimensional data, and have provided strong support for our analysis.

The new references include:

Azeem, A.; Ismail, I.; Jameel, S.M.; Harindran, V.R. Electrical load forecasting models for different generation modalities: a review. IEEE Access 2021, 9, 142239-142263. [https://doi.org/10.1109/ACCESS.2021.3120731]
Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustainable cities and society 2017, 35, 257-270. [https://doi.org/10.1016/j.scs.2017.08.009]

Additionally, we have provided specific examples regarding the limitations of machine learning methods. For example, traditional machine learning models perform poorly when handling the non-stationarity of load data caused by renewable energy integration and changes in user behavior. This analysis offers useful insights for future improvements to forecasting methods, further enhancing the comprehensiveness and academic value of the paper.

Please verify the validity of the claim regarding the 48,049 hourly samples between January 30, 2015, and June 27, 2020 — this may require a quick consistency check.

Response:

We sincerely thank the reviewer for the detailed and thoughtful comment. Upon verification, we acknowledge an error in the manuscript regarding the dataset description. The actual dataset consists of 48,048 hourly samples, spanning from January 3, 2015, 01:00 to June 27, 2020, 00:00. However, in the original manuscript, we mistakenly wrote the start date as January 30, 2015. We truly appreciate your careful reading and correction.

We have revised the relevant description in the manuscript accordingly and re-verified the time span of the dataset to ensure consistency and accuracy. This correction not only improves the precision of our data reporting but also strengthens the overall transparency and reproducibility of the experimental setup.

Additionally, review the use of the Spearman formula, which differs from Pearson correlation, and ensure that its application is correct.

Response:

Thank you very much for your careful review and insightful comment — your observation is absolutely correct. The original description of the Spearman correlation coefficient in the manuscript contained an oversight on our part. As is well known, the Spearman coefficient is essentially the Pearson correlation applied to the ranked values R(x) and R(y), but our formula failed to clearly indicate the use of ranks, which may cause confusion with the standard Pearson correlation.

We have revised both the formula and its accompanying explanation in the revised manuscript to explicitly clarify that Spearman correlation is computed based on rank values. We appreciate your attention to this detail, as it greatly contributes to improving the clarity and precision of the paper.

Also, please provide units for evaluation metrics such as RMSE and MAE (e.g., in Table 11 and preceding sections).

Response:

Thank you very much for your valuable comment. In response, we have explicitly clarified the units for RMSE and MAE (unit: MW) in Section 3.1, Performance Evaluation Metrics of the revised manuscript. Furthermore, based on your suggestion, we conducted a thorough review and identified a mislabeling of units in one of the fitting plots, which has now been corrected accordingly.

Your comment precisely identified a key detail we had overlooked, demonstrating your meticulous attention to scientific accuracy and rigor. This feedback significantly contributes to improving the clarity and completeness of our work.

Finally, check for consistency in the notation used in formulas, for example, in Equation 19 (line 459), clarify whether the symbol used is ‘alpha’ or the variable ‘a’. If it is ‘a’, ensure that it is defined somewhere in the text.

Response:

Thank you for pointing this out. We confirm that the symbols α and a in Equations (17) and (19), respectively, refer to different parameters. Specifically, α is the adaptive stability index introduced in our improved Lévy flight mechanism (Eq. 17), while a in Eq. (19) is a standard parameter used in the computation of the Lévy scale factor σ, as commonly defined in the literature. To avoid confusion, we have added clarifying statements in the manuscript to distinguish these two symbols.

In addition, we have conducted a thorough review of all the formulas in the manuscript to ensure that similar symbol inconsistencies do not occur.

Once again, we would like to express our gratitude to the reviewer for the detailed review and valuable suggestions. Your feedback has played a significant role in improving the structure of the paper and enhancing the accuracy of the content, and we truly appreciate your assistance.

Yours Sincerely,

Ziang Peng

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

The manuscript presents a relevant and technically solid approach for hybrid decomposition and forecasting of energy consumption using CEEMDAN, VMD and IBWO, with LSTM and BiLSTM integration. The revised version shows significant improvements in structure, figures, clarity of the equations, and the English language.

However, to further enhance the article and strengthen its scholarly merit, I suggest: (1) Include a short comparative discussion on how the proposed model performs against classical forecasting methods (e.g., LSTM, XGBoost, SVR) using the same evaluation metrics, and (2) Add a brief critical note on the trade-offs of using IBWO, such as computational cost and generalization potential, to improve transparency and balance in the analysis.

Best regards,

Author Response

Response to comments by Reviewer #1:

We sincerely thank you for taking the time out of your busy schedule to review our manuscript once again and for providing us with insightful and constructive feedback. We deeply appreciate your continued attention and recognition of our work, as well as the valuable suggestions offered during this round of review. Your rigorous and professional comments have significantly contributed to improving the quality of our paper by prompting deeper reflection and refinement of our model performance analysis and presentation. We are truly grateful for your support.

Below are our detailed responses to each of your suggestions.

Comments:

However, to further enhance the article and strengthen its scholarly merit, I suggest: (1) Include a short comparative discussion on how the proposed model performs against classical forecasting methods (e.g., LSTM, XGBoost, SVR) using the same evaluation metrics, and (2) Add a brief critical note on the trade-offs of using IBWO, such as computational cost and generalization potential, to improve transparency and balance in the analysis.

Response:

We sincerely thank the reviewer for the valuable comments. In the revised version, Section 3.6 "Comparative Analysis of Prediction Accuracy and Resource Usage" has been updated and expanded to include a comparative analysis of the proposed model’s performance against LSTM, XGBoost, and SVR under the same evaluation metrics.

Specifically, the proposed IBWO-Dilated BiGRU model demonstrates superior prediction accuracy, achieving RMSE = 26.1706, MAE = 18.5462, and R² = 0.9812. Compared to the average performance of three classical models (SVR, XGBoost, and LSTM) on the same dataset, the RMSE of our model is reduced by approximately 61.34%, MAE by 63.64%, and R² is improved by about 13.47%.

These results indicate that, under identical evaluation criteria, the proposed hybrid optimization model exhibits a significant advantage in prediction accuracy.

Building on this, we further analyzed the computational resource consumption of the proposed model and conducted a comparative study using three representative models: LSTM, XGBoost, and TCN, which respectively represent classical neural networks, traditional machine learning methods, and high-performance deep learning models.

Our IBWO-Dilated BiGRU model again achieved the best prediction performance, with RMSE = 26.1706, MAE = 18.5462, and R² = 0.9812. Compared with the average performance of the three baseline models (RMSE = 61.06, MAE = 46.18, R² = 0.8869), the proposed model reduces RMSE by approximately 57.14%, MAE by 59.83%, and improves R² by 10.62%, demonstrating a clear performance advantage.

In terms of computational resources, the IBWO-Dilated BiGRU model requires 873.5 seconds of runtime, which is 333.27 seconds longer than the average runtime (540.23 seconds) of LSTM, XGBoost, and TCN. The memory usage is 286.2 MB, slightly higher than their average (279.73 MB), with an increase of 6.47 MB. Additionally, compared to the same type of model without optimization algorithms, the runtime increases by approximately 300 seconds. Although resource consumption increases, the overall growth is moderate and remains within a controllable range. Given the substantial improvement in prediction performance, this level of resource investment is acceptable. For practical deployment scenarios with strict efficiency requirements, we recommend a flexible trade-off between model applicability, available resources, and accuracy demands.

To more comprehensively reflect the applicability and limitations of the proposed method, we have supplemented Section 3.6, "Comparative Analysis of Prediction Accuracy and Resource Usage," with relevant analysis to more transparently and objectively present the trade-off between the performance gains brought by IBWO and its associated resource costs.

Finally, in future work, we plan to explore more lightweight metaheuristic optimization methods as a direction to further improve model efficiency. This aims to reduce computational resource overhead while maintaining predictive accuracy. This research prospect has been added to the Conclusions section.

Yours Sincerely,

Ziang Peng

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

More work is needed to improve this research.

Author Response

Response to comments by Reviewer #2:

We thank the reviewer for taking the time to evaluate our manuscript. Although no specific revision comments were provided, we noted that you selected “Must be improved” for all six standard evaluation criteria provided by MDPI:

1.Is the content succinctly described and contextualized with respect to previous and present theoretical background and empirical research (if applicable) on the topic?

2.Are the research design, questions, hypotheses and methods clearly stated?

3.Are the arguments and discussion of findings coherent, balanced and compelling?

4.For empirical research, are the results clearly presented?

5.Is the article adequately referenced?

6.Are the conclusions thoroughly supported by the results presented in the article or referenced in secondary literature?

In response, we carefully reviewed our manuscript in conjunction with the detailed comments from the other reviewers, and made several improvements based on the items you selected. These include the addition of relevant literature references, a more in-depth discussion of methodological limitations, verification of data and equations, and clarification of the units used for evaluation metrics. We believe these revisions have significantly enhanced the academic rigor, completeness, and overall quality of the manuscript.

Comments:

More work is needed to improve this research.

Response:

1.We have added new references in the Introduction section to strengthen the discussion on the development of short-term power load forecasting methods. The revised content systematically outlines the evolution from traditional statistical techniques to machine learning and deep learning approaches. These additions not only provide a more solid theoretical foundation for the methodology adopted in this study but also enhance the logical coherence of the research background. The cited works include several representative studies published in recent years, ensuring that the discussion is aligned with current research trends and reflects a strong connection to ongoing technological developments.

The newly cited references include:

Rahmani-Sane, G.; Azad, S.; Ameli, M.T.; Haghani, S. The Applications of Artificial Intelligence and Digital Twin in Power Systems: An In-Depth Review. IEEE Access 2025. [https://doi.org/10.1109/ACCESS.2025.3580340]

Akhtar, S.; Shahzad, S.; Zaheer, A.; Ullah, H.S.; Kilic, H.; Gono, R.; Jasiński, M.; Leonowicz, Z. Short-term load forecasting models: A review of challenges, progress, and the road ahead. Energies 2023, 16, 4060. [https://doi.org/10.3390/en16104060]

Andriopoulos, N.; Magklaras, A.; Birbas, A.; Papalexopoulos, A.; Valouxis, C.; Daskalaki, S.; Birbas, M.; Housos, E.; Papaioannou, G.P. Short term electric load forecasting based on data transformation and statistical machine learning. Applied Sciences 2020, 11, 158. [https://doi.org/10.3390/app11010158]

Singh, A.K.; Khatoon, S.; Muazzam, M.; Chaturvedi, D.K. Load forecasting techniques and methodologies: A review. In Proceedings of the 2012 2nd international conference on power, control and embedded systems, 2012; pp. 1-10. [https://doi.org/10.1109/ICPCES.2012.6508132]

2.We have supplemented the Introduction section with a discussion of the limitations and challenges identified in previous studies. A brief analysis of these limitations has been provided to highlight the shortcomings of traditional methods and existing machine learning approaches in dealing with complex load data, nonlinear characteristics, and non-stationary patterns. This addition further emphasizes the necessity and novelty of the research presented in this paper.

The specific references include:

Azeem, A.; Ismail, I.; Jameel, S.M.; Harindran, V.R. Electrical load forecasting models for different generation modalities: a review. IEEE Access 2021, 9, 142239-142263. [https://doi.org/10.1109/ACCESS.2021.3120731]

Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustainable cities and society 2017, 35, 257-270. [https://doi.org/10.1016/j.scs.2017.08.009]

3.We have added a discussion on the robustness performance of the model in Section 3.5: Robustness Experiment, further emphasizing the generalization ability and stability of the IBWO-Dilated BiGRU model when applied to data from different regions.

4.We have revised the Introduction section to better connect our findings with practical applications in power grid management. This enhancement aims to highlight the real-world value of our proposed model and strengthen the overall contribution of the manuscript.

5.We conducted a comprehensive review of all figures and tables in the manuscript to ensure the accuracy of the data and information presented. In particular, we corrected unit-related errors that were identified in some original figures, thereby improving the standardization and rigor of visual representations. These adjustments help ensure that readers can interpret the results more clearly and accurately.

6.We thoroughly reviewed all equations in the manuscript, with a focus on ensuring the consistent use of parameters and eliminating any potential confusion or ambiguity. In addition, we supplemented and refined the definitions of certain variables to enhance the clarity and standardization of the formulae, ensuring that readers can accurately understand the meaning of each parameter and the logic of the equations.

Specifically, we corrected Formula 1 in the data visualization section, addressing an error caused by imprecise expression in the original version. This correction ensures the mathematical accuracy of the formula and improves the overall precision and rigor of the manuscript.

7.We have explicitly clarified and annotated the units for RMSE and MAE to prevent potential misunderstandings when interpreting the evaluation metrics. Furthermore, we have added an explanation in Section 3.1: Performance Evaluation Metrics regarding the rationale for selecting RMSE, MAE, and R² as performance indicators. This addition enhances the validity and persuasiveness of the evaluation methodology, helping readers better understand the role of these metrics in assessing model prediction accuracy and fitting performance.

8.We have added an explanation regarding the rationale for selecting the Panama dataset. Additionally, we corrected a date-related error in the original text to ensure that the data description is accurate, precise, and consistent with academic standards.

9.We have expanded Section 3.6: Comparative Analysis of Prediction Accuracy and Resource Usage by providing a more detailed comparison among three representative models—LSTM (traditional neural network), XGBoost (ensemble learning method), and TCN (advanced deep learning model). This comparison further clarifies the rationale and advantages behind selecting the IBWO-Dilated BiGRU model as our final approach. We emphasized that this model outperforms the others across multiple key performance indicators, particularly in RMSE, MAE, and R², which reflect predictive accuracy and fitting capability, thereby demonstrating strong modeling power and generalization ability.

At the same time, we conducted a systematic analysis of the model’s resource usage. Although the training time of the IBWO-Dilated BiGRU increased by approximately 333 seconds compared to other models, its memory consumption remained relatively low at 286.2 MB. These results indicate that the model maintains an acceptable level of resource consumption and exhibits good flexibility and feasibility for real-world deployment.

We further explained that the introduction of the IBWO optimization algorithm was motivated by the need to enhance both prediction accuracy and global search capability. Its goal is to improve the model’s adaptability to complex load variations and enhance robustness. While this optimization strategy increases computational overhead to some extent, we consider the trade-off justified and necessary given the significant gains in prediction accuracy and overall performance—particularly in power system scenarios where high forecasting precision is required.

Through the additions and comparative analysis in this section, we have clearly demonstrated the balance achieved by the IBWO-Dilated BiGRU model between performance and resource efficiency, and we have provided stronger theoretical support for the model selection, enhancing both the persuasiveness and academic rigor of the manuscript.

10.In the Conclusions section, we have added a discussion of future research directions. Specifically, we plan to explore more lightweight metaheuristic optimization algorithms with the aim of further reducing computational resource consumption while maintaining prediction accuracy. This will improve the model’s operational efficiency and deployment flexibility in real-world applications. Additionally, researchers within our research group have already expressed strong interest in this area, and further in-depth investigations are planned along this line in future work.

We would like to thank you once again for taking the time to review our manuscript. We have carefully revised and improved the paper based on your comments and the suggestions from other reviewers. We believe that these modifications have significantly enhanced the quality of the manuscript, and we hope that the revised version now meets the publication requirements.

Yours Sincerely,

Ziang Peng

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

The reviewer highly appreciates the substantial efforts made by the authors in this research work. However, the authors have not fully addressed the questions raised by the reviewer in the previous manuscript revision. The reviewer will only elaborate on one of the previously raised questions as a reference for the authors. The reviewer expects to see the authors achieve substantial research progress.

This study aims to improve the accuracy of electricity load forecasting by incorporating weakly correlated features. However, regarding this approach, the following issues merit further discussion:

Weakly correlated features are generally considered to have minimal impact on prediction results. However, among these features, weakly correlated features or strongly correlated features, why are only the low-frequency components of weakly correlated features considered, while their high-frequency components are ignored?
High-frequency components are usually regarded as noise. Among all high-frequency components, why are only the high-frequency components of strongly correlated features retained, while those of weakly correlated features are discarded?
In the revised manuscript, the authors state: "Unlike some studies that directly exclude weakly correlated features, many researchers tend to completely discard these weakly correlated variables to simplify the model, believing they contribute little to prediction accuracy. However, in order to explore the potential information and long-term trends of weakly correlated variables, ……". Yet, how should this "potential information" be evaluated? After all, potential information is inherently embedded in the original dataset. Therefore, if the forecasting model is ‘big’ enough, the whole information will be captured, in theory.
The manuscript compares the prediction results under three data processing strategies, but this is only based on a single case. The forecasting results based on the original dataset are worse than those based on decomposition. This is because the adopted method fails to model the characteristics of the original dataset. Scientific research typically aims to derive universal methods or theories, so the generalizability of the proposed approach still needs further verification.

Author Response

Response to comments by Reviewer #2:

We sincerely thank you for once again reviewing our manuscript with such care and rigor. In the second-round review, although you did not specify detailed revisions and simply suggested that the manuscript “must be improved,” your brief comment implicitly pointed to deep structural and logical issues. In the third round, you provided thorough and insightful clarification of these concerns, which allowed us to better understand the depth of your expectations. We must admit that your ability to capture underlying structural problems with such concise guidance demonstrates a level of reviewing expertise that far exceeds our current level of understanding.

In particular, your analysis and suggestions regarding the handling of weakly correlated features in the third-round review significantly broadened our perspective and impressed us with their depth and precision. In contrast, our revisions in the second round clearly fell short of your expectations, highlighting a gap in our research perspective and analytical depth. Your reviewing approach and academic standard are fully aligned with those expected of reviewers for top-tier journals with impact factors above 10. We hold your professional expertise in the highest regard and are sincerely grateful for your valuable input.

Below are our detailed responses to each of your suggestions.

Comments:

1.Weakly correlated features are generally considered to have minimal impact on prediction results. However, among these features, weakly correlated features or strongly correlated features, why are only the low-frequency components of weakly correlated features considered, while their high-frequency components are ignored?

Response:

We sincerely thank the reviewer for the valuable comments. The reasons why only the low-frequency components of weakly correlated features are retained, while the high-frequency components are discarded, are as follows:

Preservation of long-term trend information: Low-frequency components typically reflect long-term trends and seasonal patterns. Even in weakly correlated features, such components may still contain structural signals that are potentially related to the target load. In contrast, high-frequency components often represent abrupt fluctuations and random noise. For features that already exhibit weak correlation with the target variable, retaining such high-frequency information is more likely to introduce errors than to improve predictive performance.

Noise control in weak features: Since weakly correlated features have limited direct impact on load forecasting, retaining their high-frequency components may introduce redundant fluctuations, thereby increasing the complexity of model training. Accordingly, we adopted a conservative strategy that retains only the relatively stable low-frequency components to reduce potential interference from weak features.

Support from empirical observations: In preliminary experiments (not included in the manuscript for brevity), we compared two strategies—retaining versus removing the high-frequency components of weakly correlated features. The results showed that including high-frequency components did not lead to improved forecasting accuracy; in some cases, performance even deteriorated. These findings further support the validity of our adopted strategy.

As presented in Section 2.1, Feature Engineering, we also conducted experiments where all features were decomposed into high- and low-frequency components and fed entirely into the model. As shown in Figure 5, blindly including all decomposed components did not improve prediction accuracy, and in some cases introduced redundant information that negatively affected performance. Based on this observation, we further developed a more selective and targeted feature processing approach to enhance the model's ability to extract meaningful information.

In addition, as illustrated in Figure 4, after applying our proposed data preprocessing pipeline, the final input features exhibited significantly stronger correlation with the load. Compared with the original raw feature set, the overall correlation level was markedly improved. This further confirms the effectiveness of our feature decomposition and selection strategy in preserving useful information, suppressing noise, and enhancing the relevance of features to the load.

We have added these explanations to Section 2.1 (Feature Engineering) of the revised manuscript to clarify the rationale behind the design. Once again, we sincerely thank the reviewer for the thoughtful and insightful feedback on this issue.

Comments:

2.High-frequency components are usually regarded as noise. Among all high-frequency components, why are only the high-frequency components of strongly correlated features retained, while those of weakly correlated features are discarded?

Response:

We sincerely thank the reviewer for the insightful comments regarding the handling of high-frequency components. In our feature processing strategy, we did not simply treat all high-frequency components as noise and remove them indiscriminately. Instead, we adopted a differentiated approach, taking into account the degree of correlation between each feature and the target load variable.

Specifically:

High-frequency components are not inherently noise: While high-frequency signals often contain random fluctuations, in strongly correlated features, these fluctuations may reflect meaningful short-term behaviors in the load data—such as holiday effects, weekend patterns, extreme weather responses, or reactions to sudden events. Therefore, we retained the high-frequency components of strongly correlated features, as they may carry dynamic information closely related to load variations.

High-frequency components of weakly correlated features are more likely to be pure noise: For features with low correlation to the target load, the high-frequency components typically lack interpretability and contribute little meaningful information. Retaining them would unnecessarily increase the data dimensionality and model complexity, while introducing noise that could degrade generalization performance and prediction stability.

Balancing information contribution and noise risk: Based on these considerations, we adopted a feature processing strategy guided by the principle of “maximizing useful information retention while minimizing noise introduction.” This allows us to suppress ineffective high-frequency fluctuations and preserve informative signals that are truly beneficial to the forecasting model.

We have further elaborated on this strategy and its theoretical justification in Section 2.1 (Feature Engineering) of the revised manuscript to enhance clarity and transparency. Once again, we thank the reviewer for the thoughtful and detailed feedback regarding our feature handling methodology.

Comments:

3.In the revised manuscript, the authors state: "Unlike some studies that directly exclude weakly correlated features, many researchers tend to completely discard these weakly correlated variables to simplify the model, believing they contribute little to prediction accuracy. However, in order to explore the potential information and long-term trends of weakly correlated variables, ……". Yet, how should this "potential information" be evaluated? After all, potential information is inherently embedded in the original dataset. Therefore, if the forecasting model is ‘big’ enough, the whole information will be captured, in theory.

Response:

We sincerely thank the reviewer for the insightful critique regarding the concept of “potential information.” We fully acknowledge the theoretical perspective raised: from an information-theoretic standpoint, if all relevant information is inherently embedded within the original feature set and the forecasting model has sufficient capacity and representational power, then in theory, it should be able to capture all necessary information without requiring additional data processing.

However, in real-world applications, several key challenges prevent this theoretical assumption from being fully realized:

Trade-off between model capacity and generalization: In practical modeling scenarios, to avoid overfitting and ensure efficient training and stable performance, model architectures cannot be arbitrarily large. Moreover, large-scale models typically require significantly more data, computational resources, and hardware support, which makes them difficult to deploy in real engineering environments. Therefore, relying solely on the model to extract all latent patterns from high-dimensional raw data poses certain limitations in practice.

Feature preprocessing improves signal-to-noise ratio: Feature engineering plays a crucial role in the machine learning pipeline. Its primary objective is to extract the most informative patterns before they are potentially overwhelmed by irrelevant noise within the model. Through frequency decomposition and the selection of low-frequency components from weakly correlated variables, we aim to expose latent long-term structures in the data and ease the learning burden of the model.

Empirical validation of effectiveness: In our experiments, we compared three different feature processing strategies. The method proposed in this study consistently achieved superior performance across multiple evaluation metrics. This empirical evidence further supports the practical value of our approach under realistic model constraints.

In summary, we do not reject the theoretical assumption that all relevant information resides in the original dataset. Rather, given the limited capacity of models and the constraints of real-world tasks, we attempt to enhance the model's ability to perceive useful information through appropriate data preprocessing. We have further clarified the definition and rationale of “potential information” in Section 2.1 (Feature Engineering) of the revised manuscript and expanded the related discussion accordingly. Once again, we sincerely thank the reviewer for this precise and thought-provoking feedback.

Comments:

4.The manuscript compares the prediction results under three data processing strategies, but this is only based on a single case. The forecasting results based on the original dataset are worse than those based on decomposition. This is because the adopted method fails to model the characteristics of the original dataset. Scientific research typically aims to derive universal methods or theories, so the generalizability of the proposed approach still needs further verification.

Response:

We thank the reviewer for raising the important issue of methodological generalizability—an essential consideration in load forecasting research. In practical engineering applications, the transferability and adaptability of models are constant concerns for both researchers and system operators.

The experiments in this study were conducted based on a representative dataset, with the primary objective of maintaining controlled conditions that allow for a fair comparison of different data processing strategies within a unified experimental framework. As the reviewer rightly pointed out, conclusions drawn from a single case are not sufficient to fully demonstrate the generalizability of the proposed method. However, in the actual deployment of power systems, several real-world constraints—such as limited data dimensions, significant environmental heterogeneity, and challenges in synchronizing multi-source features—pose substantial obstacles to developing a truly “universal” model applicable across all scenarios. Therefore, rather than pursuing a theoretically universal approach, we focus on designing methods with high adaptability and practical operability under specific constraints.

Specifically:

Multi-feature data are not always available or reliable in engineering settings: In real-world deployments, multi-source feature data—such as meteorological information, holiday schedules, or industrial activity indices—are often costly to collect, updated at low frequencies, or lack quality assurance. This is especially true in regions with underdeveloped data infrastructure, where missing data, time lags, and scale mismatches are common. Even when such data exist, they may not be directly suitable for high-precision modeling. Therefore, our method is designed to remain robust even when faced with limited or noisy feature sets, thereby improving its practical applicability and adaptability.

Significant regional differences in load characteristics: Load behavior varies substantially across regions due to factors such as climate conditions, geographic location, economic structure, and policy regimes. As a result, a single model architecture is unlikely to be universally applicable. Instead, we emphasize building adjustable and extensible frameworks for data processing and model development that can accommodate diverse scenario-specific requirements, rather than aiming to fit all regions with one fixed structure.

Validation and adaptability of the method: Our proposed feature processing strategy does not rely on fixed patterns from a particular scenario. Rather, it is grounded in the general principle of extracting stable structural information and suppressing localized noise. Although the current study is conducted on a representative case, the method itself shows strong potential for adaptation. In future work, we plan to extend the dataset to include multiple regions, seasons, user types, and usage behaviors, in order to thoroughly assess the generalizability and cross-scenario stability of the proposed approach.

The manuscript has received approval from both reviewers. We sincerely hope that the current round of revisions aligns with your expectations and brings our rigorous and constructive exchange to a satisfying conclusion. Once again, we deeply appreciate your professional guidance and thoughtful feedback throughout the review process.

Article Menu

Research on a Short-Term Electric Load Forecasting Model Based on Improved BWO-Optimized Dilated BiGRU

Further Information

Guidelines

MDPI Initiatives

Follow MDPI