Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model

Zeng, Shuang; Liu, Chang; Zhang, Heng; Zhang, Baoqun; Zhao, Yutong

doi:10.3390/en18020227

Open AccessArticle

Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model

by

Shuang Zeng

,

Chang Liu

,

Heng Zhang

^*,

Baoqun Zhang

and

Yutong Zhao

State Grid Beijing Electric Power Company, Beijing 100071, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(2), 227; https://doi.org/10.3390/en18020227

Submission received: 21 November 2024 / Revised: 13 December 2024 / Accepted: 2 January 2025 / Published: 7 January 2025

(This article belongs to the Special Issue New Progress in Electricity Demand Forecasting)

Download

Browse Figures

Versions Notes

Abstract

To tackle the challenges of limited accuracy and poor generalization in short-term load forecasting under complex nonlinear conditions, this study introduces a Prophet–BO–XGBoost-based forecasting framework. This approach employs the XGBoost model to interpret the nonlinear relationships between features and loads and integrates the Prophet model for label prediction from a time-series viewpoint. Given that hyperparameters substantially impact XGBoost’s performance, this study leverages Bayesian optimization (BO) to refine these parameters. Using a Gaussian process-based surrogate model and an acquisition function aimed at expected improvement, this framework optimizes hyperparameter settings to enhance model adaptability and precision. Through a regional case study, this method demonstrated improved predictive accuracy and operational efficiency, highlighting its advantages in both runtime and performance.

Keywords:

short-term load forecasting; Prophet; Bayesian optimization; XGBoost; hyperparameter

1. Introduction

With advancements in information technology and increasing reliance on renewable energy, short-term load forecasting has become essential for ensuring the reliable operation of the power grid [1]. In power systems, this process involves predicting electricity demand for upcoming days based on historical data and external influences [2]. Inputs typically include past load patterns, meteorological data, and calendar information, as well as real-time data such as weather forecasts. By analyzing the relationships between power demand and external factors, accurate predictions can be achieved for future intervals [3]. These forecasts play a key role in optimizing grid operations, reducing energy consumption, and improving economic efficiency [4].

The growing adoption of renewable energy, the advancement of smart grids, and climate change have introduced significant nonlinearity and time-varying characteristics to power loads, highlighting the need for enhanced accuracy and stability in short-term load forecasting [5]. Forecasting approaches are generally classified into two main types: conventional methods and machine learning-based methods. Traditional techniques include the ARMA model [6] and regression analysis [7]. Reference [8] proposed a seasonal autoregressive moving average (ARMA) model to analyze forecast errors in multivariate correlated loads, preserving key statistical properties of the data. Similarly, reference [9] presented a hybrid approach combining support vector regression and local prediction, demonstrating superior accuracy compared to conventional methods such as ARMA and artificial neural networks. While these methods are valued for their simplicity and computational efficiency, they require high data-sequence stationarity and are only effective in scenarios with minimal influencing factors. They struggle with abrupt load changes and have limited capacity to model nonlinear relationships [10]. As a result, their application is constrained when handling real-time system dynamics and fluctuations, and alternative approaches are needed to improve robustness.

With the rapid progress in artificial intelligence, machine learning has become a key tool in power load forecasting, greatly improving prediction accuracy. Algorithms such as decision trees [11], neural networks [12], and deep learning [13] are commonly utilized. The XGBoost algorithm, in particular, introduces a sparse-aware approach for parallel tree learning, enhancing training and prediction efficiency. It excels in handling large-scale, high-dimensional datasets [14]. Reference [15] developed an XGBoost model utilizing complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), which effectively captures error sequence fluctuations. This model integrates well with various load prediction frameworks, delivering robust and precise forecasting. Reference [16] combined clustering techniques with the XGBoost algorithm for short-term load prediction. By employing the K-means algorithm for classification and constructing XGBoost regression models tailored to each category, this method provides accurate load estimates. Reference [17] introduced an approach based on ISFS and XGBoost, which leverages the improved spanning-tree forward selection (ISFS) algorithm for feature selection. The XGBoost model assesses features through cross-validation, enhancing training performance and reducing prediction errors. Furthermore, reference [18] proposed a hybrid model integrating LightGBM and XGBoost that was specifically designed to handle multi-feature data selection and error correction efficiently, meeting the rigorous demands of short-term electricity load forecasting.

In machine learning research, parameter tuning is critical for enhancing model accuracy. Manual tuning, however, often introduces variability, limiting the potential of the XGBoost algorithm [19]. Integrating optimization algorithms with machine learning offers a solution by leveraging optimization techniques for parameter selection, thus enhancing model performance in specific scenarios. Reference [20] applied the sparrow search algorithm, utilizing its multi-objective optimization capability to fine-tune XGBoost parameters and achieving higher prediction accuracy. Similarly, reference [21] employed the fireworks algorithm, which mimics the behavior of explosions to efficiently search a solution space. It demonstrated effectiveness in solving complex optimization challenges. Reference [22] adopted Bayesian optimization (BO) to fine-tune hyperparameters, offering high search efficiency and global optimization capabilities. Lastly, reference [23] compared a grid search with cross-validation and BO for hyperparameter tuning, showing that BO-based XGBoost delivers superior accuracy and efficiency compared to a grid search.

This study integrates traditional forecasting techniques with machine learning to propose a Prophet–BO–XGBoost-based method for short-term load forecasting in complex nonlinear environments. The key contributions of this research include the following:

(1) Using the XGBoost model to explore the relationships between feature values and load while leveraging the Prophet model for time series-based label prediction.

(2) Employing the BO algorithm to optimize XGBoost hyperparameters, with the Gaussian process as the surrogate model and the expected improvement as the acquisition function, ensuring optimal parameter selection and enhanced predictive performance under diverse conditions.

This method leverages the strengths of time-series trend modeling and the ability of machine learning algorithms to capture nonlinear features, addressing the challenges of forecasting in complex nonlinear environments. The main contributions of this study are as follows:

(1) A hybrid forecasting framework combining the Prophet and XGBoost models: This framework fully exploits the advantages of the Prophet model in trend and seasonality modeling for time series and the XGBoost model’s capacity to capture complex nonlinear relationships. By integrating these two approaches, the proposed framework not only enhances the model’s understanding of the global trends in load characteristics but also significantly improves forecasting accuracy under complex conditions.

(2) Bayesian optimization for efficient hyperparameter tuning and performance enhancement: Bayesian optimization employs Gaussian processes to model the objective function and uses an expected improvement strategy to efficiently search for optimal hyperparameter combinations. Compared with traditional methods, this optimization strategy adapts more effectively to diverse load scenarios, further enhancing the model’s predictive performance and generalization capability.

The structure of this paper is as follows: Section 1 introduces the Prophet time-series model. Section 2 examines the XGBoost machine learning model’s principles. Section 3 proposes the Prophet–BO–XGBoost-based short-term load forecasting method. Section 4 validates this method using load data from a specific region. Section 5 presents this study’s conclusions.

2. Prophet Time-Series Forecasting Model

The Prophet model integrates two main modules (modeling and evaluation), as illustrated in Figure 1. The basic process of the Prophet model involves establishing a time-series model based on the forecasting problem, continuously evaluating the model to adjust its parameters, and ultimately providing feedback on all prediction results through visualization.

Prophet is a time-series prediction model that incorporates three core components: trend (g(t)), seasonality (s(t)), and holidays (h(t)). These components are combined mathematically as follows:

P (t) = g (t) + s (t) + h (t) + ε_{t}

(1)

where ε_t is the error component, which is typically assumed to follow a normal distribution with a mean of 0. It is used to reflect unexpected variations that are not accounted for in the model.

The trend component (g(t)) is used to capture the long-term trend in the time series. Its fundamental form is

g (t) = (k + α {(t)}^{T} δ) t + (b + α {(t)}^{T} γ)

(2)

where k is the growth rate; b is the bias parameter; δ is the change in the growth rate; γ is the adjustment value process offset at points where the growth rate (k) changes; and α(t)^T indicates whether a slope or offset adjustment is performed (1 for yes or 0 for no).

The seasonal component (s(t)) is approximated using a Fourier series to represent periodic variations. It is expressed as follows:

s (t) = \sum_{n = 1}^{N} (a_{n} \cos (\frac{2 π n t}{P}) + b_{n} \sin (\frac{2 π n t}{P}))

(3)

where N is the total number of periods; P denotes a specific fixed period; 2n is the expected number of occurrences in this period in the model; and a_n and b_n are parameters that need to be estimated.

The holiday component (h(t)) is used to consider abnormal data fluctuations occurring on specific dates that cannot be captured by periodic models. Therefore, independent models are established for different holidays. Its specific form is as follows:

h (t) = Z (t) κ

(4)

Z (t) = [I (t \in D_{1}), I (t \in D_{2}), \dots, I (t \in D_{L})]

(5)

κ \in Normal (0, v^{2})

(6)

where D_i represents the i-th holiday set; I(t ∈ D_i) is the indicator function, which has a value of 1 during the i-th holiday period and a value of 0 otherwise; κ is the holiday setting parameter; and v is the impact factor of holidays on the prediction results, where a larger v implies that holiday has a greater influence.

3. XGBoost Machine Learning Model

The XGBoost model improves upon the gradient-boosting algorithm by integrating multiple base classifiers (decision trees), significantly enhancing classification and prediction performance. Figure 2 illustrates the overall workflow of the XGBoost model. Initially, the training data are used to construct an initial decision tree, which provides a preliminary prediction of the target variable and generates the first-round results. In each subsequent iteration, the model calculates the residuals between the current predictions and the actual values, and these residuals are fed into the next decision tree. The new tree is guided to learn and correct the deficiencies of the previous predictions. Each new decision tree aims to minimize the current errors, allowing the model to iteratively optimize the predictions. Once all decision trees are trained, their outputs are combined using a weighted aggregation mechanism to produce the final ensemble result. Through this iterative optimization process, XGBoost effectively approximates the true target values, delivering highly efficient and accurate predictions.

The XGBoost algorithm employs decision trees as base learners to construct multiple weak learners. It then continuously trains the model in the direction of gradient descent. Assuming the model consists of k decision trees, the formulation is as follows:

{\hat{y}}_{i} = \sum_{i = 1}^{k} f_{t} (x_{i}), f_{t} \in F

(7)

F = {f_{t} (x_{i}) = ω_{q (x_{i})}}

(8)

where k is the number of trees; f_t denotes the function in function space F;

{\hat{y}}_{i}

is the predicted value of the model; x_i is the input of the i-th data point;

ω_{q (x_{i})}

denotes the weight of the leaf node where sample x_i occurs; and q(x_i) indicates the leaf node corresponding to sample x_i.

The XGBoost algorithm adopts an addition and forward distribution algorithm, where each iteration does not affect the original model, that is

\{\begin{cases} {\hat{y}}_{i}^{(0)} = 0 \\ {\hat{y}}_{i}^{(1)} = {\hat{y}}_{i}^{(0)} + f_{1} (x_{i}) \\ ⋮ \\ {\hat{y}}_{i}^{(t)} = {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i}) \end{cases}

(9)

The XGBoost algorithm’s objective function (L) consists of two components: the loss function, which quantifies the error between the predicted and actual loads in the test dataset, and the regularization term, which mitigates overfitting and manages model complexity. The mathematical representation of the objective function is

L = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{j = 1}^{k} Ω (f_{j})

(10)

Ω (f) = γ T + \frac{1}{2} λ | | ω | |^{2}

(11)

where

l (y_{i}, {\hat{y}}_{i})

represents the error between the predicted and actual values; Ω(f_j) is the regularization term used to reduce overfitting; T denotes the number of leaf nodes; γ is the penalty factor for T; w indicates the weights of the leaf nodes; and n is the sample size. Upon training the t-th tree, the objective function is revised as

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{j = 1}^{t} Ω (f_{j}) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{j = 1}^{t - 1} Ω (f_{j}) + Ω (f_{t})

(12)

Since the previous t−1 regression trees are known information, i.e., Ω(f_j) is constant, the optimization process for the t-th tree is unaffected. Therefore, the objective function can be simplified to

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + Ω (f_{t}) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + γ T + \frac{1}{2} λ | | ω | |^{2}

(13)

The objective function can be rewritten in terms of leaf node traversal as

L^{(t)} = γ T + \sum_{j = 1}^{T} (\sum_{i \in I_{j}} l (y_{i}, {\hat{y}}_{i}) + \frac{1}{2} λ ω_{j}^{2}) = γ T + \sum_{j = 1}^{T} (\sum_{i \in I_{j}} l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + ω_{j}) + \frac{1}{2} λ ω_{j}^{2})

(14)

By expanding the objective function using the Taylor series for a binary function,

l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + ω_{j})

can be represented as

l (y_{i}, {\hat{y}}_{i}^{(t - 1)} + ω_{j}) = l (y_{i}, {\hat{y}}_{i}^{(t - 1)}) + l^{'} (y_{i}, {\hat{y}}_{i}^{(t - 1)}) ω_{j} + \frac{1}{2} l^{″} (y_{i}, {\hat{y}}_{i}^{(t - 1)}) ω_{j}^{2}

(15)

When the objective function is simplified by removing the term

l (y_{i}, {\hat{y}}_{i}^{(t - 1)})

, the following equation is obtained:

\tilde{L} (t) = γ T + \sum_{j = 1}^{T} (\sum_{i \in I_{j}} (h_{i} ω_{j} + \frac{1}{2} g_{i} ω_{j}^{2}) + \frac{1}{2} λ ω_{j}^{2}) = γ T + \sum_{j = 1}^{T} (ω_{j} \sum_{i \in I_{j}} h_{i} + \frac{1}{2} ω_{j}^{2} (λ + \sum_{i \in I_{j}} g_{i}))

(16)

where

h_{i} = l^{'} (y_{i}, {\hat{y}}_{i}^{(t - 1)})

and

g_{i} = l^{″} (y_{i}, {\hat{y}}_{i}^{(t - 1)})

.

A lower value for the objective function reflects a more optimal regression tree structure. Minimizing it and setting its derivative to zero yields the weights for each leaf node as follows:

ω_{j} = - \frac{\sum_{i = I_{j}} h_{i}}{\sum_{i = I_{j}} g_{i} + λ}

(17)

By substituting them into the objective function, the minimum loss (

\tilde{L} (t)

) at this point is

{\tilde{L}}^{t} (q) = - \frac{1}{2} \sum_{j = 1}^{T} \frac{(\sum_{i \in I_{j}} h_{i})}{\sum_{i \in I_{j}} g_{i} + λ} + γ T

(18)

The construction of the XGBoost prediction model involves the following steps: (1) start with an initial iteration, building sub-models sequentially; (2) before each iteration, compute the first-order g_i and second-order h_i derivatives of the loss function for all training samples; (3) generate a new decision tree and calculate the predicted values for each leaf node using Equation (17); and (4) incrementally add the newly generated model to the existing models after each iteration, forming the final prediction model over multiple iterations.

4. Prophet–BO–XGBoost Load Forecasting Model

4.1. Bayesian Optimization of XGBoost Hyperparameter Tuning

The XGBoost model’s hyperparameters include general, booster, and task parameters, with the booster parameters being the most influential. For instance, the learning rate (η) updates the leaf node weights, where a small η can cause overfitting and a large η may lead to underfitting. Thus, optimizing XGBoost hyperparameters is essential for enhancing model performance.

In general, hyperparameters can be determined using grid search or random search methods. However, these approaches involve significant randomness and chance and may not fully exploit the performance of the XGBoost algorithm. To address this issue, this study adopts the BO algorithm to optimize the XGBoost machine learning model. By leveraging previous evaluation information, BO reduces the number of attempts required to find the optimal hyperparameters.

The BO algorithm involves constructing a surrogate probability model for the loss function and iteratively refining it with new information to approximate the true distribution. The hyperparameter optimization of the XGBoost model using this algorithm is formulated as follows:

x^{*} = \arg \max f (x_{i}) x_{i} \in ℝ^{d}

(19)

where x_i indicates the XGBoost model’s hyperparameters; f(·) represents the objective function assessing model performance; ℝ is the hyperparameter space; and d denotes the dimensionality of the hyperparameters to be optimized. The optimal hyperparameters are represented as x*. The evaluation results of XGBoost are recorded as y_i = f(x_i).

Bayesian optimization relies on two key components: the surrogate model and the acquisition function. The surrogate model estimates the objective function’s value and facilitates optimization of black-box functions. The acquisition function selects the next sampling point, balancing exploration and exploitation to improve the objective function value in subsequent steps. This study utilizes a Gaussian process (g(x)) as the surrogate model and adopts expected improvement (α(x|D)) as the acquisition function.

4.1.1. Gaussian Process (GP)

The Gaussian process approximates an objective function by defining a probability distribution for each of its points. It models a series of random vectors that evolve over time, where all sub-vectors follow a Gaussian distribution at any given time. A stochastic process {X_t, t∈T} is Gaussian if, for any finite index set (T and t₁…t_k∈T), the random vector X_t_1…tk = (X_t₁,…,X_tk) is multivariate and normal. This implies that any linear combination of X_t₁ to X_tk is normally distributed.

Denoting the Gaussian process as GP(·), which is characterized by a mean function (m) and a covariance function (k), its mathematical representation is

f (x) ~ GP (m (x), k (x, x'))

(20)

where k(·) denotes the covariance function, defined as

k (x_{i}, x_{j}) = \exp (- \frac{1}{2} {‖x_{i} - x_{j}‖}^{2})

(21)

Let D_1:t = {x_1:t, f_1:t} represent historical data from exploration. Assuming the next search value is x_t₊₁ and f_t₊₁ = f(x_t₊₁), the covariance matrix is denoted as K:

K = [\begin{matrix} k (x_{1}, x_{1}) & \dots & k (x_{1}, x_{t}) \\ ⋮ & ⋱ & ⋮ \\ k (x_{t}, x_{1}) & \dots & k (x_{t}, x_{t}) \end{matrix}]

(22)

According to the properties of Gaussian processes, f_1:t and f_t₊₁ constitute a joint Gaussian distribution, assuming a mean of zero. This distribution is expressed as

[\begin{array}{l} f_{1 : t} \\ f_{t + 1} \end{array}] ~ N (0, [\begin{matrix} K & k \\ k^{T} & k (x_{t + 1}, x_{t + 1}) \end{matrix}])

(23)

k = [\begin{matrix} k (x_{t + 1}, x_{1}) & (x_{t + 1}, x_{2}) & \dots & (x_{t + 1}, x_{i}) \end{matrix}]

(24)

By finding its marginal density function, the result can be obtained as

f_{t + 1} | D_{1 : t}, x_{t + 1} ~ N (μ_{t} (x_{t + 1}), σ_{t}^{2} (x_{t + 1}))

(25)

μ_{t} (x_{t + 1}) = k^{T} K^{- 1} f_{1 : t}

(26)

σ_{t}^{2} (x_{t + 1}) = k (x_{t + 1}, x_{t + 1}) - k^{T} K^{- 1} k

(27)

Based on the above analysis, the normal distribution followed by x_t₊₁ at any given value can be estimated, thereby allowing for the setting of a specific objective function to locate the optimal x_t₊₁ for the next step.

4.1.2. Expected Improvement (EI)

The expected improvement criterion seeks to find the x_t₊₁ that maximizes the expected improvement while balancing exploration and exploitation. Once an x_t₊₁ is chosen, the optimization function is expressed as

I (x) = \max {0, f_{t + 1} (x) - f (x^{+})}

(28)

where x⁺ denotes the current best sample. The desired x_t₊₁ should satisfy

x = \arg \max ℤ (\max {0, f_{t + 1} (x) - f (x^{+})} | D_{t})

(29)

The formula for the expected improvement can be reformulated as follows:

ℤ (I) = σ (x) [\frac{μ (x) - f (x^{+})}{σ (x)} Φ (\frac{μ (x) - f (x^{+})}{σ (x)}) + ϕ (\frac{μ (x) - f (x^{+})}{σ (x)})]

(30)

The final simplified result is

EI (x) = \{\begin{array}{l} (μ (x) - f (x^{+})) Φ (Z) + σ (x) ϕ (Z) & σ (x) > 0 \\ 0 & σ (x) = 0 \end{array}

(31)

where

Z = \frac{μ (x) - f (x^{+})}{σ (x)}

, ϕ(·) represents the standard normal probability density function, and F(·) denotes its cumulative distribution function.

The Bayesian optimization algorithm employs Bayes’ theorem to guide the search for the objective function’s maximum value. At each iteration, it uses historical observation data to refine the optimization process, aiming to identify the best hyperparameter combination. The optimization framework is detailed in Table 1, while Figure 3 illustrates the BO–XGBoost model’s process, with the steps described as follows:

(1) Data preprocessing: The input data are first standardized and normalized to eliminate discrepancies in feature scales, ensuring the consistency and quality of the input data. The processed data are then divided into training and testing sets, which are used for model training and performance evaluation, respectively.

(2) Initial sampling and model updating for Bayesian optimization: Bayesian optimization begins by randomly sampling from the hyperparameter space to construct a surrogate model using a Gaussian process. Through iterative updates, the surrogate model gradually fits the objective function distribution, enhancing the efficiency and accuracy of the sampling process.

(3) Hyperparameter optimization and optimal configuration: In each iteration, the expected improvement strategy is used to select new sampling points, calculate objective function values (e.g., prediction errors), and evaluate whether the optimization goal has been achieved. Once the goal has been met, the optimal hyperparameter configuration for the XGBoost model is output.

(4) BO–XGBoost model training: Using the optimal hyperparameters obtained through Bayesian optimization, the XGBoost model is trained. The model iteratively constructs multiple decision trees, correcting prediction errors step by step. The final predictions are generated by aggregating the outputs of all decision trees with weighted combinations, achieving high-precision predictions of the target variable.

(5) Model evaluation: after training, the testing set is fed into the model to assess its performance using evaluation metrics, validating the accuracy of the predictions.

4.2. Hybrid Prediction Model

According to the power load data, we construct both the Prophet model and the XGBoost machine learning model. Let us assume that at time t the forecasted value from the Prophet model is denoted as P(t) and the forecasted value from the BO–XGBoost machine learning model is denoted as X(t), where t = 1, 2, …n. Then, we combine these two individual models into an integrated Prophet–BO–XGBoost hybrid prediction model. This model can be defined as follows:

Y_{t} = ω_{1} P (t) + ω_{2} X (t)

(32)

ω_{1} = \frac{ε_{X}}{ε_{P} + ε_{X}}

(33)

ω_{2} = \frac{ε_{P}}{ε_{P} + ε_{X}}

(34)

where ε_P and ε_X represent the average relative errors of Prophet and XGBoost, respectively, and Y_t denotes the mixed forecast value at time t.

The reciprocal error method, as shown in Equations (33) and (34), is applied to determine the weights. This method assigns larger weights to models with smaller average relative errors, aiming to reduce the overall average relative error of the mixed model and obtain more accurate forecast values.

Figure 4 illustrates the load prediction model based on Prophet–BO–XGBoost. The Prophet model predicts labels from a time-series perspective, represented by the horizontal arrows, which indicate the analysis along the time axis. Conversely, the vertical arrows represent the BO–XGBoost analysis, which captures the nonlinear relationships between multiple labels and the power load. Bayesian optimization fine-tunes the XGBoost model’s hyperparameters, while the BO–XGBoost model captures the nonlinear relationships between features and power load, enabling horizontal short-term load forecasting. The detailed model construction steps are outlined below:

(1) Collect relevant data, including historical load data and corresponding raw feature data, and preprocess the data samples for labeling.

(2) Establish the Prophet model to forecast various labels and obtain components such as trend, weekly, and yearly data.

(3) Utilize the Prophet results as feature variables. Maintain consistency in the number of training and testing sample sets with the Prophet model. Establish the Prophet–XGBoost optimization model using the training sample set. Optimize the hyperparameters using Bayesian optimization, analyzing the nonlinear mapping relationships between various labeled feature values and the load.

(4) Output the prediction results and calculate the accuracy.

5. Case Analysis

To assess this model’s accuracy and performance, the mean absolute percentage error (δ_MAPE) and root-mean-square error (δ_RMSE) were chosen as evaluation metrics [24]:

δ_{M A P E} = \frac{(\sum_{i = 1}^{n} | \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} |)}{n} \times 100 %

(35)

δ_{R M S E} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(36)

where y_i represents the actual observed load value at time point i;

{\hat{y}}_{i}

represents the predicted load value at the same time point generated by the BO–XGBoost model; and n denotes the total number of data samples, i.e., the number of time points.

Load data from 2018 to 2020 for a specific region, along with features like temperature and humidity, were used as training and testing datasets. Figure 5 illustrates the load dataset.

The rainflow counting technique was used for statistical analysis of the load in this area, as depicted in Figure 6. The analysis revealed that the load predominantly ranged between 800 kW and 1400 kW, with most fluctuations measuring between 200 kW and 400 kW.

The regional load level was closely related to environmental variables such as temperature, humidity, precipitation, and wind speed. To ensure the scientific validity and rationality of the selected input features, Pearson correlation analysis was conducted to quantify the relationships between the load and key environmental variables, particularly temperature and humidity. The results, as shown in Figure 7, indicate significant correlations between the load and these variables, with temperature and humidity exerting strong influences on load variations. In the figure, the size of the circles represents the strength of the correlations, with larger circles indicating stronger correlations. This analysis not only validates the relevance and rationality of the selected features but also ensures a close association between the input features and the target load, providing a reliable data foundation for subsequent modeling.

In the Anaconda environment, a 10-fold cross-validation method was applied, using the RMSE as the target for Bayesian optimization. This approach identified the best hyperparameters for the XGBoost model, as shown in Table 2.

Figure 8 illustrates 48 h load predictions made using decision trees, random forests, XGBoost, and the proposed Prophet–BO–XGBoost algorithm. The results indicate that the predicted load generated by the proposed method aligns closely with the actual load.

Figure 9 illustrates the predictive errors (measured by the RMSE and MAPE) of four algorithms: decision trees, random forests, XGBoost, and the proposed Prophet–BO–XGBoost. The results clearly demonstrate the superior performance of the Prophet–BO–XGBoost algorithm, which achieved the lowest error values for both metrics (RMSE: 19.63 kW, MAPE: 1.35%). In comparison, the other algorithms showed higher error levels, indicating limitations in error control. These findings highlight the exceptional predictive accuracy of the proposed algorithm and provide a reference for selecting and optimizing forecasting models in future studies.

Table 3 presents the RMSE and MAE values of 48 h load predictions made using decision trees, random forests, XGBoost, and the Prophet–BO–XGBoost algorithm. The results indicate that the proposed method achieved the lowest RMSE and MAE, demonstrating superior predictive performance.

6. Conclusions

The Prophet–BO–XGBoost-based method for short-term load forecasting effectively tackles the issues of low accuracy and poor generalization in complex nonlinear scenarios. The case study analysis led to the following conclusions:

(1) The hybrid model presented integrates the Prophet model with the XGBoost machine learning algorithm, leveraging their complementary strengths. The case studies demonstrated that this combined approach enhances the accuracy of short-term load forecasting, offering an effective solution for addressing complex nonlinear scenarios.

(2) The Prophet–XGBoost algorithm comprehensively evaluates the impacts of variables like temperature, humidity, and wind speed, effectively mitigating overfitting. It facilitates feature selection, boosts prediction accuracy, and enhances the interpretability of the regression model.

(3) Compared to conventional hyperparameter tuning techniques, the XGBoost model employing Bayesian optimization demonstrates superior efficiency in exploring its hyperparameters. It dynamically adjusts the balance between exploration and exploitation processes, facilitating the discovery of the global optimal solution. Consequently, this approach significantly improves the model’s performance and its ability to generalize across different datasets.

(4) The BO–XGBoost model proposed in this study demonstrates strong performance in short-term load forecasting. However, its predictive accuracy is highly sensitive to the initial feature selection, as the rationality of feature selection directly impacts the model’s performance. Future research could explore automated feature selection methods, leveraging feature importance analysis techniques such as SHAP values, to further optimize feature engineering and reduce the impact of manual intervention on model performance.

Author Contributions

Conceptualization, H.Z.; Methodology, S.Z., C.L., B.Z. and Y.Z.; Validation, C.L., H.Z. and B.Z.; Investigation, S.Z.; Writing—original draft, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the State Grid Beijing Electric Power Company (5700-202311602A-3-2-ZN).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

All authors was employed by the State Grid Beijing Electric Power Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef]
Li, J.; Deng, D.; Zhao, J.; Cai, D.; Hu, W.; Zhang, M.; Huang, Q. A novel hybrid short-term load forecasting method of smart grid using MLR and LSTM neural network. IEEE Trans. Ind. Inform. 2021, 17, 2443–2452. [Google Scholar] [CrossRef]
Liu, Y.; Dutta, S.; Kong, A.W.K.; Yeo, C.K. An image inpainting approach to short-term load forecasting. IEEE Trans. Power Syst. 2023, 38, 177–187. [Google Scholar] [CrossRef]
Guo, Y.; Li, Y.; Qiao, X.; Zhang, Z.; Zhou, W.; Mei, Y.; Lin, J.; Zhou, Y.; Nakanishi, Y. BiLSTM multitask learning-based combined load forecasting considering the loads coupling relationship for multienergy system. IEEE Trans. Smart Grid 2022, 13, 3481–3492. [Google Scholar] [CrossRef]
Lin, W.; Wu, D.; Boulet, B. Spatial-temporal residential short-term load forecasting via graph neural networks. IEEE Trans. Smart Grid 2021, 12, 5373–5384. [Google Scholar] [CrossRef]
Alhmoud, L.; Nawafleh, Q. Short-term load forecasting for Jordan power system based on NARX-ELMAN neural network and ARMA Model. IEEE Can. J. Electr. Comput. Eng. 2021, 44, 356–363. [Google Scholar] [CrossRef]
Madhukumar, M.; Sebastian, A.; Liang, X.; Jamil, M.; Shabbir, M.N.S.K. Regression model-based short-term load forecasting for university campus load. IEEE Access 2022, 10, 8891–8905. [Google Scholar] [CrossRef]
Hafen, R.P.; Samaan, N.; Makarov, Y.V.; Diao, R.; Lu, N. Joint seasonal ARMA approach for modeling of load forecast errors in planning studies. In Proceedings of the 2014 IEEE PES T&D Conference and Exposition, Chicago, IL, USA, 14–17 April 2014; pp. 1–5. [Google Scholar]
Li, M.S.; Wu, J.L.; Ji, T.Y.; Wu, Q.H.; Zhu, L. Short-term load forecasting using support vector regression-based local predictor. In Proceedings of the 2015 IEEE Power & Energy Society General Meeting, Denver, CO, USA, 26–30 July 2015; pp. 1–5. [Google Scholar]
López, J.C.; Rider, M.J.; Wu, Q. Parsimonious short-term load forecasting for optimal operation planning of electrical distribution systems. IEEE Trans. Power Syst. 2019, 34, 1427–1437. [Google Scholar] [CrossRef]
Xie, Z.; Wang, R.; Wu, Z.; Liu, T. Short-term power load forecasting model based on fuzzy neural network using improved decision tree. In Proceedings of the 2019 IEEE Sustainable Power and Energy Conference (iSPEC), Beijing, China, 21–23 November 2019; pp. 482–486. [Google Scholar]
Velasco, L.C.P.; Arnejo, K.A.S.; Macarat, J.S.S. Performance analysis of artificial neural network models for hour-ahead electric load forecasting. Procedia Comput. Sci. 2022, 197, 16–24. [Google Scholar] [CrossRef]
Song, Z.; Cao, Z.; Wan, C.; Xu, S. An ensemble wavelet deep learning approach for short-term load forecasting. In Proceedings of the 2019 IEEE Innovative Smart Grid Technologies—Asia (ISGT Asia), Chengdu, China, 21–24 May 2019; pp. 1205–1210. [Google Scholar]
Zhang, T.; Zhang, X.; Rubasinghe, O.; Liu, Y.; Chow, Y.H.; Iu, H.H.C.; Fernando, T. Long-term energy and peak power demand forecasting based on sequential-XGBoost. IEEE Trans. Power Syst. 2024, 39, 3088–3104. [Google Scholar] [CrossRef]
Lei, S.; Liang, X.; Xia, X.; Dai, H.; Zhang, C.; Ge, X.; Wang, F. A two-stage short-term load forecasting method based on comprehensive similarity day selection and CEEMDAN-XGBoost error correction. In Proceedings of the 2023 International Conference on Future Energy Solutions (FES), Beijing, China, 16–18 June 2023; pp. 1–6. [Google Scholar]
Liu, Y.; Luo, H.; Zhao, B.; Zhao, X.; Han, Z. Short-term power load forecasting based on clustering and XGBoost method. In Proceedings of the 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 23–25 November 2018; pp. 536–539. [Google Scholar]
Tang, Y.; Li, Z.; Ni, C.; Gong, D.; Chen, W.; Zhang, X. Short-term load forecasting by multi-feature iterative learning based on ISFS and XGBoost. In Proceedings of the 2021 IEEE Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 2–4 December 2021; pp. 3745–3752. [Google Scholar]
Yao, X.; Fu, X.; Zong, C. Short-term load forecasting method based on feature preference strategy and LightGBM-XGboost. IEEE Access 2022, 10, 75257–75268. [Google Scholar] [CrossRef]
Prakash, A.; Thangaraj, J.; Roy, S.; Srivastav, S.; Mishra, J.K. Model-aware XGBoost method towards optimum performance of flexible distributed Raman amplifier. IEEE Photonics J. 2023, 15, 8800210. [Google Scholar] [CrossRef]
Song, J.; Jin, L.; Xie, Y.; Wei, C. Optimized XGBoost based sparrow search algorithm for short-term load forecasting. In Proceedings of the 2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE), Zhuhai, China, 13–15 August 2021; pp. 213–217. [Google Scholar]
Suo, G.; Song, L.; Dou, Y.; Cui, Z. Multi-dimensional short-term load forecasting based on XGBoost and fireworks algorithm. In Proceedings of the 2019 18th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Wuhan, China, 15–18 November 2019; pp. 245–248. [Google Scholar]
Wang, Y.; Sun, S.; Chen, X.; Zeng, X.; Kong, Y.; Chen, J.; Guo, Y.; Wang, T. Short-term load forecasting of industrial customers based on SVMD and XGBoost. Int. J. Electr. Power Energy Syst. 2021, 129, 106830. [Google Scholar] [CrossRef]
Sun, L. Application and improvement of XGBoost algorithm based on multiple parameter optimization strategy. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 27–29 December 2020; pp. 1822–1825. [Google Scholar]
Singh, U.; Vadhera, S. Random forest and XGBoost technique for short-term load forecasting. In Proceedings of the 2022 1st International Conference on Sustainable Technology for Power and Energy Systems (STPES), Noida, India, 29–30 July 2022; pp. 1–6. [Google Scholar]

Figure 1. Flow diagram of Prophet.

Figure 2. Structure chart of XGBoost model.

Figure 3. Flow diagram of BO–XGBoost model.

Figure 4. Flow diagram of Prophet–BO–XGBoost model.

Figure 5. The load data within the dataset.

Figure 6. Statistical analysis of regional load information.

Figure 7. Pearson correlation analysis results.

Figure 8. Load predictions generated by different algorithms.

Figure 9. The predictive errors of four algorithms.

Table 1. The workflow of the Bayesian optimization framework.

Bayesian Optimization Framework
Input:	Initial point count (n₀); maximum iterations (N); surrogate model (g(x)); acquisition function (α(x\|D)).
Start
Step 1:	Randomly generate n₀ initial points (X_init = {x₀, x₁,…, x_n₀₋₁}).
Step 2:	Compute their function values (f(X_init)) and form the initial dataset (D₀ = {X_init, f(X_init)}). Set t = n₀ and D_t₋₁ = D₀.
while t < N
Step 3:	Build the surrogate model (g(x)) using the current dataset (D_t₋₁).
Step 4:	Optimize the acquisition function (α(x\|D_t₋₁)) to determine the next evaluation point: x_t = argmin α(x\|D_t₋₁).
Step 5:	Evaluate f(x_t) at x_t and update the dataset (D_t = D_t₋₁ $\cup$ {x_t, f(x_t)}), then return to step 3.
end while
Output:	Optimal evaluation point.
End

Table 2. The optimal hyperparameters for the XGBoost model.

XGBoost Hyperparameter	Optimal Value
max_depth	9
learning_rate	0.042
n_estimators	810
min_child_weight	19.653
subsample	0.808
colsample	0.428
reg_alpha	17.384
gamma	0.079

Table 3. Prediction errors of various algorithms.

Algorithm	RMSE (kW)	MAPE (%)
decision trees	66.517	3.567
random forests	47.556	3.223
XGBoost	78.910	5.071
Prophet–BO–XGBoost	19.629	1.352

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zeng, S.; Liu, C.; Zhang, H.; Zhang, B.; Zhao, Y. Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model. Energies 2025, 18, 227. https://doi.org/10.3390/en18020227

AMA Style

Zeng S, Liu C, Zhang H, Zhang B, Zhao Y. Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model. Energies. 2025; 18(2):227. https://doi.org/10.3390/en18020227

Chicago/Turabian Style

Zeng, Shuang, Chang Liu, Heng Zhang, Baoqun Zhang, and Yutong Zhao. 2025. "Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model" Energies 18, no. 2: 227. https://doi.org/10.3390/en18020227

APA Style

Zeng, S., Liu, C., Zhang, H., Zhang, B., & Zhao, Y. (2025). Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model. Energies, 18(2), 227. https://doi.org/10.3390/en18020227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Load Forecasting in Power Systems Based on the Prophet–BO–XGBoost Model

Abstract

1. Introduction

2. Prophet Time-Series Forecasting Model

3. XGBoost Machine Learning Model

4. Prophet–BO–XGBoost Load Forecasting Model

4.1. Bayesian Optimization of XGBoost Hyperparameter Tuning

4.1.1. Gaussian Process (GP)

4.1.2. Expected Improvement (EI)

4.2. Hybrid Prediction Model

5. Case Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI