This section discusses the proposed method for predicting long-term forecasting using DL algorithms on a real dataset, as shown in
Figure 11. After feature selection and dataset normalization are complete, the procedure begins. Following that, LSTM-RNN, BPNN, KNN, and ANN DL models are used for model training to make long-term predictions based on historical data. Then, the ML models’ hyper-tuning parameters, such as learning rate, epochs, batch size, hidden layers, optimizer, etc., are adjusted, and performance metrics are used to evaluate each model’s correctness. Finally, we select the optimal load forecasting method. Pandas 2.2.3, Seaborn 0.13.2, Matplotlib 3.10.0, TensorFlow 2.15, NumPy 2.0.2, Google Colab, Python 3.10, and Sklearn 1.6.0 were used to implement all of these models for this forecasting.
3.3.1. Data Processing
Data normalization is a preprocessing process that is essential to improving model accuracy and performance in electrical load forecasting since it prevents some features from overpowering all other features. There are several kinds of normalization techniques, such as Z-score, min–max, and standardization. For this research, the min–max normalization method has been used. To help DL models like LSTM converge more quickly and prevent bias toward characteristics with higher numeric ranges, it converts input features like load, temperature, or time into a standard scale (between 0 and 1). The min–max data normalization process is given below [
29]:
where ‘X’ is the original value, and ‘X min’ and ‘X max’ are the minimum and maximum values from the dataset, respectively.
Highly valued variables, such as (load in kilowatts), could dominate minor features (hour or temperature) when the values of the datasets are in abnormal positions, resulting in unstable training or ineffective accuracy. Hence, normalization is a crucial preprocessing step in load forecasting to guarantee uniform scaling, quicker training, and more accurate predictions.
- 2.
Feature selection for load forecasting
The most significant step in load forecasting is the feature selection, which aims to determine the most pertinent input variables impacting power demand. Identifying the correct features for forecasting models, particularly those features that use ML and DL, can increase model accuracy, decrease overfitting, and speed up training [
30]. In electrical load forecasting, typical features include temperature, time of day, day of the week, season, historical load, and often humidity, holiday indicators, or economic activity levels. These variables are chosen using statistical techniques such as correlation analysis, mutual information, and permutation importance, along with domain expertise [
31]. In this work, sequence creation through lag-based features has been applied, where the model learns patterns from the previous time steps of the target variable [
32]. The lag order means the number of past observations used as input is determined based on autocorrelation behavior and domain knowledge of electric load patterns. Because electricity demand shows strong hourly and daily dependencies, selecting a 24 h lag window allows the model to capture short-term temporal variations effectively and contributes to higher forecasting accuracy.
- 3.
Selecting an LSTM-RNN model for forecasting
An LSTM-RNN model for STLF is highly justified by the dataset’s features. The hourly load data from 2003 to 2014 shows various time-dependent patterns, including daily, weekly, and seasonal cycles, as well as long-range correlations that are partially influenced by temperature fluctuations. Because traditional models like ANN, BPNN, or KNN lack mechanisms for storing long-term sequential information, these models have an obstacle to capturing such recurrent and nonlinear patterns. LSTM networks, on the other hand, are ideally suited for learning temporal dependencies and temperature load interactions as they employ gated memory units that enable the model to store, update, and forget information effectively. Moreover, traditional RNNs’ vanishing-gradient issue is solved by LSTMs, allowing for stable learning over multiyear hourly sequences. Hence, the LSTM-RNN model is the best option for precise short-term load predictions based on both the observed dataset characteristics and the intrinsic capabilities of the architecture.
3.3.2. Machine Learning Algorithms
The following section provides an overview of the four ML algorithms that were used. The accuracy and performance of these algorithms were taken into consideration when selecting them. However, these ML algorithms, which may have similar objectives, are distinguished by their mathematical models, advantages, and disadvantages. Evaluation of the dependent values that could be predicted by the independent variables that differ between elements is performed using a DL technique. In this paper, four types of neural networks, LSTM-RNN, BPNN, KNN, and ANN, were used to forecast electrical load.
Electrical load forecasting is an instance of a forecasting application that effectively predicts time-dependent variables using a hybrid model combining recurrent neural networks and LSTM units. LSTMs overcome the limitations of traditional RNNs, particularly the vanishing-gradient problem, through memory cells regulated by input, output, and forget gates [
33]. The design may be able to capture both transient variations and long-term seasonal trends in load profiles. Our research proposed an LSTM-RNN-based hybrid model using historical hourly load data. The model architecture consists of a dense output layer, one 64-unit LSTM layer selected to balance model complexity and computational efficiency by providing sufficient capacity to capture nonlinear load dynamics without overfitting, and one input layer. The data was normalized and framed into sequences using a 24 h look-back frame to forecast the load for the following hour. MSE was used as a loss function, and the Adam optimizer was used to train the model. The LSTM-RNN technique successfully learns from past trends and generates accurate predictions, according to evaluation using metrics like MAE, RMSE, MAPE, and R
2:
In Equation (3), the compressed formulation of the LSTM-RNN forecasting model is represented mathematically, where the hidden state ‘
’ and cell state ‘
’ at time t are calculated by transforming the current input ‘
’ and previous states ‘
’ using an LSTM. Temporal information is regulated by the model through the input gate ‘
’, forget gate ‘
’, output gate ‘
’, and candidate cell ‘
’. To accurately assess nonlinear and time-dependent load dynamics, these gates function together to control memory flow, ensure long-term dependencies, and reduce vanishing-gradient limitations [
33].
RNN and LSTM forecasting are combined in a hybrid architecture shown in
Figure 12. The model’s input layer analyzes sequential data before being sent to the first hidden layer, which is made up of RNN cells, which are good at identifying short-term dependencies. To capture long-term temporal trends and address problems like vanishing gradients, the output from this RNN layer is then fed into a second hidden layer made up of LSTM units. A fully connected layer further processes the LSTM layer’s outputs to extract pertinent features and patterns before sending them to the final output layer. This architecture is especially appropriate for forecasting tasks where both short-term variations and long-term patterns are crucial, such as electrical load prediction.
- 2.
Backpropagation Neural Network
The block diagram in
Figure 13 illustrates the three main layers that comprise a BPNN architecture: an input layer, one or more hidden layers, and an output layer. Parameters such as temperature, time, or past electrical demand ‘x1 to xm’ are the network inputs, which are sent to the input layer and then transmitted to the hidden neurons via the changes in weighted connections ‘Wij’ [
34]. Each hidden neuron represents convoluted nonlinear interactions by calculating a weighted sum of inputs and applying a nonlinear activation function (such as a rectified linear unit) represented as φ with accompanying biases θ. After that, the output layer receives the activated outputs (a1 to aL) and generates the final prediction output ‘ok’, which, for regression tasks, is usually used for linear activation [
34]. For the purpose of modifying weights and reducing loss, the backpropagation technique is used to propagate the error backwards:
In Equation (4), ‘y’ is the actual load and ‘
’ is the predicted load, representing the squared error, and to reduce the prediction error from the network, BPNN produce a gradient-layer as shown below in Equation (5), where ‘
’ is the current weight, ‘
’ is the updated weight, ‘ƞ’ is the learning rate, and ‘
’ is the gradient of the loss:
Through the network, after the prediction was performed, the error is determined by comparing the predicted value with the actual value. To increase forecast accuracy, this procedure iteratively continues during training. Moreover, BPNNs perform effectively for nonlinear load forecasting situations, while they may not be equally effective at addressing sequential dependencies as recurrent models like LSTM.
- 3.
Artificial Neural Network
In
Figure 14, the block diagram of an ANN is presented, which is a computational model that uses the structure and functions of the human brain as inspiration to find complex patterns and relationships in data. ANN consists of layers of interconnected nodes (neurons) called input, hidden, and output layers. Neurons use activation functions to add nonlinearity, and each connection carries a weight:
ANNs are formed using interconnected neurons that use nonlinear activation and weighted summation to transform inputs into outputs, as shown in Equation (6), where ‘’ is the input variables, ‘’ is the weight connecting input i to neuron j, ‘’ is the bias, ‘f’ is the activation function, and ‘’ is the output of neuron j. Hence, ANNs are suitable for nonlinear regression issues in the framework of electrical load forecasting since they can represent complex interactions between input features like time and historical load levels. However, compared to the hybrid model, recurrent LSTM-ANN are less effective at predicting time series due to their inability to capture temporal dependencies.
- 4.
K-Nearest Neighbors:
KNN is an instance-based learning algorithm widely used for both classification and regression problems. Utilizing a distance measure like Euclidean distance, KNN finds the ‘K’ most comparable occurrences (neighbors) in the historical dataset to predict the output for a given input in a situation of electrical load forecasting [
35]. Equation (7) represents the Euclidean distance for KNN, where ‘x’ is the test input, ‘
’ is the i
th training input, and ‘n’ is the total feature. Based on this, KNNs select the smallest metrics:
Frequently, the average of the outputs from the neighbors yields the predicted value. The red dot point in
Figure 15 indicates the location where a load prediction is produced. It finds the nearest neighbors, and the outcome is inferred from their values. Although KNN’s interpretability and simplicity make it useful, it can become computationally costly when dealing with huge datasets and is unable to simulate intricate temporal patterns [
36].
After the overall discussion of individual methods, a comparative analysis of the advantages and disadvantages of DL methods is given below in
Table 4.