Load Forecasting for the Laser Metal Processing Industry Using VMD and Hybrid Deep Learning Models

: Electric load forecasting is crucial for the metallurgy industry because it enables effective resource allocation, production scheduling, and optimized energy management. To achieve an accurate load forecasting, it is essential to develop an efﬁcient approach. In this study, we considered the time factor of univariate time-series data to implement various deep learning models for predicting the load one hour ahead under different conditions (seasonal and daily variations). The goal was to identify the most suitable model for each speciﬁc condition. In this study, two hybrid deep learning models were proposed. The ﬁrst model combines variational mode decomposition (VMD) with a convolutional neural network (CNN) and gated recurrent unit (GRU). The second model incorporates VMD with a CNN and long short-term memory (LSTM). The proposed models outperformed the baseline models. The VMD–CNN–LSTM performed well for seasonal conditions, with an average RMSE of 12.215 kW, MAE of 9.543 kW, and MAPE of 0.095%. Meanwhile, the VMD–CNN–GRU performed well for daily variations, with an average RMSE value of 11.595 kW, MAE of 9.092 kW, and MAPE of 0.079%. The ﬁndings support the practical application of the proposed models for electrical load forecasting in diverse scenarios, especially concerning seasonal and daily variations.


Background
The rise in population growth and economic development has resulted in a significant increase in electricity consumption in both household and commercial sectors. For instance, in Poland, end-user energy consumption steadily increased by an average of 1% annually between 2015 and 2019, leading to an overall increase in electricity supplied to end users by approximately 6.8%, or 8.64 TWh [1]. Thus, electric utilities must maintain power grid stability by providing an adequate source of electricity to meet load demand, which fluctuates over time. This necessitates electric utilities to estimate the current and future power demand to achieve a balance between supply and demand, enabling the generated power to meet the power system reliability criteria. Load forecasting, which involves estimating current and future load demand, is essential for power system operation and planning, enabling authorities to make crucial decisions regarding load changes, power generation planning, power transaction planning, and infrastructure development [2].
The energy intensity exhibited by companies in Poland, encompassing both the industrial and service sectors, is comparatively elevated within the European Union [3]. Increasing electricity prices in the commercial sector poses a significant challenge. Consequently, companies are compelled to improve energy efficiency and explore alternative solutions to manage energy demand and prices. Despite the potential economic benefits and reduction in carbon emissions, only a few end-users in the industrial and service sectors have adopted renewable energy sources, battery storage, and smart technologies

Related Work and Contribution
Several approaches are available for load forecasting, and these methods differ depending on whether they are used for short-, medium-, or long-term forecasts. The primary contrast between medium and long-term forecasting and short-term forecasting is that the former involves external factors such as population growth projections, economic changes, and technological developments. End-use and econometric modelling are two well-known techniques employed for medium-to long-term predictions [2]. In contrast, conventional statistical methods [1], such as the autoregressive integrated moving average (ARIMA) [7] and artificial intelligence methods [8,9], such as machine learning [10] and deep learning (DL) [11,12] are some of the methods that have been developed for short-term load forecasting.
A study in [13] compared the performance of short-term forecasting models based on statistical and deep learning methods. Researchers have determined that both methods have distinct advantages and drawbacks. The advantages associated with deep learning methods include a lower error rate, greater flexibility in addressing more intricate problems, and the ability to learn from data using computing power [14]. DL is a highly versatile approach that can be implemented effectively across a range of prediction tasks. However, the challenge in using the DL method is to determine the optimal configuration for the structure of the models. In contrast, traditional statistical methods can provide accurate prediction results and establish relationships between variables. However, this method primarily relies on traditional data analysis and may present challenges in its application to large and nonlinear datasets.
According to [2], the short-term load forecast can be affected by several key factors, such as time, meteorological conditions, and potential end-use categories. The time factor is important to distinguish the pattern of electricity consumption under different conditions. For instance, the consumption of electricity during the summer and winter exhibited discernible discrepancies. Furthermore, there are significant disparities in the electricity consumption patterns on weekdays and weekends. Therefore, this time factor is very informative for load forecasting analysis by utilizing the time index of the data, such as the day of the week, hour of the day, and minute of the hour. Moreover, weather conditions also play an important role in influencing load demand characteristics. Among the various parameters, temperature and humidity have the most significant influence on the prognosis of the load demand. The last crucial key factor is customer class. Electricity consumers are typically categorized into three groups: Residential, Commercial, and Industrial. While electricity consumption patterns vary considerably among different classes, they exhibit a degree of similarity within each category [2]. Therefore, most utilities differentiate load patterns by class. These key factors play a critical role in determining the future load demand characteristics. Therefore, it is essential to consider these factors for accurate load forecasting. Nonetheless, it is worth noting that the development of load forecasting models is not necessarily restricted to incorporating all these key factors.
The present investigation focuses exclusively on a univariate time-series dataset that comprises historical variables related to electricity consumption. Notably, the relevant literature provides examples of studies that solely employ historical data of power demand or electricity load datasets for the development of load forecasting models. For instance, [15] presented an investigation wherein the authors exclusively utilized historical electric load datasets to devise a long-term hybrid model based on MLP and statistical techniques.
Similarly, [18] demonstrated a study that employed only historical load usage data to develop short-term load forecasting using an RNN. Reference [7] presented a study on the development of short-term load forecasting using the ARIMA and ANN approaches. In this study, the author utilized real load electricity profiles were used as historical datasets. To achieve accurate load demand forecasting, it is imperative to consider the specific crucial factors. Consequently, this study places a primary emphasis on the time component as the principal element for load forecasting. By extracting the time index from the dataset, comprehensive information concerning the minutes, hours, days, weeks, months, and years associated with each individual data point can be obtained. This time index facilitates convenient categorization of data points based on comparable temporal conditions. Based on this insightful perspective, it is evident that DL methods hold substantial value in effectively addressing load forecasting challenges by utilizing historical electric load datasets. However, the current study identifies a significant research gap concerning the limited exploration of DL models, specifically for hour-ahead electricity load forecasting in the context of the metallurgy industry. Although previous studies have established a foundation for the effectiveness of DL models in electric load forecasting, there remains a scarcity of comparative studies that specifically focus on developing load forecasting models tailored to the unique requirements of the metallurgy industry. Furthermore, the influence of diverse conditions on the performance of deep learning models in this industry has not been adequately investigated. Consequently, the present study aims to investigate the hypothesis that there exists a significant difference in the accuracy of hour-ahead electric load forecasting among various deep learning models when subjected to different conditions within the metallurgy industry.
The main contribution of this study is the implementation of several deep learning models to predict the one-hour-ahead electricity load under different conditions, while also identifying the appropriate individual DL model for each specific condition. These conditions are divided into two broad categories: seasonal and day category. Seasonal conditions encompass winter, spring, summer, and autumn. The day category was subdivided into weekdays (Monday, Tuesday, Wednesday, Thursday, and Friday) and weekends (Saturday and Sunday). The proposed DL models for load forecasting evaluated in this study are hybrid models of variational mode decomposition (VMD) with CNN-LSTM and VMD-CNN-GRU. This model was compared with the baseline models, including MLP, LSTM, GRU, CNN, hybrid CNN-LSTM, and hybrid CNN-GRU. The main contributions of this study are summarized as follows.

•
The inclusion of the time factor as a crucial component that influences load forecasting, which allows for the differentiation of each data point and the formation of numerous sub-datasets based on various conditions. • The implementation of several deep learning models for short-term load forecasting for one hour ahead, considering different conditions. • The comparison and assessment of the performance of the deep learning models for load forecasting across diverse sub-datasets.
The paper is structured in the following manner: Section 2 provides a brief overview of deep learning models, Section 3 outlines the proposed methodology, Section 4 presents dataset description, Section 5 present results and discussion, and finally, Section 6 provides the paper's conclusion.

Multilayer Perceptron (MLP)
The MLP, a feedforward neural network, generates outputs from inputs by learning the complex relationship between linear and non-linear data patterns [27]. The MLP is a deep learning model characterized by a hierarchical structure comprising a minimum of three node layers, namely the input layer, hidden layers, and output layer [28,29]. These layers are structured and interconnected, as shown in Figure 1. Each layer in the MLP model has at least one node. This node is a neuron or a perceptron that has a nonlinear activation function to perform computation, except for the nodes in the input layers [28]. The computation in the nodes is done by the total multiplication of the weighting value (w) and the input value (x) from the previous layer plus the bias value (b). The transfer function (f ) is used to produce an output (y) in some way. The mathematical formula of the calculation can be represented as follows in Equation (1): this study are hybrid models of variational mode decomposition (VMD) with CNN-LSTM and VMD-CNN-GRU. This model was compared with the baseline models, including MLP, LSTM, GRU, CNN, hybrid CNN-LSTM, and hybrid CNN-GRU. The main contributions of this study are summarized as follows.
• The inclusion of the time factor as a crucial component that influences load forecasting, which allows for the differentiation of each data point and the formation of numerous sub-datasets based on various conditions. • The implementation of several deep learning models for short-term load forecasting for one hour ahead, considering different conditions. • The comparison and assessment of the performance of the deep learning models for load forecasting across diverse sub-datasets.
The paper is structured in the following manner: Section 2 provides a brief overview of deep learning models, Section 3 outlines the proposed methodology, Section 4 presents dataset description, Section 5 present results and discussion, and finally, Section 6 provides the paper's conclusion.

Multilayer Perceptron (MLP)
The MLP, a feedforward neural network, generates outputs from inputs by learning the complex relationship between linear and non-linear data patterns [27]. The MLP is a deep learning model characterized by a hierarchical structure comprising a minimum of three node layers, namely the input layer, hidden layers, and output layer [28,29]. These layers are structured and interconnected, as shown in Figure 1. Each layer in the MLP model has at least one node. This node is a neuron or a perceptron that has a nonlinear activation function to perform computation, except for the nodes in the input layers [28]. The computation in the nodes is done by the total multiplication of the weighting value (w) and the input value (x) from the previous layer plus the bias value (b). The transfer function (f) is used to produce an output (y) in some way. The mathematical formula of the calculation can be represented as follows in Equation (1): In the learning phase, the MLP model employs the backpropagation technique to flexibly modify the connection weight values [30]. This modification is based on the error rate, which is computed by contrasting the anticipated output value with the real output value obtained from processing each dataset. This procedure is iteratively performed until the MLP model reaches an optimized error rate, thereby enabling the MLP model to function with greater precision and reliability. In the learning phase, the MLP model employs the backpropagation technique to flexibly modify the connection weight values [30]. This modification is based on the error rate, which is computed by contrasting the anticipated output value with the real output value obtained from processing each dataset. This procedure is iteratively performed until the MLP model reaches an optimized error rate, thereby enabling the MLP model to function with greater precision and reliability.
The MLP model is also a well-known model for load forecasting, as discussed in reference [15]. This model combines statistical techniques with machine learning methods to address the potential issue of average convergence when solely relying on machine learning for mid/long-term load forecasting. Similarly, another study discussed in reference [16] Energies 2023, 16, 5381 5 of 24 demonstrates the proficiency of the MLP model in accurately predicting electrical power demand usage.

Recurrent Neural Network (RNN)
The recurrent neural network (RNN) represents a distinct variation of the deep learning (DL) model, which is considered unique when compared to other variants [31]. The reason for its uniqueness lies in the RNN layer's special ability to model short-term dependencies with a hidden state [32]. This hidden state serves as a storage unit, acting as a highway that passes information from one time step to another within the unrolled RNN units [31]. The structure of these unrolled RNN units is depicted in Figure 2. This model also has capabilities for load forecasting applications. In reference [17], the authors propose the Online Adaptive RNN as a load forecasting approach that demonstrates the ability to continually learn from new data and adapt to evolving patterns. Additionally, reference [18] focuses on the application of RNN models in predicting electrical load to maintain a balance between demand and supply.
address the potential issue of average convergence when solely relying on m ing for mid/long-term load forecasting. Similarly, another study discussed [16] demonstrates the proficiency of the MLP model in accurately predic power demand usage.

Recurrent Neural Network (RNN)
The recurrent neural network (RNN) represents a distinct variation of th ing (DL) model, which is considered unique when compared to other vari reason for its uniqueness lies in the RNN layer's special ability to model s pendencies with a hidden state [32]. This hidden state serves as a storage un highway that passes information from one time step to another within the u units [31]. The structure of these unrolled RNN units is depicted in Figure  also has capabilities for load forecasting applications. In reference [17], the pose the Online Adaptive RNN as a load forecasting approach that demonst ity to continually learn from new data and adapt to evolving patterns. Add erence [18] focuses on the application of RNN models in predicting electrical tain a balance between demand and supply. In the unrolled RNN unit, a hidden state at the current time step (h (t) ) by the value of the previous hidden state (h (t−1) ) and the current input (x (t) ) provides a memory function to retain the information of the previous time s cessing the information of the current time step. Therefore, the output at the step (o (t) ) of the RNN is constantly dependent on the previous elements in All connections between the input, the hidden state, and the output of the u unit have weights (w) and biases (b) at all time steps. The formula for calcul den state value (refer to Equation (2)) and the output value (refer to Equati rent time step is as follows: Basically, the RNN is a simple and powerful model. However, it has a sociated with the exploding and vanishing gradient when backpropagatio used. The problem is that standard RNNs have difficulty capturing long-ter cies because multiplicative gradients can decrease or increase exponentially w ber of layers. To address these issues, a different family of RNNs can be us include long short-term memory (LSTM) and gated recurrent unit (GRU). are an extension of regular RNNs that allow dealing with long-term depe storing information for longer periods of time without exploding or vanish [33]. In the unrolled RNN unit, a hidden state at the current time step (h (t) ) is determined by the value of the previous hidden state (h (t−1) ) and the current input (x (t) ). This process provides a memory function to retain the information of the previous time step while processing the information of the current time step. Therefore, the output at the current time step (o (t) ) of the RNN is constantly dependent on the previous elements in the sequence. All connections between the input, the hidden state, and the output of the unrolled RNN unit have weights (w) and biases (b) at all time steps. The formula for calculating the hidden state value (refer to Equation (2)) and the output value (refer to Equation (3)) at current time step is as follows: Basically, the RNN is a simple and powerful model. However, it has a drawback associated with the exploding and vanishing gradient when backpropagation over time is used. The problem is that standard RNNs have difficulty capturing long-term dependencies because multiplicative gradients can decrease or increase exponentially with the number of layers. To address these issues, a different family of RNNs can be used. Examples include long short-term memory (LSTM) and gated recurrent unit (GRU). These models are an extension of regular RNNs that allow dealing with long-term dependencies and storing information for longer periods of time without exploding or vanishing gradients [33].
In reference [34], the LSTM model was introduced to solve the vanishing gradient problem by including memory cells and gates that regulate the information of the network. A typical LSTM consists of memory blocks called cells, which have two states: the cell Energies 2023, 16, 5381 6 of 24 state and the hidden state. The cells are used to make decisions by storing or ignoring information through three primary gates. These gates include a forget gate, an input gate, and an output gate, as shown in Figure 3a. The LSTM network operates in three steps: In the first step, the network uses the forget gate to determine what information to ignore or storing for the cell state by calculating the input at the current time step (x t ) and the previous value of the hidden state (h (t−1) ) using the sigmoid function (S). The computation in forget gate is presented as follows in Equation (4): Energies 2023, 16, x FOR PEER REVIEW 6 of 24 In reference [34], the LSTM model was introduced to solve the vanishing gradient problem by including memory cells and gates that regulate the information of the network. A typical LSTM consists of memory blocks called cells, which have two states: the cell state and the hidden state. The cells are used to make decisions by storing or ignoring information through three primary gates. These gates include a forget gate, an input gate, and an output gate, as shown in Figure 3a. The LSTM network operates in three steps: In the first step, the network uses the forget gate to determine what information to ignore or storing for the cell state by calculating the input at the current time step (xt) and the previous value of the hidden state (h(t−1)) using the sigmoid function (S). The computation in forget gate is presented as follows in Equation (4): In the second step, the network decides to update the old cell state (C(t−1)) into a new cell state (Ct) by selecting which new information to include in long-term memory (cell state). This process requires reference values from the forget gate, the input gate (see Equation (5)), and the cell update gate (see Equation (6)). The formulas for this step are as follows in Equation (7): When the update process for new cell state (Ct) is complete, the next step is to define the value of new hidden state (h(t)). This state acts as a memory of the network that contains information about previous data. Additionally, it can be used as an output for prediction. In this step, the value of the new cell state and the output gate are used. The formula is as follows in Equations (8) and (9): Another variant that is simpler than LSTM is the GRU model. In this model, the cell state is removed, and the hidden state is used to transmit information. The GRU model has only two gates: the update gate ( ) and the reset gate ( ), which are shown in Figure   3b. The update gate was formed by merging the input gate and the forget gate into one gate. This gate works similarly to the forget and input gate of the LSTM, which decides whether to add or ignore useful information [35]. While the reset gate is used to decide In the second step, the network decides to update the old cell state (C (t−1) ) into a new cell state (C t ) by selecting which new information to include in long-term memory (cell state). This process requires reference values from the forget gate, the input gate (see Equation (5)), and the cell update gate (see Equation (6)). The formulas for this step are as follows in Equation (7): When the update process for new cell state (C t ) is complete, the next step is to define the value of new hidden state (h (t) ). This state acts as a memory of the network that contains information about previous data. Additionally, it can be used as an output for prediction. In this step, the value of the new cell state and the output gate are used. The formula is as follows in Equations (8) and (9): Another variant that is simpler than LSTM is the GRU model. In this model, the cell state is removed, and the hidden state is used to transmit information. The GRU model has only two gates: the update gate (r t ) and the reset gate (z t ), which are shown in Figure 3b. The update gate was formed by merging the input gate and the forget gate into one gate. This gate works similarly to the forget and input gate of the LSTM, which decides whether to add or ignore useful information [35]. While the reset gate is used to decide how much Energies 2023, 16, 5381 7 of 24 information to remove from the past. The value of the reset gate (refer to Equation (11)) and the update gate (see Equation (10)) determines the new hidden state as the output of the network. The formula for the new hidden state (h (t) ) is shown in Equation (13): LSTM and GRU models possess the capacity to address load forecasting challenges. In reference [20], the authors introduce an LSTM model with the prophet model, aiming to overcome the aforementioned limitations and achieve accurate load prediction. On the other hand, reference [21] presents GRU as an adaptive approach that focuses on targeted design to capture variable temporal dependence and incorporates both periodic and nonlinear characteristics of load forecasting problems.

Convolutional Neural Network (CNN)
The convolutional neural network (CNN) is one of the DL models mainly used for image processing analysis and pattern recognition [36]. This is due to the network is able to learn highly abstracted features of objects, such as spatial data [37]. However, the CNN can also be used for time series prediction since it has the capability to automatically learn the features of sequence data involving multiple variables. Typically, a CNN from reference [38] consists of several layers: Convolutional layers, Pooling layer, Flattening layer and fully connected layer, which are shown in Figure 4. how much information to remove from the past. The value of the reset gate (refer to Equation (11)) and the update gate (see Equation (10)) determines the new hidden state as the output of the network. The formula for the new hidden state (h(t)) is shown in Equation LSTM and GRU models possess the capacity to address load forecasting challenges. In reference [20], the authors introduce an LSTM model with the prophet model, aiming to overcome the aforementioned limitations and achieve accurate load prediction. On the other hand, reference [21] presents GRU as an adaptive approach that focuses on targeted design to capture variable temporal dependence and incorporates both periodic and nonlinear characteristics of load forecasting problems.

Convolutional Neural Network (CNN)
The convolutional neural network (CNN) is one of the DL models mainly used for image processing analysis and pattern recognition [36]. This is due to the network is able to learn highly abstracted features of objects, such as spatial data [37]. However, the CNN can also be used for time series prediction since it has the capability to automatically learn the features of sequence data involving multiple variables. Typically, a CNN from reference [38] consists of several layers: Convolutional layers, Pooling layer, Flattening layer and fully connected layer, which are shown in Figure 4. The convolutional layers are the main part of the CNN that can recognize patterns and features from the input file. This layer produces an output called feature maps. The feature maps are generated by applying filters to the input data. They can be used to detect relationships and patterns from the input data [36]. The second layer is the pooling layer. This layer was used to create a subsample by shrinking the larger feature maps into a smaller feature map [37] to reduce the dimensionality and extract the dominant features for efficient training of the model [36]. Then, the flattening layer is used to generate a onedimensional vector that feeds the final layer of the CNN, the fully connected layer (FC).
The CNN model can be applied to address various challenges encountered in load forecasting. For instance, in study [23], the authors utilized 1D convolutional neural networks to extract valuable features from historical load data sequences. The proposed approach exhibits excellent performance in short-term load forecasting. Similarly, reference [24] introduces a similar study that employs CNN to extract informative features from The convolutional layers are the main part of the CNN that can recognize patterns and features from the input file. This layer produces an output called feature maps. The feature maps are generated by applying filters to the input data. They can be used to detect relationships and patterns from the input data [36]. The second layer is the pooling layer. This layer was used to create a subsample by shrinking the larger feature maps into a smaller feature map [37] to reduce the dimensionality and extract the dominant features for efficient training of the model [36]. Then, the flattening layer is used to generate a one-dimensional vector that feeds the final layer of the CNN, the fully connected layer (FC).
The CNN model can be applied to address various challenges encountered in load forecasting. For instance, in study [23], the authors utilized 1D convolutional neural networks to extract valuable features from historical load data sequences. The proposed approach exhibits excellent performance in short-term load forecasting. Similarly, reference [24] introduces a similar study that employs CNN to extract informative features from input data. Subsequently, a CNN-Seq2Seq model with an attention mechanism based on a multi-task learning method is proposed for short-term multi-energy load forecasting.

Hybrid Deep Learning Model
Various techniques can be employed to develop a hybrid model for time series forecasting. In this comparative investigation, we used two distinct hybrid deep learning models based on the structure of the convolutional neural network (CNN), long short-term memory (LSTM), and gated recurrent unit (GRU). The first model comprises a fusion of the CNN and LSTM networks, whereas the second model integrates the CNN with the GRU network. The CNN-LSTM architecture is composed of a convolutional layer, a pooling layer, a flattening layer, and an LSTM network [14]. The CNN-GRU model shares a similar structure, except for the fact that the LSTM network has been substituted with the GRU network. The hybrid DL model used in this study including CNN-LSTM and CNN-GRU operate in a sequential manner, where the output of one component serves as the input to the next component. The architectural layout of the CNN-LSTM or CNN-GRU hybrid deep learning model is illustrated in Figure 5.
input data. Subsequently, a CNN-Seq2Seq model with an attention mechanism based on a multi-task learning method is proposed for short-term multi-energy load forecasting.

Hybrid Deep Learning Model
Various techniques can be employed to develop a hybrid model for time series forecasting. In this comparative investigation, we used two distinct hybrid deep learning models based on the structure of the convolutional neural network (CNN), long shortterm memory (LSTM), and gated recurrent unit (GRU). The first model comprises a fusion of the CNN and LSTM networks, whereas the second model integrates the CNN with the GRU network. The CNN-LSTM architecture is composed of a convolutional layer, a pooling layer, a flattening layer, and an LSTM network [14]. The CNN-GRU model shares a similar structure, except for the fact that the LSTM network has been substituted with the GRU network. The hybrid DL model used in this study including CNN-LSTM and CNN-GRU operate in a sequential manner, where the output of one component serves as the input to the next component. The architectural layout of the CNN-LSTM or CNN-GRU hybrid deep learning model is illustrated in Figure 5.

Description of Proposed Methodology
To assess the performance of deep learning models in predicting load for one hourahead in various circumstances, the methodology employed is presented in Figure 6. This method comprises several distinct stages: data collection, data preprocessing, variational mode decomposes, construction of the forecast model, and model evaluation.

Description of Proposed Methodology
To assess the performance of deep learning models in predicting load for one hourahead in various circumstances, the methodology employed is presented in Figure 6. This method comprises several distinct stages: data collection, data preprocessing, variational mode decomposes, construction of the forecast model, and model evaluation.

Data Collection
Data collection, also known as data acquisition, refers to the process of gathering information from various sources. Prior to storing the data in the storage system, it underwent a filtering and cleaning process. In the present study, the focus was on collecting univariate time-series data, specifically from the metallurgy industry in Poland. This dataset consists solely of a single variable, namely electricity consumption. Initially, the data were recorded at a frequency of 15 min using a power quality meter. However, for shortterm forecasting, the dataset was artificially resampled into hourly intervals using the MATLAB software library. This resampling was carried out because the research primarily aimed to develop an individual deep learning model for predicting load demand with a one-hour lead time, considering specific conditions. Hence, the methods developed in this study specifically cater to univariate time-series data.

Data Preprocessing
Data pre-processing refers to the techniques used to prepare and convert raw data from various sources into a format suitable for DL models. The application of data preprocessing is critical because it enables the improvement of data quality by extracting valuable insights from data. In this research, several data pre-processing techniques are used sequentially, starting from data normalization, splitting the dataset, and reshaping the data structure using the sliding window method.

Data Normalization
Datasets often come from various sources, and their parameters may have different units and scales. This variation can affect the performance of DL models during the learning phase and lead to increased generalization error [39]. Therefore, it is necessary to scale all variables within the dataset. DL models perform better when input variables are scaled to a standardized range. The Min-max normalization is a popular technique, which maps the original value of the dataset to a new range [14,40]. The mathematical formula for minmax normalization used in this study is presented in the following Equation (14):

Data Collection
Data collection, also known as data acquisition, refers to the process of gathering information from various sources. Prior to storing the data in the storage system, it underwent a filtering and cleaning process. In the present study, the focus was on collecting univariate time-series data, specifically from the metallurgy industry in Poland. This dataset consists solely of a single variable, namely electricity consumption. Initially, the data were recorded at a frequency of 15 min using a power quality meter. However, for short-term forecasting, the dataset was artificially resampled into hourly intervals using the MATLAB software library. This resampling was carried out because the research primarily aimed to develop an individual deep learning model for predicting load demand with a one-hour lead time, considering specific conditions. Hence, the methods developed in this study specifically cater to univariate time-series data.

Data Preprocessing
Data pre-processing refers to the techniques used to prepare and convert raw data from various sources into a format suitable for DL models. The application of data preprocessing is critical because it enables the improvement of data quality by extracting valuable insights from data. In this research, several data pre-processing techniques are used sequentially, starting from data normalization, splitting the dataset, and reshaping the data structure using the sliding window method.

Data Normalization
Datasets often come from various sources, and their parameters may have different units and scales. This variation can affect the performance of DL models during the learning phase and lead to increased generalization error [39]. Therefore, it is necessary to scale all variables within the dataset. DL models perform better when input variables are scaled to a standardized range. The Min-max normalization is a popular technique, which maps the original value of the dataset to a new range [14,40]. The mathematical formula for min-max normalization used in this study is presented in the following Equation (14): 3.

Dataset Splitting
In the dataset splitting stage, the observed values in a time-series dataset are grouped based on their similarity in time conditions. This study incorporated two types of time conditions: seasonal and day category. The seasonal type divides the univariate time series data into sub-datasets representing winter, spring, summer, and autumn, whereas the day category divides them into sub-datasets for working days and weekends. In this study, these sub datasets were used to develop forecasting models under different conditions. Each sub-dataset associated with a specific condition was further split into training, validation, and testing datasets. The primary objective of this process is to prepare a dataset for training the deep learning model and for evaluating and optimizing its performance.
In fact, there is no optimal solution for specifying the percentage of splitting ratio to divide the original dataset into training, validation, and testing datasets. According to the literature, various approaches have been implemented to address dataset-splitting concerns. One such approach is presented in [14], where a ratio of 70% was used for the training dataset, 15% for the validation dataset, and 15% for testing. Other studies, referenced in [41,42], employed a different scenario with a 90% ratio for training and 10% for testing. Based on this literature, our study followed the scenario of 90% for training and 10% for testing. During the development of the training model, 20% of the 90% of the training dataset was allocated to the validation dataset. This allocation is possible because of the availability of an option for model fitting to divide a certain ratio of training data for use as a validation dataset. As a result, this process provides six separate sub-datasets based on selected conditions (winter, spring, summer, autumn, working days, and weekend dataset), and each of them is split further into training dataset with the ratio 90% and 10% for testing dataset.

Sliding Window Approach
As the individual sub-dataset (training and testing dataset of each condition) is still in the form of time series data, it is necessary to reshape the structure into a supervised learning dataset since the deep learning model used in this study deals exclusively with supervised learning problems [8]. The dataset structure must consist of input patterns (X) and output patterns (y). The sliding window approach (see Figure 7) is commonly used for this purpose, where the value of the previous time step serves as the input variable, and the value of the following time step serves as the output variables [38]. In this study, a sliding window with an input width of six and a label width of one was used. Specifically, the last six hours of data were taken as the input to predict the load one hour ahead of the current time (t). Figure 7 illustrates how a sliding window can be used to convert the structure of time series data into a format suitable for supervised learning. The red column represents the input variable (X), which shows the value of the last six hours, and the yellow column represents the output variable (y), which explains the load one hour ahead, while the blue column represents the current time (t). The sliding window method was applied to all subsets of the training and test time-series datasets in this study.

Variational Mode Decomposition (VMD)
In this study, the VMD method was used to improve the accuracy of the proposed hybrid deep learning model by considering the nonlinear and non-stationary characteristics of the power consumption dataset. This method is known as adaptive [43] and is a datadriven approach for decomposing signals with complex and non-stationary characteristics. The VMD algorithm decomposes the original signal into a set of mode functions that represent the different oscillation components at different frequencies and scales. The iterative solution of an optimization problem generates these modes, where the objective is to minimize the cost function to extract the modes. From the viewpoint of empirical mode decomposition [44], these modes are referred to as signals that exhibit a difference of at most one between the number of local extrema and zero crossings. In subsequent related studies, this definition has been slightly modified and referred to as intrinsic mode function (IMFs).

Variational Mode Decomposition (VMD)
In this study, the VMD method was used to improve the accuracy of the pr hybrid deep learning model by considering the nonlinear and non-stationary char tics of the power consumption dataset. This method is known as adaptive [43] a data-driven approach for decomposing signals with complex and non-stationary teristics. The VMD algorithm decomposes the original signal into a set of mode fu that represent the different oscillation components at different frequencies and sca iterative solution of an optimization problem generates these modes, where the o is to minimize the cost function to extract the modes. From the viewpoint of em mode decomposition [44], these modes are referred to as signals that exhibit a diff of at most one between the number of local extrema and zero crossings. In sub related studies, this definition has been slightly modified and referred to as intrins function (IMFs).
The main objective of VMD is to construct and deal with the variational probl This method breaks down a real-valued input signal into a set of sub-signal or denoted as , with specific sparsity properties while accurately representing the signal [44]. The approach aims to minimize the total frequency bandwidth while e that the sum of the decomposed modes equals the original input signal. This object constraint are depicted in Equation (15)  The main objective of VMD is to construct and deal with the variational problem [45]. This method breaks down a real-valued input signal into a set of sub-signal or modes, denoted as u k , with specific sparsity properties while accurately representing the original signal [44]. The approach aims to minimize the total frequency bandwidth while ensuring that the sum of the decomposed modes equals the original input signal. This objective and constraint are depicted in Equation (15): In the given context, k represents the desired number of modes to be decomposed, which is a positive integer. {uk}, {ωk} refer to the k-th modal component and the center frequency, respectively. The function δ (t) represents the dirac function, and (*) denotes the convolution operator [43].
To address the reconstruction constraint (refer to Equation (15)), a combination of a quadratic penalty term and Lagrangian multipliers is proposed to make the problem unconstrained. This augmented Lagrange expression is presented in Equation (16):

Building Forecasting Model
In this phase, distinct basic predictive models are constructed for each specific condition. The primary objective behind the development of these models is to discern and identify the most optimal individual model capable of accurately forecasting load demand with a one-hour lead time under specific conditions. The basic structure of deep learning (DL) model comprises four types of single networks: multilayer perceptron (MLP), long short-term memory (LSTM), gated recurrent unit (GRU), and convolutional neural network (CNN), as well as two hybrid networks, including CNN-LSTM and CNN-GRU. These models were used as the baseline models, whereas our study proposed the integration of VMD with the hybrid model of CNN-LSTM and VMD-CNN-GRU. The Keras and TensorFlow libraries were utilized as the primary frameworks to construct the architecture and layers of all the DL models, as shown in Table 1.

Model Evaluation
Model evaluation is a method for measuring the accuracy and effectiveness of predictive models using a test dataset. This dataset contained information that was not used during model training. In this study, the separate test dataset was divided into subsets, which were associated with different conditions. To evaluate the predictive models, test data were input into the models to generate predictions. The accuracy of the predictions was measured using three error metrics [40,46]: root mean square error (RMSE), which is presented in Equation (17), mean absolute error (MAE), as shown in Equation (18), and the mean absolute percentage error (MAPE) in Equation (19). The RMSE measures the spread of prediction errors [14,46], while the MAE calculates the average magnitude of prediction errors [33,35], and the MAPE measures the average percentage difference between the predicted value and actual value [47,48]. Smaller values of RMSE, MAE, and MAPE indicate better performance of the prediction model. The mathematical formulas for these error metrics are shown in the equation below: The actual value and the predicted value at time t are, respectively, denoted as yt −ŷt, and N is the sample size of the test data set.

Dataset Description
The electricity load data employed in this study were obtained from a metallurgical plant located in Poland. The dataset consists of a univariate time-series data that encompass a single variable, namely electricity consumption, measured in kilowatt units (kW). The data were collected with spans from 1 January 2019 to 31 December 2021. Power consumption data were collected from power quality measurements or smart meters provided by a utility company. Initially, the dataset was recorded at 15-min intervals. The sampling frequency, which represents the number of samples per unit time, can be determined by taking the reciprocal of the time interval. In this study, the sampling frequency was computed as 1 divided by 0.25, representing the conversion of the time interval to hours, resulting in a sampling frequency of 4 samples per hour. Consequently, the power consumption data in this investigation exhibited a sampling frequency of 4 samples per hour. However, because of the objective of this study, which is to predict power consumption one hour ahead under different circumstances, the original time-series dataset was artificially resampled into 1-h granularity. The process of resampling the data from a 15-min interval to a 1-h interval can be accomplished by utilizing various methods, such as aggregating the data. As mentioned in the proposed methodology, the dataset used in this study underwent preprocessing techniques including data normalization, data restructuring using the sliding window method, and dataset splitting. The aim of this process is to provide a suitable dataset for a deep learning model.
The original power consumption pattern within the dataset is depicted in Figure 8, indicating a consistent growth in electricity consumption from year to year. Upon extracting the dataset for year-by-year analysis (refer to Table 2), it was observed that the annual energy consumption escalated from 682 MWh in 2019 to 811.5 MWh in 2020, and further rose to 1190.9 MWh by the end of 2021 (see Figure 9). Furthermore, when considering specific circumstances, the dataset exhibits distinct patterns in relation to day categories and season categories (see Figure 9). Winter accounts for the highest percentage of annual electricity consumption, surpassing 27%, followed by autumn, spring, and summer. Weekdays exhibit higher electricity consumption compared to weekends, with over 80% of energy consumption occurring on weekdays. These trends suggest that load demand experiences seasonal and weekday/weekend fluctuations. Consequently, this study focuses on investigating the hypothesis that the performance of deep learning models for hour-ahead electric load forecasting varies significantly under different conditions in the metallurgy industry. Thus, separate forecasting models are required for each condition to accurately predict load demand.

Results and Discussion
In this study, a series of deep learning models encompassing various varian developed. The objective was to create distinct models that could function autono for each predefined condition. Consequently, during the training phase of the mental development, six deep learning models as the baseline and our proposed were trained and compared for each season and day category condition. The pur this comparison was twofold: first, to determine the optimal variant of the deep le model for specific conditions and second, to ascertain any significant disparities accuracy of hour-ahead electric load forecasting across different deep learning mod der varying conditions within the metallurgy industry.
In this study, the Variational Mode Decomposition (VMD) was proposed as tion to address the nonlinearity and nonstationary characteristics of the dataset. O posed approach involves decomposing the training and testing datasets using VM fore inputting the data into our hybrid model, which integrates the CNN-LST CNN-GRU architectures. The VMD function employed in this study was based

Results and Discussion
In this study, a series of deep learning models encompassing various variants were developed. The objective was to create distinct models that could function autonomously for each predefined condition. Consequently, during the training phase of the experimental development, six deep learning models as the baseline and our proposed model were trained and compared for each season and day category condition. The purpose of this comparison was twofold: first, to determine the optimal variant of the deep learning model for specific conditions and second, to ascertain any significant disparities in the accuracy of hour-ahead electric load forecasting across different deep learning models under varying conditions within the metallurgy industry.
In this study, the Variational Mode Decomposition (VMD) was proposed as a solution to address the nonlinearity and nonstationary characteristics of the dataset. Our proposed approach involves decomposing the training and testing datasets using VMD before inputting the data into our hybrid model, which integrates the CNN-LSTM and CNN-GRU architectures. The VMD function employed in this study was based on the methodology outlined in reference [49]. To configure the input parameters of the VMD function, we set the moderate bandwidth constraint to 2000, noise tolerance to 0, number of modes to 3, and initialized omega uniformly with a value of 1. The output of the VMD algorithm yielded a collection of decomposed modes. Figure 10 illustrates the original time-series dataset presented under different seasonal and day category conditions, accompanied by the collection of decomposed modes revealed by VMD. The dataset shown in Figure 10 represents the testing dataset utilized in our study, which is associated with certain conditions. Energies 2023, 16, x FOR PEER REVIEW 15 of 24 time-series dataset presented under different seasonal and day category conditions, accompanied by the collection of decomposed modes revealed by VMD. The dataset shown in Figure 10 represents the testing dataset utilized in our study, which is associated with certain conditions.
(e) (f) Upon decomposing the signal, a thorough analysis was conducted on the resulting decomposed signal obtained through the variational mode decomposition (VMD) to investigate the inherent characteristics of the dataset. Numerous approaches can be employed to select the modes from the VMD. Throughout the developmental stage, a meticulous visual inspection of the decomposed modes enabled us to select Mode 2 as the input Upon decomposing the signal, a thorough analysis was conducted on the resulting decomposed signal obtained through the variational mode decomposition (VMD) to investigate the inherent characteristics of the dataset. Numerous approaches can be employed to select the modes from the VMD. Throughout the developmental stage, a meticulous visual inspection of the decomposed modes enabled us to select Mode 2 as the input signal for our hybrid deep learning model. This visualization-oriented approach, exemplified by the utilization of plots in Figure 10, not only enhances the interpretability of our findings, but also effectively communicates the outcomes of our study.
Prior to feeding the dataset into the models employed in this study, careful consideration must be given to the data size and dimensions. This is because the utilized models were constructed using distinct architectures, resulting in varying requirements for input data treatment. For instance, the MLP model exclusively accepts a 2D array representation of the dataset, whereas other models, such as the LSTM, GRU, CNN, CNN-LSTM, and CNN-GRU, require a distinct dimensional dataset, specifically a 3D array. To accommodate these divergent requirements, the initial data preprocessing stage involves generating training datasets for each condition in a 2D array format, with data components denoted as [samples, time steps]. The number of samples corresponds to the rows in the dataset, and the number of time steps represents the inputs for the model. In this study, the time step value was set to six, utilizing the sliding window method to transform the sequence of time-series data into a supervised learning format. This entailed using the last six hours of the time-series sequence as input to predict the subsequent hour. To meet the requirements of other models that require a 3D array as the input dataset, it is necessary to convert the 2D array into a 3D array, incorporating an additional component referred to as features. The 3D array is structured with dimensions denoted as [samples, time steps, features], where the features correspond to the number of columns in each sample. In this study, the feature value was set to one for the dataset.
During the model development phase, we conducted training and testing of various baseline models, as well as our proposed models, VMD-CNN-LSTM and VMD-CNN-GRU, using diverse datasets under predetermined conditions. In the training stage, it is crucial to configure the hyperparameter settings in deep learning models to regulate and optimize the model's behavior and performance during the training process. Hyperparameters are predetermined parameters that are not learned from data. To ensure consistency across different deep learning model structures, we applied similar hyperparameter settings to both the proposed and baseline models. Initially, the selection of hyperparameter settings for deep learning models involved a combination of systematic experimentation, following best practices, and utilizing domain knowledge. Therefore, it can be asserted that there is no fixed answer to defining the hyperparameter setting (excluding hyperparameter auto-tuning). In this study, we utilized previous research to establish the configuration of the hyperparameters. For instance, the training optimizer employed was the Adam optimizer, as implemented in [50,51]. The chosen loss function for calculating the error of the deep learning model's prediction against the provided target value is the mean squared error (MSE), based on [41,52]. Furthermore, the number of epochs was set to 100, as indicated in [40]. The validation split is equally divided with a value of 0.2, referencing study [14]. Additionally, the batch size, which defines the number of samples that must be processed before updating the internal model parameters, was set to a default value of 32.
In this section, we undertake a comparative analysis between our proposed model and the baseline models to elucidate key distinctions. Our analysis centers on two pivotal parameters: the duration required for training and the model evaluation outcomes, encompassing the utilization of statistical metrics, such as RMSE, MAE, and MAPE. The quantification of the training time during the training phase of the deep learning model serves as a reliable gauge for the duration required to train the model on a given dataset. This evaluative measure facilitates the assessment of a model's efficiency and computational prerequisites, thereby enabling the identification of models that optimally align with specific requirements. Concurrently, the appraisal of model evaluation assumes a vital role Energies 2023, 16, 5381 17 of 24 in gauging the accuracy and efficacy of predictive models deployed on previously unseen datasets, commonly known as the testing dataset. Table 3 provides a comprehensive comparison of the computation times for the DL prediction models during the training phase for different datasets. It is evident from Table 3 that the GRU model takes the longest time to train, not only for seasonal forecasts but also for working days and weekend datasets, followed by the LSTM model. This can be attributed to the sequential processing nature of both GRU and LSTM models, wherein each subsequent step relies on the output of the preceding step [14]. Conversely, the Multilayer Perceptron (MLP) model was the fastest DL forecasting model for training. The proposed models, VMD-CNN-LSTM and VMD-CNN-GRU, exhibit training times that fall within a moderate range, occupying a position between the longer durations observed in certain models and the swiftness of the MLP. Nonetheless, upon meticulous examination, these models demonstrate a satisfactory suitability for load forecasting development scenarios across diverse conditions. Regarding the assessment of model performance, we employed three previously mentioned metrics to evaluate the accuracy of all the trained deep learning (DL) models. These metrics facilitate the comparison of model performance across individual test subsets, allowing for the identification of the most appropriate DL model for load prediction under specific seasonal and day category conditions. During the model evaluation stage, we assessed all baseline models and proposed models that were previously trained and saved, based on their respective conditions, using separate testing datasets tailored to those conditions. As presented in Table 4, our proposed model demonstrated lower scores in every seasonal and day category condition compared to the baseline models. This outcome signifies that the model exhibits good performance in predicting the electrical load of a metallurgy plant one hour-ahead compared to the baseline models.
Specifically, for the winter dataset, the VMD-CNN-LSTM model outperformed the other models, with an RMSE of approximately 16.805 kW and a MAE of 12.634 kW. However, in terms of the MAPE score, the VMD-CNN-GRU model performed better, with a score of 0.153%. For the spring and autumn datasets, the VMD-CNN-LSTM model demonstrated excellent performance compared to the other models under evaluation. Conversely, in the summer dataset, the VMD-CNN-GRU model exhibited outstanding results in terms of both the RMSE metric, with a score of approximately 14.389 kW, and MAE metric, with a score of 11.376 kW. Notably, both the proposed models yielded the same MAPE score (0.116%) in this scenario. Regarding the categorization of days, the proposed VMD-CNN-LSTM and VMD-CNN-GRU models exhibited superior performance compared with the baseline models, as reflected by lower scores in terms of RMSE, MAE, and MAPE. Specifically, in the workingdays dataset, the VMD-CNN-GRU model demonstrated outstanding performance compared to the VMD-CNN-LSTM model, achieving the lowest scores of approximately 12.115 kW for RMSE, 9.818 kW for MAE, and 0.079% for MAPE. Similarly, in the weekend dataset, the VMD-CNN-GRU model continues to outperform, yielding scores of approximately 11.075 kW for RMSE, 8.367 kW for MAE, and 0.137% for MAPE.
If the evaluation results of each type of model are accumulated using an average value based on the metric type, it can be observed that the VMD-CNN-LSTM model has an average MAE of 9543 kW for all seasonal conditions, whereas the VMD-CNN-GRU model has an MAE of 10,453 kW. From an RMSE perspective, the VMD-CNN-LSTM model had an average value of 12.21 kW, whereas the VMD-CNN-GRU model had an average RMSE value of 13.41 kW. Regarding the average MAPE value obtained from aggregating data from each season, that of the VMD-CNN-LSTM model was 0.095%, whereas that of the VMD-CNN-GRU model was 0.10%. Therefore, it can be concluded that the VMD-CNN-LSTM model performs better in predicting the electrical load one hour ahead under different seasonal conditions. When considering the results based on day conditions (average value of working days and weekend result), it is evident that the VMD-CNN-GRU model has an average RMSE value of 11.595 kW, whereas the VMD-CNN-LSTM model has an average of 12.21 kW. In terms of MAE, the VMD-CNN-GRU model achieved an average value of 9.09 kW, whereas the VMD-CNN-LSTM model had an average of 9.54 kW. Regarding the accuracy of the MAPE metric, the average value for the VMD-CNN-GRU model was 0.079%, whereas that for the VMD-CNN-LSTM model was 0.11%. Based on these results, it can be concluded that the VMD-CNN-GRU model demonstrates fairly good performance in predicting the electrical load one hour ahead under day category conditions.
In this experiment, the inclusion of Variational Mode Decomposition (VMD) techniques significantly improves the performance of the hybrid CNN-LSTM and CNN-GRU models. This enhancement is evidenced by achieving lower scores across all metric evaluations compared with the baseline models of CNN-LSTM and CNN-GRU, which do not incorporate VMD. The effectiveness of VMD has been further highlighted in previous studies, such as the work referenced in [53], where VMD was utilized in conjunction with a CNN-LSTM model. The experimental results of that study demonstrated superior performance compared with a recent method employing the same database, achieving an average accuracy of 98.65%. Likewise, in reference [54], the author employed a hybrid VMD-CNN-GRU model for the short-term forecasting of wind power. The proposed model exhibited exceptional performance in short-term forecasting, with notable metrics such as an RMSE of 1.5651, a MAE of 0.8161, a MAPE of 11.62%, and an R2 value of 0.9964. These results demonstrate the effectiveness of the proposed model in accurately predicting the wind power in the short term. Reference [55] introduced an integrated hybrid model called CNN-LSTM-MLP, which incorporates error correction and the variational mode decomposition (VMD). This study claims that the proposed model surpasses numerous conventional alternative approaches in terms of both accuracy and robustness. In reference [56], a novel approach to sparrow search algorithms integrated with VMD-LSTM was presented. The proposed model demonstrates significant enhancements in prediction accuracy and a reduction in wind power prediction errors compared with alternative methods. These findings provide empirical evidence supporting the effectiveness of the proposed prediction model. Figure 11 shows the forecast result of proposed model of VMD-CNN-LSTM and VMD-CNN-GRU to predict one hour ahead of electrical load of metallurgy plant under different season conditions. Figure 11a illustrates the results of a one-day load forecast during the winter season, covering the period from '2021-12-05T08:00:00' to '2021-12-06T07:00:00'. Similarly, Figure 11b presents the prediction outcome of a one-day electrical load forecast in the metallurgy sector during the spring season, spanning from '2021-05-04T15:00:00' to '2021-05-05T14:00:00'. Furthermore, Figure 11c provides the one-day forecast of electrical load during the summer season, encompassing the time range from '2021-08-04T15:00:00' to '2021-08-05T14:00:00'. Finally, Figure 11d displays the prediction results for electrical load during the autumn season, specifically covering the period from '2021-11-03T22:00:00' to '2021-11-04T21:00:00'. The forecast results of the electrical load for different day categories are depicted in Figure 12. This figure displays the one-day forecast of the electrical load. Specifically, Figure 12a presents the forecasted electrical load for a working day (Wednesday), spanning from '2021-09-15T01:00:00' to '2021-09-16T00:00:00'. On the other hand, Figure 12b showcases the forecast result of power consumption in a metallurgy plant during the weekend (Sunday), covering the period from '2021-09-12T01:00:00' to '2021-09-18T00:00:00'. ferent day categories are depicted in Figure 12. This figure displays the one-day forecast of the electrical load. Specifically, Figure 12a presents the forecasted electrical load for a working day (Wednesday), spanning from '2021-09-15T01:00:00' to '2021-09-16T00:00:00'. On the other hand, Figure 12b showcases the forecast result of power consumption in a metallurgy plant during the weekend (Sunday), covering the period from '2021-09-12T01:00:00' to '2021-09-18T00:00:00'.

Conclusions
Electric load forecasting is of paramount importance for power system operation a planning as well as in the electricity market. Thus, there is an urgent need for an effici approach to solve the load forecasting problem. In this study, the time factor was cons ered a crucial variable that could affect the load demand patterns. Consequently,

Conclusions
Electric load forecasting is of paramount importance for power system operation and planning as well as in the electricity market. Thus, there is an urgent need for an efficient approach to solve the load forecasting problem. In this study, the time factor was considered a crucial variable that could affect the load demand patterns. Consequently, the timestamp of the dataset was distinguished, and the data points were grouped based on similar conditions. It was found that the load demand varied according to seasons, working days, and weekends.
The proposed methodology aimed to develop a customized forecasting model to determine the load demand for one hour ahead under certain conditions. In this study we proposed the integration of variational mode decompose with hybrid model of CNN-LSTM and CNN-GRU. The main objective of VMD is to deal with nonlinearity and non-stationary of power consumption data used for deep learning model. While CNN can autonomously derive intricate spatial characteristics from electrical load data, while GRU or LSTM possess the ability to directly derive temporal features from previously recorded input data. Various popular deep learning models-including MLP, LSTM, GRU, CNN, CNN-LSTM, and CNN-GRU-were used in this study as baseline models. These models were compared to our proposed model VMD-CNN-GRU and VMD-CNN-LSTM to assess the performance and effectiveness of our proposed models.
In this study, we conducted a comparative analysis between our proposed model and baseline models to elucidate key distinctions. Our analysis focused on two pivotal parameters: the duration required for training and the outcomes of model evaluation, which encompassed the utilization of statistical metrics such as RMSE, MAE, and MAPE. In the experimental results obtained during the training stage, we observed that the GRU variant of the deep learning model required the longest training duration across diverse datasets. Conversely, the MLP model exhibited the fastest training speed when using all sub-datasets based on different conditions. Based on these observations, we determined that our proposed models, VMD-CNN-LSTM and VMD-CNN-GRU, exhibited training times that fell within a moderate range, occupying a position between the longer durations observed in certain models and the swiftness of MLP. Upon meticulous examination, these models demonstrated satisfactory suitability for load forecasting development scenarios across diverse conditions.
In terms of comparative analysis of model performance, our proposed model consistently outperforms the baseline models, exhibiting lower scores in terms of RMSE, MAE, and MAPE across all datasets. Specifically, the VMD-CNN-LSTM model performs exceptionally well in predicting the electrical load one hour ahead, particularly in seasonal conditions. This is evident from its superior performance, consistently achieving the lowest error metrics across most seasonal datasets. On the other hand, the VMD-CNN-GRU model excels in predicting the electrical load under day categories conditions. Both proposed models demonstrate the advantage of incorporating VMD, which effectively enhances the overall performance. This is evident when comparing them to the baseline models of CNN-LSTM and CNN-GRU, which do not utilize VMD integration.
Based on our experimental findings, we can conclude that the utilization of deep learning models is highly effective for forecasting the electrical load one hour ahead. However, it is crucial to appropriately configure the hyperparameters of these models. Therefore, future research should focus on incorporating automatic hyperparameter tuning techniques to optimize the performance of our forecasting model. Furthermore, the application of the variational mode decomposition (VMD) has demonstrated its effectiveness in signal analysis and modeling, particularly in the context of time series analysis and forecasting tasks. VMD has proven to enhance the accuracy and performance of the models utilized in this study. As a result, we recommend the implementation of the VMD-CNN-LSTM and VMD-CNN-GRU models for one-hour-ahead load forecasting in diverse scenarios, particularly considering seasonal and daily variations. To further enhance the forecasting capabilities, future research should consider the integration of additional factors, such as weather data, and explore the broader applications of deep learning models in predicting electricity consumption.