Multistep Forecasting of Power Flow Based on LSTM Autoencoder: A Study Case in Regional Grid Cluster Proposal

: A regional grid cluster proposal is required to tackle power grid complexities and evaluate the impact of decentralized renewable energy generation. However, implementing regional grid clusters poses challenges in power ﬂow forecasting owing to the inherent variability of renewable power generation and diverse power load behavior. Accurate forecasting is vital for monitoring the imported power during peak regional load periods and surplus power generation exported from the studied region. This study addressed the challenge of multistep bidirectional power ﬂow forecasting by proposing an LSTM autoencoder model. During the training stage, the proposed model and baseline models were developed using autotune hyperparameters to ﬁne-tune the models and maximize their performance. The model utilized the last 6 h leading up to the current time (24 steps of 15 min intervals) to predict the power ﬂow 1 h ahead (4 steps of 15 min intervals) from the current time. In the model evaluation stage, the proposed model achieved the lowest RMSE and MAE scores with values of 32.243 MW and 24.154 MW, respectively. In addition, it achieved a good R 2 score of 0.93. The evaluation metrics demonstrated that the LSTM autoencoder outperformed the other models for multistep forecasting task in a regional grid cluster proposal.


Introduction 1.Background
The reduction of greenhouse gas emissions is imperative, and a viable means to achieve this is by promoting the integration of renewable energy (RE) into power grids [1].The proliferation of decentralized energy systems in electricity grid networks, mainly through the deployment of wind generators and photovoltaic (PV) systems, has fundamentally transformed the power supply system, transitioning it from a centralized and unidirectional structure to a decentralized and bidirectional structure.However, because a considerable proportion of such systems are connected to the power grid at low voltage levels, novel challenges arise, including issues related to energy management and bidirectional power flow [2].This is because the variability in electrical power generation, such as the intermittency of wind intensity and solar radiation energy, results in a mismatch between the electricity demand and supply.Furthermore, the arrangement of installed renewable energy systems is heavily influenced by geographical and meteorological factors, as highlighted in the literature [3].Hence, it is crucial to develop effective solutions to address these challenges [4].
According to the literature [2], which is our primary reference, regional analysis plays a vital role in addressing the aforementioned challenges, particularly in the context of weather-dependent renewable energies.This is because it accounts for the regional variations in power generation and consumption.A regional power grid model can facilitate and modifying circuits to accommodate facilities or loads that may require power during peak load or feed surplus power to the main power supply.
In this study, there is a research gap due to the existing literature focusing on forecasting power flow in power systems but lacking specific research on multistep forecasting bidirectional power flow in regional grid cluster study cases.Meanwhile, there are already some studies addressing power flow forecasting.For instance, reference [22] introduces a novel deep learning model, called multichannel long short-term memory (TL-MCLSTM) with time location, for multistep short-term power consumption forecasting in the smart grid.Similarly, reference [23] proposes a method for multistep time series forecasting utilizing LSTM-recurrent neural network (RNN).The proposed method provides several advantages such as better data pattern fitting, less manual effort, and higher predictive accuracy.A further investigation on multistep forecasting has been conducted and reported in reference [24], where the authors have employed a residual convolutional neural network (R-CNN) with multilayered long short-term memory (ML-LSTM) architecture.The proposed methodology has exhibited a substantial reduction in error rates when compared with baseline models.
Another study, presented in reference [25], provides a 2D convolutional neural network (CNN) for multistep short-term electric load forecasting.The authors found that using this model can significantly reduce the number of trainable parameters, including training time, model size, and computation requirements.Additionally, a similar study mentioned in reference [14] also utilized a CNN and combined it with a chaotic optimization algorithm for multistep short-term solar radiation forecasting.The authors claim that this model can achieve accuracy and robustness, thereby improving the guidance for power grid dispatching.In [26], the multistep forecasting task on electricity load was solved by using a hybrid gated recurrent unit (GRU) with a feedforward neural network.The authors mentioned that the proposed approach can achieve better results compared to other methods in predicting the demand for charging stations in the short-term horizon.
Previous literature reviews have mentioned that deep learning is a powerful method that can be used to overcome multistep forecasting task, however, there is still a need for more research in this domain.Additionally, there is a gap in comparative analysis between different deep learning model variants for multistep forecasting of power flow in regional grid clusters.To address the research gap mentioned above, this study has the research objective of developing a multistep forecasting approach for power flow within a regional grid cluster, specifically dedicated to bidirectional power flow.The multistep forecasting approach in this study is designed for four steps with a 15 min interval, which corresponds to forecasting 1 h ahead.The reason for choosing this interval is that we are working with a power measurement dataset that has a 15 min interval.Forecasting 1 h ahead is intended to capture critical generation-load information [5] about the network cluster, such as the amount of power imported during high regional load and the surplus power generation exported from the investigated region.
The objective of this research is to conduct a multistep forecasting of power flow within a regional grid cluster through the utilization of LSTM autoencoder, which is a variation of the LSTM family of models.Several studies have employed the deep learning model for multistep forecasting.Table 1 provides a comprehensive overview of the literature on power flow forecasting and the utilization of deep learning models for multistep forecasting.The existing literature suggests that the LSTM model is highly effective in short-term forecasting.Nevertheless, there are numerous LSTM model architectures that can be implemented for the same purpose.In the current study, we introduce LSTM autoencoder as an appropriate model for multistep forecasting of power flow.This paper provides some technical contributions as follows:

•
This study performs a comparative analysis between LSTM autoencoder and four distinct LSTM family architectures for multistep forecasting, which, as far as the authors are aware, have not been subject to a comparative analysis in prior literature.

•
This study presents a 1-h-ahead (four steps of 15 min intervals) forecasting approach for power flow specifically tailored for a regional grid cluster application.

Power flow forecasting
Jost et al. [19] Using an extreme learning machine postprocessing technique to forecast the vertical power flow.
Brauns et al. [20] Using LSTM model with updating process for vertical power flow forecasting.
Paretkar et al. [21] Implementing Box and Jenkins ARIMA for predicting power flow in the short term on significant transmission interconnections.
Sing et al. [25] Using 2D CNN for multistep short-term electric load forecasting.

Duan et al. [14]
Proposing CNN with chaotic aquila optimization algorithm for multistep short-term solar radiation forecasting.
Cheng et al. [26] Combining GRU model and feedforward neural network for multistep electricity load forecasting.
Our approach (multistep forecasting of power flow) Proposing LSTM autoencoder for multistep forecasting of power flow.
The subsequent sections of this paper are organized as follows: Section 2 presents a succinct summary of the deep learning model architectures employed.Section 3 elaborates on the case studies pertaining to grid network cluster and the dataset utilized, while Section 4 delineates the proposed methodology.Section 5 showcases the outcomes and corresponding discussions, and the Section 6 concludes the paper with closing remarks.

Long Short-Term Memory (LSTM) Structure
A recurrent neural network (RNN) is a type of deep learning model that is particularly well suited for processing sequential or time series data [27].Due to its capacity for learning from training data, the RNN is frequently employed in solving ordinal or temporal problems.The RNN distinguishes itself from other deep learning models by incorporating a memory mechanism that allows it to leverage information from past inputs to influence the present input and output, in contrast to other models that assume independence between inputs and outputs.The recurrent neural network is notorious for its susceptibility to the issues of exploding and vanishing gradients [28], which arise due to the backpropagation through time (BPTT) algorithm employed by the RNN to compute gradients during the training process.These problems can cause suboptimal performance and slow training times for RNNs.To mitigate these issues, alternative models such as the long short-term memory (LSTM) and gated recurrent unit (GRU) models have been developed.
Hochreiter and Schmid Huber [29] initially introduced the long short-term memory model to address the issue of long-term dependence and alleviate the vanishing gradient problem, which is not feasible with the standard RNN model.The LSTM (see Figure 1) model is designed with memory cells and gates to effectively manage information flow and retain information over extended periods.As a result, the LSTM has become a popular deep learning model that is applied in a wide range of prediction and forecasting tasks.Broadly speaking, an LSTM network comprises memory blocks called cells, each having two states: the cell state and the hidden state.The LSTM network utilizes these cells to make critical decisions by selectively retaining or discarding information about significant components [7].These components, called gates, are structured into forget gates, input gates, and output gates.As depicted in Figure 1, the LSTM model operates in three stages: during the first stage, the network employs the forget gate to determine which information is to be retained or discarded for the cell state.This process involves the input at the current time step (xt) and the previous hidden state value (hs(t−1)), both of which are subjected to the sigmoid function (Sg).The calculation for the forget gate (  ) is expressed as follows.
During the second phase, the network's calculation persists by transforming the previous cell state, (Cs(t−1)), to a new cell state, (Cst).This operation involves the selection of updated information that needs to be incorporated in the long-term memory (cell state).
The updated cell state is obtained by considering the input gate (  ), forget gate, and cell update gate values (′  ).The mathematical equations for determining the output values of these gates are illustrated below.
Upon the completion of cell state updating, the final step entails ascertaining the value of the hidden state, (hs(t)), which acts as the network's memory by retaining the previous data and facilitating predictions.To achieve this, the calculation process must incorporate the reference value of the updated cell state and the output gate (ogt).The formula that characterizes this process is presented below.
The foregoing equations pertain solely to a discrete time interval.Consequently, these formulas necessitate recalculation for the ensuing time increment.Accordingly, in Broadly speaking, an LSTM network comprises memory blocks called cells, each having two states: the cell state and the hidden state.The LSTM network utilizes these cells to make critical decisions by selectively retaining or discarding information about significant components [7].These components, called gates, are structured into forget gates, input gates, and output gates.As depicted in Figure 1, the LSTM model operates in three stages: during the first stage, the network employs the forget gate to determine which information is to be retained or discarded for the cell state.This process involves the input at the current time step (x t ) and the previous hidden state value (hs (t−1) ), both of which are subjected to the sigmoid function (Sg).The calculation for the forget gate ( f g t ) is expressed as follows.
During the second phase, the network's calculation persists by transforming the previous cell state, (Cs (t−1) ), to a new cell state, (Cs t ).This operation involves the selection of updated information that needs to be incorporated in the long-term memory (cell state).The updated cell state is obtained by considering the input gate (ig t ), forget gate, and cell update gate values (Cs t ).The mathematical equations for determining the output values of these gates are illustrated below.
Upon the completion of cell state updating, the final step entails ascertaining the value of the hidden state, (hs (t) ), which acts as the network's memory by retaining the previous data and facilitating predictions.To achieve this, the calculation process must incorporate the reference value of the updated cell state and the output gate (og t ).The formula that characterizes this process is presented below.
The foregoing equations pertain solely to a discrete time interval.Consequently, these formulas necessitate recalculation for the ensuing time increment.Accordingly, in the event of a 24-step series, the aforementioned equations must be recomputed 24 times for each temporal phase, respectively.
The weight matrices (w f , wi, wc, wo) and biases (b f , b i , b c , b o ) are stationary parameters, lacking temporal dependence.Hence, these matrices remain unaltered across successive time increments, that is, they persist as constants throughout the computation of output sequences for varying timesteps.

LSTM Autoencoder
The LSTM autoencoder is a specific type of autoencoder that is designed to handle sequential data by incorporating LSTM layers [13].This architecture, as depicted in Figure 2, is widely used in sequence-to-sequence tasks, such as time series forecasting.The input sequence is encoded by the first LSTM layer, which learns a compressed representation of the data.A dense layer can be added after the LSTM layer to extract essential features from the encoded representation before passing it to the repeat vector layer.The repeat vector layer repeats the encoded representation multiple times, enabling it to be decoded back into the original sequence format.the event of a 24-step series, the aforementioned equations must be recomputed 24 times for each temporal phase, respectively.
The weight matrices ( ,  , ,  ) and biases (  ,   ,   ,   ) are stationary parameters, lacking temporal dependence.Hence, these matrices remain unaltered across successive time increments, that is, they persist as constants throughout the computation of output sequences for varying timesteps.

LSTM Autoencoder
The LSTM autoencoder is a specific type of autoencoder that is designed to handle sequential data by incorporating LSTM layers [13].This architecture, as depicted in Figure 2, is widely used in sequence-to-sequence tasks, such as time series forecasting.The input sequence is encoded by the first LSTM layer, which learns a compressed representation of the data.A dense layer can be added after the LSTM layer to extract essential features from the encoded representation before passing it to the repeat vector layer.The repeat vector layer repeats the encoded representation multiple times, enabling it to be decoded back into the original sequence format.The second LSTM layer decodes the repeated vector and reconstructs the original sequence.Refining the reconstructed sequence and improving its fidelity to the input can be achieved by adding another dense layer after the LSTM layer.It is worth noting that the number of units and layers used in each LSTM and dense layer can vary depending on the specific task and data under consideration.Moreover, it is critical to train the model with an appropriate loss function because the loss function is a part of the optimization algorithms.It is used to estimate the loss of the model, allowing the weights to be updated and reducing the loss in subsequent evaluations.Additionally, different loss functions can have varying impacts on deep learning models as they capture different aspects of the optimization problem.Therefore, the choice of loss functions depends on the specific task and behavior of the model [30].

Grid Network Cluster
The organization of power grids into distinct voltage levels enables the efficient transmission and distribution of electrical energy across various equipment, such as transformers and transmission lines.However, the recent proliferation of dynamic grid topologies and the installation of renewable power generation systems, such as photovoltaic (PV) and wind systems, within distribution systems have introduced bidirectional power flow through transformers and posed significant challenges to the overlaid grid system.This has been further exacerbated by the increasing usage of feedlines by new commercial and industrial loads in distribution grids, contributing to high power transport.The inherent variability of power generation from renewable energy resources and the diverse behavior of power loads have made power flow forecasting a formidable task.The second LSTM layer decodes the repeated vector and reconstructs the original sequence.Refining the reconstructed sequence and improving its fidelity to the input can be achieved by adding another dense layer after the LSTM layer.It is worth noting that the number of units and layers used in each LSTM and dense layer can vary depending on the specific task and data under consideration.Moreover, it is critical to train the model with an appropriate loss function because the loss function is a part of the optimization algorithms.It is used to estimate the loss of the model, allowing the weights to be updated and reducing the loss in subsequent evaluations.Additionally, different loss functions can have varying impacts on deep learning models as they capture different aspects of the optimization problem.Therefore, the choice of loss functions depends on the specific task and behavior of the model [30].

Grid Network Cluster
The organization of power grids into distinct voltage levels enables the efficient transmission and distribution of electrical energy across various equipment, such as transformers and transmission lines.However, the recent proliferation of dynamic grid topologies and the installation of renewable power generation systems, such as photovoltaic (PV) and wind systems, within distribution systems have introduced bidirectional power flow through transformers and posed significant challenges to the overlaid grid system.This has been further exacerbated by the increasing usage of feedlines by new commercial and industrial loads in distribution grids, contributing to high power transport.The inherent variability of power generation from renewable energy resources and the diverse behavior of power loads have made power flow forecasting a formidable task.
To address these challenges, a regional grid network cluster has been developed to simplify the power grid system and facilitate the analysis of decentralized power generation from renewable energy sources, referring to our preliminary result in the literature [2].This cluster is designed to be located at the connection point between the transmission system operator (TSO) and the distribution system operator (DSO) through a grid reduction procedure, enabling a comprehensive analysis of power generation and consumption patterns and the loads on the power lines.This analytical tool offers valuable insights into the behavior of power systems under different scenarios and conditions and provides a basis for designing, optimizing, and predicting local power systems while integrating different generation and consumption sources.Similarly reference [31] proposed a clustering of power networks to decompose a large interconnected power network into smaller loosely coupled groups to facilitate easy and flexible management of the power transmission systems by allowing secondary voltage control at regional levels and controlled islanding that aims to prevent the spreading of large-area blackouts.Another study [32] proposed power grid network partitioning and clusters for a splitting a power grid system into separate parts with self-sufficient power generation.Internal connectivity is maximized within the individual clusters and they minimize the power deficiency or surplus.
The importance of grid network clusters extends beyond the analysis of existing power systems, as they can also aid in the design and optimization of power systems and the prediction of power exchange between external grid systems [2]. Figure 3 illustrates an example of a regional electrical grid topology that encompasses low-voltage (LV), overlaid medium-voltage (MV), and high-voltage (HV) levels, under the distribution grid system.The region receives power supply from two connected substations, and the circle area delineates one network cluster.Within this network cluster, a multitude of power generations and loads are aggregated from different voltage levels.Our study focuses on the feedlines from both sides of the network cluster, which consists of six feedlines supplied by two connected substations, as detailed in the literature [2].By measuring the power flow in the feedlines, researchers and grid system operators can gain a better understanding of the system's behavior and identify the potential power balance between local power generation and consumption.To address these challenges, a regional grid network cluster has been developed to simplify the power grid system and facilitate the analysis of decentralized power generation from renewable energy sources, referring to our preliminary result in the literature [2].This cluster is designed to be located at the connection point between the transmission system operator (TSO) and the distribution system operator (DSO) through a grid reduction procedure, enabling a comprehensive analysis of power generation and consumption patterns and the loads on the power lines.This analytical tool offers valuable insights into the behavior of power systems under different scenarios and conditions and provides a basis for designing, optimizing, and predicting local power systems while integrating different generation and consumption sources.Similarly reference [31] proposed a clustering of power networks to decompose a large interconnected power network into smaller loosely coupled groups to facilitate easy and flexible management of the power transmission systems by allowing secondary voltage control at regional levels and controlled islanding that aims to prevent the spreading of large-area blackouts.
Another study [32] proposed power grid network partitioning and clusters for a splitting a power grid system into separate parts with self-sufficient power generation.Internal connectivity is maximized within the individual clusters and they minimize the power deficiency or surplus.
The importance of grid network clusters extends beyond the analysis of existing power systems, as they can also aid in the design and optimization of power systems and the prediction of power exchange between external grid systems [2]. Figure 3 illustrates an example of a regional electrical grid topology that encompasses low-voltage (LV), overlaid medium-voltage (MV), and high-voltage (HV) levels, under the distribution grid system.The region receives power supply from two connected substations, and the circle area delineates one network cluster.Within this network cluster, a multitude of power generations and loads are aggregated from different voltage levels.Our study focuses on the feedlines from both sides of the network cluster, which consists of six feedlines supplied by two connected substations, as detailed in the literature [2].By measuring the power flow in the feedlines, researchers and grid system operators can gain a better understanding of the system's behavior and identify the potential power balance between local power generation and consumption.

Bidirectional Power Flow Dataset
This study focuses on a regional high-voltage subnet situated in the north-east region of Germany, which has already been documented in the literature [2].For our investigation, we utilized a simplified grid, depicted in Figure 4, which is a visual representation of a network cluster comprising six feedlines that supply power to and

Bidirectional Power Flow Dataset
This study focuses on a regional high-voltage subnet situated in the north-east region of Germany, which has already been documented in the literature [2].For our investigation, we utilized a simplified grid, depicted in Figure 4, which is a visual representation of a network cluster comprising six feedlines that supply power to and receive power from two interconnected substations, namely Sub_A and Sub_B.Four feedlines (Line 3, Line 4, Line 5, Line 6) are connected to Sub_A, while two feed lines (Line 1, Line 2) are connected to Sub_B.Based on the simplified grid, the actual implementation involves the interconnection of lines in the grid in a parallel manner.Specifically, Line 1 and Line 2 are parallel, Line 3 and Line 4 are parallel, and Line 5 and Line 6 are parallel.Consequently, based on observations from data measurements, it has been inferred that the parallel lines in the cluster exhibit similar power flow patterns, which are distinct from the remaining lines, as illustrated in Figure 5. Consequently, based on observations from data measurements, it has been inferred that the parallel lines in the cluster exhibit similar power flow patterns, which are distinct from the remaining lines, as illustrated in Figure 5.The power measurement of the feeder lines enables us to acquire vital information about the generation and load of the grid cluster, such as the amount of power imported during periods of high regional load and the quantity of surplus generation exported from the cluster under investigation [2,7].In this study real power measurement data are utilized to analyze and predict the regional power balance.To this end, we acquire directional feedline power measurement data with a 15 min temporal resolution from the local distribution system operators.These directional power flow data span from 1 January 2019 to 31 December 2019, and an instance of power flow in the network cluster studied in January 2019 is presented in Figure 5.The sign of the active power indicates the direction of power flow between the busbar and the cluster since the power measurement is bidirectional.A positive value signifies that power flows from the busbar to the cluster, while a negative value indicates power flow from the cluster to the busbar.Conversely, negative active power values indicate the exported power from the cluster to the busbar, while positive values indicate the import of power from the busbar to the cluster.The power measurement of the feeder lines enables us to acquire vital informa about the generation and load of the grid cluster, such as the amount of power impo during periods of high regional load and the quantity of surplus generation exported the cluster under investigation [2,7].In this study real power measurement data utilized to analyze and predict the regional power balance.To this end, we acq directional feedline power measurement data with a 15 min temporal resolution from local distribution system operators.These directional power flow data span fro January 2019 to 31 December 2019, and an instance of power flow in the network cl studied in January 2019 is presented in Figure 5.The sign of the active power indicate direction of power flow between the busbar and the cluster since the power measurem is bidirectional.A positive value signifies that power flows from the busbar to the clu while a negative value indicates power flow from the cluster to the busbar.Conver negative active power values indicate the exported power from the cluster to the bu while positive values indicate the import of power from the busbar to the cluster.The power measurement of the feeder lines enables us to acquire vital information about the generation and load of the grid cluster, such as the amount of power imported during periods of high regional load and the quantity of surplus generation exported from the cluster under investigation [2,7].In this study real power measurement data are utilized to analyze and predict the regional power balance.To this end, we acquire directional feedline power measurement data with a 15 min temporal resolution from the local distribution system operators.These directional power flow data span from 1 January 2019 to 31 December 2019, and an instance of power flow in the network cluster studied in January 2019 is presented in Figure 5.The sign of the active power indicates the direction of power flow between the busbar and the cluster since the power measurement is bidirectional.A positive value signifies that power flows from the busbar to the cluster, while a negative value indicates power flow from the cluster to the busbar.Conversely, negative active power values indicate the exported power from the cluster to the busbar, while positive values indicate the import of power from the busbar to the cluster.
In this study, the primary objective is to predict the power net 1 h in advance, which refers to the total power flow of all feedlines in the investigated network cluster.The power net denotes the power flowing either from the busbar to the cluster or from the cluster to the busbar.As illustrated in Table 2 which shows an example of the dataset used, the dataset contains the power values from all feedlines and the power net in this study.Mathematically, the power net is calculated at a specific point in time (i) by summing up the power flowing at a specific time (i) through Line 1, Line 2, Line 3, Line 4, Line 5, and Line 6.The corresponding equation is presented below.

Proposed Methodology
In this study, relevant primary data on bidirectional power flow were gathered from an examined power grid, and data cleansing and filtration were conducted prior to their application.The resulting high-quality data facilitated the training and testing of the proposed deep learning model for power flow forecasting in a simplified network cluster.The proposed methodology comprises three main categories after the data collection stage: data preprocessing, model construction, and model evaluation.An overview of the proposed methodology is illustrated in Figure 6.
In this study, the primary objective is to predict the power net 1 h in advance, which refers to the total power flow of all feedlines in the investigated network cluster.The power net denotes the power flowing either from the busbar to the cluster or from the cluster to the busbar.As illustrated in Table 2 which shows an example of the dataset used, the dataset contains the power values from all feedlines and the power net in this study.Mathematically, the power net is calculated at a specific point in time (i) by summing up the power flowing at a specific time (i) through Line 1, Line 2, Line 3, Line 4, Line 5, and Line 6.The corresponding equation is presented below.

Proposed Methodology
In this study, relevant primary data on bidirectional power flow were gathered from an examined power grid, and data cleansing and filtration were conducted prior to their application.The resulting high-quality data facilitated the training and testing of the proposed deep learning model for power flow forecasting in a simplified network cluster.The proposed methodology comprises three main categories after the data collection stage: data preprocessing, model construction, and model evaluation.An overview of the proposed methodology is illustrated in Figure 6.

Data Collection and Data Preprocessing
Data collection is a crucial step because all further steps depend on the availability of the data.It involves gathering all the necessary data from available sources.In this study, we solely utilized univariate time series data of the total bidirectional power flow of all feedlines in the investigated network cluster (power flow net).The reason for this is the lack of data availability for other external inputs, such as weather variables.Moreover, our study indicates that weather data have no strong correlation with the power flow net.This is due to the regional grid network including a combination of inherent variability in

Data Collection and Data Preprocessing
Data collection is a crucial step because all further steps depend on the availability of the data.It involves gathering all the necessary data from available sources.In this study, we solely utilized univariate time series data of the total bidirectional power flow of all feedlines in the investigated network cluster (power flow net).The reason for this is the lack of data availability for other external inputs, such as weather variables.Moreover, our study indicates that weather data have no strong correlation with the power flow net.This is due to the regional grid network including a combination of inherent variability in power generation from renewable energy resources and the diverse behavior of power loads.
After the data collection step, data preprocessing plays a pivotal role in transforming raw data into a compatible format for deep learning models by incorporating various techniques.In the present study, diverse methodologies were employed, including handling missing values, data normalization, sliding window, and dataset partitioning.

Step 1: Dealing with Missing Values
As the measurement data collected may contain missing values, which may result from device measurement malfunctions or errors in data collection, it is essential to address them to prevent potential sampling bias.Moreover, forecasting models typically require continuous and complete time series data [33], making it necessary to handle missing values appropriately.In this study, we employed the interpolation method to fill in missing values in the time series dataset by estimating them based on neighboring data points.

Step 2: Data Normalization
The dataset used in this study comprises bidirectional power flow data with varying scales.This difference in scale can have an impact on the performance of deep learning models during the learning process [34].Therefore, it is necessary to normalize the dataset to mitigate this issue.In this study, we employed the numerical scaling method of min-max normalization.The formula for converting the original values to normalized values is shown in the following Equation (8).
The equation for normalizing a value, denoted as x , is based on the original value, x, as well as the maximum value of x (max(x)) and the minimum value of x (min(x)).

Step 3: Sliding Window
Following the normalization of the dataset, a sliding window technique was employed to convert the structured time series data into a supervised learning format comprising multiple subsequences [35].This approach was necessary as the forecasting model aimed to address a supervised learning problem, where the dataset must include input patterns (x) and output patterns (y).The sliding window approach leveraged previous time step as the input variable and the value of the following time step as the output variable.This process involved sliding a window of a fixed size along the time series dataset to generate multiple subsequences.
The primary objective of this study was to forecast power flow 1 h ahead.As a result, the time series data were transformed into the necessary format for multistep forecasting, which involves predicting multiple future time steps in a sequence.The optimal length of the input and output variables when utilizing a sliding window approach for the forecasting task depends on several factors, such as the specific time series data, patterns, and dependencies in the data.Therefore, there is no specific answer to the significant optimal length of input and output variables.However, there are considerations that can be implemented to select the lengths of the input and output variables.
One such consideration is the data granularity factor.The granularity factor can impact the window size.If the data have fine-grained observations (such as hourly or daily), a smaller window size may be needed to capture relevant patterns.On the other hand, if the data are aggregated at a higher level (e.g., monthly or yearly), a larger window size may be necessary.In this study, the length of the input and output variables was determined based on the granularity factor and considering computational time.It was recognized that a larger or smaller window size can introduce different computational time requirements during the model training stage.Based on our observation, the last 6 h (24 steps of 15 min intervals) of the time series data following the current time were used as input data.The value 1 h ahead (four steps of 15 min) of the current time was used as the output.The sliding window approach adopted is illustrated in Figure 7, where the yellow bar represents the length of the input variable and the red one represents the output variable.While blue bar represents current time.
(24 steps of 15 min intervals) of the time series data following the current tim as input data.The value 1 h ahead (four steps of 15 min) of the current time the output.The sliding window approach adopted is illustrated in Figure yellow bar represents the length of the input variable and the red one r output variable.While blue bar represents current time.

Step 4: Dataset Splitting
Data splitting is a crucial stage that involves dividing a dataset i validation, and testing sets.The training dataset is used to train a deep lea while the validation dataset is used to evaluate the performance of the mod training process.Moreover, the testing dataset is used to assess the final perf generalization capabilities of the trained model.In this study, we performed after reorganizing the structure of the time series dataset into a superv format.There is no optimal percentage ratio for splitting the dataset.How references indicate several ways to divide the dataset.For example, in refer the studies provide information on splitting the dataset with a ratio of 90% and 10% for testing.In [7,[37][38][39][40][41], a scenario of 70% for the training dataset validation dataset, and 15% for the test dataset is used.Based on these re research study specifically allocated 80% of the total dataset for train validation, and 10% of the data for testing purposes.
In the splitting process, our study does not recommend dividing randomly for training, validation, and testing when performing forecasting because the time series data used in this study have a temporal order, and make predictions on future data based on past observations.Based on th randomly shuffling and splitting the dataset can lead to invalidating the for due to future information leakage and performance estimation bias.

Model Construction with Autotune Hyperparameter
During the model construction stage, we developed several baseline mo the proposed model's performance.The common baseline models used fo tasks have included simple RNN [8,42], LSTM [43,44], GRU [45,46], and LSTM [13,47], whereas the proposed model was an LSTM autoencoder mod used in this study were developed based on the TensorFlow [48] and Keras The designs of all structures and layers of the models can be observed in Ta the development of training, all deep learning models were built w hyperparameters.The main reason for this was to automatically search fo values of hyperparameters, providing benefits to deep learning models such There is no optimal percentage ratio for splitting the dataset.However, existing references indicate several ways to divide the dataset.For example, in references [15,36], the studies provide information on splitting the dataset with a ratio of 90% for training and 10% for testing.In [7,[37][38][39][40][41], a scenario of 70% for the training dataset, 15% for the validation dataset, and 15% for the test dataset is used.Based on these references, our research study specifically allocated 80% of the total dataset for training, 10% for validation, and 10% of the data for testing purposes.
In the splitting process, our study does not recommend dividing the dataset randomly for training, validation, and testing when performing forecasting tasks.This is because the time series data used in this study have a temporal order, and the goal is to make predictions on future data based on past observations.Based on this reasoning, randomly shuffling and splitting the dataset can lead to invalidating the forecasting task due to future information leakage and performance estimation bias.

Model Construction with Autotune Hyperparameter
During the model construction stage, we developed several baseline models to assess the proposed model's performance.The common baseline models used for forecasting tasks have included simple RNN [8,42], LSTM [43,44], GRU [45,46], and bidirectional LSTM [13,47], whereas the proposed model was an LSTM autoencoder model.All models used in this study were developed based on the TensorFlow [48] and Keras libraries [49].The designs of all structures and layers of the models can be observed in Table 3.During the development of training, all deep learning models were built with autotune hyperparameters.The main reason for this was to automatically search for the optimal values of hyperparameters, providing benefits to deep learning models such as improved performance, time efficiency, and resource efficiency.

Model Evaluation
The process of model evaluation is a crucial step in assessing the precision and performance of all compared models using metric scores.In this study, prior to implementing model evaluation, the prediction results from the models and testing dataset were transformed into their original values, since their prior form was in a normalized state.
In this study, the selection of evaluation metrics was based on recommendations derived from previous research and reports in the domain of predictive modeling.These metrics encompassed the root mean square error (RMSE) [7,13], which is used to calculate the square root of the average of the squared differences between the predicted and actual values, the mean absolute error (MAE) [3,50], which measures the average magnitude of the errors without considering their direction, and the coefficient of determination (R 2 ) [51,52], which measures the proportion of the variance in the dependent variable that is explained by the independent variables in the model.
When evaluating forecasting models, RMSE and MAE are metrics typically used to assess the accuracy of model predictions, and a lower value of these metrics indicates better performance of the trained model.In contrast, R 2 is used to evaluate the overall quality of the model and assess how well it explains the variation in the data.A higher score of the R 2 metric indicates a better fit of the model.The formulas for computing these metrics used in this study are illustrated in the following equations.
The actual value O at time t is denoted as O t and the predicted value as Ôt , where − O is the mean value of O, and N is the total number of observations.In this study, all metric evaluations used were based on the scikit-learn library [53].

Comparison of Deep Learning Models in Training Stage
This section presents the results of our proposed LSTM autoencoder model compared with the baseline models during the training stage.All models were designed to forecast the net power flow value of a network cluster 1 h ahead.The structures of all the models compared in this study are based on the information provided in Table 3.
This study employed the autotune hyperparameter technique with the hyperband algorithm from the Keras tuner to optimize the deep learning models.The application of this technique encompasses both the baseline models and the proposed LSTM autoencoder throughout the model training stage.During the model development and training stages in this study, the autotuning hyperparameter technique was used to search for optimal values of key hyperparameters, such as the number of neurons in the hidden layers, preferred activation function, and appropriate learning rate value for the optimization method.This allowed us to automatically determine the most suitable configurations for these hyperparameters and optimize the performance of the deep learning models.
The utilization of the autotune hyperparameter technique yields significant advantages, including improved efficiency in the model training process and enhanced model quality [54].Moreover, it mitigated the need for laborious manual exploration of numerous hyperparameter combinations, which are often unavailable and require meticulous selection for deep learning models.The hyperband algorithm as a tuner was adopted in this study to optimize the hyperparameters in our deep learning models.This algorithm utilizes a successive halving method to iteratively eliminate poorly performing configurations [55].By employing the hyperband algorithm with appropriate settings during the model training phase, the hyperparameter space was efficiently explored, resulting in identification of the optimal configuration for our deep learning models.In this study, a hyperband tuner was configured to minimize the validation loss.In addition, the maximum number of epochs for training each model configuration was set to ten, and the algorithm employed a factor of three to determine the number of configurations in each bracket.
After identifying the optimal hyperparameter configurations for all the models, the models were trained using the training and validation datasets.The original structure of these datasets consists of time series data representing the total power flow of all feedlines (power net value) in the investigated network cluster.The training dataset comprised 80% of the total time series data, covering a period from 1 January 2019, with a 15 min interval, to 20 October 2019, at midnight.The validation dataset comprised 10% of the data, spanning 20 October 2019, at 12:15 a.m. with a 15 min interval, to 25 November 2019, at 11:15 a.m.These percentages indicate the ratio of the time series data used in this study.Furthermore, the structure of the datasets was converted into a supervised learning format by employing the sliding window technique, enabling them to be fed into the deep learning model.In terms of dataset size after their size was reorganized, the training dataset comprised 28,010 samples, each consisting of 24 time steps and 1 feature.Similarly, the validation dataset contains 3501 samples, with each sample encompassing 24 time steps and 1 feature.
During the model training stage, all models with configured hyperparameters were trained on the computer listed in Table 4.The models were executed and fitted with a configuration in which the number of epochs was set to 100, and the batch size was set to 32.Furthermore, during model compilation, all models were set with the Adam optimizer and mean squared error (MSE) as the loss function.After the training process, the loss value was recorded to provide an indication of how well the model learned from the training data.In this section, the MSE loss function was used to calculate the average squared difference between the predicted and actual values.small loss value, as all models tend to have fluctuating loss values throughout all epochs.This learning curve can diagnose the presence of an insufficient representation of the validation dataset, which implies that the data provided are inadequate for evaluating the model's generalization capability.This scenario can be identified by observing the learning curve, where the training loss curve appears to be a suitable fit, while the validation loss curve displays erratic fluctuations around the training loss [56].Monitoring the duration of the training process is vital for assessing the efficiency of models.It allows us to gain insights into the duration of model training and identify potential issues that may arise, such as the need for adjustments in batch size to improve training efficiency.Valuable insights can also be obtained regarding the performance and behavior of the models during the training process.Figure 9 shows that the simple RNN model had a prolonged training duration, which may be attributed to the inherent vanishing gradient problem, as discussed in Section 2. Conversely, bidirectional LSTM and LSTM autoencoder models have similarly extended training durations due to their complex architecture, as compared to simple LSTM and GRU models.training, whereas the other models were similar and tended to have small loss values.In the validation dataset, it is challenging to determine which model performs well with a small loss value, as all models tend to have fluctuating loss values throughout all epochs.This learning curve can diagnose the presence of an insufficient representation of the validation dataset, which implies that the data provided are inadequate for evaluating the model's generalization capability.This scenario can be identified by observing the learning curve, where the training loss curve appears to be a suitable fit, while the validation loss curve displays erratic fluctuations around the training loss [56].

Performance Comparison of Deep Learing Models
Monitoring the duration of the training process is vital for assessing the efficiency of models.It allows us to gain insights into the duration of model training and identify potential issues that may arise, such as the need for adjustments in batch size to improve training efficiency.Valuable insights can also be obtained regarding the performance and behavior of the models during the training process.Figure 9 shows that the simple RNN model had a prolonged training duration, which may be attributed to the inherent vanishing gradient problem, as discussed in Section 2. Conversely, bidirectional LSTM and LSTM autoencoder models have similarly extended training durations due to their complex architecture, as compared to simple LSTM and GRU models.8, all models generally exhibit a decrease in loss values over several epochs in the training dataset but a fluctuation in the validation dataset.The LSTM model had the highest loss value during training, whereas the other models were similar and tended to have small loss values.In the validation dataset, it is challenging to determine which model performs well with a small loss value, as all models tend to have fluctuating loss values throughout all epochs.This learning curve can diagnose the presence of an insufficient representation of the validation dataset, which implies that the data provided are inadequate for evaluating the model's generalization capability.This scenario can be identified by observing the learning curve, where the training loss curve appears to be a suitable fit, while the validation loss curve displays erratic fluctuations around the training loss [56].

Performance Comparison of Deep Learing Models
In this section, we present an evaluation of the performance and generalization ability of the trained model on a new dataset, referred to as the testing dataset, which was not used during the training or validation phases.The main objective of this stage is to assess the expected performance of the model in real-world scenarios.To achieve this, we employed several metrics to evaluate the models, including the RMSE, MAE, and R 2 .
In this study, we developed a function to evaluate the performance of a deep learning model.This function considers the input features of the test dataset (x_test) and the corresponding output features, labeled y_test, which serve as the ground truth.This function allows the trained forecasting model to make predictions based on the input features.The prediction results, along with the ground truth, were converted to their original scales.Subsequently, the function iterates a certain number of times in a loop, allowing for the individual evaluation of each element using relevant evaluation metrics.
In Table 5, we present the evaluation results of each trained deep learning model using the testing dataset.It is evident that the proposed LSTM autoencoder model outperforms the other models in terms of the RMSE and MAE metrics, achieving the lowest scores of 32.243 MW and 24.154 MW, respectively.Additionally, the R 2 score indicated that this model demonstrated a higher value, further confirming its superior performance.However, it is important to note that among all the compared models, the GRU model exhibits similarities to the proposed model, as it obtains the second-lowest scores in terms of RMSE and MAE, while also having the same score in the R 2 metric as the proposed model.Nevertheless, when comparing the training time, it can be observed that the LSTM autoencoder model requires a slightly longer training process compared to the GRU model.In Figure 10, we display an example of the multistep power forecasting results of the bidirectional power flow from all trained models, including the proposed model and baseline models.The testing input dataset used in this section covers the last 6 h, consisting of 24 steps with a 15 min interval, following the current time.The time span of the input data ranges from '2019-11-29T10:45:00' to '2019-11-29T16:30:00'.The input data provided to all trained models are expected to predict the bidirectional power flow 1 h ahead (four steps of 15 min) after the current time in the investigated network cluster.The expected output data cover the time span from '2019-11-29T16:45:00' to '2019-11-29T17:30:00'.
assess the expected performance of the model in real-world scenarios.To achieve this, we employed several metrics to evaluate the models, including the RMSE, MAE, and R 2 .
In this study, we developed a function to evaluate the performance of a deep learning model.This function considers the input features of the test dataset (x_test) and the corresponding output features, labeled y_test, which serve as the ground truth.This function allows the trained forecasting model to make predictions based on the input features.The prediction results, along with the ground truth, were converted to their original scales.Subsequently, the function iterates a certain number of times in a loop, allowing for the individual evaluation of each element using relevant evaluation metrics.
In Table 5, we present the evaluation results of each trained deep learning model using the testing dataset.It is evident that the proposed LSTM autoencoder model outperforms the other models in terms of the RMSE and MAE metrics, achieving the lowest scores of 32.243 MW and 24.154 MW, respectively.Additionally, the R 2 score indicated that this model demonstrated a higher value, further confirming its superior performance.However, it is important to note that among all the compared models, the GRU model exhibits similarities to the proposed model, as it obtains the second-lowest scores in terms of RMSE and MAE, while also having the same score in the R 2 metric as the proposed model.Nevertheless, when comparing the training time, it can be observed that the LSTM autoencoder model requires a slightly longer training process compared to the GRU model.Given the superior performance of our proposed model compared with other baseline models, as demonstrated in the model evaluation results, we present an extended forecast using our LSTM autoencoder.This extension involves expanding the test dataset to capture an additional four steps of 15 min interval forecast results from a moving window of the input dataset.The primary objective of this process is to provide enhanced insights and establish greater credibility in the forecasted results.
In Figure 11, we depict the continuation of the output forecast results for power flow in the network cluster.Specifically, we examine the scenario in which the input values transition every hour (consisting of four steps of 15 min intervals) following the starting time of '2019-11-29T10:45:00' as illustrated in Figure 10.According to the model evaluation results, our proposed model showed good performance, as indicated by lower scores on evaluation metrics, such as RMSE and MAE, and a high score of R 2 .Although the LSTM autoencoder is not a novel model, it has been employed in various studies and has consistently demonstrated a good performance.For example, in [13], the LSTM autoencoder was used to forecast 1 h ahead for solar power for participants in the intraday electricity market.The model achieved impressive performance with average RMSE and MAE values of 12.87 kW and 6.91 kW, respectively.Another study [57] also demonstrated the superiority of the LSTM autoencoder for power load forecasting.This model integrated long-term and short-term features of the samples and exhibited better performance with an MAE score of less than 52 MW when comparing the prediction results to the actual load values in the results.

Conclusions
The proposed regional grid cluster simplifies the power grid and facilitates the analysis of decentralized power generation.It is placed between the TSO and DSO via
Energies 2023,16,  x FOR PEER REVIEW 8 of 21receive power from two interconnected substations, namely Sub_A and Sub_B.Four feedlines (Line 3, Line 4, Line 5, Line 6) are connected to Sub_A, while two feed lines (Line 1, Line 2) are connected to Sub_B.Based on the simplified grid, the actual implementation involves the interconnection of lines in the grid in a parallel manner.Specifically, Line 1 and Line 2 are parallel, Line 3 and Line 4 are parallel, and Line 5 and Line 6 are parallel.

Figure 5 .
Figure 5. Power flow in all lines of the network cluster.

Figure 5 .
Figure 5. Power flow in all lines of the network cluster.Figure 5. Power flow in all lines of the network cluster.

Figure 5 .
Figure 5. Power flow in all lines of the network cluster.Figure 5. Power flow in all lines of the network cluster.

Figure 7 .
Figure 7. Sliding window approach.4.1.4.Step 4: Dataset Splitting Data splitting is a crucial stage that involves dividing a dataset into training, validation, and testing sets.The training dataset is used to train a deep learning model, while the validation dataset is used to evaluate the performance of the model during the training process.Moreover, the testing dataset is used to assess the final performance and generalization capabilities of the trained model.In this study, we performed data splitting after reorganizing the structure of the time series dataset into a supervised learning format.There is no optimal percentage ratio for splitting the dataset.However, existing references indicate several ways to divide the dataset.For example, in references[15,36], the studies provide information on splitting the dataset with a ratio of 90% for training and 10% for testing.In[7,[37][38][39][40][41], a scenario of 70% for the training dataset, 15% for the validation dataset, and 15% for the test dataset is used.Based on these references, our research study specifically allocated 80% of the total dataset for training, 10% for validation, and 10% of the data for testing purposes.In the splitting process, our study does not recommend dividing the dataset randomly for training, validation, and testing when performing forecasting tasks.This is because the time series data used in this study have a temporal order, and the goal is to make predictions on future data based on past observations.Based on this reasoning, randomly shuffling and splitting the dataset can lead to invalidating the forecasting task due to future information leakage and performance estimation bias.

Figure 8
was constructed to monitor the performance of the model on both the training and validation data, where the x-axis represents the number of training epochs and the y-axis represents the loss value.

Figure 8 .
Figure 8. Learning Curve in Training and Validation dataset.

Figure 9 .
Figure 9. Training Duration of Forecasting Model.

Figure 8 .
Figure 8. Learning Curve in Training and Validation dataset.The training loss curve displays the loss function evaluated for the training data during each training epoch, whereas the validation loss curve displays the loss function evaluated for the validation data during each epoch.As depicted in Figure8, all models generally exhibit a decrease in loss values over several epochs in the training dataset but a fluctuation in the validation dataset.The LSTM model had the highest loss value during training, whereas the other models were similar and tended to have small loss values.In the validation dataset, it is challenging to determine which model performs well with a small loss value, as all models tend to have fluctuating loss values throughout all epochs.This learning curve can diagnose the presence of an insufficient representation of the validation dataset, which implies that the data provided are inadequate for evaluating the model's generalization capability.This scenario can be identified by observing the learning curve, where the training loss curve appears to be a suitable fit, while the validation loss curve displays erratic fluctuations around the training loss[56].Monitoring the duration of the training process is vital for assessing the efficiency of models.It allows us to gain insights into the duration of model training and identify potential issues that may arise, such as the need for adjustments in batch size to improve training efficiency.Valuable insights can also be obtained regarding the performance and behavior of the models during the training process.Figure9shows that the simple RNN model had a prolonged training duration, which may be attributed to the inherent vanishing gradient problem, as discussed in Section 2. Conversely, bidirectional LSTM and LSTM autoencoder models have similarly extended training durations due to their complex architecture, as compared to simple LSTM and GRU models.

Energies 2023 ,
16, x FOR PEER REVIEW 14 of 21 OS Windows 11 Home 64 bit The training loss curve displays the loss function evaluated for the training data during each training epoch, whereas the validation loss curve displays the loss function evaluated for the validation data during each epoch.As depicted in Figure

Figure 8 .
Figure 8. Learning Curve in Training and Validation dataset.Monitoring the duration of the training process is vital for assessing the efficiency of models.It allows us to gain insights into the duration of model training and identify potential issues that may arise, such as the need for adjustments in batch size to improve training efficiency.Valuable insights can also be obtained regarding the performance and behavior of the models during the training process.Figure9shows that the simple RNN model had a prolonged training duration, which may be attributed to the inherent vanishing gradient problem, as discussed in Section 2. Conversely, bidirectional LSTM and LSTM autoencoder models have similarly extended training durations due to their complex architecture, as compared to simple LSTM and GRU models.

Figure 9 .
Figure 9. Training Duration of Forecasting Model.

Figure 9 .
Figure 9. Training Duration of Forecasting Model.

Figure 10 ,
we display an example of the multistep power forecasting results of the bidirectional power flow from all trained models, including the proposed model and baseline models.The testing input dataset used in this section covers the last 6 h, consisting of 24 steps with a 15 min interval, following the current time.The time span of the input data ranges from '2019-11-29T10:45:00' to '2019-11-29T16:30:00'.The input data provided to all trained models are expected to predict the bidirectional power flow 1 h ahead (four steps of 15 min) after the current time in the investigated network cluster.The expected output data cover the time span from '2019-11-29T16:45:00' to '2019-11-29T17:30:00'.

Figure 10 .
Figure 10.Multistep power flow forecast of all trained models.

Figure 10 .
Figure 10.Multistep power flow forecast of all trained models.

Table 1 .
Summary of related work.

Table 2 .
Example of bidirectional dataset used.

Table 2 .
Example of bidirectional dataset used.

Table 3 .
Deep Learning Model structures.

Table 5 .
Performance Evaluation of Forecasting Model.

Table 5 .
Performance Evaluation of Forecasting Model.