Deep Learning-Assisted Short-Term Load Forecasting for Sustainable Management of Energy in Microgrid

: Nowadays, supplying demand load and maintaining sustainable energy are important issues that have created many challenges in power systems. In these types of problems, short-term load forecasting has been proposed as one of the management and energy supply modes in power systems. In this paper, after reviewing various load forecasting techniques, a deep learning method called bidirectional long short-term memory (Bi-LSTM) is presented for short-term load forecasting in a microgrid. By collecting relevant features available in the input data at the training stage, it is shown that the proposed procedure enjoys important properties, such as its great ability to process time series data. A microgrid in rural Sub-Saharan Africa, including household and commercial loads, was selected as the case study. The parameters affecting the formation of household and commercial load proﬁles are considered as input variables, and the total household and commercial load proﬁles of the microgrid are considered as the target. The Bi-LSTM network is trained by input variables to forecast the microgrid load on an hourly basis by recognizing the consumption pattern. Various performance evaluation indicators such as the correlation coefﬁcient ( R ), mean squared error (MSE), and root mean squared error ( RMSE ) are utilized to analyze the forecast results. In addition, in a comparative approach, the performance of the proposed method is compared and evaluated with other methods used in similar studies. The results presented for the training phase show an accuracy of R = 99.81% for the Bi-LSTM network. The test and load forecasting stage are performed by the Bi-STLM network, with an accuracy of R = 99.34% and forecasting errors of MSE = 0.1042 and RMSE = 0.3243. The results conﬁrm the high performance of the proposed Bi-LSTM technique, with a high correlation coefﬁcient when compared to other methods used for short-term load forecasting.


Introduction
Nowadays, the significant increase in power consumption has led to the development of and fundamental changes in power grids. Power grids now consist of large-scale power plants based on nuclear power and fossil fuels. These power plants are considered to be the primary source of energy production [1,2]. In these forms of generation, unilateral energy transmission is conducted from centralized power plants (microgrids) to energy consumers. Due to several factors, including the possible lack of fossil fuels and the dangers of rising greenhouse gases, renewable energies like solar energy and wind energy are becoming much more popular as clean and novel energy sources [3,4]. To address this trend, many efforts were made to create a novel type of power system by combining communication and optimization theory to have optimal power management. A smart grid is a next-generation power grid dependent on various sources of energy, such as renewable energy. A smart Inventions 2021, 6, 15 2 of 11 grid aims to employ energy generation and consumption data by smart meters to manage energy efficiently [5,6].
Distributed generation (DG) was discussed as one of the specifications of renewable energy generation, which is done by different types of small, grid-connected, or distribution system-connected devices [7]. DGs are also commonly referred to as on-site generation, district energy, and distributed energy. Unlike traditional power plants, the generation and consumption of energy through renewable energy resources requires small-scale power plants which, in this case, will make the power system infrastructure more complex in general. In this situation, and to solve this problem, the microgrid is utilized as a building block to ensure the reliability and efficiency of the power system infrastructure [8].
In fact, a microgrid is a small-scale grid and a suitable solution for integrating variable and unpredictable renewable energy sources into distribution networks. Based on the structure and scale, a microgrid is very cost-effective in terms of infrastructure transfer [9]. Basically, the definition of a microgrid can be considered in such a way that, in order to form a self-sufficient energy system, it collects locally distributed generation sources, along with controllable loads and energy storage equipment. Short-term load forecasting is known as a principle function for the microgrid energy management system, particularly if different renewable energy resources are integrated with the microgrid. In addition, shortterm forecasting can be considered an essential tool for microgrid operators to maintain continuous network performance and increase economic gains [10,11].
Short-term load forecasting is a function that has been performed in various ways so far. However, in the previous studies, one can mainly point out some conventional procedures that have been presented to forecast the load, which are as follows: persistence, statistical, physical, artificial neural network (ANN), machine learning, deep learning, and hybrid techniques [12][13][14]. In a valuable study [15], a variety of data-driven techniques were introduced and employed in a comparative approach to solve the necessary forecasting problems in the power grid. In [16], applications of ANN models were used to forecast the amount of wind and solar power in the microgrids. In a review paper [17], various types of ANN algorithms and issues with their application in the microgrid were reviewed. Short-term load forecasting in a microgrid was done in [14] by producing a novel hybrid technique based on support vector regression (SVR) and long short-term memory (LSTM) models. In [18], short-term load forecasting was performed using a hybrid technique of machine learning applications called seasonality-adjusted support vector machines (SSA-SVM). In [19], load forecasting in a microgrid was done using various deep learning algorithms, multilayer perceptron (MLP), and a support vector machine (SVM). In some other studies, combinations of machine learning models with optimization algorithms for load forecasting have been proposed. In [20], the microgrid load was forecasted using the hybridized model of an SVM and particle swarm optimization (PSO) in a short-term horizon. In another valuable study [21], a combined approach of a wavelet transform (WT) with a fruit fly optimization (FFO) algorithm for short-term load forecasting was proposed. In most cases, this type of hybrid model, due to the high dimension of input data, has more problems, such as overfitting, due to the time series characteristics of the relevant data not being able to identify the appropriate pattern of data.
Each of the reviewed studies attempted to forecast the load in a short-term time horizon using a variety of techniques. In some of them, the methods used were not commensurate with the available data and caused a decrease in the forecast accuracy due to factors such as overfitting and data missing in the training phase. In some others, the selected procedure was such that it suffered from time series feature modeling of the data, and this action reduced the accuracy of the forecasting. Most importantly, in some studies, the chosen method was not able to forecast the amount of load during peak hours, which caused problems in the network operator's scheduling.
It is noteworthy that deep learning techniques have been used as a powerful tool in recent studies. They have also performed well at preprocessing, processing, and extracting features from raw data and addressing the problems mentioned in short-term forecasting issues [22]. In previous studies, some deep learning techniques for processing and predicting time series data, such as convolutional neural networks (CNNs), deep autoencoders (DAEs), recurrent neural networks (RNNs), and deep belief neural networks (DBNNs), are presented. Each of these introduced techniques has some unique advantages and disadvantages. The DAE and DBNN techniques suffer at understanding long dependencies in time series samples related to anticipation time [23], while the CNN method with the least number of cells and memory can extract the basic features of the time series data. However, filter selection and the number of layers are issues that, if not properly selected, can cause problems such as overfitting in the training phase. The RNN-based techniques, such as LSTM and gated recurrent unit (GRU), usually perform well in time series data processing and can model complex time-dependent nonlinear parameters. The GRU networks mainly extract features that are not obtained by LSTM networks and are less complex than the LSTM. However, some studies have shown that the LSTM algorithms, due to forward training, have suffered from problems such as missing data and overfitting when recognizing patterns of large-volume data [24].
In this paper, in order to forecast the short-term load in the microgrid and solve the problems related to the reviewed methods, a deep learning techniques called bidirectional LSTM (Bi-LSTM) is proposed. Bi-LSTM is a time series-based technique that considers all data behavior in a time period. It should be noted that Bi-LSTM is proposed for the first time in this paper to forecast the short-term load in microgrids. Data affecting the network load have a long-term interconnected behavior and pattern. Accordingly, the bidirectional movement of the proposed method and the interconnected and related structure of its layers eliminates problems such as missing data and overfitting in the training phase. In comparing the Bi-LSTM technique with other models reviewed in the literature, some of the structural and inherent advantages of Bi-LSTM, such as learning the forward rule of data information as well as the backward rule of data information, indicate the strong performance of this technique.
In general, the contribution and practical tips of this paper can be highlighted as follows: • Implementing a learning-based approach that, with its high skill, passes the training phase without problems such as missing data and overfitting; • Forecasting microgrid load without considering meteorological data that are not available in remote areas; • Modeling of microgrid load consumption for a short-term time horizon (one hour) based on different household and commercial consumption loads; • Evaluating the performance of the Bi-LSTM technique in the training phase and the results of load forecasting with different performance evaluation indicators, as well as presenting a comparative approach to express the effectiveness of the suggested method.
In the next sections, the paper is organized as follows. The suggested method is described in Section 2. The case study is introduced in Section 3. The results of the shortterm load forecasting are presented in Section 4. Finally, the conclusion of the paper is done in Section 5.

Bidirectional Long Short-Term Memory (Bi-LSTM)
In recent years, the application of deep learning has been significantly considered and used in various scientific and industrial fields. As such, deep learning techniques are used today in various applications in power and energy systems, such as fault detection [25,26], cyberattack detection [27], renewable power plant potential measurement [28], non-intrusive load monitoring [29,30], and load forecasting [14,31]. Deep learning has different techniques, each of which is skilled in specific applications due to its unique structure. In this paper, to solve short-term load forecasting in microgrids, one of the most powerful deep learning techniques, called Bi-LSTM, is proposed.
Bi-LSTM is a deep learning application used for classification, regression, pattern recognition, and feature extraction applications. One of the salient features of this technique is its excellent performance against time series data [32]. Bi-LSTM, as an extension of the traditional LSTM [33], is trained on the input sequence, with two LSTMs set up in reverse order (see Figure 1). The LSTM layer reduces the vanishing gradient problem and allows the use of deeper networks compared with recurrent neural networks (RNNs) [34,35].
Inventions 2021, 6, x FOR PEER REVIEW 4 of 12 ferent techniques, each of which is skilled in specific applications due to its unique structure. In this paper, to solve short-term load forecasting in microgrids, one of the most powerful deep learning techniques, called Bi-LSTM, is proposed. Bi-LSTM is a deep learning application used for classification, regression, pattern recognition, and feature extraction applications. One of the salient features of this technique is its excellent performance against time series data [32]. Bi-LSTM, as an extension of the traditional LSTM [33], is trained on the input sequence, with two LSTMs set up in reverse order (see Figure 1). The LSTM layer reduces the vanishing gradient problem and allows the use of deeper networks compared with recurrent neural networks (RNNs) [34,35]. In the structure of the traditional RNN and the LSTM model, the propagation of information happens in a forward path, in which case the time depends only on the information before the time . In the Bi-LSTM network, unlike in traditional LSTM, flowing the information from the backward layer to the forward layer and upside down is performed by employing a hidden state [36]. Additionally, the advantage of Bi-LSTM over convolutional neural networks (CNNs) is its dependency on the sequence of inputs by taking the forward and backward paths into account. The Bi-LSTM model behaves the same with all inputs. The mathematical formulations of Bi-LSTM are presented in detail in [36].

Case Study
In this paper, a rural microgrid in Sub-Saharan Africa was selected as the case study. The specifications and data related to this microgrid, which was a freely available dataset, constituted the input variables and outputs of the dataset used in this paper [14]. The access and use of electrical energy for South African citizens is a human rights matter that is guaranteed by government policies. However, some problems, such as the lack of a sustainable electricity supply, plague many remote rural areas. Accordingly, this paper In the structure of the traditional RNN and the LSTM model, the propagation of information happens in a forward path, in which case the time t depends only on the information before the time t. In the Bi-LSTM network, unlike in traditional LSTM, flowing the information from the backward layer to the forward layer and upside down is performed by employing a hidden state [36]. Additionally, the advantage of Bi-LSTM over convolutional neural networks (CNNs) is its dependency on the sequence of inputs by taking the forward and backward paths into account. The Bi-LSTM model behaves the same with all inputs. The mathematical formulations of Bi-LSTM are presented in detail in [36].

Case Study
In this paper, a rural microgrid in Sub-Saharan Africa was selected as the case study. The specifications and data related to this microgrid, which was a freely available dataset, constituted the input variables and outputs of the dataset used in this paper [14]. The access and use of electrical energy for South African citizens is a human rights matter that is guaranteed by government policies. However, some problems, such as the lack of a sustainable electricity supply, plague many remote rural areas. Accordingly, this paper selected a microgrid in South Africa as the case study in order to provide solutions for energy management and sustainable energy supplies. The studied microgrid included household and commercial loads, which constituted the total load consumption of the microgrid. In the existing dataset, the household load was modeled based on factors such as the number of households (NoH) available and the percentage of high-income (HI), middle-income (MI), and low-income (LI) households. Factors such as water pumping (WP), grain milling (GM), and the amount of clinics, small shops (SS), schools, and street lighting (SL) also modeled the commercial load. The modeling was done to calculate the load on an hourly basis and in a one hour interval. As an example, Figure 2 shows three  Table 1 also introduces the prevailing conditions for the formation of load profiles, shown in Figure 2. selected a microgrid in South Africa as the case study in order to provide solutions for energy management and sustainable energy supplies. The studied microgrid included household and commercial loads, which constituted the total load consumption of the microgrid. In the existing dataset, the household load was modeled based on factors such as the number of households (NoH) available and the percentage of high-income (HI), middle-income (MI), and low-income (LI) households. Factors such as water pumping (WP), grain milling (GM), and the amount of clinics, small shops (SS), schools, and street lighting (SL) also modeled the commercial load. The modeling was done to calculate the load on an hourly basis and in a one hour interval. As an example, Figure 2 shows three examples of 24 h load profiles under different conditions in this microgrid. Table 1 also introduces the prevailing conditions for the formation of load profiles, shown in Figure 2.

Simulation Results
Short-term load forecasting in a microgrid using the Bi-LSTM method required a dataset containing effective hourly data on the microgrid load and the related load profiles. Hourly parameters related to the characteristics of households and equipment and commercial places were considered as input variables, and the hourly load profile resulting from these characteristics was selected as the output variable. The existing dataset contained 240 samples of 24 h load profiles which, by considering the data associated with each hour as an input sample, would eventually form 5760 × 11 matrix for the Bi-LSTM input dataset. The designed Bi-LSTM network was trained by 70% of the data. Then, in the test phase, using the rest of the data, it forecasted the microgrid load over the one-hour intervals.
After the training and test stages, the results of each learning-based network should be evaluated using performance appraisal indicators. This expresses the accuracy of each network at each step and clearly indicates how close the forecasted or estimated values are to the actual values. In this paper, the Bi-LSTM network performance was evaluated by indicators such as the correlation coefficient (R), mean squared error (MSE), and root mean squared error (RMSE). The R-index showed a kind of correlation between the forecasted values and the real values, and the maximum values of R indicated the high accuracy of the network. The MSE and RMSE indicators showed the prediction error that were calculated for these indices for each sample and, finally, a mean value was calculated for the network performance in the whole dataset. The proximity of the MSE and RMSE indicator values to zero indicated the accuracy of the network performance [37]. The mathematical formulation for calculating each of the indicators used in this paper is as follows [38]: where x i and y i represent the actual values and forecasted values, respectively, and x and y are the means of the actual values and forecasted values, respectively. Figure 3 shows the performance of the Bi-LSTM network in the training phase using the R-index and in regression form.  Figure 3 shows the performance of the Bi-LSTM network in the training phase using the R-index and in regression form. As shown in Figure 3, the Bi-LSTM network was able to pass the training stage with good performance. Figure 4 shows the amount of network error in the training phase. It can be seen that the training error rate was very small, and these results indicated good network training. When trained with high accuracy, the network would be able to ideally  As shown in Figure 3, the Bi-LSTM network was able to pass the training stage with good performance. Figure 4 shows the amount of network error in the training phase. It can be seen that the training error rate was very small, and these results indicated good network training. When trained with high accuracy, the network would be able to ideally identify test data with an estimated model and forecast their values well. As shown in Figure 3, the Bi-LSTM network was able to pass the training stage with good performance. Figure 4 shows the amount of network error in the training phase. It can be seen that the training error rate was very small, and these results indicated good network training. When trained with high accuracy, the network would be able to ideally identify test data with an estimated model and forecast their values well. After training, the test data, which was 30% of the input data set, was used for prediction as input to the trained network. At this stage, the trained Bi-LSTM estimated the load profile for each sample based on the data behavior pattern in the training stage. Figure 5a shows the load forecasting results by Bi-LSTM in the test stage. In order to clearly observe the performance of the proposed method in forecasting the microgrid load, Figure  5b shows 100 samples related to the test data in zoom mode, which is presented in Figure  5a. After training, the test data, which was 30% of the input data set, was used for prediction as input to the trained network. At this stage, the trained Bi-LSTM estimated the load profile for each sample based on the data behavior pattern in the training stage. Figure 5a shows the load forecasting results by Bi-LSTM in the test stage. In order to clearly observe the performance of the proposed method in forecasting the microgrid load, Figure 5b shows 100 samples related to the test data in zoom mode, which is presented in Figure 5a.
The forecasted load was, in fact, the sum of the household and commercial loads related to the microgrid at each hour. The high correlation of the forecasted values with the actual values of the load profile confirmed the good performance of the trained network. It can be seen that the network accuracy coefficient was an acceptable value (R = 0.9934), and the amount of microgrid consumption could be estimated at any time. Figures 6 and 7 also show the network prediction error in the forms of the MSE, RMSE, and a histogram, respectively.   According to Figure 6, the amount of network prediction error in each sample can be seen. The error values obtained in this figure (e.g., MSE = 0.1042 and RMSE = 0.3243) were the averages of the prediction errors in all samples. Figure 7 also shows the minimum and maximum network errors in forecasting the value of each sample. One can see that, in the worst-case scenario, the largest error by the Bi-LSTM was in forecasting the microgrid load between the numbers −0.6 and 0.6.
After presenting the results obtained with the proposed Bi-LSTM network and eval-  According to Figure 6, the amount of network prediction error in each sample can be seen. The error values obtained in this figure (e.g., MSE = 0.1042 and RMSE = 0.3243) were the averages of the prediction errors in all samples. Figure 7 also shows the minimum and maximum network errors in forecasting the value of each sample. One can see that, in the worst-case scenario, the largest error by the Bi-LSTM was in forecasting the microgrid According to Figure 6, the amount of network prediction error in each sample can be seen. The error values obtained in this figure (e.g., MSE = 0.1042 and RMSE = 0.3243) were the averages of the prediction errors in all samples. Figure 7 also shows the minimum and maximum network errors in forecasting the value of each sample. One can see that, in the worst-case scenario, the largest error by the Bi-LSTM was in forecasting the microgrid load between the numbers −0.6 and 0.6.
After presenting the results obtained with the proposed Bi-LSTM network and evaluating its performance with other statistical indicators, the comparison with other similar works to confirm its effectiveness proceeded. To this end, the results of similar works presented in recent years were compared with the results obtained by Bi-LSTM via performance evaluation indicators. Table 2 shows the results of this comparison.  In the evaluation performed in Table 2, the performance of different machine learning and deep learning methods were compared with each other. As can be seen, deep learning methods offer better performance than more conventional machine learning methods. The reason for this superiority is proper training and extracting the appropriate pattern from the input data. Among the compared methods, the Bi-STM network proposed in this paper was able to surpass other methods with better performance. In using data mining techniques, choosing the right method with the available data is one of the most important issues. The proposed Bi-LSTM procedure can be selected as a tool to perform other time series-based data predictions in power and energy systems.

Conclusions
Short-term load forecasting in power and energy systems is a key technique to improve the supplying of a demand load and other energy management planning. The purpose of this paper was to forecast the short-term load in a rural microgrid in Sub-Saharan Africa in order to supply the demand load and access to sustainable energy. To this end, one of the deep learning algorithms, called bidirectional long short-term memory (Bi-LSTM), was proposed. Unlike other deep learning techniques, the Bi-LSTM method, due to its unique structure and bidirectional training routine, offers a strong ability to process large-volume and time series data. In addition, avoiding the problems of missing data and overfitting during the training phase can be mentioned as other benefits of Bi-LSTM. Data related to household and commercial loads were collected in the studied system as a Bi-LSTM input dataset. The Bi-LSTM network was trained with the input data and then forecasted the microgrid load in one-hour intervals. The results of the forecast were analyzed by various performance evaluation indicators such as the correlation coefficient (R), mean squared error (MSE), and root mean squared error (RMSE). The trained LSTM network was able to forecast the microgrid load in the short-term time horizon with an accuracy of R = 80% and the lowest error values (in this case, MSE = 0.1042 and RMSE = 0.3243). Then, in a comparative approach, the results of the Bi-LSTM network were evaluated with other algorithms used in similar works. The results of the evaluations emphasized the effectiveness of the suggested procedure in the short-term load forecasting of microgrids compared with other solutions.
One future work direction is the application, implementation, and evaluation of the proposed methodology for the load forecasting of other microgrids in the presence of renewable energy resources.