Short-Term Load Forecasting of Microgrid via Hybrid Support Vector Regression and Long Short-Term Memory Algorithms

: Short-Term Load Forecasting (STLF) is the most appropriate type of forecasting for both electricity consumers and generators. In this paper, STLF in a Microgrid (MG) is performed via the hybrid applications of machine learning. The proposed model is a modiﬁed Support Vector Regression (SVR) and Long Short-Term Memory (LSTM) called SVR-LSTM. In order to forecast the load, the proposed method is applied to the data related to a rural MG in Africa. Factors inﬂuencing the MG load, such as various household types and commercial entities, are selected as input variables and load proﬁles as target variables. Identifying the behavioral patterns of input variables as well as modeling their behavior in short-term periods of time are the major capabilities of the hybrid SVR-LSTM model. To present the e ﬃ ciency of the suggested method, the conventional SVR and LSTM models are also applied to the used data. The results of the load forecasts by each network are evaluated using various statistical performance metrics. The obtained results show that the SVR-LSTM model with the highest correlation coe ﬃ cient, i.e., 0.9901, is able to provide better results than SVR and LSTM, which have the values of 0.9770 and 0.9809, respectively. Finally, the results are compared with the results of other studies in this ﬁeld, which continued to emphasize the superiority of the SVR-LSTM model.


Introduction
Increasing the number of electrical energy consumers has caused problems, such as reduced reliability and stability in traditional power systems.To face such problems and better demand response, power systems must increase their generation capacity.However, there are other problems, such as increased fossil fuel consumption and environmental pollutions [1,2].As the energy crisis and the environmental crisis become more serious, Distributed Generations (DGs), as the main forms of Renewable Energy Sources (RESs), have attracted much attention in issues related to energy management and sustainability of the power systems.In addition, efforts have been made to design a new type of power systems called smart grids for more energy-efficient management [3].The main aim of the smart grid in power systems is energy management by the data corresponding to energy consumption/production via smart meters.Energy management in power grids reduces costs for the consumers and improves the reliability of the power supply.The concept of the smart grid is mainly comprised of a Microgrid (MG) as the main component [4,5].
An MG is a small regional unit of power systems whose energy generation and consumption are independent of macrogrids.Energy consumers in MGs, in addition to consuming energy, can also play a role in generating energy by small-scale energy generation sources.Such energy consumers are called prosumers.However, these customers can also be sellers if they are involved in energy generation.This means new electricity markets, such as trading between the prosumers, can be created in an MG, and unlike the available power systems, bidirectional energy transmission becomes necessary [6].Awareness of MG energy consumption can provide more information to electricity consumers and generators so that they can have proper scheduling for balance, intraday, and day-ahead markets.However, energy management in an MG can be achieved for electricity consumers and generators.Achieving a sustainable energy system in the MG requires intelligent energy management.Load forecasting can be considered as one of the most important solutions to energy management in power systems, especially in MGs [7][8][9].
Consumed electrical load forecasting is one of the most important factors that can provide useful information to power systems operators to manage and save energy.Load forecasting is not the only goal in the operation and decision-making development of distribution network infrastructure.Rather, accurate forecasting of electricity network demand is a critical and decisive tool in making short-term decisions in network operation and long-term decisions in planning the development of electricity network infrastructure [10,11].
Load forecasting can be performed in various time horizons, such as short-term (one to several days), medium-term (one week to several months), and long-term (several years) predictions for forecasting in power systems studies.However, short-term horizons are considered mainly for the optimal operation of MGs and participation in the electricity market.MG operators can utilize short-term forecast results to provide appropriate pricing in the electricity market, as well as economic load distribution and energy management at the MG level for optimal operation [12,13].Therefore, the more accurate the consumed load forecasting on the horizon ahead, the more savings are expected to be made in the operating costs of MGs and energy supply to consumers.
Recently, a wide diversity of methods has been suggested and implemented to solve the load forecasting issues.These approaches can be categorized into (1) persistence models, (2) statistics, (3) Artificial Neural Networks (ANNs), (4) Machine Learning, and (5) some hybrid approaches.A linear regression model in [14] is utilized to forecast the hourly load.The structure of the performed solution is obtained using trial and error, which is not a logical and reliable method.In [15], Short-Term Load Forecasting (STLF) is performed via a regression-based window.Choosing the exact window has a significant impact on the forecast results, and this window must be selected correctly.In [16], the persistence solution predicts the MG load in the same way (regression-based window).A Kalman filter-based model is presented in [17] for forecasting short-term household load demand.The Autoregressive Integrated Moving Average (ARIMA) model in [18] is selected for STLF.In [19], short-term load for an MG is forecasted using a Seasonal ARIMA (SARIMA) model.Similarly, other models, such as modified Autoregressive Moving Average (ARMA) and ARMA with Exogenous Variables (ARMAX) is proposed in [20][21][22] for STLF.However, due to the nonlinear features of the load, the aforementioned methods suffered from issues, such as not having enough capacity to handle the nonlinear properties of load, and are not capable of providing accurate results in load forecasting.
Later, the applications of ANN, machine learning hybrid models are utilized as an efficient tool to deal with the nonlinear features of the load.In this regard, solutions, such as backup machining and the Seasonality-Adjusted Support Vector Machines (SSA-SVM) in [23][24][25] are utilized for STLF.The models used in these studies sometimes do not reach ideal results due to the high-dimensional input data.In some other studies, issues with these models are improved using optimization algorithms.These models include optimized-SVM by Particle Swarm Optimization (PSO) [26], Genetic Algorithm (GA) with SVM [27], SVM based on Firefly Algorithm (FFA) [28,29], optimized-SVM by Grasshopper Optimization Algorithm [30], Empirical Mode Decomposition (EMD) with PSO-SVM [31].In some other studies related to STLF, the hybrid approaches of Wavelet Transform (WT) with Fruit Fly Optimization (FFO) based on the Least Square Support Vector Machine (LSSVM) are utilized [32,33].Additionally, in [34], load forecasting for an MG is performed using a hybrid FFO-based General Regression Neural Networks (GRNN).In [35], a solution of SVM based on the Dragonfly Algorithm (DA-SVM) is suggested to forecast the short-term load of an MG located in an offshore oil field.A three-stage architecture based on Self-Organizing Map (SOM), K-Means algorithm, and a Multilayer Perceptron (MLP) is utilized in [36] for STLF in MGs.In [37], aggregated power load in community MG is forecasted via a developed model of a Deep Recurrent Neural Networks (DRNN) with Long Short-Term Memory (LSTM) units.To forecast load demand in MG, applications of deep learning is modeled based on a hybrid of Convolutional Neural Networks (CNN) and a Gated Recurrent Unit (GRU) called CNN-GRU in [38].In addition, in [38], the proposed hybrid model is compared with other load forecast methods, such as CNN-LSTM, 2-Dimensional (2D) CNN, GRU, LSTM, ARMIA, k-Nearest Neighbor (kNN) and Neural Network Ensemble (NNE).A Bi-directional LSTM unit-based DRNN model called DRNN Bi-LSTM is proposed in [4] to supply precise aggregated electrical load demand and the forecasting of photovoltaic power production.An enhanced framework for energy management is introduced in [39] to efficiently investigate the uncertainties caused by climate change in an MG.The suggested framework utilizes a ten-stage Markov chain to produce stochastic solar radiation and a procedure based on recursive least-squares for effective participation in electricity market bidding programs.In [40], power generation forecasting at Lancaster University is performed via a hybrid model of Radial Basis Function (RBF) and K-Means clustering applications.In an MG energy management framework, the prediction of load demand, wind energy, and solar power generation is performed considering temperature, meteorological and historical data for different time horizons as input via the ANN and SVR (Support Vector Regression) techniques.In [41], Maximum Power Point Tracking (MPPT) is investigated in the installed photovoltaic arrays in an MG using a fuzzy control method.The MG load forecasting in the French metropolis considering power consumption data as output variables is performed by one of the deep learning applications called LSTM [42].In this study, in order to improve the performance of the LSTM and provide highly accurate prediction results, a GA is used to optimize the parameters of the LSTM network.In order to decrease the total cost of operations and improve the energy efficiency in an MG [43], a hybrid PSO and Opposition-based learning Gravitational Search Algorithm (PSO-OGSA) is utilized to solve the optimization problem considering different constraints.Plug-in electric vehicle STLF is performed in a smart grid by the GRNN technique to perform energy management [44].
According to the above-mentioned literature on STLF in the MG, it is observed that there is a gap for improvement in this field by presenting various methods.Providing a solution that is capable of solving nonlinear load properties and acts highly efficient in dealing with high-dimensional data that have effects on network load can be a major step in tackling the load forecasting problem.Demand load forecasting of an MG is a time series operation and the main problem of previous methods is the inability to process time-series data.Despite offering many energy management solutions through load forecasting, many of them suffer from the need for meteorological data.Given that meteorological data are not always available for future time horizons, providing a procedure that can provide accurate forecast results for the network structure by solving this problem can be an important step in load forecasting and energy management.
In this paper, data related to time series and their behavioral pattern are processed to forecast the MG load in the short term by one of the machine learning approaches called Support Vector Regression-LSTM (SVR-LSTM).The proposed hybrid model is a combination of the SVR and LSTM methods.The SVR-LSTM is applied to data related to an urban MG in Sub-Saharan Africa.Due to its structure and high ability to process time-series data when they have a high dimensional, SVR-LSTM is capable of improving the problems of conventional solutions and providing high performance in forecasting results.In addition, the SVR and LSTM methods are applied to the same data to compare with the results provided by the SVR-LSTM.Finally, the forecasted load accuracy by the models presented in this paper is compared and evaluated with the results of other studies.
The continuation of this paper is organized as follows.Section 2 introduces the suggested solutions.The simulation results are presented in Section 3. Section 4 compares the forecasted load with the results of other studies.In Section 5, the discussions of the paper are presented.Section 6 concludes the paper.

Support Vector Regression (SVR)
SVM was proposed in 1995 by Cortes and Vapnik as a machine learning method.SVM was applied to the problems of forecasting and dependency estimation [45].The SVR utilizes the same origins as the SVM for regression and function approximation with some minor differences [40,41,[46][47][48].The principle advantage of SVR is to solve regression issues and forecast future values.Among the various versions of the SVR, the classic model (ε-SVR) that is mainly used in engineering and also employed in this paper [10,49].In ε-SVR, the goal is finding a flat function, which maps the input data to output data with an error less than ε.
where b is the bias and w controls the flatness of the function, where the higher flatness level seeks smaller w, (w, x) denotes a linear function that fits input space to the feature space [45].
The above-mentioned problem can be modeled as the following convex optimization problem.
where each y i is either −1 or +1 indicating the class to which the corresponding input point x i belongs.
To deal with the infeasible constraints of the optimization issue in Equation (2), slack variables, i.e., ξ i and ξ * i , can be presented.Hence, Equation ( 2) can be restated as [10]: where i shows the number of input data, N is the number of the last instance of the input data, c > 0 is a trade-off among the smoothness of f (t) and the permissible deviation greater than ε.To extend the formulation for nonlinear functions, the dual problem of Equation ( 3) can be derived using Lagrangian multipliers, i.e., α i , α * i , η i , and η * i , forming the Lagrange function as follows [10]: After solving the dual problem function, f (x) can be derived as follows: Finally, to make the algorithm nonlinear, the training pattern, x i , can be mapped into some feature space Φ : χ → Ψ .In doing so, f (x) can be rewritten as follows [49]: where k(x i , x) is the Kernel function.
The objective is to find a smooth function in feature space instead of input space.The main structure of the SVR is displayed in Figure 1.
Sustainability 2020, 12, x FOR PEER REVIEW 5 of 17 where ( , ) is the Kernel function.
The objective is to find a smooth function in feature space instead of input space.The main structure of the SVR is displayed in Figure 1.

Long Short Term Memory (LSTM)
LSTM was suggested in 1997 by Hochreiter and Schmidhuber as a standard type of Recurrent Neural Networks (RNN) for learning and processing long-term information, time-series data, feature extraction, and pattern recognition [50,51].The problems of gradient vanishing and the explosion of long-term dependencies have been improved by replacing the LSTM with basic hidden neurons in the RNN structure.As shown in Figure 2, the LSTM includes the forget gate, input gate, update gate, and output gate in the principle structure [52].The LSTM network implements temporary memory through switch gates to prevent gradient vanishing.The main computation formula of the LSTM is as follows [52,53]: = ( (ℎ , ) + ) = tanh( (ℎ , ) + ) where , , , and determine the output values of the forget, input, update, and output gates, respectively, , , , and demonstrate the weight metrics, , , , and illustrate the bias vectors, and show the memory cell and sigmoid activation function, respectively.In addition, the inputs of the four gates contain the LSTM target value ℎ at a past time step − 1.

Long Short Term Memory (LSTM)
LSTM was suggested in 1997 by Hochreiter and Schmidhuber as a standard type of Recurrent Neural Networks (RNN) for learning and processing long-term information, time-series data, feature extraction, and pattern recognition [50,51].The problems of gradient vanishing and the explosion of long-term dependencies have been improved by replacing the LSTM with basic hidden neurons in the RNN structure.As shown in Figure 2, the LSTM includes the forget gate, input gate, update gate, and output gate in the principle structure [52].The LSTM network implements temporary memory through switch gates to prevent gradient vanishing.The main computation formula of the LSTM is as follows [52,53]: where f t , i t , g t , and o t determine the output values of the forget, input, update, and output gates, respectively, W f , W i , W g , and

Structure of the SVR-LSTM Model
Both SVR and LSTM methods have acceptable features for processing data and predicting their future values.The SVR extracts the relationship between input and output variables by creating a linear mapping.The LSTM can also estimate their behaviors well in the future by recollecting the long-term behavior of the data.The combination of these two models can estimate the long-term load demand behavior of an MG in the short term and with a high correlation coefficient.Figure 3    shows the sample set selected to estimate and gain steady series for abnormal characteristics in the .
is used for SVR 2 and LSTM training.is also intended for the LSTM training and the SVR 2 and LSTM based forecasting.The start times for , , and are indicated by ( ), ( ), and ( ), respectively.The time periods for and are also denoted by ( ) and ( ), respectively.( ) is the end time of .After forming and designing the structure of the proposed model, the SVR 1 is trained based on and produces a steady series based on .Then, the abnormal characteristics including steady series and time series are added to .These processes are repeated until sufficient samples for are generated [53,54].

Structure of the SVR-LSTM Model
Both SVR and LSTM methods have acceptable features for processing data and predicting their future values.The SVR extracts the relationship between input and output variables by creating a linear mapping.The LSTM can also estimate their behaviors well in the future by recollecting the long-term behavior of the data.The combination of these two models can estimate the long-term load demand behavior of an MG in the short term and with a high correlation coefficient.

Structure of the SVR-LSTM Model
Both SVR and LSTM methods have acceptable features for processing data and predicting their future values.The SVR extracts the relationship between input and output variables by creating a linear mapping.The LSTM can also estimate their behaviors well in the future by recollecting the long-term behavior of the data.The combination of these two models can estimate the long-term load demand behavior of an MG in the short term and with a high correlation coefficient.Figure 3     Figure 4 shows the flowchart for training and forecasting the SVR-LSTM model.S i (i {1, 2}) is a sample set of input data containing abnormal features.T i represents the sample set utilized for SVR 1 training.P i shows the sample set selected to estimate and gain steady series for abnormal characteristics in the S i .S 1 is used for SVR 2 and LSTM training.S 2 is also intended for the LSTM training and the SVR 2 and LSTM based forecasting.The start times for T i , P i , and S i are indicated by S(T i ), S(P i ), and S(S i ), respectively.The time periods for T i and P i are also denoted by T(P i ) and T(T i ), respectively.E(S i ) is the end time of S i .After forming and designing the structure of the proposed model, the SVR 1 is trained based on T i and produces a steady series based on P i .Then, the abnormal characteristics including steady series and time series are added to S i .These processes are repeated until sufficient samples for S i are generated [53,54].

Simulation Results
Using the learning methods, such as SVR and LSTM, to forecast the MG load requires a database including input and output variables.The dataset used in this paper is a freely available dataset related to the load profile of a rural MG in Sub-Saharan Africa [55].Access to electricity for South African citizens, including rural residents, is a human rights issue guaranteed by government policies.However, many remote rural areas suffer from some problems, such as the high cost of connecting to the central grid and the lack of adequately supplying the load demand.A practical approach to solving these problems and connecting rural communities to a sustainable electricity source using the MG solutions is proposed [56].For this reason, this paper focuses on energy management and achieving sustainable energy in an MG in South Africa.In this dataset, the MG load is obtained from the total household and commercial load, regardless of weather conditions.The household load is modeled by characteristics, such as the number of households, the percentage of the high-income, medium-income, and low-income households.Various commercial entities, such as water pumping for irrigation, grain milling, small shops, schools, clinics, and street lighting, are also used to model the commercial load.All modeling is performed hourly and within an hour [55].For instance, Figure 5 shows the two samples of 24-hour MG load profiles that are generated under different conditions by the input variables.Table 1 shows the values for the input variables in each instance of the illustrated load profiles in Figure 5.

Simulation Results
Using the learning methods, such as SVR and LSTM, to forecast the MG load requires a database including input and output variables.The dataset used in this paper is a freely available dataset related to the load profile of a rural MG in Sub-Saharan Africa [55].Access to electricity for South African citizens, including rural residents, is a human rights issue guaranteed by government policies.However, many remote rural areas suffer from some problems, such as the high cost of connecting to the central grid and the lack of adequately supplying the load demand.A practical approach to solving these problems and connecting rural communities to a sustainable electricity source using the MG solutions is proposed [56].For this reason, this paper focuses on energy management and achieving sustainable energy in an MG in South Africa.In this dataset, the MG load is obtained from the total household and commercial load, regardless of weather conditions.The household load is modeled by characteristics, such as the number of households, the percentage of the high-income, medium-income, and low-income households.Various commercial entities, such as water pumping for irrigation, grain milling, small shops, schools, clinics, and street lighting, are also used to model the commercial load.All modeling is performed hourly and within an hour [55].For instance, Figure 5 shows the two samples of 24-h MG load profiles that are generated under different conditions by the input variables.Table 1 shows the values for the input variables in each instance of the illustrated load profiles in Figure 5.   Table 1.Input variables in each instance of the illustrated load profiles in Figure 5 and their corresponding values.

Input Variable Figure 5a
Figure 5b Number In this paper, the utilized dataset for each of the networks contains 240 samples of 24-h load profiles.Given that the objective of this paper is to perform short-term MG load forecasting, the input variables related to each hour are considered as an input sample, and the amount of the load related to that hour is selected as its output variable.Finally, a matrix of 5760×11 forms the input matrix.Each column of this matrix corresponds to each of the factors affecting the amount of the MG load.For each network, 70% of the data is selected for training, and the rest is for testing the network.Each network that can estimate an exact communication between input and target variables in the training phase can pass this phase with high accuracy.High competency in the training phase can also generate better and more accurate results in the test phase.
Evaluating the results of the training and test process introduces the capability and efficiency of each network.The more and better the forecasted results are evaluated with the statistical performance metrics, the accuracy of the results, and the effectiveness of the prediction models can be determined.
In this paper, statistical performance metrics, such as correlation coefficient (R), Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) are introduced and utilized to assess the results.Those metrics are calculated as follows [10]: where x i and y i demonstrate the real and forecasted values of the load, respectively, and x and y are the mean of the real and forecasted values of the load, respectively.Each of such metrics represents a specific concept of evaluation of results.R demonstrates the correlation coefficient between the actual and forecasted values of the designed model.MSE depicts the mean of the squares of the errors, which is the mean squared division between actual and forecasted values of the designed model.RMSE is a quadratic error metric and illustrates the standard deviation of errors.MAE shows the mean distance between the actual and forecasted values.MAPE is often utilized in regression and time-series issues to calculate the accuracy of forecasts.The best state to evaluate the results using such metrics is the maximum value for R and the minimum values for predictive error evaluation metrics [4].
Given that forecasts are made separately for the input variables for each hour of the day, networks needed to be well-trained to be able to model the MG load behavior for future forecasts.Designed networks are trained using specific data so that they can identify patterns underlying the behavior of input variables.Figure 6 shows the training phase results for each of the networks.(c).The evaluation of the results in Figure 6 is performed using the metric.Thus, the results presented in this figure express the ability and effectiveness of the hybrid SVR-LSTM method.However, in general, each of the networks is able to pass the training phase with acceptable accuracy, despite the differences in the results.Figure 7 shows the training phase errors for each network in the MSE and RMSE formats.Table 2 compares and evaluates the results of the training phase for each network using other statistical performance metrics.The results of short-term MG load forecasting for each of the proposed models and training data are evaluated using a variety of statistical metrics.As the forecasting error of the SVR-LSTM model is less than the conventional SVR and LSTM models and has a high correlation, the evaluation results The evaluation of the results in Figure 6 is performed using the R metric.Thus, the results presented in this figure express the ability and effectiveness of the hybrid SVR-LSTM method.However, in general, each of the networks is able to pass the training phase with acceptable accuracy, despite the differences in the results.Figure 7 shows the training phase errors for each network in the MSE and RMSE formats.Table 2 compares and evaluates the results of the training phase for each network using other statistical performance metrics.(c).The evaluation of the results in Figure 6 is performed using the metric.Thus, the results presented in this figure express the ability and effectiveness of the hybrid SVR-LSTM method.However, in general, each of the networks is able to pass the training phase with acceptable accuracy, despite the differences in the results.Figure 7 shows the training phase errors for each network in the MSE and RMSE formats.Table 2 compares and evaluates the results of the training phase for each network using other statistical performance metrics.The results of short-term MG load forecasting for each of the proposed models and training data are evaluated using a variety of statistical metrics.As the forecasting error of the SVR-LSTM model is less than the conventional SVR and LSTM models and has a high correlation, the evaluation results  The results of short-term MG load forecasting for each of the proposed models and training data are evaluated using a variety of statistical metrics.As the forecasting error of the SVR-LSTM model is less than the conventional SVR and LSTM models and has a high correlation, the evaluation results indicate the efficiency of this hybrid model.Trained networks can be used as a toolbox to predict (1) new data, which is related to the future or at any time, and (2) test data, which is used at any time to forecast new data in the future.In the next step, test data are used to validate the training of each network.
Figure 8 shows the results of short-term MG load forecasts by each network for test data.In using learning methods, test results are very important.Thus, the validity of the training phase depends on the test results.Figure 8 shows the results of short-term MG load forecasts by each network for test data.In using learning methods, test results are very important.Thus, the validity of the training phase depends on the test results.The results presented in the above figures, as in the training phase, show the efficiency of the hybrid SVR-LSTM procedure in short-term MG load forecasting.To better evaluate the results and efficiency of each of the utilized models in this study, Table 3 evaluates the test results with more statistical performance metrics.The results presented in the above figures, as in the training phase, show the efficiency of the hybrid SVR-LSTM procedure in short-term MG load forecasting.To better evaluate the results and efficiency of each of the utilized models in this study, Table 3 evaluates the test results with more statistical performance metrics.The results presented in the above figures, as in the training phase, show the efficiency of the hybrid SVR-LSTM procedure in short-term MG load forecasting.To better evaluate the results and efficiency of each of the utilized models in this study, Table 3 evaluates the test results with more statistical performance metrics.In order to check the accuracy of each network in forecasting daily load on an hourly basis, a sample of forecasting test data is presented in Figure 11.This figure provides the accuracy and predictive error of each network clearly and visually for each hour of the day.The results presented in Table 3 show the efficiency of the saved networks in identifying new data, despite the validation of the training of each network.It is evident that the hybrid model of SVR-LSTM at this stage also has a good performance compared to the other two models.While comparing the two conventional models, LSTM is able to provide good results compared to the SVR for STLF in MG.The dependency of input variables on different hours of the day and following a specific behavioral pattern makes time series-based methods efficient.The proposed hybrid model can predict the demand load for each hour of the future days.However, it is possible to provide proper scheduling for electricity consumers and generators in the MG so that they can properly be managed and participate in the electricity market with proper planning.

Comparison of the Performance of the Proposed Method with the Results of Other Studies
Analyzing and comparing the results are the most important tasks in using learning methods while the performance of each model and the effectiveness of the data are obtained by evaluating and comparing various studies.A comparison of results should be performed with caution and for similar data.Most studies in this field have been conducted based on the impact of climate data, but meteorological data are not always available for the coming days.In this paper, STLF in MG is performed without considering climatic data, focusing on application conditions and electricity consumers and generators of an MG at any hour of the day.It should be noted that in using machine learning methods, selecting the appropriate method according to the available data is the most important part of the analysis.To represent the effectiveness of the suggested models in this paper, Table 4 compares the results of the methods used in this paper with the results of other studies.The comparison made in Table 4 demonstrates the efficiency and accuracy of the suggested hybrid model to forecast the short-term load in MG.In using machine learning applications, selecting the applicable model for load forecasting significantly affects the obtained results.In addition to forecasting the MG The results presented in Table 3 show the efficiency of the saved networks in identifying new data, despite the validation of the training of each network.It is evident that the hybrid model of SVR-LSTM at this stage also has a good performance compared to the other two models.While comparing the two conventional models, LSTM is able to provide good results compared to the SVR for STLF in MG.The dependency of input variables on different hours of the day and following a specific behavioral pattern makes time series-based methods efficient.The proposed hybrid model can predict the demand load for each hour of the future days.However, it is possible to provide proper scheduling for electricity consumers and generators in the MG so that they can properly be managed and participate in the electricity market with proper planning.

Comparison of the Performance of the Proposed Method with the Results of Other Studies
Analyzing and comparing the results are the most important tasks in using learning methods while the performance of each model and the effectiveness of the data are obtained by evaluating and comparing various studies.A comparison of results should be performed with caution and for similar data.Most studies in this field have been conducted based on the impact of climate data, but meteorological data are not always available for the coming days.In this paper, STLF in MG is performed without considering climatic data, focusing on application conditions and electricity consumers and generators of an MG at any hour of the day.It should be noted that in using machine learning methods, selecting the appropriate method according to the available data is the most important part of the analysis.To represent the effectiveness of the suggested models in this paper, Table 4 compares the results of the methods used in this paper with the results of other studies.The comparison made in Table 4 demonstrates the efficiency and accuracy of the suggested hybrid model to forecast the short-term load in MG.In using machine learning applications, selecting the applicable model for load forecasting significantly affects the obtained results.In addition to forecasting the MG load in a short time horizon, the SVR-LSTM model is able to provide high accuracy in evaluating different statistical performance metrics compared to the models proposed and used in other studies.

Discussion
The importance of energy management in MGs, especially in remote areas, has created many challenges.However, load forecasting programs have solved many of these issues.In this paper, by presenting a procedure that can accurately forecast the consumption load of an MG in a short-term horizon and without the need for meteorological information, the problems related to the load forecasting are solved.The proposed SVR-LSTM model is structured to cover many of the problems associated with high dimension data and data dependence on time series.The proposed model is capable of providing good performance even compared to solutions presented in other studies due to the generate an exact relationship between input and output variables in the time series.It should be noted that in using machine learning applications, especially for big data, choosing the exact method can be considered the most important stage of the project.Given that the MG is examined without the presence of RESs, for future work, issues related to RESs can also be added to solve the problem.Hence, the proposed method can be extended for the net load forecasting of MGs in the presence of RESs.Additionally, evaluating the effect of each of the input variables on the consumption load of the MG can be considered as a valuable work for future studies.

Conclusions
Forecasting the load of the Microgrid (MG) in a short-term horizon can be a very valuable achievement for the MG energy management system.Therefore, a new hybrid approach, namely Support Vector Regression-Long Short-Term Memory (SVR-LSTM) is presented in this paper for the MG load forecasting.Given that the suggested model is one of the applications of machine learning, it requires a dataset.Hence, in this paper, the SVR-LSTM is applied to the dataset related to the MG load in Sub-Saharan Africa.Surveys and data collection from the desired MG are performed without the presence of Renewable Energy Resources (RESs) in the MG and only included loads of household and commercial consumption.To present the efficiency of the designed method, the conventional SVR and LSTM models are also applied to the considered data.Designed networks are trained by the input variables to learn the behavioral patterns of the factors influencing the formation of the MG load.The forecasting results are analyzed using different statistical performance metrics.The results represent that the SVR-LSTM model with the highest value of correlation coefficient (i.e., R = 0.9901) and minimum error values is capable of providing better results than the SVR and LSTM.Among the two conventional methods, the LSTM with R = 0.9809 provides better results than the SVR with R = 0.9770.Finally, by presenting a comparative approach to the results presented in this paper with the results of other studies, the efficiency of the performed hybrid model of the SVR-LSTM is

Figure 1 .
Figure 1.The main structure of the SVR.

Figure 1 .
Figure 1.The main structure of the SVR.
W o demonstrate the weight metrics, b f , b i , b g , and b o illustrate the bias vectors, c t and σ show the memory cell and sigmoid activation function, respectively.In addition, the inputs of the four gates contain the LSTM target value h t−1 at a past time step t − 1.
shows the hybrid structure of the SVR-LSTM model in this paper.As shown in the figure, periodic features are utilized as the SVR inputs (named SVR 1s) to calculate a series of initial values of the load.The recently observed actual value is selected as a time series.The steady series and time series are examples of abnormal features used as SVR (called SVR 2) and LSTM inputs.Oy1 and Oy2 are the outcomes of SVR 2 and LSTM, respectively.The combination of Oy1 and Oy2 predicts the final output of the SVR-LSTM model [53,54].

Figure 4
Figure 4 shows the flowchart for training and forecasting the SVR-LSTM model.( {1,2}) is a sample set of input data containing abnormal features.represents the sample set utilized for SVR 1 training.showsthe sample set selected to estimate and gain steady series for abnormal characteristics in the .isused for SVR 2 and LSTM training.is also intended for the LSTM training and the SVR 2 and LSTM based forecasting.The start times for , , and are indicated by ( ), ( ), and ( ), respectively.The time periods for and are also denoted by ( ) and ( ), respectively.( ) is the end time of .After forming and designing the structure of the proposed model, the SVR 1 is trained based on and produces a steady series based on .Then, the abnormal characteristics including steady series and time series are added to .These processes are repeated until sufficient samples for are generated[53,54].
Figure 3 shows the hybrid structure of the SVR-LSTM model in this paper.As shown in the figure, periodic features are utilized as the SVR inputs (named SVR 1s) to calculate a series of initial values of the load.The recently observed actual value is selected as a time series.The steady series and time series are examples of abnormal features used as SVR (called SVR 2) and LSTM inputs.Oy1 and Oy2 are the outcomes of SVR 2 and LSTM, respectively.The combination of Oy1 and Oy2 predicts the final output of the SVR-LSTM model [53,54].
shows the hybrid structure of the SVR-LSTM model in this paper.As shown in the figure, periodic features are utilized as the SVR inputs (named SVR 1s) to calculate a series of initial values of the load.The recently observed actual value is selected as a time series.The steady series and time series are examples of abnormal features used as SVR (called SVR 2) and LSTM inputs.Oy1 and Oy2 are the outcomes of SVR 2 and LSTM, respectively.The combination of Oy1 and Oy2 predicts the final output of the SVR-LSTM model [53,54].

Figure 4
Figure 4 shows the flowchart for training and forecasting the SVR-LSTM model.( {1,2}) is a sample set of input data containing abnormal features.represents the sample set utilized for SVR 1 training.showsthe sample set selected to estimate and gain steady series for abnormal characteristics in the .isused for SVR 2 and LSTM training.is also intended for the LSTM training and the SVR 2 and LSTM based forecasting.The start times for , , and are indicated by ( ), ( ), and ( ), respectively.The time periods for and are also denoted by ( ) and ( ), respectively.( ) is the end time of .After forming and designing the structure of the proposed model, the SVR 1 is trained based on and produces a steady series based on .Then, the abnormal characteristics including steady series and time series are added to .These processes are repeated until sufficient samples for are generated[53,54].

Figure 4 .
Figure 4. Flowchart of the SVR-LSTM training and prediction.

Figure 4 .
Figure 4. Flowchart of the SVR-LSTM training and prediction.

Figure 5 .Table 1 .
Figure 5.Samples of 24-hour MG load profile: (a) A low-population MG with the same income percentage for households, and (b) A populous MG with high-income households

Figure 5 .
Figure 5.Samples of 24-h MG load profile: (a) A low-population MG with the same income percentage for households, and (b) A populous MG with high-income households.

Figure 7 .
Figure 7. Training phase errors for each network in the MSE and RMSE formats.

Figure 7 .
Figure 7. Training phase errors for each network in the MSE and RMSE formats.

Table 2 .
Evaluation of the results of the used networks in the training phase.

Figure 7 .
Figure 7. Training phase errors for each network in the MSE and RMSE formats.

Figure 9 ;
Figure 10 also illustrate the test data forecasting errors for each network in the forms of the MSE and RMSE, and in the form of histograms, respectively.Sustainability 2020, 12, x FOR PEER REVIEW 11 of 17 indicate the efficiency of this hybrid model.Trained networks can be used as a toolbox to predict (1) new data, which is related to the future or at any time, and (2) test data, which is used at any time to forecast new data in the future.In the next step, test data are used to validate the training of each network.

Figure 9 .
Figure 9. Test data forecasting errors for each network in the forms of the MSE and RMSE.

Figure 9 .
Figure 9. Test data forecasting errors for each network in the forms of the MSE and RMSE.

Figure 9 .
Figure 9. Test data forecasting errors for each network in the forms of the MSE and RMSE.

SVR 0 .
9770 0.5983 0.7735 0.5855 12.83 LSTM 0.9809 0.5133 0.7164 0.5335 10.48 SVR-LSTM 0.9901 0.1316 0.3627 0.1239 3.74In order to check the accuracy of each network in forecasting daily load on an hourly basis, a sample of forecasting test data is presented in Figure11.This figure provides the accuracy and predictive error of each network clearly and visually for each hour of the day.

Figure 11 .
Figure 11.A sample of an hourly daily load forecast in MG by each of the utilized networks.

Figure 11 .
Figure 11.A sample of an hourly daily load forecast in MG by each of the utilized networks.

Table 2 .
Evaluation of the results of the used networks in the training phase.

Table 2 .
Evaluation of the results of the used networks in the training phase.

Table 3 .
Evaluation of the results of the used networks in the test phase.

Table 3 .
Evaluation of the results of the used networks in the test phase.

Table 4 .
Evaluation and comparison of the results of this paper with the results of other studies.