Short-Term Electric Load and Price Forecasting Using Enhanced Extreme Learning Machine Optimization in Smart Grids

: A Smart Grid (SG) is a modernized grid to provide efﬁcient, reliable and economic energy to the consumers. Energy is the most important resource in the world. An efﬁcient energy distribution is required as smart devices are increasing dramatically. The forecasting of electricity consumption is supposed to be a major constituent to enhance the performance of SG. Various learning algorithms have been proposed to solve the forecasting problem. The sole purpose of this work is to predict the price and load efﬁciently. The ﬁrst technique is Enhanced Logistic Regression (ELR) and the second technique is Enhanced Recurrent Extreme Learning Machine (ERELM). ELR is an enhanced form of Logistic Regression (LR), whereas, ERELM optimizes weights and biases using a Grey Wolf Optimizer (GWO). Classiﬁcation and Regression Tree (CART), Relief-F and Recursive Feature Elimination (RFE) are used for feature selection and extraction. On the basis of selected features, classiﬁcation is performed using ELR. Cross validation is done for ERELM using Monte Carlo and K -Fold methods. The simulations are performed on two different datasets. The ﬁrst dataset, i.e., UMass Electric Dataset is multi-variate while the second dataset, i.e., UCI Dataset is uni-variate. The ﬁrst proposed model performed better with UMass Electric Dataset than UCI Dataset and the accuracy of second model is better with UCI than UMass. The prediction accuracy is analyzed on the basis of four different performance metrics: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Mean Square Error (MSE) and Root Mean Square Error (RMSE). The proposed techniques are then compared with four benchmark schemes. The comparison is done to verify the adaptivity of the proposed techniques. The simulation results show that the proposed techniques outperformed benchmark schemes. The proposed techniques efﬁciently increased the prediction accuracy of load and price. However, the computational time is increased in both scenarios. ELR achieved almost 5% better results than Convolutional Neural Network (CNN) and almost 3% than LR. While, ERELM achieved almost 6% better results than ELM and almost 5% than RELM. However, the computational time is almost 20% increased with ELR and 50% with ERELM. Scalability is also addressed for the proposed techniques using half-yearly and yearly datasets. Simulation results show that ELR gives 5% better results while, ERELM gives 6% better results when used for yearly dataset.


Introduction
For electricity generation and distribution, Traditional Grids (TGs) are used.The infrastructure of TG is getting obsolete, which results in energy loss and less efficient output.Due to the usage of outdated infrastructure, intensive power losses are being faced.This intensive power loss leads to load shedding, which is one of the major issues of today's world [1].TGs use fossil fuels like coal, petrol, diesel, etc., for the combustion process of turbines.The extensive use of fossil fuels lead to natural resource depletion and increase in pollution.The literature has suggested to use Renewable Energy Sources (RES) and to modify the existing TGs by incorporating the latest technologies and updated infrastructure to overcome these issues.The new and modified form of TG is the Smart Grid (SG) [2].The Information and Communication Technology (ICT) is integrated with TG to make SG.It provides bi-directional communication between consumers and utility.It monitors, protects and optimizes the generation, distribution and consumption of electric energy.It incorporates the latest technologies in TG: technical, control and communication technologies, to enable efficient energy transmission.With an ever increasing dilemma of energy shortage and cost inflation, people are attracted towards the SG.It provides the consumers with a reliable, economical, sustainable, secure and efficient energy as it uses intelligent methods.In SG, Demand Side Management (DSM) is used, which encourages the consumers to efficiently optimize the energy usage.DSM allows efficient load utilization by shifting maximum load from on-peak hours to off-peak hours.Thus, it reduced the cost of electricity.The differences between TG and SG are summarized in Table 1 [3].Data analytics is the phenomenon of dealing with big data obtained from different sources.Big data is the term used for the datasets having large volume, velocity, variety and veracity.It has the problem of extreme complexity which makes the processing of data difficult.Data analytics techniques are the necessity for the processing of big data.Data analytics can be used in a number of fields.For example, handling the financial details of customers by a bank, dealing with the flight details of different passengers by an airline company, dealing with the electricity load and price forecasting of consumers, etc.In SG, data analytics is used to minimize the electricity cost and to improve the service quality of energy utilities.It is also used to predict the future patterns of electricity consumption.Forecasting is done to schedule the load consumption from on-peak hours to off-peak hours for next day, week or month to reduce the electricity cost and enhance user comfort [4].
The terms forecasting and prediction are used interchangeably in this article.The case with load and consumption is similar.The sole purpose of this work is to increase the accuracy of load and price forecasting.Two techniques are proposed to solve the aforementioned objectives, i.e., ELR and ERELM.Furthermore, two types of datasets are used, i.e., uni-variate and multi-variate.UCI is the uni-variate dataset.Uni-variate dataset contains one variable, i.e., load in this paper.However, real-time data has a number of variables.Thus, multi-variate dataset is required to handle multiple variables to achieve a better understanding.In this paper, multi-variate dataset, i.e., UMass Electric Dataset is used to predict the load and price.Two types of scenarios are considered in this paper, i.e., residential load and smart meters load.The proposed techniques outperformed existing techniques in terms of forecasting load and price.Consequently, energy prediction assists in energy management on the residential and utility side.List of abbreviations that are used in this paper is given in Table 2. Whereas, Table 3 shows complete list of symbols.The rest of the paper is organized as: Section 2 deals with related work, Section 3 contains the detailed description of techniques used in this paper.Section 4 covers the proposed system models.Results and their discussion are given in Section 5, whereas Section 6 consists of evaluation of the proposed models using the performance metrics.Conclusion and future studies are discussed in Section 7.

Motivation
The authors in [5] used Multi Layer Perceptron (MLP) and Artificial Neural Network (ANN) to solve the load and price forecasting problem.We proposed an enhanced technique to increase the accuracy of load and price forecasting based on a modified loss function.In Reference [6], authors used ELM and RELM to predict electricity load.We proposed an enhanced technique to optimize weights and biases of network for efficient load forecasting.Furthermore, two scenarios are considered and two different types of datasets are used to predict the load and price efficiently.

Problem Statement
Data of SGs are increasing dramatically so an efficient technique is required to predict the load and price of electricity.Authors in [6] used Recurrent Extreme Learning Machine (RELM) to predict the electricity load.However, in RELM, weights and biases are randomly assigned which leads to drastic variations in prediction results.An enhanced technique is proposed to solve the aforesaid issue.In Reference [7], authors used Convolutional Neural Networks (CNNs) for predicting the energy demand.However, CNN involves tuning of a number of layers which makes it spatio-temporal complex.
In this paper, two enhanced techniques are proposed to increase the accuracy of load and price of electricity efficiently.Uni-variate and multi-variate datasets are used for both techniques.Furthermore, analysis of both residential and utility data is performed collectively.

Contributions
The following are the contributions of this paper:

Related Work
Many forecasting techniques have been used in the past for load and price forecasting.These techniques can be categorized in three main groups: data driven, classical and Artificial Intelligence (AI).Data driven techniques consider past data to predict the desired outcomes.Classical methods comprise of the statistical and mathematical methods like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), Random Forest (RF), etc., whereas AI methods mimic the behaviour of biological neurons like Feed Forward Neural Network (FFNN), Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), etc.

Electricity Load Forecasting
In Reference [8], behavioural analytics are performed using Bayesian network and Multi Layer Perceptron (MLP).A number of experiments were performed using the data obtainedfrom the smart meters.Both short-term and long-term forecasting was performed.In Reference [9], Multiple Linear Regression (MLR) is used for forecasting purpose.However, it has the limitation that it can not be used for long term prediction.The authors in [10] used residual network for forecasting load on the basis of weather data.The authors in [11] used Restricted Boltzmann Machine (RBM) to train the data and Rectified Linear Unit (ReLU) to predict the electricity load.In Reference [12], Discrete Wavelet Transform (DWT) and Inconsistency Rate (IR) methods are proposed to select the optimal features from the feature set which helps in dimensionality reduction.Sperm Whale Algorithm (SWA) helps to optimize the parameters of SVM.Authors in [13] proposed a model for Short Term Load Forecasting (STLF).Mutual Information (MI) is used for feature selection whereas, better forecasting results are achieved by modifying the Artificial Neural Network (ANN).In Reference [14], authors predicted 24 h ahead cooling load of buildings using deep learning.The results show that deep learning techniques enhanced the load prediction.Similarly in [15], authors used Recurrent Neural Network (RNN), which groups the consumers into pool of inputs.The proposed model is implemented using Tensorflow package and it achieved better results.
ELM is a generalized single hidden layer FFNN learning algorithm that is proposed by the authors in References [16,17].It proved to be effective in both regression and classification methods.In References [18,19], authors used the NNs for achieving better load prediction.In ELM learning processes, input weights and biases are randomly assigned, whereas output weights are calculated using the Moore-Penrose generalized inverse technique.In Reference [20], authors used Sparse Bayesian ELM for multi-classification purposes.The authors in [21] used Particle Swarm Optimization (PSO) and Discrete Particle Swarm Optimization (DPSO) techniques for efficient load forecasting.Authors in [22] implemented GWO with NNs to optimize weights and biases.It is proved that optimization of weights and biases increases the efficiency of network.In References [23], ELM is trained using back propagation by using context neurons as input to hidden and input layers.Accuracy is improved by further adjusting weights using previous iteration errors, whereas biases and neurons selection affect prediction accuracy as already discussed in [6].

Electricity Price Forecasting
In Reference [24], different models are used for price forecasting.These models belong to the class of deep learning.Based on the simulation results, it is proved that deep learning models perform better than the statistical models.In this paper, Gated Recurrent Unit (GRU) is used which is a variant of RNN.GRU outperformed LSTM and many other statistical models in terms of accuracy.In Reference [25], price forecasting is done using a variant of Auto Regressive Moving Average Model (ARMAX), i.e., Hilbertian ARMAX which uses the exogenous variables.The functional parameters used are modeled as the linear combinations of the sigmoid functions.These parameters are then optimized using a Quasi Newton (QN) algorithm.In Reference [26], two AI networks: CNN and LSTM are used for price forecasting in PJM electricity market.In Reference [27], Deep Neural Network (DNN) is used to extract complex patterns from the price dataset of Belgium.In Reference [28], Gray Correlation Analysis (GCA) is used along with Kernel Principal Component Analysis (KPCA) to deal with the dimensionality reduction issue.For prediction, Support Vector Machine (SVM) is used in combination with Differential Evolution (DE), where DE is used to tune the parameters of SVM.In Reference [29], a variant of autoencoder is used.This method comprises of encoder and decoder.First, the data is encoded to deal with space complexity.Once the output is obtained, it is decoded into original form.The authors in [30] implemented an enhanced form of Artificial Bee Colony (ABC) known as Time Varying Coefficients Artifical Bee Colony (TVC-ABC) for parameter tuning of Nonlinear Least Square Support Vector Machine (NLS-SVM).Inputs are first fed to ARIMA and then the output of ARIMA is given to NLS-SVM.This ARIMA + TVC-ABC NLS-SVM is a Multi Input Multi Output (MIMO) forecast engine.Limitations of gradient decent methods led researchers to evolve ELM based upon local minima, learning rate, stopping condition and iterations of learning [31].ELM performance is different from traditional learning algorithms because it gives comparatively less forecasting error as well as proposing better generalization performance in [32].
Different versions of ELM have also been proposed by researchers such as Kernel Based Extreme Learning Machine (KELM).Robust classification is done in this paper.It is inspired by Mercer condition [33].Related work is summarized in Table 4.In literature, short term load and price forecasting using the conventional techniques is performed on individual basis mostly, whereas we used short term load and price forecasting simultaneously using enhanced techniques which surpasses the conventional techniques in terms of accuracy.The first proposed technique, i.e., ELR outperformed LR in terms of prediction accuracy, whereas the second proposed technique ERELM outperformed ELM and RELM using GWO and performs much better due to weights and biases optimization.

Existing and New Techniques
In this section, the existing and the proposed techniques are discussed.

Classification and Regression Technique (CART)
CART is a type of decision tree algorithm which consists of both classification and regression procedures and is used to predict the continuous and discrete variables, respectively.CART uses historical data to build decision trees.These newly built trees are then used for classification of data.It is a binary recursive process.Binary process has only two output values, i.e., 0 and 1.The algorithm will search for all possible values and variables before performing the split operation [34].The CART method has three main parts: Choice of right tree size, • Classification of data using the constructed tree.
The construction of a maximum tree refers to splitting of the tree till the last set of observations.This is the most time-consuming phase in CART.Constructing the maximum trees is a complex method which can have more than hundred levels.Therefore, the trees must be optimized before being used for classification of the data.The classification problems are the ones which involve discrimination between entities, e.g., discrimination among students to decide which student will be awarded with the degree this year.On the other hand, regression uses historical data patterns to predict the future values, e.g., load and price prediction of homes.The steps of CART are stated below: Specifying the accuracy criteria, • Selecting split size, • Determine the threshold to stop splitting, • Selection of the best tree.

Recursive Feature Elimination (RFE)
RFE is a feature extraction process.It selects a set of most important features which are least redundant.As the name is self defining, it is an iterative process which keeps running in a loop unless all the best features are selected.The selected features are then ranked in the order they are being removed from the feature set.The computation time depends upon the number of features which need to be eliminated [35].The pseudocode of RFE is given in Algorithm 1.

Relief-F
Relief-F is an extensively used method for feature selection.This method randomly selects an instance R and then find its nearest hits and miss instances.The nearest hits are the k-nearest neighbors of the selected random instance R. Afterwards, the average of all the weights of the nearest hits and miss is calculated to select the next instance.The pseudocode of Relief-F is discussed in Algorithm 2 [36].Update weight of all attributes using Equation ( 1) 12 end 13 Perform feature selection 14 End

Convolutional Neural Network (CNN)
CNN is a type of NN.It is built from neurons and work like the biological neurons.Each neuron is fed with some input, and then it performs a dot product and finally gives the output.It consists of more than one convolutional layer; followed by a multilayer NN.The basic type of CNN is a 2D network and is mostly used for images.The layers in CNN are: pooling layer, dense layer, dropout layer and convolutional layer.For forecasting data, 1D CNN can also be used.It also has an activation function like Sigmoid, ReLU, Tanh, etc.When new inputs are given to CNN, it does not know the exact feature mapping.Therefore, it creates a convolutional layer and then convolves this layer to find the correct feature mapping.The pooling layer in CNN has the ability of shrinking the large inputs.The most widely used activation function for CNN is ReLU.Its working is simple; whenever a negative number occurs, it is replaced by 0. Hidden layers are also present in CNN.The error minimization is performed using these layers.

Logistic Regression (LR)
LR is a type of statistical model used for regression.It is used to analyze a given dataset and then perform predictions using the independent variables.The outcome of LR is in the binary form.The main aim of LR is to describe a pattern between independent and dependent variables.There are two main parameters of LR: loss function and sigmoid function.The features should be in the binary form to use the LR method.Hence, normalization of data is required before implementing the LR model on the available data.The sigmoid function used in LR is given in following equation [37]: (2) Logistic loss function is given by the following equation, which is taken from [37]:

Enhanced Logistic Regression (ELR)
ELR is proposed in this paper.It is an enhanced form of LR technique.In ELR, a new loss function is used.Loss function is a group of objective functions that need to be minimized.It is a measure of how good a prediction model performs in predicting the outcome.Minimizing the value of the loss function increases the prediction accuracy.In this paper, the loss function is being minimized to enhance the prediction accuracy.The equation for the loss function of ELR is given below: ELR is used to predict electricity load and price efficiently for a smart home and load of smart meters.Two different datasets, i.e., UMass Electric Dataset and UCI Dataset are used to test the proposed technique.

Grey Wolf Optimizer (GWO)
In this section, GWO technique is discussed in detail.In the proposed model, the metaheuristic technique GWO is used.It follows the social leadership and hunting mechanism of grey wolves as shown in Figure 1.The population is based on groups, i.e., alpha (α), beta (β), gamma (γ) and omega (ω).α , β and γ are considered as the fittest wolves who guide other wolves (ω) in search space.Grey wolves update their location according to the positions of the three fittest wolves, i.e., α, β and γ [22].The fittest solution is taken as α.
The pseudocode of GWO is given in Algorithm 3.
Algorithm 3: Pseudocode of GWO RELM is a single hidden layer neural network (SHLRN).It is a feedback intra network that uses output or hidden layers as given in Equation ( 5) [6]: where δ represents delay, t shows current iteration and r indicates total number of context neurons.Context neurons are connected backward from output to input.These neurons perform similar to input neurons and hold delayed values of output neurons.The learning method to update weights and biases of ELM and RELM is similar to that shown in Figure 2. Weights and biases are decided randomly.Optimal results against weight and biases are utilized in RELM on a random basis.Training dataset is used to calculate the unknown weights of hidden layer.The unknown weights of hidden layer are calculated using a Moore-Penrose generalized inverse technique.

Enhanced Recurrent Extreme Learning Machine (ERELM)
ERELM is an enhanced form of RELM, whereas RELM is an enhanced form of ELM.ERELM is a single layer FFNN.In RELM, weights and biases are decided randomly, whereas the output weights are determined analytically.The output weights are determined using a simplified generalized inverse operation.The issue with ELM is that the classification boundary is not well defined and usually misclassifies some samples.To overcome this shortcoming, a new technique is proposed.
In the proposed technique, i.e., ERELM, weights and biases are decided after optimization using GWO algorithm.GWO finds the optimized solution which minimizes RMSE.Cross validation in ERELM is done using Monte Carlo and K-Fold methods.The Monte Carlo technique is used to model the probabilistic nature of the random variables.It performs risk analysis using the probability distribution.The common probability distributions used with Monte Carlo are: normal, uniform, triangular, discrete, etc.In K-Fold cross validation process, the entire dataset is divided into batches of K samples.The value of K could be any positive integer.Most commonly used K-Fold method is 10-Fold cross validation method, in which the value of K is 10.Each batch formed after splitting of data in the validation process is termed as fold.The pseudocode of ERELM is discussed in Algorithm 4.

Proposed System Models
Two system models are proposed in this section.The description of these models are given below.

Proposed System Model 1
The proposed system model consists of residential load and price data of a SH.The SH under consideration consists of six rooms and eight heavy appliances.The proposed model consists of four basic steps, i.e., normalization of data, feature selection using CART and RFE, feature extraction using Relief-F and finally forecasting of load and price using CNN, LR and ELR.ELR is a proposed technique which outperformed CNN and LR in terms of prediction accuracy.In this model, short term forecasting is performed to make decisions for efficient load and price scheduling for the near future.
The first proposed model is shown in Figure 3.

Proposed System Model 2
The second proposed system model is shown in Figure 4.In the second system model, a load of 10 smart meters is taken.Subsequently, comparison is performed with multivariate residential data.The first step in this model is the preprocessing of data; after the data is preprocessed, the best parameters are selected using RELM.The optimization of RELM is performed using GWO.GWO optimizes biases and weights to improve the accuracy.Thereafter, the proposed technique ERELM reduces forecasting error.Cross validation is performed using Monte Carlo and K-Fold methods.
The simulation results and the assessment of both proposed models is done on the basis of four different performance metrics: MAPE, MAE, RMSE and MSE.The results show that the proposed techniques beat the existing techniques in terms of prediction accuracy.

Simulation Results and Discussion
This section covers the simulation results of the proposed models.The results are given in this section along with their discussion.The simulations are performed in Spyder (Python 3.6 package) provided by Anaconda (a data science platform manufactured by Anaconda, Inc. located in Austin, Texas, USA) on HP 450G ProBook, having 1 TB Hard Drive and 8 GB RAM.

Simulation Results and Discussion of Proposed System Model 1
The simulation results and discussion of proposed system model 1 is given below.

Data Description
The first dataset is taken from UMass Electric Dataset [38].It is a multivariate dataset used for forecasting purpose.Half-yearly and yearly data is taken for the year 2016 to address scalability.The dataset contains the half-hourly load and price values of a single home.The dataset is divided into a 70:30 ratio, i.e., seventy percent data is used for training, whereas the remaining thirty percent is used for testing.Preprocessing of the dataset is done to remove the Not a Number (NaN) and blank values.UMass dataset is used for both proposed system models.Though, it performs much better when used with ELR.
Table 5 shows the features of UMass Electric Dataset excluding the target features.The targeted features are "Load" in case of load prediction and "Price" in case of price prediction.The values are given in standard units, i.e., kW for load and cents/kWh for price.The dataset is normalized in the range [0-1].

Original Features
Air Conditioner (AC), Furnace, Cellar lights, Washer, First floor lights, Utility room + Basement, Garage outlets, Master bed + Kids bed, Dryer, Panels, Home office, Dining room, Microwave, Fridge

CART
Table 6 shows the results of CART used to predict load and price using UMass Electric Dataset.CART gives respective values for different features.7.
RFE assigns two values to the features, i.e., True and False.In the proposed model, number of selected features through RFE is 8, when used for UMass Electric Dataset.The RFE selected features are: AC, cellar lights, washer, garage, master bed + kids bed, panels, dining room and microwave.For a UCI Dataset, RFE did not give any output as it is a uni-variate dataset.5a,b show the load prediction comparison of three different techniques for one day using two different hourly datasets.Similarly, Figure 6a,b show the load prediction comparison for one week using two different hourly datasets.From Figure 7a,b, monthly load prediction comparison can be observed.In this case, to avoid the cluttering of the graphs, data is taken after every four hours.These figures show that the proposed technique ELR outperformed LR and CNN for both datasets.It can be envisioned that the load prediction with ELR is close to the actual data.The prediction results obtained using a UMass Electric Dataset are better than the UCI Dataset.

Price Forecasting
Figure 8 shows the price prediction comparison of three different techniques for one day using UMass Electric Dataset.Similarly, Figure 9 shows the price prediction comparison for one week using UMass Electric Dataset.From Figure 10, monthly price prediction comparison can be observed.In this case, data is taken every four hours.These figures show that the proposed technique ELR outperformed LR and CNN for UMass Electric Dataset in terms of price prediction.It can be envisioned that the price prediction with ELR is close to the actual data.

Simulation Results and Discussion of Proposed System Model 2
The simulation results and discussion of proposed system model 2 are given in this section.

Data Description
The second dataset is taken from the UCI machine learning repository.It is a uni-variate dataset developed by Artur Trindade [39].Consumption of 370 substations is taken under consideration to analyze the load of smart meters.Daily data of meter ID: 166, 168, 169, 171, 182, 225, 237, 249, 250 and 257 substation is shown in Figure 11.The periodicity of load consumption can be observed.Pattern of intervals give trend of load consumption that later helps in prediction of future electricity load.UCI Dataset is used for both proposed system models.In order to analyze scalability, half-yearly and yearly datasets are used.It performs well for smart meters because the only targeted feature is load.The values of load are given in kilo-Watts.This dataset is also normalized in the range [0-1].

Results Discussion
Multiple approximation function is used in order to find optimal forecasting accuracy.These functions include hard limit, sine, tanh and sigmoid function.Number of neurons and context neurons are assumed as 2 and 5.The MT166 dataset is selected to finalize functions that are producing optimal results.The dataset is normalized and scaled before use to remove spikes and noise in data.ELM, RELM and ERELM are tested on all functions one by one as given in Table 9. Sigmoid approximation function performed better than other functions.The simulations for the second proposed model are carried out on both datasets using the sigmoid approximation function.In Table 10, simulation results of both datasets are given using six months of data.Cross validation is done using Monte Carlo and K-Fold.Simulations show that the proposed technique outperformed the conventional techniques in perspective of forecasting and gives minimum RMSE.Monte Carlo gives better results as compared to K-Fold.Similarly, Table 11 addresses the scalability of the proposed system and proves that the prediction accuracy increases with the increase in size of dataset.The Figure 12a,b show regression line produced by predicted and actual load using ELM.Similarly, Figure 13a,b show greater RMSE as compared to a proposed technique in regression plot using RELM.Figure 14a,b show plots produced by ERELM, where the regression line shows actual and predicted electricity load with minimum RMSE.It is clearly visible that predicted values are very close to the actual electricity load.Table 12 gives computational time comparison for execution of training and testing data of ELM, RELM and ERELM.ERELM has great computational time as compared to ELM and RELM due to its metaheuristic behaviour.Thus, there is a tradeoff between accuracy and computational time.

Performance Metrics
The performance of the proposed system models is evaluated on basis of four performance metrics.These performance metrics are: MAE, MSE, RMSE and MAPE.Out of these four, MAPE is given in terms of percentage whereas, the other three are given as absolute values: The accuracy of the model is calculated using the following equation: Tables [13][14][15] show the load performance metrics comparison for half-yearly and yearly data to address the scalability issue.The dataset being used is UMass Electric Dataset.Similarly, Tables 16-18 show the price performance metrics comparison for half-yearly and yearly data to address the scalability issue using the UMass Electric Dataset.Tables [19][20][21] show the load performance metrics comparison for half-yearly and yearly data to address the scalability issue.The dataset being used is UCI Dataset.

Conclusions and Future Work
In this paper, electricity load and price forecasting are performed using two techniques.UMass Electric Dataset is used to predict day ahead, week ahead and month ahead load and price of a SH.Six months of hourly data are considered for day ahead and week ahead prediction, whereas four hours of data are considered for month ahead prediction.It is a multi-variate dataset.The data is first normalized and split into a training set and testing set.Feature engineering is then performed using three different techniques: RFE, CART and Relief-F.For efficient load and price prediction, a new technique, i.e., ELR is proposed.ELR outperformed CNN and LR in terms of prediction accuracy.ELR is used for UCI Dataset as well.It is a uni-variate dataset having data of smart meters of different substations.The results show that the first proposed model works well with UMass Electric Dataset.The techniques used are then accessed on the basis of four different performance metrics, i.e., MAPE, MAE, MSE and RMSE.The simulation results show that ELR outperformed LR and CNN for both datasets.
For accurate short term load forecasting, a new technique, i.e., ERELM is proposed.Short term forecasting is performed to ensure efficient load scheduling and price reduction.Parameter optimization of RELM is done using GWO.GWO optimizes biases and weights to improve the accuracy.Prediction accuracy is further increased using Monte Carlo and K-Fold.ERELM is used with both datasets.The results show that ERELM works well for UCI Datasets.It is observed that ERELM outperformed ELM and RELM for both datasets.The phenomenon of scalability is also addressed using both proposed techniques.Results prove that the prediction accuracy increases with the increase in size of dataset.
In future, the proposed methods will be used to perform mid-term and long-term forecasting.Weights and biases of ERELM will be further optimized using better methods.Furthermore, efficient work is required to reduce the computational time of ELR and ERELM.

Algorithm 1 : 12
Pseudocode of RFE 1 Input Initialization 2 Tuning the model using the training set 3 Calculating the performance 4 Calculating variable importance 5 for (Each subset size S(i), i = 1...S) do 6Selecting the most important variables from S(i) Establish the performance profile using S(i)13 Determine the number of important variables 14 Use the model corresponding to optimized S(i) 15 End

Algorithm 2 : 4 Randomly select an instance Ri 5
Pseudocode of Relief-F 1 Input Initialization 2 Assign weights to all attributes (A): W[A]=0 3 for (i = 1 to m) do Find k-nearest hits, Hj 6 end 7 for (All other classes C != class (Ri)) do 8 Find k nearest misses, Mj(C) 9 end 10 for (A = 1 to a) do 11

Figure 1 .
Figure 1.Grey wolf social hierarchy.The general steps that are followed in GWO are: • Parameters of grey wolves are initialized such as maximum number of iterations, the population size, upper and lower bounds of search space, • Calculate fitness value to initialize the position of each wolf, • Select three best wolves, i.e., α, β and γ, • Calculate the positions of the remaining wolves (ω), • Repeat from step 2 if current solution is not satisfied, •The fittest solution is taken as α.

Algorithm 4 : 4 6 7
Pseudocode of ERELM 1 Input Original dataset of N sample, objective function 2 Output Predicted desired value 3 begin Assign the input weights w i and biases b i as received from GWO 5 Calculate the hidden layer output matrix H, where H = h ij (i = 1, ..., N), j = (1, ..., K) and h ij = g(w j .xi + b j ) Calculate the output weight matrix as β = H + T, where H + shows Moore-Penrose generalized inverse of H matrix Updated weights are given as context neurons to input and hidden layers 8 end

Figure 10 .
Figure 10.One month price prediction using UMass.

Table 1 .
Differences between TG and SG.

Table 2 .
List of abbreviations.

Table 3 .
List of symbols.

Table 4 .
Summary of related work.

Table 5 .
Features in UMass Electric Dataset.

Table 6 .
Results of CART for UMass Electric Dataset.RFE is implemented for feature selection.RFE keeps on iterating unless model is left with only the most prominent features.The choice of features depend upon requirements.The results of RFE are given in Table

Table 7 .
RFE features for a UMass Electric Dataset.Relief-F is used for feature extraction.The threshold for Relief-F is 10.Table8shows the Relief-F features for UMass Electric Dataset.It did not give any output when used with UCI Dataset because it is uni-variate.

Table 8 .
Relief-F features for UMass Electric Dataset.

Table 10 .
Obtained RMSE for half-yearly testing data using ELM, RELM and ERELM by Monte Carlo and K-Fold cross validation.

Table 11 .
Obtained RMSE for yearly testing data using ELM, RELM and ERELM by Monte Carlo and K-Fold cross validation.

Table 12 .
Computational time comparison of ERELM, RELM and ELM execution.

Table 13 .
Load performance metrics comparison for one day using the UMass Electric Dataset.

Table 14 .
Load performance metrics comparison for one week using the UMass Electric Dataset.

Table 15 .
Load performance metrics comparison for one month using the UMass Electric Dataset.

Table 16 .
Price performance metrics comparison for one day using the UMass Electric Dataset.

Table 17 .
Price performance metrics comparison for one week using the UMass Electric Dataset.

Table 18 .
Price performance metrics comparison for one month using the UMass Electric Dataset.

Table 22 .
Accuracy of ERELM using RMSE, MSE and MAE for half-yearly data.

Table 23 .
Accuracy of ERELM using RMSE, MSE and MAE for yearly data.