Short Term Active Power Load Prediction on A 33/11 kV Substation Using Regression Models

: Electric power load forecasting is an essential task in the power system restructured environment for successful trading of power in energy exchange and economic operation. In this paper, various regression models have been used to predict the active power load. Model optimization with dimensionality reduction has been done by observing correlation among original input features. Load data has been collected from a 33/11 kV substation near Kakathiya University in Warangal. The regression models with available load data have been trained and tested using Microsoft Azure services. Based on the results analysis it has been observed that the proposed regression models predict the demand on substation with better accuracy.


Introduction
Electric power industries are seeking electric power prediction tools to forecast the load so that balance between load and generation can be maintained properly. Prediction of active power load is required for arranging regular interval activities and power firms are increasing their infrastructure [1]. Accurate load forecasting systems provide a better understanding of the dynamics of existing power systems [2]. Electric load forecasting was classified into three categories as presented in Table 1 based on time horizon [3].
Short-term active power load prediction is vital to effective power system service, such as dispatching power into the network to prevent regular power outages. Short term active power estimation is a critical prerequisite for optimal dispatch of generators in power plants [4]. Customers would be able to select a more cost-effective energy usage scheme if the short-term load forecasting methodology was more accurate. It helps the power system to reduce cost of power production and to utilize resources optimally [5].
Artificial Intelligence (AI) is an integral part of many fields, some of the main subparts of AI are machine learning and swarm intelligence. Machine learning has become an integral part in many fields like civil engineering applications [6,7], image processing [8] and time series data prediction [9]. Swarm intelligence was developed by taking inspiration from the swarming behavior of various natural systems and this is used to solve various optimization problems [10,11].
Estimation methods to predict the active power load was classified into two classes as shown in Figure 1. Prediction tools were used to estimate solar irradiation, temperature and wind speed. ARIMA time series forecast model was developed in [12] to predict the temperature in Pakistan and it also develops a linear trend model to estimate electric power consumption. Digital Elevation Models were developed in [13] to predict the solar irradiation.
Forecasting techniques can help power system operators exchange active power for the highest possible benefit by calculating active power load and energy price. Electric energy price was predicted using artificial neural networks in [14] by considering day category, hour marker, holiday index, electric load, nonconventional energy generation and natural gas price as input features. Table 1. Active power load prediction classification.

Load Prediction Type Time Usage
Short term Few hours to days Electric power generation and transmission scheduling Medium term Few weeks to months Fuel purchase scheduling Long term 1-10 years Establishment of power sector entities A new model was developed in [15] to predict the active power load. Active power load was estimated in [16] by considering day category, hour marker, holiday index, electric load, renewable energy generation using artificial neural networks and MLR model. Active power load was predicted in [17] based on load data of last four hours. Active power load was estimated in [18][19][20] based on load data for the last four hours and load data at the same hour for the last two days.
An ANN model was developed in [21] to forecast the half-hourly electric load demand in Tunisia. Authors have used a Levenberg-Marquardt learning algorithm to train the ANN model. Prediction of electricity demand and price one day ahead using functional models was discussed in [22]. Estimation of electric power consumption in Shanghai using grey forecasting model was discussed in [23]. All of these methods make useful advancements to load estimation, but these overlook useful elements such as dimensionality reduction, which improves model accuracy per number of model parameters.
In this paper, stochastic gradient descent optimizer [24] was used to update the parameters in the regression models. The methods described are analyzed by comparing them to previously developed models [17,18].
The main contributions of this paper are as follows: • SLR and PR models were used to predict the active power load. • A new approach, i.e., predict the active power load based on load at last three hours and load at one day before was used with various regression models and dimensionality reduction technique was used to reduce the complexity of the model so that overfitting problem was removed. • Data analytic tools were used to process the data before feeding it to the model

Methodology
Active power load on a 33/11 kV substation has been predicted using regression models like SLR, MLR and PR. In all regression models, stochastic gradient optimizer has been used to update the parameters so that error, i.e., the difference between actual and predicted load is minimum.

Simple Linear Regression Model (SLR)
In SLR, output variable (Y a ) is related linearly with input variable (X a ). For a given input X a , predicted output Y will be calculated using Equation (1). Gradient descent optimization method has been used to find the values of m and c for the given inputs (X a ) and corresponding outputs (Y a ), such that the total distance between all the output data points and line represented by Equation (1) is minimum, as shown in Figure 2. The main objective of gradient descent optimization method is minimization of half mean distance from all actual data points from line as shown in Equation (2). This half mean distance is also called error which represents the difference between actual output Y a and predicted output (Y). As error which is going to minimize is a convex function, the gradient descent optimizer will perform well to reach global minimum point. Gradient descent optimizer will update the solution such that it will reach the point where gradient is zero by choosing step size opposite to the gradient. In linear regression problem m and c are variables and step size for m, i.e., δm and step size for c, i.e., δc has been computed using Equations (3) and (4) respectively. Variables m and c will be updated using Equations (5) and (6), respectively, such that gradient will reach towards zero.
SLR has been used to predict the active power load (L(D, t)) on a substation at particular hour (t) of the day (D) based on active power load (L(D−1, t)) at same time (t) but in previous day (D−1). In this scenario L(D−1, t) data points represent input (X a ), whereas L(D, t) data points represents output (Y a ). The procedure for load prediction using SLR is presented in Figure 3.

Start
Prepare dataset for X a , i.e., L(D,t−1) and Y a , i.e., L(D,t) Randomly initialize m and c, choose η and , Initiliaze n s = 1 Calculate Y using Equation (1) Calculate E using Equation (2) Is E ≤ ?
Calculate δm and δc using Equations

Multiple Linear Regression Model (MLR)
In MLR, output variable (Y a ) is related linearly with multiple input variables like (X a 1 , X a 2 , · · · , X a n ). For a given input variables (X a 1 , X a 2 , · · · , X a n ), output Y will be predicted using Equation (7). Gradient descent optimization method has been used to find the values of (m 1 , m 2 , · · · , m n ) and c for the given inputs (X a ) and corresponding outputs (Y a ), such that the half mean distance between all the output data points and line represented by Equation (7) is minimum as shown in Figure 4. Y = m 1 X a 1 + m 2 X a 2 + ... + m n X a n + c (7) In MLR the variables step size like δm i (δm 1 , δm 2 , · · · , δm n ) and δc have been computed using Equations (8) and (9) respectively. Variables δm i (m 1 , m 2 , · · · , m n ) and c will be updated using Equations (6) and (10) respectively such that gradient will reach toward zero.
MLR has been used to predict the active power load (L(D,t)) on a substation at particular hour (t) of the day (D) based on last three hours load data from time of prediction, i.e., L(D, t−1), L(D, t−2), L(D, t−3) and active power load (L(D−1, t)) at same time (t) but in previous day (D−1). In this scenario L(D, t−1), L(D, t−2), L(D, t−3) and L(D−1, t) data points represent input (X a ), whereas L(D,t) data points represents output (Y a ). The procedure for load prediction using MLR is presented in Figure 5.

Polynomial Regression Model (PR)
In PR, output variable (Y a ) is related nonlinearly with input variable (X a ). For a given input X a , output Y will be predicted using Equation (11). Gradient descent optimization method has been used to find the values of (m 1 , m 2 , · · · , m p ) and c for the given input (X a ) and corresponding outputs (Y a ), such that the total distance between all the output data points and nonlinear curve represented by Equation (11) is minimum as shown in Figure 6. The main objective of gradient descent optimization method is minimization of half mean distance from all actual data points from nonlinear curve as shown in Equation (2). Calculate Y using Equation (7) Calculate E using Equation (2) Is E ≤ ?
Calculate δm i and δc using Equations (8) and (9) respectively Update m and c using Equations (10) and (6)   Gradient descent optimizer will update the solution such that it will reach the point where gradient is zero by choosing step size opposite to the gradient. In PR problem (m 1 , m 2 , · · · , m p ) and c are variables and step size for m i , i.e., δm i and step size for c, i.e., δc have been computed using Equations (12) and (13) respectively. Variables δm i and c will be updated using Equations (6) and (10), respectively, such that gradient will reach toward zero.
PR model has been used to predict the active power load (L(D,t)) on a substation at particular hour (t) of the day (D) based on active power load (L(D−1,t)) at same time (t) but in previous day (D−1). In this scenario L(D,t) data points represents output (Y a ), whereas L(D−1,t) data points represent input (X a ). The procedure for load prediction using PR is presented in Figure 7. Calculate Y using Equation (11) Calculate E using Equation (2) Is E ≤ ?

Dimensionality Reduction
In data science forecasting problems, there are often too many variables used to make the final estimate. These variables are also known as features. The more features there are, the more difficult it is to imagine the training set and then work on it. Occasionally, the majority of these characteristics are synonymous and therefore redundant. Features reduction algorithms come into play here. Feature reduction is the method of reducing the number of random variables taken into account by having a collection of principal variables.
Dimensionality technique based on correlation among input features was used to reduce complexity and to avoid the overfitting problem in MLR model.

Model Performance Metrics
Globally used error metrics such as MAE [25], MSE [26] and RMSE [27] as shown in Equations (14)- (16) respectively, were used to measure the performance, final decision and best model structure among simple, MLR models and PR model.
The data from the electric utility, i.e., 33/11 kV substation was collected to train and test the machine learning model. The variation in predicted output with respect to each input feature is shown in Figure 8, the data samples are crammed together in a line. By observing the data distribution, we confidently begin with regression models rather than more complex advanced models that provide high nonlinear mapping between input and outputs, which were not necessary for this data. However, to improve prediction accuracy, some nonlinearity was added to the regression model in the form of a PR model, and multiple inputs were also tried. The proposed regression models are mathematically simple and have few model parameters (light weight model), allowing for quick computation and storage of the deployable model and also having good accuracy as the model considered, which is more suitable for the distribution of substation load data.

Results
Historical load data to train and test the models was considered from [28]. Date processing techniques for observing the data distribution and outliers and for data normalization have been used before using this data to train and test the regression model. Stochastic gradient descent optimizer has been used train the proposed regression models. The proposed regression models have been implemented and tested in cloud computing environment using Microsoft Azure Notebook [29].

Simple Linear Regression
A total of 2160 samples have been considered in the dataset; out of these, 1728 samples have been used for training and the remaining 432 samples have been used for testing.

Data Analysis
Statistical information of the active power load dataset is presented in Table 2. Scattering of data available in the dataset is presented in Figure 9. From the data scattering it has been observed that linear regression model can predict the load with good accuracy.  Box plot and histogram plot have been used to observe the input and output data distribution. Input and output data histograms are presented in Figure 10, and it has been observed that both input and output data samples follow the normal distribution. The box plot shown in Figure 11 is used to identify the outliers in the data and confirms that there are no outliers in the active power load dataset. The distribution of actual load Y a and load predicted using Equation (1) with training load data samples is presented in Figure 12 and similarly distribution of actual load Y a and predicted load with testing load data samples is presented in Figure 13.  The performance of the model was observed using MAE, MSE and RMSE and the respective values with testing dataset were 0.0939, 0.0163 and 0.1277. The comparison of actual load available in testing dataset and load predicted with trained SLR model is presented in Figure 14.

Multiple Linear Regression
A total of 2160 samples have been considered in the dataset; out of these, 1728 samples have been used for training and the remaining 432 samples have been used for testing of MLR model.

Data Analysis
Statistical information in terms of mean, standard deviation and inter quartile range of the active power load dataset is presented in Table 3. Dataset contains a total of four input parameters L(D,t−1), L(D,t−2), L(D,t−3) and L(D−1,t) and one output parameter L(D,t).   Box plot and histogram plot have been used to observe the input and output data distribution. Input and output data histograms are presented in Figure 15, and it has been observed that both input and output data samples follow the normal distribution. The box plot shown in Figure 16 is used to identify the abnormal data samples and it confirms that there are no outliers in the active power load dataset for MLR model.  Table 4. It has been observed that load at one hour before and one day before has positive and high impact on the predicted load. Similarly, load at two and three hours before has negative and low impact on the predicted load. The training performance of the MLR model was measured in terms of MAE, MSE and RMSE and the respective values are 0.0723, 0.0105 and 0.1026.  The performance of the model was observed using MAE, MSE and RMSE and the respective values with testing dataset were 0.0766, 0.0119 and 0.1092. The comparison of actual load available in testing dataset and load predicted with trained MLR model is presented in Figure 17.

MLR with Dimensionality Reduction (DR)
In MLR model load at particular hour (t) of the day (D) i.e., L(D,t) was predicted based on load data on last three hours from time of predicting and load at 24 h before. This means that a total of four input features were considered. Correlation among four input features were computed and presented in Table 5 and this information was used for dimensionality reduction, i.e., to reduce the number of input features and complexity of the model. As presented in Table 5, L(D,t−2) and L(D,t−3) have more than 75% correlation with L(D,t−1) and L(D,t−2), respectively. Hence feature L(D,t−2) was removed from input features and only L(D,t−1), L(D,t−3) and L(D−1,t) were considered as input features to predict the load L(D,t) using MLR model.    Table 6. It has been observed that load at one hour before and one day before has positive and high impact on the predicted load. Similarly, load three hours before has negative and low impact on the predicted load.  The behavior of the MLR model prior to and after feature reduction was presented in Table 7. The behavior of the model was almost similar prior to and after dimensionality reduction in terms of error metrics. The behavior of the model with dimensionality reduction was compared with the behavior of the model without dimensionality reduction in terms of MAE, MSE and RMSE on testing dataset as presented in Table 8. The performance of the model was increased after applying the dimensionality reduction. It was happened due to removal of over fitting problem by reducing the number of input features and complexity of the model. The comparison of actual load available in testing dataset and load predicted with trained MLR model after reducing number of input features is presented in Figure 18.

Polynomial Regression Model
A total of 2160 samples have been considered in the dataset; out of these, 1728 samples have been used for training and the remaining 432 samples have been used for testing of the PR model. The approach used for PR model is prediction of load at a particular hour of the day based on load at 24 h before, which is similar to the approach which was used for SLR. The same dataset was used for both simple and PR models.

PR Model Performance Analysis
PR model with different degree of polynomial (p) was trained with 1728 load data samples to find the optimum m i where i {1,2,...,p} and c values so that half mean distance from actual load data points to the curve represented by optimum m i and c values was minimum. Optimum m i and c values for different degree of PR models were presented in Table 9.  The actual performance of various PR models has been observed with testing data and presented in Table 11. The performance metrics' values have been observed, i.e., MSE and RMSE values decrease up to the degree of polynomial 15 and beyond that increase. This means that if testing performance of the model increases with degree of polynomial up to 15 and beyond that decreases, it is due to overfitting of the model.  Figure 19 shows the variation of RMSE value of model with degree of polynomial on both training and testing dataset. It has been observed from the Figure 19 that training performance of the model increases with degree of polynomial. However, testing performance of the model increases with polynomial degree up to 15 but beyond that testing performance decreases due to overfitting problem. Hence, PR model with degree 15 is considered as the optimal model to deploy. The comparison of actual load available in testing dataset and the load predicted with trained PR model having polynomial degree 15 is presented in Figure 20. It has been observed that most of the predicted load points are close to the actual load.

Comparative Analysis
The proposed SLR model, MLR model with and without DR and PR model to predict the load on a 33/11 kV substation were compared in terms of MAE, MSE and RMSE as shown in Table 12. It has been observed that MLR model with DR has low MAE, MSE and RMSE values compared to simple linear and multiple linear without DR and PR models.
Which means that MLR model with DR predicted the load with good accuracy compared to simple linear and multiple linear without DR and PR models. The load on a substation varied from 3 MW to 9 MW during the observation period, and building a regression model by considering +/−0.5 MW load variation can be significant. The number of times each model failed to predict below the threshold limit is shown in Figure 21. No. of times model failed with in threshold limit (%) Figure 21. Model performance to predict load with in threshold limit.
The performance of the proposed models was compared with existing models available in the literature [17,18] in terms of training and testing MSE and presented in Table 13. The proposed multiple linear regression (MLR) with dimensionality reduction performs well with low mean square error 0.009 over the existing models.  [18] 0.23 0.44 [17] 0. 29 1.59 In comparison with other techniques, the proposed regression models stand out on the data considered for the discussed problem in terms of speed due to light weight models, accuracy due to a suitable model chosen based on data distribution and simplicity due to a very simple model with 2 model parameters for SLR, 16 model parameters for PR and 5 model parameters for multi-variable linear regression model.

Conclusions
The SLR, MLR and PR models were used in this paper to estimate the active power load on a 33/11 kV substation. Dimensionality reduction based on correlation among MLR model input features was used to reduce the model's complexity and overfitting issue.
The analytical results indicate that the proposed regression models predicts active power load with good accuracy. The analytical results concluded that the proposed MLR model with dimensionality reduction performs better. This paper offers a detailed tool for network operators to efficiently exchange energy and run the network.
The methodology to predict load using regression models can be applied to other power system research areas such as LMP computation, effective trading in the energy market, power system deregulation and so on. This load prediction work can be expanded by taking into account sequential networks as well as weekdays and weekends.

Acknowledgments:
We thank S R Engineering College Warangal, India, Woosong University, South Korea and Bennett University, India for supporting us during this work. We also thank the engineers in 33/11 kV substation near Kakatiya University in Warangal for providing the historical load data.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: