A Novel Hybrid Feature Selection Method for Day-Ahead Electricity Price Forecasting

The paper proposes a novel hybrid feature selection (FS) method for day-ahead electricity price forecasting. The work presents a novel hybrid FS algorithm for obtaining optimal feature set to gain optimal forecast accuracy. The performance of the proposed forecaster is compared with forecasters based on classification tree and regression tree. A hybrid FS method based on the elitist genetic algorithm (GA) and a tree-based method is applied for FS. Making use of selected features, aperformance test of the forecaster was carried out to establish the usefulness of the proposed approach. By way of analyzing and forecasts for day-ahead electricity prices in the Australian electricity markets, the proposed approach is evaluated and it has been established that, with the selected feature, the proposed forecaster consistently outperforms the forecaster with a larger feature set. The proposed method is simulated in MATLAB and WEKA software.


Introduction
Efficient and consistent electricity price forecasting is vital for market participants in the preparation of appropriate risk management plans in an electricity market. The higher the complexity in the market, the greater the peril of producing an error in forecast. An appropriate forecasting method allows suppliers and buyers to detail the bidding strategies for moderate losses and to increase profit. The research for developing price forecasting tools in non-intervention markets is at an intermediary stage and a variety of forecasting models covering many free trade markets have emerged in recent years [1][2][3][4][5][6]. Since a price chain is extremely volatile with non-constant mean and variance, because of the mobile nature and inflexible condition of establishing real-time stability of demand and supply of electricity, short-term price prediction is a difficult task. Market clearing prices (MCP) are volatile in a deregulated power market because of being an auction market price forecasting is an important tool for such markets. The companies that do business in electricity markets make broad use of price forecasting methods either to propose or to evade against volatility while auctioning in a pool system. Market contenders are requested to communicate the bids in terms of quantities and prices. A corporation can regulate its own production/price schedule on its own cast and hourly pool prices. A good quality MCP forecast and its confidence interval evaluation can help utilities and autonomous approaches were used for the accurate forecasting of electricity price in Iberian market. Historical data of 22 dominating factor such as load, price, hydro wind energy production, thermal generation hours etc., are used to efficiently forecast electricity price.
In this work electricity price forecasting is done using sequential minimization optimization (SMO) regression. The SMO regression algorithm followed by regression analysis is a powerful tool of decomposition for training SVM without the requirement of a quadratic programming (QP) solver. It is probable that SMO is the only SVM optimizer that exclusively exploits the quadratic form of the objective function and simultaneously uses the analytic solution of the size two cases. Since most of the objective functions have minimization criteria characteristics with set of constraints, for optimization purposes, such objective functions are represented in the form of Lagrange's multipliers. These Lagrange multipliers contain a dual form of the primal set of the predefined objective functions minus linear constraints. The only problem with SMO is its convergence accuracy for non-sparse data sets, but a few modifications have been suggested by the researchers to cope with such limitations. The novel contributions made in this work are:

1.
A novel hybrid method based on elitist GA and tree based method for input FS in price prediction, 2.
SMO regression base SVM is used for prediction of price and result obtained is compared with classification tree (J48) [27] and regression tree [28] (bagging and M5P) with and without feature selection, 3.
Fixing the error margins during price prediction by applying the confidence interval, and 4.
Season-wise optimize FS to have a better forecasting accuracy.
The paper is organized as follows. The proposed methodology for price forecasting is summarized in Section 2. Section 3 of the paper explains the SMO regression. The methodology adopted for input FS using a novel hybrid method based on elitist GA and tree-based method is explained briefly in Section 4. Section 5 shows the outcomes and performance of the proposed methodology. Findings and concluding remarks are provided in Section 6.

Proposed Methodology
Forecasting of day-ahead prices with FS and without feature selection (WoFS) has been considered in this work. The present work emphasizes on an elitist GA and tree-based FS method for price forecasting. Forecast of electricity price on half hourly basis and for each day and all season, week-wise is considered. A comparison of forecasting accuracy has been made by using full feature set and that of reduce feature set, which validates the usefulness of the forecasting with a reduced feature set. The proposed methodology for comparison of different methods is shown in Figure 1. Figure 1 shows that electricity price forecasting is performed using SMO regression employing a full feature set and in parallel the same SMO regression is used with selected feature set. These two forecasting modes were compared to ascertain the superiority of FS. The result obtained from SMO Regression, WoFS and with FS methods is also compared with classification tree and regression tree.

Sequential Minimal Optimization (SMO) Regression Algorithm
SMO regression was proposed by J. Platt (1998) as the SVM classifier design for the training of SVM (Vladimir Vapnik, 1979) by using the LIBSVM tool. SMO is used to optimize solutions of large QP without using extra matrix storage and QP optimization steps. It is probable that SMO is the only SVM optimizer that exclusively exploits the quadratic form of the objective function and it breaks large QP into sets of the smallest possible QP by using Osuna's theorem. SMO consist two components one is analytic method and another is heuristic process. During the computation, at every step SMO uses only two Lagrange's multiplier and solves it by using analytic method. Then by heuristic process it selects the best result to update the system to get new optimal value. This selection method needs very short and simple C coding. Therefore, it fastens the speed and omits the requirement of numerical QP optimization which requires an entire QP library routine or complex matrix algorithms.

Methodology
The SMO algorithm basically works with two steps and until convergence, the algorithm keeps repeating these steps in each iteration: Step 1: Breaking of large QP problems into series of smallest possible QP problem. Find the most promising pair (µ1 and µ2).
Step 2: It solves small QP problems in a very fast manner compared to the QP optimization process because it consumes more time due to inner loops. It is important point to consider that it requires memory in proportion to the smallest possible samples taken under step 1. This enables it to handle a lot of training sets i.e., very large QP problems. Optimize µ1 and µ2 keeping other µ's fixed.
A general quadratic problem has a quadratic objective function with linear constraints. A basic form is expressed by Equation (1)

Sequential Minimal Optimization (SMO) Regression Algorithm
SMO regression was proposed by J. Platt (1998) as the SVM classifier design for the training of SVM (Vladimir Vapnik, 1979) by using the LIBSVM tool. SMO is used to optimize solutions of large QP without using extra matrix storage and QP optimization steps. It is probable that SMO is the only SVM optimizer that exclusively exploits the quadratic form of the objective function and it breaks large QP into sets of the smallest possible QP by using Osuna's theorem. SMO consist two components one is analytic method and another is heuristic process. During the computation, at every step SMO uses only two Lagrange's multiplier and solves it by using analytic method. Then by heuristic process it selects the best result to update the system to get new optimal value. This selection method needs very short and simple C coding. Therefore, it fastens the speed and omits the requirement of numerical QP optimization which requires an entire QP library routine or complex matrix algorithms.

Methodology
The SMO algorithm basically works with two steps and until convergence, the algorithm keeps repeating these steps in each iteration: Step 1: Breaking of large QP problems into series of smallest possible QP problem. Find the most promising pair (µ 1 and µ 2 ).
Step 2: It solves small QP problems in a very fast manner compared to the QP optimization process because it consumes more time due to inner loops. It is important point to consider that it requires memory in proportion to the smallest possible samples taken under step 1. This enables it to handle a lot of training sets i.e., very large QP problems. Optimize µ 1 and µ 2 keeping other µ's fixed.
A general quadratic problem has a quadratic objective function with linear constraints. A basic form is expressed by Equation (1).
where, A = a 1 a 2 is a vector of decision variable, C = c 11 c 12 c 21 c 22 contain the constant coefficients that would multiply the squared term and the a1 times a2 terms. Therefore, C needs to be multiplied two times by the column vector A. the linear term of model is the coefficients are d 1 and d 2 in the vector. 0 defines the lower bound C is higher bound or upper bound in the constraint.

Calculation of SMO Regression
See Appendix A.

Input Feature Selection Using Proposed Algorithm
GA is based on the Darwinian theory of natural evolution. It is basically a heuristic search technique. In this approach no assumptions are made for relationship among features, while searching the space for feature selection. The genetic algorithm, when selecting any feature as per their importance to define the given data, decides by giving a sequence of Boolean values, allowing exploration of the feature space. It retains the features that benefit the classification task. By doing this it simultaneously avoids any local optima due to their intrinsic randomness. By using operators inspired by natural evolution such as selection, mutation and crossover, GA finds the solutions to optimize the problems. For any given dataset, FS is used to select dominating features among the pool of various feature so that handled data would be less and efficient. It is performed using an elitist technique of GA and tree based method.
In this method we select the dominant features of the input data set, which affects the process of forecasting on a priority basis. The 20% elite data from the large dataset to be presented in next generation has been selected, so that the next population is with features having classification accuracy not less than the previous generation. For the purpose of FS string of '0' and '1' are used in the manner of chromosome segments. In this chromosome '0' shows that the specific feature corresponding to that index is not chosen and '1' shows that the specific features is chosen. Here, the length of the chromosome is same as the number of features in given dataset. Computation is undertaken using data mining workbench WEKA. The fitness function is derived through tree and will be the stratified with 10-fold cross validation (10-FCV) classification accuracy. Consequently, we apply genetic operators, iteration by iteration until we met the stopping criteria. The set of selected features is the obtained chromosomes with optimal fitness function.
The mathematical definition of the fitness function for the proposed approach for feature selection is given below: Fitness function = Classification accuracy In this work, single site crossover with 0:5 probabilities is performed in every step through roulette wheel selection. The mutation is performed with probability of 0:005. Further, we include the elite chromosomes by keeping 20% elite chromosomes for the next generation. In this way it can be seen that the resulting population has the best chromosomes which have optimal or, best classification accuracy of the given data set. The flowchart of elitist GA and the tree-based method for FS is shown in Figure 2.

Result and Discussion
The half hourly historical load and price data of New South Wales, Australia (taken from Australian Energy Market Operator (AEMO)), and weather data of Sydney City (www.weatherzone.com/au, accessed on 18 August 2019) has been taken from January 2014 to June 2016 for the forecasting. The elitist GA algorithm is performed in MATLAB software, whereas the formation of optimal regression tree was performed in WEKA software. The WEKA software was interfaced with MATLAB for performing all regression tree calculations. The final forecast was made using MATLAB Software. Electricity price, load, wind speed, temperature, and humidity are considered as input variable in the pre-

Result and Discussion
The half hourly historical load and price data of New South Wales, Australia (taken from Australian Energy Market Operator (AEMO)), and weather data of Sydney City (www.weatherzone.com/au, accessed on 18 August 2019) has been taken from January 2014 to June 2016 for the forecasting. The elitist GA algorithm is performed in MATLAB software, whereas the formation of optimal regression tree was performed in WEKA software. The WEKA software was interfaced with MATLAB for performing all regression tree calculations. The final forecast was made using MATLAB Software. Electricity price, load, wind speed, temperature, and humidity are considered as input variable in the present study. Table 1 shows the list of input variables that may affect day ahead price forecasting. Table 1 shows the input feature set and the time delay relative to forecast hour. The input feature set is taken on the basis of various literatures.  Each data set consists of 31 input features. 2016 data sets are used in the one training set to predict the electricity price. The results obtained from the proposed work are explained in two sections. The importance of FS is discussed in first section and forecast accuracy in second section. The input variables for FS of electricity price forecasting are taken from Table 1  For obtaining the FS the classification accuracy of data for the classifier is calculated using 10-FCV. It means at least once complete data is tested by this method. In this way FS is the only method that can be used for conducting feature analysis. By the present studies a detailed feature analysis can be undertaken.
From Table 2 it is clear that the input variable price of present-day (Pr 1 ) was selected in all the runs i.e., 36 times. At the same time, the price of the previous day (Pr 4 ) was selected 22 times and the hour type (Ht o ) 24 times, which shows their relative importance accordingly. The load of immediate hour (Lo 1 ) and previous day load (Lo 4 ) was an important feature and selected 20 and 23 times respectively. The temperatures of the present day (Te 1 ) selected more times than on the previous day. Wind speed of the previous day (Wi 6 ) was selected more number of times than the same day. The humidity of the previous day (Hu 6 ) was selected more times than on the same day. Effects of features can also be analyzed according to seasons. Table 2 also indicates the top 10 features selected most times.  Table 3 indicates the season-wise importance of different features. From the Table 3 it is observed that Lo 6 and Lo 1 etc. are features which are assumed to have more importance during the winter season. Hu 6 and Pr 5 etc. are features which are assumed to have more importance during the spring season. Lo 4 and Te 5 etc. are features which are assumed to have more importance during the summer season. Pr 1 is seem to be feature regardless of the season. These analysis indicated the relative importance of feature in terms of seasonal variations.
In this work, numerous error measures like root-mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and error variance (EV) are used for numerical accuracy assessment of the price forecasting as follows: where A p and F p are actual and forecasted value of electricity price at time p and K is the length of forecast horizon and A week = 1 The electricity price forecasting is undertaken using SMO regression with the FS method for the New South Wales (NSW) electricity market. The average errors of each method for all seasons are calculated week-wise. Table 4 shows comparison between the SMO regression approach and seven other approaches (SMO regression, M5P, M5P +FS, Bagging, Bagging +FS, J48, J48+FS), in terms of different error measures in terms of MAPE, RMSE, MAE and EV. It also summarizes the overall mean performance for each method, in the last column. Results show that the SMO regression + FS methods perform well over all other methods used for the comparison. The error of four prior weeks are evaluated and arranged at the regular interval of half an hour from 00:00, 00:30, 01:00,...., up to 23:30 to evaluate the confidence interval for a day. Then, hourly standard deviation (δ) and 2δ are calculated for 95% confidence interval. The upper limit and lower limit calculation is as given below: Upper Limit = Forecasted value + 2δ (8) Lower Limit = Forecasted value − 2δ (9) The results of proposed model for winter, spring summer of NSW electricity market is depicted in Figures 3-5 respectively. From the Table 4 it is clear that the proposed method (SMO regression+FS) outperforms other methods in all seasons and for all error measures. Table 5 shows percentage improvement achieved by SMO+FS over the considered approaches. It is observed that the SMO Reg + FS gives 52.16% improvement over the J48 method. It can be seen that the SMO Reg + FS has improved forecasting accuracy over all the methods considered for comparison.        Table 6 summarizes the daily MAPE corresponding to SMO regression and SMO regression +FS. The daily errors for the winter, spring and summer seasons, using the SMO regression and SMO regression + FS are depicted in Figures 6-8. These results indicate that the performance of the SMO regression + FS is generally better than the performance of the SMO regression.

Conclusions
In this paper, a novel hybrid method for day ahead electricity price forecasting is presented. The elitist GA and tree-based method is used for input features selection. The forecasting is done for whole year with FS and WoFS. MAPE, RMES, MAE and EV have been calculated day-wise, week-wise and season-wise. The result obtained from SMO regression is compared with the results of classification tree (J48) and regression tree (bagging and M5P). It can be observed from the experimental results that SMO regression with FS method provides better forecast results of electricity prices than WoFS and SMO regression outperforms classification tree (J48) and regression tree (bagging and M5P)based forecasters. It was observed that SMO regression + FS could give improved accuracy (MAPE) in the range of 23.77 and 36.27 (for M5P and bagging) to 52.16 (for J48).

Conclusions
In this paper, a novel hybrid method for day ahead electricity price forecasting is presented. The elitist GA and tree-based method is used for input features selection. The forecasting is done for whole year with FS and WoFS. MAPE, RMES, MAE and EV have been calculated day-wise, week-wise and season-wise. The result obtained from SMO regression is compared with the results of classification tree (J48) and regression tree (bagging and M5P). It can be observed from the experimental results that SMO regression with FS method provides better forecast results of electricity prices than WoFS and SMO regression outperforms classification tree (J48) and regression tree (bagging and M5P)based forecasters. It was observed that SMO regression + FS could give improved accuracy (MAPE) in the range of 23.77 and 36.27 (for M5P and bagging) to 52.16 (for J48).  To express analytically, the minimum of model objective function as a function of two parameters i.e., u and v, we have: where λ u , λ v are two constants, and Representing the model objective function in terms of a single Lagrange's multiplier by substituting s * = λ u + λ v = λ * u + λ * v for the constraint to be true after a step in parametric space keeping summative values of 'λ' be fixed: Taking partial derivative w.r.t.λ v and equating to zero, dW(λ v ) dx = 0, we obtain: The steps involved will minimize the global objective function, if any of the parameter violates the KKT (Karush-Kuhan-Tucker) conditions of regression i.e., If none of the parameters violates the KKT condition, the global minima have been reached.