A Novel Feature Selection and Short-Term Price Forecasting Based on a Decision Tree (J48) Model

: A novel feature selection method based on a decision tree (J48) for price forecasting is proposed in this work. The method uses a genetic algorithm along with a decision tree classiﬁer to obtain the minimum number of features giving an optimum forecast accuracy. The usefulness of the proposed approach is established through the performance test of the forecaster using the feature selected by this approach. It is found that the forecast with the selected feature consistently out-performed than that having larger feature set


Introduction
In the de-regulated scenario, generators, as well as consumers, are free to buy and sell electricity as per their choice.Every market participant needs to know the accurate electricity price for each load block to achieve maximum profitability.If the electricity market prices can be predicted accurately, generating companies and large-scale enterprises can reduce their risks and further maximize outcomes [1].Price forecasting differs with load forecasting due to uncertainties involved in operation and bidding strategies [2].It is more complex than load forecasting.In the current deregulated environment, the price forecasting has emerged as one of the major challenges for researchers and academics [3].Researchers are continuously working to develop efficient tools and algorithms for electricity price forecasting.
Although load forecasting and price forecasting are mutually dependent, electricity price forecasting is a much more complex process due to its unique characteristics, such as non-constant mean and variance, high frequency, calendar effect, multiple seasonality, high level of volatility and high percentage of unusual price movements.These characteristics are due to various reasons such as, non-storable nature of electrical energy, inelastic nature of demand over short time period, balance between demand and supply, and oligopolistic generation side.Due to these unique aspects, electricity price forecasting methods need sophisticated and modern techniques and tools to cater to the demand of market players.Electricity price forecasting can be further divided on the basis of time span and tools used.
Alanis et al. [4] proposed, recurrent neural network which is based on the Kalman filter and which included stability proof using the Lyapunov methodology for cases of one-step ahead and n-step ahead electric energy price prediction.Rafiei et al. [5] used a probabilistic approach for the hourly electricity price forecasting and have used a bootstrapping technique to implement the uncertainty in the forecasting model as an uncertainty factor.A generalized learning method is applied for fast and low computational cost for daily forecasting.Kim et al. [6] introduced a cuckoo search Levenberg-Marquardt (CSLM)-trained, CSLM feedforward neural network (CSLM-FFNN) for the electricity price forecasting process, which combines the improved Levenberg-Marquardt and cuckoo search algorithms using actual power generation and system load as input sets.Darudi et al. [7] used a hybrid electricity price forecasting methodology with a new data fusion algorithm combining artificial neural network (ANN), adaptive neuro-fuzzy inference system, and autoregressive moving average methods to extract the advantages of these forecasting engines.Sarikprueck et al. [8] presented a novel hybrid market price forecasting method with data clustering techniques to predict very short time price forecasting for non-spike and spike wholesale market prices.The authors used support vector classification for spike price occurrences and then support vector regression to forecast the value of both non-spike and spike prices.Furthermore, three clustering techniques including classification and regression trees, K-means, and stratification methods are used to minimize high error spike magnitude evaluation.Wu et al. [9] proposed a hybrid, two-stage integrated price and load forecasting model to predict the integrated day-ahead electricity price as well as load in a smart grid.Wan et al. [10] discussed, a comprehensive evaluation model for probabilistic price forecasting.In this a hybrid method is used to construct prediction intervals of MCPs with a two-stage formulation.Miranian et al. [11] presented a singular spectrum analysis (SSA) technique-based method for day-ahead electricity price forecasting.SSA is used to decompose the original electricity price series into trend, periodic and noisy components.Wu et al. [12] used functional principal component analysis (FPCA which is a sophisticated tool of multivariate analysis to forecast the electricity prices.They further used a recursive dynamic factor analysis (RDFA) algorithm to reduce the arithmetic complexity.Mandal et al. [13] developed a hybrid intelligent algorithm using data filtering based on wavelet transform.In this an optimization technique based on the firefly (FF) algorithm along with a soft computing model based on fuzzy ARTMAP (FA) network has been effectively incorporated to forecast day-ahead electricity prices in the Ontario market.Chen et al. [14] introduced a method using extreme learning machine and bootstrapping for electricity price forecasting.They have used a fast method of single hidden layer feed-forward neural networks instead of the slow method of the back-propagation (BP) approach.The extreme learning machine (ELM) is used to overcome the drawbacks of ANNs.Catalao et al. [15] proposed a combined model of wavelet transform, particle swarm optimization, and an adaptive-network-based fuzzy inference system to predict the electricity price in case study of electricity market of mainland Spain.Saini et al. [16] emphasized parameter selection in electricity price forecasting.They used a support vector machine (SVM) tool for function approximation and genetic algorithm (GA) for optimization of electricity price forecasting model.Huang et al. [17] presented, a data mining approach for electricity price classification.The authors focus on price threshold prediction for operation decisions for demand side management.Areekul et al. [18] developed a hybrid methodology that combines both artificial neural network (ANN) and autoregressive integrated moving average (ARIMA) models for electricity price forecasting.
The process of selecting a subset of relevant features from a large dataset is termed feature selection [19].Computational complexity of the system models can be effectively handled with the help of feature selection.Numbers of feature selection techniques are available for the reduction of dimensionality and optimization of the features [20].Feature selection is problem-dependent.In [21,22], Feature ranking and subset selection are the two techniques used for feature selection.Amjady et al. [23] proposed an improved version of a relief algorithm for feature selection and a hybrid neural network for forecasting of electricity price.In another paper, Amjady et al. [24] developed a combination of a feature selection technique and cascaded neuro-evolutionary algorithm (CNEA).They used an improved version of the mutual information (MI) technique for feature selection, along with CNEA, which is composed of cascaded forecasters, a neural network (NN), and an evolutionary algorithm (EA).Abedinia et al. [25] proposed a feature selection method for load and price forecasting.Here, modeling of the interaction, in addition to relevancy and redundancy, based on information-theoretic criteria along with a hybrid filter-wrapper approach for feature selection, has been used.Tahmasebifar et al. [26] employed a new hybrid method to estimate point forecasts.The hybrid employed combination of wavelet transformation (WT), extreme learning machine (ELM), feature selection based on mutual information (MI), and bootstrap approaches in an ensemble structure.Abedinia et al. [27] presented a novel forecast approach based on combination of a neural network with a meta-heuristic algorithm as the hybrid forecasting engine.Gonzalez et al. [28] discussed regression tree-based models, like bagging and random forests, to identify the inputs dominating the marginal price and highlighted the effectiveness of the proposed ensemble of tree-based models.Baek et al. [29] presented a next day forecasted for total daily solar irradiation through an ensemble of multiple machine learning algorithms using forecasted weather scenarios from numerical weather prediction (NWP) models.Many data trimming techniques, such as outlier detection, input data clustering, and input data pre-processing, are developed and compared.Gao et al. [30] proposed a short-term electricity load forecasting model based on an empirical mode decomposition-gated recurrent unit (EMD-GRU) with feature selection (FS-EMD-GRU).First, the original load series is decomposed into several sub-series by EMD.Then, they analyzed the correlation between the sub-series and the original load series through the Pearson correlation coefficient method.
Price forecasting is a tricky problem as it depends not only on combination of several factors, such as time of day and type of weather (temperature, humidity etc.), but also on the bid pricing and market dynamics.The proposed method rests on the motivation that the electricity prices are a function of combinations of feature values rather than assuming that each of the features contribute a specific amount to the prices, i.e., regression with time series and other variables.
Tree-based classification J48 has been used in several problems.In this paper an attempt has been made to study its applicability and usefulness for the price forecasting problem.In earlier literature it was observed that various classifiers based on SVM, ANN, fuzzy, etc., had been used.However, the method based on a tree-based classifier based on J48 had not been used earlier.In this paper the following novel contribution have been made: 1.
Using a combination of GA and J48 for feature selection in the price forecasting problem.

2.
Using the J48 classifier for prediction of Australian price data.

3.
Applying the confidence interval for fixing the error margins in the prices forecasted.

4.
Season-wise feature selection is attempted to draw certain insight on the number and type of features affecting the price in different seasons.
This paper focuses on feature selection for short-term price forecasting.Therefore, only the forecasting of day-ahead prices is considered with and without selected features.Although the method proposed is general in nature, in this work, the forecast of electricity price on a half-hourly basis and for each day is considered.The method proposes the decision tree algorithm combined with genetic algorithms for feature selection for price forecasting.Later, these selected features are used with decision trees to forecast the prices.The usefulness of the proposed algorithm is established by comparing the forecasts obtained using the full feature set with that of the reduced feature set.The work proposed in the paper can lead to some new insights on the type and the seasonality of features and their effects on electricity prices.
Section 2 of the paper explains the working of the decision tree classifier.The application of the classifier for the task of price forecasting is explained with an example.Section 3 presents the method of feature selection using a GA and J-48 classifier.Section 4 present the results of feature selection and the performance of the proposed method in detail.The conclusion and findings are summarized in Section 5.

Decision Tree Classifier (J48)
The J48 classifier is a decision tree classifier.By applying J48 one can predict the class label of a new record in a dataset out of a list of dependent and independent variables.The attribute which is to Energies 2019, 12, 3665 4 of 17 be predicted is known as the dependent variable and the other attributes which help in predicting it are known as independent variables.The decision tree models the classification process through symbols of nodes and branches.The nodes of the tree denote different attributes and branches represents the splitting of the attributes based on their values and leaves denotes the classes of the dependent variable.The node at a particular level is found on the basis of highest information gain ratio obtained on the set of available attributes, and the same attribute is selected for further branching.Splitting is done on the basis of the highest information gain for a selected node attribute.It creates a decision tree based on the attribute values of the available training set.
Here, a stratified 10-fold cross-validation (10-FCV) classification accuracy using J48 has been used as the fitness function in the genetic algorithm (GA) for feature selection.The standard way of forecasting the error rate of a learning technique given a single, fixed sample of data is to use stratified 10-FCV.Data is divided into ten parts randomly and the class is represented in approximately the same proportions as in the full dataset.The error rate is calculated on the holdout set after each part is held out in turn and the learning scheme trained on the remaining nine.The learning process is executed ten times on different training sets.The 10 error estimates are finally averaged to obtain an overall estimate of the classification error [31].
Dataset S = (X 1, . . ., n , C i ), where n is the number of independent variables and C i is the dependent variable, i = 1, 2, . . ., K, where K is the number of classes of dependent variable.A new node is added to the decision tree for every partition.In a partition S, a test attribute X is selected for further partitioning the set into S 1 , S 2 , . . ., S L .New nodes for S 1 , S 2 , . . ., S L are created and these are added to the decision tree as children of the node for S. The node for S is labelled with test X, and partitions S 1 , S 2 , . . ., S L are then recursively partitioned.A partition in which all the records have identical class labels is not partitioned further and the leaf corresponding to it is labeled with the dependent variable.

Decision Tree Algorithm (J48):
The algorithm to construct the decision tree using J48 takes following steps [32].
Step 1: Calculate Entropy(S) of the training set S as follows: where |S| is the number of sample in the training set.C i is a dependent variable, i = 1, 2, . . ., K, K is the number of classes of the dependent variable, and freq(C i , S) is the number of samples included in class C i .
Step 2: Calculate the Information Gain X(S) for test attribute X to partition: where L is the number of test outputs, X, S i is a subset of S corresponding to i th output, and |S i | is the number of dependent variables of subset S i .For a particular attribute partition the subset which provides maximum information gain will be selected as the threshold.The two partitions S and S-S i will be the branch of the node.In case the instances belong to the same class, the tree represents a leaf; so the leaf is returned by labeling with the same dependent variables (class).
Step 3: Calculate the partition information value Split Info(X) acquiring for S partitioned into L subsets: Energies 2019, 12, 3665 5 of 17 Step 4: Calculate the Gain Ratio(X): Step 5: The attribute having the highest gain ratio will be designated as the root node and the same computation from step 1 to step 4 is done for every intermediate node until all the instances re exhausted and it reaches the leaf node according to step 2.
Following are the characteristic features of the J48 algorithm: (1) It handles classification with the missing values in the data.
(2) It can be applied to both discrete and continuous variables.
(3) It also performs the pruning of the tree.
(4) It can handle high dimensional data.
(5) It replaces internal node with a leaf node and thus reduces the error rate.

Input Feature Selection Using a Genetic Algorithm
Genetic Algorithm (GA) is a heuristic search technique based on the Darwinian theory of natural evolution and genetics of survival of the fittest.To generate useful solutions to optimization problems, his heuristic is used.While searching the space for feature selection, no assumption of the relationships among features involved has been considered in this approach.GA can easily encode decisions as sequences of boolean values, allowing for exploration of the feature space by retaining the decisions that benefit the classification task.It also avoids local optimums due to their intrinsic randomness simultaneously [33,34].Also, it generates the solutions to optimization problems using operators inspired by natural evolution, like selection, crossover, and mutation [35,36].
The present method of decision trees is used for two purposes viz.(i) feature selection and (ii) to classify the target class, i.e., the price for the given feature set.The regression trees have normally been used for predicting the data given the feature values.Unlike regression trees, which are normally used to predict a value for given set of features, the present method of using a decision tree requires that the target classes be fixed beforehand for building the decision tree.Thus, we need to have discretized data for all features, and the data are discretized in a much wider range as there may be cases which may be beyond any values.If the data could go beyond any value, then it is fixed to the maximum value of the minimum value taken for that particular variable.In the present work the variables were discretized at 2.5% of the range.The value was arrived through experimenting with values in the range of 1-5%.
In the present work, feature selection is performed using an evolutionary elitist Genetic algorithm [34].The significant features of input dataset, which affects the forecasting process meritoriously, are selected simultaneously.
In the elitist GA the 20% elite population is passed to the next generation, by which the next generation has the population of feature sets whose classification accuracy is not less than the previous generation.Parameters for this method is same as in [37], and the fitness function used is the classification accuracy (stratified 10-fold cross-validation classification accuracy) of the given dataset.Strings of 0s and 1s are taken as chromosome segments in the present problem: in these chromosomes 1 shows that the particular feature is selected and 0 shows that the feature corresponding to that index is not selected.The length of the string is the same as the number of features in the dataset.All the computation of the fitness function is the stratified 10 FCV classification accuracy, which is being computed using the WEKA data mining workbench [31], through a decision tree-based J48 classifier.Classification accuracy shows an estimate of the number of instances correctly classified.Resulting Roulette wheel selection has been used in this work and then single-site crossover with 0.7 probability is performed in every step.The mutation operator is also performed with a mutation probability 0.005.Furthermore, 20% of the elite population was passed on to the next generation as it is.The final combination or encoding of chromosomes provide the best set of selected features for the forecasting of the electricity price.
The flowchart of the GA for feature selection is shown in Figure 1.

Results
For short-term price forecasting data is collected from the Australian Energy Market Operator for New South Wales, Australia.The data consists of half-hourly load and prices of all seasons from January 2014 to June 2016.The weather data of Sydney City is taken from www.weatherzone.com/au.Weather data considered in the present studies are half-hourly wind speed, temperature, and humidity.All the data were quantized at 40 levels, each level consisting of 2.5% of the range.Thus, for a particular week, all the data have been classified to have only 40 discrete values.
Table 1 shows the list of features which are assumed to affect half-hourly electricity prices.It has been observed that there is similarity in energy prices for the same hours of the day, although, there

Results
For short-term price forecasting data is collected from the Australian Energy Market Operator for New South Wales, Australia.The data consists of half-hourly load and prices of all seasons from January 2014 to June 2016.The weather data of Sydney City is taken from www.weatherzone.com/au.Weather data considered in the present studies are half-hourly wind speed, temperature, and humidity.All the data were quantized at 40 levels, each level consisting of 2.5% of the range.Thus, for a particular week, all the data have been classified to have only 40 discrete values.
Table 1 shows the list of features which are assumed to affect half-hourly electricity prices.It has been observed that there is similarity in energy prices for the same hours of the day, although, there may be small shift in the price of similar hours.It is quite common to assume the phenomena of "similar hours" of the day as input in load and price forecasting literature.The set of input features thus consisted of 31 features.The training set for the classifier consisted of 2016 datasets.The results of the present work are discussed in two parts.The first part discusses the results of feature selection and its significance.The second part discusses the accuracy of the forecasted results.Feature selection: The method of simultaneous feature selection using an elitist genetic algorithm is employed in this work.The features listed in Table 1 are taken as the input feature set from which the feature selection is done.The process of feature selection was performed weekly.The training set for the classifier consists of 2016 datasets.The graph showing the evolution of the fitness function with generations of the first week of February 2016 is shown in Figure 2.
The training set was taken on the concept of similar weeks.The dataset corresponds to the five similar weeks of the months of the last year and the past week the same year.For example, if the feature The classifier uses stratified 10-fold cross-validation classification accuracy methodology for finalizing the feature selection.Thus, the whole of the data is tested in this method at least once.The major advantage of feature selection is that the method can be employed for conducting feature analysis, as to which feature or the component is affecting the forecast or consumption pattern more significantly.A detailed feature analysis is performed on the basis of the present studies.Table 2 shows that, out of a total of 36 times for which the feature selections were made, the number of times a particular feature was selected.It is observed from Table 2 that the feature price of the present day (P1) is selected in all the runs, the hour type (Ho) was the second most important feature and was selected 25 times.It is notable that the humidity of previous day (H5 and H6) was selected a greater number of times than the same day (H1 and H2).The wind speed and temperatures were selected fewer times than humidity.The load of the immediate hour (L1) was selected only 12 times compared to load of the 24-hours before (L3), which was selected 16 times.The effects of the features can also be analyzed according to the season.
Table 3 shows the number of times a feature was selected in a particular season.From the table it can be observed that the load previous hour load value is invariably selected in winters whereas, in  Table 2 shows that, out of a total of 36 times for which the feature selections were made, the number of times a particular feature was selected.It is observed from Table 2 that the feature price of the present day (P1) is selected in all the runs, the hour type (Ho) was the second most important feature and was selected 25 times.It is notable that the humidity of previous day (H5 and H6) was selected a greater number of times than the same day (H1 and H2).The wind speed and temperatures were selected fewer times than humidity.The load of the immediate hour (L1) was selected only 12 times compared to load of the 24-hours before (L3), which was selected 16 times.The effects of the features can also be analyzed according to the season.Table 3 shows the number of times a feature was selected in a particular season.From the table it can be observed that the load previous hour load value is invariably selected in winters whereas, in spring, the previous day load values are not selected.Similarly, the weather variables, such as wind speed, temperature, and humidity, are selected quite regularly in winter as compared to that of summer and spring seasons.The maximum and minimum number of features selected for each of the seasons are depicted in Table 4. Table 4 also shows the week for which the minimum and maximum number of features were selected.It is observed that the minimum number of features selected were five in the winter season during July 8-14, 2015.The forecasts obtained for same period are depicted in Figures 3 and 4. The mean absolute percentage error (MAPE) for this week with feature selection (FS) is 9.38, whereas without feature selection (WoFS) it is 11.72.It is observed that, despite the very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all 31 features.However, the maximum number of features selected were 14 in the winter season during Aug 8-14, 2015.The forecasts obtained for Aug 8-14, 2015 are depicted in Figures 5 and 6.The MAPE for this week with FS is 10.22, whereas the same for WoFS is 10.47.It is observed that the results obtained from FS are better, compared to that of WoFS, which has all the 31 features.It is observed that the minimum number of feature selected were two in the spring season during September 8-14, 2015.The forecasts obtained for September 8-14, 2015 is depicted in Figures 7 and 8.The    It is observed that the minimum number of feature selected were two in the spring season during September 8-14, 2015.The forecasts obtained for September 8-14, 2015 is depicted in Figures 7 and 8.The MAPE for this week with FS is 8.73, whereas for WoFS it is 10.05.It is also observed that, despite very    It is observed that the minimum number of feature selected were two in the spring season during September 8-14, 2015.The forecasts obtained for September 8-14, 2015 is depicted in Figures 7 and 8.The MAPE for this week with FS is 8.73, whereas for WoFS it is 10.05.It is also observed that, despite very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all    It is observed that the minimum number of feature selected were two in the spring season during September 8-14, 2015.The forecasts obtained for September 8-14, 2015 is depicted in Figures 7 and 8.The MAPE for this week with FS is 8.73, whereas for WoFS it is 10.05.It is also observed that, despite very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all It is observed that the minimum number of feature selected were two in the spring season during September 8-14, 2015.The forecasts obtained for September 8-14, 2015 is depicted in Figures 7 and 8.The MAPE for this week with FS is 8.73, whereas for WoFS it is 10.05.It is also observed that, despite very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all the 31 features.However, the maximum number of feature selected were 15 in the winter season during Sept. 1-7, 2015.The forecasts obtained for Sept. 1-7, 2015 is depicted in Figures 9 and 10.The MAPE for this week with FS is 11.23, whereas for WoFS it is 11.70.It is observed that, with the number of features in the case of FS, the results are better, compared to that of WoFS, which has all the 31 features.It is observed that the minimum number of features selected was three in the summer season    It is observed that the minimum number of features selected was three in the summer season    It is observed that the minimum number of features selected was three in the summer season     It is observed that the minimum number of features selected was three in the summer season during December 8-14, 2015.The forecasts obtained for December 8-14, 2015 is depicted in Figures 11 It is observed that the minimum number of features selected was three in the summer season during December 8-14, 2015.The forecasts obtained for December 8-14, 2015 is depicted in Figures 11  and 12.The MAPE for this week with FS is 10.69, whereas for WoFS it is 13.08.It is observed that despite the very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all 31 features.However, the maximum number of features selected were 19 in the summer season during Feb 8-14, 2016.The forecasts obtained for Feb 8-14, 2016 is depicted in Figures 13 and 14.The MAPE for this week with FS is 13.70, whereas for WoFS it is 12.81.

Fitness Function
Energies 2019, 12, x FOR PEER REVIEW 12 of 17 and 12.The MAPE for this week with FS is 10.69, whereas for WoFS it is 13.08.It is observed that despite the very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all 31 features.However, the maximum number of features selected were 19 in the summer season during Feb 8-14, 2016.The forecasts obtained for Feb 8-14, 2016 is depicted in Figures 13 and 14.The MAPE for this week with FS is 13.70, whereas for WoFS it is 12.81.The feature selection was made season-wise.The MAPE recorded for whole of season of summer (i.e., from Dec. to Feb.) with feature selection was 12.48% whereas, the accuracy without feature selection was found to be 14.06%.The cases depicted in Figure 13 (without feature selection) and Figure 14 (with feature selection) had MAPEs of 12.81 and 13.70, respectively.Thus, for a particular week the error was higher in the case of feature selection to that without feature selection.This specific case was considered as a worst case and, therefore, depicted in Figures 13 and 14.    and 12.The MAPE for this week with FS is 10.69, whereas for WoFS it is 13.08.It is observed that despite the very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all 31 features.However, the maximum number of features selected were 19 in the summer season during Feb 8-14, 2016.The forecasts obtained for Feb 8-14, 2016 is depicted in Figures 13 and 14.The MAPE for this week with FS is 13.70, whereas for WoFS it is 12.81.
The feature selection was made season-wise.The MAPE recorded for whole of season of summer (i.e., from Dec. to Feb.) with feature selection was 12.48% whereas, the accuracy without feature selection was found to be 14.06%.The cases depicted in Figure 13 (without feature selection) and Figure 14 (with feature selection) had MAPEs of 12.81 and 13.70, respectively.Thus, for a particular week the error was higher in the case of feature selection to that without feature selection.This specific case was considered as a worst case and, therefore, depicted in Figures 13 and 14.The feature selection was made season-wise.The MAPE recorded for whole of season of summer (i.e., from Dec. to Feb.) with feature selection was 12.48% whereas, the accuracy without feature selection was found to be 14.06%.The cases depicted in Figure 13 (without feature selection) and Figure 14 (with feature selection) had MAPEs of 12.81 and 13.70, respectively.Thus, for a particular week the error was higher in the case of feature selection to that without feature selection.This specific case was considered as a worst case and, therefore, depicted in Figures 13 and 14.The feature selection was made season-wise.The MAPE recorded for whole of season of summer (i.e., from Dec. to Feb.) with feature selection was 12.48% whereas, the accuracy without feature selection was found to be 14.06%.The cases depicted in Figure 13 (without feature selection) and Figure 14 (with feature selection) had MAPEs of 12.81 and 13.70, respectively.Thus, for a particular week the error was higher in the case of feature selection to that without feature selection.This specific case was considered as a worst case and, therefore, depicted in Figures 13 and 14.
Forecast accuracy after forming the training set and testing set data, as well as the daily forecast using J48 classifier, are computed.The MAPE is calculated for the entire week.The MAPEs calculated without feature selection (WoFS) and with feature selection (FS) for three seasons are shown in Table 5.The monthly data is categorized in the three seasons viz.summer, winter, and spring.The data from June 2015 to August 2015 falls under summer, the data from September 2015 to November 2015 under spring, and the data from December 15 to February 16 under winter.This result is further obtained weekly without feature selection (WoFS) and with feature selection (FS).Additionally, month-wise, season-wise, and yearly averages are determined for the WoFS and FS.When the MAPE is compared season-wise, it is observed that the FS based method provides a lower MAPE (11.07) than that WoFS (11.62) for winter.This is found for all the three seasons.However, the effect of feature selection is more pronounced in case of summer season.In summer, FS methods yields a MAPE of 12.48 as compared to 14.06 to that of WoFS.It is observed that the overall mean MAPE is lower with FS as compared to that of WoFS in three weeks, i.e., first, second, and fourth.However, for the third week the MAPEs are comparable for FS and WoFS.
The MAPE obtained for weekdays and weekend days are depicted in Table 6.It is observed that, in general, prices of weekdays can be forecasted less accurately as compared to that of weekend days.The Maximum MAPE is obtained for Wednesdays.Mondays in the summer season yielded the maximum MAPE.It is observed that when the FS is employed the forecasts improves quite significantly for weekend days.The comparison with other forecasting methods (ANN, SVM, and regression model) for the same data is computed for summer season and summarized in Table 7.The comparison shows that the proposed method J48 with FS performs competitively.To calculate the confidence interval for a particular day the error of four previous weeks are calculated and arranged at the regular interval of half an hour from 00:00, 00:30, 01:00, ..., up to 23:30.Then, the hourly standard deviation and 2δ are calculated for the 95% confidence interval.The calculation of upper limit and lower limit of the day is given below: Upper limit = forecasted value + 2δ Lower limit = forecasted value − 2δ A typical forecast for weekdays of winter season with confidence interval is shown in Figures 15  and 16 for WoFS and FS.It is observed that the FS (6.03) method provides a lower MAPE than that of WoFS (6.88).A forecast for weekend days during the winter season with the confidence interval is shown in Figures 17 and 18 for WoFS and FS.It is observed that the FS (4.98) method provides a lower MAPE than that of WoFS (8.60).The comparison with other forecasting methods (ANN, SVM, and regression model) for the same data is computed for summer season and summarized in Table 7.The comparison shows that the proposed method J48 with FS performs competitively.To calculate the confidence interval for a particular day the error of four previous weeks are calculated and arranged at the regular interval of half an hour from 00:00, 00:30, 01:00, ..., up to 23:30.Then, the hourly standard deviation and 2δ are calculated for the 95% confidence interval.The calculation of upper limit and lower limit of the day is given below:

Upper limit = forecasted value + 2δ
Lower limit = forecasted value − 2δ A typical forecast for weekdays of winter season with confidence interval is shown in Figures 15  and 16 for WoFS and FS.It is observed that the FS (6.03) method provides a lower MAPE than that of WoFS (6.88).A forecast for weekend days during the winter season with the confidence interval is shown in Figures 17 and 18 for WoFS and FS.It is observed that the FS (4.98) method provides a lower MAPE than that of WoFS (8.60).

Conclusion
In this paper, a decision tree method with a novel feature selection is presented for predicting electricity prices.The method uses an elite genetic algorithm and decision tree classifier for feature selection and the process of feature selection has been performed weekly.This paper explains the effect of feature selection year-wise and season-wise and also analyzes the effect of feature selection when the maximum and minimum number of features are selected.It is observed that certain features are selected a greater number of times than the others depending on the season, and also that for the same system in a year the number of required features can be as low as 2 to as large as 19.The mean absolute percentage error (MAPE) has been calculated day-wise, week-wise, and season-wise without feature selection (WoFS) and with feature selection (FS), using predicted data for the whole year.It is established from the result that the proposed feature selection (FS) method provides better forecast accuracy of electricity prices in comparison to that using the full set.

Conclusion
In this paper, a decision tree method with a novel feature selection is presented for predicting electricity prices.The method uses an elite genetic algorithm and decision tree classifier for feature selection and the process of feature selection has been performed weekly.This paper explains the effect of feature selection year-wise and season-wise and also analyzes the effect of feature selection when the maximum and minimum number of features are selected.It is observed that certain features are selected a greater number of times than the others depending on the season, and also that for the same system in a year the number of required features can be as low as 2 to as large as 19.The mean absolute percentage error (MAPE) has been calculated day-wise, week-wise, and season-wise without feature selection (WoFS) and with feature selection (FS), using predicted data for the whole year.It is established from the result that the proposed feature selection (FS) method provides better forecast accuracy of electricity prices in comparison to that using the full set.

Energies 2019 ,
12, 3665 6 of 17 chromosomes with an optimal fitness value is the set of selected features.The mathematical definition of the fitness function for the GA for feature selection is given below: Fitness function = Classification accuracy = No.o f instances correctly classi f ied Total No.o f instance in the data set * 100 (5)

Figure 1 .
Figure 1.Flowchart of feature selection using a GA.

Figure 1 .
Figure 1.Flowchart of feature selection using a GA.

Energies 2019 ,
12,  x FOR PEER REVIEW 11 of 17 the 31 features.However, the maximum number of feature selected were 15 in the winter season during Sept. 1-7, 2015.The forecasts obtained for Sept. 1-7, 2015 is depicted in Figures9 and 10.The MAPE for this week with FS is 11.23, whereas for WoFS it is 11.70.It is observed that, with the number of features in the case of FS, the results are better, compared to that of WoFS, which has all the 31 features.

Figure 10 .
Figure 10.September 1-7, 2015 forecasting with the maximum number of features selected in the spring season (MAPE = 11.23).

Figure 10 .
Figure 10.September 1-7, 2015 forecasting with the maximum number of features selected in the spring season (MAPE = 11.23).

Figure 15 .
Figure 15.Winter weekdays plot of Tuesday (7 July 2015) without selected features with a MAPE of 6.88.Figure 15.Winter weekdays plot of Tuesday (7 July 2015) without selected features with a MAPE of 6.88.

Figure 15 .
Figure 15.Winter weekdays plot of Tuesday (7 July 2015) without selected features with a MAPE of 6.88.Figure 15.Winter weekdays plot of Tuesday (7 July 2015) without selected features with a MAPE of 6.88.

Figure 15 .
Figure 15.Winter weekdays plot of Tuesday (7 July 2015) without selected features with a MAPE of 6.88.

Figure 16 .
Figure 16.Winter weekdays plot of Tuesday (7 July 2015) with selected features with a MAPE of 6.03.Figure 16.Winter weekdays plot of Tuesday (7 July 2015) with selected features with a MAPE of 6.03.

Figure 18 .
Figure 18.Winter weekends plot of Saturday (11 July 2015) with selected features with a MAPE of 4.98.

Figure 17 .
Figure 17.Winter weekend plot of Saturday (11 July 2015) without selected features with a MAPE of 8.60.

Figure 18 .
Figure 18.Winter weekends plot of Saturday (11 July 2015) with selected features with a MAPE of 4.98.

Table 1 .
List of features which are assumed affect half-hourly electricity.

Table 2 .
Number of times a feature is selected year-wise.

Table 2 .
Number of times a feature is selected year-wise.

Table 3 .
Number of times a feature is selected season-wise.

Table 4 .
Maximum and minimum number of features selected in all season.
Energies 2019, 12, x FOR PEER REVIEW 12 of 17 and 12.The MAPE for this week with FS is 10.69, whereas for WoFS it is 13.08.It is observed that despite the very small number of features in the case of FS, the results are better, compared to that of WoFS, which has all 31 features.However, the maximum number of features selected were 19 in the summer season during Feb 8-14, 2016.The forecasts obtained for Feb 8-14, 2016 is depicted in Figures 13 and 14.The MAPE for this week with FS is 13.70, whereas for WoFS it is 12.81.

Table 5 .
Mean absolute percentage error (MAPE) with FS and WoFS.

Table 7 .
Comparison of results of the proposed method with other forecasting methods.

Table 7 .
Comparison of results of the proposed method with other forecasting methods.