Price Forecasting in the Day-ahead Energy Market by an Iterative Method with Separate Normal Price and Price Spike Frameworks

A forecasting methodology for prediction of both normal prices and price spikes in the day-ahead energy market is proposed. The method is based on an iterative strategy implemented as a combination of two modules separately applied for normal price and price spike predictions. The normal price module is a mixture of wavelet transform, linear AutoRegressive Integrated Moving Average (ARIMA) and nonlinear neural network models. The probability of a price spike occurrence is produced by a compound classifier in which three single classification techniques are used jointly to make a decision. Combined with the spike value prediction technique, the output from the price spike module aims to provide a comprehensive price spike forecast. The overall electricity price forecast is formed as combined normal price and price spike forecasts. The forecast accuracy of the proposed method is evaluated with real data from the Finnish Nord Pool Spot day-ahead energy market. The proposed method provides significant improvement in both normal price and price spike prediction accuracy compared with some of the most popular forecast techniques applied for case studies of energy markets.


Introduction
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry.Unlike electricity demand series, electricity price series can exhibit variable means, major volatility and significant spikes [1].
Based on the needs of the energy market, a variety of approaches for electricity price forecasting have been proposed in the last decades, among them, models based on simulation of power system equipment and related cost information [2], game-theory based models which focus on the impact of bidder strategic behavior on electricity prices [3], models based on stochastic modeling of finance [4], regression models [5] and artificial intelligence models [6][7][8][9].In recent years, hybrid approaches have become popular since it is almost universally agreed in the forecasting literature that no single method is best in every situation [10,11].
While most existing approaches to forecasting electricity prices are reasonably effective for normal prices, they cannot deal with the price spikes accurately.In early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters [12,13].Electricity price spikes, however, are significant for energy market participants to stay competitive in a competitive market.
The Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) process has been tested to simulate price spikes in original price series and has not been able incorporate spikes with a height usually observed in the original prices [14].Spikes have been incorporated into a Markov-switching model and diffusion models with the addition of a Poisson jump component [15].Data mining techniques have been applied to the spike forecasting problem and have achieved promising results [16][17][18][19].
An analysis of price forecasts along with knowledge of upcoming price spikes is important for market participants to estimate their potential and stay competitive in a competitive market.However, most of the work on electricity market price forecasting is concentrated on improving forecast accuracy rather than the effects of price forecast inaccuracy on market participants.Only a few approaches have been reported in the literature to deal with the problem of future price uncertainty in operation planning in competitive environments [20][21][22].
The methodology presented in this paper uses an iterative hybrid approach to separately predict normal prices and price spikes in the Finnish day-ahead energy market.Such a strategy provides an opportunity to train the forecasting models more effectively while the non-separate forecasting methods should learn the behaviors of both normal prices and price spikes.The proposed approach uses a wavelet transform (WT) combined with ARIMA, a neural network (NN), a compound classifier and a k-nearest neighbor model (k-NN) to separately implement normal price and price spikes forecasting processes.WT deals with non-stationarity by decomposing the price series into less volatile components.The ARIMA model captures cyclicality of the series clearly exhibiting hourly and weekly patterns.The compound classifier discriminates normal price and price spike processes to separately predict values of those processes by different forecasting engines.The combined ARIMA and NN frameworks produce normal price forecasts by capturing linear and non linear patterns between target and exogenous variables.The k-NN is applied for the price spike value prediction.Time-varying model parameters allow capturing of localized trending of the series.
The methodology is evaluated with real data of Finnish day-ahead energy market.It can, however, be considered to have applicability for the entire Nordic region, as well as deregulated markets in other countries or even for financial markets, since the methodology addresses common statistical features of the price series.

Nordic Energy Market
The Nordic region has considerable experience with deregulated electricity markets.The Nordic electricity market was formed in 1993 in conjunction with deregulation of electricity markets in the region.The derivatives and energy markets were separated in 2002 to establish Nord Pool Spot, which currently operates in Norway, Denmark, Sweden, Finland, Estonia, Latvia and Lithuania.
The main goal of Nord Pool Spot is to balance the generation of electricity with the electricity demand, precisely and at an optimal price, so-called equilibrium point trading.The optimal price represents the cost of producing one kilowatt hour of power from the most expensive source needing to be employed in order to balance the system.All the employed generators are paid the same market price.
Two different physical operation markets are organized in Nord Pool Spot: Elspot and Elbas.Elspot is a day-ahead energy market in which market participants submit offers to sell, or bids to buy, physical electricity for the next day.Elbas is an intra-day energy market where trades are adjusted in the day-ahead market until one hour prior to delivery time.

Mathematical Framework
Before the prediction strategy is described, key features of WT, ARIMA, NN, feature selection technique and different classification frameworks are first introduced.

WT
When using classical statistical techniques, a stationary process is assumed for the data.For electricity price time series, the assumption of stationarity usually has to be rejected.One way to capture localized trending in the series is to apply models with time-varying parameters [23].Another way to deal with non-stationarity is the use of mathematical transformations of an initial series.In many cases, information that cannot be readily seen in the time domain can be obtained in the frequency domain.Fourier transform (FT) and short time FT (STFT) are probably the most popular transforms and are used in many different areas, including many branches of engineering.However, these transforms provides poor time or frequency resolution.The WT was developed as an alternative approach to FT/STFT to overcome the resolution problem [24].Wavelet analysis begins with selection of a proper wavelet (mother wavelet) and analysis of its translated and dilated versions [25].It is advantageous to scale and translate the mother wavelet using defined scales and positions usually based on powers of two [26].This technique is known as the discrete WT.An algorithm to implement discrete WT using filters has been developed by Mallat.Multiresolution via Mallat's algorithm is a procedure to obtain approximations (A) and details (D) from a given signal f [27].In the reconstruction stage, these components can be assembled back into the original signal f' (see Figure 1).In this paper, a Daubechies wavelet of order 5 is used as the mother wavelet to transform the price series into several wavelet subseries.This wavelet offers an appropriate trade-off between wavelength and smoothness, resulting in an appropriate behavior for the price forecast [19,26,28].Three decomposition levels are considered, since this describes the price series in a more thorough and meaningful way [26].

Seasonal ARIMA
AutoRegressive Moving Average (ARMA) models form a class of time series models that are widely applicable in the field of time series forecasting [29].In the case of linear trends and/or seasonal behavior, non-stationary time series processes can be transformed by differentiation of the series to make them stationary.The ARMA model, therefore, is transformed to an autoregressive integrated moving average (ARIMA) model [30].To capture a linear trend and seasonality (diurnal cycle) in the time series, one-hour (regular) and 24-hour (seasonal) differencing is used.
The Box-Jenkins approach is utilized to build the ARIMA.The approach uses an iterative model building strategy consisting of four stages.In the first stage, the structure of the model is identified.Utilization of the autocorrelation function (ACF) and the partial ACF (PACF) of the sample data is a basic tool to identify the order of the ARIMA best model, which is then estimated by maximum likelihood in the second step.The parameters of the model are estimated such that an overall measure of errors is minimized.Goodness-of-fit is tested on the estimated model residuals in the third step.If the model is not adequate, a new tentative model should be identified.Forecast future outcomes are obtained in the fourth step [29].

NN
The NN, also called a multilayer perceptron (MLP), is a semi-parametric model and has been developed based on study of the brain functions and the nervous system.Perceptrons are arranged in layers with no connections inside a layer, and each layer is fully connected to preceding and following layers without loops.The first and last layers are called input and output layers, respectively.Other layers are hidden layers.Each layer, therefore, consists of a specific number of computational elements, called neurons, which are connected to neurons in adjacent layers and capture complex non-linear phenomena.A sigmoid function is used in a hidden layer [31].
The procedure for developing NNs is as follows: data pre-processing; definition of the architecture and parameters; weights initialization; training until the stopping criterion is reached (number of iterations, sum of squares of error is lower than a pre-determined value); finding the network with the minimum error; and forecasting the future outcome.
The NN toolbox of MATLAB was selected for NN model building due to its flexibility and simplicity.The Levenberg-Marquardt (LM) algorithm was used in this study, which is an advanced optimization algorithm and one of the more efficient for training NNs.According to Kolmogorov's theorem, NN can solve a problem by using one hidden layer provided that it has a proper number of hidden neurons (N h ) [32].Therefore, one hidden layer has been considered in the structure of all NNs utilized in this study.

Compound Classifier
The problem of the price spike occurrence prediction is stated as a classification problem that can be solved by a pattern recognition framework.The ultimate goal of pattern recognition is to discriminate the class membership of the observed novel objects with the minimum misclassification rate.It had been observed that even if one of the designs would yield the best performance, the sets of patterns misclassified by the different classifiers would not necessarily overlap.This suggests that different classifier designs potentially offer complementary information about the patterns to be classified which could be harnessed to improve the performance of the selected classifier.The idea behind use of the compound classifier presented in this paper is to avoid reliance on a single classifier.Various classifier combination schemes have been devised and it has been experimentally demonstrated that some of them consistently outperform a single best classifier [33].The majority vote rule is applied to get an overall output (spike or non-spike) from the compound classifier.
The three individual classifiers used together in the compound classifier are a relevance vector machine (RVM), ensemble of bagged decision tress (DT) and probabilistic neural network (PNN).These methods are chosen because they provide probabilistic output (probability of class membership, e.g., probability of spike occurrence).The methods have been previously applied to several other applications with promising results [19,34,35].

RVM
RVM is a statistical learning technique based on Bayesian theory.It was developed for regression and classification problems.In RVM, the method to deal with non-linear data is to use a map function to map the training data from the input space into some high dimensional feature space, so the training data become linearly separable in the feature space.The related kernel function is used to avoid explicit knowledge of the high dimensional mapping [36].Consider a set of example of input vectors For classification problem, t i should be 0 for class C 1 and +1 for class C 2 .The RVM constructs a logistic regression model based on a set of sequence features derived from the input patterns, i.e.: where the basis function ( ) ( ( ), ( ),..., ( )) [1, ( , ), ( , ),..., ( , )] is the logistic sigmoid link function and 1 ( , ) N i j j K x x = are kernels terms.Assuming a Bernoulii distribution for ( | ) P t x , the likelihood can be written as: To form a Bayesian training criterion, a prior distribution over the vector of model parameters or weights, ( ) p w must be imposed.The RVM adopts a separable Gaussian prior, with a distinct hyper-parameter, ( ) , for each weight: The optimal parameters of the model are then given by the minimiser of the penalized negative log-likelihood: where { ( , )} A diag α = is a diagonal matrix with non-zero elements given by the vector of hyper-parameters.A detailed mathematical description of RVM is given in [37].Here, we select a Gaussian radial basis function (RBF) kernel with its specific value of spread σ RVM for application of RVM [29].

DT
Ensemble of DT creates a forest of a specific number of decision trees whose outputs are combined to make the overall output for the ensemble: where N tree is a number of trees, C is a class label, v is a feature vector, p i (C|v) is posterior probability generated by i th tree.
Bagging is a method to develop improved estimating class probabilities from DT classification algorithm.Mathematical description of DT classifier and bagging method can be found in [38,39].

PNN
PNN is a radial basis network that is suitable for classification problems.PNNs are closely related to the Parzen window probability density function estimator [40].The particular estimator used in this study is: where i is a pattern number; m is a total number of training patterns; X Ci is i th training pattern of class C; σ PNN is a spread parameter; p is a dimensionality of measurement.Function f C (X) is simply the sum of small multivariate Gaussian distributions centered at each training sample.However, the sum is not limited to being Gaussian.It can, in fact, approximate any smooth density function.
A PNN is organized into a multilayered feed-forward network with four layers: the input layer (set of measurements), pattern layer (Gaussian functions), summation layer (average operation of the outputs from the second layer for each class) and output layer (a vote, selecting the largest value).Mathematical details of PNN can be found in [41].Spread of the Gaussian RBF σ PNN is an adjustable parameter of PNN.If spread is near zero, the network acts as a nearest neighbor classifier.As spread becomes larger, the designed network takes into account several nearby design vectors.

Probability Threshold
Prediction of price spike occurrence is a serious imbalanced classification problem (i.e., the non-spike class has many more samples than the spike class).The probabilities of spike occurrence obtained from the above mentioned single classifiers are calculated for every input vector and then compared with a probability threshold V 0 .If the probability is larger than the threshold, a spike is predicted to occur, regardless of whether this probability is less than the probability of non-spikes.This modification is performed because many spikes occur when their occurrence probabilities are smaller than 50% [17].

k-NN
After having determined the probability of price spike occurrence, it is of considerable interest for market participants to be able to further predict the actual value of the price spike.A k-NN approach has been used for this task in this paper as in [18,42].
In the k-NN, from the training data samples, k-neighboring samples closest to the unknown sample are selected.Then the sum of weighted values of the k-closest samples is computed as the unknown sample's value.The Euclidean distance measure is employed here to calculate the closeness between two instances of the training data set.If k = 1 the instance is assigned to the class of its nearest neighbor.

Feature Selection Technique
Suitable selection of input variables plays a large role in the success of any forecast method.For price forecasts, apart from lagged values, many other input variables can be considered: available generation, fuel costs etc. [10,43,44].The set of potential inputs may be too large for further use in a model.Thus, it is necessary to refine the initial set of potential inputs such that a subset of the most effective inputs is selected to be applied to the forecast engine [43].
The two-step feature selection algorithm consisting of both relevance and redundancy filters is used [43].The ability to filter out redundant information from the set of the candidate features is the benefit of such a procedure versus a simple calculation of relevance value between target and independent variables.A mutual information (MI) criterion is used within the feature selection algorithm to capture non-linearity of the original price signal [45].
In the feature selection technique applied, SET I = {x 1 , x 2 ,…, x t } is supposed as a set of candidate inputs.For each feature x i ∈ SET I , its MI value with the target feature y (continuous or binary) is computed as MI(x i ,y).If the MI(x i ,y) value between a candidate variable and a target is greater than a pre-specified value V 1 , then this candidate is retained for further processing; otherwise it is filtered out: In the second step, the set of the retained candidates is supposed as SET 1 ⊂ SET I .For any two retained candidates x a , x b ∈ SET 1 , their MI value supposed as the redundancy measure is computed.If the MI value between any two candidate variables (x a and x b ) is smaller than a prespecified value V 2 , both variables are retained; otherwise, only the variable having the largest MI value with respect to the target [MI(x a ,y) or MI(x b ,y)] is retained.For instance, for x a , x b ∈ SET 1 : The redundancy filtering process is repeated for all candidate inputs of SET 1 until no redundancy measure becomes greater than V 2 .The subset of candidate variables SET 2 ⊂ SET 1 that passed the redundancy filter is finally selected as best inputs by the proposed two-step feature selection algorithm.

Electricity Price Spike Definition
A spike is defined as a price that surpasses a specified threshold.Some authors suggest the use of fixed log-price change thresholds [46], a varying log-price range threshold [47], or a fixed threshold value for a whole price time series under consideration [19].Here, the statistical method employed in [16] is used, where the spike threshold is calculated as µ + 3σ.Notations µ and σ indicate the mean and standard deviation of the considered market price data.The threshold is time-varying and calculated on the basis of half-a-year price data before each day of the considered period, i.e., of years 2009-2010, to capture evolving conditions of the market.All the prices exceeding this threshold are considered as spikes and extracted from the original price series of the Finnish day-ahead energy market of Nord Pool Spot over the period from 1 January 2009 to 31 December 2010 (see Figure 2).Table 1 shows the basic distribution parameters for normal prices and spikes in the Finnish day-ahead energy market of years 2009-2010.
It can be seen from Table 1 that spikes constitute about 1.0% of all the prices.However, their magnitude and unexpectedness cause them to have disproportionate significance in the energy markets.In multistep ahead prediction, the predicted price value of the current step is used to determine its value in the next step, and this cycle is repeated until the price values of the whole forecast horizon are predicted.

Price Spike Module: Compound Classifier
First, the set of candidate inputs for the compound classifier is constructed.Values of original price series lagged up to 200 h before a forecast hour are considered among the candidate inputs.If the period of the study is extended further, the results are not affected seriously, that is, the relation of the current price with the price of much more than one week ago is very small [48].Thus, lagged hours take into account short-run trend, daily and weekly periodicity of the electricity time series itself and external explanatory time series [19,43,49].
Electricity demand and supply are among the candidate inputs for the compound classifier since the relations of these variables are known to drive the movement in the price spikes to a large extent [17].Therefore, total electricity generation (i.e., internal supply (sup)) and electricity demand (d) in Finland, both lagged up to 200 h before a forecast hour, are selected.Despite its general importance, demand and supply forecasts will not be the focus of the present paper.Here, a WT + ARIMA model [26] is implemented to predict supply.The effect of weather variation is incorporated in a WT + ARIMAX model to predict demand.Atmospheric temperature is chosen as an indicator of weather variability.
Approximation (A3 p ) and detail (D1 p ) price wavelet components of a price series, both lagged up to 200 h before a forecast hour, are also candidate inputs for the compound classifier.In [19], a high correlation of a spiky price series with these wavelet components has been described.Hourly (ind h ), daily (ind d ), and seasonal (ind s ) indices are considered as candidate inputs to indicate the temporal effect.
The ARIMA is used as the initial price forecasting model and produces preliminary day-ahead predictions for all price wavelet subseries over the forecast period.For more clarity, prices and the wavelet components predicted by the ARIMA are additionally indexed as "arima" in the paper.For instance, price value predicted by the initial forecasting model at hour h is notated as p arima,h and used in the candidate input set for the compound classifier.
Finally, the initial set of candidate inputs for the compound classifier, i.e., for each single classifier, includes both historical and forecasted features of both wavelet and time domains.For instance, the 1008 candidate inputs to predict possibility of spike occurrence at hour h are {p arima,h ,

Normal Price Module
If the forecast sample is classified as a non-spike, the normal price module is activated.All electricity price spikes are extracted from the original training price series and replaced by the corresponding mean price value to form new normal price series.
Next, the set of candidate inputs for the normal price module is constructed.Firstly, the new normal price series is decomposed into four wavelet components of normal price series.Although the wavelet components are obtained by decomposition of the normal price signal, historical values of the original normal price series are considered among the candidate inputs of each price wavelet component, since it is still possible that some characteristics of the price are better highlighted in the original time domain [45].Historical and forecasted electricity demand data are also considered within the candidate input set.The ARIMA produces preliminary day-ahead forecasts for all wavelet subseries of the normal price series.Finally, the candidate input set for each wavelet subseries of normal price includes forecasted and lagged price and demand values of these subseries (e.g., A3 d is an approximation wavelet subseries of demand to predict A3 p ) plus original normal prices (p) lagged up to 200 h before a forecast day.For instance, the 602 candidate inputs to predict approximation normal price wavelet component at hour h

Price Spike Module: k-NN
If the forecast sample is classified as a spike, the price spike module is activated.The target set to train a k-NN is formed by the price spike samples extracted from the original training price series.The k-NN uses the set of candidate inputs similar to the one utilized for the compound classifier.

Search Procedure to Tune Model Parameters
Usually, the adjustable parameters of a forecasting model are selected based on past experience in the study.However, as each energy market has characteristics of its own, selection of optimal model parameters is an open area of a research.In this work, an analytical iterative search procedure is realized.It can automatically adjust the parameters of a forecasting model on a selected validation set with minimum reliance on the heuristics.The procedure is used to select optimal set of inputs by defining the threshold values V 1 , V 2 and parameter settings separately for NN (N h ), k-NN (k), RVM (σ RVM and V 0 ), DT (N tree and V 0 ) and PNN (σ PNN and V 0 ).
There are four adjustable parameters when the proposed search procedure is applied for RVM.The procedure is outlined below: A. Initial values for V 0 , V 1 , V 2 and σ RVM for RVM are set.B. Using the selected inputs, training samples are constructed.The classifier is trained and produces forecast on the validation set.The corresponding validation error is evaluated and stored.
C. Each adjustable parameter is varied by turn at a neighborhood around its previously selected value, while three remaining parameters are kept constant.A fixed radius of neighborhood (±25% of the previously selected value) is considered in the local search.For each value of the varied parameter in the neighborhood, training of the classifier is repeated and validation error is evaluated and stored.The value of the varied parameter resulting in the least validation error is selected and fixed.When only the first cycle described in the current step is executed, this cycle is repeated again.This modification is made to avoid a local minimum trap in the search procedure.Therefore, if the procedure misses the optimum solution in one cycle, it may find the optimum point in the next cycle.D. If the selected values of the adjustable parameters in the current cycle are the same as their previous values, the search procedure is terminated.Otherwise go to step 3.

Forecast Strategy
The proposed forecast strategy can be summarized by the following step-by-step algorithm, shown also in Figure 4. F. The overall electricity price forecast is formed as a joint output from the normal price and price spike modules.G.The overall price forecast (original and transformed into price wavelet components) replaces the predictions produced by the initial forecasting model for the current forecast day, since it is expected that electricity prices predicted by the separate forecasting frameworks have more accuracy.After replacement, the forecasting cycle is repeated as shown in Figure 4 until no difference in the overall electricity price forecast output of two successive iteration steps is observed.

Case Study
For examination of the proposed method, the real hourly data of the Finnish day-ahead energy market are considered.The electricity price, demand and supply historical data over the period from November 2008 to December 2009 are used to establish the initial training data sample set.The data over the period from January 2010 to December 2010 are used as the test set.

Training Phase
Training periods for the forecasting models of the normal price and price spike modules are different.As recommended in [26,43], a 50 days training period preceding the forecast day is considered for the NNs of the normal price module.It should be borne in mind that the price series have local trends since market conditions evolve with time, and, hence, use of a long training period may result in significant inaccuracies.
However, there are only few price spike samples in the whole data set (see Table 1).Unlike normal price prediction, in order to get a sufficient number of spike samples to train the model, a longer price series period is required.Hence, 365 days preceeding the forecast day are considered for the price spike module (the compound classifier and the k-NN model).
Since the forecasting models of the normal price and price spike modules have the inputs preliminarily predicted by other models their training periods are extended to comprise two consecutive periods: a moving training period for the preliminary model and the training period of the main model.
As a result, to predict normal prices or price spikes, a day denoted by D is considered in the corresponding second training period.Values of prices for this day are assumed to be unknown.The preliminary ARIMA models are trained by the historical data of the 50 days proceeding hour 1 of day D and predict price wavelet subseries of day D. To improve the performance of the ARIMA forecast process for each day of the second training period (D = 1,…, 50 for NNs or D = 1,…, 365 for the price spike module), the ARIMA models are trained by the immediately previous 50-days period.This process is repeated until forecasts from the ARIMA models are obtained for all days of the corresponding second training period (see Figure 5).

Validation Phase
The 24 h before the forecast day are removed from the training set of the NNs of the normal price module and used as the validation data set.Then, the NNs are trained by the remaining training samples.Adjusted parameters are fine-tuned on the validation data set.For the price spike module, all adjustable parameters of the classification approaches are fine-tuned by a 10-fold cross-validation technique applied for a whole training data set.

Numerical Results
The obtained results of the two-step feature selection algorithm implemented for the compound classifier, the k-NN and the NN to predict prices in the Finnish day-ahead energy market for a single forecast day, 5 January 2010, are presented in Tables 2 and 3. Since electricity price spikes have a very volatile stochastic nature with respect to the normal price time series, regular and periodic behaviour of price spikes are not so obvious (see Table 2).Otherwise, variables of the short-run trend (e.g., A3 p,h−1 , D3 p,h−2 ), daily periodicity (e.g., A3 p,h−25 , D3 p,h−24 ) and weekly periodicity (e.g., A3 p,h−169 , A3 d,h−169 ) are among the selected input features to forecast normal price wavelet components (see Table 3).Table 2. Selected inputs for the three classification approaches of the compound classifier and the k-NN for a single forecast day, 5 January 2010.
Inputs selected by the two-step feature selection to predict the normal price wavelet components for the NN for a single forecast day, 5 January 2010.In this paper, Adapted Mean Average Percentage Error (AMAPE) proposed in [10] was considered to evaluate the forecast results: where P iACTUAL and P iFOREC are actual and forecast values of hour i, respectively; and T is the number of predictions.
In addition, two performance measures that are spike prediction accuracy and spike prediction confidence proposed in [17] are used to reliably assess the performance of the compound classifier.
Spike prediction accuracy is a ratio of the number of correctly classified spikes (N corr ) to the number of actual spikes (N sp ): This measure was introduced because the ability to correctly predict spike occurrence is the subject of greatest concern.
Spike prediction confidence aims to account for the uncertainties and risks carried within the forecast.Spike prediction confidence is described as: where N corr is the number of correctly classified spikes and N as_sp is the number of observations classified as spikes.As the classifier may misclassify some nonspikes as spikes, this definition is used to assess the percentile in which the classifier makes this kind of a mistake.Only few research works have considered price forecasting in the Finnish day-ahead energy market and it was not possible to find price forecast methods considering the above-mentioned test period for price forecast.Therefore, the overall accuracy of the proposed method is compared with some of the most popular price forecast techniques applied for case studies of energy markets of other countries: seasonal ARIMA [5,44,50]; WT+ARIMA [26,28]; NN [6,50]; WT + NN [11].Additionally, WT + ARIMA + NN, which has not been found in the literature is among competitive techniques.To demonstrate the efficiency of the proposed methodology, its obtained results for the Finnish day-ahead energy market in year 2010 are shown in Table 4 with corresponding results obtained from five other prediction techniques.
In the WT + NN and WT + ARIMA + NN models separate NNs with the LM algorithm are applied for each price wavelet component.For a fair comparison, NN, WT + NN and WT + ARIMA + NN have historical and forecasted demand data among the candidate inputs.Feature selection analysis based on the proposed two-step feature selection is utilized for all examined models.The adjustable parameters of the competing models are fine-tuned by the proposed search procedure.It should be noted that among the competing examined models, only the WT + ARIMA + NN has preliminarily predicted price values in its set of candidate inputs i.e., the NN uses predictions from ARIMA as the candidate input.
As seen from As seen from Table 5, the iteration procedure converges in at most three cycles and the prediction error for the four test weeks at the end of the iterative forecast process with respect to Iteration 1 is improved by 13% on average.In addition, the performance of the proposed compound classifier is compared with selected single classifiers and other techniques: Naïve Bayesian [17], SVM [17], PNN [19], RVM [34], and DT [35].N corr and N as_sp for the Finnish day-ahead energy market of year 2010 are presented in the second and third columns of Table 6, respectively.Corresponding spike prediction accuracy and confidence in terms of percentage are given in the fourth and fifth columns of Table 6.Candidate inputs of all alternative classifiers are similar to the candidate input set of the compound classifier and refined by the proposed two-step feature selection.All preliminarily predicted price variables which are among the input sets of each competing classifier are predicted by ARIMA model.This action is similar to the case when spike occurrence is predicted using forecasts from the initial forecasting model.To justify the proposed iteration strategy particularly for the price spike occurrence forecast, N corr and N as_sp , accuracy and confidence measures obtained from the compound classifier on the final iteration step of the proposed methodology are shown in the sixth, seventh, eighth and ninth columns of Table 6, respectively.Total number of actual spike samples in the testing data set is 182.From the obtained results given in Table 6, it can be seen that the use of the iteration strategy results in a notable accuracy improvement of price spike occurrence prediction.Only RVM has slightly better spike prediction accuracy than the compound classifier, while the compound classifier has considerably better spike prediction confidence than RVM.
Table 7 shows the results obtained from each single classifier and the compound classifier itself on the final iteration step.The set of actual price spike test samples of year 2010 are divided according to their price value intervals (see the second column of Table 7).Large price spikes with values varying between 300 and 1500 euro/MWh constitute around 15% of all the spike samples.Because of their values and stochastic character, they are extremely important for all market participants.All the classifiers presented in Table 7 are able to correctly discriminate all the large spike samples of the test period.The accuracy of the examined classifiers varies in the prediction of price spike samples with values between 85 and 300 euro/MWh.For a more detailed representation of the performance of the proposed forecast strategy and separately for price spike occurrence on the whole test year, their results for all the weeks of year 2010 are shown in Table 8.There are six measures given for all test weeks of the day-ahead Finnish energy market of 2010: AMAPE, N sp , N corr , N as_sp , accuracy and confidence of the spike forecast.
As can be seen from Table 8, price forecasts of the weeks related to a winter season (December-February), i.e., weeks 1-8 and 48-52 of year 2010, have higher prediction error with respect to price forecasts related to other yearly seasons.The performance of the forecasting model is worse during the winter season, due to extreme price volatility, reflected in price spikes, which is caused by a number of complex factors and exists during periods of market stress.These stressed market situations are generally associated with extreme meteorological events and unusually high demand.However, in light of the fact that price spike values are highly stochastic, the achieved forecast accuracy level is fairly good and provides market participants with an ability to analyze spikes and thus manage their risks.Moreover, as can be seen from Table 8, occurrence of price spikes generally existing in the winter period is predicted by the proposed methodology with high accuracy and confidence.In this context, price spike prediction can be considered as a forecasting of a price volatility rather than exact price value.In order to graphically illustrate the price forecast performance of the proposed methodology and emphasize its ability to capture spikes, the forecasted and actual signals for the four selected spiky weeks (1,2,5 and 28 in Table 8) of the Finnish day-ahead energy market of year 2010 are shown in Figure 6.As can be seen in Figure 6, all the forecasted price curves acceptably follow the actual curves.The proposed methodology based on a hybrid iterative strategy is able to capture essential features of the given price time series: non-constant mean, cyclicality, exhibiting daily and weekly patterns, major volatility and significant outliers.This ability results in the superiority of the proposed methodology over all the examined alternative techniques.The total running time to set up the proposed separate forecasting strategy including its normal price module, price spike module, and iterative prediction process for the first forecast day is about 42 h since price predictions produced by the initial forecasting model are required over the period up to 365 days.The running time of the training and prediction procedures for the next forecast days after the first one is significantly lower (about 50 min) and considered suitable for day-ahead energy market operation.All the competitive non-separate forecasting approaches examined for price prediction have lower computation costs than the proposed separate forecasting strategy but are outperformed by the proposed strategy in terms of forecasting accuracy.The prediction accuracy is a crucial concern for a forecasting method (as far as the computation time is reasonable).The PNN and RVM classifiers of the compound classifier have relatively lower computational costs than the alternative back-propagation NN and SVM, respectively.The training process of the PNN is carried out through one run of each training sample unlike the back-propagation algorithm.The RVM is faster than the SVM in decision speed, as the RVM has a much sparser structure (the number of relevant vectors versus the number of support vectors).The computation times to set up the proposed and competitive forecasting strategies are measured on a hardware platform comprising an Intel Core i5 2.40 GHz processor (Intel Corporation, Santa Clara, CA, USA) and 3.24 GB RAM.All computer codes are provided by the MATLAB (MathWorks, Natick, MA, USA) and R (R Development Core Team, Auckland, New Zealand) software packages.

Figure 2 .
Figure 2. (a) Original price data of the Finnish day-ahead energy market in years 2009-2010; (b) Extracted price spikes.

Figure 3 .
Figure 3.Time framework to forecast market prices for the Nordic day-ahead energy market.

Figure 4 .
Figure 4. Procedure of the proposed method.

Figure 5 .
Figure 5. Historical data required for the training of the normal price and price spike modules to produce overall price forecast on a single forecast day.

Figure 6 .
Figure 6.Real and predicted prices for the four weeks with prominent spikes of the Finnish energy market of year 2010: (a) Week 1; (b) Week 2; (c) Week 5; (d) Week 28.

Table 1 .
Basic statistics for normal spot prices and price spikes in terms of (euro/MWh).

Table 4 .
AMAPE (%) obtained from different techniques for price forecasts in the Finnish day-ahead energy market of year 2010.It is expected that implementation of the proposed iteration strategy increases the accuracy of the overall price prediction.Detailed results of the proposed iteration strategy for the four test weeks of the Finnish day-ahead energy market of year 2010 are shown in Table5.These test weeks are related to dates 1−7 January 2010, 8−14 January 2010, 29 January−4 February 2010, 5−11 February 2010, respectively, and indicate periods of high volatility in energy price series.Iteration 0 in Table5represents the obtained results from the initial forecasting model.

Table 5 .
Accuracy of the proposed iteration procedure in terms of AMAPE (%) for the four test weeks of year 2010.

Table 6 .
N corr , N as_sp , occurrence prediction accuracy and confidence in terms of percentage (%) for price spike classification in the Finnish day-ahead energy market of year 2010.

Table 7 .
Results obtained by the compound classifier for different price spike intervals.

Table 8 .
Obtained results from the proposed forecasting methodology for each week of 2010.