Day-Ahead Electricity Price Forecasting Using a Hybrid Principal Component Analysis Network

Bidding competition is one of the main transaction approaches in a deregulated electricity market. Locational marginal prices (LMPs) resulting from bidding competition and system operation conditions indicate electricity values at a node or in an area. The LMP reveals important information for market participants in developing their bidding strategies. Moreover, LMP is also a vital indicator for the Security Coordinator to perform market redispatch for congestion management. This paper presents a method using a principal component analysis (PCA) network cascaded with a multi-layer feedforward (MLF) network for forecasting LMPs in a day-ahead market. The PCA network extracts essential features from periodic information in the market. These features serve as inputs to the MLF network for forecasting LMPs. The historical LMPs in the PJM market are employed to test the proposed method. It is found that the proposed method is capable of forecasting day-ahead LMP values efficiently.


Introduction
There are two main transaction modes in a deregulated electric power industry, namely, competitive bidding and bilateral contract.Competitive biddings are used in the energy, spot, firm-transmission-right and ancillary service markets while bilateral contract is adopted outside the competitive market for any two individual entities, buyer and seller [1,2].For either transaction mode, the electricity price information serves as an essential signal for all entities to adjust their offers/bids and/or contract prices.In particular, OPEN ACCESS locational marginal pricing (LMP) is one of the most popular modes for pricing electricity in a deregulated electricity market.LMPs can reflect the electricity value at a node and may be discriminated at different nodes in a power network [3].LMPs provide information that is helpful to market participants in developing their bidding strategies.It is also a vital indicator for the Security Coordinator to mitigate transmission congestion [4].LMPs reveal important information for both the spot market and entities with bilateral contracts.
Past studies have investigated short-term System Marginal Price (SMP) forecasting [5,6].Because the SMP is irrelevant to transmission constraints, forecasting LMPs subject to transmission constraints is more difficult than forecasting Market Clear Prices (MCPs).Current methods for short-term LMP forecasting can be classified at least into three groups: hour-ahead, day-ahead and week-ahead forecastings.
The recurrent neural network integrated with fuzzy-c-means was proposed for hour-ahead LMP forecasting in [7].Linguistic descriptions in the PJM market were transformed into fuzzy membership functions associated with the recurrent neural network for forecasting volatile hour-ahead LMP variations when contingency occurs [8].This paper investigates the more difficult problem related to the day-ahead price forecasting, which may be applied to the day-ahead market and will be discussed in the next paragraph.
In recent years, Contreras et al. [9] used the ARIMA model and Nogales et al. [10] used the dynamic regression approach and transfer function approach to predict the next-day (day-ahead) electricity prices.However, there is no discussion on extracting the market features for usage of these approaches in [9,10].Li et al. [11] integrated the fuzzy inference system with least-squares estimation to conduct the day-ahead electricity price forecasting.The "week day", "yesterday price" and "local demand" were considered in the 18 antecedent (premise or condition) parts of the fuzzy rules in [11].Giving the membership functions of these three linguistic variables is quite heuristic.Moreover, the "local demand" for the fuzzy rules is not a forecasted but an actual value, which is generally not available in the day-ahead market.Amjady and Keynia [12] combined a mutual information technique (MIT) with the cascaded neuro-evolutionary algorithm (NEA) for the day-ahead electricity price forecasting.In [12], 14 features in the market were selected by MIT for 24 feedforward neural networks trained by the NEA.No reasonable explanation was found for these 14 features.Moreover, many (24) neural networks make the method impractical for industrial application.Garcia et al. [13] presented an approach to predicting next-day electricity prices using the Generalized Autoregressive Conditional Heteroskedastic (GARCH) methodology, which is an extended auto-regressive integrated moving average (ARIMA).Amjady [14] presented a fuzzy neural network with an inter-layer and feedforward architecture using a new hypercubic training mechanism.The proposed method predicted hourly market-clearing prices for the day-ahead electricity markets.Again, there is no discussion on extracting the market features for usage of the GARCH in [13,14].Coelho and Santos [15] proposed a nonlinear forecasting model based on radial basis function neural networks (RBF-NNs) with Gaussian activation functions.Partial autocorrelation functions (PACF), which relies on the mutual linear dependency among studied parameters, was used to identify the market features.However, the relation among power market features is very nonlinear.
The problem of week-ahead price forecasting is generally easier than that of day-ahead price forecasting because the price pattern of a day is similar to that of its corresponding week-ahead day.Catalao et al. [16] proposed a wavelet-based Sugeno type fuzzy inference system to predict the electricity price in the electricity market of mainland Spain.However, the selection of numbers of membership functions in [16] is a trade-off between refining and sparseness.Che and Wang [17] presented a method based on support vector regression and ARIMA modeling; however, only the MCPs of California electricity market were used to examine the accuracy of the proposed method.The method has not been applied to forecasting LMPs, whose pattern is more nonlinear than MCPs'.
Because LMPs vary dramatically, it is difficult to analyze the related data with traditional techniques (e.g., regression analysis).Like other forecasting problems [18][19][20], the LMP forecasting needs feature extraction incorporating a powerful approach.As described above, the neural network is suitable for nonstationary time-series prediction, providing satisfactory results.In this paper, a principal component analysis (PCA) neural network cascaded with the multi-layer-feedforward (MLF) neural network is proposed for day-ahead LMP forecasting.The PCA neural network is used to extract essential features in the electricity market.It also helps reduce high-dimensional data into low-dimensional ones, which serve as inputs for the MLF neural network.
The rest of this paper is organized as follows: the PJM real-time market data will be described in Section 2. The proposed PCA neural network cascaded with the MLF neural network for forecasting day-ahead LMPs will be given in Section 3. Simulation results obtained using the PJM data are presented in Section 4. Concluding remarks are provided in Section 5.

Volatile LMPs in a Day-ahead Market
The PJM energy market comprises day-ahead and real-time markets.The day-ahead market is a forward market in which hourly LMPs are calculated for the next operating day using generation offers, demand bids and scheduled bilateral transactions.The real-time market is a spot market in which current LMPs are calculated at five-minute intervals according to actual grid operating conditions.PJM settles transactions hourly and issues invoices to market participants monthly.Figures 1 and 2 illustrate the LMPs in Fisk (4 kV) and Byberry (13 kV), respectively, on 1-7 July 2008.As can be seen, LMPs vary dramatically over a wide range.

The Proposed Method
The hybrid PCA neural network is developed by combining the unsupervised PCA and supervised MLF neural networks to conduct day-ahead LMP forecasting.The PCA neural network is employed to extract essential features in the electricity market.The PCA neural network can also reduce high-dimensional data into low-dimensional ones, which serve as inputs of the MLF neural network to reduce the training CPU time.

Principal Component Analysis Neural Network
The purpose of the PCA neural network is to find a set of P orthonormal vectors (OVs) in a Q-dimensional space (Q  P), such that these OVs will account for as much variance of the input data as possible.OVs are actually P eigenvectors associated with the P largest eigenvalues of the E( t xx ), where x denotes the Q-dimensional input column vector, i.e., x = (x 1 x 2 … x Q ) t .The direction of the q-th principal component will be along the q-th eigenvector, q = 1, 2, .., Q.
Let symbol t be the training index.This paper used Sanger's method [21] to update the weightings between the neurons of the PCA network as follows:  is a parameter of learning rate and is the output.Equation ( 1) is employed to train a neural network consisting of P linear neurons so as to find the first P principal components.More specifically, Generalized Hebbian Algorithm was able to make ), (t w p p = 1, 2, …, P, converge to the first P principal component directions, in sequential order: ( ) , where v i is a normalized eigenvector associated with the i-th largest eigenvalue of the correlation matrix.It was shown that if ), (t ), are exactly the p-th eigenvalue and the p-th normalized eigenvector p v of the

MWx100
LMP LOAD correlation matrix of x, i.e., C ≡ E( t xx ), respectively.Consequently, neuron p can find the p-th normalized eigenvector of C. Detailed explanations can be found in [21].

It was shown that ) (t  should be smaller than the reciprocal largest eigenvalue of E(
) to ensure the convergence of training a PCA neural network.When the training process is convergent, p w , p = 1, 2, …, P, converges to the p-th eigenvector of E( ). Figure 3 shows the configuration of the hybrid PCA neural network: The left part of Figure 3 is the unsupervised PCA neural network while the right part is the supervised MLF neural network.Because the training time of unsupervised PCA neural network is trivial while that of supervised MLF is considerable, PCA neural network is adopted to reduce both the dimension of training data and the training time for the cascaded MLF network trained by the back-propagation algorithm, which is well known and ignored here.
In the proposed hybrid PCA, the new hidden layer consists of 20 neurons.The number (p) of orthonormal vectors is 24 or 48, depending on the numbers of inputs.After training the unsupervised PCA, the supervised MLF is trained, using the frozen weights of the unsupervised PCA.The training sets are identical for both unsupervised and supervised NNs.

Features for Inputs of PCA Neural Network
The performance of a neural network depends strongly on the adopted features at the input layers.As shown in Figures 1 and 2, variations of system load affect LMPs.Assume that the h-th LMP is to be forecasted.Let P(h) and L(h) be the LMP and MW demand at hour h, respectively.
Below are 4 alternatives for considering input features x 1 , x 2 , …, x Q : (  (3) The features of the past 2 days and the designated day: F3(h) = F1(h) (D|D is one of the seven days in a week).This implies Q = 49.(4) The features of the same day of the last week, those of the past 2 days and the designated day: F4(h) = F2(h) (D|D is one of the seven days in a week).This means Q = 97.
The symbol D for the designated day here means Monday, …, Saturday or Sunday.Because the neural network cannot deal with symbols, 30, 50, …, 150 stand for Monday, …, Saturday and Sunday, respectively, in this paper.

Moving Data Windows for Forecasting
P(h) at the output layer is paired with F1(h), F2(h), F3(h) or F4(h).More specifically, assume that F1(h) is considered and the 24 LMPs on Wednesday (next day) are forecasted.Figure 4 illustrates the moving data window corresponding to the forecasted LMP.Hence, the paired training data are as follows: (F1(h), P(h)), (F1(h + 1), P(h + 1)), …, (F1(h + 23), P(h + 23)).In Figure 4, the first data set involves only Monday and Wednesday.The last 23 data on Monday and the first data on Tuesday will be paired with P(h + 1) for the second data set.Restated, forecasting 24 LMPs on Wednesday will be completed at 23:00 p.m. on Tuesday.
When the proposed hybrid PCA neural network is used in the day-ahead market or in the testing stage, the input data for the past day (e.g., Monday in Figure 4) and this day (e.g., Tuesday in Figure 4) are known and output (forecasted) data for the next day (e.g., Wednesday in Figure 4) is unknown.Assume that this day is Tuesday and LMPs on Wednesday are to be forecasted.Figure 5 shows the moving data window for F2(h) paired with P(h).The paired training data are as follows: (F2(h), P(h)), (F2(h + 1), P(h + 1)), …, (F2(h + 23), P(h + 23)).As shown in Figure 5, the first data set involves only the last Wednesday, Monday and Wednesday.The time index h will be increased by one at a time until h + 23.Restated, forecasting 24 LMPs on Wednesday will be completed at 23:00 p.m. on Tuesday.

Numbers of Neurons in Different Layers
The numbers of input, output, second, and fourth layers are discussed as follows: (1) The numbers of input neurons for the hybrid PCA neural networks are 48, 96, 49 and 97 for F1(h), F2(h), F3(h) and F4(h), respectively.That is, subscript Q in Figure 3 can be 48, 96, 49 or 97.(2) The number of neurons in the MLF output layer is one (i.e., P(h)), regardless of F1(h), F2(h), F3(h) and F4(h) being considered.(3) Because the purpose of the PCA neural network is to find a set of P orthonormal vectors (OVs) in a Q-dimensional space, P is expected to be smaller than the corresponding number of inputs.It is intuitive to consider P in Figure 3 to be 24 for the studied problem with Q = 48 or 49 because there are 24 hours in a day.Similarly, P = 48 while Q = 96 or 97.(4) The common number of neurons for the fourth (hidden) layer is (P + number of output neurons)/2 or (P  number of output neurons) 0.5 .The simulation results show no significant difference between these two alternatives.

Simulation Results
In order to demonstrate the applicability of the proposed hybrid PCA neural network, the LMPs for the Fisk (4 kV) and Byberry (13 kV) areas in the PJM system were studied.Two sets of 366  24 data (1 January-31 December 2008) for Fisk and Byberry from the PJM web site were employed to train/validate and test the proposed hybrid PCA neural network.The entire data set includes four seasons.The data of each season are further divided into three groups: training data, validation data (in total 2/3), and test data (1/3).The training data were used for training the neural network and updating the biases and weights.The validation data were utilized to monitor the training process.The remaining data were employed to test the proposed hybrid PCA neural network after they were well trained.A C++ code was developed using a PC equipped with a Pentium(R) Dual-Core E5200 2.5 GHz CPU and 4-GB RAM for showing the applicability of the proposed method.

Comparison between Hybrid PCA and Back Propagation-Based Neural Networks
In this subsection, the performance of the proposed hybrid PCA neural network is compared with that of traditional back propagation-based (BP-based) neural network for the Fisk area.The traditional BP-based neural network can be taken as a neural network in which there are no second and third layers as seen in Figure 3. Tables 1-4 display the CPU time (minute: second), correlation coefficient (R2) and mean absolute error (MAE, $/MWh) obtained by these two methods for Fisk.The correlation coefficient represents the resemblance between the actual and the forecasted values.The value of one for the correlation coefficient indicates that the actual values are identical to the forecasted ones.The average value and corresponding standard deviation (sd, $/MWh) of actual LMPs in each season are also shown in the second and third columns of Tables 1-4. Figure 6 shows the comparisons among actual, BP-based and hybrid PCA-based LMPs for Fisk (1-7 July 2008).(2) For the same neural network, the R2 obtained by 49 inputs is better (larger) than that by 48 inputs; for the same reason, the neural network with 97 inputs has better performance than that with 96 inputs in terms of R2. (3) For the same neural network, the R2 obtained by 96 (97) inputs is much better (larger) than that by 48 (49) inputs; however, the CPU times required by 96 (97) inputs are longer.(3) For the same number of inputs, the R2 and MAE $/MWh obtained by the hybrid PCA neural network are better than those obtained by the BP-based neural network.(4) For the same number of inputs, the CPU time required by the hybrid PCA neural is shorter than that required by the BP-based neural network.
Tables 5-8 show the comparison of performance between the proposed hybrid PCA neural network and the traditional BP-based neural network for the Byberry area.The same conclusions can be made for both Byberry and Fisk areas.

Investigation of Number of Output Neurons for PCA Network
The second layer of the proposed hybrid PCA neural network shown in Figure 3 denotes the features of the electricity market.The number (i.e., P) of neurons at this layer hence plays a crucial role in developing the proposed method.Tables 9 and 10 show the impact of different P's at the second layer on R2 and MAE for Fisk and Byberry areas, respectively, in fall.In order to show the effectiveness of the proposed method, only 97 inputs (i.e., Q) in Figure 3 were studied.The following remarks can be made according to Tables 9 and 10: (1) The larger the P, the longer the CPU time is required due to the supervised MLF neural network at the third, fourth and fifth layers in Figure 3. (2) A larger P will result in a better performance in terms of R2 and MAE obtained.Hence, there is a trade-off between performance and CPU time.In general, performance is more important.

Comparison between Hybrid PCA Network and ARIMA
Traditional nonstationary time-series prediction method using ARIMA [9] is employed to study the same PJM day-ahead market.Because the hybrid PCA neural network with 97 inputs gained the best performance as described in Section 4.1, it was compared with the ARIMA.The general ARIMA formulation was given as follows [9]: where P(h) is the LMP at time h, φ(B) and ) (B  are functions of the backshift operator B: B k P(h) ≡ P(h − k), and ε(h) () is the error term.This paper adopted the functions φ(B)and θ(B) given in [9] for comparisons.In [9], the load factor was not considered as a regressor in the ARIMA.Twenty four LMPs were used as lagged regressors in the ARIMA.Tables 11 and 12 show the correlation coefficient (R2) and mean absolute error (MAE, $/MWh) obtained for all four seasons by the two methods.
The following comments could be made according to the results shown in Tables 11 and 12: (1) For either Fisk or Byberry, the performances of the proposed hybrid PCA neural network are always better than those of the ARIMA in terms of both R2 and MAE obtained.(2) The R2's obtained by the ARIMA for both Fisk and Byberry in winter are very low (0.488 and 0.419) while those obtained by the proposed method are much higher (0.822 and 0.837).
(3) The LMPs in the Byberry area are more volatile than those in the Fisk area in terms of average R2 (0.566 with respect to 0.725).However, the proposed method is more reliable regardless of the studied areas; that is, 0.843 for Byberry is close to 0.852 for Fisk.

Diebold and Mariano Test
Diebold and Mariano proposed and evaluated explicit tests of the null hypothesis of no difference in the accuracy of two competing forecasts [22].The loss function does not need to be quadratic, and even to be symmetric, and forecast errors can be non-Gaussian, nonzero mean, serially correlated, and contemporaneously correlated in this method.This subsection utilizes Diebold and Mariano test to evaluate the performance of the proposed hybrid PCA network, the BP-based network, and ARIMA.The loss function used in this paper is based on "mean squared error" (MSR) [23].
Let H 0 be the null hypothesis of no difference in the accuracy of the proposed hybrid PCA network and the BP-based network.The alternative hypothesis is the union of H 1 and H 2 , which mean that the proposed hybrid PCA network is significantly better than the BP-based network and that the BP-based network is significantly better than the proposed hybrid PCA network, respectively.Under the null hypothesis, the test statistic S 1 defined in [22] and used to test H 0 , H 1 and H 2 has an asymptotic standard normal distribution.Let the confidence level be 95%.If S 1 is greater than 1.96, than H 1 is accepted and H 0 is declined.If S 1 is smaller than −1.96, than H 2 is accepted and H 0 is declined.When S 1 is within [−1.96, 1.96], H 0 is accepted and there is no significant difference in forecasting accuracy between the two models.According to Tables 13 and 14, the proposed hybrid PCA network has better performance in 9 out of 16 tests while 7 tests accept H 0 .H 2 has never been accepted.
Similarly, Diebold and Mariano test is conducted to compare the performance between the proposed hybrid PCA network and ARIMA.Based on the same comparisons given in Tables 11 and 12, Table 15 shows that the proposed hybrid PCA network is significantly better than ARIMA.

Conclusions
In this paper, a new method using the hybrid principal component analysis (PCA) neural network for the day-ahead LMP forecasting in a deregulated market is proposed.The purpose of the PCA neural network is to find a set of 24 or 48 orthonormal vectors in a Q-dimensional space (24 for Q = 48, 49, and 48 for Q = 96 and 97 in this paper).The PCA can extract more essential features of the power market and hence reduce the training time required for the cascaded multi-layer feedforward neural network.
Simulation results show that the features of the same day of the last week and of the designated day provide crucial information serving as inputs of the PCA neural network.Simulation results also show that the performance of the proposed method is always better than that of the back-propagation-based neural network and ARIMA by evaluating R2 and MAE.The results of the Diebold and Mariano test show that the proposed method is better than the back-propagation-based neural network for most of

Figure 3 .
Figure 3.The proposed hybrid PCA neural network.

Table 1 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Fisk, spring).

Table 2 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Fisk, summer).

Table 3 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Fisk, fall).

Table 4 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Fisk, winter).

Table 5 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Byberry, spring).

Table 6 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Byberry, summer).

Table 7 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Byberry, fall).

Table 8 .
Performance comparison between the proposed hybrid PCA and BP-based neural network (Byberry, winter).

Table 13 .
Diebold andMariano test between the proposed hybrid PCA and BP-based neural network (Fisk).

Table 14 .
Diebold andMariano Test between the proposed hybrid PCA and BP-based neural network (Byberry).