Day-Ahead vs. Intraday—Forecasting the Price Spread to Maximize Economic Beneﬁts

: Recently, a dynamic development of intermittent renewable energy sources (RES) has been observed. In order to allow for the adoption of trading contracts for unplanned events and changing weather conditions, the day-ahead markets have been complemented by intraday markets; in some countries, such as Poland, balancing markets are used for this purpose. This research focuses on a small RES generator, which has no market power and sells electricity through a larger trading company. The generator needs to decide, in advance, how much electricity is sold in the day-ahead market. The optimal decision of the generator on where to sell the production depends on the relation between prices in different markets. Unfortunately, when making the decision, the generator is not sure which market will offer a higher price. This article investigates the possible gains from utilizing forecasts of the price spread between the intraday/balancing and day-ahead markets in the decision process. It shows that the sign of the price spread can be successfully predicted with econometric models, such as ARX and probit. Moreover, our research demonstrates that the statistical measures of forecast accuracy, such as the percentage of correct sign classiﬁcations, do not necessarily coincide with economic beneﬁts.


Introduction
Over the last decade, many countries have experienced a dynamic development of intermittent renewable energy sources (RES), among which wind and solar play a central role. From 2005 to 2014, RES generation in the EU (28 countries) increased by 87.8%, from 495.7 GWh to 930.9 GWh. In 2016, it reached 29.6% of the total gross electricity generation. Increasing input of RES affects both the electricity market and the distribution system. On one hand, it results in the reduction of CO 2 emissions and the decrease of wholesale electricity prices [1][2][3][4]. On the other hand, expanding RES creates new challenges for market participants. Electricity generated by wind or photovoltaic (PV) depends strongly on weather conditions and, hence, is volatile and difficult to forecast; see [5] for a comprehensive discussion. In some countries, such as Germany, RES generation is granted priority during the dispatch and receives a fixed feed-in tariff. As a result, the day-ahead prices are more prone to extreme behavior, such as spikes or negative values [6]. Finally, RES generation increases the risk of system imbalance, because an inelastic demand needs to be covered by a stochastic supply.
Nowadays, a major share of electricity is sold on power exchanges, such as Nord Pool or European Energy Exchange (EEX) in Europe, India Energy Exchanges (IEX) in Asia, and Pennsylvania New Jersey Maryland Interconnection LLC (PJM) or New York Independent System Operator (NYISO) in the US, where day-ahead contracts dominate. The day-ahead prices, which in Europe are also called 'spot prices', are set around noon on the day preceding the delivery. In order to allow for

Data
This article analyzes two distinct electricity markets: Germany and Poland. Although they are both located in Europe and are neighbors, they differ in terms of the generation structure and the market organization. Germany is well known for its success in RES penetration, which-at the time of this article-accounts for 40.2% of total electricity production (see https://www.energycharts.de). At the same time, Polish electricity generation is based on coal, with RES reaching only 8.4% (see https://www.pse.pl/dane-systemowe/funkcjonowanie-kse/raporty-miesieczne-zfunkcjonowania-rb/raporty-miesieczne). In both countries, RES has priority during the dispatch. Second, the German and Polish markets are designed differently, which could affect the decision process of market participants [8]. The intraday market in Germany is based on the pay-as-bid principle, which has been complemented by intraday auctions. In Poland, the balancing market, which is considered as the main market for contract adjustments, uses a double price approach. The practice shows, however, that a single market-clearing price is preferred by the market operator and, hence, the double prices have never been applied. Finally, in Germany, negative prices are allowed in both day-ahead and intraday markets, whereas, in Poland, until 2019 the balancing prices were restricted to be positive and to stay within an interval of 70-500 PLN/MWh (which is equivalent to 17.5-125 EUR/MWh). A recent regulation, which came into power on 1 January 2019, changed the lower and the upper bounds to −50,000 PLN/MWh and 50,000 PLN/MWh, respectively.
The data used in this research is hourly and spans the period from 1 January 2016 to 31 December 2017. The data for Poland consists of day-ahead and balancing market prices (both in Polish zloty, PLN). The prices are complemented by exogenous variables: The forecasted energy demand, the forecasted wind generation, and the forecasted available energy reserves. The intraday prices are not included in the analysis, as the market is not liquid. The German data is comprised of energy prices (in Euros) in the day-ahead and intraday markets. The intraday prices are calculated as the weighted average of all intraday contracts for the given hour. The exogenous variables are: The forecasted total load, which can be treated as a proxy for the forecasted demand, and the forecasted wind generation. Data sources and units are summarized in Table 1. Time paths of electricity price, together with the price spread, computed as the excess of the intraday/balancing price over the spot price, are presented in Figures 1 and 2. First, it can be observed that the day-ahead prices are less volatile than the intraday/balancing prices, especially for Poland. Moreover, both time series are characterized by positive spikes, with Germany exhibiting sudden drops with prices falling below zero. Spikes are also observed in the spread series, indicating that they are not perfectly synchronized between markets. The basic descriptive statistics of electricity prices are presented in Table 2. They confirm that balancing and intraday prices are, on average, higher and more volatile than day-ahead prices, with the difference being more pronounced for Poland. Finally, it could be noticed that the spread in Germany has a much lower variance than any of the prices, whereas, in Poland, the spread is more volatile than spot prices. In order to evaluate the forecasting possibilities of the price difference, the sample is divided into estimation and validation periods. The validation window contains the last 365 observations, from 1 January 2017 to 31 December 2017 (see Figures 1 and 2). In this research, a rolling estimation window approach was adopted, with its length ranging from a month (30 days), a quarter (91 days), half a year (182 days), to a year (365 days).

Forecasting Methods and Forecast Evaluation
In order to model market preference, let us define a decision variable Y ht , which equals one when the generator decides to sell the electricity for hour h and day t in the intraday market, and zero otherwise. Let us consider two benchmark strategies: The first one assumes that the utility sells all generated electricity in the day-ahead market. In such a case, Y ht = 0 for all h and t. The second one assumes that the generator enters only the intraday/balancing market and, hence, it is always the case that Y ht = 1. We refer to them as naïve day-ahead and naïve intraday/balancing strategies, respectively. Here, we compare them with a data driven approach, which assumes that the decision depends on the relationship between the day-ahead (P 0 ht ) and the intraday/balancing price (P 1 ht ) As the price difference ∆P ht = P 1 ht − P 0 ht is not known in advance, the generator needs to base its decision on the predicted spreadŶ where ∆P ht|t−1 is a forecast of ∆P ht , computed using the information available on day t − 1. In this research, two alternative ways of forecasting Y ht are considered. First, autoregressive models with exogenous variables (ARX) are examined, which either (1) separately model the level of prices P 0 ht and P 1 ht and then compute ∆P ht , or (2) describe directly the price spread, ∆P ht . In the regression, information on predicted market fundamentals and past levels of prices is used. The forecast,Ŷ ht , is based on the sign of the predicted spread ∆P ht . Second, probit models are used, which describe directly the distribution of the binomial variable Y ht defined by (1). Similar to ARX models, the probability Prob(Y ht = 1) is conditioned on exogenous variables and lagged prices. A more detailed description of the models is presented in Sections 3.1 and 3.2.
When the aggregated, daily models are considered, then the endogenous and exogenous variables are constructed as the daily averages of the corresponding hourly variables. The decision variable Y t becomes which implies that the generator adopts the same strategy throughout the day, and hence Y 1t In the remaining part of the article, the following notation is used: • D t denotes a (4 × 1) vector of deterministic variables: A constant and dummy variables for Mondays, Saturdays, and Sundays/Holidays, • X ht represents a vector of exogenous variables, which is a subset of {X 1,ht .X 2,ht , X 3,ht } for Poland and {X 1,ht , X 2,ht } for Germany. Variables X i are defined in Table 1. • X t , P 0 t , and P 1 t are daily averages of the corresponding hourly variables: X ht , P 0 ht , and P 1 ht , respectively.

Autoregressive Models
ARX is a linear model, which links the current level of an endogenous variable with its past values and a vector of exogenous variables. It has been widely applied in the literature and has proved useful in forecasting electricity prices; see [5] for a discussion. Here, two model specifications are considered. First, the prices P 0 ht and P 1 ht are modeled separately Then, the spread ∆P ht is computed as their difference: ∆P ht = P 1 ht − P 0 ht . Alternatively, the price spread is modeled directly, according to the following formula In both model specifications, α and β are vectors of coefficients corresponding to the deterministic and exogenous variables, respectively, while θ i are the autoregressive parameters and t are the residuals. Lags i belong to the pre-defined set L. Unfortunately, on day t − 1, not all prices P 1 ht−1 are known yet, nor the spreads ∆P ht−1 . For this reason the lag i = 1 is excluded from set L, except when modeling P 0 ht . In order to compensate for this, the previous day's price P 0 ht−1 is added to Equations (5) and (6).
Since the main interest of the generator is the optimal choice of the market, the results of regressions (4)-(6) are used to predictŶ ht . According to Equation (2), the utility sells in the intraday/balancing market,Ŷ ht = 1, when the predicted spread ∆P ht is positive. Otherwise, it chooses the day-ahead market,Ŷ ht = 0.
For aggregated, daily models, the hourly observations in Equations (4)-(6) are replaced by their daily averages: X t , P 0 t , P 1 t , and ∆P t . The structure of the models remains unchanged. As a result, the decision variable Y t , defined by (3), is equal to one when ∆P t > 0, and zero otherwise.
Parameters α, β, θ i , and γ are estimated with the least-squares (LS) method. For each hour, h, separately, the model is fitted such that the sum-of-squares of the differences between the observed and predicted values is minimized.

The Probit Model
The probit model is used to describe directly the probability distribution of the binary variable Y ht . The model is formulated as follows: where Φ(x) is the standard normal cumulative distribution function, α is a (1 × 4) vector of parameters describing the impact of deterministic variables, and β summarizes the effect of the exogenous variables. Finally, θ i are the autoregressive parameters, with lags i belonging to the pre-defined set L, as in Equations (4)- (6). In order to utilize all the information available on day t − 1, the effect of P 0 ht on the probability is added by use of the parameter γ.
Parameters α, β, θ i , and γ are estimated separately for each hour, h, using the maximum likelihood method. Due to the lack of a closed-form solution for the maximization problem, the parameters are estimated numerically, using the Nelder-Mead algorithm [25]. Calculations are conducted in the R environment. The initial parameters for the procedure are estimated by a least-squares method and the number of iterations is limited to 10, 000.
With parameter estimatesα,β,θ i , andγ obtained within the rolling calibration window, the forecast of Y ht is defined aŝ where µ ∈ (0, 1) is the threshold parameter. Typically, the threshold is chosen to equal µ = 0.5. However, the results indicated that µ = 0.4 or µ = 0.3 provide more accurate and profitable forecasts (see Section 4 for details). For aggregated, daily models, Formula (7) becomes with the parameters defined as above. The forecastsŶ t are obtained, as in (8), by comparing the forecasted probability with the threshold µ.

Forecast Evaluation
The literature proposes various methods, which could be used to evaluate the accuracy of binomial variable forecasts. First, one could compute the classification power, denoted here by p: where H = 24, T = 365, and 1 {s} stands for an indicator variable, which takes value one when s is true, and zero otherwise. This measure shows how often the forecast coincides with the true value. Second, two measures of the predictive power could be computed: , and (11) The first measure describes the probability that Y ht = 0 whenŶ ht = 0. Similarly, q 1 indicates the probability that Y ht = 1 whenŶ ht = 1. It should be noticed that, for the day-ahead strategy, the classification power p = q 0 and it is equal to the unconditional probability Prob(Y ht = 0). Analogously, for the second naïve strategy, which always selects the intraday/balancing market, we have p = q 1 = Prob(Y ht = 1).
The classical, statistical approach of prediction evaluation provides an interesting description of the forecast performance, but may not reflect the main concerns of the generator, which are the profit and the risk. Therefore, the analysis is complemented by two measures related to financial outcomes of the adopted strategy. The potential gains and losses induced are computed relative to the benchmark strategy. This implies that the hourly profit, π ht , becomes Using Equation (13), one could compute the daily profit π t = ∑ 24 h=1 π ht and the total yearly profit π = ∑ T t=1 π t . In order to describe the financial aspects of the decision, we use the total yearly profit, π and the 5% Value at Risk (VaR) associated with daily profits π t .

Results
The results for the Polish and German markets are analyzed from the perspective of forecast accuracy (p, q 0 , and q 1 ) and financial profitability (π). The risk associated with each of the forecasting methods is measured with the 5% VaR of daily profits. Various model specifications are examined, which differ in terms of the aggregation (daily versus hourly data), the choice of the exogenous variables X, the lag structure L, and the length of the calibration window. Finally, the data-driven approach is compared with the second naïve strategy, which assumes selling only in the intraday/balancing market.

Poland
The results for the Polish electricity market are summarized in Tables 3 and 4. Table 3 shows the classical measures of prediction accuracy. The results for the top three model specifications for each forecasting approach are presented and compared with the naive strategies. It should be emphasized that the balancing market provided higher prices than the day-ahead market in 51.2% of cases. First, the aggregated, daily models were considered. The most accurate were the ARX models, which separately described the day-ahead and balancing prices. They correctly predicted the sign of the spread in 57.3% of the cases. The best probit model had a classification accuracy of 55.3%. Note that the best ARX model specification was more successful in forecasting high spot prices than high balancing prices (q 0 = 0.606 > 0.563 = q 1 ). When the disaggregated models were considered, it was observed that there were no substantial differences between model types and model specifications. The classification powers were significantly lower than those of the aggregated models, and the highest one reached 53.2%. This still exceeded the naïve, balancing market strategy, but the gains for data driven strategies were much less pronounced. Note: ARX is the autoregressive model with exogenous variables; X stands for the subset of exogenous variables; L defines the lag structure; T is the length of the calibration window; measures p, q 0 , and q 1 are defined by (10)- (12). The forecast threshold is µ = 0.3.
The financial gains from choosing the data driven trading strategy are presented in Table 4. Two measures, total yearly profits, π, and 5% VaR of daily profits, are presented in the last two columns. First, it can be observed that all the best performing models declassified the benchmarks. The highest yearly profit from selling 1 MWh was 84, 191 PLN, which was equivalent to around 19, 809 EUR (in the year 2017, the PLN exchange rate oscillated around 1 EUR = 4.25 PLN). At the same time, selling the whole production in the balancing market gave 82,576 PLN (19,430 EUR). Since the naive, balancing market strategy brought substantial profits, it was difficult to beat it. This is reflected by the fact that only seven out of 15 presented models gave profits exceeding 82,576 PLN. Profits from selling in the balancing market were burdened with some risk. The VaR showed that in 5% of days, the generator potentially lost, depending on the model specification, between 695 PLN and 829 PLN (163-195 EUR).
When the performance of different models was analyzed, it was observed that, similar to classification power, the most profitable forecasts were provided by ARX models, which separately described prices in the two competing markets. However, it should be noticed that in majority of cases, the model specifications selected with the p measure did not coincide with those bringing the highest profits, as measured by π. This was particularly valid for probit models, for which only one of the most accurate models was chosen as the top profit yielding model.
Finally, the choice of the exogenous variables, lag structure, and the length of the calibration window were compared, based on the results presented in Tables 3 and 4. The results indicated that the forecasts of the ARX models were more accurate when the parameters were estimated using a full year of observations. On the contrary, the best forecasting probit model utilized only 91 days of data. Similar results were obtained when analyzing the profit level. The lag structure used in the best performing models showed that there was no need to include all lags, L = [2, ..., 7], and it was more efficient to choose L = [2,7] or L = [2]. Finally, when the choice of the exogenous variables was considered, the outcomes indicated that the most important variable in predicting the price spread sign was the forecasted wind generation, X 2 . It was included in the majority of the best performing models and model specifications.
As mentioned in Section 3.2, the forecasts based on probit models depend on the assumed level of threshold µ. The total profits, π, for different levels of thresholds are presented in Figure 3. The models are first ranked according to their forecast efficiency and then the profits, conditional on the threshold level, are presented. It seems that probit models underestimated the probability Prob(Y ht = 1), and decreasing the threshold from 0.5 to 0.2 − 0.4 led to an increase of the overall profits. Therefore, Tables 3 and 4 show the results for µ = 0.3. Table 4. The comparison of profits and risks for Polish data. Note that the results for only the top three, best-performing model specifications in terms of total yearly profit π, relative to the day-ahead strategy, are reported in each group.

No.
X L T p π VaR 5% Aggregated, daily models ARX, P 0 t and P Notice: X stands for the subset of exogenous variables; L defines of the lag structure; and T is the length of the calibration window. The forecast threshold is µ = 0.3.

Germany
The German electricity market is one of the most mature in Europe. The intraday market is liquid and allows for adjustment of the trade contracts to varying conditions; for example, intermittent RES generation. Although the electricity prices in the intraday market are, on average, higher than on the day-ahead market (see Table 2), only in 48.9% of cases the price spread ∆Y ht is positive. Also, the average price levels seem similar in the two analyzed markets, which makes spread forecasting a demanding exercise.
The accuracy of the proposed prediction methods for the German market is summarized in Table 5. Contrary to previous results, the highest classification power was achieved by probit models estimated for aggregated data. The probability of a correct decision exceeded 50% only slightly and reached 54.2% for the top ARX models. The performance of the ARX models was slightly worse, and the best specification gave correct classification in 53.4% of cases. It was observed that, for the top models, the predictive power was q 0 > q 1 , indicating that the models falsely predicted that intraday prices were higher than the day-ahead ones.
The results, in terms of total profits and risk, are presented in Table 6. First, it should be noticed that the naïve, intraday strategy led to a low profit, only 676 EUR a year, and a VaR of −156 EUR. This result was significantly improved by applying a data-driven approach. All of the presented models increased the profit and, at the same time, decreased the risk measured by VaR. Among them, probit models using the aggregated, daily data dominated by a wide margin. The third-best probit model, with a very short, 30 day calibration window and no exogenous variables, gave a yearly profit of 2880 EUR, which was higher than the best outcome of any other model type. The best probit model used a 182 day calibration window and total load as exogenous, leading to a yearly profit of 3100 EUR.
Similar to the Polish market, the two ways of evaluating forecasts-statistical and financial-did not coincide. This shows that, from the perspective of the generator, the most profitable model did not necessarily correctly classify all the observations. The reason for this discrepancy is the fact that profits were mainly driven by spikes and price differences, which were large in magnitude. At the same time, most of the observed spreads were close to zero, and hence were less influential on the financial outcome.
When model specification is considered, outcomes confirmed some of the results obtained for the Polish market. First, the top models, in terms of profits, had a reduced lag structure, with L = [2] or L = [2,7]. Second, the ARX models performed well for a long, yearly calibration window, whereas probit models provided the best forecasts for medium-length window sizes. Finally, it seems that both exogenous variables-total load and wind generation-affected the spread forecasts significantly. The best ARX model used information on both X 1 and X 2 . When probit models are considered, the most profitable one included only the total load.
The effect of the threshold on the performance of probit models is illustrated in Figure 4. It can be observed that the highest profits were obtained for µ = 0.4, for which the results are presented in Tables 5 and 6.   Notice: X stands for the subset of exogenous variables; L defines the lag structure; T is the length of the calibration window; and measures p, q 0 , and q 1 are defined by (10)- (12). The forecast threshold is µ = 0.4. Notice: X stands for the subset of exogenous variables; L defines of the lag structure; and T is the length of the calibration window. The forecast threshold is µ = 0.4.

Conclusions
In this article, we focused on predicting the sign of the spread between the day-ahead and the intraday/balancing market prices. Two types of econometric models were examined: ARX and probit models. In both modeling approaches, the dependent variable is linked to past prices and to a set of exogenous variables. Various model specifications were considered, depending on the data aggregation level, lag structure, length of the calibration window, and the set of exogenous variables.
First, the impact of aggregation level on the forecast performance was analyzed. We showed that inclusion of more information does not result in higher profits or more accurate classifications. The aggregated, daily models provided forecasts which outperformed their disaggregated, hourly counterparts. This outcome shows that the noise included in the hourly data could lead to incorrect trading strategies and, hence, decrease potential profits.
Second, different lag structures, L, were compared. We showed that it is reasonable to account for a weekly seasonality by including lag i = 7. On the other hand, the most efficient specifications restricted the number of lags included in the model, and chose L = [2,7].
The performance of various model specifications was evaluated for different lengths of the calibration window. This issue has been recently discussed in the literature [26,27]. In particular, Marcjasz et al. [27] showed the impact of the sample size on forecasting accuracy. They indicated that it is sometimes more efficient to use a shorter estimation window, because it can adjust better to the nonlinear behavior of the variables. The results presented in this article confirm these findings and show that, for some models, a short (monthly or quarterly) window size is optimal, in terms of accuracy and profits.
When the choice of exogenous variables was considered, the results indicated that the relationship between the day-ahead and intraday/balancing prices depends on the behavior of fundamental variables. It can reflect two phenomena: Inefficient forecasts of the fundamentals and strategic behavior of market participants. In the first case, generators utilize additional information, which is not included in the forecasts, but could be correlated with them. This leads to the dependence of the price difference on the predicted total load/demand or wind generation. Second, market players-conventional generators and trading companies-can strategically choose their imbalances and, hence, decide on a position in the intraday/balancing market in order to optimize their profits and reduce their risk (see [8,24]).
Finally, the presented results encourage a discussion on the most accurate forecast evaluation method. We show that traditional measures do not coincide with financial measures and, hence, may fail to address questions important for practitioners. Therefore, choosing a model which is best, in terms of classification or prediction power, could be misleading and result in lower profits. Some issues, such as the choice of the optimal economic measure or an adjustment of the estimation method to account for financial gains, are left for further analysis.