Next Article in Journal
Study of a Thin Film Aluminum-Air Battery
Next Article in Special Issue
Assessing and Comparing Short Term Load Forecasting Performance
Previous Article in Journal
iABACUS: A Wi-Fi-Based Automatic Bus Passenger Counting System
Previous Article in Special Issue
Solving the Cold-Start Problem in Short-Term Load Forecasting Using Tree-Based Methods
 
 
Article

Load Nowcasting: Predicting Actuals with Limited Data

House of Energy Markets and Finance, University of Duisburg-Essen, 45141 Essen, Germany
Energies 2020, 13(6), 1443; https://doi.org/10.3390/en13061443
Received: 20 February 2020 / Revised: 10 March 2020 / Accepted: 10 March 2020 / Published: 20 March 2020
(This article belongs to the Special Issue Short-Term Load Forecasting 2019)

Abstract

We introduce the problem of load nowcasting to the energy forecasting literature. The recent load of the objective area is predicted based on limited available metering data within this area. Thus, slightly different from load forecasting, we are predicting the recent past using limited available metering data from the supply side of the system. Next, to an industry benchmark model, we introduce multiple high-dimensional models for providing more accurate predictions. They evaluate metered interconnector and generation unit data of different types like wind and solar power, storages, and nuclear and fossil power plants. Additionally, we augment the model by seasonal and autoregressive components to improve the nowcasting performance. We consider multiple estimation techniques based on the lassoand ridge and study the impact of the choice of the training/calibration period. The methodology is applied to a European TSO dataset from 2014 to 2019. The overall results show that in comparison to the industry benchmark, an accuracy improvement in terms of MAE and RMSE of about 60% is achieved. The best model is based on the ridge estimator and uses a specific non-standard shrinkage target. Due to the linear model structure, we can easily interpret the model output.
Keywords: load forecasting; electricity consumption; lasso; Tikhonov regularization; load metering; preliminary load load forecasting; electricity consumption; lasso; Tikhonov regularization; load metering; preliminary load

1. Introduction and Motivation

In electricity system management, there is a wide range of load forecasting literature [1]. On a high hierarchy level, usually, the transmission system operator (TSO) and sometimes the distribution system operator (DSO) are responsible for the metering and publishing of the load in the corresponding electricity system. When it comes to the details, there exists a wide range of definitions for electrical load; see, e.g., [2,3]. In many countries, there exist accounting rules for the system operator, which define the metering process for billing and management purposes. Thus, from the economic point of view, these load values are very important for the generation and consumption side such as the system operator. However, in many countries, these values are finally published with a large delay with respect to delivery. For instance, PJM published the final metered load values with a delay of up to 90 day. Similarly, in Germany, the TSO published those final metered values in accordance with the accounting rules with a similar delay of up to three months.
In practice, the system operators also publish electrical load real-time data just after delivery with a very small time lag, usually less than an hour. Those load values are often referred to as preliminary/actual/instantaneous/estimated load, depending on the considered market. Of course, these preliminary load values should be as close as possible to the final metered load values that are computed with respect to the accounting rules for the electricity system. Still, there are usually deviations, which might deviate substantially in magnitude. For the computation of the preliminary load, the system operator usually only has limited metering data available to deduce the load values for the overall electricity system.
In this paper, we address the problem of providing more accurate preliminary load values just after delivery when there is only limited metering information in the system available. Those preliminary load values should be as close as possible to the metered load, which is derived with respect to the accounting and metering rules. The academic literature on this topic available is very limited; see [4].
We contribute to this topic and propose an efficient and robust method for nowcasting load using machine learning and data science techniques. In the data science and forecasting literature, especially in applications to economics and meteorology, the phrase nowcasting is used for predicting extremely short-term forecast or predicting the very recent past [5,6,7,8,9]. As mentioned, in our electricity load situation, we are exactly in the case of predicting the very recent past load values under limited data availability. Hence, we propose the phrase load nowcasting for these situations.
In this manuscript, we first introduce the nowcasting problem in detail. Then, we propose several nowcasting models that are oriented to the load forecasting literature. Afterwards, we proceed with a nowcasting study to validate the models and discuss the corresponding results, including the interpretation of the best performing model. We close with a summary and some conclusions.

2. The Nowcasting Problem

2.1. Formal Problem Description

Based on the accounting rules, the system operator has to compute the final load values of the objective region for which he/she is responsible. We denote Y t the corresponding load values at time point t. The detailed computation depends on the regulatory details and the mentioned accounting rules of the considered electricity market. Still, independent of the market, all accounting rules that determine the load Y t have in common that they specify the system balance, so the match of the supply and demand, the interconnection with neighboring areas, and potential grid losses.
Under the assumption of no grid losses, we could state for each time point t that:
Y t = i Consumption _ of _ unit i , t + i Interconnector _ balance i , t
where Consumption _ of _ unit i , t is the electricity consumption of unit i and Interconnector _ balance i , t the imbalance of interconnector i. Obviously, both sums are taken across all consumption units and interconnectors. Of course, from the generation point of view, we can also state:
Y t = i Generation _ of _ unit i , t + i Interconnector _ balance i , t
where Generation _ of _ unit i , t is the generated electricity of generation unit i and Interconnector _ balance i , t . In practice, the latter is easier to compute as we have usually less production units (mainly large power plants) than consumers. Therefore, the latter is usually applied for deriving the load. Moreover, the generation units are often divided into subgroups, dependent on generation type, which could be nuclear, lignite, coal, natural gas, oil, pump storage, hydro, biomass, wind, and solar, among others. In the formulas, the generation based equation above turns into:
Y t = i X G , i , t + i X I , i , t
where X G , i , t is the generation of generation unit i and X I , i , t the interconnector balance i. Again, the sums are taken across all units and interconnectors in the balancing area.
As introduced above, the key problem is that there is only limited metering information at the time of prediction. Therefore, some generation units or interconnectors are not metered (yet) and reduce the number of available time series. Thus, we used:
Y t = L t + ε t
= i = 1 J G X G , i , t + i = 1 J I X I , i , t + ε t
with L t as the overall metered load across all J G metered generation units and all J I interconnectors balance time series datasets. The error term ε t absorbs the missing information of Y t , which is not covered by L t , including potential grid losses and contaminated data. In practice, this leads usually to the fact that the sum of all available metered generation units plus the interconnector imbalance L t is well below the targeted load Y t . In the application example, we show below that this is about 80% of the overall load. Remember that in a perfect metering environment where J G and J I cover all units, it holds L t = Y t or, equivalently, ε t = 0 ; see (1).
Now, the prediction task is to nowcast (or forecast) Y t by Y t ^ given the available information up to time t, i.e., X G , i , t and X I , i , t . A specific restriction is that recent values Y t - k (e.g., Y t , Y t - 1 , Y t - 2 , or Y t - 3 ) are not available for predicting Y t . As mentioned in the Introduction, the last known values usually have a huge delay, often up to 90 days. Thus, we assume that Y t - K is the last known value where K is a relatively large number. In the situation of hourly data with 90 days of publication delay, this would be K = 24 × 90 = 2160 .

2.2. Data and Problem Illustration

We considered a dataset ranging from 31 December 2014 to 30 April 2019 for the region of a European system operator. The data were metered in quarter-hourly resolution, and if not stated otherwise, all load values are given in MW. There were J G = 92 generation time series and J I = 5 five interconnection balance time series available. The generation time series contained seven wind power series and five solar series and a diverse collection of power plant productions of different types: nuclear, lignite, coal, natural gas (NG), oil, pump storage, hydro, and biomass. Potential missing data were replaced by the last known values. Moreover, we applied clock change adjustments to the data due to daylight savings time. Hence, for the last Sunday in March, we interpolated the missing clock change hour, and for the last Sunday in October, we averaged the doubling clock change hour.
In Figure 1, we illustrate an example of the considered dataset for the last week of April 2019. We observed that the load process Y t exhibited the typical daily pattern with smaller values during night than during day time, and smaller values on the weekend than on working days. Additionally, we see that the process L t (see Equation (3)) is the sum of all available meters series X G , i , t and X I , i , t . Note that metering data exhibited negative values, and this held particularly for the transmission data of the interconnectors and the storages. Thus, only if all metered data were positive, the process L t was visually that of all individual generation and interconnector data. Such a particular example period can be spotted for the last hours of Sunday in Figure 1.
Further, we observed that during the illustrated period, the generation had a substantial infeed of wind and solar power. Additionally, we see that nuclear power provided base load energy, but also some coal power plants in the last two days of April 2019. The remaining power plant contributed only little to the energy supply during this period. Finally, we want to highlight that L t followed the same pattern as Y t , but lied consistently below Y t . This also motivated the first simple model for predicting  Y t .

3. Nowcasting Models

3.1. Benchmark Model

The industry benchmark from the system operator solves the problem stated above by a linear regression on L t motivated by Equation (2). Thus,
Y t = α 0 + α 1 L t + ε t
To estimate the unknown coefficients α 0 and α 1 , the industry benchmark applies ordinary least squares (OLS) to the past years data of the same month of the target time t. Thus, if we want to predict Y t , which is in January, we take all January values for Y t and L t of the previous year to estimate α 0 and α 1 . As we had quarter-hourly data, this was 31 × 96 data points. By OLS, we used α ^ 0 and α ^ 1 and computed nowcasts Y t by Y ^ t = α ^ 0 + α ^ 1 L t .
The estimation principle is visualized in Figure 2. Here, α 0 and α 1 of Model (4) were estimated using the input data from April 2018 for estimating Y t in April 2019. Note that we will generalize this estimation method slightly and consider a broader range of training periods options in the application.

3.2. Proposed Nowcasting Model

The proposed model was motivated by Equation (3). First, we imposed a linear model on the individual generation and interconnector components by:
Y t = β 0 + i = 1 J G β G , i X G , i , t + i = 1 J I β I , i X I , i , t + ε t
= β 0 + β G X G , t + β I X I , t + ε t
= β 0 + β F X F , t + ε t .
with X F , t = ( X G , t , X I , t ) . We regarded this as a fundamental linear load model, as the only linear inputs were X F , which contained all fundamental information: the generated power data X G , t and the interconnector imbalance X I , t . Note that (7) can be regarded as natural extension of (4) because Model  (7) turns into (4) by choosing β i = α 1 for i > 0 .
However, we extended Model (7) by two further terms: (i) a term that contains seasonal information and (ii) a term that represents autoregressive information. In load forecasting, both terms showed high relevance; see, e.g., [10,11,12,13]. Sometimes, models with many seasonal and autoregressive components performed even very well in short-term forecasting; see, e.g., [14].
Formally, the extended model is given by:
Y t = β 0 + β F X F , t + β S X S , t + β A X A , t + ε t .
= β 0 + β X t + ε t
where X S , t is a vector of seasonal regressors and X A , t is a vector of autoregressive components of Y t . Of course, (8) turns into (7) by choosing β S = 0 and β A = 0 . Note that we also defined X t = ( X F , t , X S , t , X X , t ) , which did not include the intercept. Hence, β = ( β F , β S , β A ) did not include β 0 .
It is widely known that in electricity demand, load and consumption modeling periodic features play an important role. The most important seasonalities are daily, weekly, and annual cycles. We suggested to model the three periodic components by periodic cubic by splines with periodicities S D , S W , and S A , which represent a day, a week, and a (meteorologic) year, as in [15]. For out quarter-hourly data, we had S D = 96 , S W = 96 × 7 = 672 , and S A = 96 × 365 . 24 = 35,063.04. In contrast to Fourier analysis, periodic B-splines have the advantage that the basis functions are local and allow for flexibility. When applied to positive data with positivity constraints, they also benefit from the fact that they are always positive. We chose equidistant basis functions for each period. Additionally, we specified the number basis functions B D , B W , and B A for each period. For our application, we chose B D = 24 , B W = 12 , and B A = 24 . Thus, β S had a length of B D + B W + B A , which was 60 in our application.
Furthermore, we had to specify the autoregressive term in (8). We defined the autoregressive components:
X A , t = ( ( Y t - k ) k K K , ( Y t - k ) k K A )
with two sets of lags K K and K A . K K contained lags around the most recent available Y t - K , and K A contained lags around a calendar year ago. The latter mimicked annual effects.
We specified for the most recent lags:
K K = { K + ( 0 , 1 , , 8 ) , K + S D + ( - 8 , - 7 , , 8 ) , K + 2 S D , K + 8 S D }
which contains the nine most recent known values, the values eight hours around the day before Y t - K , and the lags of the past eight days at the same hour as t. Remember that K = 90 S D in our application; thus, Y t - z with z = K + S D = 91 S D = 13 × 7 S D = 13 S W had the same weekday as the target Y t . For lag around a year ago, we specified:
K A = { 50 S W , 52 S W + ( - 8 , - 7 , , - 1 , 1 , 2 , , 8 ) S D , 52 S W + ( - 8 , - 7 , , 8 ) , 54 S W }
as 52 S W = 364 is approximately one calendar year. In total, K K and K A contributed 54 parameters to the model.
To summarize, the overall (8) had many parameters. In our application scenario, in total, there were 5 + 12 + 80 + 60 + 54 = 211 parameters. As this might lead to overestimation issues when applying plain OLS, we proposed the application of efficient regularization techniques to tackle the nowcasting problem adequately.

3.3. Estimation of Proposed Nowcasting Model

We will see that the estimation procedure (or training method) played an important role in an accurate nowcasting. Obviously, a natural estimation candidate for Model (8) is linear regression. However, as we had many parameters and some of them might contain useless information, this might be suboptimal. Regularization can help to address the problem. In the energy forecasting literature, the lasso (least selection and shrinkage operator) seems to be a popular choice for shrinkage and feature selection methods in linear models; see, e.g., [15,16,17,18]. An extension of the lasso is given by the elastic net, which also has been applied [19,20,21,22,23,24,25].
For introducing the estimation procedure, we require some further notations. Let { 1 , , T } be the time points of available data for Y t . Thus, our objective was to predict the load Y t at time point t = T + K , which corresponds to the actual time point. Let T { 1 , , T } be the training period of size n T . Define Y ( m 0 , s 0 ) = ( ( Y t - m 0 ) / s 0 ) t T as the ( m 0 , s 0 ) -standardized response vector and X ( p , s p ) = ( X i ( m i , s i ) ) i { 1 , , p } = ( ( X i , t - m i ) / s i ) ( i , t ) { 1 , , p } × T as the scaled input matrix with scaling coefficients p = ( m 1 , , m p ) and s p = ( s 1 , , s p ) and number of input parameters p. Denote = ( m 0 , m 1 , , m p ) and s = ( s 0 , s 1 , , s p ) the collections of all scaling coefficients.
Furthermore, denote c as a vector of the same size as β , which will be the shrinkage target. In the vast majority of applications, this is c = 0 . The intuition behind this choice is that a specific regressor has zero impact if it contains useless information, to reduce the garbage in, garbage out problem.
With all the notations above, the elastic net estimator for β in Model (8) is given as:
β ^ λ , α ( , s ; c ) = arg min ( β 0 , β ) R p + 1 1 n T Y ( m 0 , s 0 ) - β 0 + β X ( p , s p ) 2 2 + λ α β - c 1 + λ 1 - α 2 β - c 2 2
where λ , α 0 are tuning parameters, p is the number of parameters (length of β ), and · 1 and · 2 as the standard 1 and 2 norm. The tuning parameters λ and α characterize the regularization properties of the elastic net. For α = 1 , we used the popular choice of the lasso (least absolute shrinkage and selection operator), and for α = 0 , we used the ridge regression, which is also known as Tikhonov regularization. For λ = 0 , we used the OLS solution, and for very large λ , we used a solution very close to the shrinkage target c . In the non-ridge case α > 0 , we even used exactly c as the solution if λ was sufficiently large. For the case of the ridge regression, we had an explicit solution available. This was:
β ^ λ , 0 ( , s ; c ) = X ˜ ( p , s p ) X ˜ ( p , s p ) + Diag λ ˜ - 1 ( X ˜ ( p , s p ) Y ( m 0 , s 0 ) + λ c ˜ )
with X ˜ ( p , s p ) = ( 1 , X ( p , s p ) ) , λ ˜ = ( 0 , λ 1 ) , and c ˜ = ( 0 , c ) . In the elastic net or lasso case with α > 0 , we had efficient estimation techniques based on the coordinate descent or LARS (least angle regression) available. Both options had the drawback that they could only handle the case c = 0 .
However, also the scaling coefficients m and s impacted the estimation substantially. Usually, the scaling coefficients m and s in (10) are standardized so that Y ( m 0 , s 0 ) remains unchanged by m = 0 and s = 1 , and X i ( m i , s i ) has mean zero and standard deviation of one, i.e., it holds that X i ( m i , s i ) 1 = 0 and X i ( m i , s i ) 2 = 1 . The latter can be achieved by choosing m i = n T - 1 X i 1 and s i = n T - 1 ( X i - m i 1 ) ( X i - m i 1 ) . This scaling procedure is standard in the literature and, e.g., the default in the glmnet or lars packages in R for estimation of the elastic net and lasso estimation with c = 0 .
Still, it turned out that for our nowcasting problem, the scaling procedure for X was suboptimal as we ignored historic observations. It is true that Y T was the last known observation. However, for X t , we knew all observations up to T + K , the time point when the forecast was created. Thus, we proposed to compute the scaling coefficients s i and m i on the larger and more recent information set T K = T { T + 1 , , T + K } for all X i . Moreover, we suggested for reasons explained in the next paragraph to scale the Y ( m 0 , s 0 ) by the corresponding sample mean and standard deviation m 0 = n T - 1 Y 1 and s 0 = n T - 1 ( Y - m 0 1 ) ( Y - m 0 1 ) .
Now, we discuss the impact of the shrinkage target c in more detail. We mentioned already that the standard choice c = 0 was motivated by the fact that by default, a regressor has no influence. Only if a regressor contributes substantially to the explanation of the response Y t , the estimated coefficient will deviate from zero and show a corresponding impact. If we have no further information about our regressors, this is a reasonable approach. We will apply this approach to the ridge and lasso estimator and denote them by 0-ridge and 0-lasso.
However, in our situation, we knew something about the fundamental relationship between our response vector Y t and the fundamental regressors X F , t from Equations (1) and (7). This fundamental relationship could help to impose a suitable regularization for our model. We explain this with the following example: Suppose there is a situation where in the in-sample period or training period { 0 , , T } , a certain power plant or interconnector is offline; thus, all observations are zero. A reason could be that it is a new unit that just started operating somewhere after the last observation known Y T . Then, the ridge or lasso estimators with c = 0 will give an estimated coefficient of zero for the corresponding unit. Hence, the power plant will have no out-of-sample contribution to the overall load even though it is operating now, at Y T + K . Thus, from the fundamental point of view, it makes sense to deviate from the shrinkage target of 0 for all generation units or interconnectors. If we assume that the metered values are reasonable, eps. not contaminated by implausible data, then taking these values into account should improve the forecasting accuracy. This holds at least for the situation just explained. Hence, we proposed the choice:
c C = ( c F , c A , S )
with c F = 1 and c A , S = 0 corresponding to the impact as in the perfect fundamental situation from Equation (1). Obviously, the vector c F had a length of β F , and c A , S had the aggregated length of β A and β F . We applied this choice for the ridge regression only and denoted it by c-ridge. The reason why this choice was not applied to lasso or elastic net estimators with α > 0 was the unavailability of efficient estimation algorithms.

4. Nowcasting Study

We conducted a rolling window nowcasting study using the considered European dataset, and the design was similar to a standard rolling window forecasting study, as illustrated in Figure 3. The initial last known load value Y T was on 29 January 2018 at 23:45. Based on historic data, we nowcast the S D = 96 values Y T + K + 1 , , Y T + K + S D . We considered a publication delay of K = 90 × 96 = 8640 (90 days), which resulted in the first nowcast being on 30 April 2018, approximately three months later. Then, we shifted the last known load value by a day ( S D = 96 time points) to Y T + S D and nowcast Y T + K + S D + 1 , , Y T + K + 2 S D . This procedure was repeated N = 366 times, which gave an out-of-sample time of about a year and around 96 × 366 = 35,136 observations for evaluation. For the in-sample dataset, we considered for our application six choices:
(i)
All available data from the past 37 months (three years plus one month):
( 365 × 3 + 30 - 90 ) × 96 = 99,360 observations of Y t , denoted as 3years
(ii)
All available data from the past 25 months (two years plus one month):
( 365 × 2 + 30 - 90 ) × 96 = 64,320 observations of Y t , denoted as 2years
(iii)
All available data from the past 13 months (one year plus one month):
( 365 + 30 - 90 ) × 96 = 29,280 observations of Y t , denoted as 1year
(iv)
Data of the past year, 120 days centered around the nowcasting day of the past year:
120 × 96 = 11,520 observations of Y t , denoted as 4months
(v)
Data of the past year, 60 days centered around the nowcasting day of the past year:
60 × 96 = 5760 observations of Y t , denoted as 2months
(vi)
Data of the past year, 30 days centered around the nowcasting day of the past year:
30 × 96 = 2880 observations of Y t , denoted as 1month
Option (i) used the maximum amount of data of ( 365 × 3 + 30 - 90 ) = 1035 days, which was also used for illustration in Figure 3. Note that Option (vi) was very close to the industry benchmark approach, which used the data of the month of the previous year for estimating the model parameters.
We considered the all competing models, benchm, 0-lasso ( λ ), 0-ridge ( λ ), and c-ridge( λ ) in the rolling window forecasting study. As emphasized, the lasso and ridge models depended on the tuning parameter λ , which we had to specify. For all models, we considered exponential grids Λ for λ ; in detail: For the ridge models, we chose Λ = 2 L r with L r as an equidistant grid from −10 to 20 of length 100, and for the lasso models, Λ = 2 L l as an equidistant grid from −30 to 3 of length 100. Of course, we did not know in advance the optimal λ . Therefore, we considered for the 0-lasso, 0-ridge, and c-ridge models a version where λ was chosen on the past performance (cumulated loss) of the the corresponding models, initializing with λ = 1 for the first prediction. We denoted the models by 0-lasso * , 0-ridge * , and c-ridge * .
For measuring the nowcasting accuracy or measures for forecasting performance, we considered the out-of-sample MAE (mean absolute error) and the out-of-sample RMSE (root mean square error). To evaluate the forecasting accuracy also for each of the S D = 96 quarter-hours separately, we defined:
MAE = 1 S D s = 1 S D MAE s with MAE s = 1 N n = 1 N | Y i , s - Y ^ i , s |
RMSE = 1 S D s = 1 S D RMSE s with RMSE s = 1 N n = 1 N | Y i , s - Y ^ i , s | 2
Note that our models were regression based, and the forecasted value should coincide with the the expected value. Thus, the RMSE should be preferred for evaluation as it identified the true mean correctly. In contrast, the MAE was optimal for median forecasts. However, it is often used as a robust alternative to the RMSE. For more details on the evaluation of point forecasts, we refer to [26].

5. Results

5.1. Nowcasting Performance

We first discuss the overall nowcasting performance of the considered models. The out-of-sample MAE and RMSE values are given in Table 1 and Table 2. There, we also computed improvements in the MAE and RMSE with respect to the benchmark model benchm estimated on the shorted training period 1month. Remember that ridge * and lasso * chose the tuning parameter based on the past performance, whereas ridge and lasso represented the models that gave ex-post the best prediction accuracy on the λ -grid Λ .
First, we observe that all ridge and lasso models showed clear improvements against the benchmark. The largest improvement of around 60% in both measures was gained by the c-ridge (or c-ridge * ) model calibrated on the training period of 2years. Second, we see that the ridge * and lasso * models showed almost the same performance as ridge and lasso, which indicated that the ex-post selection of λ was not a big problem. Next, the benchmark model benchm with short calibration periods of 1month and 2months showed the best prediction accuracy against the benchmark model. In contrast, the ridge and lasso approaches showed that long training periods of 2years and 3yearsperformed best. The reason was likely that the estimation of many parameters required more data to receive stable parameter estimates. Figure 4 illustrates the solution path of the ridge and lasso models for a calibration period 2years which uses about two years of data.
Here, the · 1 -norm of β ^ as a typical measure for model complexity is plotted against the MAE and RMSE score. Note that β ^ 1 is the sum of all absolute parameters. The solution paths for different λ values of a certain model e.g., c-ridge ( λ ) (red circle), are represented by the color intensity. The darker the color of the symbol within the solution path, the smaller λ . Thus, black symbols correspond to the OLS solution.
We observe that all three models c-ridge ( λ ), 0-ridge ( λ ), and 0-lasso ( λ ) converged to the the OLS solution for small λ . The OLS solution had an MAE of around 500 MW and an RMSE of slightly above 700 MW with an · 1 -norm of β of around 5 . 5 . We clearly see that for small λ values, 0-ridge ( λ ) and 0-lasso ( λ ) obtained smaller β values and tended towards the 0 solution. In contrast, c-ridge ( λ ) had always a similar range of the · 1 -norm of β . The corresponding MAE and RMSE minima has a · 1 -norm around 5.2, which is a similar magnitude as the OLS solution. Thus, the parameter complexity of both solutions was comparable, but the parameters were better selected by the c-ridge approach due to the shrinkage towards a reasonable target, instead of 0 .
Next, we wanted to look at the intraday structure of the nowcasting errors across the 96 quarter-hours. The forecasting accuracy in term of MAE s and RMSE s is visualized in Figure 5. There, we observe that the benchmarks exhibited a relatively clear diurnal pattern. The nowcasting error was largest during the working hours, esp. during the afternoon. For the lasso and ridge models, the daily pattern was substantially reduced. For instance, the MAE s of c-ridge * varied between 383 MW and 484 MW, which was a variation of around 100 MW. The intraday MAE h variation of the MAE of the benchmark model was around 300 MW and significantly larger. However, as the overall forecasting error reduced by 60%, the relative variation of the of the MAE forecasting performance remained at a similar level.
We saw that the proposed models with an in-sample sample size of about two years performed best. It was clear that the computational complexity increased with the amount of data used for training and calibration. Still, in all cases, the models allowed the implementation and application on a real-time basis due to the linear model structure. For instance, the estimation of the c-ridge, 0-ridge, and 0-lasso models on the full λ -grid with a training period of 2years took 3 . 0  s, 0 . 5  s, and, 2 . 3  s, respectively. These times were measured on a standard computer using a simple CPU. The ridge models were estimated using the solve.QP function of the R package quadprog, and the lasso model was trained and calibrated using glmnet function of the R package glmnet.

5.2. Model Interpretation

As our models were linear models, it was relatively easy to interpret the parameters. The easiest way to get an understanding of the impact of each parameter in the model was to evaluate the absolute impact of parameter i with respect to the overall parameter contribution | β ^ i | / β ^ 1 . Those impacts of the c-ridge * model with a training period of about two years such as the benchmark model benchm with training period of about a month are illustrated in the bar chart in Figure 6. As the full model had many parameters, we grouped the impacts | β ^ i | / β ^ 1 by parameter type to maintain readable results.
Obviously, we saw that the only the c-ridgemodel had a contribution from external regressors and autoregressive impacts (EXT_A, EXT_W, and EXT_D represent the annual, weekly, and daily seasonal components; LAGS_A and LAGS_S represent the annual and short-term autoregressive lags), as the benchmark model did not take those effects into account. Here, it seemed that the annual impacts contributed substantially to the c-ridge * model, and this held for both types’ effects from deterministic external regressors (EXT_A) and autoregressive effects (LAGS_A). Furthermore, the daily seasonal component (EXT_D) showed about a 3.5% contribution to the overall solution. For the generation units, we observed that all reduced their absolute impact in the c-ridge * model with respect to the benchmark model. However, all parameters remained relevant.
The interpretation by the absolute impacts | β ^ i | / β ^ 1 was suitable for evaluation of the impact within the estimated model. However, the regressors X i , t lived on completely different scales. To obtain interpretable impacts with respect to the load Y t , we had to evaluate the time series of β ^ i X i , t , which represented the impact of each single component to the final model. Therefore, Figure 7 shows a time series plot of the actual load Y t , the benchmark model benchm nowcasts, and the c-ridge * model nowcasts, along with the estimated contributions β ^ i X i , t for each regressor i .
We observed that for both models, the interconnector, wind, and solar contributed substantially to the final solution. For the c-ridge * nowcast, a very important contribution to Y ^ t came from the annual autoregressive impacts (LAG_A). It mainly had positive contributions, but also some negative contributions. For the c-ridge * nowcast, some moderate impact could be seen from the nuclear power and hydro. The latter contributed more to the negative side than to the positive, which was a bit surprising, as the fundamental model would suggest a positive impact. Furthermore, the benchmark model had no negative contribution from hydro power. All other generation types had only a minor impact for both considered models. Finally, we observed that the intercept contributed around 2000 MW to the final contribution of the c-ridge * model, which was about 10% of the overall load Y t . Remember that about 80% of the load Y t was metered (by generation units and interconnectors). Thus, from the missing 20% load, around a half (=10%) seemed to be base load.

6. Summary and Conclusions

We formally introduced the problem of load nowcasting to the energy forecasting literature. In contrast to load forecasting, the recent load of a certain balancing area was predicted based on limited available metering data within this area. Thus, we were predicting the recent past. We introduced an industry benchmark model and multiple high-dimensional linear model to tackle the nowcasting problem. The model design orientated from load forecasting problems. Next to the impacts of metered generation and interconnector units, the models had seasonal and autoregressive components to improve the prediction performance. We considered multiple estimation techniques based on lasso and ridge and studied the impact of the choice of the training/calibration period.
The overall results showed that in comparison to the industry benchmark, an accuracy improvement in terms of MAE and RMSE of about 60% was achieved. The best model was based on the ridge estimator and used a specific non-standard shrinkage target. Moreover, we highlighted that the model parameters could be interpreted. The overall results showed that the annual effects (deterministic and autoregressive) contributed significantly to the proposed ridge model.
Future research could investigate more nowcasting models, especially non-linear ones, like artificial neural networks or support vector machines. Obviously, the study could be extended to probabilistic nowcasting. The considered nowcasting models could also serve a basis for the construction of load forecasting models. Here, the generation and interconnector units X i , t had to be considered in a lagged manner ( X i , t - k ), potentially for multiple lags. In general, many methodologies can be transferred from energy forecasting, especially from short-term load forecasting.
Finally, the model accuracy might be enriched by the use of more external information. In load forecasting, the (average) temperature of a objective area is often seen as highly relevant. Thus, the incorporation into a nowcasting model could be beneficial as well. This information can be added easily by adding the temperature (and potential non-linear transformations) as a new regressor to the model. We can also add further dummy variables that characterize known structural breaks, e.g., for changes in the regulation or reshaping of the balancing area. Furthermore, it was clear that additional metering information would improve the nowcasting accuracy. With respect to renewable energy information from wind and solar power, a finer geographical resolution might improve the forecasting accuracy, as Figure 7 shows a high importance for a few individual time series of the c-ridge * model with respect to the benchmark model.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
  2. Schumacher, M.; Hirth, L.; How Much Electricity Do We Consume? A Guide to German and European Electricity Consumption and Generation Data (2015). FEEM Working Paper No. 88.2015. Available online: https://ssrn.com/abstract=2715986orhttp://dx.doi.org/10.2139/ssrn.2715986 (accessed on 20 December 2019).
  3. Hirth, L.; Mühlenpfordt, J.; Bulkeley, M. The ENTSO-E Transparency Platform—A review of Europe’s most ambitious electricity data platform. Appl. Energy 2018, 225, 1054–1067. [Google Scholar] [CrossRef]
  4. Gerbec, D.; Gubina, F.; Toros, Z. Actual load profiles of consumers without real time metering. In Proceedings of the IEEE Power Engineering Society General Meeting, San Francisco, CA, USA, 12–16 June 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 2578–2582. [Google Scholar]
  5. Banbura, M.; Giannone, D.; Reichlin, L. Nowcasting. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1717887 (accessed on 20 December 2019).
  6. Sun, J.; Xue, M.; Wilson, J.W.; Zawadzki, I.; Ballard, S.P.; Onvlee-Hooimeyer, J.; Joe, P.; Barker, D.M.; Li, P.W.; Golding, B.; et al. Use of NWP for nowcasting convective precipitation: Recent progress and challenges. Bull. Am. Meteorol. Soc. 2014, 95, 409–426. [Google Scholar] [CrossRef][Green Version]
  7. Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Dutchess, NY, USA, 2015; pp. 802–810. [Google Scholar]
  8. Sanfilippo, A. Solar Nowcasting. In Solar Resources Mapping; Springer: Cham, Switzerland, 2019; pp. 353–367. [Google Scholar]
  9. Sala, S.; Amendola, A.; Leva, S.; Mussetta, M.; Niccolai, A.; Ogliari, E. Comparison of Data-Driven Techniques for Nowcasting Applied to an Industrial-Scale Photovoltaic Plant. Energies 2019, 12, 4520. [Google Scholar] [CrossRef][Green Version]
  10. Gaillard, P.; Goude, Y.; Nedellec, R. Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting. Int. J. Forecast. 2016, 32, 1038–1050. [Google Scholar] [CrossRef]
  11. Ziel, F. Modeling public holidays in load forecasting: A German case study. J. Mod. Power Syst. Clean Energy 2018, 6, 191–207. [Google Scholar] [CrossRef][Green Version]
  12. Ziel, F. Quantile regression for the qualifying match of GEFCom2017 probabilistic load forecasting. Int. J. Forecast. 2019, 35, 1400–1408. [Google Scholar] [CrossRef][Green Version]
  13. Kanda, I.; Veguillas, J.Q. Data preprocessing and quantile regression for probabilistic load forecasting in the GEFCom2017 final match. Int. J. Forecast. 2019, 35, 1460–1468. [Google Scholar] [CrossRef]
  14. Haben, S.; Giasemidis, G.; Ziel, F.; Arora, S. Short term load forecasting and the effect of temperature at the low voltage level. Int. J. Forecast. 2019, 35, 1469–1484. [Google Scholar] [CrossRef][Green Version]
  15. Ziel, F.; Liu, B. Lasso estimation for GEFCom2014 probabilistic electric load forecasting. Int. J. Forecast. 2016, 32, 1029–1037. [Google Scholar] [CrossRef][Green Version]
  16. Dudek, G. Pattern-based local linear regression models for short-term load forecasting. Electr. Power Syst. Res. 2016, 130, 139–147. [Google Scholar] [CrossRef]
  17. Takeda, H.; Tamura, Y.; Sato, S. Using the ensemble Kalman filter for electricity load forecasting and analysis. Energy 2016, 104, 184–198. [Google Scholar] [CrossRef]
  18. Wang, Y.; Gan, D.; Zhang, N.; Xie, L.; Kang, C. Feature selection for probabilistic load forecasting via sparse penalized quantile regression. J. Modern Power Syst. Clean Energy 2019, 7, 1200–1209. [Google Scholar] [CrossRef][Green Version]
  19. Uniejewski, B.; Nowotarski, J.; Weron, R. Automated variable selection and shrinkage for day-ahead electricity price forecasting. Energies 2016, 9, 621. [Google Scholar] [CrossRef][Green Version]
  20. Ambach, D.; Croonenbroeck, C. Space-time short-to medium-term wind speed forecasting. Stat. Methods Appl. 2016, 25, 5–20. [Google Scholar] [CrossRef]
  21. Liu, W.; Dou, Z.; Wang, W.; Liu, Y.; Zou, H.; Zhang, B.; Hou, S. Short-term load forecasting based on elastic net improved GMDH and difference degree weighting optimization. Appl. Sci. 2018, 8, 1603. [Google Scholar] [CrossRef][Green Version]
  22. Kath, C.; Ziel, F. The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts. Energy Econ. 2018, 76, 411–423. [Google Scholar] [CrossRef][Green Version]
  23. Narajewski, M.; Ziel, F. Econometric modelling and forecasting of intraday electricity prices. J. Commod. Mark. 2019, 100107. [Google Scholar] [CrossRef][Green Version]
  24. Pirbazari, A.M.; Chakravorty, A.; Rong, C. Evaluating feature selection methods for short-term load forecasting. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 27 February–2 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
  25. Muniain, P.; Ziel, F. Probabilistic forecasting in day-ahead electricity markets: Simulating peak and off-peak prices. Int. J. Forecast. 2020. [Google Scholar] [CrossRef][Green Version]
  26. Gneiting, T. Making and evaluating point forecasts. J. Am. Stat. Assoc. 2011, 106, 746–762. [Google Scholar] [CrossRef][Green Version]
Figure 1. Time series plot of the load Y t and the process L t with its single components X G , i , t and X I , i , t classified by generation type in the last week of April 2019.
Figure 1. Time series plot of the load Y t and the process L t with its single components X G , i , t and X I , i , t classified by generation type in the last week of April 2019.
Energies 13 01443 g001
Figure 2. (Left) Scatter plot of the process L t (see (3)) and load Y t in April 2018 with the fitted line of Model (4). (Right) Time series plot of Y t , L t , and Y ^ t = α ^ 0 + α ^ 1 L t for the last week of April 2019 as in Figure 1.
Figure 2. (Left) Scatter plot of the process L t (see (3)) and load Y t in April 2018 with the fitted line of Model (4). (Right) Time series plot of Y t , L t , and Y ^ t = α ^ 0 + α ^ 1 L t for the last week of April 2019 as in Figure 1.
Energies 13 01443 g002
Figure 3. Illustration of the nowcasting study design.
Figure 3. Illustration of the nowcasting study design.
Energies 13 01443 g003
Figure 4. Graph of β ^ 1 against MAE (left) and RMSE (right) of the selected lasso and ridge models, illustrating the solution paths for different λ values. The darker the color, the smaller the shrinkage (black = OLS).
Figure 4. Graph of β ^ 1 against MAE (left) and RMSE (right) of the selected lasso and ridge models, illustrating the solution paths for different λ values. The darker the color, the smaller the shrinkage (black = OLS).
Energies 13 01443 g004
Figure 5. Intraday prediction accuracy in MAE s and RMSE s of selected models.
Figure 5. Intraday prediction accuracy in MAE s and RMSE s of selected models.
Energies 13 01443 g005
Figure 6. Bar chart of the absolute impact | β ^ i | / β ^ 1 of Model c-ridge * for 2years and benchm for 1month grouped by parameter type.
Figure 6. Bar chart of the absolute impact | β ^ i | / β ^ 1 of Model c-ridge * for 2years and benchm for 1month grouped by parameter type.
Energies 13 01443 g006
Figure 7. Time series plot of the actual load Y t (black), with the fitted model of the benchmark model (red) and the c-ridge * approach (blue) on 6–12 August 2018. Additionally, the estimated impact of the single components β ^ i X i , t for the c-ridge * model (bottom) and benchmark model (top) classified by type with different colors is illustrated.
Figure 7. Time series plot of the actual load Y t (black), with the fitted model of the benchmark model (red) and the c-ridge * approach (blue) on 6–12 August 2018. Additionally, the estimated impact of the single components β ^ i X i , t for the c-ridge * model (bottom) and benchmark model (top) classified by type with different colors is illustrated.
Energies 13 01443 g007
Table 1. Out-of-sample MAE in MW with relative improvement in % with respect to the benchmark trained on the shortest training period for all models and training periods. A heat map is used to indicate better (→ green) and worse (→ red) performing models.
Table 1. Out-of-sample MAE in MW with relative improvement in % with respect to the benchmark trained on the shortest training period for all models and training periods. A heat map is used to indicate better (→ green) and worse (→ red) performing models.
Models →benchmc-ridge * 0-ridge * 0-lasso * c-ridge0-ridge0-lasso
PeriodMAEImp.MAEImp.MAEImp.MAEImp.MAEImp.MAEImp.MAEImp.
3years1302.7−18.3453.658.8483.656.1509.553.7452.158.9481.456.3507.053.9
2years1328.8−20.7430.060.9474.156.9487.855.7428.761.1469.057.4484.756.0
1year1290.5−17.2653.940.6588.746.5591.046.3630.542.7581.747.2588.846.5
4months1130.2−2.7934.315.1549.550.1583.847.0923.216.1538.351.1578.647.4
2months1097.90.3944.514.2602.445.3626.643.1919.616.5593.846.1617.243.9
1month1100.90.0918.016.6607.144.9635.042.3913.117.1604.145.1629.342.8
Table 2. Out-of-sample RMSE in MW with relative improvement in % with respect to the benchmark trained on the shortest training period for all models and training periods. A heat map is used to indicate better (→ green) and worse (→ red) performing models.
Table 2. Out-of-sample RMSE in MW with relative improvement in % with respect to the benchmark trained on the shortest training period for all models and training periods. A heat map is used to indicate better (→ green) and worse (→ red) performing models.
Models →benchmc-ridge * 0-ridge * 0-lasso * c-ridge0-ridge0-lasso
PeriodRMSEImp.RMSEImp.RMSEImp.RMSEImp.RMSEImp.RMSEImp.RMSEImp.
3years1556.0−18.8578.955.8710.045.8868.533.7582.255.5713.045.6825.037.0
2years1562.4−19.3560.457.2705.146.2759.542.0556.857.5699.546.6721.944.9
1year1460.6−11.51051.319.7858.934.4940.928.2919.929.8817.237.6923.329.5
4months1332.9−1.81185.39.5776.640.7960.926.61102.315.8754.642.4880.632.8
2months1299.50.81274.32.7877.133.0975.925.51121.314.4828.236.8966.926.2
1month1309.70.01147.912.4850.335.1917.629.91150.512.2858.234.5914.530.2
Back to TopTop