# Load Nowcasting: Predicting Actuals with Limited Data

## Abstract

**:**

## 1. Introduction and Motivation

## 2. The Nowcasting Problem

#### 2.1. Formal Problem Description

#### 2.2. Data and Problem Illustration

## 3. Nowcasting Models

#### 3.1. Benchmark Model

#### 3.2. Proposed Nowcasting Model

#### 3.3. Estimation of Proposed Nowcasting Model

**m**and

**s**impacted the estimation substantially. Usually, the scaling coefficients

**m**and

**s**in (10) are standardized so that $\mathbb{Y}({\mathit{m}}_{0},{\mathit{s}}_{0})$ remains unchanged by $\mathit{m}=0$ and $\mathit{s}=1$, and ${\mathbb{X}}_{\mathit{i}}({\mathit{m}}_{\mathit{i}},{\mathit{s}}_{\mathit{i}})$ has mean zero and standard deviation of one, i.e., it holds that ${\mathbb{X}}_{\mathit{i}}{({\mathit{m}}_{\mathit{i}},{\mathit{s}}_{\mathit{i}})}^{\prime}1=0$ and $\parallel {\mathbb{X}}_{\mathit{i}}({\mathit{m}}_{\mathit{i}},{\mathit{s}}_{\mathit{i}}){\parallel}_{2}=1$. The latter can be achieved by choosing ${\mathit{m}}_{\mathit{i}}={\mathit{n}}_{\mathbb{T}}^{-1}{\mathbb{X}}_{\mathit{i}}^{\prime}1$ and ${\mathit{s}}_{\mathit{i}}=\sqrt{{\mathit{n}}_{\mathbb{T}}^{-1}{({\mathbb{X}}_{\mathit{i}}-{\mathit{m}}_{\mathit{i}}1)}^{\prime}({\mathbb{X}}_{\mathit{i}}-{\mathit{m}}_{\mathit{i}}1)}$. This scaling procedure is standard in the literature and, e.g., the default in the

`glmnet`or

`lars`packages in

`R`for estimation of the elastic net and lasso estimation with $\mathit{c}=0$.

**0-ridge**and

**0-lasso**.

**c-ridge**. The reason why this choice was not applied to lasso or elastic net estimators with $\mathit{\alpha}>0$ was the unavailability of efficient estimation algorithms.

## 4. Nowcasting Study

- (i)
- All available data from the past 37 months (three years plus one month):$(365\times 3+30-90)\times 96=$ 99,360 observations of ${\mathit{Y}}_{\mathit{t}}$, denoted as
**3years** - (ii)
- All available data from the past 25 months (two years plus one month):$(365\times 2+30-90)\times 96=$ 64,320 observations of ${\mathit{Y}}_{\mathit{t}}$, denoted as
**2years** - (iii)
- All available data from the past 13 months (one year plus one month):$(365+30-90)\times 96=$ 29,280 observations of ${\mathit{Y}}_{\mathit{t}}$, denoted as
**1year** - (iv)
- Data of the past year, 120 days centered around the nowcasting day of the past year:$120\times 96=$ 11,520 observations of ${\mathit{Y}}_{\mathit{t}}$, denoted as
**4months** - (v)
- Data of the past year, 60 days centered around the nowcasting day of the past year:$60\times 96=5760$ observations of ${\mathit{Y}}_{\mathit{t}}$, denoted as
**2months** - (vi)
- Data of the past year, 30 days centered around the nowcasting day of the past year:$30\times 96=2880$ observations of ${\mathit{Y}}_{\mathit{t}}$, denoted as
**1month**

**benchm**,

**0-lasso**($\mathit{\lambda}$),

**0-ridge**($\mathit{\lambda}$), and

**c-ridge**($\mathit{\lambda}$) in the rolling window forecasting study. As emphasized, the lasso and ridge models depended on the tuning parameter $\mathit{\lambda}$, which we had to specify. For all models, we considered exponential grids $\mathsf{\Lambda}$ for $\mathit{\lambda}$; in detail: For the ridge models, we chose $\mathsf{\Lambda}={2}^{{\mathcal{L}}_{\mathit{r}}}$ with ${\mathcal{L}}_{\mathit{r}}$ as an equidistant grid from −10 to 20 of length 100, and for the lasso models, $\mathsf{\Lambda}={2}^{{\mathcal{L}}_{\mathit{l}}}$ as an equidistant grid from −30 to 3 of length 100. Of course, we did not know in advance the optimal $\mathit{\lambda}$. Therefore, we considered for the

**0-lasso**,

**0-ridge**, and

**c-ridge**models a version where $\mathit{\lambda}$ was chosen on the past performance (cumulated loss) of the the corresponding models, initializing with $\mathit{\lambda}=1$ for the first prediction. We denoted the models by

**0-lasso${}^{*}$**,

**0-ridge${}^{*}$**, and

**c-ridge${}^{*}$**.

## 5. Results

#### 5.1. Nowcasting Performance

**benchm**estimated on the shorted training period

**1month**. Remember that

**ridge${}^{*}$**and

**lasso${}^{*}$**chose the tuning parameter based on the past performance, whereas ridge and lasso represented the models that gave ex-post the best prediction accuracy on the $\mathit{\lambda}$-grid $\mathsf{\Lambda}$.

**c-ridge**(or

**c-ridge${}^{*}$**) model calibrated on the training period of

**2years**. Second, we see that the

**ridge${}^{*}$**and

**lasso${}^{*}$**models showed almost the same performance as

**ridge**and

**lasso**, which indicated that the ex-post selection of $\mathit{\lambda}$ was not a big problem. Next, the benchmark model

**benchm**with short calibration periods of

**1month**and

**2months**showed the best prediction accuracy against the benchmark model. In contrast, the ridge and lasso approaches showed that long training periods of

**2years**and

**3years**performed best. The reason was likely that the estimation of many parameters required more data to receive stable parameter estimates. Figure 4 illustrates the solution path of the ridge and lasso models for a calibration period

**2years**which uses about two years of data.

**c-ridge**($\mathit{\lambda}$) (red circle), are represented by the color intensity. The darker the color of the symbol within the solution path, the smaller $\mathit{\lambda}$. Thus, black symbols correspond to the OLS solution.

**c-ridge**($\mathit{\lambda}$),

**0-ridge**($\mathit{\lambda}$), and

**0-lasso**($\mathit{\lambda}$) converged to the the OLS solution for small $\mathit{\lambda}$. The OLS solution had an MAE of around 500 MW and an RMSE of slightly above 700 MW with an ${\parallel \xb7\parallel}_{1}$-norm of $\mathit{\beta}$ of around $5.5$. We clearly see that for small $\mathit{\lambda}$ values,

**0-ridge**($\mathit{\lambda}$) and

**0-lasso**($\mathit{\lambda}$) obtained smaller $\mathit{\beta}$ values and tended towards the $0$ solution. In contrast,

**c-ridge**($\mathit{\lambda}$) had always a similar range of the ${\parallel \xb7\parallel}_{1}$-norm of $\mathit{\beta}$. The corresponding MAE and RMSE minima has a ${\parallel \xb7\parallel}_{1}$-norm around 5.2, which is a similar magnitude as the OLS solution. Thus, the parameter complexity of both solutions was comparable, but the parameters were better selected by the

**c-ridge**approach due to the shrinkage towards a reasonable target, instead of $0$.

**c-ridge${}^{*}$**varied between 383 MW and 484 MW, which was a variation of around 100 MW. The intraday MAE h variation of the MAE of the benchmark model was around 300 MW and significantly larger. However, as the overall forecasting error reduced by 60%, the relative variation of the of the MAE forecasting performance remained at a similar level.

**c-ridge**,

**0-ridge**, and

**0-lasso**models on the full $\mathit{\lambda}$-grid with a training period of

**2years**took $3.0$ s, $0.5$ s, and, $2.3$ s, respectively. These times were measured on a standard computer using a simple CPU. The ridge models were estimated using the

`solve.QP`function of the

`R`package

`quadprog`, and the lasso model was trained and calibrated using

`glmnet`function of the

`R`package

`glmnet`.

#### 5.2. Model Interpretation

**c-ridge${}^{*}$**model with a training period of about two years such as the benchmark model

**benchm**with training period of about a month are illustrated in the bar chart in Figure 6. As the full model had many parameters, we grouped the impacts $|{\widehat{\mathit{\beta}}}_{\mathit{i}}|/\parallel \widehat{\mathit{\beta}}{\parallel}_{1}$ by parameter type to maintain readable results.

**c-ridge${}^{*}$**model, and this held for both types’ effects from deterministic external regressors (EXT_A) and autoregressive effects (LAGS_A). Furthermore, the daily seasonal component (EXT_D) showed about a 3.5% contribution to the overall solution. For the generation units, we observed that all reduced their absolute impact in the

**c-ridge${}^{*}$**model with respect to the benchmark model. However, all parameters remained relevant.

**benchm**nowcasts, and the

**c-ridge${}^{*}$**model nowcasts, along with the estimated contributions ${\widehat{\mathit{\beta}}}_{\mathit{i}}{\mathit{X}}_{\mathit{i},\mathit{t}}$ for each regressor $\mathit{i}$.

**c-ridge${}^{*}$**nowcast, a very important contribution to ${\widehat{\mathit{Y}}}_{\mathit{t}}$ came from the annual autoregressive impacts (LAG_A). It mainly had positive contributions, but also some negative contributions. For the

**c-ridge${}^{*}$**nowcast, some moderate impact could be seen from the nuclear power and hydro. The latter contributed more to the negative side than to the positive, which was a bit surprising, as the fundamental model would suggest a positive impact. Furthermore, the benchmark model had no negative contribution from hydro power. All other generation types had only a minor impact for both considered models. Finally, we observed that the intercept contributed around 2000 MW to the final contribution of the

**c-ridge${}^{*}$**model, which was about 10% of the overall load ${\mathit{Y}}_{\mathit{t}}$. Remember that about 80% of the load ${\mathit{Y}}_{\mathit{t}}$ was metered (by generation units and interconnectors). Thus, from the missing 20% load, around a half (=10%) seemed to be base load.

## 6. Summary and Conclusions

**c-ridge${}^{*}$**model with respect to the benchmark model.

## Funding

## Conflicts of Interest

## References

- Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast.
**2016**, 32, 914–938. [Google Scholar] [CrossRef] - Schumacher, M.; Hirth, L.; How Much Electricity Do We Consume? A Guide to German and European Electricity Consumption and Generation Data (2015). FEEM Working Paper No. 88.2015. Available online: https://ssrn.com/abstract=2715986orhttp://dx.doi.org/10.2139/ssrn.2715986 (accessed on 20 December 2019).
- Hirth, L.; Mühlenpfordt, J.; Bulkeley, M. The ENTSO-E Transparency Platform—A review of Europe’s most ambitious electricity data platform. Appl. Energy
**2018**, 225, 1054–1067. [Google Scholar] [CrossRef] - Gerbec, D.; Gubina, F.; Toros, Z. Actual load profiles of consumers without real time metering. In Proceedings of the IEEE Power Engineering Society General Meeting, San Francisco, CA, USA, 12–16 June 2005; IEEE: Piscataway, NJ, USA, 2005; pp. 2578–2582. [Google Scholar]
- Banbura, M.; Giannone, D.; Reichlin, L. Nowcasting. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1717887 (accessed on 20 December 2019).
- Sun, J.; Xue, M.; Wilson, J.W.; Zawadzki, I.; Ballard, S.P.; Onvlee-Hooimeyer, J.; Joe, P.; Barker, D.M.; Li, P.W.; Golding, B.; et al. Use of NWP for nowcasting convective precipitation: Recent progress and challenges. Bull. Am. Meteorol. Soc.
**2014**, 95, 409–426. [Google Scholar] [CrossRef][Green Version] - Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.c. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Dutchess, NY, USA, 2015; pp. 802–810. [Google Scholar]
- Sanfilippo, A. Solar Nowcasting. In Solar Resources Mapping; Springer: Cham, Switzerland, 2019; pp. 353–367. [Google Scholar]
- Sala, S.; Amendola, A.; Leva, S.; Mussetta, M.; Niccolai, A.; Ogliari, E. Comparison of Data-Driven Techniques for Nowcasting Applied to an Industrial-Scale Photovoltaic Plant. Energies
**2019**, 12, 4520. [Google Scholar] [CrossRef][Green Version] - Gaillard, P.; Goude, Y.; Nedellec, R. Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting. Int. J. Forecast.
**2016**, 32, 1038–1050. [Google Scholar] [CrossRef] - Ziel, F. Modeling public holidays in load forecasting: A German case study. J. Mod. Power Syst. Clean Energy
**2018**, 6, 191–207. [Google Scholar] [CrossRef][Green Version] - Ziel, F. Quantile regression for the qualifying match of GEFCom2017 probabilistic load forecasting. Int. J. Forecast.
**2019**, 35, 1400–1408. [Google Scholar] [CrossRef][Green Version] - Kanda, I.; Veguillas, J.Q. Data preprocessing and quantile regression for probabilistic load forecasting in the GEFCom2017 final match. Int. J. Forecast.
**2019**, 35, 1460–1468. [Google Scholar] [CrossRef] - Haben, S.; Giasemidis, G.; Ziel, F.; Arora, S. Short term load forecasting and the effect of temperature at the low voltage level. Int. J. Forecast.
**2019**, 35, 1469–1484. [Google Scholar] [CrossRef][Green Version] - Ziel, F.; Liu, B. Lasso estimation for GEFCom2014 probabilistic electric load forecasting. Int. J. Forecast.
**2016**, 32, 1029–1037. [Google Scholar] [CrossRef][Green Version] - Dudek, G. Pattern-based local linear regression models for short-term load forecasting. Electr. Power Syst. Res.
**2016**, 130, 139–147. [Google Scholar] [CrossRef] - Takeda, H.; Tamura, Y.; Sato, S. Using the ensemble Kalman filter for electricity load forecasting and analysis. Energy
**2016**, 104, 184–198. [Google Scholar] [CrossRef] - Wang, Y.; Gan, D.; Zhang, N.; Xie, L.; Kang, C. Feature selection for probabilistic load forecasting via sparse penalized quantile regression. J. Modern Power Syst. Clean Energy
**2019**, 7, 1200–1209. [Google Scholar] [CrossRef][Green Version] - Uniejewski, B.; Nowotarski, J.; Weron, R. Automated variable selection and shrinkage for day-ahead electricity price forecasting. Energies
**2016**, 9, 621. [Google Scholar] [CrossRef][Green Version] - Ambach, D.; Croonenbroeck, C. Space-time short-to medium-term wind speed forecasting. Stat. Methods Appl.
**2016**, 25, 5–20. [Google Scholar] [CrossRef] - Liu, W.; Dou, Z.; Wang, W.; Liu, Y.; Zou, H.; Zhang, B.; Hou, S. Short-term load forecasting based on elastic net improved GMDH and difference degree weighting optimization. Appl. Sci.
**2018**, 8, 1603. [Google Scholar] [CrossRef][Green Version] - Kath, C.; Ziel, F. The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts. Energy Econ.
**2018**, 76, 411–423. [Google Scholar] [CrossRef][Green Version] - Narajewski, M.; Ziel, F. Econometric modelling and forecasting of intraday electricity prices. J. Commod. Mark.
**2019**, 100107. [Google Scholar] [CrossRef][Green Version] - Pirbazari, A.M.; Chakravorty, A.; Rong, C. Evaluating feature selection methods for short-term load forecasting. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 27 February–2 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Muniain, P.; Ziel, F. Probabilistic forecasting in day-ahead electricity markets: Simulating peak and off-peak prices. Int. J. Forecast.
**2020**. [Google Scholar] [CrossRef][Green Version] - Gneiting, T. Making and evaluating point forecasts. J. Am. Stat. Assoc.
**2011**, 106, 746–762. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**Time series plot of the load ${Y}_{t}$ and the process ${L}_{t}$ with its single components ${X}_{G,i,t}$ and ${X}_{I,i,t}$ classified by generation type in the last week of April 2019.

**Figure 2.**(Left) Scatter plot of the process ${L}_{t}$ (see (3)) and load ${Y}_{t}$ in April 2018 with the fitted line of Model (4). (Right) Time series plot of ${Y}_{t}$, ${L}_{t}$, and ${\widehat{Y}}_{t}={\widehat{\alpha}}_{0}+{\widehat{\alpha}}_{1}{L}_{t}$ for the last week of April 2019 as in Figure 1.

**Figure 4.**Graph of $\parallel \widehat{\mathit{\beta}}{\parallel}_{1}$ against MAE (left) and RMSE (right) of the selected lasso and ridge models, illustrating the solution paths for different $\mathit{\lambda}$ values. The darker the color, the smaller the shrinkage (black = OLS).

**Figure 5.**Intraday prediction accuracy in MAE${}_{\mathit{s}}$ and RMSE${}_{\mathit{s}}$ of selected models.

**Figure 6.**Bar chart of the absolute impact $|{\widehat{\mathit{\beta}}}_{\mathit{i}}|/\parallel \widehat{\mathit{\beta}}{\parallel}_{1}$ of Model

**c-ridge${}^{*}$**for

**2years**and

**benchm**for

**1month**grouped by parameter type.

**Figure 7.**Time series plot of the actual load ${\mathit{Y}}_{\mathit{t}}$ (black), with the fitted model of the benchmark model (red) and the

**c-ridge${}^{*}$**approach (blue) on 6–12 August 2018. Additionally, the estimated impact of the single components ${\widehat{\mathit{\beta}}}_{\mathit{i}}{\mathit{X}}_{\mathit{i},\mathit{t}}$ for the

**c-ridge${}^{*}$**model (bottom) and benchmark model (top) classified by type with different colors is illustrated.

**Table 1.**Out-of-sample MAE in MW with relative improvement in % with respect to the benchmark trained on the shortest training period for all models and training periods. A heat map is used to indicate better (→ green) and worse (→ red) performing models.

Models → | benchm | c-ridge${}^{*}$ | 0-ridge${}^{*}$ | 0-lasso${}^{*}$ | c-ridge | 0-ridge | 0-lasso | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Period↓ | MAE | Imp. | MAE | Imp. | MAE | Imp. | MAE | Imp. | MAE | Imp. | MAE | Imp. | MAE | Imp. |

3years | 1302.7 | −18.3 | 453.6 | 58.8 | 483.6 | 56.1 | 509.5 | 53.7 | 452.1 | 58.9 | 481.4 | 56.3 | 507.0 | 53.9 |

2years | 1328.8 | −20.7 | 430.0 | 60.9 | 474.1 | 56.9 | 487.8 | 55.7 | 428.7 | 61.1 | 469.0 | 57.4 | 484.7 | 56.0 |

1year | 1290.5 | −17.2 | 653.9 | 40.6 | 588.7 | 46.5 | 591.0 | 46.3 | 630.5 | 42.7 | 581.7 | 47.2 | 588.8 | 46.5 |

4months | 1130.2 | −2.7 | 934.3 | 15.1 | 549.5 | 50.1 | 583.8 | 47.0 | 923.2 | 16.1 | 538.3 | 51.1 | 578.6 | 47.4 |

2months | 1097.9 | 0.3 | 944.5 | 14.2 | 602.4 | 45.3 | 626.6 | 43.1 | 919.6 | 16.5 | 593.8 | 46.1 | 617.2 | 43.9 |

1month | 1100.9 | 0.0 | 918.0 | 16.6 | 607.1 | 44.9 | 635.0 | 42.3 | 913.1 | 17.1 | 604.1 | 45.1 | 629.3 | 42.8 |

**Table 2.**Out-of-sample RMSE in MW with relative improvement in % with respect to the benchmark trained on the shortest training period for all models and training periods. A heat map is used to indicate better (→ green) and worse (→ red) performing models.

Models → | benchm | c-ridge${}^{*}$ | 0-ridge${}^{*}$ | 0-lasso${}^{*}$ | c-ridge | 0-ridge | 0-lasso | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Period↓ | RMSE | Imp. | RMSE | Imp. | RMSE | Imp. | RMSE | Imp. | RMSE | Imp. | RMSE | Imp. | RMSE | Imp. |

3years | 1556.0 | −18.8 | 578.9 | 55.8 | 710.0 | 45.8 | 868.5 | 33.7 | 582.2 | 55.5 | 713.0 | 45.6 | 825.0 | 37.0 |

2years | 1562.4 | −19.3 | 560.4 | 57.2 | 705.1 | 46.2 | 759.5 | 42.0 | 556.8 | 57.5 | 699.5 | 46.6 | 721.9 | 44.9 |

1year | 1460.6 | −11.5 | 1051.3 | 19.7 | 858.9 | 34.4 | 940.9 | 28.2 | 919.9 | 29.8 | 817.2 | 37.6 | 923.3 | 29.5 |

4months | 1332.9 | −1.8 | 1185.3 | 9.5 | 776.6 | 40.7 | 960.9 | 26.6 | 1102.3 | 15.8 | 754.6 | 42.4 | 880.6 | 32.8 |

2months | 1299.5 | 0.8 | 1274.3 | 2.7 | 877.1 | 33.0 | 975.9 | 25.5 | 1121.3 | 14.4 | 828.2 | 36.8 | 966.9 | 26.2 |

1month | 1309.7 | 0.0 | 1147.9 | 12.4 | 850.3 | 35.1 | 917.6 | 29.9 | 1150.5 | 12.2 | 858.2 | 34.5 | 914.5 | 30.2 |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ziel, F.
Load Nowcasting: Predicting Actuals with Limited Data. *Energies* **2020**, *13*, 1443.
https://doi.org/10.3390/en13061443

**AMA Style**

Ziel F.
Load Nowcasting: Predicting Actuals with Limited Data. *Energies*. 2020; 13(6):1443.
https://doi.org/10.3390/en13061443

**Chicago/Turabian Style**

Ziel, Florian.
2020. "Load Nowcasting: Predicting Actuals with Limited Data" *Energies* 13, no. 6: 1443.
https://doi.org/10.3390/en13061443