Next Article in Journal
One-Dimensional Study on Hydrate Formation from Migrating Dissolved Gas in Sandy Sediments
Next Article in Special Issue
Regional Sustainable Development with Environmental Performance: Measuring Growth Indexes on Chinese Provinces
Previous Article in Journal
Thermal Response Characteristics of Intermittently Cooled Room with Tube-Embedded Cooling Slab and Optimization of Intermittent Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Time Series Decomposition of the Daily Outdoor Air Temperature in Europe for Long-Term Energy Forecasting in the Context of Climate Change

by
Santiago Moreno-Carbonell
,
Eugenio F. Sánchez-Úbeda
* and
Antonio Muñoz
Institute for Research in Technology (IIT), ICAI School of Engineering, Comillas Pontifical University, 28015 Madrid, Spain
*
Author to whom correspondence should be addressed.
Energies 2020, 13(7), 1569; https://doi.org/10.3390/en13071569
Submission received: 22 February 2020 / Revised: 20 March 2020 / Accepted: 24 March 2020 / Published: 29 March 2020
(This article belongs to the Special Issue Solutions to Climate Emergency)

Abstract

:
Temperature is widely known as one of the most important drivers to forecast electricity and gas variables, such as the load. Because of that reason, temperature forecasting is and has been for years of great interest for energy forecasters and several approaches and methods have been published. However, these methods usually do not consider temperature trend, which causes important error increases when dealing with medium- or long-term estimations. This paper presents several temperature forecasting methods based on time series decomposition and analyzes their results and the trends of 37 different European countries, proving their annual average temperature increase and their different behaviors regarding trend and seasonal components.

Graphical Abstract

1. Introduction

The main goal of large electric and natural gas utility companies is to provide energy to customers, spread out through large regions such as countries or states. These companies devote great efforts to optimize all processes required to produce and deliver electricity and natural gas, due to the high costs of building, maintaining and operating the involved infrastructure. Within this context, short-term demand forecasting is carried out in order to ensure the reliability of supply in daily operation, whereas medium- and long-term demand forecasting is the basis for effective operation and planning (see e.g., Reference [1]).
It is well-known that meteorological conditions have a significant influence on end-use energy consumption. Among all the derived factors from weather variables such as solar radiation, humidity, wind speed, cloudiness, or rainfall, outdoor air temperature is the main weather driver of electricity and natural gas demand (see e.g., References [2,3]).For example, residential and commercial natural gas consumption by end use is primarily linked with heating (including hot water) and cooking. Concerning electricity, it is used not only for heating and cooking, but also for a variety of purposes including lighting and cooling. Therefore, these energy consumption categories are clearly influenced by outdoor air temperature. Note that the impact of weather factors could vary depending on its geographical location, climate and industrial structure of the region.
Focusing on electric load forecasting, non-linear relationship between temperature and electricity demand has been widely studied, and many papers use temperature as the main driver (see Reference [4]). In fact, a large amount of examples can be found between the participants of Global Energy Forecasting Competitions (GEFCom) of 2012, 2014 and 2017 (see References [5,6,7] respectively). For that reason, temperature forecasting has been for years of great interest for energy forecasting. Furthermore, in the last few years there is a growth of interest in probabilistic load forecasting, and generating several temperature scenarios to feed a point load forecasting model is a very common approach. For example, in Reference [8] the authors review several methods for temperature scenario generation and provide some guidelines, focusing on electric load forecasting.
Decomposition methods are a common and useful approach for time series forecasting (including temperature forecasting) in order to analyze separately different underlying patterns [9]. As it can be seen in Figure 1, where 40 years of daily minimum and maximum temperatures (1980–2019) from weather stations (WS) from four European countries are shown (specifically, Spain (ES), France (FR), Germany (DE) and Sweden (SE)), these series present strong seasonality within each year. Regarding temperature trends, observing Figure 1 there is not an evident annual increase, but the effect of considering it or not will be discussed in this paper. Furthermore, in order to more clearly reflect the underlying seasonal patterns, Figure 2 shows seasonal plots for each WS.
Time series decomposition methods usually split the time series in three main components or underlying patterns: trend (or trend-cycle), seasonal and remainder. In the context of climate change, the scientific consensus regarding human-caused global warming global exceeds 90% according to Reference [10], and temperature trend analysis has increased in interest. For example, in Reference [11], the authors analyze monthly European temperatures showing that most of the trend components in the time series are positive and linear. Regarding temperature forecasting, Reference [12] proposes a load-based temperature forecasting model tested with and without the trend as input variable, without finding significant error improvements. However, in that paper the authors use 2 years (2008 and 2009) as training set, to forecast 2010. What happens if we wanted to forecast a longer period? Would the inclusion of the trend negligible for our model? Or on the contrary, would the error increase if we did not consider it? Regarding the seasonal component, several different methods will be tested. Finally, the remainder, dealing with medium- or long-term forecasting, is usually assumed to be uncorrelated, normally distributed with zero mean and unknown variance. These residuals of trend and seasonal components typically model cold and heat temperature waves, such as those described in Reference [13]. Here we will make the aforementioned assumption, forecasting the expected value of temperature based on its trend and seasonal components.
This paper, focusing on long-term forecasting (according to Reference [4], more than three years) and time series decomposition methods, aims to answer three questions. First, does trend inclusion improve the performance of our models? Our results, based on 6 different temperature forecasting models, conclude that in most cases the answer is yes. Secondly, which method behaves better regarding temperature forecasting? Finally, and once answered that two main questions, what do our models say about the behavior of European minimum and maximum temperatures?

2. Temperature Times Series Decomposition

Time series decomposition involves separating the time series into several distinct components of interest. As stated in Reference [9], it is often helpful to split time series into several components, each one representing an underlying pattern category. As the magnitude of the annual fluctuations does not vary with the level of the temperature time series, the additive decomposition is the most appropriate for temperature time series:
y t = D t + R t = T t + S t + R t ,
where y t is the daily temperature at time t, and T t , S t and R t are the trend, seasonal and remainder components, all at time t, respectively. The sum of the trend and the seasonal components D t represents the deterministic component of y t , whereas the deviations of y t around the expected value given by D t , are represented by the remainder component, that is, those variations not explained by the deterministic one.
Although decomposition is primarily useful for studying historical changes over time, it can also be used in forecasting. In this paper, the deterministic component is projected into the future to estimate the expected daily temperatures. Thus, the trend and seasonal components have to be modeled to allow not only their accurate estimation, but reliable extrapolation into the future as well.
Regarding temperature time series decomposition, and as it can be seen in Figure 1, if the trend exists, it should be very weak. For that reason, a simple linear regression on the input time t has been used, as a robust and reliable model and in line with Reference [11]. Related to the trend component, we have ignored the possible cyclic behavior of not fixed frequency. Concerning the seasonal component, as in classical decomposition, in this paper it is assumed that the seasonal component is constant from year to year. For daily temperature time series this is a reasonable assumption in long-term forecasting. Finally, modelling the remainder component is out of the scope of this paper, since our goal is to model the deterministic component and measuring the impact of the trend when dealing with long-term forecasts.
Among all possible alternatives for the deterministic component, in this paper we have proposed a particular set of models. These models are listed in Table 1 and are explained in more detail in following sections.

2.1. Naïve Linear Regression Model

The naïve model (REG) has the form of a multiple linear regression given by
D t R E G = T t R E G + S t R E G ,
where T t R E G and S t R E G are the trend and the seasonal components, respectively. T t R E G is a simple linear term with the year, whereas S t R E G is a function of the day of the year.
Different alternatives for modeling S t R E G are possible. For example, a three-order polynomial with the day of the year, considered as a continuous variable, can produce a seasonal component in a compact-form with a few parameters. However, in our experiments better results are obtained when the day of the year is considered as categorical, that is, the seasonal component consists of 364 dummy variables representing the day of the year. Thus, the naïve model of Equation (1) can be rewritten as
D t R E G = β 0 + β 1 y e a r t + i = 1 364 α i d t i ,
where y e a r t is the year of time t, and d t i is the dummy variable for the i-th day of the year. The parameters of Equation (3) are estimated by least squares, minimizing the Mean Squared Error (MSE).
Figure 3 shows an example of the seasonal component estimated by REG for the maximum temperature of Spain, where 30 years of daily data have been used. Compared with the other methods, it is clear that REG generates a noisy but non-biased seasonal component. In all our experiments with the minimum and maximum temperatures of the 37 countries this is the typical behavior.

2.2. Discrete-Time Fourier Transform

The Discrete-time Fourier Series decomposition (FFT) proposed for the deterministic term of Equation (1) has the form
D t F F T = T t F F T + S t F F T ,
where T t F F T is a simple linear trend given by β 0 + β 1 t , and S t F F T is the seasonal component, represented by a Discrete-time Fourier series (see e.g., Reference [14])
S t F F T = h = 1 H θ h sin ( ω h t ) + h = 1 H ϕ h cos ( ω h t ) ,
where θ h and ϕ h are the coefficients of the Fourier series, H is the number of harmonics, and the angular frequency has been fixed to ω = 2 π / 365 in order to model the periodic oscillations of temperature with the seasons of the year. Note that the frequencies of the sines and cosines are multiples of the fundamental frequency 1 / 365 , therefore the frequency h / 365 is called the hth harmonic. For a given value of H, the amplitudes of the sines and cosines can be estimated by Ordinary Least Squares (OLS). The value of H has been selected using repeated cross-validation (see Section 3.2). Both the trend and the seasonal components of Equation (4) are iteratively refitted using the backfitting algorithm of Section 3.1.
According to the illustrative example of Figure 3, the seasonal component estimated by FFT has a good trade-off between bias and variance. This is the typical behavior observed in all our experiments with the 37 countries. Figure 4 shows the trend and seasonal components estimated for the maximum temperature of Spain (ES) and Sweden (SE). The number of harmonics estimated in both cases is 2. Thus, the seasonal component is the result of combining two sines and two cosines.

2.3. Weighted Moving Average

In contrast to the proposed FFT method of Section 2.2, where a predefined form of the seasonal component is assumed, smoothers do not make any assumption about of the form (see e.g., References [15,16]).
The proposed weighted moving average decomposition (AVG) for the deterministic term of Equation (1) has the form
D t A V G = T t A V G + S t A V G ,
where T t A V G is a simple linear trend given by β 0 + β 1 t , and the seasonal component S t A V G is obtained by fitting a locally weighted linear regression, that is, placing less weight on the points at the edge of the smoothing window, centered about the current element, according to a Gaussian function (see Reference [16]). The window length sets the number of weighted neighbouring elements used to fit the linear regression locally by OLS, and it has been selected using repeated cross-validation (see Section 3.2). Both components of Equation (6) are refitted using the backfitting algorithm of Section 3.1.
The illustrative example of Figure 3 is representative of the typical behavior observed in all our experiments with the 37 countries. The seasonal component estimated by AVG selected a window size of 89 in this particular case. As expected, this window length produces a smooth output with a reasonable compromise between bias and variance, being the bias concentrated in summer and winter, the periods of higher curvature.

2.4. Robust Locally Estimated Scatterplot Smoothing

As an alternative to the proposed AVG method, the robust locally estimated scatterplot smoothing decomposition (LOESS) replaces the locally weighted linear regression used in the seasonal component of AVG by robust LOESS, a weighted quadratic least squares regression robust to possible outliers, see References [17,18]. Note that LOESS uses the tri-cube weight function instead of the Gaussian one used in AVG. As in the proposed AVG method, the window length has been selected using repeated cross-validation (see Section 3.2).
According to Figure 3, the seasonal component estimated by LOESS presents a good compromise between bias and variance, but, as also happened with AVG, bias is concentrated in the periods with more csurvature (i.e., summer and winter). It is noteworthy that these two models are distanced from all the others in these periods.

2.5. Linear Hinges Model

The linear hinges model decomposition (LHM) for the deterministic term of Equation (1) has the form
D t L H M = T t L H M + S t L H M ,
where T t L H M is a simple linear trend given by β 0 + β 1 t , and the seasonal component S t L H M is obtained by fitting the Linear Hinges Model proposed in References [19,20]. Thus, the seasonal component S t L H M is a piecewise linear model defined by K knots, the points specifying the pieces.
The trend and seasonal components of Equation (6) are refitted using the backfitting algorithm of Section 3.1. In each iteration of this algorithm the trend component is fitted by OLS, whereas the number and positions of knots in S t L H M are obtained using the learning algorithm described in Reference [19], a particular implementation of backfitting that combines a greedy divide-and-conquer strategy with a computationally efficient pruning approach and special updating formulas.
Figure 3 shows the seasonal component estimated by LHM, having a good trade-off between bias and variance. Note that S t L H M is a very simple seasonal model. With only 22 parameters it is able to represent the underlying seasonal pattern in a compact form. The rest of seasonal models, except FFT, require hundreds of coefficients.

2.6. Generalized Additive Model

Following Reference [15], the Generalized Additive Model decomposition (GAM) proposed for the deterministic term of Equation (1) has the form
D t G A M = T t G A M + S t G A M ,
where T t G A M and S t G A M are the trend and the seasonal component, respectively. The trend T t G A M is modeled by a simple linear regression β 0 + β 1 t . Concerning the seasonal component S t G A M , we have used a cyclic penalized cubic regression spline with the day of the year d t , with knots at each day of the year { 1 , , 365 } . This type of cubic spline forces that the beginning and the end of the seasonal term match up to second derivative, (see Reference [21] for further details). Note that because both model components have a straightforward representation using basis functions, all the parameters of this model can be fitted directly using OLS.
It is worth noting that that this model has also been used to test the suitability of linear trend. In order to do that, we have compared the results of the GAM with linear trend, with a version in which the trend has been also fitted using regression cubic splines (see, e.g., Reference [22]). Comparing their results, the first outperforms the out-of-sample error of the second in more than 84% of the cases. That confirms our initial assumption, and what was stated in Reference [11].
According to the illustrative example in Figure 3, the seasonal component estimated by GAM has a good trade-off between bias and variance. Note that the GAM’s seasonal term seems to be a smoothed version of the seasonal obtained with REG. In fact, GAM provides the best results in our experiments of Section 4.

3. Estimation and Model Selection

The parameters of the previous models are estimated by minimizing the Mean Squared Error (MSE), or equivalently, the Root Mean Squared Error (RMSE)
R M S E = 1 N t = 1 N y t D t 2 ,
where N is the number of observations in the data set, y t is the actual temperature and D t is the estimated temperature by the deterministic component. We have also used the RMSE for evaluation of the performance of the different models tested in the following sections.
According to the particular specification of each candidate model, only REG and GAM have a fixed number of parameters, that can be estimated using ordinal least squares (OLS). The rest of models require a mechanism to automatically estimate their complexity, as well as an alternative to standard OLS to fit the parameters. In this paper we have used repeated cross-validation (RCV) for selecting the complexity of these models, combined with backfitting to estimate their parameters. The list of models is presented in Table 1, showing those fitted using backfitting and RCV.
Note that to fit the seasonal component of the proposed methods, the February 29 of all the years are previously discarded in order to work with years of 365 days. Furthermore, all the methods except REG and FFT require to form a learning set by overlapping years, such as the scatterplots of Figure 2.

3.1. Backfitting

Among all the proposed models, only the parameters of REG and GAM can be estimated in one shot by OLS. When it is not possible to estimate the full set of parameters of (1) by ordinal least squares in one shot, an alternative is to estimate each component in a forward stepwise manner. This is the common approach, for example, in classical additive decomposition time series.
In the forward fitting approach the trend component is first estimated from the original data. Once the trend has been estimated, the seasonal component is estimated from the detrended series, that is, the time series resulting from subtracting the estimated trend from the original time series. The remainder term ( R t ) is just the residuals of the deterministic component estimated using this simple procedure.
However, this forward one-step fitting can be improved using backfitting. This algorithm was initially proposed by Reference [23] in the context of projection-pursuit regression, being used for parameter estimation in well-known models such as GAM (see Section 2.6) and LASSO [24]. It is also the global fitting strategy of the LHM (see Section 2.5), the SNAKE model [25], and the medium-term forecasting model of Reference [26]. Reference [15] makes intensive use of this general algorithm, providing justifications for its use. Note that the backfitting algorithm is in fact a kind of coordinate descent optimization method, see Reference [27]. According to Reference [28], these coordinate descent algorithms are also used to solve problems that arise in machine learning and data analysis, particularly in big data settings.
Basically, backfitting is an iterative strategy where the parameters of the model are grouped such that the solution for those in each group is straightforward given fixed values for those outside the group. The algorithm iterates through these groups, one by one, making several passes over the groups. Although this strategy does not guarantee that the solution is the global minimum, this does not mean that the algorithm is not useful. Moreover, experience tells us that it is very effective in practice.
In this paper we propose a particular implementation of the backfitting algorithm, see Algorithm 1, designed for fitting the decomposition model of (1) when direct ordinal least squares is not possible. There are two groups of parameters, those related to the trend component and those that define the seasonal term. The remainder component is just computed at the end by subtracting the estimated trend and seasonal terms.
According to Reference [16], the required number of cycles m of the backfitting algorithm is usually less than 20, depending on the amount of correlation in the inputs. In this paper we have set m = 20 in Algorithm 1.
Algorithm 1: The backfitting algorithm for the additive decomposition model
Energies 13 01569 i001

3.2. Complexity Selection: Repeated cross-Validation

Determining the optimal value for the complexity parameter is critical for ensuring that the model performs well. In this paper, where the complexity of several models had to be determined (specifically, FFT, AVG and LOESS) before being used for each country, we have used repeated cross-validation (RCV, also known as leave-group-out or Monte Carlo cross-validation, see Reference [29]) to ensure a good complexity-accuracy balance. Unlike K-fold cross-validation, where the number of folds determines the number of equal-sized and mutually-exclusive folds, RCV allows decoupling the number of partitions and its size. Being N the size of our dataset, first, a randomly selected (without replacement) fraction of data of size M is taken as training set, and the rest of the points are used to validate. This sampling is repeated K times, being K and M independent and controlled by the practitioner. The error for each partition is evaluated over the remaining NM points.
In this paper, RCV has been chosen over K-fold cross-validation to better control the variance of our results, due to the fact that it has been used to determine the complexity of the aforementioned three methods, and for all the different time series that will be analyzed in the case study of Section 4 (74 time series, minimum and maximum temperatures from 37 European countries). To do that, and using the values suggested in Reference [29], at each replication we have used a 75% of our training data for parameter estimation (M), and we have carried out 100 repetitions (K).
Finally, in order better control model complexity not simply selecting the one with lowest RCV error, and following a similar approach to the one-standard error rule detailed in Reference [16], the most parsimonious model whose error is inside the confidence interval of 95% of the error of the best one is finally chosen. Figure 5 shows an example of this final step.

4. Results: The European Case

In this section, the minimum and maximum daily outdoor temperatures from 37 weather stations (WS) will be used, in order to assess the impact of the trend component in the context of medium- or long-term estimations, as well as to compare exhaustively all the methods listed in Table 1 and explained in previous sections. Furthermore, the best performing model will be finally selected, analysing the estimated temperature trends in Europe according to the selected model.

4.1. Data Description

First, the data used for the case study are described. All the data used in this paper come from the European Climate Assessment & Dataset project (ECA&D) [30]. The dataset consists of thousands of weather stations, with quite complete daily temperature observations since 1980 for the main European cities. It should be noted that in spite of the fact that we have focused on the last forty years of data, the ECA&D dataset, depending on the weather station, has much more past information. As an example, the oldest (not-empty) available data point comes from Radcliffe Meteorological Station of Oxford University (ID 274 in ECA&D), from which there is information from December 1814. In this paper, we will select one reference time series for each country. A first pre-processing step to select that reference temperatures, remove outliers and fill their missing values was required before testing the different models. Regarding missing values, we have applied a hierarchical regression imputation method, based on neighbouring stations, that is detailed in Appendix A.2. All the information regarding data pre-processing is detailed in Appendix A, including the list of reference weather stations that have been selected.
Our dataset consists of the minimum (TMIN) and maximum (TMAX) daily outdoor temperatures recorded at the 37 weather stations of Table A1, from January 1980 to December 2019. Thus, this case study consists of 74 daily time series of length 40 years (14,610 days). Furthermore, let us describe the different data partitions that have been used in this paper. The years 1980–2009 have been used as training data (in-sample, 10958 points), and years 2010–2019 have been used as test set (out-of-sample, 3652 points). Figure 6 shows the boxplot of the minimum and maximum daily temperatures of the selected station for each country, once main outliers have been removed (see Table A2). It is noticeable the clear differences between the Mediterranean countries, such as Malta (MT), Italy (IT), Greece (GR), Spain (ES) or Cyprus (CY), and the rest.
Dendrograms of Figure 7 summarize the complex spatio-temporal correlations of the selected weather stations based on the minimum and maximum temperatures, respectively. It is remarkable that, considering a correlation threshold of 0.9, the clusters formed in both dendrograms are different. For example, according to the dendrogram of TMAX shown in Figure 7a, Finland (FI) has similar maximum temperatures to Norway (NO), Denmark (DK) and Sweden (SE), whereas according to the one of TMIN (bottom), FI is closer to Russia (RU) and Estonia (EE) in terms of minimum temperature.
Figure 8 shows the location of the selected weather stations. Each station has been coloured according to the identified clusters of the dendrogram of the maximum temperature (see Figure 7a). Note that both latitude and longitude explain those clusters.

4.2. Importance of the Trend Component

This section aims to briefly analyze the effect of including the trend in all the methods described in Section 2. To do that, two versions of each model have been fitted, one using linear trends as described in Section 3, and other just using seasonal components and the level (i.e., mean value) of each time series. Table 2 shows the out-of-sample error improvements obtained by including the trend in each case, calculated as the percentage improvement in RMSE of the model with linear trend, compared to the one using the mean.
First, it can be seen that results are systematic—the effect of including the trend in a particular time series (minimum or maximum from any country) provides similar error improvements regardless the model in use. Secondly, regarding trend significance, we have obtained p-values lower than 0.05 in 73 of the 74 time series: the only exception is the minimum temperature from Romania (RO), with a p-value of 0.112. Starting from this point, it can be clearly seen that including the trend improves model performance in nearly all the cases. In terms of minimum temperatures, excluding RO, 92% of the countries present out-of-sample error improvements, whereas in the case of TMAX, that rate rises to 97%. Finally, it can be seen that several countries, such as Cyprus (CY), Poland (PL), or Serbia (RS) present high error improvements. However, in the other side, one of the 74 analyzed time series has a surprising result: the trend of the minimum temperature from Ireland (IE), whose p-value is 7.41 · 10 5 , and providing nearly an 1% of in-sample error improvement, causes an out-of-sample error increase of 6%. As it will be discussed in Section 4.4, for all the models, the resulting trend in the minimum temperature of IE has been negative, and it seems not to be the behavior of the time series during the 10 years of test set. Ignoring that case, and as aforementioned, the trend has proved to improve model out-of-sample performance in most of series and countries.

4.3. Empirical Comparative Analysis

Having described the dataset to be used, and having confirmed the importance of the trend component in long-term temperature forecasting, this section aims to compare the performance of the different candidate models of Table 1 in the selected 37 reference European stations (see Table A1). As aforementioned, for those methods that require selecting model complexity (i.e., FFT, LOESS and AVG), their hyper-parameters have been estimated for each country by using repeated cross-validation, and are detailed in Table A4. The estimated number of knots of the LHM for each country, which is automatically determined by its learning algorithm, can be also seen in Table A4. The results provided by the six methods over the 37 reference weather stations, and for their minimum and maximum temperatures, are detailed below.
First, before presenting the R-squared and the in-sample and out-of-sample errors (RMSE) of all the methods, Figure 9 and Figure 10 show the relative position of the different methods when estimating the minimum and maximum temperatures, respectively, and in-sample and out-of-sample. It should be noted that in both figures, countries have been sorted attending to the clusters of Figure 7 but there is not a best performing model for each group, and we have not found any relationship between that ordering and model performance.
The results obtained for the minimum and maximum temperatures are quite similar—the best performing models are the same in Figure 9 and Figure 10. First, it should be noted that regarding in-sample error, the REG method outperforms in all the countries all the other models. However, it can be seen that REG is never selected as one of the the top-3 models for out-of-sample performance. For that reason, we can conclude that it is clearly over-fitted. Ignoring REG, it can be seen that in both minimum and maximum temperatures, the second and third places regarding in-sample error belong to the GAM, and the LHM (the latter, beaten by the FFT in some countries).
On the other hand, in the out-of-sample set, there are three models that clearly outperform the others—GAM, FFT an LHM. The first one, that also was the second best in-sample performer after the over-fitted REG, has the lowest out-of-sample RMSE in most of the cases. As it can be seen in Figure 9 and Figure 10, GAM is the out-of-sample winner in almost 60% of the countries, both in the minimum and maximum temperatures. After the GAM, FFT provides the lowest error in approximately 25% of the countries, and LHM in the vast majority of remaining cases.
Before presenting the RMSE for all the methods and countries, and in order to check the goodness of the estimated deterministic (trend plus seasonal) components explaining temperature variance, Table 3 and Table 4 present the Adjusted R-squared ( R a d j 2 ) for the minimum and maximum temperatures, respectively. It can be seen that, in spite of the fact that the remainder has not been modelled and, therefore, forecasting performance can be improved, the obtained R a d j 2 are over 0.7 in the vast majority of cases. The average R a d j 2 for the minimum temperature and all the models and countries is of 0.731, for both in-sample and out-of-sample sets. Regarding maximum temperature, the average R a d j 2 is 0.777 and 0.776 respectively. The only cases where a weak R a d j 2 has been obtained (lower than 0.5 in average) are Ireland (IE), for its out-of-sample minimum temperature, and Iceland (IS), for its maximum temperature (both in- and out-of-sample).
Finally, Table 5 and Table 6 present the in-sample and out-of-sample errors (RMSE) of all the models, for the minimum and maximum temperatures respectively. As aforementioned, the GAM provides good in-sample results, is always in the top-3 models in terms of out-of sample performance, and beating all the others in almost 60% of cases. For that reason, in order to analyze long-term temperature trends in Europe, and their expected values in the following years Section 4.4, the GAM will be the only method to use.

4.4. Analysis of the Long-Term Temperature Trends with the Best Performance Model

Once analysed the performance of all the different methods over the 37 European countries, and confirmed that the GAM is the model, this subsection aims to analyze the results provided by that model in all the countries to shed some light on potential future changes and better understand the behavior of European temperature trends. Furthermore, in order to exhaustively present the results of the GAM, Appendix C presents the seasonal and trend components estimated by this method in all the countries. It should be noted that in spite of the fact that we will only analyze the results provided by the GAM in this section, we could extract the same conclusions, regarding the trend component, with all the tested methods, since their resulting trends are very similar. As an example, the average difference between the three best performing models (FFT, LHM and GAM) in the trends of the minimum temperatures is 9.43 · 10 4 °C/year.
It should be noted that this section presents the resulting trends of the GAM, using the reference weather stations from the ECA&D dataset presented in Table A1. Although we have performed a systematic data pre-processing step, removing outliers and filling all the missing points (see Appendix A), any data inconsistency in the original dataset can affect model results. As an example, the reference weather station from Estonia, presents a sudden temperature increase of around 1 °C during the last 13 years of our in-sample period for its minimum temperature. For that reason, its estimated trend may not be reflecting the actual behavior of that temperature.
Figure 11 shows the trends estimated by the GAM for the minimum and maximum temperatures and all the countries. First, it can be seen that most countries present positive temperature trends in both series, 0.02 to 0.08 °C/year. The only exceptions are Romania (RO), Ireland (IE) and Estonia (EE), regarding TMIN, and Malta (MT) and Croatia (HR) regarding TMAX. In the case of RO, its minimum temperature grows at a rate of 0.007 whereas its maximum does so at 0.06 °C/year. The case of IE is more surprising: although its maximum temperature is growing 0.04 °C each year, is the only country in which the GAM (and all the other models) has estimated a negative trend: its minimum temperature is decreasing 0.013 °C/year. At the far end, EE presents a very similar trend of TMAX to that of IE, but it is the country where the minimum temperature presents a higher growing rate: 0.102 °C/year. However, due to the aforementioned data issue, this result may be not representative.
On the other hand, in terms of maximum temperatures, Malta (MT) is growing slower than all the other countries (0.013 °C/year), and Croatia (HR), just in the other side, has the largest increases with 0.089 °C/year. In summary, the average temperature increase of the minimum temperatures of the 37 European countries is 0.0485 °C/year, whereas the average trend for the maximum temperatures is 0.0554 °C/year. To observe all the detail, Table 7 shows the results of the GAM in all the different countries and for the minimum and maximum temperatures.
In order to find possible similar behaviors between countries, Figure 12 presents the trends of Table 7, separated in minimum and maximum, and coloured according to the clusters formed shown in Figure 7. First, it can be seen that, once the countries have been sorted by cluster, several trend patterns can be appreciated.
Let us analyze several examples. Cluster number 5 of of maximum temperatures, formed by Baltic countries, Nordic countries (except Iceland), Belarus (BY), Poland (PL) and Russia (RU) presents trend values between 0.047 and 0.072 °C/year. Spain (ES) and Portugal (PT) form, for both minimum and maximum temperatures, the Iberian cluster with positive but low values of trend. Ireland (IE) is in the same cluster than the UK for the maximum temperatures and with similar values; however, they are in two separated clusters for TMIN (IE is the only country with negative trend for that variable). France (FR), Belgium (BE), Switzerland (CH) and the Netherlands (NL) have similar results in terms of maximum temperature, but regarding TMIN FR presents lower increases than all the other neighbours. Italy (IT) and Malta (MT) behaves similarly in TMAX (flat trends), but in TMIN, MT presents a higher value.
In order to check the geographical distribution of these trends, Figure 13 shows the European map of minimum and maximum temperature trends. First, it can be seen that, in general terms, the minimum temperatures are increasing at a slower rhythm than maximum temperatures: 68% of countries present higher growing rates in its maximum temperature. Regarding maximum temperatures, fourteen countries present annual increases higher than 0.06 °C/year, leaded by Croatia (HR) with 0.089 °C/year. Only seven countries grow in minimum and maximum temperatures at a rate higher than 0.06 °C/year—the Czech Republic (CZ), Luxembourg (LU), Moldova (MD), Norway (NO), Serbia (RS), Sweden (SE) and Slovenia (SI). It should be noted that, Finland (FI) with trends of 0.052 and 0.063 °C/year for its minimum and maximum temperatures respectively, is not very far from that group, so Scandinavian countries present quite high grow rates for both variables.

5. Conclusions

Temperature forecasting is a common step for most energy forecasting methods and several techniques for temperature scenario generation can be found in literature. Furthermore, and also related to electric load forecasting, recent papers have discussed the use of temperature trend, concluding that the effect of that component, dealing with two years of training data and one year to test, is negligible in terms of temperature forecasting accuracy.
However, dealing with long-term estimations (i.e., more than three years, and ten years in this paper), and training also with longer time series (40 years), our results have shown that we can make more accurate predictions in the long-term of the daily minimum and maximum temperatures by including a linear trend in the model. Using time series decomposition, and dealing also with the seasonal component, six different methods have been analyzed, concluding that Generalized Additive Models (GAM) outperforms all the others, providing the lowest out-of-sample error in almost 60% of the 74 time series analyzed (minimum and maximum temperatures from reference weather stations from 37 European countries).
In addition, a brief analysis of GAM results for temperatures of those 37 countries has been carried out, showing that both maximum and minimum temperatures present linear increasing trends, (with p-value = 0), and rates between 0.02 and 0.08 °C/year in most cases. As next steps, including the remainder in this kind of temperature decomposition methods will allow us to model the effect of cold and heating waves, and to better understand the behavior and possible correlation, between that component in different countries.

Author Contributions

Conceptualization, E.F.S.-Ú. and A.M.; methodology, S.M.-C. and E.F.S.-Ú.; software, S.M.-C.; validation, S.M.-C., E.F.S.-Ú. and A.M.; formal analysis, S.M.-C.; data curation, S.M.-C.; writing—original draft preparation, S.M.-C.; writing—review and editing, S.M.-C., E.F.S.-Ú. and A.M.; visualization, S.M.-C.; supervision, E.F.S.-Ú. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We acknowledge the data providers in the ECA&D project. Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. of Climatol., 22, 1441-1453. Data and metadata available at https://www.ECAD.eu.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following main abbreviations are used in this manuscript:
ECA&DEuropean Climate Assessment Dataset
FFTFast Fourier Transform
GAMGeneralized Additive Models
LHMLinear Hinges Model
LOESSLocally Estimated Scatterplot Smoothing
OLSOrdinal Least Squares
RMSERoot Mean Squared Error
TMAXMaximum temperature
TMINMinimum temperature
TRTraining set
TSTest set
WSWeather Station

Appendix A. Data Cleansing and Hierarchical Regression Imputation

This section aims to describe the complete data pre-processing process that has been carried out from ECA&D original datasets [30] (specifically, blended daily minimum and maximum temperatures) to the dataset that has been finally used in the paper. The two following subsections describe the most critical part of this pre-process: outlier detection and data imputation, but first, let us briefly describe how we have selected the reference weather stations to use. As our case study focus on the European case, 9 countries and a collective grouping of two remote jurisdictions of Norway: Svalbard and Jan Mayen were removed from our original dataset. Specifically, the not considered countries were: Algeria, Egypt, Greenland, Israel, Libya, Morocco, Syria, Turkmenistan, and Tunisia. Secondly, in order to select a reference weather station to analyse for each of the remaining 37 European countries, a simple data quality assessment was carried out. We have selected stations located at the country capitals, whenever the amount of available data and the existing missing data in the period of interest for our case study (January 1980 to December 2019) are reasonable. It should be noted that in order to better select the reference temperature for electric load forecasting, methods such as those described in Reference [31] or Reference [32] would provide better results in terms of error. However, since this paper aims to model the deterministic component of a temperature time series regardless its subsequent use, WS selection is out of the scope of this paper.
Table A1 lists the weather stations that have been finally selected. For each country, it includes its ISO 3116 code (Country ID), the ECA&D station identifier (Station ID), the station name (Station), the latitude and longitude in degrees of the WS (Lat and Lon), as well as the station elevation in meters (Hgth).
Table A1. Detail of the European weather stations used for each country.
Table A1. Detail of the European weather stations used for each country.
Country IDCountryStation IDStationLat.Lon.Hgth.
ATAustria16Wien48.233116.35198
BABosnia and Herzegovina276Sarajevo43.867818.4228630
BEBelgium17Uccle50.84.3664100
BYBelarus653Brest52.116723.6831146
CHSwitzerland240Genevecointring46.256.1331420
CYCyprus23Larnaca34.883133.63311
CZCzech Republic510Milesovka50.55513.9306830
DEGermany41Berlin-Dahlem52.463913.301751
DKDenmark116Koebenhavn:Landbohojskolen-155.683112.53319
EEEstonia11357Narva59.389228.112828
ESSpain230Madrid-Retiro40.4117−3.6781667
FIFinland28Helsinkikaisaniemi60.17524.94784
FRFrance11249Orly48.71672.384289
GBUnited Kingdom1860Heathrow51.4789−0.4488925
GRGreece61Heraklion35.333125.183139
HRCroatia21Zagreb-Gric45.816715.9781156
HUHungary849Pecspogany46.005618.2328202
IEIreland1718Dublinairport53.4281−6.240871
ISIceland65Dalatangi65.2681−13.57569
ITItaly174Brindisi40.633117.933110
LTLithuania200Kaunas54.883123.833177
LULuxembourg203Luxembourgairport49.62586.2033376
LVLatvia2951Liyepayaamsg56.5521.024
MDMoldova394Kisinev47.0228.87173
MTMalta447Luqa35.8514.483191
NLNetherlands598Rotterdam51.96064.4467−4
NONorway193Osloblindern59.942810.720694
PLPoland209Warszawa-Okecie52.162820.9608107
PTPortugal212Braganca41.8−6.7331690
RORomania219Bucuresti-Baneasa44.516726.083190
RSSerbia263Belgrade (Observatory)44.820.4667132
RURussia85St.Petersburg59.966730.33
SESweden10Stockholm59.3518.0544
SISlovenia228Ljubljanabezigrad46.065614.5169299
SKSlovakia227Hurbanovo47.866718.1831115
TRTurkey346Isparta37.7530.55997
UAUkraine252Kiev50.430.5331166

Appendix A.1. Outlier Detection

As stated in Reference [12], where authors propose a temperature anomaly detection method based on electricity demand, raw data collected by local weather stations are usually full of missing values and outliers, and they must be corrected in order to preserve model accuracy. Our dataset is not an exception, and even the finally selected temperatures of each country, which were chosen after an initial data quality assessment, have missing values and several outilers. In order to modify as less as possible the original set of time series and do it in a controlled way, we established an upper and lower threshold for outlier detection and set to missing value all the days outside the interval. Specifically, all the temperatures above 50 °C and below −60 °C has been detected as outliers.
Figure A1 shows two of those outliers, removed afterwards from the maximum temperature of Malta, where several missing values can also be seen. It should be noted that in order to carry out the regression imputation method described in Appendix A.2, which uses neighbouring WS to fill missing values of the selected reference temperatures, the oultiers of our complete input dataset have been removed. For simplicity, Table A2 presents the points that have been removed from the original reference stations. As it can be seen, there is a limited number of outliers. In the worst case, they represent a 0.027% of a time series: the minimum temperature from Malta, with 4 outliers. Overall, only 11 outliers have been detected. Furthermore, all these points have been filled using the method described below.
Figure A1. Example of outliers that have been detected in the reference maximum temperature from Malta (MT) on 28 January and 7 December 1990, and removed form the original dataset.
Figure A1. Example of outliers that have been detected in the reference maximum temperature from Malta (MT) on 28 January and 7 December 1990, and removed form the original dataset.
Energies 13 01569 g0a1
Table A2. Detail of the outliers removed from the reference weather stations, and TMAX and TMIN time series in the studied interval (January 1980 to December 2019).
Table A2. Detail of the outliers removed from the reference weather stations, and TMAX and TMIN time series in the studied interval (January 1980 to December 2019).
CountryVariableOutliers
DateValue (°C)
LVTMIN13 March 199781.70
TMAX29 December 199487.20
12 March 199785.70
MDTMAX27 January 199566.60
MTTMIN2 October 198378.90
23 February 198971.10
18 June 198978.10
5 February 199589.10
TMAX20 February 198877.60
28 January 199067.50
7 December 199063.10

Appendix A.2. Hierarchical Regression Imputation

After deleting all the outliers of the dataset, the goal of this last step in our pre-processing is to fill all the missing points of the reference time series. Considering the great correlation between temperature time series (specially between those WS that are very close geographically), and the large amount of stations available (together with their location and height), we propose a hierarchical regression imputation method. Assuming that neighbouring weather stations have strong correlations, provided they are not at a very different height, and given a particular reference temperature to fill, in this example a maximum temperature, the remaining stations are sorted by distance. In order to avoid selecting temperatures at a very different height, all the stations with more than 200 m height difference are deleted from the initial list of stations. To begin to with the set of candidate variables to fill the gaps, the opposite variable of the same station is chosen. In our aforementioned example, the minimum temperature of the reference station. After that, the nearest stations are added one by one, until there are no gaps left to fill. Note that, if the nearest stations share the gaps of the reference temperature, they are not added to the candidate set. This can cause that, in some cases, we have to select variables that can be farther away than what we could have initially assumed. Once determined how many and which variables are enough to complete the reference time series, a linear regression model with all that set as input is fitted, and all the gaps are replaced by the estimated value of the model at those missing points.
Table A3 shows all the missing points (after removing the outliers detailed in Table A2) that have been filled, and the WS that have been used for each country. It can be seen that, in general, there is not a large amount of missing values in our reference time series. Only 4 variables of the 74 that have been used (TMAX and TMIN from 37 countries) contained more than a 1% of empty points. Specifically, all except the two series from Malta (MT) and Latvia (LV), which had between a 3.65% and 7.21% of missing values. To fill the empty points of MT, 2 additional time series from Italy have been required, and for LV, 5 neighbouring WS have been used: one from Latvia, two from Sweden, and one from Estonia.
Table A3. Detail of the missing points of the reference WS of each country.The first two columns show the Country ID (C-ID) and Station ID (WS-ID). The last two columns show the neighbouring WS than have been used to fill all the empty points of each time series.
Table A3. Detail of the missing points of the reference WS of each country.The first two columns show the Country ID (C-ID) and Station ID (WS-ID). The last two columns show the neighbouring WS than have been used to fill all the empty points of each time series.
C-IDWS-IDEmpty Points% EmptySelected WS to Fill
TMINTMAXTMINTMAXTMINTMAX
AT160000--
BA2760000--
BE173133662.14%2.51%TMAX+NL-2571TMIN+NL-2571
BY6530000--
CH2400000--
CY23660.04%0.04%CY-24CY-24
CZ510750.05%0.03%TMAXTMIN
DE410000--
DK1160600.04%-TMIN
EE11,3570000--
ES2300000--
FI280000--
FR11,2490000--
GB18600000--
GR6156470.38%0.32%GR-63+TR-347GR-63+TR-347
HR21100.01%0TMAX-
HU8490000--
IE17180000--
IS65110.01%0.01%IS-2943IS-2943
IT17462640.42%0.44%TMAX+HR-10963+HR-1682TMIN+HR-10963+HR-1682
LT200630.04%0.02%TMAXTMIN
LU2030000--
LV29516495334.44%3.65%TMAX+LV-199+SE-5282
+SE-5283+SE-5281+EE-11364
TMIN+LV-199+SE-5282
+SE-5281+EE-11364+SE-5283
MD394118970.81%0.66%TMAX+RO-951TMIN+RO-951
MT44710548967.21%6.13%TMAX+IT-175+IT-174TMIN+IT-175+IT-174
NL598200.01%0TMAX-
NO1930000--
PL2090000--
PT2121361130.93%0.77%TMAX+ES-1396TMIN+ES-1396
RO2190000--
RS2630100.01%-TMIN
RU85200.01%0TMAX-
SE100000--
SI228300.02%0TMAX-
SK227220.01%0.01%TMAXTMIN
TR346730.05%0.02%TMAXTMIN
UA2521250.08%0.03%TMAX+UA-1482TMIN+UA-1482

Appendix B. Estimated Model’s Complexity in the European Case

This section presents the complexity of the models presented in Section 3.1. As aforementioned, the hyper-parameters of three of them (FFT, AVG, and LOESS) were chosen by repeated cross-validation. The complexity (i.e., number of knots) of the LHM is automatically selected by the fitting algorithm. Table A4 shows the results of that four models. It can be seen that the obtained results are coherent between them, that is, the smoothness of their results is similar in the different countries. For example, for the maximum temperature of Spain (ES), FFT use 2 harmonics, AVG a window size of 89 days (the second lowest after Malta, 87), LOESS its lowest window size (223 days) and LHM its maximum number of knots ( K = 11 ).
Table A4. Complexity selected for each country and method, and for minimum and maximum temperatures. From left to right: number of harmonics for the FFT, window size for AVG and LOESS, and number of knots for the LHM.
Table A4. Complexity selected for each country and method, and for minimum and maximum temperatures. From left to right: number of harmonics for the FFT, window size for AVG and LOESS, and number of knots for the LHM.
Country IDFFTAVGLOESSLHM
TMINTMAXTMINTMAXTMINTMAXTMINTMAX
AT1211711531928377
BA1212710932529557
BE1212511732330357
BY1113310133126167
CH1211310332527578
CY229110327928389
CZ1211710532325576
DE1212910332728757
DK2111710131725178
EE1113710533327587
ES229789231223711
FI1212512530130967
FR2212510731926957
GB2212110931327787
GR219910726332186
HR1211110730729577
HU121179531727778
IE2212111729328966
IS2212316928934365
IT229110725929178
LT1213910934726167
LU1212311133729177
LV1214110134726566
MD111159931128179
MT221058730324166
NL2113711934332356
NO1111710731929158
PL1114510735128377
PT2211587301237610
RO1211510531128779
RS121099931127378
RU111099929528578
SE221219931726576
SI121059132127577
SK121279932924968
TR32959924127188
UA1212911131527989

Appendix C. European Case: Detailed Trend and Seasonal Components

This last section presents the trend (Figure A2) and seasonal (Figure A3) components obtained using the GAM model in all the different European countries of our case study.
Figure A2. Trend component obtained with the GAM model for the 37 reference European weather stations and the minimum and maximum temperatures.
Figure A2. Trend component obtained with the GAM model for the 37 reference European weather stations and the minimum and maximum temperatures.
Energies 13 01569 g0a2
Figure A3. Seasonal component obtained with the GAM model for the 37 reference European weather stations and the minimum and maximum temperatures.
Figure A3. Seasonal component obtained with the GAM model for the 37 reference European weather stations and the minimum and maximum temperatures.
Energies 13 01569 g0a3

References

  1. Feinberg, E.A.; Genethliou, D. Load forecasting. In Applied Mathematics for Restructured Electric Power Systems; Springer: Berlin, Germany, 2005; pp. 269–285. [Google Scholar]
  2. Weron, R. Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach; John Wiley and Sons, Ltd.: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
  3. Muñoz, A.; Sánchez-Úbeda, E.F.; Cruz, A.; Marin, J. Short-term Forecasting in Power Systems: A Guided Tour. In Handbook of Power Systems II; Rebennack, S., Pardalos, P.M., Pereira, M.V.F., Iliadis, N.A., Eds.; Energy Systems; Springer: Berlin/Heidelberg, Germany, 2010; pp. 129–160. [Google Scholar] [CrossRef]
  4. Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
  5. Hong, T.; Pinson, P.; Fan, S. Global Energy Forecasting Competition 2012. Int. J. Forecast. 2014, 30, 357–363. [Google Scholar] [CrossRef]
  6. Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; Hyndman, R.J. Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond. Int. J. Forecast. 2016, 32, 896–913. [Google Scholar] [CrossRef] [Green Version]
  7. Hong, T.; Xie, J.; Black, J. Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting. Int. J. Forecast. 2019. [Google Scholar] [CrossRef]
  8. Xie, J.; Hong, T. Temperature Scenario Generation for Probabilistic Load Forecasting. IEEE Trans. Smart Grid 2018, 9, 1680–1687. [Google Scholar] [CrossRef]
  9. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  10. Cook, J.; Oreskes, N.; Doran, P.T.; Anderegg, W.R.L.; Verheggen, B.; Maibach, E.W.; Carlton, J.S.; Lewandowsky, S.; Skuce, A.G.; Green, S.A.; et al. Consensus on consensus: A synthesis of consensus estimates on human-caused global warming. Environ. Res. Lett. 2016, 11, 048002. [Google Scholar] [CrossRef]
  11. Grieser, J.; Trömel, S.; Schönwiese, C.D. Statistical time series decomposition into significant components and application to European temperature. Theor. Appl. Climatol. 2002, 71, 171–183. [Google Scholar] [CrossRef] [Green Version]
  12. Sobhani, M.; Hong, T.; Martin, C. Temperature anomaly detection for electric load forecasting. Int. J. Forecast. 2019, 36, 324–333. [Google Scholar] [CrossRef]
  13. Guerreiro, S.B.; Dawson, R.J.; Kilsby, C.; Lewis, E.; Ford, A. Future heat-waves, droughts and floods in 571 European cities. Environ. Res. Lett. 2018, 13, 034009. [Google Scholar] [CrossRef]
  14. Oppenheim, A.V.; Schafer, R.W.; Buck, J.R. Discrete-Time Signal Processing, 2nd ed.; Prentice-hall Englewood Cliffs: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
  15. Hastie, T.; Tibshirani, R. Generalized Additive Models; Wiley Online Library: Hoboken, NJ, USA, 1990. [Google Scholar]
  16. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Science & Business Media: Berlin, Germany, 2009. [Google Scholar]
  17. Cleveland, W.S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 1979, 74, 829–836. [Google Scholar] [CrossRef]
  18. Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess (with Discussion). J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
  19. Sánchez-Úbeda, E.F.; Wehenkel, L. The Hinges model: A one-dimensional continuous piecewise polynomial model. In Proceedings of the International Congress on Information Processing and Management of Uncertainty in Knowledge Based Systems, IPMU98, Paris, France, 6–10 July 1998. [Google Scholar]
  20. Sánchez-Úbeda, E.F. Models for Data Analysis: Contributions to Automatic Learning. Ph.D. Thesis, Universidad Pontificia Comillas de Madrid, Madrid, Spain, 1999. [Google Scholar]
  21. Wood, S.N. Generalized Additive Models: An Introduction with R, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  22. Li, Y.; Jones, B. The Use of Extreme Value Theory for Forecasting Long-Term Substation Maximum Electricity Demand. IEEE Trans. Power Syst. 2019, 35, 128–139. [Google Scholar] [CrossRef]
  23. Friedman, J.H.; Stuetzle, W. Projection Pursuit Regression. J. Am. Stat. Assoc. 1981, 76, 817–823. [Google Scholar] [CrossRef]
  24. Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
  25. Gascón, A.; Sánchez-Úbeda, E.F. Automatic specification of piecewise linear additive models: Application to forecasting natural gas demand. Stat. Comput. 2018, 28, 201–217. [Google Scholar] [CrossRef]
  26. Sánchez-Úbeda, E.F.; Berzosa, A. Modeling and forecasting industrial end-use natural gas consumption. Energy Econ. 2007, 29, 710–742. [Google Scholar] [CrossRef]
  27. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [Green Version]
  28. Wright, S.J. Coordinate descent algorithms. Math. Program. 2015, 151, 3–34. [Google Scholar] [CrossRef]
  29. Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
  30. Klein Tank, A.M.G.; Wijngaard, J.B.; Können, G.P.; Böhm, R.; Demarée, G.; Gocheva, A.; Mileta, M.; Pashiardis, S.; Hejkrlik, L.; Kern-Hansen, C.; et al. Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. Climatol. 2002, 22, 1441–1453. [Google Scholar] [CrossRef]
  31. Moreno-Carbonell, S.; Sánchez-Úbeda, E.F.; Muñoz, A. Rethinking weather station selection for electric load forecasting using genetic algorithms. Int. J. Forecast. 2019, 36, 695–712. [Google Scholar] [CrossRef]
  32. Hong, T.; Wang, P.; White, L. Weather station selection for electric load forecasting. Int. J. Forecast. 2015, 31, 286–295. [Google Scholar] [CrossRef]
Figure 1. Minimum and maximum daily temperatures of four weather stations from Europe: (a) Spain (ES). (b) France (FR). (c) Germany (DE). (d) Sweden (SE).
Figure 1. Minimum and maximum daily temperatures of four weather stations from Europe: (a) Spain (ES). (b) France (FR). (c) Germany (DE). (d) Sweden (SE).
Energies 13 01569 g001
Figure 2. Seasonal plots where the daily data from each year are overlapped: (a) Spain (ES). (b) France (FR). (c) Germany (DE). (d) Sweden (SE).
Figure 2. Seasonal plots where the daily data from each year are overlapped: (a) Spain (ES). (b) France (FR). (c) Germany (DE). (d) Sweden (SE).
Energies 13 01569 g002
Figure 3. Seasonal component for the maximum temperature of Spain. (a) Estimated by all the proposed methods of Table 1. (b) Estimated by REG. (c) Estimated by FFT. (d) Estimated by AVG. (e) Estimated by LOESS. (f) Estimated by LHM. (g) Estimated by GAM. (bg) Black lines in background represent the seasonal components estimated by all the rest of the models of Table 1.
Figure 3. Seasonal component for the maximum temperature of Spain. (a) Estimated by all the proposed methods of Table 1. (b) Estimated by REG. (c) Estimated by FFT. (d) Estimated by AVG. (e) Estimated by LOESS. (f) Estimated by LHM. (g) Estimated by GAM. (bg) Black lines in background represent the seasonal components estimated by all the rest of the models of Table 1.
Energies 13 01569 g003
Figure 4. Trend and seasonal components estimated by the FFT model for (a) Spain (ES), and (c) Sweden (SE). (b) and (d) show the detail of the seasonal component estimated for each country. In order to facilitate the comparison between both countries, seasonal components of (b) Sweden and (d) Spain are indicated with red broken lines.
Figure 4. Trend and seasonal components estimated by the FFT model for (a) Spain (ES), and (c) Sweden (SE). (b) and (d) show the detail of the seasonal component estimated for each country. In order to facilitate the comparison between both countries, seasonal components of (b) Sweden and (d) Spain are indicated with red broken lines.
Energies 13 01569 g004
Figure 5. Detail of the error curve with error bars obtained by repeated cross-validation at each point, for the method AVG, and the maximum temperature of Spain. The smoothest model (banwidth = 89) whose error is inside the 95% confidence interval of the best one (banwidth = 61) is chosen, indicated by the red vertical broken line.
Figure 5. Detail of the error curve with error bars obtained by repeated cross-validation at each point, for the method AVG, and the maximum temperature of Spain. The smoothest model (banwidth = 89) whose error is inside the 95% confidence interval of the best one (banwidth = 61) is chosen, indicated by the red vertical broken line.
Energies 13 01569 g005
Figure 6. Boxplots for the (a) minimum daily temperature and (b) maximum daily temperature of each European country.
Figure 6. Boxplots for the (a) minimum daily temperature and (b) maximum daily temperature of each European country.
Energies 13 01569 g006
Figure 7. Dendrograms of the 37 European weather stations, based on the (a) minimum and (b) maximum temperatures. The coloured clusters correspond to those formed using a correlation threshold of 0.9.
Figure 7. Dendrograms of the 37 European weather stations, based on the (a) minimum and (b) maximum temperatures. The coloured clusters correspond to those formed using a correlation threshold of 0.9.
Energies 13 01569 g007
Figure 8. Location of the reference weather stations. The coloured clusters correspond to those formed using the dendrogram of Figure 7a, based on the maximum temperature.
Figure 8. Location of the reference weather stations. The coloured clusters correspond to those formed using the dendrogram of Figure 7a, based on the maximum temperature.
Energies 13 01569 g008
Figure 9. Results for the minimum temperatures and all the tested methods: (a) dendrogram of the 37 weather stations, first three methods with lower (b) in-sample and (d) out-of-sample error, and percentage of times winner for each method: (c) in-sample and (e) out-of-sample.
Figure 9. Results for the minimum temperatures and all the tested methods: (a) dendrogram of the 37 weather stations, first three methods with lower (b) in-sample and (d) out-of-sample error, and percentage of times winner for each method: (c) in-sample and (e) out-of-sample.
Energies 13 01569 g009
Figure 10. Results for the maximum temperatures and all the tested methods: (a) dendrogram of the 37 weather stations, first three methods with lower (b) in-sample and (d) out-of-sample error, and percentage of times winner for each method: (c) in-sample and (e) out-of-sample.
Figure 10. Results for the maximum temperatures and all the tested methods: (a) dendrogram of the 37 weather stations, first three methods with lower (b) in-sample and (d) out-of-sample error, and percentage of times winner for each method: (c) in-sample and (e) out-of-sample.
Energies 13 01569 g010
Figure 11. Scatterplot of the trends of the minimum and maximum temperatures. Each point represents a country, and colors indicate the cluster to which each point belongs according to Figure 7b. The black broken line represents the values of equal trend for minimum and maximum temperatures.
Figure 11. Scatterplot of the trends of the minimum and maximum temperatures. Each point represents a country, and colors indicate the cluster to which each point belongs according to Figure 7b. The black broken line represents the values of equal trend for minimum and maximum temperatures.
Energies 13 01569 g011
Figure 12. Trend determined by the GAM for the (a) minimum and (b) maximum temperatures of the 37 countries. The colors indicate the cluster to which each country belongs according to Figure 7.
Figure 12. Trend determined by the GAM for the (a) minimum and (b) maximum temperatures of the 37 countries. The colors indicate the cluster to which each country belongs according to Figure 7.
Energies 13 01569 g012
Figure 13. Annual increase of temperatures obtained from the trend component of the GAM for the (a) minimum and (b) maximum temperatures of the 37 countries.
Figure 13. Annual increase of temperatures obtained from the trend component of the GAM for the (a) minimum and (b) maximum temperatures of the 37 countries.
Energies 13 01569 g013
Table 1. List of models and main characteristics.
Table 1. List of models and main characteristics.
IDFittingTrendSeasonal
REGOLSLinear with the yearDay of the year as categorical
GAMOLSLinear with timeCubic spline of the day of the year
FFTBackfittingLinear with timeSum of weighted sines and cosines
AVGBackfittingLinear with timeWeighted moving average
LOESSBackfittingLinear with timeRobust LOESS
LHMBackfittingLinear with timePiecewise linear model
Table 2. Out-of-sample (TS) error improvements ( Δ R M S E , %) obtained by including linear trends in all the methods, for the minimum and maximum temperatures from the 37 European countries.
Table 2. Out-of-sample (TS) error improvements ( Δ R M S E , %) obtained by including linear trends in all the methods, for the minimum and maximum temperatures from the 37 European countries.
Country IDTMIN—Error Improvement ( Δ R M S E   T S , %)TMAX—Error Improvement ( Δ R M S E   T S , %)
REGGAMFFTAVGLOESSLHMREGGAMFFTAVGLOESSLHM
AT−4.353−4.472−4.507−4.401−4.485−4.327−3.821−3.893−3.923−3.809−4.118−3.895
BA−3.777−3.881−3.889−3.811−3.645−3.915−2.915−2.955−2.984−2.914−2.810−2.965
BE−1.530−1.581−1.638−1.543−1.401−1.578−1.537−1.585−1.620−1.551−1.843−1.548
BY−1.150−1.180−1.204−1.165−1.119−1.176−2.151−2.191−2.208−2.147−2.181−2.278
CH2.0572.1622.0442.1502.2112.174−1.842−1.896−1.911−1.853−2.296−1.847
CY−15.007−15.300−15.425−15.070−14.854−15.312−7.798−8.005−8.158−7.956−8.179−7.985
CZ−6.971−7.185−7.239−7.112−7.249−6.903−1.758−1.809−1.863−1.776−2.132−1.806
DE−0.439−0.448−0.477−0.431−0.500−0.455−2.051−2.104−2.153−2.083−2.336−2.110
DK−1.161−1.201−1.247−1.170−1.200−1.204−4.440−4.536−4.604−4.407−4.576−4.502
EE1.9061.9521.8992.0092.5342.586−1.709−1.773−1.790−1.727−1.884−1.759
ES−2.929−3.020−3.069−2.927−3.133−2.551−2.481−2.560−2.630−2.508−2.772−2.560
FI−2.929−3.083−3.071−3.030−2.993−3.094−1.937−2.027−2.079−1.892−2.046−2.020
FR−0.716−0.732−0.748−0.705−0.720−0.726−1.933−1.998−2.026−1.953−2.270−2.061
GB−0.079−0.097−0.150−0.075−0.074−0.107−1.560−1.627−1.656−1.589−1.824−1.991
GR−6.680−6.837−6.856−6.878−6.922−6.926−2.559−2.638−2.688−2.620−2.636−2.668
HR−5.897−6.055−6.087−5.942−6.052−6.771−4.622−4.705−4.730−4.587−4.623−4.756
HU−3.746−3.880−3.897−3.784−3.812−3.826−2.219−2.277−2.308−2.272−2.321−2.273
IE−0.448−0.456−0.450−0.452−0.464−0.4505.9786.0866.0616.0175.9856.049
IS−8.999−9.388−9.497−9.427−9.239−9.297−0.250−0.263−0.290−0.225−0.387−0.298
IT−5.217−5.370−5.388−5.326−5.274−5.376−1.034−1.076−1.124−1.054−1.282−1.163
LT−2.666−2.746−2.765−2.686−2.450−2.713−1.629−1.676−1.723−1.600−1.714−1.930
LU−1.504−1.549−1.608−1.491−1.597−1.409−1.953−2.001−2.037−1.946−2.696−1.992
LV−1.551−1.615−1.638−1.554−1.613−1.583−1.325−1.361−1.421−1.309−1.407−1.355
MD−2.888−3.008−3.068−2.896−2.836−2.774−3.225−3.349−3.329−3.235−3.359−3.301
MT−2.620−2.734−2.847−2.542−2.579−2.635−1.256−1.322−1.344−1.319−1.450−1.297
NL0.005−0.007−0.0450.0150.069−0.030−1.372−1.419−1.480−1.383−1.780−1.419
NO0.3740.3670.3000.3970.5290.328−0.795−0.841−0.933−0.818−1.014−0.527
PL−4.342−4.459−4.471−4.248−4.291−4.444−3.660−3.728−3.755−3.668−3.760−3.670
PT−0.647−0.665−0.677−0.666−0.652−0.657−2.766−2.840−2.898−2.802−3.064−2.846
RO−0.562−0.580−0.596−0.615−0.602−0.583−2.012−2.102−2.156−2.051−1.985−1.997
RS−6.853−7.056−7.069−6.916−7.021−6.960−3.194−3.277−3.308−3.245−3.362−3.272
RU−3.413−3.566−3.579−3.534−3.401−3.848−1.424−1.487−1.504−1.436−1.532−1.737
SE−0.742−0.801−0.840−0.732−1.060−1.004−1.649−1.737−1.786−1.671−1.921−1.893
SI−7.528−7.749−7.811−7.669−7.484−7.474−3.870−3.956−4.009−3.939−4.111−3.958
SK−3.432−3.533−3.589−3.461−3.463−3.820−3.213−3.287−3.352−3.251−3.492−3.226
TR−4.612−4.680−4.749−4.585−4.635−4.677−3.892−4.050−4.111−4.035−3.898−4.498
UA−4.147−4.297−4.317−4.152−3.970−4.438−4.484−4.620−4.649−4.368−4.497−4.586
Table 3. Summary of the in-sample (TR) and out-of-sample (TS) Adjusted R-squared ( R a d j 2 ) for the minimum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Table 3. Summary of the in-sample (TR) and out-of-sample (TS) Adjusted R-squared ( R a d j 2 ) for the minimum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Country ID R a d j 2 (TMIN–TR) R a d j 2 (TMIN–TS)
REGGAMFFTAVGLOESSLHMREGGAMFFTAVGLOESSLHM
AT0.777 *0.7720.7710.7680.7680.7720.7790.7850.785 *0.7800.7810.785
BA0.721 *0.7130.7110.7080.7090.7120.7170.7250.7240.7190.7200.728 *
BE0.660 *0.6500.6470.6460.6460.6480.6190.630 *0.6280.6270.6260.624
BY0.709 *0.7030.7020.6970.6980.7010.6830.691 *0.6890.6860.6870.689
CH0.767 *0.7610.7580.7570.7560.7610.7600.770 *0.7680.7620.7610.768
CY0.850 *0.8450.8450.8440.8440.8450.8600.8630.863 *0.8610.8610.863
CZ0.740 *0.7340.7320.7300.7290.7340.7170.7250.726 *0.7210.7220.722
DE0.709 *0.7020.7000.6960.6970.7000.7000.706 *0.7040.6990.7010.705
DK0.777 *0.7700.7700.7660.7650.7700.7630.7710.772 *0.7690.7700.772
EE0.705 *0.6960.6930.6880.6890.6980.6660.680 *0.6770.6740.6720.675
ES0.812 *0.8070.8050.8040.8050.8060.8220.8280.828 *0.8220.8270.825
FI0.741 *0.7350.7310.7290.7310.7330.7410.753 *0.7490.7490.7500.751
FR0.656 *0.6470.6470.6420.6430.6430.6410.647 *0.6460.6410.6420.639
GB0.660 *0.6510.6510.6470.6470.6520.6330.643 *0.6420.6380.6390.640
GR0.816 *0.8100.8080.8070.8070.8080.7850.7910.7890.7900.7890.792 *
HR0.794 *0.7880.7860.7850.7850.7890.7900.796 *0.7950.7920.7920.796
HU0.779 *0.7730.7720.7690.7700.7730.7870.795 *0.7940.7890.7900.792
IE0.583 *0.5710.5710.5670.5680.5690.4910.500 *0.5000.4960.4970.495
IS0.598 *0.5870.5850.5810.5800.5850.5720.5910.594 *0.5930.5870.588
IT0.824 *0.8180.8180.8170.8160.8180.8090.8150.8130.8120.8110.815 *
LT0.699 *0.6920.6900.6850.6850.6920.6860.694 *0.6910.6850.6860.690
LU0.707 *0.6990.6970.6950.6940.6990.6880.697 *0.6970.6920.6920.693
LV0.710 *0.7030.6990.6950.6950.7000.6950.707 *0.7040.6950.6970.701
MD0.805 *0.7990.7990.7950.7960.7990.7920.8010.801 *0.7940.7960.798
MT0.799 *0.7920.7920.7890.7890.7910.8390.846 *0.8450.8380.8390.843
NL0.615 *0.6030.6030.5970.5970.6000.6020.614 *0.6130.6050.6060.608
NO0.763 *0.7560.7530.7520.7510.7550.7460.761 *0.7590.7560.7560.758
PL0.707 *0.7010.7000.6920.6950.7010.7230.730 *0.7290.7140.7180.729
PT0.685 *0.6770.6750.6720.6710.6730.6220.6320.6320.634 *0.6330.626
RO0.787 *0.7810.7800.7770.7770.7810.7660.774 *0.7720.7680.7680.772
RS0.770 *0.7640.7620.7610.7600.7640.7650.772 *0.7710.7670.7670.769
RU0.760 *0.7540.7520.7510.7500.7530.7580.768 *0.7660.7650.7660.766
SE0.779 *0.7720.7720.7670.7670.7730.7670.779 *0.7780.7740.7750.775
SI0.778 *0.7720.7710.7700.7690.7720.7800.787 *0.7870.7850.7840.785
SK0.746 *0.7390.7380.7340.7350.7390.7370.7450.745 *0.7400.7410.745
TR0.771 *0.7640.7640.7620.7620.7640.7690.7720.773 *0.7690.7690.772
UA0.787 *0.7820.7800.7750.7770.7820.7860.793 *0.7910.7840.7870.791
Table 4. Summary of the in-sample (TR) and out-of-sample (TS) Adjusted R-squared ( R a d j 2 ) for the maximum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Table 4. Summary of the in-sample (TR) and out-of-sample (TS) Adjusted R-squared ( R a d j 2 ) for the maximum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Country ID R a d j 2 (TMAX–TR) R a d j 2 (TMAX–TS)
REGGAMFFTAVGLOESSLHMREGGAMFFTAVGLOESSLHM
AT0.788 *0.7820.7810.7780.7790.7830.7760.780 *0.7800.7750.7770.779
BA0.731 *0.7230.7220.7200.7190.7220.7200.7250.7250.7190.7180.727 *
BE0.718 *0.7110.7100.7070.7070.7130.6990.707 *0.7070.7010.7030.705
BY0.805 *0.8000.7950.7970.7960.8000.7960.7990.7970.7950.7970.800 *
CH0.786 *0.7800.7800.7780.7770.7810.7730.779 *0.7770.7750.7750.776
CY0.874 *0.8710.8700.8680.8690.8710.8700.8730.8740.8730.8730.874 *
CZ0.756 *0.7490.7460.7460.7450.7480.7370.7440.744 *0.7390.7430.744
DE0.767 *0.7610.7580.7580.7560.7610.7480.7540.755 *0.7510.7530.754
DK0.817 *0.8130.8080.8100.8090.8130.8200.8240.824 *0.8190.8220.823
EE0.809 *0.8040.8000.8010.8000.8030.8060.812 *0.8100.8080.8090.810
ES0.829 *0.8240.8210.8210.8200.8250.8250.8310.831 *0.8260.8310.829
FI0.831 *0.8270.8250.8210.8210.8270.8300.837 *0.8360.8280.8300.834
FR0.746 *0.7390.7390.7370.7370.7400.7340.742 *0.7410.7370.7390.741
GB0.746 *0.7400.7390.7370.7370.7400.7230.732 *0.7310.7270.7290.730
GR0.760 *0.7520.7510.7490.7490.7500.7520.7600.7610.7590.7590.762 *
HR0.787 *0.7810.7810.7780.7780.7810.7770.782 *0.7800.7760.7770.780
HU0.788 *0.7810.7810.7800.7780.7820.7710.777 *0.7770.7760.7760.776
IE0.728 *0.7220.7220.7180.7190.7210.6650.672 *0.6720.6660.6700.667
IS0.446 *0.4310.4300.4160.4170.4220.4130.4400.443 *0.4390.4380.428
IT0.828 *0.8230.8230.8200.8200.8230.8140.8210.8210.8200.8200.822 *
LT0.821 *0.8160.8150.8120.8110.8150.8150.819 *0.8190.8120.8160.819
LU0.750 *0.7440.7430.7400.7400.7440.7370.743 *0.7420.7370.7400.741
LV0.793 *0.7870.7850.7840.7820.7870.7930.7980.799 *0.7920.7940.797
MD0.808 *0.8020.7990.8000.7990.8030.8030.810 *0.8060.8040.8050.809
MT0.864 *0.8590.8580.8580.8570.8570.8700.877 *0.8760.8750.8750.874
NL0.732 *0.7250.7210.7210.7190.7230.7120.720 *0.7200.7150.7170.715
NO0.826 *0.8210.8180.8180.8170.8210.8120.8200.822 *0.8170.8190.818
PL0.794 *0.7890.7850.7850.7840.7880.7830.786 *0.7850.7820.7840.786
PT0.763 *0.7560.7530.7540.7520.7570.7700.7760.778 *0.7730.7760.775
RO0.822 *0.8170.8160.8140.8140.8160.8110.8190.820 *0.8150.8150.815
RS0.744 *0.7370.7360.7350.7340.7370.7340.741 *0.7410.7380.7380.740
RU0.818 *0.8140.8110.8110.8100.8140.8170.8240.8210.8190.8190.824 *
SE0.824 *0.8190.8170.8160.8160.8190.8270.834 *0.8320.8290.8310.833
SI0.810 *0.8040.8040.8030.8010.8040.7890.7940.794 *0.7930.7930.793
SK0.813 *0.8080.8070.8050.8060.8080.7930.7980.799 *0.7960.7980.797
TR0.839 *0.8350.8350.8330.8330.8340.8360.8430.843 *0.8410.8410.842
UA0.825 *0.8210.8180.8160.8160.8210.8190.825 *0.8240.8140.8180.823
Table 5. Summary of the in-sample (TR) and out-of-sample (TS) errors for the minimum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Table 5. Summary of the in-sample (TR) and out-of-sample (TS) errors for the minimum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Country IDRMSE (TMIN–TR)RMSE (TMIN–TS)
REGGAMFFTAVGLOESSLHMREGGAMFFTAVGLOESSLHM
AT3.436 *3.4773.4883.5073.5053.4783.4283.3803.376 *3.4153.4063.378
BA3.830 *3.8853.8953.9193.9123.8893.8503.7933.8063.8363.8313.779 *
BE3.429 *3.4763.4903.4983.4963.4883.5743.524 *3.5353.5403.5393.551
BY4.373 *4.4194.4284.4654.4544.4334.5534.499 *4.5134.5334.5264.512
CH3.111 *3.1563.1733.1803.1863.1553.2063.141 *3.1513.1943.1973.151
CY2.329 *2.3622.3652.3742.3762.3622.2432.2212.219 *2.2392.2352.220
CZ3.712 *3.7613.7753.7883.7933.7613.9273.8673.860 *3.8943.8893.887
DE3.608 *3.6513.6653.6853.6793.6623.7053.669 *3.6793.7113.7033.676
DK2.966 *3.0083.0143.0373.0413.0083.0222.9722.965 *2.9852.9802.966
EE4.924 *4.9935.0205.0565.0484.9815.1225.014 *5.0395.0625.0605.058
ES2.801 *2.8362.8532.8582.8562.8442.8582.8132.813 *2.8602.8202.836
FI4.424 *4.4774.5134.5244.5114.4974.3424.240 *4.2754.2804.2674.261
FR3.469 *3.5123.5143.5353.5333.5323.5563.524 *3.5323.5553.5503.562
GB3.103 *3.1433.1443.1623.1633.1393.1883.146 *3.1523.1663.1623.157
GR2.336 *2.3742.3832.3902.3912.3832.5072.4772.4842.4812.4832.467 *
HR3.376 *3.4203.4373.4463.4463.4183.4223.372 *3.3803.4073.4053.375
HU3.612 *3.6563.6693.6873.6843.6543.5603.494 *3.5023.5443.5353.521
IE2.886 *2.9272.9272.9412.9392.9343.4013.369 *3.3713.3823.3803.385
IS2.697 *2.7362.7402.7552.7572.7402.5442.4862.479 *2.4812.4972.497
IT2.570 *2.6112.6152.6212.6252.6122.7072.6662.6742.6832.6892.666 *
LT4.445 *4.4934.5074.5474.5434.4984.5834.519 *4.5414.5864.5844.551
LU3.471 *3.5163.5293.5423.5483.5183.5553.504 *3.5093.5343.5373.532
LV4.057 *4.1044.1354.1614.1584.1244.2844.204 *4.2204.2854.2744.246
MD3.863 *3.9173.9233.9543.9493.9174.0733.9913.991 *4.0534.0404.014
MT2.347 *2.3862.3882.4022.4062.3942.1112.066 *2.0702.1132.1112.084
NL3.574 *3.6263.6273.6543.6543.6423.6353.580 *3.5833.6203.6163.604
NO3.740 *3.8023.8203.8333.8343.8073.8483.738 *3.7473.7723.7683.755
PL4.170 *4.2144.2224.2744.2574.2174.2134.159 *4.1684.2854.2564.171
PT3.307 *3.3503.3583.3723.3793.3713.5633.5133.5143.506 *3.5093.543
RO3.789 *3.8463.8533.8793.8743.8424.0623.994 *4.0064.0464.0414.010
RS3.716 *3.7653.7763.7873.7893.7603.8303.771 *3.7823.8153.8143.797
RU4.602 *4.6634.6854.6954.6994.6704.5674.473 *4.4954.5004.4984.489
SE3.488 *3.5403.5463.5793.5823.5363.5443.456 *3.4603.4953.4913.484
SI3.495 *3.5393.5483.5583.5673.5423.4213.366 *3.3683.3843.3913.381
SK3.890 *3.9383.9453.9783.9713.9443.9343.8753.873 *3.9153.9053.876
TR3.379 *3.4283.4303.4433.4453.4283.3983.3763.373 *3.4013.3963.376
UA4.189 *4.2394.2574.3004.2854.2334.2714.194 *4.2194.2874.2624.219
Table 6. Summary of the in-sample (TR) and out-of-sample (TS) errors for the maximum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Table 6. Summary of the in-sample (TR) and out-of-sample (TS) errors for the maximum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.
Country IDRMSE (TMAX–TR)RMSE (TMAX–TS)
REGGAMFFTAVGLOESSLHMREGGAMFFTAVGLOESSLHM
AT4.391 *4.4484.4614.4924.4854.4454.6104.567 *4.5734.6244.6034.576
BA5.163 *5.2405.2455.2685.2775.2445.4215.3765.3745.4315.4405.360 *
BE3.924 *3.9733.9814.0014.0033.9654.1524.094 *4.0954.1354.1254.110
BY4.529 *4.5864.6384.6184.6294.5844.7904.7534.7804.8044.7844.744 *
CH4.065 *4.1224.1284.1464.1514.1204.2164.151 *4.1744.1914.1924.182
CY2.250 *2.2762.2842.3012.2982.2812.2242.1952.1892.1962.1962.187 *
CZ4.729 *4.7974.8254.8264.8334.8065.1445.0775.072 *5.1255.0925.073
DE4.320 *4.3734.3994.4004.4174.3694.5934.5414.529 *4.5644.5514.545
DK3.276 *3.3143.3513.3403.3443.3113.3713.3363.335 *3.3843.3563.344
EE4.415 *4.4724.5184.5124.5184.4814.5324.459 *4.4844.5124.4944.485
ES3.551 *3.6033.6333.6273.6373.5913.8143.7553.745 *3.8053.7453.769
FI3.728 *3.7723.7943.8443.8373.7823.8033.731 *3.7443.8313.8063.762
FR3.920 *3.9723.9763.9943.9933.9684.0523.988 *4.0014.0274.0173.995
GB3.275 *3.3133.3183.3343.3333.3103.4973.438 *3.4473.4723.4593.453
GR2.895 *2.9412.9462.9572.9612.9542.8812.8362.8282.8402.8402.821 *
HR4.299 *4.3574.3624.3874.3904.3604.5184.472 *4.4874.5304.5244.491
HU4.584 *4.6494.6584.6694.6834.6464.7424.676 *4.6784.6864.6894.692
IE2.579 *2.6092.6102.6292.6242.6142.9322.900 *2.9012.9272.9142.924
IS3.425 *3.4713.4753.5173.5133.5003.1073.0343.026 *3.0353.0423.067
IT2.769 *2.8102.8112.8352.8332.8122.8142.7592.7592.7672.7682.753 *
LT4.240 *4.2954.3114.3444.3484.3054.5034.452 *4.4554.5434.4994.459
LU4.164 *4.2184.2264.2454.2494.2154.3864.332 *4.3394.3834.3724.350
LV3.858 *3.9153.9273.9433.9603.9184.0153.9713.961 *4.0254.0063.983
MD4.775 *4.8434.8874.8704.8784.8375.1045.008 *5.0625.0955.0855.031
MT2.351 *2.3922.3992.4022.4062.4062.2792.216 *2.2242.2312.2282.235
NL3.591 *3.6373.6643.6663.6753.6503.7663.713 *3.7153.7443.7423.745
NO3.823 *3.8823.9113.9173.9183.8794.0173.9303.907 *3.9653.9433.958
PL4.448 *4.5074.5534.5444.5574.5184.7084.668 *4.6774.7124.6964.669
PT4.002 *4.0624.0864.0804.0984.0574.1174.0654.045 *4.0924.0614.071
RO4.609 *4.6734.6764.7124.7124.6804.7834.6824.672 *4.7324.7274.733
RS5.100 *5.1745.1795.1945.1975.1755.2665.194 *5.1975.2255.2245.206
RU4.420 *4.4724.5114.5044.5164.4754.5284.4474.4824.5044.5064.442 *
SE3.784 *3.8423.8643.8683.8763.8443.8113.731 *3.7443.7783.7643.736
SI4.163 *4.2284.2344.2424.2624.2284.4654.4094.408 *4.4234.4194.416
SK4.365 *4.4234.4334.4514.4484.4224.6394.5834.574 *4.6084.5904.600
TR3.845 *3.8983.9013.9263.9193.9063.8173.7403.734 *3.7563.7573.748
UA4.533 *4.5934.6224.6494.6574.5854.9714.898 *4.9115.0454.9964.915
Table 7. Trends (°C/year) for the minimum and maximum temperatures and all the countries, estimated by the GAM model.
Table 7. Trends (°C/year) for the minimum and maximum temperatures and all the countries, estimated by the GAM model.
Country IDTrend (°C/Year)Country IDTrend (°C/Year)
TMINTMAXTMINTMAX
AT0.0390.053IT0.0310.030
BA0.0450.046LT0.0460.059
BE0.0650.054LU0.0670.079
BY0.0350.050LV0.0420.054
CH0.0640.056MD0.0690.073
CY0.0650.055MT0.0540.013
CZ0.0610.076NL0.0460.059
DE0.0290.062NO0.0620.069
DK0.0370.057PL0.0410.056
EE0.1020.047PT0.0210.037
ES0.0330.024RO0.0070.061
FI0.0520.063RS0.0610.064
FR0.0390.061RU0.0550.055
GB0.0560.051SE0.0640.072
GR0.0270.037SI0.0710.069
HR0.0580.089SK0.0570.074
HU0.0420.050TR0.0760.042
IE−0.0130.041UA0.0470.069
IS0.046 0.044---

Share and Cite

MDPI and ACS Style

Moreno-Carbonell, S.; Sánchez-Úbeda, E.F.; Muñoz, A. Time Series Decomposition of the Daily Outdoor Air Temperature in Europe for Long-Term Energy Forecasting in the Context of Climate Change. Energies 2020, 13, 1569. https://doi.org/10.3390/en13071569

AMA Style

Moreno-Carbonell S, Sánchez-Úbeda EF, Muñoz A. Time Series Decomposition of the Daily Outdoor Air Temperature in Europe for Long-Term Energy Forecasting in the Context of Climate Change. Energies. 2020; 13(7):1569. https://doi.org/10.3390/en13071569

Chicago/Turabian Style

Moreno-Carbonell, Santiago, Eugenio F. Sánchez-Úbeda, and Antonio Muñoz. 2020. "Time Series Decomposition of the Daily Outdoor Air Temperature in Europe for Long-Term Energy Forecasting in the Context of Climate Change" Energies 13, no. 7: 1569. https://doi.org/10.3390/en13071569

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop