Time Series Decomposition of the Daily Outdoor Air Temperature in Europe for Long-Term Energy Forecasting in the Context of Climate Change

: Temperature is widely known as one of the most important drivers to forecast electricity and gas variables, such as the load. Because of that reason, temperature forecasting is and has been for years of great interest for energy forecasters and several approaches and methods have been published. However, these methods usually do not consider temperature trend, which causes important error increases when dealing with medium- or long-term estimations. This paper presents several temperature forecasting methods based on time series decomposition and analyzes their results and the trends of 37 different European countries, proving their annual average temperature increase and their different behaviors regarding trend and seasonal components.


Introduction
The main goal of large electric and natural gas utility companies is to provide energy to customers, spread out through large regions such as countries or states. These companies devote great efforts to optimize all processes required to produce and deliver electricity and natural gas, due to the high costs of building, maintaining and operating the involved infrastructure. Within this context, short-term demand forecasting is carried out in order to ensure the reliability of supply in daily operation, whereas medium-and long-term demand forecasting is the basis for effective operation and planning (see e.g., Reference [1]).
It is well-known that meteorological conditions have a significant influence on end-use energy consumption. Among all the derived factors from weather variables such as solar radiation, humidity, wind speed, cloudiness, or rainfall, outdoor air temperature is the main weather driver of electricity and natural gas demand (see e.g., References [2,3]). For example, residential and commercial natural gas consumption by end use is primarily linked with heating (including hot water) and cooking. Concerning electricity, it is used not only for heating and cooking, but also for a variety of purposes including lighting and cooling. Therefore, these energy consumption categories are clearly influenced by outdoor air temperature. Note that the impact of weather factors could vary depending on its geographical location, climate and industrial structure of the region.
Focusing on electric load forecasting, non-linear relationship between temperature and electricity demand has been widely studied, and many papers use temperature as the main driver (see Reference [4]). In fact, a large amount of examples can be found between the participants of Global Energy Forecasting Competitions (GEFCom) of 2012, 2014 and 2017 (see References [5][6][7] respectively). For that reason, temperature forecasting has been for years of great interest for energy forecasting.
Furthermore, in the last few years there is a growth of interest in probabilistic load forecasting, and generating several temperature scenarios to feed a point load forecasting model is a very common approach. For example, in Reference [8] the authors review several methods for temperature scenario generation and provide some guidelines, focusing on electric load forecasting.
Decomposition methods are a common and useful approach for time series forecasting (including temperature forecasting) in order to analyze separately different underlying patterns [9]. As it can be seen in Figure 1, where 40 years of daily minimum and maximum temperatures (1980-2019) from weather stations (WS) from four European countries are shown (specifically, Spain (ES), France (FR), Germany (DE) and Sweden (SE)), these series present strong seasonality within each year. Regarding temperature trends, observing Figure 1 there is not an evident annual increase, but the effect of considering it or not will be discussed in this paper. Furthermore, in order to more clearly reflect the underlying seasonal patterns, Figure 2 shows seasonal plots for each WS. Time series decomposition methods usually split the time series in three main components or underlying patterns: trend (or trend-cycle), seasonal and remainder. In the context of climate change, the scientific consensus regarding human-caused global warming global exceeds 90% according to Reference [10], and temperature trend analysis has increased in interest. For example, in Reference [11], the authors analyze monthly European temperatures showing that most of the trend components in the time series are positive and linear. Regarding temperature forecasting, Reference [12] proposes a load-based temperature forecasting model tested with and without the trend as input variable, without finding significant error improvements. However, in that paper the authors use 2 years (2008 and 2009) as training set, to forecast 2010. What happens if we wanted to forecast a longer period? Would the inclusion of the trend negligible for our model? Or on the contrary, would the error increase if we did not consider it? Regarding the seasonal component, several different methods will be tested. Finally, the remainder, dealing with medium-or long-term forecasting, is usually assumed to be uncorrelated, normally distributed with zero mean and unknown variance. These residuals of trend and seasonal components typically model cold and heat temperature waves, such as those described in Reference [13]. Here we will make the aforementioned assumption, forecasting the expected value of temperature based on its trend and seasonal components. This paper, focusing on long-term forecasting (according to Reference [4], more than three years) and time series decomposition methods, aims to answer three questions. First, does trend inclusion improve the performance of our models? Our results, based on 6 different temperature forecasting models, conclude that in most cases the answer is yes. Secondly, which method behaves better regarding temperature forecasting? Finally, and once answered that two main questions, what do our models say about the behavior of European minimum and maximum temperatures?

Temperature Times Series Decomposition
Time series decomposition involves separating the time series into several distinct components of interest. As stated in Reference [9], it is often helpful to split time series into several components, each one representing an underlying pattern category. As the magnitude of the annual fluctuations does not vary with the level of the temperature time series, the additive decomposition is the most appropriate for temperature time series: where y t is the daily temperature at time t, and T t , S t and R t are the trend, seasonal and remainder components, all at time t, respectively. The sum of the trend and the seasonal components D t represents the deterministic component of y t , whereas the deviations of y t around the expected value given by D t , are represented by the remainder component, that is, those variations not explained by the deterministic one. Although decomposition is primarily useful for studying historical changes over time, it can also be used in forecasting. In this paper, the deterministic component is projected into the future to estimate the expected daily temperatures. Thus, the trend and seasonal components have to be modeled to allow not only their accurate estimation, but reliable extrapolation into the future as well.
Regarding temperature time series decomposition, and as it can be seen in Figure 1, if the trend exists, it should be very weak. For that reason, a simple linear regression on the input time t has been used, as a robust and reliable model and in line with Reference [11]. Related to the trend component, we have ignored the possible cyclic behavior of not fixed frequency. Concerning the seasonal component, as in classical decomposition, in this paper it is assumed that the seasonal component is constant from year to year. For daily temperature time series this is a reasonable assumption in long-term forecasting. Finally, modelling the remainder component is out of the scope of this paper, since our goal is to model the deterministic component and measuring the impact of the trend when dealing with long-term forecasts.
Among all possible alternatives for the deterministic component, in this paper we have proposed a particular set of models. These models are listed in Table 1 and are explained in more detail in following sections.

Naïve Linear Regression Model
The naïve model (REG) has the form of a multiple linear regression given by where T REG t and S REG t are the trend and the seasonal components, respectively. T REG t is a simple linear term with the year, whereas S REG t is a function of the day of the year. Different alternatives for modeling S REG t are possible. For example, a three-order polynomial with the day of the year, considered as a continuous variable, can produce a seasonal component in a compact-form with a few parameters. However, in our experiments better results are obtained when the day of the year is considered as categorical, that is, the seasonal component consists of 364 dummy variables representing the day of the year. Thus, the naïve model of Equation (1) can be rewritten as where year t is the year of time t, and d i t is the dummy variable for the i-th day of the year. The parameters of Equation (3) are estimated by least squares, minimizing the Mean Squared Error (MSE). Figure 3 shows an example of the seasonal component estimated by REG for the maximum temperature of Spain, where 30 years of daily data have been used. Compared with the other methods, it is clear that REG generates a noisy but non-biased seasonal component. In all our experiments with the minimum and maximum temperatures of the 37 countries this is the typical behavior.

Discrete-Time Fourier Transform
The Discrete-time Fourier Series decomposition (FFT) proposed for the deterministic term of Equation (1) has the form where T FFT t is a simple linear trend given by β 0 + β 1 t, and S FFT t is the seasonal component, represented by a Discrete-time Fourier series (see e.g., Reference [14]) where θ h and φ h are the coefficients of the Fourier series, H is the number of harmonics, and the angular frequency has been fixed to ω = 2π/365 in order to model the periodic oscillations of temperature with the seasons of the year. Note that the frequencies of the sines and cosines are multiples of the fundamental frequency 1/365, therefore the frequency h/365 is called the hth harmonic. For a given value of H, the amplitudes of the sines and cosines can be estimated by Ordinary Least Squares (OLS). The value of H has been selected using repeated cross-validation (see Section 3.2). Both the trend and the seasonal components of Equation (4) are iteratively refitted using the backfitting algorithm of Section 3.1. According to the illustrative example of Figure 3, the seasonal component estimated by FFT has a good trade-off between bias and variance. This is the typical behavior observed in all our experiments with the 37 countries. Figure 4 shows the trend and seasonal components estimated for the maximum temperature of Spain (ES) and Sweden (SE). The number of harmonics estimated in both cases is 2. Thus, the seasonal component is the result of combining two sines and two cosines.

Weighted Moving Average
In contrast to the proposed FFT method of Section 2.2, where a predefined form of the seasonal component is assumed, smoothers do not make any assumption about of the form (see e.g., References [15,16]).
The proposed weighted moving average decomposition (AVG) for the deterministic term of Equation (1) has the form where T AVG t is a simple linear trend given by β 0 + β 1 t, and the seasonal component S AVG t is obtained by fitting a locally weighted linear regression, that is, placing less weight on the points at the edge of the smoothing window, centered about the current element, according to a Gaussian function (see Reference [16]). The window length sets the number of weighted neighbouring elements used to fit the linear regression locally by OLS, and it has been selected using repeated cross-validation (see Section 3.2). Both components of Equation (6) are refitted using the backfitting algorithm of Section 3.1.
The illustrative example of Figure 3 is representative of the typical behavior observed in all our experiments with the 37 countries. The seasonal component estimated by AVG selected a window size of 89 in this particular case. As expected, this window length produces a smooth output with a reasonable compromise between bias and variance, being the bias concentrated in summer and winter, the periods of higher curvature.

Robust Locally Estimated Scatterplot Smoothing
As an alternative to the proposed AVG method, the robust locally estimated scatterplot smoothing decomposition (LOESS) replaces the locally weighted linear regression used in the seasonal component of AVG by robust LOESS, a weighted quadratic least squares regression robust to possible outliers, see References [17,18]. Note that LOESS uses the tri-cube weight function instead of the Gaussian one used in AVG. As in the proposed AVG method, the window length has been selected using repeated cross-validation (see Section 3.2). According to Figure 3, the seasonal component estimated by LOESS presents a good compromise between bias and variance, but, as also happened with AVG, bias is concentrated in the periods with more curvature (i.e. summer and winter). It is noteworthy that these two models are distanced from all the others in these periods.

Linear Hinges Model
The linear hinges model decomposition (LHM) for the deterministic term of Equation (1) has the form where T LHM t is a simple linear trend given by β 0 + β 1 t, and the seasonal component S LHM t is obtained by fitting the Linear Hinges Model proposed in References [19,20]. Thus, the seasonal component S LHM t is a piecewise linear model defined by K knots, the points specifying the pieces. The trend and seasonal components of Equation (6) are refitted using the backfitting algorithm of Section 3.1. In each iteration of this algorithm the trend component is fitted by OLS, whereas the number and positions of knots in S LHM t are obtained using the learning algorithm described in Reference [19], a particular implementation of backfitting that combines a greedy divide-and-conquer strategy with a computationally efficient pruning approach and special updating formulas. Figure 3 shows the seasonal component estimated by LHM, having a good trade-off between bias and variance. Note that S LHM t is a very simple seasonal model. With only 22 parameters it is able to represent the underlying seasonal pattern in a compact form. The rest of seasonal models, except FFT, require hundreds of coefficients.

Generalized Additive Model
Following Reference [15], the Generalized Additive Model decomposition (GAM) proposed for the deterministic term of Equation (1) has the form where T GAM t and S GAM t are the trend and the seasonal component, respectively. The trend T GAM t is modeled by a simple linear regression β 0 + β 1 t. Concerning the seasonal component S GAM t , we have used a cyclic penalized cubic regression spline with the day of the year d t , with knots at each day of the year {1, · · · , 365}. This type of cubic spline forces that the beginning and the end of the seasonal term match up to second derivative, (see Reference [21] for further details). Note that because both model components have a straightforward representation using basis functions, all the parameters of this model can be fitted directly using OLS.
It is worth noting that that this model has also been used to test the suitability of linear trend. In order to do that, we have compared the results of the GAM with linear trend, with a version in which the trend has been also fitted using regression cubic splines (see, e.g., Reference [22]). Comparing their results, the first outperforms the out-of-sample error of the second in more than 84% of the cases. That confirms our initial assumption, and what was stated in Reference [11].
According to the illustrative example in Figure 3, the seasonal component estimated by GAM has a good trade-off between bias and variance. Note that the GAM's seasonal term seems to be a smoothed version of the seasonal obtained with REG. In fact, GAM provides the best results in our experiments of Section 4.

Estimation and Model Selection
The parameters of the previous models are estimated by minimizing the Mean Squared Error (MSE), or equivalently, the Root Mean Squared Error (RMSE) where N is the number of observations in the data set, y t is the actual temperature and D t is the estimated temperature by the deterministic component. We have also used the RMSE for evaluation of the performance of the different models tested in the following sections. According to the particular specification of each candidate model, only REG and GAM have a fixed number of parameters, that can be estimated using ordinal least squares (OLS). The rest of models require a mechanism to automatically estimate their complexity, as well as an alternative to standard OLS to fit the parameters. In this paper we have used repeated cross-validation (RCV) for selecting the complexity of these models, combined with backfitting to estimate their parameters. The list of models is presented in Table 1, showing those fitted using backfitting and RCV.
Note that to fit the seasonal component of the proposed methods, the February 29 of all the years are previously discarded in order to work with years of 365 days. Furthermore, all the methods except REG and FFT require to form a learning set by overlapping years, such as the scatterplots of Figure 2.

Backfitting
Among all the proposed models, only the parameters of REG and GAM can be estimated in one shot by OLS. When it is not possible to estimate the full set of parameters of (1) by ordinal least squares in one shot, an alternative is to estimate each component in a forward stepwise manner. This is the common approach, for example, in classical additive decomposition time series.
In the forward fitting approach the trend component is first estimated from the original data. Once the trend has been estimated, the seasonal component is estimated from the detrended series, that is, the time series resulting from subtracting the estimated trend from the original time series. The remainder term (R t ) is just the residuals of the deterministic component estimated using this simple procedure.
However, this forward one-step fitting can be improved using backfitting. This algorithm was initially proposed by Reference [23] in the context of projection-pursuit regression, being used for parameter estimation in well-known models such as GAM (see Section 2.6) and LASSO [24]. It is also the global fitting strategy of the LHM (see Section 2.5), the SNAKE model [25], and the medium-term forecasting model of Reference [26]. Reference [15] makes intensive use of this general algorithm, providing justifications for its use. Note that the backfitting algorithm is in fact a kind of coordinate descent optimization method, see Reference [27]. According to Reference [28], these coordinate descent algorithms are also used to solve problems that arise in machine learning and data analysis, particularly in big data settings.
Basically, backfitting is an iterative strategy where the parameters of the model are grouped such that the solution for those in each group is straightforward given fixed values for those outside the group. The algorithm iterates through these groups, one by one, making several passes over the groups. Although this strategy does not guarantee that the solution is the global minimum, this does not mean that the algorithm is not useful. Moreover, experience tells us that it is very effective in practice.
In this paper we propose a particular implementation of the backfitting algorithm, see Algorithm 1, designed for fitting the decomposition model of (1) when direct ordinal least squares is not possible. There are two groups of parameters, those related to the trend component and those that define the seasonal term. The remainder component is just computed at the end by subtracting the estimated trend and seasonal terms.
According to Reference [16], the required number of cycles m of the backfitting algorithm is usually less than 20, depending on the amount of correlation in the inputs. In this paper we have set m = 20 in Algorithm 1.

Algorithm 1:
The backfitting algorithm for the additive decomposition model Input: Training data given by the temperature time series Output: Model defined by components T t , S t and R t Step 1: Fit the trend component 4 Compute the partial residuals Step 2: Fit the seasonal component 7 Compute the partial residuals Center S t = S t − mean(S t ) 10 Step 3: Compute the remainder component

Complexity Selection: Repeated Cross-Validation
Determining the optimal value for the complexity parameter is critical for ensuring that the model performs well. In this paper, where the complexity of several models had to be determined (specifically, FFT, AVG and LOESS) before being used for each country, we have used repeated cross-validation (RCV, also known as leave-group-out or Monte Carlo cross-validation, see Reference [29]) to ensure a good complexity-accuracy balance. Unlike K-fold cross-validation, where the number of folds determines the number of equal-sized and mutually-exclusive folds, RCV allows decoupling the number of partitions and its size. Being N the size of our dataset, first, a randomly selected (without replacement) fraction of data of size M is taken as training set, and the rest of the points are used to validate. This sampling is repeated K times, being K and M independent and controlled by the practitioner. The error for each partition is evaluated over the remaining N-M points.
In this paper, RCV has been chosen over K-fold cross-validation to better control the variance of our results, due to the fact that it has been used to determine the complexity of the aforementioned three methods, and for all the different time series that will be analyzed in the case study of Section 4 (74 time series, minimum and maximum temperatures from 37 European countries). To do that, and using the values suggested in Reference [29], at each replication we have used a 75% of our training data for parameter estimation (M), and we have carried out 100 repetitions (K).
Finally, in order better control model complexity not simply selecting the one with lowest RCV error, and following a similar approach to the one-standard error rule detailed in Reference [16], the most parsimonious model whose error is inside the confidence interval of 95% of the error of the best one is finally chosen. Figure 5 shows an example of this final step.

Results: The European Case
In this section, the minimum and maximum daily outdoor temperatures from 37 weather stations (WS) will be used, in order to assess the impact of the trend component in the context of mediumor long-term estimations, as well as to compare exhaustively all the methods listed in Table 1 and explained in previous sections. Furthermore, the best performing model will be finally selected, analysing the estimated temperature trends in Europe according to the selected model.

Data Description
First, the data used for the case study are described. All the data used in this paper come from the European Climate Assessment & Dataset project (ECA&D) [30]. The dataset consists of thousands of weather stations, with quite complete daily temperature observations since 1980 for the main European cities. It should be noted that in spite of the fact that we have focused on the last forty years of data, the ECA&D dataset, depending on the weather station, has much more past information. As an example, the oldest (not-empty) available data point comes from Radcliffe Meteorological Station of Oxford University (ID 274 in ECA&D), from which there is information from December 1814. In this paper, we will select one reference time series for each country. A first pre-processing step to select that reference temperatures, remove outliers and fill their missing values was required before testing the different models. Regarding missing values, we have applied a hierarchical regression imputation method, based on neighbouring stations, that is detailed in Appendix A.2. All the information regarding data pre-processing is detailed in Appendix A, including the list of reference weather stations that have been selected.
Our dataset consists of the minimum (TMIN) and maximum (TMAX) daily outdoor temperatures recorded at the 37 weather stations of Table A1, from January 1980 to December 2019. Thus, this case study consists of 74 daily time series of length 40 years (14,610 days). Furthermore, let us describe the different data partitions that have been used in this paper. The years 1980-2009 have been used as training data (in-sample, 10958 points), and years 2010-2019 have been used as test set (out-of-sample, 3652 points). Figure 6 shows the boxplot of the minimum and maximum daily temperatures of the selected station for each country, once main outliers have been removed (see Table A2). It is noticeable the clear differences between the Mediterranean countries, such as Malta (MT), Italy (IT), Greece (GR), Spain (ES) or Cyprus (CY), and the rest.   Figure 8 shows the location of the selected weather stations. Each station has been coloured according to the identified clusters of the dendrogram of the maximum temperature (see Figure 7, top). Note that both latitude and longitude explain those clusters.

Importance of the Trend Component
This section aims to briefly analyze the effect of including the trend in all the methods described in Section 2. To do that, two versions of each model have been fitted, one using linear trends as described in Section 3, and other just using seasonal components and the level (i.e., mean value) of each time series. Table 2 shows the out-of-sample error improvements obtained by including the trend in each case, calculated as the percentage improvement in RMSE of the model with linear trend, compared to the one using the mean.
First, it can be seen that results are systematic-the effect of including the trend in a particular time series (minimum or maximum from any country) provides similar error improvements regardless the model in use. Secondly, regarding trend significance, we have obtained p-values lower than 0.05 in 73 of the 74 time series: the only exception is the minimum temperature from Romania (RO), with a p-value of 0.112. Starting from this point, it can be clearly seen that including the trend improves model performance in nearly all the cases. In terms of minimum temperatures, excluding RO , 92% of the countries present out-of-sample error improvements, whereas in the case of TMAX, that rate rises to 97%. Finally, it can be seen that several countries, such as Cyprus (CY), Poland (PL), or Serbia (RS) present high error improvements. However, in the other side, one of the 74 analyzed time series has a surprising result: the trend of the minimum temperature from Ireland (IE), whose p-value is 7.41·10 −5 , and providing nearly an 1% of in-sample error improvement, causes an out-of-sample error increase of 6%. As it will be discussed in Section 4.4, for all the models, the resulting trend in the minimum temperature of IE has been negative, and it seems not to be the behavior of the time series during the 10 years of test set. Ignoring that case, and as aforementioned, the trend has proved to improve model out-of-sample performance in most of series and countries.

Empirical Comparative Analysis
Having described the dataset to be used, and having confirmed the importance of the trend component in long-term temperature forecasting, this section aims to compare the performance of the different candidate models of Table 1 in the selected 37 reference European stations (see Table A1). As aforementioned, for those methods that require selecting model complexity (i.e., FFT, LOESS and AVG), their hyper-parameters have been estimated for each country by using repeated cross-validation, and are detailed in Table A4. The estimated number of knots of the LHM for each country, which is automatically determined by its learning algorithm, can be also seen in Table A4. The results provided by the six methods over the 37 reference weather stations, and for their minimum and maximum temperatures, are detailed below.
First, before presenting the R-squared and the in-sample and out-of-sample errors (RMSE) of all the methods, Figures 9 and 10 show the relative position of the different methods when estimating the minimum and maximum temperatures, respectively, and in-sample and out-of-sample. It should be noted that in both figures, countries have been sorted attending to the clusters of Figure 7 but there is not a best performing model for each group, and we have not found any relationship between that ordering and model performance.
The results obtained for the minimum and maximum temperatures are quite similar-the best performing models are the same in Figures 9 and 10. First, it should be noted that regarding in-sample error, the REG method outperforms in all the countries all the other models. However, it can be seen that REG is never selected as one of the the top-3 models for out-of-sample performance. For that reason, we can conclude that it is clearly over-fitted. Ignoring REG, it can be seen that in both minimum and maximum temperatures, the second and third places regarding in-sample error belong to the GAM, and the LHM (the latter, beaten by the FFT in some countries).  On the other hand, in the out-of-sample set, there are three models that clearly outperform the others-GAM, FFT an LHM. The first one, that also was the second best in-sample performer after the over-fitted REG, has the lowest out-of-sample RMSE in most of the cases. As it can be seen in Figures 9  and 10, GAM is the out-of-sample winner in almost 60% of the countries, both in the minimum and maximum temperatures. After the GAM, FFT provides the lowest error in approximately 25% of the countries, and LHM in the vast majority of remaining cases.
Before presenting the RMSE for all the methods and countries, and in order to check the goodness of the estimated deterministic (trend plus seasonal) components explaining temperature variance, Tables 3 and 4 present the Adjusted R-squared (R 2 adj ) for the minimum and maximum temperatures, respectively. It can be seen that, in spite of the fact that the remainder has not been modelled and, therefore, forecasting performance can be improved, the obtained R 2 adj are over 0.7 in the vast majority of cases. The average R 2 adj for the minimum temperature and all the models and countries is of 0.731, for both in-sample and out-of-sample sets. Regarding maximum temperature, the average R 2 adj is 0.777 and 0.776 respectively. The only cases where a weak R 2 adj has been obtained (lower than 0.5 in average) are Ireland (IE), for its out-of-sample minimum temperature, and Iceland (IS), for its maximum temperature (both in-and out-of-sample). Finally, Tables 5 and 6 present the in-sample and out-of-sample errors (RMSE) of all the models, for the minimum and maximum temperatures respectively. As aforementioned, the GAM provides good in-sample results, is always in the top-3 models in terms of out-of sample performance, and beating all the others in almost 60% of cases. For that reason, in order to analyze long-term temperature trends in Europe, and their expected values in the following years Section 4.4, the GAM will be the only method to use.    Table 6. Summary of the in-sample (TR) and out-of-sample (TS) errors for the maximum temperatures, obtained with all the different methods and for all the countries. The bold values with * indicate the best method for each country and dataset.

Analysis of the Long-Term Temperature Trends with the Best Performance Model
Once analysed the performance of all the different methods over the 37 European countries, and confirmed that the GAM is the model, this subsection aims to analyze the results provided by that model in all the countries to shed some light on potential future changes and better understand the behavior of European temperature trends. It should be noted that in spite of the fact that we will only analyze the results provided by the GAM in this section, we could extract the same conclusions, regarding the trend component, with all the tested methods, since their resulting trends are very similar. As an example, the average difference between the three best performing models (FFT, LHM and GAM) in the trends of the minimum temperatures is 9.43·10 −4 • C/year.
It should be noted that this section presents the resulting trends of the GAM, using the reference weather stations from the ECA&D dataset presented in Table A1. Although we have performed a systematic data pre-processing step, removing outliers and filling all the missing points (see Appendix A), any data inconsistency in the original dataset can affect model results. As an example, the reference weather station from Estonia, presents a sudden temperature increase of around 1 • C during the last 13 years of our in-sample period for its minimum temperature. For that reason, its estimated trend may not be reflecting the actual behavior of that temperature. Figure 11 shows the trends estimated by the GAM for the minimum and maximum temperatures and all the countries. First, it can be seen that most countries present positive temperature trends in both series, 0.02 to 0.08 • C/year. The only exceptions are Romania (RO), Ireland (IE) and Estonia (EE), regarding TMIN, and Malta (MT) and Croatia (HR) regarding TMAX. In the case of RO, its minimum temperature grows at a rate of 0.007 whereas its maximum does so at 0.06 • C/year. The case of IE is more surprising: although its maximum temperature is growing 0.04 • C each year, is the only country in which the GAM (and all the other models) has estimated a negative trend: its minimum temperature is decreasing 0.013 • C/year. At the far end, EE presents a very similar trend of TMAX to that of IE, but it is the country where the minimum temperature presents a higher growing rate: 0.102 • C/year. However, due to the aforementioned data issue, this result may be not representative. On the other hand, in terms of maximum temperatures, Malta (MT) is growing slower than all the other countries (0.013 • C/year), and Croatia (HR), just in the other side, has the largest increases with 0.089 • C/year. In summary, the average temperature increase of the minimum temperatures of the 37 European countries is 0.0485 • C/year, whereas the average trend for the maximum temperatures is 0.0554 • C/year. To observe all the detail, Table 7 shows the results of the GAM in all the different countries and for the minimum and maximum temperatures. In order to find possible similar behaviors between countries, Figure 12 presents the trends of Table 7, separated in minimum and maximum, and coloured according to the clusters formed shown in Figure 7. First, it can be seen that, once the countries have been sorted by cluster, several trend patterns can be appreciated.
Let us analyze several examples. Cluster number 5 of of maximum temperatures, formed by Baltic countries, Nordic countries (except Iceland), Belarus (BY), Poland (PL) and Russia (RU) presents trend values between 0.047 and 0.072 • C/year. Spain (ES) and Portugal (PT) form, for both minimum and maximum temperatures, the Iberian cluster with positive but low values of trend. Ireland (IE) is in the same cluster than the UK for the maximum temperatures and with similar values; however, they are in two separated clusters for TMIN (IE is the only country with negative trend for that variable). France (FR), Belgium (BE), Switzerland (CH) and the Netherlands (NL) have similar results in terms of maximum temperature, but regarding TMIN FR presents lower increases than all the other neighbours. Italy (IT) and Malta (MT) behaves similarly in TMAX (flat trends), but in TMIN, MT presents a higher value.
In order to check the geographical distribution of these trends, Figure 13 shows the European map of minimum and maximum temperature trends. First, it can be seen that, in general terms, the minimum temperatures are increasing at a slower rhythm than maximum temperatures: 68% of countries present higher growing rates in its maximum temperature. Regarding maximum temperatures, fourteen countries present annual increases higher than 0.06 • C/year, leaded by Croatia (HR) with 0.089 • C/year. Only seven countries grow in minimum and maximum temperatures at a rate higher than 0.06 • C/year-the Czech Republic (CZ), Luxembourg (LU), Moldova (MD), Norway (NO), Serbia (RS), Sweden (SE) and Slovenia (SI). It should be noted that, Finland (FI) with trends of 0.052 and 0.063 • C/year for its minimum and maximum temperatures respectively, is not very far from that group, so Scandinavian countries present quite high grow rates for both variables.

Conclusions
Temperature forecasting is a common step for most energy forecasting methods and several techniques for temperature scenario generation can be found in literature. Furthermore, and also related to electric load forecasting, recent papers have discussed the use of temperature trend, concluding that the effect of that component, dealing with two years of training data and one year to test, is negligible in terms of temperature forecasting accuracy.
However, dealing with long-term estimations (i.e., more than three years, and ten years in this paper), and training also with longer time series (40 years), our results have shown that we can make more accurate predictions in the long-term of the daily minimum and maximum temperatures by including a linear trend in the model. Using time series decomposition, and dealing also with the seasonal component, six different methods have been analyzed, concluding that Generalized Additive Models (GAM) outperforms all the others, providing the lowest out-of-sample error in almost 60% of the 74 time series analyzed (minimum and maximum temperatures from reference weather stations from 37 European countries).
In addition, a brief analysis of GAM results for temperatures of those 37 countries has been carried out, showing that both maximum and minimum temperatures present linear increasing trends, (with p-value = 0), and rates between 0.02 and 0.08 • C/year in most cases. As next steps, including the remainder in this kind of temperature decomposition methods will allow us to model the effect of cold and heating waves, and to better understand the behavior and possible correlation, between that component in different countries.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following main abbreviations are used in this manuscript:

Appendix A. Data Cleansing and Hierarchical Regression Imputation
This section aims to describe the complete data pre-processing process that has been carried out from ECA&D original datasets [30] (specifically, blended daily minimum and maximum temperatures) to the dataset that has been finally used in the paper. The two following subsections describe the most critical part of this pre-process: outlier detection and data imputation, but first, let us briefly describe how we have selected the reference weather stations to use. As our case study focus on the European case, 9 countries and a collective grouping of two remote jurisdictions of Norway: Svalbard and Jan Mayen were removed from our original dataset. Specifically, the not considered countries were: Algeria, Egypt, Greenland, Israel, Libya, Morocco, Syria, Turkmenistan, and Tunisia. Secondly, in order to select a reference weather station to analyse for each of the remaining 37 European countries, a simple data quality assessment was carried out. We have selected stations located at the country capitals, whenever the amount of available data and the existing missing data in the period of interest for our case study (January 1980 to December 2019) are reasonable. It should be noted that in order to better select the reference temperature for electric load forecasting, methods such as those described in Reference [31] or Reference [32] would provide better results in terms of error. However, since this paper aims to model the deterministic component of a temperature time series regardless its subsequent use, WS selection is out of the scope of this paper. Table A1 lists the weather stations that have been finally selected. For each country, it includes its ISO 3116 code (Country ID), the ECA&D station identifier (Station ID), the station name (Station), the latitude and longitude in degrees of the WS (Lat and Lon), as well as the station elevation in meters (Hgth). As stated in Reference [12], where authors propose a temperature anomaly detection method based on electricity demand, raw data collected by local weather stations are usually full of missing values and outliers, and they must be corrected in order to preserve model accuracy. Our dataset is not an exception, and even the finally selected temperatures of each country, which were chosen after an initial data quality assessment, have missing values and several outilers. In order to modify as less as possible the original set of time series and do it in a controlled way, we established an upper and lower threshold for outlier detection and set to missing value all the days outside the interval. Specifically, all the temperatures above 50 • C and below −60 • C has been detected as outliers. Figure A1 shows two of those outliers, removed afterwards from the maximum temperature of Malta, where several missing values can also be seen. It should be noted that in order to carry out the regression imputation method described in Appendix A.2, which uses neighbouring WS to fill missing values of the selected reference temperatures, the oultiers of our complete input dataset have been removed. For simplicity, Table A2 presents the points that have been removed from the original reference stations. As it can be seen, there is a limited number of outliers. In the worst case, they represent a 0.027% of a time series: the minimum temperature from Malta, with 4 outliers. Overall, only 11 outliers have been detected. Furthermore, all these points have been filled using the method described below. Figure A1. Example of outliers that have been detected in the reference maximum temperature from Malta (MT) on 28 January and 7 December 1990, and removed form the original dataset.

. Hierarchical Regression Imputation
After deleting all the outliers of the dataset, the goal of this last step in our pre-processing is to fill all the missing points of the reference time series. Considering the great correlation between temperature time series (specially between those WS that are very close geographically), and the large amount of stations available (together with their location and height), we propose a hierarchical regression imputation method. Assuming that neighbouring weather stations have strong correlations, provided they are not at a very different height, and given a particular reference temperature to fill, in this example a maximum temperature, the remaining stations are sorted by distance. In order to avoid selecting temperatures at a very different height, all the stations with more than 200 meters height difference are deleted from the initial list of stations. To begin to with the set of candidate variables to fill the gaps, the opposite variable of the same station is chosen. In our aforementioned example, the minimum temperature of the reference station. After that, the nearest stations are added one by one, until there are no gaps left to fill. Note that, if the nearest stations share the gaps of the reference temperature, they are not added to the candidate set. This can cause that, in some cases, we have to select variables that can be farther away than what we could have initially assumed. Once determined how many and which variables are enough to complete the reference time series, a linear regression model with all that set as input is fitted, and all the gaps are replaced by the estimated value of the model at those missing points. Table A3 shows all the missing points (after removing the outliers detailed in Table A2) that have been filled, and the WS that have been used for each country. It can be seen that, in general, there is not a large amount of missing values in our reference time series. Only 4 variables of the 74 that have been used (TMAX and TMIN from 37 countries) contained more than a 1% of empty points. Specifically, all except the two series from Malta (MT) and Latvia (LV), which had between a 3.65% and 7.21% of missing values. To fill the empty points of MT, 2 additional time series from Italy have been required, and for LV, 5 neighbouring WS have been used: one from Latvia, two from Sweden, and one from Estonia. Table A3. Detail of the missing points of the reference WS of each country.The first two columns show the Country ID (C-ID) and Station ID (WS-ID). The last two columns show the neighbouring WS than have been used to fill all the empty points of each time series.