Short-Term Electricity Demand Forecasting Using a Functional State Space Model

: In the past several years, the liberalization of the electricity supply, the increase in variability of electric appliances and their use, and the need to respond to the electricity demand in real time has made electricity demand forecasting a challenge. To this challenge, many solutions are being proposed. The electricity demand involves many sources such as economic activities, household need and weather sources. All of these sources make electricity demand forecasting difﬁcult. To forecast the electricity demand, some proposed parametric methods that integrate main variables that are sources of electricity demand. Others proposed a non parametric method such as pattern recognition methods. In this paper, we propose to take only the past electricity consumption information embedded in a functional vector autoregressive state space model to forecast the future electricity demand. The model we proposed aims to be applied at some aggregation level between regional and nation-wide grids. To estimate the parameters of this model, we use likelihood maximization, spline smoothing, functional principal components analysis and Kalman ﬁltering. Through numerical experiments on real datasets, both from supplier Enercoop and from the Transmission System Operator of the French nation-wide grid, we show the appropriateness of the approach.


Introduction
Important recent changes in electricity markets make the electricity demand and production forecast a current challenge for the industries. Market liberalization, increasing use of electronic appliances and the penetration of renewable electricity sources are just a few of the numerous causes that explain the current challenges [1]. On the other side, new sources of data are becoming available notably with the deployment of smart meters. However, access to these individual consumers' data is not always possible (when available) and so aggregated data is used to anticipate the load of the system. Load curves at some aggregate level (say regional or nation-wide) usually present a series of salient features that are the basis of any forecasting method. Common patterns are long-term trends, various cycles (with yearly, weekly and daily patterns) as well as a high dependence on external factors such as meteorological variables. We may separate these common patterns into two sets of effects. On one side, the effects linked to the social and economical organisation of the human activity. For instance, the calendar structure induces its cycle to the electrical demand: load needs during week days are higher than during weekends, and load during daylight is higher than during the night; holidays also have a large impact on the demand structure. Notice that these effects are mostly deterministic in their nature, that is they can be predicted without error. On the other side, we found the effects connected to the environment of the demand-for instance, the weather, since the demand usually depends on variables such as air temperature, humidity, wind speed and direction, dew point, etc., but also variables connected to exceptional events such as strikes or damages on the electrical grid that may affect the demand. While weather is still easier to anticipate than exceptional events, both effects share a stochastic nature that makes them more difficult to anticipate.
While only recorded at some time points (e.g., each hour, half-hour or quarter-hour), the electricity load of the system is a continuum. From this, one may consider mathematically the load curve as a function of time with some regularity properties. In fact, electrical engineers and forecaster usually represent the load curve as a function instead of a sequence of discrete measures. Then, one may study the electrical load as a sequence of functions. Recently, attention has been paid to this kind of setting, which is naturally called functional time series (FTS). A nice theoretical framework to cope with FTS is within the autoregressive Hilbertian processes, defined through families of random variables taking values on a Hilbert space [2,3]. These processes are strictly stationary and linear, which are two constrictive assumptions to model the electrical load. An alternative to linearity was proposed in [4] where the prediction of a function is obtained as a linear combination of past observed segments, using the weights induced by a notion of similarity between curves. Although the stationary assumption of the full time series is still too strong for the electrical load data [5], corrections can be made in order to render the hypothesis more reasonable. First, one may consider that the mean level of the curves presents some kind of evolution. Second, the calendar structure creates on the data at least two different regimes: workable and non workable days. Of course, the specific form of the corrections needed should depend on the nature of the model used to obtain the predictions.
State-space models (SSM) and the connection notion of Kalman filter are an interesting alternative to cope with nonlinearity and non stationary patterns of the electrical data. Let us mention some references where SSM have been used to forecast load demand. Classical vector autoregressive processes are used in [6] under the form of SSM to compare the predictive performance with respect to seasonal autoregressive processes. The main point in [7] is to combine several machine learning techniques with wavelet transforms of the electrical signal. SSM are then used to add adaptability to the proposed prediction technique. Since some of the machine learning tools may produce results that are difficult to interpret, the authors in [8] looks for a forecasting procedure based on SSM that is easier to analyse making explicit the dependence on some exogenous variables. They use a Monte Carlo based version of the Kalman Filter to increase flexibility and add analytical information.
A more detailed model is in [9], where the authors propose to describe the hourly load curve as a set of 24 individual regression models that share trends, seasons at different levels, short-term dynamics and weather effects including non linear functions for heating effects. The equations represent 24 univariate stochastically time-varying processes that should be estimated simultaneously within a multivariate linear Gaussian state space framework using the celebrated Kalman filter [10]. However, the cumbersome of the computational burden is a drawback. A second work circumvents the problem by using a dimension reduction approach which reasonably resizes the problem into a handy number of dimension which render the use of the Kalman filter practicable [11].
Some successful uses of SSM to cope with functional data (not necessarily time series) are reported in literature-for instance, by using common dynamical factor as in [12] to model electricity price and load, or as in [13] to predict yield curves of some financial assets in addition to [14] where railway supervision is performed thanks to a new online clustering approach over functional time series using SSM.
Inspired by these ideas, we push forward the model in [11] to describe now a completely functional autoregressive process whose parameter may eventually vary on time. Indeed, at each time point (days in our practical case), the whole functional structure (load curve) is described through the projection coefficients on a spline basis. Then, using a functional version of principal components analysis, the dimension of the representation is reduced. The vector of spline coefficients is then used as a multivariate autoregressive process, as in [15]. Thus, our approach is completely endogenous but with the ability of incorporating exogenous information (available at the time of the forecast) as covariates.
This paper will be structured as follows. In Section 2, we describe the model we propose for forecasting electricity demand. We present the functional data, functional data representation in splines basis, the state space model that we propose and model estimation methods. Section 3 is proposed to show a model inference on a simulated dataset. We will talk about Kalman filtering and smoothing, functional principal component analysis. Section 4 will describe the experiments we make on real data with simple application of our procedure at the aggregation level of a single supplier. We then explore, in Section 5, some corrections and extension to the simple approach in order to take into account some of the non stationary patters present in the data. Additional experiments are the object of Section 6, where we predict at the greater national aggregation level. The article concludes in Section 7 where some future lines of work are discussed.

Materials and Methods
We introduce in this section the notation and we construct the prediction method. For convenience, Table 1 sums up the used nomenclature including all variables, acronyms, indexes and constants defined in the manuscript, in order to make the text more clear and readable. The starting point of our modeling is a univariate continuous-time stochastic process Z = {Z(t), t ∈ R}. To study this process, a useful device [2] is to consider a second stochastic process which is now a discrete-time process and at each time step it takes values on some functional space. The process X is derived from Z as follows. For a trajectory of Z observed over the interval [0, T], T > 0, we consider the n subintervals of form I i = [(i − 1)δ, iδ], i = 1, . . . , n such that δ = T/n. Then, we can write With this, anticipate the behavior of Z on say [T, T + δ] is equivalent to predict the next function X n+1 (t) of X. The construction is usually called a functional time series (FTS). The setting is particularly fruitful when Z presents a seasonal component of size δ. In our practical application, Z will represent the underlying electrical demand, δ will be the size of a day and so X is the sequence of daily electrical loads. Notice that X represents a continuum that is not necessarily completely observed. As mentioned in the Introduction, the records of load demand are only sampled at some discrete grid. We will discuss this issue below.

Prediction of Functional Time Series
The prediction task involves making assertions on the future value of the series X n+1 (t) having observed the first n elements X 1 (t), . . . , X n (t). From the statistical point of view, one may be interested in the predictorX which minimizes the L 2 prediction error given the available observations at moment n. A useful model is the (order 1) Autoregressive Hilbertian (ARH(1)) process defined by where µ is the mean function of X, ρ is a linear operator and { i (t)} is a strong white noise sequence of random functions. Under mild conditions, Equation (2) defines a (strictly) stationary random process (see [2]). The predictor (1) for the ARH(1) process isX n+1 (t) = µ(t) + ρ(X n (t) − µ(t)), which depends generally on two unknown quantities: the function µ and the operator ρ. The former can be predicted by the empirical meanμ(t) =X n (t). The alternative for the latter is to predict ρ by sayρ n and obtain the predictionX n+1 =μ n +ρ n (X n −μ n ), or to estimate directly the application ρ(X n −μ n ) of ρ over the last observed centered function. Both variants needs an efficient way to approximate the possibly infinite size of either the operator ρ or the function ρ(X n −μ n ) which are then estimated (see discussion below on this point).
The inherent linearity of Equation (2) makes this model not flexible enough to be used on electricity load forecast. Indeed, having only one (infinite-dimensional) parameter to describe the transition between any two consecutive days is not reasonable. Variants have been studied. We may mention [16] which incorporate weights in Equation (2) making the impact of recent functions more important; the doubly stochastic ARH model that considers the linear operator ρ to be random [17]; or the conditional ARH where an exogenous covariate drives the behavior of ρ [18]. In the sake of more flexibility, we aim to make predictions on a time-varying setting where the mean function µ(t) and the operator ρ are allowed to evolve.

Spline Smoothing for Functional Data
In practice, one only disposes a finite sampling x = {x(t j ), j = 1, . . . , N} observed eventually with noise, from the trajectory x(t) of the random function X(t). Then, one wishes to approximate x(t) from the discrete measurements. A popular choice is to develop x(t) over the elements of a L 2 basis φ 1 (t), . . . , φ k (t), . . ., which is to write where the coefficientsỹ k =< x(t), φ k (t) > are the coordinates resulting of projecting the function x on each of the elements of the basis. Among the several bases usually used, we choose to work with a B-spline basis because they are adapted to cope with the nature of the data we want to model and have nice computational properties. B-splines is a basis system adapted to represent splines. In our case, we use cubic spline that is 3th-order polynomial piecewise functions. The connections are made at points called knots in order to join-up smoothing, which is warranting the continuity of the second order derivative. An appealing property of B-spline is the compact support of its elements that gives good location properties as well as efficient computation. Figure 1 illustrates this fact from the reconstruction of a daily load curve. The B-spline elements have a support defined over compact subintervals of horizontal axis.
Another important property is that, at each point of the domain, the sum of the spline functions is 1. Since the shape of the spline functions on the border knots are clearly different, this fact is clearly observed on the extreme points of the horizontal axis where only one spline has a non null value. Together with the regularity constraints and the additional knots on the extreme support, these points are subject to a boundary condition. Figure 1 illustrates this important issue concerning the behavior of the boundaries. To avoid this undesirable effect, we will use a large number of spline functions on the basis that empirically allows for reducing the boundary condition.

Functional Principal Components Analysis
Like in multivariate data analysis, Functional Principal Components Analysis (FPCA) provides a mechanism to reduce the dimension of the data by a controlled lost of information. Since data in FDA are of infinite dimension, some care must be given to the sense of dimension reduction. Indeed, what we look for is a representation of the functions like the one in (3) with a relatively low number of basis functions that are now dependent on the data. Moreover, if we demand also that the basis functions form an orthonormal system, then the solution is given by the elements of the eigendecomposition of the associated covariance operator (i.e., the functional equivalent to the covariance matrix) [19].
However, the problem is that these elements are functions and so of infinite dimension. The solution is to represent themselves into a functional basis system (for instance, the one presented on the precedent section). Thus, the initial curve x(t) can be approximated in the eigenfunctions basis system: where the number p of eigenfunctions, expected to be relatively small, will be chosen as such according to the error of approximation of the curves. Since the representation system may be non orthogonal, then it can be shown that the inner product needed in FPCA is connected to the properties of the representational basis system.
Then, the notion of dimension reduction can be understood when one compares the lower number of eigenfunctions with respect to the number of basis functions needed to represent an observation. FPCA reduction of a representation dimension, which will yield a dramatic drop of the computational time of the model we describe next.

State Space Model
State Space Models (SSM) are a powerful useful tool to describe dynamic behavior of time evolving processes. The shape of the load curve may present long-term changes that induce non stationary patterns on the signal. Taking into account these changes is one of the challenges of electricity demand forecast.
The linear SSM [10] includes two terms. An inertial term in the form of an intrinsic state of the whole system being modeled. The observed output is a function of the state, some covariate and a noise structure. The state evolution over time is modeled as a linear equation involving the previous state and other observed variables summarized in a vector η. The general formulation is given by: where y i is the target variable observed at time i, z i ∈ R m+1 is a vector of predictors, the state at time i is represented as α i ∈ R m+1 , T i and R i are known matrices, and i and η i are the noise and disturbance processes usually assumed to be independent Gaussian with zero-mean and its respective covariance matrices H i and Q i , which usually contains unknown parameters. The evolution of the states are useful to understand the system. Using the celebrated Kalman Filter and smoothing, one is able to extract information about the underlying state vector [10]. The one-step-ahead prediction and prediction error are respectively In addition, their associated covariance matrices are of interest so let us define P i+1 = Var(α i+1 |y 1 , . . . y i ) and F i = Var(v i ) = z i P i z i + H i . Since these definitions use recursion, an important step is its initialization. When the observations are unidimensional, an exact diffusion method can be used from uninformative diffuse prior. However, the method may fail with multivariate observations because the diffusion phase can yield into a non invertible F i matrix. Moreover, even when F i is invertible, computations become slow due to its dimensionality. It is however possible to obtain an univariate alternative representation of (5) that theoretically reduces computational cost of the Kalman filter and allows one to use the diffuse initialization.
Inference on SSM can be obtained by a maximum likelihood (see [20]) and, due to the diffuse initialization, the stability of the model is described in [21].

A Functional State Space Model
Approaches of SSM in continuous-time also exists. For instance, Ref. [10] presents the simple mean level model. There, the random walk inherent to the state equation is replaced by a Brownian motion that drives the time-varying mean level. Early connections between FDA and SSM yielded derivations of a SSM with the help of FDA. For example, Ref. [22] uses spline interpolation to approximate the behavior of a time dependent system that is described by a space model.
Our choice is to keep the time discrete by allowing the observations to be functions or curves. A similar idea is behind the model in [14] where functions are used to represent observation on a SSM model, but only dependence between states is considered.
Let us consider the vector y i as the p FPCA scores resulting from the projection of x i (t), the load curve for day i, into the eigenfunctions basis system. Then, we may represent an autoregression system by replacing the covariate z i by the past load curve, or more precisely by its spline coefficients y i−1 .
We propose the following Functional State Space Model (FSSM): As before, the disturbance terms i and η i follow independent Gaussian distribution with zero mean vector and generally unknown variance matrices H i and Q i . The sizes of these matrices are in a function of p, the number of FPCA scores retained on the approximation step discussed above.
In order to keep the computation time under control while keeping some flexibility on the modeling, we focus on three structural forms of matrices H i and Q i : full, diagonal and null, which yields six possible models. Table 2 summarizes the variants as well as the number of parameters to be estimated on the covariance matrices. The complexity of the variant grows while going from 1 to 6. When Q i is null, then the state equation establishes that states are simply constant on time. Diagonal structures on Q i and H i assumes that all the correlations are null and so only variance terms are to be treated. Conversely, full structures allows for a maximum of flexibility letting all the covariances be free. However, the important drawback of dimensionality becomes crucial since the number of terms to be estimated if of order p 4 .
The FSSM we propose is an SSM on the FPCA scores. Another choice could have been to apply the SSM directly on the spline basis coefficientsỹ i , but such choice would be computationally too expensive. It is illustrative to link these dimensions to the problem of electricity demand forecasting. Recall that the number of daily records on our load curves is 48 (sampled at half-hourly), which is prohibited to be treated within our framework. Even if this number can be easily divided by two using spline approximation, the number of coefficients would be still too high. Moreover, since the spline coefficients can not be considered independent, one would need to use full diagonal structures on the covariance matrices H i and eventually on Q i . Lastly, the choice we make to reduce the dimension by using FPCA approach is then justified since, with a handy number of eigenfunctions, say less than 10, most of the variants discussed above can be easily computed.
The whole prediction procedure is schematically represented as a flowchart in Figure 2.

Experiments on Simulated Data
We illustrate in this section our approach to forecast using the proposed functional state space models on a functional time series. There are three steps in our approach. First, we approximate the initial data using a B-spline basis. Then, an FPCA is performed using the B-splines approximations of the curves. Finally, a fit of the FSSM is obtained. Prediction can then be done by applying the recursion equations on the last estimated state. The resulting predicted coefficients are then put into the functional expansion equations (see Equations (3) and (4)) to obtain the predicted function. For the experiments, we use the statistical software R [23] to fit our model with the packages fda [24] for spline approximation and FPCA computation and KFAS [20] for the FSSM estimation.

Simulation Scheme
Let us consider a process Y generated as follows: where ν(t) is strong white noise process (i.e., an independent and identical distributed zero-mean normal random variables N (0, σ 2 )). Following [4], we set β 0 = 8, β 1 = 0.8 and β 2 = 0.18, θ = 0.8 and σ 2 = 0.05. Expression (7) is evaluated on discrete times ranging from 1 to δ × n, where n is the number of functions of length δ = 64. Then, we consider the segments of length δ as a discrete sampling of some unobserved functional process. Figure 3 represents a time window of the simulated data generated through model (7). Notice that the signal is composed of two additive sinusoidal terms of different frequencies plus a moving average structure for the noise term in order to mimic the double seasonal structure of load curves.

Actual Prediction Procedure
For each model variant, we build and fit the FSSM with the first 26 segments on the simulated signal. That is, each segment is projected on the B-spline basis, and these projections are used in an FPCA. We let the number p of principal components as a tune parameter of the whole procedure. Parameters are estimated and the states are filtered and smoothed as described in Section 2. The last state, together with the last segment coefficients are then used to predict the coefficients of the next segment of the signal, which is naturally not used in the estimation of the model. Using the reconstruction expression, the actual predicted segment is obtained that closes a prediction cycle.
In order to provide more robust prediction measures, several prediction cycles are used where a sequential increment of the train dataset is done. In what follows, we report results on four prediction cycles following the one-segment-ahead rolling basis described.

How We Measure Prediction Quality
There are three steps through which the prediction quality must be measured: the splines representation quality of the initial functions x, the functional principal components representation of x, and the forecasting. For all these three steps of quality measurement, we use the RMSE (Root Mean-Square-Error) and the MAPE (Mean Absolute Percentage Error). For one-step-ahead forecasting of vector x i on time i, if we consider the length of x i as h (h = δ in this case), these metrics are defined as: The RMSE is measured in the scale of the data (e.g., kWh for our electricity demand data), and MAPE is expressed in percentage. Notice that MAPE can not be calculated if target variable is zero at some time point. While this is quite unlikely in practice, our simulated signal may present values quite close to zero making MAPE to be unstable. However, this measure is useful to compare prediction performance between signals of different mean magnitude.

Spline Representation and Reconstruction
To represent the simulated data, we use cubic splines using a regular grid for the knots (with augmented knots on the extremes). To avoid cutting down predictive power of our forecast model, we may want to retain here as many spline coefficients as possible (in our case 63). However, we have to make a special point here since a boundary condition may yield artefacts on the spline coefficients near the boundaries. A simple way to reduce this problem was to choose this number of splines (and so the length of the interior knots) to be about 59. This choice produces reasonable quality reconstructions with a MAPE error less than 0.18%.

Functional Principal Components
The reconstruction quality of the initial functions highly depends on the number of principal components. Of course, the quality of the forecasts will also be impacted by this choice. Table 3 reports the reconstruction quality as mean MAPE and RMSE for two, three and four principal components.

Forecasting Results
In this topic, we discuss the forecasting errors for each choice of the structure of the matrices Q i and H i . We take cases of null, diagonal and full matrices Q i and H i , as described in Table 2. Table 4 reports RMSE and MAPE values for the forecasting of the simulation data. Both mean and standard deviation are presented. Better prediction performances produce lower MAPE and RMSE. On the one hand, as expected, the number of principal components retained has a large impact on the mean prediction performance. When only two principal components are kept, the prediction error is unreasonably large due to a poor reconstruction. On the other hand, there is no clear advantage for any variant since the standard deviations are large enough to compensate any pairwise difference. This is mainly due to the very small number of prediction segments. Variants with null Q i matrix are slightly more performant (e.g., smaller errors). This would indicate that a static structure is detected where no time-evolving parameters are needed to predict the signal, which is the true nature of the simulated signal. Finally, we compare now the variants from the computational time needed to obtain the prediction. We can see in Table 5 that differences in computing times are significant since standard deviations are quite small. For a fixed number of principal components, there is a clear ranking that can be obtained where the more parsimonious structures produce smaller computing times. Conversely, when the number of principal components increases the computation time increases. However, the increment is more important for the variants of covariances matrices having more parameters.

Experiments on Real Electrical Demand Data
We centre the experiments on this section around the French supplier Enercoop (enercoop.fr), one of the new agents appearing with the recent liberalization of the French electrical market. Electricity supplied by Enercoop is from green renewable electricity plants owned by local producers everywhere in France. The utility has several kind of customers such as householders, industries as well as small and medium-sized enterprise (SME) with different profiles of electricity consumption. People with households, for example, use electricity mainly for lightning, heating and, sometimes cooling and cooking. The main challenge for Enercoop is electricity demand and production and so anticipation of load needs is crucial for the company. We work with the aggregated electricity data that is the simple sum of all the individual demands.
We first introduce the data paying special attention to those salient features that are important for the prediction task. Then, we introduce a simple benchmark prediction technique based on persistence that we compare to the naive utilization of our prediction procedure. Next, in Section 5, we study some simple variants constructed to cope with the existence of non stationary patterns.

Description of the Dataset
Let us use some graphical summaries to comment on the features of these data. Naturally, we adopt the perspective of time series analysis to describe the demand series. Figure 4 represents the dataset that consists of half-hourly sampled records over six years going from 1 January 2012 to 31 December 2017. Vertical bars delimits years that are shown on top of the plot. Each record represents the load demand measured in kWh. First, we observe a clear, growing long-term trend that is connected to a higher variability of the signal at the end of the period. Second, an annual cycle is present with lower demand levels during the summer and higher during winter. Both the industrial production calendar and the weather impacts the demand, which explains this cyclical pattern. Figure 5 shows the profile of daily electricity consumption for a period of three weeks (from 31 October 2016 to 20 November 2016). This unveils new cycles presented in data that can be seen as two new patterns: the weekly one and the daily one. The former is the consequence of how human activity is structured around the calendar. Demand during workable days is larger than during weekend days. The latter is also mainly guided by human activity with lower demand levels during the nighttime, and the usual plateau during the afternoon and peaks during the evening. However, a certain similarity can be detected among days. Indeed, even if the profile of Fridays is slightly different, the ensemble of workable days share a similar load shape. Similarly, the ensemble of weekends form a homogeneous group. Holiday banks and extra days off may also impact the demand shape. For instance, in Figure 5, the second segment, which corresponds to 1st November, is the electrical demand on a bank holiday. Even if this occurs on a Tuesday, the shape of load of these days and the preceding one (usually an extra day off) are closer to weekend days.
We may also inspect the shape of the load curve across the annual cycle. A simple way to do this is to inspect the mean form between months. Figure 6 describes four monthly load profiles, where each month belongs to a different season of the year. Some differences are easy to notice-for instance, the peak demand is during the afternoon in Autumn and Winter but at midday in Spring and Summer. Others are more subtle, like the effect of daylight savings time clock change, which horizontally shifts the whole demand. Transitions between the cold and warm seasons are particularly sensitive for the forecasting task, especially when the change is abrupt.

Spline and FPCA Representation
As before, we report on the reconstruction error resulting from the spline and FPCA representations. For comparison purposes, we compute the error criteria for five choices on the number of splines (12,24,40, 45 and 47 splines) on the reconstruction of the coded functions. Table 6 shows the MAPE and RMSE between the reconstruction and the real load data. As expected, the lower the number of splines, the higher the reconstruction error. This shows that, using only spline interpolation, our approach is not pertinent because a relatively large number of spline coefficients is needed. The extremest case of 12 splines, which would make the computing times reasonable, produces a too large MAPE of 1.310%, which hampers the performance of any forecasting method based on this reconstruction. On the other extreme, using 47 cubic splines to represent the 48-length discrete signal of the daily demand produces boundary effects that will dominate the variability of the curves.
Since spline approximation is connected to the FPCA in our approach, we may check the reconstruction quality for all the choices issued from the crossing of the selected number of splines and a number of principal component between 2 and 10. Table 7 shows the MAPE and RMSE of the reconstructions obtained by each of the possible crossings. We may target a maximum accepted MAPE value of 1% in reconstruction. Then, there are just a few options, most of them with very close MAPE values. In what follows, we choose to work with 45 cubic splines and 10 principal components. Table 6. RMSE and MAPE between the splines approximation and the electrical load data as a function of the number of splines.

Forecasting Results
Forecasting is done in a rolling basis over one year in the data. Load data from 1 January 2012 to 30 October 2016 is used as a training dataset and, as a testing dataset, we choose a period from 31 October 2016 to 30 October 2017. Predictions are obtained at horizons one-segment ahead. This means that, actually, we are making predictions for the next 48 time steps (1 day) if we adopt a traditional time series point of view. Once the prediction is obtained, we compare it with the actual data and incorporate the observation into the training dataset. Thanks to the recursion in the SSM, only an update step is necessary here.
To give a comparison point to our methodology, we propose using a simple but powerful benchmark based on persistence forecasting.

Persistence-Based Forecasting
A persistence-based forecasting method for a functional time series equals to anticipate X n+1 (t) with the simple predictorX n+1 (t) = X n (t). Thus, the predictor can be connected to the ARH model (Equation (2)) where the linear operator is the identity operator ρ = Id. However, this approach is not convenient since there are two groups of load profiles in the electricity demand given by workable days and the other days (e.g., weekends or holidays). We use then a smarter version of the persistence model, which takes into account the existence of these two groups. The predictor is then written if day n is Monday, Tuesday, Wednesday or Thursday, X n−7 (t), otherwise.
(8) Table 8 summarizes the MAPE on prediction by type of day for the persistence-based forecasting method. We can observe that the global MAPE errors are several times the reconstruction error we observed above. There is a clear distinction between those days predicted by the previous day and the other ones (i.e., Saturdays, Sundays and Mondays). The lack of recent information for the latter group is a severe drawback and impacts its prediction performance. The increased difficulty of predicting bank holidays is translated into the highest error levels.

FSSM Forecasting
We now present the results for the proposed FSSM. Only the variant 1 in Table 2 is used, namely we consider the covariance matrix of the observations H i as diagonal and the covariance matrix of the states Q i as null. The reason is twofold. First, we show on simulations that basic models give as satisfactory results as the more involved ones. Second, computing time must be kept into reasonable standards for the practical application. Tables 9 and 10 show the MAPE on prediction for days and months respectively for the FSSM approach. In comparison with the persistence-based forecasting, the global error is sensibly reduced with improvement on almost all day types. In addition, improvements are observed in all the months of the year but August (results for persistence-based forecasting are not shown). If we look at the distribution of MAPE, we see that the range is compressed with a lower maximum error but also a higher minimum error. This last effect is the price to pay for having an approximate representation of the function. We may think the MAPE on approximation as a lower bound for the MAPE on forecasting. The higher this bound, the more limited the approach is. Despite this negative result, the gain on the largest errors observed before more than compensates for the increment on the minimum MAPE and yields a globally better alternative. Among workable days, Mondays presents the higher MAPE. FSSM being an autoregressive approach, it builds on the previous days that present a different demand structure. Moreover, the mean load level of these two consecutive days is sharply different. Undoubtedly, incorporating the calendar information would help the model to better anticipate this kind of transition. Both mean load level corrections and calendar structure are modification or extensions that can be naturally incorporated in our FSSM. We discuss some clues for doing this in the next section.

Corrections to Cope with Non Stationary Patterns
We explore two kinds of extensions to add exogenous effects. In the first one, the days are grouped into two groups, workable and non workable days, and the prediction is done separately in each group. In the second one, the day and the month are introduced as an exogenous fixed effect in the model.

Adding Effects as Grouping Variables
We aim to tackle some of the difficulties that non stationary patterns impose on the forecasting of load data by explicitly considering two groups of days: workable (i.e., Monday, Tuesday, Wednesday, Thursday, Friday), and non workable (i.e., Saturday, Sunday and Holidays). We adapt our model FSSM described in Equation (6) by adapting it on each group separately, that is, we consider the model The only difference between models (6) and (9) is the data used in estimation of parameters. In the case of (9), we have two groups of data, workable days and, non workable days. The structure of the matrices are the same as in the model 6, but now they are specific to each group of days. In terms of forecasting procedure, if we want to predict a workable day, we choose the data for the group workable. Similarly for non workable days, in Tables 11 and 12, we present MAPE obtained with this procedure reported by day type and month. The results for this approach globally improve the forecast with respect to the initial model since the global MAPE decreases. In addition, reduced MAPE are obtained for most of the day types. However, we can see also that, for Saturday and Monday, the errors are significantly increased. These days are those where the transition between the groups occur. They share then the additional difficulty of not having the most recent information (that one from the precedent day) in the model. Some cold months' predictions are not improved. Improvements are observed during summer months or months around summer. The high level load demand and variability of winter months impacts the rise of prices and thus makes these months of particular interest. The accuracy in forecasting for these months is important because bad prediction can have a large economical impact for electricity suppliers. One thing we can also do to improve forecasting of load demand is to integrate some exogenous variables such as day types and weather variables. Day types are fixed variables and weather variables are random variables. In this paragraph, we have just implicitly introduced day types in our modeling but not as exogenous variable. In the next paragraph, we introduce in our model the variable day types.

Adding Effects as Covariates
In this paragraph, we introduce in model (6) the variables day type and months. We must have an appropriate presentation of this exogenous deterministic variables before predicting. We choose to create for each day of the week, a binary dummy variable. In total, we have eight days (including holidays) type which we use as eight dummies variables. Each variable takes values in {0, 1}. The value 1 corresponds to the case where the response vector y i is observed on the same day as the day that is being used. For example, if the response variable y i is observed on Sunday, the dummy variable for Sunday takes value 1. The dummy variable for Sunday will take 0 if the response vector y i is not observed on Sunday, but y i takes 1 for other day dummy variable on which it was observed. In addition, we choose a numeric variable that represents the months of the year. We would like to control the seasonal effect of the series with this variable. The months variable has 12 values representing the month of the year. Let us choose Day as days' dummy variables and Month as the numeric months' variable. The model (6) becomes: In terms of mixed linear models modeling, β i controls the fixed effects of the load data. Parameters estimation of (10) is a bit different when using the package KFAS that allows for estimating states' space models parameters. In the case of the model (6), we choose a random approach to estimate α i , which consequently explains the random parts of the load curves. This means that we assume that Q i exists.
Tables 13 and 14 sum up the prediction performance of model 10 using daily and monthly MAPE respectively.  Figure 7 shows the errors of observations equation and their variance H i . It can be observed at the end of year 2016 (from November 2016) that there was a big change in the errors' variance. This change comes from the change in the rhythm of electricity consumers' subscription as clients of Enercoop. In the end of 2016, there was the COP21 (21st Sustainable Innovation Forum) in France. As Enercoop's strategy is to support renewable electricity production and consumption; at this moment, the communication of the image of Enercoop increased in the population. Enercoop was more well known to the French public than ever. This communication brought many customers to subscribe to electricity supplied by Enercoop. For this moment, it is difficult for the model to be accurate because, at this moment, outliers' values were being observed.

A Short Case Study: Prediction at a National Grid Level
We now provide a detailed case study on the construction of a forecasting model using our approach for a more aggregated signal. We move from the supplier point of view to the Transmission System Operator (TSO) point of view. Data in this section come from RTE (Réseau de Transport d'Électricité) (http://www.rte-france.com/)), the TSO of the French electrical grid. These data are openly available (http: //www.rte-france.com/fr/eco2mix/eco2mix-telechargement). The size of the system is a considerably large RTE operating on the largest grid in Europe with over 100,000 km of electrical lines and routing a total gross annual consumption of over 475 TWh. For the reasons detailed below, we incorporate meteorological data in the prediction model. For this, data is obtained upon Météo-France, the national weather service in France (https://donneespubliques.meteofrance.fr/).

Salient Features on the Data
Data is retrieved from RTE API (http://www.rte-france.com/fr/eco2mix/eco2mix-telechargement) where past effective demands but also forecast load demand obtained by RTE are available. The daily load demand is recorded with 96 points per day, which is equivalent to intervals of fifteen (15) minutes.
We use a series of plots to highlight relevant features on the data that should be considered in the construction of a prediction model. The first view of the data is in Figure 8, which shows the time plot of the electrical load from 2012 until the first two months of 2018 (vertical lines separates years). When compared to Figure 5, the most important fact is the absence of a growing trend. Indeed, the gross yearly energy demand in France is quite stable during the period of study, which explains the constant trend. The annual cycle is clearly made evident with yearly larger levels of load demand during winter and lower ones during summer.  Figure 9 illustrates the daily load demand for France from 13 February 2017 to 5 March 2017. As it can be noticed, the load demand is correlated with human activities, which means that the load profile is high on workable days and low on weekends and holidays. Lastly, Figure 10 represents the relationship between load demand and temperature by both a scatter plot and a smooth curve estimated non parametrically. It can be seen that demand increases with cold or hot temperatures.

Model Construction
For the FSSM training and test purposes, we use data from 1 January 2012 to 9 February 2017 as training dataset and from 10 February 2017 to 9 February 2018 for the test dataset.
For the comparison purposes, we make some experiments with model 6 and we introduced some exogenous variables. For our model training purposes, we represent the initial load demand data in a cubic splines basis (with 82 splines) and in a functional principal components basis (with 10 functions). Observing the initial load data, we can notice the annual (see Figure 8) and the week (see Figure 9) patterns in the load data. Therefore, we introduce four fixed variables Week 1 , Week 2 , Year 1 and Year 2 . In a week, we have seven days that we transform to a numeric variable NumDay, and to extract the week pattern in the load demand, we choose Week 1 = sin(2πNumDay/7), Week 2 = cos(2πNumDay/7). For the annual pattern, we introduce variable PosDayYear, which means position of the day in a year (position goes from 1 to 365) Year 1 = sin(2πPosDayYear/365) and Year 2 = cos(2πPosDayYear/365). Finally, we make variables Week = {Week 1 , Week 2 } and Year = {Year 1 , Year 2 }. Year also controls the effects of months, which is why we do not need to introduce months in this model. The load demand depends on the type of the days, as it has been noticed. For this reason, we introduce the day type dummies variables Day as in the case of Enercoop's load demand data forecasting. It should be noted that we add new fixed variables for RTE's load demand data forecasting, which are different from fixed variables used for Enercoop's load demand forecasting. The reason for this choice is the annual and weekly seasonality observed in the load demand data.
We have also observed that the load demand is a quadratic function of temperature (see Figure 10). The target temperature that we consider here is an aggregation of France's 96 departments' temperatures. Usually, RTE considers a few special zones, which represent all of France in terms of temperature. We do not consider this representation method in this paper. It can be observed in Figure 10 that load demand decreases with temperature below 10°C, and increases with temperature above 10°C. The temperatures below 10°C describe the heating season for the load demand and the temperatures above 10°C illustrate the cooling season for the load demand. For this reason, we define two variables Temp Heating and Temp Cooling to be introduced into the model. We then define a variable Temp as Temp = {Temp Heating , Temp Cooling }. Then, the FSSM model used for the RTE load demand data is where

Prediction Performances
Tables 15 and 16 illustrate the results of the RTE prediction model on the test period. It can be observed that forecasting MAPE is quite high during weekends and higher on holidays. The global MAPE is about 1.47%, which means in terms of RMSE to 912.12 MW.  Tables 17 and 18 illustrate the results of our model (11). We can see that, during Saturdays and Holidays, model (11) performs better than RTE's model. MAPE of months shows that, during February, March, May, June and October, our model performs better than the model of RTE. Nevertheless, the global MAPE for (11) is 1.54%, which means 972.25 MW in terms of RMSE, which is slightly larger than for the RTE model. One way to improve our model is to introduce all variables that can change the load signal such as hydro power production with water pumping, wind power production and photo-voltaic power production and consumption.

Discussion and Conclusions
In a concurrent environment, electrical companies need to anticipate load demand from data that presents non stationary patterns induced by the arrival and departure of customers. Forecasting in this context is a challenge since one desires using as much past data as possible but needs to reduce the usable date to the records that describe the current situation. In this trade-off, adaptive methods have their role to play. Figure 7 witnesses the ability that FSSM has to adapt to a certain extent to the changing environment. The impact of an external event to the electrical demand is translated into larger variability on the error and so an inflation of the trace of the errors variance matrix (cf. at the end of 2016).
Forecasting process for Enercoop's load demand in this paper is mainly endogenous. Only some calendar information is used as an exogenous variable. However, in electricity forecasting, it is well known that weather has a great influence on the load curve. For instance, temperature impacts through the use of cooling systems in hot season and electrical heating during cold seasons. In France, this dependence is known to be asymmetrical with a higher influence of temperature on cold days. The nature of this covariable on forecasting is different to the ones we used. Indeed, while calendars can be deterministically predicted, it is not the case for the temperature. Using forecasted weather on an electrical demand forecast inserts the eventual bias and the uncertainty of the weather forecasting system to the electrical demand prediction. Integration of weather information into our model to forecast Enercoop's load demand data, eventually changing the structure of the matrix H i , and obtaining prediction interval for the predictions are perspectives of future work.
For France's grid load demand, we have integrated fixed and weather variables that enable better forecasting than in the case of Enercoop. It is possible to improve the accuracy of our model on this data if there is more time to have more information on all variables that can be introduced in the forecasting process.
In addition, only point predictions are obtained. In a probabilistic framework, one would like to have not only an idea of the mean level anticipation of the load, but also some elements about the predictive distribution of the forecast. Whether it is a whole distribution, some quantile levels or a predictive interval, this information is not trivially obtained from our approach. While SSM does provide intervals through the Gaussian assumptions coupled with the estimation of the variance matrices, FSSM has this information on the coefficients of the functional representation. Transporting this information to the whole curve needs to be studied.
Besides these technical considerations, a natural interrogation is how this model can be used in other forecasting contexts. While the work is focused on short-term prediction horizons, there is no mathematical constraint in using the model on other tight time frameworks. For instance, for a long-term horizon, one would naturally increase the sample resolution to a certainly monthly basis in which case the functional segments could represent the annual cycle. However, as it is well known in practice, predicting long-term patterns without relevant exogenous information is not the best option. Another case to consider is the presence of periodic and seasonal patterns. As it is the case, the electricity demand carries on very clear cycles (e.g., yearly, weekly, daily) that are exploited by our model in two ways. First, the smallest one is taken into consideration within the functional scheme and so all the non stationary patterns inside it are naturally taken into consideration by the model. Second, the largest ones are explicitly modelled as exogenous variables. A last case relevant to discuss is the one where the sampling of the time series is done by an irregular basis. However, in our examples, we only treated regular grid samplings, and the functional data analysis framework allows one to cope with irregular sampling in a natural way. Indeed, the data for each segment (in our case each day) is used to fit the corresponding curve (e.g., the daily load curve) and then estimate the spline coefficients. With this, the rest of the forecasting procedure remains unchanged. Of course, the proposed model could be used to forecast other kind of phenomena. The good reliability and predictability would depend on the nature of these signals.
To sum up, the presented model has enough flexibility to be used in the anticipation of energy demands in the electricity industry, providing credible predictions of energy supply, and improvement on the efficient use of energy resources. This may help to handle theoretical and practical electricity industry applications for further development of energy sources, strategic policy-making, plans of energy mix and adoption patterns [25].