3.1. Non-Spatial GAM: Generalized Additive Model Regression Using Cubic Splines as a One-Dimensional Smoother for Modelling Time
Additive models were introduced by Ref. [
16] to deal with the dimensionality problem of the General Linear Models GLMs through univariate smoothing [
17]. The GAMs Stone approach can be seen as a traditional way of portraying GAMs, which in this case has the demand as the response variable d, and space or time as predictors 
, which can be written as
        
        or its summarised version
        
        where 
 and 
 are the smooth functions over the covariates 
.
The Stone approach fails to include interaction and therefore has unsatisfactory results when interaction exists [
18]. It is here that the approach from Ref. [
11] becomes important, as it not only includes interactions 
 but also includes the link function g over the response variable 
d to analyse 
 within a solution space of any member of the exponential family, usually Gaussian-Normal, but it is open to Gamma, Poisson, or Binomial [
17]. Then, a Generalized Additive model over the demand according to the Hastie and Tibshirani approach can be written as
        
        where 
 are the smooth functions over the covariates 
 allowing for both single basis smoothing 
 and interactions 
 and 
. In this approach, 
 is a regression for the variables that will not be modelled additively. Hastie and Tibshirani defineed the procedure for estimating 
 and 
 as a local scoring procedure for generalized additive models.
In the early 2000s, Ref. [
19] provided a more recent perspective to the GAM definition developed by Ref. [
11], denoted in Equation (
13). The author directly defined the iterative procedure with a model structure that has index 
i, in which 
 is represented as the product of the model matrix 
A and a vector of the associated model parameters 
, where the intercept is assumed to be included. 
A is a design matrix for the covariates 
x with linear effects. The author also included a linear operator 
 to multiply the smooth function 
.
        
        where 
 is distributed according to an exponential family EF that has median 
 and variance 
, 
.
The cubic regression spline smoother is represented through the function 
 where 
 is a base function of the spline and 
 is the vector containing all the basis coefficients such as
        
Finally, a vector of coefficients 
 that contains the linear fixed effects vector 
 followed by all the basis coefficients 
 for 
 such as 
 is defined. Then the estimation of the vector of coefficients 
 from the GAM model will be done by penalized maximum likelihood, where the optimal is given by some smoothing parameters 
 [
19]
        
        where 
 is a penalty matrix and the deviance of the model 
.
Note that not only the optimal vector of coefficients 
 must be estimated but also the parameters themselves 
. The smoothing parameters estimation 
 is done either by prediction error criteria (GCV) or maximum likelihood (ML or REML) [
20].
The method that we will be used in this article for selecting the smoothing parameters 
 is the Restricted Maximum Likelihood, REML. This will be done instead of other smoothness selection methods such as ML, Maximum Likelihood, and GCV, Generalized Cross-validation. In REML unlike ML, the estimation of variance components is better, as it has a higher probability of providing a complete rank of estimates of the variance components [
21]. On the other hand, REML is preferred over GCV due to it having a higher penalty for overfitting and being less likely to have multiple minimum points, therefore being less likely to have highly variable smoothing parameters 
 [
22].
In the literature, demand electricity is usually modelled by three patterns: daily, weekly, and monthly. Ref. [
4] modelled the previously mentioned patterns through a GAM by preselecting the knots according to the number of unique values of the variables included in the model. In that case, when modelling the patterns, only continuous variables were included with 24 knots for the daily pattern, 7 knots for the weekly pattern, and 12 knots for the monthly pattern. In this case, the daily pattern was included in the model through an hour variable, the weekly pattern through a categorical variable that differentiates days between weekend and weekdays, and instead of including a monthly pattern a trend pattern was included. In the model, the interactions between categorical and numerical variables were included. Knots were not set up based on unique values but on sensible values according to the underlying data.
Initially, the daily pattern was included in the model using a cyclic spline smooth over the discrete variable hour with a range from 0 to 23, . After this, the weekly pattern was included in the model as a categorical variable, wk, where weekly days were classified as 1-Monday and 7-Sunday, and were aggregated into two factors, . The corresponding interaction between the discrete numerical variable that accounts for the daily pattern, hr, and the categorical variable, wk, that accounts for the weekly pattern was also included using a factor smooth-interaction with the argument inside the smooth function, s, in the following way . Finally, trend pattern was fitted not by using the month’s numeric notation from 1 to 12 or the days of the year from 1 to 365, but through a trend variable that goes from 1 to 43,800, that accounts for the total number of data points available days per years per hour between 2013 and 2018 through a cubic spline smooth . The interaction between the daily pattern and the trend pattern was fitted using pairwise bivariate tensors product interactions .
This model can be expressed as
        
        where 
 accounts for the linear effects and the f function accounts for the non-linear effects defined by each smoothed covariate. The super indexes 
 correspond respectively to the Seasonal, Trend, and Interaction components. In the model 
 with the response variable demand being distributed as 
.
As can be seen in 
Table 1, the resultant temporal model accounts for eighty-five percent of the electricity demand variation.
The Wald 
t-test in 
Table 2 and 
Table 3 for the non-spatial GAM model shows that all smooth terms and parametric terms are highly significant.
The knots were preselected, but not based on unique values, as in Ref. [
4], but on sensible values that reduce the heteroscedasticity of the data as much as possible while maintaining them at logical levels below 24 knots. Splines over 10 knots are rare and can lead to model overfitting, so in this case the upper limit was set to 24 because of the number of unique values of the variable hours. Previous literature also points out that selecting unique values for time related variables makes a better fit.
Heteroscedasticity which indicates if the basis dimension 
k is too low, was checked through the 
k-index in 
Table 4. The model was divided between the residual variance to get a 
p-value.
The hours variable and its respective interactions showed no evidence of heteroscedasticity with a 
p-value over zero. On the other hand, the number of data points was clearly heteroscedastic with a 
p-value around zero. For reducing heteroscedasticity in the model, it was attempted to increase the number of knots for the data points variable up to 50, but this had little to no effect in the 
k-index or 
p-value. Fifty knots were chosen as an upper limit to test the heteroscedasticity values due to it accounts of two years of data. Going above this value was not explored because the size of the data does not call for a high number of knots, considering that there were only five years available. The residual vs. fitted plot (See 
Figure 4—second plot from left to right) also shows proofs of heteroscedasticity or non-constant spread in the data.
The model demonstrated convergence after ten iterations supporting the data being enough for the number of parameters. However, by looking at the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) it can be seen there is remaining autocorrelation that the model is not capturing (See 
Figure 5).
As is the case for GAMs, the correlation among covariates was tested through concurvity instead of collinearity due to the non-linear nature of the model. As expected, there is acceptable concurvity between individual effects and interactions (See 
Figure 6). There is no reason to believe that smooth terms of the model approximate to other terms.
Figure 7 exhibits flat line in the density graph (first line, third figure) for the interaction of hours and weekend indicating there is no much of an effect in the interaction of such variables.On the contrary, the interaction between hours and weekdays shows a clear spline pattern in the density graph (first line, second figure). Both the individual effects of hours and data points portrayed in the first figures of the first line and second line show splines with a higher degree of curvature than the interactions.
   3.2. Spatial GAM: Generalized Additive Model Regression Using Markov Random Fields as a Two-Dimensional Smoother for Modelling Space
Often, space modelling is associated with Geographically Weighted Regression, GWR [
23], which according to Ref. [
24] is based on the Varying Coefficient Model, VCM proposed by Ref. [
25]. Although there are many other types of modelling such as the Geoadditive model [
26], this one is specifically based on GLMs. In this article, the space modelling is going to be based on GAMs, as proposed by Refs. [
24,
27] and later incorporated by Ref. [
8].
The spatial modelling will be performed through GAM using GMRF over areal data, which is often considered a separate spatial modelling technique [
28] but in this case it was incorporated to the Generalized Additive Model as a two dimensional smoother so it can account for the model covariance as a function of the location, as proposed by Refs.  [
8,
27].
Areal data is important for its various applications not only in the energy demand fields where observations could be allocated to a region according to the electricity operator but also to census and survey data where observations are aggregated to administrative districts.
Spatial GAM has the same structure as a non-spatial GAM with the addition of a term that accounts for the spatial effect represented through the smooth function 
 [
27] such as
        
        where 
.
In Equation (
21), 
 accounts for the linear effects of covariates 
x while 
 accounts for the non-linear deterministic effects defined by the smoothers and 
 accounts for the spatially correlated effect that in the equation is calculated by the GMRF.
Here the dependence of all the terms from the spatial locations is suppressed for simplicity, but it is assumed when including the Gaussian stochastic process assumptions through , therefore no subindex s related to the locations is included.
Equation (
21) can also be represented in terms of the vector of spatial effects 
v [
27] through an incidence matrix 
, which ensures that each observation is assigned the spatial effect of the corresponding 
s region it belongs to by assigning ones or zeros according to whether that observation 
i has been collected in that region 
s or not
        
        which translates to
        
After establishing the Generalized Additive Model, GAM, this article will proceed to explain the assumptions under the spatially varying effects vector, v.
To understand how the spatially varying effects vector 
v operates and its underlying assumptions, it is necessary to briefly recapitulate the fact that the spatial effect vector forms part of a spatial function 
 expressed as 
 for the specific case of discrete spatial areal data. 
v distribution follows a Gaussian Markov Random Field, GMR,F prior distribution conditional on the variance parameter 
 [
24]. In numeral 2.3, it was mentioned how Gaussian Markov Random Fields, GMRFs, follow a multivariate normal distribution such as 
. In this case the multivariate normal model does not fully specify the join distribution of 
 because the variance 
 is known and denoted as 
, but 
 is a completely unknown reason, which is why we say that the prior is an improper prior, making the Gaussian Random Field be an Intrinsic Gaussian Random Field, IGMRF [
29]. The use of improper prior distributions is widely done and preferred in Bayesian statistical analysis when the information about parameters is weak or missing [
30]. According to Ref. [
15], Intrinsic Gaussian Random Fields, IGMRFs, can be defined in both regular or irregular lattice. A lattice is a structure or pattern for arranging an area of land into spatial units called sub-areas, cells, or units [
31]. As stated in Ref. [
31] these sub-areas share boundary edges between a single or more sub-areas and cannot intersect with each other. As regards the regularity or irregularity nature of the lattice, it all depends on the application over which spatial data is going to be analysed. Radar data, for example, is usually based on regular grids because the sub-areas formed have the same shape and size. However, when the spatial data are administrative regions, as in this case, then the nature of the lattice is irregular because they are defined following the natural shape of the territory, often setting boundaries according to natural features such as rivers or mountains. This applies only for first level divisions of a country such as states or departments. The ones that follow census units are not related with the territory but are more defined for data collection purposes. Due to the idea being to estimate 
 from the spatially varying effect vector 
v, then the deviation between spatial parameters 
v of adjacent regions such as 
 will proceed to be measured, which carefully corresponds to the form of a random walk of first order such as 
 with 
, where 
s exists in a set of regions 
 ordered in a neighbourhood structure called 
 where 
 [
27], which is what Ref. [
15] define as a First-order intrinsic GRMF on irregular lattices and for which they use the German regions as an example. So, basically, a first-order intrinsic GRMF on irregular lattices as defined in Ref. [
15] and later used an incorporated as a spatial smoother for modelling Generalized Additive Models by Refs. [
8,
27] is an independent Gaussian relationship between neighbouring regions 
i and 
j such as 
, also called an independent Gaussian increment or deviance between adjacent regions, which yields to the multivariate Gaussian Intrinsic GMRF prior, as denoted by Ref. [
27].
        
This relation or deviance between the adjacent region shown in Equation (
24) as 
 and [
15,
32] can also be represented as 
, where 
 symbolizes the set of all unordered pairs of neighbours. The pairs are defined as unordered to prevent double-counting due to the fact that if s is a neighbour from r, the order in which they get expressed 
 does not matter, as seen before in numeral 2.3.
The quadratic form 
, which represents the difference between the 
X vector and the 
 vector in the multivariate Gaussian prior of the GAM. can also be expressed in a matrix-vector form as shown in Refs. [
24,
27,
33].
Where 
v is the spatially varying vector, 
 is the transposed version of the spatially varying vector, and 
K is a precision matrix, otherwise known as 
Q [
15].
Letting 
 represent the number of neighbours for region 
s such as 
 the elements of the 
K precision matrix from Equation (
23) as
        
The super index of 
 could be expressed as the rank of improper precision matrix 
K  [
34]. The precision matrix 
K is called improper because it does not have a full rank. The rank of the precision matrix 
K from an Improper GRMF of order 
k is usually expressed as 
 [
15].
Ref. [
27] states that the multivariate Gaussian Intrinsic GMRF prior for spatial effects vector 
 expressed in Equation (
21) and its corresponding equivalent Equation (
23) can also be articulated under full conditionals as
        
Equation (
26) can also be expressed in terms of known unequal weights based on the extent of the common boundary or the distance to the region centroids [
27,
32]. but the application in this document is going to be restricted to equal weights 
 for each neighbouring regions 
, making 
 have no effect over the Equation (
26), which is why we are not including it.
In order to run the spatial GAM, data was aggregated into regions based on the energy operators, as previously mentioned.
The Spatial Component from the GAM was defined using a neighbourhood structure to get a precision penalty matrix 
K for the regions. Neighbouring regions were defined by making a neighbours list for each region based on contiguous borders, as shown in 
Figure 8.
This model can be expressed as Geoadditive regression because it has no further terms but the one with the spatial component such as
        
        where 
 is the incidence matrix between observation i and region 
s in which the observation 
i was collected, and 
v is the spatially varying effects vector that follows a Gaussian Markov Random Field, GMRF, prior distribution conditional on the variance parameter 
 such as 
 being 
 the number of neighbours for region s that can also be represented as 
 and in this case would represent the neighbourhood relationships shown in 
Figure 8.
The purely spatial GAM with Markov Random Fields has as a constraint, as it can only be limited to cross-sectional data where the precision penalty matrix K dimensions are based on the set of regions, such as K, which has  dimension. This is why the application of the GMRF in this dataset was limited to aggregated data from one-year—2018.
As can be seen in 
Table 5, the resultant spatial model accounts for seventy-seve percent of the electricity demand variation. The Akaike Information Criteria AIC for this model is not truly comparable with the non-spatial GAM, which is considerably higher.
The was no need to calculate the concurvity, because the model was only dependent on one covariate, which was the Markov Random Fields, and the diagnostic showed that the model converges after five iterations, which means the data is enough for the number of parameters.
Moran’s I test was calculated to test the spatial autocorrelation in the data, and, as can be seen in 
Table 6 and 
Table 7 the value obtained showed no autocorrelation meaning the assumption of independence was maintained.
  3.3. Extension of the Non-Spatial GAM with a Spatial-Categorical Regional Component
After running both temporal and spatial models as separate. A final model was built, where a covariate accounting for the regions was included over the already tested temporal one. Unfortunately, there was no way of doing this with Markov Random Fields across longitudinal data.For this reason, it was decided to use a categorical variable that had the ID of the regions such as
        
The components remain the same as the ones in Equation (
28) for the non-spatial GAM with the addition of the categorical component that accounts for the IDs of the regions. The link function keeps being Gaussian, therefore 
 is maintained.
As can be seen in 
Table 8, the resultant extended non-spatial model with the spatial categorical ID variable accounts for ninety-six percent of the electricity demand variation, which is ten percent more than the temporal model by itself.
The Wald 
t-test from the summary in 
Table 9 and 
Table 10 for the extended model shows that all smooth terms and parametric terms are highly significant.
The Akaike Information Criteria AIC for this model was nine times higher than the non-spatial GAM which points to the non-spatial model being a better fit even when the extended model explains more of the demand variation.
The model converges after seven iterations, showing the data being enough for the number of parameters, even where the heteroscedasticity still persists for the time individual covariate and its respective interaction as shown in 
Table 11.
Regarding concurvity, the conclusions from the non-spatial model are maintained. There concurvity of minor magnitude between the individual effects and the interactions.
For the extended model the conclusions remain the same as for the non-spatial GAM (See 
Figure 9). In this model there is additional information about regions not included in the non-spatial GAM. The model shows predominance in the energy consumption of regions three-Centro, seven-CostaAtlantica, and two-Antioquia when compared against other regions.
For the purpose of comparing both models, the non-spatial GAM without the ID variable was fitted again over the same dataset as the one used for the extended version, so we could run an added variable test to check if the extra variable was contributing to explain the response or not.
The results in 
Table 12 show that there is strong evidence that the spatial categorical variable does contribute to explaining the response even when an AIC test between the two models prefers the model with fewer terms.
All analyses were performed using R Statistical Software (v4.0.3; R Core Team 2020) [
35]. Generalized Additive models were fitted using the mgcv R package (v1.8-33; Wood 2020) [
20]. Splines were outlined using the mgcViz R package (v0.1.6; Fasiolo 2020) [
36]. Pairwise concurvity was portraited with a heatmap using base R. Maps were elaborated with the ggplot R package(v3.3.2;Wickham 2020) using geometries from the special features sf R package (v0.9.6; Pebesma 2020).