Performance Analysis of Short-Term Electricity Demand with Atmospheric Variables

Abstract: The quality of short-term electricity demand forecasting is essential for the energy market players for operation and trading activities. Electricity demand is significantly affected by non-linear factors, such as climatic conditions, calendar components and seasonal behavior, which have been widely reported in the literature. This paper considers parsimonious forecasting models to explain the importance of atmospheric variables for hourly electricity demand forecasting. Many researchers include temperature as a major weather component. If temperature is included in a model, other weather components, such as relative humidity and wind speed, are considered as less effective. However, several papers mention that there is a significant impact of atmospheric variables on electricity demand. Therefore, the main purpose of this study is to investigate the impact of the following atmospheric variables: rainfall, relative humidity, wind speed, solar radiation, and cloud cover to improve the forecasting accuracy. We construct three different multiple linear models (Model A, Model B, and Model C) including the auto-regressive moving average with exogenous variables (ARMAX) with the mentioned exogenous weather variables to compare the performances for Hokkaido Prefecture, Japan. The Bayesian approach is applied to estimate the weight of each variable with Gibbs sampling to approximate the estimation of the coefficients. The overall mean absolute percentage error (MAPE) performances of Model A, Model B, and Model C are estimated as 2.43%, 1.98% and 1.72%, respectively. This means that the accuracy is improved by 13.4% by including rainfall, snowfall, solar radiation, wind speed, relative humidity, and cloud cover data. The results of the statistical test indicate that these atmospheric variables and the improvement in accuracy are statistically significant in most of the hours. More specifically, they are significant during highly fluctuating and peak hours.


Introduction
Short-term electricity demand or load forecasting is a way of estimating future demand for a short time horizon, commonly an hour to one week ahead.According to the time horizon for prediction, Apadula et al. [1] classified the load forecast into four categories: very short-term forecasts (from a few minutes to 1 h ahead), short term forecasts (from 1 h-1 week ahead), medium-term forecasts (from one week to a year head) and long-term forecasts (longer than a year ahead).Such differences in the lead time influence the choice of models and methods to apply, as well as the selection of important external factors affecting the electricity demand (socio-economic, atmospheric, seasonal and time-dependent factors) [1].For short-term demand forecasting, we exclude socio-economical factors and consider atmospheric, seasonal and other short-term dependent variables.To avoid an ambiguous presentation, we note that the rest of this paper uses two terms: 'demand forecasting' and 'other weather variables' to refer to 'electricity demand forecasting' and 'rainfall, snowfall, solar radiation, wind speed, relative humidity, and cloud cover', respectively.
Knowledge of electricity demand behavior in advance is crucial for the planning, analysis and operation of power systems to assure an uninterrupted, reliable, secure and economic supply of electricity.These days, electric utility industries are undergoing a highly competitive market-driven industry.Every electricity provider is focusing on decreasing power generation cost by increasing the efficiency of load demand management.For the market operator, forecasting is crucial for scheduling and dispatch of generators' capacity.For electricity generators, the strategic choice involved in bidding and re-bidding of capacity depends on demand forecasts [2].Such bidding and re-bidding processes determine the market volatility for the electricity price.Similarly, for the electricity retailer, demand forecasting affects the decisions about the balance between the hedging spot acquisition of electricity for consistent energy supplies without black outs and possibly minimum cost.Improving the accuracy of electricity forecast could reduce the power marketing risk and keep the market stable for electricity demand and price, as well [3,4] Therefore, accurate electricity load demand forecasting for different time horizons is a very hot research issue.
Most electric utilities serve residential, commercial and industrial area customers.In residential areas, electricity depends on human activities.Therefore, electricity demand may rise to a peak during the morning and evening.Residential customers are very sensitive to weather fluctuations, but calendar (weekday, weekend, and holiday) effects have very low predictive power [5].Retail stores, restaurants, hotels and educational institutes are commercial customers, and their demand is affected by business schedules and some weather behavior.This results in electricity demand dropping significantly during weekends or holidays.Industrial customers play a vital role to increase or decrease electricity demand.However their characteristics are also similar to commercial customers and quite difficult to predict.In our study, aggregate demand data are used to analyze the impact of atmospheric variables; therefore, it seems more challenging.
Many authors emphasize univariate models with historical data and perform forecasting with acceptable accuracy [6].Despite that, [7] suggest that the level of accuracy can be improved by including weather variables.Furthermore, the Intergovernmental Panel on Climate Change (IPCC) suggests that global temperature is gradually increasing, and there is evidence of an increase in the frequency of extremes [8].These extremes directly affect electricity demand.According to IPCC, global warming has already made the world 0.74 • C warmer over the last 100 years, and temperatures are probably going to increase by 1.8-4 • C by the end of the century.Therefore, many authors have studied the impact of weather on electricity demand [1, 7,[9][10][11][12][13][14]. Historical data of electricity demand show negative linear effects on load demand during the winter season.The experience of many utilities is that atmospheric variables that cause an influence on electricity demand are temperature, humidity, wind speed and precipitation, in decreasing order of importance [15].This conclusion is drawn from the study of a considerable number of atmospheric variables on electricity demand for Greece for better forecasting performance.All these scenarios envision that electricity demand essentially depends on meteorological variability [1].However, only temperature is considered as the most essential variable [7,16,17].There are three main reasons behind excluding other weather variables: (i) they show lower impact on electricity demand [18]; (ii) it is expensive to install weather stations to collect all these data; and (iii) there are potential collinearity problems when simultaneously employing several weather variables as explanatory variables [16].However, our main concern in this paper is to analyze the impact of these other weather variables to enhance forecasting performance because [6,7,17] strongly recommended that including meteorological variables can improve forecasting accuracy.
An appropriate modeling technique in demand forecasting is crucial for high accuracy and stable performances.In short-term load demand forecasting, the well-cited papers [2,5,7,12,14,[19][20][21][22] implemented statistical techniques especially multiple linear regression (MLR), semi-parametric and ARMAX models.Similarly, artificial intelligence techniques especially artificial neural network (ANN), fuzzy and support vector machine (SVM) models were implemented in [9,10,[23][24][25].Several modeling concepts for robust parameter estimation used prior to 1990 were discussed in [26], and it was concluded that MLR was superior.Interest in ANN modeling for electricity demand forecasting began in 1990 because of its fairly good results.Nevertheless, these good results were due to the ability to peek into the future and undergo over-fitting due to either over-training or over-parameterization [27].For example, the non-linear autoregressive ANN model was implemented in [25] and achieved 30% improvement on the average error compared to traditional ANNs, ARMAX and state space models.In fact, ANNs are black-box techniques, and they do not offer insights into the relation between electricity demand and driving factors.Hippert et al. [23] analyzed 40 papers and concluded that without solid support, ANNs are good enough for short-term demand forecasting.Chow et al. [10] agreed that MLR models are capable of characterizing the relationship between load demand and other exogenous factors, but this is a complicated modeling technique and requires enormous computational effort to produce acceptable results.Various time series models such as seasonal autoregressive integrated moving average (SARIMA) and the SARIMA model with exogenous variables are not able to react fast enough to rapid demand changes that deviate from the historical demand forecast [5].The MLR model with a dynamic error structure and adaptive adjustment of forecasted error proposed in [19] was the winner of a load forecasting competition organized by a utility in the U.S. Hong et al. [28] reviewed the modeling techniques of winning teams in the Global Energy Forecasting Competition 2012 (GEFCom2012), where all four wining teams applied regression analysis, while only two teams implemented ANN.Apart from neural nets, other traditional and adaptive techniques such as SARIMA and regression ARIMA (RegARIMA) have been compared to linear regression for cognate energy prediction with weather variable selection.The result showed that linear regression was highly effective and better than other sophisticated techniques for the majority of simulations in [29].Therefore, more robust modeling techniques comprise our concern to handle the chaotic behavior of atmospheric variables.
Three different scenarios are discussed in this paper before drawing some conclusions.In the first scenario, an MLR as the ARMAX model with deterministic variables, historical demand and the interaction of days with historical demand data was constructed and named Model A. In the second scenario, temperature-derived variables and their interaction with months were added to Model A, and this new model was named Model B. Similarly, in the third scenario (named Model C), other atmospheric variables were added to Model B. These models were trained with a rolling window of two years of data and forecasted for one year out from the sample data.Finally, we interpreted the impact and significance of weather variables for short-term demand forecasting and also compared the forecasting accuracy among these three models.
This paper contributes to short-term demand forecasting by showing that atmospheric variables, such as relative humidity, wind speed, rain fall, solar radiation, and cloud cover, improve the accuracy, compared to the baseline models Model A, Model B, and Model C with temperature (alone) for Hokkaido Prefecture.Analysis of the individual impacts of the atmospheric variables is also a crucial contribution.We believe that the selection of the best weather stations and estimation of the exhaustion temperatures, the base temperatures for cooling degree days (CDD) and heating degree days (HDD) comprise a pioneer work for short-term electricity demand forecasting in Hokkaido Prefecture, Japan.No published study to our knowledge has taken an MLR approach with an ARMAX modeling for short-term demand forecasting in Hokkaido Prefecture.
The organization of this paper is as follows.Section 2 describes the characteristics of electricity demand data, and atmospheric variables are discussed.Different modeling techniques, methodologies and results related to this paper are presented in Section 3. Section 4 demonstrates an extensive empirical analysis of variables for the quality of the model fit for our dataset.An estimation and a forecasting technique followed in this paper is discussed in Section 5. Section 6 deals with the comprehensive discussion of the predicted results and analysis, and Section 7 concludes this paper.

Description of Data
The data comprise hourly electricity load demands from the area covered by the utility named the Hokkaido Electric Power Company (HEPCO), ranging from 1 January 2013 to the last day of 2015.The same period of the atmospheric dataset from Japan Meteorological Agency (JMA) is used for this study, which is freely available.There are no missing data on electricity demand, but some missing snowfall, and cloud data are filled up using the interpolation technique.The dataset then is separated into 24 subsets, each containing the load for a specific hour of the day, and used as single series.Such an hour-by-hour approach has been implemented by [19] and won a load forecast competition.

Electricity Demand Data
Hourly data of 26,280 samples are made available by HEPCO, Japan.Unlike many other countries, HEPCO electricity data exhibit a significant and persistent downward trend (Figure 1a), which is mainly associated with the economic, demographic and population growth of the country.Firstly Japan is already at the saturation level of development and has negative population growth.Secondly, people are shifting their interest towards renewable energy and implementing a smart switching system for residential and commercial buildings.Strong weekly and monthly seasonal patterns are superimposed on this yearly trend (Figure 2b) and are highly correlated with atmospheric factors, especially temperature.Hokkaido is a very cold region; therefore, the peak level of demand occurs during December-February (winter).During this season, the average temperature drops down to −5 • C, causing intensive use of electric heating appliances.Figure 1b shows the day type variation of electricity demand for all hours with various peaks.After midnight, electricity demand starts to rise until morning Hour 5. On holidays, people enjoy holiday activities during night hours.Therefore, demand rose to the maximum level on holidays.As was our expectation, electricity demand on weekdays dramatically rose after Hour 8 due to office hours and dropped again during lunch hours (noon to Hour 13).However, during the weekend, electricity demand significantly dropped to the minimum level at Hour 8 and slightly rose during the day time.The pattern of rising or falling electricity demand for all days is the same during day hours, and the peak is at Hour 19.The contour plot in Figure 2a illustrates the details of average electricity demand for all hours of the day and all months of the year from 2013-2015.During the winter season, especially the last week of December to the first week of February, there are two peaks of demand in the morning and evening time.Morning peaks exist from Hour 4-Hour 6 and an evening peak approximately from Hour 18-Hour 19 of 4800 megawatts (MW) in magnitude.However, in the summer season, the demand level dramatically falls to an extreme low during the morning time.Since Hokkaido Prefecture suffers from a very cold climate during the winter season (30-year mean temperature in January: −8 • C), people use electricity for warming purposes such as room heating, building heating, and water heating.Furthermore, the variation of the pricing and the needs of people cause excessive demand during the morning and evening time.The lowest demand for a similar time period is found in June, but soon after, a significant rise in temperature causes electricity demand to rise again until August, which is the hottest month (30-year mean temperature in August: 22 • C).This is exactly the opposite reason that causes and exhibits seasonal variation to that of winter (Figure 2b).

Atmospheric Variables
The climate of Japan is showing long-term changes, and the frequency of extremes of maximum temperature is increasing, as well as annual temperature is expected to increase.A good review paper about the impact of climate change for energy consumption [30] discussed the demand for energy, coal, gas, oil and electricity for Japan.The paper concluded that energy consumption was responding according to the variations in temperature.The higher the temperature, the higher will be the consumption during the summer in warmer countries; and the lower the consumption during winter in colder countries.This is unlike [30], because the cold region Hokkaido, Japan, shows a higher demand during winter.Therefore, the analysis of the impact caused by following atmospheric variables is of interest for short-term demand forecasting: Temperature contributes the most to the majority of load forecasting variations in electricity demand.Increased temperature during the winter will reduce electrical heating load.Conversely, in the summer, electrical heating is normally absent, but high temperatures increase the refrigeration and air-conditioning load.Wind speed may have some effects on air conditioning and cooling fans because of air chill.However, its effect is relatively localized.Rainfall also shows a localized impact on electrical heating demand similar to wind speed.If it is raining, there must be cloud cover, no radiation and the possibility of wind, as well.Therefore, these atmospheric variables are intra-correlated among themselves.Wet conditions can affect the efficiency of air-vented dryers and can be considered as the effect of relative humidity.This means that during the periods of high humidity, electricity demand tends to increase.However, in Hokkaido region, electricity demand decreases with the increment of humidity in the summer (Table 1) and vice versa in the winter.Cottet and Smith [20] resolved the nonlinearity issue by the interaction of temperature and humidity.Besides the relative humidity, an increase in cloud cover can also increase demand for lighting purposes.Recorded hours of sunshine in a period are a more direct indicator of cloud cover.In this study, forecast models are trained and tested using measured atmospheric data instead of forecasted data.From three years of sampled data, we have plotted the correlation matrix of all weather-related explanatory variables of electricity demand in Figure 3.There is a positive correlation of electricity demand with rainfall, snowfall, wind speed, and cloud cover, i.e., the increase of rainfall, snowfall, wind speed, and cloud cover causes an increase in electricity demand.However, there is a very high negative correlation with temperature (−0.68), indicating that the overall temperature in Hokkaido is cold, and this cold causes an increase in electricity demand.We can notice that other weather variables, except temperature, have a small correlation with electricity demand, indicating a lower impact on electricity demand.This is one reason why many researchers have used only temperature as the weather variable.The collinearity problem [16] is another reason for ignoring these weather variables, if temperature is already used in the model.This is because the temperature is expected to be correlated with radiation, cloud cover, and wind speed, as well.Furthermore, weather variables are correlated among themselves.For example, if there is rainfall, there is the possibility of cloud cover, humidity, wind speed, and snowfall.
The correlation of atmospheric variables with the electricity demand is measured in Table 1.For both seasons, temperature is the factor most correlated with electrical demand.Solar radiation has a negative correlation during the winter season, while it has a positive one during the summer.In the winter time, a significant lighting and heating load coincides with the lower temperatures; therefore, the solar radiation tends to provide warmth, resulting in less use of heating appliances.Conversely, in the summer, demand tends to increase because of solar radiation.Similarly, relative humidity and precipitation show a positive correlation during the winter and the opposite in the summer.However, Solar radiation and relative humidity during the summer show approximately equal and opposite correlation results with minimization of their individual effects, and a strong correlation of temperature remains a dominant factor for electricity demand during the summer.
Since there is a strong correlation between electricity demand and temperature, special analysis of this relationship is required.In the Hokkaido data, it can be seen that there is a strong non-linear relationship between demand and temperature (Figure 4).Electricity demand is increasing for an increase or decrease of temperature from a certain point known as the base temperature.It is reported in the literature that the base temperature varies from 12.9-22 • C (Table 2).Some industrial sectors use a very low base temperature (e.g., refrigerators or department stores).Local outdoor cooling sources and adaption to human comfort levels may cause these large variations of the base temperature [31].The relationship (Figure 4) between temperature and electricity demand is usually asymmetric.Therefore, third or higher order polynomial models are preferred to determine the base temperatures for minimum electricity demand and exhaustion point(s) (one or more points), rather than the second order polynomial.In this paper, we implement two techniques: (i) polynomial models (2nd, 3rd, 4th and 5th); and (ii) piece-wise linear models, as mentioned in Table 2.There is an approximately constant slope of electricity demand, no matter how much the temperature drops from −10.2 • C and no matter how increased from 33.18 • C.These two points are defined as exhaustion points.Similarly, the base temperature for minimum demand is observed at 16.28 • C.This point is important because there is no effect of temperature on electricity demand.However, our piece-wise linear model evaluates this point at 17.1 • C, from 2 p.m. data.
In piece-wise linear approximation, 2 p.m. demand from weekdays is used.The least squares estimation technique is used to approximate HDD and CDD points by varying the number of knots  from minimum to maximum temperature.From the eight knots' (at temperatures of −7.9  4b explains the temperature between 15.65 • C and 21.53 • C, exhibit base temperatures.In this temperature range, electricity demand has a negligible response to the variation of temperature.Therefore, we implement these two points as base temperature points to calculate HDD and CDD for our further calculation.

Weather Station Selection
JMA has installed 226 automated weather stations at different location of Hokkaido Prefecture.Our interest is to observe the most effective weather station concerning electricity demand variation.In this paper, a combination of three weather stations with the simple mean is used because of the lower MAPE and lower variance values.The three selected stations are the regional headquarter weather stations of Sapporo, the special automated weather station of Hakodate and Kitami.The details of the methodology are explained in Appendix A.

Related Works
In the literature, many authors have developed a model without any exogenous variables.Even so, they show competitive forecasting performance with many multivariate models.For example, Taylor [6] employed double seasonal exponential smoothing for half hour data and predicted very good results with MAPE of 1.2-2% for half hour to day ahead lead time forecasting.Nevertheless, only historical demand dataset may not sufficiently address the effect on electricity demand because temperature variation is also an important factor that directly influences electricity demand.Harvey and Koopman [39] used a time-varying splines model with temperature to get MAPE performance up to 3%.After 2003, climate change has been affecting the variation of demand such as modification of the annual daily load curve and shifting of the peak demand occurrence from evening to morning in Jordan [14].In Europe, extremely high temperatures during the summer of 2003 created significantly greater electricity demand.Ihara et al. [36] reported that a 1 • C rise in temperature was equivalent to a 180-MW increment in electricity demand for Tokyo, Japan.They also studied the impact of other weather variables for peak demand forecasting.The recent paper by Staffell and Pfenninger [40] concluded that electricity demand and supply are becoming increasingly weather dependent in the U.K. In [31], weather sensitivity for short-term prediction of electricity demand was analyzed for warm working days for Adelaide, Australia.For a long lead time forecasting model, weather prediction shows higher uncertainty, but for a short lead time level of uncertainty is very low.Therefore, Trotter et al. included weather uncertainty for demand forecasting in Brazil [22].Apadula et al. [1] mentioned the role of various weather-related components quantitatively for electricity demand in Italy.Lusis et al. mentioned that the atmospheric effect on demand is significant [5].
Weather variables, such as temperature and humidity, were employed for modeling electricity load consumption in [20], and it was concluded that including these variables yields better results with consistent performance.Factors affecting electricity demand in Athens, Greece, and London were studied in [21].They compared both weather and non-weather factors and found that temperature is the most sensitive for electricity load demand, especially in Athens.Satish et al. [41] studied the effect of temperature with 20 training patterns and compared the results, but there is no comprehensive analysis with atmospheric variables.Friedrich and Afshari [42] investigated the results for Abu Dhabi city electricity load using multiple weather variable for a 24 h-48 h prediction horizon and obtained a very promising result of 1.5% MAPE.Apadula et al. [1] analyzed weather and calendar variables affecting monthly electricity demand using an MLR model for Italy and concluded that the calendar component considerably contributed from January-May with 4-7.7%.The contribution of the cooling degree months ranges from 3.4-7.6%during the summer months, while the heating degree months of winter months was lower.The influence of the cloud cover variable is always below 1% throughout the year.They improved monthly demand forecast by 1.3% by including atmospheric variables.However, individual atmospheric variables were not analyzed comprehensively.
Exploring the link between electricity use and temperature is important to assess the impact of climate change on energy demand.Ongoing climate changes have promoted an increased scientific interest in the study of the relationships between energy consumption and weather variables.Similar to our result, [1] for Italy [43] and [12] for the European region concluded that including all weather-related variables can give the lowest MAPE, but the improvement in accuracy was usually smaller.In [12], the European region is classified based on their average monthly temperature into three groups: cold, intermediate and warm.Cold countries demonstrated almost a linear relationship with only heating components.For intermediate countries, the cooling component is clearly visible though dominated by the heating component.Warm countries presented a highly non-linear relationship with a cooling component of a similar scale as heating for Adelaide [31] and explained that for Japan [36].When wind speed is above some value, electricity usage is increased for heating in the winter as it increases the feeling of coldness when temperatures are low.For summer, the correlation between wind speed and electricity demand is not clearly visible.The value of humidity is reported to matter only if it coincides with the specified value of other variable.The feeling of hotness accompanying temperatures above certain value is intensified by high humidity, which can increase energy use for air conditioning.Xie et al. [17] included relative humidity in their model to improve relative accuracy by 21% in North Carolina.In [10], the weather variables that cause a significant impact on electricity demand for Hong Kong were discussed.The effect of wind cooling exterior walls of buildings was intensified if they were wet.For regions with a certain climate, usage of humidity was reasonable [10].This weather compensation neural network model was capable of providing more accurate forecasts with a 0.9% reduction in forecasting error for Hong Kong.
Therefore, it is worth examining the influence of each atmospheric variable for short-term electricity demand forecasting.This is because there is always a research gap for the statistical analysis of accuracy improvement and the significance of other individual atmospheric variables in the literature of demand forecasting.However, in real-time forecasting systems, we must implement forecasted weather data instead of measured data, as they always contain some error.Fay and Ringwood [44] studied the effect of weather forecast error while implementing a short-term load forecasting model.Their tested results showed that weather forecast error causes an approximately 1% deterioration in accuracy.

Modeling Trend
The existence of a trend largely depends on the length of the time period examined, industrialization and rapid economic growth.Electricity load data have occasionally been found to be non-stationary.The use of the logarithm or the performance difference of the historical data is applied to account for the non-stationarity.However, fitting a deterministic trend in the model is more appropriate rather than the difference.The seminal paper of Ramanathan et al. [19] fitted both a linear trend and the reciprocal of that to their model.In this paper, the logarithm is used for historical demand data to avoid possible heteroscedasticity and trend adjustment.The positive integer is also included in the model, but the estimated weight is almost zero, indicating that taking the logarithm of historical data had already captured the trend.

Modeling Cyclicality and Seasonality
A cyclic pattern exists when data exhibit rises and falls, but not in a fixed period.However, a seasonal pattern exists when a series is influenced by seasonal factors, such as the quarter of the year, the month or the day of the week.Therefore, seasonality is always of a fixed and almost known period.Unlike seasonal behavior, cycles no longer occur, but there is more variation of the magnitude.The class of ARMA models can handle both seasonality and cyclic behavior.An ARMA(p, q) model can be cyclic if p > 1.Therefore, ARMA models have been extensively applied in the load forecasting literature [45,46].The most popular time series techniques that have been adopted are ARMA or ARMA with exogenous variables (ARMAX) models.However, a long period of cyclicity is not handled very well.According to Hahn et al. [47], time series load data contain three seasonal patterns: intraday (daily), weekly and annual, which is quite similar to electricity price data.Silvano et al. [48] mentioned that electricity demand and price exhibit a strong daily seasonality due to day and night hours and weekly seasonality due to working and non-working days.
A lag structure-based model for MLR is constructed to capture seasonality because lag structures alone cannot capture the complete seasonal features during weekends [49].The reason is explained by Clements et al. [2]: the electricity demand for Saturday and Sunday was significantly over-predicted (negative bias in the errors).This stems from the fact that a higher load on a weekday is being used as a one-day lagged load in generating the forecast for weekends.Similarly, when the Sunday load is used in generating the forecast for Monday, significant under-prediction occurs (positive bias in the errors) [2].Essentially, this bias is due to the fact that the coefficients on one-day lagged load do not differentiate between days of the week.A simple way to deal with this issue is to combine the one-day lagged load with day-of-the-week dummy variables.However, our models avoid this step because of the excessive number of variables.

Mathematical Model
Since sampling of the data is performed every hour, there are N = 24 samples in a day.We separate these samples' hour-wise pattern to forecast the electricity demand for the first hour of the day by one equation and the second for the second hour of the day from the next equation, and so on.Hence, we need 24 individual equations for a one-day forecast, and now, the generic prototype model is, In Equation (1), predictor variables are grouped as deterministic, atmospheric, historical electrical demand and interaction terms, denoted by Det, AtmV, DHist and ITerms, respectively, for hour of day h and forecasting days d.The residual v h,d modeling is very sensitive and important in MLR because of the incorporation of an auto-regressive structure in the error term.According to the econometric regression theory, if the residuals are serially correlated, the estimation of coefficients may lead to instability and cause misleading results.Therefore, Equation (2) assumes that the current value of the error term at hour h of day d is denoted by v h,d , i.e., error terms are assumed to follow an autoregression in previous days and are modeled as the sum of the finite order of previous errors.This paper has treated each hour as a separate time series; therefore, the correlation error between hour can reasonably be taken as small [19].Mathematically, for h = 1, ..., 24.The term v h,d−i is the serially-correlated error term up to q lags.The simple Ljung-Box Q-test can be implemented for selecting the appropriate order of q and h,d = N(0, σ 2 ).
The term Det h,d refers to predictable variables that can capture the seasonality of electricity demand, where demand variation is based on a day of weeks, a month of years and years.The historical demand profile shows higher demand during the business week (Monday-Friday) than that of weekends (Saturday and Sunday) and public holidays.Such an effect can be addressed by dummy variables.The number of dummy variable selection criteria for weekdays dv = 7 and for months m = 12 is dv − 1 and m − 1, respectively.If all dummies (dv = 7 and m = 12) are used in the regression together with the intercept, then this set of dummies is linearly dependent on the intercept called the perfect multicollinearity or dummy variable trap and affects the stability of the regression coefficients [50].Therefore, six dummy variables WD i h,d where i = 1, ..., 6 represent the corresponding days of the week (i = 1 for Sunday, i = 2 for Monday,..., i = 6 for Friday), except Saturday.Because, Saturday is considered as a reference day.Each dummy variable has only two allowable values, either zero or one.For example, if the day d is Monday, then the variable WD 2 h,d takes the value one, while all other variables WD i h,d are zero.Similarly, if the day d is Sunday, then all the variables WD i h,d take the value zero, except WD 1 h,d .Now, the days' dummy variables are explained as, where Day(1), ..., Day(6) represent Sunday, ..., Friday, respectively.Additionally, the linear function of the dummy variables can be written as WD h,d = ∑ dv−1 i=1 β i WD i hd .Some other categorical variables such as working day after a holiday, working day before a holiday and long holidays are also included in the Det h,d modeling as dummy variables.Therefore, the model Det h,d is formulated with 26 variables, including constant β 0 .For simplicity, we use variable vector β for all the coefficients.
For the months' dummy variables, eleven dummy variables (M j hd ) where j = 1, ..., 11 in the model account for the monthly seasonality of electricity demand, not only related to the weather conditions.Monthly seasonality of electricity demand may be influenced by a number of economic and social factors such as summer vacations and the seasonal character of some industrial activities.The index j values represent correspondingly all months in a year (j = 1 for January, j = 2 for February and j = 11 for November), except the base month of December.The variable M j hd takes the value of one if the j observation belongs to the month j, otherwise M j hd takes zero.Therefore, months' dummy variables can be formulated as, where Month(1), ..., Month(11) represent January, ..., November, respectively.Therefore, the linear function of the dummy variables can be written as Hence, the term Det h,d can be formulated as, Here, all the coefficients are carried by vector β, where β 0 represents the intercept, β 1 is the trend and β i , i = 2, .., 7 is for the day of the week dummy variable.Similarly, i = 8, ..., 18 for the month of the year dummy variable, and i = 19, ..., 22 for working days, a working day before a holiday, a working day after a holiday and long holidays, respectively.
The term AtmV h,d is another group of variables that affects the demand of electricity.Some pre-processing of temperature helps to observe the non-linear influence on electricity demand (Figures 2 and 4a,b, which implies 17.1 • C as the reference point where there is no effect of temperature (dead zone) for demand.The base temperatures for HDD and CDD were estimated as 15.65 • C and 21.53 • C, respectively.This means that the temperature between 15.65 • C and 21.53 • C has no effect on electricity demand.However, this may not be true in the realistic observation of months and seasons.Human behavior is different according to the season.For example, humans at 16 • C during the winter season still feel cold and want to use a heater; however, during the summer season, 16 • C is a very comfortable temperature for the human body, and people never use a heater at this temperature.The same situation holds true for 22 • C in the summer season.Therefore, the interaction of CDD with months and the interaction of HDD with months seems necessary to address this realistic problem.A simple rule setup for CDD and HDD is as follows, The heating and cooling effect of the previous day is considered in the model as HDDDay d−1 , CDDDay d−1 .The representations of CDD and HDD for daily, maximum and minimum temperature are written as CDDDay, HDDDay, CDDmax, HDDmax, CDDmin, HDDmin, respectively.Similarly, the daily and hourly deviation of temperature are also added to model as TDevDay and TDevHr.
Therefore, only temperature-derived terms AtmVT h,d consist of the following 16 variables, Other atmospheric variables are also accounted for by the mathematical formulations.The weight of β i , i = 1, ..., 11 represents the coefficient values for these weather variables.Therefore, Equation ( 7) is added to the new model, and there will be 27 (temperature related: 16: and other atmospheric related: 11) variables combined in the atmospheric model, It is obvious that the next day demand level is correlated with its previous day demand from the observation of the historical demand pattern.However, the Ljung-Box Q-test is the simplest method for the correlation test, and in our test, the null hypothesis is rejected.Therefore, we have to accept that there is a significant correlation with the previous day's demand.Since the auto-regressive (AR) component captures the pattern of load in hour h = i for any given day, it gives a good indication that the load will be higher in hour h = i on the following day(s).Hourly plots of auto-correlation and partial auto-correlation indicate that demand is strongly correlated with day ahead, same time load and previous hour load.Hence, the amount of electricity demand that depends on its previous demand is denoted by DHist h,d , and the logarithm is taken (Section 4.1) for historical demand DHist h,d =ln(D h,d ); then, AR structure is modeled as, In Equation ( 9), the same hour of yesterday's demand, the day before yesterday's demand, last week's demand, yesterday's maximum demand, yesterday's minimum demand and the seven-day moving average of midnight demand are denoted as, respectively.The forecasting demand during Saturday and Sunday (weekends) is significantly higher with a negative bias in the errors.This is because the higher load during Thursday and Friday (β 1 Dem h,d−1 , β 2 Dem h,d−2 ) is used to forecast weekends' demand.Similarly, when Saturday and Sunday demands are used to predict Monday demand, significant under-prediction with positive bias in the errors occurs.To overcome this issue, the interaction between day types and electricity demand (Equation ( 10)) of that day is necessary.
Similarly, temperature also shows seasonal characteristics with months; therefore, it is meaningful to combine temperature-derived variables (CDD and HDD) with month dummy variables.It consists of 22 variables.
Therefore, the interaction term ITerms h,d is the sum of Equations ( 10) and (11).Based on this mathematical modeling, three models are constructed.

•
Model A: This model consists of the variables from deterministic terms, historical demand and historical demand-related interaction (e.g., DH I h,d ) term.Therefore, Model A is the sum of Equations ( 5), ( 9) and ( 10) and consists of a total of 47 variables including six correlated error terms (v h,d ).The electricity demand from Model A is denoted by DA h,d and can be generalized as, • Model B: Model B consists of Model A (Equation ( 12)), temperature variables (Equation ( 7)) and interaction terms with temperature (Equation ( 11)).Therefore, Model B consists of 83 variables.
The electricity demand from Model B is denoted by DB h,d and can be generalized as, • Model C: Model C consists of all the variables from Model B and atmospheric variables (Equation ( 8)).Therefore, Model C consists of 94 variables.The electricity demand from Model C is represented by DemC h,d and can be generalized as, As our interest is to analyze the effect of other atmospheric factors on the demand forecasting, e.g., rainfall, snowfall, wind speed, solar radiation, relative humidity and cloud cover, the co-variates for Model A, Model B, and Model C can be arranged in column vector form and estimated using the ordinary least square (OLS) technique first.These estimated point values from OLS are used as prior values for Bayesian estimation in the next step.Markov chain Monte Carlo (MCMC) is constructed to implement Gibbs sampling and generate the posterior distribution of weights for a better forecast.More detailed discussion of this process is given in Section 5.

Estimation and Forecasting
The MLR model for Model A, Model B, and Model C (Equations ( 12)-( 14)) can be setup as Y h = X h β h + v h , for h = 1, ..., m is the regression model for hour h.Here, Y h = (D h,1 , D h,2 , ..., D h,T ) , v h = (v h,1 , v h,2 , ..., v h,T ) , and β h = (β h,1 , β h,2 , ..., β h,P h ) .These P h variables in the regression are for hour h, and X h is the corresponding T × P h size matrix.This design matrix contains the variables as Equations ( 12)-( 14).For instance, if h = 1, ..., 24, there are 24 regressions, so that y = X β + v where y = (y 1 , ..., y 24 ), β = β 1 , ..., β 24 ), and X = bdiag(X 1 , ..., X 24 ) is a (24d × ∑ m h=1 P i ).The serially-correlated error v h (Equation ( 2)) term can be transformed to a serially-uncorrelated error , and i represents the order of lag for which we account.Similarly, , and X * h = X h − ρ i X h−i removes the serial correlation.Because of the use of transformed equations with serially-correlated error, simple OLS estimation cannot be used.Therefore, the Bayesian estimation procedure is implemented in two steps.Firstly, the MLR parameters are estimated using the OLS technique assuming serially-correlated error coefficients to be zero.The purpose of OLS estimation here is to obtain the initial value (prior) of parameters near the true value.Therefore, rolling windows of only a 200-day training dataset are used.This size of the training dataset for OLS estimation has been chosen such that the over-fitting problem is avoided.Secondly, the posterior distribution is updated based on prior information and the likelihood function.For this purpose, moving windows of a 730-day training dataset are used.Such a length of the training dataset can capture daily, weekly, monthly, and yearly characteristics with the trend.Lusis et al. [5] suggested that one year of historical data is sufficient to develop a demand forecast model.However, Clements et al. [2] used a three-year moving window of aggregate data of the Queensland region, Australia, for a model estimation and obtained impressive results (MAPE 1.36%).Since the length of the training dataset shows localized characteristics, therefore, three different sizes of training datasets, 1 year, 2 years, and 3 years, were tested on the basis of forecasting accuracy performed before the selection of the training size.As the number of variables increases in our models, the estimation of the posterior distribution becomes more difficult.Therefore, the MCMC method becomes quite useful.This paper uses the Gibbs sampling algorithm to draw inferences about the model variables.
The priors are informative and close to the true weights because the weights from OLS estimation are used for the priors.These priors' weights are further refined with the likelihood function the using Gibbs sampling techniques.The sampling scheme is run for 1000 iterations, after which we assume that it has converged to the joint posterior distribution.Nevertheless, some of the weights started to converge in very early draws.Then, we generated a further 1000 iterations to use for MCMC inference.Moreover, we have tested with 5000 and 10,000 iterations to see the possibility of divergence, but no symptoms of divergence were found.Since the median value minimizes the sum of the absolute deviation, it can have better prediction than the mean mean value.Therefore, median values are used for the prediction, and MAPE is used for error analysis.

Results and Discussion
We have used three years of data up to 2015 from 2013 with no missing data.For training data, two years (730 days) of moving windows are used and predicted for the year 2015.The multiple equations modeled here are estimated for each hour with separate equations having their own co-variants.Therefore, for every hour of the day, they have a different weight of the parameter value.

Atmospheric Variables
Since our motive is to find the atmospheric impact on electricity demand, their impacts vary according to the hour of the day.The estimated coefficients (only atmospheric variables are of interest to us) discussed for all the hours from Model C are plotted in Figure 5. Out of 26,280 samples, seven snowfall data and some cloud cover and humidity data were missing.Simple interpolation is used to fill up these missing data.At Hour 14, these variables show higher weights compared to other hours.The statistical and hypothesis testing of these weights is discussed in Section 6.4.  Figure 5 shows the overall impact of rainfall, solar radiation, wind speed, relative humidity, and cloud cover for all the hours.Rainfall during day hours (Hour 10-Hour 16) causes an increase of the cold environment, therefore showing the highest impact for the increment in electricity demand up to 0.5% (i.e., 18 MW for weekends, 18.4 MW for holidays and 20 MW for weekdays), followed by wind speed.However, sunlight (solar radiation) causes the environment to warm, and people use less heating appliances, which causes a decrease in demand during day hours.This is the reason why Hour 11 and Hour 15's electricity demand is reduced by more than 0.6% (approximately 23 MW in aggregate) by solar radiation.From Hours 8 to 18, the electricity demand is significantly affected by wind speed, as the demand is increased to 0.38%.The correlation figure (Figure 3) indicates that cloud cover has the highest positive correlation with electricity demand, among these weather parameters.This is because cloud cover causes people to turn on lights, but the estimated parameter value for cloud cover shows up to a 0.16% increment only.Snowfall causes a slight increment during evening to morning hour, but humidity does not show any significant changes on electricity demand.In general, atmospheric sensitivities appear from Hour 5-Hour 22, mainly Hour 9-Hour 17.This result is comparable to a previous study's result covering a business district of Tokyo Prefecture [36].

Temperature Variables
CDD and HDD terms are used to convert the non-linear relation of temperature to the linear relation by estimating the base temperature.However, still, square terms of CDD and HDD exist in the model.Therefore, the interpretation procedure of these terms and their impact on electricity demand are slightly different.Table 3 illustrates the increase in electricity demand due to CDD for Hour 1, Hour 14 and Hour 19 based on the reference temperature.The second column explains the CDD values for the corresponding temperature in the first column.These CDD values are re-estimated from the partial derivative of Dem d with respect to CDD, then solving the new CDD value (CDD_val), setting the equation equal to zero.Since the CDD value cannot be negative, therefore the first five rows are neglected.During night time, for instance Hour 1, when the temperature increased from 25-26 • C, the demand is predicted to rise from 1.19-1.79%.Moreover, for the same temperature during Hour 19, electricity demand is predicted to increase from 6.69-7.72%.When the temperature reaches 32 • C, electricity demand is expected to be increased by 14.15% during peak hours of the evening because of more human activities in the evening time.However, for other hours, there is less than half the impact of temperature compared to Hour 19.

Performance Analysis
In Figure 6, electricity demand prediction by Model C is plotted for the first week of January 2015 and compared with the actual demand.The mean, median, and 60 percentile forecast values from the posterior distribution are implemented for prediction, which is the beauty of Bayesian estimation.Due to the New Year's effects, forecasting performance is worse than other weeks.This week consists of a scheduled public holiday, unscheduled holidays, weekdays, and weekends.Therefore, it is worth discussing their effects on forecasting accuracy.The best MAPE performance for this week is 2.73% by Model C followed by Model B, while the worst performance is 4.25% (Model A).When the electricity demand for 1 January is predicted, the AR influence is clearly observed because the effect of previous days (high demand days) causes over-forecasting.Similarly, the electricity demand for 2 January is under-forecasted because of the lower demand of its previous day (1 January).As a normal Saturday demand, electricity demand is forecasted for 3 January, but actual demand is still lower than forecasted due to the New Year's impact.Therefore, this week has a significant rise and fall of electricity demand.
Similarly, Figure 7a,b explains the hourly demand prediction by Model B, and Model A, respectively, for the same time period as Model C. The existence of higher forecast error caused by Model A, especially on 2 January, 4 January and 7 January, is minimized by model B. However, the predictive nature of all the models (Model A, Model B, and Model C) follows similar patterns, but comparatively, Model C is better.In Figure 8, electricity demand prediction by Model C is plotted for the first week of July (summer season) 2015 and compared with actual demand.Electricity demand on 4 July (Saturday) is significantly over-forecasted.However, fluctuations in peaks are forecasted well.This week consists of one scheduled public holiday on 7 July (Tuesday), where both Model A, and Model B failed to predict the peak demand (Figure 9a,b).However, Model C predicted it very close to the actual demand.The best MAPE performance for this week is 1.69% by Model C, followed by Model A, while Model B is the worst with 1.96%.
Figures A1 and A2 illustrate the prediction error distribution of each model for individual hours.As was our expectation, error variation during day hours is higher than other hours.Model A predicts the highest variation in error distribution of 188 at Hour 14.This variation in the same hour is reduced to 122 by Model C. The skewness and kurtosis plot in Figure 10   Similarly, kurtosis criteria are a measure of how prone a distribution with a value of three is to outliers, being equivalent to a normal distribution.A high kurtosis value can be interpreted as a failure to predict high electricity demand occurrence.Model C being closer to three compared to the other two models indicates that Model C is better.This can be seen more clearly in Figure 11a,b.Over-forecasting and under-forecasting characteristics of all three models are compared for all hours of a day.At Hour 14, Model A over-forecasts up to more than 155 MW followed by Model B, and Model C, respectively.Similarly, under-forecasting also occurs at the same Hour 14.However, the highest value for under-forecasting is less than 150 MW.Moreover, after Hour 20 until Hour 8, Model C can predict withing a ±50 MW forecast error, which is 1.2-1.4% MAPE in aggregate demand for these hours.Higher forecasting error has occurred during other hours (Hour 9-Hour 19), which is also explained in Figure 11b.The forecast distribution of absolute errors for each hour (Hour 9-Hour 19) is wider when most peaks occur.Hourly mean forecast error for 2015 This box plot in Figure 12a shows the hourly deviation of predicted demand from real electricity demand.Throughout the hours, the median value is zero, but during day hours, especially Hour 9-Hour 17, higher variations in the error occurred.As was our expectation, the over-and under-forecasting probabilities are approximately equally distributed.However, at Hour 10, there is a negligible probability for under-forecasting.Another important intuition from this box plot in Figure 12a is that forecasting error is randomly distributed with zero mean with some variance.We have used the median value for the prediction of electricity demand because the median value minimizes the sum of absolute deviations.During simulations, the accuracy of electricity forecasting was measured by MAPE and the root mean square error (RMSE).For better accuracy, MAPE and the corresponding RMSE value should be low.However, only MAPE is presented in this paper for simplicity.Figure 12b indicates the box plot for accuracy measured in MAPE for each hour.The box plot can clearly explain the distribution of MAPE as a different percentile form.The MAPE value in our range (25th-75th percentile) is considered as an outlier in the box-plot and denoted by (+) in the plot.There are significant MAPE outliers throughout the hours that directly cause a higher mean MAPE value.The maximum MAPE is observed at morning Hour 9, afternoon Hour 14 and night Hour 23.This is because of the fluctuating electricity demand based on human activities during these periods.
Comparing the seasonal forecasting performance for all three models in Figure 13, Model A shows the worst MAPE value greater than 2% for all four seasons.However, for the same four seasons, Model C can forecast with less than 2% MAPE.Interestingly, the summer months from all the models show better forecasting performance.This is because the lower the demand, the lower the error probability and vice versa.This could be one reason why winter has a higher MAPE value than summer for all three models.The hourly performances of Model A, Model B, and Model C are tabulated in the Table A2.The first row of the table contains the description of each column such as Column 1-7: hours, model types, and days of the week.Column 8-15 contain the MAPE values of working days including holidays (WDH), workings day not including holidays (WDNH), weekends including holidays (WEH), weekends not including holidays (WENH), holidays on working days (HWD), holidays on weekends (HWE), only holidays, and non-holidays, respectively.For each column, we are selecting the largest 20 (randomly chosen number) MAPE values (bold and italic) and smallest 20 MAPE values (bold) to compare the hourly performance of individual models.
In general, the performances of the three models for all days from the evening at Hour 19 until the morning at Hour 7 in the Table A2 show that 99% of the smallest MAPE values are within these periods.Similarly, more than 99% of the largest MAPE values occur during morning Hour 8 to evening Hour 18.This is because of human activities during day hours.Model A is 48.3%;Model B is 38.7%, responsible for the higher MAPE; whereas Model C is much less (13%) responsible for the higher MAPE.However, the important question is: Which model plays a significant role in the minimization of MAPE?Model C has the highest percentage (53.7%)having minimum MAPE followed by Model B (40.7%), and the lowest is Model A with 5.6%.
From morning Hour 9 to afternoon Hour 17, Model C can forecast with the best MAPE value (100%) compared to the other models.However, for 1 a.m. to 4 a.m., Model B can forecast with 60% of the best MAPE values.Getting smaller MAPE values during working hours of the day is very challenging for researchers.This is because of unpredictable human activities and weather variations.Therefore, during the peaks and fluctuating hours, Model C can be used for better forecasting accuracy.In Hokkaido, peak demand is during Hour 17, and for this time period as well, Model C performs very well.There is very less possibility of getting higher MAPE values from Model C compared to Model B and Model A.
Figure 13b compares the performance of Model C with Model B and Model A, based on the accuracy of their forecasting ability.Throughout the months of 2015, the accuracy of Model C is better than the other two models.However, unlike our expectation, the accuracies of Model A and Model B are superior in June and September, respectively.Nevertheless, the difference between the accuracies is relatively low (0.005% MAPE).The one reason behind this result is the dramatic change in the weather variables (Figure 15a,b).In June, Model B and Model C have very poor performance especially on 4, 5, 10 and 16 June; see Figure 14a.On 5 June, the performance of Model C is 3.6-times worse than Model A. Therefore, a strong reason behind the poor performance of Model B and Model C is the fluctuation of the weather variables, especially temperature (because both Models B and C perform poorly).Figure 15a justifies the reason.On June 2015, except the first week, there was normal variation of temperature with a mean of 16.70%.However, the temperature on 5 June 2015 dramatically decreases to the minimum level.This sudden variation in temperature forces the poor forecasting performance of both Models B and C.  Similarly, in September 2015, unlike our expectations, only Model B has very competitive performance with Model C. Nevertheless, the difference in accuracy is negligible.Since Model C includes rainfall, snowfall, solar radiation, wind, humidity, and cloud cover, we can assume that there must be some unpredictable variation in these variables.Figure 14b shows that especially for 10 and 30 September 2015, the forecasting performance of Model C is poor.This is because on 10 September 2015, wind speed unexpectedly rose to a peak.The increased value of cloud cover on the same day also supports this volatility (Figure 15b).Therefore, such unexpected variation in weather variables causes the forecasting performance to be always challenging.
The overall measure of forecasting accuracy in different MAPE ranges is reported in Table 4, in which Models A, B, and C are compared.Tabulated values in the third column express the overall MAPE for 2015.Comparing all the models, the most significant MAPEs are obtained from Model C for all cases.More specifically, the introduction of all weather variables in Model C reduces the high MAPE values (MAPE ≥ 5%) by one-fourth on working days and at least by half on other days.The overall MAPE performances of Models A, B, and C are 2.43%, 1.98%, 1.72% respectively.Therefore, the performance is improved by 18.5%, if we include temperature with historical electricity demand data.Similarly, Model C can improve forecasting accuracy by 13.4%, by including other atmospheric variables.

Hypothesis Testing
We test the statistical significance for two aspects: statistical significance of other atmospheric variables and statistical significance for accuracy improvement by Model C. Since these atmospheric variables are added in Model C, the statistical significance of the estimated weights is tested (Figure 16).After Hour 20, there is no solar radiation data available until morning Hour 5; therefore, a void contour graph exist during those hours.The estimated weight of relative humidity (hourly) is significant for all hours, whereas daily rainfall and daily cloud cover are strongly not significant.Other variables such as wind speed (hourly and daily), solar radiation (hourly and daily) and hourly rainfall are fairly significant for some hours.
To analyze the improvement in accuracy by Model C, a one-sided z-test for 99% confidence is used for all individual hours.We determine whether the accuracy of any hour is significantly improved by adding atmospheric variables (Model C) compared to the accuracy of other models (without atmospheric variables) or not.The null hypothesis states that there is no significant difference in the models' performance tested at the 0.01 level against the alternative hypothesis.Testing the accuracy with respect to Model A, the test result shows that Model C has a significant improvement of the accuracy for all the hours of a day.However, testing the accuracy of Model C with respect to Model B, we cannot obtain sufficient evidence to reject the null hypothesis for Hours 1-6.i.e., the improvement of the accuracy by Model C during Hours 1-6 is statistically not significant.However, the improvement of the accuracy for the rest of the hours is statistically significant.

Computation Time
The computational time of the forecast model is important for utilities.If online training is used, where electricity demand observations from each day are added to the training dataset for model retraining, computational time plays a crucial role.When conducting an electricity demand forecast for 24 h ahead, the computational time depends on the length of the moving window, the number of predictors used and, most importantly, the selected forecasting technique.In this study, the computational time to forecast for 24 h was approximately 0.20 min for the OLS technique (for all three models).However, for the Bayesian technique, it was significantly higher than the OLS technique.The computational times to forecast for the next 24 h of Model A, Model B, and Model C were recorded as 3.77 min, 5.682 min and 7.919 min, respectively.The simulation was conducted on an Intel Core i5 2.5-GHz processor with 8 GB of RAM.

Conclusions
In this paper, three MLR models named as Model A, Model B, and Model C are constructed based on the types of variables used.During modeling, we pay attention to the weather variables that can affect electricity demand and try to analyze their impact quantitatively.Model A consists of historical demand data, the interaction of days with historical demand data and deterministic variables, such as day types or month types.Temperature-related variables are added to Model A to construct Model B. Model C consists of other weather variables including all variables of Model B. Therefore, the variables used in Model A, and Model B are a subset of Model C.These models are compared based on their capability in forecasting performance for one year out sample prediction, also focused on the marginal impact of temperature and other atmospheric variables on electricity demand.Preliminary work on the selection of weather stations suggests that the combination of three weather station's data (Sapporo, Hakodate and Kitami) can give the best demand forecasting result (MAPE 2.89%) for Hokkaido Prefecture.Similarly, the base temperatures 21.53 • C for CDD and 15.65 • C for HDD are also calculated from polynomials and piece-wise models, which are the pioneering works for this prefecture.
To minimize the MAPE error, Model C has the highest chance, with 53.7%, Model B with 40.7% and the lowest for Model A 5.6%.Because of the low and consistent electricity demand during June and September, forecasting errors are minimum, whereas they are maximum in January and December.The overall MAPE performances of Model A, Model B, and Model C are estimated as 2.43%, 1.98% and 1.72%, respectively.This means that the accuracy of Model C is improved by 13.4% when rainfall, snowfall, solar radiation, wind speed, relative humidity, and cloud cover data are included in the model.The introduction of these variables to Model C reduces the higher MAPE values by one-fourth on working days and at least by half on other days.At Hour 14, these weather variables show the highest effects on electricity demand.However, temperature shows the highest effects during peak demand, i.e., Hour 19.The weather sensitivities appear mainly from Hour 9 to Hour 17.This result is comparable to a previous studied paper [36]

7.
Calculate the forecasting error and find the combinations of weather stations that give the best (smallest) MAPE.
In summary, we can choose a best weather station in two steps.Firstly, we observed the MAPE of each weather station (Table A1) and arranged the weather stations in ascending order to combine their data.Weather station data from Sapporo city with approximately 53% of the population exhibits the best forecast with the lowest MAPE (2.959%) as we expected; nevertheless, the Hakodate (Index 2) and Moruran (Index 7) regions' weather stations show competitive MAPE values, ranking second and third, respectively (Table A1).
In the second step, the top ranked weather stations' data based on the lowest MAPE weather stations are combined using two different techniques.The first technique is the simple average as Equation (A3).The second technique is the weighted average by population as Equation (A4); where the symbols have their usual meaning.Then, follow Step 7 from Appendix A.2.
where T i , i = 1, 2, ..., N sample observations of temperature T and P i , i = 1, 2, ..., N observations of population P, assuming that the temperature per person is constant throughout Hokkaido territory.The overall MAPE performance, variation with R 2 and Adjusted − R 2 values of the combined weather stations are tabulated in the 8th-13th column of Table A1.Figures in parenthesis (eighth and 11th column) correspond to the variation in MAPE values, which indicates the consistency in accuracy.The overall performance of the weighted mean temperature by population (11th, 12th and 13th columns) gives better results with lower variation and higher R 2 values.To choose the best combination of weather stations, weather stations indexed as 1, 2 and 7 have the lowest MAPE and its variation with highest R 2 values.In the case of being weighted by population, the combination of Weather Stations 1, 2, 7, 8, 6, 5 and 4 also provides a competitive performance for the combination in the simple mean technique given by the 1, 2 and 7 combination.However, based on variance, R 2 and Adjusted − R 2 values, Weather Stations 1, 2 and 7 (i.e., Sapporo (population 53%), Hakodate (population 12%) and Kitami (population 10%) are selected as the best weather stations.In this paper, the combination of these weather stations with the simple mean is used because of its lower MAPE and lower variance values.

Figure 1 .
Figure 1.(a) Trend of the electricity demand profile: indicating the constant decreasing trend of demand.This is the averaged data sample of 2013-2015; (b) variation of electricity demand for day types in Hokkaido Prefecture from 2013-2015.

Figure 2 .
Figure 2. (a) Contour plot for electricity demand variation from 2013-2015; (b) daily mean plot of the seasonal variation of electricity demand and temperature in 2013.

Figure 3 .
Figure 3. Correlation plot for all the weather variables and electricity demand.

Figure 4 .
Figure 4. (a) Non-linear relationship of electricity demand with temperature; (b) piecewise linear approximation for the base temperature of CDD and HDD.

Figure 5 .
Figure 5. Impact of other weather parameters on the hourly electricity demand at different hours.

Figure 6 .Figure 7 .
Figure 6.Mean, median, and 60 percentile of the forecasts using Model C versus the actual load, for the first week of January 2015.

Figure 8 .Figure 9 .
Figure 8. Mean, median, and 60 percentile of the forecasts using Model C versus the actual load, for the first week of July 2015.

Figure 10 .
Figure 10.Statistics summary of prediction error distributions.

Figure 11 .
Figure 11.(a) Variation of forecasting error at different hours for 2015 (Model C); (b) percentile distribution of absolute forecasting error for 2015 (Model C).

Figure 12 .
Figure 12.(a) Variation of forecasting error at different hours for 2015 (Model C); (b) MAPE variation at different hours for 2015 (Model C).

Figure 13 .
Figure 13.(a) Seasonal comparison of MAPE among three models; (b) performance improvement of Model C with respect to Model A and Model B.

Figure 14 .
Figure 14.(a) The performance of Models A, B, and C for June 2015; (b) the performance of Models A, B, and C for September 2015.

Figure 15 .
Figure 15.(a) Fluctuation of temperature in June 2015; (b) fluctuation of weather variables in September 2015.

Figure 16 .
Figure 16.P-values for each weather variables for all the hours.

Table 1 .
Correlation coefficients of weather variables with electricity demand.

Table 2 .
Base temperature summary from some published studies.

Table 4 .
Measures of the forecasting accuracy for Models A, B, and C.
covering Tokyo; although Tokyo and Hokkaido have very different climates and populations.Rainfall causes increased electricity demand by 18 MW on weekends, 18.4 MW on holidays and 20 MW on weekdays.Wind speed follows after rainfall up to a demand of 16 MW at Hour 11, and solar radiation causes a decrease of demand up to 23 MW.Snowfall and humidity do not show significant changes in demand.The results of the statistical test indicate that these atmospheric variables and accuracy improvement by these variables are statistically significant in most of the hours.Therefore, such improvement on forecasting accuracy clearly indicates that including other atmospheric variables in model (rainfall, snowfall, solar radiation, wind speed, relative humidity, and cloud cover) is very necessary.

Table A1 .
Top eight weather stations based on populations and their forecasting performance on electricity demand for Hokkaido territory.Note: figures in parenthesis (8th and 11th column) correspond to the variation in MAPE values.