Wind Speed for Load Forecasting Models

Temperature and its variants, such as polynomials and lags, have been the most frequently-used weather variables in load forecasting models. Some of the well-known secondary driving factors of electricity demand include wind speed and cloud cover. Due to the increasing penetration of distributed energy resources, the net load is more and more affected by these non-temperature weather factors. This paper fills a gap and need in the load forecasting literature by presenting a formal study on the role of wind variables in load forecasting models. We propose a systematic approach to include wind variables in a regression analysis framework. In addition to the Wind Chill Index (WCI), which is a predefined function of wind speed and temperature, we also investigate other combinations of wind speed and temperature variables. The case study is conducted for the eight load zones and the total load of ISO New England. The proposed models with the recommended wind speed variables outperform Tao’s Vanilla Benchmark model and three recency effect models on four forecast horizons, namely, day-ahead, week-ahead, month-ahead, and year-ahead. They also outperform two WCI-based models for most cases.

Among the various weather variables, the temperature variable is the most entrenched one in the load forecasting literature.In the summer, load increases as temperature increases due to cooling needs.In the winter, load increases as temperature decreases to meet the heating needs.Commonly-used temperature variables are the various forms of dry bulb temperature, such as the piecewise form [13], polynomial [9][10][11][12], and high-order regression splines [14].The computing power today also allows the inclusion of many lagged and moving average temperature variables, as recently proposed in [15].
Several other adjusted temperature variables, such as wet bulb temperature, dew point temperature, and Temperature-Humidity Index (THI), are also of great interest to the load forecasting community.These variables have the temperature adjusted with humidity information.A formal study on humidity variables for load forecasting models was recently reported in [16], showing that splitting relative humidity and temperature results in more accurate load forecasts than using the predefined THI formula.That sets an example for systematically investigating those secondary weather variables for load forecasting.
Following a similar analytical framework as in [16], this paper fills a gap and need in the load forecasting literature by presenting a formal study on the role of wind variables in load forecasting models.The motivation mainly comes from the fact that the net load is more and more affected by the increased penetration of wind power resources.It gradually becomes a necessity to understand the effect of wind variables in electricity demand to accurately forecast the load.Although wind speed information has been used for load forecasting models [17,18], it is usually embedded in the Wind Chill Index (WCI) or wind speed adjusted temperature [4,[17][18][19][20].Overall, the load forecasting literature on wind speed variables is far less common and thorough than the ones on temperature and relative humidity.The research question we try to answer in this paper is "will splitting wind speed and temperature result in more accurate load forecasts than using the predefined and well-established WCI formula?"If the answer is yes, the follow-up question is "what wind speed variables shall we include in a load forecasting model?" The rest of this paper is organized as follows: Section 2 introduces the background of this study, including the base models, WCI, and the forecast evaluation techniques; Section 3 presents the exploratory analysis on the case study data; Section 4 discusses the proposed models and the models for comparison; Section 5 presents the results and discussions.The paper is then concluded in Section 6 with a brief discussion of future research.

Multiple Linear Regression Models for Load Forecasting
Multiple linear regression (MLR) is a widely-deployed load forecasting technique in the field due to its transparency, interpretability, and simplicity.In this paper, we will use MLR models for the analysis due to the same reasons.Tao's Vanilla Benchmark model is a frequently-cited regression model for load forecasting.It was used in GEFCom2012 as the benchmark model [7], and then reproduced by other scholars [21,22].The model is specified as in Equation (1): where y t is the expected load; Trend t is a chronological trend at time t; M t , W t and H t are class variables for month, day of a week, and hour of a day at time t; and T t is the coincidence temperature.Let f (T t ) be defined as in Equation (2): Furthermore, considering the impact of lagged temperature on load, we can add the various forms of lagged temperature variables to the benchmark model as in Equation (3): where T t−h is the temperature of previous h th hours (h = 0, 1, 2, . . .); T t,d = 1 24 ∑ 24d h=24d−23 T t−h is the 24-h moving average temperature of the previous d th day (d = 0, 1, 2, . . .).These models are also known as recency effect models [15].In this paper, we consider the benchmark model and three variations of the recency effect models to show the effectiveness of including the wind variables in load forecasting models with different levels of sophistication.Specifically, h ranges from 0 to 2; d is up to 1.

Wind Chill Index
WCI is defined by the National Oceanic and Atmospheric Administration (NOAA) National Weather Service (NWS) to measure how cold people feel in cold and windy weather.It is a predefined function of temperature and wind speed variables describing how the body feels on exposed skin due to the flow of air.The parameters are estimated based on the tests conducted with human subjects [23].WCI is only defined when the temperature is lower than 50 • F with the wind speed greater than 3 mph.As shown in Equation ( 4), for cold and windy weather, we follow the WCI defined by NWS.Otherwise, we set it to be equal to the temperature.

Cross-Validation
Cross-validation is a commonly used forecast evaluation technique to avoid potential overfitting issues of a forecasting model [24].In this paper, we adopt the V-fold cross-validation (VFCV) technique for variable selection, where V is the number of the validation periods.Specifically, the data is first dissected into V pieces with approximately equal size.Every time, we use one of them for validation and the other (V -1) pieces for model training.This process is repeated V times, one for each validation period.The performance of the model is evaluated based on the average performance of the model on the V validation periods.In Section 4, we will use the three-fold cross-validation to select the wind variables for load forecasting models.

Out-of-Sample Test
In this paper, we use a sliding simulation with a fixed-length of history to test the forecast accuracy of the models [25].Sliding simulation keeps rolling the forecast origin forward by the size of the predefined forecast horizon and uses the predefined-length of history prior to the forecast origin for parameter estimation.For example, when we forecast the load one day ahead, we estimate the parameters of the model using a predefined length of the history (e.g., three years) prior to the forecast origin to forecast the next day.This process is repeated by moving the forecast origin one day at a time until we provide the day-ahead forecast for every day of the test year.Alternatively, when the forecast horizon is one month, we repeat the process by moving the forecast origin one-month ahead at a time until we provide the month-ahead forecast for every month of the test year.In Section 5, we will present the performance of the models for day-ahead (e.g., 24-h-ahead, to be specific), week-ahead, month-ahead, and year-ahead forecasting.We test the performance of the models for different forecast horizons to show that the proposed model is not restricted to short-or long-term load forecasting.

Data Description
ISO New England (ISONE) is an independent regional transmission organization, serving the six states in the Northeastern United States, including Connecticut (CT), most of Maine (ME), Massachusetts (MA), New Hampshire (NH), Rhode Island (RI), and Vermont (VT).The state of Massachusetts is further dissected into three load zones, namely NEMASS, SEMASS, and WCMASS.Each of the other five states forms its own load zone.ISONE publishes the zonal level hourly load on its website [26].In this paper, the weather variables, including temperature and wind speed, were provided by a weather service vendor AccuWeather, Inc, State College, PA, USA.
In this paper, we use the system total data of ISONE for demonstration purposes.We present the results of the eight load zones and the system total in Section 5.For each zone, we use the weather data from the closest major airport.For ISONE, we use the average of the weather data from all eight zones.The data being used in this study ranges from 2012 to 2015.The first three years (2012-2014) are used for variable selection based on the three-fold cross-validation, while 2015 is used for out-of-sample tests.Table 1 presents the summary statistics of the load and weather data for these four years.There is no pre-processing of data other than the adjustment of the load data for daylight savings time (DST).
At the beginning of the DST, we take the average of the adjacent two readings as the load.At the end of the DST, we divide the load by two.

Exploratory Data Analysis
Figure 1 shows the time series plots of hourly load, temperature, WCI, and wind speed at the system level from 2012 to 2015.Temperature and WCI share a similar seasonal pattern: high in the summer and low in the winter.When WCI is defined (usually in winter months), it is always lower than the temperature and has a wider range.Wind speed varies from 0 to 30 mph during winter months, while the range is much narrower during summer months.
Figure 2 shows the scatterplot between load and temperature, while Figure 3 shows the scatterplot between load and WCI, both in 2014.In this paper, we use the data from the year 2014 for demonstration purposes to avoid verbose presentation, noting that data from other years present similar patterns.When the temperature is higher than 50 • F, WCI is equal to the temperature.In other words, the right arms of the two scatterplots in Figures 2 and 3 are identical.Other than the right arms, the load-WCI scatter is sparser than the load-temperature scatter.
Figure 4 shows the scatterplots between load and wind speed by month during 2014.The correlation between the two variables in June to August is stronger than that in the other months.During these three summer months, load tends to increase when wind speed increases.In the other nine months, the relationship between the two is weak.This appears to be opposite to our common sense that under the same temperature in the summer, the wind makes us feel cooler.For a further investigation, we draw the scatterplots between temperature and wind speed in Figure 5, which suggests a positive correlation between these two variables for most of the months during a year, with the correlation in the summer months being strongest of all.In other words, the counterintuitive observation mentioned above is due to the fact that the load-temperature relationship is much stronger than the load-wind relationship.
To sum up, the four time series plots in Figure 1 depict the seasonal features of load, temperature, WCI, and wind speed.Scatterplots in Figures 2 and 3 show the similar impact of temperature and WCI have on load.Scatterplots of load and wind speed by month in Figure 4 suggest that there is a strong correlation between load and wind speed during the summer months June to August, though some of the correlation is due to the load-temperature relationship and the positive correlation between temperature and wind speed as shown in Figure 5.For the rest of the paper, these three months are defined as summer months.The relationship between wind speed and load for these summer months will be further investigated.

Wind Speed-Related Variables
Due to the strong relationship between load and wind speed during the summer months (i.e., June to August), we introduce the dummy variable S as in ( 5): This dummy variable S is equal to 1 if the month is June, July, or August, and 0 otherwise.We use WSS t to denote the coincidence wind speed in summer.
For each of the four base models defined in Section 2, we introduce the wind speed variables in the following sequence: (1) WSS 0.16 t ; (2) T t × WSS 0.16 t ; and (3) H t × WSS 0.16 t .Note that we keep the power of 0.16 for wind speed to be consistent with the NWS definition.We then calculate the mean absolute percentage error (MAPE) values as defined in Equation ( 6) in the three-fold cross-validation setting: where n is the total number of observations.The simple average MAPE of the three validation years for all base models and the models with additional wind speed variables are listed in Table 2.The None column corresponds to the MAPEs of the models without the wind speed variables.The other three columns present the MAPEs of the models with different sets of wind speed variables.The cooler the background color of the cell is, the better the forecasts are.We can observe that for all four base models, the best results are obtained by using all three recommended effects.Let G(T t ) represents a base model depending upon temperature variables; we define the proposed model family as in Equation ( 7): where: Comparing with the NWS's formula for WCI (Equation ( 4)), the proposed Equation ( 8) extends the WCI (Equation ( 4)) by adding interactions between wind speed and dummy variable S, and between wind speed and hour.Furthermore, the proposed model allows the parameters to be estimated based on the dataset, while the parameters in Equation ( 4) are predefined.

Two WCI-Based Models
Since WCI can be seen as an adjustment to the temperature, we can obtain the first WCI-based model as in Equation ( 9) by replacing temperature with WCI in Equation (3): Another way to include WCI variables is to treat them as the wind speed and replace the wind speed with WCI in Equation (7), which gives us the second WCI-based model (Equation ( 10)).However, when WCI is less than zero, the WCI terms being used in g(WCI t ) become undefined.In such cases, we leave the WCI as-is without taking the root of 0.16:

Out-of-Sample Test
The out-of-sample test is conducted using the data from the year 2015 for all the eight zones and the system.Four forecasting horizons are tested for forecast evaluation: one-day (i.e., 24-h), one-week, one-month, and one-year.For one-week-ahead forecasting, the first seven days of the test year are considered as the first week, the second seven days are considered as the second week, and so on.
The tested model groups are listed in Table 3.They are the base models (TM 1 ), the base models plus the proposed wind speed terms (TM 2 ), the base models with WCI replacing the temperature variables (TM 3 ), and the base models plus the WCI terms (TM 4 ).

Tested Model Groups
Model Equation Figures 6-9 show the corresponding out-of-sample performance for one-day-ahead, one-week-ahead, one-month-ahead, and one-year-ahead forecasts for all zones across four base models, respectively.Across all of the base models and zones, the proposed TM 2 models outperform the TM 1 models with the relative improvement on MAPE(s) ranging from 0.08% to 1.99%.That confirms the effectiveness of the proposed wind speed variables.On the other hand, TM 3 models are not as accurate as the TM 1 models.In other words, simply replacing temperature with the predefined WCI does not improve the forecast accuracy.Although the TM 4 models also outperform the TM 1 models, the TM 2 models have the lowest MAPE in most cases.The percentage values listed beside the base model label indicates the percentage of the number of zones where TM 2 returns better results than TM 4 .In sum, adding the proposed wind speed related variables brings more improvement to the base models on average than using the predefined WCI.Another way to include WCI variables is to treat them as the wind speed and replace the wind speed with WCI in Equation (7), which gives us the second WCI-based model (Equation ( 10)).However, when WCI is less than zero, the WCI terms being used in ( ) become undefined.In such cases, we leave the WCI as-is without taking the root of 0.16:

Out-of-Sample Test
The out-of-sample test is conducted using the data from the year 2015 for all the eight zones and the system.Four forecasting horizons are tested for forecast evaluation: one-day (i.e., 24-h), one-week, one-month, and one-year.For one-week-ahead forecasting, the first seven days of the test year are considered as the first week, the second seven days are considered as the second week, and so on.
The tested model groups are listed in Table 3.They are the base models (TM1), the base models plus the proposed wind speed terms (TM2), the base models with WCI replacing the temperature variables (TM3), and the base models plus the WCI terms (TM4).

Tested Model Groups
Model Equation ( ) show the corresponding out-of-sample performance for one-day-ahead, one-weekahead, one-month-ahead, and one-year-ahead forecasts for all zones across four base models, respectively.Across all of the base models and zones, the proposed TM2 models outperform the TM1 models with the relative improvement on MAPE(s) ranging from 0.08% to 1.99%.That confirms the effectiveness of the proposed wind speed variables.On the other hand, TM3 models are not as accurate as the TM1 models.In other words, simply replacing temperature with the predefined WCI does not improve the forecast accuracy.Although the TM4 models also outperform the TM1 models, the TM2 models have the lowest MAPE in most cases.The percentage values listed beside the base model label indicates the percentage of the number of zones where TM2 returns better results than TM4.In sum, adding the proposed wind speed related variables brings more improvement to the base models on average than using the predefined WCI.

Ex Ante Forecasting
In this paper, all of the tests have been conducted in the ex post forecasting settings, where the actual weather information is provided through the forecast horizon.In practice, actual weather information is unknown, so the predicted values have to be used to forecast the load.While temperature forecasts nowadays for the short-run are quite accurate, other weather variables, such

Ex Ante Forecasting
In this paper, all of the tests have been conducted in the ex post forecasting settings, where the actual weather information is provided through the forecast horizon.In practice, actual weather information is unknown, so the predicted values have to be used to forecast the load.While temperature forecasts nowadays for the short-run are quite accurate, other weather variables, such

Ex Ante Forecasting
In this paper, all of the tests have been conducted in the ex post forecasting settings, where the actual weather information is provided through the forecast horizon.In practice, actual weather information is unknown, so the predicted values have to be used to forecast the load.While temperature forecasts nowadays for the short-run are quite accurate, other weather variables, such

Ex Ante Forecasting
In this paper, all of the tests have been conducted in the ex post forecasting settings, where the actual weather information is provided through the forecast horizon.In practice, actual weather information is unknown, so the predicted values have to be used to forecast the load.While temperature forecasts nowadays for the short-run are quite accurate, other weather variables, such as wind speed and relative humidity, are not as predictable.Although our empirical case study shows that using the proposed wind speed terms for short-term load forecasting could improve the forecast accuracy, it may or may not help with the ex ante forecast accuracy depending on how accurate the wind speed forecast is.Considering the trade-off between the improvement brought in by using the wind speed variables and the error introduced by the wind speed forecasts, additional empirical studies would be beneficial to tell whether using wind speed variables could benefit the ex ante forecast.
Beyond a few weeks ahead, all weather variables are unpredictable.In recent practices, simulated temperature scenarios have been fed into point load forecasting models to generate probabilistic load forecasts [10,12].Adding wind speed variables should improve the efficacy of the probabilistic forecast by expanding the range of the scenarios considered.Furthermore, Xie and Hong [27] also demonstrated the effectiveness of relying on point forecast accuracy to select the underlying model for temperature scenario-based probabilistic load forecasts.While this paper offers empirical evidence that the inclusion of the proposed wind variable terms helps reduce point forecast error, additional empirical case studies can be conducted to test the effectiveness of adding wind speed scenarios for probabilistic load forecasting.

Future Research Directions
In this paper, we formally and systematically study the wind speed variables for load forecasting using models with different comprehensive levels.One future research topic on wind speed variables can follow the similar direction as what was done for the temperature variables in [15] to include lagged and moving-average wind speed variables into the model.Once these wind speed variables are thoroughly studied, the investigation can be extended to net load forecasting with significant wind penetration.
The climate of a relatively small area may differ from that of the surrounding areas, especially when we are looking at weather variables, such as wind speed and cloud cover.Incorporating additional weather variables could potentially help better forecasting the load of a small area.This will require precisely selecting weather stations for the small area.The weather station selection methodology proposed in [28] can be extended with the inclusion of other weather variables.The research conducted in this paper also lays the groundwork for hierarchical load forecasting.

Conclusions
In this paper, we investigated the effect of wind on electricity demand using the data from ISONE.Three wind speed-related terms were proposed by looking at the cross-validation MAPE.The out-of-sample tests showed that the proposed wind speed-related variables help improve the temperature-only models and performs better than the WCI defined by NWS in most cases.Researchers can follow the same evaluation process to introduce wind speed variables to load forecasting models for other regions or datasets.The findings in this study also lay the groundwork for several future studies, such as net load forecasting, weather station selection, ex ante short-term load forecasting, probabilistic load forecasting, and hierarchical load forecasting.

Figure 1 .
Figure 1.Time series plot of the hourly load, temperature, wind chill index, and wind speed (2012-2015).

Figure 1 .
Figure 1.Time series plot of the hourly load, temperature, wind chill index, and wind speed (2012-2015).

Figure 3 .
Figure 3. Scatterplot of the hourly load and wind chill index (year = 2014).

Figure 3 .
Figure 3. Scatterplot of the hourly load and wind chill index (year = 2014).Figure 3. Scatterplot of the hourly load and wind chill index (year = 2014).

Figure 3 .
Figure 3. Scatterplot of the hourly load and wind chill index (year = 2014).Figure 3. Scatterplot of the hourly load and wind chill index (year = 2014).

Figure 4 .
Figure 4. Scatterplot of hourly load and wind speed by the month of the year (year = 2014).

Figure 5 .
Figure 5. Scatterplot of hourly temperature and wind speed by the month of the year (year = 2014).

Figure 4 .
Figure 4. Scatterplot of hourly load and wind speed by the month of the year (year = 2014).

Figure 4 .
Figure 4. Scatterplot of hourly load and wind speed by the month of the year (year = 2014).

Figure 5 .
Figure 5. Scatterplot of hourly temperature and wind speed by the month of the year (year = 2014).Figure 5. Scatterplot of hourly temperature and wind speed by the month of the year (year = 2014).

Figure 5 .
Figure 5. Scatterplot of hourly temperature and wind speed by the month of the year (year = 2014).Figure 5. Scatterplot of hourly temperature and wind speed by the month of the year (year = 2014).

Table 2 .
MAPE (%) of base models with additional wind speed terms.

Table 3 .
List of tested model groups.

Table 3 .
List of tested model groups.