Recurrent Neural Network Based Short-Term Load Forecast with Spline Bases and Real-Time Adaptation

Short-term load forecast (STLF) plays an important role in power system operations. This paper proposes a spline bases-assisted Recurrent Neural Network (RNN) for STLF with a semi-parametric model being adopted to determine the suitable spline bases for constructing the RNN model. To reduce the exposure to real-time uncertainties, interpolation is achieved by an adapted mean adjustment and exponentially weighted moving average (EWMA) scheme for finer time interval forecast adjustment. To circumvent the effects of forecasted apparent temperature bias, the forecasted temperatures issued by the weather bureau are adjusted using the average of the forecast errors over the preceding 28 days. The proposed RNN model is trained using 15-min interval load data from the Taiwan Power Company (TPC) and has been used by system operators since 2019. Forecast results show that the spline bases-assisted RNN-STLF method accurately predicts the short-term variations in power demand over the studied time period. The proposed real-time short-term load calibration scheme can help accommodate unexpected changes in load patterns and shows great potential for real-time applications.


Introduction
Short Term Load Forecasting (STLF) can be used to obtain the most economical way to commit power generation sources while fulfilling policies requirements, ensuring reliability and meeting the security, environmental, and equipment constraints of the power system [1].
The daily load profile generally follows cyclic and seasonal patterns related to both the climate and human activities, and is intrinsically a univariate time series. Many general forecasting methods based on regression or time-series models can be used for load forecasting (e.g., a semi-parametric additive model [2] or an autoregressive integrated moving average (ARIMA) [3]). These methods assume, however, a linear relationship between the observed and future time series. This assumption makes them less effective for time series with significant nonlinear characteristics, such as those associated with energy demand. Chen et al. [4] considers a more complicated time series model with a functional trend curve to improve the forecast results.
Due to their nonlinear fitting ability, machine learning techniques have been applied to many forecasting problems. The Artificial Neural Network (ANN) [5] is a typical machine learning method. ANNs learn regularities and patterns automatically from past recorded data and produce generalized results with the ability to be self-adaptive. Feed forward Multilayer Perceptron (MLP) [6,7] and Generalized Regression Neural Network

Methodology
The proposed RNN-based STLF procedure, with selected bases through a semiparametric model and a real-time load forecast adjustment scheme, is shown in Figure 1. In the first stage, forecasts of daily load patterns up to the next seven days are obtained using the apparent temperatures predicted by the Taiwan Central Weather Bureau (TCWB).
In the second stage, based on past-forecasted results, real-time adapted forecasting load sequences are generated through the interpolation method using real-time adaptation and an exponentially weighted moving average (EWMA). Figure 1 presents the data flow of the proposed RNN-based RNN adp model. Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 1 the first stage, forecasts of daily load patterns up to the next seven days are obtained using the apparent temperatures predicted by the Taiwan Central Weather Bureau (TCWB). In the second stage, based on past-forecasted results, real-time adapted forecasting load se quences are generated through the interpolation method using real-time adaptation and an exponentially weighted moving average (EWMA). Figure 1 presents the data flow o the proposed RNN-based RNN model.

Apparent Temperature
The system net-loads are greatly affected by external factors such as temperature humidity, wind speed and seasonal events that change over time. The apparent tempera ture index [21] equivalent to the temperature felt is used to evaluate its effect upon th load. The apparent temperature is defined as: ], e is the pressur in hPa, V is the wind speed in m/sec, and RH is the relative humidity in percent. Th apparent temperature for the next 48 h (with a 3 h resolution) is provided by the TCWB Throughout this work, references to "temperature" refer to the apparent temperature. Different regions have different weather patterns. The total system load is a combi nation of the loads from the north, central, south and east regions of Taiwan (whose aver age load proportions are 38%, 28%, 33% and 1%, respectively). Since the goal is to forecas the total system load, the temperatures of the four regions are merged into one value by taking the weighted average of their temperatures with weights equal to their load pro portions; namely: (2 where = 1, … ,4 corresponds to the north, central, south and east regions respectively , is the temperature in the ith region and ℓ i is the proportion of regional load in system total energy demand.

Spline Basis Functions
To catch the general behaviors of the daily load patterns, we consider the class o multi-resolution basis functions proposed by Tzeng and Huang [19], which are ordered in the direction of increasing resolution detail, with the number of bases, K, being chosen to be large enough to represent the general 24-h patterns. On the other hand, due to th fact that the daily load patterns may change rapidly between the peak and off-peak peri ods, we also include the cubic B-spline basis functions to accommodate load patterns tha The system net-loads are greatly affected by external factors such as temperature, humidity, wind speed and seasonal events that change over time. The apparent temperature index [21] equivalent to the temperature felt is used to evaluate its effect upon the load. The apparent temperature T a is defined as: where T c is the temperature in Celsius, e = 6.105 * RH 100 * Exp 17.27T c 237.7+T c , e is the pressure in hPa, V is the wind speed in m/sec, and RH is the relative humidity in percent. The apparent temperature for the next 48 h (with a 3 h resolution) is provided by the TCWB. Throughout this work, references to "temperature" refer to the apparent temperature.
Different regions have different weather patterns. The total system load is a combination of the loads from the north, central, south and east regions of Taiwan (whose average load proportions are 38%, 28%, 33% and 1%, respectively). Since the goal is to forecast the total system load, the temperatures of the four regions are merged into one value by taking the weighted average of their temperatures with weights equal to their load proportions; namely: where i = 1, . . . , 4 corresponds to the north, central, south and east regions respectively, T a,i is the temperature in the ith region and i is the proportion of regional load in system total energy demand.

Spline Basis Functions
To catch the general behaviors of the daily load patterns, we consider the class of multi-resolution basis functions proposed by Tzeng and Huang [19], which are ordered in the direction of increasing resolution detail, with the number of bases, K, being chosen to be large enough to represent the general 24-h patterns. On the other hand, due to the fact that the daily load patterns may change rapidly between the peak and off-peak periods, we also include the cubic B-spline basis functions to accommodate load patterns that change substantially within relatively short periods of time. Two sets of bases functions are used: (1). the multi-resolution bases { f 1 , . . . , f n } defined on n control points {s 1 , . . . , s n } and (2). the B-spline bases of order d, B i,d , i = 1, . . . , n with knots at {s 1 , . . . , s n }. Details about the spline basis functions can be found in [19].

Semi-Parametric Model
A semi-parametric (SPM) model is adopted for STLF under the framework of additive models [22], by a suitable combination of the two aforementioned sets of spline basis functions, together with the nonlinear function of the temperature. Now let the sequence of daily load random vectors at time t be y t = (y(t, 1), . . . , y(t, n)) , t = 1, . . . , T, with y(t,s) denoting the load at time t and local time grid location (control point) s, s = 1, . . . , n.
The model for STLF is assumed to have the following form: where µ(t, s) represents the mean function of the logarithm of the daily load y(t, s), and (t, s) is the corresponding random error at time t = 1, · · · , T and period s = 1, . . . , n, assuming that t = ( (t, 1), . . . , (t, n)) ∼ N(0, Σ n (t)), with Σ n (t) being the covariance matrix of the random error vector t at time t. In the case of an independent error at time t and assuming Σ n (t) = σ 2 I n , the time series errors sequence t , t = 1, . . . , T, may yield a different covariance matrix estimate. Specific patterns of the response variable (i.e., the system net load) of the model for forecasts are described below, under the following explanatory variables.

1.
It is observed that there are patterns depicted by the intra-daily, intra-weekly, peak and off-peak effects to be modeled by the multi-resolution and cubic B-spline bases.

2.
It is clear that the temperature significantly affects the load pattern. The weighted average of the temperature at different periods each day, and similarly the daily highest and lowest and the weighted average of temperatures in the different regions, are included as important predictors.

3.
The interaction effects of the period with the day type within each week are also crucial.
To accommodate these three effects, and also the temperature effect, let the mean function at time t and point s be given by: with: interaction effect among the intra-daily and intra-weekly DW(t, s) 4 apparent temperature effect T(t, s) To model the intra-daily effect, we use a combination of the first 24 multi-resolution bases { f 1 , . . . , f 24 } for 96 control points with s i = i, i = 1, . . . , 96, and 96 cubic B-spline bases B d j,4 , j = 1, . . . , 96 with knots at s j , j = 1, . . . , 96. Similarly, the intra-weekly effect is modeled by 7 cubic B-spline bases B w k, with knots at w k , k = 1, . . . , 7.
The interaction effects among the intra-daily and intra-weekly are modeled by the products of the corresponding intra-daily and intra-weekly bases functions, namely:

Model Bases Selection
Many standard estimators can be improved by shrinkage methods, such as ridge regression [23] and Lasso regression. This study adopts Lasso regression to obtain sparse solutions from the model bases selection.
In a Lasso regression, the value of the parameter controls both the size and the number of coefficients. Cross-validation is a resampling technique which can find a parameter value that ensures a proper balance between bias and variance. In this case, cross-validation considers the best tuning parameter value to be the one that minimizes the estimated test error rate of the forecasting results. More details about the Lasso estimate and adaptive Lasso can be found in [17,18].

Temperature Forecast Adjustment
Currently, TCWB provides three-hour temperature forecasts for the present day (D-day), the next day (D + 1), and the maximum and minimum temperature forecasts for the days (D + 2) to (D + 7).

Calibration of temperature forecasts
To calibrate the day-ahead temperature forecasts at the eight time points s 1 , . . . , s 8 provided by TCWB before D-day (the first day that the temperature forecasts are to be calibrated), the errors between the historical apparent temperatures T a (t, s), and the recorded day-ahead temperature forecasts of the 28 days before D-day,T a (t, s), t = D − 1, . . . , D − 28, s = s 1 , . . . , s 8 , are used for the calibration. Define both the historical and the forecasted mean temperatures of the 28 days before D-day as T(D, s) andT(D, s) respectively, where s = s 1 , . . . , s 8 : For D-day, let the error between the mean temperatures and the calibrated temperature forecasts be, respectively: The calibrations and (D + 1)-day's eight temperature forecast time points can be found similarly. Note that samples from historical days with unusual temperature pattern are treated as outliers and thus deleted beforehand.

2.
Refined temperature forecasts As we need to make load forecasts at 15-min intervals, we first interpolate the provided three-hour forecast data into a 15-min resolution, which will lead to smaller biases versus the real 15-min interval temperatures. We use the well-established Cubic Hermite interpolation method [20] and present that interpolation formula below.
Let the sequence {u i } k i=1 be a partition u 1 < u 2 < · · · < u k of the interval [u 1 , u k ], and let {T i }, T i = h(u i ), be the corresponding data points. The local grid spacing is ∆u (i+0.5) = u (i+1) − u i , and the slope of the piecewise linear interpolant between the data The cubic Hermite interpolant polynomial defined for u i < x < u i+1 is: Then the interpolant method produces as its output a sequence of temperature forecasts at 15-min intervals.

3.
Transformed temperature forecasts It is noted that the effect of temperature to the load is nonlinear as shown in Figure 2 and upon examination it is observed that the load is approximately linearly related to the logistic sigmoid transformation of the temperature through where c 0 = 30 represents the location of the reflection from concave upward to concave downward, c 1 = 0.85 represents the scale parameter controlling slope changes.
Then the interpolant method produces as its output a sequence of tem casts at 15-min intervals.

Transformed temperature forecasts
It is noted that the effect of temperature to the load is nonlinear as sho and upon examination it is observed that the load is approximately linearly logistic sigmoid transformation of the temperature through where 0 = 30 represents the location of the reflection from concave upw downward, 1 = 0.85 represents the scale parameter controlling slope cha

Recurrent Neural Network with Selected Bases
RNN introduces loops in the network and allow internal connections units to enable exploration of the temporal relationships among the data structure with selected model bases taken from the resulting SPM describe sented in the following.

General Structure of RNN
Each of the RNN layers uses a loop to iterate over the time steps of the RNN with a single hidden layer is illustrated below.
The input training data is the adaptive lasso estimator effects, it is giv The mapping of the output ( ) can be represented as: where (0) = 0, t = 1,..., T, ϕ are the activation functions, u is the input we is the input weight for ( − ) , where both are the same for all time points

Recurrent Neural Network with Selected Bases
RNN introduces loops in the network and allow internal connections among hidden units to enable exploration of the temporal relationships among the data [24]. The RNN structure with selected model bases taken from the resulting SPM described above is presented in the following.

1.
General Structure of RNN Each of the RNN layers uses a loop to iterate over the time steps of the sequence. An RNN with a single hidden layer is illustrated below.
The input training data is the adaptive lasso estimator effects, it is given as: The mapping of the output o (t) can be represented as: where o (0) = 0, t = 1,..., T, φ are the activation functions, u is the input weight for , t = 1, . . . , T, which can be obtained through the following equation: where h where V is output weight matrices and b 0 is a parameter in the model representing the bias of the hidden layer and the output layer. With suitable choices for the parameters, such as the number of layers k, number of neurons in each layer m, and time steps of a sequence T, the RNN is expected to perform better than a more general model structure considering time effects in the neural network framework for STLF problems. We build an RNN model with k multilayer perceptrons in Python using the tensorflow library.

2.
Configuration Architecture The RNN training process is heavily influenced by the choice of hyper parameters: sequence size, number of hidden layers and number of nodes per hidden layer. Efforts were made to search for a hyper parameter space to test different parameter combinations most suitable for TPC system. The experiment was conducted using standard RNN network to provide a best set of hyper-parameters. The results shown in Table 1 indicate that the best number of hyper-parameters units is 14, Layers is 3, and Time steps is 4.

Real-Time Adapted Forecasting
The increasing use of renewable power sources has produced an increa tency and a ramping in the net load profile that requires additional control e tain frequency quality. For a complete treatment for STLF, we also prov adaptive STLF procedure to help system operators with a detailed view int power system condition, so as to aid in their decision making. A quasi-r based forecasting model (RNN ) with the objective of providing short-t casts is described below.

Load Forecasts Interpolation
In the first stage, the STLF results are interpolated to be a sequence every 5-min period. The Cubic Hermite interpolation method is used to pr forecasts at 5-min intervals. This real time load data at 5-min intervals is th second step to adaptively adjust the forecasting results.

Adaptive Load Forecasting
The correction value is the average difference between the actual and load values in the past 15-min interval. In other words, it is the average o ferences of the actual and forecasted load values calculated at 5-min interv

Exponentially Weighted Average
Finally, we use the EWMA to smooth the correction result. The expon ing is given by the formula: is the ith corrected value.

Test Results
TPC system load data from January 2012 to December 2019 are used proposed method. The load data used here is the net load (the power serve ators minus the TPC's pumped storage load). The days in each year are di classes: general days and special days. Special days refer to exceptional d

Real-Time Adapted Forecasting
The increasing use of renewable power sources has produced an increase in intermittency and a ramping in the net load profile that requires additional control efforts to maintain frequency quality. For a complete treatment for STLF, we also provide a real-time adaptive STLF procedure to help system operators with a detailed view into the real-time power system condition, so as to aid in their decision making. A quasi-real time RNN-based forecasting model (RNN adp ) with the objective of providing short-term load forecasts is described below.

Load Forecasts Interpolation
In the first stage, the STLF results are interpolated to be a sequence with values in every 5-min period. The Cubic Hermite interpolation method is used to produce the load forecasts at 5-min intervals. This real time load data at 5-min intervals is then used in the second step to adaptively adjust the forecasting results.

Adaptive Load Forecasting
The correction value is the average difference between the actual and the forecasted load values in the past 15-min interval. In other words, it is the average of the three differences of the actual and forecasted load values calculated at 5-min intervals.

Exponentially Weighted Average
Finally, we use the EWMA to smooth the correction result. The exponential smoothing is given by the formula: where y i is the ith corrected value.

Test Results
TPC system load data from January 2012 to December 2019 are used for testing the proposed method. The load data used here is the net load (the power served by all generators minus the TPC's pumped storage load). The days in each year are divided into two classes: general days and special days. Special days refer to exceptional days that have their own load patterns (e.g., holidays, days experiencing a typhoon, etc.). General days refer to either typical working days or weekends. The main goal of this study is to provide STLF method for general days.

Training Data Selection
For the future day loads to be predicted, the training samples are chosen from historical days with a similar load pattern. The input-target pairs are the historical temperature (predicted and actual) and load data recorded during the corresponding days in the previous 28 days (4 weeks), together with the 6 weeks around the same period of the previous year and the predicted temperatures of the future days from TCWB. To select a subset of model bases as predictors for estimating the future day loads, a training set that has 70 daily loads and temperature data corresponding to the time period shown in Figure 4 is used. The forecasting process begins every morning at 9:00 a.m. to forecast demand up to next 7 days with 15 min resolution. The test results obtained when applying the method to forecast the load in year 2018-2019 are presented. STLF performance indices, such as the mean absolute mean error (MAE), root mean square error (RMSE), absolute performance error (APE) and mean absolute percentage error (MAPE), are used to evaluate the forecasting accuracy of the model used [25].
ci. 2021, 11, x FOR PEER REVIEW year and the predicted temperatures of the future days from TCW model bases as predictors for estimating the future day loads, a daily loads and temperature data corresponding to the time peri used. The forecasting process begins every morning at 9:00 a.m. to next 7 days with 15 min resolution. The test results obtained whe to forecast the load in year 2018-2019 are presented. STLF perfo the mean absolute mean error (MAE), root mean square error (R mance error (APE) and mean absolute percentage error (MAPE), forecasting accuracy of the model used [25].

Comparison of Test Results Obtained from the Semi-Parametric M
The MAE, RMSE and MAPE of the accuracies of the load fo from 2018-2019 are provided. The forecasting accuracies for the tw historical temperatures serve as a baseline for comparison. In Ta (D + 1)-day monthly MAE, RMSE and MAPE of the two models w tures are given in details. It can be seen that with the actual tem have good accuracies on the (D + 1)-day forecasts and the perform is especially outstanding on most of the months from 2018-2019 w MAPE at 2.03, 1.70 respectively.

Comparison of Test Results Obtained from the Semi-Parametric Model and the RNN Model
The MAE, RMSE and MAPE of the accuracies of the load forecasts for every month from 2018-2019 are provided. The forecasting accuracies for the two models based on the historical temperatures serve as a baseline for comparison. In Table 2 the corresponding (D + 1)-day monthly MAE, RMSE and MAPE of the two models with historical temperatures are given in details. It can be seen that with the actual temperature, both models have good accuracies on the (D + 1)-day forecasts and the performances of the RNN model is especially outstanding on most of the months from 2018-2019 with annual averages of MAPE at 2.03, 1.70 respectively.

Forecasts with Temperature Calibration
The actual temperature data indicate that temperature forecasting biases increase rapidly with large values (around 3 degrees). Figure 5 presents the MAPE time plots of two STLF models using original and adjusted temperature forecasts as inputs. From Figure 5 and Table 3, it can be seen that, after calibrating the forecasted temperature through bias correction based on the previous 28 days' temperature forecasting biases, the forecasting accuracies are significantly improved.
Appl. Sci. 2021, 11, x FOR PEER REVIEW Figure 5. The mean absolute percentage errors (MAPEs) of semi-parametric (SPM) and RN els with the historical, forecasted and calibrated temperatures for the (D + 1)-day daily loa terns.

Real-Time Forecast Performance
The monthly performance comparisons for 2018 on the MAPE of the real-time D-day forecast for the next 6 h are given in Table 4. As the table shows, the annual averages of the MAPE for the RNN-based RNN adp model are below 1%.  Figure 6 shows the performance of the model for a typical day of the studied period, June 22. As can be seen, the real time load pattern and the forecasting load pattern of the RNN model have similar trend patterns, but the forecasted curve is much lower. With the adjusted model, RNN adp , however, the forecast accuracy improves significantly, obtaining an average error of 0.567% across the entire day.

Real-Time Forecast Performance
The monthly performance comparisons for 2018 on the MAPE of the real-time D-day forecast for the next 6 h are given in Table 4. As the table shows, the annual averages of the MAPE for the RNN-based RNN adp model are below 1%.  Figure 6 shows the performance of the model for a typical day of the studied period, June 22. As can be seen, the real time load pattern and the forecasting load pattern of the RNN model have similar trend patterns, but the forecasted curve is much lower. With the adjusted model, RNN , however, the forecast accuracy improves significantly, obtaining an average error of 0.567% across the entire day.

Comparison of ANN, MIX, SPM and RNN Model Performance
In this sub-section, performance of different STLF methods are compared, including a two-stage Artificial Neural Network (ANN) model and an STLF model developed previously by TPC with special attention being given to the adjustments of the peak and nadir load forecasts [26], and a mixed model (MIX) with a weighted average of the ANN and against our basic RNN without Lasso variable selection or temperature calibration, where the weights are inversely proportional to the MAPEs of the previous day.
The performances of the next (D + 1)-day monthly MAPEs for these four forecasting models are presented in Table 5, for the year 2018. As the table shows, each model has its own advantages and disadvantages in daily load profile and max/min load forecasts. For example, the RNN model has the best overall yearly average performance for monthly MAPEs: 2.34 for the daily loads and 2.23 for daily peak loads. The SPM model, however, performs the best for nadir loads, with an average monthly MAPE of 2.18. The RNN performance is the best in the spring seasons and fairly good in the winter, when daily load patterns are stable. The SPM performance is the best in the summer season when the weather varies more. The MIX model has its best accuracy in the winter season. The ANN does very well in February and August, when there is the most uncertainty in the load pattern. Both the ANN and MIX models have large biases in June, however, especially for days adjacent to special days.
In more closely examining the daily MAPEs of the four models, we find that there are only 2 days for which all four models have MAPEs greater than 4, thus failing to catch the real load patterns: the day before Chinese New Year's Eve in February and the "nine in one" election day in November. Figure 7 shows the actual and (D + 1)-day forecasts for the four models made on the previous D-day of these two days. All four models have similar forecasts with large biases to the real load on these two days. This indicates that these two days should be considered as special days in the future, so as to avoid large biases being included in the training samples for future forecasts, thereby helping improving the biases, particularly in February, for the SPM and RNN models. Figure 8 presents the boxplots of daily MAPEs in 2018, after deleting the two days mentioned above. The performances of SPM and RNN are shown to be generally more robust, with fewer extremely large MAPEs.       One of those days is 13 June 2018, where the MIX and ANN have sim casts, while the SPM and RNN perform reasonably well. The other day is where the MIX model is slightly better than the ANN, with smaller biases. One of those days is 13 June 2018, where the MIX and ANN have similar daily forecasts, while the SPM and RNN perform reasonably well. The other day is 11 June 2018, where the MIX model is slightly better than the ANN, with smaller biases.
Another indication that the four models complement each other well is that only about 5% of forecasting days have MAPEs greater than 2.5 for all four models. This 5% compares to the 18% of days where the MAPEs of both ANN and MIX are both greater than 2.5, and the 14% of days where this is true for both SPM and RNN-a reduction of more than 9% in these cases. Among those days where all four models had large errors, about two thirds had explainable unexpected circumstances that caused the forecasting errors, such as TPC executing electric demand bidding, extreme weather conditions, or special events.
A new model can, in fact, be created by using an optimal weighted average of the ANN, SPM and RNN forecasts as such a hybrid model might further reduce the forecasting errors with proper time-varying weightings. How to choose appropriate optimal weights is a topic worthy of further investigation.
3.6. Performances of the (D + 2) to (D + 7) Day Forecasting Accuracies of the RNN Models Figure 10 presents the monthly averages of the (D + 2)-day to (D + 7)-day forecasting MAPEs of the RNN models with forecasting temperatures from 2018-2019. Note that the seasonal patterns appear in both years, and, as they are the (D + 2)-day to (D + 7)-day forecasts, the higher MAPEs due to longer-range forecasts are to be expected.
casts, while the SPM and RNN perform reasonably well. The other day is where the MIX model is slightly better than the ANN, with smaller biases.
Another indication that the four models complement each other we about 5% of forecasting days have MAPEs greater than 2.5 for all four mo compares to the 18% of days where the MAPEs of both ANN and MIX ar than 2.5, and the 14% of days where this is true for both SPM and RNNmore than 9% in these cases. Among those days where all four models ha about two thirds had explainable unexpected circumstances that caused t errors, such as TPC executing electric demand bidding, extreme weather special events.
A new model can, in fact, be created by using an optimal weighted a ANN, SPM and RNN forecasts as such a hybrid model might further reduc ing errors with proper time-varying weightings. How to choose approp weights is a topic worthy of further investigation. Figure 10 presents the monthly averages of the (D + 2)-day to (D + 7)-d MAPEs of the RNN models with forecasting temperatures from 2018-2019 seasonal patterns appear in both years, and, as they are the (D + 2)-day forecasts, the higher MAPEs due to longer-range forecasts are to be expecte

Conclusions
An STLF method using a semi-parametric model and RNN with selected bases is presented. This tool has been adopted by TPC Operation Department for daily operation purposes since 2019. Due to the weather characteristics in Taiwan, test results indicate that STLF is especially challenging at season transitions: from spring to summer and from summer to autumn. The main advantage of an RNN-based STLF proposed here is that with the calibrated forecasted temperatures and features extracted from the load series, through ensembles of B-spline and multi-resolution bases after statistical variable selection approach, it can avoid the overfitting problems in the deep learning stage and adapt to these weather changing patterns earlier than other methods. Noticeable improvements of the MAPEs in 2019 for the RNN model with calibrating temperatures as compared with other methods are observed. However, the intra-day load forecasts are sometimes far off due to unexpected meteorologic factors. The real-time adaptive load forecasts for the next one hour with every 5-min interval and helps the system operator to adjust the ancillary service requirement to meet the electricity demand changes. In [15,16], both have used LSTM as the deep learning methodology; we have also tried the LSTM model, where results for the STLF show that the improvements on the accuracies of the forecasts are limited and the model is more complicated and takes much more time to compute, which is not that feasible for daily use in practice. In [16], a similar-days selection procedure is adopted, which is worthy of more studies to see the advantages and shortcomings of this approach for our dataset with fast changing weather patterns. Besides the techniques presented, an optimal model averaging various load forecasting models is a topic for further investigation, as is how to extend the training samples to special day load forecasts.

Conflicts of Interest:
The authors declare no conflict of interest.