Short-Term Load Forecasting in Smart Grids : An Intelligent Modular Approach

Daily operations and planning in a smart grid require a day-ahead load forecasting of its customers. The accuracy of day-ahead load-forecasting models has a significant impact on many decisions such as scheduling of fuel purchases, system security assessment, economic scheduling of generating capacity, and planning for energy transactions. However, day-ahead load forecasting is a challenging task due to its dependence on external factors such as meteorological and exogenous variables. Furthermore, the existing day-ahead load-forecasting models enhance forecast accuracy by paying the cost of increased execution time. Aiming at improving the forecast accuracy while not paying the increased executions time cost, a hybrid artificial neural network-based day-ahead load-forecasting model for smart grids is proposed in this paper. The proposed forecasting model comprises three modules: (i) a pre-processing module; (ii) a forecast module; and (iii) an optimization module. In the first module, correlated lagged load data along with influential meteorological and exogenous variables are fed as inputs to a feature selection technique which removes irrelevant and/or redundant samples from the inputs. In the second module, a sigmoid function (activation) and a multivariate auto regressive algorithm (training) in the artificial neural network are used. The third module uses a heuristics-based optimization technique to minimize the forecast error. In the third module, our modified version of an enhanced differential evolution algorithm is used. The proposed method is validated via simulations where it is tested on the datasets of DAYTOWN (Ohio, USA) and EKPC (Kentucky, USA). In comparison to two existing day-ahead load-forecasting models, results show improved performance of the proposed model in terms of accuracy, execution time, and scalability.


Introduction
An existing/traditional grid system needs renovation to bridge the ever-increasing gap between demand and supply and also to meet essential challenges such as grid reliability, grid robustness, customer electricity cost minimization, etc. [1].In this regard, recent integration of advanced communication technologies and infrastructures into traditional grids have led to the formation of so called smart grids (SGs) [2].The national national institute of standards and technology (NIST) [3] conceptual diagram of smart grid (SG) is shown in Figure 1.This conceptual diagram can be used as a reference model for standardization works in seven SG domains: generation, transmission, distribution, end users, markets, operations, and service providers.Each domain involves one or more SG actors (e.g., devices, systems, programs, etc.) to make decisions for realizing an application based on exchange of information.Further details on each domain, its involved actors, and respective applications can be found in [3].One of the advantages of this integration is customer engagement, which plays a key role in the economies of energy trade.In other words, the old concept of uni-directional energy flow is replaced by the new and smart concept of bi-directional energy flow-transformation from traditional consumer to a smart prosumer [4].The resulting/new grid, integrated with advanced metering infrastructure, faces many challenges such as [5]: (i) designing new techniques to meet the load while not increasing the generation capacity; and (ii) devising new ways/policies to ensure customer engagement with utility.When installing new technologies, utilities aim for a maximum possible return on an investment.However, this maximization would require that the daily operations of an SG utility (such as strategic decisions to bridge the gap between demand and supply, and fuel resource planning) are properly conveyed.All these decisions are highly influenced by load forecast strategy(ies) [6].Accurate load forecast means that both utility and prosumer can maximize their electricity price savings due to spot price establishment-one of the major reasons that utilities show growing interest towards SG implementation.The concerned utility forecasts the future price/load signal which is based on the past activities of users' energy consumption patterns.In response to the forecast price/load signal, the users adjust their energy consumption schedules subject to minimization of electricity cost and/or their comfort level [7].In reference [8], Hippert et al. classify load forecast based on time to be predicted (Figure 2): short-term, medium-term and long-term.Short-term load forecasting is further categorized into two types: (i) very short-term; and (ii) short-term forecasting.The first one has a prediction duration from seconds/minutes to hours and model applications in flow control.The second one has prediction horizon from hours to weeks and model applications in adjusting generation and demand, therefore, used to launch offers to the electrical market.The short-term forecasting models are vital in day-to-day operations, evaluation of net interchange, unit commitment and scheduling functions, and system security analysis.In medium term forecasting, the prediction horizon is typically between months.These models are used by utilities for fuel scheduling, maintenance planning, and hydro reservoir management.In long-term forecasting, the prediction horizon is for years.Utilities use these types of models for planning capacity of the grid and maintenance scheduling.Since accurate load forecast is needed by utilities to properly plan the ongoing grid operations for efficient management of their resources, this paper aims at an accurate load-forecasting model.However, the scope of this paper is limited to short-term load forecasting with a day-ahead prediction horizon only.In the literature, two types of day-ahead load forecasting (DALF) models have been presented: linear and non-linear [9].Also, [10] has highlighted the relative limitation(s) of linear models as compared to non-linear models.In reference [9], the non-linear models are investigated in five classes: (i) support vector machine-based models; (ii) Markov chain-based models; (iii) artificial neural network (ANN)-based models; (iv) fuzzy ANN-based models; and (v) stochastic distribution-based models.
The support vector machine-based models [11][12][13] achieve relatively moderate accuracy, but at the cost of high execution time (slow convergence rate) due to high complexity.Whereas, the Markov chain-based models [14][15][16] have low execution time, but at the cost of reduced forecast accuracy.Furthermore, the stochastic distribution-based models [17][18][19][20] need improvement in terms of both accuracy and execution time.The fuzzy ANN-based models [21][22][23][24][25][26] achieve moderate accuracy, but at the cost of high execution time.Finally, hybrid ANN-based models improve the accuracy of ANN-based models to an extent, but at the cost of high execution time.Among the hybrid ANN-based models, reference [27] selects features via MI technique and ANN-based prediction to forecast the day-ahead load (DAL) of SGs.To improve the accuracy of [27], the authors in [28] add a heuristic optimization-based technique with [27].Similarly, another hybrid strategy is presented in [29] subject to DALF of SGs.However, reference [27,29] achieve relatively high forecast accuracy while taking high time to execute the algorithm.Furthermore, the forecast error of the existing works [28,29] significantly increases due to meteorological variables (such as dew point temperature, dry bulb temperature, etc.), and exogenous variables (such as cultural and social events, human impact, etc.).Thus, we aim at improving the forecast accuracy of DALF models without increasing their execution time, and in the presence of meteorological and exogenous variables.
In our proposed work, a hybrid ANN-based DALF model for SGs is presented which is a multi-model forecasting ANN with a supervised architecture and MARA for training.The proposed model follows a modular structure (it has three functional modules): a pre-processor, a forecaster, and an optimizer.Given the correlated lagged load data along with influential meteorological and exogenous variables as inputs, the first module removes two types of features from it: (i) redundant; and (ii) irrelevant.Given the selected features, the second module employs ANN to predict future values of load.The AN is activated by sigmoid function and the ANN is trained by MARA.We further minimize the forecast/prediction error by using an optimization module in which a a heuristics-based optimization technique is implemented.The proposed DALF strategy for SGs is validated via simulations which show that our proposed strategy forecasts the future load of SGs with approximately 98.76% accuracy.To sum up, this paper has the following contributions/advantages:

•
The proposed model takes into account external DALF influencing factors such as meteorological and exogenous variables.

•
Due to better accuracy and less execution time, we have used MARA for training which none of the existing forecast models has used for training.

•
To improve the forecast accuracy and minimize the execution of the forecast model, we have performed local training which none of the existing forecast models has used.

•
We have used our modified version of the EDE in the error minimization module.The existing Bi-level strategy [28] has used EDE algorithm in the error minimization module.

•
We have tested our proposed model on the datasets of two USA grids: DAYTOWN and EKPC.For evaluation and validation purposes, we have compared our proposed model with two existing forecast models (bi-level forecast and MI+ANN forecast) and provided extensive simulation results.
Please note that this work is continuation of our previous work in [30,31], where in both [30,31] we have not considered exogenous and meteorological variables.The rest of the paper is organized as follows.Section 2 discusses recent/relevant DALF works, Section 3 briefly describes the newly proposed ANN and modified evolutionary algorithm-based DALF model for SGs, simulation results are discussed in Section 4, and Section 5 states the concluding points drawn from this work along with future work.

Related Work
For the sake of better understanding, the existing techniques are discussed in two classes (linear and non-linear) according to the type of model used [9].The model to be used is totally the choice of researcher due to specific design considerations.

Linear Models
Linear models give continuous response which is a function or linear combination of one or more prediction variables.These models depend on the synthesis of all features of a problem that is more or less solved by complex equations.Examples of these models include spectral decomposition-based models, ordinary least square-based models, ARMA, etc.Since the prediction of demand is complex due non-linearities, the linear forecast models predict with high relative errors due to their inability to map the complex relationship between input and output.Thus, development of linear models is highly challenging.Furthermore, Hagan et al. [10] highlighted the relative limitation(s) of linear models as compared to non-linear models.Therefore, this research work is focused towards the discussion of non-linear models only.

Non-Linear Models
When the observational data is modeled by non-linear combination of one or more prediction variables, the model is said to be non-linear.To describe the relation between residual and periodical components, Bunn and Farmer [32] realize/conclude the ability of non-linear models to overcome the limitation(s) of linear models.In reference [9], the non-linear models are further categorized into five classes: (i) support vector machine-based models; (ii) Markov chain-based models; (iii) ANN-based models; (iv) fuzzy neural network-based models; and (v) stochastic distribution-based models.These models are discussed as follows.
(i) Support vector machine-based models: In reference [11], Niu et al. propose support vector machine and ant colony optimization-based load-forecasting technique for an SG.The authors use ant colony optimization technique for preprocessing of the input data.In this paper, system mining technique is used for feature selection.The selected features are fed into the forecaster which is a support vector machine-based predictor.Another important work has been presented by Li et al. in [12].This varied version of the authors is least squares-based support vector machine.Similarly, reference [13] models the cyclic nature of demand by support vector machine-based linear regression.In conclusion, the support vector machine-based works are better in terms of accuracy; however, development of these models is highly challenging due to high complexity.
(ii) Markov chain-based models: Subject to robustness of DALF forecast strategy, authors in [14] propose a Markov chain-based strategy.This stochastic strategy aims to tackle load-time series fluctuations associated with energy consumption of users in a heterogeneous environment.The Markov chains are used to predict the future duty cycles of appliances.The technique is robust due to their memoryless nature (predicted pattern only depends on the current states; past states are not considered).In reference [15], Markov chain Monte Carlo method is used to model the switching pattern of household appliances.In simulations, they consider 100 households for one weak.However, this model limited in scope as it applies to situations in the Netherlands only.Another work in [16] proposes explicit duration hidden Markov model along with differential observation-based model to predict individual load of appliances.The authors collect the aggregated power signals by ordinary smart meters.The memoryless nature of Markov chains not only makes the DALF strategy robust but also relatively less complex in comparison to the aforementioned techniques.However, the memory less nature of Markov chains also has a drawback; less accuracy.
(iii) ANN-based models: ANNs learn from experience/training to predict future values while being fed with relevant input information.The advantages of these networks include but are not limited to self-organization, adaptive learning, fault tolerance, ease of integration with existing network/technology, and real time operation.The abilities to generalize and to capture non-linearity in complex environments make ANNs very attractive in problems of load forecasting.There are two basic architectures of ANN; feed forward and feedback.The former one carries information from input to output via hidden layer in forward direction only, i.e., the information of each layer is independent from that of the others.Feed forward ANNs are widely used for pattern recognition and forecasting problems.The later one carries information in both directions, forward and feedback, such that the information of each layer is dependent on that of the others.Feedback ANNs are appropriate for complex and time varying problems [33][34][35].On the other hand, the learning modes of ANNs fall under three categories: supervised [36], unsupervised [37], and re-enforced [38].In the first category, the ANN attempts to minimize minimum square error (MSE) for known target vector (i.e., the input/output vectors are specified).For a given input/output, error is calculated between output and the target values.This error is used to update the weights and biases of the ANN to minimize the MSE to a certain threshold.In the second category, the ANN does not need explicit target data.The system adjusts its output based on self-learning from different input patterns.In the third category, the connections between ANs are reinforced every time these are activated.Since this research work is limited in scope to supervised learning only, we discuss some of these latest/relevant works as follows.
In reference [27], authors present a hybrid technique subject to short-term price forecasting of SGs.This hybrid technique comprises two steps; feature selection and prediction.In the first step, a mutual information-based technique is implemented to remove redundancy and irrelevancy from the input load-time series.In the second step, ANN along with evolutionary algorithm is used to forecast the time series of the future load.In this process, the authors assume sigmoid activation function for artificial neurons (ANs) , and Levenburg-Marquardt algorithm for training.In addition, the authors fine-tune some adjustable parameters during the first and second steps via an iterative search procedure which is part of their work.Subject to forecast accuracy, this technique is efficient as it embeds various techniques; however, the cost paid is high execution time.In reference [28], the authors investigate stochastic characteristics of SG's load.More importantly, the authors present a bi-level DALF technique for SGs.In the first/lower level, ANN and evolutionary algorithm are implemented to forecast the future load/price curve.In the second/upper level, an EDE algorithm is implemented to further minimize the prediction errors.Effectiveness of this work is reflected via MATLAB simulations which demonstrate that the proposed strategy performs DALF in SGs with a reasonable accuracy by paying the cost of high execution time.The hybrid methodology in [39] completes the DALF task in four steps: (i) data selection; (ii) transformation; (iii) forecast; and (iv) error correction.In step one, some well-known techniques of data selection are used to minimize the high dimensionality curse of input load-time series characteristics.Step two deals wavelet transformation of the selected characteristics of input load-time series to enable redundancy and irrelevancy filter implementation.Followed by step three, which uses ANN and a training algorithm subject to DALF in SGs.More importantly, they choose sigmoid activation function for ANs due non-linear capturability.Finally, error correcting functions are used in step four to improve the proposed DALF methodology in terms of accuracy.In simulations, this methodology is tested against practical household load which demonstrates that this methodology is very good for improving the accuracy by paying the cost of high complexity.Another novel strategy is presented in [40] to predict the occurrence of price spikes in SGs.The proposed strategy uses wavelet transformation for input feature selection.An ANN is then used to predict future price spikes based on the training of the selected inputs.
(iv) Fuzzy neural network-based models: Doveh et al. [21] present fuzzy ANN-based model for load forecasting.In their work, the input variables are heterogeneous.They also model the seasonal effect via a fuzzy indicator.In reference [22], the authors present a self-adaptive load-forecasting model for SGs.To correlate demand profile information and the operational conditions, a knowledge-based feedback fuzzy system is proposed.For optimization of error, a multilayered perceptron ANN structure is used where training is done via back propagation method.Some other hybrid strategies such as [23,24] focus on fuzzy ANN as well.Wang [23] presents electric demand forecasting model using fuzzy ANN model, whereas, Che et al. [24] present an adaptive fuzzy combination model.Che et al. iteratively combine different subgroups while calculating fuzzy functions for all the subgroups.A few more works combining fuzzy ANN with other schemes are presented in [25,26].Subject to fuzzy neural network controller design for improving prediction accuracy, membership functions to express the inference rules by linguistic terms need proper definitions.As fuzzy systems lack such formal definitions, optimization of these functions is thus a potential research area.However, the integration of optimization technique further complicates the overall methodology.
(v) Stochastic distribution-based models: The model in [17] predicts the power usage time series by using a probability-based approach.The model also configures household appliances between holidays and working days.A major assumption in this work is the gaussian distribution-based on-off cycles of household appliances, number of appliances, and power consumption pattern of appliances.In this work, not only a wide range of appliances is considered but also high flexibility degree of appliances is considered.However, absence of closed form solution makes the gaussian-based forecast strategy very complex.Moreover, these assumptions cannot be always true, thus, accuracy of the predicted load-time series is highly questionable.An improvement over [17] is presented in [18].This research work uses 1 2 regulizer to overcome the computational complexity of gaussian distribution-based DALF strategy in [17].Moreover, the proposed DALF strategy can capture heteroscedasticity of load in a more efficient way as compared [17].Simulations are conducted to prove that the proposed DALF strategy performs better than the existing one.To sum up, we conclude that [18] has overcome the complexity of [17] to some extent; however, the basic assumptions (gaussian distribution-based on-off cycles of household appliances, number of appliances, and power consumption pattern of appliances) still hold the bases and thus make the proposal highly questionable in terms of accuracy.A semi-parametric additive forecast model is presented in [19].This work is based on point forecast and calculates the prediction intervals via a modified bootstrap algorithm.Similarly, another semi-parametric generalized additive load forecast model is presented in [20].In terms of forecast horizon, the generalized additive forecast model is better than the non-generalized one due to its dual forecast capability; short-term and middle term.However, both the forecast models are not sufficient in terms of accuracy when compared to the ANN-based models.The overall classification hierarchy of forecast techniques is shown in Figure 2, and their summary is given in Table 1.

The Proposed Forecast Strategy
ANNs are widely used as forecasters because these networks can predict the non-linearities of SGs' load with low convergence time.However, sometimes the achieved prediction accuracy is not up to the mark.Thus, leading to the adoption of optimization techniques that can significantly enhance the prediction accuracy of ANNs.However, the cost paid to achieve high accuracy is increased convergence time.Therefore, we aim towards the development of a new DALF strategy using the concept of hybrid integration subject to: (i) improvement of prediction accuracy; and (ii) reduction of convergence time.
Our proposed DALF strategy is implemented in three interconnected modules: (i) a pre-processing module; (ii) a forecast module; and (iii) an optimization module.Given the input data, the pre-processing module removes redundant and irrelevant samples from the input data.Using sigmoid activation function and MARA, the hybrid ANN-based forecast module predicts the DAL of an SG.Finally, the optimization module minimizes prediction errors to improve accuracy of the overall DALF strategy.Block diagram of the proposed model is shown in Figure 3. Detailed description of each module is as follows.

Pre-Processing Module
Since the ANN-based forecaster predicts load of the next day, the input data must be pre-processed subject to removal of redundant and irrelevant samples due to two reasons: (i) redundant features do not provide more information and thus unnecessarily increase the execution time during the training process (will be later discussed in the forecast module); and (ii) irrelevant features do not provide useful information and act as outliers.Detailed description of the pre-processor module is as follows.
As mentioned earlier, the data preparation module receives the input load-time series (historical).Suppose, following is the input load data: where, d n is the nth day, h m is the mth hour of the day, and p(h m , d n ) is power usage value of the of the nth day at the mth hour.Similarly, we have input dew point temperature data in a matrix T DP , input dry bulb temperature data in a matrix T DB , and the input type of day (working day or holiday) data in a matrix D T .Choosing n is totally dependent on the choice of designer.Greater value of n means that more historical lagged samples are available (fine tuning).This fine tuning however results in greater time during execution of the algorithm.Thus, there is a trade-off between convergence rate and accuracy.Before feeding the forecast/prediction module with P, the values of P are normalized.
In this process, a local maximum value 'p c i max ' is computed in each column of P: By local normalization we mean normalization of each P's column by local maxima (one maximum in each column); results are saved in P nrm (range of P nrm ∈ [0, . . ., 1]).Similarly, the matrices T DP,nrm , T DB,nrm and D T,nrm are normalized forms of T DP , T DB and D T , respectively.
These input matrices P nrm , T DP,nrm , T DB,nrm and D T,nrm not only contain irrelevant features but also contain redundant features.To remove these two types of features, we use mutual information technique that is proposed in [27] and later used in [28] as well.According to this technique, the relative amount of mutual information between two quantities; input K and target G, is as follows: In reference (3), MI(K, G) = 0 reflects that the input and target variables and independent, high value of MI(K, G) reflects that there is a strong relation between K and G two and low value of MI(K, G) reflects that there is loose relation between K and G.
By using (3), we calculate MI(K, G) with the help of which two types of samples (redundant plus irrelevant) are discarded from the given input data matrices P nrm , T DP,nrm , T DB,nrm and D T,nrm .According to [27,28], this MI technique achieves acceptable accuracy while not taking high time for execution.
Remark 1.The data set used for training is historical, i.e., for tomorrow's load forecast we need measured load values of previous days.Yes!The historical data was time dependent however with respect to the current day these values do not undergo any change.In other words, we deal with previously recorded data which means that the stationary assumption is not violated.Thus, the computation of MI is applicable here.Remark 2. The power consumption/demand of a user is different for days such as holidays or working days.It even shows variation for different hours such as on-peak and off-peak hours.To better explain our choice, let us consider the following example: Considering matrix P in Equation (1), let p(h 1 , d 1 ) be the prediction variable.Then there are two possible cases for training: (a) The ANN is trained by all elements of the matrix P except the first row.(b) The ANN is trained only by the 1st column of the matrix P except p(h 1 , d 1 ).
The training samples in case (a) lead to greater prediction error due to the presence of outliers.Whereas, the training samples in case (b) lead to smaller prediction error because the outliers are removed.Remark 3. To improve accuracy of a forecast/prediction model, the samples used for training must be a-priori made relevant.Also, minimized number of samples will decrease algorithm's execution time.Due to these two reasons, we prefer/chose local training for each hour.In our approach, the historical load values are locally normalized by local maxima.Then the normalized values are binary encoded with respect to local median.This encoding represents two classes of values: high and low.The classes are used for selecting features only, i.e., the mutual information is easily calculated for binary variables.This selection reduces the computational complexity of the mutual information-based feature selection strategy.Once we get rid of redundant and irrelevant samples are removed from the data set, the actual values against the binary encoded values are used for training and optimization in the rest of the modules to prevent information loss.Thus, we have used a compromising approach between computational complexity and information loss.Remark 4. Feature selection is done at beginning, and the selected features are then used for training during the operational life of the technique.From simulations, we conclude the following: (i) If the data set size is small (≤1 month), feature selection has no significant impact on the computational complexity of the overall strategy.(ii) If the data set size is moderate (≥1 month and ≤3 months), feature selection somehow affects the computational complexity of the overall strategy.(iii) If the data set size is large (≥3 months), feature selection has a significant impact on the computational complexity of the overall strategy.

Forecast Module
From the works discussed in Section 2, it is concluded that any DALF strategy must ensure non-linear prediction capability.Therefore, we choose ANNs because these can capture the highly volatile characteristics of load-time series with reasonable accuracy.
For DALF, two strategies are used; direct forecasting and iterative forecasting [28].However, it is discussed in [41] that the first strategy may introduce significant round off errors and the second one introduces large forecast errors.To overcome these imperfections, reference [28] has introduced the idea of cascaded strategy.Thus, our proposed forecast module implements the cascaded strategy.Our forecast module consists of an ANN; 24 consecutive cascaded forecasters such that each one of the 24 forecasters has an output for forecasting an hour's load of the upcoming day.It is worth mentioning that the 24 h' forecasters/predictors are modeled explicitly instead of a single implicit/complex one.These 24 one hour ahead forecasters allow improvement in terms of accuracy [28].The cascaded ANN forecast structure is a combination of direct and iterative structures such that load of each hour of the next day is directly predicted and each forecaster yields exactly one output.
In the forecast module, each forecaster is an AN that implements sigmoid function for activation.We have chosen sigmoid activation function because for enabling ANs in terms of capturing the highly volatile (non-linear) SG's time variant load characteristics.To update the weights during training process of the ANN, different algorithms have been used previously.For example, reference [42] include Gradient Descent Back Propagation algorithm.Similarly, references [27,28] suggest Levenberg-Marquardt algorithm as it can train the ANN 1-100 times faster than the Gradient Descent Back Propagation algorithm.We use multivariate auto regressive algorithm (MARA) [43] because it can train the ANN faster than Levenberg-Marquardt algorithm and Gradient Descent Back Propagation algorithm [42].According to Kolmogrov theorem, if the ANN is provided with proper number of ANs then it can solve a problem by adopting one hidden layer.Thus, we have considered one hidden layer in the cascaded ANN structure of all 24 ANs.From the selected features S f (.) of the pre-processing module, the forecast module constructs training and validation samples, S T = S f (i, j) and S V = S f (1, j), respectively (where i ∈ [2, m] and j ∈ [1, n]).These samples illustrate that the training of ANN by all the candidate inputs except the last/final one.The set of last samples of historical load-time series is used for validation purpose.In fact, the validation set is a part of the training load set constructed from it the training.Thus, the validation set becomes unseen for ANN.To make the validation error as a true representative of the forecast error, validation set needs to be as close to the forecast horizon as possible.While forecasting tomorrow's load we choose one day backward samples due to two reasons: (i) daily periodicity; and (ii) short-run trend [44].Thus, each of the 24 ANs is trained as per multi-variate MARA using the aforementioned training and validation sets.Further details of the training process to update the weights can be found in [43] and pictorial view of the learning process is shown in Figure 4.For a set of finite input-target pairs, once the weights are adaptively adjusted as per MARA [43], the forecast module returns the forecast error signal; mean absolute percentage error' ', to the optimization module.Where p a (i, j) is the actual load value and p f (i, j) is the forecasted load value.Stepwise operations of the proposed forecast module are shown in Figure 5a.

Pre-condition: MAPE(i) is the output of ith AN, for a maximum of 24
ANs.
Receive selected features from the pre-processing

Compute MAPE(i)
Maximum number of iterations reached?

Optimization Module
Based on the nature of the overall forecast strategy, the basic objective of optimization module is to minimize the forecast error, E F (.), where i ∈ [1, m], I th and R th represent thresholds for irrelevancy and redundancy, respectively.Optimization module gives I th 's and R th 's optimized values to the MI-based feature selection module which uses these threshold values for feature selection.For this purpose, various choices are available such as linear programming, non-linear programming, quadratic programming, convex optimization, heuristic optimization, etc.However, the first one is not applicable here because the problem is highly non-linear.The non-linear problem can be converted into a linear one; however, the overall process would become very complex.The second one is applicable here and gives accurate results by paying execution time's cost.Similarly, the third and fourth ones suffer from slow convergence time.It is worth mentioning here that optimization does not imply exact reachability to optimum set of solutions, rather, near optimal solution(s) is(are) obtained.To sum up, heuristic optimization techniques are preferred in these situations because these provide near optimal solution(s) in relatively less execution time.DE is one of the heuristic optimization techniques proposed in [45] and its enhanced version is used for forecast error minimization in [28].In this paper, we modify the EDE algorithm for the sake accuracy improvement.Thus, in the upcoming paragraphs, detailed discussion is presented.
According to [28], in generation t, the jth trial vector y for ith individual is given as: where, x t i,j and u t i,j are the corresponding parent and mutant vectors, respectively.In (5), FF N (.) denotes the fitness function (0 < FF N (.) < 1) and Rand(j) ∈ [0, 1] is a random number complying to uniform distribution.Between X t i and Y t i , the corresponding offspring of the next generation X (t+1) i is selected as follows: where, MAPE(.) is the objective function.From ( 5) and ( 6), it is clear that offspring selection depends on the trial vector which in turn depends on the random number and the fitness function.From this discussion, we conclude that the selected offspring is not the fittest.To make the fittest one, our approach eliminates the chances of offspring selection under the influence of random number, i.e., we modify (5) as follows: From (7), it is clear that the trial vector no longer depends on the random number instead its dependence in now totally on the mutant vector which in turn depends on the parent vector.Offspring selection by this method will ensure selection of the fittest ones subject to accuracy improvement.
Stepwise operations of the optimization module are shown in Figure 5b.

Simulation Results
For evaluation of our proposed model, we conduct simulations.For simulations, we have used MATLAB installed on Intel(R) Core(TM) i3-2370M CPU @ 2.4GHz and 2GB RAM with Windows 7. The proposed MI+ANN+mEDE-based forecast model is compared with two existing DALF models: MI+ANN forecast [27], and bi-level forecast [28].For simulation purpose, traces of real time data for DAYTOWN and EKPC (the two USA grids) are taken from PJM electricity market.This data is freely available at [46].We have used January-December 2014 load values for training the ANN, and January-December 2015 data for testing the ANN.Following are the simulation parameters that are used in our experiments (refer to Table 2).Justification of these parameters can be found in [27,28,42,43].forecast strategy is subjected to step ahead generations.However, during simulations, we observed that from 89th to 100th generation, the forecast error does not exhibit significant improvement.Therefore, the proposed and the existing forecast models are not subjected to further generations.There exists a possible trade-off between accuracy of a forecast strategy and its convergence rate (refer to Sections 1-3).This trade-off is shown in Figure 6c-f.From these figures, it is clear that the bi-level forecast model improves the accuracy of MI+ANN forecast model while paying cost in terms of relatively slow convergence rate.On the other hand, the newly proposed MI+ANN+mEDE model modifies the EDE algorithm to further improve the accuracy of the bi-level forecast model.More importantly, the MI+ANN+mEDE model improves the prediction accuracy by not paying surplus cost in terms of execution time.However, the execution time of our proposed forecast model is still greater than the MI+ANN forecast model due to integration of optimization module.
Figure 7 shows the impact of dataset size (number of training data samples) on error performance (see Figure 7a) and execution time (see Figure 7b) of the three selected models.By observing Figure 7a, an improvement of error performance for all the compared STLF models is evident when the number of lagged input samples increase from 30 to 120.This result follows Equation (1), i.e., the ANN is more finely tuned by increasing the value of n (30 to 120) which improves the forecast error performance.However, this improvement is not significant at much higher tuning when the number of training samples are increased from 60 to 120 (stability can be seen in the curves).On the other hand, Figure 7b shows the cost of high execution time paid by the fine tuning to achieve relative improvement in forecast accuracy.This is obvious because training of the ANN takes additional time when the number of training samples are increase.From Figure 7a,b, it is clear that the proposed modular model is more scalable (relatively higher degree of stability can be seen for MI+ANN+mEDE forecast) as compared to the other two models.The reasons for this higher scalability are: usage of selected features for training of the ANN, training the ANN via MARA algorithm with local normalization, and usage mEDE algorithm for error minimization.
Table 7 shows the relationship between MAPE and the number of iterations of the three compared STLF models when tested on DAYTOWN and EKPC datasets.The convergence characteristics (i.e., the number of iterations) indicate that the proposed MI+ANN+mEDE model and the bi-level model converge at an optimal value in almost the same number of iterations.On the other hand, the MI+ANN model takes only 20-23 iterations for converging into an optimal target value.This result is obvious due to the added computational burden in the bi-level and the MI+ANN+mEDE models (i.e., these models use the optimization module) which is not the case in MI+ANN model (i.e., this model does not use the optimization model).In other words, the MI+ANN model achieves its target of the required training, testing, and validation with the least number of iterations.However, this least computational burden is achieved by paying the high cost of forecast accuracy.In this regard, a regression analysis of the network was performed to evaluate confidence interval of the training, testing and validation performance of the compared forecast models, and the results are shown in Table 7.Clearly, the proposed MI+ANN+mEDE model achieves the highest confidence interval (i.e., 98%) as compared to bi-level (i.e., 97%) and MI+ANN (i.e., 96%) models.This means that only 2% of the estimated data is not statistically significant for the network in case of the proposed MI+ANN+mEDE model.As a result, the forecasted load demand of the proposed MI+ANN+mEDE model is rather closer to its actual value as compared to the other two models (see Figure 6a,b).

Conclusions and Future Work
In SGs, DALF is an essential task because its accuracy has a direct impact on the planning schedules of utilities that strongly affects the energy trade market.Moreover, high volatility in the history load curves makes DALF in SGs relatively more challenging when compared to load forecast for longer duration.Taking into account DALF influencing factors such as exogenous variables and meteorological variables, we have presented a hybrid ANN-based DALF model for SGs which is a multi-model forecasting ANN with a supervised architecture and MARA for training.The proposed model significantly reduced the execution time and enhanced the forecast accuracy by distinctly carrying local normalization and local training.Moreover, sigmoid activation function and MARA enable the forecast strategy to capture non-linearities in load-time series.Integration of optimization module (based on our proposed modifications) with the forecast strategy also improved the forecast accuracy.Tests are conducted on three USA grids: DAYTOWN, EKPC and FE.Results show that the proposed model achieves relatively better forecast accuracy (98.76%) in comparison to an existing bi-level technique and an MI+ANN technique.Moreover, improvement in forecast accuracy is achieved while not paying the cost of slow convergence rate.Thus, the trade-off between convergence rate and forecast is not created.Finally, from application perspective, the proposed model can be used by utilities to launch better offers in the electricity market.This means that the utilities can save significant amount of money due to better adjustment of their generation and demand schedules simply because of high accuracy of the proposed model.
In future, we are interested in advanced signal processing techniques for feature selection and extraction purposes.Moreover, exploration of particle swarm optimization-based techniques and a complete forecast plus scheduling-based technique is also under consideration.

Figure 2 .
Figure 2. Classification of existing forecast techniques.

Figure 3 .
Figure 3. Block diagram of the proposed modular approach for an hour.

Figure 4 .
Figure 4. Supervised learning of the ANN.

Figure 6 .
Figure 6.Relative performance of the proposed intelligent modular approach tested on historical data of DAYTOWN and EKPC grid: STLF results for 27 January 2015.

Figure 7 .
Figure 7. Relative scalability analysis of the proposed intelligent modular approach.

Table 1 .
Performance analyses of the selected forecast classes.

Table 7 .
Comparison of training iterations (convergence) and regression analysis (accuracy).