Forecasting of Market Clearing Volume Using Wavelet Packet-Based Neural Networks with Tracking Signals

: In order to analyze the nature of electrical demand series in deregulated electricity markets, various forecasting tools have been used. All these forecasting models have been developed to improve the accuracy of the reliability of the model. Therefore, a Wavelet Packet Decomposition (WPD) was implemented to decompose the demand series into subseries. Each subseries has been forecasted individually with the help of the features of that series, and features were chosen on the basis of mutual correlation among all-time lags using an Auto Correlation Function (ACF). Thus, in this context, a new hybrid WPD-based Linear Neural Network with Tapped Delay (LNNTD) model, with a cyclic one-month moving window for a one-year market clearing volume (MCV) forecasting has been proposed. The proposed model has been effectively implemented in two years (2015–2016) and unconstrained MCV data collected from the Indian Energy Exchange (IEX) for 12 grid regions of India. The results presented by the proposed models are better in terms of accuracy, with a yearly average MAPE of 0.201%, MAE of 9.056 MWh, and coefﬁcient of regression ( R 2 ) of 0.9996. Further, forecasts of the proposed model have been validated using tracking signals (TS’s) in which the values of TS’s lie within a balanced limit between − 492 to 6.83, and universality of the model has been carried out effectively using multiple steps-ahead forecasting up to the sixth step. It has been found out that hybrid models are powerful forecasting tools for demand forecasting.


Introduction
In the present day electricity supply markets, the utility of load forecasting tool is high as it helps in managing the demand leading to a transparent price of electricity to the consumers. Furthermore, it helps in the security, decision making, reliability, and stability of the transmission system. After a deep extensive literature study, there are, namely, three prospectives of load forecasting on the behalf of time span: short-, mid-, and long-term perspective. Each approach has a different view as per data complexity and input data parameters utilized in coordination with seasonal, as well as environmental, factors. It has also been observed that the varying level of accuracy has been achieved depending on the foresting approach used [1,2].
In parallel, the pricing mechanism is also an important issue in electricity markets. Electricity has been traded through the bidding mechanism via the power exchange in which generating companies (GENCOs) can submit generation bids corresponding to their bidding prices, and consumers do the same with respect to their load demand. The market is cleared at an equilibrium point where both generation and demand bids meet. The quantity of electricity demand at this equilibrium point is called market clearing volume (MCV) and the lowest price at that point is called the market clearing price (MCP). At the MCP, generation companies must be satisfied to sell their generation, and consumers must be satisfied to purchase their electricity demand corresponding to their respective bids. Hence, the proper bidding strategy is a critical issue for market players in order to maximize their profit in electricity markets. Generally, limited information is available about the market, therefore; both generators and consumers rely on load demand forecast information available for preparing their strategies corresponding to their bids [3,4].
In the last 20 years, many efforts have been made to develop models for MCV forecasting such as statistical, artificial intelligence, signal processing, and data mining-based standalone and hybrid models. Among these, artificial intelligence (AI)-based models are promising because of their ability to find hidden relationships between inputs and outputs of the system. These AI-based models are also the most common, accurate, and efficient ones for load-profile estimation and have been utilized in three different ways: one is an individual neural network (NN) and the other two are known as hybrid models (evolutionary and pre-processing-based) [1,[5][6][7][8]. In forecasting, the main problem associated with the neural network (NN) is learning and data preprocessing. Therefore, most of the research available has been carried out by considering these factors. The parameters of NN are determined by gradient search algorithms associated with the problem of local minima and are also quite sensitive to the persistence of initial values that result in higher error rates (due to over and under training). Thus, for the initialization of parameters during the learning and training of NN, some other global search optimization techniques have been employed. In [9], a traditional Genetic Algorithm (GA) for optimization of the fuzzy rule base of the hybrid fuzzy NN is utilized; whereas, a modified GA with new genetic operations has also been proposed to optimize the fuzzy rules of Neural Fuzzy Network (NFN) for hourly-load forecasting [10]. The problem of over and under-forecasting during the learning process of the modified Radial Basis Function Neural Network (RBFNN) has been resolved using a GA-based optimization algorithm [11]. For improving the forecasting accuracy of NN, the Particle Swarm Optimization (PSO)-based algorithm has been employed instead of the Levenberg Marquardt (LM) algorithm [12]. By employing a four-step-ahead load forecast, the parameters of the Recurrent Support Vector Machine (RSVM) have been optimized using GA. Standard v-SVM suffers from high-frequency components; hence, a Gaussian loss function-based g-SVM has been proposed to approximate the load series with normal trend data and Embedded Chaotic PSO (ECPSO) has been used for parameters selection of g-SVM [13]. In [14], the PSO, based on Wang-Mendel (WM) for optimization of the fuzzy rule base of a load forecaster is demonstrated. To overcome the slow processing speed and over-training of SVM, Ant Colony Optimization (ACO) has been deployed for Wavelet Transform (WT) processed load sub-series [15].
Time series-based wavelet neural network was proposed in 2001, in which, training data was processed through the time-series input selection technique; WT normalized the input data and then, the final prediction was done using multi-layer perceptron neural networks with denormalization of the data series [16]. Reference [17] designed four single hidden layer FFNNs with WT for a 24 h load prediction. The authors in [18] proposed two separate three-layer perceptron networks for prediction of the next day load corresponding to low and high-frequency components decomposed by WT of a historically similar day load series vector. The hourly seasonal load series behavior has been characterized by different frequency components using WT, and final forecasting has been carried out using a PSO+NN model [13]. Reference [19] reported two hybrid models: the first is waveletbased fuzzy neural networks (WFNN) and the second one is a fuzzy neural network based on Choquet Integral (FNCI) for peak and minimum load forecasting and achieved better results as compared with Adaptive Neuro-Fuzzy Inference System (ANFIS). To examine the behavior of the historical load patterns, reference [20] uses the regression model, and the prediction has been done by LM algorithm-based wavelet neural network. The authors in [21] utilized the WT+NN for primary forecasting and improving the accuracy of this model using a WT-based ANFIS approach.
Similarly, in order to handle the non-linear and non-stationary building heat load data, Gao (2020) et al. [22] proposed a novel ensemble prediction model which integrates the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and support vector regression (SVR). The CEEMDAN algorithm automatically decomposes the heat load data patterns into an intrinsic modes function (IMF) signal as per the characteristics of patterns on multiple time scales. In the same year, Gao et al. [23] deployed the CEEMDAN for the features extraction of hourly solar irradiance forecasting. A stable average RMSE of 38.49 W/m 2 has been achieved by using deep learning network-based models such as convolution neural network (CNN) and long short-term memory network (LSTM) on individual intrinsic time series [23]. Further, a hybrid CEEMDAN and LSTMbased model has also been utilized to improve the forecasting accuracy of stock market prices [24]. Similarly, CEEMDAN has also been utilized for the decomposition of wind speed data for improving the forecasting accuracy with NN's [25].
Zhang et al. [26] introduced a variational mode decomposition (VMD) algorithm with an aim to improve the forecasting accuracy of wind speed, in which VMD was deployed to decompose the original wind series data into IMF's, and then each subseries was forecasted using a GA-based NN. Similarly, the decomposed IMF's of wind data has also been further denoised by using WT, and then final forecasting was performed by using back propagation (BP) and RBF-based NN's [27]. For the forecasting of wind power, a pattern recognition-based hybrid method is proposed in which VMD is utilized for data processing, Gram-Schmidt Orthogonalization (GSO) is used for feature selection, and in the last step, the forecasting Extreme Learning Machine (ELM) was utilized for the training of each feature-based sub-series [28]. Similarly, VMD has been utilized for load forecasting with a long short-term memory network (LSTM) [29].
The conventional WT decomposes the signal into low frequency and high-frequency components. However, for better accuracy in the results, the low-frequency component of the signal is further decomposed into the low and high-frequency components using multiresolution analysis theory. The decomposed series has been further processed through the Group Method of Data Handling (GMDH)-based algorithm for the forecasting of load data series [30]. In 2021, the same decomposition process was carried out for temperature data in which the mother wavelet was chosen on the basis of the energy-entropy ratio; for training and testing data, a different learning algorithm-based NN was proposed [31]. On the other hand, WPD decomposes the load profile between higher and lower frequency components again into lower and higher frequency components with neural networks, and achieved almost 20% more accurate results compared to traditional WT [32]. Advanced WT has been presented in which the entropy cost function is used to select the best wavelet basis for data decomposition, mutual information for feature selection, and neural networks for prediction of electricity load with a one and multi-step-ahead basis [33]. In order to deal with the data noise of WPD, decomposed series correlation analysis has been deployed, and data with all the features has been trained through an improved weighted extreme learning machine [34]. In this paper, to extract the maximum features of the input signal, the data was decomposed using the proposed signal processing technique, i.e., WPD. Unlike WT, it decomposes approximate and detailed components at the same time to achieve the maximum resolution to the input data.
As per the existing literature, it has been observed that the pre-processing of data is still an open issue from the forecasting point of view. Therefore, in this, the authors proposed a time series (statistical)-based forecasting model in which WPD is used as an input data pre-processing tool for MCV forecasting. The results of the proposed model have been compared with stand-alone NN and conventional WT-based models for a single step ahead of point forecasting. The contribution is summarised as follows: First, a practical and transparent approach with the newly demonstrated LNNTD model has been implemented to forecast MCV; the input neurons of this have been selected using ACF. Second, the proposed MCV forecast framework has been implemented to forecast MCV for a period of one year with all seasonal estimation weeks. The concept of a moving window has been adapted with a cyclic test period of one month up to a one year forecast using the one-year training data set. In WPD-based models, two types of input selection criteria have been adopted; first, one combination of WPD-based decomposed series with ACF-based time lags (TL's) have been used as input vectors (neurons) of the model, and a similar theory has also been used for conventional WT-based models. In later (proposed) models, each WPD-based series has been forecasted individually, and the TL's for this have been selected on the basis of mutual correlation among all time-lags using ACF. Third, as per the existing literature, for the very first time, TS's of the forecasts have been measured for the validation of results on a single step-ahead of the forecasted values. The multiple step-ahead forecasting is conducted using an iterative approach up to the sixth step to check whether the forecast is applicable or not. Next, Section 2 describes the strategy of the proposed model, the experimental work is presented in Section 3, the Discussion is in Section 4, and finally the paper is concluded in Section 5.

Strategy of Proposed Model
In this section, the authors have tried to take care of all the methodology associated with MCV forecasting, with an aim to improve the accuracy of forecasts. The proposed model utilized the features of WPD as a pre-processing tool for LNNTD. In this model, each preprocessed sub-series was forecasted individually, and the input features were chosen on the basis of mutual correlation among the TL's using ACF. The performance of the proposed model was compared with standard benchmark stand-alone NN models such as Feed Forward Neural Networks (FFNN), GA-based NN (GANN), and Elman Recurrent Neural Networks (ERNN), along with conventional WT-based NN models. The structural parameters of all compared models are discussed in this section. An extensive study for the selection of wavelet selection has also been carried out on all accuracy indices. The details are described in the next subsections.

Input Selection
The demand curve of electricity has been associated with various uncertain factors that will be reflected in the MCV curve. These factors affect the training and weight adjustment of neural networks that create difficulties during the input selection of NN models. Therefore, there is a requirement for special treatment for input selection. The selection of input variables is one of the most important parts of NN-based forecasting models. The input vector (neurons) determines the input architecture of the NN model on which the accuracy of the model is highly dependent. In the present work, a correlation-based time series method has been utilized to select the input time lag data, and the proposed model has been effectively implemented on the two year (2015-2016) IEX unconstrained MCV data [35].
In the time series context, it is necessary to know the relationship that exists between the present-time MCV series, along with their past time-lag series. The ACF and partial autocorrelation function (PACF) have been used for input selection. ACF and PACF plots of a sample MCV series are shown in Figure 1. The higher value of ACF defines the correlation between successive lags, which is very strong and drops off very quickly over large time lags. The MCV curve forecast problem aims to find an estimate MCV (t+k) of the MCV curve the vector MCV (t+n) based on the previous n measurements: MCV (t) , MCV (t−1) , . . ., WP (t−n) . The number of time lags are 17 as:

Pre-Processing Using Wavelet Transform
In this step, two types of wavelet theories have been adopted for the preprocessing of data, first one is conventional Wavelet Transform (WT) and the second is Wavelet Packet-based Decomposition (WPD).

Conventional Wavelet Transform (WT)
For an MCV forecasting model, it is necessary to pre-process the input data because of uncertainty and nonlinearity. The MCV time-series signal collected from the site might be associated with some corrupted and irrelevant information. This corrupted time series needs to be improved using WT. It is a mathematical signal processing tool used to handle the continuous time-varying signal, and divide the original time series into subseries that forecasts better than the original. Based on the time series signal category, WT is of two types: continuous and discrete wavelet transforms.
For an input MCV signal, MCV(t), the continuous wavelet transform (CWT) is defined on the real axis from (-infinity to infinity) and is given as: where: a is the scaling function of the mother wavelet function W(a, b), b is the timeshifting translation variables, * denotes the complex conjugate, and ψ(t) denotes the mother wavelet. CWT is the continuous scaling and time-shifting of the mother wavelet to either the high-scaled sub-frequency component or low-scaled sub-frequency component. The highscaled (low-pass filter) and low-scaled (high-pass filter) frequency components provide approximate and detailed information about the input MCV signal.
Whereas, in discrete wavelet transforms (DWT), discrete scaling (a = 2i) and timeshifting (b = k 2i ) of the mother wavelet function W(a, b) are done. Using DWT, the decomposition and reconstruction of the original MCV signal are followed by (2): DWT involves both high-pass and low-pass filters corresponding to decomposition and reconstruction of the original MCV data pattern [20,21]. The approximate (A 1 , A 2 , . . . , A N ) and detailed coefficient (D 1 , D 2 , . . . , D N ) of MCV data signal, MCV(t), is given below in Figure 2 as per the multi level decomposition (3):

Wavelet Packet Based Decomposition (WPD)
In WPD, the WT-based decomposed approximated signal components with low frequencies and the detailed signal components with high frequencies have been further decomposed. Therefore, by using WPD-based decomposition, more valuable information can be extracted from the original raw time series of MCV. The original time series of MCV has been decomposed into four sets of approximated and detailed signals: (A3:1, D3:2), (A3:3, D3:4), (A3:5, D3:6), (A3; 7, D3:8) up-to third level of decomposition as shown in Figure 3 and the waveform is shown in Figure 4. The Daubechies (db) wavelet has been shown to be one of the most capable of dealing with MCV data and the analysis for the selection of db has been discussed in Section 2.5. These technique outputs are much more balanced compared to that of WT, and this can easily identify weak and singular component signals. If θ(t) is the scaling function, and ψ(t) is the wavelet function, then both can be related, as [31,34]:  In Equation (4), l im and h im denotes the low-and high-pass frequency coefficients of the signal, respectively. The full wavelet packet {δ n (t)}(n ∈ Z + ) is on the basis of δ o (t) = θ(t), which can be derived as: In the above Equation (5), h m and g m are the wavelet function coefficients, v is the wavelet packet (WP), n denotes the decomposition level in which b is the node position at that level where convolution of the wavelet and scaling filter with WP of the previous level can be used to generate the wavelet packet coefficient at a specific level. This process is repeated until the desired depth of the binary tree is reached.

Linear Neural Network with Time Delay
The Linear Neural Network (LNN) is a three-layer NN in which the transfer function used is linear rather than hard-limiting. In LNN, there is a delay between the input and summation layer as shown in Figure 5.
The input MCV data pattern is fed to the input layer and passed via the n delays of the Finite Impulse Response (FIR) filter to the summation layer. The output signal that comes out from the summation layer is passed through the linear transfer function to the output layer. It combines the features of the Multi-Layer Perceptron (MLP) structure with a delay layer between the input and summation layers [36][37][38][39]. The output of the network has been evaluated using Equation (6). The network has been trained using the standard Back Propagation algorithm (BP).

Wavelet Component Selected Time Lags for Forecasting
• Step 1: From raw MCV data, an input time series vector has been formed, on the basis of ACF 17, time-lag data was chosen as the input variable for standalone NN models. • Step 2: edcomposition of the original MCV time series into approximated (A1-A6) and detailed (D1-D6) subseries using db10.

•
Step 3: The fourth level approximated and detailed MCV subseries with a 17 MCV time lag has been selected as an input variable for conventional WT-based models. The structure of the LNNTD model is shown in Figure 6 and the schematic flow diagram for the conventional WT-based MCV forecasting model is shown in Figure 8. • Step 4: For the WPD-based model, a third-level decomposition is used [35][36][37][38][39] in which eight different high-and low-frequency component-based series are obtained. Two types of input selection criteria are adopted first, eight WPD-based decomposed series are used with 17 ACF-based time-lag series similar to that of the conventional WT-based model. In the second (proposed model), each WPD-based series has been forecasted individually, the schematic flow diagram is shown in Figure 9 and the TL's are selected on the basis of ACF which are presented in Table 2.

•
Step 5: For the forecasting, one-year MCV data was trained and tested for the next month, a similar process is continuously repeated up to the next 12 months, with a one-month moving window as shown in Figure 6. The epochs and performance goals have been chosen to be equal to 10,000 and 0.001, respectively.

Accuracy Metrics
For the accuracy of estimated MCV results, the trained NN forecast output has been compared with the actual indicated MCV values. The comparison has been made with two measures, Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE). The MAPE can be defined as follows: The MAE is given by: where MCV t is the actual indicated MCV value, F t is the estimated MCV value for the tth hour and n is the number of hours considered for forecasting. Consequently, the application of WT makes the input MCV data patterns more suitable and efficient for accurate forecasting by improving the hidden characteristics of the original MCV data signal. The NN models train themselves in a better way with input subfrequency data components compared to actual MCV data, which results in better forecasts. For proper selection of decomposition levels, a complete experimental analysis has been done on the basis of their error rate. Based on the ascending order of WT decomposition, the results are drawn in Table 3 and a similar procedure has also been adopted for the selection of Daubechies wavelet as shown in Figure 10. Thus, on the basis of experimental analysis, it is found that the fourth level decompositions using db10 provides the best-suited forecasts.

Effect of WPD
On the basis of the existing literature, the third level decomposition using WPD is found to be more suitable [30][31][32][33]. However, for proper selection of decomposition level, a complete experiment has also been conducted to verify the results on the basis of their error rate. Based on the ascending order of WPD decomposition up to the fifth level, the results are shown in Table 3 and a similar procedure has also been adopted for the selection of Daubechies wavelet as shown in Figure 10. From fourth level WPD results it is observed that the error is almost close to the third level but in the fifth level of decomposition, the error is increased suddenly because, in this level, the input variables are almost double to that of the fourth level input variables. Similarly, in the sixth level decomposition, the numbers of input variables are almost double that of the fifth level, they are very difficult to handle and the error rate will also abruptly rise. Thus, on the basis of experimental analyses, it is observed that third-level decomposition using db10 provides the best-suited forecasts.

Simulation Results
The hourly IEX unconstrained MCV data has been utilized for the evaluation of the performance measurement of presented load forecasting models. The data includes the historical load for a two year period from 2015 and 2016. The authors have not cut out any part of the time-series such as anomalous days, special events, or data that might be flawed. The performance of the stand-alone NN model has been compared with WT-based NN models along with the proposed model. The forecasting performance of all models has been carried out on the basis of accuracy and R 2 analysis using MATLAB version R2020b.

Accuracy Analysis
The performance on the basis of accuracy has been carried out on a monthly and seasonal week basis using MAPE and MAE accuracy indices. The MAPE test results for the year 2016 have been presented in Table 4 Table 5. The performances in terms of MAPE for all stand-alone NN models are almost similar; but, the performance of LNNTD has been found to be better as compared to other stand-alone NN models. Due to this, LNNTD has been used exclusively with pre-processing techniques adopted for the present work. From Table 6, the average MAPE performance (January 2016 to December 2016) of WT+LNNTD, WPD+LNNTD and proposed model is 1.753%, 1.162%, and 0.201%, respectively. From Table 7, the average MAEs reported by the WT+LNNTD, WPD+LNNTD, and proposed model are 80.6851 MWh, 53.065 MWh, and 9.056 MWh respectively.
It is observed that the performance of all stand-alone NN models is almost the same; but, when WT is deployed for input pre-processing; then, there is improvement in results to a large extent. Therefore, the improved results confirm the utility of WT for input pre-processing. These are the observations obtained from an accuracy point of view: • The performance in terms of accuracy of LNNTD is the best among all NN models. • WT-based NN model accuracy is higher compared to the NN models. • The accuracy level of the proposed model is found to be better amongst all others. • In spite of that, FFNN is one of the toughest benchmarks to beat. In order to check the quality of the forecasts, four seasonal weeks presented in Table 6 [40] has also been taken into consideration. The forecasted results in terms of MAPE and MAE have been given in Tables 7 and 8

Coefficient of Regression R 2 Analysis
The R 2 has been utilized to articulate the slope of the forecasted MCV against the actual value of MCV as described in Table 9. The average value of R 2 determined for the year 2016 by FFNN is 0.882, which is almost similar to that of all NN models. Furthermore, LNNTD also outperformed in our study, with a value of 0.9034. However, the value of R 2 , determined by the WT-based model is better than NN models, which are close to unity (by proposed 0.9996). When forecasting accuracy is poor, then R 2 moves away from unity; when accuracy is better then it moves close to unity.

Disscusion
In this section, the proposed model performance has been validated on the basis of the calculation of forecast tracking signals, and the universality of the model has been carried out using multiple-step ahead forecasts up to six steps. Further, in order to investigate the forecasting performance, the percentage of improvement in accuracy has also been taken into consideration.

Validation of the Model Using Tracking Signals (TS's)
In 1962, Brown has introduced TS's with an objective to provide automatic quality control for the forecasting system [41]. The purpose is to find whether the forecast is absolute or misbehaving on the data set. Generally, it is applied to a system that covers thousands of data points and where the data set is highly influenced by unsuspected seasonal variations. From the forecasting point of view, it is necessary to predict such situations within a quick succession of time so that a more appropriate and accurate model may be introduced. It is defined as the sum of the forecasting errors divided by the mean absolute deviation [42,43]. TS's are very helpful to find out whether our forecasting model is balanced or moving in one direction, i.e., over-forecasted or under-forecasted. For balanced forecasts, its values should be close to zero and move in both directions, positive as well as negative. The positive value of TS suggests that the model output is less than the actual value (under forecast); whereas, a negative value of TS conveys that model output is higher than the actual value (over forecast). When a forecast is consistent in one direction, either positive or negative, then it is referred to as biased forecasting. Here, the meaning of biasness is the continuous deviation of forecasts from the mean in one direction. So, it is a very helpful tool for us in improving or adapting the forecasting model in real-time situations [43,44].  Table 10 contains the value of overall TS's for stand-alone, pre-processing-based and proposed models. As far as the under and over forecasting is concerned that all models have both positive as well as negative values of TS except for one or two models. In the case of stand-alone models, the range of tracking signals is very far away from zero, but has both positive as well as negative values of tracking signals. Their range is high because their performance in terms of accuracy was also poor. On the other hand, the preprocessing-based models have a lower range of TS's and they are found to be the best with the maximum value of the tracking signal 6.839 for the month of September and a lower value of tracking signal is −4.9257 for the month of December. Hence, the proposed model has more accurate and balanced forecasts among the others because of the positive and negative values. In spite of the accurate forecast, the TS's may be out of range because of over-and under-forecasting, and it can be seen from the comparison of the tracking signal and error signals. As per the MAPE and MAE table, the accuracy of WPD is higher with average MAPE and MAE of 1.162% and 53.065 MWh, respectively. On the other hand, the majority of tracking signals for WPD-based model lies in the negative direction with the value of −28.4919 and the maximum value of the tracking signal in positive is 4.056883 for the month of December. Whereas, the majority of tracking signals for WT+LNNTD-based model lies in positive with a maximum value of 133.80 and the only value of tracking signal lies in negative is for January with −4.126882. Therefore, the forecasts are not balanced. Hence, the value of the TS's depends not only on the accuracy, but also on under-and over-forecasted values of the model used.

Multiple Steps-Ahead Forecasting
The performance of the proposed model has also been evaluated for more than one, to check the universality of forecasts. In this case, the forecasting is done for more than one hour or day in advance. For multiple-step-ahead forecasting, a single model was trained multiple times depending on the number of steps, and only the target set was changed, corresponding to the number of steps. The target matrix was increased with respect to each step in advance, as given below in Equations (12) and (13). If the large forecasting horizon is considered, the error that occurs at the first step will be multiplied with each step; therefore, the accuracy of the model will be lower. The results in the given Table 11 showed the superiority of the proposed model for six-step-ahead forecasting. The proposed model has achieved a MAPE of 1.57% and a MAE of 63.29 MWh for the sixth step-ahead forecasting in contrast to other models. Table 11. Multiple steps-ahead forecasting.

Percentage of Improvement in Accuracy
The percentage of improvement in accuracy is also one of the criteria for investigating the forecasting performance of a model more comprehensively. Both seasonally, as well as yearly, average MAPE and MAE, have been considered for comparison points of view. Table 12 shows the percentage in the improvement of MAPE by the proposed model, with respect to each model utilized in this work, is calculated by the following: In the above Equations (14) and (15), Y p represents the forecasted results of the proposed model and on the other hand, Y r represents the forecasted results of other models used in this work for performance comparison. Tables 12 and 13 confirm the superiority of the proposed model which improves the error percentage to a significant level, and for the interpretation of results both MAPE and MAE have been considered. Compared to other models (both stand-alone NN and WT-based models), the yearly average forecast, is increased in the range from 82.702 to 94.15882.9 in the proposed model MAPE, and MAE is increased in the range from 82.93414 to 94.39396. Similarly, the percentage of improvement of results on the basis of MAPE and MAE has been increased for the case of seasonal forecast in the range from 80.635 to 96.053 and 80.87127 to 96.29833, respectively.

Conclusions
For appropriate electricity management, taking part in the electricity market bidding and for proper implementation of policies, an accurate load estimation tool is quite important. Electrical engineers and policymakers are working to satisfy various frequency load demand cycles. However, in developing countries, it is very difficult to match the demand and load curve leading to difficulty in designing accurate forecasting tools. Therefore, it has been worth analyzing yearly MCV forecasting performance, in terms of accuracy and the R 2 value. It has been observed that the average forecasted MAPE and MAE results obtained by the proposed model are 0.201% and 9.056 MWh, respectively, which is far better than the other respective models. The comprehensive experimental analysis on the tracking signals of the forecast, multiple steps-ahead forecasting, and the percentage of improvement of accuracy indicate the superiority of the proposed model. The observed values of tracking signals, forecast at multiple steps, and the percentage of improvement of accuracy proved the superiority of the proposed model. Furthermore, the results in terms of accuracy can be improved by the more efficient handling of input data through pre-processing and post-processing soft computing-based tools.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://www.iexindia.com/marketdata/areavolume.aspx accessed on 17 September 2021.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: