Recent Development in Electricity Price Forecasting Based on Computational Intelligence Techniques in Deregulated Power Market

The development of artificial intelligence (AI) based techniques for electricity price forecasting (EPF) provides essential information to electricity market participants and managers because of its greater handling capability of complex input and output relationships. Therefore, this research investigates and analyzes the performance of different optimization methods in the training phase of artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) for the accuracy enhancement of EPF. In this work, a multi-objective optimization-based feature selection technique with the capability of eliminating non-linear and interacting features is implemented to create an efficient day-ahead price forecasting. In the beginning, the multi-objective binary backtracking search algorithm (MOBBSA)-based feature selection technique is used to examine various combinations of input variables to choose the suitable feature subsets, which minimizes, simultaneously, both the number of features and the estimation error. In the later phase, the selected features are transferred into the machine learning-based techniques to map the input variables to the output in order to forecast the electricity price. Furthermore, to increase the forecasting accuracy, a backtracking search algorithm (BSA) is applied as an efficient evolutionary search algorithm in the learning procedure of the ANFIS approach. The performance of the forecasting methods for the Queensland power market in the year 2018, which is well-known as the most competitive market in the world, is investigated and compared to show the superiority of the proposed methods over other selected methods.


Introduction
Modern power system planning encompasses diverse resources to incorporate the increasing demand subject to numerous techno-economic and environmental constraints. Price forecasting is of paramount importance to all aspects pertaining to power system operation, which is emulated by the abundance of researchers working on operationrelated issues [1]. Several methodologies have been put forth by researchers that differ in data processing, model selection, calibration and testing phases. Profuse literature is available on load forecasting due to years of extensive research and application, while the Table 1. Review of the developed artificial smart techniques implemented in literature for the prediction of EPF.

Model
Year Applied for Characteristics MAPE SVM + GA [9] 2016 Ontario A real-time cost of Hourly Ontario Electricity Price (HOEP) in Ontario mainland (HOEP and demand are considered as selected features for forecasting analysis.) is predicted hour by hour for 6 test weeks 9.22% Environmentally adapted generalized neuron model [24] 2017 NSW The test cycle on the Australian electricity market is a one week test on the basis of historical price and demand data

2.28%
Hybrid ANN-artificial cooperative search algorithm [25] 2019 Ontario The test cycle on the Ontario electricity market is about 4 months by taking the consideration of different season conditions on the basis of historical price and demand data 1.1% ANFIS-BSA [26] 2019 Ontario The test cycle on the Ontario electricity market is about 1 week by taking the consideration of different season conditions on the basis of historical price and demand data for the year 2017 0.79% Optimized heterogeneous LSTM [27] 2019 PJM The autocorrelation analysis is conducted to determine price data based on LSTM and EEMD and is used to decompose the electricity price sequence 2.51% Cuckoo search, singular spectrum analysis and SVM [28] 2019

New South Wales
This work considering the dynamic behavior of price series and to find optimum features by defining one threshold in different seasons 4% Extreme learning machine [29] 2020 New York City Real-time data of the market in 2017 in different seasons has been simulated in order to predict the electricity market. Data were converted to four previous hours and were taken into account to forecast the current price

1.44%
Energies 2021, 14,6104  Grey correlation analysis is applied to select efficient EPF and deep neural networks with stacked denoising auto-encoders to denoise data from different sources

5.64%
Gan-based deep reinforcement learning [35] 2021 New England Electricity market The proposed approach uses generative adversarial networks to collect synthetic data and increase the training set, and to enhance the forecasting system considering more features, such as temperature and load data, as inputs

Not reported
Hybrid deep neural network [36] 2021 New York City Hourly electricity price data from 2015 to 2018 are tested using VMD, CNN and GRU for four seasons 0.73% Multi-head self attention and EEMD framework [37] 2021 New England LMP, hourly system load with temperature and dew point information, has been used as the input variable to the hybrid model Not reported LSTM-NN [38] 2021 PJM Features are selected by a combination of entropy and mutual information, and wavelet transform is used to eliminate the fluctuation behavior of electricity load and price time series 0.63% Combined integration via Stacked Pruning Sparse Denoising Auto Encoder [39] 2021 Australian electricity market The proposed method has been used to decrease the noise of the data set and Tensor Canonical Correlation Analysis to select features with low correlation ranks 5% Ensemble approach [40] 2021 Austria A bootstrap aggregated-stack generalized architecture has been implemented to facilitate participants with renewable energy resources in real-time

5.13%
The contributions of this paper are as follows: • In this work, investigation over the day-ahead price forecasting is based on the price of electricity (PoE) data and demand of electricity (DoE) data in different time intervals. Therefore, within the available data set, this work has developed a feature selection technique to extract the features with the highest impact on price prediction accuracy. The feature selection method is a combination of two techniques, namely ANFIS and MOBBSA. To select non-dominated features, subsets from different combinations of input variable MOBBSA are used, while the performance of every selected feature subset is determined by ANFIS. Additionally, various well-known feature selection methods based on multi-objective methods, such as MOPSO, NSGAIII and NSGAII, are simulated for EPF as a benchmark.

•
Since electricity price and demand are inherently correlated, prediction of these parameters in a deregulated market, such as the Australian electricity market, is a challenging task in a smart grid environment. Therefore, to enhance the forecasting accuracy, a novel prediction approach is required, and to meet this requirement in this work, dif- ferent optimization techniques are implemented to improve the forecasting accuracy in the training phase of an AI-based approach. Based on the studied research in price forecasting, to improve the forecasting accuracy of ANFIS, BSA is applied to tune the membership functions of ANFIS so that the error can be minimized. Based on the forecasting accuracy, the improvement rate of different robust metaheuristic algorithms as comparisons is verified in the training phase of ANFIS and ANN methods. • Finally, the proposed method is evaluated through the comprehensive statistical analysis. It is observed that the developed model via a data-driven approach complies with all the necessary constraints. As such, it is suitable to be implemented for future EPF.
To evaluate the accuracy of electricity price forecasting, the Queensland electricity market has been considered as one the most unstable markets in the world. In Queensland, the Australian Energy Market Operator (AEMO) provides opportunities for a wide range of forecasting and planning trends to power suppliers and consumers in order to submit their offers for sale and bids to buy the electrical energy. Then, an Independent System Operator (ISO) is a utility or option in the market that arranges the price offers of the generators and the bid price of the consumers. In this open market, a single wholesale market for electricity is well known as a National Electricity Market (NEM). This infrastructure is responsible for purchasing and selling electrical energy between interconnected regions, generating units and also the retailers. The Queensland competitive market is considered as an electric grid that can deliver electricity in a controlled smart way from points of generation to active consumers. It is done by promoting the interaction and responsiveness of the customer as well as offers a broad range of potential benefits for system operation and expansion and for market efficiency. Therefore, it becomes a massive challenge for EPF. Since the pattern of electricity demand is changing based on seasonality; short-term EPF would be more useful for real-time decision making in the deregulated electricity market for the purpose of assessing price forecasting. Hence, to execute this work, feature selection and a forecasting method are adopted to cater to short-term EPF, and data from different seasons of this market are utilized for the verification of AI application for price prediction.
The remainder of the paper is organized as follows: section II summarizes the price forecasting classification. In the same section, the process of the development of the method is briefly explained. In section III, the typical price forecasting approaches, such as ANN and ANFIS, are explained separately. The process of selecting the most influential features for price forecasting is discussed in section V. After investigating the most popular AI techniques, the recent techniques for price forecasting are simulated for the QLD competitive market for different seasons. Finally, the last section concludes the paper and recommends the potential future development of methodologies for accurate electricity price forecasting.

General Framework for the Development of Price Forecasting Method
Forecasting of electricity prices is generally divided into short, medium and long term [13] categories. Nonetheless, there is no particular boundary line in the literature to distinguish them. Generally,

•
Short-term: This significant subcategory is the most relevant for daily market operations, and the forecast time varies from a few or several minutes to several days upwards. • Medium-term: The establishment of balancing sheet estimation includes medium electricity price forecasting, such as the derivatives pricing (change structure), and strategies of the risk management process, and the forecast duration starts from a few days to a few months after. The development in the forecast of electricity prices is generally based on the factor of price distribution on the future horizon rather than on the actual point predictions.

•
Long-term: The forecasting implementation of this scenario concentrates mostly on the preparation of profitability investment analysis and planning; the prediction duration is carried out for a month, quartile or even years in the future. This type of forecasting has generated useful information, which is appropriate to evaluate the potential site or generating facility-based fuel sources.
Precise forecasting is a prerequisite for key players and decision-makers in the electricity market to develop an optimal strategy that includes the improvement of socio-economic benefits and risk reduction. Short-term forecasts attract substantial attention and are extensively utilized for economic dispatch and power system control in electricity markets. Therefore, removing impediments in the short-term forecasting of electricity prices will play an instrumental role in managing power systems to meet the growing demand, keeping in line with economic growth that is imperative for sustainable development of different competitive electricity markets.
The proposed electricity price forecasting strategy is presented in Figure 1. After collecting data on historical prices and demand, it is required to prepare constrained data through significant feature selection. Therefore, in the first step, an enhanced feature selection is utilized via hybrid filtering and embedded techniques to assess the quality of features for the forecasting process. In the first stage, MI is applied to reduce the time of training due to the availability of data with a high dimension. In the next stage, MOBBSA is applied to select the features that represent the most important information of the original set. In step 2, the robust forecasting technique, based on combined ANFIS-BSA, is designed for day-ahead price forecasting in the highly volatile Queensland market. In this study, two types of ANFIS are developed. During the feature selection, in order to evaluate the selected input variable for forecasting purposes, as well as during forecasting, the electricity price is used to improve the forecasting accuracy. In the second step, BSA and other well-known optimization techniques are utilized to tune the membership function parameters to improve the price forecasting accuracy. The number of lag sequence for the electricity price is represented by NLPoE, and similarly, the electricity demand lag sequence is represented by NLDoE. In this research, by considering dual sets of historical input data, the hybrid ANFIS-BSA can be carried out to forecast Queensland electricity price (PoE) hourly. In 2018, historical data sets of input consisting of PoE and hourly Queensland DoE were found in [41]. In order to implement the dependent and independent variables normalization, Equation (2) has been applied based on the wide-ranging historical data. Data normalizations main feature is the adaptation of unrefined data encountered on various scales to a conceptually familiar scale, relatively early to the processing of data by where Z and t are described as the normalized value, the value that should be normalized and hourly interval, respectively. Assume that only one week with exogenous variables hourly lagged values (NLEP = NLED = 168) are applied for the prediction of electricity price, which represents 336 exogenous variables lagged. If any of the above exogenous variables are used in the forecasting The electricity price is represented as a function of the demand for electricity in a deregulated electricity market. During the process of electricity price prediction, it can be determined that the electrical energy prices in time, t, depend not only on the demand of electricity, but the previous values also affect the prices. The usual relationship between electricity demand and price from the previous values is stated in the following equation: where the demand and price of electricity at time t are represented by DoE(t) and PoE(t), respectively, whereby the assumption has been made for them as a t interval time series. The number of lag sequence for the electricity price is represented by NL PoE , and similarly, the electricity demand lag sequence is represented by NL DoE . In this research, by considering dual sets of historical input data, the hybrid ANFIS-BSA can be carried out to forecast Queensland electricity price (PoE) hourly. In 2018, historical data sets of input consisting of PoE and hourly Queensland DoE were found in [41]. In order to implement the dependent and independent variables normalization, Equation (2) has been applied based on the wide-ranging historical data. Data normalizations main feature is the adaptation of unrefined data encountered on various scales to a conceptually familiar scale, relatively early to the processing of data by where Z and t are described as the normalized value, the value that should be normalized and hourly interval, respectively. Assume that only one week with exogenous variables hourly lagged values (NL EP = NL ED = 168) are applied for the prediction of electricity price, which represents 336 exogenous variables lagged. If any of the above exogenous variables are used in the forecasting phase, the learning process may be slowed down, performance deteriorated and training data will cause overfitting. Since these aspects are of utmost significance in the process of electrical energy price prediction, it should pick only those features that have a major impact on performance.
The primary goal selection of the featured method is purposely to determine the importance of input features with the consistency aspect for the best subset selection in the original feature set of the suitable information. Due to the principle of dimensionality, numerous predictors may lead to a lower performance of extracted models. The core principle for using a feature selection technique has to be the removal of redundant or irrelevant features from a data set with various features, with no significant loss of predictive precision. A search strategy with a measurement metric is used for seeking candidates and for rating the performance of these candidates in a feature selection algorithm. The simplest function selection algorithm is to evaluate every potential feature subset to find one that reduces the error rate that provides the feature space with a thorough search, but it is computationally intractable [42]. Therefore, to discover all potentials, integration is implemented by a comprehensive assessment measure. Meanwhile, to determine the feature quality with a significant impact on the effective algorithm for the feature selection integration, a search strategy is utilized.
Based on the combination of searching techniques with well-known learning algorithms (assessment metric) to construct a model, three classes of feature selection methods are formed, namely wrappers, filters and embedded devices [43]. A predictive model based on wrapper (the search driven by accuracy) methods is implemented for feature subset evaluation. Every candidate function subset is applied to wrapper methods to train a model that is assessed on a holdout set. The error rate evaluation of the model for a testing set has generated the score for each subset of participants. As the wrapper for each candidate subset technique provides training to a trendy predictive model intended for the expenses of computationally intensive tasks for that particular type of model, they will regularly offer the best-performing feature set.
In filter (information gain) methods, a proxy measure is used instead of an error rate to score a candidate subset of the feature. The selection of proxy measures comprises the pointwise information, the product-moment correlation coefficient and mutual information for faster assessment of efficiency feature set. The MI technique is already extensively used in electricity market price prediction [44]. Nevertheless, this technique encounters challenges because of the lagged values of the candidate inputs given by the electricity market, which consist of load demand, price and other variables. Therefore, it is hard to acquire both individual and joint probability distributions of the candidate inputs [11]. In addition, it should be pointed out that the price of electricity is also recognized as a time-variant signal. Hence, it is not necessary to use a long history of candidate inputs, as market circumstances fluctuate most of the time. As such, due to the shortage of information values, it may deceive or give a less accurate price forecast process [11].
The wrapper methods are much more computationally intensive compared to filter methods, but wrapper methods have used a specific type of learning algorithm to obtain a subset of features for performance evaluation. Due to the learning algorithm lacking in filters, this has caused less prediction performance generated in comparison with the wrapper method set performed by the filter methods with a feature set, which is normally a common method. Commonly, the filter type method is utilized to discover variables' relationships, and instead of including an explicit best feature subset, the rank of features is more preferred. Hence, the hybrid feature selection method can be created by utilizing a filter to do a wrapper pre-processing step. A selection in this hybrid method for the most suitable features of bigger sets of data, based on the dimensionality reduction method by filter method type, allows the wrapper to do the proper selection.
Another subset of features, known as embedded methods, has the best contribution to precision during the process of designing the model. Embedded methods must not distinguish between the feature selection component and the training process, as the selection of the features and model construction steps are accomplished concurrently. Although computation is less comprehensive using the built-in methods compared to aggregation methods, this method has major limitations with certain features towards the base model, being sensitive to its structure. Hence, the approach is normally accurate to their learning algorithms. Dissimilar categories of the embedded method are classification trees, known as random forests and regularization approaches. The most generic version of the embedded feature selection method is the regularization approach, which is often called the penalization approach. The penalization approach inserts more constraints to the model development, which simplifies the model by penalizing the model for higher intricacy.

Electricity Price Forecasting (EPF) Techniques
In assessing electricity price forecasting, there are two general techniques; hard and soft computing techniques. There are various studies on hard computing approaches with various objectives, such as transfer function model, autoregressive integrated moving average (ARIMA), wavelet-ARIMA and mixed model. This approach needs an accurate model of the system to utilize the algorithm in finding the optimal solution considering physical phenomena. Although the accuracy of this approach is found to be high, it needs a large number of information and is computationally outrageous. There are also several studies on soft computing approaches in electricity price forecasting. Some recent approaches include artificial neural networks (ANN) and adaptive neuro-fuzzy inference system (ANFIS). Generally, this approach does not need any system modeling since it develops an input-output mapping based on historical data. Therefore, these approaches are computationally more efficient, and it has higher accuracy as well as higher resolution subject to correct inputs [13]. Therefore, the main focus of this paper is to review the methods and techniques that have been developed and introduced adopting soft computing models, namely AI techniques.

Artificial Neural Network
Artificial neural networks (ANN) are one of the promising technologies found in the last few decades that are used extensively in numerous functions in various fields. In the 1980s, the ANN approach, basically a mathematical model, was introduced for the very first time. An artificial neural network, simply known as a neural network, is developed based on the architecture and activity of biological neural networks in the brain. Numerous numbers of artificial neurons are gathered, and together, they construct an artificial neural network. Each neuron is connected to other neurons through synaptic weights (or directly weights). A simple biological neuron has four main parts-a cell body (soma), axons, dendrites and synapses. The dendrites help to take input signals Energies 2021, 14, 6104 9 of 28 into the cell body. Axons' responsibility is to transfer the signals from one neuron to the others. On the other hand, the dendrites of one cell and the axon of another cell meet at a point called a synapse. An artificial neural network consists mainly of; weights, bias and activation functions. Generally, an artificial neural network can be divided into two main parts; neurons and connections between network layers where the neurons are located. A typical ANN consists of three main layers, such as input layer, hidden layer and output layer. The ANN uses the concept of multilayer perception (MLP) that is the most popular ANN method among researchers. However, the outputs (Y n ) of ANN are determined from Equation (3) as follows: where X n presents the input values, W ni stands for the connection weight values among the input, hidden and output layers. b n and f n are the bias and transfer functions, respectively. From the above model and equation of ANN, the main challenge is found in handling the unknown variable-transfer function. The responsibility of the transfer function is to determine the characteristics of an artificial neuron.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
Based on the early defined rules, fuzzy logic methods cannot learn as well as adapt to a new condition by themselves. To succeed, the authors of [45] have mixed two different methods with each other and made a hybrid method named ANFIS, which is nothing but a mixture of Fuzzy Inference System (FIS) and ANN. The methodology named ANFIS can be acknowledged as a characterized system that can be matched with ANN. In ANN, the output parameters of the fuzzy system can be adaptive to train the system parameters of the fuzzy membership function. The benefit of FIS and ANN is processed by ANFIS. Different types of drawbacks are being analyzed through different difficult procedures of neural networks where all of the networks are being bypassed by linguistic variables of the FIS system. On the other hand, the neural inference system solves the problem by creating the ability to learn, as well as adapting themselves to a new condition. Hence, complicated non-linear mappings can be assumed by the competency of this method by applying the fuzzy system with ANN learning. Furthermore, it is acknowledged as a comprehensive estimator of long-lasting, medium and short forecasting [46].
The main reason to develop a system such as ANFIS is to adopt a system with a tunable membership function (MF), as well as a set of fuzzy rules during a phase of training. Two individual parameters can be optimized to implement the learning steps: • Parameters of antecedent (the MF parameters) • Parameters of consequent (the fuzzy system output function) Here, the characteristic is linear in the following parameters. The linear least-square is applied to optimize the predecessor parameters that look very similar to neural networks' backpropagation algorithm in conjunction, where gradient descent is applied for optimization.
Usually, the ANFIS is constructed by five individual stages. Among those, each of the stages has a node function. From the earlier layers, the next layer gets an input node [47]. The sequential layers of ANFIS can be arranged as fuzzification (if-part) in layer 1, production part in layer 2, normalization part in layer 3, defuzzification (then-part) in layer 4 and lastly, total output generation part in layer 5. There are dual inputs that are independent variables (x and y) and a single output, which is a dependent variable (f out ) included in the composition of ANFIS.
Dual diverse kinds of fuzzy inference systems are generated by alternating the fuzzy rules (if-then) of the consequences set with the procedure of defuzzification. This system is called Mamdani-based FIS and Sugeno-based FIS.
In numerous regards, the Mamdani-based FIS approach is similar to the Sugeno approach. A comparative fluffy deduction is prepared for both sorts by the implementation of the fuzzifying process upon the input information and the fluffy administrators. The foremost distinction between Sugeno-based FIS and Mamdani-based FIS is that the manner of the fluffy inputs has changed over to a fresh yield. In Mamdani-based FIS, for computing the fresh yield, the fluffy yield is employed in the defuzzification strategy, whereas in Sugeno-based FIS, the weighted normal strategy is utilized. The idea of disposing of Mamdani interpretability and expressive control is the aim of the strategies due to the reason that the standard consequence of the Sugeno strategy is not fluffy. Sugeno has a quicker interim time compared to Mamdani-based FIS; rather than the time-devouring defuzzification, it prepares the connection to the weighted normal strategy. Due to the instinctive nature of and operation of this view, it has led to the complex strategy of Mamdani, with the choice-back application thought to be linked. Additionally, Sugeno and Mamdani-based FIS's show more contrast between them due to the fact that Sugeno has no yield participation capacities compared to Mamdani FIS yield participation, so the Sugeno strategy gives a yield that is either a direct (weighted) numerical expression or is steady. The Mamdani strategy provides a yield that is a fluffy set. Sugeno has more adaptability in the framework plan than Mamdani-based FIS, as demonstrated by the more efficient frameworks that can be achieved if the ANFIS device is coordinated with [48].
Conceding ANFIS is linked with Sugeno-based FIS, the composition of the fuzzy IF-THEN rules of Sugeno-based FIS of the first-order are known as the ANFIS rules, and are indicated as: Rule 1: If x is A1 and y is B1 then z is f1(x, y; p1, q1, r1) = x p1 + y q1 + r1 Rule 2: If x is A2 and y is B2 then z is f2(x, y; p2, q2, r2) = x p2 + y q2 + r2 where Ai and Bi are sets of the fuzzy, fi (x, y; pi, qi, ri) is known as the first-order polynomial function that defines the Sugeno-based FIS outputs, x and y are two separate facts, and z is an ANFIS model output.
Inside the ANFIS mainframe, the layers of the distinctive comprise of distinctive hubs work. The nodes (hubs) within the same layer of this network perform functions of the same type. The layers are described more in detail as follows: Layer 1: In this layer, the inputs are x and y to hub i, etymological names are Ai and Bi, enrollment capacities for Ai and Bi fluffy sets are µ Ai and µ Bi , separately, and the enrollment review of a fluffy set is known as q1. i is regarded as the yield of hub i within a layer that indicates the degree to which the specified input (x or y) fulfills the evaluation. In ANFIS, the MF (enrollment work) for a fluffy set can regularly be any parameterized participation work, such as universal Chime molded work, Gaussian, trapezoidal or triangular.
Layer 2: Each hub in this layer may be a settled hub that yields the item of all the approaching signals. In this layer, through an increase of input signals, the terminating quality of each run is decided.
Layer 3: Each hub in this layer may be a settled hub. Throughout this layer, the terminating quality given in the past layer is normalized by computing the proportion of the i th rule's terminating quality to the entirety of all rules' terminating qualities.
Layer 4: Each hub is adaptive with a node feature in this layer. Layer 5: This layer has one settled hub that computes the large yield of ANFIS by summation of all approaching signals.
Lastly, the hybrid learning algorithm has been utilized by ANFIS to tune the parameter. At the same time, for updating the input MF parameters (antecedent parameters) and training the consequent parameters in layer 1, respectively, the backpropagation algorithm and the least-squared method have been used.
Based on recent research in AI techniques, it was concluded that the simulation results of the Queensland market in 2018 in different seasons are based on the hybrid ANFIS-BSA. Therefore, BSA is explained as the most recently developed optimization technique in the training phase of ANFIS and is compared with well-known optimization techniques to prove the proposed method can be applied for any deregulated electricity market.

Multi-Objective Backtracking Search Algorithm (MOBSA)
In multi-objective optimization problems, the Pareto (French economist) optimality method is applied to generate a set of solutions to the objectives instead of looking for a single solution. The set of optimal solutions is needed because a single point may not optimize all the objective functions at the same time due to a conflict sometimes arising among the objectives. There are two feasible optimal solutions determined by the Petro optimality technique, which are designated by ε = (ε 1 , . . . , ε N ) and ∂ = (∂ 1 , . . . , ∂ N ), respectively. In order to accomplish the solutions, two sets of objective functions are used; . . , f m (∂)), developed as shown in Equation (4). Solution (∂) is accepted only as an optimal solution over solution (ε) when a mathematical condition ( f (ε) < f (∂)) and Equation (4) are satisfied simultaneously. Hence, solution (ε) is called the non-dominant solution, corresponding to the solutions incorporated in the Pareto optimal set. The depiction of the Pareto optimal set (containing the objecting functions and decision variables) is designated as Pareto front [49][50][51][52][53][54][55][56].
The Pareto optimal set of a multi-objective BSA scheme is shown in Figure 2 A standardization mechanism is designed to provide an estimated common scale in the objectives that are originally designed for distinct scale factors in the multi-objective algorithm. To evaluate a precise solution (least valued solution) from the Pareto set, other normalized outcomes have to be accumulated and measured with the common scale.
The backtracking search algorithm (BSA) is the latest evolutionary algorithm with a simple structure. It has the capability to solve multimodal functions and different numerical optimization problems. In BSA, two advanced crossover and mutation operators are proposed to generate the trial population. These operators are unique and different from other evolutionary algorithms (e.g., GA and DE) in terms of their structure. It only has one control parameter, and it is insensitive to the initial parameter value. As such, it overcomes the drawbacks of metaheuristic methods that have a lot of control parameters, are sensitive to the initial value of these parameters, premature convergence and time-consuming computation. The overall optimization process of BSA in the selection of features and price prediction is obtainable, which has six stages: initialization, selection-I, mutation, crossover, boundary control and selection-II [57].
In the case of multi-objective BSA schemes, several numbers of optimal solutions are generated as a Pareto optimal set, rather than dealing with a unique solution. To compute well the generated dominant and non-dominant solutions and to bring improvements in the existing algorithm, a sophisticated mechanism is included based on the superiority idea of the Pareto technique [58]. In an initial stage, there should be a generation of a considerable number of offspring (T) by using the parameters of the crossover and mutation The Pareto front produces a group of optimal outcomes regardless of generating a unique optimal solution in the BSA optimization analysis. In fact, no single solution should be neglected over other solutions in the Pareto front as they are all considered crucial parts of the optimization technique. It may be an impossible task to attain greater development in the determined objectives if any one of them is eliminated from the optimization process. Consequently, trade-offs of the solutions are expected and satisfied with a most convincing solution while manipulating the multi-objective optimization problems.
A standardization mechanism is designed to provide an estimated common scale in the objectives that are originally designed for distinct scale factors in the multi-objective algorithm. To evaluate a precise solution (least valued solution) from the Pareto set, other normalized outcomes have to be accumulated and measured with the common scale.
The backtracking search algorithm (BSA) is the latest evolutionary algorithm with a simple structure. It has the capability to solve multimodal functions and different numerical optimization problems. In BSA, two advanced crossover and mutation operators are proposed to generate the trial population. These operators are unique and different from other evolutionary algorithms (e.g., GA and DE) in terms of their structure. It only has one control parameter, and it is insensitive to the initial parameter value. As such, it overcomes the drawbacks of metaheuristic methods that have a lot of control parameters, are sensitive to the initial value of these parameters, premature convergence and timeconsuming computation. The overall optimization process of BSA in the selection of features and price prediction is obtainable, which has six stages: initialization, selection-I, mutation, crossover, boundary control and selection-II [57].
In the case of multi-objective BSA schemes, several numbers of optimal solutions are generated as a Pareto optimal set, rather than dealing with a unique solution. To compute well the generated dominant and non-dominant solutions and to bring improvements in the existing algorithm, a sophisticated mechanism is included based on the superiority idea of the Pareto technique [58]. In an initial stage, there should be a generation of a considerable number of offspring (T) by using the parameters of the crossover and mutation processes in the multi-objective BSA algorithm. Hence, a comparative study is taken place depending upon the notion of Pareto dominance between the individual members (i th ) of the offspring and the population (P i ). In this comparison analysis, the individuals of the population are replaced by the offspring ones due to (T i ) led by (P i ) in the optimization process. In the next stage, it is important to transform the BSA algorithm into the multi-objective functioned method to reach the global minimum optimization. A Pareto optimal mechanism set is developed to store many dominant and non-dominant solutions as an alternative to exporting a global minimum approach. To establish the concepts of the external elitist archive and crowding procedure, there are many steps that need to be accomplished in the multi-functioned BSA algorithm, which are shown in Figure 3 and described in the following order.
Step 1: The optimization parameters are the same size as the main arbitrarily developed population (P), as referred to in (5).
Step 2: Now, there is a need to figure out every member of the primary population with their fitness strengths. After identifying the categories of different populations, only non-dominant solutions should be stored in the external archive.
Step 3: Equation (6) is used to compute the historical archive of the parent participants in this BSA optimization algorithm.
Step 4: The archived members of the non-dominant solutions are updated in every consecutive repetition of the optimization procedure by following the "if-then" rules in Equation (7).
Step 5: A mutation technique is implemented to determine a single offspring from only one population stored in the historical archive, which is manipulated by Equation (9).
Step 6: To Equation (10), eventually, a unique solution of the offspring (T) can be achieved using a crossover strategy in every consecutive iterative operation from the trial population previously stored in the archive.
Step 7: After the crossover analysis is completed, an already produced member of the offspring population should be replaced by an alternative one if it breaches the threshold condition (stated in Equation (11)) of the non-dominant set size in the external elitist archive.
Step 8: The entries (i th ) of the produced offspring (T i ) have to be changed by the members (i th ) of the parent (P i ), when T i exceeds Pi in number.
Step 9: Then, reorder the solutions in the elitist archive based on the commands explained in Step 4.
Step 10: The area of objective functions is split after measuring the crowding interspaces of the solutions in the external elitist archive. Then every single solution is stationed in a specific destination based on the parameters of their objectives. When there is no more space to store a newcomer of the non-dominant solutions into the external archive, an arbitrarily chosen solution from the densely populated area should be eliminated to give access to the new incoming solution.
Step 11: If the redistribution process does not meet the optimization requirements, apply the formula g = g + 1 and start repeating the optimization process from Step 4. Step 1 Initialization Step 2 Selection-I

Step 3 Mutation
Step 4 Crossover Step 6 Selection-II Step 5 Boundary Control End (5) where: nPop is population size. nVar signifies the optimization variable. uniform distribution function is defined as U. lowj and upj are lower and upper (6) where, (7) where: := is the updated operation, a and b are randomly generated numbers. (8) where: where, mixrate is the control parameter of optimization algorithm.

Selection of Forecast Model Inputs via Multi-Objectives
In this section, feature selection is developed based on the filtering and wrapper techniques to determine the most effective input for the forecasting of the Queensland deregulated electricity market. The wrapper method for a particular model type always provides the most important features, but their efficient research algorithm in the evolving training phase is required. The sequential search-based metaheuristic technique for feature selection is recommended for the purpose of adding or eliminating the features before the efficiency of the model is improved, as the exhaustive search is normally impractical. During the local optima point, the sequential search has a probability of turning out to be stagnated. Thereby, to search for a different feature subsets space, the metaheuristic algorithm, also known as the randomized search algorithm, is suggested. The application of metaheuristic algorithms for feature selection has included random searching procedures to avoid becoming trapped in a local optima point. The multiple metaheuristic algorithms, which are GA, SA, ACO and PSO, have been used in the context of feature selection [42].
The formulation of the objective optimization problem of the feature selection is done in metaheuristic algorithms; therefore, the number of appropriate features must be predefined and continually locate the features subset along with a number of static features. Typically, the selection of features has dual main-diverging purposes; simultaneously minimizing both measurement error and the number of features. Consequently, multiobjective problems formulated from the selection of features consist of two key goals, which are optimizing the model effectiveness and considering the number of features to be minimized, and a decision is a trade-off of these two objectives. The multi-objective topic is a formulation of feature selection that corresponds to a non-dominant set of feature subsets in order to fulfill multiple requirements.
The investigation through NSGA-III, NSGA-II and multi-objective particle swarm optimization (MOPSO) is carried out to attain the Pareto front of feature subsets [58,59], yet a still more effective search approach is required to enhance the solution to feature selection issues [60]. There are some issues experienced by the existing multi-objective feature selection algorithms where the obstacles are referring to the computational cost, which is too high, more parameter control, and are also very sensitive to the initial value of the BSA parameter, whereby it is considered to be less costly to computationally implement instead of other metaheuristic techniques with just a single control parameter [54]. A Binary-Valued BSA (BBSA) in [61] is proposed for solving the discrete form parameter optimization.
Nevertheless, BBSA is being used as an effective search algorithm for the selection of features in [60]; it handles the role of a single objective issue and is not specifically applied to tackle multi-objective problems of feature selection. Numerous variations of Multi-Objective BSA (MOBSA) were established in [54,62]. Statistical analyses in [54] indicate that MOBSA is becoming a promising strategy of optimization to solve high-dimensional multi-objective problems among multiple established multi-objective evolutionary algorithms (e.g., MOPSO, NSGAIII and NSGA-II). Hence, this work proposes an algorithm for solving multi-objective problems of feature selection based on BBSA that can be a potential algorithm for obtaining a non-dominant subset Pareto front. The evolutionary training procedure can utilize any learning algorithm (e.g., ANN, SVM, ANFIS) for solving BSA-based multi-objective problem algorithms to assess the quality of individual candidate feature subsets. Due to its rapid learning ability to estimate non-linear functions, ANFIS has been known as a universal estimator [60]; thus, it is incorporated in the proposed wrapper-based multi-objective feature selection method as an evaluation metric method. In particular, ANFIS deploys an effective hybrid learning technique, which integrates the least-square method with the gradient descent. The least-square method contributed to the speed of the training [63]. Thus, ANFIS has the capabilities to develop the predictive model after only a few epochs of training. The models have been designed for different combinations of features selected by Multi-Objective BBSA (MOBBSA), as the least-square approach is computationally effective, a single/few runs of the least-square technique is used to train them, and then to establish the model, a subset with non-dominated features with the best performance quality is preferable. Prior to implementing a feature selection to gain the most prominent subsets with maximum validity variables of input and minimal of EPF short-term redundancy, dependent variables, as well as the independent variables, are put into two sets randomly: 70% as training sets and 30% as test sets. ANFIS models are constructed by the training set with various input variables subsets, whilst the test set provides access to the vigor and utility generated models.
MOBBSA is utilized to appear within a diverse mix of variable inputs as the technique of multi-objective feature selection progresses and the selection of the non-dominant subset of features advances, while ANFIS is used as a metric of evaluation to observe the work accomplishment of each subset of each feature. Each MOBBSA individual represents one input variable during the applied learning method (ANFIS) training process. The establishment of a feature selection method retains non-dominant feature subsets in an external elitist archive, which concurrently minimizes the root mean square error (RMSE) by using the principle of Pareto dominance on the test set and a variable number of the input to achieve the solution in global optima points. The MOBBSA upgrade from BSA currently only has a single control parameter, called "mixrate", responsible for restricting the number of individuals' involved in the crossover phase. Therefore, about 100% of the population has been determined with a maximum mixrate value (i.e., 100% of the population size) has been considered for the implementation of the feature selection approach by including each and every individual in the crossover stage. The alternative potential to the Mamdani-based FIS is the Sugeno-based FIS to build the framework of ANFIS for the purpose of feature selection since this type of model is ideally suitable for modeling non-linear functions by interpolating with numerous linear functions. Based on the research done [64], the author provided scatter partitioning into the training process to enhance the feature selection process. The main part of scatter partitioning, which is known as subtractive clustering, becomes a key point to create the ANFIS for feature-based selection.
The reciprocal information of the input characteristics is evaluated, and insignificant and redundant traits are filtered to create a lower input subset. Then the technique of feature selection is implemented into the reduced subset to identify a smaller set of features with high predictive precision. The feature selection technique has suggested that during the initial stage, the reciprocal details of the input and output features amongst every individual variable are tabulated based on MI formulation in [25]. The indication of mutual information with much higher values leads to a higher reliance on each output and input variable. The sorting process of input features is in descending order, which is based on specifically computing the mutual information.
In order to reduce the running time of feature selection, two-stage feature selection is proposed in this work. Input features exhibiting a lesser amount regarding the significant influence toward the output and allowing lesser value due to the reciprocal information than the relevancy threshold (TH) is eliminated. Due to filtering the purposes of the redundant features, the significance threshold must be considered as TH = 0.46. Therefore, after filtering procedures, the most important qualities of 69 features are selected. Out of the 69 candidates chosen, 27 features with the most significance and the highest dissimilarity are identified by MOBBSA as inputs for the next predictive procedures. Moreover, for the effectiveness of the evaluation phase in the proposed multi-objective feature selection method, the comparison with the MOPSO, NSGAIII and NSGAII has been carried out. A thorough analysis to find the optimal solution for subsets of input variables along with their computational time is achieved using the multi-objective feature selection methods and tabulated RMSE values, which represent their subsequent performances in Table 2. According to the obtained results, for the similar results to the test, the suggested feature selection technique is better than other techniques since it generates less error in estimation and feature numbers. Table 2. The multi-objective feature selection methods select the optimal subsets of input variables by studied, RMSE and computational time value, which represent their subsequent performances.  Figure 4 represents the sequential steps to obtain AI-based models for short-term EPF are carried out for all models, which are as follows:

Sequential Steps to Obtain AI-Based Models for Short-Term EPF
Energies 2021, 14, 6104 18 of 29 U-statistic always generates binary results [0, 1], where zero represents higher forecasting precision, and one represents the estimation is as inaccurate as a naïve guess.
The appropriateness description of a given data series obtained through models is ensured through the whiteness test, also known as the Durbin-Watson test [65], acquired after a confirmatory analysis. The objective of the confirmatory analysis is to confirm the whiteness of estimated residuals (e(t)), which confirms the non-correlation between them.  To prove the effectiveness of the proposed model, the Akaike Information Criterion (AIC) for different months is calculated [66] as (18). AIC deals with the trade-off between the goodness of fit of the model and the simplicity of the model. In other words, AIC deals with both the risk of overfitting and the risk of underfitting. log( ) 2 A IC n RMSE k = × + (18) where n represents the number of observations, k represents the number of coefficients optimized by a model and RMSE is the root mean square error.

Simulation Results and Discussion
The electricity price forecasting process and development of the whole feature selection techniques were coded in MATLAB (R2019a) and run on a personal computer, with a core-2 quad processor of 2.6 GHz clock speed and 4GB RAM. Regarded as one of the  Step 1: To evaluate the efficacy of the applied methods for EPF in different seasons, one month in each season was considered (i.e., February (summer), May (autumn), August (winter) and November (spring)) due to seasonality effects.
Step 2: Designing the training phase entails the derivation of algorithms responsible for connecting the input variables to the output variables, and Equation (2) is used to normalize the input and output variables to make the learning process swift.
Once the selected features (PoE and DoE) are finalized, the most influential features will transfer as the input variables for the next forecasting process. In order to evaluate the forecasted price, a data set beyond the training data set, known as the testing data set, is employed. The reliability of the generated model can be determined utilizing the considered scores in the testing process.
Step 3: To predict the electricity price (PoE) precisely, ANFIS-BSA is implemented by minimizing the cost function, illustrated as follows: where the actual and predicted electricity prices are PoE (t) observed and PoE(t) forecasted , respectively, and N represents the number of observations.
Step 4: The purpose of designing a testing phase is to evaluate model performance on results of AI approaches applied to datasets having no function in building models.
Once the selected features (PoE and DoE) are finalized, the most influential features will transfer as input variable for next forecasting process. In order to evaluate the forecasted price, a data set beyond the training data set known as the testing data is employed. Reliability of the generated model can be determined utilizing the considered scores in testing process.
U-statistic always generates binary results [0, 1], where zero represents higher forecasting precision, and one represents the estimation is as inaccurate as a naïve guess.
The appropriateness description of a given data series obtained through models is ensured through the whiteness test, also known as the Durbin-Watson test [65], acquired after a confirmatory analysis. The objective of the confirmatory analysis is to confirm the whiteness of estimated residuals (e(t)), which confirms the non-correlation between them.
To prove the effectiveness of the proposed model, the Akaike Information Criterion (AIC) for different months is calculated [66] as (18). AIC deals with the trade-off between the goodness of fit of the model and the simplicity of the model. In other words, AIC deals with both the risk of overfitting and the risk of underfitting.
where n represents the number of observations, k represents the number of coefficients optimized by a model and RMSE is the root mean square error.

Simulation Results and Discussion
The electricity price forecasting process and development of the whole feature selection techniques were coded in MATLAB (R2019a) and run on a personal computer, with a core-2 quad processor of 2.6 GHz clock speed and 4GB RAM. Regarded as one of the most volatile electricity markets, the implementation of ANFIS-BSA in this study is to improve the precision of Queensland mainland Electricity Price Forecasting (EPF). In this section, substantial features are established by emerging multi-objective BSA and ANFIS as inputs for prediction investigation. Additionally, an assessment of the efficacy of ANFIS-BSA for short-term EPF accurateness purposes is performed by comparison with the well-known AI techniques: ANN, ANFIS, ANN-GA, ANN-PSO, ANFIS-GA, ANN-BSA and ANFIS-PSO. There are two types of variables, dependent and independent variables. They are divided into two subdivisions, where the hourly data of the first three weeks is used for the training of design phase and that last week of each month is utilized for the testing phase of the models, which are obtained from the analysis. The adopted methods with the control parameters are typically based on identical methodologies that have been effectively utilized for energy price or demand forecasting in the literature, as there is no unanimity in the optimal values of the AI-based method settings of the parameters.
A backpropagation MLP as an ANN model, which adopts a feed-forward structure trained with a backpropagation (BP) learning model, is used. This network is usually referred to as a universal estimator because of its simple solution network and faster computational steps, which enables it to have conducted training over a large input of data sets. Several criteria, for example, architecture network, learning algorithm and transfer function impact the effectiveness of developed models by ANN. As recommended in [67], two hidden layers are adopted for short-term EPF to form an MLP model. The layers include Levenberg-Marquardt PB learning and the logarithmic sigmoid transfer function. The formation of the ANFIS structure is based on the selected Sugeno-type FIS instead of the Mamdani-type FIS (Fuzzy Inference System). The reason for selecting the Sugeno-type is that it is able to model non-linear systems based on the interpolation of multiple linear models. As suggested in [68], the selected membership function here is Gaussian, and ANFIS is set up by subtractive clustering (radii = 0.8). Based on the recent studies, different ANFIS models are computed in this work. The results in Table 3 show that subtractive clustering (SC) outperformed the other models of ANFIS. The method recommended in [69] is GA, which was implemented by the genetic algorithm optimization toolbox (GAOT) with the involvement of scattered crossover and mutation of Gaussian. All the control parameters, such as crossover rate and mutation rate, are set to be default as suggested in [70], and the population of the GA is set to 100.
In this study, a standard form of PSO algorithm is also implemented, in which 100 is set for the population number. Value 2.0 is selected for both acceleration factors (c1 and c2). A decaying inertia weight ω with a range between 0.9 and 0.4 is selected, as stipulated in [71] with the increase of the running time. For a cross-over process participation, the control parameter of BSA sets the number of individuals within the population varies in the range of 0% and 100%. Nevertheless, the variation of this parameter is not influenced by the performance of BSA. Thus, to achieve the best solutions, the individuals with the maximum value were selected. The adjustable parameters of the whole technique are presented in Table 4. For February, May, August and November 2018, the performance of machine learning methods applications for the EPF of QLD with their computational time are summarized in Tables 5-8, respectively. The accomplished values of RACF validate the estimated residual whiteness in the asserted confidence range for different months for all established models. Regarding Table 5, the precision of the QLD electricity market forecasting method can be achieved by adopting the mean rank approach, which was intended for the indicator based on multi-criteria evaluations. ANFIS-BSA > ANFIS-PSO > ANFIS-GA > MLP-BSA > MLP-PSO > MLP-GA > ANFIS > MLP is the ranked order (absolute error, RMSE, U-statistic and MAPE) for the applied methods of EPF for forecasting effectiveness based on multi-criteria decisions for each indicator, considering the mean rank of the methods. The  Table 9 reports the AIC index with the better-fit model. For different months of different seasons, the enhanced results indicate the better-fit estimation generated by the optimized ANFIS methodologies as compared to previous researched techniques. Additionally, for training the ANFIS, it is decided that BSA is the most effective optimization algorithm, whereas ANFIS-BSA attained the better MAPE (%), U-statistic, RMSE and absolute error, which is less than the values attained using other optimization algorithms.  The performance of the models developed by ANFIS-BSA is verified through external validation using different statistical methods for the QLD competitive electricity market in the preceding year of 2018. To assess the performance of the obtained model, the following qualities were recommended [72][73][74][75]       The performance of the models developed by ANFIS-BSA is verified through external validation using different statistical methods for the QLD competitive electricity market in the preceding year of 2018. To assess the performance of the obtained model, the following qualities were recommended [72][73][74][75]

Conclusions and Recommendations
An investigation has been carried out over price forecasting methodologies used in recent studies related to the deregulated environment. As a result of the rapid transition in the power market's structure, prediction with a high degree of precision for upcoming prices, which has the capability to maximize the profit margin, has become an irresistible practice for market participants. It is a challenging task to consider the most influential electricity input variables caused by the reliance on the multiple parameters of electricity demand and electricity price. In order to overcome the challenges encountered in the forecasting of electricity prices, various techniques have been proposed to attain a robust model with high precision. The statistical techniques, artificial intelligence-based techniques and hybrid approach have been clearly deployed for the short-term forecasting of electricity prices. While most of the alternatives to electricity price forecasts in the literature were implemented, there are limitations for such approaches. For example, the main deficiencies in data-driven approaches are so many control parameters, and they are extremely sensitive, making it very complicated to initialize these parameters. Although multiple machine-learning techniques are often considered for electricity price forecasting, in order to give a more precise electricity price forecast, the newest methodology is still preferred. Moreover, long-term price forecasting was taken into account in most of the aforementioned literature. The electricity demand trend relies on the seasons. Thus, short-term price forecasting would become more efficient for real-time decision-making on a deregulated electricity market. Feature selection is another important factor affecting the efficacy and precision of the forecast. The selection of features significantly helps in enhancing forecasting abilities and accuracy. Nevertheless, a determination has been made that it is challenging to pick the most robust feature selection approach for the prediction of electricity price regarding the non-linearity of the price signal. In the most recent reference, the authors recommended the testing of a hybrid ANFIS-BSA method instead of Ontario to test its robustness and efficacy. Therefore, the efficiency of ANFIS-BSA on the Queensland electricity market is tested by the proposed method.
The suggested methodology for the selection of features evaluated not only whether the price signal of non-linear dependencies on its variable of input is better compared to the established multi-objective optimization methods, such as MOPSO, NSGAIII and NSGAII, but also eliminating features redundancy commonly to select the different related features. The classification of fewer input variables by the proposed combination of a MOBBSA-ANFIS feature selection approach, on the other hand, allows better accuracy in forecasting. The theoretical feature selection technique with the information of the nonredundant features between a big set input of candidates also has the potential to choose the most appropriate minimum subset of the proposed strategies. Hence, the candidate inputs are eventually chosen with minimal redundancy, and the output is highly relevant.
Finally, the backtracking search algorithm (BSA) is used as an efficient optimization algorithm in the learning phase of the ANFIS approach to provide a more precise prediction engine for forecasting the price of electrical energy. The predicted results are compared with ANN and ANFIS models' results, which are optimized by particle swarm optimization (PSO) and genetic algorithms (GA) to determine the effectiveness and applicability of the proposed approach for electricity price forecasting. The Hybrid ANFIS-BSA approach provides a good forecasting accuracy, with an average MAPE of 3.07, 2.77, 4.22 and 2.71 in February, May, August and November, respectively, taking into account the results of previous publications for different case studies as benchmark described in Table 1. The verifications of the results of the simulations used the real sets of data based on the electricity market of Queensland, which is amongst the most volatile electricity markets globally. The BSA-optimized ANFIS strategy is worth being considered as a robust and useful forecast mechanism in order to fulfill the actual needs of electricity market participants, including the self-producers and traditional generation companies, the suppliers/retailers and aggregators. These contributions will help market players efficiently bid, keep their daily business productive and ultimately increase the revenue of the companies.

Data Availability Statement:
The data used in this study obtained from Australian Energy Market Operator and are available online: https://www.aemo.com.au/energy-systems/electricity/nationalelectricity-market-nem/data-nem (accessed on 20 September 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

of 29
approach provides a good forecasting accuracy, with an average MAPE of 3.07, 2.77, 4.22 and 2.71 in February, May, August and November, respectively, taking into account the results of previous publications for different case studies as benchmark described in Table  1. The verifications of the results of the simulations used the real sets of data based on the electricity market of Queensland, which is amongst the most volatile electricity markets globally. The BSA-optimized ANFIS strategy is worth being considered as a robust and useful forecast mechanism in order to fulfill the actual needs of electricity market participants, including the self-producers and traditional generation companies, the suppliers/retailers and aggregators. These contributions will help market players efficiently bid, keep their daily business productive and ultimately increase the revenue of the companies.