A Fusion Framework for Forecasting Financial Market Direction Using Enhanced Ensemble Models and Technical Indicators

: People continuously hunt for a precise and productive strategy to control the stock exchange because the monetary trade is recognised for its unbelievably different character and unpredictability. Even a minor gain in predicting performance will be extremely proﬁtable and signiﬁcant. Our novel study implemented six boosting techniques, i.e., XGBoost, AdaBoost, Gradient Boosting, LightGBM, CatBoost, and Histogram-based Gradient Boosting, and these boosting techniques were hybridised using a stacking framework to ﬁnd out the direction of the stock market. Five different stock datasets were selected from four different countries and were used for our experiment. We used two-way overﬁtting protection during our model building process, i.e., dynamic reduction technique and cross-validation technique. For model evaluation purposes, we used the performance metrics, i.e., accuracy, ROC curve (AUC), F-score, precision, and recall. The aim of our study was to propose and select a predictive model whose training and testing accuracy difference was minimal in all stocks. The ﬁndings revealed that the meta-classiﬁer Meta-LightGBM had training and testing accuracy differences that were very low among all stocks. As a result, a proper model selection might allow investors the freedom to invest in a certain stock in order to successfully control risk and create short-term, sustainable proﬁts.


Introduction
Forecasting future stock values has long been a contentious academic issue. For a significant stretch of time, it was assumed that fluctuations in stock values could not be predicted. The share value index is an important part of the financial system since it represents global economic success. Real-world businesses must be watchful of their security as well as their growth. At almost the same moment, investors and analysts were interested in learning about the overall capital market patterns and trends. As a result, correctness in forecasting is critical for stakeholders' well-being. In the midst of the messy and volatile character of stock markets, forecasting future price movements is a difficult topic on which academicians are seeking to improve forecasting models.
Stock value trend forecasting is a masterpiece and fascinating subject drawn by various trained professionals and researchers from fields such as financial engineering, economics, operations research, statistics, and artificial intelligence. Although a large amount of effort has been put in over the last few years, the exact figure of the stock cost and its directions are still challenging to achieve at this point, even though some high-level AI strategies are used. Globalisation of the economy constantly requires developments in the field of computational science and data innovation. In recent years, monetary exercises have been progressively developing in number with the fast financial turn of events, and their varied pattern has additionally become gradually more intricate. The securities exchange assumes a fundamental part of the monetary space of any country [1].
As of now, with the fast improvement of AI and manufactured reasoning in the previous 10 years, an ever-increasing number of market analysts have begun to execute the index value estimating of gaugeable models, have exclusive requirements, and have attempted different strategies [2]. The best standard for deciding on the presentation of the model is to look at the anticipated effects of the model with genuine information. By investigating the current exploration, we can find that even though it is hard to anticipate the securities exchange law precisely, it can foresee the future pattern of the financial exchange somewhat and decrease the dangers looked at by financial backers [3].
Currently, although there are numerous approaches to anticipate the cost of the financial exchange, to all the more likely dissect and manage the information available, as we may found delightful many issues and curtailments, it has furthermore become the point of convergence of examination to come into more critical data. Since traditional insightful techniques have apparent imperfections in taking care of non-linear issues, some machine learning algorithms are brought into securities exchange investigation [4]. A predictive model that can gauge the direction of a stock value development assists financial backers with settling on suitable choices, improve productivity, and consequently decline potential misfortunes. As a result, precise forecast and investigation of the stock market become more challenging and advantageous. For stock value prediction, we must constantly increase determining methods. Previously numerous researchers at local and overseas committed themselves to develop gaugeable monetary frameworks to anticipate index value development. Before the beginning of proficient AI calculations, analysts regularly utilised diverse statistical techniques to fabricate expectation models. There are linear models and non-linear models used for stock price prediction. Most of the linear models come under statistical methods, whereas non-linear models are based on ML algorithms. Implementing a fiscal framework to accomplish precise index value gauging has become a hypothetical and pragmatic work [5]. In principle, the conventional fiscal frameworks and the arising computerised reasoning model can accomplish the expectation of stock costs, yet the forecast impact is very extraordinary [6].
Discovering frameworks with better prescient impacts through model blend and examination is beneficial for some researchers, and it likewise has significant hypothetical importance [7]. In actuality, realised information can be infused into the monetary frameworks to anticipate future information. For example, if the stock value gauge is higher than the end cost of the day, the model predicts that the future stock cost may rise, and financial backers can decide to keep holding the stock to acquire higher venture pay [8].
On the off chance that the stock value gauge is lower than the day's end value, it demonstrates that the stock cost may fall later on. Subsequently, it is incredibly viable to develop a monetary model to acknowledge stock value gauging [9]. Moreover, if you can figure out how to precisely anticipate stock value developments and unpredictability patterns, at that point, it has a significant incentive for nations, recorded organisations, and individual financial backers [10]. As of late, there have been a developing number of studies taking a direction at the course or pattern of developments of financial markets. Now the study gradually increases by looking at the demand and trend of stock markets. Technical investigation and fundamental investigation are two different strategies by which we can foresee the securities exchange. The fundamental investigation relies on precise information on the other variables that affect the securities exchange such as miniature financial aspects; large-scale financial matters; and political and, surprisingly, mental components.
In any case, the information is typically not promptly accessible. The technical examination endeavours to make expectations dependent on past designs. In any case, these models are not, for the most part, evident because of the upheaval [11]. For customary measurable strategies, it is tough to catch the abnormality. In these conventional frameworks, we need to accept a practical connection between information and yield and attempt to fit the data according to that relationship. This has empowered scholastic scientists and business professionals to grow more unsurprising estimation frameworks. Numerous different technologies and strategies have been proposed to embrace and anticipate stock costs through multiple approaches. Yet, the appropriate blend of feature selection and the dynamic behavior of the stock market consistently are an open challenge for researchers to discover a solution. With the expanding accessibility of high-recurrence trading data and the irregularity given by prior models, it is consistently available for local and international researchers to foster a model, which will provide a reliable outcome [12].

The Inspiration Is as per the Following
Forecasting of the stock market always is an interesting and open challenge problem for researchers [8]. As day to day more information is opening up, we face new difficulties in securing and handling the information to extricate the knowledge and examine the impact on stock costs. Finding the best possible approach for predicting the daily return direction of the financial market is always a challenging and debatable topic [13]. However, the desired goal of this study is to forecast the future market. The most common and fascinating part of this research area in the forecasting of the stock market is its selfsabotaging behaviour. The rapid development of machine learning models tools and technologies always provides opportunities for the researcher to find the hidden truths of the market and analyse the market in their own ways [14]. Identification of proper feature selection increase the performance of prediction of machine learning models [15]. Only a few studies have attempted to identify significant input features [16]. More research is needed on technical indicators for finding an optimal combination of input features for predicting stock prices [17]. However, the performance of forecasting models depends on quality features, and inappropriate feature selection leads to degrading the performance of the model and returns as a biased result. As proper feature selection takes an important role in the model building process in that building a reliable forecasting model which can identify risk factors and provide the positive and negative direction of the market is equally important. Thus, proper selection of algorithms during the model building process is a large challenge for researchers. Past researchers have attempted to adopt hybridisation techniques using either base-level machine learning models or deep learning models, but there is still a question mark as to whether we can hybridise ensemble models; ensemble models are one type of hybridisation of weak learners [18]. Hybridisation of ensemble models can provide better accuracy using voting/averaging techniques. Most researchers' basic selection criteria for finding the best model is to look at the testing accuracy, which is a common and straightforward process that leads to overfitting.
Along these lines, our objective was to develop an ensemble-based hybrid model that learns from the past stock market data and gauges the directional movement of the stock.

•
We propose a novel framework where six ensemble models are hybridised, minimising the model risk and increasing accuracy. • A new set of input features were designed, providing a real test for future researchers to think of that combination.

•
In this approach, we adopted two-phase overfitting protection. The first is LDA, and the second is the K-fold cross-validation. These techniques were merged into a single framework, making our model a unique one.

•
Our model selection process is somewhat different and uncommon. Instead of selecting the model which provides the highest accuracy, we selected the model whose training and testing accuracy difference is minimal, which is very much uncommon and innovative, and this selection process produces a model neither overfitted nor underfitted. • Specifically, a long time period of data was collected for our experimental setup, which explores the performance level of volatility-stress periods and smooth trending periods and it also examines the persistence of financial crisis and clustering.
The rest of the article is figured out as given below. Section 2 portrays the related work, while Section 3 depicts the Materials and Methods, and in Section 4, we explore our proposed framework. In Section 5, we focus on the exploratory outcomes and discuss critical discoveries in our examination. Finally, in Section 6, we discuss the conclusion part of our paper and the future scope of our study.

Related Work
For a long period of time, financial backers and researchers were of the belief that stock cost cannot be anticipated. This conviction appeared due to the efficient market hypothesis (EMH) term coined by Fama [19]. According to Fama, due to the dynamic behaviour and non-stationary nature of financial market data, the financial market cannot be predictable [19]. The EMH says that once a piece of new information is entered into the investment securities, the market reacts instantaneously. Thus, it is impossible to crack the market.
On the other hand, the hypothesis again revised by the hypothesiser and their revised version classified the study into three forms as strong form, semi-strong form, and weak form [20]. The weak form of the hypothesis surmises that using historical prices future stock price cannot be forecasted. The semi-strong form surmises that the stock market behaves instantly as any new information (publicly available) is entered, practically showing there is no opportunity to forecast the market. The third form is the strong form, which deals with both public and private information, which implies that it does not provide financial backers an edge on the lookout. However, some researchers accept the EMH theory, and some researchers have disputed the efficient market hypothesis, both empirically and theoretically [21][22][23][24]. According to Nti et al. [25], the amenable involved in the EMH is open for discussion to choose which one is correct. According to Shiller [26], a new era opened for the financial market in the 1990s when behavioural finance was focused on academics. The Nobel Laureate Robert Shiller's [26] investigation revealed that during the period of 1989 to 2000, the up and down of the stock market was influenced by sentiment. At the turn of the century, Thaler [27] implemented behavioural finance to forecast the cave-in of the internet stock boom and accused the broadly held EMH that acknowledged all financial backers as being normal and making fair-minded figures about what is to come. According to Shiller [26], the behavioural finance remains on the opposite side of EMH and puts an idea that the market made changes inside stock always reflects genuine information. Shiller [26] showed that stock costs are very unstable over a short period of time; however, to some degree, the stock market can be predictable over long periods.
Thus, we consider the above authors' outcomes that in the current scenario, there is a chance of prediction of the stock market.
The financial market forecasting is commonly based on two factors: fundamental and technical factors [12,25,[28][29][30]. The fundamental analysis utilises the monetary remaining of the firm, workers, the directorate, management decision policies, monetary status, company's yearly report, asset report, pay reports, earthbound and climatic conditions such as unnatural or catastrophic events, and political information to anticipate the future of the stock market [31][32][33][34]. The fundamental factor normally deals with the companies' GDP, CPI, and P/E ratios [35]. For the prediction of stock market, using a fundamental approach is more suitable for long run instead of short run forecasting [36]. The specialised investigators attempt to foresee the securities exchange through the learning of graphs that depict the historical market costs and technical indicators [37][38][39]. Technical indicators are statistical techniques that are calculated with the help of mathematical formulas using Mathematics 2021, 9, 2646 5 of 31 historical prices [40]. The development of artificial intelligence techniques and the increased number of datasets that are easily publicly available brings about new opportunities for researchers to explore something new from the market. According to Tshilidzi [41], the rapid development of AI techniques influences the EMH theory and provides an efficient way to learn from the market. A growing amount of research has been conducted [42][43][44][45][46][47][48], finding that post attestation demonstrates that the financial market may be anticipated to some extent [37,49]. Thus, there is a scope for investors to minimise the loss and maximise the profit when dealing with the stock market [50]. In recent studies, the financial market analysis and forecasting basically falls into two categories, i.e., statistical and machine learning [51].

Statistical Technique
Before the implementation of machine learning techniques, statistical techniques are used to learn the patterns of the stock are and given an approach to dissect and anticipate stocks. A group of statistical approaches are used, i.e., ARIMA, ARMA, GARCH, STAR, EMA, LDA, QDA, and regression techniques for the analysis of the financial market [52], with the ARIMA, EMA, and regression approaches having a predictable capability to some extent [53,54]. As the stock market is dynamic and non-linear in nature, the traditional statistical techniques have suffered a large amount to learn non-linear behaviour, and therefore the emerging machine learning techniques can avoid the limitations of traditional statistical techniques [55].

Machine Learning Technique
For the forecasting of the stock market, a large number of machine learning algorithms have been implemented [13,16,49,52,[56][57][58][59][60][61][62]. As is known from previous studies, to predict stock market directional movement using machine learning techniques normally produces better outcomes than any other techniques [63]. Leung et al. [64] found that the exact estimates of the stock worth list development are critical for building effective trading methods such as financial backers that can fence against the expected dangers from the securities exchange. And even though a small amount of improvement on accuracy, its anticipating execution will be profoundly beneficial. Machine learning technique commonly uses two approaches to predict the stock market (a) using a single model to predict the stock market, and (b) using an ensemble of machine learning models [13,60,63,65,66]. The use of ensemble models reported by some researchers found that ensemble models provide better performance than a single predictive model [40,67,68]. According to Fatih et al. [69], there still is little research that has been done to predict the stock market using ensemble models.
As compared to traditional models, machine learning models behave more flexible. There are so many machine learning algorithms that have been applied in previous studies [70]. Examples are logistic regression, support vector machine, k-nearest neighbours, random forest, decision tree [40], and neural networks [49,52,60,71]. As seen in the literature, the most commonly used algorithms for stock market forecasting are support vector machines and artificial neural networks [72]. Milosevic et al. [73] proposed a classification framework to predict the financial market in a long window. They suggest that if the stock value rises 10% in a financial year, we can consider the stock as a good stock; otherwise, it is lousy stock. During their model building process, they extracted 11 fundamental ratios applied to different algorithms as input features. Their study revealed that in differentiation with naïve Bayes and SVM, random forest shows a good F-fcore, i.e., 0.751. Ballings et al. [37] discussed how various ML models have been created for discovering the direction of the stock market. Their study adopted different ensemble machine learning algorithms such as random forest, AdaBoost, neural network, logistic regression, SVR, and KNN, as well as the datasets chosen from European Companies. Their model attempted to predict the price movement of the long-term stock market, and their study revealed that the random forest algorithm performs well in their dataset. Choudhury et al. [74] proposed an ANN model wherein their model used a backpropagation algorithm for the training phase and a multilayer feed-forward network for the testing phase for forecasting the value of a share. Their proposed model provided 0.996 as the regression value. Boonpeng et al. [75] proposed a multi-class classification problem in which their model can classify whether to buy, hold, or sell the stock. The author developed two models, one-against-all and one-against-one neural networks, and compared their performances with the traditional neural network. They concluded that one-against-all neural networks performed better than one-against-one and traditional neural network models, with an accuracy of 72.50%. According to Yang et al. [76], for a successful forecasting model, it is necessary to learn the non-linear factors of a stock. The authors proposed a radial basis function based on SVM with a genetic algorithm that is used for the forecasting of the stock market for the short run.
Dey et al. [77] proposed a model to forecast the stock whose input features are technical indicators. Their model was developed by using XGBoost algorithm, obtaining an accuracy level of 87.99% on the dataset Apple and Yahoo indexes. They compare their proposed model with SVM and ANN and finally revealed that their XGBoost models were the best among them. Basak et al. [19] proposed a framework for classification problems that forecast the price of the stock will increase or decrease. They used random forest and XGBoost classifier, and their study revealed ensemble models perform better if the proper combination of technical indicators is used as input features for a model. According to Ernest et al. [69], the ensemble machine learning models provide superior results in comparison with any individual machine learning model. In their study, they focused on the tree-based ensemble models, and their models were trained with three different stock exchange datasets. Their findings show that the extra trees ensemble classifier performed better than other tree-based classifiers.
Yang et al. [78] proposed an ensemble-based multi-layer feedforward network for the forecasting of the Chinese stock market. Their model was trained with backpropagation and Adam algorithms, whereas an ensemble was created with the help of the bagging approach. The performance of the model may increase if the used dataset is normalised further. Fatih et al. [69] proposed two models using multilayer perceptron with genetic algorithm and particle swarm optimisation. To train their model, they used nine technical indicators, which were recorded as the RMSE of 0.732583 and 0.733063 for MLP-GA and MLP-PSO, respectively. Finally, they concluded that hybrid machine learning methods can improve the accuracy level during forecasting. Wang et al. [79] developed a hybrid model with a combined BPNN, ARIMA, and ESM effort to predict the stock market weekly. The datasets used here were Shenzhen Integrated Index and DJIA. After successful modelling, they attempted to analyse every single framework with the combined ensemble framework. They found hybrid models performed better than traditional individual models and found 70.16% accuracy when forecasting the stock market's direction. Chenglin et al. [80] proposed a model that can accurately predict stock prices' direction. They used a combined model, a combination of SVM and ARIMA, and concluded that the performance of combined models performed better than a single predictive model. Tiwari et al. [81] proposed an ensemble model combined with the Markov framework with a decision tree to forecast the Bombay stock exchange. The proposed model provides an accuracy level of 92.1%, and it was concluded that combined models provide better accuracy than any individual models. A comparative study conducted by Prasad et al. [51] used three different algorithms, namely, XGBoost, Kalman filters, and ARIMA, and two different datasets taken, namely, NSE and NYSE. Their study was based on individual algorithm forecasting capability as well as a hybrid model also developed by them using Kalman filters and XGBoost. Finally, they compared four models and found the ARIMA and XGBoost to show promising results on both datasets, whereas the accuracy of the Kalman filter was not consistent in both datasets. A total of 87.64% accuracy level was maintained by the ARIMA model on the NSE dataset, whereas 79.44% was maintained on NYSE. On the other hand, an 88.66% accuracy level was maintained by the XGBoost model on the NSE dataset, whereas 79.44% was maintained on NYSE. The Kalman filter model showed a promising accuracy level of 89.09% on the NSE dataset, whereas 64.96% was shown on the NYSE dataset. The hybrid model provided 76.79% on the NSE dataset and 70.91% on NYSE dataset. Instead of suggesting the best model among them all, they left the decision for the users in terms of finding the best one among the four. Qiu et al. [66] proposed a combined model LSTM with attention mechanism, i.e., WLSTM+Attention, on three different indexes, finding that the proposed model MSE was less than 0.05. Moreover, they suggested that proper selection of features can improve the predictive capability of the model.
Several authors used technical indicators as input features to train their model. Weng et al. [18] concluded that only by using macroeconomic indicators with a machine learning approach can one predict the stock index efficiently. Markovic et al. [82] implemented different technical indicators in the LS-SVM model to find the trend movement of the stock market in the Southeast European Market, finding technical indicators to have a certain level of prediction power. Valavanis et al. [83] found that approximately 20% of the stock market predictive models used technical indicators as their input features. According to the authors, technical indicators are generally used to learn the flow of complex patterns from specific stock data and to forecast the upcoming behaviours. Fernández et al. [84] found an optimal combination of technical indicators; the researchers developed several indicators suitable for their use and then found an appropriate mix for their predictive model. Andrade et al. [85], discussed how for finding the optimal combination of indicators for modelling, a large amount of effort has been made. However, still, there is no sophisticated, easy technique available for developers to select appropriate technical indicators.
We refer to a summary of various recent research on short-term finance market prediction for more comprehensive and extensive assessments in Table 1. According to the authors, their model is a conservative model wherein during the bull market, small losses and small gain will be found.
The summary given in Table 1 indicates that there is some scientific study in place to predict the stock market in the short run, and the findings are remarkable [28,63,69,91] to some extent, inducing us to take our direction of study in the short run to forecast the stock market. Zulkernine et al. [92] suggested that it is easier to predict the stock market on a long-term basis than via daily stock forecasting. This is because the daily stock forecasting data constantly fluctuate and are full of noise. Thus, in our study based on fusion ensemble models and technical indicators, a brief review was conducted to some extent in our literature survey. According to Weng et al. [18], the prediction results may increase using the voting or averaging technique of different ensemble models. Singh et al. [93] provide their opinion that in recent times the machine learning models have shown promising results. Consequently, we aimed to build a hybrid model which learns from the historical prices in form of indicators and gauges the direction of the stock on the next day.
Some exploration leads to using a combination of unique models to produce their ensemble or hybrid models with technical indicators for the directional movement of individual stocks. Some researchers use pre-existing ensemble models such as XGBoost, CatBoost, etc., for finding the trend of the stock, and some of them use deep learning techniques to forecast the stock market. Thus far, we have noticed that more researchers are trying to develop hybrid models, i.e., a combination of individual machine learning models or deep learning models or their variety. Another challenge researchers face in finding a proper mix of input features is a very fuzzy and tedious job. We believe that there is still a gap in the hybridisation of individual ensemble models into a single frame with a unique combination of input features.
Therefore, after a successful review, we proposed a novel approach wherein ensemble models used a stacking framework, and the stacking framework takes the trained ensemble models as the base-level classifiers. Again, the ensemble models are to be used as the ensemble model meta-classifiers. In addition, we developed a unique combination of input features for the prediction of the stock market.

Materials and Methods
When we attempted to develop an ensemble framework, six types of ensemble algorithms were used: XGB classifier, AdaBoost Classifier, Gradient boosting, LightGBM, CatBoost, and Hist gradient boosting as base learners. In this section, we briefly describe the aforementioned base classifiers, and finally, we explain our final framework. Before discussing our framework, we first establish what ensemble learning is and how it works.

Ensemble Learning
A method takes forecasts from numerous ML models or similar forecasting from the same models at various times to make them more precise results. Forecast from a solitary individual model probably will not produce that many exact outcomes, and thus we require fostering a gathering-based AI model whose prescient limit is much higher than a solitary calculation. Figure 1 depicts the common structure of an ensemble model.
Bagging and boosting are two effective techniques used in machine learning to ensemble the models. Here, in our experiment, we used the boosting technique stacking classifier method; the stacking classifier works on a two-stage process. First, it sequentially creates base learners that are used as the input for the final stage. Then, in the final stage, the stacking classifier builds a meta-classifier using the base learners, which are treated as level-1 weak learners.

Gradient Boosting
Gradient boosting is a type of classifier used to merge different slow learners to produce a robust forecasted model [94]. The following principle describes gradient boosting.

1.
Initially, develop an error function, and that function is optimised at the time of model building.

2.
Iteratively develop weak models for forecasting.

3.
Finally, all the weak models are merged a create a robust model with minimising error function.
Our model-building process begins with a comparatively weak learner model (depending upon our dataset). Then, iteratively, a weak learner is converted to a better classifier F_m(x), and that classifier becomes a robust classifier.

Materials and Methods
When we attempted to develop an ensemble framework, six types of ensemble algorithms were used: XGB classifier, AdaBoost Classifier, Gradient boosting, LightGBM, CatBoost, and Hist gradient boosting as base learners. In this section, we briefly describe the aforementioned base classifiers, and finally, we explain our final framework. Before discussing our framework, we first establish what ensemble learning is and how it works.

Ensemble Learning
A method takes forecasts from numerous ML models or similar forecasting from the same models at various times to make them more precise results. Forecast from a solitary individual model probably will not produce that many exact outcomes, and thus we require fostering a gathering-based AI model whose prescient limit is much higher than a solitary calculation. Figure 1 depicts the common structure of an ensemble model. Bagging and boosting are two effective techniques used in machine learning to ensemble the models. Here, in our experiment, we used the boosting technique stacking classifier method; the stacking classifier works on a two-stage process. First, it sequentially creates base learners that are used as the input for the final stage. Then, in the final stage, the stacking classifier builds a meta-classifier using the base learners, which are treated as level-1 weak learners.

Gradient Boosting
Gradient boosting is a type of classifier used to merge different slow learners to produce a robust forecasted model [94]. The following principle describes gradient boosting.
1. Initially, develop an error function, and that function is optimised at the time of model building. 2. Iteratively develop weak models for forecasting. 3. Finally, all the weak models are merged a create a robust model with minimising error function.

AdaBoost
In 2003, Freund and Robert [95] developed a meta-algorithm and coined its name, AdaBoost; they were awarded a Nobel Prize for their work. According to them, AdaBoost can form an alliance with other machine learning algorithms to increase the accuracy level. Here, in this modelling technique, the weak learners are created sequentially.
During model training, an AdaBoost classifier can be represented as where the object z is used as an input for each weak learner, i.e., f m t and a value is returned which represents an object of that class. For instance, in a binary classification problem, the predicted class object and the absolute value depend upon the outcome of the weak learner's sign. Similarly, the sample belongs to the positive class, and the Tth classifier falls into the positive class or otherwise falls into the negative class.
In the training set for each sample, a weak learner generates an output hypothesis h(z i ). During each iteration t, a coefficient αt is assigned to the selected weak learner; for instance, a minimised sum of error terms Em t will be calculated for the t-stage classifier.
Here, Fm t−1 (z i ) is a robust classifier that is derived from the previous stage training, Em(Fm) denotes the sum of error terms, and f m t (z) = α t h (z) is the weak classifier which is considered for addition to the final classifier.

Extreme Gradient Boosting (XGBoost)
The XGBoost model is an early experimental model of a Ph.D. student at Washington University [95]. XGBoost is an improved version of the gradient boosting algorithm, which is more scalable and efficient. The features that make the XGBoost algorithm something different are provided here [63]. The automatic feature extraction can be possible. XGBoost supports the regularisation technique to avoid overfitting and has the capability to learn from non-linear datasets. Moreover, the parallelisation feature makes the XGBoost train with multiple CPU cores. It is one of the tree-based ensemble additive models that are composed of multiple base learners. In general, the XGBoost can be represented as whereŷ i is the final predictive model, which is the combination of all weak learners, and x is the input feature for each weak learner, i.e., m. From the paper [94], we extracted the objective function for XGBoost as given below: By looking at Equation (4), we see that the objective function has two parts; the first part denotes the loss function, i.e., L denotes the training loss of either logistic or squared loss, and the second part represents the addition of each tree's complexity. z i is actual value andẑ i is the predicted value, whereas Ω is the regularisation term, T denotes the total number of trees, and f is the function.

The LightGBM
The LightGBM is the most diversely used boosted model that supports parallel training such as extreme gradient boosting [96]. When dealing with multi-dimensional datasets, the LightGBM works much better than the traditional boosting algorithms or XGBoost. Typically, the boosting algorithms split the tree structure horizontally (i.e., level-wise growth), whereas the LightGBM increases the tree structure vertically. Figure 2 shows the growth of tree division between level-wise and leaf-wise. value and Z^i is the predicted value, whereas is the regularisation term, T denotes the total number of trees, and f is the function.

The LightGBM
The LightGBM is the most diversely used boosted model that supports parallel training such as extreme gradient boosting [96]. When dealing with multi-dimensional datasets, the LightGBM works much better than the traditional boosting algorithms or XGBoost. Typically, the boosting algorithms split the tree structure horizontally (i.e., levelwise growth), whereas the LightGBM increases the tree structure vertically. Figure 2 shows the growth of tree division between level-wise and leaf-wise.

CatBoost
CatBoost is the first Russian machine learning algorithm, developed in the year 2017 by the researchers of Yandex [97]. It is one of the tree-based boosting algorithms. CatBoost stands for categorical boosting, but it deals with absolute values and with other matters (not only with category features but also with regression problems and automatic feature engineering also possible for data). Therefore, compared to the training time of different gradient boosting algorithms, CatBoost takes less time to train. Generally, in a boosting technique, we follow a standard GBT technique for the construction of decision trees, but CatBoost follows two ways of constructing the tree. One is an ordered technique, and the other is a basic technique.
A random permutation technique is applied in the ordered mode for training that follows n number of supporting models, i.e., M1…Mn, such that the Mi is trained with the help of the earliest I samples in the permutation. In each iteration, for obtaining the residual of the jth sample, the MJ-1 model is used.

Histogram Gradient Boosting
The GBDT framework takes a longer time to train the model as the number of datasets increases, and sometimes the average accuracy level also decreases. Thus, histogram-based gradient boosting is very much effective when there is a large size of datasets [98]. Moreover, this technique reduces training time without losing the accuracy level. Thus, we can say histogram-based gradient boosting is a technique for training faster decision trees used in the gradient boosting ensemble. The splitting principle of

CatBoost
CatBoost is the first Russian machine learning algorithm, developed in the year 2017 by the researchers of Yandex [97]. It is one of the tree-based boosting algorithms. CatBoost stands for categorical boosting, but it deals with absolute values and with other matters (not only with category features but also with regression problems and automatic feature engineering also possible for data). Therefore, compared to the training time of different gradient boosting algorithms, CatBoost takes less time to train. Generally, in a boosting technique, we follow a standard GBT technique for the construction of decision trees, but CatBoost follows two ways of constructing the tree. One is an ordered technique, and the other is a basic technique.
A random permutation technique is applied in the ordered mode for training that follows n number of supporting models, i.e., M1 . . . Mn, such that the Mi is trained with the help of the earliest I samples in the permutation. In each iteration, for obtaining the residual of the jth sample, the MJ-1 model is used.

Histogram Gradient Boosting
The GBDT framework takes a longer time to train the model as the number of datasets increases, and sometimes the average accuracy level also decreases. Thus, histogram-based gradient boosting is very much effective when there is a large size of datasets [98]. Moreover, this technique reduces training time without losing the accuracy level. Thus, we can say histogram-based gradient boosting is a technique for training faster decision trees used in the gradient boosting ensemble. The splitting principle of histogram-based gradient boosting is as follows: instead of finding the split points on the sorted feature values, the histogram-based algorithm buckets continuous feature values into discrete bins and uses these bins to construct feature histograms during training. Since the histogram-based algorithm is more efficient in both memory consumption and training speed, we developed our work on its basis.

Dimensionality Reduction Technique
The dimensionality reduction technique is used to minimise the number of features during the training dataset. Dimensionality reduction techniques are applied to the ML models to avoid overfitting issues. As the number of dimensions decreases, the corresponding training parameters also decrease, making the model more straightforward and indicating the degree of freedom. If the number of parameters increases, then the degree of freedom is also high, leading to overfitting the model. That means our model will perform better when the training dataset is provided, but it may not serve better when we provide a test dataset. It is a data preparation technique that will be applied first on the dataset, but it must be remembered that the dataset must be cleaned and scaled before use [99].

Evaluation Matrices
To examine our proposed model's performances, we used the performance matrices, i.e., accuracy, ROC curve (AUC), and F-score are used. Thus, we took the blend of matrices rather than a solitary one to check the framework's performance. The performance matrices are given below.
where t p represents the total number of true positive values; t n represents the total number of true negative values; f p represents the total number of false positive values; f n represents the total number of false negative values.
According to Sokolova et al. [100], the area under curve is an appropriate evaluation matrix for classification problems; when the AUC value increases, the prediction performance of the model also increases.

Tools and Technologies Used
For the technical analysis, Python environment with Anaconda and Google collab was used for model development. Our total model development procedure was executed with an Intel processor (core-i5-1035G1, 1.19 GHz) with 8 GB of memory and a 64-bit Windows operating system. The period in which the dataset was taken and tested in our model was from 3 January 2000 to 1 July 2019 for DJIA; for S&P 500 Index, the HSI data were from 2 January 2002 to 1 July 2019; the DAX index data were from 12 December 1987 to 18 August 2021; and the NIKKEI 225 data were from 5 January 1965 to 20 August 2021. The Dow Jones Industrial Average index had 4372 tuples, the S&P 500 index had 4904 tuples, the HSI had 4304 tuples, the DAX index had 8495 tuples, and the NIKKEI 225 index had tuples initially. Preliminarily, the records of the dataset had the existing features, i.e., trading volume, high, close, open, and low with the corresponding trading data. 'Yahoo/Finance' portal is a reliable source to download our dataset.

Data Pre-Processing
In previous related studies, there were no specific rules for selecting related input features to forecast the flow direction of the index. Hence, without hesitation, we can say that each technical feature has its hidden behaviour. Using this covert behaviour, the investors try to analyse the current situation and decide whether to buy or sell, according to Weng et al. [18]. Finally, given their conclusion on the technical indicators that analyse the hidden behaviour of these input indicators, it can forecast the monthly closing price of major U.S. indices. Therefore, we used technical indicators and some other features to predict stock index movement in this research.
Once our raw dataset was received, the data needed to be preprocessed. During data preprocessing, we had to follow the following steps: (a) Generally, the index extracted from the web portal has some existing features that are open, close, low, high, etc. Now, looking at the dataset, we had to handle the null and missing values. (b) In the second step, we extracted 23 technical indicators using the preexisting dataset described in the previous point. Apart from the technical indicators, we extracted two more features, i.e., the difference between the open and close price, which reflects the increase and decrease of stock value on that day. Another one is to find out the volatility, i.e., the difference between high and low stock prices. (c) Label generation: In label generation, we constructed a response predicted variable, the binary feedback variable, i.e., Z t ∈ {0, 1}, for individual trading days of our stock. The feedback variable that will be forecast on the T'th day is calculated as If Open t < Close t Then Z t = 1 Else Z t = 0 End If Z t is the forecast label labelled as 'TREND' that is used as our predicted variable, Open is the opening price of the index on the day, and Close t is the closing price of the index on the day. Here, we assume that when the Z t value returns '1', the stock price will increase, and when the Z t value returns '0', we consider the stock price to have decreased. (d) Although we are dealing with technical indicators representing a stock's hidden behaviour, we must find a perfect combination of input features with no multicollinearity issues. The problems with multi-collinearity from a mathematical viewpoint are that the coefficient gauges themselves will, in general, be untrustworthy and that variable is not measurably critical; because of these disadvantages, we ought to consistently check for multi-collinearity in our dataset. For checking multi-collinearity, we have to create a corelation matrix with the help of the correlation function corr(·) (which is used to find the pairwise correlation of all columns in the dataframe) function. This function creates a matrix with a correlation value with the combination of each variable. Therefore, when we diagonally check the matrix, we will obtain correlation values. By looking at the matrix, we have to remove the features whose values are more than 50. Thus, quickly looking at the matrix, we can easily identify the highly correlated values, which should be released. During this feature selection process, we used 23 technical indicators along with seven standard features. After successful correlation testing, we found only seven input features perfectly combined and ready to train our model.
Finally, we found four technical indicators and two derived features, as well as one preexisting feature, whose descriptions are given below for our experiment. For our technical analysis purposes, we used (Ta-Lib) library, popularly used by traders and researchers for calculating technical indicators. This library can be downloaded from the www.ta-lib.org (accessed on 1 January 2021) website [101][102][103]. In this study, we employed technical indicators with other dummy variables as our input features for our ensemble models.

Triple Exponential Average (TRIX)
The goal of TRIX is to find out the change of price percentage between two triple SMEA. Trix = (ema3n − ema3n-1)/ema3n-1 where ema3n denotes the previous n period's ema.

Percentage Price Oscillator (PPO)
The PPO can be calculated as the difference between the moving averages on n different lengths, i.e., slow-moving average and fast-moving average, and divided by slow-moving average.
F_ma (fast-moving average for a short period); • S_ma (slow-moving average for a long period).

Ultimate Oscillator (ULT)
ULT indicates whether our stock is oversold or overbought with the goal that we can produce whether we purchase or sell the stocks.
Open This is a pre-existing feature that denotes any stock price during the opening of the stock in each day.
Open-close This is an extraction of one feature that indicates the difference between daily transactions' opening and closing values.
High-low This is also an extracted feature that finds the volatility of each trading day. It is calculated as the difference between the high price and low price value of that day.
(e) After obtaining the useful features, we divided our database into two parts: 75% of the data reserve for training and 25% of the data to test our predictive model. (f) Finally, in the data processing step, we implemented the scaling technique to normalise our features, which are to be inputted into our model. The statistical description of our three datasets are provided below in Tables 2-6, in which we exploit each of the features of max, min, mean, and standard deviation values of all three datasets.

Proposed Framework
The objective of our study was to develop a framework in which different ensemble models are combined together to form a single predictive model. In a standard stacking framework, the training database is fitted by the base learners known as first-level classifiers. These base-level classifiers after training are used as the input features for the second-level meta classifiers. However, sometimes in ensemble techniques, the level-1 models show overfitting problems. Thus, we simply introduce the dimensional reduction technique to prepare necessary inputs for the first-level classifiers that may not create any over-fitting issues.
We trained all the base-level classifiers in our fusion-based work and converted them into a forecast model; these are used as input for our stacking framework. Hence, we gathered each base classifier's predictive output in our model development process, which is treated as a new set of data for our final model. Thus, categorically, we divided the model into two phases. In the first phase, XGBoost, Adaboost, GB, LightGBM, CatBoost, and Hamming Gradient Boosting are used as level-1 classifiers. The stacking classifiers will be used as a level-2 classifier which is called a meta-classifier and is used to extract the preprocessed hidden features from the level-1 classifiers and combine the level-1 classifiers to make a strong level-2 classifier. The developed framework and pseudo-code are given in Figure 3 and Algorithm 1. Step 1: Implementation of dimensional reduction technique to prepare a training set for base-level classifiers. 4: Step 2: Initialise E to 6 (number of base-level classifiers) 5: For i < −1 to E do 6: Read the baselevel classifier m i 7: Prepare a training set for first level classifiers 8: For which is treated as a new set of data for our final model. Thus, categorically, we divided the model into two phases. In the first phase, XGBoost, Adaboost, GB, LightGBM, CatBoost, and Hamming Gradient Boosting are used as level-1 classifiers. The stacking classifiers will be used as a level-2 classifier which is called a meta-classifier and is used to extract the preprocessed hidden features from the level-1 classifiers and combine the level-1 classifiers to make a strong level-2 classifier. The developed framework and pseudo-code are given in Figure 3 and Algorithm 1. We developed a fusion-based ensemble model in this investigation, wherein we used six ensemble algorithms, i.e., XGBoost, Adaboost, LightGBM, GB, CatBoost, and Hamming Gradient Boosting.
Here, we implemented the two most advanced techniques for avoiding overfitting issues during training the models. We took three different datasets and seven input features (detailed description given in Section 3.2). In any model building, process data pre-processing takes an essential role in generating a better predictive model. Thus, in our experiment, we went through the data pre-processing steps; this is elaborated upon in Section 3. After data pre-processing, we considered our models, which were to be trained. During our model building process, we used the stacking framework for the hybridisation of models. In a stacking framework, it takes two steps to develop a hybrid model. Thus, in step 1, it first finds out what base-level classifiers there are; using the base-level classifiers can prepare a training dataset for the second level. Thus, here, we took all our above-mentioned ensemble models as base level classifiers that are labelled as the level-1 classifier.
However, the real challenge came to our notice when we trained the level-1 classifiers with our seven input features. When we trained our base-level classifiers, we simply obtained overfitted models. Thus, to avoid overfitting issues for level-1 models, we implemented the dimensional reduction technique. Different types of dimensional techniques are present; after the successful implementation of this technique, we found the LDA technique was the perfect one for avoiding overfitting in our level-1 models. After implementing the dimensional reduction technique, we successfully developed six ensemble models whose results are given in Tables 7-11. These base-level classifiers are used as a training dataset for second-level classifiers, a meta-classifier. However, again, another challenge came to our attention in that when we trained the meta classifiers with the help of level-1 classifiers, again, we faced overfitting issues. Thus, to avoid overfitting in our meta-classifier, we implemented the cross-validation technique. Therefore, after the implementation of the cross-validation technique, not only did we avoid overfitting, but we also obtained a generalised model. Tables 13,15,17,19,and 21 and Tables 12,14,16,18,and 20 show the performance level of our meta classifier with cross-validation and without cross-validation, respectively.

Results and Discussion
In this segment, we focus on our experimental results during our development process.
These ensemble models were trained with the three datasets, whose accuracy measures are given below in Tables 7-11. We remarked that we used six ensemble boosted models whose training accuracy and test accuracy are given by looking at Table 7  Observing the cells of accuracy difference shows the difference in accuracy between the training accuracy set and the testing accuracy set. For our information purpose, it is always a good practice that develops a model as much as a generalised model. A generalised model is a model with little or no difference between the training and testing accuracies. Hence, Table 7 shows a small variation, i.e., 0.94 to 2.15, between training and testing accuracy; the less indifference in the model is more common. Here, our CatBoost and HistoGradient boosting models were more generalised than the other four models. After this, these six models' predictive outputs were used as input features for the meta-classifier.
We remarked that we used six ensemble boosted models whose training accuracy and test accuracy are given in  94.78. From here, all these models were ready to train our meta-classifier, which was our primary goal in terms of developing a generalised model.
As shown in Table 9, we found six models were developed whose training and testing accuracy were given. From here, all these models were ready to train our meta-classifier, which was our primary goal in terms of developing a generalised model.
We used six ensemble boosted models whose training accuracy and test accuracy are shown in Table 10 We used six ensemble boosted models whose training accuracy and test accuracy are given in Table 11

Performances of Fusion-Based Meta-Classifiers
In this section, we attempt find out the best combination of meta-classifiers. In the development of the meta-classifiers, we used the stacking cross-validation technique to combine level-1 predictive outputs and finally developed the metaclassifiers. The aim of our experiment was to find out a generalised model. The model was generalised when the training accuracy and testing accuracy were both are close to each other. Table 12 shows meta-classifiers' structure without cross-validation and Table 13 shows the construction of meta-classifiers with cross-validation.  By observing Tables 12 and 13, we can say that the accuracy value of all models in Table 12 without C.V and in Table 13 with C.V seemed to be the same, with a negligible difference. However, when we came to find a generalised model with good predictive power, then Table 13 with cross-validation provides a promising result. Table 13 found the six meta-classifiers, i.e., XGBoost, AdaBoost, GB, LightGbm, Cat-Boost, and HistoGradient Boosting. Out of the six meta classifiers, some of them had good accuracy. Some of them were better, but the LightGBM and HistoGradient Boosting provided good, promising results as they tend to prove themselves as generalised models with good predictive power. The model was generalised when the training accuracy and testing accuracy results were nearer to each other with a small difference or no difference. The meta classifier LightGBM provided training accuracy of 93.33 and testing accuracy of 93.27, with a negligent difference, i.e., 0.06. The meta-classifier HistoGradient Boosting provided a training accuracy of 93.96 and a testing accuracy of 93.46 with a difference of 0.5.

Performances of DJIA Index
When we compared both models, we found that the meta classifier HistGrandBoosting showed the highest predictive accuracy, i.e., 93.46, whereas LightGbm showed 93.27 as its predictive accuracy; this means in terms of comparison of accuracies, HistGrandboosting proved itself to be more promising than LightGbm. However, if we were aiming for a generalised model with reasonable accuracy, then we can say that LightGbm is the perfect one because the training and testing accuracy difference was only 0.06, providing excellent promising power in prediction with both training and testing accuracy, inducing us to consider it as the best model among all.

Performance of HSI
In Table 14, we show the six meta classifiers that were developed without crossvalidation technique; the purpose of this development was to bring about a conclusion that instead of using a single model, if we were to use a combination of models for prediction, then definitely our combined approach can predict better than an individual one.  95.35, and Meta-LightGBM provided 94.97 as the testing accuracy. However, by looking at the differences in accuracies, we found that the Meta-H.G Boost provided a difference of 0.31, whereas Meta-LightGBM provided a difference of 0.12, which was the slightest difference among all six meta-classifiers. Thus, the Meta-LightGBM is the more generalised model among all the models, which are shown in Table 15. As shown in Table 17, we found that the meta-classifiers were more generalised when the cross-validation technique was applied. Table 17 shows that the Meta-AdaBoost, Meta-GB, and Meta-CatBoost showed the highest testing accuracies, i.e., at 95.43, whereas the Meta-LightGBM provided 94.68 as the testing accuracy. However, by looking at the accuracy difference, we found that Meta-XGBoost had a difference of 0.28, whereas Meta-LightGBM was 0.05, which was the slightest difference among all six meta-classifiers. Thus, Meta-LightGBM was the more generalised one among all. Table 17. Construction of predictive meta-classifiers with cross-validation of S&P-500 index.

Models
Training Accuracy

Performance of DAX Index
As shown in Table 18, we found six meta-classifiers that were developed without using the cross-validation technique. Here, by looking at the table, we found the training accuracy of all models appeared same with small differences, such as that we observe the testing accuracy, all model results looked the same and only a small variation was found. However, our result finding revealed that the accuracy difference was increased in DAX index, and therefore there was a chance of underfitting of the model. As shown in Table 19, we found that the meta-classifiers were more generalised when the cross-validation technique was applied. In Table 19, by looking at the accuracy difference, we see that Meta-AdaBoost provided the highest difference, i.e., 0.47, whereas Meta-LightGBM provided 0.10, which was the smallest difference among all six metaclassifiers. Thus, the Meta-LightGBM was the more generalised one among all models. When we applied the cross-validation technique in our model, we found the models were more generalised and the training and testing accuracy difference was significantly less.

Performance of NIKKEI 225 Index
As is shown in Table 20, we found six meta-classifiers that were developed without using the cross-validation technique. Here, we found the training accuracy of all models appeared the same with a small difference such that that if we looked at the testing accuracy, all model results appeared the same, with only a small variation found. However, our result findings revealed that the accuracy difference was increased in NIKKEI 225 index when there was no cross-validation technique applied, which may lead to underfitting of a model. As we can see in Table 21, we found that the meta-classifiers were more generalised when the cross-validation technique was applied. By looking at the accuracy difference, we found that the Meta-XGBoost provided the highest difference, i.e., 1.63, whereas Meta-LightGBM provided 0.22, which was the smallest difference among all six meta-classifiers. Thus, Meta-LightGBM was the more generalised one among all models. When we applied cross-validation technique in our model, we found the models were more generalised and the training and testing accuracy differences were significantly less.

Evalution Matrices of Meta-LightGBM
From the above experiment and discussion, we found the Meta-LightGBM model to be a more generalised model. Thus, in this paragraph, we explain the performance measurement technique of Meta-LightGBM of different datasets used in our experiment. If we are to access the accuracy of a classification model and the quality of prediction, then we must extract the classification report of that model. This classification report provides the results of metrices, i.e., recall, precision, and f1-score on the basis of class. Table 22 shows the classification report of meta-classifier LightGBM, and the model had an AUC score of 93.36. When our model worked on the DJIA index, the performance of the metrics for class 0 was 0.93, recall was 0.93, and the F1-score was 0.93, whereas for class 1, the precision was 0.94, recall was 0.93, and the F1-score was 0.94. As we found, the AUC score was much closer to 1, which indicates that our model performance was good enough for this dataset.  Table 23 shows the classification report of meta-classifier LightGBM, and the model had an AUC score of 95.43. When our model worked on the S&P 500 Index, the performance of the metrics of class 0 was 0.97, recall was 0.94, and the F1-dcore was 0.95, whereas for class 1, the precision was 0.94, recall was 0.97, and the F1-score was 0.95. As we see, the AUC score was much closer to 1, which indicates that our model performance was good enough for this dataset.  Table 24 shows the classification report of Metaclassifier LightGBM, and the model had an AUC score of 95.26 for HSI. When our model worked on the HSI, the performance of the metrics of class 0 was 0.95, recall was 0.96, and the F1-score was 0.96, whereas for class 1, the precision was 0.96, recall was 0.94, and the F1-score was 0.95. As we see, the AUC score was much closer to 1, which indicates that our model performance was good enough for this dataset.  Table 25 shows the classification report of meta-classifier LightGBM, and the model had an AUC score of 84.71 for DAX index. When our model worked on the DAX index, the precision metrics of class 0 was 0.86, recall was 0.87, and the F1-score was 0.87, whereas for class 1, the precision was 0.83, recall was 0.82, and the F1-score was 0.83. In comparison to the previous three datasets, the AUC scored somewhat less but was still good enough for a model performance.  Table 26 shows the classification report of meta-classifier LightGBM, and the model had an AUC score of 84.41 for NIKKEI 225 index. When our model worked on the NIKKEI 225 index, the precision metrics of class 0 was 0.90, recall was 0.94 and the F1-score was 0.92, whereas for class 1, the precision was 0.84, recall was 0.75, and the F1-score was 0.79. In comparison to the previous three datasets, the AUC scored somewhat less but was still good enough for a model performance. The Figure 4 shows a graphical presentation of AUC scores, and Table 27 shows the AUC scores of the Meta-LightGBM model of different datasets. Table 26 shows the classification report of meta-classifier LightGBM, and the model had an AUC score of 84.41 for NIKKEI 225 index. When our model worked on the NIKKEI 225 index, the precision metrics of class 0 was 0.90, recall was 0.94 and the F1-score was 0.92, whereas for class 1, the precision was 0.84, recall was 0.75, and the F1-score was 0.79. In comparison to the previous three datasets, the AUC scored somewhat less but was still good enough for a model performance. The Figure 4 shows a graphical presentation of AUC scores, and Table 27 shows the AUC scores of the Meta-LightGBM model of different datasets.

Forecast Accuracy Comparison With Past Work
For a comparative study, we took the suggested model of Qiu et al. [66], i.e., WLSTM+ Attention with our proposed model, i.e., Meta-LightGBM. As a benchmark, we took the mean absolute error (MAE) of both the models. The model which gave less MAE value

Forecast Accuracy Comparison with Past Work
For a comparative study, we took the suggested model of Qiu et al. [66], i.e., WL-STM+Attention with our proposed model, i.e., Meta-LightGBM. As a benchmark, we took the mean absolute error (MAE) of both the models. The model which gave less MAE value was the best predictive model [40]. As shown in Table 28, our proposed model showed a lower MAE value than the WLSTM+Attention model. Thus, we can say the proposed model can perform better than the WLSTM+Attention model.

Practical Implications
Every investor's 'dream' is to be able to properly anticipate the stock price and, as a result, compute the expected return. The proposed method has the ability to provide investors with useful information. Nowadays, the ML-based tools provide recommendations about specific stocks for the investors so that the investor gains a preliminary idea and can minimise the losses on investment. Artificial intelligence has a genuine effect on monetary exchange by mining significant data and providing modest and effectively accessible apparatuses that advantage everybody, not simply corporates. The speculation choices made by AI will be determined, exact and fair, not at all like those made by humans, who are evidently excessively enthusiastic about the exchange of securities. The proposed model may be used to develop new trading techniques or to manage stock portfolios by changing equities on the basis of trend predictions. It will help different financial institutions to gather information about the movement of the stock so that they can guide their investors to book profits and minimise the losses. Furthermore, it also provides a new direction for future researchers on how ensemble models are hybridised with different combinations of technical indicators and what the outcome will be when different parameters are tuned. Our experimental outcome revealed that the meta-classifier LightGBM had less error differences between training and testing accuracy that made our model more generalised. With the help of our model, any investor can minimise the losses during trading.

Conclusions and Future Scope
This study is based on fusion of ensemble models with technical indicators and extracted features to develop an evolutionary ensembled framework for forecasting stock market swings. During the model building process, our goal was to select a generalised model whose training and testing accuracy difference was minimal, instead of finding a model which provides the highest accuracy so that the investor and financial decisionmakers can minimise the biased result using this proposed model. For our novel approach, we randomly selected five different indexes of four different countries, using seven features: four technical indicators, two derived features from the pre-existing elements, and the open price of the stock indices. Six ensemble models were used as base classifiers in layer one: XGBoost, LightGBM, AdaBoost, Gradient Boosting, CatBoost, and HistoGradient Boosting. During the individual modelling process (layer one modelling), we used a dynamic reduction algorithm, i.e., LDA, to generate probable input for the next layer classifier to prevent overfitting.
Our fact-finding results revealed that when we fused the ensemble models and developed a meta-classifier without using cross-validation technique, the fusion models training and testing accuracy difference was not good in comparison to the fusion models that were based on cross-validation technique. Sometimes, we found a fusion model provided better performance than a single predictive model, but the fusion models which were trained with cross validation technique showed promising results. During our experiment, we found that when the data size was increased, the performance of the model sometimes decreased. Our goal was to find a fusion model that offered a minimal overfitting and underfitting level, i.e., the training and testing accuracy difference should be significantly less. By looking at Table 29, we can summarise that the meta-classifier Meta-LightGbm is a model that had minimal training and testing accuracy difference in all the indexes. The recorded accuracy differences are shown in Table 29. Instead of only focusing on accuracy if we consider both accuracy and generalised acceptance, the meta-classifier Meta-LightGBM with cross-validation was shown to be more promising than all other predictive models. The run time of the model (Metta-LightGBM) on different datasets depended upon different constraints, such as the hardware configuration of the system in which the model was run though Googlecolab platform and the running time of our proposed model on S&P 500 index was 5 s, DJIA was 4 s, HSI was 4 s, NIKKEI 225 index was 40 s, and DAX index was 14 s in order to train the model. From this, we found that the execution time may differ as the number of datasets increases. Finally, we can say that the fusion of ensemble models can be more generalised when we apply a cross-validation technique; not only will we improve the predictive accuracy, but also we will obtain a generalised model whose training and testing accuracy are very much closer to each other. Thus, we obtain a model which is neither overfitted nor underfitted.

Limitations and Future Work
Despite our proposed methodology's great predictive performance, there are some constraints which may be worked over in future. Our current study examined only one day ahead in terms of stock direction prediction, and therefore the study must be extended in future for long-term prediction. Our current study focused only on four countries' stock exchanges, but it may be examined and extended to study more stock exchanges of different countries. The research group did not consider any other information sources such as fundamental analysis or sentiment analysis, and thus this dataset must be considered in future experiments. In future work, we will use the factorisation machines and observe how they help in predicting the stock market behaviour [104][105][106][107]. Data Availability Statement: Publicly available datasets were analysed in this study. This data can be found on the 'Yahoo/Finance' portal. The data used to support the findings of this study are available from the first author upon request.