A Machine Learning Method for Prediction of Stock Market Using Real-Time Twitter Data

: Finances represent one of the key requirements to perform any useful activity for humanity. Financial markets, e.g., stock markets, forex, and mercantile exchanges, etc., provide the opportunity to anyone to invest and generate ﬁnances. However, to reap maximum beneﬁts from these ﬁnancial markets, effective decision making is required to identify the trade directions, e.g., going long/short by analyzing all the inﬂuential factors, e.g., price action, economic policies, and supply/demand estimation, in a timely manner. In this regard, analysis of the ﬁnancial news and Twitter posts plays a signiﬁcant role to predict the future behavior of ﬁnancial markets, public sentiment estimation, and systematic/idiosyncratic risk estimation. In this paper, our proposed work aims to analyze the Twitter posts and Google Finance data to predict the future behavior of the stock markets (one of the key ﬁnancial markets) in a particular time frame, i.e., hourly, daily, weekly, etc., through a novel StockSentiWordNet (SSWN) model. The proposed SSWN model extends the standard opinion lexicon named SentiWordNet (SWN) through the terms speciﬁcally related to the stock markets to train extreme learning machine (ELM) and recurrent neural network (RNN) for stock price prediction. The experiments are performed on two datasets, i.e., Sentiment140 and Twitter datasets, and achieved the accuracy value of 86.06%. Findings show that our work outperforms the state-of-the-art approaches with respect to overall accuracy. In future, we plan to enhance the capability of our method by adding other popular social media, e.g., Facebook and Google News etc.


Introduction
Stock price fluctuation signifies the existing market trends and company evolution that might be measured to sell or buy stocks. A stock market estimate has been considered as one of the highly challenging and essential tasks due to its nonlinear or dynamic behavior [1]. Stock prices turn up and down every minute or even every second because of variations in demand and supply. If a group of individuals wants to purchase a specific stock, its price will rise. Whereas, when most people owning a specific stock want to sell it, its market price will decrease. This association among supply and demand is tied into the news, blogs, and sentiment analysis (SA), etc. Stock market prediction using SA deals with automatic [2] performance of the stock market. In this regard, Twitter is the most popular platform that can be used to predict public opinion, so it can be useful for forecasting the stock market price [3].
Nowadays, there has been a debate on the effectiveness of the sentiments conveyed via social media in forecasting the change in the stock market. Various researchers have revealed that sentiments might influence the stock market movement and act as potential predictors for trade-off outcomes [4,5]. Furthermore, different methods of sentiment mining can be employed differently in numerous stock circumstances [6]. In other words, there are a lot of responsibilities involved in evaluating opinions about the traits and features of stocks. [7,8]. However, the existing techniques do not suggest an absolute reliance on the number of tweets per unit of time. The amount of data gathered and analyzed during the existing studies remain inadequate, thus causing predictions with low accuracy [9,10].
Even though extensive techniques have been presented by the research community for stock market prediction, these approaches have some potential limitations. The existing methods are not robust to tackle the versatile nature of stocks. Furthermore, the massive size of data requires such methods which can learn a more reliable set of features to better demonstrate the varying behaviors of stocks over the time. Hence, there is a need for performance enhancement both for the stock trends prediction accuracy and time complexity.
To deal with the issues of current approaches, we propose the technique namely the SSWN with ELM classifier for stock market prediction. The presented method comprises three main steps which are data gathering, sentiments computation along with model training, and finally the stock market prediction module. More descriptively these are the contributions of this paper: 1.
An efficient framework namely SSWN is proposed with ELM and RNN classifiers for stock market behavior prediction.

2.
Utilization of SA for stock market prediction and modification of SWN by introducing new terms related to stock market. 3.
Assignment of sentiment scores to newly introduced stock market-related terms by applying the information gain method, resulting in the development of a new sentiment lexicon SSWN.

4.
To perform comparative analysis with other methods to show the effectiveness of proposed method.
The remainder of this paper is structured as follows: Section 2 shows the related work. The proposed method is presented in Section 3. Experiments and results are described in Section 4, while Section 5 concludes our work.

Related Work
Numerous studies [11][12][13][14][15][16][17][18][19][20][21] have been exhibited on employing electronic knowledge to forecast stock trends. For instance, Zhang et al. [22] proposed an LSTM based method to estimate the stock market trend. In the first step, the input is partitioned into three parts: open opinion space, stock transaction, and market transaction data. The one-layer LSTM was employed to prepare long memory in public opinion space, whereas two layers of LSTM were applied to train short memory in stock series and market. After this, data were combined by using the merged layer, and a linear layer was utilized to enhance the model results. The method predicts the market behavior and evaluates the relationship between the emotions of investors and transaction data. However, the method needs further improvements in the emotion abstraction technique. Xu et al. [23] presented a method for the forecast of the stock market by introducing the SA. Initially, the dataset is gathered by using a heuristic mean-end process, and then sentiments are identified from the acquired data. SA was combined with event study and the result was used as the input of principal component analysis (PCA), which was used for further analysis. The method predicts the market behavior using SA with an accuracy of 84.89%. However, the method faces stability-related issues and there exists an inequality between the forecast and real values.
Wu et al. [24] proposed a deep learning (DL) method for the prediction of the stock dimensional valence-arousal sentiments in the stock market. The method used the title, keywords, and overview of stock market-related messages for estimation of all vectors using the hieratical attention approach. The method achieved success, producing better results. However, it cannot identify the words with multiple meanings, and it also needs some stability improvements. Similar to the aforementioned technique, a DL-based method was employed in [22] for extrapolation of Stock market using sentiment analysis. The model is based on RNN and LSTM techniques which is then utilized to define the sentiments into positive and negative class. The increase or decrease in stock prices is predicted from sentiment analysis. Ren et al. [15] presented a framework for prediction by examining the sentiments of investors. Initially, the financial reviewed content was gathered from two sites namely Sina Finance and Eastmoney. Then, the SVM was trained over the financial data to predict an essential index in China, namely SSE 50 Index, by applying a five-fold cross-validation technique. The method confirmed that merging the sentiment keypoints with stock market data can obtain robust results in comparison to utilizing only stock market data in estimating movement direction. However, this technique is not robust to analyze large data in real-time. Bouktif et al. [14] introduced an approach to predict the stock market's future directions. Initially, stock data are gathered from online resources together with public tweets. In the second step, the NLP approach was applied to compute the informative key-points from the tweets. Then, several ML-based methods, namely naive Bayes, logistic regression, SVM, ANN, random forest, and XGBoost, were trained to classify the data. The technique needs further improvement for complex textual features.
Kelotra et al. [13] offered a DL based technique namely the Rider-monarch butterfly optimization (MBO)-based on the ConvLSTM framework for stock market prediction. In the first step, the input data were collected from the livestock market which was passed to the key-points computation process to calculate the technical indicators-based representative set of features. In the next step, the clustering technique, namely sparse-fuzzy C-means (FCM), was employed over the extracted key-points to group them. After this, the highly important key-points were passed to the presented RiderMBO-based Deep-ConvLSTM network to perform prediction. Another sentiment analysis-based stock market prediction approach was presented in [12], which makes use of computed textual deep features. After gathering the stock market data, CNN and RNN were employed to compute the deep features. After this, PCA and LDA algorithms were applied to extract the significant set of features. Finally, the SVM classifier was trained over the calculated features for stock market movements prediction. The model performs well for stock market prediction, but it may not exhibit better performance over real-world scenarios. Similarly, in [11], a DL-based framework employing sentiment analysis for stock market prediction was presented. The LSTM model was utilized to forecast the future closing values of a stock market. Supporting the English-only tweets, this method is robust to calculate the stock market movements.
The user responses from historic articles can be employed to predict consumer behaviors with time. One such method was presented in [25] using a dual CNN approach with user behaviors to embed both the semantic and structural information from text articles. Another approach employing Pillar 3 disclosed information was presented in [26] that focused on the investigation of deposit users' interests and behavior using information from websites that were rooted deeply in commercial bank disclosures. The Pillar 3 regulatory framework's objective was to strengthen price stability by ensuring accountability and improving financial institutions' public disclosures. The work [26] performs well for analyzing consumer behavior. However, the model needs evaluation on a standard dataset.

Proposed Methodology
Our proposed technique encompasses three steps: data gathering, extraction of sentiments, training, and prediction of the stock market.

Data Gathering and Cleansing
First, we gather data from Twitter. This social media platform is selected due to its conciseness. In addition to tweets data directly extracted from Twitter, we have used the state-of-the-art dataset named Sentiment140 [27]. After data acquisition, we cleanse this collected data by removing spam, redundant, meaningless or irrelevant tweets by using a reduction system. The preprocessing step further includes the following:

•
Conversion of tweets into word tokens by using bigrams, meaning that the model evaluates two tokens/words at the same time. This means that if a tweet describes something as "not good", that will be considered as a negative remark, rather than a positive one just because it contains the word "good". • Removal of tags like author tag (@). These labels must be eliminated because they contain no valuable knowledge for obtaining sentiments. After preprocessing, the cleansed dataset is used for feature extraction and sentiment identification by using the ML algorithm. This process formed the raw twitter data into a standard dataset containing a feature set and tweets with their predicted sentiments, i.e., Positive, Negative and Neutral denoted by 1, −1, and 0, respectively. Furthermore, neutral tweets can cause an imbalance in the training process which can degrade the performance of the classifier. To remove the neutral tweets, we used a simple algorithm which identified them by their label (i.e., 0) and filtered them out of the dataset, resulting in the reduced version of the dataset with no neutral tweets. The dataset is further reduced by removing neutral tweets as they do not play any role in the prediction process. The removal of neutral tweets is necessary for two reasons; (i) neutral tweets do not contain any opinion or sentiment polarity, hence they do not play any significant role in opinion mining, and (ii) the inclusion of neutral set of tweets causes a bigger dataset, resulting in the extra and unnecessary overhead for the classifier during model training [28][29][30]. The overall architecture is shown in Figure 1.
conciseness. In addition to tweets data directly extracted from Twitter, we have used the state-of-the-art dataset named Sentiment140 [27]. After data acquisition, we cleanse this collected data by removing spam, redundant, meaningless or irrelevant tweets by using a reduction system. The preprocessing step further includes the following:

•
Conversion of tweets into word tokens by using bigrams, meaning that the model evaluates two tokens/words at the same time. This means that if a tweet describes something as "not good", that will be considered as a negative remark, rather than a positive one just because it contains the word "good".

•
Removal of tags like author tag (@). These labels must be eliminated because they contain no valuable knowledge for obtaining sentiments. After preprocessing, the cleansed dataset is used for feature extraction and sentiment identification by using the ML algorithm. This process formed the raw twitter data into a standard dataset containing a feature set and tweets with their predicted sentiments, i.e., Positive, Negative and Neutral denoted by 1, −1, and 0, respectively. Furthermore, neutral tweets can cause an imbalance in the training process which can degrade the performance of the classifier. To remove the neutral tweets, we used a simple algorithm which identified them by their label (i.e., 0) and filtered them out of the dataset, resulting in the reduced version of the dataset with no neutral tweets. The dataset is further reduced by removing neutral tweets as they do not play any role in the prediction process. The removal of neutral tweets is necessary for two reasons; (i) neutral tweets do not contain any opinion or sentiment polarity, hence they do not play any significant role in opinion mining, and (ii) the inclusion of neutral set of tweets causes a bigger dataset, resulting in the extra and unnecessary overhead for the classifier during model training [28][29][30]. The overall architecture is shown in Figure 1. Secondly, we also make use of stock market data provided at Google Finance, where Global historical stock data is available. The price data of chosen stocks is selected and downloaded from the service provider in a CSV file. The collected data maintain seven features named: date, open, high, low, close, volume, and adjusted close. These features indicate traded date, opening price, highest price for trading, lowest price for trading, price at closing, traded shares, and stock closing price when investors are paid their dividends, respectively. This data is also preprocessed by adding some calculated values based on existing features (i.e., 5-day price difference, 10-day price difference, extrapolated prices during holidays, and return of the market (RM)), and removing some columns including adjusted close price, volume, and opening price. The reasons for adding those calculated values are as follow: the 5-and 10-day price difference provides a brief past behavior of the stock under discussion. The closing prices for weekend have been extrapolated to complete the timeline of the dataset, which may result in improved overall accuracy of the model [4]. The return of the market (RM) is calculated to provide an investor a probabilistic idea of risk vs. expected profit.
After the preprocessing stage for both data sources have been completed, the next step is model training and stock prediction. An ELM and RNN-based model have been trained using the extracted features from the Twitter and Google Finance datasets. Both datasets are distributed into two subsets; the first 70% is reserved for training and the second 30% for testing/validation. More details about the incorporated datasets have been provided in the results and discussions section.

Feature Extraction
Once the data re passed from the preprocessed stage, they are forwarded to the feature extraction stage where further data processing is performed. For this reason, we have proposed a novel approach, namely the SSWN. A detailed description of the proposed approach is given in the subsequent sections.

SWN
Several lexical resources are highly utilized in various investigations. A summary of the highly applied assets is given in Table 1. The first lexical resource mentioned in the table named SenticNet is a semantic resource which is publicly available and used for performing SA at concept-level. It does not use the standard graph mining techniques, rather it uses as custom-devised concept 'energy flows' for common sense knowledge representations. On the other hand, AFINN one of the simplest and popular lexicons containing hundreds of synsets and words associated with a polarity score ranging from −5 to 5. Similarly, SO-CAL is also a lexical resource which more than six thousand Synsets while assigning each word a polarity score ranging from −5 to 5. Another popular lexical resource is WordNet, which is a superficial resemblance of thesaurus, grouping the words together based on their meanings. It is a freely available large lexical database which groups nouns, verbs, adverbs and adjectives into synsets, also known as cognitive synonyms. Additionally, WordNet-Affect extends the domains of WordNet by further including a subset of cognitive synonyms (synsets) which are appropriate for representing the affective concepts in a correlation with affective words. There are several applications of SWN in SA that can be employed to predict the stock market as the structure of its key points is convenient to perform the mathematical modeling. SWN is a lexical resource for opinion mining [23], in which every synset of WordNet, a triple of polarity scores is named, i.e., a positivity, negativity, and objectivity score. SWN has been established routinely by implying a mixture of linguistic and statistic classifiers. It has been employed in various opinion-related missions, i.e., for bias analysis and SA with encouraging findings.

SSWN
For predicting the future trends of stock market, we have introduced SSWN, which is based on SWN 3.0 and contains a set of feature words specifically helpful to identify and score tweets related to stock market only. The SSWN creation procedure starts with two seed sets. The first group comprises positive terms while the other contains negative terms. The seed groups are extended by combining all the synsets from SWN related to the seed words. A particular value of the radius is chosen for seed expansion. Another set namely objective word is also introduced. In the second step, the computed seeds are used to classify the SSWN synsets into positive and negative classes. In the presented approach, we have employed classifiers along with four choices of radius = 0, 2, 4, 6. The outputs from all classifiers are averaged to decide the final value of the synset. Table 2 Electronics 2022, 11, 3414 6 of 20 describes a SSWN sample in which every tuple of SSWN specifies a synset comprised of dialogue data, an identifier that links the synset with WordNet, scores, and a gloss that keeps the denotation together with the usage of the values available in each synset. All words/tokens in each row of the cleansed data are replaced with the calculated scores, resulting in a feature matrix which is aligned/standardized with the input requirements of the ELM classifier. The objective score (OS) can be calculated as: where PS is the positive score while NS is negative. The sentiment score (SS) can be calculated using Equation (2): The strength of sentiment (ST) can be found through Equation (3), in which r is the rank of the feature. Table 3 demonstrates the relationship between a term t and a class c.  [32] 2477 N/A −5 5 SO-CAL [33] 6306 N/A −5 5 Subjectivity Lexicon [34] 8221 N/A N/A N/A Opinion Lexicon [35] 6786 N/A N/A N/A General Inquirer [36] 11,789 N/A N/A N/A SentiSense [37] 2190 5496 N/A N/A Micro-WNOp [38] 1105 1960 0 1 WordNet [39] 117,659 155,287 N/A N/A WordNet-Affect [40] 2874 4787 N/A N/A SentiWordNet [41] 117,659 155,287 0 1  Table 3. Association between t and c.

Presence of a Term t Absence of a Term t
Prescence of a class c A C Absence of a class c B D Information Gain (IG) IG, also termed as expected mutual information, is an ML-based technique that is employed to compute the term goodness for a given technique [23]. It works by computing the bits of information based on the existence or absence of a word in a file. For example, the collection of groups in a target space is represented by [30] i = 1, . . . , m. Then, the IG for a term t is computed by using the formula in Equation (4).
It is a simplified type of binary categorization [21] as text categorization approaches typically use n-array classification space, i.e., the range of n can be up to tens of thousands. Furthermore, the goodness of a value is calculated universally in accordance with all classes on average. The IG value is computed for every distinctive term for a specified corpus. Furthermore, a threshold is defined against the IG score based on which terms are eliminated from the corpus. The computation complexity for IG is O(Vn), where V is vocabulary size and n is n-array categorization. By employing the correlation table, the IG value is computed through Equation (5). The greater the value of IG, the better the union.

Sentiment Knowledge Base (SKB) Generation Procedure
To produce the SKB, the presented approach follows the following steps: 1. Take all rows from SSWN one by one.

2.
Compute synset from each selected row.

3.
Calculate the sentiment orientation (SO) for each synset.

4.
If the computed SO is found to be subjective thengo for step 5, else remove the selected synset and jump to step 1 again.

5.
For each subjective synset, locate and calculate the portions of its speech information. 6.
Find specific words from the synset. 7.
Calculate feature vector by combining all individual terms along with speech chunks differentiated with a hash, i.e., term#POS. 8.
Save the computed key points in the list of nominated features. 9.
Repeat steps 1-8 for all the rows. 10. The same feature can have replicated records with different polarity and sentiment scores in the keypoints list because of its sense ranking-based usage. So, this step is performed to locate the distinctive features. 11. The positive and negative occurrences are computed for all the features detected in step 10. 12. Based on the count score computed from the step 11, IG is employed to produce sentiment scores. 13. Finally, a distinctive identifier (ID) is allocated to each feature.
The SKBs produced via this procedure are domain-independent as sentiment strength is computed through employing a generic sentiment lexicon that does not require the training from a specific domain. The presented SKBs are capable to deal with the problem of data absence and data diversity. Moreover, these SKBs can easily locate the sentiment orientation, weightage, and sense of words based on their usage. These sentiment resources are used in the introduced technique to improve SA specifically for stock market prediction and for SA in general. Table 4 shows a sample from the proposed lexical resource SSWN. Another challenging problem for effective SA is the constant occurrence of new words or sentences. Hence, there is a need for such a method that can deal with a database comprising frequent out-of-vocabulary (OOV) words. In natural language processing, the words which are present in testing/real data set but not available in the training dataset are called out of vocabulary (OOV) words. The main issue is that the model mistakenly assigns zero probability to OOV words, which results in likelihood of a word equal to zero. This common problem normally occurs when the model is trained on larger dataset. There are multiple solutions to solve this problem, including tokenization, smoothing technique, and semantic representations [42,43]. As OOV terms belong to a specific domain, intensive domain information is needed to specify its strength. To cope with this issue, usually, active learning is employed in which a polarity score is computed through humans. To evade the bias, we have chosen only those OOV words for which at least ten persons have voted. The final sentiment score is computed by taking the average value of all ten scores.

Prediction Phase
The link between stocks and sentiments is definitely nonlinear. Hence, after discovering a causality association between the moods over the past 3 days and present-day stock prices, we attempted two techniques (ELM and RNN) to discover and examine the definite association [44], and financial markets often follow nonlinear trends. As discussed earlier, the proposed technique incorporates two datasets, i.e., data extracted from Twitter and a state-of-the-art dataset named Sentiment140. The features extracted from Twitter data by using SSWN have been incorporated to predict the stock trends by using the past three days stock data extracted from Google Finance. These extracted features are then utilized to predict the current day's stock trends of a set of specific brands.

Extreme Learning Machine
The important characteristics of text classification include a large number of training samples and high text dimensionality. The high dimension of the text results is increased computational burden to the ELM. A traditional and effective method to resolve this issue is to reduce text dimensionality by using some text representations which help increase the clarification accuracy. The researchers often use vector space model (VSM) for text representation in text classification. Compared with other text representation methods, word vector representation has proven to have better text representation ability. Word vector deals with dimensionality problem by mapping each term (a distinct word in textual dataset) with a real vector with low dimension by training the unlabeled corpus. We have considered open, high, and low as input to the ELM and closing price as output of the ELM. In the proposed approach, the ELM classifier [45] was initially introduced for a feed-forward neural network with a single hidden layer without the need to tune it. The output with L hidden nodes for training set is explained in Equation (6): Here, β = {β 1 , . . . , β L } T is presenting output weights among the nodes of the hidden and output layer while h(x) = {h 1 (x), . . . , h L (x)} is the output vector. The decision method for ELM classifier is given as: To obtain the robust performance, the ELM aims to deal with the lowest training error and reach the minimum norm of the resultant weights by reducing the given objective function: Minimize : ||Hβ − T|| 2 and ||β|| (8) Here, H is showing the output matrix from hidden layers.
To reduce the norm of the output weights, ELM draws an optimal hyper-plane to classify the samples into different classes through maximizing the margin: 2/||β|| by employing the nominal least square approach as: Here, H † presents the Moore-Penrose generalized inverse of the matrix that is calculated by using the orthogonalization, orthogonal projection, and singular value decomposition approaches.
The desired output of ELM is:

Recurrent Neural Network
Recurrent neural network, aka RNN is suitable in the problems in which we must deal with a sequence of data. Many researchers recommend using RNN for time series analysis [8][9][10]. In this type of work, the model learns from its current observing, also known as Short-term memory of the network, resembling the frontal lobe of the brain. The reason for using RNN when we are going to deal with sequential data is that the model uses its short-term memory to predict the upcoming data with more accuracy. Rather than using a fix deadline for deleting the past data, the weights allotted to past data determine the time for which these data will be kept in memory. Thus, RNN is more suitable in the case of problems, such as sequence labeling, sentiment analysis, and speech tagging, etc. [46,47].
Time series analysis is generally an important problem which can be resolved by using RNN. In this problem, we need to work with data which is in sequential order. Such works involve learning from the most recent observations, alternatively called short-term memory. This work primarily focuses on text classification. So, RNN in this research, is used for classification of Twitter data. We propose a model to predict the closing price of the stock market.
Twitter data are not in a uniform format, meaning that number of words in a tweet may vary from 3-5 words to 17-20 words, for example. However, our neural network does not accept input in this form. We need to convert this data into a uniform format. The most appropriate solution to this problem can be embedding and padding the data rows. The embedding process involves representing the words with vectors by using the procedure mentioned in the discussion related to ELM. The position of a term or word in a vector space is determined and it is represented in the feature vector. The embedding data then needs to be in the uniform length, so we pad the data with zeros.
RNN [48] employs links among nodes to build a directed graph over a timeframe. This enables it to show sequential vibrant behavior. RNN utilizes its memory to manipulate the varying length sequences of inputs which makes it appropriate for the stock prediction. Every processing unit in an RNN consists of time-based arbitrary real valued activation and adaptable weight which are generated by employing the same set of weights in a loop over a graph-like structure. Equation (12) is used to specify the values of hidden units.
In RNN, the size of the input remains same for each learned model, as, it is indicated in the form of shift from one state to another. Moreover, the structure employs the identical transition function having the same parameters for each time step. RNN stores the output of the previous layers to make predictions which enables it to work with sequential data. In this work, we have tested the RNN for prediction of stock market behavior.

Experimental Results
This section describes the demographics of datasets used, an overview of the evaluation metrics, and a comprehensive discussion of the results achieved along with a comparison with state-of-the-art techniques.

Experimental Setup
The test bed consists of a workstation equipped with an x64 Intel Core i7-6700 CPU clocking at 3.40 GHz with 16 GB of DDR4 RAM and 4 GB of NVIDIA GetForce graphics card. The storage capacity is 1 TB HDD and 256 GB of SSD. The 64-bit operating system is Microsoft Windows 10 Professional which is installed on the SSD. The datasets and working environments are stored on SSD to avoid the mechanical delay caused by the HDD and speedup the model training and testing process.
Python version 3.7.15 along with necessary libraries like NLTK, Stanford NER Tagger, and BeautifulSoup, Numpy, Scikit-learn etc. is installed in Anaconda environment. We have used the Relu activation function and learning rate is 0.001 for our model training. For performance evaluation we have employed different metrics, i.e., accuracy, precision, recall, and F-measure.

Datasets
As described in the previous sections, we incorporated two datasets, i.e., Sentiment140, which is a state-of-the-art dataset widely used for tasks involving SA, and the other dataset is directly collected from Twitter platform using a Twitter API, i.e., Tweepy.

The Sentiment140 Dataset
The Sentiment140 dataset contains a total of 1.6 M tweets extracted by using a Twitter API [49]. All the tweets have been annotated as negative = 0, neutral = 2, and positive = 4 and are utilized to discover their sentiments. The dataset contains six columns described in Table 5. A detailed description of the Sentiment140 dataset can be found here [27]. We filtered out the tweets mentioning one of the specified brand names in the tweet body. This filtration resulted in a new subset of the Sentiment140 dataset consisting of total 56 K tweets. The set of neutral tweets has been ignored and subtracted from the dataset as neutral tweets do not play any significant role in the stock prediction.

Direct Data from Twitter
This dataset is collected using a custom code which uses a Twitter API. Tweets mentioning the brands are shown in Table 6 and posted during 1 March 2021 to 21 March 2021 have been extracted/downloaded by using a Python library called Tweepy. After performing preprocessing and cleansing steps mentioned in the previous sections, the gathered data are finally in a condition to be processed and used for predicting the stock market value of specific brands. Table 6 demonstrates the demographics of data directly collected from Twitter. Similar to the steps performed on the setntiment140 dataset, we downloaded the tweets mentioning one of the brands under study by using the previously mentioned custom code. This resulted in a new dataset consisting of approximately 506 K tweets. Additionally, we also calculated the term frequency. Figure 2 depicts a word cloud showing frequently used words in the dataset.
The set of neutral tweets was ignored and excluded from the dataset as neutral tweets do not play any significant role in the stock prediction. After performing preprocessing and subtraction of the neutral tweets, the dataset was further reduced to a total of 224.2 K tweets belonging to positive and negative classes only. Similar to the steps performed on the setntiment140 dataset, we downloaded the tweets mentioning one of the brands under study by using the previously mentioned custom code. This resulted in a new dataset consisting of approximately 506 K tweets. Additionally, we also calculated the term frequency. Figure 2 depicts a word cloud showing frequently used words in the dataset. The set of neutral tweets was ignored and excluded from the dataset as neutral tweets do not play any significant role in the stock prediction. After performing preprocessing and subtraction of the neutral tweets, the dataset was further reduced to a total of 224.2 K tweets belonging to positive and negative classes only.

Proposed Method Results
This is section is a detailed discussion about the achieved results by proposed approach. For stock price prediction, we trained two models, i.e., ELM and RNN over both the datasets and reported the average results. Figure 3 shows the results of the proposed method in terms of precision, recall, and f-measure. The said figure depicts that the

Proposed Method Results
This is section is a detailed discussion about the achieved results by proposed approach. For stock price prediction, we trained two models, i.e., ELM and RNN over both the datasets and reported the average results. Figure 3 shows the results of the proposed method in terms of precision, recall, and f-measure. The said figure depicts that the proposed model shows variable performance from stock to stock. The reason for this is the availability of the training data. Some stocks were found to be mentioned less than others on Twitter, resulting in fewer tweets (i.e., training data) for those brands. Thus, the more data you have for certain stocks, the more accurately the model can predict the output values (stock prices in our case) for those stocks. proposed model shows variable performance from stock to stock. The reason for this is the availability of the training data. Some stocks were found to be mentioned less than others on Twitter, resulting in fewer tweets (i.e., training data) for those brands. Thus, the more data you have for certain stocks, the more accurately the model can predict the output values (stock prices in our case) for those stocks. From the results, we can say our method performs achieved the good results for predating the stock market behavior. The average values of our proposed system in terms of precision, recall and f-measure are 0.8603, 0.811, and 0.8537, respectively. The column graph shows the brand wise results of our method, in which blue, red, and gray bars show precision, recall, and f-measure, respectively. So, we can say that our method can precisely predict the stock market behavior of all brands.
To further evaluate our method, we have plotted accuracies of all brands in boxplot which can be seen in Figure 4. Figure 4a describes the results of ELM classification. The prediction accuracy of all brands, i.e., APPL, TSLA, MSFT, WMT, PYPL, NVDA, INTC, FB, TWTR, and AMZN is 90.3%, 85.01%, 88.21%, 85.18%, 84.716%, 87.35%, 80.733%, 79.25%, 91.05%, and 88.78% respectively. So, the average accuracy of our proposed technique is 86.06% which is impressive and can be used to precisely predict the stock market behavior. Figure 4b shows the results of RNN classifier for all brands. Here, our method From the results, we can say our method performs achieved the good results for predating the stock market behavior. The average values of our proposed system in terms of precision, recall and f-measure are 0.8603, 0.811, and 0.8537, respectively. The column graph shows the brand wise results of our method, in which blue, red, and gray bars show precision, recall, and f-measure, respectively. So, we can say that our method can precisely predict the stock market behavior of all brands.
To further evaluate our method, we have plotted accuracies of all brands in boxplot which can be seen in Figure 4. Figure 4a describes the results of ELM classification. The prediction accuracy of all brands, i.e., APPL, TSLA, MSFT, WMT, PYPL, NVDA, INTC, FB, TWTR, and AMZN is 90.3%, 85.01%, 88.21%, 85.18%, 84.716%, 87.35%, 80.733%, 79.25%, 91.05%, and 88.78% respectively. So, the average accuracy of our proposed technique is 86.06% which is impressive and can be used to precisely predict the stock market behavior. Figure 4b shows the results of RNN classifier for all brands. Here, our method achieved the average accuracy of 81.4%, which is less than the accuracy achieved by ELM classifier. According to the results, we can say that our proposed approach more accurately predicts the stock market trends of any brand.
dating the stock market behavior. The average values of our proposed system in terms of precision, recall and f-measure are 0.8603, 0.811, and 0.8537, respectively. The column graph shows the brand wise results of our method, in which blue, red, and gray bars show precision, recall, and f-measure, respectively. So, we can say that our method can precisely predict the stock market behavior of all brands.
To further evaluate our method, we have plotted accuracies of all brands in boxplot which can be seen in Figure 4. Figure 4a describes the results of ELM classification. The prediction accuracy of all brands, i.e., APPL, TSLA, MSFT, WMT, PYPL, NVDA, INTC, FB, TWTR, and AMZN is 90.3%, 85.01%, 88.21%, 85.18%, 84.716%, 87.35%, 80.733%, 79.25%, 91.05%, and 88.78% respectively. So, the average accuracy of our proposed technique is 86.06% which is impressive and can be used to precisely predict the stock market behavior. Figure 4b shows the results of RNN classifier for all brands. Here, our method achieved the average accuracy of 81.4%, which is less than the accuracy achieved by ELM classifier. According to the results, we can say that our proposed approach more accurately predicts the stock market trends of any brand.

Classifiers' Performance Evaluation
We selected nine ML algorithms and compared their performance with respect to the prediction accuracy. We trained these algorithms and then tested them on both the datasets to predict future stock market trends. Before applying these ML algorithms, we split the final datasets into two portions, i.e., 70% of the samples as training data and the remaining 30% as testing data. The training and testing of the algorithms are performed by using a Python library for ML named Scikit-learn [31]. Table 7 provides a list of ML algorithms used in this experimentation along with their optimal parameters.

Performance of Algorithms before and after SSWN
We evaluated the performance of the chosen techniques with SSWN and without using it, i.e., by employing the standard SWN. Figure 5 demonstrates an overall increase in the accuracy of all algorithms after employing SSWN.

Performance of Algorithms on Both Datasets
Along with other comparisons, we also compared the performance of selected algorithms on the standard Sentiment140 dataset, as shown in Table 8.  Table 8 that the ELM classifier outperforms other algorithms in terms of accuracy and precision. However, recall and F-measure of ELM cannot remain on top. It can also be observed from the table that that RNN shows second-best performance in terms of accuracy and precision wile remining on top in terms of F-measure. Table 9 demonstrates the performance of these algorithms on the Twitter dataset.

Performance of Algorithms on Both Datasets
Along with other comparisons, we also compared the performance of selected algorithms on the standard Sentiment140 dataset, as shown in Table 8. It is evident from Table 8 that the ELM classifier outperforms other algorithms in terms of accuracy and precision. However, recall and F-measure of ELM cannot remain on top. It can also be observed from the table that that RNN shows second-best performance in terms of accuracy and precision wile remining on top in terms of F-measure. Table 9 demonstrates the performance of these algorithms on the Twitter dataset. Table 9 demonstrates the significant performance improvement obtained using a majority of the classifiers. By incorporating SSWN, the performance of all algorithms except for NB DT increased, i.e., these two algorithms did not show any significant improvement in their performance. Whereas RNN here, again, shows the second-best performance in terms of accuracy and precision and best performance in terms of F-measure.

Time Complexity
The performance of all selected algorithms is also compared with respect to the time taken by the models for training and assigning the sentiment scores. Figure 6 shows a detailed comparison of the performance of algorithms in terms of time taken in seconds for training and scoring. NB took minimum time for training while FLM was the fastest while sentiment scoring. Overall, ELM and RNN were found to perform well with respect to accuracy combined with time complexity.  Table 9 demonstrates the significant performance improvement obtained using a majority of the classifiers. By incorporating SSWN, the performance of all algorithms except for NB DT increased, i.e., these two algorithms did not show any significant improvement in their performance. Whereas RNN here, again, shows the second-best performance in terms of accuracy and precision and best performance in terms of F-measure.

Time Complexity
The performance of all selected algorithms is also compared with respect to the time taken by the models for training and assigning the sentiment scores. Figure 6 shows a detailed comparison of the performance of algorithms in terms of time taken in seconds for training and scoring. NB took minimum time for training while FLM was the fastest while sentiment scoring. Overall, ELM and RNN were found to perform well with respect to accuracy combined with time complexity.

Comparison with State-of-the-Art Techniques
Several researchers have presented ML-based work to predict the future trends for the stock market. Therefore, in this section, to assess the prediction robustness of our approach, we performed a comparative analysis of our framework with the latest MLbased approaches. This analysis is evaluated in terms of employed technique, data, as well as obtained accuracy and precision.

Comparison with State-of-the-Art Techniques
Several researchers have presented ML-based work to predict the future trends for the stock market. Therefore, in this section, to assess the prediction robustness of our approach, we performed a comparative analysis of our framework with the latest ML-based approaches. This analysis is evaluated in terms of employed technique, data, as well as obtained accuracy and precision.
The comparative results are reported in Table 10. Zhou et al. [33] presented an approach for stock market prediction by using the online emotions used by the people to assess their behaviors. The method [33] employed the SVM to perform classification and attained an accuracy of 64.15%. Nguyen et al. [34] introduced a framework for stock market prediction by using the sentiments related to a specific topic of the company and employed the SVM classifier for prediction. This method [34] showed an average accuracy of 54.41%. The work in [35] presented a data mining technique for stock market prediction with obtained accuracy of 66.48%. Khan et al. [25] introduced a framework by using the social sites along with political events for stock market prediction and attained an accuracy of 75.38%. Similarly, the technique in [36] used the same concept with the ANN classifier and attained an average accuracy of 77.12%. Khan et al. [37] presented another approach using financial news data and obtained an accuracy of 80.6%. The technique in [38] employed sentiments of the people from social sites along with naive Bayes and SVM classifier and showed the best average accuracy of 80.6%. From the Table 10, it can be witnessed that the presented approach showed an accuracy value of 85.7%, which is higher than all the comparative methods. Moreover, the comparative methods attained an average accuracy value of 71.24%, which is 85.7% in our case, so our method obtained a 14.46% performance gain.

Technique
Accuracy (%) Precision Khan et al. [27] 75.38 0.67 Zhou et al. [50] 64.15 0.65 The comparative results are reported in Table 10. Zhou et al. [33] presented an approach for stock market prediction by using the online emotions used by the people to assess their behaviors. The method [33] employed the SVM to perform classification and attained an accuracy of 64.15%. Nguyen et al. [34] introduced a framework for stock market prediction by using the sentiments related to a specific topic of the company and employed the SVM classifier for prediction. This method [34] showed an average accuracy of 54.41%. The work in [35] presented a data mining technique for stock market prediction with obtained accuracy of 66.48%. Khan et al. [25] introduced a framework by using the social sites along with political events for stock market prediction and attained an accuracy of 75.38%. Similarly, the technique in [36] used the same concept with the ANN classifier and attained an average accuracy of 77.12%. Khan et al. [37] presented another approach using financial news data and obtained an accuracy of 80.6%. The technique in [38] employed sentiments of the people from social sites along with naive Bayes and SVM classifier and showed the best average accuracy of 80.6%. From the Table 10, it can be witnessed that the presented approach showed an accuracy value of 85.7%, which is higher than all the comparative methods. Moreover, the comparative methods attained an average accuracy value of 71.24%, which is 85.7% in our case, so our method obtained a 14.46% performance gain.

Discussion
The prediction of stock market prices is an interesting topic of research, and it is a challenging task due to the volatility, diversity, and dynamic behavior of stock market. Recent research has revealed that sentiments and news might influence the stock market movement and act as potential predictors for tradeoff outcomes. So, social media platforms can be considered an important source of information for extracting important chunks of information from the social media posts already published by the users. In this regard, Twitter becomes a more suitable source of information due to the concise nature of tweets posted there. However, this conciseness also makes the job more challenging due to usage of shortened words, duplication, and different types of noise residing in tweets. Combined with the power of machine learning, tweets can be significant for prediction of stock market prices. In this work, we introduced a novel approach for prediction of stock prices by using SA. For this purpose, we implement two distinct classifiers, i.e., RNN and ELM, along with other popular ones that are based on the proposed sentiment lexicon named SSWN and two datasets, i.e., data directly acquired from Twitter and a standard dataset named Sentiment140. We performed the experimentation on ten US market stocks data obtained from Google Finance. Firstly, we compared and evaluated the performance of nine different machine learning algorithms on the said stock data where the performance of ELM remained on top. Secondly, we compared our work with state-of-the-art while achieving a superior overall accuracy due to usage of a dedicated sentiment lexicon specially proposed for the prediction of stock market. The scope and working of the proposed technique can be further enhanced considering other DL-based approaches.

Conclusions
People use social media to share their personal ideas and opinions regarding a brand, entity, person, or an affair. Twitter is a globally recognized, modern social media platform for sharing ideas and opinions in a very concise way. Using the power of SA ML, social media posts such as tweets can play a significant role in the prediction of the stock market behavior. This work introduces a novel approach for stock market prediction using SA. The model is based on the proposed SSWN sentiment lexicon along with RNN and ELM classifiers. We have used Twitter data and Sentiment140 dataset for the performance evaluation of the ML models considering ten different brands for stock market prediction. We achieved the average accuracy of 81.40% for RNN and 86.06% for ELM classifier. We compared our approach with various ML models as well as with state-of-the art methods and achieved remarkable results. In future, we plan to enhance the capability and coverage of this approach by adding other popular social media platforms, e.g., Facebook, and Google News. Furthermore, we may evaluate the proposed approach over other challenging datasets while considering more stocks as well.