Unleashing the Power of Tweets and News in Stock-Price Prediction Using Machine-Learning Techniques

: Price prediction tools play a significant role in small investors’ behavior. As such, this study aims to propose a method to more effectively predict stock prices in North America. Chiefly, the study addresses crucial questions related to the relevance of news and tweets in stock-price prediction and highlights the potential value of considering such parameters in algorithmic trading strategies—particularly during times of market panic. To this end, we develop innovative multi-layer perceptron (MLP) and long short-term memory (LSTM) neural networks to investigate the influence of Twitter count (TC), and news count (NC) variables on stock-price prediction under both normal and market-panic conditions. To capture the impact of these variables, we integrate technical variables with TC and NC and evaluate the prediction accuracy across different model types. We use Bloomberg Twitter count and news publication count variables in North American stock-price prediction and integrate them into MLP and LSTM neural networks to evaluate their impact during the market pandemic. The results showcase improved prediction accuracy, promising significant benefits for traders and investors. This strategic integration reflects a nuanced understanding of the market sentiment derived from public opinion on platforms like Twitter.


Introduction
The dynamic landscape of the global stock market plays a significant role in shaping economies, influencing individual financial decisions, and driving continuous innovation in investment strategies.The indisputable significance of the stock market is underscored by the growth in total global capitalizations, which exceeded USD 109 trillion in 2023-a remarkable threefold increase from the 2009 figure of USD 25 trillion [1,2].As per World Bank data from 2017, stock trading's substantial impact on the American economy has been evident since 2013, with the total value of stocks traded on US markets consistently surpassing 200% of the nation's annual GDP.The New York Stock Exchange (NYSE)-a symbolic hub of financial activity-boasts an average market capitalization of approximately USD 29 trillion, highlighting its central role in global financial markets [3].The daily average of about USD 169 billion in stocks traded on the NYSE further emphasizes the fluidity and dynamism of the market [4].
Beyond its macroeconomic influence, the movement of stocks on a micro level is crucial in determining the financial market's overall well-being.Notably, studies such as that of Chan and Woo [5] reveal that stock-market price booms can drive long-term economic growth.The involvement of novice investors in the stock market adds another layer of complexity to the financial ecosystem.In particular, there are now over 54% of US adults having some form of investment in the stock market [6].This increase in individual investors can be attributed to the post-2008 financial crisis [7].In this evolving landscape, the need for reliable stock-price prediction models increases, especially as financial products and services become more accessible to smaller investors.
Moreover, price prediction tools play a significant role in small investors' behavior.As such, numerous research studies (e.g., [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]) have been conducted to develop appropriate models and techniques that would widely be employed in algorithmic trading.Algorithmic trading, in essence, leverages computational power and mathematical formulas to illuminate buy or sell decisions for financial securities on an exchange and incorporates complex formulas and models with human oversight [24][25][26][27].These techniques are integral to institutional firms' trading algorithms, as they aid in minimizing transaction costs and market risks [28].The rise of artificial intelligence (AI) in both the stock market and financial firms has significantly contributed to the growth of the algorithmic trading market.Companies like Sentient have developed AI-powered algorithmic traders, thereby showcasing the potential for these advanced algorithms to function as standalone entities [29,30].
While numerous studies have developed models and techniques utilizing AI methods in stock-price prediction, they often focus on technical indices and general data in a normal market context.More specifically, these studies frequently overlook the impact of news on the decisions of small traders-particularly in market-panic scenarios.The profound impact of the media-most notably the news and Twitter-on investors' decisions in relation to stock buying and selling is evident.This has greatly shaped the modern financial landscape.News that is disseminated through traditional media or online platforms can swiftly influence investor sentiment by providing critical information about companies, industries, and the broader economic landscape.For example, during the COVID-19 pandemic, the demand for oil rapidly declined due to business closures and travel restrictions; this had, inevitably, caused the futures price for West Texas Intermediate (WTI) to plummet from USD 18 a barrel to around −USD 37 a barrel [31].Real-time updates on corporate earnings, geopolitical events, and market trends can trigger immediate reactions and prompt investors to make rapid decisions in reassessing their positions.The social media platform Twitter, in particular, has become a dynamic space for financial discussions and for disseminating market-related information.Tweets from influential market analysts, financial experts, and even company executives can rapidly circulate and thereby influence investor perceptions and drive fluctuations in stock prices.The accessibility and speed of information on both news outlets and Twitter have made it imperative for investors to stay vigilant, as their decisions are now increasingly shaped by the instantaneous flow of news and opinions.
Evidently, both the news and tweets can potentially impact small traders' trading decisions.As such, this study aims to investigate whether or not these parameters can affect price prediction performance.More specifically, we aim to answer the following research questions:

•
Can considering the news and tweets improve the stock-price prediction accuracy?• Does the impact of the news and tweets on the price prediction differ under normal and market-panic conditions?
Given these research questions, the primary aim of the research is to investigate whether considering news and tweets can enhance stock-price prediction accuracy.This impact is crucial for investors, financial institutions, and algorithmic trading firms, as more accurate predictions can lead to better investment decisions, reduced risks, and improved profitability.Furthermore, by examining the impact of news and tweets on stock-price prediction, the research seeks to deepen understanding of the dynamic interactions between market information, investor sentiment, and stock-price movements.This understanding can help develop more accurate prediction models that capture the complex interaction of factors influencing financial markets.Following the understanding of these impacts, this research aims to investigate the stability of these impacts under panic conditions.In other words, it aims to consider whether and how these impacts may alter during periods of market distress.To this end, we chose the COVID-19 pandemic as the case for our analysis, given its status as one of the most significant instances of market panic in recent history [32][33][34].Similarly, considering the widespread utilization of LSTM and MLP models in stock-price prediction (e.g., [12,19,21,[35][36][37][38][39][40][41][42][43][44][45]), we opted to evaluate these two methods specifically in the context of incorporating tweets and news count as predictive variables, particularly during market distress.
The remainder of this paper is organized as follows.In Section 2, we review the relevant literature to identify extant knowledge gaps and highlight the contributions of our current study.Next, in Section 3, we define the problem under investigation and outline the proposed model for price prediction.Then, in Section 4, we develop the solution algorithms.Moreover, in Section 5, we analyze the results to address the research questions.Lastly, in Section 6, we offer concluding remarks and insights and suggest future avenues for research.

Literature Review
A stock market is a place for publicly listed companies to trade stocks and other financial instruments; likewise, the price of shares is termed the stock price [46].Initial studies (e.g., [47]) held a view of the stock market as stochastic and, hence, non-predictable.However, later studies (e.g., [22,35,48,49]) argued that the stock market may be predictable to an extent when it is examined from a behavioral economics and socioeconomic theory of finance point of view.Therefore, many research studies have developed different models to predict stock prices and parameters in the stock market.These various approaches and techniques can be categorized into three main categories in terms of strategy: (i) technical analysis, (ii) fundamental analysis, and (iii) sentiment-based analysis [20,22,30].Technical analysis is the most popular approach.It defines a couple of indicators, such as open price, close price, low price, high price, and trading volume, and applies mathematical and statistical techniques to structured data to predict the stock price based on trends in the past and present stock prices [22,48,[50][51][52].On the contrary, the fundamental analysis is concerned with the company that underlies the stock itself instead of the actual stock [53][54][55].The data that are used by the fundamental analyst are usually unstructured and, thus, pose some challenges.However, this type of data has occasionally been shown to be a good predictor of stock-price movement [56].Moreover, this approach is utilized by financial analysts daily, as it incorporates various factors, such as economic forecasts, the efficiency of management, business opportunities, and financial statements [57].Fundamental analysis can, in essence, be defined as a method of finding a stock's inherent value based on financial analysis.Lastly, sentiment-based analysis is based on linguistic feature extraction (e.g., [58][59][60]).This approach has not been as popular, given the difficulty of developing reliable and efficient sentiment analysis tools.This is mainly due to the design complexity and relevant source selection (which is of utmost importance).
Given the popularity of the technical analysis approach, many studies have employed different statistical and mathematical techniques to predict stock prices.These techniques can be generally categorized into three groups: (i) statistical model (STAT), (ii) evolutionary algorithm (EA), and (iii) machine learning (ML), including neural networks (NN).To utilize these techniques, decision-makers should decide on the type of target variable the model aims to predict.These variables can either be the stock price, stock direction, index price, or index direction.The stock price refers to an individual numerical value that the model believes the stock will be priced at soon.The stock direction refers to the direction (i.e., up or down) in which the stock price will move.Furthermore, the index could be another target variable.Unlike the case of stocks, where the data pertains to that of an individual stock, an index measures a section of the stock market by combining price data from multiple stocks.A few well-known indices are the S&P 500 and the Dow Jones Industrial Average.
Exploring the existing literature reveals that diverse techniques have been used for stock-market prediction.Table 1 summarizes studies that have used various techniques for stock-market prediction.To compile this table, we utilized two prominent research databases, namely the Web of Science (WOS) and Scopus, which are known for their comprehensive coverage across various disciplines.Additionally, we supplemented our search by exploring the reference lists of selected studies not initially found in these databases.Our survey focused primarily on papers published after 2000, particularly those published within the last decade.Subsequently, we meticulously reviewed the available literature to identify studies specifically addressing stock-price prediction through technical or sentimental analysis techniques.Each identified article underwent a thorough investigation, enabling us to categorize them based on the type of analysis and techniques employed for stock-price prediction.As shown in Table 1, individual stock outputs are significantly more popular than index-based computations.Likewise, ML and NN are the most popular techniques that have been used in recent years.Hence, given the popularity of these techniques, we will focus on reviewing research studies using ML and NN techniques in order to position our work in the extant literature.

ML Techniques in Stock-Price Prediction
Several ML techniques have been employed in stock-price prediction.Among these techniques, the K nearest neighbor (KNN), random forests (RF), support vector machines (SVM), regression, support vector regression (SVR), and ARIMA are the most popular.The KNN algorithm is one of the initial ML algorithms for stock-price prediction and is a supervised algorithm used exclusively for classification and regression.The primary operation involves identifying the closest neighbors to the queried data point.If the task is classification, the most-occurring neighbor value is returned and used as the output value.However, if regression is the objective, then the average of all neighbor values is returned and used as the output value.Chen and Hao [131] applied the regression format of this algorithm for stock-price prediction.Nevertheless, its application in stock-price prediction has decreased in recent years due to its simplicity, as it results in poor performance when compared to newer ML approaches.For instance, Shynkevich et al. [97] utilized the KNN and SVM algorithms for stock-price prediction and compared their performance to show the superiority of the SVM algorithm.Shynkevich et al. [104] also used ML algorithms to investigate the impact of the forecasting window length on stock-price prediction.They concluded that KNN-SVM results in appropriate prediction accuracy; however, the KNN leads to poor performance.
Regression is another supervised ML technique used for stock-price prediction.For instance, Jiang et al. [88] incorporated this algorithm to predict stock-price movements.To this end, they divided users into stakeholder groups and then analyzed how stakeholder group postings correlated with events in the company.They then used this information to predict movements in stock prices.Furthermore, Gupta and Chen [58] investigated the sentiments extracted from a vast repository of tweets sourced from StockTwits.They employed three distinct machine-learning algorithms (i.e., Naïve Bayes, SVM, and logistic regression) and additionally explored five different featurization techniques (i.e., bag of words, bigram, trigram, term frequency-inverse document frequency (TF-IDF), and latent semantic analysis (LSA)) in an attempt to comprehensively understand the nuanced relationship between sentiment and stock-market dynamics.The work of Zheng et al. [123] also utilized ML for stock-price prediction.However, they introduced the bat algorithm to optimize the three free parameters of the SVR model to create the BA-SVR hybrid model.This model was then employed to forecast the closing prices of 18 stock indexes in the Chinese stock market.The empirical results demonstrated that the BA-SVR model surpassed both the polynomial kernel SVR model and the sigmoid kernel SVR model, particularly when compared to the latter models without the optimized initial parameters.
Moreover, random forests (RF) are another ML algorithm for stock-price prediction (e.g., [15,83,110,132]).This algorithm aggregates the power of a large number of individual decision-tree algorithms to improve the prediction performance [133].More specifically, it is based on the simple principle that many relatively simple models that exhibit low correlation will outperform any of the individual models when they operate as a joint group [134].Lee et al., [83] examined whether textual data improves stock-price prediction when RFs are employed in training their models.The authors' proposed RF model consisted of 2000 trees, and all the models tested were successfully trained.Their results demonstrated that the incorporation of textual data improved next-day price-movement prediction by 10%.Similarly, Weng et al. [110] examined the effectiveness of incorporating both textual data and technical data in stock-price prediction.Applying decision trees was one of their tested methods alongside SVM and NN.The authors also highlighted that incorporating textual data can improve prediction results.Patel et al. [132] compared RF to NN and SVM and concluded that RF outperforms the others.However, it is worth noting that the applied NN was very simple, with only one hidden layer containing two neurons.Later, Zhang et al. [15] utilized RF as a critical part of training in a proprietary stock prediction model.They concluded that the relatively effective performance of their prediction model (in terms of accuracy and returns) is due to the incorporation of RF as one of the integrated models used as a learning method.Support vector machines (SVM) is another promising model in the domain of ML.This type of model separates data into distinct classes via a decision boundary and then seeks to maximize the margin.Due to the nonlinearity of stock-price data, SVM is an appropriate technique for prediction, as it can project the data points into a higher dimensional space by performing a function on the data that makes the classes linearly separable.As a result, many studies (e.g., [100,135]) have utilized SVM for stock-price prediction.Additionally, several studies have successfully incorporated textual data into their SVMs for stock prediction [75,91].Nevertheless, the SVM algorithm is computationally time-consuming.Thus, some studies (e.g., [16]) utilized a kernel function to improve the efficiency of mapping data into a higher dimensional space.Similar to SVM, support vector regression (SVR) employs the same components and techniques; however, rather than classifying the mathematical operations, these operations are tasked with regressing the data points.Li et al. [79] utilized a multiple-kernel SVR to test whether or not the inclusion of news articles alongside technical indicators improved the predictive power of the SVR model.The results revealed that the multiple-kernel SVR outperformed a normal SVR model.The work of Schumaker et al. [73] employed the AZfintext qualitative stock-price predictor and regressed stock quotes and financial news article data as inputs into an SVR algorithm for stock-price prediction.The authors examined whether incorporating sentiment-based analysis into the AZfintext system would improve the stock direction prediction accuracy.The results showed that incorporating sentiment analysis into the AZfintext system did not improve the overall prediction accuracy.
Another widely employed method for stock-price prediction involves utilizing the auto-regressive integrated moving average (ARIMA) model, which is also commonly known as the Box-Jenkins model in finance.ARIMA is specifically designed for training in forecasting time-series data [136].Functioning as a generalized random walk model, ARIMA is finely tuned to eliminate residual autocorrelation, which is a statistical measure of the correlation between variables based on past values.As a generalized exponential smoothing model, ARIMA can incorporate long-term trends and seasonality into its predictions [137].In recent stock-price prediction research, ARIMA has frequently been integrated either into other ML algorithms or used as a benchmark for comparison.An early example is the work of Pai and Lin [66], who developed and tested a hybrid ARIMA and SVM model.They recognized ARIMA's declining popularity and demonstrated its utility in enhancing ML models.The results revealed that the proposed hybrid ARIMA-SVM model outperformed both the standalone ARIMA and SVM models.A subsequent study conducted by Adebiyi et al. [82] compared ARIMA with a three-layer NN and found that the NN consistently outperformed ARIMA in most cases.The graphical representation of their ARIMA predictions indicated a linear pattern, thereby emphasizing its limitation in providing value-based forecasting.Similarly, Chong et al. [109] discovered that NNs significantly outperformed the benchmark autoregressive model in stock-price prediction.
Sun et al. [80] utilized data on stock movements to analyze whether trading behavior could be mapped and used to predict stock prices.For each individual stock, they first analyzed stock trading activities and mapped a network.Then, they classified the trading relationships and grouped them into appropriate categories.Next, they employed Granger causality analysis to prove that the stock prices were indeed correlated with the different trading categories.Moreover, they used a simple three-layer feed-forward NN to test the trading predictability power.The NN incorporated technical indicators as well as trading indicators.The results revealed that the NN performed well overall.Lastly, it is worth emphasizing that this positive result can be considered relatively intuitive, as it is generally well-known that the activities of one group of traders influence another.
Furthermore, Geva and Zahavi [87] investigated whether market data, simple news item counts, business events, and sentiment scores could improve various ML algorithms in stock-price prediction.The authors considered NNs, decision trees, and basic regression.The results demonstrated that among the algorithms tested, only the NN could fully exploit the more intricate nature of the proposed sentiment/news inputs.The other models could not take advantage of these inputs, given the complicated relationship between price and sentiment/news indicators.Zhuge et al. [111] utilized Shanghai Composite Index data and emotional data.Emotional data, in this case, involved sentiment analysis from the news and microblogs that were related to a specific company.The authors demonstrated that 15 input variables, comprised of sentiment and technical indicators, could successfully predict a Chinese company's stock opening prices.
Kooli et al. [117] proposed a simple NN to examine whether the inclusion of accounting variables (generated from the release of accounting disclosures) improved the prediction accuracy of the NNs.The results showed that combining 48,204 daily stock closing prices of 39 companies with the respective accounting disclosure variables improved the NNs' prediction quality.However, this level of improvement drastically dropped when the NN predicted prices in 2011, a time of civil unrest in Tunisia.This extreme example is noteworthy, as it portrays how an observed variable (i.e., one that was able to consistently improve the model accuracy) could lose its impact when emotional events occurred.
Vantstone et al. [122] investigated if the prediction of the price of 20 Australian stocks by a neural network autoregressive (NNAR) model could be improved with the inclusion of inputs in the form of counts of both news articles and tweets.The sentiment-based indicators used in this study were generated by Bloomberg.These types of sentiment-based indicators are increasingly becoming available.Additionally, due to the overall improvement in data-mining techniques, these indicators should theoretically be more reliable than ever.Their study found that the NNAR that incorporated the Bloomberg-generated news and Twitter-based sentiment indicators had a higher quality of stock-price predictions.Due to the indicators being created and readily available by Bloomberg, no text/data-mining models had to be utilized by the authors.The resulting ease of incorporating news and Twitter indicators into NNs was equal to any other technical indicator.
As shown in Table 1, LSTM stands out as one of the most popular NN techniques in the price prediction literature.This technique was introduced by Olah [139] in 2015 and has since been utilized in numerous studies.For example, Jin et al. [19] introduced an LSTM-based model that incorporated sentiment analysis and employed empirical modal decomposition to break down stock-price sequences.The authors' approach enhanced prediction accuracy by leveraging LSTM's capacity to analyze relationships among timeseries data through its memory function.Furthermore, Lu et al. [21] introduced the CNN-BiLSTM-AM method, which combined convolutional neural networks, bidirectional long short-term memory, and an attention mechanism.The authors' model aimed to predict the following-day stock closing prices by extracting features using CNN, using BiLSTM for prediction, and employing an attention mechanism to capture feature influences at different times.A comparative analysis against seven other methods for predicting stock closing prices on the Shanghai Composite Index revealed the superior performance of the CNN-BiLSTM-AM method in terms of MAE and RMSE.
Vijh et al. [20] employed artificial neural network and random-forest techniques to predict next-day closing prices for stocks across various sectors.By utilizing financial data, such as open, high, low, and closed prices, they created new variables as inputs in the model.The evaluation based on standard indicators RMSE and MAPE highlighted the efficiency of their models for predicting stock closing prices.Wu et al. [60] explored LSTM for stock-price prediction by introducing the S_I_LSTM method in incorporating multiple data sources and investor sentiment.The authors' approach leveraged sentiment analysis based on convolutional NNs for calculating investors' sentiment index and combined this with technical indicators and historical transaction data as features for LSTM prediction.The results indicated that the predicted stock closing prices aligned more closely with the accurate closing prices compared to the traditional LSTM methods.Lastly, Kurani et al. [23] presented a comprehensive study on the use of artificial neural networks (ANN) and support vector machines (SVM) for stock forecasting to provide further insights into the application of machine-learning techniques in the financial domain.
Khan et al. [127] applied algorithms to social media and financial news data to assess their impact on stock-market prediction accuracy over a span of ten subsequent days.They conducted feature selection and minimized spam tweets in the datasets to enhance prediction quality.Furthermore, the study involved experiments to identify stock markets that were challenging to predict and those heavily influenced by social media and financial news.The researchers also compared the outcomes of various algorithms to determine a reliable classifier.The results recommended random forest for stock-trend prediction due to its consistent results in all the cases.Finally, deep-learning techniques were employed, and classifiers were combined to maximize prediction accuracy.The experimental findings revealed the highest prediction accuracies of 80.53% and 75.16% using social media and financial news data, respectively.Shaban et al. [128] introduced a new system based on deep learning to predict the stock price.They combined LSTM and bidirectional gated recurrent unit (BiGRU) to predict the closing price of the stock market.Then, they applied the proposed method to some stocks and predicted their close price 10 and 30 min before the actual time.Liu et al. [44] have another study that considered news in the market price prediction.They developed a model based on TrellisNet and a sentiment attention mechanism (SA-TrellisNet) to predict stock-market prices.They integrated the LSTM and CNN models for sentiment analysis while employing a sentiment attention mechanism to allocate weights and a trellis network for stock prediction.The hybrid model includes three components: sentiment analysis, sentiment attention mechanism, and the prediction model.Finally, they compared the proposed model with general methods to demonstrate its performance.
Recently, the substantial impact of cryptocurrencies on the global financial markets has led to an increasing number of price prediction studies in academic research.Ammer and Aldhyani [45] proposed an LSTM algorithm to forecast the values of four types of cryptocurrencies: AMP, Ethereum, Electro-Optical System, and XRP.To overcome the problem of price-fluctuation prediction, they proposed an LSTM that captures the time dependency aspects of the prices of cryptocurrencies and proposed an embedding network to capture the hidden representations from linked cryptocurrencies.They then employed these two networks in conjunction with each other to predict price.In addition, Belcastro et al. [129] introduced a methodology aimed at optimizing cryptocurrency trading decisions to enhance profit margins.Their approach integrates various statistical, text analytics, and deep-learning methodologies to support a recommendation trading algorithm.Notably, the study leverages supplementary data points, such as the correlation between social media activity and price movements, causal relationships within price trends, and the sentiment analysis of cryptocurrency-related social media, to generate both buy and sell signals.Finally, the researchers conducted numerous experiments utilizing historical data to evaluate the efficacy of the trading algorithm, achieving an average gain of 194% without factoring in transaction fees and 117% when accounting for fees.Lastly, Al-Nefaie et al. [130] employed AI algorithms, including the gated recurrent unit (GRU) and MLP, for forecasting Bitcoin prices.They evaluated their models using various metrics, such as mean square error (MSE), root mean square error (RMSE), Pearson correlation (R), and R-squared (R2), to assess performance.Their findings indicated that the MLP method outperformed the GRU approach.Given these studies, the primary contributions of the current study to the extant literature are as follows.
• We are the first to use Bloomberg Twitter and news publication count variables as critical inputs for stock-price prediction within the North American context; • We use a novel approach in employing Twitter and news publication count variables as inputs into multi-layer perceptron (MLP) and long short-term memory (LSTM) NNs.This novel approach seeks to assess the influence of these variables on various NN architectures, allowing us to concurrently evaluate and contrast the stock-price prediction performance of both models; • We focus on examining the existence of a potential notable decline in model perfor- mance during periods rife with market panic (e.g., the COVID-19 pandemic).Therefore, we seek to provide insights into the robustness of the proposed models under stressful conditions in financial markets.

Problem Definition and Formulation
Given the significant influence of social media and news on investor behavior in the stock market [140,141], we employ machine-learning (ML) techniques for leveraging such information to predict stock prices in the North American context.Predicting stock prices accurately, especially during periods of heightened market uncertainty, presents a unique challenge for existing prediction models.Notably, stock-price volatility experiences a substantial fluctuation during economic downturns [142] which, in turn, adds complexity to accurate forecasting.Trade volume serves as a more effective predictor of panic-induced volatility than the traditional inputs commonly used in the literature (e.g., [143]).Furthermore, some studies (e.g., [32,144,145]) revealed that investors' reactions to news differ significantly during times of market panic.Therefore, beyond the development of an ML-based prediction model, our study investigates the public panic's impact on the price prediction.Hence, we seek to compare the predictive efficacy of the developed model in periods of public panic with its performance in more stable market conditions.This dual focus aims to provide an understanding of the model's reliability under varying market conditions, which enhances its practical applicability in real-world scenarios.
Multiple-layer perceptron (MLP) and long short-term memory (LSTM) have emerged as the most commonly applied techniques in stock-price prediction.Consequently, we adopt these two prevalent network types for formulating an advanced stock-price prediction model.To this end, several crucial design decisions on NN configuration should be made.These decisions encompass determining the optimal number of layers, nodes, and training epochs and selecting the appropriate cost functions and optimizers.It is worth noting that, within the extensive body of research utilizing NN as the primary algorithm, there is a notable absence of standardization concerning their design.Moreover, the predominant focus in NN design research lies in hyper-parameter optimization through algorithms rather than in establishing empirical standards.This lack of proven baselines hinders researchers from effectively comparing and advancing various NN architectures [148].

Multiple-Layer Perceptron (MLP) Network
A multiple-layer perceptron (MLP) is a specific feed-forward NN designed to analyze non-linear data effectively.It comprises multiple layers of perceptrons, each functioning as an algorithm for supervised learning.MLP's capacity to handle chaotic and non-linear data is enhanced by increasing the number of perceptrons.A concise representation of the input variable processing within an MLP is depicted through the following mathematical expression: where X k denotes the value of the kth variable input into the perceptron, w k represents the corresponding weight, b is the bias, and Φ is the activation function, including the sigmoid and tanh functions ([23,49]).Furthermore, K n denoted the number of neurons in the nth layer.These layers fall into three categories: (i) input, (ii) output, and (iii) hidden layers.Finally, "o" represents values associated with the output neurons, and "h" represents values associated with the hidden neurons.While it has been established that a single hidden layer in a neural network can approximate any univariate function [149], stock-price prediction inherently involves multi-variate complexities.Given the successful approximation of multi-variate functions with just two hidden layers in a simple feed-forward network [150,151], we adopt an NN configuration with two hidden layers for the MLP network.
Although various cost functions have been explored for the MLP network in the literature, we specifically opt for the mean squared error (MSE).The effectiveness of MSE in optimizing NNs has previously been demonstrated [152], thereby showcasing its ability to handle an extensive magnitude of training data [153].Additionally, we employ the Adam (Adaptive Moment) optimization algorithm.This algorithm was first introduced by Hasan et al. [154] and has been tested by researchers at OpenAI and Google Deepmind [154,155].Adam has been found to be successful in handling non-stationary data while also being able to handle both sparse and noisy gradients.
The number of hidden nodes is another parameter that should be determined for NN configuration.Unlike epoch iterative testing, a generally agreed-upon formula can guide the establishment of a testing range for the optimal number of hidden nodes.This formula is presented as follows: where N h denotes the number of hidden nodes, N i denotes the number of input neurons, and N o and Ns denote the number of output neurons and samples in the training dataset, respectively.Finally, α is an arbitrary scaling factor, usually 2-10.With a training dataset comprising 1000 samples, an input layer of six neurons, and an output layer of one neuron, we have opted for 13 hidden neurons in the MLP.Another critical aspect of the design process involves determining the number of epochs.Since the optimal number of epochs should be established on a case-by-case basis, we have conducted iterative testing to identify the point at which the loss function ceases to decrease.It is worth noting that, despite the iterative approach, we have set a maximum limit of 5000 epochs for the MLP for the sake of simplicity.

Long Short-Term Memory (LSTM) Networks
LSTM networks represent a prominent class of NN extensively employed in stockprice prediction research.LSTMs, categorized as recurrent NNs, distinguish themselves from their feed-forward counterparts by incorporating the previous output as an input for the subsequent timestamp.Unlike feed-forward NNs like MLP, which treat the first and 1000th inputs similarly, recurrent networks process data sequentially.This characteristic makes LSTMs particularly suitable for addressing the inherent linearity of MLPs.Despite the resemblance of basic linear transformations within an LSTM to those in an MLP, the pivotal feature contributing to the widespread adoption of LSTMs is the integration of gates and states, which fundamentally alter the nature of the NN.
To construct the LSTM network, similar to the MLP, we specify two hidden layers with the objective of minimizing the mean squared error (MSE) as the cost function.Additionally, based on the demonstrated effectiveness in prior studies [156,157], we have chosen to combine LSTM and the Adam optimization algorithm for this LSTM NN.Moreover, con-sidering the network's complexity, we set the number of hidden neurons and the maximum number of epochs to 60 and 1000, respectively, when the training sample size is 1000.

Model Construction
The current study aims to examine the specific impact of Twitter and news count variables on stock-price prediction, with a primary emphasis on the North American context.To this end, as shown in Figure 1, we first selected a group of stocks for investigation.A subset of the chosen stocks is collectively known as "FAANG", an acronym representing Facebook Inc. (FB), Apple Inc. (AAPL), Amazon.comInc. (AMZN), Netflix Inc. (NFLX), and Alphabet Inc. (GOOG).Coined by former Goldman Sachs fund manager Jim Cramer [158], FAANG stocks are particularly significant for North American investors and the overall stock market and are publicly traded on the NASDAQ.As of July 2020, these five companies boasted a combined market capitalization of USD 4.1 trillion, which constituted 16.64% of the total S&P 500 market capitalization.The S&P 500, an index comprised of 500 companies, historically represents 70% to 80% of the total US stock-market capitalization.The substantial contribution of FAANG stocks to the S&P 500 highlights their broader importance in shaping the North American stock market.The movements of FAANG stocks directly influence North American investors' perceptions of the overall market and thereby impact trading decisions.

Input Parameters Selection
To establish the input parameters for our analysis, we reviewed the parameters commonly explored in other studies.As depicted in Table 2, the most frequently examined inputs include open price, high price, low price, close price, moving average, and trade volume.Among these, the four most prevalent ones are open, high, low, and close.Consequently, the moving average (representing the price averaged over a specified number of periods) and trade volume (indicating the number of trades executed in a day) are incorporated into only half of the models.In contrast, the remaining half incorporates the four most common inputs alongside Bloomberg-generated Twitter and news count data.Several reasons justify the exclusion of the least utilized variables.Primarily, this choice facilitates a more direct comparison of the individual impact of these variables on enhancing stock-price prediction.Likewise, since the total number of variables remains constant, any observed performance improvement cannot be attributed to an increase in data volume.Additionally, in the context of NN, adjustments to design decisions, such as the number of neurons, are influenced by changes in the input variable count.Hence, for a methodologically sound comparison, we maintain consistency in the NN's design, irrespective of the variable set in use.It is worth noting that all variables are sourced from Bloomberg and are formatted in a comma-separated value structure.The dataset we used spanned from January 2015 to May 2020.We chose this time period in order to consider a more comprehensive coverage, that is, to include both normal market conditions (i.e., pre-COVID-19) as well as market-panic conditions (i.e., a few months post-COVID-19 outbreak).The definitions for each variable can be found in Table 3.The choice of Walmart as a focal point stems from the unique characteristics of the Bloomberg Twitter and news count variables.These variables gauge the frequency of mentions a specific company receives on Twitter or on digital news platforms.Given the recent surge in the popularity of digital and social media, particularly amongst a younger demographic, there is potential variation in the significance of Twitter and News count variables for companies adhering to traditional business models versus those embracing non-traditional models.To investigate this distinction, we opted to conduct a comparative analysis between Walmart and Amazon.Our aim was to discern any disparities in how Twitter and news count variables manifested for a conventional brick-and-mortar retailer like Walmart versus a more contemporary retailer like Amazon.Similarly, the selection of Ford and Tesla was motivated by the desire to contrast a more traditional automobile manufacturer with a purely electric one.Ford adheres to the traditional car dealership model while Tesla's showrooms are commonly situated in malls and have all transactions occurring online [159,160].Exploring the potential impact of this dichotomy in business models on both the Twitter and news count variables, as well as their effectiveness as inputs in stock-price prediction models, adds an intriguing dimension to our analysis.

Input Parameters Selection
To establish the input parameters for our analysis, we reviewed the parameters commonly explored in other studies.As depicted in Table 2, the most frequently examined inputs include open price, high price, low price, close price, moving average, and trade volume.Among these, the four most prevalent ones are open, high, low, and close.Consequently, the moving average (representing the price averaged over a specified number of periods) and trade volume (indicating the number of trades executed in a day) are incorporated into only half of the models.In contrast, the remaining half incorporates the four most common inputs alongside Bloomberg-generated Twitter and news count data.Several reasons justify the exclusion of the least utilized variables.Primarily, this choice facilitates a more direct comparison of the individual impact of these variables on enhancing stock-price prediction.Likewise, since the total number of variables remains constant, any observed performance improvement cannot be attributed to an increase in data volume.Additionally, in the context of NN, adjustments to design decisions, such as the number of neurons, are influenced by changes in the input variable count.Hence, for a methodologically sound comparison, we maintain consistency in the NN's design, irrespective of the variable set in use.It is worth noting that all variables are sourced from Bloomberg and are formatted in a comma-separated value structure.The dataset we used spanned from January 2015 to May 2020.We chose this time period in order to consider a more comprehensive coverage, that is, to include both normal market conditions (i.e., pre-COVID-19) as well as market-panic conditions (i.e., a few months post-COVID-19 outbreak).The definitions for each variable can be found in Table 3.

Open Price
The dollar value of the first trade since the market opened High Price The highest dollar value trade of the day Low Price The lowest dollar value trade of the day Close Price The dollar value of the last trade before the market closed 30-Day Moving Average The average dollar value of one share in the last 30 days Trade Volume The total quantity of shares traded all day.
Twitter Count (TC) Represents the difference between the number of tweets expressing positive sentiment and those expressing negative sentiment towards the parent company over a 24-h period.

News Publication Count (NC)
Represents the total number of news publications mentioning the parent company over a 24-h period.
As mentioned, the data utilized in our study was extracted from Bloomberg, a reputable financial data provider widely used in academic and industry research.To ensure accuracy and reliability, we accessed Bloomberg's database and retrieved the required information using their data-export functionality.Bloomberg's database is renowned for its accuracy, timeliness, and depth of coverage, making it a preferred choice for researchers and practitioners in the financial industry.Specifically, we employed Bloomberg's Excel API to extract the data directly into an Excel format.This API allowed us to access various financial data, including stocks' open prices, high prices, low prices, and close prices daily from 1 January 2015 to 31 May 2020.It also includes trading volume, the 30-day moving average, the number of tweets with positive sentiment, the number of tweets with negative sentiment, the number of tweets with neutral sentiment, and the news publication count.To capture the sentiments of tweets, we consider the difference between positive and negative sentiment tweets as the number of Twitter counts (TC).However, we do not involve the sentiment in the news publication count (NC) parameter and consider the total number of news publications mentioning the parent company over a 24 h period.The dataset includes 1412 data items for each stock, and the attributes were selected based on their potential significance for analyzing stock-market dynamics and sentiment and their availability within the Bloomberg database.The daily average number of tweets and news publications for the selected companies are Apple (4952 tweets, 2506 news), Amazon (3009 tweets, 757 news), Facebook (3912 tweets, 701 news), Netflix (1556 tweets, 637 news), Google (3930 tweets, 1196 news), Walmart (743 tweets, 375 news), Tesla (2218 tweets, 594 news), and Ford (192 tweets, 271 news).Furthermore, by utilizing the Excel API, we could integrate the data into our analysis workflow, facilitating further processing and analysis.Additional details regarding the collected data can be found in Table A1 in Appendix A.

Data Splitting, Modeling, and Analysis
Following the collection of daily stock data from Bloomberg, the dataset is divided into two subsets: (i) a technical set (T) and (ii) a technical-plus set incorporating TC and NC (T+).To address skewness, a log transformation is applied to the TC and NC.After this transformation, all TC and NC variables exhibit an acceptable level of skewness.Min-max normalization is then applied to all the remaining variables.Subsequently, the dataset is further categorized into three sets: (i) a training set, (ii) a normal test set, and (iii) a panic test set.The training set encompasses daily variables from January 2015 to 2019, the normal test set comprises variables from January 2019 to November 2019, and the panic test set includes variables from January 2019 to May 2020.For each stock, four price prediction models are constructed.Each model undergoes training and execution of the NN five times and the predictions are averaged to obtain the final model prediction.This approach addresses the stochastic nature of NNs, which can result in slight performance variances.The first two models utilize LSTM, where one model incorporates OP, HP, LP, CP, TV, and 30MA as inputs and the other uses OP, HP, LP, CP, TC, and NC as inputs.The next day's CP is defined as the target variable in these models.The third and fourth models are built using MLP with the respective input sets.
Each defined model undergoes testing on both the normal test set and the panic test set.The accuracy measure for model evaluation is the root mean squared error (RMSE), where lower RMSE values indicate better stock-price predictions.RMSE is selected over other error measures due to its expression in the original unit being measured.This characteristic makes RMSE useful for analyzing the error gap between the expected and predicted values [162].Moreover, RMSE assigns a higher weight to larger errors compared to other measures, making it particularly suitable for domains where significant errors in accuracy are undesirable [161].Incorporating the magnitude of error into RMSE is pertinent in stockprice prediction research, as larger errors in prediction can potentially lead to greater losses in buy or sell decisions.Recent stock-price prediction research adopted RMSE as the error measure for comparative analysis of different companies' stock-price predictions [163].

Results and Discussion
In this section, we investigate the potential enhancement of stock-price prediction by including Twitter and news variables and explore whether their influence varies significantly during periods of market panic.In this regard, we employ the constructed models to forecast the selected stock prices and incorporate the identified input parameters.To distinguish the distinct impact amongst various parameters that might influence the prediction accuracy, we apply both MLP and LSTM models with a technical set (T) and a technical-plus set (T+) separately.We calculated the root mean squared error (RMSE) as the accuracy index for each model and test-set configuration.The results of the designed experiments are presented in Table A2 of Appendix A.
To evaluate the improvement in stock-price prediction between T+ and T models' performance, we assessed the impact of Twitter and news variables by subtracting the former's RMSE from the latter's.Furthermore, we examined whether this impact differed in a panic market by comparing the results obtained in a normal market with those achieved during a panic market.These results were then subjected to a statistical test for model comparison.Additionally, when exploring the impact of input-data type, test-data type, and NN selection, our analysis emphasized group-level analysis rather than a focus on individual performances.Concentrating on the average relative difference between T+ input data and T input data for the same test set as a group aimed to provide insights independent of specific decisions (e.g., individual stock selection) that may influence stockprice prediction performance.This approach holds the potential for increased practical replicability and utility.Therefore, the impact of T+ variables is analyzed based on the RMSE difference compared to models utilizing exclusively technical variables.

Comparing Predictive Models under Panic and Normal Circumstances
To investigate the models' performance under panic and normal circumstances, we created two separate test sets, the normal set (nor-set) and the panic set (pan-set), for each stock.The reason behind creating two sets for each stock is due to the nature of Bloomberg, which generates TC and NC variables relating to the number of times the public mentions a company.Therefore, interesting insights can be gained by analyzing the performance of all models in times of increased panic in the stock market.It is a general consensus that, in the first half of 2020, North America experienced widespread panic related to the performance of the stock market and the economy due to the impact of the global COVID-19 pandemic outbreak [164].The average RMSE for models tested on the pan-set was 0.0670, while the average RMSE for the nor-set was 0.0354.There is an apparent decrease in performance when testing models on the nor-set compared to the pan-set.To test the statistical significance of this difference, a Wilcox signed-rank test was conducted between the RMSEs of the models tested on these sets.The results showed that this difference in performance is significant, even at a 1% level (p-value~0.001).
To better understand this discrepancy in performance, an analysis of how the variables of interest differed between the different test sets can be found in Appendix.This analysis provides insights into the changes in mean, standard deviation, and coefficient of variation (CV) of the T+ variable (including TC, NC, and close price) across the overall dataset, training dataset, normal test sets, and panic test set.It is worth noting that the normal test set spans from January 2019 to November 2019 (including 218 observations), whereas the panic test set includes 151 observations, covering December 2019 to May 2020 (this is mainly due to the date of a report issued by the World Health Organization which resulted in panic in the US stock market [34]).Additionally, the training data encompasses 1043 observations, covering the period from January 2015 to December 2018.By splitting our analysis into these three distinct sections, we allow ourselves a starting point from which to theorize what may have caused an average decrease in performance for the tests done on the pan-set versus the nor-set.
Table 4 shows a summary of the analysis of two T+ variables, the TC and the NC, and the price differences between the pan-set and the training data.The companies are displayed in descending order by mean RMSE for the pan-set.As shown in Table 4, when comparing the bottom 50% to the top 50% of the list, the CV for the TC and NC variables exhibit a 19% and 27% absolute difference, respectively.In addition, the percentage change of mean TC and NC variables shows an 18% and 19% absolute difference, respectively.The difference in variance between the data the models are learning from and the additional period added to the nor-set to make it the pan-set are critical to our analysis.The average percentage change in mean for the price data, as well as the average CV of the price data, exhibits a similar difference between the panic data and training data for all the stocks being predicted.This discrepancy in variable variance between the training and test sets can be the primary cause of the drop in performance when it comes to the pan-set compared to the nor-set.

Comparing the Impact of T and T+ Variables
To determine whether using T+ variables leads to a significant improvement in price prediction, we analyzed instances where the utilization of T+ variables, as opposed to T variables, resulted in improvements in terms of RMSE.As shown in Table 5, there are 25 instances where using T+ variables led to an RMSE reduction.Among those instances that do not lead to improvement, four are associated with Walmart, two are related to Tesla, and one is concerned with Facebook.Furthermore, among these seven instances, four are associated with the LSTM model, and three are related to the MLP model.It is also worth noting that Tesla had a model that improved by 42% and another that lowered by 89%.In essence, this signifies an ample need for researchers and traders to test across a wide variety of scenarios before deciding to use a variable as an input in a stock-price prediction model.A summary of these impacts is presented in Table 6.As highlighted in Table 6, the most considerable improvement belongs to Ford, which has a 36% improvement over the stock prediction using T+ variables.Notably, both MLP and LSTM configurations for Ford exhibit superior performance with the pan-test set compared to the normal test set.This ranking will guide the analysis of input variables, test data, and NN selection in future assessments.Notably, no clear trends are apparent concerning the magnitude of mean TC and NC variables and their relationship to RMSE% improvement.Table 7, however, reveals insights when comparing the average TC to the average NC ratios.The top 50% of the list in terms of RMSE% reduction was associated with the data featuring an average ratio of 1.66 TC/NC.In contrast, the bottom 50% of the list exhibited an almost 57% increase in the average TC to average NC ratio, reaching 2.92.A potential interpretation of these findings suggests that neither the TC nor NC alone necessarily fortifies the variables, emphasizing the nuanced interaction of factors in predictive modeling.In general, within ML, a distinctive requirement involves training with a distinct dataset.It is necessary to thoroughly examine input variables across various stages of both the training and testing processes [165].The objective is to assess the potential impact of the TC and NC variables on enhancing stock-price prediction.To this end, we conducted comparisons between these variables within the training data and across both test sets.The rationale behind this comparative analysis is to ascertain whether the pre-emptive examination of input data (prior to actual testing) could reveal the possibility of improving the accuracy of stock-price prediction by incorporating T+ variables into the model.Table 8 provides an analysis of the change in the CV between the training and the test sets.The CV for the variables exhibits relative stability across both the training and the test data for all models.Although CV is always non-negative, the change percentage can be positive or negative.
Table 9 presents an analysis of the mean T+ variables across the training data, normal test set (nor-set), and panic test set (pan-set).Notably, there is a consistent improvement in RMSE across all eight tests with the inclusion of T+ variables.Among the top 50% of performers, there is an average 25% reduction in the mean TC between the training and test sets, suggesting that, on average, the TC is 25% lower in both test sets compared to the training set.Conversely, the bottom 50% of performers exhibit an almost 60% decrease in the mean TC between the training data and the average of both test sets.The difference in the mean NC between the training data and test data for the top 50% of models indicates a 40% decrease.In contrast, the bottom 50% of companies (based on the RMSE percentage improvement due to the addition of T+ variables) show an average 25% increase in the mean NC between the training and testing phases.

Comparing the MLP and LSTM Models
To thoroughly analyze the MLP and LSTM models' performance, we evaluate the mean and variance of the RMSE across various groupings for each model.The corresponding outcomes are presented in Table 10.Table 10 illustrates that the MLP consistently outperformed the LSTM across all subsets, showcasing lower RMSE values.Additionally, the MLP models demonstrated a lower CV, averaging 0.1693.Despite the LSTM being more complex than the MLP, it failed to surpass the MLP in price prediction accuracy.This finding is in line with the literature, such as the work of Hiransha et al. [37], who also found that their LSTM model did not outperform their MLP model for stock-price prediction over a 400-day period.However, when tested over a 10-year period, the LSTM exhibited improved accuracy and outperformed the MLP model-this was mainly attributed to its memory feature.In other words, the memory feature is a key advantage of LSTMs that becomes more beneficial in larger test sets.The intuition behind this phenomenon lies in the increased prediction period, which allows for more opportunities for memory utilization.Consequently, the enhanced complexity of the LSTM proves advantageous in longer prediction periods; however, the same is not true in shorter testing periods.As discussed regarding leveraging MLP and LSTM techniques, we examined factors influencing stock-price prediction accuracy, particularly non-technical variables such as Twitter and news-related data.In other words, we investigated how incorporating these variables impacts prediction performance in normal and panic market conditions.By calculating RMSE, a widely used metric in predictive modeling, we quantitatively assessed the accuracy of these predictions.By comparing the RMSEs of models utilizing only technical variables (T) with those incorporating additional Twitter and news variables (T+), we analyzed how these supplementary factors contribute to improved prediction accuracy.In addition, we conducted several statistical tests to validate observed differences in the MLP and LSTM models' performance.Despite the LSTM's reputation for handling sequential data and its inherent complexity, the study finds that MLP consistently outperforms LSTM across various subsets.One possible reason for the superiority of MLP is the volume of data.More specifically, the memory feature is a key advantage of LSTMs that becomes more beneficial in larger test sets.One plausible explanation for MLP's superiority lies in the volume of data.Specifically, while the memory feature of LSTMs becomes more advantageous in larger test sets due to increased prediction periods [166,167], it may not include the same benefits in smaller datasets.This result highlights the sophisticated relationship between model complexity and dataset size in predictive modeling.

Conclusions and Future Research
This study investigated the influence of Twitter count (TC) and news count (NC) variables on stock-price prediction within both normal and market conditions.We incorporate Bloomberg Twitter and news publication count variables into MLP and LSTM neural networks to assess their predictive influence.Additionally, we analyze these effects during the market panic to evaluate their stability.Our methodology integrates MLP and LSTM neural networks with technical variables (T variables), TC, and NC (creating T+ variables) for price prediction.The models are trained on data from January 2015 to December 2018 and tested on normal (January 2019 to November 2019) and panic periods (December 2019 to May 2020).We applied statistical analyses to these results, which revealed a notable enhancement in stock-price prediction accuracy across various model types when these additional variables were incorporated.Furthermore, the comparison between the T and T+ variables indicated that both traders and researchers could derive substantial benefits from including TC and NC variables as inputs in neural network-based stock-price prediction models.This integration not only enhanced prediction accuracy and provided significant value to traders and investors, but it also facilitated the seamless incorporation of public opinion into prediction models.Given the escalating impact of social media on societal perspectives, the inclusion of TC and NC variables allows traders to consider the public's perception of corporations and products in their analyses.This strategic utilization of social media data empowers traders to make more informed decisions, reflecting a nuanced understanding of market sentiment and public opinion.
While the proposed models aimed to analyze the impact of news and tweets on stockprice prediction accuracy, it is important to acknowledge certain limitations for further research.First, this study focused primarily on the COVID-19 pandemic as a period of market distress.The impact identified in this case might not capture the full range of possible market behaviors under different types of panic conditions, crises, financial crashes, or geopolitical events.In addition, this study used LSTM and MLP models, which, while established, may not represent the cutting edge in predictive modeling.More advanced techniques, like hybrid approaches, might provide better performance or additional insights.Furthermore, the study used sentiment analysis tools to quantify investor sentiment from news and tweets.The accuracy of these tools can vary, and errors in sentiment classification could affect the overall findings.Finally, the impact of news and tweets on stock prices might vary across different regions and industry sectors.The study does not explicitly address whether the findings are consistent across various markets and industries or if specific segments primarily drive the results.
There are several avenues to develop future studies in this area.The current configuration of TC and NC variables in T+ entails countable measures and shares similarities with other variables.In contrast to more generalized Twitter and news-based indicators, Bloomberg's generated Twitter and news count variables employ clearly defined and replicable terms linked to objective measures.Moreover, certain indicator providers, such as Yahoo Finance, may yield identical values for TC and NC, which can be integrated into price prediction models.However, further analysis is imperative to ascertain why these variables demonstrate efficacy across the majority of scenarios but exhibit sub-optimal performance in specific instances.Thus, exploring these cases and identifying their underlying sources of impact may be a promising avenue for future research.Another prospective area for future investigation involves considering whether the influence of TC and NC variables correlates with the size of the company.More specifically, future studies are encouraged to explore whether the inclusion of TC and NC variables equally enhances stock-price prediction accuracy for small-, medium-, and large-sized companies.
Furthermore, the observed variations in prediction performance between MLP and LSTM models can be attributed to the test-period length and the extent of data preprocessing.Traders and researchers should be conscious of these factors when selecting stock-price prediction models.To enhance model applicability across different scenarios, it is recommended to standardize data-preparation techniques to ensure the optimal performance of various model types.This standardization should be replicable for each use of a specific model and thereby promote consistency in the analyses related to stock-price prediction.Likewise, it should advance research aimed at refining neural networks.Additionally, exploring alternative neural network architectures beyond MLP and LSTM is advisable, in an attempt to identify simpler models that may outperform stock-price prediction.Moreover, the current study focused solely on the quantity of tweets and news (TC and NC) while neglecting their sentiment and diverse impacts.Future research could cover the varying impact weights of distinct tweets or news items for each stock and involve their dominant sentiment.Such considerations can be promising areas for future research, as they may refine prediction accuracy in the dynamic landscape of stock-market forecasting.

Table 1 .
A summary of the related literature.

Table 2 .
The indicators that have been investigated in the literature.

LP TV 30MA BB TB MACD Data Gathering/Extraction Data Preparation Input Parameters Selection (Feature Selection & Normalization) Data Splitting Train Set Normal Test Set Analysis & Evaluation Panic Test Set Modeling MLP LSTM Stock Selection Figure
1.The methodology used for analysis.

Table 2 .
The indicators that have been investigated in the literature.

Table 3 .
Definitions of input parameters.

Table 4 .
T+ variables' mean and coefficient of variation in the panic test set and training data.

Table 5 .
The RMSE change when utilizing T vs. T+ variables.

Table 6 .
The average RMSE% improvement across all configurations.

Table 7 .
The mean Twitter count to mean news publication count ratio.

Table 8 .
The variable CV change between the training and test sets.

Table 9 .
The mean TC and NC in the training data, the normal test set, and the panic test set.

Table 10 .
The RMSE analysis for T vs. T+ variables and the panic test set vs. the normal test set.

Table A1 .
Comparing variables in the overall, training, normal test, pan-test, and panic-period test set.